SlideShare una empresa de Scribd logo
1 de 30
1© StreamSets, Inc. All rights reserved.
Ingesting Bulk and Streaming Data
for Analysis in Apache Ignite
Pat Patterson
pat@streamsets.com
@metadaddy
2© StreamSets, Inc. All rights reserved.
Agenda
Product Support Use Case
Continuous Queries in Apache Ignite
Integrating StreamSets Data Collector with Apache Ignite
Demo
Wrap-up
3© StreamSets, Inc. All rights reserved.
Who Am I?
Pat Patterson / pat@streamsets.com / @metadaddy
Past: Sun Microsystems, Salesforce
Present: Director of Evangelism, StreamSets
I run far 
4© StreamSets, Inc. All rights reserved.
Who is StreamSets?
Seasoned leadership team Customer base from global
8000
50%
Unique commercial
downloaders
2000+
Open source downloads
worldwide
3,000,000+
Broad connectivity
50+
History of innovation
5© StreamSets, Inc. All rights reserved.
Use Case: Product Support
Customer service platform (SaaS) holds support ticket status,
assignment to support engineers
HR system (relational database) holds reporting structure
Device monitoring system (flat delimited files) provides fault data
How do we discover which director’s team has the most high-
priority tickets?
How do we get notifications of faults for high-priority tickets?
6© StreamSets, Inc. All rights reserved.
Continuous Queries (1)
Enable you to listen to data modifications occurring on Ignite
caches
Specify optional initial query, remote filter, local listener
Initial query can be any type: Scan, SQL , or TEXT
Remote filter executes on primary and backup nodes
7© StreamSets, Inc. All rights reserved.
Continuous Queries (2)
Local listener executes in your app’s JVM
Can use BinaryObjects for generic, performant code
Can use ContinuousQueryWithTransformer to run a remote
transformer
• Restrict results to a subset of the available fields
8© StreamSets, Inc. All rights reserved.
Continuous Query with Binary Objects - Setup
// Get a cache object
IgniteCache<Object, BinaryObject> cache = ignite.cache(cacheName).withKeepBinary();
// Create a continuous query
ContinuousQuery<Object, BinaryObject> qry = new ContinuousQuery<>();
// Set an initial query - match a field value
qry.setInitialQuery(new ScanQuery<>((IgniteBiPredicate<Object, BinaryObject>) (key, val) -> {
System.out.println("### applying initial query predicate");
return val.field(filterFieldName).toString().equals("fieldValue");
}));
// Filter the cache updates
qry.setRemoteFilterFactory(() -> event -> {
System.out.println("### evaluating cache entry event filter");
return event.getValue().field(filterFieldName).toString().equals("fieldValue");
});
9© StreamSets, Inc. All rights reserved.
Continuous Query with Binary Objects - Listener
// Process notifications
qry.setLocalListener((evts) -> {
for (CacheEntryEvent<? extends Object, ? extends BinaryObject> e : evts) {
Object key = e.getKey();
BinaryObject newValue = e.getValue();
System.out.println("Cache entry with ID: " + e.getKey() +
" was " + e.getEventType().toString().toLowerCase());
BinaryObject oldValue = (e.isOldValueAvailable()) ? e.getOldValue() : null;
processChange(key, oldValue, newValue);
}
});
10© StreamSets, Inc. All rights reserved.
Continuous Query with Binary Objects – Run the Query
// Run the continuous query
try (QueryCursor<Cache.Entry<Object, BinaryObject>> cur = cache.query(qry)) {
// Iterate over existing cache data
for (Cache.Entry<Object, BinaryObject> e : cur) {
processRecord(e.getKey(), e.getValue());
}
// Sleep until killed
boolean done = false;
while (!done) {
try {
Thread.sleep(1000);
} catch (InterruptedException e) {
done = true;
}
}
}
11© StreamSets, Inc. All rights reserved.
Demo: Continuous
Query Basics
12© StreamSets, Inc. All rights reserved.
Continuous Query with Transformer
ContinuousQueryWithTransformer<Object, BinaryObject, String> qry =
new ContinuousQueryWithTransformer<>();
// Transform result – executes remotely
qry.setRemoteTransformerFactory(() -> event -> {
System.out.println("### applying transformation");
return event.getValue().field("fieldname").toString();
});
// Process notifications - executes locally
qry.setLocalListener((values) -> {
for (String value : values) {
System.out.println(transformerField + ": " + value);
}
});
13© StreamSets, Inc. All rights reserved.
Demo: Continuous
Query Transformers
14© StreamSets, Inc. All rights reserved.
Key Learnings
Need to enable peer class loading so that app can send code to execute on remote
node
<property name="peerClassLoadingEnabled" value="true"/>
By default, CREATE TABLE City … in SQL means you have to use
ignite.cache("SQL_PUBLIC_CITY")
Override when creating table with cache_name=city
Binary Objects make your life simpler and faster
RTFM! CacheContinuousQueryExample.java is very helpful!
15© StreamSets, Inc. All rights reserved.
The StreamSets DataOps Platform
Data Lake
16© StreamSets, Inc. All rights reserved.
A Swiss Army Knife for Data
17© StreamSets, Inc. All rights reserved.
StreamSets Data Collector and Ignite
SELECT c.case_number, f.serial_number, f.fault
FROM Case c RIGHT JOIN Fault f
ON c.serial_number = f.serial_number;
18© StreamSets, Inc. All rights reserved.
Ignite JDBC Driver
org.apache.ignite.IgniteJdbcThinDriver
JDBC Thin Driver - connects to cluster node, lightweight!
Located in ignite-core-2.7.0.jar
JDBC URL of form: jdbc:ignite:thin://hostname
NOTE – table, column names must be UPPERCASE!!! - IGNITE-9730
Also JDBC Client Driver – starts its own client node
19© StreamSets, Inc. All rights reserved.
Demo: Ingest
from CSV
20© StreamSets, Inc. All rights reserved.
Data Architecture
MySQL
Salesforce
21© StreamSets, Inc. All rights reserved.
Demo: Ingest Kafka,
MySQL, Salesforce
22© StreamSets, Inc. All rights reserved.
But…
IGNITE-9606 breaks JDBC integrations 🙁
metadata.getPrimaryKeys() returns _KEY as the column name, returns column name (e.g. ID)
as primary key name
Fixed in forthcoming 2.8.0 release 🙁
Had to include ugly workaround to get my demo working:
ResultSet result = metadata.getPrimaryKeys(connection.getCatalog(), schema, table);
while (result.next()) {
// Workaround for Ignite bug
String pk = result.getString(COLUMN_NAME);
if ("_KEY".equals(pk)) {
pk = result.getString(PK_NAME);
}
keys.add(pk);
}
23© StreamSets, Inc. All rights reserved.
Continuous Streaming Application
Listen for high priority service tickets
Get the last 10 critical/error sensor readings for the affected
device
24© StreamSets, Inc. All rights reserved.
Continuous Streaming Application
// SQL query to run with serial number
SqlFieldsQuery sql = new SqlFieldsQuery(
"SELECT timestamp, fault FROM fault
WHERE serial_number = ? AND fault IN ('Critical', 'Error’)
ORDER BY timestamp DESC LIMIT 10"
);
// Process notifications - executes locally
qry.setLocalListener((values) -> {
for (String serialNumber : values) {
System.out.println("Device serial number " + serialNumber);
System.out.println("Last 10 faults:");
QueryCursor<List<?>> query = faultCache.query(sql.setArgs(serialNumber));
for (List<?> result : query.getAll()) {
System.out.println(result.get(0) + " | " + result.get(1));
}
}
});
25© StreamSets, Inc. All rights reserved.
Demo: Ignite
Streaming App
26© StreamSets, Inc. All rights reserved.
Other Common StreamSets Use Cases
Data Lake
Replatforming IoT Cybersecurity
Real-time
applications
27© StreamSets, Inc. All rights reserved.
Customer Success
“StreamSets allowed us to build and operate over
175,000 pipelines and synchronize 97% of our
structured data in R&D to our data lake within 4
months. This will save us billions of dollars.”
“StreamSets strikes the perfect balance between ease-of-
use and extensibility for users with varying coding skills.
E*Trade has been able to unify all security log data in a
single location for all cybersecurity needs”
“StreamSets has dramatically improved time to
analysis and reliability of our data science efforts
around revenue, quality control and ad inventory for
our websites.”
28© StreamSets, Inc. All rights reserved.
Conclusion
Ignite’s continuous queries provide a robust notification mechanism for acting
on changing data
Ignite’s Thin JDBC driver is still a work in progress, but promises to be a useful,
lightweight alternative to the older Client Driver
StreamSets Data Collector can read data from a wide variety of sources and
write to Ignite via the JDBC driver
29© StreamSets, Inc. All rights reserved.
References
Apache Ignite Continuous Queries
apacheignite.readme.io/docs/continuous-queries
Neo4j JDBC Driver
apacheignite-sql.readme.io/docs/jdbc-driver
Download StreamSets
streamsets.com/opensource
StreamSets Community
streamsets.com/community
30© StreamSets, Inc. All rights reserved.
Thank you
30© StreamSets, Inc. All rights reserved.
Pat Patterson
pat@streamsets.com
@metadaddy
Thank You!

Más contenido relacionado

Similar a Ingesting streaming data for analysis in apache ignite (stream sets theme)

CiklumJavaSat_15112011:Alex Kruk VMForce
CiklumJavaSat_15112011:Alex Kruk VMForceCiklumJavaSat_15112011:Alex Kruk VMForce
CiklumJavaSat_15112011:Alex Kruk VMForce
Ciklum Ukraine
 
Programming with ZooKeeper - A basic tutorial
Programming with ZooKeeper - A basic tutorialProgramming with ZooKeeper - A basic tutorial
Programming with ZooKeeper - A basic tutorial
Jeff Smith
 
Networking and Data Access with Eqela
Networking and Data Access with EqelaNetworking and Data Access with Eqela
Networking and Data Access with Eqela
jobandesther
 
PaaS on Openstack
PaaS on OpenstackPaaS on Openstack
PaaS on Openstack
Open Stack
 

Similar a Ingesting streaming data for analysis in apache ignite (stream sets theme) (20)

Pixels_Camp
Pixels_CampPixels_Camp
Pixels_Camp
 
IMC Summit 2016 Breakout - Matt Coventon - Test Driving Streaming and CEP on ...
IMC Summit 2016 Breakout - Matt Coventon - Test Driving Streaming and CEP on ...IMC Summit 2016 Breakout - Matt Coventon - Test Driving Streaming and CEP on ...
IMC Summit 2016 Breakout - Matt Coventon - Test Driving Streaming and CEP on ...
 
Azure SQL Database - Connectivity Best Practices
Azure SQL Database - Connectivity Best PracticesAzure SQL Database - Connectivity Best Practices
Azure SQL Database - Connectivity Best Practices
 
Scaling asp.net websites to millions of users
Scaling asp.net websites to millions of usersScaling asp.net websites to millions of users
Scaling asp.net websites to millions of users
 
CiklumJavaSat_15112011:Alex Kruk VMForce
CiklumJavaSat_15112011:Alex Kruk VMForceCiklumJavaSat_15112011:Alex Kruk VMForce
CiklumJavaSat_15112011:Alex Kruk VMForce
 
Spring and Cloud Foundry; a Marriage Made in Heaven
Spring and Cloud Foundry; a Marriage Made in HeavenSpring and Cloud Foundry; a Marriage Made in Heaven
Spring and Cloud Foundry; a Marriage Made in Heaven
 
ITCamp 2018 - Magnus Mårtensson - Azure Resource Manager For The Win
ITCamp 2018 - Magnus Mårtensson - Azure Resource Manager For The WinITCamp 2018 - Magnus Mårtensson - Azure Resource Manager For The Win
ITCamp 2018 - Magnus Mårtensson - Azure Resource Manager For The Win
 
Super-NetOps Source of Truth
Super-NetOps Source of TruthSuper-NetOps Source of Truth
Super-NetOps Source of Truth
 
Dataservices: Processing (Big) Data the Microservice Way
Dataservices: Processing (Big) Data the Microservice WayDataservices: Processing (Big) Data the Microservice Way
Dataservices: Processing (Big) Data the Microservice Way
 
Cloud Computing for Business - The Road to IT-as-a-Service
Cloud Computing for Business - The Road to IT-as-a-ServiceCloud Computing for Business - The Road to IT-as-a-Service
Cloud Computing for Business - The Road to IT-as-a-Service
 
Programming with ZooKeeper - A basic tutorial
Programming with ZooKeeper - A basic tutorialProgramming with ZooKeeper - A basic tutorial
Programming with ZooKeeper - A basic tutorial
 
Advanced kapacitor
Advanced kapacitorAdvanced kapacitor
Advanced kapacitor
 
Running Vue Storefront in production (PWA Magento webshop)
Running Vue Storefront in production (PWA Magento webshop)Running Vue Storefront in production (PWA Magento webshop)
Running Vue Storefront in production (PWA Magento webshop)
 
Networking and Data Access with Eqela
Networking and Data Access with EqelaNetworking and Data Access with Eqela
Networking and Data Access with Eqela
 
What is going on - Application diagnostics on Azure - TechDays Finland
What is going on - Application diagnostics on Azure - TechDays FinlandWhat is going on - Application diagnostics on Azure - TechDays Finland
What is going on - Application diagnostics on Azure - TechDays Finland
 
PaaS on Openstack
PaaS on OpenstackPaaS on Openstack
PaaS on Openstack
 
Sherlock Homepage - A detective story about running large web services - NDC ...
Sherlock Homepage - A detective story about running large web services - NDC ...Sherlock Homepage - A detective story about running large web services - NDC ...
Sherlock Homepage - A detective story about running large web services - NDC ...
 
LendingClub RealTime BigData Platform with Oracle GoldenGate
LendingClub RealTime BigData Platform with Oracle GoldenGateLendingClub RealTime BigData Platform with Oracle GoldenGate
LendingClub RealTime BigData Platform with Oracle GoldenGate
 
fog or: How I Learned to Stop Worrying and Love the Cloud
fog or: How I Learned to Stop Worrying and Love the Cloudfog or: How I Learned to Stop Worrying and Love the Cloud
fog or: How I Learned to Stop Worrying and Love the Cloud
 
Visualizing Big Data in Realtime
Visualizing Big Data in RealtimeVisualizing Big Data in Realtime
Visualizing Big Data in Realtime
 

Más de Tom Diederich

Tom Diederich portfolio presentation (updated Nov. 18, 2016)
Tom Diederich portfolio presentation (updated Nov. 18, 2016)Tom Diederich portfolio presentation (updated Nov. 18, 2016)
Tom Diederich portfolio presentation (updated Nov. 18, 2016)
Tom Diederich
 
How to build & grow online communities: with Tom Diederich
How to build & grow online communities: with Tom DiederichHow to build & grow online communities: with Tom Diederich
How to build & grow online communities: with Tom Diederich
Tom Diederich
 
How to build a production-ready in-memory-based application in 1 hour
How to build a production-ready in-memory-based application in 1 hourHow to build a production-ready in-memory-based application in 1 hour
How to build a production-ready in-memory-based application in 1 hour
Tom Diederich
 

Más de Tom Diederich (12)

Tom Diederich portfolio presentation (updated Nov. 18, 2016)
Tom Diederich portfolio presentation (updated Nov. 18, 2016)Tom Diederich portfolio presentation (updated Nov. 18, 2016)
Tom Diederich portfolio presentation (updated Nov. 18, 2016)
 
How to build & grow online communities: with Tom Diederich
How to build & grow online communities: with Tom DiederichHow to build & grow online communities: with Tom Diederich
How to build & grow online communities: with Tom Diederich
 
Troubleshooting Apache® Ignite™
Troubleshooting Apache® Ignite™Troubleshooting Apache® Ignite™
Troubleshooting Apache® Ignite™
 
How to build a production-ready in-memory-based application in 1 hour
How to build a production-ready in-memory-based application in 1 hourHow to build a production-ready in-memory-based application in 1 hour
How to build a production-ready in-memory-based application in 1 hour
 
IT Modernization in Practice
IT Modernization in PracticeIT Modernization in Practice
IT Modernization in Practice
 
In-Memory Key Value Store (KVS) in FPGA for Ultra Low Latency and High Throug...
In-Memory Key Value Store (KVS) in FPGA for Ultra Low Latency and High Throug...In-Memory Key Value Store (KVS) in FPGA for Ultra Low Latency and High Throug...
In-Memory Key Value Store (KVS) in FPGA for Ultra Low Latency and High Throug...
 
Machine learning and deep learning with Apache Ignite
Machine learning and deep learning with Apache IgniteMachine learning and deep learning with Apache Ignite
Machine learning and deep learning with Apache Ignite
 
Heimdall Data: "Increase Application Performance with SQL Auto-Caching; No Co...
Heimdall Data: "Increase Application Performance with SQL Auto-Caching; No Co...Heimdall Data: "Increase Application Performance with SQL Auto-Caching; No Co...
Heimdall Data: "Increase Application Performance with SQL Auto-Caching; No Co...
 
Improving Apache Spark™ In-Memory Computing with Apache Ignite™
 Improving Apache Spark™ In-Memory Computing with Apache Ignite™ Improving Apache Spark™ In-Memory Computing with Apache Ignite™
Improving Apache Spark™ In-Memory Computing with Apache Ignite™
 
Comparing Apache Ignite and Cassandra for Hybrid Transactional/Analytical Pro...
Comparing Apache Ignite and Cassandra for Hybrid Transactional/Analytical Pro...Comparing Apache Ignite and Cassandra for Hybrid Transactional/Analytical Pro...
Comparing Apache Ignite and Cassandra for Hybrid Transactional/Analytical Pro...
 
“Building consistent and highly available distributed systems with Apache Ign...
“Building consistent and highly available distributed systems with Apache Ign...“Building consistent and highly available distributed systems with Apache Ign...
“Building consistent and highly available distributed systems with Apache Ign...
 
Quick MySQL performance check
Quick MySQL performance checkQuick MySQL performance check
Quick MySQL performance check
 

Último

TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
mohitmore19
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
 

Último (20)

W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdf
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 

Ingesting streaming data for analysis in apache ignite (stream sets theme)

  • 1. 1© StreamSets, Inc. All rights reserved. Ingesting Bulk and Streaming Data for Analysis in Apache Ignite Pat Patterson pat@streamsets.com @metadaddy
  • 2. 2© StreamSets, Inc. All rights reserved. Agenda Product Support Use Case Continuous Queries in Apache Ignite Integrating StreamSets Data Collector with Apache Ignite Demo Wrap-up
  • 3. 3© StreamSets, Inc. All rights reserved. Who Am I? Pat Patterson / pat@streamsets.com / @metadaddy Past: Sun Microsystems, Salesforce Present: Director of Evangelism, StreamSets I run far 
  • 4. 4© StreamSets, Inc. All rights reserved. Who is StreamSets? Seasoned leadership team Customer base from global 8000 50% Unique commercial downloaders 2000+ Open source downloads worldwide 3,000,000+ Broad connectivity 50+ History of innovation
  • 5. 5© StreamSets, Inc. All rights reserved. Use Case: Product Support Customer service platform (SaaS) holds support ticket status, assignment to support engineers HR system (relational database) holds reporting structure Device monitoring system (flat delimited files) provides fault data How do we discover which director’s team has the most high- priority tickets? How do we get notifications of faults for high-priority tickets?
  • 6. 6© StreamSets, Inc. All rights reserved. Continuous Queries (1) Enable you to listen to data modifications occurring on Ignite caches Specify optional initial query, remote filter, local listener Initial query can be any type: Scan, SQL , or TEXT Remote filter executes on primary and backup nodes
  • 7. 7© StreamSets, Inc. All rights reserved. Continuous Queries (2) Local listener executes in your app’s JVM Can use BinaryObjects for generic, performant code Can use ContinuousQueryWithTransformer to run a remote transformer • Restrict results to a subset of the available fields
  • 8. 8© StreamSets, Inc. All rights reserved. Continuous Query with Binary Objects - Setup // Get a cache object IgniteCache<Object, BinaryObject> cache = ignite.cache(cacheName).withKeepBinary(); // Create a continuous query ContinuousQuery<Object, BinaryObject> qry = new ContinuousQuery<>(); // Set an initial query - match a field value qry.setInitialQuery(new ScanQuery<>((IgniteBiPredicate<Object, BinaryObject>) (key, val) -> { System.out.println("### applying initial query predicate"); return val.field(filterFieldName).toString().equals("fieldValue"); })); // Filter the cache updates qry.setRemoteFilterFactory(() -> event -> { System.out.println("### evaluating cache entry event filter"); return event.getValue().field(filterFieldName).toString().equals("fieldValue"); });
  • 9. 9© StreamSets, Inc. All rights reserved. Continuous Query with Binary Objects - Listener // Process notifications qry.setLocalListener((evts) -> { for (CacheEntryEvent<? extends Object, ? extends BinaryObject> e : evts) { Object key = e.getKey(); BinaryObject newValue = e.getValue(); System.out.println("Cache entry with ID: " + e.getKey() + " was " + e.getEventType().toString().toLowerCase()); BinaryObject oldValue = (e.isOldValueAvailable()) ? e.getOldValue() : null; processChange(key, oldValue, newValue); } });
  • 10. 10© StreamSets, Inc. All rights reserved. Continuous Query with Binary Objects – Run the Query // Run the continuous query try (QueryCursor<Cache.Entry<Object, BinaryObject>> cur = cache.query(qry)) { // Iterate over existing cache data for (Cache.Entry<Object, BinaryObject> e : cur) { processRecord(e.getKey(), e.getValue()); } // Sleep until killed boolean done = false; while (!done) { try { Thread.sleep(1000); } catch (InterruptedException e) { done = true; } } }
  • 11. 11© StreamSets, Inc. All rights reserved. Demo: Continuous Query Basics
  • 12. 12© StreamSets, Inc. All rights reserved. Continuous Query with Transformer ContinuousQueryWithTransformer<Object, BinaryObject, String> qry = new ContinuousQueryWithTransformer<>(); // Transform result – executes remotely qry.setRemoteTransformerFactory(() -> event -> { System.out.println("### applying transformation"); return event.getValue().field("fieldname").toString(); }); // Process notifications - executes locally qry.setLocalListener((values) -> { for (String value : values) { System.out.println(transformerField + ": " + value); } });
  • 13. 13© StreamSets, Inc. All rights reserved. Demo: Continuous Query Transformers
  • 14. 14© StreamSets, Inc. All rights reserved. Key Learnings Need to enable peer class loading so that app can send code to execute on remote node <property name="peerClassLoadingEnabled" value="true"/> By default, CREATE TABLE City … in SQL means you have to use ignite.cache("SQL_PUBLIC_CITY") Override when creating table with cache_name=city Binary Objects make your life simpler and faster RTFM! CacheContinuousQueryExample.java is very helpful!
  • 15. 15© StreamSets, Inc. All rights reserved. The StreamSets DataOps Platform Data Lake
  • 16. 16© StreamSets, Inc. All rights reserved. A Swiss Army Knife for Data
  • 17. 17© StreamSets, Inc. All rights reserved. StreamSets Data Collector and Ignite SELECT c.case_number, f.serial_number, f.fault FROM Case c RIGHT JOIN Fault f ON c.serial_number = f.serial_number;
  • 18. 18© StreamSets, Inc. All rights reserved. Ignite JDBC Driver org.apache.ignite.IgniteJdbcThinDriver JDBC Thin Driver - connects to cluster node, lightweight! Located in ignite-core-2.7.0.jar JDBC URL of form: jdbc:ignite:thin://hostname NOTE – table, column names must be UPPERCASE!!! - IGNITE-9730 Also JDBC Client Driver – starts its own client node
  • 19. 19© StreamSets, Inc. All rights reserved. Demo: Ingest from CSV
  • 20. 20© StreamSets, Inc. All rights reserved. Data Architecture MySQL Salesforce
  • 21. 21© StreamSets, Inc. All rights reserved. Demo: Ingest Kafka, MySQL, Salesforce
  • 22. 22© StreamSets, Inc. All rights reserved. But… IGNITE-9606 breaks JDBC integrations 🙁 metadata.getPrimaryKeys() returns _KEY as the column name, returns column name (e.g. ID) as primary key name Fixed in forthcoming 2.8.0 release 🙁 Had to include ugly workaround to get my demo working: ResultSet result = metadata.getPrimaryKeys(connection.getCatalog(), schema, table); while (result.next()) { // Workaround for Ignite bug String pk = result.getString(COLUMN_NAME); if ("_KEY".equals(pk)) { pk = result.getString(PK_NAME); } keys.add(pk); }
  • 23. 23© StreamSets, Inc. All rights reserved. Continuous Streaming Application Listen for high priority service tickets Get the last 10 critical/error sensor readings for the affected device
  • 24. 24© StreamSets, Inc. All rights reserved. Continuous Streaming Application // SQL query to run with serial number SqlFieldsQuery sql = new SqlFieldsQuery( "SELECT timestamp, fault FROM fault WHERE serial_number = ? AND fault IN ('Critical', 'Error’) ORDER BY timestamp DESC LIMIT 10" ); // Process notifications - executes locally qry.setLocalListener((values) -> { for (String serialNumber : values) { System.out.println("Device serial number " + serialNumber); System.out.println("Last 10 faults:"); QueryCursor<List<?>> query = faultCache.query(sql.setArgs(serialNumber)); for (List<?> result : query.getAll()) { System.out.println(result.get(0) + " | " + result.get(1)); } } });
  • 25. 25© StreamSets, Inc. All rights reserved. Demo: Ignite Streaming App
  • 26. 26© StreamSets, Inc. All rights reserved. Other Common StreamSets Use Cases Data Lake Replatforming IoT Cybersecurity Real-time applications
  • 27. 27© StreamSets, Inc. All rights reserved. Customer Success “StreamSets allowed us to build and operate over 175,000 pipelines and synchronize 97% of our structured data in R&D to our data lake within 4 months. This will save us billions of dollars.” “StreamSets strikes the perfect balance between ease-of- use and extensibility for users with varying coding skills. E*Trade has been able to unify all security log data in a single location for all cybersecurity needs” “StreamSets has dramatically improved time to analysis and reliability of our data science efforts around revenue, quality control and ad inventory for our websites.”
  • 28. 28© StreamSets, Inc. All rights reserved. Conclusion Ignite’s continuous queries provide a robust notification mechanism for acting on changing data Ignite’s Thin JDBC driver is still a work in progress, but promises to be a useful, lightweight alternative to the older Client Driver StreamSets Data Collector can read data from a wide variety of sources and write to Ignite via the JDBC driver
  • 29. 29© StreamSets, Inc. All rights reserved. References Apache Ignite Continuous Queries apacheignite.readme.io/docs/continuous-queries Neo4j JDBC Driver apacheignite-sql.readme.io/docs/jdbc-driver Download StreamSets streamsets.com/opensource StreamSets Community streamsets.com/community
  • 30. 30© StreamSets, Inc. All rights reserved. Thank you 30© StreamSets, Inc. All rights reserved. Pat Patterson pat@streamsets.com @metadaddy Thank You!

Notas del editor

  1. Companies across industries use StreamSets for a broad range of technical use cases. Customers use StreamSets for data lake replatforming, simplifying migration to Hadoop, rationalization of data warehouse spend&the building of hybrid cloud architectures. StreamSets is also used for a broad range of IoT use cases in which it enables the streaming of data from IoT devices, ingest and compute at the edge, and predictive maintenance for connected devices. Customers also use StreamSets for cybersecurity, and ingest from new data sources like network systems and endpoints to detect advanced persistent threats and improve forensics. And finally, StreamSets is used to build and enable real-time applications for mixing stream processing and batch loads into converged workloads to modernize existing applications.