Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.

Ingesting streaming data for analysis in apache ignite (stream sets theme)

110 visualizaciones

Publicado el

Apache Ignite provides a distributed platform for a wide variety of workloads, but often the issue is simply in getting data into the database in the first place. The wide variety of data sources and formats presents a challenge to any data engineer; in addition, 'data drift', the constant and inevitable mutation of the incoming data's structure and semantics, can break even the most well-engineered integration.

This session, aimed at data architects, data engineers and developers, will explore how we can use the open source StreamSets Data Collector to build robust data pipelines. Attendees will learn how to collect data from cloud platforms such as Amazon and Salesforce, devices, relational databases and other sources, continuously stream it to Ignite, and then use features such as Ignite's continuous queries to perform streaming analysis.

We'll start by covering the basics of reading files from disk, move on to relational databases, then look at more challenging sources such as APIs and message queues. You will learn how to:
* Build data pipelines to ingest a wide variety of data into Apache Ignite
* Anticipate and manage data drift to ensure that data keeps flowing
* Perform simple and complex ad-hoc queries in Ignite via SQL
* Write applications using Ignite to run continuous queries, combining data from multiple sources

Publicado en: Software
  • Sé el primero en comentar

Ingesting streaming data for analysis in apache ignite (stream sets theme)

  1. 1. 1© StreamSets, Inc. All rights reserved. Ingesting Bulk and Streaming Data for Analysis in Apache Ignite Pat Patterson pat@streamsets.com @metadaddy
  2. 2. 2© StreamSets, Inc. All rights reserved. Agenda Product Support Use Case Continuous Queries in Apache Ignite Integrating StreamSets Data Collector with Apache Ignite Demo Wrap-up
  3. 3. 3© StreamSets, Inc. All rights reserved. Who Am I? Pat Patterson / pat@streamsets.com / @metadaddy Past: Sun Microsystems, Salesforce Present: Director of Evangelism, StreamSets I run far 
  4. 4. 4© StreamSets, Inc. All rights reserved. Who is StreamSets? Seasoned leadership team Customer base from global 8000 50% Unique commercial downloaders 2000+ Open source downloads worldwide 3,000,000+ Broad connectivity 50+ History of innovation
  5. 5. 5© StreamSets, Inc. All rights reserved. Use Case: Product Support Customer service platform (SaaS) holds support ticket status, assignment to support engineers HR system (relational database) holds reporting structure Device monitoring system (flat delimited files) provides fault data How do we discover which director’s team has the most high- priority tickets? How do we get notifications of faults for high-priority tickets?
  6. 6. 6© StreamSets, Inc. All rights reserved. Continuous Queries (1) Enable you to listen to data modifications occurring on Ignite caches Specify optional initial query, remote filter, local listener Initial query can be any type: Scan, SQL , or TEXT Remote filter executes on primary and backup nodes
  7. 7. 7© StreamSets, Inc. All rights reserved. Continuous Queries (2) Local listener executes in your app’s JVM Can use BinaryObjects for generic, performant code Can use ContinuousQueryWithTransformer to run a remote transformer • Restrict results to a subset of the available fields
  8. 8. 8© StreamSets, Inc. All rights reserved. Continuous Query with Binary Objects - Setup // Get a cache object IgniteCache<Object, BinaryObject> cache = ignite.cache(cacheName).withKeepBinary(); // Create a continuous query ContinuousQuery<Object, BinaryObject> qry = new ContinuousQuery<>(); // Set an initial query - match a field value qry.setInitialQuery(new ScanQuery<>((IgniteBiPredicate<Object, BinaryObject>) (key, val) -> { System.out.println("### applying initial query predicate"); return val.field(filterFieldName).toString().equals("fieldValue"); })); // Filter the cache updates qry.setRemoteFilterFactory(() -> event -> { System.out.println("### evaluating cache entry event filter"); return event.getValue().field(filterFieldName).toString().equals("fieldValue"); });
  9. 9. 9© StreamSets, Inc. All rights reserved. Continuous Query with Binary Objects - Listener // Process notifications qry.setLocalListener((evts) -> { for (CacheEntryEvent<? extends Object, ? extends BinaryObject> e : evts) { Object key = e.getKey(); BinaryObject newValue = e.getValue(); System.out.println("Cache entry with ID: " + e.getKey() + " was " + e.getEventType().toString().toLowerCase()); BinaryObject oldValue = (e.isOldValueAvailable()) ? e.getOldValue() : null; processChange(key, oldValue, newValue); } });
  10. 10. 10© StreamSets, Inc. All rights reserved. Continuous Query with Binary Objects – Run the Query // Run the continuous query try (QueryCursor<Cache.Entry<Object, BinaryObject>> cur = cache.query(qry)) { // Iterate over existing cache data for (Cache.Entry<Object, BinaryObject> e : cur) { processRecord(e.getKey(), e.getValue()); } // Sleep until killed boolean done = false; while (!done) { try { Thread.sleep(1000); } catch (InterruptedException e) { done = true; } } }
  11. 11. 11© StreamSets, Inc. All rights reserved. Demo: Continuous Query Basics
  12. 12. 12© StreamSets, Inc. All rights reserved. Continuous Query with Transformer ContinuousQueryWithTransformer<Object, BinaryObject, String> qry = new ContinuousQueryWithTransformer<>(); // Transform result – executes remotely qry.setRemoteTransformerFactory(() -> event -> { System.out.println("### applying transformation"); return event.getValue().field("fieldname").toString(); }); // Process notifications - executes locally qry.setLocalListener((values) -> { for (String value : values) { System.out.println(transformerField + ": " + value); } });
  13. 13. 13© StreamSets, Inc. All rights reserved. Demo: Continuous Query Transformers
  14. 14. 14© StreamSets, Inc. All rights reserved. Key Learnings Need to enable peer class loading so that app can send code to execute on remote node <property name="peerClassLoadingEnabled" value="true"/> By default, CREATE TABLE City … in SQL means you have to use ignite.cache("SQL_PUBLIC_CITY") Override when creating table with cache_name=city Binary Objects make your life simpler and faster RTFM! CacheContinuousQueryExample.java is very helpful!
  15. 15. 15© StreamSets, Inc. All rights reserved. The StreamSets DataOps Platform Data Lake
  16. 16. 16© StreamSets, Inc. All rights reserved. A Swiss Army Knife for Data
  17. 17. 17© StreamSets, Inc. All rights reserved. StreamSets Data Collector and Ignite SELECT c.case_number, f.serial_number, f.fault FROM Case c RIGHT JOIN Fault f ON c.serial_number = f.serial_number;
  18. 18. 18© StreamSets, Inc. All rights reserved. Ignite JDBC Driver org.apache.ignite.IgniteJdbcThinDriver JDBC Thin Driver - connects to cluster node, lightweight! Located in ignite-core-2.7.0.jar JDBC URL of form: jdbc:ignite:thin://hostname NOTE – table, column names must be UPPERCASE!!! - IGNITE-9730 Also JDBC Client Driver – starts its own client node
  19. 19. 19© StreamSets, Inc. All rights reserved. Demo: Ingest from CSV
  20. 20. 20© StreamSets, Inc. All rights reserved. Data Architecture MySQL Salesforce
  21. 21. 21© StreamSets, Inc. All rights reserved. Demo: Ingest Kafka, MySQL, Salesforce
  22. 22. 22© StreamSets, Inc. All rights reserved. But… IGNITE-9606 breaks JDBC integrations 🙁 metadata.getPrimaryKeys() returns _KEY as the column name, returns column name (e.g. ID) as primary key name Fixed in forthcoming 2.8.0 release 🙁 Had to include ugly workaround to get my demo working: ResultSet result = metadata.getPrimaryKeys(connection.getCatalog(), schema, table); while (result.next()) { // Workaround for Ignite bug String pk = result.getString(COLUMN_NAME); if ("_KEY".equals(pk)) { pk = result.getString(PK_NAME); } keys.add(pk); }
  23. 23. 23© StreamSets, Inc. All rights reserved. Continuous Streaming Application Listen for high priority service tickets Get the last 10 critical/error sensor readings for the affected device
  24. 24. 24© StreamSets, Inc. All rights reserved. Continuous Streaming Application // SQL query to run with serial number SqlFieldsQuery sql = new SqlFieldsQuery( "SELECT timestamp, fault FROM fault WHERE serial_number = ? AND fault IN ('Critical', 'Error’) ORDER BY timestamp DESC LIMIT 10" ); // Process notifications - executes locally qry.setLocalListener((values) -> { for (String serialNumber : values) { System.out.println("Device serial number " + serialNumber); System.out.println("Last 10 faults:"); QueryCursor<List<?>> query = faultCache.query(sql.setArgs(serialNumber)); for (List<?> result : query.getAll()) { System.out.println(result.get(0) + " | " + result.get(1)); } } });
  25. 25. 25© StreamSets, Inc. All rights reserved. Demo: Ignite Streaming App
  26. 26. 26© StreamSets, Inc. All rights reserved. Other Common StreamSets Use Cases Data Lake Replatforming IoT Cybersecurity Real-time applications
  27. 27. 27© StreamSets, Inc. All rights reserved. Customer Success “StreamSets allowed us to build and operate over 175,000 pipelines and synchronize 97% of our structured data in R&D to our data lake within 4 months. This will save us billions of dollars.” “StreamSets strikes the perfect balance between ease-of- use and extensibility for users with varying coding skills. E*Trade has been able to unify all security log data in a single location for all cybersecurity needs” “StreamSets has dramatically improved time to analysis and reliability of our data science efforts around revenue, quality control and ad inventory for our websites.”
  28. 28. 28© StreamSets, Inc. All rights reserved. Conclusion Ignite’s continuous queries provide a robust notification mechanism for acting on changing data Ignite’s Thin JDBC driver is still a work in progress, but promises to be a useful, lightweight alternative to the older Client Driver StreamSets Data Collector can read data from a wide variety of sources and write to Ignite via the JDBC driver
  29. 29. 29© StreamSets, Inc. All rights reserved. References Apache Ignite Continuous Queries apacheignite.readme.io/docs/continuous-queries Neo4j JDBC Driver apacheignite-sql.readme.io/docs/jdbc-driver Download StreamSets streamsets.com/opensource StreamSets Community streamsets.com/community
  30. 30. 30© StreamSets, Inc. All rights reserved. Thank you 30© StreamSets, Inc. All rights reserved. Pat Patterson pat@streamsets.com @metadaddy Thank You!

×