Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.

Akka, Spark or Kafka? Selecting The Right Streaming Engine For the Job

14.843 visualizaciones

Publicado el

For many businesses, the batch-oriented architecture of Big Data–where data is captured in large, scalable stores, then processed later–is simply too slow: a new breed of “Fast Data” architectures has evolved to be stream-oriented, where data is processed as it arrives, providing businesses with a competitive advantage.

There are many stream processing tools, so which ones should you choose? It helps to consider several factors in the context of your applications:

* Low latency: How low (or high) is needed?
* High volume: How much volume must be handled?
* Integration with other tools: Which ones and how?
* Data processing: What kinds? In bulk? As individual events?

In this talk by Dean Wampler, PhD., VP of Fast Data Engineering at Lightbend, we’ll look at the criteria you need to consider when selecting technologies, plus specific examples of how four streaming tools–Akka Streams, Kafka Streams, Apache Flink and Apache Spark serve particular needs and use cases when working with continuous streams of data.

Publicado en: Software
  • Sex in your area is here: www.bit.ly/sexinarea
       Responder 
    ¿Estás seguro?    No
    Tu mensaje aparecerá aquí
  • Dating for everyone is here: www.bit.ly/2AJerkH
       Responder 
    ¿Estás seguro?    No
    Tu mensaje aparecerá aquí
  • Learn a Little-Known, But 100% Proven Way To Make Money Money Online. ♣♣♣ https://tinyurl.com/realmoneystreams4u
       Responder 
    ¿Estás seguro?    No
    Tu mensaje aparecerá aquí
  • Earn $90/day Working Online You won't get rich, but it is going to make you some money! ➥➥➥ https://tinyurl.com/ezpayjobs2019
       Responder 
    ¿Estás seguro?    No
    Tu mensaje aparecerá aquí
  • Sex in your area for one night is there tinyurl.com/hotsexinarea Copy and paste link in your browser to visit a site)
       Responder 
    ¿Estás seguro?    No
    Tu mensaje aparecerá aquí

Akka, Spark or Kafka? Selecting The Right Streaming Engine For the Job

  1. 1. Check out these resources: Dean’s book Webinars etc. Fast Data Architectures 
 for Streaming Applications Getting Answers Now from Data Sets that Never End By Dean Wampler, Ph. D., VP of Fast Data Engineering 2 lightbend.com/products/fast-data-platform
  2. 2. Streaming Engines in Context…
  3. 3. Classic Batch Architecture: Hadoop
  4. 4. Logs Persistence S3 HDFS DiskDiskDisk SQL/ NoSQL Search YARN Resource Manager Node Manager N M Batch MapReduce … Spark Flume SqoopDBs
  5. 5. Logs Persistence S3 HDFS DiskDiskDisk SQL/ NoSQL Search YARN Resource Manager Node Manager N M Batch MapReduce … Spark Flume SqoopDBs
  6. 6. Logs Persistence S3 HDFS DiskDiskDisk SQL/ NoSQL Search YARN Resource Manager Node Manager N M Batch MapReduce … Spark Flume SqoopDBs
  7. 7. Logs Persistence S3 HDFS DiskDiskDisk SQL/ NoSQL Search YARN Resource Manager Node Manager N M Batch MapReduce … Spark Flume SqoopDBs
  8. 8. Logs Persistence S3 HDFS DiskDiskDisk SQL/ NoSQL Search YARN Resource Manager Node Manager N M Batch MapReduce … Spark Flume SqoopDBs
  9. 9. New Streaming, “Fast Data” Architecture (but it also supports batch)
  10. 10. Mesos, Kubernetes, YARN, … Cloud, on premise, … Logs Sockets REST ZooKeeper Cluster ZK Low Latency and Mini-batch Spark Streaming Batch Spark … Low Latency Flink Ka9a Streams Akka Streams Beam … Persistence S3 HDFS DiskDiskDisk SQL/ NoSQL Search 1 5 6 3 11 KaFa Cluster Ka9a Microservices RP Go Node.js … 2 4 7 8 9 10 Beam
  11. 11. Mesos, Kubernetes, YARN, … Cloud, on premise, … Logs Sockets REST ZooKeeper Cluster ZK Low Latency and Mini-batch Spark Streaming Batch Spark … Low Latency Flink Ka9a Streams Akka Streams Beam … Persistence S3 HDFS DiskDiskDisk SQL/ NoSQL Search 1 5 6 3 11 KaFa Cluster Ka9a Microservices RP Go Node.js … 2 4 7 8 9 10 Beam
  12. 12. Mesos, Kubernetes, YARN, … Cloud, on premise, … Logs Sockets REST ZooKeeper Cluster ZK Low Latency and Mini-batch Spark Streaming Batch Spark … Low Latency Flink Ka9a Streams Akka Streams Beam … Persistence S3 HDFS DiskDiskDisk SQL/ NoSQL Search 1 5 6 3 11 KaFa Cluster Ka9a Microservices RP Go Node.js … 2 4 7 8 9 10 Beam
  13. 13. Mesos, Kubernetes, YARN, … Cloud, on premise, … Logs Sockets REST ZooKeeper Cluster ZK Low Latency and Mini-batch Spark Streaming Batch Spark … Low Latency Flink Ka9a Streams Akka Streams Beam … Persistence S3 HDFS DiskDiskDisk SQL/ NoSQL Search 1 5 6 3 11 KaFa Cluster Ka9a Microservices RP Go Node.js … 2 4 7 8 9 10 Beam
  14. 14. Mesos, Kubernetes, YARN, … Cloud, on premise, … Logs Sockets REST ZooKeeper Cluster ZK Low Latency and Mini-batch Spark Streaming Batch Spark … Low Latency Flink Ka9a Streams Akka Streams Beam … Persistence S3 HDFS DiskDiskDisk SQL/ NoSQL Search 1 5 6 3 11 KaFa Cluster Ka9a Microservices RP Go Node.js … 2 4 7 8 9 10 Beam
  15. 15. • Why Kafka? Service 1 Log & Other Files Internet Services Service 2 Service 3 Services Services N * M links ConsumersProducers Before: Service 1 Log & Other Files Internet Services Service 2 Service 3 Services Services N + M links ConsumersProducers After:
  16. 16. Mesos, Kubernetes, YARN, … Cloud, on premise, … Logs Sockets REST ZooKeeper Cluster ZK Low Latency and Mini-batch Spark Streaming Batch Spark … Low Latency Flink Ka9a Streams Akka Streams Beam … Persistence S3 HDFS DiskDiskDisk SQL/ NoSQL Search 1 5 6 3 11 KaFa Cluster Ka9a Microservices RP Go Node.js … 2 4 7 8 9 10 Beam
  17. 17. Mesos, Kubernetes, YARN, … Cloud, on premise, … Logs Sockets REST ZooKeeper Cluster ZK Low Latency and Mini-batch Spark Streaming Batch Spark … Low Latency Flink Ka9a Streams Akka Streams Beam … Persistence S3 HDFS DiskDiskDisk SQL/ NoSQL Search 1 5 6 3 11 KaFa Cluster Ka9a Microservices RP Go Node.js … 2 4 7 8 9 10 Beam
  18. 18. Mesos, Kubernetes, YARN, … Cloud, on premise, … Logs Sockets REST ZooKeeper Cluster ZK Low Latency and Mini-batch Spark Streaming Batch Spark … Low Latency Flink Ka9a Streams Akka Streams Beam … Persistence S3 HDFS DiskDiskDisk SQL/ NoSQL Search 1 5 6 3 11 KaFa Cluster Ka9a Microservices RP Go Node.js … 2 4 7 8 9 10 Beam
  19. 19. Streaming Engines
  20. 20. Features to Consider
  21. 21. • Low latency? How low? • High volume? How high? • Which kinds of data processing, analytics? • Process data in bulk or individually? •Bulk processing of records? •Individual processing of events? • Preferred application architecture?
  22. 22. • Low latency? How low? www.spacex.com/news
  23. 23. • Low latency? How low? • Real real time? pico- to microseconds www.spacex.com/news
  24. 24. • Low latency? How low? • < 100 microseconds? tradinghub.co/watch-list-for-mar-26th-2015/ www.usa.philips.com/
  25. 25. • Low latency? How low? • < 10 milliseconds? money.cnn.com/2017/05/12/pf/credit-card-mistakes/index.html
  26. 26. • Low latency? How low? • < 100s milliseconds? github.com/keen/dashboards coursera.org/learn/machine-learning
  27. 27. • Low latency? How low? • < 1 second to minutes ETL Model Training storage Data Model Training Model Serving Other Logic Logs Ka'a Raw Logs Topic Parsed Logs Topic Ka'a Streams Job
  28. 28. • Low latency? How low? • > 1 minute? • Use short batch jobs
  29. 29. • High volume? How high?
  30. 30. • High volume? How high? • < 1oK -100K per second? drdobbs.com/web-development/ soa-web-services-and-restful-systems/199902676
  31. 31. • High volume? How high? • > 1M per second? https://store.nest.com/product/thermostat/
  32. 32. • Which kinds of data processing, analytics? • SQL? SELECT COUNT(*) FROM my-iot-data GROUP BY zip-code val input = spark.read. format(“parquet”). stream(“my-iot-data”) input.groupBy(“zip-code”). count()
  33. 33. • Which kinds of data processing, analytics? • “Dataflow”? val sc = new SparkContext("local[*]", "Inverted Idx") sc.textFile("data/crawl") .map { line => val Array(path, text) = line.split(“t”,2); (path, text } flatMap { case (path, text) => text.split(“”"W+""").map((_, path)) } map { case (w, p) => ((w, p), 1) } reduceByKey { case (n1, n2) => n1 + n2 } map {
  34. 34. • Which kinds of data processing, analytics? • ETL? ETL Logs Ka'a Raw Logs Topic Parsed Logs Topic Ka'a Streams Job
  35. 35. • Which kinds of data processing, analytics? • Train and serve ML models? storage Data Model Training Model Serving Other Logic
  36. 36. • Process data in bulk or individually? • Individual events (i.e., CEP). • In bulk records (i.e., each datum’s identity unimportant). Microservice Microservice Microservice Microservice Service Actor 1 Event Event Event Event Event Event Router Actor Service Actor 2 … SA13 SA11 SA12 SA23 SA21 SA22 SELECT COUNT(*) FROM my-iot-data GROUP BY zip-code
  37. 37. • Preferred application architecture • Streaming library in an app? • Distributed services running your job? Mini-batch Spark Streaming Low Latency Flink Ka0a Streams Akka Streams Beam … Mini-batch Spark Streaming Low Latency Flink Ka0a Streams Akka Streams Beam …
  38. 38. Best of Breed Streaming Engines
  39. 39. Low Latency and Mini-batch Spark Streaming Batch Spark … Low Latency Flink Ka9a Streams Akka Streams Beam … Persistence S3 HDFS DiskDiskDisk SQL/ NoSQL Search 1 5 6 KaFa Cluster Ka9a 2 4 7 8 9 10 Beam
  40. 40. Low Latency and Mini-batch Spark Streaming Batch Spark … Low Latency Flink Ka9a Streams Akka Streams Beam … Persistence S3 HDFS DiskDiskDisk SQL/ NoSQL Search 1 5 6 KaFa Cluster Ka9a 2 4 7 8 9 10 Beam • Apache Beam • (Formerly Google Dataflow) • Define your flows; run with Flink, Spark, etc. • Beam is defining the state of the art for streaming semantics
  41. 41. Low Latency and Mini-batch Spark Streaming Batch Spark … Low Latency Flink Ka9a Streams Akka Streams Beam … Persistence S3 HDFS DiskDiskDisk SQL/ NoSQL Search 1 5 6 KaFa Cluster Ka9a 2 4 7 8 9 10 Beam • Apache Flink • Low-latency streaming • Best Beam runner • SQL, ML, etc.
  42. 42. Low Latency and Mini-batch Spark Streaming Batch Spark … Low Latency Flink Ka9a Streams Akka Streams Beam … Persistence S3 HDFS DiskDiskDisk SQL/ NoSQL Search 1 5 6 KaFa Cluster Ka9a 2 4 7 8 9 10 Beam • Apache Spark • Best known; large community • Batch, mini-batch, and new low-latency streaming • SQL, ML, etc.
  43. 43. Low Latency and Mini-batch Spark Streaming Batch Spark … Low Latency Flink Ka9a Streams Akka Streams Beam … Persistence S3 HDFS DiskDiskDisk SQL/ NoSQL Search 1 5 6 KaFa Cluster Ka9a 2 4 7 8 9 10 Beam • Akka Streams • Low-latency streaming • Rich dataflow language • Rich APIs for microservices, data sources and sinks • Excellent for model serving
  44. 44. Low Latency and Mini-batch Spark Streaming Batch Spark … Low Latency Flink Ka9a Streams Akka Streams Beam … Persistence S3 HDFS DiskDiskDisk SQL/ NoSQL Search 1 5 6 KaFa Cluster Ka9a 2 4 7 8 9 10 Beam • Kafka Streams • Read, write Kafka topics • Stream and Table abstractions • SQL on streams
  45. 45. Low Latency and Mini-batch Spark Streaming Batch Spark … Low Latency Flink Ka9a Streams Akka Streams Beam … Persistence S3 HDFS DiskDiskDisk SQL/ NoSQL Search 1 5 6 KaFa Cluster Ka9a 2 4 7 8 9 10 Beam • Spark or Flink? • Best for massive data sets • Rich analytics • Akka Streams or Kafka Streams • Best for microservice integration • Wider flexibility
  46. 46. Check out these resources: Dean’s book Webinars etc. Fast Data Architectures 
 for Streaming Applications Getting Answers Now from Data Sets that Never End By Dean Wampler, Ph. D., VP of Fast Data Engineering 48 lightbend.com/products/fast-data-platform
  47. 47. For more information on Lightbend Fast Data Platform: lightbend.com/fast-data-platform

×