Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.

Redis+Spark Structured Streaming: Roshan Kumar

149 visualizaciones

Publicado el

RedisConf19

Publicado en: Tecnología
  • Sé el primero en comentar

Redis+Spark Structured Streaming: Roshan Kumar

  1. 1. PRESENTED BY Redis + Spark Structured Streaming: A Perfect Combination to Scale-out Your Continuous Applications Roshan Kumar Redis Labs
  2. 2. PRESENTED BY This Presentation is About…. How to collect and process data stream in real-time at scale IoT User Activity Messages
  3. 3. PRESENTED BY
  4. 4. PRESENTED BY http://bit.ly/spark-redis
  5. 5. PRESENTED BY Breaking up the Solution into Functional Blocks Click data Record all clicks Count clicks in real-time Query clicks by assets 2. Data Processing1. Data Ingest 3. Data Querying
  6. 6. PRESENTED BY ClickAnalyzer Redis Stream Redis Hash Spark SQLStructured Stream Processing 1. Data Ingest 2. Data Processing 3. Data Querying The Actual Building Blocks of Our Solution Click data
  7. 7. PRESENTED BY 1. Data Ingest
  8. 8. PRESENTED BY ClickAnalyzer Redis Stream Redis Hash Spark SQLStructured Stream Processing 1. Data Ingest 2. Data Processing 3. Data Querying Data Ingest using Redis Streams
  9. 9. PRESENTED BY What is Redis Streams?
  10. 10. PRESENTED BY Redis Streams in its Simplest Form ConsumerProducer
  11. 11. PRESENTED BY Redis Streams can Connect Many Producers and Consumers Producer 2 Producer m Producer 1 Producer 3 Consumer 1 Consumer n Consumer 2 Consumer 3
  12. 12. PRESENTED BY Comparing Redis Streams with Redis Pub/Sub, Lists, Sorted Sets Pub/Sub • Fire and forget • No persistence • No lookback queries Lists • Tight coupling between producers and consumers • Persistence for transient data only • No lookback queries Sorted Sets • Data ordering isn’t built-in; producer controls the order • No maximum limit • The data structure is not designed to handle data streams
  13. 13. PRESENTED BY What is Redis Streams? Pub/Sub Lists Sorted Sets It is like Pub/Sub, but with persistence It is like Lists, but decouples producers and consumers It is like Sorted Sets, but asynchronous + • Lifecycle management of streaming data • Built-in support for timeseries data • A rich choice of options to the consumers to read streaming and static data • Super fast lookback queries powered by radix trees • Automatic eviction of data based on the upper limit
  14. 14. PRESENTED BY Redis Streams Benefits Analytics Data Backup Consumers Producer Messaging It enables asynchronous data exchange between producers and consumers and historical range queries
  15. 15. PRESENTED BY Redis Streams Benefits Producer Image Processor Arrival Rate: 500/sec Consumption Rate: 500/sec Image Processor Image Processor Image Processor Image Processor Redis Stream With consumer groups, you can scale out and avoid backlogs PRESENTED BY Redis Streams Benefits Producer Image Processor Arrival Rate: 500/sec Consumption Rate: 500/sec Image Processor Image Processor Image Processor Image Processor Redis Stream With consumer groups, you can scale out and avoid backlogs
  16. 16. PRESENTED BY Classifier 1 Classifier 2 Classifier n Consumer Group XREADGROUP XREAD Consumers Producer 2 Producer m Producer 1 Producer 3 XADD XACK Deep Learning-based Classification Analytics Data Backup Messaging Redis Streams Benefits Simplify data collection, processing and distribution to support complex scenarios
  17. 17. PRESENTED BY Our Ingest Solution Redis Stream 1. Data Ingest Command xadd clickstream * img [image_id] Sample data 127.0.0.1:6379> xrange clickstream - + 1) 1) "1553536458910-0" 2) 1) ”image_1" 2) "1" 2) 1) "1553536469080-0" 2) 1) ”image_3" 2) "1" 3) 1) "1553536489620-0" 2) 1) ”image_3" 2) "1” . . . .
  18. 18. PRESENTED BY 2. Data Processing
  19. 19. PRESENTED BY ClickAnalyzer Redis Stream Redis Hash Spark SQLStructured Stream Processing 1. Data Ingest 2. Data Processing 3. Data Querying Data Processing using Spark’s Structured Streaming
  20. 20. PRESENTED BY What is Structured Streaming?
  21. 21. PRESENTED BY “Structured Streaming provides fast, scalable, fault- tolerant, end-to-end exactly-once stream processing without the user having to reason about streaming.” Definition
  22. 22. PRESENTED BY How Structured Streaming Works? Micro-batches as DataFrames (tables) Source: Data Stream DataFrame Operations Selection: df.select(“xyz”).where(“a > 10”) Filtering: df.filter(_.a > 10).map(_.b) Aggregation: df.groupBy(”xyz").count() Windowing: df.groupBy( window($"timestamp", "10 minutes", "5 minutes"), $"word"” ).count() Deduplication: df.dropDuplicates("guid") Output Sink Spark Structured Streaming
  23. 23. PRESENTED BY ClickAnalyzer Redis Stream Redis HashStructured Stream Processing Redis Streams as data source Spark-Redis Library Redis as data sink § Developed using Scala § Compatible with Spark 2.3 and higher § Supports • RDD • DataFrames • Structured Streaming
  24. 24. PRESENTED BY 1. Connect to the Redis instance 2. Map Redis Stream to Structured Streaming schema 3. Create the query object 4. Run the query Redis Streams as Data Source
  25. 25. PRESENTED BY Redis Streams as Data Source 1. Connect to the Redis instance val spark = SparkSession.builder() .appName("redis-df") .master("local[*]") .config("spark.redis.host", "localhost") .config("spark.redis.port", "6379") .getOrCreate() val clickstream = spark.readStream .format("redis") .option("stream.keys","clickstream") .schema(StructType(Array( StructField("img", StringType) ))) .load() val queryByImg = clickstream.groupBy("img").count
  26. 26. PRESENTED BY Redis Streams as Data Source 2. Map Redis Stream to Structured Streaming schema val spark = SparkSession.builder() .appName("redis-df") .master("local[*]") .config("spark.redis.host", "localhost") .config("spark.redis.port", "6379") .getOrCreate() val clickstream = spark.readStream .format("redis") .option("stream.keys","clickstream") .schema(StructType(Array( StructField("img", StringType) ))) .load() val queryByImg = clickstream.groupBy("img").count xadd clickstream * img [image_id]
  27. 27. PRESENTED BY Redis Streams as Data Source 3. Create the query object val spark = SparkSession.builder() .appName("redis-df") .master("local[*]") .config("spark.redis.host", "localhost") .config("spark.redis.port", "6379") .getOrCreate() val clickstream = spark.readStream .format("redis") .option("stream.keys","clickstream") .schema(StructType(Array( StructField("img", StringType) ))) .load() val queryByImg = clickstream.groupBy("img").count
  28. 28. PRESENTED BY Redis Streams as Data Source 4. Run the query val clickstream = spark.readStream .format("redis") .option("stream.keys","clickstream") .schema(StructType(Array( StructField("img", StringType) ))) .load() val queryByImg = clickstream.groupBy("img").count val clickWriter: ClickForeachWriter = new ClickForeachWriter("localhost","6379") val query = queryByImg.writeStream .outputMode("update") .foreach(clickWriter) .start() query.awaitTermination() Custom output sink
  29. 29. PRESENTED BY Redis as Output Sink override def process(record: Row) = { var img = record.getString(0); var count = record.getLong(1); if(jedis == null){ connect() } jedis.hset("clicks:"+img, "img", img) jedis.hset("clicks:"+img, "count", count.toString) } Create a custom class extending ForeachWriter and override the method, process() Save as Hash with structure clicks:[image] img [image] count [count] Example clicks:image_1001 img image_1001 count 1029 clicks:image_1002 img image_1002 count 392 . . . . img count image_1001 1029 image_1002 392 . . . . Table: Clicks
  30. 30. PRESENTED BY 3. Data Querying
  31. 31. PRESENTED BY ClickAnalyzer Redis Stream Redis Hash Spark SQLStructured Stream Processing 1. Data Ingest 2. Data Processing 3. Data Querying Query Redis using Spark SQL
  32. 32. PRESENTED BY 1. Initialize Spark Context with Redis 2. Create table 3. Run Query 3 Steps to Query Redis using Spark SQL clicks:image_1001 img image_1001 count 1029 clicks:image_1002 img image_1002 count 392 . . . . img count image_1001 1029 image_1002 392 . . . . Redis Hash to SQL mapping
  33. 33. PRESENTED BY 1. Initialize scala> import org.apache.spark.sql.SparkSession scala> val spark = SparkSession.builder().appName("redis- test").master("local[*]").config("spark.redis.host","localhost").config("spark.redis.port","6379").getOrCreate() scala> val sc = spark.sparkContext scala> import spark.sql scala> import spark.implicits._ 2. Create table scala> sql("CREATE TABLE IF NOT EXISTS clicks(img STRING, count INT) USING org.apache.spark.sql.redis OPTIONS (table 'clicks’)”) How to Query Redis using Spark SQL
  34. 34. PRESENTED BY 3. Run Query scala> sql("select * from clicks").show(); +----------+-----+ | img|count| +----------+-----+ |image_1001| 1029| |image_1002| 392| |. | .| |. | .| |. | .| |. | .| +----------+-----+ How to Query Redis using Spark SQL
  35. 35. PRESENTED BY Recap
  36. 36. PRESENTED BY
  37. 37. PRESENTED BY ClickAnalyzer Redis Stream Redis Hash Spark SQLStructured Stream Processing 1. Data Ingest 2. Data Processing 3. Data Querying Building Blocks of our Solution Redis Streams as data source; Redis as data sinkSpark-Redis Library is used for:
  38. 38. PRESENTED BY Questions ? ? ? ? ? ? ? ? ? ? ?
  39. 39. Thank you! roshan@redislabs.com @roshankumar Roshan Kumar
  40. 40. PRESENTED BY
  41. 41. PRESENTED BY 1 Problem statement and the demo 2 Building blocks of the solution: • Data ingest: Redis Streams • Data processing: Structured Streaming + Spark-Redis • Data querying: Spark SQL + Redis 3 Recap Agenda

×