SlideShare una empresa de Scribd logo
1 de 39
Descargar para leer sin conexión
®
© 2015 MapR Technologies 1
®
© 2014 MapR Technologies
Overview of Apache Spark Streaming
Carol McDonald
®
© 2015 MapR Technologies 2
Agenda
•  Why Apache Spark Streaming ?
•  What is Apache Spark Streaming?
–  Key Concepts and Architecture
•  How it works by Example
®
© 2015 MapR Technologies 3
Why Spark Streaming?
•  Process Time Series data :
–  Results in near-real-time
•  Use Cases
–  Social network trends
–  Website statistics, monitoring
–  Fraud detection
–  Advertising click monetization
put
put
put
put
Time stamped data
data
•  Sensor, System Metrics, Events, log files
•  Stock Ticker, User Activity
•  Hi Volume, Velocity
Data for real-time
monitoring
®
© 2015 MapR Technologies 4
What is time series data?
•  Stuff with timestamps
–  Sensor data
–  log files
–  Phones..
Credit Card Transactions Web user behaviour
Social media
Log files
Geodata
Sensors
®
© 2015 MapR Technologies 5
Why Spark Streaming ?
What If?
•  You want to analyze data as it arrives?
For Example Time Series Data: Sensors, Clicks, Logs, Stats
®
© 2015 MapR Technologies 6
Batch Processing
It's 6:01 and 72 degrees
It's 6:02 and 75 degrees
It's 6:03 and 77 degrees
It's 6:04 and 85 degrees
It's 6:05 and 90 degrees
It's 6:06 and 85 degrees
It's 6:07 and 77 degrees
It's 6:08 and 75 degrees
It was hot at 6:05
yesterday!
Batch processing may be too late for some events
®
© 2015 MapR Technologies 7
Event Processing
It's 6:05 and
90 degrees
Someone should
open a window!
Streaming
Its becoming important to process events as they arrive
®
© 2015 MapR Technologies 8
What is Spark Streaming?
•  extension of the core Spark AP
•  enables scalable, high-throughput, fault-tolerant stream
processing of live data
Data Sources Data Sinks
®
© 2015 MapR Technologies 9
Stream Processing Architecture
Streaming
Sources/Apps
MapR-FS
Data Ingest
Topics
MapR-DB
Data Storage
MapR-FS
Apps	
  
Stream
Processing
®
© 2015 MapR Technologies 10
Key Concepts
•  Data Sources:
–  File Based: HDFS
–  Network Based: TCP sockets,
Twitter, Kafka, Flume, ZeroMQ, Akka Actor
•  Transformations
•  Output Operations
MapR-FS
Topics
®
© 2015 MapR Technologies 11
Spark Streaming Architecture
•  Divide	
  data	
  stream	
  into	
  batches	
  of	
  X	
  seconds	
  	
  
– Called	
  DStream	
  =	
  	
  sequence	
  of	
  RDDs	
  	
  	
  
Spark
Streaming
input data
stream
DStream RDD batches
Batch
interval
data from
time 0 to 1
data from
time 1 to 2
RDD @ time 2
data from
time 2 to 3
RDD @ time 3RDD @ time 1
®
© 2015 MapR Technologies 12
Resilient Distributed Datasets (RDD)
Spark revolves around RDDs
•  read only collection of
elements
®
© 2015 MapR Technologies 13
Resilient Distributed Datasets (RDD)
Spark revolves around RDDs
•  read only collection of
elements
•  operated on in parallel
•  Cached in memory
–  Or on disk
•  Fault tolerant
®
© 2015 MapR Technologies 14
Working With RDDs
RDD
RDD
RDD
RDD
Transformations
Action Value
linesWithErrorRDD.count()!
6!
!
linesWithErrorRDD.first()!
# Error line!
textFile = sc.textFile(”SomeFile.txt”)!
linesWithErrorRDD = linesRDD.filter(lambda line: “ERROR” in
line)!
®
© 2015 MapR Technologies 15
Process DStream
transform	
  
Transform	
  
map	
  
reduceByValue	
  
count	
  
DStream
RDDs
Dstream	
  
RDDs	
  
transform	
  transform	
  
•  Process	
  using	
  transformaBons	
  	
  
– creates	
  new	
  RDDs	
  
data from
time 0 to 1
data from
time 1 to 2
RDD @ time 2
data from
time 2 to 3
RDD @ time 3RDD @ time 1
RDD @ time 1 RDD @ time 2 RDD @ time 3
®
© 2015 MapR Technologies 16
Key Concepts
•  Data Sources
•  Transformations: create new DStream
–  Standard RDD operations: map, filter, union, reduce, join, …
–  Stateful operations: UpdateStateByKey(function),
countByValueAndWindow, …
•  Output Operations
®
© 2015 MapR Technologies 17
Spark Streaming Architecture
•  processed	
  results	
  are	
  pushed	
  out	
  	
  in	
  batches	
  
Spark
batches of processed
results
Spark
Streaming
input data
stream
DStream RDD batches
data from
time 0 to 1
data from
time 1 to 2
RDD @ time 2
data from
time 2 to 3
RDD @ time 3RDD @ time 1
®
© 2015 MapR Technologies 18
Key Concepts
•  Data Sources
•  Transformations
•  Output Operations: trigger Computation
–  saveAsHadoopFiles – save to HDFS
–  saveAsHadoopDataset – save to Hbase
–  saveAsTextFiles
–  foreach – do anything with each batch of RDDs
MapR-DB
MapR-FS
®
© 2015 MapR Technologies 19
Learning Goals
•  How it works by example
®
© 2015 MapR Technologies 20
Use Case: Time Series Data
Data for
real-time monitoring
read
Spark Processing
Spark
Streaming
Oil Pump Sensor data
®
© 2015 MapR Technologies 21
Convert Line of CSV data to Sensor Object
case class Sensor(resid: String, date: String, time: String,
hz: Double, disp: Double, flo: Double, sedPPM: Double,
psi: Double, chlPPM: Double)
def parseSensor(str: String): Sensor = {
val p = str.split(",")
Sensor(p(0), p(1), p(2), p(3).toDouble, p(4).toDouble, p(5).toDouble,
p(6).toDouble, p(7).toDouble, p(8).toDouble)
}
®
© 2015 MapR Technologies 22
Schema
•  All events stored, data CF could be set to expire data
•  Filtered alerts put in alerts CF
•  Daily summaries put in Stats CF
Row key
CF data CF alerts CF stats
hz … psi psi … hz_avg … psi_min
COHUTTA_3/10/14_1:01 10.37 84 0
COHUTTA_3/10/14 10 0
®
© 2015 MapR Technologies 23
Basic Steps for Spark Streaming code
These are the basic steps for Spark Streaming code:
1.  create a Dstream
1.  Apply transformations
2.  Apply output operations
2.  Start receiving data and processing it
–  using streamingContext.start().
3.  Wait for the processing to be stopped
–  using streamingContext.awaitTermination().
®
© 2015 MapR Technologies 24
Create a DStream
val ssc = new StreamingContext(sparkConf, Seconds(2))
val linesDStream = ssc.textFileStream(“/mapr/stream")
batch	
  
'me	
  0-­‐1	
  
linesDStream
batch	
  
'me	
  1-­‐2	
  
batch	
  
'me	
  1-­‐2	
  
DStream:	
  a	
  sequence	
  of	
  RDDs	
  represenBng	
  a	
  
stream	
  of	
  data	
  
stored	
  in	
  memory	
  as	
  an	
  
RDD	
  
®
© 2015 MapR Technologies 25
Process DStream
val linesDStream = ssc.textFileStream(”directory path")
val sensorDStream = linesDStream.map(parseSensor)
map	
  
new	
  RDDs	
  created	
  
for	
  every	
  batch	
  	
  
batch	
  
'me	
  0-­‐1	
  
linesDStream
RDDs
sensorDstream	
  
RDDs	
  
batch	
  
'me	
  1-­‐2	
  
map	
  map	
  
batch	
  
'me	
  1-­‐2	
  
®
© 2015 MapR Technologies 26
Process DStream
// for Each RDD
sensorDStream.foreachRDD { rdd =>
// filter sensor data for low psi
val alertRDD = rdd.filter(sensor => sensor.psi < 5.0)
. . .
}
®
© 2015 MapR Technologies 27
DataFrame and SQL Operations
// for Each RDD parse into a sensor object filter
sensorDStream.foreachRDD { rdd =>
. . .
alertRdd.toDF().registerTempTable(”alert”)
// join alert data with pump maintenance info
val res = sqlContext.sql(
"select s.resid,s.psi, p.pumpType
from alert s join pump p on s.resid = p.resid
join maint m on p.resid=m.resid")
. . .
}
®
© 2015 MapR Technologies 28
Save to HBase
// for Each RDD parse into a sensor object filter
sensorDStream.foreachRDD { rdd =>
. . .
// convert alert to put object write to HBase alerts
alertRDD.map(Sensor.convertToPutAlert)
.saveAsHadoopDataset(jobConfig)
}
®
© 2015 MapR Technologies 29
Save to HBase
rdd.map(Sensor.convertToPut).saveAsHadoopDataset(jobConfig)
map	
  
Put	
  objects	
  wriFen	
  	
  
To	
  HBase	
  
batch	
  
'me	
  0-­‐1	
  
linesRDD
DStream
sensorRDD	
  
Dstream	
  
batch	
  
'me	
  1-­‐2	
  
map	
  map	
  
batch	
  
'me	
  1-­‐2	
  
HBase
save save save
output	
  opera'on:	
  persist	
  data	
  to	
  external	
  storage	
  
®
© 2015 MapR Technologies 30
Start Receiving Data
sensorDStream.foreachRDD { rdd =>
. . .
}
// Start the computation
ssc.start()
// Wait for the computation to terminate
ssc.awaitTermination()
®
© 2015 MapR Technologies 31
Using HBase as a Source and Sink
read
write
Spark applicationHBase database
EXAMPLE: calculate and store summaries,
Pre-Computed, Materialized View
®
© 2015 MapR Technologies 32
HBase
HBase Read and Write
val hBaseRDD = sc.newAPIHadoopRDD(
conf,classOf[TableInputFormat],
classOf[org.apache.hadoop.hbase.io.ImmutableBytesWritable],
classOf[org.apache.hadoop.hbase.client.Result])
keyStatsRDD.map { case (k, v) => convertToPut(k,
v) }.saveAsHadoopDataset(jobConfig)
newAPIHadoopRDD
Row key Result
saveAsHadoopDataset
Key Put
HBase
Scan Result
®
© 2015 MapR Technologies 33
Read HBase
// Load an RDD of (rowkey, Result) tuples from HBase table
val hBaseRDD = sc.newAPIHadoopRDD(conf, classOf[TableInputFormat],
classOf[org.apache.hadoop.hbase.io.ImmutableBytesWritable],
classOf[org.apache.hadoop.hbase.client.Result])
// get Result
val resultRDD = hBaseRDD.map(tuple => tuple._2)
// transform into an RDD of (RowKey, ColumnValue)s
val keyValueRDD = resultRDD.map(
result => (Bytes.toString(result.getRow()).split(" ")(0),
Bytes.toDouble(result.value)))
// group by rowkey , get statistics for column value
val keyStatsRDD = keyValueRDD.groupByKey().mapValues(list =>
StatCounter(list))
®
© 2015 MapR Technologies 34
Write HBase
// save to HBase table CF data
val jobConfig: JobConf = new JobConf(conf, this.getClass)
jobConfig.setOutputFormat(classOf[TableOutputFormat])
jobConfig.set(TableOutputFormat.OUTPUT_TABLE, tableName)
// convert psi stats to put and write to hbase table stats column family
keyStatsRDD.map { case (k, v) =>
convertToPut(k, v) }.saveAsHadoopDataset(jobConfig)
®
© 2015 MapR Technologies 35
MapR Blog: Using Apache Spark DataFrames for Processing of Tabular Data
•  https://www.mapr.com/blog/spark-streaming-hbase
®
© 2015 MapR Technologies 36
Free HBase On Demand Training
(includes Hive and MapReduce with HBase)
•  https://www.mapr.com/services/mapr-academy/big-data-hadoop-
online-training
®
© 2015 MapR Technologies 37
Soon to Come
•  Spark On Demand Training
–  https://www.mapr.com/services/mapr-academy/
®
© 2015 MapR Technologies 38
References
•  Spark web site: http://spark.apache.org/
•  https://databricks.com/
•  Spark on MapR:
–  http://www.mapr.com/products/apache-spark
•  Spark SQL and DataFrame Guide
•  Apache Spark vs. MapReduce – Whiteboard Walkthrough
•  Learning Spark - O'Reilly Book
•  Apache Spark
®
© 2015 MapR Technologies 39
Q&A
@mapr maprtech
Engage with us!
MapR
maprtech
mapr-technologies

Más contenido relacionado

La actualidad más candente

Cmu-2011-09.pptx
Cmu-2011-09.pptxCmu-2011-09.pptx
Cmu-2011-09.pptxTed Dunning
 
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013Marcel Kornacker: Impala tech talk Tue Feb 26th 2013
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013Modern Data Stack France
 
Spark Summit East 2015 Advanced Devops Student Slides
Spark Summit East 2015 Advanced Devops Student SlidesSpark Summit East 2015 Advanced Devops Student Slides
Spark Summit East 2015 Advanced Devops Student SlidesDatabricks
 
Four Things to Know About Reliable Spark Streaming with Typesafe and Databricks
Four Things to Know About Reliable Spark Streaming with Typesafe and DatabricksFour Things to Know About Reliable Spark Streaming with Typesafe and Databricks
Four Things to Know About Reliable Spark Streaming with Typesafe and DatabricksLegacy Typesafe (now Lightbend)
 
Spark Streaming: Pushing the throughput limits by Francois Garillot and Gerar...
Spark Streaming: Pushing the throughput limits by Francois Garillot and Gerar...Spark Streaming: Pushing the throughput limits by Francois Garillot and Gerar...
Spark Streaming: Pushing the throughput limits by Francois Garillot and Gerar...Spark Summit
 
Hive+Tez: A performance deep dive
Hive+Tez: A performance deep diveHive+Tez: A performance deep dive
Hive+Tez: A performance deep divet3rmin4t0r
 
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...Codemotion
 
Tez Data Processing over Yarn
Tez Data Processing over YarnTez Data Processing over Yarn
Tez Data Processing over YarnInMobi Technology
 
Dealing with an Upside Down Internet
Dealing with an Upside Down InternetDealing with an Upside Down Internet
Dealing with an Upside Down InternetMapR Technologies
 
Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...
Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...
Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...Brian O'Neill
 
Deep Dive with Spark Streaming - Tathagata Das - Spark Meetup 2013-06-17
Deep Dive with Spark Streaming - Tathagata  Das - Spark Meetup 2013-06-17Deep Dive with Spark Streaming - Tathagata  Das - Spark Meetup 2013-06-17
Deep Dive with Spark Streaming - Tathagata Das - Spark Meetup 2013-06-17spark-project
 
Apache Drill: Building Highly Flexible, High Performance Query Engines by M.C...
Apache Drill: Building Highly Flexible, High Performance Query Engines by M.C...Apache Drill: Building Highly Flexible, High Performance Query Engines by M.C...
Apache Drill: Building Highly Flexible, High Performance Query Engines by M.C...The Hive
 
Drill at the Chug 9-19-12
Drill at the Chug 9-19-12Drill at the Chug 9-19-12
Drill at the Chug 9-19-12Ted Dunning
 
Hug france-2012-12-04
Hug france-2012-12-04Hug france-2012-12-04
Hug france-2012-12-04Ted Dunning
 
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...DataWorks Summit/Hadoop Summit
 

La actualidad más candente (20)

Using Apache Drill
Using Apache DrillUsing Apache Drill
Using Apache Drill
 
Cmu-2011-09.pptx
Cmu-2011-09.pptxCmu-2011-09.pptx
Cmu-2011-09.pptx
 
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013Marcel Kornacker: Impala tech talk Tue Feb 26th 2013
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013
 
Apache Spark & Hadoop
Apache Spark & HadoopApache Spark & Hadoop
Apache Spark & Hadoop
 
Spark Summit East 2015 Advanced Devops Student Slides
Spark Summit East 2015 Advanced Devops Student SlidesSpark Summit East 2015 Advanced Devops Student Slides
Spark Summit East 2015 Advanced Devops Student Slides
 
Four Things to Know About Reliable Spark Streaming with Typesafe and Databricks
Four Things to Know About Reliable Spark Streaming with Typesafe and DatabricksFour Things to Know About Reliable Spark Streaming with Typesafe and Databricks
Four Things to Know About Reliable Spark Streaming with Typesafe and Databricks
 
Spark Streaming: Pushing the throughput limits by Francois Garillot and Gerar...
Spark Streaming: Pushing the throughput limits by Francois Garillot and Gerar...Spark Streaming: Pushing the throughput limits by Francois Garillot and Gerar...
Spark Streaming: Pushing the throughput limits by Francois Garillot and Gerar...
 
Hive+Tez: A performance deep dive
Hive+Tez: A performance deep diveHive+Tez: A performance deep dive
Hive+Tez: A performance deep dive
 
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
 
Tez Data Processing over Yarn
Tez Data Processing over YarnTez Data Processing over Yarn
Tez Data Processing over Yarn
 
Dealing with an Upside Down Internet
Dealing with an Upside Down InternetDealing with an Upside Down Internet
Dealing with an Upside Down Internet
 
Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...
Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...
Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...
 
Yahoo's Experience Running Pig on Tez at Scale
Yahoo's Experience Running Pig on Tez at ScaleYahoo's Experience Running Pig on Tez at Scale
Yahoo's Experience Running Pig on Tez at Scale
 
Deep Dive with Spark Streaming - Tathagata Das - Spark Meetup 2013-06-17
Deep Dive with Spark Streaming - Tathagata  Das - Spark Meetup 2013-06-17Deep Dive with Spark Streaming - Tathagata  Das - Spark Meetup 2013-06-17
Deep Dive with Spark Streaming - Tathagata Das - Spark Meetup 2013-06-17
 
Apache Drill: Building Highly Flexible, High Performance Query Engines by M.C...
Apache Drill: Building Highly Flexible, High Performance Query Engines by M.C...Apache Drill: Building Highly Flexible, High Performance Query Engines by M.C...
Apache Drill: Building Highly Flexible, High Performance Query Engines by M.C...
 
Drill at the Chug 9-19-12
Drill at the Chug 9-19-12Drill at the Chug 9-19-12
Drill at the Chug 9-19-12
 
October 2014 HUG : Hive On Spark
October 2014 HUG : Hive On SparkOctober 2014 HUG : Hive On Spark
October 2014 HUG : Hive On Spark
 
Hug france-2012-12-04
Hug france-2012-12-04Hug france-2012-12-04
Hug france-2012-12-04
 
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
 
What's new in Hadoop Common and HDFS
What's new in Hadoop Common and HDFS What's new in Hadoop Common and HDFS
What's new in Hadoop Common and HDFS
 

Destacado

Apache HBase 入門 (第2回)
Apache HBase 入門 (第2回)Apache HBase 入門 (第2回)
Apache HBase 入門 (第2回)tatsuya6502
 
Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...
Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...
Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...DataWorks Summit/Hadoop Summit
 
HBaseConEast2016: HBase and Spark, State of the Art
HBaseConEast2016: HBase and Spark, State of the ArtHBaseConEast2016: HBase and Spark, State of the Art
HBaseConEast2016: HBase and Spark, State of the ArtMichael Stack
 
HBaseとSparkでセンサーデータを有効活用 #hbasejp
HBaseとSparkでセンサーデータを有効活用 #hbasejpHBaseとSparkでセンサーデータを有効活用 #hbasejp
HBaseとSparkでセンサーデータを有効活用 #hbasejpFwardNetwork
 
Apache HBase Internals you hoped you Never Needed to Understand
Apache HBase Internals you hoped you Never Needed to UnderstandApache HBase Internals you hoped you Never Needed to Understand
Apache HBase Internals you hoped you Never Needed to UnderstandJosh Elser
 
Apache HBase 入門 (第1回)
Apache HBase 入門 (第1回)Apache HBase 入門 (第1回)
Apache HBase 入門 (第1回)tatsuya6502
 

Destacado (7)

Apache HBase 入門 (第2回)
Apache HBase 入門 (第2回)Apache HBase 入門 (第2回)
Apache HBase 入門 (第2回)
 
Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...
Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...
Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...
 
HBaseConEast2016: HBase and Spark, State of the Art
HBaseConEast2016: HBase and Spark, State of the ArtHBaseConEast2016: HBase and Spark, State of the Art
HBaseConEast2016: HBase and Spark, State of the Art
 
HBaseとSparkでセンサーデータを有効活用 #hbasejp
HBaseとSparkでセンサーデータを有効活用 #hbasejpHBaseとSparkでセンサーデータを有効活用 #hbasejp
HBaseとSparkでセンサーデータを有効活用 #hbasejp
 
Apache HBase Internals you hoped you Never Needed to Understand
Apache HBase Internals you hoped you Never Needed to UnderstandApache HBase Internals you hoped you Never Needed to Understand
Apache HBase Internals you hoped you Never Needed to Understand
 
Spark + HBase
Spark + HBase Spark + HBase
Spark + HBase
 
Apache HBase 入門 (第1回)
Apache HBase 入門 (第1回)Apache HBase 入門 (第1回)
Apache HBase 入門 (第1回)
 

Similar a Apache Spark streaming and HBase

Spark Streaming Data Pipelines
Spark Streaming Data PipelinesSpark Streaming Data Pipelines
Spark Streaming Data PipelinesMapR Technologies
 
The Future of Hadoop: A deeper look at Apache Spark
The Future of Hadoop: A deeper look at Apache SparkThe Future of Hadoop: A deeper look at Apache Spark
The Future of Hadoop: A deeper look at Apache SparkCloudera, Inc.
 
Deep dive into spark streaming
Deep dive into spark streamingDeep dive into spark streaming
Deep dive into spark streamingTao Li
 
Paris Data Geek - Spark Streaming
Paris Data Geek - Spark Streaming Paris Data Geek - Spark Streaming
Paris Data Geek - Spark Streaming Djamel Zouaoui
 
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!Tugdual Grall
 
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...Debraj GuhaThakurta
 
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...Debraj GuhaThakurta
 
Visual Mapping of Clickstream Data
Visual Mapping of Clickstream DataVisual Mapping of Clickstream Data
Visual Mapping of Clickstream DataDataWorks Summit
 
Applying Machine Learning to Live Patient Data
Applying Machine Learning to  Live Patient DataApplying Machine Learning to  Live Patient Data
Applying Machine Learning to Live Patient DataCarol McDonald
 
An introduction To Apache Spark
An introduction To Apache SparkAn introduction To Apache Spark
An introduction To Apache SparkAmir Sedighi
 
Spark Study Notes
Spark Study NotesSpark Study Notes
Spark Study NotesRichard Kuo
 
Analyzing Real-World Data with Apache Drill
Analyzing Real-World Data with Apache DrillAnalyzing Real-World Data with Apache Drill
Analyzing Real-World Data with Apache DrillTomer Shiran
 
Microsoft R Server for Data Sciencea
Microsoft R Server for Data ScienceaMicrosoft R Server for Data Sciencea
Microsoft R Server for Data ScienceaData Science Thailand
 
Real Time Data Processing Using Spark Streaming
Real Time Data Processing Using Spark StreamingReal Time Data Processing Using Spark Streaming
Real Time Data Processing Using Spark StreamingHari Shreedharan
 
Spark Streaming & Kafka-The Future of Stream Processing
Spark Streaming & Kafka-The Future of Stream ProcessingSpark Streaming & Kafka-The Future of Stream Processing
Spark Streaming & Kafka-The Future of Stream ProcessingJack Gudenkauf
 
Spark Streaming& Kafka-The Future of Stream Processing by Hari Shreedharan of...
Spark Streaming& Kafka-The Future of Stream Processing by Hari Shreedharan of...Spark Streaming& Kafka-The Future of Stream Processing by Hari Shreedharan of...
Spark Streaming& Kafka-The Future of Stream Processing by Hari Shreedharan of...Data Con LA
 
Scrap Your MapReduce - Apache Spark
 Scrap Your MapReduce - Apache Spark Scrap Your MapReduce - Apache Spark
Scrap Your MapReduce - Apache SparkIndicThreads
 
Big data processing with Apache Spark and Oracle Database
Big data processing with Apache Spark and Oracle DatabaseBig data processing with Apache Spark and Oracle Database
Big data processing with Apache Spark and Oracle DatabaseMartin Toshev
 

Similar a Apache Spark streaming and HBase (20)

Spark Streaming Data Pipelines
Spark Streaming Data PipelinesSpark Streaming Data Pipelines
Spark Streaming Data Pipelines
 
The Future of Hadoop: A deeper look at Apache Spark
The Future of Hadoop: A deeper look at Apache SparkThe Future of Hadoop: A deeper look at Apache Spark
The Future of Hadoop: A deeper look at Apache Spark
 
Deep dive into spark streaming
Deep dive into spark streamingDeep dive into spark streaming
Deep dive into spark streaming
 
Paris Data Geek - Spark Streaming
Paris Data Geek - Spark Streaming Paris Data Geek - Spark Streaming
Paris Data Geek - Spark Streaming
 
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!
 
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
 
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
 
Visual Mapping of Clickstream Data
Visual Mapping of Clickstream DataVisual Mapping of Clickstream Data
Visual Mapping of Clickstream Data
 
Applying Machine Learning to Live Patient Data
Applying Machine Learning to  Live Patient DataApplying Machine Learning to  Live Patient Data
Applying Machine Learning to Live Patient Data
 
An introduction To Apache Spark
An introduction To Apache SparkAn introduction To Apache Spark
An introduction To Apache Spark
 
Spark Study Notes
Spark Study NotesSpark Study Notes
Spark Study Notes
 
MATLAB and Scientific Data: New Features and Capabilities
MATLAB and Scientific Data: New Features and CapabilitiesMATLAB and Scientific Data: New Features and Capabilities
MATLAB and Scientific Data: New Features and Capabilities
 
Analyzing Real-World Data with Apache Drill
Analyzing Real-World Data with Apache DrillAnalyzing Real-World Data with Apache Drill
Analyzing Real-World Data with Apache Drill
 
Microsoft R Server for Data Sciencea
Microsoft R Server for Data ScienceaMicrosoft R Server for Data Sciencea
Microsoft R Server for Data Sciencea
 
Real Time Data Processing Using Spark Streaming
Real Time Data Processing Using Spark StreamingReal Time Data Processing Using Spark Streaming
Real Time Data Processing Using Spark Streaming
 
Spark Streaming & Kafka-The Future of Stream Processing
Spark Streaming & Kafka-The Future of Stream ProcessingSpark Streaming & Kafka-The Future of Stream Processing
Spark Streaming & Kafka-The Future of Stream Processing
 
Spark Streaming& Kafka-The Future of Stream Processing by Hari Shreedharan of...
Spark Streaming& Kafka-The Future of Stream Processing by Hari Shreedharan of...Spark Streaming& Kafka-The Future of Stream Processing by Hari Shreedharan of...
Spark Streaming& Kafka-The Future of Stream Processing by Hari Shreedharan of...
 
Scrap Your MapReduce - Apache Spark
 Scrap Your MapReduce - Apache Spark Scrap Your MapReduce - Apache Spark
Scrap Your MapReduce - Apache Spark
 
Apache Spark Overview
Apache Spark OverviewApache Spark Overview
Apache Spark Overview
 
Big data processing with Apache Spark and Oracle Database
Big data processing with Apache Spark and Oracle DatabaseBig data processing with Apache Spark and Oracle Database
Big data processing with Apache Spark and Oracle Database
 

Más de Carol McDonald

Introduction to machine learning with GPUs
Introduction to machine learning with GPUsIntroduction to machine learning with GPUs
Introduction to machine learning with GPUsCarol McDonald
 
Streaming healthcare Data pipeline using Apache APIs: Kafka and Spark with Ma...
Streaming healthcare Data pipeline using Apache APIs: Kafka and Spark with Ma...Streaming healthcare Data pipeline using Apache APIs: Kafka and Spark with Ma...
Streaming healthcare Data pipeline using Apache APIs: Kafka and Spark with Ma...Carol McDonald
 
Analyzing Flight Delays with Apache Spark, DataFrames, GraphFrames, and MapR-DB
Analyzing Flight Delays with Apache Spark, DataFrames, GraphFrames, and MapR-DBAnalyzing Flight Delays with Apache Spark, DataFrames, GraphFrames, and MapR-DB
Analyzing Flight Delays with Apache Spark, DataFrames, GraphFrames, and MapR-DBCarol McDonald
 
Analysis of Popular Uber Locations using Apache APIs: Spark Machine Learning...
Analysis of Popular Uber Locations using Apache APIs:  Spark Machine Learning...Analysis of Popular Uber Locations using Apache APIs:  Spark Machine Learning...
Analysis of Popular Uber Locations using Apache APIs: Spark Machine Learning...Carol McDonald
 
Predicting Flight Delays with Spark Machine Learning
Predicting Flight Delays with Spark Machine LearningPredicting Flight Delays with Spark Machine Learning
Predicting Flight Delays with Spark Machine LearningCarol McDonald
 
Structured Streaming Data Pipeline Using Kafka, Spark, and MapR-DB
Structured Streaming Data Pipeline Using Kafka, Spark, and MapR-DBStructured Streaming Data Pipeline Using Kafka, Spark, and MapR-DB
Structured Streaming Data Pipeline Using Kafka, Spark, and MapR-DBCarol McDonald
 
Streaming Machine learning Distributed Pipeline for Real-Time Uber Data Using...
Streaming Machine learning Distributed Pipeline for Real-Time Uber Data Using...Streaming Machine learning Distributed Pipeline for Real-Time Uber Data Using...
Streaming Machine learning Distributed Pipeline for Real-Time Uber Data Using...Carol McDonald
 
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real-Ti...
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real-Ti...Applying Machine Learning to IOT: End to End Distributed Pipeline for Real-Ti...
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real-Ti...Carol McDonald
 
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real- T...
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real- T...Applying Machine Learning to IOT: End to End Distributed Pipeline for Real- T...
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real- T...Carol McDonald
 
How Big Data is Reducing Costs and Improving Outcomes in Health Care
How Big Data is Reducing Costs and Improving Outcomes in Health CareHow Big Data is Reducing Costs and Improving Outcomes in Health Care
How Big Data is Reducing Costs and Improving Outcomes in Health CareCarol McDonald
 
Demystifying AI, Machine Learning and Deep Learning
Demystifying AI, Machine Learning and Deep LearningDemystifying AI, Machine Learning and Deep Learning
Demystifying AI, Machine Learning and Deep LearningCarol McDonald
 
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...Carol McDonald
 
Streaming patterns revolutionary architectures
Streaming patterns revolutionary architectures Streaming patterns revolutionary architectures
Streaming patterns revolutionary architectures Carol McDonald
 
Spark machine learning predicting customer churn
Spark machine learning predicting customer churnSpark machine learning predicting customer churn
Spark machine learning predicting customer churnCarol McDonald
 
Fast Cars, Big Data How Streaming can help Formula 1
Fast Cars, Big Data How Streaming can help Formula 1Fast Cars, Big Data How Streaming can help Formula 1
Fast Cars, Big Data How Streaming can help Formula 1Carol McDonald
 
Streaming Patterns Revolutionary Architectures with the Kafka API
Streaming Patterns Revolutionary Architectures with the Kafka APIStreaming Patterns Revolutionary Architectures with the Kafka API
Streaming Patterns Revolutionary Architectures with the Kafka APICarol McDonald
 
Apache Spark Machine Learning Decision Trees
Apache Spark Machine Learning Decision TreesApache Spark Machine Learning Decision Trees
Apache Spark Machine Learning Decision TreesCarol McDonald
 
Advanced Threat Detection on Streaming Data
Advanced Threat Detection on Streaming DataAdvanced Threat Detection on Streaming Data
Advanced Threat Detection on Streaming DataCarol McDonald
 
Apache Spark Machine Learning
Apache Spark Machine LearningApache Spark Machine Learning
Apache Spark Machine LearningCarol McDonald
 

Más de Carol McDonald (20)

Introduction to machine learning with GPUs
Introduction to machine learning with GPUsIntroduction to machine learning with GPUs
Introduction to machine learning with GPUs
 
Streaming healthcare Data pipeline using Apache APIs: Kafka and Spark with Ma...
Streaming healthcare Data pipeline using Apache APIs: Kafka and Spark with Ma...Streaming healthcare Data pipeline using Apache APIs: Kafka and Spark with Ma...
Streaming healthcare Data pipeline using Apache APIs: Kafka and Spark with Ma...
 
Analyzing Flight Delays with Apache Spark, DataFrames, GraphFrames, and MapR-DB
Analyzing Flight Delays with Apache Spark, DataFrames, GraphFrames, and MapR-DBAnalyzing Flight Delays with Apache Spark, DataFrames, GraphFrames, and MapR-DB
Analyzing Flight Delays with Apache Spark, DataFrames, GraphFrames, and MapR-DB
 
Analysis of Popular Uber Locations using Apache APIs: Spark Machine Learning...
Analysis of Popular Uber Locations using Apache APIs:  Spark Machine Learning...Analysis of Popular Uber Locations using Apache APIs:  Spark Machine Learning...
Analysis of Popular Uber Locations using Apache APIs: Spark Machine Learning...
 
Predicting Flight Delays with Spark Machine Learning
Predicting Flight Delays with Spark Machine LearningPredicting Flight Delays with Spark Machine Learning
Predicting Flight Delays with Spark Machine Learning
 
Structured Streaming Data Pipeline Using Kafka, Spark, and MapR-DB
Structured Streaming Data Pipeline Using Kafka, Spark, and MapR-DBStructured Streaming Data Pipeline Using Kafka, Spark, and MapR-DB
Structured Streaming Data Pipeline Using Kafka, Spark, and MapR-DB
 
Streaming Machine learning Distributed Pipeline for Real-Time Uber Data Using...
Streaming Machine learning Distributed Pipeline for Real-Time Uber Data Using...Streaming Machine learning Distributed Pipeline for Real-Time Uber Data Using...
Streaming Machine learning Distributed Pipeline for Real-Time Uber Data Using...
 
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real-Ti...
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real-Ti...Applying Machine Learning to IOT: End to End Distributed Pipeline for Real-Ti...
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real-Ti...
 
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real- T...
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real- T...Applying Machine Learning to IOT: End to End Distributed Pipeline for Real- T...
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real- T...
 
How Big Data is Reducing Costs and Improving Outcomes in Health Care
How Big Data is Reducing Costs and Improving Outcomes in Health CareHow Big Data is Reducing Costs and Improving Outcomes in Health Care
How Big Data is Reducing Costs and Improving Outcomes in Health Care
 
Demystifying AI, Machine Learning and Deep Learning
Demystifying AI, Machine Learning and Deep LearningDemystifying AI, Machine Learning and Deep Learning
Demystifying AI, Machine Learning and Deep Learning
 
Spark graphx
Spark graphxSpark graphx
Spark graphx
 
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
 
Streaming patterns revolutionary architectures
Streaming patterns revolutionary architectures Streaming patterns revolutionary architectures
Streaming patterns revolutionary architectures
 
Spark machine learning predicting customer churn
Spark machine learning predicting customer churnSpark machine learning predicting customer churn
Spark machine learning predicting customer churn
 
Fast Cars, Big Data How Streaming can help Formula 1
Fast Cars, Big Data How Streaming can help Formula 1Fast Cars, Big Data How Streaming can help Formula 1
Fast Cars, Big Data How Streaming can help Formula 1
 
Streaming Patterns Revolutionary Architectures with the Kafka API
Streaming Patterns Revolutionary Architectures with the Kafka APIStreaming Patterns Revolutionary Architectures with the Kafka API
Streaming Patterns Revolutionary Architectures with the Kafka API
 
Apache Spark Machine Learning Decision Trees
Apache Spark Machine Learning Decision TreesApache Spark Machine Learning Decision Trees
Apache Spark Machine Learning Decision Trees
 
Advanced Threat Detection on Streaming Data
Advanced Threat Detection on Streaming DataAdvanced Threat Detection on Streaming Data
Advanced Threat Detection on Streaming Data
 
Apache Spark Machine Learning
Apache Spark Machine LearningApache Spark Machine Learning
Apache Spark Machine Learning
 

Último

Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Jeffrey Haguewood
 
All These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFAll These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFMichael Gough
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...amber724300
 
Infrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsInfrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsYoss Cohen
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
A Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxA Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxAna-Maria Mihalceanu
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructureitnewsafrica
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observabilityitnewsafrica
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
Kuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorialKuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorialJoão Esperancinha
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 

Último (20)

Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
 
All These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFAll These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDF
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
 
Infrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsInfrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platforms
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
A Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxA Glance At The Java Performance Toolbox
A Glance At The Java Performance Toolbox
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
Kuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorialKuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorial
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 

Apache Spark streaming and HBase

  • 1. ® © 2015 MapR Technologies 1 ® © 2014 MapR Technologies Overview of Apache Spark Streaming Carol McDonald
  • 2. ® © 2015 MapR Technologies 2 Agenda •  Why Apache Spark Streaming ? •  What is Apache Spark Streaming? –  Key Concepts and Architecture •  How it works by Example
  • 3. ® © 2015 MapR Technologies 3 Why Spark Streaming? •  Process Time Series data : –  Results in near-real-time •  Use Cases –  Social network trends –  Website statistics, monitoring –  Fraud detection –  Advertising click monetization put put put put Time stamped data data •  Sensor, System Metrics, Events, log files •  Stock Ticker, User Activity •  Hi Volume, Velocity Data for real-time monitoring
  • 4. ® © 2015 MapR Technologies 4 What is time series data? •  Stuff with timestamps –  Sensor data –  log files –  Phones.. Credit Card Transactions Web user behaviour Social media Log files Geodata Sensors
  • 5. ® © 2015 MapR Technologies 5 Why Spark Streaming ? What If? •  You want to analyze data as it arrives? For Example Time Series Data: Sensors, Clicks, Logs, Stats
  • 6. ® © 2015 MapR Technologies 6 Batch Processing It's 6:01 and 72 degrees It's 6:02 and 75 degrees It's 6:03 and 77 degrees It's 6:04 and 85 degrees It's 6:05 and 90 degrees It's 6:06 and 85 degrees It's 6:07 and 77 degrees It's 6:08 and 75 degrees It was hot at 6:05 yesterday! Batch processing may be too late for some events
  • 7. ® © 2015 MapR Technologies 7 Event Processing It's 6:05 and 90 degrees Someone should open a window! Streaming Its becoming important to process events as they arrive
  • 8. ® © 2015 MapR Technologies 8 What is Spark Streaming? •  extension of the core Spark AP •  enables scalable, high-throughput, fault-tolerant stream processing of live data Data Sources Data Sinks
  • 9. ® © 2015 MapR Technologies 9 Stream Processing Architecture Streaming Sources/Apps MapR-FS Data Ingest Topics MapR-DB Data Storage MapR-FS Apps   Stream Processing
  • 10. ® © 2015 MapR Technologies 10 Key Concepts •  Data Sources: –  File Based: HDFS –  Network Based: TCP sockets, Twitter, Kafka, Flume, ZeroMQ, Akka Actor •  Transformations •  Output Operations MapR-FS Topics
  • 11. ® © 2015 MapR Technologies 11 Spark Streaming Architecture •  Divide  data  stream  into  batches  of  X  seconds     – Called  DStream  =    sequence  of  RDDs       Spark Streaming input data stream DStream RDD batches Batch interval data from time 0 to 1 data from time 1 to 2 RDD @ time 2 data from time 2 to 3 RDD @ time 3RDD @ time 1
  • 12. ® © 2015 MapR Technologies 12 Resilient Distributed Datasets (RDD) Spark revolves around RDDs •  read only collection of elements
  • 13. ® © 2015 MapR Technologies 13 Resilient Distributed Datasets (RDD) Spark revolves around RDDs •  read only collection of elements •  operated on in parallel •  Cached in memory –  Or on disk •  Fault tolerant
  • 14. ® © 2015 MapR Technologies 14 Working With RDDs RDD RDD RDD RDD Transformations Action Value linesWithErrorRDD.count()! 6! ! linesWithErrorRDD.first()! # Error line! textFile = sc.textFile(”SomeFile.txt”)! linesWithErrorRDD = linesRDD.filter(lambda line: “ERROR” in line)!
  • 15. ® © 2015 MapR Technologies 15 Process DStream transform   Transform   map   reduceByValue   count   DStream RDDs Dstream   RDDs   transform  transform   •  Process  using  transformaBons     – creates  new  RDDs   data from time 0 to 1 data from time 1 to 2 RDD @ time 2 data from time 2 to 3 RDD @ time 3RDD @ time 1 RDD @ time 1 RDD @ time 2 RDD @ time 3
  • 16. ® © 2015 MapR Technologies 16 Key Concepts •  Data Sources •  Transformations: create new DStream –  Standard RDD operations: map, filter, union, reduce, join, … –  Stateful operations: UpdateStateByKey(function), countByValueAndWindow, … •  Output Operations
  • 17. ® © 2015 MapR Technologies 17 Spark Streaming Architecture •  processed  results  are  pushed  out    in  batches   Spark batches of processed results Spark Streaming input data stream DStream RDD batches data from time 0 to 1 data from time 1 to 2 RDD @ time 2 data from time 2 to 3 RDD @ time 3RDD @ time 1
  • 18. ® © 2015 MapR Technologies 18 Key Concepts •  Data Sources •  Transformations •  Output Operations: trigger Computation –  saveAsHadoopFiles – save to HDFS –  saveAsHadoopDataset – save to Hbase –  saveAsTextFiles –  foreach – do anything with each batch of RDDs MapR-DB MapR-FS
  • 19. ® © 2015 MapR Technologies 19 Learning Goals •  How it works by example
  • 20. ® © 2015 MapR Technologies 20 Use Case: Time Series Data Data for real-time monitoring read Spark Processing Spark Streaming Oil Pump Sensor data
  • 21. ® © 2015 MapR Technologies 21 Convert Line of CSV data to Sensor Object case class Sensor(resid: String, date: String, time: String, hz: Double, disp: Double, flo: Double, sedPPM: Double, psi: Double, chlPPM: Double) def parseSensor(str: String): Sensor = { val p = str.split(",") Sensor(p(0), p(1), p(2), p(3).toDouble, p(4).toDouble, p(5).toDouble, p(6).toDouble, p(7).toDouble, p(8).toDouble) }
  • 22. ® © 2015 MapR Technologies 22 Schema •  All events stored, data CF could be set to expire data •  Filtered alerts put in alerts CF •  Daily summaries put in Stats CF Row key CF data CF alerts CF stats hz … psi psi … hz_avg … psi_min COHUTTA_3/10/14_1:01 10.37 84 0 COHUTTA_3/10/14 10 0
  • 23. ® © 2015 MapR Technologies 23 Basic Steps for Spark Streaming code These are the basic steps for Spark Streaming code: 1.  create a Dstream 1.  Apply transformations 2.  Apply output operations 2.  Start receiving data and processing it –  using streamingContext.start(). 3.  Wait for the processing to be stopped –  using streamingContext.awaitTermination().
  • 24. ® © 2015 MapR Technologies 24 Create a DStream val ssc = new StreamingContext(sparkConf, Seconds(2)) val linesDStream = ssc.textFileStream(“/mapr/stream") batch   'me  0-­‐1   linesDStream batch   'me  1-­‐2   batch   'me  1-­‐2   DStream:  a  sequence  of  RDDs  represenBng  a   stream  of  data   stored  in  memory  as  an   RDD  
  • 25. ® © 2015 MapR Technologies 25 Process DStream val linesDStream = ssc.textFileStream(”directory path") val sensorDStream = linesDStream.map(parseSensor) map   new  RDDs  created   for  every  batch     batch   'me  0-­‐1   linesDStream RDDs sensorDstream   RDDs   batch   'me  1-­‐2   map  map   batch   'me  1-­‐2  
  • 26. ® © 2015 MapR Technologies 26 Process DStream // for Each RDD sensorDStream.foreachRDD { rdd => // filter sensor data for low psi val alertRDD = rdd.filter(sensor => sensor.psi < 5.0) . . . }
  • 27. ® © 2015 MapR Technologies 27 DataFrame and SQL Operations // for Each RDD parse into a sensor object filter sensorDStream.foreachRDD { rdd => . . . alertRdd.toDF().registerTempTable(”alert”) // join alert data with pump maintenance info val res = sqlContext.sql( "select s.resid,s.psi, p.pumpType from alert s join pump p on s.resid = p.resid join maint m on p.resid=m.resid") . . . }
  • 28. ® © 2015 MapR Technologies 28 Save to HBase // for Each RDD parse into a sensor object filter sensorDStream.foreachRDD { rdd => . . . // convert alert to put object write to HBase alerts alertRDD.map(Sensor.convertToPutAlert) .saveAsHadoopDataset(jobConfig) }
  • 29. ® © 2015 MapR Technologies 29 Save to HBase rdd.map(Sensor.convertToPut).saveAsHadoopDataset(jobConfig) map   Put  objects  wriFen     To  HBase   batch   'me  0-­‐1   linesRDD DStream sensorRDD   Dstream   batch   'me  1-­‐2   map  map   batch   'me  1-­‐2   HBase save save save output  opera'on:  persist  data  to  external  storage  
  • 30. ® © 2015 MapR Technologies 30 Start Receiving Data sensorDStream.foreachRDD { rdd => . . . } // Start the computation ssc.start() // Wait for the computation to terminate ssc.awaitTermination()
  • 31. ® © 2015 MapR Technologies 31 Using HBase as a Source and Sink read write Spark applicationHBase database EXAMPLE: calculate and store summaries, Pre-Computed, Materialized View
  • 32. ® © 2015 MapR Technologies 32 HBase HBase Read and Write val hBaseRDD = sc.newAPIHadoopRDD( conf,classOf[TableInputFormat], classOf[org.apache.hadoop.hbase.io.ImmutableBytesWritable], classOf[org.apache.hadoop.hbase.client.Result]) keyStatsRDD.map { case (k, v) => convertToPut(k, v) }.saveAsHadoopDataset(jobConfig) newAPIHadoopRDD Row key Result saveAsHadoopDataset Key Put HBase Scan Result
  • 33. ® © 2015 MapR Technologies 33 Read HBase // Load an RDD of (rowkey, Result) tuples from HBase table val hBaseRDD = sc.newAPIHadoopRDD(conf, classOf[TableInputFormat], classOf[org.apache.hadoop.hbase.io.ImmutableBytesWritable], classOf[org.apache.hadoop.hbase.client.Result]) // get Result val resultRDD = hBaseRDD.map(tuple => tuple._2) // transform into an RDD of (RowKey, ColumnValue)s val keyValueRDD = resultRDD.map( result => (Bytes.toString(result.getRow()).split(" ")(0), Bytes.toDouble(result.value))) // group by rowkey , get statistics for column value val keyStatsRDD = keyValueRDD.groupByKey().mapValues(list => StatCounter(list))
  • 34. ® © 2015 MapR Technologies 34 Write HBase // save to HBase table CF data val jobConfig: JobConf = new JobConf(conf, this.getClass) jobConfig.setOutputFormat(classOf[TableOutputFormat]) jobConfig.set(TableOutputFormat.OUTPUT_TABLE, tableName) // convert psi stats to put and write to hbase table stats column family keyStatsRDD.map { case (k, v) => convertToPut(k, v) }.saveAsHadoopDataset(jobConfig)
  • 35. ® © 2015 MapR Technologies 35 MapR Blog: Using Apache Spark DataFrames for Processing of Tabular Data •  https://www.mapr.com/blog/spark-streaming-hbase
  • 36. ® © 2015 MapR Technologies 36 Free HBase On Demand Training (includes Hive and MapReduce with HBase) •  https://www.mapr.com/services/mapr-academy/big-data-hadoop- online-training
  • 37. ® © 2015 MapR Technologies 37 Soon to Come •  Spark On Demand Training –  https://www.mapr.com/services/mapr-academy/
  • 38. ® © 2015 MapR Technologies 38 References •  Spark web site: http://spark.apache.org/ •  https://databricks.com/ •  Spark on MapR: –  http://www.mapr.com/products/apache-spark •  Spark SQL and DataFrame Guide •  Apache Spark vs. MapReduce – Whiteboard Walkthrough •  Learning Spark - O'Reilly Book •  Apache Spark
  • 39. ® © 2015 MapR Technologies 39 Q&A @mapr maprtech Engage with us! MapR maprtech mapr-technologies