SlideShare una empresa de Scribd logo
1 de 28
Descargar para leer sin conexión
Noam Shaish
Spark Streaming
Scale	
  
Fault	
  tolerance	
  
High	
  throughput
Agenda
❖ Overview	
  
❖ Architecture	
  
❖ Fault-­‐tolerance	
  
❖ Why	
  Spark	
  streaming?	
  We	
  have	
  Storm	
  
❖ Demo
Overview
❖ Spark	
  Streaming	
  is	
  an	
  extension	
  of	
  core	
  Spark	
  API.	
  It	
  enables	
  scalable,	
  
high-­‐throughput,	
  fault-­‐tolerant	
  stream	
  processing	
  of	
  live	
  data	
  streams.	
  
❖ ConnecGons	
  for	
  most	
  of	
  common	
  data	
  sources	
  such	
  as	
  KaIa,	
  Flume,	
  
TwiKer,	
  ZeroMQ,	
  Kinesis,	
  TCP,	
  etc.	
  
❖ Spark	
  streaming	
  differ	
  from	
  most	
  online	
  processing	
  soluGon	
  by	
  
espousing	
  mini	
  batch	
  approach,	
  instead	
  of	
  data	
  stream.	
  
❖ Based	
  on	
  DiscreGzed	
  Stream	
  paper	
  	
  
❖ Discretized Streams:A Fault-Tolerant Model for Scalable Stream Processing

Matei Zaharia,Tathagata Das, Haoyuan Li, 

Timothy Hunter, Scott Shenker, Ion Stoica

Berkeley EECS (2012-12-14)

www.eecs.berkeley.edu/Pubs/TechRpts/2012/EECS-2012-259.pdf
Overview
Spark	
  streaming	
  runs	
  streaming	
  computaGon	
  as	
  a	
  series	
  of	
  very	
  small,	
  
determinis1c	
  batch	
  jobs	
  
Spark	
  
streaming
Spark
Live	
  data	
  stream
Batches	
  of	
  X	
  milliseconds
Processed	
  results
❖ Chops	
  live	
  stream	
  into	
  batches	
  of	
  x	
  
milliseconds	
  
❖ Spark	
  treats	
  each	
  batch	
  of	
  data	
  as	
  
RDDs	
  
❖ Processed	
  results	
  of	
  the	
  RDD	
  
operaGons	
  are	
  returned	
  in	
  batches
DStream, not just RDD
* Datastax cassandra connector
Transformations
• map(),	
  	
  
• flatMap()	
  	
  
• filter()	
  	
  
• count()	
  
• reparGGon()	
  
• union()	
  
• reduce()	
  	
  
• countByValue()	
  
• reduceByKey()	
  
• join()	
  	
  
• cogroup()	
  
• transform()	
  
• updateStateByKey()
Output Operations
• print()	
  
• foreachRDD()	
  
• saveAsObjectToFiles()	
  
• saveAsTextFiles()	
  
• saveAsHadoopFiles()	
  
• *saveToCassandra()
Window Operations
• window()	
  
• countByWindow()	
  
• reduceByWindow()	
  
• reduceByKeyAndWindow()	
  
• countByValueAndWindow()
Example 1 - DStream to RDD
val tweets = ssc.twitterStream(<Twitter username>, <Twitter password>)
Twi8er	
  Streaming	
  API	
  
!
!
tweets	
  DStream	
  
batch	
  @	
  t batch	
  @	
  t	
  +	
  1 batch	
  @	
  t	
  +	
  3batch	
  @	
  t	
  +	
  2
stored	
  in	
  memory	
  as	
  an	
  RDD	
  
(immutable,	
  distributed)
Example 1 - DStream to RDD relation
val tweets = ssc.twitterStream(<Twitter username>, <Twitter password>)!
val hashTags = tweets.flatMap(status => getTags(status))
tweets	
  DStream	
  
batch	
  @	
  t batch	
  @	
  t	
  +	
  1 batch	
  @	
  t	
  +	
  3batch	
  @	
  t	
  +	
  2
hashTags	
  DStream	
  
[#hobbitch,	
  	
  #bilboleggins,	
  …]
flatMap flatMap flatMap flatMap
new	
  RDDs	
  for	
  
each	
  batch
new	
  DStream
Example 1 - DStream to RDD
val tweets = ssc.twitterStream(<Twitter username>, <Twitter password>)!
val hashTags = tweets.flatMap(status => getTags(status))!
hashTags.saveToCassandra(“keyspace”, “tableName”)
tweets	
  DStream	
  
hashTags	
  DStream	
  
[#hobbitch,	
  	
  #bilboleggins,	
  …]
flatMap flatMap flatMap flatMap
every	
  batch	
  
saved	
  to	
  
Cassandra
save save save save
Example 2 - DStream to RDD relation
val tweets = ssc.twitterStream(<Twitter username>, <Twitter password>)!
val hashTags = tweets.flatMap(status => getTags(status))!
val tagCounts = hashTags.countByValue()
tweets	
  DStream	
  
hashTags	
  
flatMap flatMap flatMap flatMap
map map map map
reduceByKey reduceByKey reduceByKey reduceByKey
hashTags	
  
[(#hobbitch,	
  10),	
  	
  (#bilboleggins,	
  34),	
  …]
Example 3 - Count the hash tags over last 10 minutes
val tweets = ssc.twitterStream(<Twitter username>, <Twitter password>)!
val hashTags = tweets.flatMap(status => getTags(status))!
val tagCounts = hashTags.window(Minutes(10), Seconds(1)).countByValue()
Sliding	
  window	
  
operaGon Window	
  length Sliding	
  interval
Example 3 - Count the hash tags over last 10 minutes
val tagCounts = hashTags.window(Minutes(10), Seconds(1)).countByValue()
t-1 t t+1 t+2 t+3
sliding	
  window
hashTags	
  
hashTags	
  
Count	
  over	
  all	
  
data	
  in	
  window
Example 4 - Count hash tags over last 10 minutes smartly
val tagCounts = hashTags.countByValueAndWindow(Minutes(10), Seconds(1))
t-1 t t+1 t+2 t+3
sliding	
  window
hashTags	
  
hashTags	
  
Add	
  count	
  of	
  new	
  
batch	
  in	
  window
+-
Reduce	
  count	
  of	
  
batch	
  out	
  of	
  window
generalizaGon	
  of	
  smart	
  window	
  reduce	
  exists:	
  	
  
reduceByKeyAndWindow(reduce,	
  inverseReduce,	
  window,	
  	
  interval)
Architecture
❖ Receivers	
  divides	
  data	
  into	
  mini	
  batches	
  
❖ Size	
  of	
  batches	
  can	
  be	
  defined	
  in	
  milliseconds	
  (best	
  pracGce	
  
is	
  greater	
  than	
  500	
  milliseconds)
Spark	
  Streaming
Receivers
Spark	
  
Engine
Batches	
  of	
  	
  
input	
  RDDs
Batches	
  of	
  	
  
output	
  RDDs
Input	
  streams
Fault-tolerance
❖ RDDs	
  are	
  not	
  generated	
  from	
  
fault-­‐tolerance	
  source	
  	
  	
  
❖ Replicate	
  data	
  among	
  worker	
  
nodes	
  

(default	
  replicaGon	
  factor	
  of	
  2)	
  
❖ In	
  state-­‐full	
  jobs	
  checkpoints	
  
should	
  be	
  used	
  	
  
❖ Journaling	
  such	
  as	
  in	
  DB	
  can	
  
be	
  acGvated	
  
flatMap
Tweets	
  RDD
hashTags	
  RDD
input	
  data	
  
replicated	
  in	
  
memory
lost	
  parGGons	
  
recomputed	
  on	
  other	
  
workers
Fault-tolerance
❖ Two	
  kinds	
  of	
  data	
  to	
  recover	
  in	
  the	
  event	
  of	
  failure:	
  
• Data	
  received	
  and	
  replicated	
  -­‐	
  

This	
  data	
  survives	
  failure	
  of	
  a	
  single	
  worker	
  node,	
  since	
  a	
  copy	
  of	
  it	
  
exists	
  on	
  one	
  of	
  the	
  other	
  nodes.	
  
• Data	
  received	
  but	
  buffered	
  for	
  replicaGon	
  -­‐

As	
  this	
  is	
  not	
  replicated,	
  the	
  only	
  way	
  to	
  recover	
  that	
  data	
  is	
  to	
  get	
  
it	
  from	
  the	
  source	
  again.
Fault-tolerance
❖ Two	
  receiver	
  semanGcs:	
  
• Reliable	
  receiver	
  -­‐	
  

Acknowledges	
  only	
  ager	
  received	
  data	
  is	
  replicated.	
  If	
  fails,	
  
buffered	
  data	
  does	
  not	
  get	
  acknowledged	
  to	
  the	
  source.	
  If	
  the	
  
receiver	
  is	
  restarted,	
  the	
  source	
  will	
  resend	
  the	
  data,	
  and	
  
therefore	
  no	
  data	
  will	
  be	
  lost	
  due	
  to	
  the	
  failure.	
  	
  
• Unreliable	
  Receiver	
  -­‐	
  

Such	
  receivers	
  can	
  lose	
  data	
  when	
  they	
  fail	
  due	
  to	
  worker	
  or	
  driver	
  
failures.
Fault-tolerance
Deployment	
  
Scenario
Receiver	
  Failure Driver	
  failure
without	
  write	
  
ahead	
  log
Buffered	
  data	
  lost	
  with	
  unreliable	
  receivers	
  
Zero	
  data	
  lost	
  with	
  reliable	
  receivers	
  and	
  files
Buffered	
  data	
  lost	
  with	
  unreliable	
  receivers	
  
Past	
  data	
  lost	
  with	
  all	
  receivers	
  
Zero	
  data	
  lost	
  with	
  files
with	
  write	
  
ahead	
  log
Zero	
  data	
  lost	
  with	
  receivers	
  and	
  files Zero	
  data	
  lost	
  with	
  receivers	
  and	
  files
Why Spark streaming? 

We have Storm
One model to rule them all
❖ Same	
  model	
  for	
  offline	
  AND	
  
online	
  processing	
  
❖ Common	
  code	
  base	
  for	
  offline	
  
AND	
  online	
  processing	
  
❖ Less	
  bugs	
  due	
  to	
  duplicaGon	
  
❖ Less	
  bugs	
  of	
  framework	
  difference	
  
❖ Increase	
  developer	
  producGvity
One stack to rule them all
❖ Explore	
  data	
  
interacGvely	
  using	
  Spark	
  
shell	
  to	
  idenGfy	
  problem	
  
❖ Use	
  same	
  code	
  in	
  Spark	
  
standalone	
  to	
  idenGfy	
  
problem	
  in	
  producGon	
  
environment	
  
❖ Use	
  similar	
  code	
  in	
  
Spark	
  Streaming	
  to	
  
monitor	
  problem	
  online
$	
  ./spark-­‐shell	
  
scala>	
  val	
  file	
  =	
  sc.hadoopFile(“smallLogs”)	
  
...	

scala>	
  val	
  filtered	
  =	
  file.filter(_.contains(“ERROR”))	
  
...	

scala>	
  va
object	
  ProcessProductionData	
  {	
  
	
   def	
  main(args:	
  Array[String])	
  {	
  
	
   	
   val	
  sc	
  =	
  new	
  SparkContext(...)	
  
	
   	
   val	
  file	
  =	
  sc.hadoopFile(“productionLogs”)	
  
	
   	
   val	
  filtered	
  =	
  file.filter(_.contains(“ERROR”))	
  
	
   	
   val	
  mapped	
  =	
  filtered.map(...)	
  
	
   	
   ...	
  
	
   }	
  
} object	
  ProcessLiveStream	
  {	
  
	
   def	
  main(args:	
  Array[String])	
  {	
  
	
   	
   val	
  sc	
  =	
  new	
  StreamingContext(...)	
  
	
   	
   val	
  stream	
  =	
  sc.kafkaStream(...)	
  
	
   	
   val	
  filtered	
  =	
  stream.filter(_.contains(“ERROR”))	
  
	
   	
   val	
  mapped	
  =	
  filtered.map(...)	
  
	
   	
   ...	
  
	
   }	
  
}
Performance
❖ Higher	
  throughput	
  than	
  Storm	
  
• Spark	
  Streaming:	
  670k	
  records/second/node	
  
• Storm:	
  115k	
  records/seconds/node
Grep
Throughput	
  per	
  
node	
  (MB/s)
0
17.5
35
52.5
70
Record	
  size	
  (bytes)
100 1000
Spark
Storm
WordCount
0
7.5
15
22.5
30
Record	
  size	
  (bytes)
100 1000
Tested	
  with	
  100	
  EC2	
  instances	
  with	
  4	
  core	
  each	
  
Comparison	
  taken	
  from	
  Das	
  Thatagata	
  and	
  Reynold	
  Xin	
  Hadoop	
  summit	
  2013	
  presentaGon
Community
Community
Community
Monitoring
In	
  addiGon	
  StreamListener	
  interface	
  provides	
  addiGonal	
  informaGon	
  in	
  various	
  levels	
  	
  
(ApplicaGon,	
  Job,	
  Task,	
  etc.)	
  	
  
Language
vs
Utilization
❖ Spark	
  1.2	
  introduces	
  dynamic	
  cluster	
  resource	
  allocaGon	
  
❖ Jobs	
  can	
  request	
  more	
  resources	
  and	
  release	
  resource	
  
❖ Available	
  only	
  on	
  YARN
Demo
hKps://github.com/NoamShaish/spark-­‐streaming-­‐workshop.git

Más contenido relacionado

La actualidad más candente

H2O with Erin LeDell at Portland R User Group
H2O with Erin LeDell at Portland R User GroupH2O with Erin LeDell at Portland R User Group
H2O with Erin LeDell at Portland R User GroupSri Ambati
 
Apache Spark and the Emerging Technology Landscape for Big Data
Apache Spark and the Emerging Technology Landscape for Big DataApache Spark and the Emerging Technology Landscape for Big Data
Apache Spark and the Emerging Technology Landscape for Big DataPaco Nathan
 
How Apache Spark fits in the Big Data landscape
How Apache Spark fits in the Big Data landscapeHow Apache Spark fits in the Big Data landscape
How Apache Spark fits in the Big Data landscapePaco Nathan
 
GalvanizeU Seattle: Eleven Almost-Truisms About Data
GalvanizeU Seattle: Eleven Almost-Truisms About DataGalvanizeU Seattle: Eleven Almost-Truisms About Data
GalvanizeU Seattle: Eleven Almost-Truisms About DataPaco Nathan
 
Jupyter for Education: Beyond Gutenberg and Erasmus
Jupyter for Education: Beyond Gutenberg and ErasmusJupyter for Education: Beyond Gutenberg and Erasmus
Jupyter for Education: Beyond Gutenberg and ErasmusPaco Nathan
 
How Apache Spark fits into the Big Data landscape
How Apache Spark fits into the Big Data landscapeHow Apache Spark fits into the Big Data landscape
How Apache Spark fits into the Big Data landscapePaco Nathan
 
Intro to H2O Machine Learning in R at Santa Clara University
Intro to H2O Machine Learning in R at Santa Clara UniversityIntro to H2O Machine Learning in R at Santa Clara University
Intro to H2O Machine Learning in R at Santa Clara UniversitySri Ambati
 
PyData Texas 2015 Keynote
PyData Texas 2015 KeynotePyData Texas 2015 Keynote
PyData Texas 2015 KeynotePeter Wang
 
Hadoop or Spark: is it an either-or proposition? By Slim Baltagi
Hadoop or Spark: is it an either-or proposition? By Slim BaltagiHadoop or Spark: is it an either-or proposition? By Slim Baltagi
Hadoop or Spark: is it an either-or proposition? By Slim BaltagiSlim Baltagi
 
Data Science with Spark
Data Science with SparkData Science with Spark
Data Science with SparkKrishna Sankar
 
Use of standards and related issues in predictive analytics
Use of standards and related issues in predictive analyticsUse of standards and related issues in predictive analytics
Use of standards and related issues in predictive analyticsPaco Nathan
 
Paris Data Geek - Spark Streaming
Paris Data Geek - Spark Streaming Paris Data Geek - Spark Streaming
Paris Data Geek - Spark Streaming Djamel Zouaoui
 
H2O PySparkling Water
H2O PySparkling WaterH2O PySparkling Water
H2O PySparkling WaterSri Ambati
 
Architecture in action 01
Architecture in action 01Architecture in action 01
Architecture in action 01Krishna Sankar
 
H2O Big Join Slides
H2O Big Join SlidesH2O Big Join Slides
H2O Big Join SlidesSri Ambati
 
Big Data, Mob Scale.
Big Data, Mob Scale.Big Data, Mob Scale.
Big Data, Mob Scale.darach
 
H2O Deep Water - Making Deep Learning Accessible to Everyone
H2O Deep Water - Making Deep Learning Accessible to EveryoneH2O Deep Water - Making Deep Learning Accessible to Everyone
H2O Deep Water - Making Deep Learning Accessible to EveryoneSri Ambati
 
ArnoCandelAIFrontiers011217
ArnoCandelAIFrontiers011217ArnoCandelAIFrontiers011217
ArnoCandelAIFrontiers011217Sri Ambati
 

La actualidad más candente (20)

H2O with Erin LeDell at Portland R User Group
H2O with Erin LeDell at Portland R User GroupH2O with Erin LeDell at Portland R User Group
H2O with Erin LeDell at Portland R User Group
 
Apache Spark and the Emerging Technology Landscape for Big Data
Apache Spark and the Emerging Technology Landscape for Big DataApache Spark and the Emerging Technology Landscape for Big Data
Apache Spark and the Emerging Technology Landscape for Big Data
 
How Apache Spark fits in the Big Data landscape
How Apache Spark fits in the Big Data landscapeHow Apache Spark fits in the Big Data landscape
How Apache Spark fits in the Big Data landscape
 
GalvanizeU Seattle: Eleven Almost-Truisms About Data
GalvanizeU Seattle: Eleven Almost-Truisms About DataGalvanizeU Seattle: Eleven Almost-Truisms About Data
GalvanizeU Seattle: Eleven Almost-Truisms About Data
 
Jupyter for Education: Beyond Gutenberg and Erasmus
Jupyter for Education: Beyond Gutenberg and ErasmusJupyter for Education: Beyond Gutenberg and Erasmus
Jupyter for Education: Beyond Gutenberg and Erasmus
 
How Apache Spark fits into the Big Data landscape
How Apache Spark fits into the Big Data landscapeHow Apache Spark fits into the Big Data landscape
How Apache Spark fits into the Big Data landscape
 
Intro to H2O Machine Learning in R at Santa Clara University
Intro to H2O Machine Learning in R at Santa Clara UniversityIntro to H2O Machine Learning in R at Santa Clara University
Intro to H2O Machine Learning in R at Santa Clara University
 
PyData Texas 2015 Keynote
PyData Texas 2015 KeynotePyData Texas 2015 Keynote
PyData Texas 2015 Keynote
 
Hadoop or Spark: is it an either-or proposition? By Slim Baltagi
Hadoop or Spark: is it an either-or proposition? By Slim BaltagiHadoop or Spark: is it an either-or proposition? By Slim Baltagi
Hadoop or Spark: is it an either-or proposition? By Slim Baltagi
 
Data Science with Spark
Data Science with SparkData Science with Spark
Data Science with Spark
 
Use of standards and related issues in predictive analytics
Use of standards and related issues in predictive analyticsUse of standards and related issues in predictive analytics
Use of standards and related issues in predictive analytics
 
Paris Data Geek - Spark Streaming
Paris Data Geek - Spark Streaming Paris Data Geek - Spark Streaming
Paris Data Geek - Spark Streaming
 
Hands On: Introduction to the Hadoop Ecosystem
Hands On: Introduction to the Hadoop EcosystemHands On: Introduction to the Hadoop Ecosystem
Hands On: Introduction to the Hadoop Ecosystem
 
H2O PySparkling Water
H2O PySparkling WaterH2O PySparkling Water
H2O PySparkling Water
 
Architecture in action 01
Architecture in action 01Architecture in action 01
Architecture in action 01
 
H2O Big Join Slides
H2O Big Join SlidesH2O Big Join Slides
H2O Big Join Slides
 
Big Data, Mob Scale.
Big Data, Mob Scale.Big Data, Mob Scale.
Big Data, Mob Scale.
 
LinkedIn
LinkedInLinkedIn
LinkedIn
 
H2O Deep Water - Making Deep Learning Accessible to Everyone
H2O Deep Water - Making Deep Learning Accessible to EveryoneH2O Deep Water - Making Deep Learning Accessible to Everyone
H2O Deep Water - Making Deep Learning Accessible to Everyone
 
ArnoCandelAIFrontiers011217
ArnoCandelAIFrontiers011217ArnoCandelAIFrontiers011217
ArnoCandelAIFrontiers011217
 

Destacado

Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...Helena Edelson
 
Spark and Shark: Lightning-Fast Analytics over Hadoop and Hive Data
Spark and Shark: Lightning-Fast Analytics over Hadoop and Hive DataSpark and Shark: Lightning-Fast Analytics over Hadoop and Hive Data
Spark and Shark: Lightning-Fast Analytics over Hadoop and Hive DataJetlore
 
Apache storm vs. Spark Streaming
Apache storm vs. Spark StreamingApache storm vs. Spark Streaming
Apache storm vs. Spark StreamingP. Taylor Goetz
 
Spark Streaming Data Pipelines
Spark Streaming Data PipelinesSpark Streaming Data Pipelines
Spark Streaming Data PipelinesMapR Technologies
 
Big Data Analytics with Storm, Spark and GraphLab
Big Data Analytics with Storm, Spark and GraphLabBig Data Analytics with Storm, Spark and GraphLab
Big Data Analytics with Storm, Spark and GraphLabImpetus Technologies
 
Apache NiFi in the Hadoop Ecosystem
Apache NiFi in the Hadoop EcosystemApache NiFi in the Hadoop Ecosystem
Apache NiFi in the Hadoop EcosystemBryan Bende
 
Apache Spark Streaming - www.know bigdata.com
Apache Spark Streaming - www.know bigdata.comApache Spark Streaming - www.know bigdata.com
Apache Spark Streaming - www.know bigdata.comknowbigdata
 
Global Big Data Conference Sept 2014 AWS Kinesis Spark Streaming Approximatio...
Global Big Data Conference Sept 2014 AWS Kinesis Spark Streaming Approximatio...Global Big Data Conference Sept 2014 AWS Kinesis Spark Streaming Approximatio...
Global Big Data Conference Sept 2014 AWS Kinesis Spark Streaming Approximatio...Chris Fregly
 
An introduction To Apache Spark
An introduction To Apache SparkAn introduction To Apache Spark
An introduction To Apache SparkAmir Sedighi
 
[Spark meetup] Spark Streaming Overview
[Spark meetup] Spark Streaming Overview[Spark meetup] Spark Streaming Overview
[Spark meetup] Spark Streaming OverviewStratio
 
Productionalizing a spark application
Productionalizing a spark applicationProductionalizing a spark application
Productionalizing a spark applicationdatamantra
 
Understanding Data Partitioning and Replication in Apache Cassandra
Understanding Data Partitioning and Replication in Apache CassandraUnderstanding Data Partitioning and Replication in Apache Cassandra
Understanding Data Partitioning and Replication in Apache CassandraDataStax
 
Interactive Data Analysis in Spark Streaming
Interactive Data Analysis in Spark StreamingInteractive Data Analysis in Spark Streaming
Interactive Data Analysis in Spark Streamingdatamantra
 
Simplifying Big Data Analytics with Apache Spark
Simplifying Big Data Analytics with Apache SparkSimplifying Big Data Analytics with Apache Spark
Simplifying Big Data Analytics with Apache SparkDatabricks
 
Four Things to Know About Reliable Spark Streaming with Typesafe and Databricks
Four Things to Know About Reliable Spark Streaming with Typesafe and DatabricksFour Things to Know About Reliable Spark Streaming with Typesafe and Databricks
Four Things to Know About Reliable Spark Streaming with Typesafe and DatabricksLegacy Typesafe (now Lightbend)
 
Dive into Spark Streaming
Dive into Spark StreamingDive into Spark Streaming
Dive into Spark StreamingGerard Maas
 
Reactive dashboard’s using apache spark
Reactive dashboard’s using apache sparkReactive dashboard’s using apache spark
Reactive dashboard’s using apache sparkRahul Kumar
 
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...Anton Kirillov
 
Spark Streaming - The simple way
Spark Streaming - The simple waySpark Streaming - The simple way
Spark Streaming - The simple wayYogesh Kumar
 

Destacado (20)

Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
 
Spark and Shark: Lightning-Fast Analytics over Hadoop and Hive Data
Spark and Shark: Lightning-Fast Analytics over Hadoop and Hive DataSpark and Shark: Lightning-Fast Analytics over Hadoop and Hive Data
Spark and Shark: Lightning-Fast Analytics over Hadoop and Hive Data
 
Apache storm vs. Spark Streaming
Apache storm vs. Spark StreamingApache storm vs. Spark Streaming
Apache storm vs. Spark Streaming
 
Spark Streaming Data Pipelines
Spark Streaming Data PipelinesSpark Streaming Data Pipelines
Spark Streaming Data Pipelines
 
Big Data Analytics with Storm, Spark and GraphLab
Big Data Analytics with Storm, Spark and GraphLabBig Data Analytics with Storm, Spark and GraphLab
Big Data Analytics with Storm, Spark and GraphLab
 
Apache NiFi in the Hadoop Ecosystem
Apache NiFi in the Hadoop EcosystemApache NiFi in the Hadoop Ecosystem
Apache NiFi in the Hadoop Ecosystem
 
Apache Spark Streaming - www.know bigdata.com
Apache Spark Streaming - www.know bigdata.comApache Spark Streaming - www.know bigdata.com
Apache Spark Streaming - www.know bigdata.com
 
Global Big Data Conference Sept 2014 AWS Kinesis Spark Streaming Approximatio...
Global Big Data Conference Sept 2014 AWS Kinesis Spark Streaming Approximatio...Global Big Data Conference Sept 2014 AWS Kinesis Spark Streaming Approximatio...
Global Big Data Conference Sept 2014 AWS Kinesis Spark Streaming Approximatio...
 
An introduction To Apache Spark
An introduction To Apache SparkAn introduction To Apache Spark
An introduction To Apache Spark
 
[Spark meetup] Spark Streaming Overview
[Spark meetup] Spark Streaming Overview[Spark meetup] Spark Streaming Overview
[Spark meetup] Spark Streaming Overview
 
Apache Spark & Streaming
Apache Spark & StreamingApache Spark & Streaming
Apache Spark & Streaming
 
Productionalizing a spark application
Productionalizing a spark applicationProductionalizing a spark application
Productionalizing a spark application
 
Understanding Data Partitioning and Replication in Apache Cassandra
Understanding Data Partitioning and Replication in Apache CassandraUnderstanding Data Partitioning and Replication in Apache Cassandra
Understanding Data Partitioning and Replication in Apache Cassandra
 
Interactive Data Analysis in Spark Streaming
Interactive Data Analysis in Spark StreamingInteractive Data Analysis in Spark Streaming
Interactive Data Analysis in Spark Streaming
 
Simplifying Big Data Analytics with Apache Spark
Simplifying Big Data Analytics with Apache SparkSimplifying Big Data Analytics with Apache Spark
Simplifying Big Data Analytics with Apache Spark
 
Four Things to Know About Reliable Spark Streaming with Typesafe and Databricks
Four Things to Know About Reliable Spark Streaming with Typesafe and DatabricksFour Things to Know About Reliable Spark Streaming with Typesafe and Databricks
Four Things to Know About Reliable Spark Streaming with Typesafe and Databricks
 
Dive into Spark Streaming
Dive into Spark StreamingDive into Spark Streaming
Dive into Spark Streaming
 
Reactive dashboard’s using apache spark
Reactive dashboard’s using apache sparkReactive dashboard’s using apache spark
Reactive dashboard’s using apache spark
 
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...
 
Spark Streaming - The simple way
Spark Streaming - The simple waySpark Streaming - The simple way
Spark Streaming - The simple way
 

Similar a Spark streaming

Spark (Structured) Streaming vs. Kafka Streams
Spark (Structured) Streaming vs. Kafka StreamsSpark (Structured) Streaming vs. Kafka Streams
Spark (Structured) Streaming vs. Kafka StreamsGuido Schmutz
 
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Guido Schmutz
 
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Guido Schmutz
 
Guest Lecture on Spark Streaming in Stanford CME 323: Distributed Algorithms ...
Guest Lecture on Spark Streaming in Stanford CME 323: Distributed Algorithms ...Guest Lecture on Spark Streaming in Stanford CME 323: Distributed Algorithms ...
Guest Lecture on Spark Streaming in Stanford CME 323: Distributed Algorithms ...Tathagata Das
 
Deep Dive with Spark Streaming - Tathagata Das - Spark Meetup 2013-06-17
Deep Dive with Spark Streaming - Tathagata  Das - Spark Meetup 2013-06-17Deep Dive with Spark Streaming - Tathagata  Das - Spark Meetup 2013-06-17
Deep Dive with Spark Streaming - Tathagata Das - Spark Meetup 2013-06-17spark-project
 
strata_spark_streaming.ppt
strata_spark_streaming.pptstrata_spark_streaming.ppt
strata_spark_streaming.pptAbhijitManna19
 
strata_spark_streaming.ppt
strata_spark_streaming.pptstrata_spark_streaming.ppt
strata_spark_streaming.pptsnowflakebatch
 
strata spark streaming strata spark streamingsrata spark streaming
strata spark streaming strata spark streamingsrata spark streamingstrata spark streaming strata spark streamingsrata spark streaming
strata spark streaming strata spark streamingsrata spark streamingShidrokhGoudarzi1
 
What no one tells you about writing a streaming app
What no one tells you about writing a streaming appWhat no one tells you about writing a streaming app
What no one tells you about writing a streaming apphadooparchbook
 
What No One Tells You About Writing a Streaming App: Spark Summit East talk b...
What No One Tells You About Writing a Streaming App: Spark Summit East talk b...What No One Tells You About Writing a Streaming App: Spark Summit East talk b...
What No One Tells You About Writing a Streaming App: Spark Summit East talk b...Spark Summit
 
strata_spark_streaming.ppt
strata_spark_streaming.pptstrata_spark_streaming.ppt
strata_spark_streaming.pptrveiga100
 
ETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetupETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetupRafal Kwasny
 
SnappyData overview NikeTechTalk 11/19/15
SnappyData overview NikeTechTalk 11/19/15SnappyData overview NikeTechTalk 11/19/15
SnappyData overview NikeTechTalk 11/19/15SnappyData
 
Deep dive into spark streaming
Deep dive into spark streamingDeep dive into spark streaming
Deep dive into spark streamingTao Li
 
Continuous Application with Structured Streaming 2.0
Continuous Application with Structured Streaming 2.0Continuous Application with Structured Streaming 2.0
Continuous Application with Structured Streaming 2.0Anyscale
 
Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...
Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...
Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...Lightbend
 
Spark Streaming Recipes and "Exactly Once" Semantics Revised
Spark Streaming Recipes and "Exactly Once" Semantics RevisedSpark Streaming Recipes and "Exactly Once" Semantics Revised
Spark Streaming Recipes and "Exactly Once" Semantics RevisedMichael Spector
 
Quick Guide to Refresh Spark skills
Quick Guide to Refresh Spark skillsQuick Guide to Refresh Spark skills
Quick Guide to Refresh Spark skillsRavindra kumar
 

Similar a Spark streaming (20)

Spark (Structured) Streaming vs. Kafka Streams
Spark (Structured) Streaming vs. Kafka StreamsSpark (Structured) Streaming vs. Kafka Streams
Spark (Structured) Streaming vs. Kafka Streams
 
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
 
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
 
Guest Lecture on Spark Streaming in Stanford CME 323: Distributed Algorithms ...
Guest Lecture on Spark Streaming in Stanford CME 323: Distributed Algorithms ...Guest Lecture on Spark Streaming in Stanford CME 323: Distributed Algorithms ...
Guest Lecture on Spark Streaming in Stanford CME 323: Distributed Algorithms ...
 
Deep Dive with Spark Streaming - Tathagata Das - Spark Meetup 2013-06-17
Deep Dive with Spark Streaming - Tathagata  Das - Spark Meetup 2013-06-17Deep Dive with Spark Streaming - Tathagata  Das - Spark Meetup 2013-06-17
Deep Dive with Spark Streaming - Tathagata Das - Spark Meetup 2013-06-17
 
strata_spark_streaming.ppt
strata_spark_streaming.pptstrata_spark_streaming.ppt
strata_spark_streaming.ppt
 
strata_spark_streaming.ppt
strata_spark_streaming.pptstrata_spark_streaming.ppt
strata_spark_streaming.ppt
 
strata spark streaming strata spark streamingsrata spark streaming
strata spark streaming strata spark streamingsrata spark streamingstrata spark streaming strata spark streamingsrata spark streaming
strata spark streaming strata spark streamingsrata spark streaming
 
What no one tells you about writing a streaming app
What no one tells you about writing a streaming appWhat no one tells you about writing a streaming app
What no one tells you about writing a streaming app
 
What No One Tells You About Writing a Streaming App: Spark Summit East talk b...
What No One Tells You About Writing a Streaming App: Spark Summit East talk b...What No One Tells You About Writing a Streaming App: Spark Summit East talk b...
What No One Tells You About Writing a Streaming App: Spark Summit East talk b...
 
strata_spark_streaming.ppt
strata_spark_streaming.pptstrata_spark_streaming.ppt
strata_spark_streaming.ppt
 
ETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetupETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetup
 
Nike tech talk.2
Nike tech talk.2Nike tech talk.2
Nike tech talk.2
 
SnappyData overview NikeTechTalk 11/19/15
SnappyData overview NikeTechTalk 11/19/15SnappyData overview NikeTechTalk 11/19/15
SnappyData overview NikeTechTalk 11/19/15
 
Deep dive into spark streaming
Deep dive into spark streamingDeep dive into spark streaming
Deep dive into spark streaming
 
Continuous Application with Structured Streaming 2.0
Continuous Application with Structured Streaming 2.0Continuous Application with Structured Streaming 2.0
Continuous Application with Structured Streaming 2.0
 
Apache Spark Components
Apache Spark ComponentsApache Spark Components
Apache Spark Components
 
Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...
Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...
Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...
 
Spark Streaming Recipes and "Exactly Once" Semantics Revised
Spark Streaming Recipes and "Exactly Once" Semantics RevisedSpark Streaming Recipes and "Exactly Once" Semantics Revised
Spark Streaming Recipes and "Exactly Once" Semantics Revised
 
Quick Guide to Refresh Spark skills
Quick Guide to Refresh Spark skillsQuick Guide to Refresh Spark skills
Quick Guide to Refresh Spark skills
 

Último

Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...OnePlan Solutions
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionSolGuruz
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 

Último (20)

Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 

Spark streaming

  • 1. Noam Shaish Spark Streaming Scale   Fault  tolerance   High  throughput
  • 2. Agenda ❖ Overview   ❖ Architecture   ❖ Fault-­‐tolerance   ❖ Why  Spark  streaming?  We  have  Storm   ❖ Demo
  • 3. Overview ❖ Spark  Streaming  is  an  extension  of  core  Spark  API.  It  enables  scalable,   high-­‐throughput,  fault-­‐tolerant  stream  processing  of  live  data  streams.   ❖ ConnecGons  for  most  of  common  data  sources  such  as  KaIa,  Flume,   TwiKer,  ZeroMQ,  Kinesis,  TCP,  etc.   ❖ Spark  streaming  differ  from  most  online  processing  soluGon  by   espousing  mini  batch  approach,  instead  of  data  stream.   ❖ Based  on  DiscreGzed  Stream  paper     ❖ Discretized Streams:A Fault-Tolerant Model for Scalable Stream Processing
 Matei Zaharia,Tathagata Das, Haoyuan Li, 
 Timothy Hunter, Scott Shenker, Ion Stoica
 Berkeley EECS (2012-12-14)
 www.eecs.berkeley.edu/Pubs/TechRpts/2012/EECS-2012-259.pdf
  • 4. Overview Spark  streaming  runs  streaming  computaGon  as  a  series  of  very  small,   determinis1c  batch  jobs   Spark   streaming Spark Live  data  stream Batches  of  X  milliseconds Processed  results ❖ Chops  live  stream  into  batches  of  x   milliseconds   ❖ Spark  treats  each  batch  of  data  as   RDDs   ❖ Processed  results  of  the  RDD   operaGons  are  returned  in  batches
  • 5. DStream, not just RDD * Datastax cassandra connector Transformations • map(),     • flatMap()     • filter()     • count()   • reparGGon()   • union()   • reduce()     • countByValue()   • reduceByKey()   • join()     • cogroup()   • transform()   • updateStateByKey() Output Operations • print()   • foreachRDD()   • saveAsObjectToFiles()   • saveAsTextFiles()   • saveAsHadoopFiles()   • *saveToCassandra() Window Operations • window()   • countByWindow()   • reduceByWindow()   • reduceByKeyAndWindow()   • countByValueAndWindow()
  • 6. Example 1 - DStream to RDD val tweets = ssc.twitterStream(<Twitter username>, <Twitter password>) Twi8er  Streaming  API   ! ! tweets  DStream   batch  @  t batch  @  t  +  1 batch  @  t  +  3batch  @  t  +  2 stored  in  memory  as  an  RDD   (immutable,  distributed)
  • 7. Example 1 - DStream to RDD relation val tweets = ssc.twitterStream(<Twitter username>, <Twitter password>)! val hashTags = tweets.flatMap(status => getTags(status)) tweets  DStream   batch  @  t batch  @  t  +  1 batch  @  t  +  3batch  @  t  +  2 hashTags  DStream   [#hobbitch,    #bilboleggins,  …] flatMap flatMap flatMap flatMap new  RDDs  for   each  batch new  DStream
  • 8. Example 1 - DStream to RDD val tweets = ssc.twitterStream(<Twitter username>, <Twitter password>)! val hashTags = tweets.flatMap(status => getTags(status))! hashTags.saveToCassandra(“keyspace”, “tableName”) tweets  DStream   hashTags  DStream   [#hobbitch,    #bilboleggins,  …] flatMap flatMap flatMap flatMap every  batch   saved  to   Cassandra save save save save
  • 9. Example 2 - DStream to RDD relation val tweets = ssc.twitterStream(<Twitter username>, <Twitter password>)! val hashTags = tweets.flatMap(status => getTags(status))! val tagCounts = hashTags.countByValue() tweets  DStream   hashTags   flatMap flatMap flatMap flatMap map map map map reduceByKey reduceByKey reduceByKey reduceByKey hashTags   [(#hobbitch,  10),    (#bilboleggins,  34),  …]
  • 10. Example 3 - Count the hash tags over last 10 minutes val tweets = ssc.twitterStream(<Twitter username>, <Twitter password>)! val hashTags = tweets.flatMap(status => getTags(status))! val tagCounts = hashTags.window(Minutes(10), Seconds(1)).countByValue() Sliding  window   operaGon Window  length Sliding  interval
  • 11. Example 3 - Count the hash tags over last 10 minutes val tagCounts = hashTags.window(Minutes(10), Seconds(1)).countByValue() t-1 t t+1 t+2 t+3 sliding  window hashTags   hashTags   Count  over  all   data  in  window
  • 12. Example 4 - Count hash tags over last 10 minutes smartly val tagCounts = hashTags.countByValueAndWindow(Minutes(10), Seconds(1)) t-1 t t+1 t+2 t+3 sliding  window hashTags   hashTags   Add  count  of  new   batch  in  window +- Reduce  count  of   batch  out  of  window generalizaGon  of  smart  window  reduce  exists:     reduceByKeyAndWindow(reduce,  inverseReduce,  window,    interval)
  • 13. Architecture ❖ Receivers  divides  data  into  mini  batches   ❖ Size  of  batches  can  be  defined  in  milliseconds  (best  pracGce   is  greater  than  500  milliseconds) Spark  Streaming Receivers Spark   Engine Batches  of     input  RDDs Batches  of     output  RDDs Input  streams
  • 14. Fault-tolerance ❖ RDDs  are  not  generated  from   fault-­‐tolerance  source       ❖ Replicate  data  among  worker   nodes  
 (default  replicaGon  factor  of  2)   ❖ In  state-­‐full  jobs  checkpoints   should  be  used     ❖ Journaling  such  as  in  DB  can   be  acGvated   flatMap Tweets  RDD hashTags  RDD input  data   replicated  in   memory lost  parGGons   recomputed  on  other   workers
  • 15. Fault-tolerance ❖ Two  kinds  of  data  to  recover  in  the  event  of  failure:   • Data  received  and  replicated  -­‐  
 This  data  survives  failure  of  a  single  worker  node,  since  a  copy  of  it   exists  on  one  of  the  other  nodes.   • Data  received  but  buffered  for  replicaGon  -­‐
 As  this  is  not  replicated,  the  only  way  to  recover  that  data  is  to  get   it  from  the  source  again.
  • 16. Fault-tolerance ❖ Two  receiver  semanGcs:   • Reliable  receiver  -­‐  
 Acknowledges  only  ager  received  data  is  replicated.  If  fails,   buffered  data  does  not  get  acknowledged  to  the  source.  If  the   receiver  is  restarted,  the  source  will  resend  the  data,  and   therefore  no  data  will  be  lost  due  to  the  failure.     • Unreliable  Receiver  -­‐  
 Such  receivers  can  lose  data  when  they  fail  due  to  worker  or  driver   failures.
  • 17. Fault-tolerance Deployment   Scenario Receiver  Failure Driver  failure without  write   ahead  log Buffered  data  lost  with  unreliable  receivers   Zero  data  lost  with  reliable  receivers  and  files Buffered  data  lost  with  unreliable  receivers   Past  data  lost  with  all  receivers   Zero  data  lost  with  files with  write   ahead  log Zero  data  lost  with  receivers  and  files Zero  data  lost  with  receivers  and  files
  • 18. Why Spark streaming? 
 We have Storm
  • 19. One model to rule them all ❖ Same  model  for  offline  AND   online  processing   ❖ Common  code  base  for  offline   AND  online  processing   ❖ Less  bugs  due  to  duplicaGon   ❖ Less  bugs  of  framework  difference   ❖ Increase  developer  producGvity
  • 20. One stack to rule them all ❖ Explore  data   interacGvely  using  Spark   shell  to  idenGfy  problem   ❖ Use  same  code  in  Spark   standalone  to  idenGfy   problem  in  producGon   environment   ❖ Use  similar  code  in   Spark  Streaming  to   monitor  problem  online $  ./spark-­‐shell   scala>  val  file  =  sc.hadoopFile(“smallLogs”)   ... scala>  val  filtered  =  file.filter(_.contains(“ERROR”))   ... scala>  va object  ProcessProductionData  {     def  main(args:  Array[String])  {       val  sc  =  new  SparkContext(...)       val  file  =  sc.hadoopFile(“productionLogs”)       val  filtered  =  file.filter(_.contains(“ERROR”))       val  mapped  =  filtered.map(...)       ...     }   } object  ProcessLiveStream  {     def  main(args:  Array[String])  {       val  sc  =  new  StreamingContext(...)       val  stream  =  sc.kafkaStream(...)       val  filtered  =  stream.filter(_.contains(“ERROR”))       val  mapped  =  filtered.map(...)       ...     }   }
  • 21. Performance ❖ Higher  throughput  than  Storm   • Spark  Streaming:  670k  records/second/node   • Storm:  115k  records/seconds/node Grep Throughput  per   node  (MB/s) 0 17.5 35 52.5 70 Record  size  (bytes) 100 1000 Spark Storm WordCount 0 7.5 15 22.5 30 Record  size  (bytes) 100 1000 Tested  with  100  EC2  instances  with  4  core  each   Comparison  taken  from  Das  Thatagata  and  Reynold  Xin  Hadoop  summit  2013  presentaGon
  • 25. Monitoring In  addiGon  StreamListener  interface  provides  addiGonal  informaGon  in  various  levels     (ApplicaGon,  Job,  Task,  etc.)    
  • 27. Utilization ❖ Spark  1.2  introduces  dynamic  cluster  resource  allocaGon   ❖ Jobs  can  request  more  resources  and  release  resource   ❖ Available  only  on  YARN