SlideShare a Scribd company logo
1 of 35
Download to read offline
Spark Streaming and IoT
Michael J. Freedman
iobeam
Technology confluence in IoT
UBIQUITOUS SENSORS
REAL-TIME
SYSTEMS
MACHINE
LEARNING
DATA
ANALYSIS
INTERSECTION OF 3 MAJOR TRENDS
Data analysis is the killer app
CASE STUDY: PREDICTIVE MAINTENANCE
Predicting motor failure through
analysis of vibration data
CASE STUDY: HEALTH & FITNESS
Exercise identification based on
3D motion data analysis
CASE STUDY: SMART CITIES
Traffic and air quality monitoring via
GPS and environmental sensor
CASE STUDY: SMART GRID
Demand-response optimizations on
supply-side capacity, spot prices
Challenges in applying Spark to IoT
REQUIREMENTS
2
Devices send data at varying
delays and rates
2 Handling delayed data transparently
3
Processing many low-volume,
independent streams
1
One IoT app performs tasks
at different time intervals
1
Supporting full spectrum of
batch to real-time analysis
3
Within org, multiple IoT apps
run concurrently
4
Multi-tenancy with low-volume apps
and high utilization
CHALLENGES
Potential economic impact of IoT is >$11 trillion per year,
even while 99% of IoT data goes unused today.
— 2015 McKinsey study
Required: Programming + data infra abstractions
Supporting full spectrum of
batch to real-time analysis
1
IoT analysis spans many intervals
BATCH PROCESSING
(HOURS, NIGHTLY)
STREAM PROCESSING
(REAL-TIME)
Fire / Hazard Detection
Immediately
Bus Location Updates
15 sec
Traffi
c Conditions
1 min
Environmental Conditions
15 min
Traffi
c Optimizations
Daily
Spark simplifies programming across intervals
val readings = iobeamInterface.getInputStreamRecords()
// Trigger temperatures that fall outside acceptable conditions
val bad_temps = readings.filter(t => t > highTempThreshold || t < lowTempThreshold)
val triggers = new TriggerEventStream(bad_temps.map(t => new TriggerEvent("bad_temperature", t)))
// Compute mean temperatures over 5 min windows
val windows = readings.groupByKeyAndWindow(Seconds( 300 ), Seconds(60))
val mean_temps = new TimeSeriesStream("mean_temperature", windows.map(t => t.sum / t.length))
new OutputStreams(mean_temps, triggers)
30
1800
But programming != data abstractions
DATA STREAMS
(KAFKA, FLUME, SOCKETS, ETC.)
DATA FILES
(HDFS, ETC.)
BATCH PROCESSING
(HOURS, NIGHTLY)
STREAM PROCESSING
(REAL-TIME)
Programming != data abstractions
Traffi
c Conditions
30
sec
Traffi
c Conditions
1 hour
Frequencies change as products evolve1
Programming != data abstractions
Joining real-time with historical data2
Frequencies change as products evolve1
5 min mean vs. trailing - hourly mean
- hourly mean from yesterday
- hourly mean from last week
Programming != data abstractions
Joining real-time with historical data2
Supporting backfill for delayed data3
Frequencies change as products evolve1
Programming != data abstractions
Joining real-time with historical data2
Supporting backfill for delayed data3
Frequencies change as products evolve1
Data Series Abstraction
Handling delayed data transparently2
Windows in streaming DBs
Tumbling windows
titjtk
Windows in streaming DBs
Sliding windows
• Defined over # of tuples
• Defined over time period
…using arrival_time of tuples
titjtk
But IoT data is often delayed
Seconds due to
network congestion
Minutes due to duty cycling
for energy savings
Minutes to hours due to
intermittent connectivity
Windowing data by arrival time has no semantic meaning
titjtk
Wanted: Data generation time, not arrival time
titjtk
filter by timestamp
Data semantics defined over timestamp
e.g., aggregation
JOIN ( , historical data)
Wanted: Backfill does not change semantics
titjtk
filter by timestamp
Data semantics defined over timestamp
e.g., aggregation
JOIN ( , historical data)
…from recent streaming data…
…from historical archive…
Wanted: Better data infra abstractions
titjtk
filter by timestamp
e.g., aggregation
JOIN ( , historical data)
Data Series Abstraction
…from recent streaming data…
…from historical archive…
Processing many low-volume,
independent streams
3
IoT Device Streams
Wanted: Maintain state across batches
Map to good/bad conditions
titjtk
Alert on condition transition
Spark: Share state through RDDs
Click streams
Ad impressions
Market feeds
Shared state b/w
stream partitions
‣ Transforms RDD, makes state available across cluster
‣ Many great uses, e.g., learning parameters in iterative ML
‣ But increases data lineage increases checkpointing cost
Maintain shared state via updateKeyByState()
IoT: Many independent streams
‣ Each worker handles 1+ streams, not multiple workers per stream
‣ Use language data structures (e.g., Java Map) to maintain state within worker
‣ No RDD transform no lineage increase no increased checkpointing cost
Independent state
per stream
IoT Device Streams
Often only need to maintain state within individual streams
Multi-tenancy with low-volume apps
and high utilization
4
Multi-tenancy for batch processing
Job Queue
Server Cores
Spark: 1 worker = 1 server core
Goal: Minimize time-to-completion
Multi-tenancy for batch processing
Job Queue
Server Cores
Spark: 1 worker = 1 server core
Goal: Minimize time-to-completion
Multi-tenancy for stream processing
Job Queue
Server Cores
Spark: 1 worker = 1 server core
Problem: Low utilization with low-rate apps
Multi-tenancy for stream processing
Job Queue
Server Cores
Virtual Cores
(e.g., resource-limited
containers)
1 worker = 1 virtual core
N workers = 1 server core
Goal: Improve utilization with low-rate apps
Multi-tenancy for stream processing
Job Queue
Server Cores
Virtual Cores
(e.g., resource-limited
containers)
1 worker = 1 virtual core
N workers = 1 server core
Goal: Improve utilization with low-rate apps
Spark + Unified Data Infrastructure
Required: Programming + data infra abstractions
Required: Programming + data infra abstractions
Device-Model-Infra (DMI) framework for IoT
Questions?
Developers: docs.iobeam.com
Whitepaper: www.iobeam.com/docs/iobeam-DMI.pdf

More Related Content

What's hot

Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...
Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...
Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...
Simplilearn
 
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
Simplilearn
 

What's hot (20)

Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Hadoop Hive Tutorial | Hive Fundamentals | Hive Architecture
Hadoop Hive Tutorial | Hive Fundamentals | Hive ArchitectureHadoop Hive Tutorial | Hive Fundamentals | Hive Architecture
Hadoop Hive Tutorial | Hive Fundamentals | Hive Architecture
 
Modern real-time streaming architectures
Modern real-time streaming architecturesModern real-time streaming architectures
Modern real-time streaming architectures
 
Deep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache SparkDeep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache Spark
 
Kafka Retry and DLQ
Kafka Retry and DLQKafka Retry and DLQ
Kafka Retry and DLQ
 
Stream processing using Kafka
Stream processing using KafkaStream processing using Kafka
Stream processing using Kafka
 
Introduction to Spark Streaming
Introduction to Spark StreamingIntroduction to Spark Streaming
Introduction to Spark Streaming
 
YARN Federation
YARN Federation YARN Federation
YARN Federation
 
Learn Apache Spark: A Comprehensive Guide
Learn Apache Spark: A Comprehensive GuideLearn Apache Spark: A Comprehensive Guide
Learn Apache Spark: A Comprehensive Guide
 
Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...
Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...
Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...
 
PySpark dataframe
PySpark dataframePySpark dataframe
PySpark dataframe
 
Apache Flink: Real-World Use Cases for Streaming Analytics
Apache Flink: Real-World Use Cases for Streaming AnalyticsApache Flink: Real-World Use Cases for Streaming Analytics
Apache Flink: Real-World Use Cases for Streaming Analytics
 
Building large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudiBuilding large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudi
 
The Top Five Mistakes Made When Writing Streaming Applications with Mark Grov...
The Top Five Mistakes Made When Writing Streaming Applications with Mark Grov...The Top Five Mistakes Made When Writing Streaming Applications with Mark Grov...
The Top Five Mistakes Made When Writing Streaming Applications with Mark Grov...
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Apache Spark Overview
Apache Spark OverviewApache Spark Overview
Apache Spark Overview
 
Spark streaming: Best Practices
Spark streaming: Best PracticesSpark streaming: Best Practices
Spark streaming: Best Practices
 
Shipping Data from Postgres to Clickhouse, by Murat Kabilov, Adjust
Shipping Data from Postgres to Clickhouse, by Murat Kabilov, AdjustShipping Data from Postgres to Clickhouse, by Murat Kabilov, Adjust
Shipping Data from Postgres to Clickhouse, by Murat Kabilov, Adjust
 
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
 
Apache Iceberg Presentation for the St. Louis Big Data IDEA
Apache Iceberg Presentation for the St. Louis Big Data IDEAApache Iceberg Presentation for the St. Louis Big Data IDEA
Apache Iceberg Presentation for the St. Louis Big Data IDEA
 

Similar to Spark Streaming and IoT by Mike Freedman

So Long Computer Overlords
So Long Computer OverlordsSo Long Computer Overlords
So Long Computer Overlords
Ian Foster
 
Rpi talk foster september 2011
Rpi talk foster september 2011Rpi talk foster september 2011
Rpi talk foster september 2011
Ian Foster
 
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
confluent
 

Similar to Spark Streaming and IoT by Mike Freedman (20)

Webinar: Cutting Time, Complexity and Cost from Data Science to Production
Webinar: Cutting Time, Complexity and Cost from Data Science to ProductionWebinar: Cutting Time, Complexity and Cost from Data Science to Production
Webinar: Cutting Time, Complexity and Cost from Data Science to Production
 
SQL Server 2008 R2 StreamInsight
SQL Server 2008 R2 StreamInsightSQL Server 2008 R2 StreamInsight
SQL Server 2008 R2 StreamInsight
 
Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex
 
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
 
Cloud Experience: Data-driven Applications Made Simple and Fast
Cloud Experience: Data-driven Applications Made Simple and FastCloud Experience: Data-driven Applications Made Simple and Fast
Cloud Experience: Data-driven Applications Made Simple and Fast
 
Wikibon #IoT #HyperConvergence Presentation via @theCUBE
Wikibon #IoT #HyperConvergence Presentation via @theCUBE Wikibon #IoT #HyperConvergence Presentation via @theCUBE
Wikibon #IoT #HyperConvergence Presentation via @theCUBE
 
Hyper-Convergence CrowdChat
Hyper-Convergence CrowdChatHyper-Convergence CrowdChat
Hyper-Convergence CrowdChat
 
ParStream - Big Data for Business Users
ParStream - Big Data for Business UsersParStream - Big Data for Business Users
ParStream - Big Data for Business Users
 
So Long Computer Overlords
So Long Computer OverlordsSo Long Computer Overlords
So Long Computer Overlords
 
[WSO2Con EU 2018] The Rise of Streaming SQL
[WSO2Con EU 2018] The Rise of Streaming SQL[WSO2Con EU 2018] The Rise of Streaming SQL
[WSO2Con EU 2018] The Rise of Streaming SQL
 
Future Grid Overview 2018
Future Grid Overview 2018Future Grid Overview 2018
Future Grid Overview 2018
 
RTI Data-Distribution Service (DDS) Master Class 2011
RTI Data-Distribution Service (DDS) Master Class 2011RTI Data-Distribution Service (DDS) Master Class 2011
RTI Data-Distribution Service (DDS) Master Class 2011
 
Real Time Event Processing and In-­memory analysis of Big Data - StampedeCon ...
Real Time Event Processing and In-­memory analysis of Big Data - StampedeCon ...Real Time Event Processing and In-­memory analysis of Big Data - StampedeCon ...
Real Time Event Processing and In-­memory analysis of Big Data - StampedeCon ...
 
Trivento summercamp masterclass 9/9/2016
Trivento summercamp masterclass 9/9/2016Trivento summercamp masterclass 9/9/2016
Trivento summercamp masterclass 9/9/2016
 
EDA Meets Data Engineering – What's the Big Deal?
EDA Meets Data Engineering – What's the Big Deal?EDA Meets Data Engineering – What's the Big Deal?
EDA Meets Data Engineering – What's the Big Deal?
 
Rpi talk foster september 2011
Rpi talk foster september 2011Rpi talk foster september 2011
Rpi talk foster september 2011
 
Les objets connectés : de nombreux cas d'usage
Les objets connectés : de nombreux cas d'usage Les objets connectés : de nombreux cas d'usage
Les objets connectés : de nombreux cas d'usage
 
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
 
FlinkDTW: Time-series Pattern Search at Scale Using Dynamic Time Warping - Ch...
FlinkDTW: Time-series Pattern Search at Scale Using Dynamic Time Warping - Ch...FlinkDTW: Time-series Pattern Search at Scale Using Dynamic Time Warping - Ch...
FlinkDTW: Time-series Pattern Search at Scale Using Dynamic Time Warping - Ch...
 
DM Radio Webinar: Adopting a Streaming-Enabled Architecture
DM Radio Webinar: Adopting a Streaming-Enabled ArchitectureDM Radio Webinar: Adopting a Streaming-Enabled Architecture
DM Radio Webinar: Adopting a Streaming-Enabled Architecture
 

More from Spark Summit

Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang WuApache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Spark Summit
 
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data  with Ramya RaghavendraImproving Traffic Prediction Using Weather Data  with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Spark Summit
 
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
Spark Summit
 
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya RaghavendraImproving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Spark Summit
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Spark Summit
 
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
Spark Summit
 

More from Spark Summit (20)

FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
 
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
 
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang WuApache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
 
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data  with Ramya RaghavendraImproving Traffic Prediction Using Weather Data  with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
 
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
 
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
 
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
 
Next CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub WozniakNext CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub Wozniak
 
Powering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin KimPowering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin Kim
 
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya RaghavendraImproving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
 
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
 
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
 
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
 
Goal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim SimeonovGoal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim Simeonov
 
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
 
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir VolkGetting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir Volk
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
 
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
 

Recently uploaded

Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
gajnagarg
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Bertram Ludäscher
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
nirzagarg
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
gajnagarg
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
ranjankumarbehera14
 
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
HyderabadDolls
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
Health
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Klinik kandungan
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
vexqp
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
HyderabadDolls
 

Recently uploaded (20)

Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
 
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
 
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
 
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
 
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
 
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
 

Spark Streaming and IoT by Mike Freedman

  • 1. Spark Streaming and IoT Michael J. Freedman iobeam
  • 2. Technology confluence in IoT UBIQUITOUS SENSORS REAL-TIME SYSTEMS MACHINE LEARNING DATA ANALYSIS INTERSECTION OF 3 MAJOR TRENDS
  • 3. Data analysis is the killer app CASE STUDY: PREDICTIVE MAINTENANCE Predicting motor failure through analysis of vibration data CASE STUDY: HEALTH & FITNESS Exercise identification based on 3D motion data analysis CASE STUDY: SMART CITIES Traffic and air quality monitoring via GPS and environmental sensor CASE STUDY: SMART GRID Demand-response optimizations on supply-side capacity, spot prices
  • 4. Challenges in applying Spark to IoT REQUIREMENTS 2 Devices send data at varying delays and rates 2 Handling delayed data transparently 3 Processing many low-volume, independent streams 1 One IoT app performs tasks at different time intervals 1 Supporting full spectrum of batch to real-time analysis 3 Within org, multiple IoT apps run concurrently 4 Multi-tenancy with low-volume apps and high utilization CHALLENGES Potential economic impact of IoT is >$11 trillion per year, even while 99% of IoT data goes unused today. — 2015 McKinsey study
  • 5. Required: Programming + data infra abstractions
  • 6. Supporting full spectrum of batch to real-time analysis 1
  • 7. IoT analysis spans many intervals BATCH PROCESSING (HOURS, NIGHTLY) STREAM PROCESSING (REAL-TIME) Fire / Hazard Detection Immediately Bus Location Updates 15 sec Traffi c Conditions 1 min Environmental Conditions 15 min Traffi c Optimizations Daily
  • 8. Spark simplifies programming across intervals val readings = iobeamInterface.getInputStreamRecords() // Trigger temperatures that fall outside acceptable conditions val bad_temps = readings.filter(t => t > highTempThreshold || t < lowTempThreshold) val triggers = new TriggerEventStream(bad_temps.map(t => new TriggerEvent("bad_temperature", t))) // Compute mean temperatures over 5 min windows val windows = readings.groupByKeyAndWindow(Seconds( 300 ), Seconds(60)) val mean_temps = new TimeSeriesStream("mean_temperature", windows.map(t => t.sum / t.length)) new OutputStreams(mean_temps, triggers) 30 1800
  • 9. But programming != data abstractions DATA STREAMS (KAFKA, FLUME, SOCKETS, ETC.) DATA FILES (HDFS, ETC.) BATCH PROCESSING (HOURS, NIGHTLY) STREAM PROCESSING (REAL-TIME)
  • 10. Programming != data abstractions Traffi c Conditions 30 sec Traffi c Conditions 1 hour Frequencies change as products evolve1
  • 11. Programming != data abstractions Joining real-time with historical data2 Frequencies change as products evolve1 5 min mean vs. trailing - hourly mean - hourly mean from yesterday - hourly mean from last week
  • 12. Programming != data abstractions Joining real-time with historical data2 Supporting backfill for delayed data3 Frequencies change as products evolve1
  • 13. Programming != data abstractions Joining real-time with historical data2 Supporting backfill for delayed data3 Frequencies change as products evolve1 Data Series Abstraction
  • 14. Handling delayed data transparently2
  • 15. Windows in streaming DBs Tumbling windows titjtk
  • 16. Windows in streaming DBs Sliding windows • Defined over # of tuples • Defined over time period …using arrival_time of tuples titjtk
  • 17. But IoT data is often delayed Seconds due to network congestion Minutes due to duty cycling for energy savings Minutes to hours due to intermittent connectivity Windowing data by arrival time has no semantic meaning titjtk
  • 18. Wanted: Data generation time, not arrival time titjtk filter by timestamp Data semantics defined over timestamp e.g., aggregation JOIN ( , historical data)
  • 19. Wanted: Backfill does not change semantics titjtk filter by timestamp Data semantics defined over timestamp e.g., aggregation JOIN ( , historical data) …from recent streaming data… …from historical archive…
  • 20. Wanted: Better data infra abstractions titjtk filter by timestamp e.g., aggregation JOIN ( , historical data) Data Series Abstraction …from recent streaming data… …from historical archive…
  • 21. Processing many low-volume, independent streams 3 IoT Device Streams
  • 22. Wanted: Maintain state across batches Map to good/bad conditions titjtk Alert on condition transition
  • 23. Spark: Share state through RDDs Click streams Ad impressions Market feeds Shared state b/w stream partitions ‣ Transforms RDD, makes state available across cluster ‣ Many great uses, e.g., learning parameters in iterative ML ‣ But increases data lineage increases checkpointing cost Maintain shared state via updateKeyByState()
  • 24. IoT: Many independent streams ‣ Each worker handles 1+ streams, not multiple workers per stream ‣ Use language data structures (e.g., Java Map) to maintain state within worker ‣ No RDD transform no lineage increase no increased checkpointing cost Independent state per stream IoT Device Streams Often only need to maintain state within individual streams
  • 25. Multi-tenancy with low-volume apps and high utilization 4
  • 26. Multi-tenancy for batch processing Job Queue Server Cores Spark: 1 worker = 1 server core Goal: Minimize time-to-completion
  • 27. Multi-tenancy for batch processing Job Queue Server Cores Spark: 1 worker = 1 server core Goal: Minimize time-to-completion
  • 28. Multi-tenancy for stream processing Job Queue Server Cores Spark: 1 worker = 1 server core Problem: Low utilization with low-rate apps
  • 29. Multi-tenancy for stream processing Job Queue Server Cores Virtual Cores (e.g., resource-limited containers) 1 worker = 1 virtual core N workers = 1 server core Goal: Improve utilization with low-rate apps
  • 30. Multi-tenancy for stream processing Job Queue Server Cores Virtual Cores (e.g., resource-limited containers) 1 worker = 1 virtual core N workers = 1 server core Goal: Improve utilization with low-rate apps
  • 31. Spark + Unified Data Infrastructure
  • 32. Required: Programming + data infra abstractions
  • 33. Required: Programming + data infra abstractions