SlideShare una empresa de Scribd logo
1 de 51
Building scalable IoT apps
using OSS technologies
Pavel Hardak
Basho Technologies
Disclaimer: some of the opinions expressed here are mine and might not fully agree with those of
IOT & INDUSTRY VERTICALS
IoT market - growth prediction
Number of connected “things”
• 2016 – about 6.4 B
• 30% YoY growth, 5.5M activations per day
• 2020 – about 21 B
“By 2020 more than half of new major business processes
and systems will incorporate some element of Internet of
Things”
Reality Check - let us get a second opinion
IoT Project Plan
• Investigate those “things” and figure out
• What protocols they support (CoAP, MQTT, HTTP, …)
• What data they generate (temperature, humidity, location, speed, ...)
• Collect this data in our data center
• Implement protocols and parsing routines
• Store into persistent storage (“Data Lake” architecture)
• Once stored in Data Lake
• Analyze, summarize, “slice and dice”
• Predict, discover insights
• Declare a victory – make profit & go for IPO
Data Lake
IoT
Devices
SQL
Apps &
AnalyticsMQTT, CoAP and
HTTP
REFERENCE ARCHITECTURE (?)
Not so fast, my friend.
What is wrong with “Data Lake” for IoT ?
Auto Insurance - Micro Case Study
• One of top 5 auto insurance companies in USA, appears in Fortune-500 list
• More than $10B in annual revenue, above $15B in assets
• About 20,000 employees and 50,000 insurance agents
• More than 19 million individual policies across all 50 states
How this “rating info” influences your payment ?
• Garaging Zip – what neighborhood is the car parked when it is
not used? There is a high correlation between Zip code and the
probability of car being stolen or vandalized.
• Current and Previous Annual Mileage – if the insured drives
for longer distances, it leads to the higher probability of road
accidents or car malfunctions.
• Vehicle Usage – do you use your car for work or pleasure? Are
you commuter, student, stay-at-home parent or Uber driver?
Depending on your usage, the company will calculate the risk
and adjust the rate.
• Years of Driving Experience – young drivers are put into
higher risk categories, where older people are considered safer
drivers due to more time behind the wheel. Note - average
young driver vs. average experienced driver.
Sampling Frequency and Dataset Size
• Mileage
• From one sample per year to 52 (weekly) or 365 (daily)
• Better - let us do hourly to “see” the car usage (commuter, …)
• Location (used to be “Garaging Zip”)
• From one sample per year to 365 (daily)
• Better - hourly, allows to learn when car is parked for several hours
• New factors for rating algorithm based on weekly summaries
• Hard brakes, hard accelerations, going above the speed limit, …
• Amount of time series data to be stored and analyzed
• Grows by factor of 365x, then by another 24x = 8760x
Each week – at least 50x more data than the whole previous year.
What is different special about IoT?
It is about the “things”… and more.
IoT Data Categories
Category Description
Metadata
&
Profiles
Devices Device info (model, SN, firmware, sensors, ..), configuration, owner, …
Users Personal info, preferences, billing info, registered devices, …
Time
Series
Ingested
(“Raw”)
Measurements, statuses and events from devices.
Aggregated
(“Derived”)
Calculated data - from devices & profiles
• Rollups – aggregate metrics from low resolution to higher ones (min -
hour – day) using min, max, avg, ...
• Aggregations – aggregate measurements, configuration and profiles
(model, region, …) over time ranges
IOT - NETWORKING TECHNOLOGIES
NETWORK WISH LIST
• Extreme Reliability
• Guaranteed Delivery
• End-to-End Low Latency
• Quality of Service
• Engineered Topology
• Committed Bandwidth (CIR)
• Fiber-optic network
• Dedicated Channel
• Strong Signal
• Interference and Crosstalk Resistant
• High SNR (Signal to Noise Ratio)
• Very Low BER (Bit Error Rate)
REALITY CHECK - LET US LOOK AGAIN
IOT & NETWORK - REALITY
• Wireless technologies
• Shared transmission media
• Limited bandwidth
• Mesh or Ad-hoc Topology
• Possible signals interference
• Mis-ordered or lost packets
• Low cost hardware components
• Low power radio transmitters
• Very small antennas
• “Custom-made” firmware
• Constrained Application Protocol
(CoAP)
• “Best Effort” QoS (“shoot and forget”)
IoT is “Big Data” - by definition.
Actually, lots and lots of Big Data.
Five “V”s IoT data
Velocity Torrent of small writes (sensors). Reads – millions of low-latency queries: user and device
profiles, range queries for TS data (slices). Stream of updates (profiles) - beware of
conflicts.
Five “V”s IoT data
Velocity Torrent of small writes (sensors). Reads – millions of low-latency queries, user and device
profiles, range queries for TS data (slices). Stream of updates (profiles) - beware of
conflicts.
Variety Sensors data (time series), users and devices profiles, also time series “derived” data
(e.g. rollups, aggregations).
Five “V”s IoT data
Velocity Torrent of small writes (sensors). Reads – millions of low-latency queries, user and device
profiles, range queries for TS data (slices). Stream of updates (profiles) - beware of
conflicts.
Variety Sensors data (time series), users and devices profiles, also time series “derived” data
(e.g. rollups, aggregations).
Volume Starts small, grows quickly, keeps coming 24x7x365 (nights, weekends and holidays).
Spikes up on new model launches or successful marketing campaign. But can slow down,
but will keep growing. Efficient data retention policy is critical to prevent overflows.
Five “V”s IoT data
Velocity Torrent of small writes (sensors). Reads – millions of low-latency queries, user and device
profiles, range queries for TS data (slices). Stream of updates (profiles) - beware of
conflicts.
Variety Sensors data (time series), users and devices profiles, also time series “derived” data
(e.g. rollups, aggregations).
Volume Starts small, grows quickly, keeps coming 24x7x365 (nights, weekends and holidays).
Spikes up on new model launches or successful marketing campaign. But can slow down,
but will keep growing. Efficient data retention policy is critical to prevent overflows.
Veracity Generally trustworthy, but beware of “low cost” sensors with low accuracy. Sent over not-
so-reliable transport - expect that some data will be corrupted or arrive late or might be
lost. (Hopefully the devices were not hijacked or impersonated by hackers)
Five “V”s IoT data
Velocity Torrent of small writes (sensors). Reads – millions of low-latency queries, user and device
profiles, range queries for TS data (slices). Stream of updates (profiles) - beware of
conflicts.
Variety Sensors data (time series), users and devices profiles, also time series “derived” data
(e.g. rollups, aggregations).
Volume Starts small, grows quickly, keeps coming 24x7x365 (nights, weekends and holidays).
Spikes up on new model launches or successful marketing campaign. But can slow down,
but will keep growing. Efficient data retention policy is critical to prevent overflows.
Veracity Generally trustworthy, but beware of “low cost” sensors with low accuracy. Sent over not-
so-reliable transport - expect that some data will be corrupted or arrive late or might be
lost. (Hopefully the devices were not hijacked or impersonated by hackers)
Value Profiles and summaries are much more valuable than raw data samples. The value of
“raw” time series quickly goes down was processed and clock advances. Aggregated
(”derived”) data are more valuable than raw data.
Exceptions: financial transactions, life support, nuclear plants, oil rigs, …
Five “V”s IoT data
Velocity Torrent of small writes (sensors). Reads – millions of low-latency queries, user and device
profiles, range queries for TS data (slices). Stream of updates (profiles) - beware of
conflicts.
Variety Sensors data (time series), users and devices profiles, also time series “derived” data
(e.g. rollups, aggregations).
Volume Starts small, grows quickly, keeps coming 24x7x365 (nights, weekends and holidays).
Spikes up on new model launches or successful marketing campaign. But can slow down,
but will keep growing. Efficient data retention policy is critical to prevent overflows.
Veracity Generally trustworthy, but beware of “low cost” sensors with low accuracy. Sent over not-
so-reliable transport - expect that some data will be corrupted or arrive late or might be
lost. (Hopefully the devices were not hijacked or impersonated by hackers)
Value Profiles and summaries are much more valuable than raw data samples. The value of
“raw” time series quickly goes down was processed and clock advances. Aggregated
(”derived”) data are more valuable than raw data.
Exceptions: financial transactions, life support, nuclear plants, oil rigs, …
Complexity Poly-structured using simple schemas and simple relations (usually implicit). Some data is
treated as unstructured (”opaque”) for speed or flexibility.
What architecture would work ?
Architectural Blueprints
• Lambda Architecture by Nathan Marz (ex-Twitter)
• Kappa Architecture by Jay Kreps (Confluent)
• Zeta Architecture by Jim Scott (MapR)
• … and their variants
Lambda
Kappa
Zeta
Data Processing Framework for IoT
• Uses “Best of breed” OSS technologies
• Combines two paradigms
• “Speed Layer” – pipeline for Stream Processing for “Data in Motion”
• “Serving Layer” – analytics for “Data in Motion” and “Data at Rest”
• Every component is “Distributed by Design”
• Collection Layer
• Message Queue
• Stream Processing
• Data Storage (Database, Object System, Data Warehouse)
• Query and Analytics Engines
Data Access Patterns
Category Description R:W %
Metadata
&
Profiles
Devices &
Users
Many low latency small reads - all over the dataset. Occasional
updates – possibly by different “actors” (web, device, app), conflicts
need to be prevented or resolved. Fewer creates and deletes.
90:10
Time
Series
Data Access Patterns
Category Description R:W %
Metadata
&
Profiles
Devices &
Users
Many low latency small reads - all over the dataset. Occasional
updates – possibly by different “actors” (web, device, app), conflicts
need to be prevented or resolved. Fewer creates and deletes.
90:10
Time
Series
Ingested
(“Raw”)
Very high throughout of relatively small writes. Most reads are over
recent time range “slice”. Updates are rare (corrections).
This category is a biggest part of the IoT application dataset.
10:90
Data Access Patterns
Category Description R:W %
Metadata
&
Profiles
Devices &
Users
Many low latency small reads - all over the dataset. Occasional
updates – possibly by different “actors” (web, device, app), conflicts
need to be prevented or resolved. Fewer creates and deletes.
90:10
Time
Series
Ingested
(“Raw”)
Very high throughout of relatively small writes. Most reads are over
recent time range “slice”. Updates are rare (corrections).
This category is a biggest part of the IoT application dataset.
10:90
Aggregated
(“Derived”)
Mostly reads – users, platform services, reports. Writes are
periodical on each time interval or from batch jobs.
80:20
Data store for IoT – “Wish list”
• Ingested (Raw) Time Series
• Very high write throughput
• Fast slice (time range) reads
• Aggregated (Derived) Time Series
• Auto-distributed + slice locality
• SQL-like queries
• Aggregations
• Bulk queries (analytics)
• Secondary Indexes (Tags)
• Efficient Storage
• Auto Data Retention (TTL)
• Build-in anti entropy
• Compression
• Hot Backups
• Profiles and Metadata
• Many concurrent reads with low latency
• Reliable writes (ACID or conflict
resolution)
• Unstructured or partially structured
• Secondary Indexes + Text Search
• Scalability and Availability
• Distributed architecture, no SPoF
• Linearly scalable - up and down
• Operational simplicity
• Master-less architecture
• Automatic rebalancing
• Metrics, logs, events
• Rolling upgrades
What DB type is a good fit for TS use cases?
Database Type For IoT or Time Series
Relational Key Value Document Wide Column Graph
MySQL Riak KV MongoDB Cassandra Neo4J
PostgreSQL DynamoDB CouchBase HBase Titan
Oracle Voldemort RethinkDB Accumulo Infinite Graph
There is a need for a new type of NoSQL database – Time
Series
None of existing DB types was designed to handle time series data
• Wide column DBs have high write throughput, but reads and updates are not their strength
• Key Value and Document DBs handle metadata well, but struggle with heavy writes and time-slicing
reads
• Relational - good with metadata (unless number of updates is high), but a bad choice for TS data
• Graph DB – not a good choice for either time series or metadata, can be added later on
Database Type For IoT or Time Series
Relational Key Value Document Wide Column Graph
MySQL Riak KV MongoDB Cassandra Neo4J
PostgreSQL DynamoDB CouchBase HBase Titan
Oracle Voldemort RethinkDB Accumulo Infinite Graph
Time Series
InfluxDB Riak TS Blueflood
KairosDB Prometeus Druid
OpenTSDB Dalmatiner Graphite
Iot Sensors Data – Hot to Cold
SENSORS DATA – HOT N’ COLD
Temp Purpose Description Immutable?
Boiling
Hot
App usage
Last known value(s) and/or for last N minutes, useful for
immediate responses, very frequently accessed
No
Hot Operational
dataset
Last 24 hours to several days or weeks (rarely months),
frequently accessed, dashboards and online analytics
Almost*
Warm Historical data
Older data, less frequently accessed, used mostly for
offline analytics and historical analysis
Yes
Cold Archives
Used only in rare situations, kept in long term storage for
regulatory or unpredicted purposes
Yes
STORAGE TIERS – FROM HOT TO COLD
RAM → Database (TSDB) → Object Storage → Archive
Data Lake
Temp Purpose Storage Products Immutable?
Boiling
Hot
App usage Internal app cache, Redis or Memcached No
Hot Operational
dataset
NoSQL Database (preferably Time Series DB)
Riak TS, OpenTSDB, KairosDB, Cassandra, HBase
Almost*
Warm Historical data
Object storage – HDFS (Hadoop), Ceph, Minio,
Riak S2 or AWS S3
Yes
Cold Archives Various Yes
STORAGE TIERS – REALITY CHECK
RAM → Database (TSDB) → Object Storage → Archive
Elastic Cache (Redis) → Database (Postgres, DynamoDB) → AWS S3 →
Glacier
Data Lake
Temp AWS Service Storage price, GB per month
Boiling Hot Elastic Cache (Redis) $15-45
Hot DynamoDB
RDS (Postgres)
$ 0.25-0.35 (SSD)
from $0.1 (Magnetic)
Warm Simple Storage Service (S3) $0.024 to $0.030
Cold Glacier $0.007
OSS technologies for scalable IoT apps
Component Open Source Technologies
Load balancer Ngnix, HA Proxy
Ingestion Kafka, RabbitMQ, ZeroMQ, Flume
Stream Computing Spark Streaming, Apache Flink, Kafka Streams, Samza
Time Series Store InfluxDB, KairosDB, Riak, Cassandra, OpenTSDB
Profiles Store CouchBase, Riak, MySQL, Postgres, MongoDB
Search Solr, Elastic Search
Object Storage HDFS (Hadoop), Minio, Riak S2, Ceph
Analytics Framework Apache Spark, MapReduce, Hive
SQL Query Engine Spark SQL, Presto, Impala, Drill
Cluster Manager Mesosphere DC/OS or Mesos, Kubernetes, Docker Swarm
Checklist for IoT technology stack
❑Is it vendor lock-in or open source software? Are there open APIs?
❑Can it be deployed in cloud? At the edge? In a data center? Using hybrid
approach?
❑Can it be used it for free or low cost (no big upfront investment)?
❑Can you develop your app on your laptop? How many “moving parts”?
❑Are the components pre-integrated or can be easily integrated together?
❑Can you easily scale each component in this architecture by 10x? 20x? 50x?
❑Is there a roadmap, actively worked on, which is aligned with your vision?
❑Is there a company behind the technology to provide 24x7 support when needed?
Come to Basho booth to learn about
• Riak TS (Time Series) - highly scalable NoSQL database for IoT and Time
Series
… and more
• Riak Spark Connector for Apache Spark
• Riak Integrations with Redis and Kafka
• Riak Mesos Framework (RMF) for DC/OS
QUESTIONS?
Building Scalable IoT Apps (QCon S-F)

Más contenido relacionado

La actualidad más candente

Powering the Internet of Things with Apache Hadoop
Powering the Internet of Things with Apache HadoopPowering the Internet of Things with Apache Hadoop
Powering the Internet of Things with Apache HadoopCloudera, Inc.
 
IOT Platform as a Service
IOT Platform as a ServiceIOT Platform as a Service
IOT Platform as a Servicekidozen
 
ParStream - Big Data for Business Users
ParStream - Big Data for Business UsersParStream - Big Data for Business Users
ParStream - Big Data for Business UsersParStream Inc.
 
IoT Meets Big Data: The Opportunities and Challenges by Syed Hoda of ParStream
IoT Meets Big Data: The Opportunities and Challenges by Syed Hoda of ParStreamIoT Meets Big Data: The Opportunities and Challenges by Syed Hoda of ParStream
IoT Meets Big Data: The Opportunities and Challenges by Syed Hoda of ParStreamgogo6
 
Overcoming the AIoT Obstacles through Smart Component Integration
Overcoming the AIoT Obstacles through Smart Component IntegrationOvercoming the AIoT Obstacles through Smart Component Integration
Overcoming the AIoT Obstacles through Smart Component IntegrationInnodisk Corporation
 
Architect Your IoT Platform for Success
Architect Your IoT Platform for SuccessArchitect Your IoT Platform for Success
Architect Your IoT Platform for SuccessSolace
 
CL2015 - Datacenter and Cloud Strategy and Planning
CL2015 - Datacenter and Cloud Strategy and PlanningCL2015 - Datacenter and Cloud Strategy and Planning
CL2015 - Datacenter and Cloud Strategy and PlanningCisco
 
Brian Isle: The Internet of Things: Manufacturing Panacea - or - Hacker's Dream?
Brian Isle: The Internet of Things: Manufacturing Panacea - or - Hacker's Dream?Brian Isle: The Internet of Things: Manufacturing Panacea - or - Hacker's Dream?
Brian Isle: The Internet of Things: Manufacturing Panacea - or - Hacker's Dream?360mnbsu
 
2015-09-16 IoT in Oil and Gas Conference
2015-09-16 IoT in Oil and Gas Conference2015-09-16 IoT in Oil and Gas Conference
2015-09-16 IoT in Oil and Gas ConferenceMark Reynolds
 
Short introduction to Big Data Analytics, the Internet of Things, and their s...
Short introduction to Big Data Analytics, the Internet of Things, and their s...Short introduction to Big Data Analytics, the Internet of Things, and their s...
Short introduction to Big Data Analytics, the Internet of Things, and their s...Andrei Khurshudov
 
Powering the Intelligent Edge: HPE's Strategy and Direction for IoT & Big Data
Powering the Intelligent Edge: HPE's Strategy and Direction for IoT & Big DataPowering the Intelligent Edge: HPE's Strategy and Direction for IoT & Big Data
Powering the Intelligent Edge: HPE's Strategy and Direction for IoT & Big DataDataWorks Summit
 
Connected barrels_IoT in Oil and Gas_deloitte
Connected barrels_IoT in Oil and Gas_deloitteConnected barrels_IoT in Oil and Gas_deloitte
Connected barrels_IoT in Oil and Gas_deloitteAnshu Mittal
 
Real-Time Communications and the Industrial Internet of Things
 Real-Time Communications and the Industrial Internet of Things Real-Time Communications and the Industrial Internet of Things
Real-Time Communications and the Industrial Internet of ThingsReal-Time Innovations (RTI)
 
HPE Presentation on Internet of Things at IoT World 2016 - Dubai
HPE Presentation on Internet of Things at IoT World 2016 - DubaiHPE Presentation on Internet of Things at IoT World 2016 - Dubai
HPE Presentation on Internet of Things at IoT World 2016 - DubaiAlpha Data
 
Accelerating the AIoT @ the EDGE
Accelerating the AIoT @ the EDGE Accelerating the AIoT @ the EDGE
Accelerating the AIoT @ the EDGE Amazon Web Services
 
Internet of Things Stack - Presentation Version
Internet of Things Stack - Presentation VersionInternet of Things Stack - Presentation Version
Internet of Things Stack - Presentation VersionPostscapes
 

La actualidad más candente (20)

Powering the Internet of Things with Apache Hadoop
Powering the Internet of Things with Apache HadoopPowering the Internet of Things with Apache Hadoop
Powering the Internet of Things with Apache Hadoop
 
IOT Platform as a Service
IOT Platform as a ServiceIOT Platform as a Service
IOT Platform as a Service
 
ParStream - Big Data for Business Users
ParStream - Big Data for Business UsersParStream - Big Data for Business Users
ParStream - Big Data for Business Users
 
IoT Meets Big Data: The Opportunities and Challenges by Syed Hoda of ParStream
IoT Meets Big Data: The Opportunities and Challenges by Syed Hoda of ParStreamIoT Meets Big Data: The Opportunities and Challenges by Syed Hoda of ParStream
IoT Meets Big Data: The Opportunities and Challenges by Syed Hoda of ParStream
 
AI as a Catalyst for IoT
AI as a Catalyst for IoTAI as a Catalyst for IoT
AI as a Catalyst for IoT
 
IoT-Use-Case-eBook
IoT-Use-Case-eBookIoT-Use-Case-eBook
IoT-Use-Case-eBook
 
The IOT scenario in the digital age
The IOT scenario in the digital ageThe IOT scenario in the digital age
The IOT scenario in the digital age
 
Overcoming the AIoT Obstacles through Smart Component Integration
Overcoming the AIoT Obstacles through Smart Component IntegrationOvercoming the AIoT Obstacles through Smart Component Integration
Overcoming the AIoT Obstacles through Smart Component Integration
 
Architect Your IoT Platform for Success
Architect Your IoT Platform for SuccessArchitect Your IoT Platform for Success
Architect Your IoT Platform for Success
 
CL2015 - Datacenter and Cloud Strategy and Planning
CL2015 - Datacenter and Cloud Strategy and PlanningCL2015 - Datacenter and Cloud Strategy and Planning
CL2015 - Datacenter and Cloud Strategy and Planning
 
Brian Isle: The Internet of Things: Manufacturing Panacea - or - Hacker's Dream?
Brian Isle: The Internet of Things: Manufacturing Panacea - or - Hacker's Dream?Brian Isle: The Internet of Things: Manufacturing Panacea - or - Hacker's Dream?
Brian Isle: The Internet of Things: Manufacturing Panacea - or - Hacker's Dream?
 
2015-09-16 IoT in Oil and Gas Conference
2015-09-16 IoT in Oil and Gas Conference2015-09-16 IoT in Oil and Gas Conference
2015-09-16 IoT in Oil and Gas Conference
 
Short introduction to Big Data Analytics, the Internet of Things, and their s...
Short introduction to Big Data Analytics, the Internet of Things, and their s...Short introduction to Big Data Analytics, the Internet of Things, and their s...
Short introduction to Big Data Analytics, the Internet of Things, and their s...
 
Powering the Intelligent Edge: HPE's Strategy and Direction for IoT & Big Data
Powering the Intelligent Edge: HPE's Strategy and Direction for IoT & Big DataPowering the Intelligent Edge: HPE's Strategy and Direction for IoT & Big Data
Powering the Intelligent Edge: HPE's Strategy and Direction for IoT & Big Data
 
World of Watson IoT Journey Map
World of Watson IoT Journey MapWorld of Watson IoT Journey Map
World of Watson IoT Journey Map
 
Connected barrels_IoT in Oil and Gas_deloitte
Connected barrels_IoT in Oil and Gas_deloitteConnected barrels_IoT in Oil and Gas_deloitte
Connected barrels_IoT in Oil and Gas_deloitte
 
Real-Time Communications and the Industrial Internet of Things
 Real-Time Communications and the Industrial Internet of Things Real-Time Communications and the Industrial Internet of Things
Real-Time Communications and the Industrial Internet of Things
 
HPE Presentation on Internet of Things at IoT World 2016 - Dubai
HPE Presentation on Internet of Things at IoT World 2016 - DubaiHPE Presentation on Internet of Things at IoT World 2016 - Dubai
HPE Presentation on Internet of Things at IoT World 2016 - Dubai
 
Accelerating the AIoT @ the EDGE
Accelerating the AIoT @ the EDGE Accelerating the AIoT @ the EDGE
Accelerating the AIoT @ the EDGE
 
Internet of Things Stack - Presentation Version
Internet of Things Stack - Presentation VersionInternet of Things Stack - Presentation Version
Internet of Things Stack - Presentation Version
 

Destacado

Event Driven Streaming Analytics - Demostration on Architecture of IoT
Event Driven Streaming Analytics - Demostration on Architecture of IoTEvent Driven Streaming Analytics - Demostration on Architecture of IoT
Event Driven Streaming Analytics - Demostration on Architecture of IoTLei Xu
 
Time Series Data with Apache Cassandra (ApacheCon EU 2014)
Time Series Data with Apache Cassandra (ApacheCon EU 2014)Time Series Data with Apache Cassandra (ApacheCon EU 2014)
Time Series Data with Apache Cassandra (ApacheCon EU 2014)Eric Evans
 
DataStax et Apache Cassandra pour la gestion des flux IoT
DataStax et Apache Cassandra pour la gestion des flux IoTDataStax et Apache Cassandra pour la gestion des flux IoT
DataStax et Apache Cassandra pour la gestion des flux IoTVictor Coustenoble
 
Armando scannone recopilación de recetas
Armando scannone   recopilación de recetasArmando scannone   recopilación de recetas
Armando scannone recopilación de recetasFree lancer
 
Rethinking Topology In Cassandra (ApacheCon NA)
Rethinking Topology In Cassandra (ApacheCon NA)Rethinking Topology In Cassandra (ApacheCon NA)
Rethinking Topology In Cassandra (ApacheCon NA)Eric Evans
 
Time Series Data with Apache Cassandra
Time Series Data with Apache CassandraTime Series Data with Apache Cassandra
Time Series Data with Apache CassandraEric Evans
 
Time Series Data with Apache Cassandra
Time Series Data with Apache CassandraTime Series Data with Apache Cassandra
Time Series Data with Apache CassandraEric Evans
 
Creator Ci40 IoT kit & Framework - scalable LWM2M IoT dev platform for business
Creator Ci40 IoT kit & Framework - scalable LWM2M IoT dev platform for businessCreator Ci40 IoT kit & Framework - scalable LWM2M IoT dev platform for business
Creator Ci40 IoT kit & Framework - scalable LWM2M IoT dev platform for businessPaul Evans
 
Lightning fast analytics with Spark and Cassandra
Lightning fast analytics with Spark and CassandraLightning fast analytics with Spark and Cassandra
Lightning fast analytics with Spark and Cassandranickmbailey
 
Hands-on with AWS IoT
Hands-on with AWS IoTHands-on with AWS IoT
Hands-on with AWS IoTJulien SIMON
 
Time Series Processing with Apache Spark
Time Series Processing with Apache SparkTime Series Processing with Apache Spark
Time Series Processing with Apache SparkJosef Adersberger
 
Spark Summit EU talk by Bas Geerdink
Spark Summit EU talk by Bas GeerdinkSpark Summit EU talk by Bas Geerdink
Spark Summit EU talk by Bas GeerdinkSpark Summit
 
Why Scala Is Taking Over the Big Data World
Why Scala Is Taking Over the Big Data WorldWhy Scala Is Taking Over the Big Data World
Why Scala Is Taking Over the Big Data WorldDean Wampler
 
Time series with Apache Cassandra - Long version
Time series with Apache Cassandra - Long versionTime series with Apache Cassandra - Long version
Time series with Apache Cassandra - Long versionPatrick McFadin
 
A reference architecture for the internet of things
A reference architecture for the internet of thingsA reference architecture for the internet of things
A reference architecture for the internet of thingsCharles Gibbons
 
Introduction to Network Function Virtualization (NFV)
Introduction to Network Function Virtualization (NFV)Introduction to Network Function Virtualization (NFV)
Introduction to Network Function Virtualization (NFV)rjain51
 

Destacado (17)

Event Driven Streaming Analytics - Demostration on Architecture of IoT
Event Driven Streaming Analytics - Demostration on Architecture of IoTEvent Driven Streaming Analytics - Demostration on Architecture of IoT
Event Driven Streaming Analytics - Demostration on Architecture of IoT
 
Time Series Data with Apache Cassandra (ApacheCon EU 2014)
Time Series Data with Apache Cassandra (ApacheCon EU 2014)Time Series Data with Apache Cassandra (ApacheCon EU 2014)
Time Series Data with Apache Cassandra (ApacheCon EU 2014)
 
DataStax et Apache Cassandra pour la gestion des flux IoT
DataStax et Apache Cassandra pour la gestion des flux IoTDataStax et Apache Cassandra pour la gestion des flux IoT
DataStax et Apache Cassandra pour la gestion des flux IoT
 
Armando scannone recopilación de recetas
Armando scannone   recopilación de recetasArmando scannone   recopilación de recetas
Armando scannone recopilación de recetas
 
Rethinking Topology In Cassandra (ApacheCon NA)
Rethinking Topology In Cassandra (ApacheCon NA)Rethinking Topology In Cassandra (ApacheCon NA)
Rethinking Topology In Cassandra (ApacheCon NA)
 
Time Series Data with Apache Cassandra
Time Series Data with Apache CassandraTime Series Data with Apache Cassandra
Time Series Data with Apache Cassandra
 
Time Series Data with Apache Cassandra
Time Series Data with Apache CassandraTime Series Data with Apache Cassandra
Time Series Data with Apache Cassandra
 
Creator Ci40 IoT kit & Framework - scalable LWM2M IoT dev platform for business
Creator Ci40 IoT kit & Framework - scalable LWM2M IoT dev platform for businessCreator Ci40 IoT kit & Framework - scalable LWM2M IoT dev platform for business
Creator Ci40 IoT kit & Framework - scalable LWM2M IoT dev platform for business
 
Lightning fast analytics with Spark and Cassandra
Lightning fast analytics with Spark and CassandraLightning fast analytics with Spark and Cassandra
Lightning fast analytics with Spark and Cassandra
 
Hands-on with AWS IoT
Hands-on with AWS IoTHands-on with AWS IoT
Hands-on with AWS IoT
 
Time Series Processing with Apache Spark
Time Series Processing with Apache SparkTime Series Processing with Apache Spark
Time Series Processing with Apache Spark
 
Spark Summit EU talk by Bas Geerdink
Spark Summit EU talk by Bas GeerdinkSpark Summit EU talk by Bas Geerdink
Spark Summit EU talk by Bas Geerdink
 
Why Scala Is Taking Over the Big Data World
Why Scala Is Taking Over the Big Data WorldWhy Scala Is Taking Over the Big Data World
Why Scala Is Taking Over the Big Data World
 
Time series with Apache Cassandra - Long version
Time series with Apache Cassandra - Long versionTime series with Apache Cassandra - Long version
Time series with Apache Cassandra - Long version
 
A reference architecture for the internet of things
A reference architecture for the internet of thingsA reference architecture for the internet of things
A reference architecture for the internet of things
 
Introduction to Network Function Virtualization (NFV)
Introduction to Network Function Virtualization (NFV)Introduction to Network Function Virtualization (NFV)
Introduction to Network Function Virtualization (NFV)
 
IoT architecture
IoT architectureIoT architecture
IoT architecture
 

Similar a Building Scalable IoT Apps (QCon S-F)

Using Spark and Riak for IoT Apps—Patterns and Anti-Patterns: Spark Summit Ea...
Using Spark and Riak for IoT Apps—Patterns and Anti-Patterns: Spark Summit Ea...Using Spark and Riak for IoT Apps—Patterns and Anti-Patterns: Spark Summit Ea...
Using Spark and Riak for IoT Apps—Patterns and Anti-Patterns: Spark Summit Ea...Spark Summit
 
Lecture1 BIG DATA and Types of data in details
Lecture1 BIG DATA and Types of data in detailsLecture1 BIG DATA and Types of data in details
Lecture1 BIG DATA and Types of data in detailsAbhishekKumarAgrahar2
 
Lecture 1-big data engineering (Introduction).pdf
Lecture 1-big data engineering (Introduction).pdfLecture 1-big data engineering (Introduction).pdf
Lecture 1-big data engineering (Introduction).pdfahmedibrahimghnnam01
 
IARE_BDBA_ PPT_0.pptx
IARE_BDBA_ PPT_0.pptxIARE_BDBA_ PPT_0.pptx
IARE_BDBA_ PPT_0.pptxAIMLSEMINARS
 
Big data unit 2
Big data unit 2Big data unit 2
Big data unit 2RojaT4
 
Barga ACM DEBS 2013 Keynote
Barga ACM DEBS 2013 KeynoteBarga ACM DEBS 2013 Keynote
Barga ACM DEBS 2013 KeynoteRoger Barga
 
Dell Digital Transformation Through AI and Data Analytics Webinar
Dell Digital Transformation Through AI and  Data Analytics WebinarDell Digital Transformation Through AI and  Data Analytics Webinar
Dell Digital Transformation Through AI and Data Analytics WebinarBill Wong
 
SplunkLive! London - Splunk App for Stream & MINT Breakout
SplunkLive! London - Splunk App for Stream & MINT BreakoutSplunkLive! London - Splunk App for Stream & MINT Breakout
SplunkLive! London - Splunk App for Stream & MINT BreakoutSplunk
 
Big Data Session 1.pptx
Big Data Session 1.pptxBig Data Session 1.pptx
Big Data Session 1.pptxElsonPaul2
 
How to Use Big Data to Transform IT Operations
How to Use Big Data to Transform IT OperationsHow to Use Big Data to Transform IT Operations
How to Use Big Data to Transform IT OperationsExtraHop Networks
 
CrateDB Machine Data Platform Webinar
CrateDB Machine Data Platform Webinar CrateDB Machine Data Platform Webinar
CrateDB Machine Data Platform Webinar Caroline Stewart
 
bigdataintro.pptx
bigdataintro.pptxbigdataintro.pptx
bigdataintro.pptxAlbert Alex
 
Harnessing Big Data_UCLA
Harnessing Big Data_UCLAHarnessing Big Data_UCLA
Harnessing Big Data_UCLAPaul Barsch
 
Partner webinar presentation aws pebble_treasure_data
Partner webinar presentation aws pebble_treasure_dataPartner webinar presentation aws pebble_treasure_data
Partner webinar presentation aws pebble_treasure_dataTreasure Data, Inc.
 
When Streaming Becomes Strategic
When Streaming Becomes StrategicWhen Streaming Becomes Strategic
When Streaming Becomes StrategicMapR Technologies
 
Predictive Analytics World Chicago 2015
Predictive Analytics World Chicago 2015Predictive Analytics World Chicago 2015
Predictive Analytics World Chicago 2015Dan Potter
 

Similar a Building Scalable IoT Apps (QCon S-F) (20)

Using Spark and Riak for IoT Apps—Patterns and Anti-Patterns: Spark Summit Ea...
Using Spark and Riak for IoT Apps—Patterns and Anti-Patterns: Spark Summit Ea...Using Spark and Riak for IoT Apps—Patterns and Anti-Patterns: Spark Summit Ea...
Using Spark and Riak for IoT Apps—Patterns and Anti-Patterns: Spark Summit Ea...
 
Analytics&IoT
Analytics&IoTAnalytics&IoT
Analytics&IoT
 
Lecture1 BIG DATA and Types of data in details
Lecture1 BIG DATA and Types of data in detailsLecture1 BIG DATA and Types of data in details
Lecture1 BIG DATA and Types of data in details
 
Lecture 1-big data engineering (Introduction).pdf
Lecture 1-big data engineering (Introduction).pdfLecture 1-big data engineering (Introduction).pdf
Lecture 1-big data engineering (Introduction).pdf
 
IARE_BDBA_ PPT_0.pptx
IARE_BDBA_ PPT_0.pptxIARE_BDBA_ PPT_0.pptx
IARE_BDBA_ PPT_0.pptx
 
Big data unit 2
Big data unit 2Big data unit 2
Big data unit 2
 
Barga ACM DEBS 2013 Keynote
Barga ACM DEBS 2013 KeynoteBarga ACM DEBS 2013 Keynote
Barga ACM DEBS 2013 Keynote
 
Dell Digital Transformation Through AI and Data Analytics Webinar
Dell Digital Transformation Through AI and  Data Analytics WebinarDell Digital Transformation Through AI and  Data Analytics Webinar
Dell Digital Transformation Through AI and Data Analytics Webinar
 
SplunkLive! London - Splunk App for Stream & MINT Breakout
SplunkLive! London - Splunk App for Stream & MINT BreakoutSplunkLive! London - Splunk App for Stream & MINT Breakout
SplunkLive! London - Splunk App for Stream & MINT Breakout
 
Big Data Session 1.pptx
Big Data Session 1.pptxBig Data Session 1.pptx
Big Data Session 1.pptx
 
How to Use Big Data to Transform IT Operations
How to Use Big Data to Transform IT OperationsHow to Use Big Data to Transform IT Operations
How to Use Big Data to Transform IT Operations
 
CrateDB Machine Data Platform Webinar
CrateDB Machine Data Platform Webinar CrateDB Machine Data Platform Webinar
CrateDB Machine Data Platform Webinar
 
bigdataintro.pptx
bigdataintro.pptxbigdataintro.pptx
bigdataintro.pptx
 
Big data.ppt
Big data.pptBig data.ppt
Big data.ppt
 
Lecture1
Lecture1Lecture1
Lecture1
 
DNA: an overview
DNA: an overviewDNA: an overview
DNA: an overview
 
Harnessing Big Data_UCLA
Harnessing Big Data_UCLAHarnessing Big Data_UCLA
Harnessing Big Data_UCLA
 
Partner webinar presentation aws pebble_treasure_data
Partner webinar presentation aws pebble_treasure_dataPartner webinar presentation aws pebble_treasure_data
Partner webinar presentation aws pebble_treasure_data
 
When Streaming Becomes Strategic
When Streaming Becomes StrategicWhen Streaming Becomes Strategic
When Streaming Becomes Strategic
 
Predictive Analytics World Chicago 2015
Predictive Analytics World Chicago 2015Predictive Analytics World Chicago 2015
Predictive Analytics World Chicago 2015
 

Último

Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024StefanoLambiase
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtimeandrehoraa
 
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdf
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdfInnovate and Collaborate- Harnessing the Power of Open Source Software.pdf
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdfYashikaSharma391629
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprisepreethippts
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Natan Silnitsky
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Matt Ray
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsAhmed Mohamed
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based projectAnoyGreter
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsSafe Software
 
How To Manage Restaurant Staff -BTRESTRO
How To Manage Restaurant Staff -BTRESTROHow To Manage Restaurant Staff -BTRESTRO
How To Manage Restaurant Staff -BTRESTROmotivationalword821
 
Salesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZSalesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZABSYZ Inc
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Velvetech LLC
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaHanief Utama
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesPhilip Schwarz
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfAlina Yurenko
 
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsChristian Birchler
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ
 
Xen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfXen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfStefano Stabellini
 

Último (20)

Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtime
 
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdf
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdfInnovate and Collaborate- Harnessing the Power of Open Source Software.pdf
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdf
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprise
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
 
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort ServiceHot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML Diagrams
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based project
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data Streams
 
How To Manage Restaurant Staff -BTRESTRO
How To Manage Restaurant Staff -BTRESTROHow To Manage Restaurant Staff -BTRESTRO
How To Manage Restaurant Staff -BTRESTRO
 
Salesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZSalesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZ
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief Utama
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a series
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
 
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
 
Xen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfXen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdf
 

Building Scalable IoT Apps (QCon S-F)

  • 1. Building scalable IoT apps using OSS technologies Pavel Hardak Basho Technologies Disclaimer: some of the opinions expressed here are mine and might not fully agree with those of
  • 2. IOT & INDUSTRY VERTICALS
  • 3. IoT market - growth prediction Number of connected “things” • 2016 – about 6.4 B • 30% YoY growth, 5.5M activations per day • 2020 – about 21 B “By 2020 more than half of new major business processes and systems will incorporate some element of Internet of Things”
  • 4. Reality Check - let us get a second opinion
  • 5.
  • 6. IoT Project Plan • Investigate those “things” and figure out • What protocols they support (CoAP, MQTT, HTTP, …) • What data they generate (temperature, humidity, location, speed, ...) • Collect this data in our data center • Implement protocols and parsing routines • Store into persistent storage (“Data Lake” architecture) • Once stored in Data Lake • Analyze, summarize, “slice and dice” • Predict, discover insights • Declare a victory – make profit & go for IPO
  • 7. Data Lake IoT Devices SQL Apps & AnalyticsMQTT, CoAP and HTTP REFERENCE ARCHITECTURE (?) Not so fast, my friend.
  • 8. What is wrong with “Data Lake” for IoT ?
  • 9.
  • 10.
  • 11.
  • 12.
  • 13. Auto Insurance - Micro Case Study • One of top 5 auto insurance companies in USA, appears in Fortune-500 list • More than $10B in annual revenue, above $15B in assets • About 20,000 employees and 50,000 insurance agents • More than 19 million individual policies across all 50 states
  • 14. How this “rating info” influences your payment ? • Garaging Zip – what neighborhood is the car parked when it is not used? There is a high correlation between Zip code and the probability of car being stolen or vandalized. • Current and Previous Annual Mileage – if the insured drives for longer distances, it leads to the higher probability of road accidents or car malfunctions. • Vehicle Usage – do you use your car for work or pleasure? Are you commuter, student, stay-at-home parent or Uber driver? Depending on your usage, the company will calculate the risk and adjust the rate. • Years of Driving Experience – young drivers are put into higher risk categories, where older people are considered safer drivers due to more time behind the wheel. Note - average young driver vs. average experienced driver.
  • 15.
  • 16. Sampling Frequency and Dataset Size • Mileage • From one sample per year to 52 (weekly) or 365 (daily) • Better - let us do hourly to “see” the car usage (commuter, …) • Location (used to be “Garaging Zip”) • From one sample per year to 365 (daily) • Better - hourly, allows to learn when car is parked for several hours • New factors for rating algorithm based on weekly summaries • Hard brakes, hard accelerations, going above the speed limit, … • Amount of time series data to be stored and analyzed • Grows by factor of 365x, then by another 24x = 8760x Each week – at least 50x more data than the whole previous year.
  • 17.
  • 18. What is different special about IoT? It is about the “things”… and more.
  • 19.
  • 20.
  • 21. IoT Data Categories Category Description Metadata & Profiles Devices Device info (model, SN, firmware, sensors, ..), configuration, owner, … Users Personal info, preferences, billing info, registered devices, … Time Series Ingested (“Raw”) Measurements, statuses and events from devices. Aggregated (“Derived”) Calculated data - from devices & profiles • Rollups – aggregate metrics from low resolution to higher ones (min - hour – day) using min, max, avg, ... • Aggregations – aggregate measurements, configuration and profiles (model, region, …) over time ranges
  • 22. IOT - NETWORKING TECHNOLOGIES
  • 23. NETWORK WISH LIST • Extreme Reliability • Guaranteed Delivery • End-to-End Low Latency • Quality of Service • Engineered Topology • Committed Bandwidth (CIR) • Fiber-optic network • Dedicated Channel • Strong Signal • Interference and Crosstalk Resistant • High SNR (Signal to Noise Ratio) • Very Low BER (Bit Error Rate)
  • 24. REALITY CHECK - LET US LOOK AGAIN
  • 25. IOT & NETWORK - REALITY • Wireless technologies • Shared transmission media • Limited bandwidth • Mesh or Ad-hoc Topology • Possible signals interference • Mis-ordered or lost packets • Low cost hardware components • Low power radio transmitters • Very small antennas • “Custom-made” firmware • Constrained Application Protocol (CoAP) • “Best Effort” QoS (“shoot and forget”)
  • 26. IoT is “Big Data” - by definition. Actually, lots and lots of Big Data.
  • 27. Five “V”s IoT data Velocity Torrent of small writes (sensors). Reads – millions of low-latency queries: user and device profiles, range queries for TS data (slices). Stream of updates (profiles) - beware of conflicts.
  • 28. Five “V”s IoT data Velocity Torrent of small writes (sensors). Reads – millions of low-latency queries, user and device profiles, range queries for TS data (slices). Stream of updates (profiles) - beware of conflicts. Variety Sensors data (time series), users and devices profiles, also time series “derived” data (e.g. rollups, aggregations).
  • 29. Five “V”s IoT data Velocity Torrent of small writes (sensors). Reads – millions of low-latency queries, user and device profiles, range queries for TS data (slices). Stream of updates (profiles) - beware of conflicts. Variety Sensors data (time series), users and devices profiles, also time series “derived” data (e.g. rollups, aggregations). Volume Starts small, grows quickly, keeps coming 24x7x365 (nights, weekends and holidays). Spikes up on new model launches or successful marketing campaign. But can slow down, but will keep growing. Efficient data retention policy is critical to prevent overflows.
  • 30. Five “V”s IoT data Velocity Torrent of small writes (sensors). Reads – millions of low-latency queries, user and device profiles, range queries for TS data (slices). Stream of updates (profiles) - beware of conflicts. Variety Sensors data (time series), users and devices profiles, also time series “derived” data (e.g. rollups, aggregations). Volume Starts small, grows quickly, keeps coming 24x7x365 (nights, weekends and holidays). Spikes up on new model launches or successful marketing campaign. But can slow down, but will keep growing. Efficient data retention policy is critical to prevent overflows. Veracity Generally trustworthy, but beware of “low cost” sensors with low accuracy. Sent over not- so-reliable transport - expect that some data will be corrupted or arrive late or might be lost. (Hopefully the devices were not hijacked or impersonated by hackers)
  • 31. Five “V”s IoT data Velocity Torrent of small writes (sensors). Reads – millions of low-latency queries, user and device profiles, range queries for TS data (slices). Stream of updates (profiles) - beware of conflicts. Variety Sensors data (time series), users and devices profiles, also time series “derived” data (e.g. rollups, aggregations). Volume Starts small, grows quickly, keeps coming 24x7x365 (nights, weekends and holidays). Spikes up on new model launches or successful marketing campaign. But can slow down, but will keep growing. Efficient data retention policy is critical to prevent overflows. Veracity Generally trustworthy, but beware of “low cost” sensors with low accuracy. Sent over not- so-reliable transport - expect that some data will be corrupted or arrive late or might be lost. (Hopefully the devices were not hijacked or impersonated by hackers) Value Profiles and summaries are much more valuable than raw data samples. The value of “raw” time series quickly goes down was processed and clock advances. Aggregated (”derived”) data are more valuable than raw data. Exceptions: financial transactions, life support, nuclear plants, oil rigs, …
  • 32. Five “V”s IoT data Velocity Torrent of small writes (sensors). Reads – millions of low-latency queries, user and device profiles, range queries for TS data (slices). Stream of updates (profiles) - beware of conflicts. Variety Sensors data (time series), users and devices profiles, also time series “derived” data (e.g. rollups, aggregations). Volume Starts small, grows quickly, keeps coming 24x7x365 (nights, weekends and holidays). Spikes up on new model launches or successful marketing campaign. But can slow down, but will keep growing. Efficient data retention policy is critical to prevent overflows. Veracity Generally trustworthy, but beware of “low cost” sensors with low accuracy. Sent over not- so-reliable transport - expect that some data will be corrupted or arrive late or might be lost. (Hopefully the devices were not hijacked or impersonated by hackers) Value Profiles and summaries are much more valuable than raw data samples. The value of “raw” time series quickly goes down was processed and clock advances. Aggregated (”derived”) data are more valuable than raw data. Exceptions: financial transactions, life support, nuclear plants, oil rigs, … Complexity Poly-structured using simple schemas and simple relations (usually implicit). Some data is treated as unstructured (”opaque”) for speed or flexibility.
  • 34. Architectural Blueprints • Lambda Architecture by Nathan Marz (ex-Twitter) • Kappa Architecture by Jay Kreps (Confluent) • Zeta Architecture by Jim Scott (MapR) • … and their variants Lambda Kappa Zeta
  • 35. Data Processing Framework for IoT • Uses “Best of breed” OSS technologies • Combines two paradigms • “Speed Layer” – pipeline for Stream Processing for “Data in Motion” • “Serving Layer” – analytics for “Data in Motion” and “Data at Rest” • Every component is “Distributed by Design” • Collection Layer • Message Queue • Stream Processing • Data Storage (Database, Object System, Data Warehouse) • Query and Analytics Engines
  • 36. Data Access Patterns Category Description R:W % Metadata & Profiles Devices & Users Many low latency small reads - all over the dataset. Occasional updates – possibly by different “actors” (web, device, app), conflicts need to be prevented or resolved. Fewer creates and deletes. 90:10 Time Series
  • 37. Data Access Patterns Category Description R:W % Metadata & Profiles Devices & Users Many low latency small reads - all over the dataset. Occasional updates – possibly by different “actors” (web, device, app), conflicts need to be prevented or resolved. Fewer creates and deletes. 90:10 Time Series Ingested (“Raw”) Very high throughout of relatively small writes. Most reads are over recent time range “slice”. Updates are rare (corrections). This category is a biggest part of the IoT application dataset. 10:90
  • 38. Data Access Patterns Category Description R:W % Metadata & Profiles Devices & Users Many low latency small reads - all over the dataset. Occasional updates – possibly by different “actors” (web, device, app), conflicts need to be prevented or resolved. Fewer creates and deletes. 90:10 Time Series Ingested (“Raw”) Very high throughout of relatively small writes. Most reads are over recent time range “slice”. Updates are rare (corrections). This category is a biggest part of the IoT application dataset. 10:90 Aggregated (“Derived”) Mostly reads – users, platform services, reports. Writes are periodical on each time interval or from batch jobs. 80:20
  • 39. Data store for IoT – “Wish list” • Ingested (Raw) Time Series • Very high write throughput • Fast slice (time range) reads • Aggregated (Derived) Time Series • Auto-distributed + slice locality • SQL-like queries • Aggregations • Bulk queries (analytics) • Secondary Indexes (Tags) • Efficient Storage • Auto Data Retention (TTL) • Build-in anti entropy • Compression • Hot Backups • Profiles and Metadata • Many concurrent reads with low latency • Reliable writes (ACID or conflict resolution) • Unstructured or partially structured • Secondary Indexes + Text Search • Scalability and Availability • Distributed architecture, no SPoF • Linearly scalable - up and down • Operational simplicity • Master-less architecture • Automatic rebalancing • Metrics, logs, events • Rolling upgrades
  • 40. What DB type is a good fit for TS use cases?
  • 41. Database Type For IoT or Time Series Relational Key Value Document Wide Column Graph MySQL Riak KV MongoDB Cassandra Neo4J PostgreSQL DynamoDB CouchBase HBase Titan Oracle Voldemort RethinkDB Accumulo Infinite Graph There is a need for a new type of NoSQL database – Time Series None of existing DB types was designed to handle time series data • Wide column DBs have high write throughput, but reads and updates are not their strength • Key Value and Document DBs handle metadata well, but struggle with heavy writes and time-slicing reads • Relational - good with metadata (unless number of updates is high), but a bad choice for TS data • Graph DB – not a good choice for either time series or metadata, can be added later on
  • 42. Database Type For IoT or Time Series Relational Key Value Document Wide Column Graph MySQL Riak KV MongoDB Cassandra Neo4J PostgreSQL DynamoDB CouchBase HBase Titan Oracle Voldemort RethinkDB Accumulo Infinite Graph Time Series InfluxDB Riak TS Blueflood KairosDB Prometeus Druid OpenTSDB Dalmatiner Graphite
  • 43. Iot Sensors Data – Hot to Cold
  • 44. SENSORS DATA – HOT N’ COLD Temp Purpose Description Immutable? Boiling Hot App usage Last known value(s) and/or for last N minutes, useful for immediate responses, very frequently accessed No Hot Operational dataset Last 24 hours to several days or weeks (rarely months), frequently accessed, dashboards and online analytics Almost* Warm Historical data Older data, less frequently accessed, used mostly for offline analytics and historical analysis Yes Cold Archives Used only in rare situations, kept in long term storage for regulatory or unpredicted purposes Yes
  • 45. STORAGE TIERS – FROM HOT TO COLD RAM → Database (TSDB) → Object Storage → Archive Data Lake Temp Purpose Storage Products Immutable? Boiling Hot App usage Internal app cache, Redis or Memcached No Hot Operational dataset NoSQL Database (preferably Time Series DB) Riak TS, OpenTSDB, KairosDB, Cassandra, HBase Almost* Warm Historical data Object storage – HDFS (Hadoop), Ceph, Minio, Riak S2 or AWS S3 Yes Cold Archives Various Yes
  • 46. STORAGE TIERS – REALITY CHECK RAM → Database (TSDB) → Object Storage → Archive Elastic Cache (Redis) → Database (Postgres, DynamoDB) → AWS S3 → Glacier Data Lake Temp AWS Service Storage price, GB per month Boiling Hot Elastic Cache (Redis) $15-45 Hot DynamoDB RDS (Postgres) $ 0.25-0.35 (SSD) from $0.1 (Magnetic) Warm Simple Storage Service (S3) $0.024 to $0.030 Cold Glacier $0.007
  • 47. OSS technologies for scalable IoT apps Component Open Source Technologies Load balancer Ngnix, HA Proxy Ingestion Kafka, RabbitMQ, ZeroMQ, Flume Stream Computing Spark Streaming, Apache Flink, Kafka Streams, Samza Time Series Store InfluxDB, KairosDB, Riak, Cassandra, OpenTSDB Profiles Store CouchBase, Riak, MySQL, Postgres, MongoDB Search Solr, Elastic Search Object Storage HDFS (Hadoop), Minio, Riak S2, Ceph Analytics Framework Apache Spark, MapReduce, Hive SQL Query Engine Spark SQL, Presto, Impala, Drill Cluster Manager Mesosphere DC/OS or Mesos, Kubernetes, Docker Swarm
  • 48. Checklist for IoT technology stack ❑Is it vendor lock-in or open source software? Are there open APIs? ❑Can it be deployed in cloud? At the edge? In a data center? Using hybrid approach? ❑Can it be used it for free or low cost (no big upfront investment)? ❑Can you develop your app on your laptop? How many “moving parts”? ❑Are the components pre-integrated or can be easily integrated together? ❑Can you easily scale each component in this architecture by 10x? 20x? 50x? ❑Is there a roadmap, actively worked on, which is aligned with your vision? ❑Is there a company behind the technology to provide 24x7 support when needed?
  • 49. Come to Basho booth to learn about • Riak TS (Time Series) - highly scalable NoSQL database for IoT and Time Series … and more • Riak Spark Connector for Apache Spark • Riak Integrations with Redis and Kafka • Riak Mesos Framework (RMF) for DC/OS