SlideShare a Scribd company logo
1 of 33
© 2016 Silver Spring Networks. All rights reserved.1
Silver Spring Networks
Greg Brosman
Product Manager
SilverLink Data Platform
© 2016 Silver Spring Networks. All rights reserved.2
Silver Spring Networks
• Silver Spring Networks helps global utilities and cities
connect, optimize, and manage smart energy and smart city
infrastructure
• Over 22 million connected devices
• 200B records read per year
• 2 million remote operations per year
Integrate
Renewables
Engage
Customers
Improve
Operational Efficiency
Improve
Reliability
Manage
Peak
Automate
Measurement
Improve
Energy Efficiency
Reduce Truck Rolls for
Device Maintenance
© 2016 Silver Spring Networks. All rights reserved.3
More Devices, More Data
• How can we do more with our network?
- We deployed a network to support meter reading. It works great, but
we’re ready for the next thing to leverage these investments
• How do we manage these new devices and make all this
data accessible and secure?
- There are lots of opportunities to enhance our service by making use of
advanced analytics, but we can’t get the data to the right people
• How can we reduce the cost, time, and pain of integrating
with 3rd party apps?
- The ecosystem of 3rd party apps is growing, but need a scalable way to
connect apps with data
Managing the volume, variety, and velocity of data
© 2016 Silver Spring Networks. All rights reserved.4
SilverLink Data Platform
• Automatically ingest smart grid data
• Enrich data with valuable context
• Enable real-time and batch
applications
• Archive raw and enriched data
• Connect apps through standard APIs
• Explore data through BI tool
integrations
Seamlessly connecting apps with sensor data
Security & API Management
Storage & BatchReal-Time
Data Ingestion
Data Sources
SilverLink Data Platform
Applications
Silver Spring
Networks Apps
3rd Party
Apps
In-House
Apps
Devices
Silver Spring
Networks Data
Utility
Data
3rd Party
Data
© 2016 Silver Spring Networks. All rights reserved.5
Starfish
• A Worldwide Wireless IPv6 Network Service for the IoT.
Starfish enables cities, utilities, enterprises, and developers to
connect and manage a new generation of intelligent devices
• Focus areas include water, energy, food, traffic, transportation
and safety
• 2016 Global IoT Hackathon Series: an opportunity to develop
and test innovations and collaborate with leading IoT
technologists
Building a new ecosystem of IoT services
© 2016 Silver Spring Networks. All rights reserved.6
IOT Big Data Ingestion &
Processing in Hadoop
Darin Nee
Silver Spring Networks
© 2016 Silver Spring Networks. All rights reserved.7
• Context & scope of our use case
• Tour a DataTorrent app we built
• Some technical hurdles & solutions we came up with
• Q & A
Agenda
© 2016 Silver Spring Networks. All rights reserved.8
• Sensor reads
• Meter register reads & interval data
• Threshold events, traps
• Device metadata
Kinds of Data
© 2016 Silver Spring Networks. All rights reserved.9
• NICs collect data from meters
• Head end software poll NICs
• Some data sent asynchronously to head end
• Agents send data to SilverLink
• Data processing using DataTorrent + more
• Data consumed via APIs and SQL
Data Flow
© 2016 Silver Spring Networks. All rights reserved.10
• Encryption of data at rest & in-transit
• Ranger & Knox
• Custom requirements to satisfy local laws
• Auditing
• No data leakage across tenants
• Not enough to be secure – need to prove it
Security
© 2016 Silver Spring Networks. All rights reserved.11
• Shared resources to cut costs
• Customers with millions of devices, and pilots with a handful of them
• Centralized management of software & operations
• Challenge in selling shared anything to our customers
Multi-Tenancy
© 2016 Silver Spring Networks. All rights reserved.12
• 23 million network endpoints in service today
• Up to 96 intervals a day
• Each interval has 4 channels
• So, approximately 8 billion intervals per day
• Keep this data forever
• Also, 100 million events a day
• And, sensors that can collect data every 10s
• 19.4 GB per million meters per day
• ½ TB per day
Scalability
© 2016 Silver Spring Networks. All rights reserved.13
• Clustering
• Automated Fail-overs
• Rolling upgrades
High Availability & Disaster Recovery
© 2016 Silver Spring Networks. All rights reserved.14
• HDFS
• Kafka
• DataTorrent
• Elasticsearch
• OpenTSDB & HBase
• Oozie
• Hive
• Mule
• Apigee
• Tableau
Tech Architecture
© 2016 Silver Spring Networks. All rights reserved.15
• Management UI Console
• Malhar Library + Java
• Support
• Rapid Development
• Stats, Operability, Auto-Scaling
Why DT?
© 2016 Silver Spring Networks. All rights reserved.16
• Resilient operators (availability)
• Easily partition operators (scalability)
• Any java programmer can build a simple app
• Facilitate management hand-off to operations
• Easy to detect failures with UI and stats
Strengths
© 2016 Silver Spring Networks. All rights reserved.17
• No “back pressure”
• If container crashes with OOM, it restores container to OOM state
• No good way to stop an app and save context
• Can be difficult to navigate logs
Our focus areas for improvement
© 2016 Silver Spring Networks. All rights reserved.18
Example DT App: AMM Export Ingestion
© 2016 Silver Spring Networks. All rights reserved.19
Example App: AMM Export Ingestion
• Scans last 2 days’ HDFS directories
• Emits filenames
• Too fast!
Input Operator
© 2016 Silver Spring Networks. All rights reserved.20
Example App: AMM Export Ingestion
• Parses different types
• Emits avro tuples
• XML parsing can be slow
• File & tuple sizes vary a lot
AMM File Reader
© 2016 Silver Spring Networks. All rights reserved.21
Example App: AMM Export Ingestion
• Adds metadata to every tuple
• External dependency on elasticsearch
• Uses a thread pool since one YARN container too big for
a single client
Enricher
© 2016 Silver Spring Networks. All rights reserved.22
Example App: AMM Export Ingestion
• Normalizes tuples across schema versions
• Outputs many tuples from one
Avro Converter
© 2016 Silver Spring Networks. All rights reserved.23
Example App: AMM Export Ingestion
• Writes avro tuples to HDFS files
• Names output files by date, input file, part, etc.
• HDFS can be slow – another external dependency
• Container death causes rewriting of tuples
Enriched Persister
© 2016 Silver Spring Networks. All rights reserved.24
Example App: AMM Export Ingestion
• Embedded instance of OpenTSDB
• External dependency on HBase
• Slow during metric creation and Hbase Region splits
TSDB Writer
© 2016 Silver Spring Networks. All rights reserved.25
AMM Export Ingestion
Continuing to extend the DAG with new operators
© 2016 Silver Spring Networks. All rights reserved.26
• The classic YARN application solution is to spin up more containers
• Not so simple due to external dependencies, and,
• Highly variable loads
- Tuple mix
- Tuple size
- Kind of tuple
• Buffering tuples in the DAG
• Static partitioning means the DAG has to be slow
• Throughput: how many tuples operator can emit per window
• We need dynamic throughput management
Scalability & Throughput
© 2016 Silver Spring Networks. All rights reserved.27
Throughput Management
We use a Stats Listener to “auto-tune” the throughput rate
© 2016 Silver Spring Networks. All rights reserved.28
Throughput Management
• Any pair of logical operators
• Adjusts upstream operator throughput every N windows
• Scales it by a factor based on downstream operator
backlog threshold levels
• A lagging correction since based on operator stats from
prior windows
• Observed overall processing rate across DAG oscillates
• Control theory says this is not going to work since it will
never converge to a reasonable value
First implementation
© 2016 Silver Spring Networks. All rights reserved.29
Throughput Management
• Compute a backlog
• Try to maintain a target backlog that is a multiple of the
downstream operator processing rate
• Problem: starvation
- Stats not reported when throughput set to zero
- Solution 1: small, positive min throughput
- Solution 2: fractional/probabilistic emit
Second implementation
© 2016 Silver Spring Networks. All rights reserved.30
Throughput Management
• Operators don’t run out of memory and crash
• Overall throughput across the DAG is much higher
• Can adapt to a wide mix of loads
• General enough that we are using it in all our apps
• We ingested 4 multi-month pilot datasets successfully
• Reduced the time it takes to ingest 1 day’s worth of data
from 1½ hrs to 15 min
• Hands off, automated tuning
Successes
© 2016 Silver Spring Networks. All rights reserved.31
Throughput Management
• Throughput management is based on tuple count and not
all tuples are the same
• Garbage Collection causes uneven performance
• Slow to converge
• Hard to test and debug
Remaining problems
© 2016 Silver Spring Networks. All rights reserved.32
• Persist processed state for files & Kafka messages
- Save Kafka offsets in ZooKeeper
- Rename input files to .processed
• Checkpoint Listener
- Wait to persist state until tuple fully transits DAG
- Prevent loss of data
• However, some tuples get processed twice
• Suspend script
- Use REST API to set a flag on Input Operator
- Wait until no more activity
Stopping DAGs
© 2016 Silver Spring Networks. All rights reserved.33
• Hadoop 2.3.0
• DataTorrent 3.1.1
Versions

More Related Content

Viewers also liked

Introduction to Apache Apex
Introduction to Apache ApexIntroduction to Apache Apex
Introduction to Apache ApexApache Apex
 
Predix Builder Roadshow
Predix Builder RoadshowPredix Builder Roadshow
Predix Builder RoadshowPredix
 
Akselos solutions for oil & gas
Akselos solutions for oil & gasAkselos solutions for oil & gas
Akselos solutions for oil & gasAlonso Giannoni
 
2015Apr21 IoT Global innovation forum Dallas Texas USA
2015Apr21 IoT Global innovation forum Dallas Texas USA2015Apr21 IoT Global innovation forum Dallas Texas USA
2015Apr21 IoT Global innovation forum Dallas Texas USACJ Boguszewski
 
Windowing in Apache Apex
Windowing in Apache ApexWindowing in Apache Apex
Windowing in Apache ApexApache Apex
 
Introduction to Apache Apex and writing a big data streaming application
Introduction to Apache Apex and writing a big data streaming application  Introduction to Apache Apex and writing a big data streaming application
Introduction to Apache Apex and writing a big data streaming application Apache Apex
 
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and TransformIntro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and TransformApache Apex
 
Introduction to Apache Apex
Introduction to Apache ApexIntroduction to Apache Apex
Introduction to Apache ApexApache Apex
 
Introduction to Real-Time Data Processing
Introduction to Real-Time Data ProcessingIntroduction to Real-Time Data Processing
Introduction to Real-Time Data ProcessingApache Apex
 
Apache Big Data EU 2016: Building Streaming Applications with Apache Apex
Apache Big Data EU 2016: Building Streaming Applications with Apache ApexApache Big Data EU 2016: Building Streaming Applications with Apache Apex
Apache Big Data EU 2016: Building Streaming Applications with Apache ApexApache Apex
 
Introduction to Yarn
Introduction to YarnIntroduction to Yarn
Introduction to YarnApache Apex
 
E3: Edge and Cloud Connectivity (Predix Transform 2016)
E3: Edge and Cloud Connectivity (Predix Transform 2016)E3: Edge and Cloud Connectivity (Predix Transform 2016)
E3: Edge and Cloud Connectivity (Predix Transform 2016)Predix
 
E1: Building the Digital Twin (Predix Transform 2016)
E1: Building the Digital Twin (Predix Transform 2016)E1: Building the Digital Twin (Predix Transform 2016)
E1: Building the Digital Twin (Predix Transform 2016)Predix
 
PAM3: Machine Learning in the Railway Industry ( Predix Transform 2016)
PAM3: Machine Learning in the Railway Industry ( Predix Transform 2016)PAM3: Machine Learning in the Railway Industry ( Predix Transform 2016)
PAM3: Machine Learning in the Railway Industry ( Predix Transform 2016)Predix
 
Apache Hadoop YARN - Enabling Next Generation Data Applications
Apache Hadoop YARN - Enabling Next Generation Data ApplicationsApache Hadoop YARN - Enabling Next Generation Data Applications
Apache Hadoop YARN - Enabling Next Generation Data ApplicationsHortonworks
 
Data made out of functions
Data made out of functionsData made out of functions
Data made out of functionskenbot
 

Viewers also liked (17)

Introduction to Apache Apex
Introduction to Apache ApexIntroduction to Apache Apex
Introduction to Apache Apex
 
Predix Builder Roadshow
Predix Builder RoadshowPredix Builder Roadshow
Predix Builder Roadshow
 
Akselos solutions for oil & gas
Akselos solutions for oil & gasAkselos solutions for oil & gas
Akselos solutions for oil & gas
 
2015Apr21 IoT Global innovation forum Dallas Texas USA
2015Apr21 IoT Global innovation forum Dallas Texas USA2015Apr21 IoT Global innovation forum Dallas Texas USA
2015Apr21 IoT Global innovation forum Dallas Texas USA
 
Windowing in Apache Apex
Windowing in Apache ApexWindowing in Apache Apex
Windowing in Apache Apex
 
Introduction to Apache Apex and writing a big data streaming application
Introduction to Apache Apex and writing a big data streaming application  Introduction to Apache Apex and writing a big data streaming application
Introduction to Apache Apex and writing a big data streaming application
 
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and TransformIntro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
 
Introduction to Apache Apex
Introduction to Apache ApexIntroduction to Apache Apex
Introduction to Apache Apex
 
Introduction to Real-Time Data Processing
Introduction to Real-Time Data ProcessingIntroduction to Real-Time Data Processing
Introduction to Real-Time Data Processing
 
Apache Big Data EU 2016: Building Streaming Applications with Apache Apex
Apache Big Data EU 2016: Building Streaming Applications with Apache ApexApache Big Data EU 2016: Building Streaming Applications with Apache Apex
Apache Big Data EU 2016: Building Streaming Applications with Apache Apex
 
Introduction to Yarn
Introduction to YarnIntroduction to Yarn
Introduction to Yarn
 
E3: Edge and Cloud Connectivity (Predix Transform 2016)
E3: Edge and Cloud Connectivity (Predix Transform 2016)E3: Edge and Cloud Connectivity (Predix Transform 2016)
E3: Edge and Cloud Connectivity (Predix Transform 2016)
 
E1: Building the Digital Twin (Predix Transform 2016)
E1: Building the Digital Twin (Predix Transform 2016)E1: Building the Digital Twin (Predix Transform 2016)
E1: Building the Digital Twin (Predix Transform 2016)
 
PAM3: Machine Learning in the Railway Industry ( Predix Transform 2016)
PAM3: Machine Learning in the Railway Industry ( Predix Transform 2016)PAM3: Machine Learning in the Railway Industry ( Predix Transform 2016)
PAM3: Machine Learning in the Railway Industry ( Predix Transform 2016)
 
Apache Hadoop YARN - Enabling Next Generation Data Applications
Apache Hadoop YARN - Enabling Next Generation Data ApplicationsApache Hadoop YARN - Enabling Next Generation Data Applications
Apache Hadoop YARN - Enabling Next Generation Data Applications
 
GE Predix - The IIoT Platform
GE Predix - The IIoT PlatformGE Predix - The IIoT Platform
GE Predix - The IIoT Platform
 
Data made out of functions
Data made out of functionsData made out of functions
Data made out of functions
 

More from Apache Apex

Low Latency Polyglot Model Scoring using Apache Apex
Low Latency Polyglot Model Scoring using Apache ApexLow Latency Polyglot Model Scoring using Apache Apex
Low Latency Polyglot Model Scoring using Apache ApexApache Apex
 
From Batch to Streaming with Apache Apex Dataworks Summit 2017
From Batch to Streaming with Apache Apex Dataworks Summit 2017From Batch to Streaming with Apache Apex Dataworks Summit 2017
From Batch to Streaming with Apache Apex Dataworks Summit 2017Apache Apex
 
Actionable Insights with Apache Apex at Apache Big Data 2017 by Devendra Tagare
Actionable Insights with Apache Apex at Apache Big Data 2017 by Devendra TagareActionable Insights with Apache Apex at Apache Big Data 2017 by Devendra Tagare
Actionable Insights with Apache Apex at Apache Big Data 2017 by Devendra TagareApache Apex
 
Developing streaming applications with apache apex (strata + hadoop world)
Developing streaming applications with apache apex (strata + hadoop world)Developing streaming applications with apache apex (strata + hadoop world)
Developing streaming applications with apache apex (strata + hadoop world)Apache Apex
 
Hadoop Interacting with HDFS
Hadoop Interacting with HDFSHadoop Interacting with HDFS
Hadoop Interacting with HDFSApache Apex
 
Introduction to Map Reduce
Introduction to Map ReduceIntroduction to Map Reduce
Introduction to Map ReduceApache Apex
 
Intro to Big Data Hadoop
Intro to Big Data HadoopIntro to Big Data Hadoop
Intro to Big Data HadoopApache Apex
 
Kafka to Hadoop Ingest with Parsing, Dedup and other Big Data Transformations
Kafka to Hadoop Ingest with Parsing, Dedup and other Big Data TransformationsKafka to Hadoop Ingest with Parsing, Dedup and other Big Data Transformations
Kafka to Hadoop Ingest with Parsing, Dedup and other Big Data TransformationsApache Apex
 
Building Your First Apache Apex (Next Gen Big Data/Hadoop) Application
Building Your First Apache Apex (Next Gen Big Data/Hadoop) ApplicationBuilding Your First Apache Apex (Next Gen Big Data/Hadoop) Application
Building Your First Apache Apex (Next Gen Big Data/Hadoop) ApplicationApache Apex
 
Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)
Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)
Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)Apache Apex
 
Ingesting Data from Kafka to JDBC with Transformation and Enrichment
Ingesting Data from Kafka to JDBC with Transformation and EnrichmentIngesting Data from Kafka to JDBC with Transformation and Enrichment
Ingesting Data from Kafka to JDBC with Transformation and EnrichmentApache Apex
 
Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex Apache Apex
 
Ingestion and Dimensions Compute and Enrich using Apache Apex
Ingestion and Dimensions Compute and Enrich using Apache ApexIngestion and Dimensions Compute and Enrich using Apache Apex
Ingestion and Dimensions Compute and Enrich using Apache ApexApache Apex
 
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache Apex
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache ApexHadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache Apex
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache ApexApache Apex
 
Apache Beam (incubating)
Apache Beam (incubating)Apache Beam (incubating)
Apache Beam (incubating)Apache Apex
 
Java High Level Stream API
Java High Level Stream APIJava High Level Stream API
Java High Level Stream APIApache Apex
 
Intro to Apache Apex - Next Gen Native Hadoop Platform - Hackac
Intro to Apache Apex - Next Gen Native Hadoop Platform - HackacIntro to Apache Apex - Next Gen Native Hadoop Platform - Hackac
Intro to Apache Apex - Next Gen Native Hadoop Platform - HackacApache Apex
 
Making sense of Apache Bigtop's role in ODPi and how it matters to Apache Apex
Making sense of Apache Bigtop's role in ODPi and how it matters to Apache ApexMaking sense of Apache Bigtop's role in ODPi and how it matters to Apache Apex
Making sense of Apache Bigtop's role in ODPi and how it matters to Apache ApexApache Apex
 
Apache Apex & Bigtop
Apache Apex & BigtopApache Apex & Bigtop
Apache Apex & BigtopApache Apex
 

More from Apache Apex (20)

Low Latency Polyglot Model Scoring using Apache Apex
Low Latency Polyglot Model Scoring using Apache ApexLow Latency Polyglot Model Scoring using Apache Apex
Low Latency Polyglot Model Scoring using Apache Apex
 
From Batch to Streaming with Apache Apex Dataworks Summit 2017
From Batch to Streaming with Apache Apex Dataworks Summit 2017From Batch to Streaming with Apache Apex Dataworks Summit 2017
From Batch to Streaming with Apache Apex Dataworks Summit 2017
 
Actionable Insights with Apache Apex at Apache Big Data 2017 by Devendra Tagare
Actionable Insights with Apache Apex at Apache Big Data 2017 by Devendra TagareActionable Insights with Apache Apex at Apache Big Data 2017 by Devendra Tagare
Actionable Insights with Apache Apex at Apache Big Data 2017 by Devendra Tagare
 
Developing streaming applications with apache apex (strata + hadoop world)
Developing streaming applications with apache apex (strata + hadoop world)Developing streaming applications with apache apex (strata + hadoop world)
Developing streaming applications with apache apex (strata + hadoop world)
 
Hadoop Interacting with HDFS
Hadoop Interacting with HDFSHadoop Interacting with HDFS
Hadoop Interacting with HDFS
 
Introduction to Map Reduce
Introduction to Map ReduceIntroduction to Map Reduce
Introduction to Map Reduce
 
HDFS Internals
HDFS InternalsHDFS Internals
HDFS Internals
 
Intro to Big Data Hadoop
Intro to Big Data HadoopIntro to Big Data Hadoop
Intro to Big Data Hadoop
 
Kafka to Hadoop Ingest with Parsing, Dedup and other Big Data Transformations
Kafka to Hadoop Ingest with Parsing, Dedup and other Big Data TransformationsKafka to Hadoop Ingest with Parsing, Dedup and other Big Data Transformations
Kafka to Hadoop Ingest with Parsing, Dedup and other Big Data Transformations
 
Building Your First Apache Apex (Next Gen Big Data/Hadoop) Application
Building Your First Apache Apex (Next Gen Big Data/Hadoop) ApplicationBuilding Your First Apache Apex (Next Gen Big Data/Hadoop) Application
Building Your First Apache Apex (Next Gen Big Data/Hadoop) Application
 
Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)
Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)
Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)
 
Ingesting Data from Kafka to JDBC with Transformation and Enrichment
Ingesting Data from Kafka to JDBC with Transformation and EnrichmentIngesting Data from Kafka to JDBC with Transformation and Enrichment
Ingesting Data from Kafka to JDBC with Transformation and Enrichment
 
Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex
 
Ingestion and Dimensions Compute and Enrich using Apache Apex
Ingestion and Dimensions Compute and Enrich using Apache ApexIngestion and Dimensions Compute and Enrich using Apache Apex
Ingestion and Dimensions Compute and Enrich using Apache Apex
 
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache Apex
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache ApexHadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache Apex
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache Apex
 
Apache Beam (incubating)
Apache Beam (incubating)Apache Beam (incubating)
Apache Beam (incubating)
 
Java High Level Stream API
Java High Level Stream APIJava High Level Stream API
Java High Level Stream API
 
Intro to Apache Apex - Next Gen Native Hadoop Platform - Hackac
Intro to Apache Apex - Next Gen Native Hadoop Platform - HackacIntro to Apache Apex - Next Gen Native Hadoop Platform - Hackac
Intro to Apache Apex - Next Gen Native Hadoop Platform - Hackac
 
Making sense of Apache Bigtop's role in ODPi and how it matters to Apache Apex
Making sense of Apache Bigtop's role in ODPi and how it matters to Apache ApexMaking sense of Apache Bigtop's role in ODPi and how it matters to Apache Apex
Making sense of Apache Bigtop's role in ODPi and how it matters to Apache Apex
 
Apache Apex & Bigtop
Apache Apex & BigtopApache Apex & Bigtop
Apache Apex & Bigtop
 

Recently uploaded

Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AIABDERRAOUF MEHENNI
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionSolGuruz
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️anilsa9823
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...OnePlan Solutions
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number SystemsJheuzeDellosa
 

Recently uploaded (20)

Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
Exploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the ProcessExploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the Process
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStack
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number Systems
 

IOT Big Data Ingestion and Processing in Hadoop by Silver Spring Networks

  • 1. © 2016 Silver Spring Networks. All rights reserved.1 Silver Spring Networks Greg Brosman Product Manager SilverLink Data Platform
  • 2. © 2016 Silver Spring Networks. All rights reserved.2 Silver Spring Networks • Silver Spring Networks helps global utilities and cities connect, optimize, and manage smart energy and smart city infrastructure • Over 22 million connected devices • 200B records read per year • 2 million remote operations per year Integrate Renewables Engage Customers Improve Operational Efficiency Improve Reliability Manage Peak Automate Measurement Improve Energy Efficiency Reduce Truck Rolls for Device Maintenance
  • 3. © 2016 Silver Spring Networks. All rights reserved.3 More Devices, More Data • How can we do more with our network? - We deployed a network to support meter reading. It works great, but we’re ready for the next thing to leverage these investments • How do we manage these new devices and make all this data accessible and secure? - There are lots of opportunities to enhance our service by making use of advanced analytics, but we can’t get the data to the right people • How can we reduce the cost, time, and pain of integrating with 3rd party apps? - The ecosystem of 3rd party apps is growing, but need a scalable way to connect apps with data Managing the volume, variety, and velocity of data
  • 4. © 2016 Silver Spring Networks. All rights reserved.4 SilverLink Data Platform • Automatically ingest smart grid data • Enrich data with valuable context • Enable real-time and batch applications • Archive raw and enriched data • Connect apps through standard APIs • Explore data through BI tool integrations Seamlessly connecting apps with sensor data Security & API Management Storage & BatchReal-Time Data Ingestion Data Sources SilverLink Data Platform Applications Silver Spring Networks Apps 3rd Party Apps In-House Apps Devices Silver Spring Networks Data Utility Data 3rd Party Data
  • 5. © 2016 Silver Spring Networks. All rights reserved.5 Starfish • A Worldwide Wireless IPv6 Network Service for the IoT. Starfish enables cities, utilities, enterprises, and developers to connect and manage a new generation of intelligent devices • Focus areas include water, energy, food, traffic, transportation and safety • 2016 Global IoT Hackathon Series: an opportunity to develop and test innovations and collaborate with leading IoT technologists Building a new ecosystem of IoT services
  • 6. © 2016 Silver Spring Networks. All rights reserved.6 IOT Big Data Ingestion & Processing in Hadoop Darin Nee Silver Spring Networks
  • 7. © 2016 Silver Spring Networks. All rights reserved.7 • Context & scope of our use case • Tour a DataTorrent app we built • Some technical hurdles & solutions we came up with • Q & A Agenda
  • 8. © 2016 Silver Spring Networks. All rights reserved.8 • Sensor reads • Meter register reads & interval data • Threshold events, traps • Device metadata Kinds of Data
  • 9. © 2016 Silver Spring Networks. All rights reserved.9 • NICs collect data from meters • Head end software poll NICs • Some data sent asynchronously to head end • Agents send data to SilverLink • Data processing using DataTorrent + more • Data consumed via APIs and SQL Data Flow
  • 10. © 2016 Silver Spring Networks. All rights reserved.10 • Encryption of data at rest & in-transit • Ranger & Knox • Custom requirements to satisfy local laws • Auditing • No data leakage across tenants • Not enough to be secure – need to prove it Security
  • 11. © 2016 Silver Spring Networks. All rights reserved.11 • Shared resources to cut costs • Customers with millions of devices, and pilots with a handful of them • Centralized management of software & operations • Challenge in selling shared anything to our customers Multi-Tenancy
  • 12. © 2016 Silver Spring Networks. All rights reserved.12 • 23 million network endpoints in service today • Up to 96 intervals a day • Each interval has 4 channels • So, approximately 8 billion intervals per day • Keep this data forever • Also, 100 million events a day • And, sensors that can collect data every 10s • 19.4 GB per million meters per day • ½ TB per day Scalability
  • 13. © 2016 Silver Spring Networks. All rights reserved.13 • Clustering • Automated Fail-overs • Rolling upgrades High Availability & Disaster Recovery
  • 14. © 2016 Silver Spring Networks. All rights reserved.14 • HDFS • Kafka • DataTorrent • Elasticsearch • OpenTSDB & HBase • Oozie • Hive • Mule • Apigee • Tableau Tech Architecture
  • 15. © 2016 Silver Spring Networks. All rights reserved.15 • Management UI Console • Malhar Library + Java • Support • Rapid Development • Stats, Operability, Auto-Scaling Why DT?
  • 16. © 2016 Silver Spring Networks. All rights reserved.16 • Resilient operators (availability) • Easily partition operators (scalability) • Any java programmer can build a simple app • Facilitate management hand-off to operations • Easy to detect failures with UI and stats Strengths
  • 17. © 2016 Silver Spring Networks. All rights reserved.17 • No “back pressure” • If container crashes with OOM, it restores container to OOM state • No good way to stop an app and save context • Can be difficult to navigate logs Our focus areas for improvement
  • 18. © 2016 Silver Spring Networks. All rights reserved.18 Example DT App: AMM Export Ingestion
  • 19. © 2016 Silver Spring Networks. All rights reserved.19 Example App: AMM Export Ingestion • Scans last 2 days’ HDFS directories • Emits filenames • Too fast! Input Operator
  • 20. © 2016 Silver Spring Networks. All rights reserved.20 Example App: AMM Export Ingestion • Parses different types • Emits avro tuples • XML parsing can be slow • File & tuple sizes vary a lot AMM File Reader
  • 21. © 2016 Silver Spring Networks. All rights reserved.21 Example App: AMM Export Ingestion • Adds metadata to every tuple • External dependency on elasticsearch • Uses a thread pool since one YARN container too big for a single client Enricher
  • 22. © 2016 Silver Spring Networks. All rights reserved.22 Example App: AMM Export Ingestion • Normalizes tuples across schema versions • Outputs many tuples from one Avro Converter
  • 23. © 2016 Silver Spring Networks. All rights reserved.23 Example App: AMM Export Ingestion • Writes avro tuples to HDFS files • Names output files by date, input file, part, etc. • HDFS can be slow – another external dependency • Container death causes rewriting of tuples Enriched Persister
  • 24. © 2016 Silver Spring Networks. All rights reserved.24 Example App: AMM Export Ingestion • Embedded instance of OpenTSDB • External dependency on HBase • Slow during metric creation and Hbase Region splits TSDB Writer
  • 25. © 2016 Silver Spring Networks. All rights reserved.25 AMM Export Ingestion Continuing to extend the DAG with new operators
  • 26. © 2016 Silver Spring Networks. All rights reserved.26 • The classic YARN application solution is to spin up more containers • Not so simple due to external dependencies, and, • Highly variable loads - Tuple mix - Tuple size - Kind of tuple • Buffering tuples in the DAG • Static partitioning means the DAG has to be slow • Throughput: how many tuples operator can emit per window • We need dynamic throughput management Scalability & Throughput
  • 27. © 2016 Silver Spring Networks. All rights reserved.27 Throughput Management We use a Stats Listener to “auto-tune” the throughput rate
  • 28. © 2016 Silver Spring Networks. All rights reserved.28 Throughput Management • Any pair of logical operators • Adjusts upstream operator throughput every N windows • Scales it by a factor based on downstream operator backlog threshold levels • A lagging correction since based on operator stats from prior windows • Observed overall processing rate across DAG oscillates • Control theory says this is not going to work since it will never converge to a reasonable value First implementation
  • 29. © 2016 Silver Spring Networks. All rights reserved.29 Throughput Management • Compute a backlog • Try to maintain a target backlog that is a multiple of the downstream operator processing rate • Problem: starvation - Stats not reported when throughput set to zero - Solution 1: small, positive min throughput - Solution 2: fractional/probabilistic emit Second implementation
  • 30. © 2016 Silver Spring Networks. All rights reserved.30 Throughput Management • Operators don’t run out of memory and crash • Overall throughput across the DAG is much higher • Can adapt to a wide mix of loads • General enough that we are using it in all our apps • We ingested 4 multi-month pilot datasets successfully • Reduced the time it takes to ingest 1 day’s worth of data from 1½ hrs to 15 min • Hands off, automated tuning Successes
  • 31. © 2016 Silver Spring Networks. All rights reserved.31 Throughput Management • Throughput management is based on tuple count and not all tuples are the same • Garbage Collection causes uneven performance • Slow to converge • Hard to test and debug Remaining problems
  • 32. © 2016 Silver Spring Networks. All rights reserved.32 • Persist processed state for files & Kafka messages - Save Kafka offsets in ZooKeeper - Rename input files to .processed • Checkpoint Listener - Wait to persist state until tuple fully transits DAG - Prevent loss of data • However, some tuples get processed twice • Suspend script - Use REST API to set a flag on Input Operator - Wait until no more activity Stopping DAGs
  • 33. © 2016 Silver Spring Networks. All rights reserved.33 • Hadoop 2.3.0 • DataTorrent 3.1.1 Versions