SlideShare a Scribd company logo
1 of 68
© 2014 MapR Technologies 1© 2014 MapR Technologies
© 2014 MapR Technologies 2
Who am I?
Ted Dunning, Chief Applications Architect MapR Technologies
Email tdunning@mapr.com tdunning@apache.org
Twitter @Ted_Dunning
© 2014 MapR Technologies 3
e-book available courtesy of MapR
http://bit.ly/1jQ9QuL
A New Look at Anomaly Detection
by Ted Dunning and Ellen Friedman © June 2014 (published by O’Reilly)
© 2014 MapR Technologies 4
Agenda
• The Internet is turning upside down
• A story
• The last (mile) shall be first
• Time series on NO-SQL
• Faster time series on NO-SQL
• Summary
© 2014 MapR Technologies 5
How the Internet Works
• Big content servers feed data across the backbone to
• Regional caches and servers feed data across neighborhood
transport to
• The “last mile”
• Bits are nearly conserved, $ are concentrated centrally
– But total $ mass at the edge is much higher
© 2014 MapR Technologies 6
How The Internet Works
Server
Cache
Cache
Gateway
Switch
Firewall
c1
c2
Gateway
Switch Firewall
c1
c2
Switch
Firewall c1
c2
© 2014 MapR Technologies 7
Conservation of Bits Decreases Bandwidth
Server
Cache
Cache
Gateway
Switch
Firewall
c1
c2
Gateway
Switch Firewall
c1
c2
Switch
Firewall c1
c2
© 2014 MapR Technologies 8
Total Investment Dominated by Last Mile
Server
Cache
Cache
Gateway
Switch
Firewall
c1
c2
Gateway
Switch Firewall
c1
c2
Switch
Firewall c1
c2
© 2014 MapR Technologies 9
The Rub
• What's the problem?
– Speed (end-to-end latency, backbone bw)
– Feasibility (cost for consumer links)
– Caching
• What do we need?
– Cheap last-mile hardware
– Good caches
© 2014 MapR Technologies 10
First:
An apology for going
off-script
© 2014 MapR Technologies 11
Now, the story
© 2014 MapR Technologies 12
© 2014 MapR Technologies 13
By the 1840’s, the NY-SF
sailing time was down to
130-180 days
© 2014 MapR Technologies 14
© 2014 MapR Technologies 15
In 1851, the record was
set at 89 days by the
Flying Cloud
© 2014 MapR Technologies 16
The difference was due
(in part) to big data
and a primitive kind of
time-series database
© 2014 MapR Technologies 17
© 2014 MapR Technologies 18
© 2014 MapR Technologies 19
© 2014 MapR Technologies 20
These charts were free …
If you donated your data
© 2014 MapR Technologies 21
But how does this apply
today?
© 2014 MapR Technologies 22
What has changed?
Where will it lead?
© 2014 MapR Technologies 23
© 2014 MapR Technologies 24
© 2014 MapR Technologies 25
© 2014 MapR Technologies 26
© 2014 MapR Technologies 27
© 2014 MapR Technologies 28
© 2014 MapR Technologies 29
© 2014 MapR Technologies 30
© 2014 MapR Technologies 31
© 2014 MapR Technologies 32
© 2014 MapR Technologies 33
Things
© 2014 MapR Technologies 34
Emitting data
© 2014 MapR Technologies 35
How The Internet Works
Server
Cache
Cache
Gateway
Switch
Firewall
c1
c2
Gateway
Switch Firewall
c1
c2
Switch
Firewall c1
c2
© 2014 MapR Technologies 36
How the Internet is Going to Work
Server
Cache
Cache
GatewaySwitchController
m4
m3
Gateway
Switch
Controller
m6
m5
Switch
Controllerm2
m1
© 2014 MapR Technologies 37
Where Will The $ Go?
Server
Cache
Cache
GatewaySwitchController
m4
m3
Gateway
Switch
Controller
m6
m5
Switch
Controllerm2
m1
© 2014 MapR Technologies 38
Sensors
© 2014 MapR Technologies 39
Controllers
© 2014 MapR Technologies 40
The Problems
• Sensors and controllers have little processing or space
– SIM cards = 20Mhz processor, 128kb space = 16kB
– Arduino mini = 15kB RAM (more EPROM)
– BeagleBone/Raspberry Pi = 500 kB RAM
• Sensors and controllers have little power
– Very common to power down 99% of the time
• Sensors and controls often have very low bandwidth
– Mesh networks with base rates << 1Mb/s
– Power line networking
– Intermittent 3G/4G/LTE connectivity
© 2014 MapR Technologies 41
What Do We Need to Do With a Time Series
• Acquire
– Measurement, transmission, reception
– Mostly not our problem
• Store
– We own this
• Retrieve
– We have to allow this
• Analyze and visualize
– We facilitate this via retrieval
© 2014 MapR Technologies 42
Retrieval Requirements
• Retrieve by time-series, time range, tags
– Possibly pull millions of data points at a time
– Possibly do on-the-fly windowed aggregations
• Search by unstructured data
– Typically require time windowed facetting after search
– Also need to dive in with first kind of retrieval
© 2014 MapR Technologies 43
Storage choices and trade-offs
• Flat files
– Great for rapid ingest with massive data
– Handles essentially any data type
– Less good for data requiring frequent updates
– Harder to find specific ranges
• Traditional relational db
– Ingests up to 10,000’s/ sec; prefers well structured (numerical) data; expensive
• Non-relational db: Tables (such as MapR tables in M7 or HBase)
– Ingests up to 100,000 rows/sec
– Handles wide variety of data
– Good for frequent updates
– Easily scanned in a range
© 2014 MapR Technologies 44
Specific Example
• Consider a server farm
• Lots of system metrics
• Typically 100-300 stats / 30 s
• Loads, RPC’s, packets, requests/s
• Common to have 100 – 10,000 machines
© 2014 MapR Technologies 45
The General Outline
• 10 samples / second / machine
x 1,000 machines
= 10,000 samples / second
• This is what Open TSDB was designed to handle
• Install and go, but don’t test at scale
© 2014 MapR Technologies 46
Specific Example
• Consider oil drilling rigs
• When drilling wells, there are *lots* of moving parts
• Typically a drilling rig makes about 10K samples/s
• Temperatures, pressures, magnetics,
machine vibration levels, salinity, voltage,
currents, many others
• Typical project has 100 rigs
© 2014 MapR Technologies 47
The General Outline
• 10K samples / second / rig
x 100 rigs
= 1M samples / second
© 2014 MapR Technologies 48
The General Outline
• 10K samples / second / rig
x 100 rigs
= 1M samples / second
• But wait, there’s more
– Suppose you want to test your system
– Perhaps with a year of data
– And you want to load that data in << 1 year
• 100x real-time = 100M samples / second
© 2014 MapR Technologies 49
How Should That Work?
Message
queue
Collector
MapR
table
Samples
Web service Users
© 2014 MapR Technologies 50
Example Time Series
...
1409497082 327810227706 mysql.bytes_received schema=foo host=db1
1409497099 6604859181710 mysql.bytes_sent schema=foo host=db1
1409497106 327812421706 mysql.bytes_received schema=foo host=db1
1409497113 6604901075387 mysql.bytes_sent schema=foo host=db
...
UNIX epoch timestamp: $(date +%s)
a metric (often hierarchical)
two tags
© 2014 MapR Technologies 51
The Whole Picture
HBase
or
MapR-DB
© 2014 MapR Technologies 52
Wide Table Design: Point-by-Point
© 2014 MapR Technologies 53
Wide Table Design: Hybrid Point-by-Point + Blob
Insertion of data as blob makes original columns redundant
Non-relational, but you can query these tables with Drill
© 2014 MapR Technologies 54
Status to This Point
• Each sample requires one insertion, compaction requires
another
• Typical performance on SE cluster
– 1 edge node + 4 cluster nodes
– 20,000 samples per second observed
– Would be faster on performance cluster, possibly not a lot
• Suitable for server monitoring
• Not suitable for large scale history ingestion
• Bulk load helps a little, but not much
• Still 1000x too slow for industrial work
© 2014 MapR Technologies 55
Speeding up OpenTSDB
20,000 data points per second per node in the test cluster
Why can’t it be faster ?
© 2014 MapR Technologies 56
Speeding up OpenTSDB: open source MapR extensions
Available on Github: https://github.com/mapr-demos/opentsdb
© 2014 MapR Technologies 57
Status to This Point
• 3600 samples require one insertion
• Typical results on SE cluster
– 1 edge node + 4 cluster nodes
– 14 million samples per second observed
– ~700x faster ingestion
• Typical results on performance cluster
– 2-4 edge nodes + 4-9 cluster nodes
– 110 million samples/s (4 nodes) to >200 million samples/s (8 nodes)
• Suitable for large scale history ingestion
• 30 million data points retrieved in 20s
• Ready for industrial work
© 2014 MapR Technologies 58
Going Further
• Open TSDB is substantially limited in many respects
– Millisecond resolution is a bit of a hack
– Data formats “just growed”, better design needed
– Internal code is difficult to modify safely
• Possible improvements
– Compress and batch at collectors
– Use advanced compression technology
– Interface with modern query systems (Apache Drill)
© 2014 MapR Technologies 59
Compression example
Samples are
64b time, 16 bit sample
Sample time at 10kHz
Sample time jitter makes it
important to keep original
time-stamp
How much overhead to
retain time-stamp?
© 2014 MapR Technologies 60
Key Results
• Ingestion is network limited
– Edge nodes are the critical resource
– Number of edge nodes defines a limit to scaling
• With enough edge nodes scaling is near perfect
• Performance of raw OpenTSDB is limited by stateless demon
• Modified OpenTSDB can run 1000x faster
© 2014 MapR Technologies 61
Overall Ingestion Rate
Nodes
TotalIngestionRate(millionsofpoints/second)
4 5 8 9
050150250
Two ingestors
One ingestor
© 2014 MapR Technologies 62
Normalized Ingestion Rate
Nodes
Ingestionpernode(millionsofpoints/second)
4 5 8 9
010203040 Two ingestors
One ingestor
© 2014 MapR Technologies 63
Why MapR?
• MapR tables are inherently faster, safer
– Sustained > 1GB/s ingest rate in tests
• Mirror to M5 or M7 cluster to isolate analytics load
• Transaction logs involves frequent appends, many files
© 2014 MapR Technologies 64
When is this All Wrong?
• In some cases, retrieval by series-id + time range not sufficient
• May need very flexible retrieval of events based on text-like
criteria
• Search may be better than class time-series database
• Can scale Lucene based search to > 1 million events / second
© 2014 MapR Technologies 65
When is it Even More Right
• In many industrial settings, data rates from individual sensors are
relatively high
– Latency to view is still measured in seconds, not sample points
• This allows batching at source
• Common requirement for highly variable sample rates
– 1 sample/s, baseline, switch to 10 k sample/s
– Small batches during slow times are just fine since number of sensors is
constant
– Requires variable window sizes
© 2014 MapR Technologies 66
Summary
• The internet is turning upside down
• This will make time series ubiquitous
• Current open source systems are much too slow
• We can fix that with modern NoSQL systems
– (I wear a red hat for a reason)
© 2014 MapR Technologies 67
Questions
© 2014 MapR Technologies 68
Thank You
@mapr maprtech
tdunning@mapr.com
tdunning@apache.org
Ted Dunning, ChiefApplicationArchitect
MapRTechnologies
maprtech
mapr-technologies

More Related Content

What's hot

Spark crash course workshop at Hadoop Summit
Spark crash course workshop at Hadoop SummitSpark crash course workshop at Hadoop Summit
Spark crash course workshop at Hadoop SummitDataWorks Summit
 
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...DataWorks Summit/Hadoop Summit
 
Improving Hadoop Resiliency and Operational Efficiency with EMC Isilon
Improving Hadoop Resiliency and Operational Efficiency with EMC IsilonImproving Hadoop Resiliency and Operational Efficiency with EMC Isilon
Improving Hadoop Resiliency and Operational Efficiency with EMC IsilonDataWorks Summit/Hadoop Summit
 
Practice of large Hadoop cluster in China Mobile
Practice of large Hadoop cluster in China MobilePractice of large Hadoop cluster in China Mobile
Practice of large Hadoop cluster in China MobileDataWorks Summit
 
Tune up Yarn and Hive
Tune up Yarn and HiveTune up Yarn and Hive
Tune up Yarn and Hiverxu
 
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3DataWorks Summit
 
Data Guarantees and Fault Tolerance in Streaming Systems
Data Guarantees and Fault Tolerance in Streaming SystemsData Guarantees and Fault Tolerance in Streaming Systems
Data Guarantees and Fault Tolerance in Streaming SystemsDataWorks Summit
 
Hadoop 3 (2017 hadoop taiwan workshop)
Hadoop 3 (2017 hadoop taiwan workshop)Hadoop 3 (2017 hadoop taiwan workshop)
Hadoop 3 (2017 hadoop taiwan workshop)Wei-Chiu Chuang
 
Hadoop operations-2015-hadoop-summit-san-jose-v5
Hadoop operations-2015-hadoop-summit-san-jose-v5Hadoop operations-2015-hadoop-summit-san-jose-v5
Hadoop operations-2015-hadoop-summit-san-jose-v5Chris Nauroth
 
Query Engines for Hive: MR, Spark, Tez with LLAP – Considerations!
Query Engines for Hive: MR, Spark, Tez with LLAP – Considerations!Query Engines for Hive: MR, Spark, Tez with LLAP – Considerations!
Query Engines for Hive: MR, Spark, Tez with LLAP – Considerations!Mich Talebzadeh (Ph.D.)
 
LLAP Nov Meetup
LLAP Nov MeetupLLAP Nov Meetup
LLAP Nov Meetupt3rmin4t0r
 
Sub-second-sql-on-hadoop-at-scale
Sub-second-sql-on-hadoop-at-scaleSub-second-sql-on-hadoop-at-scale
Sub-second-sql-on-hadoop-at-scaleYifeng Jiang
 
Taming the Elephant: Efficient and Effective Apache Hadoop Management
Taming the Elephant: Efficient and Effective Apache Hadoop ManagementTaming the Elephant: Efficient and Effective Apache Hadoop Management
Taming the Elephant: Efficient and Effective Apache Hadoop ManagementDataWorks Summit/Hadoop Summit
 
Difference between hadoop 2 vs hadoop 3
Difference between hadoop 2 vs hadoop 3Difference between hadoop 2 vs hadoop 3
Difference between hadoop 2 vs hadoop 3Manish Chopra
 
Keep your hadoop cluster at its best! v4
Keep your hadoop cluster at its best! v4Keep your hadoop cluster at its best! v4
Keep your hadoop cluster at its best! v4Chris Nauroth
 

What's hot (20)

Evolving HDFS to Generalized Storage Subsystem
Evolving HDFS to Generalized Storage SubsystemEvolving HDFS to Generalized Storage Subsystem
Evolving HDFS to Generalized Storage Subsystem
 
Spark crash course workshop at Hadoop Summit
Spark crash course workshop at Hadoop SummitSpark crash course workshop at Hadoop Summit
Spark crash course workshop at Hadoop Summit
 
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
 
Improving Hadoop Resiliency and Operational Efficiency with EMC Isilon
Improving Hadoop Resiliency and Operational Efficiency with EMC IsilonImproving Hadoop Resiliency and Operational Efficiency with EMC Isilon
Improving Hadoop Resiliency and Operational Efficiency with EMC Isilon
 
Achieving 100k Queries per Hour on Hive on Tez
Achieving 100k Queries per Hour on Hive on TezAchieving 100k Queries per Hour on Hive on Tez
Achieving 100k Queries per Hour on Hive on Tez
 
Practice of large Hadoop cluster in China Mobile
Practice of large Hadoop cluster in China MobilePractice of large Hadoop cluster in China Mobile
Practice of large Hadoop cluster in China Mobile
 
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in HiveLLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
 
Tune up Yarn and Hive
Tune up Yarn and HiveTune up Yarn and Hive
Tune up Yarn and Hive
 
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
 
Data Guarantees and Fault Tolerance in Streaming Systems
Data Guarantees and Fault Tolerance in Streaming SystemsData Guarantees and Fault Tolerance in Streaming Systems
Data Guarantees and Fault Tolerance in Streaming Systems
 
Hadoop 3 (2017 hadoop taiwan workshop)
Hadoop 3 (2017 hadoop taiwan workshop)Hadoop 3 (2017 hadoop taiwan workshop)
Hadoop 3 (2017 hadoop taiwan workshop)
 
Hadoop operations-2015-hadoop-summit-san-jose-v5
Hadoop operations-2015-hadoop-summit-san-jose-v5Hadoop operations-2015-hadoop-summit-san-jose-v5
Hadoop operations-2015-hadoop-summit-san-jose-v5
 
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS HadoopBreaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
 
Ingest and Stream Processing - What will you choose?
Ingest and Stream Processing - What will you choose?Ingest and Stream Processing - What will you choose?
Ingest and Stream Processing - What will you choose?
 
Query Engines for Hive: MR, Spark, Tez with LLAP – Considerations!
Query Engines for Hive: MR, Spark, Tez with LLAP – Considerations!Query Engines for Hive: MR, Spark, Tez with LLAP – Considerations!
Query Engines for Hive: MR, Spark, Tez with LLAP – Considerations!
 
LLAP Nov Meetup
LLAP Nov MeetupLLAP Nov Meetup
LLAP Nov Meetup
 
Sub-second-sql-on-hadoop-at-scale
Sub-second-sql-on-hadoop-at-scaleSub-second-sql-on-hadoop-at-scale
Sub-second-sql-on-hadoop-at-scale
 
Taming the Elephant: Efficient and Effective Apache Hadoop Management
Taming the Elephant: Efficient and Effective Apache Hadoop ManagementTaming the Elephant: Efficient and Effective Apache Hadoop Management
Taming the Elephant: Efficient and Effective Apache Hadoop Management
 
Difference between hadoop 2 vs hadoop 3
Difference between hadoop 2 vs hadoop 3Difference between hadoop 2 vs hadoop 3
Difference between hadoop 2 vs hadoop 3
 
Keep your hadoop cluster at its best! v4
Keep your hadoop cluster at its best! v4Keep your hadoop cluster at its best! v4
Keep your hadoop cluster at its best! v4
 

Similar to How the Internet of Things are Turning the Internet Upside Down

Ted Dunning – Very High Bandwidth Time Series Database Implementation - NoSQL...
Ted Dunning – Very High Bandwidth Time Series Database Implementation - NoSQL...Ted Dunning – Very High Bandwidth Time Series Database Implementation - NoSQL...
Ted Dunning – Very High Bandwidth Time Series Database Implementation - NoSQL...NoSQLmatters
 
How the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside DownHow the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside DownTed Dunning
 
Dunning time-series-2015
Dunning time-series-2015Dunning time-series-2015
Dunning time-series-2015Ted Dunning
 
Dealing with an Upside Down Internet With High Performance Time Series Database
Dealing with an Upside Down Internet  With High Performance Time Series DatabaseDealing with an Upside Down Internet  With High Performance Time Series Database
Dealing with an Upside Down Internet With High Performance Time Series DatabaseDataWorks Summit
 
Time Series Data in a Time Series World
Time Series Data in a Time Series WorldTime Series Data in a Time Series World
Time Series Data in a Time Series WorldMapR Technologies
 
Building HBase Applications - Ted Dunning
Building HBase Applications - Ted DunningBuilding HBase Applications - Ted Dunning
Building HBase Applications - Ted DunningMapR Technologies
 
Real time-hadoop
Real time-hadoopReal time-hadoop
Real time-hadoopTed Dunning
 
HUG_Ireland_Streaming_Ted_Dunning
HUG_Ireland_Streaming_Ted_DunningHUG_Ireland_Streaming_Ted_Dunning
HUG_Ireland_Streaming_Ted_DunningJohn Mulhall
 
Real-time Hadoop: The Ideal Messaging System for Hadoop
Real-time Hadoop: The Ideal Messaging System for Hadoop Real-time Hadoop: The Ideal Messaging System for Hadoop
Real-time Hadoop: The Ideal Messaging System for Hadoop DataWorks Summit/Hadoop Summit
 
Lawrence Livermore Labs talk 2011
Lawrence Livermore Labs talk 2011Lawrence Livermore Labs talk 2011
Lawrence Livermore Labs talk 2011MapR Technologies
 
2014.06.16 - BGI - Genomics BigData Workloads - Shenzhen China
2014.06.16 - BGI - Genomics BigData Workloads - Shenzhen China2014.06.16 - BGI - Genomics BigData Workloads - Shenzhen China
2014.06.16 - BGI - Genomics BigData Workloads - Shenzhen ChinaAllen Day, PhD
 
How to find what you didn't know to look for, oractical anomaly detection
How to find what you didn't know to look for, oractical anomaly detectionHow to find what you didn't know to look for, oractical anomaly detection
How to find what you didn't know to look for, oractical anomaly detectionDataWorks Summit
 
Spark SQL versus Apache Drill: Different Tools with Different Rules
Spark SQL versus Apache Drill: Different Tools with Different RulesSpark SQL versus Apache Drill: Different Tools with Different Rules
Spark SQL versus Apache Drill: Different Tools with Different RulesDataWorks Summit/Hadoop Summit
 
Tsinghua University: Two Exemplary Applications in China
Tsinghua University: Two Exemplary Applications in ChinaTsinghua University: Two Exemplary Applications in China
Tsinghua University: Two Exemplary Applications in ChinaDataStax Academy
 
Apache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with HadoopApache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with HadoopHortonworks
 
Zeta Architecture: The Next Generation Big Data Architecture
Zeta Architecture: The Next Generation Big Data ArchitectureZeta Architecture: The Next Generation Big Data Architecture
Zeta Architecture: The Next Generation Big Data ArchitectureMapR Technologies
 

Similar to How the Internet of Things are Turning the Internet Upside Down (20)

Ted Dunning – Very High Bandwidth Time Series Database Implementation - NoSQL...
Ted Dunning – Very High Bandwidth Time Series Database Implementation - NoSQL...Ted Dunning – Very High Bandwidth Time Series Database Implementation - NoSQL...
Ted Dunning – Very High Bandwidth Time Series Database Implementation - NoSQL...
 
How the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside DownHow the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside Down
 
Dunning time-series-2015
Dunning time-series-2015Dunning time-series-2015
Dunning time-series-2015
 
Dealing with an Upside Down Internet With High Performance Time Series Database
Dealing with an Upside Down Internet  With High Performance Time Series DatabaseDealing with an Upside Down Internet  With High Performance Time Series Database
Dealing with an Upside Down Internet With High Performance Time Series Database
 
Time Series Data in a Time Series World
Time Series Data in a Time Series WorldTime Series Data in a Time Series World
Time Series Data in a Time Series World
 
Building HBase Applications - Ted Dunning
Building HBase Applications - Ted DunningBuilding HBase Applications - Ted Dunning
Building HBase Applications - Ted Dunning
 
Keys for Success from Streams to Queries
Keys for Success from Streams to QueriesKeys for Success from Streams to Queries
Keys for Success from Streams to Queries
 
Real time-hadoop
Real time-hadoopReal time-hadoop
Real time-hadoop
 
HUG_Ireland_Streaming_Ted_Dunning
HUG_Ireland_Streaming_Ted_DunningHUG_Ireland_Streaming_Ted_Dunning
HUG_Ireland_Streaming_Ted_Dunning
 
Real-time Hadoop: The Ideal Messaging System for Hadoop
Real-time Hadoop: The Ideal Messaging System for Hadoop Real-time Hadoop: The Ideal Messaging System for Hadoop
Real-time Hadoop: The Ideal Messaging System for Hadoop
 
Yarnthug2014
Yarnthug2014Yarnthug2014
Yarnthug2014
 
Lawrence Livermore Labs talk 2011
Lawrence Livermore Labs talk 2011Lawrence Livermore Labs talk 2011
Lawrence Livermore Labs talk 2011
 
2014.06.16 - BGI - Genomics BigData Workloads - Shenzhen China
2014.06.16 - BGI - Genomics BigData Workloads - Shenzhen China2014.06.16 - BGI - Genomics BigData Workloads - Shenzhen China
2014.06.16 - BGI - Genomics BigData Workloads - Shenzhen China
 
Deep Learning for Fraud Detection
Deep Learning for Fraud DetectionDeep Learning for Fraud Detection
Deep Learning for Fraud Detection
 
How to find what you didn't know to look for, oractical anomaly detection
How to find what you didn't know to look for, oractical anomaly detectionHow to find what you didn't know to look for, oractical anomaly detection
How to find what you didn't know to look for, oractical anomaly detection
 
Spark SQL versus Apache Drill: Different Tools with Different Rules
Spark SQL versus Apache Drill: Different Tools with Different RulesSpark SQL versus Apache Drill: Different Tools with Different Rules
Spark SQL versus Apache Drill: Different Tools with Different Rules
 
Tsinghua University: Two Exemplary Applications in China
Tsinghua University: Two Exemplary Applications in ChinaTsinghua University: Two Exemplary Applications in China
Tsinghua University: Two Exemplary Applications in China
 
Kafka talk
Kafka talkKafka talk
Kafka talk
 
Apache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with HadoopApache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with Hadoop
 
Zeta Architecture: The Next Generation Big Data Architecture
Zeta Architecture: The Next Generation Big Data ArchitectureZeta Architecture: The Next Generation Big Data Architecture
Zeta Architecture: The Next Generation Big Data Architecture
 

More from DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Recently uploaded

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 

Recently uploaded (20)

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 

How the Internet of Things are Turning the Internet Upside Down

  • 1. © 2014 MapR Technologies 1© 2014 MapR Technologies
  • 2. © 2014 MapR Technologies 2 Who am I? Ted Dunning, Chief Applications Architect MapR Technologies Email tdunning@mapr.com tdunning@apache.org Twitter @Ted_Dunning
  • 3. © 2014 MapR Technologies 3 e-book available courtesy of MapR http://bit.ly/1jQ9QuL A New Look at Anomaly Detection by Ted Dunning and Ellen Friedman © June 2014 (published by O’Reilly)
  • 4. © 2014 MapR Technologies 4 Agenda • The Internet is turning upside down • A story • The last (mile) shall be first • Time series on NO-SQL • Faster time series on NO-SQL • Summary
  • 5. © 2014 MapR Technologies 5 How the Internet Works • Big content servers feed data across the backbone to • Regional caches and servers feed data across neighborhood transport to • The “last mile” • Bits are nearly conserved, $ are concentrated centrally – But total $ mass at the edge is much higher
  • 6. © 2014 MapR Technologies 6 How The Internet Works Server Cache Cache Gateway Switch Firewall c1 c2 Gateway Switch Firewall c1 c2 Switch Firewall c1 c2
  • 7. © 2014 MapR Technologies 7 Conservation of Bits Decreases Bandwidth Server Cache Cache Gateway Switch Firewall c1 c2 Gateway Switch Firewall c1 c2 Switch Firewall c1 c2
  • 8. © 2014 MapR Technologies 8 Total Investment Dominated by Last Mile Server Cache Cache Gateway Switch Firewall c1 c2 Gateway Switch Firewall c1 c2 Switch Firewall c1 c2
  • 9. © 2014 MapR Technologies 9 The Rub • What's the problem? – Speed (end-to-end latency, backbone bw) – Feasibility (cost for consumer links) – Caching • What do we need? – Cheap last-mile hardware – Good caches
  • 10. © 2014 MapR Technologies 10 First: An apology for going off-script
  • 11. © 2014 MapR Technologies 11 Now, the story
  • 12. © 2014 MapR Technologies 12
  • 13. © 2014 MapR Technologies 13 By the 1840’s, the NY-SF sailing time was down to 130-180 days
  • 14. © 2014 MapR Technologies 14
  • 15. © 2014 MapR Technologies 15 In 1851, the record was set at 89 days by the Flying Cloud
  • 16. © 2014 MapR Technologies 16 The difference was due (in part) to big data and a primitive kind of time-series database
  • 17. © 2014 MapR Technologies 17
  • 18. © 2014 MapR Technologies 18
  • 19. © 2014 MapR Technologies 19
  • 20. © 2014 MapR Technologies 20 These charts were free … If you donated your data
  • 21. © 2014 MapR Technologies 21 But how does this apply today?
  • 22. © 2014 MapR Technologies 22 What has changed? Where will it lead?
  • 23. © 2014 MapR Technologies 23
  • 24. © 2014 MapR Technologies 24
  • 25. © 2014 MapR Technologies 25
  • 26. © 2014 MapR Technologies 26
  • 27. © 2014 MapR Technologies 27
  • 28. © 2014 MapR Technologies 28
  • 29. © 2014 MapR Technologies 29
  • 30. © 2014 MapR Technologies 30
  • 31. © 2014 MapR Technologies 31
  • 32. © 2014 MapR Technologies 32
  • 33. © 2014 MapR Technologies 33 Things
  • 34. © 2014 MapR Technologies 34 Emitting data
  • 35. © 2014 MapR Technologies 35 How The Internet Works Server Cache Cache Gateway Switch Firewall c1 c2 Gateway Switch Firewall c1 c2 Switch Firewall c1 c2
  • 36. © 2014 MapR Technologies 36 How the Internet is Going to Work Server Cache Cache GatewaySwitchController m4 m3 Gateway Switch Controller m6 m5 Switch Controllerm2 m1
  • 37. © 2014 MapR Technologies 37 Where Will The $ Go? Server Cache Cache GatewaySwitchController m4 m3 Gateway Switch Controller m6 m5 Switch Controllerm2 m1
  • 38. © 2014 MapR Technologies 38 Sensors
  • 39. © 2014 MapR Technologies 39 Controllers
  • 40. © 2014 MapR Technologies 40 The Problems • Sensors and controllers have little processing or space – SIM cards = 20Mhz processor, 128kb space = 16kB – Arduino mini = 15kB RAM (more EPROM) – BeagleBone/Raspberry Pi = 500 kB RAM • Sensors and controllers have little power – Very common to power down 99% of the time • Sensors and controls often have very low bandwidth – Mesh networks with base rates << 1Mb/s – Power line networking – Intermittent 3G/4G/LTE connectivity
  • 41. © 2014 MapR Technologies 41 What Do We Need to Do With a Time Series • Acquire – Measurement, transmission, reception – Mostly not our problem • Store – We own this • Retrieve – We have to allow this • Analyze and visualize – We facilitate this via retrieval
  • 42. © 2014 MapR Technologies 42 Retrieval Requirements • Retrieve by time-series, time range, tags – Possibly pull millions of data points at a time – Possibly do on-the-fly windowed aggregations • Search by unstructured data – Typically require time windowed facetting after search – Also need to dive in with first kind of retrieval
  • 43. © 2014 MapR Technologies 43 Storage choices and trade-offs • Flat files – Great for rapid ingest with massive data – Handles essentially any data type – Less good for data requiring frequent updates – Harder to find specific ranges • Traditional relational db – Ingests up to 10,000’s/ sec; prefers well structured (numerical) data; expensive • Non-relational db: Tables (such as MapR tables in M7 or HBase) – Ingests up to 100,000 rows/sec – Handles wide variety of data – Good for frequent updates – Easily scanned in a range
  • 44. © 2014 MapR Technologies 44 Specific Example • Consider a server farm • Lots of system metrics • Typically 100-300 stats / 30 s • Loads, RPC’s, packets, requests/s • Common to have 100 – 10,000 machines
  • 45. © 2014 MapR Technologies 45 The General Outline • 10 samples / second / machine x 1,000 machines = 10,000 samples / second • This is what Open TSDB was designed to handle • Install and go, but don’t test at scale
  • 46. © 2014 MapR Technologies 46 Specific Example • Consider oil drilling rigs • When drilling wells, there are *lots* of moving parts • Typically a drilling rig makes about 10K samples/s • Temperatures, pressures, magnetics, machine vibration levels, salinity, voltage, currents, many others • Typical project has 100 rigs
  • 47. © 2014 MapR Technologies 47 The General Outline • 10K samples / second / rig x 100 rigs = 1M samples / second
  • 48. © 2014 MapR Technologies 48 The General Outline • 10K samples / second / rig x 100 rigs = 1M samples / second • But wait, there’s more – Suppose you want to test your system – Perhaps with a year of data – And you want to load that data in << 1 year • 100x real-time = 100M samples / second
  • 49. © 2014 MapR Technologies 49 How Should That Work? Message queue Collector MapR table Samples Web service Users
  • 50. © 2014 MapR Technologies 50 Example Time Series ... 1409497082 327810227706 mysql.bytes_received schema=foo host=db1 1409497099 6604859181710 mysql.bytes_sent schema=foo host=db1 1409497106 327812421706 mysql.bytes_received schema=foo host=db1 1409497113 6604901075387 mysql.bytes_sent schema=foo host=db ... UNIX epoch timestamp: $(date +%s) a metric (often hierarchical) two tags
  • 51. © 2014 MapR Technologies 51 The Whole Picture HBase or MapR-DB
  • 52. © 2014 MapR Technologies 52 Wide Table Design: Point-by-Point
  • 53. © 2014 MapR Technologies 53 Wide Table Design: Hybrid Point-by-Point + Blob Insertion of data as blob makes original columns redundant Non-relational, but you can query these tables with Drill
  • 54. © 2014 MapR Technologies 54 Status to This Point • Each sample requires one insertion, compaction requires another • Typical performance on SE cluster – 1 edge node + 4 cluster nodes – 20,000 samples per second observed – Would be faster on performance cluster, possibly not a lot • Suitable for server monitoring • Not suitable for large scale history ingestion • Bulk load helps a little, but not much • Still 1000x too slow for industrial work
  • 55. © 2014 MapR Technologies 55 Speeding up OpenTSDB 20,000 data points per second per node in the test cluster Why can’t it be faster ?
  • 56. © 2014 MapR Technologies 56 Speeding up OpenTSDB: open source MapR extensions Available on Github: https://github.com/mapr-demos/opentsdb
  • 57. © 2014 MapR Technologies 57 Status to This Point • 3600 samples require one insertion • Typical results on SE cluster – 1 edge node + 4 cluster nodes – 14 million samples per second observed – ~700x faster ingestion • Typical results on performance cluster – 2-4 edge nodes + 4-9 cluster nodes – 110 million samples/s (4 nodes) to >200 million samples/s (8 nodes) • Suitable for large scale history ingestion • 30 million data points retrieved in 20s • Ready for industrial work
  • 58. © 2014 MapR Technologies 58 Going Further • Open TSDB is substantially limited in many respects – Millisecond resolution is a bit of a hack – Data formats “just growed”, better design needed – Internal code is difficult to modify safely • Possible improvements – Compress and batch at collectors – Use advanced compression technology – Interface with modern query systems (Apache Drill)
  • 59. © 2014 MapR Technologies 59 Compression example Samples are 64b time, 16 bit sample Sample time at 10kHz Sample time jitter makes it important to keep original time-stamp How much overhead to retain time-stamp?
  • 60. © 2014 MapR Technologies 60 Key Results • Ingestion is network limited – Edge nodes are the critical resource – Number of edge nodes defines a limit to scaling • With enough edge nodes scaling is near perfect • Performance of raw OpenTSDB is limited by stateless demon • Modified OpenTSDB can run 1000x faster
  • 61. © 2014 MapR Technologies 61 Overall Ingestion Rate Nodes TotalIngestionRate(millionsofpoints/second) 4 5 8 9 050150250 Two ingestors One ingestor
  • 62. © 2014 MapR Technologies 62 Normalized Ingestion Rate Nodes Ingestionpernode(millionsofpoints/second) 4 5 8 9 010203040 Two ingestors One ingestor
  • 63. © 2014 MapR Technologies 63 Why MapR? • MapR tables are inherently faster, safer – Sustained > 1GB/s ingest rate in tests • Mirror to M5 or M7 cluster to isolate analytics load • Transaction logs involves frequent appends, many files
  • 64. © 2014 MapR Technologies 64 When is this All Wrong? • In some cases, retrieval by series-id + time range not sufficient • May need very flexible retrieval of events based on text-like criteria • Search may be better than class time-series database • Can scale Lucene based search to > 1 million events / second
  • 65. © 2014 MapR Technologies 65 When is it Even More Right • In many industrial settings, data rates from individual sensors are relatively high – Latency to view is still measured in seconds, not sample points • This allows batching at source • Common requirement for highly variable sample rates – 1 sample/s, baseline, switch to 10 k sample/s – Small batches during slow times are just fine since number of sensors is constant – Requires variable window sizes
  • 66. © 2014 MapR Technologies 66 Summary • The internet is turning upside down • This will make time series ubiquitous • Current open source systems are much too slow • We can fix that with modern NoSQL systems – (I wear a red hat for a reason)
  • 67. © 2014 MapR Technologies 67 Questions
  • 68. © 2014 MapR Technologies 68 Thank You @mapr maprtech tdunning@mapr.com tdunning@apache.org Ted Dunning, ChiefApplicationArchitect MapRTechnologies maprtech mapr-technologies

Editor's Notes

  1. Talk track: 2nd in series, first was on how to build a simple recommender. This one on anomaly detection is being sold by O’Reilly on Amazon, but for a limited time MapR is giving away the e-book for free. Here’s the link where you can register to get one.
  2. Ted’s original talk notes: OpenTSDB consists of a Time Series Daemon (TSD) as well as set of command line utilities. Interaction with OpenTSDB is primarily achieved by running one or more of the TSDs. Each TSD is independent. There is no master, no shared state so you can run as many TSDs as required to handle any load you throw at it. Each TSD uses the open source databaseHBase to store and retrieve time-series data. The HBase schema is highly optimized for fast aggregations of similar time series to minimize storage space. Users of the TSD never need to access HBase directly. You can communicate with the TSD via a simple telnet-style protocol, an HTTP API or a simple built-in GUI. All communications happen on the same port (the TSD figures out the protocol of the client by looking at the first few bytes it receives).
  3. Key ideas: Unique row key based on an id for each time series (looked up from a separate look-up table); important part of the efficiency of design is to have each column be a time off-set from the start time shown in the row key. Note that data is stored point-by-point in this wide table design. Ted’s notes from his original slide: One technique for increasing the rate at which data can be retrieved from a time series database is to store many values in each row. Doing this allows data points to be retrieved at a higher speed Because both HBase and MapR-DB store data ordered by the primary key, this design will cause rows containing data from a single time series to wind up near one another on disk. Retrieving data from a particular time series for a time range will involve largely sequential disk operations and therefore will be much faster than would be the case if the rows were widely scattered. Typically, the time window is adjusted so that 100–1,000 samples are in each row.
  4. Ted’s notes from original slide: The table design is improved by collapsing all of the data for a row into a single data structure known as a blob. This blob can be highly compressed so that less data needs to be read from disk. Also, having a single column per row decreases the per-column overhead incurred by the on-disk format that HBase uses, which further increases performance. Data can be progressively converted to the compressed format as soon as it is known that little or no new data is likely to arrive for that time series and time window. Commonly, once the time window ends, new data will only arrive for a few more seconds, and the compression of the data can begin. Since compressed and uncompressed data can coexist in the same row, if a few samples arrive after the row is compressed, the row can simply be compressed again to merge the blob and the late-arriving samples.
  5. Richard: This is based on a figure from Chapter 3 of our book. Point here is to show that with standard Open TSDB, data is loaded into the wide table point-by-point, then pulled out and compressed to blob, then reloaded to form the hybrid table. This is a fairly efficient arrangement. Next slide will show how this is speeded up with the MapR open source extensions. Here are Ted’s original notes: Since data is inserted in the uncompressed format, the arrival of each data point requires a row update operation to insert the value into the database. Then read again by the blob maker. Reads are approximately equal to writes. Once data is compressed to blobs, it is again written to the database. This row update can limit the insertion rate for data to as little as 20,000 data points per second per node in the cluster.
  6. Richard: Also based on a figure from Chapter 3 of book: This slide shows the increased performance using the open source code MapR made open on github. I’ve added the github link. The key differences is that the blob production occurs upstream, before the data is ever loaded into the table. The restart logs are useful so that if there were ever a glitch with the process of compressing data to blobs and insertion, you would not lose the original data. Note that there is still the delay while blobs are made… see explanation in book, chapters 3 and 4. Richard: Please preserve the rest of the material on fast ingestion with MapR extensions (direct blob loading) for Ted’s talk on Sat. Use this slide as a preview and mention that Ted will be talking about this on Fiday. Ted’s original notes: the direct blob insertion data flow allows the insertion rate to be increased by as much as roughly 1,000-fold. How does the direct blob approach get this bump in performance? The essential difference is that the blob maker has been moved into the data flow between the catcher and the NoSQL time series database. This way, the blob maker can use incoming data from a memory cache rather than extracting its input from wide table rows already stored in the storage tier. the full data stream is only written to the memory cache, which is fast, rather than to the database. Data is not written to the storage tier until it’s compressed into blobs, so writing can be much faster. The number of database operations is decreased by the average number of data points in each of the compressed data blobs. This decrease can easily be a factor in the thousands.