SlideShare a Scribd company logo
1 of 59
© 2017 MapR Technologies 1
Geo-Distributed Big Data and Analytics:
Data where you need it
Computation where you want it
© 2017 MapR Technologies 2
Contact Information
Ted Dunning, PhD
Chief Application Architect, MapR Technologies
Board member, Apache Software Foundation
O’Reilly author
Email tdunning@mapr.com tdunning@apache.org
Twitter @ted_dunning
© 2017 MapR Technologies 3
Contact Information
Ellen Friedman, PhD
Principal Technologist, MapR Technologies
Committer Apache Drill & Apache Mahout projects
O’Reilly author
Email efriedman@mapr.com ellenf@apache.org
Twitter @Ellen_Friedman
© 2017 MapR Technologies 4
Imagine a future where …
You easily collect, access & analyze big data where ever you need it across
the globe in a seamless system under the same security & administration
© 2017 MapR Technologies 5
The future is here: Global Data Fabric
• Global data fabric lets you update business globally
• Local activities coordinate with global analyses
• Do this without requiring huge teams at each site
• Do this affordably, reliably, with low latency and at large scale
We’re here to explain how, but first: a real-world case study
© 2017 MapR Technologies 6
MapR customer:
“A year in the bank”
© 2017 MapR Technologies 7
A Year of Technical Credit, not Debt
• Streaming media delivery is inherently geo-distributed
• Must measure if you want to manage
– metrics need to be collected and processed locally
– and distributed back to central facility
• How?
Streams!
© 2017 MapR Technologies 8
Collect Data
log consolidator
web server
web server
Web-
server
Log
Web-
server
Log
log_events
log-stash
log-stash
data center
© 2017 MapR Technologies 9
And Transport to Global Analytics
log consolidator
web server
web server
Web-
server
Log
Web-
server
Log
log_events
log-stash
log-stash
data center GHQ
log_events
events
Elaborate
events
(log-stash)
Aggregate
Signal
detection
© 2017 MapR Technologies 10
With Many Sources
log consolidator
web server
web server
Web-
server
Log
Web-
server
Log
log_events
log-stash
log-stash
data center GHQ
log_events
events
Elaborate
events
(log-stash)
Aggregate
Signal
detection
© 2017 MapR Technologies 11
With Many Sources
log consolidator
web server
web server
Web-
server
Log
Web-
server
Log
log_events
log-stash
log-stash
data center GHQ
log_events
events
Elaborate
events
(log-stash)
Aggregate
Signal
detection
log consolidator
web server
Web-
server
Log
web server
Web-
server
Log
log_events
log-stash
log-stash
data center
© 2017 MapR Technologies 12
With Many Sources
log consolidator
web server
web server
Web-
server
Log
Web-
server
Log
log_events
log-stash
log-stash
data center GHQ
log_events
events
Elaborate
events
(log-stash)
Aggregate
Signal
detection
log consolidator
web server
Web-
server
Log
web server
Web-
server
Log
log_events
log-stash
log-stash
data center
log consolidator
web server
Web-
server
Log
web server
Web-
server
Log
log_events
log-stash
log-stash
data center
© 2017 MapR Technologies 13
Analytics Doesn’t Care About Location
GHQ
log_events
events
Elaborate
events
(log-stash)
Aggregate
Signal
detection
© 2017 MapR Technologies 14
MapR Streams:
Geo-distributed data
plus separation of concerns
© 2017 MapR Technologies 15
Why stream?
© 2017 MapR Technologies 16
Mechanism for a Global Data Fabric: Streaming
• Streaming data is becoming mainstream
• Innovative technologies are emerging to handle and process
streaming data
• Stream-1st architecture is a powerful approach with surprisingly
widespread advantages
© 2017 MapR Technologies 17
Images © E. Friedman
.
IoT Data: Sensors & Smart Parts
©WesAbrams
© 2017 MapR Technologies 18
Revolution in Manufacturing: Smart Tools
• Respond quickly to new requirements
• IoT enabled “smart tools” on
manufacturing floor
• Reconfigurable factory
• Removes barriers to communication:
engineering, design, analysis,
manufacture in one space
Image credit Bond Bryan Architecture, used with permission
Factory 2050: Boeing AMRC at University of Sheffield
© 2017 MapR Technologies 19
Streaming data has value
beyond real-time insights
© 2017 MapR Technologies 20
Predictive Maintenance
• Streaming sensor data + long term maintenance histories 
• Machine learning model detects anomalous pattern
• “Failure signature” warns of need for maintenance before
damage occurs
Image courtesy Mtell used with permission.
in Real World Hadoop by Dunning & Friedman © 2015
© 2017 MapR Technologies 21
Stream-1st Architecture
Real-time
analytics
EMR
Patient Facilities
management
Insurance
audit
A
B
Medical tests
C
Medical test
results
People often start with event
data streamed to a real-time
application (A)
Image © 2016 Ted Dunning & Ellen Friedman from Chap 1 O’Reilly book Streaming Architecture
used with permission
© 2017 MapR Technologies 22
Heart of Stream-1st Architecture: Message Transport
Real-time
analytics
EMR
Patient Facilities
management
Insurance
audit
A
B
Medical tests
C
Medical test
results
The right messaging tool
supports other classes of use
cases (B & C)
Image © 2016 Ted Dunning & Ellen Friedman from Chap 1 O’Reilly book Streaming Architecture
used with permission
© 2017 MapR Technologies 23
Key capabilities:
Message Transport: Apache Kafka & MapR Streams
• Highly scalable
• High throughput, low latency
• Multiple producers &
consumers: decoupled
• Durable messages
• Geo-distributed replication
preserves offsets (unique to
MapR Streams)
Consumer
group
Messages
Producer
Consumer
group
Consumer
group
Producer
Image © 2016 Ted Dunning & Ellen Friedman from Chap 2 of O’Reilly book Streaming Architecture
used with permission
© 2017 MapR Technologies 24
Stream transport
supports micro services
© 2017 MapR Technologies 25
Stream-1st Architecture: Basis for Micro-Services
Stream instead of database as the shared “truth”
POS
1..n
Fraud
detector
Last card
use
Updater
Card
analytics
Other
card activity
Image © 2016 Ted Dunning & Ellen Friedman from Chap 6 of O’Reilly book Streaming Architecture used with permission
© 2017 MapR Technologies 26
MapR Streams:
efficient bi-directional,
multi-master replication
© 2017 MapR Technologies 27
With MapR, Geo-Distributed Data Appears Local
stream
Data
source
Consumer
© 2017 MapR Technologies 28
With MapR, Geo-Distributed Data Appears Local
stream
stream
Data
source
Consumer
© 2017 MapR Technologies 29
With MapR, Geo-distributed Data Appears Local
stream
stream
Data
source
ConsumerGlobal Data Center
Regional Data Center
© 2017 MapR Technologies 30
Unique to MapR: Manage Topics at Stream Level
• Many more topics on MapR cluster
• Topics are grouped together in Stream (different from Kafka)
• Policies set at the Stream level such as time-to-live, ACEs (controlled
access at this level is different than Kafka)
• Geo-distributed stream replication (different from Kafka)
Stream
Topic 1
Topic 3
Topic 2
Image © 2016 Ted Dunning & Ellen Friedman from Chap 5 of O’Reilly book Streaming Architecture used with permission
© 2017 MapR Technologies 31
Multi-Master Matters: MapR Table Replication
SF
DB
NY
DB
Source Source
SF
DB
NY
DB
Source Source
A
B
Better:
Bi-directional table replication
© 2017 MapR Technologies 32
Multi-Master Replication Matters
From O’Reilly report “Data Where You Want It: Geo-Distribution of Big Data and
Analytics” © 2017 by Ted Dunning & Ellen Friedman, used with permission
© 2017 MapR Technologies 33
Legacy Applications
MapR: Files, tables, streams in one technology
Data Center
Big Data 1.0 Applications Next-Gen Applications
Converged Data Platform
High Availability Real Time Unified Security Multi-tenancy Disaster Recovery Global Namespace
Real-Time Database Stream TransportWeb-Scale Storage
© 2017 MapR Technologies 34
Example
Files
Table
Streams
Directories
Cluster
Volume mount point
© 2017 MapR Technologies 35
Geo-distributed data
with ease of management
© 2017 MapR Technologies 36
Remember this? Universal Pathname
Files
Table
Streams
Directories
Cluster
Volume mount point
© 2017 MapR Technologies 37
Global Namespace: Advantage for Geo-Distribution
• Enables program to refer to data anywhere
– On premise, in cloud, around the world
• Easier to manage: Supports separation of concerns
• Unique to MapR
© 2017 MapR Technologies 38
The Need for Cloud
• Fewer machines to purchase, flexibility regarding hardware
• Eases pressure on IT
• Cloud-bursting possible for short-term increase in computing
• Lets you optimize cluster usage (especially if you maintain
cloud neutrality)
© 2017 MapR Technologies 39
Cloud Neutrality for Optimization
Burst
Private
On-premise
data center
Core
4x cheaper for base load
4x cheaper for
peak loads
© 2017 MapR Technologies 40
Hybrid Cloud On-Premise Architecture
• Optimization of resources, but…
• Too complicated unless have a good platform for geo-distributed
data
• MapR is such a platform
© 2017 MapR Technologies 41
The Need for Containers
• To deploy exactly the same thing in many data-centers
• To get predictable behavior in production compared with testing
• Result: Docker-style containers becoming ubiquitous
© 2017 MapR Technologies 42
Key to Lightweight Containers: Persist State to MapR
Data platform
Stateful
Application
Stateful
Application
Stateless
Application
Container management system
• Works for files, tables, streams
• Run stateful applications in stateless containers
© 2017 MapR Technologies 43
Data where you want it,
compute where you need it…
© 2017 MapR Technologies 44
... including processing
at the IoT edge
© 2017 MapR Technologies 45
MapR Edge
• Small footprint cluster for remote processing
• Intended to run right next to the data producing device
– Unified end-to-end security policy
– Reliable replication to cloud and data center, even with occasional
connections
• Small but full MapR data services (files, tables, streams)
– Normal data protection (snapshots, mirroring, replication)
– Normal management capabilities (volumes, fine-grained access
control, monitoring)
© 2017 MapR Technologies 46
Why MapR Edge is Useful
Data
source
Data
source
Data
source
Report
Data
source
• Designed to sit at IoT edge
• Intended to run right next to data- producing
device
© 2017 MapR Technologies 47
Who needs MapR Edge?
• Connected car industry
• Telecommunications industry
• Hospitals and medical testing facilities
• Anyone who benefits from global learning but needs local
action
© Ellen Friedman
© Ellen Friedman
©WesAbrams
© 2017 MapR Technologies 48
MapR Edge: Improves Time to Insight
Before MapR Edge After MapR Edge
• Oil & gas
• Medical device
• Car test & dev
48 hours
12 hours
24 hours
< 2 hours
< 15 minutes
< 5 minutes
© 2017 MapR Technologies 49
Use Case: Telecommunications
Callers
Towers
cdr data
© 2017 MapR Technologies 50
Streaming in Telecom
• Data collection & handling happens at different levels
– tower, local data center, central data center)
• Batch: Can take 30 minutes per level
• Streaming: Latency drops to seconds or sub-seconds per level
• Ability to respond as events occur
• MapR Streams enables stream replication with offsets across data
centers
© 2017 MapR Technologies 51
Telecom Reporting and Logging
Tower 2
Tower 1
Data
source
HQ
Aggregate
Data
source
© 2017 MapR Technologies 52
Data Center
REST
https REST GW
Use Case: Automotive IoT
Car
CAN Bus
μ
Raw stream
Dispatcher
Data stream
Metrics
μ
© 2017 MapR Technologies 53
Global
analytics
models
GHQ
metrics
data center 1
data center 2
metrics
m1
m2
m3
m4
metrics
m1
m2
m3
m4
models
models
Global Machine Learning Foundation
© 2017 MapR Technologies 54
Learn globally, act locally
© 2017 MapR Technologies 55
Image © Ellen Friedman 2015, used with permission. From Chap 7 “Streaming
Architecture” book. Read free online: http://bit.ly/streams-ebook-ch7
.
Over 20% of world’s
shipping containers pass
through Singapore’s port.
Use Case: Container Shipping
© 2017 MapR Technologies 56
IoT Data for Container Shipping
Tokyo:
Sensors stream data to on-board
cluster that reports to onshore cluster
while in port
En route to Singapore:
MapR Streams geo-replication sends
data to next port before ship arrives.
Problem in Sydney:
Real-time insights alert to “high
humidity” in some containers
Singapore
Tokyo
Sydney
Corporate
HQ
A
B
C
Details in Chapter 7 “Streaming Architecture” book. Read free online here: http://bit.ly/streams-ebook-ch7
Figure used with permission.
© 2017 MapR Technologies 57
Additional Resources
O’Reilly report by Ted Dunning & Ellen Friedman © March 2017
Download free pdf courtesy of MapR:
http://bit.ly/mapr-geo-distribution-ebook-pdf
O’Reilly book by Ted Dunning & Ellen Friedman
© March 2016
Read free courtesy of MapR
https://mapr.com/streaming-architecture-using-
apache-kafka-mapr-streams/
© 2017 MapR Technologies 58
Please support women in tech – help build
girls’ dreams of what they can accomplish
© Ellen Friedman 2015#womenintech #datawomen
© 2017 MapR Technologies 59
Q&A
@mapr
Maprtechnologies
tdunning@mapr.com
ENGAGE WITH US
@ ted_dunning
@ Ellen_Friedman

More Related Content

What's hot

An Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data PlatformAn Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data PlatformMapR Technologies
 
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...MapR Technologies
 
Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...
Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...
Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...MapR Technologies
 
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...Carol McDonald
 
ML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning LogisticsML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning LogisticsMapR Technologies
 
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...MapR Technologies
 
Meruvian - Introduction to MapR
Meruvian - Introduction to MapRMeruvian - Introduction to MapR
Meruvian - Introduction to MapRThe World Bank
 
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale StorageBringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale StorageMapR Technologies
 
3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data Analytics3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data AnalyticsMapR Technologies
 
Spark and MapR Streams: A Motivating Example
Spark and MapR Streams: A Motivating ExampleSpark and MapR Streams: A Motivating Example
Spark and MapR Streams: A Motivating ExampleIan Downard
 
MapR Streams and MapR Converged Data Platform
MapR Streams and MapR Converged Data PlatformMapR Streams and MapR Converged Data Platform
MapR Streams and MapR Converged Data PlatformMapR Technologies
 
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real- T...
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real- T...Applying Machine Learning to IOT: End to End Distributed Pipeline for Real- T...
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real- T...Carol McDonald
 
State of the Art Robot Predictive Maintenance with Real-time Sensor Data
State of the Art Robot Predictive Maintenance with Real-time Sensor DataState of the Art Robot Predictive Maintenance with Real-time Sensor Data
State of the Art Robot Predictive Maintenance with Real-time Sensor DataMathieu Dumoulin
 
Live Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn PredictionLive Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn PredictionMapR Technologies
 
Streaming Machine learning Distributed Pipeline for Real-Time Uber Data Using...
Streaming Machine learning Distributed Pipeline for Real-Time Uber Data Using...Streaming Machine learning Distributed Pipeline for Real-Time Uber Data Using...
Streaming Machine learning Distributed Pipeline for Real-Time Uber Data Using...Carol McDonald
 
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real-Ti...
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real-Ti...Applying Machine Learning to IOT: End to End Distributed Pipeline for Real-Ti...
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real-Ti...Carol McDonald
 
MapR on Azure: Getting Value from Big Data in the Cloud -
MapR on Azure: Getting Value from Big Data in the Cloud -MapR on Azure: Getting Value from Big Data in the Cloud -
MapR on Azure: Getting Value from Big Data in the Cloud -MapR Technologies
 
Pouring the Foundation: Data Management in the Energy Industry
Pouring the Foundation: Data Management in the Energy IndustryPouring the Foundation: Data Management in the Energy Industry
Pouring the Foundation: Data Management in the Energy IndustryDataWorks Summit
 
Streaming healthcare Data pipeline using Apache APIs: Kafka and Spark with Ma...
Streaming healthcare Data pipeline using Apache APIs: Kafka and Spark with Ma...Streaming healthcare Data pipeline using Apache APIs: Kafka and Spark with Ma...
Streaming healthcare Data pipeline using Apache APIs: Kafka and Spark with Ma...Carol McDonald
 
Predictive Analytics with Hadoop
Predictive Analytics with HadoopPredictive Analytics with Hadoop
Predictive Analytics with HadoopDataWorks Summit
 

What's hot (20)

An Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data PlatformAn Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data Platform
 
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
 
Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...
Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...
Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...
 
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
 
ML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning LogisticsML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning Logistics
 
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
 
Meruvian - Introduction to MapR
Meruvian - Introduction to MapRMeruvian - Introduction to MapR
Meruvian - Introduction to MapR
 
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale StorageBringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
 
3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data Analytics3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data Analytics
 
Spark and MapR Streams: A Motivating Example
Spark and MapR Streams: A Motivating ExampleSpark and MapR Streams: A Motivating Example
Spark and MapR Streams: A Motivating Example
 
MapR Streams and MapR Converged Data Platform
MapR Streams and MapR Converged Data PlatformMapR Streams and MapR Converged Data Platform
MapR Streams and MapR Converged Data Platform
 
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real- T...
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real- T...Applying Machine Learning to IOT: End to End Distributed Pipeline for Real- T...
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real- T...
 
State of the Art Robot Predictive Maintenance with Real-time Sensor Data
State of the Art Robot Predictive Maintenance with Real-time Sensor DataState of the Art Robot Predictive Maintenance with Real-time Sensor Data
State of the Art Robot Predictive Maintenance with Real-time Sensor Data
 
Live Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn PredictionLive Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn Prediction
 
Streaming Machine learning Distributed Pipeline for Real-Time Uber Data Using...
Streaming Machine learning Distributed Pipeline for Real-Time Uber Data Using...Streaming Machine learning Distributed Pipeline for Real-Time Uber Data Using...
Streaming Machine learning Distributed Pipeline for Real-Time Uber Data Using...
 
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real-Ti...
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real-Ti...Applying Machine Learning to IOT: End to End Distributed Pipeline for Real-Ti...
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real-Ti...
 
MapR on Azure: Getting Value from Big Data in the Cloud -
MapR on Azure: Getting Value from Big Data in the Cloud -MapR on Azure: Getting Value from Big Data in the Cloud -
MapR on Azure: Getting Value from Big Data in the Cloud -
 
Pouring the Foundation: Data Management in the Energy Industry
Pouring the Foundation: Data Management in the Energy IndustryPouring the Foundation: Data Management in the Energy Industry
Pouring the Foundation: Data Management in the Energy Industry
 
Streaming healthcare Data pipeline using Apache APIs: Kafka and Spark with Ma...
Streaming healthcare Data pipeline using Apache APIs: Kafka and Spark with Ma...Streaming healthcare Data pipeline using Apache APIs: Kafka and Spark with Ma...
Streaming healthcare Data pipeline using Apache APIs: Kafka and Spark with Ma...
 
Predictive Analytics with Hadoop
Predictive Analytics with HadoopPredictive Analytics with Hadoop
Predictive Analytics with Hadoop
 

Similar to Geo-Distributed Big Data and Analytics

Streaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine LearningStreaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine LearningTed Dunning
 
Why Stream? Advantages of Streaming Architecture #StrataData SJ 2017 presenta...
Why Stream? Advantages of Streaming Architecture #StrataData SJ 2017 presenta...Why Stream? Advantages of Streaming Architecture #StrataData SJ 2017 presenta...
Why Stream? Advantages of Streaming Architecture #StrataData SJ 2017 presenta...Ellen Friedman
 
Big Data LDN 2017: Real World Impact of a Global Data Fabric
Big Data LDN 2017: Real World Impact of a Global Data FabricBig Data LDN 2017: Real World Impact of a Global Data Fabric
Big Data LDN 2017: Real World Impact of a Global Data FabricMatt Stubbs
 
Converged and Containerized Distributed Deep Learning With TensorFlow and Kub...
Converged and Containerized Distributed Deep Learning With TensorFlow and Kub...Converged and Containerized Distributed Deep Learning With TensorFlow and Kub...
Converged and Containerized Distributed Deep Learning With TensorFlow and Kub...Mathieu Dumoulin
 
Big Data LDN 2017: How to leverage the cloud for Business Solutions
Big Data LDN 2017: How to leverage the cloud for Business SolutionsBig Data LDN 2017: How to leverage the cloud for Business Solutions
Big Data LDN 2017: How to leverage the cloud for Business SolutionsMatt Stubbs
 
Streaming patterns revolutionary architectures
Streaming patterns revolutionary architectures Streaming patterns revolutionary architectures
Streaming patterns revolutionary architectures Carol McDonald
 
Predictive Maintenance Using Recurrent Neural Networks
Predictive Maintenance Using Recurrent Neural NetworksPredictive Maintenance Using Recurrent Neural Networks
Predictive Maintenance Using Recurrent Neural NetworksJustin Brandenburg
 
Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...
Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...
Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...Chris Fregly
 
Digital Transformation - #StrataData London 2017 - Data101
Digital Transformation - #StrataData London 2017 - Data101Digital Transformation - #StrataData London 2017 - Data101
Digital Transformation - #StrataData London 2017 - Data101Ellen Friedman
 
Container and Kubernetes without limits
Container and Kubernetes without limitsContainer and Kubernetes without limits
Container and Kubernetes without limitsAntje Barth
 
Map r chicago_advanalytics_oct_meetup
Map r chicago_advanalytics_oct_meetupMap r chicago_advanalytics_oct_meetup
Map r chicago_advanalytics_oct_meetupAlan Iovine
 
Introduction to Streaming with Apache Flink
Introduction to Streaming with Apache FlinkIntroduction to Streaming with Apache Flink
Introduction to Streaming with Apache FlinkTugdual Grall
 
Progress for big data in Kubernetes
Progress for big data in KubernetesProgress for big data in Kubernetes
Progress for big data in KubernetesTed Dunning
 
Fast Cars, Big Data How Streaming can help Formula 1
Fast Cars, Big Data How Streaming can help Formula 1Fast Cars, Big Data How Streaming can help Formula 1
Fast Cars, Big Data How Streaming can help Formula 1Carol McDonald
 
MapR Edge : Act Locally Learn Globally
MapR Edge : Act Locally Learn GloballyMapR Edge : Act Locally Learn Globally
MapR Edge : Act Locally Learn Globallyridhav
 
[DataCon.TW 2017] Data Lake: centralize in on-prem vs. decentralize on cloud
[DataCon.TW 2017] Data Lake: centralize in on-prem vs. decentralize on cloud[DataCon.TW 2017] Data Lake: centralize in on-prem vs. decentralize on cloud
[DataCon.TW 2017] Data Lake: centralize in on-prem vs. decentralize on cloudJeff Hung
 
MapR and Cisco Make IT Better
MapR and Cisco Make IT BetterMapR and Cisco Make IT Better
MapR and Cisco Make IT BetterMapR Technologies
 
MapR-DB – The First In-Hadoop Document Database
MapR-DB – The First In-Hadoop Document DatabaseMapR-DB – The First In-Hadoop Document Database
MapR-DB – The First In-Hadoop Document DatabaseMapR Technologies
 
Advanced Threat Detection on Streaming Data
Advanced Threat Detection on Streaming DataAdvanced Threat Detection on Streaming Data
Advanced Threat Detection on Streaming DataCarol McDonald
 
Structured Streaming Data Pipeline Using Kafka, Spark, and MapR-DB
Structured Streaming Data Pipeline Using Kafka, Spark, and MapR-DBStructured Streaming Data Pipeline Using Kafka, Spark, and MapR-DB
Structured Streaming Data Pipeline Using Kafka, Spark, and MapR-DBCarol McDonald
 

Similar to Geo-Distributed Big Data and Analytics (20)

Streaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine LearningStreaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine Learning
 
Why Stream? Advantages of Streaming Architecture #StrataData SJ 2017 presenta...
Why Stream? Advantages of Streaming Architecture #StrataData SJ 2017 presenta...Why Stream? Advantages of Streaming Architecture #StrataData SJ 2017 presenta...
Why Stream? Advantages of Streaming Architecture #StrataData SJ 2017 presenta...
 
Big Data LDN 2017: Real World Impact of a Global Data Fabric
Big Data LDN 2017: Real World Impact of a Global Data FabricBig Data LDN 2017: Real World Impact of a Global Data Fabric
Big Data LDN 2017: Real World Impact of a Global Data Fabric
 
Converged and Containerized Distributed Deep Learning With TensorFlow and Kub...
Converged and Containerized Distributed Deep Learning With TensorFlow and Kub...Converged and Containerized Distributed Deep Learning With TensorFlow and Kub...
Converged and Containerized Distributed Deep Learning With TensorFlow and Kub...
 
Big Data LDN 2017: How to leverage the cloud for Business Solutions
Big Data LDN 2017: How to leverage the cloud for Business SolutionsBig Data LDN 2017: How to leverage the cloud for Business Solutions
Big Data LDN 2017: How to leverage the cloud for Business Solutions
 
Streaming patterns revolutionary architectures
Streaming patterns revolutionary architectures Streaming patterns revolutionary architectures
Streaming patterns revolutionary architectures
 
Predictive Maintenance Using Recurrent Neural Networks
Predictive Maintenance Using Recurrent Neural NetworksPredictive Maintenance Using Recurrent Neural Networks
Predictive Maintenance Using Recurrent Neural Networks
 
Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...
Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...
Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...
 
Digital Transformation - #StrataData London 2017 - Data101
Digital Transformation - #StrataData London 2017 - Data101Digital Transformation - #StrataData London 2017 - Data101
Digital Transformation - #StrataData London 2017 - Data101
 
Container and Kubernetes without limits
Container and Kubernetes without limitsContainer and Kubernetes without limits
Container and Kubernetes without limits
 
Map r chicago_advanalytics_oct_meetup
Map r chicago_advanalytics_oct_meetupMap r chicago_advanalytics_oct_meetup
Map r chicago_advanalytics_oct_meetup
 
Introduction to Streaming with Apache Flink
Introduction to Streaming with Apache FlinkIntroduction to Streaming with Apache Flink
Introduction to Streaming with Apache Flink
 
Progress for big data in Kubernetes
Progress for big data in KubernetesProgress for big data in Kubernetes
Progress for big data in Kubernetes
 
Fast Cars, Big Data How Streaming can help Formula 1
Fast Cars, Big Data How Streaming can help Formula 1Fast Cars, Big Data How Streaming can help Formula 1
Fast Cars, Big Data How Streaming can help Formula 1
 
MapR Edge : Act Locally Learn Globally
MapR Edge : Act Locally Learn GloballyMapR Edge : Act Locally Learn Globally
MapR Edge : Act Locally Learn Globally
 
[DataCon.TW 2017] Data Lake: centralize in on-prem vs. decentralize on cloud
[DataCon.TW 2017] Data Lake: centralize in on-prem vs. decentralize on cloud[DataCon.TW 2017] Data Lake: centralize in on-prem vs. decentralize on cloud
[DataCon.TW 2017] Data Lake: centralize in on-prem vs. decentralize on cloud
 
MapR and Cisco Make IT Better
MapR and Cisco Make IT BetterMapR and Cisco Make IT Better
MapR and Cisco Make IT Better
 
MapR-DB – The First In-Hadoop Document Database
MapR-DB – The First In-Hadoop Document DatabaseMapR-DB – The First In-Hadoop Document Database
MapR-DB – The First In-Hadoop Document Database
 
Advanced Threat Detection on Streaming Data
Advanced Threat Detection on Streaming DataAdvanced Threat Detection on Streaming Data
Advanced Threat Detection on Streaming Data
 
Structured Streaming Data Pipeline Using Kafka, Spark, and MapR-DB
Structured Streaming Data Pipeline Using Kafka, Spark, and MapR-DBStructured Streaming Data Pipeline Using Kafka, Spark, and MapR-DB
Structured Streaming Data Pipeline Using Kafka, Spark, and MapR-DB
 

More from MapR Technologies

Live Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIsLive Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIsMapR Technologies
 
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA DeploymentsCisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA DeploymentsMapR Technologies
 
Evolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQLEvolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQLMapR Technologies
 
Evolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and RainEvolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and RainMapR Technologies
 
Open Source Innovations in the MapR Ecosystem Pack 2.0
Open Source Innovations in the MapR Ecosystem Pack 2.0Open Source Innovations in the MapR Ecosystem Pack 2.0
Open Source Innovations in the MapR Ecosystem Pack 2.0MapR Technologies
 
How Spark is Enabling the New Wave of Converged Cloud Applications
How Spark is Enabling the New Wave of Converged Cloud Applications How Spark is Enabling the New Wave of Converged Cloud Applications
How Spark is Enabling the New Wave of Converged Cloud Applications MapR Technologies
 
MapR 5.2: Getting More Value from the MapR Converged Data Platform
MapR 5.2: Getting More Value from the MapR Converged Data PlatformMapR 5.2: Getting More Value from the MapR Converged Data Platform
MapR 5.2: Getting More Value from the MapR Converged Data PlatformMapR Technologies
 
Handling the Extremes: Scaling and Streaming in Finance
Handling the Extremes: Scaling and Streaming in FinanceHandling the Extremes: Scaling and Streaming in Finance
Handling the Extremes: Scaling and Streaming in FinanceMapR Technologies
 
Baptist Health: Solving Healthcare Problems with Big Data
Baptist Health: Solving Healthcare Problems with Big DataBaptist Health: Solving Healthcare Problems with Big Data
Baptist Health: Solving Healthcare Problems with Big DataMapR Technologies
 
The Keys to Digital Transformation
The Keys to Digital TransformationThe Keys to Digital Transformation
The Keys to Digital TransformationMapR Technologies
 
Insight Platforms Accelerate Digital Transformation
Insight Platforms Accelerate Digital TransformationInsight Platforms Accelerate Digital Transformation
Insight Platforms Accelerate Digital TransformationMapR Technologies
 
Design Patterns for working with Fast Data
Design Patterns for working with Fast DataDesign Patterns for working with Fast Data
Design Patterns for working with Fast DataMapR Technologies
 

More from MapR Technologies (12)

Live Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIsLive Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIs
 
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA DeploymentsCisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
 
Evolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQLEvolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQL
 
Evolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and RainEvolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and Rain
 
Open Source Innovations in the MapR Ecosystem Pack 2.0
Open Source Innovations in the MapR Ecosystem Pack 2.0Open Source Innovations in the MapR Ecosystem Pack 2.0
Open Source Innovations in the MapR Ecosystem Pack 2.0
 
How Spark is Enabling the New Wave of Converged Cloud Applications
How Spark is Enabling the New Wave of Converged Cloud Applications How Spark is Enabling the New Wave of Converged Cloud Applications
How Spark is Enabling the New Wave of Converged Cloud Applications
 
MapR 5.2: Getting More Value from the MapR Converged Data Platform
MapR 5.2: Getting More Value from the MapR Converged Data PlatformMapR 5.2: Getting More Value from the MapR Converged Data Platform
MapR 5.2: Getting More Value from the MapR Converged Data Platform
 
Handling the Extremes: Scaling and Streaming in Finance
Handling the Extremes: Scaling and Streaming in FinanceHandling the Extremes: Scaling and Streaming in Finance
Handling the Extremes: Scaling and Streaming in Finance
 
Baptist Health: Solving Healthcare Problems with Big Data
Baptist Health: Solving Healthcare Problems with Big DataBaptist Health: Solving Healthcare Problems with Big Data
Baptist Health: Solving Healthcare Problems with Big Data
 
The Keys to Digital Transformation
The Keys to Digital TransformationThe Keys to Digital Transformation
The Keys to Digital Transformation
 
Insight Platforms Accelerate Digital Transformation
Insight Platforms Accelerate Digital TransformationInsight Platforms Accelerate Digital Transformation
Insight Platforms Accelerate Digital Transformation
 
Design Patterns for working with Fast Data
Design Patterns for working with Fast DataDesign Patterns for working with Fast Data
Design Patterns for working with Fast Data
 

Recently uploaded

Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024Timothy Spann
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...GQ Research
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics
 
Vision, Mission, Goals and Objectives ppt..pptx
Vision, Mission, Goals and Objectives ppt..pptxVision, Mission, Goals and Objectives ppt..pptx
Vision, Mission, Goals and Objectives ppt..pptxellehsormae
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
LLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGILLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGIThomas Poetter
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 

Recently uploaded (20)

Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
 
Vision, Mission, Goals and Objectives ppt..pptx
Vision, Mission, Goals and Objectives ppt..pptxVision, Mission, Goals and Objectives ppt..pptx
Vision, Mission, Goals and Objectives ppt..pptx
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
LLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGILLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGI
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 

Geo-Distributed Big Data and Analytics

  • 1. © 2017 MapR Technologies 1 Geo-Distributed Big Data and Analytics: Data where you need it Computation where you want it
  • 2. © 2017 MapR Technologies 2 Contact Information Ted Dunning, PhD Chief Application Architect, MapR Technologies Board member, Apache Software Foundation O’Reilly author Email tdunning@mapr.com tdunning@apache.org Twitter @ted_dunning
  • 3. © 2017 MapR Technologies 3 Contact Information Ellen Friedman, PhD Principal Technologist, MapR Technologies Committer Apache Drill & Apache Mahout projects O’Reilly author Email efriedman@mapr.com ellenf@apache.org Twitter @Ellen_Friedman
  • 4. © 2017 MapR Technologies 4 Imagine a future where … You easily collect, access & analyze big data where ever you need it across the globe in a seamless system under the same security & administration
  • 5. © 2017 MapR Technologies 5 The future is here: Global Data Fabric • Global data fabric lets you update business globally • Local activities coordinate with global analyses • Do this without requiring huge teams at each site • Do this affordably, reliably, with low latency and at large scale We’re here to explain how, but first: a real-world case study
  • 6. © 2017 MapR Technologies 6 MapR customer: “A year in the bank”
  • 7. © 2017 MapR Technologies 7 A Year of Technical Credit, not Debt • Streaming media delivery is inherently geo-distributed • Must measure if you want to manage – metrics need to be collected and processed locally – and distributed back to central facility • How? Streams!
  • 8. © 2017 MapR Technologies 8 Collect Data log consolidator web server web server Web- server Log Web- server Log log_events log-stash log-stash data center
  • 9. © 2017 MapR Technologies 9 And Transport to Global Analytics log consolidator web server web server Web- server Log Web- server Log log_events log-stash log-stash data center GHQ log_events events Elaborate events (log-stash) Aggregate Signal detection
  • 10. © 2017 MapR Technologies 10 With Many Sources log consolidator web server web server Web- server Log Web- server Log log_events log-stash log-stash data center GHQ log_events events Elaborate events (log-stash) Aggregate Signal detection
  • 11. © 2017 MapR Technologies 11 With Many Sources log consolidator web server web server Web- server Log Web- server Log log_events log-stash log-stash data center GHQ log_events events Elaborate events (log-stash) Aggregate Signal detection log consolidator web server Web- server Log web server Web- server Log log_events log-stash log-stash data center
  • 12. © 2017 MapR Technologies 12 With Many Sources log consolidator web server web server Web- server Log Web- server Log log_events log-stash log-stash data center GHQ log_events events Elaborate events (log-stash) Aggregate Signal detection log consolidator web server Web- server Log web server Web- server Log log_events log-stash log-stash data center log consolidator web server Web- server Log web server Web- server Log log_events log-stash log-stash data center
  • 13. © 2017 MapR Technologies 13 Analytics Doesn’t Care About Location GHQ log_events events Elaborate events (log-stash) Aggregate Signal detection
  • 14. © 2017 MapR Technologies 14 MapR Streams: Geo-distributed data plus separation of concerns
  • 15. © 2017 MapR Technologies 15 Why stream?
  • 16. © 2017 MapR Technologies 16 Mechanism for a Global Data Fabric: Streaming • Streaming data is becoming mainstream • Innovative technologies are emerging to handle and process streaming data • Stream-1st architecture is a powerful approach with surprisingly widespread advantages
  • 17. © 2017 MapR Technologies 17 Images © E. Friedman . IoT Data: Sensors & Smart Parts ©WesAbrams
  • 18. © 2017 MapR Technologies 18 Revolution in Manufacturing: Smart Tools • Respond quickly to new requirements • IoT enabled “smart tools” on manufacturing floor • Reconfigurable factory • Removes barriers to communication: engineering, design, analysis, manufacture in one space Image credit Bond Bryan Architecture, used with permission Factory 2050: Boeing AMRC at University of Sheffield
  • 19. © 2017 MapR Technologies 19 Streaming data has value beyond real-time insights
  • 20. © 2017 MapR Technologies 20 Predictive Maintenance • Streaming sensor data + long term maintenance histories  • Machine learning model detects anomalous pattern • “Failure signature” warns of need for maintenance before damage occurs Image courtesy Mtell used with permission. in Real World Hadoop by Dunning & Friedman © 2015
  • 21. © 2017 MapR Technologies 21 Stream-1st Architecture Real-time analytics EMR Patient Facilities management Insurance audit A B Medical tests C Medical test results People often start with event data streamed to a real-time application (A) Image © 2016 Ted Dunning & Ellen Friedman from Chap 1 O’Reilly book Streaming Architecture used with permission
  • 22. © 2017 MapR Technologies 22 Heart of Stream-1st Architecture: Message Transport Real-time analytics EMR Patient Facilities management Insurance audit A B Medical tests C Medical test results The right messaging tool supports other classes of use cases (B & C) Image © 2016 Ted Dunning & Ellen Friedman from Chap 1 O’Reilly book Streaming Architecture used with permission
  • 23. © 2017 MapR Technologies 23 Key capabilities: Message Transport: Apache Kafka & MapR Streams • Highly scalable • High throughput, low latency • Multiple producers & consumers: decoupled • Durable messages • Geo-distributed replication preserves offsets (unique to MapR Streams) Consumer group Messages Producer Consumer group Consumer group Producer Image © 2016 Ted Dunning & Ellen Friedman from Chap 2 of O’Reilly book Streaming Architecture used with permission
  • 24. © 2017 MapR Technologies 24 Stream transport supports micro services
  • 25. © 2017 MapR Technologies 25 Stream-1st Architecture: Basis for Micro-Services Stream instead of database as the shared “truth” POS 1..n Fraud detector Last card use Updater Card analytics Other card activity Image © 2016 Ted Dunning & Ellen Friedman from Chap 6 of O’Reilly book Streaming Architecture used with permission
  • 26. © 2017 MapR Technologies 26 MapR Streams: efficient bi-directional, multi-master replication
  • 27. © 2017 MapR Technologies 27 With MapR, Geo-Distributed Data Appears Local stream Data source Consumer
  • 28. © 2017 MapR Technologies 28 With MapR, Geo-Distributed Data Appears Local stream stream Data source Consumer
  • 29. © 2017 MapR Technologies 29 With MapR, Geo-distributed Data Appears Local stream stream Data source ConsumerGlobal Data Center Regional Data Center
  • 30. © 2017 MapR Technologies 30 Unique to MapR: Manage Topics at Stream Level • Many more topics on MapR cluster • Topics are grouped together in Stream (different from Kafka) • Policies set at the Stream level such as time-to-live, ACEs (controlled access at this level is different than Kafka) • Geo-distributed stream replication (different from Kafka) Stream Topic 1 Topic 3 Topic 2 Image © 2016 Ted Dunning & Ellen Friedman from Chap 5 of O’Reilly book Streaming Architecture used with permission
  • 31. © 2017 MapR Technologies 31 Multi-Master Matters: MapR Table Replication SF DB NY DB Source Source SF DB NY DB Source Source A B Better: Bi-directional table replication
  • 32. © 2017 MapR Technologies 32 Multi-Master Replication Matters From O’Reilly report “Data Where You Want It: Geo-Distribution of Big Data and Analytics” © 2017 by Ted Dunning & Ellen Friedman, used with permission
  • 33. © 2017 MapR Technologies 33 Legacy Applications MapR: Files, tables, streams in one technology Data Center Big Data 1.0 Applications Next-Gen Applications Converged Data Platform High Availability Real Time Unified Security Multi-tenancy Disaster Recovery Global Namespace Real-Time Database Stream TransportWeb-Scale Storage
  • 34. © 2017 MapR Technologies 34 Example Files Table Streams Directories Cluster Volume mount point
  • 35. © 2017 MapR Technologies 35 Geo-distributed data with ease of management
  • 36. © 2017 MapR Technologies 36 Remember this? Universal Pathname Files Table Streams Directories Cluster Volume mount point
  • 37. © 2017 MapR Technologies 37 Global Namespace: Advantage for Geo-Distribution • Enables program to refer to data anywhere – On premise, in cloud, around the world • Easier to manage: Supports separation of concerns • Unique to MapR
  • 38. © 2017 MapR Technologies 38 The Need for Cloud • Fewer machines to purchase, flexibility regarding hardware • Eases pressure on IT • Cloud-bursting possible for short-term increase in computing • Lets you optimize cluster usage (especially if you maintain cloud neutrality)
  • 39. © 2017 MapR Technologies 39 Cloud Neutrality for Optimization Burst Private On-premise data center Core 4x cheaper for base load 4x cheaper for peak loads
  • 40. © 2017 MapR Technologies 40 Hybrid Cloud On-Premise Architecture • Optimization of resources, but… • Too complicated unless have a good platform for geo-distributed data • MapR is such a platform
  • 41. © 2017 MapR Technologies 41 The Need for Containers • To deploy exactly the same thing in many data-centers • To get predictable behavior in production compared with testing • Result: Docker-style containers becoming ubiquitous
  • 42. © 2017 MapR Technologies 42 Key to Lightweight Containers: Persist State to MapR Data platform Stateful Application Stateful Application Stateless Application Container management system • Works for files, tables, streams • Run stateful applications in stateless containers
  • 43. © 2017 MapR Technologies 43 Data where you want it, compute where you need it…
  • 44. © 2017 MapR Technologies 44 ... including processing at the IoT edge
  • 45. © 2017 MapR Technologies 45 MapR Edge • Small footprint cluster for remote processing • Intended to run right next to the data producing device – Unified end-to-end security policy – Reliable replication to cloud and data center, even with occasional connections • Small but full MapR data services (files, tables, streams) – Normal data protection (snapshots, mirroring, replication) – Normal management capabilities (volumes, fine-grained access control, monitoring)
  • 46. © 2017 MapR Technologies 46 Why MapR Edge is Useful Data source Data source Data source Report Data source • Designed to sit at IoT edge • Intended to run right next to data- producing device
  • 47. © 2017 MapR Technologies 47 Who needs MapR Edge? • Connected car industry • Telecommunications industry • Hospitals and medical testing facilities • Anyone who benefits from global learning but needs local action © Ellen Friedman © Ellen Friedman ©WesAbrams
  • 48. © 2017 MapR Technologies 48 MapR Edge: Improves Time to Insight Before MapR Edge After MapR Edge • Oil & gas • Medical device • Car test & dev 48 hours 12 hours 24 hours < 2 hours < 15 minutes < 5 minutes
  • 49. © 2017 MapR Technologies 49 Use Case: Telecommunications Callers Towers cdr data
  • 50. © 2017 MapR Technologies 50 Streaming in Telecom • Data collection & handling happens at different levels – tower, local data center, central data center) • Batch: Can take 30 minutes per level • Streaming: Latency drops to seconds or sub-seconds per level • Ability to respond as events occur • MapR Streams enables stream replication with offsets across data centers
  • 51. © 2017 MapR Technologies 51 Telecom Reporting and Logging Tower 2 Tower 1 Data source HQ Aggregate Data source
  • 52. © 2017 MapR Technologies 52 Data Center REST https REST GW Use Case: Automotive IoT Car CAN Bus μ Raw stream Dispatcher Data stream Metrics μ
  • 53. © 2017 MapR Technologies 53 Global analytics models GHQ metrics data center 1 data center 2 metrics m1 m2 m3 m4 metrics m1 m2 m3 m4 models models Global Machine Learning Foundation
  • 54. © 2017 MapR Technologies 54 Learn globally, act locally
  • 55. © 2017 MapR Technologies 55 Image © Ellen Friedman 2015, used with permission. From Chap 7 “Streaming Architecture” book. Read free online: http://bit.ly/streams-ebook-ch7 . Over 20% of world’s shipping containers pass through Singapore’s port. Use Case: Container Shipping
  • 56. © 2017 MapR Technologies 56 IoT Data for Container Shipping Tokyo: Sensors stream data to on-board cluster that reports to onshore cluster while in port En route to Singapore: MapR Streams geo-replication sends data to next port before ship arrives. Problem in Sydney: Real-time insights alert to “high humidity” in some containers Singapore Tokyo Sydney Corporate HQ A B C Details in Chapter 7 “Streaming Architecture” book. Read free online here: http://bit.ly/streams-ebook-ch7 Figure used with permission.
  • 57. © 2017 MapR Technologies 57 Additional Resources O’Reilly report by Ted Dunning & Ellen Friedman © March 2017 Download free pdf courtesy of MapR: http://bit.ly/mapr-geo-distribution-ebook-pdf O’Reilly book by Ted Dunning & Ellen Friedman © March 2016 Read free courtesy of MapR https://mapr.com/streaming-architecture-using- apache-kafka-mapr-streams/
  • 58. © 2017 MapR Technologies 58 Please support women in tech – help build girls’ dreams of what they can accomplish © Ellen Friedman 2015#womenintech #datawomen
  • 59. © 2017 MapR Technologies 59 Q&A @mapr Maprtechnologies tdunning@mapr.com ENGAGE WITH US @ ted_dunning @ Ellen_Friedman