2. About me
Eyal Ben Ivri
Big Data & Cloud Architect, Sela Group
Focus On Hadoop Eco-System & Big-Data +
NoSQL Solutions
3. Modern Data – The Big Picture
IoT
User Data
Media Files
Documents
Machine Data
Log Files
4.
5. The Light Rail problem – TLV
Railway
Imagine the new light Rail maintenance
company
IoT – Internet of Trains (and cameras, and cash
registers and carts and rails and more…)
Analyze data in stream and in batch
Dashboards
Alerts
The perfect problem
6. What We Need
An integrated data solution that will be:
Able to process events from external sources
Able to walk data through different pipelines
Fast and responsive
Big-Data Ready
7. In Other Words
Consume
BI Dashboards Applications
Process
ETL Aggregations Computation Analysis Querying
Persist
Hadoop SQL NoSQL
Ingest
IoT Structured Data Un-Structured Data
8. Microsoft Azure Services for
IoT and BigData
Devices Device Connectivity Storage Analytics Presentation & Action
Event Hubs SQL Database
Machine
Learning
App Service
Service Bus
Table/Blob
Storage
Stream Analytics Power BI
External Data
Sources
DocumentDB HDInsight
Notification
Hubs
Data Lake Store Data Factory Mobile Services
External Data
Sources
Data Lake
Analytics
BizTalk Services
{ }
9. Microsoft Azure Services for
IoT and BigData
Devices Device Connectivity Storage Analytics Presentation & Action
Event Hubs SQL Database
Machine
Learning
App Service
Service Bus
Table/Blob
Storage
Stream Analytics Power BI
External Data
Sources
DocumentDB HDInsight
Notification
Hubs
Data Lake Store Data Factory Mobile Services
External Data
Sources
Data Lake
Analytics
BizTalk Services
{ }
10. Event Hub
Messages at scale
Why not throw it into a queue, and have a
listener at the backend?
Scaling limits, because of the architecture of queues
and topics of a standard Service Bus
Event Hub uses a partition model
11. Getting Started
Easy to set up
Two Configurations
Partition Count – Depend on the number of consumers (2-
32)
Message Retention (days) – between 1 and 7 days
Secured using SAS Policies
13. Field
Gateway
Device
Connectivity & Management
Analytics &
Operationalized Insights
IoT & Data Processing Patterns
Devices
RTOS,Linux,Windows,Android,iOS
Protocol
Adaptation
Batch Analytics & Visualizations
Azure HDInsight, AzureML, Power BI,
Azure Data Factory
Hot Path Analytics
Azure Stream Analytics, Azure HDInsight Storm
Hot Path Business Logic
Service Fabric & Actor Framework
Cloud Gateway
Event Hubs
&
IoT Hub
Field
Gateway
Protocol
Adaptation
14. TLV Railway
Can now ingest millions of messages each
second
These messages carry data from:
Devices
End-Machines
Servers
Next, we need to use this data to create real-
time alerts when something goes wrong
15. Azure Stream Analytics
Automatic recovery
Monitoring and alerting
Scale on demand
Managed Cloud Service
Each unit handles 1MB/s
Can scale up to 1GB/s
SQL like language
temporal windowing
semantics
support for reference data
16. Stream Analytics – Main Concepts
Inputs
Can be stream or reference data (metadata)
Stream Data sources can be Event Hub, Blob Storage
(using blobs with timestamps) or IoT Hub (preview)
Serialization types support CSV, JSON, and Avro
Query
A SQL query to that will select from input(s) and
dump results to output(s)
Output
Can be Blob, SQL, Event Hub (notification), Power BI
(preview), Table storage, Service Bus or DocumentDB
17. Tumbling Windows
How many trains entered each station every 5
minutes?
SELECT TrainId, COUNT(*) FROM EntryStream
GROUP BY TrainId, TumblingWindow(minute,5)
18. Temporal Windows
Tumbling Window
A series of fixed-sized, non-overlapping and
contiguous time intervals
Hopping Window
Scheduled overlapping windows
Sliding Window
Outputs events only for those points in time when
the content of the window actually changes
19. TLV Railway
Can now respond in near-real-time to events as
they happen
Track and maintain malfunctioning equipment
Receive real time data regarding customers
entering and leaving stations
Data can now be processed, so we need a place
to save it, preferably at scale.
20. DocumentDB and Azure Data
Services
fully managed, scalable, queryable, schema free JSON
document database service for modern applications
transactional processing
rich query
managed as a service
elastic scale
internet accessible http/rest
schema-free data model
arbitrary data formats
21. DocumentDB features
JSON Documents
SQL support
Linq Support
REST API Support
JS Support (triggers, UDFs, stored procedures)
Automatic Index
Multiple Document Transactions
Tunable Consistency
22. DocumentDB Key Concept
Collection
A collection of Documents
Not a table (different entities can go into the same
collection)
Collections = Partitions
Not just logical containers, but physical ones
24. TLV Railway
Can now store it’s data in a highly scalable store
Great for interactive querying of any data
Messages from sensors
Reference Data
But this data (and other data) needs to move to
other places (SQL, Batch processing, ML). How?
25. What is Azure Data Factory?
Azure Data Factory is a managed service to produce
trusted information from data stored in the cloud
and on-premises. Easily create, orchestrate and
schedule highly-available, fault tolerant work flows
to move and transform your data at scale.
26. Evolving Approaches to Analytics
ETL Tool
(SSIS, etc)
EDW
(SQL Svr, Teradata, etc)
Extract
Original
Data
Load
Transformed
Data
Transform
BI Tools
Ingest
Original
Data
Scale-out
Storage &
Compute
(HDFS, Blob Storage,
etc)
Transform & Load
Data Marts
Data Lake(s)
Dashboards
Apps
Streaming data
27. Data Factory – Main concepts
Data Store
A data source/sink component
SQL (Azure or On-Premise), Storage, DocumentDB and
more)
Data Set
A defined data set that is contained inside a data store
One data store can have many data sets
Compute
A service for computation
HDInsight, Azure Batch, Data Lake Analytics, Azure ML
28. Data Factory – Main concepts
Pipeline
Set of instructions
“Take data from data set A and move to compute,
then store results in data set B”
Slices
Everything is time sliced
A data set (source) can declare on what time
intervals the data can be sliced, and the pipeline will
be activated when a new slice is ready
JSON
29.
30. Microsoft Azure Services for
IoT and BigData
Devices Device Connectivity Storage Analytics Presentation & Action
Event Hubs SQL Database
Machine
Learning
App Service
Service Bus
Table/Blob
Storage
Stream Analytics Power BI
External Data
Sources
DocumentDB HDInsight
Notification
Hubs
Data Lake Store Data Factory Mobile Services
External Data
Sources
Data Lake
Analytics
BizTalk Services
{ }
31. Microsoft Azure Services for
IoT and BigData
Devices Device Connectivity Storage Analytics Presentation & Action
Event Hubs SQL Database
Machine
Learning
App Service
Service Bus
Table/Blob
Storage
Stream Analytics Power BI
External Data
Sources
DocumentDB HDInsight
Notification
Hubs
Data Lake Store Data Factory Mobile Services
External Data
Sources
Data Lake
Analytics
BizTalk Services
{ }
32. TLV Railway
Can now integrate different services and
different data sources
Move data with ease and as little hassle as
possible
What about aggregations, deeper dive into
data, for more complex analysis?
33.
34. HDInsight
Hadoop-as-a-Service
Based on the Hortonworks distribution
Few flavors:
Hadoop (Windows + Linux)
Storm (Windows + Linux)
HBase (Windows + Linux)
Spark (Windows + Linux)
37. TLV Railway - Summary
Can now perform advanced analytics on top of
large amounts of data, in a variety of formats
(not just structured, boring data)
Can integrate all the loose ends of data coming
in, with data generated in ”Old-School” data
platforms like SQL that is collected from Line-
of-Business applications
We’ve covered data ingestion, responding in
real-time, querying, storing and processing
Azure Stack
38. Hadoop and OSS vs.
Azure IoT and BigData Ecosystem
Azure Ecosystem OSS
Event Hubs Kafka
Stream Analytics Storm
HDInsight Hadoop
Map Reduce Map Reduce
Hive Hive
Spark Spark
HBase HBase
Azure ML Mahout
Data Factory Pig
DocumentDB MongoDB / Couchbase
Key goal of slide:
IoT as you know is a hot area these days and there are a number of players that claim to be active in this space…. And they tend to focus on specific elements you see in this diagram.
Microsoft has the most comprehensive portfolio of cloud services that customers need to develop and deploy end-to-end IoT solutions.
Customers are adopting these services and are successfully deploying their solutions today (reference Rockwell, ThyssenKrupp)
Talk track [Short Version for Sam’s Leadership Session]:
As we think about Azure IoT services, Microsoft has the most comprehensive portfolio of cloud services that customers need to develop and deploy end-to-end IoT solutions
Ranging from devices that produce data, to connecting them to the cloud storage, and driving analytics to gain valuable business insights that allows enterprises to take actions
Talk track [Long Version Chris’ Breakout Session]:
As we think about Azure IoT services, there are a collection of capabilities involved.
First there are Producers. These can be basic sensors, small form factor devices, traditional computer systems, or even complex assets made up of a number of data sources.
Next we have the Connect Devices capabilities on the ingress level within and around Azure. The primary destination is Service Bus & Event Hubs, but this relies on client agent technology either at the edge device level or within a field or cloud gateway. We also have capabilities for other external data sources o provide data
As data is ingressed to Azure, there are various Storage options there can be a number of destinations engaged. Traditional database technology, table or blob, or even more complex destinations like Document DB are possible. External or third party technologies can also be used. This is where the flexibility and agility of a platform shows its strength, This is where analysts like Gartner are forming opinions about just how robust our platform can be.
As this data is processed in Azure, there are a number of capabilities that can be utilized. Machine Learning, HD Insight, Stream Analytics are examples of tools that can analytics the data in various ways.
Finally the concept of Take Actions uses Azure services. Data may populate a LOB portal, be pushed to apps, or presented in analytics and productivity tools. These are all ways that the data gets out of these architecture points to allow organizations to use analysis to change / transform their business.
Through all of these areas, there is the possibility of utilizing existing investments either within your Azure environment, or elsewhere.
Key goal of slide:
IoT as you know is a hot area these days and there are a number of players that claim to be active in this space…. And they tend to focus on specific elements you see in this diagram.
Microsoft has the most comprehensive portfolio of cloud services that customers need to develop and deploy end-to-end IoT solutions.
Customers are adopting these services and are successfully deploying their solutions today (reference Rockwell, ThyssenKrupp)
Talk track [Short Version for Sam’s Leadership Session]:
As we think about Azure IoT services, Microsoft has the most comprehensive portfolio of cloud services that customers need to develop and deploy end-to-end IoT solutions
Ranging from devices that produce data, to connecting them to the cloud storage, and driving analytics to gain valuable business insights that allows enterprises to take actions
Talk track [Long Version Chris’ Breakout Session]:
As we think about Azure IoT services, there are a collection of capabilities involved.
First there are Producers. These can be basic sensors, small form factor devices, traditional computer systems, or even complex assets made up of a number of data sources.
Next we have the Connect Devices capabilities on the ingress level within and around Azure. The primary destination is Service Bus & Event Hubs, but this relies on client agent technology either at the edge device level or within a field or cloud gateway. We also have capabilities for other external data sources o provide data
As data is ingressed to Azure, there are various Storage options there can be a number of destinations engaged. Traditional database technology, table or blob, or even more complex destinations like Document DB are possible. External or third party technologies can also be used. This is where the flexibility and agility of a platform shows its strength, This is where analysts like Gartner are forming opinions about just how robust our platform can be.
As this data is processed in Azure, there are a number of capabilities that can be utilized. Machine Learning, HD Insight, Stream Analytics are examples of tools that can analytics the data in various ways.
Finally the concept of Take Actions uses Azure services. Data may populate a LOB portal, be pushed to apps, or presented in analytics and productivity tools. These are all ways that the data gets out of these architecture points to allow organizations to use analysis to change / transform their business.
Through all of these areas, there is the possibility of utilizing existing investments either within your Azure environment, or elsewhere.
Key goal of slide:
IoT as you know is a hot area these days and there are a number of players that claim to be active in this space…. And they tend to focus on specific elements you see in this diagram.
Microsoft has the most comprehensive portfolio of cloud services that customers need to develop and deploy end-to-end IoT solutions.
Customers are adopting these services and are successfully deploying their solutions today (reference Rockwell, ThyssenKrupp)
Talk track [Short Version for Sam’s Leadership Session]:
As we think about Azure IoT services, Microsoft has the most comprehensive portfolio of cloud services that customers need to develop and deploy end-to-end IoT solutions
Ranging from devices that produce data, to connecting them to the cloud storage, and driving analytics to gain valuable business insights that allows enterprises to take actions
Talk track [Long Version Chris’ Breakout Session]:
As we think about Azure IoT services, there are a collection of capabilities involved.
First there are Producers. These can be basic sensors, small form factor devices, traditional computer systems, or even complex assets made up of a number of data sources.
Next we have the Connect Devices capabilities on the ingress level within and around Azure. The primary destination is Service Bus & Event Hubs, but this relies on client agent technology either at the edge device level or within a field or cloud gateway. We also have capabilities for other external data sources o provide data
As data is ingressed to Azure, there are various Storage options there can be a number of destinations engaged. Traditional database technology, table or blob, or even more complex destinations like Document DB are possible. External or third party technologies can also be used. This is where the flexibility and agility of a platform shows its strength, This is where analysts like Gartner are forming opinions about just how robust our platform can be.
As this data is processed in Azure, there are a number of capabilities that can be utilized. Machine Learning, HD Insight, Stream Analytics are examples of tools that can analytics the data in various ways.
Finally the concept of Take Actions uses Azure services. Data may populate a LOB portal, be pushed to apps, or presented in analytics and productivity tools. These are all ways that the data gets out of these architecture points to allow organizations to use analysis to change / transform their business.
Through all of these areas, there is the possibility of utilizing existing investments either within your Azure environment, or elsewhere.
Key goal of slide:
IoT as you know is a hot area these days and there are a number of players that claim to be active in this space…. And they tend to focus on specific elements you see in this diagram.
Microsoft has the most comprehensive portfolio of cloud services that customers need to develop and deploy end-to-end IoT solutions.
Customers are adopting these services and are successfully deploying their solutions today (reference Rockwell, ThyssenKrupp)
Talk track [Short Version for Sam’s Leadership Session]:
As we think about Azure IoT services, Microsoft has the most comprehensive portfolio of cloud services that customers need to develop and deploy end-to-end IoT solutions
Ranging from devices that produce data, to connecting them to the cloud storage, and driving analytics to gain valuable business insights that allows enterprises to take actions
Talk track [Long Version Chris’ Breakout Session]:
As we think about Azure IoT services, there are a collection of capabilities involved.
First there are Producers. These can be basic sensors, small form factor devices, traditional computer systems, or even complex assets made up of a number of data sources.
Next we have the Connect Devices capabilities on the ingress level within and around Azure. The primary destination is Service Bus & Event Hubs, but this relies on client agent technology either at the edge device level or within a field or cloud gateway. We also have capabilities for other external data sources o provide data
As data is ingressed to Azure, there are various Storage options there can be a number of destinations engaged. Traditional database technology, table or blob, or even more complex destinations like Document DB are possible. External or third party technologies can also be used. This is where the flexibility and agility of a platform shows its strength, This is where analysts like Gartner are forming opinions about just how robust our platform can be.
As this data is processed in Azure, there are a number of capabilities that can be utilized. Machine Learning, HD Insight, Stream Analytics are examples of tools that can analytics the data in various ways.
Finally the concept of Take Actions uses Azure services. Data may populate a LOB portal, be pushed to apps, or presented in analytics and productivity tools. These are all ways that the data gets out of these architecture points to allow organizations to use analysis to change / transform their business.
Through all of these areas, there is the possibility of utilizing existing investments either within your Azure environment, or elsewhere.