Ford Motor Company's mission to become both an Automotive and Mobility company has required an evolution in our analytics data flow, from traditional batch processing systems to dynamically routed stream processing based systems. Valuable data is continually being generated across the enterprise, from consumer WiFi in dealerships, robots working on the assembly line, and vehicle diagnostic data, and is now flowing into Ford's Real Time Streaming Architecture (RTSA). Our goal was to develop a provider agnostic, end to end solution to ingest and dynamically route individual streams of data in less than one second from edge node to Ford's on premise data center, or vice versa. The architecture dynamically scales in the cloud to reliably handle thousands of outbound and inbound transactions per second, with data provenance capabilities to audit data flow from end to end.
3. 3
Product Vision / Mission Statement
•Experiments (BDD 2.0)
• No platform to do ‘Streaming’ Experiments
• How do we enable ‘Self-Service’ Streaming?
•Utility Ingestion
• Existing Storm solution would not scale
operationally the way it had been implemented.
• Today applications developer their own one off
ingestion solutions to deal with proxy and
firewall rules. How do we reduce the surface
area that is exposed while handling multiple
types of ingest?
4. SCA-V / BDD BUSINESS VALUE
BDD (Big Data Drive) drives value across the enterprise today and in the
future
Pillar 1
Collection
Pillar 2
Configuration
Pillar 3
Edge Analytics
Enables
• Off cycle credit validation
• Intelligent Customer Interactions
• Vehicle performance insights
• Customer specific city solutions
• Fleet based telematics
• Warranty reduction across fleets
• Powertrain fuel efficiency improvement
• Automotive cybersecurity
• High-touch customer / dealer engagement
• Product feature validation
• Vehicle feature deployment
• Product development lifecycle reduction
• Vehicle diagnostic and prognostic enhacements
5. 5
SCA-V (Single Complete
Actionable Vehicle
Landing Zone
Discovery
Zone
Data Supply
Chain
Multi-Platform Data and Analytics Ecosystem
Data and Analytics
Ecosystem
SCA-C (Single Complete
Actionable Customer)
other
6. • Development leverages the product team approach which promotes cross-
functional partnerships in FordLabs, PD, IT and GDI&A
• Developed the first edge computing platform which emulates the fully
networked vehicle-1 and 2 (FNV-1/FNV-2) and provides production grade
web based software to support this vehicle platform
• Created the first real-time streaming application in the enterprise
• Represents a significant shift toward data-driven decision making by
leveraging rich, connected vehicle data. The solution includes Natural
Language Search, Real Time Streaming, vehicle architecture agnosticism,
software deployable anywhere (ePID2.0, TCU, Sync, ECG), and rapid
vehicle data validation processes
• The platform can accommodate a diverse set of vehicles across the fleet
With BDD, we created a cloud agnostic Ford owned and managed
real time streaming solution
66
BDD 2.0 ACCOMPLISHMENTS: A THIN SLICE
7. Real Time Streaming Analytics - Conceptual
Real Time streaming is an incremental capability over traditional batch processing to
ingest, transform and score individual streams of real time data
Lambda architecture is a data-processing architecture
designed to handle massive quantities of data by taking
advantage of both batch and stream-processing methods.
Routing Pub/Sub Processing
AnalyzeStore
Real-Time
Batch Model is trained,
optimized and
deployedHistorical
persistence
The model is executed
8. Real Time Streaming Analytics – Conceptual
8
Routing Pub/Sub Processing
AnalyzeStore
Real-Time
Batch Model is trained,
optimized and
deployedHistorical
persistence
The model is executed
1
2
3
Real Time Streaming Data ingested, routed, transformed
Data passed from speed layer to batch/storage layer
Analytical apps consuming/producing data in the real-time speed layer
4 Historical data analyzed, models developed and trained
RTSA – Analytics & Data Flow Life-Cycle
5 Trained analytical models deployed to the real-time speed layer
1
2
3
4
5
Apps
Data
Analytics
Speed
10. Vehicle
WebSocket
NiFi
Apps XYZ
NiFi
Pull*
HDFS
Push
Push
Apps XYZ
Azure CLOUD
*Native NiFi Site-2-Site HTTP Proxy Capability.
Fixes Storm Endpoint Scaling Ops problem today.
EventHub/IoTHub
Ford Network and
Data Center
Firewall
P
M
M
L
Firewall
P
M
M
L
Intelligent Mobile
Apps
Public Internet
EDGE/IoT
Dynamic Stream Routing
10
1
2
3
Data from OpenXC ingested via Cloud Foundry WebSocket
Data routed from Cloud to Ford data center via NiFi
Specific data consumed by an analytical app
4 Data published to Kafka on prem
Live Demo - Data Flow Narrative
5 Data persisted in Hadoop on prem
5
1
2
1
3
4
Live Demo
Real Time Streaming Analytics – Physical
HBase
11. Summary of Key Concepts
RTSA is….
•Fully developed, managed, and deployed by Ford
•We own the data at every step
•Fully cloud and data center agnostic
•Push and pull capable
•No additional Ford Data Center Exposure
•Horizontally scalable
11
With BDD (Big Data Drive), we created a cloud agnostic Ford owned and
managed real time streaming solution
12. • RTSA product to provide foundational enterprise services :
–Data ingest
–Data Processing
–Stream Routing
• Including Cloud to On-premise
–Analytics
–Data Persistence On-premise
Roadmap
12
Ingestion, Transformation, Processing, and Persistence of
Streaming Data in Real-Time
Foundational services available in production environment Q1 for
applications promoted from experiment status.
13. Vehicle
WebSocket
NiFi
Apps XYZ
NiFi
Pull*
HDFS
Push
Push
Apps XYZ
Azure CLOUD
*Native NiFi Site-2-Site HTTP Proxy Capability.
Fixes Storm Endpoint Scaling Ops problem today.
EventHub/IoTHub
Ford Network and
Data Center
Firewall
P
M
M
L
Firewall
P
M
M
L
Intelligent Mobile
Apps
Public Internet
EDGE/IoT
Dynamic Stream Routing
13
HBase
Other Opportunities
14. 14
Vehicle
WebSocket
NiFi
Apps XYZ
NiFi
Pull*
HDFS
Push
Push
Apps XYZ
Azure CLOUD
*Native NiFi Site-2-Site HTTP Proxy Capability.
Fixes Storm Endpoint Scaling Ops problem today.
EventHub/IoTHub
Ford Network and
Data Center
Firewall
REST
P
M
M
L
Firewall
P
M
M
L
Intelligent Mobile
Apps
Public Internet
EDGE/IoT
Dynamic Stream Routing
Other Opportunities
NY FordHub Cisco Meraki WiFi
Data started flowing 2/28 via RTSA
Production infrastructure in Q1
HBase
15. 15
Vehicle
WebSocket
NiFi
Apps XYZ
NiFi
Pull*
HDFS
Push
Push
Apps XYZ
Azure CLOUD
*Native NiFi Site-2-Site HTTP Proxy Capability.
Fixes Storm Endpoint Scaling Ops problem today.
EventHub/IoTHub
Ford Network and
Data Center
Firewall
REST
P
M
M
L
Firewall
P
M
M
L
Intelligent Mobile
Apps
Public Internet
EDGE/IoT
Dynamic Stream Routing
Other Opportunities??
HBase
16. 16
Third Party
Data Sources
Third Party
Data Consumers
(as needed)
Vehicle
WebSocket
NiFi
Apps XYZ
NiFi
Pull*
HDFS
Push
Push
Apps XYZ
Azure CLOUD
*Native NiFi Site-2-Site HTTP Proxy Capability.
Fixes Storm Endpoint Scaling Ops problem today.
EventHub/IoTHub
Ford Network and
Data Center
Firewall
REST
WebSocket
REST
MQTT
P
M
M
L
Firewall
P
M
M
L
Intelligent Mobile
Apps
Public Internet
EDGE/IoT
Dynamic Stream Routing
Event and/or
Streaming Data
Made Available to
Authorized Third
Party Partners as
needed
• DPF Regen
• Silver
• Security
• Plant Floor
• ControlTec
• LCV Telematics
• MiniFi
• Cisco Meraki
-Dealer WiFi
-Other Hubs
HBase
18. 18
Andrea Siudara
Tom BryansMelissa Richards
Kevin Cooper
RTSA Product Owner
Tracy HewiitDan Totten
Core RTSA Organization
RTSA Product Organization
3/11/2017
Laura Churchill
PM
T Young
J Niemiec
G Gwidz
DHickey
Jill Johnson
PM
Raju Doma
Delivery Supervisor
C Petras
E Ulicny
D Godwin
GDIA
Information Technology
GDIA
Smart Mobility Analytics
1) Intro RTSA
Lambda
2) BBD was to validate and instantiate the RTSA
3) Demo - Live Drive
- Oldie but goodies
- Huey
4) Vision
5) Roadmap - production plans
- NY Hub
- BDD 2
Cotinued support for expierments
- PLant floor (FIS)
- Security
- Silver
- DPF regen
- Dealer WiFi (Meraki)
GDIA is building an enterprise single complete and actionable data and analytics eco-system, centered around SCA-C, focused on ingesting and curating Ford’s internal applications and warehouses and providing analytics as a service opportunities. This important work and can be accomplished with a shared vision and roadmap with IT. But as we enter the emerging world of connectivity driven customer experience management and data driven everything, the data and analytics ecosystem must expand to include other edge nodes, including the car. This integrated multi-platform data analytics ecosystem can not be delivered by GDIA and IT alone. The partnership needs to be expanded to include PD.
Winners are moving through build->measure->learn fastest. Our born into competitors understand this.
Just another node. Not part of data analytics ecosystem.
Emerging requires shared understanding with PD.
Real-time analytics is a term used to refer to analytics that are able to be accessed as they come into a system. In general, the term analytics is used to define data patterns that provide meaning to a business or other entity, where analysts collect valuable information by sorting through and analyzing that data.
Vast amounts of data are flowing at high velocity over the wire today. Organizations that can process and act on this streaming data in real time can dramatically improve efficiencies and differentiate themselves in the market.
Some additional bullet points for the ‘what is real time streaming’
Real time data ingesting and analysis –
the speed of today’s processing systems have moved from classical data-warehousing batch reporting to the realm of real-time processing and analytics.
Real-time means near to zero latency and access to information whenever it is required.
Real-time analytics is a term used to refer to analytics that are able to be accessed as they come into a system. In general, the term analytics is used to define data patterns that provide meaning to a business or other entity, where analysts collect valuable information by sorting through and analyzing that data.
Vast amounts of data are flowing at high velocity over the wire today. Organizations that can process and act on this streaming data in real time can dramatically improve efficiencies and differentiate themselves in the market.
Some additional bullet points for the ‘what is real time streaming’
Real time data ingesting and analysis –
the speed of today’s processing systems have moved from classical data-warehousing batch reporting to the realm of real-time processing and analytics.
Real-time means near to zero latency and access to information whenever it is required.
DPF Regen
Silver
Security
Plant Floor
Cisco Meraki
Dealer WiFi
New York Hub