SlideShare una empresa de Scribd logo
1 de 33
Making Sense of Streaming
Sensor Data: How Uber Detects
On-Trip Car Crashes
Nikolas Anderson, Safety
Jin Yang, Safety
October 9, 2019
Who We Are
Nikolas Anderson
− Software Engineer, Safety
Jin Yang
− Software Engineer, Safety
RideCheck Announcement
Rider Driver
Proactive
Response Team
■ A First Model
■ Choosing and Integrating Flink
■ 1st Iteration: A Modular, Light Topology
■ 2nd Iteration: On a Reusable Sensor Platform
■ 3rd Iteration: On-Trip Detection
Agenda
■ A First Model
■ Choosing and Integrating Flink
■ 1st Iteration: A Modular, Light Topology
■ 2nd Iteration: On a Reusable Sensor Platform
■ 3rd Iteration: On-Trip Detection
Agenda
What Does a Crash Look Like?
● Distance from dropoff to destination
● Rider/Driver cancellation vs normal trip
completion
● Overall length of trip (time, distance)
● Location context (highway, movie
theatre, airport, etc.)
Trip Context Features
Large Force
Long Stop
>3g
Sensor Event Features
GPS
Long Stop
Accelerometer
Spikes
Building a Model
● Uber has very high-accuracy labels
● Extremely unbalanced dataset
● Model trained using Apache Spark
● Can host model for streaming using
platform called Michelangelo
The Apache Spark logo is either a registered trademark or a trademark of the Apache Software Foundation in the United States and/or other countries. No endorsement by The Apache Software Foundation is implied by the use of this mark.
■ A First Model
■ Choosing and Integrating Flink
■ 1st Iteration: A Modular, Light Topology
■ 2nd Iteration: On a Reusable Sensor Platform
■ 3rd Iteration: On-Trip Detection
Agenda
Why Flink?
● Uber migrated from Samza to Flink
● Rich API: keyBy, join, window, etc.
● Supports batch processing
● Exactly once guarantee
The Apache Spark and Apache Flink logos are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. No endorsement by The Apache Software Foundation is implied by the use of these marks.
Uber Infrastructure: Schema’d Kafka
● Many Kafka topics at Uber have enforced schemas
● Centralized schema registry that stores Avro schemas
● Wrote a custom, config-driven SourceFunction/SinkFunction that
loads into and out of generated Java classes
topic-name
feature-name
DataStream<T> inputStream = env.addSource(
(SourceFunction<T>) getInputs().get(topicName),
new AvroTypeInfo<>(tClass)
);
Uber Infrastructure: M3 Metrics
● In-house, open source metrics system, M3, roughly compatible with
Prometheus
● Implemented custom MetricReporter, lightly adapted from Flink’s
PrometheusReporter
● A Prometheus scraper then ingests into Uber metrics system
● Utilize M3 with our internal alerting and monitoring
M3 Query Language:
■ A First Model
■ Choosing and Integrating Flink
■ 1st Iteration: A Modular, Light Topology
■ 2nd Iteration: On a Reusable Sensor Platform
■ 3rd Iteration: On-Trip Detection
Agenda
Sensor Data at Uber
5 minutes 3 minutes 6 minutes
GPS: Points sent up one at a time;
0.5 Hz, latitude/longitude/speed, ~3 TB/day
Accelerometer: ~5-minute batched payloads;
25 Hz, 3 dimensions, ~10 TB/day
Hive
Cassandra
● Uber operates tens of millions of trips daily
● Sensor data is MBs per trip
Joining TBs of Sensor Streams
● Managing state is difficult; state is sensitive to failures
● Trade-offs between state size and data coverage
● Focus on reducing stream joins
?
GPS
Accel
Points sent up one or two at a time;
0.5 Hz, latitude/longitude/speed, ~3 TB/day
~5-minute payloads;
25 Hz, 3 dimensions, ~10 TB/day
Condensing Prior to Trip Joins/Aggregations
Detect
Spikes
Accelerometer
Payloads
Detect
Stops
Trip GPS
3 TB
10 TB 60 GB
1 GB
Condensing Prior to Trip Joins/Aggregations
Detect
Spikes
Aggregate
Spikes by Trip
Accelerometer
Payloads
Detect
Stops
Trip GPS
3 TB
10 TB 60 GB
1 GB
Join Stops and
Spikes by Trip
A Modular Post-Trip Crash Detection Topology
Location Service
Fetch GPS
Route
Detect
Spikes
Aggregate
Spikes by Trip
Detect
Stops
Trip End Event
Kafka Topic
Fetch Trip
Context
Stops and
Spikes
Scored by
Model
Michelangelo:
Machine Learning
Platform
Trip Service
To RideCheck
Service
Accelerometer
Payloads
Why so many jobs?
● Resource isolation
● "Paper trail"/debuggability
● Reuse intermediate features
● Facilitates cross-team
collaboration
The Power of Flink: Joining by Trip ID
● Use SessionWindow
● We first ensure that both streams are
deduplicated by trip ID
● Configured “gap” roughly acts as an expiry time
● See power of windows in Flink:
○ Triggered the moment that both sides have
arrived, immediately freeing state
■ A First Model
■ Choosing and Integrating Flink
■ 1st Iteration: A Modular, Light Topology
■ 2nd Iteration: On a Reusable Sensor Platform
■ 3rd Iteration: On-Trip Detection
Agenda
Platformizing: When Use Cases Diverge
● Example of stop detection between RideCheck's Crash vs Trip Anomaly
● Different products have different criteria for data latency, data quality,
precision, and definition for contextual features
● It's a case where a feature is engineered differently for different applications
Platformizing: Demand for Sensor Data
● Efforts joining large data streams to give data context is not unique
● Fraud detection, ETA calculation
● Examples of Aggregations: per trip, rider/driver match, geolocation (segments
of streets, region), time
Adding Sensor Embeddings to the Model
● Use deep learning to learn
features from raw sensor
data
○ GPS
○ Accelerometer
○ Gyroscope
● Produce 100-dimension
embedding
● Add output as features for
existing model
● TensorFlow sub-model runs
within Flink
Sensor Trip Aggregation
● A few things make this more feasible:
○ More demand for clean, convenient sensor
data from other teams within Uber
○ Reliable GPS now included in batched sensor
payloads
● Time for a Sensor Platform that does the
aggregation once for everybody
● Unlocks:
○ Full-trip raw data analysis
○ Easy use of trip context data
○ Data quality guarantees
?
Consolidated
Crash Detection
Consolidation
Trip
Aggregation
Extract Stops,
Spikes, Embeddings
Accel/Gyro/GPS
Payloads
Trip Event
Kafka
Topics Fetch Trip
Context
Scored by
Model
Michelangelo:
Hosted ML ModelsTrip Service
To
RideCheck
Per-Trip
Sensor
Data
Trip
Events
Why move to a single job now?
● Platform has simplified things
● Much more stable now; less
need to isolate
● Rapid iteration has slowed;
less need for debugging
■ A First Model
■ Choosing and Integrating Flink
■ 1st Iteration: A Modular, Light Topology
■ 2nd Iteration: On a Reusable Sensor Platform
■ 3rd Iteration: On-Trip Detection
Agenda
On-Trip Crash Detection: A Hybrid Solution
● Hardest part is forgoing some valuable trip context
● Model performance is inevitably lower due to
○ Giving up Post-Trip features
○ Consider only a sliding window of data
● Meant to be run in tandem with Post-Trip pipeline
On-Trip Crash Detection
Trip
Aggregation 1-Minute
Payloads
On-Trip Crash Detection
To
RideCheck
Trip
Events
Retain at most 5 minutes of data
Still Emit Original Per-
Trip Sensor Data
The Future
● Drive down the delay further
○ There would be enormous value
in being able to respond in
seconds
● On-device heuristics/model
○ Trigger early upload of batched
sensor data
○ Backend still does heavy lifting
Thank you!
nikolas@uber.com
jiny@uber.com
Proprietary and confidential © 2019 Uber Technologies, Inc. All rights reserved. No part of this document may be reproduced or utilized in any
form or by any means, electronic or mechanical, including photocopying, recording, or by any information storage or retrieval systems, without
permission in writing from Uber. This document is intended only for the use of the individual or entity to whom it is addressed and contains
information that is privileged, confidential or otherwise exempt from disclosure under applicable law. All recipients of this document are notified
that the information contained herein includes proprietary and confidential information of Uber, and recipient may not make use of, disseminate,
or in any way disclose this document or any of the enclosed information to any person other than employees of addressee to the extent
necessary for consultations with authorized personnel of Uber.

Más contenido relacionado

La actualidad más candente

Simplified Machine Learning Architecture with an Event Streaming Platform (Ap...
Simplified Machine Learning Architecture with an Event Streaming Platform (Ap...Simplified Machine Learning Architecture with an Event Streaming Platform (Ap...
Simplified Machine Learning Architecture with an Event Streaming Platform (Ap...
Kai Wähner
 

La actualidad más candente (20)

Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in Flink
 
KSQL Performance Tuning for Fun and Profit ( Nick Dearden, Confluent) Kafka S...
KSQL Performance Tuning for Fun and Profit ( Nick Dearden, Confluent) Kafka S...KSQL Performance Tuning for Fun and Profit ( Nick Dearden, Confluent) Kafka S...
KSQL Performance Tuning for Fun and Profit ( Nick Dearden, Confluent) Kafka S...
 
Monitoring MySQL with Prometheus and Grafana
Monitoring MySQL with Prometheus and GrafanaMonitoring MySQL with Prometheus and Grafana
Monitoring MySQL with Prometheus and Grafana
 
Introduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingIntroduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processing
 
Processing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial ServicesProcessing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial Services
 
Stream processing with Apache Flink (Timo Walther - Ververica)
Stream processing with Apache Flink (Timo Walther - Ververica)Stream processing with Apache Flink (Timo Walther - Ververica)
Stream processing with Apache Flink (Timo Walther - Ververica)
 
Apache Spark GraphX & GraphFrame Synthetic ID Fraud Use Case
Apache Spark GraphX & GraphFrame Synthetic ID Fraud Use CaseApache Spark GraphX & GraphFrame Synthetic ID Fraud Use Case
Apache Spark GraphX & GraphFrame Synthetic ID Fraud Use Case
 
Spark Summit East 2017: Apache spark and object stores
Spark Summit East 2017: Apache spark and object storesSpark Summit East 2017: Apache spark and object stores
Spark Summit East 2017: Apache spark and object stores
 
Realtime Data Analysis Patterns
Realtime Data Analysis PatternsRealtime Data Analysis Patterns
Realtime Data Analysis Patterns
 
Stream Processing: Choosing the Right Tool for the Job
Stream Processing: Choosing the Right Tool for the JobStream Processing: Choosing the Right Tool for the Job
Stream Processing: Choosing the Right Tool for the Job
 
Simplified Machine Learning Architecture with an Event Streaming Platform (Ap...
Simplified Machine Learning Architecture with an Event Streaming Platform (Ap...Simplified Machine Learning Architecture with an Event Streaming Platform (Ap...
Simplified Machine Learning Architecture with an Event Streaming Platform (Ap...
 
(BDT303) Running Spark and Presto on the Netflix Big Data Platform
(BDT303) Running Spark and Presto on the Netflix Big Data Platform(BDT303) Running Spark and Presto on the Netflix Big Data Platform
(BDT303) Running Spark and Presto on the Netflix Big Data Platform
 
Deploying Flink on Kubernetes - David Anderson
 Deploying Flink on Kubernetes - David Anderson Deploying Flink on Kubernetes - David Anderson
Deploying Flink on Kubernetes - David Anderson
 
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production DeploymentUsing the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production Deployment
 
Kafka Tutorial: Kafka Security
Kafka Tutorial: Kafka SecurityKafka Tutorial: Kafka Security
Kafka Tutorial: Kafka Security
 
Pinot: Enabling Real-time Analytics Applications @ LinkedIn's Scale
Pinot: Enabling Real-time Analytics Applications @ LinkedIn's ScalePinot: Enabling Real-time Analytics Applications @ LinkedIn's Scale
Pinot: Enabling Real-time Analytics Applications @ LinkedIn's Scale
 
Arbitrary Stateful Aggregations using Structured Streaming in Apache Spark
Arbitrary Stateful Aggregations using Structured Streaming in Apache SparkArbitrary Stateful Aggregations using Structured Streaming in Apache Spark
Arbitrary Stateful Aggregations using Structured Streaming in Apache Spark
 
OCP 4.10 y Observabilidad.pdf
OCP 4.10 y Observabilidad.pdfOCP 4.10 y Observabilidad.pdf
OCP 4.10 y Observabilidad.pdf
 
Deep Dive into the New Features of Apache Spark 3.0
Deep Dive into the New Features of Apache Spark 3.0Deep Dive into the New Features of Apache Spark 3.0
Deep Dive into the New Features of Apache Spark 3.0
 
Kafka Utrecht Meetup
Kafka Utrecht MeetupKafka Utrecht Meetup
Kafka Utrecht Meetup
 

Similar a Making Sense of Streaming Sensor Data: How Uber Detects on Trip Car Crashes - Nikolas Anderson & Jin Yang, Uber

TraVis CTTHES3
TraVis CTTHES3TraVis CTTHES3
TraVis CTTHES3
Ni Aguirre
 
Flink Forward San Francisco 2019: Streaming your Lyft Ride Prices - Thomas We...
Flink Forward San Francisco 2019: Streaming your Lyft Ride Prices - Thomas We...Flink Forward San Francisco 2019: Streaming your Lyft Ride Prices - Thomas We...
Flink Forward San Francisco 2019: Streaming your Lyft Ride Prices - Thomas We...
Flink Forward
 
Flink Forward San Francisco 2019: Streaming your Lyft Ride Prices - Thomas We...
Flink Forward San Francisco 2019: Streaming your Lyft Ride Prices - Thomas We...Flink Forward San Francisco 2019: Streaming your Lyft Ride Prices - Thomas We...
Flink Forward San Francisco 2019: Streaming your Lyft Ride Prices - Thomas We...
Flink Forward
 
Timmins how would you like to use ua vs
Timmins how would you like to use ua vsTimmins how would you like to use ua vs
Timmins how would you like to use ua vs
GeCo in the Rockies
 

Similar a Making Sense of Streaming Sensor Data: How Uber Detects on Trip Car Crashes - Nikolas Anderson & Jin Yang, Uber (20)

A Low-Cost IoT Application for the Urban Traffic of Vehicles, Based on Wirele...
A Low-Cost IoT Application for the Urban Traffic of Vehicles, Based on Wirele...A Low-Cost IoT Application for the Urban Traffic of Vehicles, Based on Wirele...
A Low-Cost IoT Application for the Urban Traffic of Vehicles, Based on Wirele...
 
Traffic Congestion using IOT
Traffic Congestion using IOTTraffic Congestion using IOT
Traffic Congestion using IOT
 
Sss14duke BT Innovate Research Design
Sss14duke BT Innovate Research DesignSss14duke BT Innovate Research Design
Sss14duke BT Innovate Research Design
 
GPS based smart CAB dispatch system
GPS based smart CAB dispatch systemGPS based smart CAB dispatch system
GPS based smart CAB dispatch system
 
TCP/IP Protocol Based Adaptive Cruise Control using Raspberry Pi
TCP/IP Protocol Based Adaptive Cruise Control using Raspberry PiTCP/IP Protocol Based Adaptive Cruise Control using Raspberry Pi
TCP/IP Protocol Based Adaptive Cruise Control using Raspberry Pi
 
TraVis CTTHES3
TraVis CTTHES3TraVis CTTHES3
TraVis CTTHES3
 
Empowering Real-Time Decision Making with Data Streaming
Empowering Real-Time Decision Making with Data StreamingEmpowering Real-Time Decision Making with Data Streaming
Empowering Real-Time Decision Making with Data Streaming
 
Applications of Deep Learning in Telematics
Applications of Deep Learning in TelematicsApplications of Deep Learning in Telematics
Applications of Deep Learning in Telematics
 
Driving Efficiency with Splunk Cloud at Gatwick Airport
Driving Efficiency with Splunk Cloud at Gatwick AirportDriving Efficiency with Splunk Cloud at Gatwick Airport
Driving Efficiency with Splunk Cloud at Gatwick Airport
 
Streaming your Lyft Ride Prices - Flink Forward SF 2019
Streaming your Lyft Ride Prices - Flink Forward SF 2019Streaming your Lyft Ride Prices - Flink Forward SF 2019
Streaming your Lyft Ride Prices - Flink Forward SF 2019
 
Flink Forward San Francisco 2019: Streaming your Lyft Ride Prices - Thomas We...
Flink Forward San Francisco 2019: Streaming your Lyft Ride Prices - Thomas We...Flink Forward San Francisco 2019: Streaming your Lyft Ride Prices - Thomas We...
Flink Forward San Francisco 2019: Streaming your Lyft Ride Prices - Thomas We...
 
Flink Forward San Francisco 2019: Streaming your Lyft Ride Prices - Thomas We...
Flink Forward San Francisco 2019: Streaming your Lyft Ride Prices - Thomas We...Flink Forward San Francisco 2019: Streaming your Lyft Ride Prices - Thomas We...
Flink Forward San Francisco 2019: Streaming your Lyft Ride Prices - Thomas We...
 
SFScon 2020 - Alex Bojeri - BLUESLEMON project autonomous UAS for landslides ...
SFScon 2020 - Alex Bojeri - BLUESLEMON project autonomous UAS for landslides ...SFScon 2020 - Alex Bojeri - BLUESLEMON project autonomous UAS for landslides ...
SFScon 2020 - Alex Bojeri - BLUESLEMON project autonomous UAS for landslides ...
 
Application Server-less Web Applications - Serverless Toronto Meetup Intro
Application Server-less Web Applications - Serverless Toronto Meetup IntroApplication Server-less Web Applications - Serverless Toronto Meetup Intro
Application Server-less Web Applications - Serverless Toronto Meetup Intro
 
The Green Lab - [04 B] [PWA] Experiment setup
The Green Lab - [04 B] [PWA] Experiment setupThe Green Lab - [04 B] [PWA] Experiment setup
The Green Lab - [04 B] [PWA] Experiment setup
 
[@NaukriEngineering] AppTracer
[@NaukriEngineering] AppTracer[@NaukriEngineering] AppTracer
[@NaukriEngineering] AppTracer
 
Disaster Recovery for Multi-Region Apache Kafka Ecosystems at Uber
Disaster Recovery for Multi-Region Apache Kafka Ecosystems at UberDisaster Recovery for Multi-Region Apache Kafka Ecosystems at Uber
Disaster Recovery for Multi-Region Apache Kafka Ecosystems at Uber
 
vehicle traking based on GSM/GPS using 8051
vehicle traking based on GSM/GPS using 8051vehicle traking based on GSM/GPS using 8051
vehicle traking based on GSM/GPS using 8051
 
Next Generation Tooling for building streaming analytics app
Next Generation Tooling for building streaming analytics appNext Generation Tooling for building streaming analytics app
Next Generation Tooling for building streaming analytics app
 
Timmins how would you like to use ua vs
Timmins how would you like to use ua vsTimmins how would you like to use ua vs
Timmins how would you like to use ua vs
 

Más de Flink Forward

Más de Flink Forward (20)

Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...
 
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
 
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
 
Introducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorIntroducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes Operator
 
Autoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeAutoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive Mode
 
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
 
One sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async SinkOne sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async Sink
 
Tuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptxTuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptx
 
Flink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink powered stream processing platform at Pinterest
Flink powered stream processing platform at Pinterest
 
Apache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraApache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native Era
 
Where is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkWhere is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in Flink
 
The Current State of Table API in 2022
The Current State of Table API in 2022The Current State of Table API in 2022
The Current State of Table API in 2022
 
Flink SQL on Pulsar made easy
Flink SQL on Pulsar made easyFlink SQL on Pulsar made easy
Flink SQL on Pulsar made easy
 
Dynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data AlertsDynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data Alerts
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
 
Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & Iceberg
 
Welcome to the Flink Community!
Welcome to the Flink Community!Welcome to the Flink Community!
Welcome to the Flink Community!
 
Practical learnings from running thousands of Flink jobs
Practical learnings from running thousands of Flink jobsPractical learnings from running thousands of Flink jobs
Practical learnings from running thousands of Flink jobs
 
Extending Flink SQL for stream processing use cases
Extending Flink SQL for stream processing use casesExtending Flink SQL for stream processing use cases
Extending Flink SQL for stream processing use cases
 

Último

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Último (20)

Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 

Making Sense of Streaming Sensor Data: How Uber Detects on Trip Car Crashes - Nikolas Anderson & Jin Yang, Uber

  • 1. Making Sense of Streaming Sensor Data: How Uber Detects On-Trip Car Crashes Nikolas Anderson, Safety Jin Yang, Safety October 9, 2019
  • 2. Who We Are Nikolas Anderson − Software Engineer, Safety Jin Yang − Software Engineer, Safety
  • 5. ■ A First Model ■ Choosing and Integrating Flink ■ 1st Iteration: A Modular, Light Topology ■ 2nd Iteration: On a Reusable Sensor Platform ■ 3rd Iteration: On-Trip Detection Agenda
  • 6. ■ A First Model ■ Choosing and Integrating Flink ■ 1st Iteration: A Modular, Light Topology ■ 2nd Iteration: On a Reusable Sensor Platform ■ 3rd Iteration: On-Trip Detection Agenda
  • 7. What Does a Crash Look Like?
  • 8. ● Distance from dropoff to destination ● Rider/Driver cancellation vs normal trip completion ● Overall length of trip (time, distance) ● Location context (highway, movie theatre, airport, etc.) Trip Context Features
  • 9. Large Force Long Stop >3g Sensor Event Features GPS Long Stop Accelerometer Spikes
  • 10. Building a Model ● Uber has very high-accuracy labels ● Extremely unbalanced dataset ● Model trained using Apache Spark ● Can host model for streaming using platform called Michelangelo The Apache Spark logo is either a registered trademark or a trademark of the Apache Software Foundation in the United States and/or other countries. No endorsement by The Apache Software Foundation is implied by the use of this mark.
  • 11. ■ A First Model ■ Choosing and Integrating Flink ■ 1st Iteration: A Modular, Light Topology ■ 2nd Iteration: On a Reusable Sensor Platform ■ 3rd Iteration: On-Trip Detection Agenda
  • 12. Why Flink? ● Uber migrated from Samza to Flink ● Rich API: keyBy, join, window, etc. ● Supports batch processing ● Exactly once guarantee The Apache Spark and Apache Flink logos are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. No endorsement by The Apache Software Foundation is implied by the use of these marks.
  • 13. Uber Infrastructure: Schema’d Kafka ● Many Kafka topics at Uber have enforced schemas ● Centralized schema registry that stores Avro schemas ● Wrote a custom, config-driven SourceFunction/SinkFunction that loads into and out of generated Java classes topic-name feature-name DataStream<T> inputStream = env.addSource( (SourceFunction<T>) getInputs().get(topicName), new AvroTypeInfo<>(tClass) );
  • 14. Uber Infrastructure: M3 Metrics ● In-house, open source metrics system, M3, roughly compatible with Prometheus ● Implemented custom MetricReporter, lightly adapted from Flink’s PrometheusReporter ● A Prometheus scraper then ingests into Uber metrics system ● Utilize M3 with our internal alerting and monitoring M3 Query Language:
  • 15. ■ A First Model ■ Choosing and Integrating Flink ■ 1st Iteration: A Modular, Light Topology ■ 2nd Iteration: On a Reusable Sensor Platform ■ 3rd Iteration: On-Trip Detection Agenda
  • 16. Sensor Data at Uber 5 minutes 3 minutes 6 minutes GPS: Points sent up one at a time; 0.5 Hz, latitude/longitude/speed, ~3 TB/day Accelerometer: ~5-minute batched payloads; 25 Hz, 3 dimensions, ~10 TB/day Hive Cassandra ● Uber operates tens of millions of trips daily ● Sensor data is MBs per trip
  • 17. Joining TBs of Sensor Streams ● Managing state is difficult; state is sensitive to failures ● Trade-offs between state size and data coverage ● Focus on reducing stream joins ? GPS Accel Points sent up one or two at a time; 0.5 Hz, latitude/longitude/speed, ~3 TB/day ~5-minute payloads; 25 Hz, 3 dimensions, ~10 TB/day
  • 18. Condensing Prior to Trip Joins/Aggregations Detect Spikes Accelerometer Payloads Detect Stops Trip GPS 3 TB 10 TB 60 GB 1 GB
  • 19. Condensing Prior to Trip Joins/Aggregations Detect Spikes Aggregate Spikes by Trip Accelerometer Payloads Detect Stops Trip GPS 3 TB 10 TB 60 GB 1 GB Join Stops and Spikes by Trip
  • 20. A Modular Post-Trip Crash Detection Topology Location Service Fetch GPS Route Detect Spikes Aggregate Spikes by Trip Detect Stops Trip End Event Kafka Topic Fetch Trip Context Stops and Spikes Scored by Model Michelangelo: Machine Learning Platform Trip Service To RideCheck Service Accelerometer Payloads Why so many jobs? ● Resource isolation ● "Paper trail"/debuggability ● Reuse intermediate features ● Facilitates cross-team collaboration
  • 21. The Power of Flink: Joining by Trip ID ● Use SessionWindow ● We first ensure that both streams are deduplicated by trip ID ● Configured “gap” roughly acts as an expiry time ● See power of windows in Flink: ○ Triggered the moment that both sides have arrived, immediately freeing state
  • 22. ■ A First Model ■ Choosing and Integrating Flink ■ 1st Iteration: A Modular, Light Topology ■ 2nd Iteration: On a Reusable Sensor Platform ■ 3rd Iteration: On-Trip Detection Agenda
  • 23. Platformizing: When Use Cases Diverge ● Example of stop detection between RideCheck's Crash vs Trip Anomaly ● Different products have different criteria for data latency, data quality, precision, and definition for contextual features ● It's a case where a feature is engineered differently for different applications
  • 24. Platformizing: Demand for Sensor Data ● Efforts joining large data streams to give data context is not unique ● Fraud detection, ETA calculation ● Examples of Aggregations: per trip, rider/driver match, geolocation (segments of streets, region), time
  • 25. Adding Sensor Embeddings to the Model ● Use deep learning to learn features from raw sensor data ○ GPS ○ Accelerometer ○ Gyroscope ● Produce 100-dimension embedding ● Add output as features for existing model ● TensorFlow sub-model runs within Flink
  • 26. Sensor Trip Aggregation ● A few things make this more feasible: ○ More demand for clean, convenient sensor data from other teams within Uber ○ Reliable GPS now included in batched sensor payloads ● Time for a Sensor Platform that does the aggregation once for everybody ● Unlocks: ○ Full-trip raw data analysis ○ Easy use of trip context data ○ Data quality guarantees ?
  • 27. Consolidated Crash Detection Consolidation Trip Aggregation Extract Stops, Spikes, Embeddings Accel/Gyro/GPS Payloads Trip Event Kafka Topics Fetch Trip Context Scored by Model Michelangelo: Hosted ML ModelsTrip Service To RideCheck Per-Trip Sensor Data Trip Events Why move to a single job now? ● Platform has simplified things ● Much more stable now; less need to isolate ● Rapid iteration has slowed; less need for debugging
  • 28. ■ A First Model ■ Choosing and Integrating Flink ■ 1st Iteration: A Modular, Light Topology ■ 2nd Iteration: On a Reusable Sensor Platform ■ 3rd Iteration: On-Trip Detection Agenda
  • 29. On-Trip Crash Detection: A Hybrid Solution ● Hardest part is forgoing some valuable trip context ● Model performance is inevitably lower due to ○ Giving up Post-Trip features ○ Consider only a sliding window of data ● Meant to be run in tandem with Post-Trip pipeline
  • 30. On-Trip Crash Detection Trip Aggregation 1-Minute Payloads On-Trip Crash Detection To RideCheck Trip Events Retain at most 5 minutes of data Still Emit Original Per- Trip Sensor Data
  • 31. The Future ● Drive down the delay further ○ There would be enormous value in being able to respond in seconds ● On-device heuristics/model ○ Trigger early upload of batched sensor data ○ Backend still does heavy lifting
  • 33. Proprietary and confidential © 2019 Uber Technologies, Inc. All rights reserved. No part of this document may be reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying, recording, or by any information storage or retrieval systems, without permission in writing from Uber. This document is intended only for the use of the individual or entity to whom it is addressed and contains information that is privileged, confidential or otherwise exempt from disclosure under applicable law. All recipients of this document are notified that the information contained herein includes proprietary and confidential information of Uber, and recipient may not make use of, disseminate, or in any way disclose this document or any of the enclosed information to any person other than employees of addressee to the extent necessary for consultations with authorized personnel of Uber.