Flink Forward San Francisco 2019: How John Deere uses Flink to process millions of sensor measurements per second - Greg Finch & Adam Butler

Processing millions of measurements per second
Flink Streaming at John Deere

© 2019, Deere & Co. All rights reserved.
About John Deere
Agricultural Equipment Construction Equipment
Turf Equipment Forestry Equipment

Our Purpose: Committed to Those Linked to the Land
We will help our customers – those who cultivate, harvest, transform, enrich, or
build upon the land – meet the world's dramatically increasing need for food,
fuel, and infrastructure. In so doing, we will support a higher quality of life
around the world.
Global population is increasing
Arable land is fixed
About John Deere

John Deere Intelligent Solutions Group

ExactEmerge™ Planter
15 sensor readings
x
5 hertz
x
32 row units
=
2400 readings / sec
----
10 miles / hr
160k seeds / ac

A “typical” Field
48 Acres
1.5 Million Corn
Plants
2 Billion Kernels
Spatially divided into
100000 3’x3’
sections

World Wide Data Processing
• Each dot
represents a
machine
capturing data
• 5738 active
sessions
• 12 million
measurements
per second
• 720 million
measurements
in 60 seconds

Use Cases – Precision Analysis
• Data is rasterized at the operation level
for precision analysis and visualization
• Full resolution to 0.1493 m/cell, on a
256x256 cell raster
• Can perform real-time evaluation,
combination and visualization of 1 to n
measurements via a robust API

Use Cases – Large Scale Analysis
• 1 to n sessions can be aggregated to
generate totals
• Arbitrary criteria can be used to filter results
• Spatially organized
• 2.5B stored layers
• Example - Average yield of corn in Polk
County Iowa, in 2018, grouped by average
harvester speed

Ingestion
Constant Stream
Micro-batches
Large Batch

Ingestion
Stream or Batch Processing?
• Zip up the stream and
process it as a batch?
• Unzip the batch and
process it as a stream?
• Some of both?

Streaming – The Lowest Common Denominator
Kinesis Data Stream
… but not always the best choice

Retaining Batch Cohesion
Kinesis Data Stream

Stateless Stream Processing
Decoder
Concerns:
• Can I keep up?
• Can I recover?

Keeping Up - Options
1.MoreShards
2. Bigger Decoder
Instances
Consumer
Decoder
Decoder
Decoder3. Fan Out

Stateful Stream Processing
512,107 seeds 4,804,347 seeds
More Concerns:
1. How do I group
related data?
2. How do I handle
late arriving data?
3. How do I ensure
exactly once
processing?

Apache Flink

Checkpoints, Savepoints, and Other Painpoints
Some problems we’ve had:
• Long checkpoint durations
• Very large checkpoints & savepoints
• S3 throttling
• Checkpoint timeout spiral

Checkpoints, Savepoints, and Other Painpoints
Some tips:
• Try to avoid backpressure
• Limit / reduce the amount of state we are
keeping
• Very long checkpoint duration
• Removing checkpoints altogether

Scaling and Spillway
• Flink/EMR does not autoscale
• Our data is very spiky.
• Irregular bursts of data
• Inconsistent record size

Scaling and Spillway
Solution - Spillway
• If backpressure is detected, start
piping records to a new stream
• Monitor stream, if record count
goes up, spin up a new cluster
• When record count goes down,
tear down cluster
• Can cascade if needed

Validation at Scale
• 26.8 Trillion Measurements (so far)
• Even at 6 Sigma that is 92 Million failures
• How to tackle this:
• Logging - Elasticsearch/Kibana with careful
grooming of what to log
• Audits - Periodic jobs that evaluate statistical
success
• Monitoring – Cloudwatch Dashboards and
Alarms
• Investigator – Internally developed spark
based tool that does analysis on failures at
scale.

John Deere Careers
http://jobs.deere.com
Now hiring:
• ML / AI
• Vision and Perception
• Data Science
• Telematics
• Robotics
• Mobile Software
• Embedded Software
• Software Engineering
• Architecture

Flink Forward San Francisco 2019: How John Deere uses Flink to process millions of sensor measurements per second - Greg Finch & Adam Butler

Flink Forward San Francisco 2019: How John Deere uses Flink to process millions of sensor measurements per second - Greg Finch & Adam Butler

Recomendados

Recomendados

Más contenido relacionado

Similar a Flink Forward San Francisco 2019: How John Deere uses Flink to process millions of sensor measurements per second - Greg Finch & Adam Butler

Similar a Flink Forward San Francisco 2019: How John Deere uses Flink to process millions of sensor measurements per second - Greg Finch & Adam Butler (20)

Más de Flink Forward

Más de Flink Forward (20)

Último

Último (20)

Flink Forward San Francisco 2019: How John Deere uses Flink to process millions of sensor measurements per second - Greg Finch & Adam Butler