Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.
Processing millions of measurements per second
Flink Streaming at John Deere
© 2019, Deere & Co. All rights reserved.
About John Deere
Agricultural Equipment Construction Equipment
Turf Equipment For...
© 2019, Deere & Co. All rights reserved.
Our Purpose: Committed to Those Linked to the Land
We will help our customers – t...
© 2019, Deere & Co. All rights reserved.
John Deere Intelligent Solutions Group
© 2019, Deere & Co. All rights reserved.
ExactEmerge™ Planter
15 sensor readings
x
5 hertz
x
32 row units
=
2400 readings ...
© 2019, Deere & Co. All rights reserved.
A “typical” Field
48 Acres
1.5 Million Corn
Plants
2 Billion Kernels
Spatially di...
© 2019, Deere & Co. All rights reserved.
World Wide Data Processing
• Each dot
represents a
machine
capturing data
• 5738 ...
© 2019, Deere & Co. All rights reserved.
Use Cases – Precision Analysis
• Data is rasterized at the operation level
for pr...
© 2019, Deere & Co. All rights reserved.
Use Cases – Large Scale Analysis
• 1 to n sessions can be aggregated to
generate ...
© 2019, Deere & Co. All rights reserved.
Ingestion
Constant Stream
Micro-batches
Large Batch
© 2019, Deere & Co. All rights reserved.
Ingestion
Stream or Batch Processing?
• Zip up the stream and
process it as a bat...
© 2019, Deere & Co. All rights reserved.
Streaming – The Lowest Common Denominator
Kinesis Data Stream
… but not always th...
© 2019, Deere & Co. All rights reserved.
Retaining Batch Cohesion
Kinesis Data Stream
© 2019, Deere & Co. All rights reserved.
Stateless Stream Processing
Decoder
Concerns:
• Can I keep up?
• Can I recover?
© 2019, Deere & Co. All rights reserved.
Keeping Up - Options
1.MoreShards
2. Bigger Decoder
Instances
Consumer
Decoder
De...
© 2019, Deere & Co. All rights reserved.
Stateful Stream Processing
512,107 seeds 4,804,347 seeds
More Concerns:
1. How do...
© 2019, Deere & Co. All rights reserved.
Apache Flink
© 2019, Deere & Co. All rights reserved.
Checkpoints, Savepoints, and Other Painpoints
Some problems we’ve had:
• Long che...
© 2019, Deere & Co. All rights reserved.
Checkpoints, Savepoints, and Other Painpoints
Some tips:
• Try to avoid backpress...
© 2019, Deere & Co. All rights reserved.
Scaling and Spillway
• Flink/EMR does not autoscale
• Our data is very spiky.
• I...
© 2019, Deere & Co. All rights reserved.
Scaling and Spillway
Solution - Spillway
• If backpressure is detected, start
pip...
© 2019, Deere & Co. All rights reserved.
Validation at Scale
• 26.8 Trillion Measurements (so far)
• Even at 6 Sigma that ...
© 2019, Deere & Co. All rights reserved.
John Deere Careers
http://jobs.deere.com
Now hiring:
• ML / AI
• Vision and Perce...
Flink Forward San Francisco 2019: How John Deere uses Flink to process millions of sensor measurements per second - Greg F...
Próxima SlideShare
Cargando en…5
×

Flink Forward San Francisco 2019: How John Deere uses Flink to process millions of sensor measurements per second - Greg Finch & Adam Butler

105 visualizaciones

Publicado el

The John Deere data platform receives and processes millions of sensor measurements per second from machines around the world. In this talk, we'll discuss the importance of stream data processing and how we are using it to improve machine automation and to help our customers improve the efficiency of their farm operations. We'll review the evolution of our streaming data platform, discuss lessons learned along the way, and share why we chose Flink to solve some of our most difficult problems. Finally, we'll walk through one of our Flink applications in detail and share techniques that we use to process data at massive scale.

Publicado en: Tecnología
  • Sé el primero en comentar

  • Sé el primero en recomendar esto

Flink Forward San Francisco 2019: How John Deere uses Flink to process millions of sensor measurements per second - Greg Finch & Adam Butler

  1. 1. Processing millions of measurements per second Flink Streaming at John Deere
  2. 2. © 2019, Deere & Co. All rights reserved. About John Deere Agricultural Equipment Construction Equipment Turf Equipment Forestry Equipment
  3. 3. © 2019, Deere & Co. All rights reserved. Our Purpose: Committed to Those Linked to the Land We will help our customers – those who cultivate, harvest, transform, enrich, or build upon the land – meet the world's dramatically increasing need for food, fuel, and infrastructure. In so doing, we will support a higher quality of life around the world. Global population is increasing Arable land is fixed About John Deere
  4. 4. © 2019, Deere & Co. All rights reserved. John Deere Intelligent Solutions Group
  5. 5. © 2019, Deere & Co. All rights reserved. ExactEmerge™ Planter 15 sensor readings x 5 hertz x 32 row units = 2400 readings / sec ---- 10 miles / hr 160k seeds / ac
  6. 6. © 2019, Deere & Co. All rights reserved. A “typical” Field 48 Acres 1.5 Million Corn Plants 2 Billion Kernels Spatially divided into 100000 3’x3’ sections
  7. 7. © 2019, Deere & Co. All rights reserved. World Wide Data Processing • Each dot represents a machine capturing data • 5738 active sessions • 12 million measurements per second • 720 million measurements in 60 seconds
  8. 8. © 2019, Deere & Co. All rights reserved. Use Cases – Precision Analysis • Data is rasterized at the operation level for precision analysis and visualization • Full resolution to 0.1493 m/cell, on a 256x256 cell raster • Can perform real-time evaluation, combination and visualization of 1 to n measurements via a robust API
  9. 9. © 2019, Deere & Co. All rights reserved. Use Cases – Large Scale Analysis • 1 to n sessions can be aggregated to generate totals • Arbitrary criteria can be used to filter results • Spatially organized • 2.5B stored layers • Example - Average yield of corn in Polk County Iowa, in 2018, grouped by average harvester speed
  10. 10. © 2019, Deere & Co. All rights reserved. Ingestion Constant Stream Micro-batches Large Batch
  11. 11. © 2019, Deere & Co. All rights reserved. Ingestion Stream or Batch Processing? • Zip up the stream and process it as a batch? • Unzip the batch and process it as a stream? • Some of both?
  12. 12. © 2019, Deere & Co. All rights reserved. Streaming – The Lowest Common Denominator Kinesis Data Stream … but not always the best choice
  13. 13. © 2019, Deere & Co. All rights reserved. Retaining Batch Cohesion Kinesis Data Stream
  14. 14. © 2019, Deere & Co. All rights reserved. Stateless Stream Processing Decoder Concerns: • Can I keep up? • Can I recover?
  15. 15. © 2019, Deere & Co. All rights reserved. Keeping Up - Options 1.MoreShards 2. Bigger Decoder Instances Consumer Decoder Decoder Decoder3. Fan Out
  16. 16. © 2019, Deere & Co. All rights reserved. Stateful Stream Processing 512,107 seeds 4,804,347 seeds More Concerns: 1. How do I group related data? 2. How do I handle late arriving data? 3. How do I ensure exactly once processing?
  17. 17. © 2019, Deere & Co. All rights reserved. Apache Flink
  18. 18. © 2019, Deere & Co. All rights reserved. Checkpoints, Savepoints, and Other Painpoints Some problems we’ve had: • Long checkpoint durations • Very large checkpoints & savepoints • S3 throttling • Checkpoint timeout spiral
  19. 19. © 2019, Deere & Co. All rights reserved. Checkpoints, Savepoints, and Other Painpoints Some tips: • Try to avoid backpressure • Limit / reduce the amount of state we are keeping • Very long checkpoint duration • Removing checkpoints altogether
  20. 20. © 2019, Deere & Co. All rights reserved. Scaling and Spillway • Flink/EMR does not autoscale • Our data is very spiky. • Irregular bursts of data • Inconsistent record size
  21. 21. © 2019, Deere & Co. All rights reserved. Scaling and Spillway Solution - Spillway • If backpressure is detected, start piping records to a new stream • Monitor stream, if record count goes up, spin up a new cluster • When record count goes down, tear down cluster • Can cascade if needed
  22. 22. © 2019, Deere & Co. All rights reserved. Validation at Scale • 26.8 Trillion Measurements (so far) • Even at 6 Sigma that is 92 Million failures • How to tackle this: • Logging - Elasticsearch/Kibana with careful grooming of what to log • Audits - Periodic jobs that evaluate statistical success • Monitoring – Cloudwatch Dashboards and Alarms • Investigator – Internally developed spark based tool that does analysis on failures at scale.
  23. 23. © 2019, Deere & Co. All rights reserved. John Deere Careers http://jobs.deere.com Now hiring: • ML / AI • Vision and Perception • Data Science • Telematics • Robotics • Mobile Software • Embedded Software • Software Engineering • Architecture

×