Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.

Flink Forward San Francisco 2019: Realtime Store Visit Predictions at Scale - Luca Giovagnoli

373 visualizaciones

Publicado el

This talk aims to inspire attendees with a multidisciplinary Flink application, where different fields have come together with a graceful synergy. You will hear about geospatial clustering algorithms, a gradient boosting ML model, and cutting-edge stream-processing technology - all in the same talk! And, if you are wondering, you can incorporate all this into your SOA using Async I/O!

After introducing our product use-case (real-time notifications for nearby local businesses), we’ll dive into the big data challenges. The talk will be describing a Visit Detection algorithm we have built to cluster raw GPS pings into Visits, using Flink state management and custom processing constructs (custom Windows, Triggers and Evictors). Finally we will discuss a real-time machine learning model to predict the correct nearby business, leveraging Flink’s Async I/O at scale.

Flink enabled us to scale complex algorithms to thousands of operations per second, and to power hundreds of thousands of daily push notifications. It availed itself as a clearly superior alternative, whose performance netted Yelp great cost savings, and allowed us to move away from hardly scalable Python alternatives.

Publicado en: Tecnología
  • Sé el primero en comentar

  • Sé el primero en recomendar esto

Flink Forward San Francisco 2019: Realtime Store Visit Predictions at Scale - Luca Giovagnoli

  1. 1. Luca Giovagnoli Realtime Store Visit Predictions at Scale
  2. 2. Product Algorithm Flink Implementation Impact Outline
  3. 3. Yelp’s Mission Connecting people with great local businesses.
  4. 4. The wreck Room St Francis Memorial Hospital Trader Joe’s Creator
  5. 5. Candidate Businesses Creator Trader Joe’s The Wreck Room St. Francis Hospital ... ? Input location Luca @ California & Hyde St. XGBoost ML model
  6. 6. Flink Clustering ~ 103 /s Pings Store Visits Online model inference Async I/O + Dispatch Data Pipeline Py service Flink app System overview
  7. 7. Clustering 03:30 DepartureVisit (03:21, 03:30) ArrivalVisit (03:21, +inf) 03:21 03:55 d meters < d d + ε Time > t mins 03:27 --> ProcessWindowFunction distance (m)
  8. 8. GlobalWindow ArrivalVisit [03:21, +inf) 04:1003:5503:3003:21 DepartureVisit [03:21, 03:30] 04:29 ArrivalVisit [04:10, +inf) Time < d d + ε d + ε < d distance (m)< d
  9. 9. 03:3003:21 Time OnAnyPingTrigger fires here GlobalWindow Custom Trigger
  10. 10. 03:3003:21 Event time CountEvictor removes the oldest ping GlobalWindow 03:30 Event time GlobalWindow Count Evictor
  11. 11. ● Mobiles go offline ● Mobiles clocks can be off * Picture from by Tyler Akidau
  12. 12. 03:5503:1003:21 (Evicted ping) GlobalWindow 03:1003:21 GlobalWindow Event time Event time LatePingEvictor 03:22 03:24 Processing time Processing time03:22 03:24 03:57
  13. 13. Location (Input) Candidate businesses Features Confidence score Distance Directions ... Luca @ California & Hyde St. Creator 0.001 0.026 ... 0.99 Trader Joe’s 0.021 0.071 ... 0.13 St. Francis Hospital ... ... ... ... ... ... ... ... ...
  14. 14. ● Fragile in-house state ● No concurrency, no scaling ● At-least-once guarantee --> duplicate pushes Origins story
  15. 15. x10 Visits recall increase up to ~ 103 ML predictions / sec 14 Flink instances down from ~102 Python ~ x5 times cheaper Impact
  16. 16. We're Hiring!
  17. 17. @YelpEngineering