Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.

Maxim Fateev - Beyond the Watermark- On-Demand Backfilling in Flink

782 visualizaciones

Publicado el

http://flink-forward.org/kb_sessions/beyond-the-watermark-on-demand-backfilling-in-flink/

Flink has consistency guarantees and efficient checkpointing model which make it a good fit for Uber’s money-related use cases, such as driver incentives. However, Flink’s time-progress model is built around a single watermark, which is incompatible with Uber’s business need for generating aggregates retroactively. The talk covers our solution for on-demand backfilling. It also outlines other abstractions and features we expect Flink to support as it matures.

Publicado en: Datos y análisis
  • Inicia sesión para ver los comentarios

Maxim Fateev - Beyond the Watermark- On-Demand Backfilling in Flink

  1. 1. Beyond the Watermark On-Demand Backfilling in Flink Maxim Fateev, Staff SDE 2016
  2. 2. Who Am I ● Amazon Internal Messaging Infrastructure ● AWS SQS Storage Engine ● AWS Simple Workflow Service (SWF) ● Uber Cherami Messaging System ○ to be open sourced fall 2016
  3. 3. Uber Marketplace ● Ride within minutes ● City needs a minimal number of riders and drivers ● Incentives is a mechanism to bootstrap a marketplace ● Incentives are specific to location, time, type of vehicle, driver rating, etc.
  4. 4. Driver Incentive Example ● Guarantee of $40 an hour ○ UberX ○ From August 21st to August 26th ○ San Francisco ○ Minimum 20 hours online ○ Minimum rating of 4.5 ○ Acceptance rate of 0.8
  5. 5. Driver Incentive Pipeline Driver Status Change Log Key By Driver Per Driver Window Incentive Evaluator Incentive Progress Microservices Custom Trigger Incentive FilterSource DB Sink
  6. 6. Driver Incentive Pipeline Driver Status Change Log Key By Driver Per Driver Window Incentive Evaluator Incentive Progress Microservices Custom Trigger Incentive FilterSource DB Sink Thousands of incentives at any given time!
  7. 7. Driver Incentives Pipeline Driver Status Change Log Key By Driver Incentive Evaluator Incentive Progress Microservices Incentive FilterSource DB Sink Join with Incentives Incentives DB Per Driver/Incentive Window Custom Trigger
  8. 8. Driver Incentive Pipeline Driver Status Change Log Key By Driver Incentive Evaluator Incentive Progress Microservices Incentive FilterSource DB Sink Join with Incentives Incentives DB Some incentives are created retroactively Per Driver/Incentive Window Custom Trigger
  9. 9. Retroactive Incentive Creation ● pipeline for incentives created up front ● backfill pipeline that runs periodically for retroactively created incentives
  10. 10. Retroactive Incentive Creation ● pipeline for incentives created up front ● backfill pipeline that runs periodically for retroactively created incentives ● What to do when backfill reaches “current” events?
  11. 11. Retroactive Incentive Creation ● pipeline for incentives created up front ● backfill pipeline that runs periodically for retroactively created incentives ● What to do when backfill reaches “current” events? ○ Keep running it until end of all incentive periods or ○ Hand over incentive to the “current events” pipeline
  12. 12. Ideal Solution ● Single pipeline instance ● Supports retroactive incentive creation
  13. 13. Our Solution: On-Demand Query “Source” ● Not a Flink Source as it consumes DataStream of incentives ● Reads Driver Status Change Log ● Emits state change / incentive pairs ● For every incentive emits pairs from the beginning of the incentive period ○ Internally has multiple source instances ○ Periodically starts source stream scan from the oldest incentive to backfill ● Global watermark is not used ○ Per incentive watermark would be great ● Checkpoint includes the list of not yet completed incentives and each internal source checkpoint
  14. 14. On-Demand Backfilling Pipeline On-Demand Source Driver/Incentive Window Incentive Evaluator Custom Trigger Incentive Filter Microservices Incentives Stream Driver Status Change Log Key By Driver Embedded Sources Incentives Payments DB Sink Source
  15. 15. Summary ● DataStream that contains union of current and backfill messages ● DataStream source doesn’t need to be at the start of a pipeline ● Source that changes its behavior based on its inputs is a useful abstraction ● Global Watermark is not adequate
  16. 16. Generic Solution?
  17. 17. Driver Incentive Pipeline Driver Status Change Log Key By Driver Per Driver Window Incentive Evaluator Incentive Progress Microservices Custom Trigger Incentive FilterSource DB Sink
  18. 18. Strawman: Pipeline Template ● Pipeline that depends on some parameters to be instantiated ○ Driver Incentive would be such parameter ● Parameter values are specified when pipeline is instantiated ● All instances of the templated pipeline share the same operator instances ● All streams and operators are implicitly keyed on parameter values ● Any sources, operators and sinks have access to parameter values ● Watermarks and state values are scoped to an instance ● Implementation of sources, operators and sinks might be optimized to share resources between instances ○ Source that performs single Kafka stream read for all instances that were started for the last hour
  19. 19. Driver Incentive Pipeline Template Driver Status Change Log Key By Driver Per Driver Window Incentive Evaluator Incentive Progress Microservices Custom Trigger Incentive FilterSource DB Sink Incentive
  20. 20. Additional Feature Requests ● Per message error handling ● Runtime Visibility ○ Look at the state of any window and associated trigger in the system ○ Overhear any data stream for a task ● Triggering on empty windows ● Pipeline graph rewriting ○ Interceptors ○ Platform components ● Pre-checkpoint callback for sources

×