The mission of Uber is to make transportation as reliable as running water. The business is fundamentally driven by real-time data -- more than half of the employees in Uber, many of whom are non-technical, use SQL on a regular basis to analyze data and power their business decisions. We are building AthenaX, a stream processing platform built on top of Apache Flink to enable our users to write SQL to process real-time data efficiently and reliably at Uber's scale. Using Apache Calcite as query parser, AthenaX compiles the SQL down to Flink jobs. Leveraging Flink's unique streaming capabilities, AthenaX supports (1) consistent computations reliably thanks to at-least-once guarantees, (2) nontrivial analytics (e.g., windowing and joins) on multiple data sources, and (3) efficient and cost-effective executions in production through code generation and elastic scaling.
9. Example
Real-time dashboard for restaurants
…
Time
AvgPrep.time
Time
…
SELECT meal_id, AVG(meal_prep_time)
FROM eats_order
GROUP BY meal_id, HOP(proctime(),
INTERVAL ‘1’ MINUTE,
INTERVAL ‘15’ MINUTE)
10. Example (cont.)
Building streaming processing applications with SQL
SELECT AVG(meal_prep_time) FROM
eats_order
GROUP BY meal_id, HOP(proctime(),
INTERVAL ‘1’ MINUTE,
INTERVAL ‘15’ MINUTE)
11. Example (cont.)
SELECT * FROM (
SELECT EXPECTED_TIME(meal_id)
AS e, meal_id,
AVG(meal_prep_time) AS t
FROM eats_order
GROUP BY meal_id, HOP(proctime(),
INTERVAL ‘1’ MINUTE,
INTERVAL ‘15’ MINUTE)
Building streaming processing applications with SQL
Tables are more generic than analytical stores
RPC
13. The case of UberEATS
• Three-way marketplace
• Real-time metrics
• Estimated Time to Delivery (ETD)
• Transactions
• Demand forecasts
14. The case of UberEATS
• Three-way marketplace
• Real-time metrics
• Estimated Time to Delivery (ETD)
• Transactions
• Demand forecasts
15. Predicting the ETD
• Key metric: time to prepare a meal(tprep)
• Learn a function f: (order status) → tprep periodically
• Predict the ETD for current orders using f
• AthenaX extracts features for both learnings and
predictions
16. Architecture of the ETD service
Prediction
service
Order status
(Kafka)
AthenaX
Data warehouse
Feature / Model
(Cassandra)
Online features
Offline features
Machinelearning
SELECT AVG(meal_prep_time)
FROM eats_order
GROUP BY meal_id,
HOP(proctime(),
INTERVAL ‘1’ MINUTE,
22. Current status
• Pilot jobs in production
• In the process of full-scale roll outs
• Based on Apache Flink 1.3-SNAPSHOT
• Projection, filtering, group windows, UDF
• Streaming joins not yet supported
23. Embrace the community
• Group window support for streaming SQL
• CALCITE-1603, CALCITE-1615
• FLINK-5624, FLINK-5710, FLINK-6011, FLINK-6012
• Stability fixes
• FLINK-3679, FLINK-5631
• Table abstractions for Cassandra / JDBC (WIP)
• Available in the upcoming 1.3 release
Contributions to the upstream
25. Conclusion
• AthenaX: write SQLs to build streaming applications
• Treat table as a generic concept
• Productivity: development → production in hours
• The AthenaX approach
• SQL on streams as a platform
• Self-serving production support end-to-end