What Are The Drone Anti-jamming Systems Technology?
Abstractions for managed stream processing platform (Arya Ketan - Flipkart)
1. Abstractions of a Managed Stream Processing
platform
Arya Ketan
arya.ketan@flipkart.com
Flipkart Internet Pvt.Ltd
2. About Me
● Senior Architect @ Flipkart with over 10 years of experience
● Love solving complex problem statements
● Conference Traveler
● Board gamer enthusiast, sports follower
3. Agenda
➔ Stream Processing use-cases and examples from
Flipkart.
➔ Why a stream platform?
➔ FStream - Managed Stateful Stream Processing
Platform at Flipkart.
➔ FStream Components.
8. Some other use-cases
● Real time search rank improvement
● Real time reseller fraud detection
Flipkart currently has 500+ stream compute jobs that see
peak throughput of ~400k/sec
9. Sample topology from the above use-cases
Order
Event
Stream
Join
Time Windowed
Aggregates
Map
GroupBy category,
product
Write to
Report
Database
Browse
Events
Time Windowed
Aggregates
Map
GroupBy
category
Update
Trending
Database
Search
Events
Map
GroupBy user,
category
Write to
Search
Index
Join
Time Windowed
Aggregates
10. Common patterns observed
➔ Stateless operations like filter, map, flatmap et.
➔ Stateful operations on time windows like stream join,
aggregate, dedupe etc.
➔ Triggers and change notifications
12. Why a stream platform?
➔ High Entry Bar
◆ Domain expertise and deep knowledge is needed.
➔ Complex Stateful Operators
◆ Need for windowed operators and state management.
◆ Managing state at high throughput.
13. Why a stream platform?
● Stateful operations to handle late data, triggers and data correction
Stream Join window
[(G,od7,7), (H,11), (F,
od4)]
(G, od7), (F, od4)
(G,7), (J,3), (H, 11)
[(od7,G),(o
d4,F)]
Image source: https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-101
14. Why a stream platform?
➔ Fault tolerance is Hard
◆ hardware, network, unexpected events.
➔ Infrastructure management is Hard
◆ Multiple compute and storage clusters is challenging.
➔ Housekeeping is Hard
◆ Job management, monitoring and alerting is often tedious and messy
15. Why a stream platform?
Programming
Model
Stateful
Operations
Job Lifecycle
Management
Low Entry
Bar
Monitoring
and Alerting
Infrastructure
Management
Ideal
Streaming
Platform
27. Programming Model : Source
➔ Stream of Input data
➔ Pluggable interface
➔ Supports Checkpointing
28. Programming Model : Sinks
➔ Terminal output
➔ Pluggable Interface and
SDK to write custom
sinks
29. Programming Model : Operators
➔ Stateful operators
➔ Stateless operations
➔ Pluggage operator and
state store interface
30. Stateful Operation : Join
➔ Stream-Stream Join / Table-Stream Join
➔ Windowed (Indefinite) Join
➔ Late arriving data
➔ Types of Join
◆ Non Mandatory Join
◆ Mandatory Join
31. Stateful Operation: Join
➔ Declarative syntax to define join scenarios.
➔ Dependent streams join via state store.
➔ Reusable pipelines
➔ Handles delayed streams.
32. Concepts: Stateful Join
➔ Low Latency vs Correctness vs Eventual Correctness
◆ Mandatory Join ( Correctness)
◆ Delayed Join ( Low latency)
◆ Combination ( Eventual Correctness)
● 15+ mandatory joins at less than 5 secs of latency at a QPS of 50k qps
33. Stateful Aggregations
➔ Aggregation of fields in payload based on some key.
➔ Support time based windowed aggregation.
➔ Greatly reduces code written by user.
● 1 Bn + events aggregated daily over 50+ dimensions
● 1000 odd reports powered
41. Monitoring & Alerting
➔ Monitoring health & Debuggability
➔ Fstream provides:
◆ Automated metricing
◆ Automatic alerts creation
◆ Granular metrics
◆ Current and Historical Metrics
42. Common Metrics
➔ Source
◆ Source lag
◆ Time to read from source
◆ Number of unparseable records
➔ Compute
◆ Time to process
◆ Scheduling delay
◆ Number of records