This document discusses Mantis, a reactive stream processing system for operational insights. Mantis allows querying data on-demand, reusing data and results between jobs for efficiency. It enables job chaining through discovery of job outputs and auto-scales jobs and clusters based on workload. Mantis provides high throughput and low latency stream processing while maintaining data guarantees.
9. So, in order to manage complex environments,
need to rethink insights, shift the curve
10. An Insight system that can...
Auto-detect anomalies in high
volume, high cardinality data
11. An Insight system that can...
Auto-detect anomalies in high
volume, high cardinality data
Identify titles that have an abnormal failure
rate and highlight their common
characteristics
(only on certain devices using certain
CDNs etc)
34. EdgeServers
Device Health
Dashboard
Realtime Data
Aggregator Job
All Device Events
Anomaly
Detection Job
Alerts
All Device Events
Can we reuse Data?
Device
Events Q
Ad-hoc
Query
Search for
device1 events
Job
All Device Events
Device !=
“device1”
35. EdgeServers
Device Health
Dashboard
Realtime Data
Aggregator Job
All Device Events
Anomaly
Detection Job
Alerts
All Device Events
3x fan out
Can we reuse Data?
Device
Events Q
Ad-hoc
Query
Search for
device1 events
Job
All Device Events
Device !=
“device1”
36. EdgeServers
Device Health
Dashboard
Realtime Data
Aggregator Job
All Device Events
Anomaly
Detection Job
Alerts
All Device Events
3x fan out
Can we reuse Data?
Device
Events Q
Ad-hoc
Query
Search for
device1 events
Job
All Device Events
Device !=
“device1”
Queryable
Events Job
37. EdgeServers
Device Health
Dashboard
Realtime Data
Aggregator Job
All Device Events
Anomaly
Detection Job
Alerts
All Device Events
3x fan out
Can we reuse Data?
Device
Events Q
Ad-hoc
Query
Search for
device1 events
Job
All Device Events
Device !=
“device1”
Queryable
Events Job
(Select status Where true)
Only get
“projected”
events
38. EdgeServers
Device Health
Dashboard
Realtime Data
Aggregator Job
All Device Events
Anomaly
Detection Job
Alerts
All Device Events
3x fan out
Can we reuse Data?
Device
Events Q
Ad-hoc
Query
Search for
device1 events
Job
All Device Events
Device !=
“device1”
Queryable
Events Job
(Select status Where true)
Only get
“projected”
events
40. EdgeServers
Device Health
Dashboard
Realtime Data
Aggregator Job
All Device Events
Anomaly
Detection Job
Alerts
All Device Events
3x fan out
Can we reuse Data?
Device
Events Q
Ad-hoc
Query
Search for
device1 events
Job
Queryable
Events Job
(select * where
device
==
“device1”)
41. EdgeServers
Device Health
Dashboard
Realtime Data
Aggregator Job
All Device Events
Anomaly
Detection Job
Alerts
All Device Events
3x fan out
Can we reuse Data?
Device
Events Q
Ad-hoc
Query
Search for
device1 events
Job
Queryable
Events Job
Only get
“filtered”
events
42. EdgeServers
Device Health
Dashboard
Realtime Data
Aggregator Job
All Device Events
Anomaly
Detection Job
Alerts
All Device Events
2x fan out
Can we reuse Data?
Device
Events Q
Ad-hoc
Query
Search for
device1 events
Job
Queryable
Events Job
Only get
“filtered”
events
43. What If ?
Only stream what is needed & when it is needed?
Reuse the data already streamed?
Reuse the results?
44. EdgeServers
Device Health
Dashboard
Realtime Data
Aggregator Job
All Device Events
2x fan out
Can we reuse Results?
Device
Events Q
Ad-hoc
Query
Search for
device1 events
Job
Anomaly Detection
Job
Alerts
Queryable
Events Job
All Device Events
54. Mantis
● Small but extremely fast
shrimp
● A Reactive stream
processing system
55. Mantis
Only stream what is needed & when it is needed
Reuse the data & results?
Auto-scale resources?
56. Mantis
Only stream what is needed & when it is needed
Reuse the data & results?
Auto-scale resources?
Query based On-demand streaming of data
57. Mantis
Only stream what is needed & when it is needed
Reuse the data & results?
Auto-scale resources?
Query based On-demand streaming of data
Built-in Job Discovery and Job Chaining
58. Mantis
Only stream what is needed & when it is needed
Reuse the data & results?
Auto-scale resources?
Query based On-demand streaming of data
Built-in Job Discovery and Job Chaining
Job and Cluster Auto-scaling
59. + Much More
● High throughput, low latency stream processing system focused on
Operational Insights
● Configurable data guarantees
● Long running & Transient jobs
● Flexible Functional programming with RxJava
60. Mantis deep-dive
● Query based On-demand Streaming of data
● Job Discovery and Job Chaining
● Auto-scaling Jobs and Clusters
● End-to-end Reactive Stream Semantics
64. Mantis
● Query based On-demand Streaming of data
● Job Discovery and Job Chaining
● Auto-scaling Jobs and Clusters
● End-to-end Reactive Stream Semantics
65. Query Based On Demand Streaming
● Stream data only when needed and only what is needed
● Filter data at the source
● Cleanup after use
Data Source
QueryRequested
Data
Mantis Job
66. Mantis Query Language (MQL)
SELECT xid, errorCode
WHERE device-type == SONY_PS3
SAMPLE {"strategy": "RANDOM", "threshold": 200}
Projection
Filtering
Sampling
67. Query processing on Data producing app
API
MRE Mantis Real-time Events library
68. Query processing on Data producing app
API
MRE
QoE Analysis
Mantis Job
Mantis Real-time Events library
SELECT xid
WHERE type = rebuffer
69. Query processing on Data producing app
API
MRE
QoE Analysis
Mantis Job
SELECT xid
WHERE type = rebuffer
Mantis Real-time Events library
{ “xid”: 1234},
{ “xid”: 4567}
70. Query processing on Data producing app
API
MRE
QoE Analysis
Mantis Job
SELECT xid
WHERE type = rebuffer
Mantis Real-time Events library
{ “xid”: 1234},
{ “xid”: 4567}
Device Analysis
Mantis Job
SELECT * WHERE
device = XBox
{ “device”: “XBox”,
“IP”: 1.1.1.1,
“xid”:1111 }
71. Mantis
● Query based On-demand Streaming of data
● Job Discovery and Job Chaining
● Auto-scaling Jobs and Clusters
● End-to-end Reactive Stream Semantics
88. Reuse Kafka Data Streams
0 N
Mantis Kafka
Consumer Job
Kafka
TopicPartition
89. Reuse Kafka Data Streams
0 N
SELECT *
WHERE device = XBox
SELECT *
WHERE xid = 0afcedbxe
SELECT *
WHERE type = re-buffer
Mantis Kafka
Consumer Job
Device Analysis Job QoE analysis Job
Adhoc Transaction
Analysis Job
Kafka
TopicPartition
90. Mantis
● Query based On-demand Streaming of data
● Job Discovery and Job Chaining
● Auto-scaling Jobs and Clusters
● End-to-end Reactive Stream Semantics
96. Bin Packing
● Simple round robin scheduling causes fragmentation
● Smarter bin-packing of jobs frees up mantis agents for scale down
Host 1 Host 2 Host 3 Host 4
v/s
Host 1 Host 2 Host 3 Host 4
97. Mantis
● Query based On-demand Streaming of data
● Job Discovery and Job Chaining
● Auto-scaling Jobs and Clusters
● End-to-end Reactive Stream Semantics
99. End-to-end Reactive Streams
● RxJava operators compose backpressure within a single worker
● Reactive Socket for backpressure across network boundaries
100. ● Application layer protocol for async non-blocking backpressure across
network boundaries
● Rich interaction modes
● Pluggable transport protocol
Reactive Socket
Node A Node B
Request N
Data
109. Mantis Today
● ~650 Jobs in production
● ~8 Million events / sec processed
● 80 Gb/s processed (instead of 10 Tb/s due to filtering) i.e. 99% less data !!
● The processed data gets reused by other jobs further reducing costs
● Auto-scaling jobs use up to 75% fewer resources compared to peak