Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Low Latency Execution For Apache Spark
1. Low latency execution for apache spark
Shivaram Venkataraman, Aurojit Panda, Kay Ousterhout
2. Who am I ?
PhD candidate, AMPLab UC Berkeley
Dissertation: System design for large scale machine learning
Apache Spark PMC Member. Contributions to Spark core, MLlib, SparkR
10. Execution models: parallel operators
Long-lived operators
Checkpointing based
fault tolerance
Naiad
S
H
U
F
F
L
E
S
H
U
F
F
L
E
Iteration
Task
Control Message
Driver
Network Transfer
Low Latency
15. DAG scheduling
Assign tasks to hosts using
(a) locality preferences
(b) straggler mitigation
(c) fair sharing etc.
Tasks
Host1
Host2
Driver
Host1
Host2
Serialize &
Launch
Host
Metadata
Scheduler
Same DAG structure
for many iterations
Can reuse scheduling decisions
16. Batch scheduling
Schedule a batch
of iterations at once
Fault tolerance, scheduling
at batch boundaries
1 stage in each iteration
b = 2
17. How much does this help ?
1
10
100
1000
4
8
16
32
64
128
Time/Iter(ms)
Machines
Apache Spark
Drizzle-10
Drizzle-50
Drizzle-100
Workload: Sum of 10k numbers per-core
Single Stage Job, 100 iterations – Varying Drizzle batch size
24. Shared variables
Fine-grained updates to shared data
Distributed shared variables
- MPI using MPI_AllReduce
- Bloom, CRDTs
- Parameter servers, key-value stores
- reduce followed by broadcast
25. Drizzle: Replicated shared variables
Custom merge strategies
1
Commutative updates
e.g. vector addition
Task
Shared Variable
M
E
R
G
E
1
1
1
1
2
2
2
2
30. b=1 à Batch processing
Choosing batch size
Higher overhead
Smaller window for fault tolerance
In practice: Largest batch such that overhead is below fixed threshold
e.g., For 128 machines, batch of few seconds is enough
b=N à Parallel operators
Lower overhead
Larger window for fault tolerance
31. Evaluation
Micro-benchmarks
- Single stage
- Multiple stages
- Shared variables
End-to-end experiments
- Streaming benchmarks
- Logistic Regression
Implemented on Apache Spark 1.6.1
Integrations with Spark Streaming, MLlib