Strata Singapore: GearpumpReal time DAG-Processing with Akka at Scale

1
GearpumpReal time DAG-Processing with Akka at Scale
Strata Singapore 2015
Sean Zhong
Xiang.zhong@intel.com, Intel Software

2
What is Gearpump
• Akka based lightweight Real time data processing platform.
• Apache License http://gearpump.io version 0.7
• Akka:
• Communication,
concurrency, Isolation,
and fault-tolerant
Simple and Powerful
Message level streaming
Long running daemons
What is Akka?

3
What is Akka?
• Micro-service(Actor) oriented.
• Message Driven
• Lock-free
• Location-transparent
It is like our human society, driven by message
Which can scale to 7 billion population!

4
Micro-service oriented
higher abstraction
• Break your application into Micro services
instead of object.
• Throw away locks
• Use Immutable Async message to exchange
information between micro-service instead of
shared object.

5
Gearpump in Big Data Stack
store
visualization
batch stream
SQL
Catalyst
StreamSQL
Impala
Cluster manager monitor/alert/notify
Machine learning
Graphx
Cloudera
Manager
Storage
Engine
Analytics
Visualization
& management
storm
Data explore
Gearpump

6
Why another streaming platform?
• The requirements are not fully met.
• A higher abstraction like micro-service oriented can largely simplify
the problem.

7
What we want
• Meet The 8 Requirements of Real-Time Stream
Processing (2006)
7
Flexible Volume Speed Accuracy Visual
Any where
Any size
Any source
Any use case
dynamic DAG
②StreamSQL
High
throughput
⑦Scale linearly
①In-Stream
Zero latency
⑥HA
⑧Responsive
Exactly-once
③Message
loss/delay/out
of order
④Predictable
Easy to debug
WYSWYG

9
DAG representation and API
Graph(A~>B~>C~>D, B~>E~>D)
Low level Graph API Syntax:
A B C D
E
Processor
Processor Processor Processor
Processor
DAG
Shuffle
Field
grouping
Field
grouping

10
Architecture - Actor Hierarchy
Client
Hook in and query state
As general service
YARN
Each App has one isolated AppMaster, and use Actor Supervision
tree for error handling.
Master Cluster
HA Design

11
WorkerWorkerWorker
Master
standby
Master
Standby
Master
State
Gossip
Architecture - Master HA (no SPOF)
• Akka Cluster for a centerless HA system
• Conflict free data types(CRDT) for consistency
CRDT Data type example:
Decentralized: Not rely on single central meta server
leader

12
Feature Highlights
Akka/Akka-
stream/Storm
compatible
Throughput
14 million/s (*)
2ms Latency(*)
Exactly-once Dynamic DAG
Out of Order
Message
Flexible DSL
DAG
Visualization
Internet of
Thing
function
usability
[*] Test environment: Intel® Xeon™ Processor E5-2690, 4 nodes, each node has 32 CPU cores, and 64GB memory, 10GbE network.
We use default configuration of Gearpump 0.7 release. See backup page for more details.
We use the SOL workload, message size 100 bytes, (https://github.com/intel-hadoop/storm-benchmark) (tested carried by Intel team)

14
Three steps to use it
1. Download binary from http://gearpump.io
2. Submit jar by UI
3. Monitor Status

15
Application Submission Flow
Master
Workers
1. Submit a Jar
Master
Workers
AppMaster 2. Create AppMaster
Master
Workers
AppMaster
YARN
Master
Workers
AppMaster Executor Executor Executor
1 2
43
154. Report Executor to AppMaster
YARN or without YARN

16
Low level Graph API - WordCount
val context = new ClientContext()
val split = Processor[Split](splitParallism)
val sum = Processor[Sum](sumParallism)
val app = StreamApplication("wordCount", Graph(split ~> sum), UserConfig.empty)
val appId = context.submit(app)
context.close()
class Split(taskContext : TaskContext, conf: UserConfig) extends Task(taskContext, conf) {
override def onNext(msg : Message) : Unit = { /* split the line */ }
}
class Sum (taskContext : TaskContext, conf: UserConfig) extends Task(taskContext, conf) {
val count = /**count of words **/
override def onNext(msg : Message) : Unit = {/* do aggregation on word*/}
}
Scala Java

17
High Level DSL API - WordCount
val context = ClientContext()
val app = new StreamApp("dsl", context)
val data = "This is a good start, bingo!! bingo!!"
app.fromCollection(data.lines)
// word => (word, count = 1)
.flatMap(line => line.split("[s]+")).map((_, 1))
// (word, count1), (word, count2) => (word, count1 + count2)
.groupByKey().sum.log
val appId = context.submit(app)
context.close()

18
Akka-stream API – WordCount
implicit val system = ActorSystem("akka-test")
implicit val materializer = new GearpumpMaterializer(system)
val echo = system.actorOf(Props(new Echo()))
val sink = Sink.actorRef(echo, "COMPLETE")
val source = GearSource.from[String](new CollectionDataSource(lines))
source.mapConcat{line => line.split(" ").toList}
.groupBy2(x=>x)
.map(word => (word, 1))
.reduce {(a, b) => (a._1, a._2 + b._2)}
.log("word-count").runWith(sink)
Available at branch https://github.com/gearpump/gearpump/tree/akkastream

19
DAG Page
UI Portal - DAG Visualization
Track global min-Clock
of all message DAG:
• Node size reflect throughput
• Edge width represents flow rate
• Red node means something goes wrong
19

20
UI Portal – Processor Detail
Processor Page
Data skew distribution Task throughput and latency

22
Throughput and Latency
Throughput: 11 million message/second
Latency: 17ms on full load
SOL Shuffle test
32 tasks->32 tasks
[*] Test environment: Intel® Xeon™ Processor E5-2680, 4 nodes, each node has 32 CPU cores, and 64GB memory, 10GbE network. (tested carried by Intel team)
We use default configuration of Gearpump 0.2 release. We use the SOL workload, message size 100 bytes, (https://github.com/intel-hadoop/storm-benchmark)

23
Scalability
• Test run on 100 nodes(*) and 3000 tasks
• Gearpump performance scales:
100 nodes
[*] We use 8 machines to simulate 100 worker nodes
Test environment: Intel® Xeon™ Processor E5-2680, each node has 32 CPU cores, and 64GB memory, 10GbE network. (tested carried by Intel team)
We use default configuration of Gearpump 0.3.5 release. We use the SOL workload, message size 100 bytes, (https://github.com/intel-hadoop/storm-benchmark)

24
Fault-Tolerance: Recovery time
[*]: Recovery time is the time interval between: a) failure happen b) all tasks in topology resume processing data.
91 worker nodes, 1000 tasks
Test environment: Intel® Xeon™ Processor E5-2680 91 worker nodes, 1000 tasks (We use 7 machines to simulate 91 worker nodes). Each node has 32 CPU cores, and 64GB
memory, 10GbE network. We use default configuration of Gearpump 0.3.5 release. We use the SOL workload (https://github.com/intel-hadoop/storm-benchmark) (by Intel team)

25
High performance Messaging Layer
• Akka remote message has a big overhead, (sender + receiver address)
• Reduce 95% overhead (400 bytes to ~20 bytes)
Effective
batching
convert from
short address
convert to short address
Sync with other executors

26
Effective batching
Network Idle: Flush as fast as we can
Network Busy: Smart batching until the network is open again.
This feature is ported from Storm-297
Network Bandwidth
Doubled
For 100 byte per message
Test environment: Same as storm-297

27
High performance flow Control
Pass back-pressure level-by-level
About ~1% throughput impact
TaskTaskTask TaskTaskTask TaskTaskTask TaskTaskTask
Back-pressure
Sliding window
Another option(not used): big-loop-feedback flow control
1. NO central ack nodes
2. Each level knows
network status, thus
can optimize the
network at best

29
What is a good use case for Gearpump?
When you want exactly-once message processing, with millisecond
latency.
When you want to integrate with Akka and Akka-stream transparently.
When you want dynamic modification of online DAG
When you want to connect with IoT edge device, location tranparency.
When you want a simple scheduler to distribute customized
application, like collecting logs, distributed cron…
When you want to use Monoid state like Count-Min Sketch…
Besides, it can integrate with: YARN, Kafka, Storm and etc..

30
IoT Transparent Cloud
Location transparent. Unified programming model across the boundary.
log
Data Center
dag on
device side
Target Problem Large gap between edge devices and data center
case: Intelligent Traffic System, 3000 traffic lights, travel time, overspeed detection…

31
Exactly-once: Financial use cases
Target Problem both real-time and accuracy are important
realtime
Stock index
Crawlers
Process Alerts Actions Reports
Rules
Transaction
AccountOther
data
source
Programing trading

32
Delete
Transformers: Dynamic DAG
Add
change parallelism online to scale out without message loss
add/remove source/sink processor dynamically
B
Target Problem No existing way to manipulate the DAG on the fly
Each Processor can has its own independent jar
Replace
It can

33
Eve: Online Machine Learning
• Decide immediately based online learning result
sensor
Learn
Predict
Decide
Input
Output
Target Problem ML train and predict online for real-time decision support.
TAP analytics platform integration: http://trustedanalytics.org/

35
General ideas
State
Minclock service
Replayable
Source DAG
MinClock service track the min
timestamp of all pending messages in the
system now and future
Message(timestamp)
Every message in the system
Has a application
timestamp(message birth time)
Normal Flow

36
General ideas
State
Minclock service
Replayable
Source
Checkpoint Store
Message(timestamp)
④Exactly-once State can be reconstructed by message replay:
①Detect Message loss at Tp
② clock pause at Tp
③ replay from Tp
Recovery Flow
DAG

37
1. Detect Message loss
2. Pause the clock at Tp when message is lost
3. Replay from clock Tp
4. Exactly-once

38
Detect Failure in time
AckRequest and Ack to
detect Message loss:
Easy to trouble-shoot
When An error happen, we know
 When
 Where
 Why
Master
AppMaster
Executor
Task
Failure
Failure
Failure

39
Executor Executor
AppMaster
TaskTaskTask
Store
Source
Global clock
service
① error
detected
②Fence zombie
Recover the runtime when machine crashed
Use dynamic session ID to
fence zombies
Send message
1. Quarantine

40
Executor
AppMaster
Task
Store
Source
Replay
Global clock
service
① error
detected
②isolate zombie
Executor
Task
③ Recover
DAG Recovery: Quarantine and Recover
2. Recover the executor JVM, and replay message
Send message

41
4. Exactly-once

42
Application’s Clock Service
Definition: Task min-clock is
Minimum of (
min timestamp of pending-messages in current task
Task min-Clock of all upstream tasks
)
Report task min-clock
Level Clock
A
B
C
D
E
Later
Earlier
1000
800
600
400
Ever incremental

43
4. Exactly-once

44
Source-based Message Replay
Normal Flow
Replay from the very-beginning source
Source
Like offset of kafka queue -> message timestamp

45
Source-based Message Replay
Replay from the very-beginning source
Global Clock Service
①Resolve offset with timestamp Tp②Replay from offset
Source
Recovery Flow

46
2. Clock pause at Tp when message is lost
4. Exactly-once

47
Checkpoint Store
DAG runtime
Key: Ensure State(t) only contains
message(timestamp <= t)
Exactly-once message processing
How?
append only checkpoint

48
Two states
State
Accept (t < Tc)
State
Accept all
Checkpoint Store
Streaming SystemMessages
Normal Flow

49
Two states
State
Accept (t < Tc)
State
Accept all
Checkpoint Store
Streaming SystemMessages
Recover checkpoint in failures
Recovery Flow

50
How to do Dynamic DAG?
Multiple-Version DAG
A B C
B’Message.time >= Tc
Target Problem: Replace processor B with B’ at time Tc
DAG(Version = 0) DAG(Version = 1)
transit
Message.time < Tc
NO message loss during the transition
A B C

53
References
• 钟翔大数据时代的软件架构范式：Reactive架构及Akka实践, 程序员期刊2015年2A期
• Gearpump whitepaper http://typesafe.com/blog/gearpump-real-time-streaming-engine-using-akka
• 吴甘沙低延迟流处理系统的逆袭, 程序员期刊2013年10期
• Stonebraker http://cs.brown.edu/~ugur/8rulesSigRec.pdf
• https://github.com/intel-hadoop/gearpump
• Gearpump: https://github.com/intel-hadoop/gearpump
• http://highlyscalable.wordpress.com/2013/08/20/in-stream-big-data-processing/
• https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-
three-cheap-machines
• Sqlstream http://www.sqlstream.com/customers/
• http://www.statsblogs.com/2014/05/19/a-general-introduction-to-stream-processing/
• http://www.statalgo.com/2014/05/28/stream-processing-with-messaging-systems/
• Gartner report on IOT http://www.zdnet.com/article/internet-of-things-devices-will-dwarf-number-
of-pcs-tablets-and-smartphones/

54
Legal Disclaimers
1 This document contains information on products, services and/or processes in development. All
information provided here is subject to change without notice. Contact your Intel representative
to obtain the latest forecast, schedule, specifications and roadmaps.
2 Intel technologies’ features and benefits depend on system configuration and may require
enabled hardware, software or service activation. Performance varies depending on system
configuration. No computer system can be absolutely secure. Check with your system
manufacturer or retailer or learn more at [intel.com].
3 Software and workloads used in performance tests may have been optimized for performance
only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are
measured using specific computer systems, components, software, operations and functions. Any
change to any of those factors may cause the results to vary. You should consult other
information and performance tests to assist you in fully evaluating your contemplated purchases,
including the performance of that product when combined with other products.
§ For more information go to http://www.intel.com/performance.
Intel, the Intel logo, Xeon are trademarks of Intel Corporation in the U.S. and/or other countries.
*Other names and brands may be claimed as the property of others.
© 2015 Intel Corporation

55
Latest Performance evaluation on Gearpump 0.7
This is the latest
performance test on
version 0.7. Please see
the embedded
document on the right
to see the configuration
details.

Strata Singapore: GearpumpReal time DAG-Processing with Akka at Scale

Strata Singapore: GearpumpReal time DAG-Processing with Akka at Scale

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Destacado

Destacado (20)

Similar a Strata Singapore: GearpumpReal time DAG-Processing with Akka at Scale

Similar a Strata Singapore: GearpumpReal time DAG-Processing with Akka at Scale (20)

Último

Último (20)