The Hive Think Tank: Heron at Twitter

Twitter Heron In
Practice
KARTHIK RAMASAMY
@KARTHIKZ
#TwitterHeron
MAOSONG FU
@LOUIS_FUMAOSONG
BILL GRAHAM
@BILLGRAHAM

AGENDA
HERON OVERVIEW
HERON HANDS-ON

BEGIN
END
HERON
OVERVIEW
!
I
HERON IN
PRODUCTION
(II
CONCLUSION
K
V
HERON
LOAD
SHEDDING
ZIV
TALK OUTLINE
HERON
BACKPRESSURE
bIII

STORM/HERON TERMINOLOGY
TOPOLOGY
Directed acyclic graph
Vertices=computation, and edges=streams of data tuples
SPOUTS
Sources of data tuples for the topology
Examples - Kafka/Kestrel/MySQL/Postgres
BOLTS
Process incoming tuples and emit outgoing tuples
Examples - filtering/aggregation/join/arbitrary function
,
%

STORM/HERON TOPOLOGY
%
%
%
%
%
SPOUT 1
SPOUT 2
BOLT 1
BOLT 2
BOLT 3
BOLT 4
BOLT 5

WHY HERON?
PERFORMANCE PREDICTABILITY
EASE OF MANAGEABILITY
!
d
" IMPROVE DEVELOPER PRODUCTIVITY

HERON ARCHITECTURE
Topology 1
TOPOLOGY
SUBMISSION
Scheduler
Topology 2
Topology 3
Topology N

TOPOLOGY ARCHITECTURE
Topology
Master
ZK
CLUSTER
Stream
Manager
I1 I2 I3 I4
Stream
Manager
I1 I2 I3 I4
Logical Plan,
Physical Plan and
Execution State
Sync Physical Plan
CONTAINER CONTAINER
Metrics
Manager
Metrics
Manager

Large amount of data
produced every day
Large cluster Several hundred
topologies deployed
Several billion
messages every day
HERON @TWITTER
1 stage 10 stages
3x reduction in cores and memory
Heron has been in production for 2 years

HERON USE CASES
REALTIME
ETL
REAL TIME
BI
SPAM
DETECTION
REAL TIME
TRENDS
REALTIME
ML
REAL TIME
MEDIA
REAL TIME
OPS

STRAGGLERS
BAD HOST EXECUTION
SKEW
INADEQUATE
PROVISIONING
b Ñ
Stragglers are the norm in a multi-tenant distributed systems

APPROACHES TO HANDLE STRAGGLERS
SENDERS TO STRAGGLER DROP DATA
DETECT STRAGGLERS AND RESCHEDULE THEM
!
d
" SENDERS SLOW DOWN TO THE SPEED OF STRAGGLER

DROP DATA STRATEGY
UNPREDICTABLE AFFECTS
ACCURACY
POOR
VISIBILITY
b Ñ

SLOW DOWN SENDER STRATEGY
PROVIDES
PREDICTABILITY
PROCESSES
DATA AT
MAXIMUM
RATE
REDUCE
RECOVERY
TIMES
HANDLES
TEMPORARY
SPIKES
/b Ñ
back pressure

SLOW DOWN SENDER STRATEGY
% %
S1 B2 B3
%
B4
back pressure

S1 B2
B3
SLOW DOWN SENDERS STRATEGY
Stream
Manager
Stream
Manager
Stream
Manager
Stream
Manager
S1 B2
B3 B4
S1 B2
B3
S1 B2
B3 B4
B4
S1 S1
S1S1

BACK PRESSURE IN PRACTICE
Read pointer Write pointer
Manually restart the container

BACK PRESSURE IN PRACTICE
IN MOST SCENARIOS BACK PRESSURE RECOVERS
Without any manual intervention
SOMETIMES USER PREFER DROPPING OF DATA
Care about only latest data
!
d
"
SUSTAINED BACK PRESSURE
Irrecoverable GC cycles
Bad or faulty host

ENVIRONMENT'S SUPPORTED
STORM API
PRE- 1.0.0
POST 1.0.0
!
SUMMINGBIRD FOR HERON

CURIOUS TO LEARN MORE…
Twitter Heron: Stream Processing at Scale
Sanjeev Kulkarni, Nikunj Bhagat, Maosong Fu, Vikas Kedigehalli, Christopher Kellogg,
Sailesh Mittal, Jignesh M. Patel*,1
, Karthik Ramasamy, Siddarth Taneja
@sanjeevrk, @challenger_nik, @Louis_Fumaosong, @vikkyrk, @cckellogg,
@saileshmittal, @pateljm, @karthikz, @staneja
Twitter, Inc., *University of Wisconsin – Madison
ABSTRACT
Storm has long served as the main platform for real-time analytics
at Twitter. However, as the scale of data being processed in real-
time at Twitter has increased, along with an increase in the
diversity and the number of use cases, many limitations of Storm
have become apparent. We need a system that scales better, has
better debug-ability, has better performance, and is easier to
manage – all while working in a shared cluster infrastructure. We
considered various alternatives to meet these needs, and in the end
concluded that we needed to build a new real-time stream data
processing system. This paper presents the design and
implementation of this new system, called Heron. Heron is now
system process, which makes debugging very challenging. Thus, we
needed a cleaner mapping from the logical units of computation to
each physical process. The importance of such clean mapping for
debug-ability is really crucial when responding to pager alerts for a
failing topology, especially if it is a topology that is critical to the
underlying business model.
In addition, Storm needs dedicated cluster resources, which requires
special hardware allocation to run Storm topologies. This approach
leads to inefficiencies in using precious cluster resources, and also
limits the ability to scale on demand. We needed the ability to work
in a more flexible way with popular cluster scheduling software that
allows sharing the cluster resources across different types of data
Storm @Twitter
Ankit Toshniwal, Siddarth Taneja, Amit Shukla, Karthik Ramasamy, Jignesh M. Patel*, Sanjeev Kulkarni,
Jason Jackson, Krishna Gade, Maosong Fu, Jake Donham, Nikunj Bhagat, Sailesh Mittal, Dmitriy Ryaboy
@ankitoshniwal, @staneja, @amits, @karthikz, @pateljm, @sanjeevrk,
@jason_j, @krishnagade, @Louis_Fumaosong, @jakedonham, @challenger_nik, @saileshmittal, @squarecog
Twitter, Inc., *University of Wisconsin – Madison
Streaming@Twitter
Maosong Fu, Sailesh Mittal, Vikas Kedigehalli, Karthik Ramasamy, Michael Barry,
Andrew Jorgensen, Christopher Kellogg, Neng Lu, Bill Graham, Jingwei Wu
Twitter, Inc.
Abstract
Twitter generates tens of billions of events per hour when users interact with it. Analyzing these
events to surface relevant content and to derive insights in real time is a challenge. To address this, we
developed Heron, a new real time distributed streaming engine. In this paper, we ﬁrst describe the design
goals of Heron and show how the Heron architecture achieves task isolation and resource reservation
to ease debugging, troubleshooting, and seamless use of shared cluster infrastructure with other critical
Twitter services. We subsequently explore how a topology self adjusts using back pressure so that the
pace of the topology goes as its slowest component. Finally, we outline how Heron implements at most
once and at least once semantics and we describe a few operational stories based on running Heron in
production.
1 Introduction

INTERESTED IN HERON?
CONTRIBUTIONS ARE WELCOME!
https://github.com/twitter/heron
http://heronstreaming.io
HERON IS OPEN SOURCED
FOLLOW US @HERONSTREAMING

QUESTIONS
and
ANSWERS
R
" Go ahead. Ask away.

Heron Hands On
BILL GRAHAM, MAOSONG FU
@BILLGRAHAM
@LOUIS_FUMAOSONG
#TwitterHeron

BEGIN
END
INSTALL
HERON
!
I
LAUNCH
TOPOLOGIES
(II
CONCLUSION
K
V
HERON
STARTER
ZIV
AGENDA OUTLINE
EXPLORE UI
bIII

HELP LINE
SLACK INVITE
heronusers.herokuapp.com
DOCUMENTATION
heronstreaming.io

HERON PREREQS
JAVA 7 OR ABOVE
PYTHON 2.7
LINUX PLATFORMS: INSTALL LIBUNWIND

INSTALL HERON CLIENT
STEP 1: DOWNLOAD CLIENT BINARIES
https://github.com/twitter/heron/releases/tag/0.14.2
heron-client-install-0.14.2-<platform>.sh
STEP 2: INSTALL CLIENT BINARIES
chmod +x heron-client-install-0.14.2-<platform>.sh
./heron-client-install-0.14.2-<platform>.sh —user

INSTALL HERON TOOLS
STEP 1: DOWNLOAD TOOLS BINARIES
https://github.com/twitter/heron/releases/tag/0.14.2
heron-tools-install-0.14.2-<platform>.sh
STEP 2: INSTALL TOOLS BINARIES
chmod +x heron-tools-install-0.14.2-<platform>.sh
./heron-tools-install-0.14.2-<platform>.sh —user

LAUNCH EXAMPLE TOPOLOGIES
STEP 1: SUBMIT TOPOLOGY
heron submit local ~/.heron/examples/heron-examples.jar
com.twitter.heron.examples.ExclamationTopology
ExclamationTopology
STEP 2: SUBMIT YET ANOTHER TOPOLOGY
com.twitter.heron.examples.AckingTopology
AckingTopology
STEP 3: SUBMIT YET ANOTHER ANOTHER TOPOLOGY
com.twitter.heron.examples.MultiStageAckingTopology
MultiStageAckingTopology

LAUNCH HERON TOOLS
STEP 1: LAUNCH HERON TRACKER
heron-tracker
STEP 2: LAUNCH HERON UI
heron-ui
http://localhost:8888
http://localhost:8889

EXPLORE UI
STEP 1: LOGICAL PLAN AND PHYSICAL PLAN
STEP 2: EXPLORE DASHBOARD
STEP 3: VIEW METRICS

EXPLORE UI
STEP 4: VIEW PROCESS LOGS
STEP 5: VIEW EXCEPTIONS
STEP 6: EXPLORE OTHER FEATURES

HERON STARTER
SEVERAL STARTER EXAMPLES
https://github.com/kramasamy/heron-starter
INTERESTING EXERCISES
Trending Twitter Words
Trending Twitter Hashtags
Find top retweeters for given hash tag

QUESTIONS?
https://groups.google.com/forum/#!forum/heron-users
http://heronstreaming.io
FOR UPDATES FOLLOW
@heronstreaming

The Hive Think Tank: Heron at Twitter

Recomendados

Recomendados

Más contenido relacionado

Destacado

Destacado (19)

Similar a The Hive Think Tank: Heron at Twitter

Similar a The Hive Think Tank: Heron at Twitter (20)

Más de The Hive

Más de The Hive (14)

Último

Último (20)

The Hive Think Tank: Heron at Twitter