Tez: Accelerating Data Pipelines - fifthel

©
Hortonworks
Inc.
2014
Page
1

Accelera8ng

Hadoop
Data

Pipelines

Fi>hElephant.in
2014

gopalv
@
apache.org

©
Hortonworks
Inc.
2014

Tez
–
Introduc8on

Page
2

• Distributed
execu-on
framework

targeted
towards
data-‐processing

applica-ons.

• Based
on
expressing
a
computa-on

as
a
dataﬂow
graph.

• Highly
customizable
to
meet
a

broad
spectrum
of
use
cases.

• Built
on
top
of
YARN
–
the
resource

management
framework
for

Hadoop.

• Open
source
Apache
project
and

Apache
licensed.

© Hortonworks Inc. 2014© Hortonworks Inc. 2014. Confidential and Proprietary.
Hadoop
1
-‐>
Hadoop
2

HADOOP 1.0
HDFS

(redundant,
reliable
storage)

MapReduce

(cluster
resource
management

&
data
processing)

Pig

(data
ﬂow)

Hive

(sql)

Others

(cascading)

HDFS2

(redundant,
reliable
storage)

YARN

(cluster
resource
management)

Tez

(execu8on
engine)

HADOOP 2.0
Data
Flow

Pig

SQL

Hive

Others

(Cascading)

Batch

MapReduce
Real
Time

Stream

Processing

Storm

Online

Data

Processing

HBase,

Accumulo

Monolithic

•  Resource
Management

•  Execu-on
Engine

•  User
API

Layered

•  Resource
Management
–
YARN

•  Execu-on
Engine
–
Tez

•  User
API
–
Hive,
Pig,
Cascading,
Your
App!

© Hortonworks Inc. 2014
Tez
–
Design
considera8ons

Don’t
solve
problems
that
have
already
been
solved.
Or
you

will
have
to
solve
them
again!

•  Leverage
discrete
task
based
compute
model
for
elas8city,
scalability

and
fault
tolerance

•  Leverage
several
man
years
of
work
in
Hadoop
Map-‐Reduce
data

shuﬄing
opera8ons

•  Leverage
proven
resource
sharing
and
mul8-‐tenancy
model
for
Hadoop

and
YARN

•  Leverage
built-‐in
security
mechanisms
in
Hadoop
for
privacy
and

isola8on

Page 4
Look
to
the
Future
with
an
eye
on
the
Past

Tez
–
Problems
that
it
addresses

• Expressing
the
computa-on

•  Direct
and
elegant
representa8on
of
the
data
processing
flow

•  Interfacing
with
applica8on
code
and
new
technologies

• Performance

•  Late
Binding
:
Make
decisions
as
late
as
possible
using
real
data
from
at

run8me

•  Leverage
the
resources
of
the
cluster
efficiently

•  Just
work
out
of
the
box!

•  Customizable
engine
to
let
applica8ons
tailor
the
job
to
meet
their

specific
requirements

• Opera-on
simplicity

•  Painless
to
operate,
experiment
and
upgrade

Page 5

Tez
–
Simplifying
Opera8ons

•  Tez
is
a
pure
YARN
applica8on.
Easy
and
safe
to
try
it
out!

•  No
deployments
to
do,
no
servers
to
run

•  Enables
running
diﬀerent
versions
concurrently.
Easy
to
test
new

func8onality
while
keeping
stable
versions
for
produc8on.

•  Leverages
YARN
local
resources.

Page 6
Client
Machine
Node
Manager
TezTask
Node
Manager
TezTaskTezClient
HDFS
Tez Lib 1 Tez Lib 2
Client
Machine
TezClient

Tez
–
Expressing
the
computa8on

Page 7
Aggregate Stage
Partition Stage
Preprocessor Stage
Sampler
Task-1 Task-2
Task-1 Task-2
Task-1 Task-2
Samples
Ranges
Distributed Sort
Distributed
data
processing
jobs
typically
look
like
DAGs
(Directed
Acyclic

Graph).

•  Ver-ces
in
the
graph
represent
data
transforma-ons

•  Edges
represent
data
movement
from
producers
to
consumers

MR
is
a
2-‐vertex
sub-‐set
of
Tez

Page 8

But
Tez
is
so
much
more

Page 9

Tez
–
Expressing
the
computa8on

Page 10
Tez
defines
the
following
APIs
to
define
the
work

• DAG
API

•  Defines
the
structure
of
the
data
processing
and
the
rela8onship

between
producers
and
consumers

•  Enable
defini8on
of
complex
data
flow
pipelines
using
simple
graph

connec8on
API’s.
Tez
expands
the
logical
DAG
at
run8me

•  This
is
how
all
the
tasks
in
the
job
get
specified

• Run-me
API

•  Defines
the
interface
using
which
the
framework
and
app
code
interact

with
each
other

•  App
code
transforms
data
and
moves
it
between
tasks

•  This
is
how
we
specify
what
actually
executes
in
each
task
on
the
cluster

nodes

Tez
–
DAG
API

//
Define
DAG

DAG
dag
=
new
DAG();

//
Define
Vertex

Vertex
source
=
new
Vertex(Processor.class);

//
Define
Edge

Edge
edge
=
Edge(source,
des8na8on,

SCATTER_GATHER,
PERSISTED,
SEQUENTIAL,

Output.class,
Input.class);

//
Connect
them

dag.addVertex(source).addEdge(edge)…

Page 11
reduce1
map2
reduce2
join1
map1
Scatter_Gather
Bipartite
Sequential
Scatter_Gather
Bipartite
Sequential
Defines the global processing flow

Tez
–
Logical
DAG
expansion
at
Run8me

Page 12
reduce1
map2
reduce2
join1
map1

Tez
–
Library
of
Inputs
and
Outputs

Page 13
Classical
‘Map’
Classical
‘Reduce’

Intermediate
‘Reduce’
for

Map-‐Reduce-‐Reduce

Map

Processor

HDFS

Input

Sorted

Output

Reduce

Processor

Shuﬄe

Input

HDFS

Output

Reduce

Processor

Shuﬄe

Input

Sorted

Output

• What
is
built
in?

– 
Hadoop
InputFormat/OutputFormat

– 
SortedGroupedPar88oned
Key-‐Value

Input/Output

– 
UnsortedGroupedPar88oned
Key-‐
Value
Input/Output

– 
Key-‐Value
Input/Output

Tez
–
Broadcast
Edge

SELECT ss.ss_item_sk, ss.ss_quantity, avg_price, inv.inv_quantity_on_hand
FROM (select avg(ss_sold_price) as avg_price, ss_item_sk, ss_quantity_sk
from store_sales
group by ss_item_sk) ss
JOIN inventory inv
ON (inv.inv_item_sk = ss.ss_item_sk);
Hive – MR Hive – Tez
M
M
M
M M
HDFS
Store Sales
scan. Group by
and aggregation
reduce size of
this input.
Inventory scan
and Join
Broadcast
edge
M M M
HDFS
Store Sales
scan. Group by
and aggregation.
Inventory and Store
Sales (aggr.) output
scan and shuffle
join.
R R
R R
RR
M
MMM
HDFS
Hive
:

Broadcast
Join

Tez
–
Custom
Edge

SELECT ss.ss_item_sk, ss.ss_quantity, inv.inv_quantity_on_hand
FROM store_sales ss
JOIN inventory inv
ON (inv.inv_item_sk = ss.ss_item_sk);
M MM
M M
HDFS
Inventory scan
(Runs on
cluster
potentially more
than 1 mapper)
Store Sales
scan and Join
(Custom vertex
reads both
inputs – no side
file reads)
Custom
edge (routes
outputs of
previous stage to
the correct
Mappers of the
next stage)
M MM
M
HDFS
Inventory scan
(Runs as single
local map task)
Store Sales
scan and Join
(Inventory hash
table read as
side file)
HDFS
Hive
:
Dynamically

Par88oned
Hash
Join

Tez
–
Mul8ple
Outputs

FROM (SELECT * FROM store_sales, date_dim WHERE ss_sold_date_sk
= d_date_sk and d_year = 2000)
INSERT INTO TABLE t1 SELECT distinct ss_item_sk
INSERT INTO TABLE t2 SELECT distinct ss_customer_sk;
M MM
M
HDFS
Map join date_dim/
store sales
Two MR jobs to
do the distinct
M MM
M M
HDFS
RR
HDFS
M M M
R
M M M
R
HDFS
Broadcast Join
(scan date_dim,
join store sales)
Distinct for
customer + items
Materialize join on
HDFS
Hive
:
Mul8-‐insert

queries

Tez
–
One
to
One
Edge

Page 17
Aggregate
Sample L
Join
Stage sample map
on distributed cache
l = LOAD ‘left’ AS (x, y);
r = LOAD ‘right’ AS (x, z);
j = JOIN l BY x, r BY x
USING ‘skewed’;
Load &
Sample
Aggregate
Partition L
Join
Pass through input
via 1-1 edge
Partition R
HDFS
Broadcast
sample map
Partition L and Partition R
Pig – MR Pig – Tez
Pig
:
Skewed
Join

Tez
–
Bringing
it
all
together

Page 18
Architecting the Future of Big Data
Tez Session populates
container pool
Dimension table
calculation and HDFS
split generation in
parallel
Dimension tables
broadcasted to Hive
MapJoin tasks
Final Reducer pre-
launched and fetches
completed inputs
TPCDS – Query-27 with Hive on Tez

Tez
–
Performance

• Benefits
of
expressing
the
data
processing
as
a
DAG

•  Reducing
overheads
and
queuing
effects

•  Gives
system
the
global
picture
for
beper
planning

• Efficient
use
of
resources

•  Re-‐use
resources
to
maximize
u8liza8on

•  Pre-‐launch,
pre-‐warm
and
cache

•  Locality
&
resource
aware
scheduling

• Support
for
applica-on
defined
DAG
modifica-ons
at
run-me

for
op-mized
execu-on

•  Change
task
concurrency

•  Change
task
scheduling

•  Change
DAG
edges

•  Change
DAG
ver8ces

Page 19

Tez
–
Benefits
of
DAG
execu8on

• Faster
Execu-on
and
Higher
Predictability

– Eliminate
replicated
write
barrier
between
successive
computa8ons.

– Eliminate
job
launch
overhead
of
workflow
jobs.

– Eliminate
extra
stage
of
map
reads
in
every
workflow
job.

– Eliminate
queue
and
resource
conten8on
suffered
by
workflow
jobs

that
are
started
a>er
a
predecessor
job
completes.

– Beper
locality
because
the
engine
has
the
global
picture

Page 20
Pig/Hive - MR
Pig/Hive - Tez

Tez
–
Container
Re-‐Use

• Reuse
YARN
containers/JVMs
to
launch
new
tasks

• Reduce
scheduling
and
launching
delays

• Shared
in-‐memory
data
across
tasks

• JVM
JIT
friendly
execu8on

Page 21
YARN Container / JVM
TezTask Host
TezTask1
TezTask2
SharedObjects
YARN Container
Tez
Application Master
Start Task
Task Done
Start Task

Tez
–
Sessions

Page 22
Application Master
Client
Start
Session
Submit
DAG
Task Scheduler
ContainerPool
Shared
Object
Registry
Pre
Warmed
JVM
Sessions

•  Standard
concepts
of
pre-‐launch

and
pre-‐warm
applied

•  Key
for
Interac8ve
queries

•  Represents
a
connec8on
between

the
user
and
the
cluster

•  Mul8ple
DAGs/Queries
executed
in

the
same
AM

•  Containers
re-‐used
across
queries

•  Takes
care
of
data
locality
and

releasing
resources
when
idle

Tez
–
Re-‐Use
in
Ac8on

Task
Execu8on

Timeline

Tez
–
Customizable
Core
Engine

Page 24
Vertex-2
Vertex-1
Start
vertex
Vertex Manager
Start
tasks
DAG
Scheduler
Get Priority
Get Priority
Start
vertex
Task
Scheduler
Get container
Get container
•  Vertex Manager
•  Determines task
parallelism
•  Determines
when tasks in a
vertex can start.
•  DAG Scheduler
Determines priority
of task
•  Task Scheduler
Allocates
containers from
YARN and assigns
them to tasks

Tez
–
Theory
to
Prac8ce

• In theory, there is no difference
between theory and practice.
• But, in practice, there is.
Page 25

Tez
–
Data
at
scale

Page 26
Hive
TPC-‐DS

Scale
10TB

0
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
Replicated
Join (2.8x)
Join +
Groupby
(1.5x)
Join +
Groupby +
Orderby
(1.5x)
3 way Split +
Join +
Groupby +
Orderby
(2.6x)
Timeinsecs
MR
Tez
Tez
–
Pig
performance
gains

•  Demonstrate
performance
gains
from
a
basic
transla8on
to
a

Tez
DAG

•  Deeper
integra8on
in
the
works
for
further
boost

Tez
–
itera8ve
algorithms

•  Pig
can
do
itera8ve
algorithms
on
top
of
Tez

•  This
uses
heavy-‐weight
itera8on
(for-‐loop
+
map)

•  Future
work
for
faster
loop-‐unrolled
out-‐of-‐order
itera8on

•  1-‐1
edges
between
loops

allows
building
morsel
style

parallelism

0
1000
2000
3000
10 50 100
Timeinsecs
Iteration
k-means
MR
Tez
14.84X
13.12X
5.37X
* Source code at http://hortonworks.com/blog/new-apache-pig-features-part-2-embedding

Tez
–
Designed
for
big,
busy
clusters

•  Number of stages in the DAG
•  Higher the number of stages in the DAG, performance of Tez (over MR)
will be better.
•  Cluster/queue capacity
•  More congested a queue is, the performance of Tez (over MR) will be
better due to container reuse.
•  Size of intermediate output
•  More the size of intermediate output, the performance of Tez (over MR)
will be better due to reduced HDFS usage (cross-rack traffic)
•  Size of data in the job
•  For smaller data and more stages, the performance of Tez (over MR) will
be better as percentage of launch overhead in the total time is high for
smaller jobs.
•  Move workloads from gateway boxes to the cluster
•  Move as much work as possible to the cluster by modelling it via the job
DAG. Exploit the parallelism and resources of the cluster.
Page 29

Tez
–
what
if
you
can’t
get
enough
containers?

• 78 vertex + 8374 tasks on 50 YARN containers
Page 30

Tez
–
Adop8on

• Hive

•  Hadoop
standard
for
declara8ve
access
via
SQL-‐like
interface

• Pig

•  Hadoop
standard
for
procedural
scrip8ng
and
pipeline
processing

• Cascading

•  Developer
friendly
Java
API
and
SDK

•  Scalding
(Scala
API
on
Cascading)

• Commercial
Vendors

•  ETL
:
Use
Tez
instead
of
MR
or
custom
pipelines

•  Analy8cs
Vendors
:
Use
Tez
as
a
target
plasorm
for
scaling
parallel

analy8cal
tools
to
large
data-‐sets

Page 31

Tez
–
Roadmap

• Richer
DAG
support

– 
Addi8on
of
ver8ces
at
run8me

– 
Shared
edges
for
shared
outputs

– 
Enhance
Input/Output
collec8ons

• Performance
op-miza-ons

– 
Improve
throughput
at
high
concurrency

– 
Improve
locality
aware
scheduling
(co-‐scheduling)

– 
Add
framework
level
data
sta8s8cs

– 
HDFS
memory
storage
integra8on

• Usability

– 
Stability
and
testability

– 
API
ease
of
use

– 
Tools
for
performance
analysis
and
debugging

Page 32

Tez
–
Community

• Early
adopters
and
code
contributors
welcome

– Adopters
to
drive
more
scenarios.
Contributors
to
make
them
happen.

• Technical
blog
series

– hpp://hortonworks.com/blog/apache-‐tez-‐a-‐new-‐chapter-‐in-‐hadoop-‐data-‐
processing

• Useful
links

– Work
tracking:
hpps://issues.apache.org/jira/browse/TEZ

– Code:
hpps://github.com/apache/tez

– 
Developer
list:
dev@tez.apache.org

User
list:
user@tez.apache.org

Issues
list:
issues@tez.apache.org

Page 33

Tez: Accelerating Data Pipelines - fifthel

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Tez: Accelerating Data Pipelines - fifthel

Similar to Tez: Accelerating Data Pipelines - fifthel (20)

Recently uploaded

Recently uploaded (20)

Tez: Accelerating Data Pipelines - fifthel