More Related Content Similar to Tez: Accelerating Data Pipelines - fifthel (20) Tez: Accelerating Data Pipelines - fifthel1. ©
Hortonworks
Inc.
2014
Page
1
Accelera8ng
Hadoop
Data
Pipelines
Fi>hElephant.in
2014
gopalv
@
apache.org
2. ©
Hortonworks
Inc.
2014
Tez
–
Introduc8on
Page
2
• Distributed
execu-on
framework
targeted
towards
data-‐processing
applica-ons.
• Based
on
expressing
a
computa-on
as
a
dataflow
graph.
• Highly
customizable
to
meet
a
broad
spectrum
of
use
cases.
• Built
on
top
of
YARN
–
the
resource
management
framework
for
Hadoop.
• Open
source
Apache
project
and
Apache
licensed.
3. © Hortonworks Inc. 2014© Hortonworks Inc. 2014. Confidential and Proprietary.
Hadoop
1
-‐>
Hadoop
2
HADOOP 1.0
HDFS
(redundant,
reliable
storage)
MapReduce
(cluster
resource
management
&
data
processing)
Pig
(data
flow)
Hive
(sql)
Others
(cascading)
HDFS2
(redundant,
reliable
storage)
YARN
(cluster
resource
management)
Tez
(execu8on
engine)
HADOOP 2.0
Data
Flow
Pig
SQL
Hive
Others
(Cascading)
Batch
MapReduce
Real
Time
Stream
Processing
Storm
Online
Data
Processing
HBase,
Accumulo
Monolithic
• Resource
Management
• Execu-on
Engine
• User
API
Layered
• Resource
Management
–
YARN
• Execu-on
Engine
–
Tez
• User
API
–
Hive,
Pig,
Cascading,
Your
App!
4. © Hortonworks Inc. 2014
Tez
–
Design
considera8ons
Don’t
solve
problems
that
have
already
been
solved.
Or
you
will
have
to
solve
them
again!
• Leverage
discrete
task
based
compute
model
for
elas8city,
scalability
and
fault
tolerance
• Leverage
several
man
years
of
work
in
Hadoop
Map-‐Reduce
data
shuffling
opera8ons
• Leverage
proven
resource
sharing
and
mul8-‐tenancy
model
for
Hadoop
and
YARN
• Leverage
built-‐in
security
mechanisms
in
Hadoop
for
privacy
and
isola8on
Page 4
Look
to
the
Future
with
an
eye
on
the
Past
5. © Hortonworks Inc. 2014
Tez
–
Problems
that
it
addresses
• Expressing
the
computa-on
• Direct
and
elegant
representa8on
of
the
data
processing
flow
• Interfacing
with
applica8on
code
and
new
technologies
• Performance
• Late
Binding
:
Make
decisions
as
late
as
possible
using
real
data
from
at
run8me
• Leverage
the
resources
of
the
cluster
efficiently
• Just
work
out
of
the
box!
• Customizable
engine
to
let
applica8ons
tailor
the
job
to
meet
their
specific
requirements
• Opera-on
simplicity
• Painless
to
operate,
experiment
and
upgrade
Page 5
6. © Hortonworks Inc. 2014
Tez
–
Simplifying
Opera8ons
• Tez
is
a
pure
YARN
applica8on.
Easy
and
safe
to
try
it
out!
• No
deployments
to
do,
no
servers
to
run
• Enables
running
different
versions
concurrently.
Easy
to
test
new
func8onality
while
keeping
stable
versions
for
produc8on.
• Leverages
YARN
local
resources.
Page 6
Client
Machine
Node
Manager
TezTask
Node
Manager
TezTaskTezClient
HDFS
Tez Lib 1 Tez Lib 2
Client
Machine
TezClient
7. © Hortonworks Inc. 2014
Tez
–
Expressing
the
computa8on
Page 7
Aggregate Stage
Partition Stage
Preprocessor Stage
Sampler
Task-1 Task-2
Task-1 Task-2
Task-1 Task-2
Samples
Ranges
Distributed Sort
Distributed
data
processing
jobs
typically
look
like
DAGs
(Directed
Acyclic
Graph).
• Ver-ces
in
the
graph
represent
data
transforma-ons
• Edges
represent
data
movement
from
producers
to
consumers
10. © Hortonworks Inc. 2014
Tez
–
Expressing
the
computa8on
Page 10
Tez
defines
the
following
APIs
to
define
the
work
• DAG
API
• Defines
the
structure
of
the
data
processing
and
the
rela8onship
between
producers
and
consumers
• Enable
defini8on
of
complex
data
flow
pipelines
using
simple
graph
connec8on
API’s.
Tez
expands
the
logical
DAG
at
run8me
• This
is
how
all
the
tasks
in
the
job
get
specified
• Run-me
API
• Defines
the
interface
using
which
the
framework
and
app
code
interact
with
each
other
• App
code
transforms
data
and
moves
it
between
tasks
• This
is
how
we
specify
what
actually
executes
in
each
task
on
the
cluster
nodes
11. © Hortonworks Inc. 2014
Tez
–
DAG
API
//
Define
DAG
DAG
dag
=
new
DAG();
//
Define
Vertex
Vertex
source
=
new
Vertex(Processor.class);
//
Define
Edge
Edge
edge
=
Edge(source,
des8na8on,
SCATTER_GATHER,
PERSISTED,
SEQUENTIAL,
Output.class,
Input.class);
//
Connect
them
dag.addVertex(source).addEdge(edge)…
Page 11
reduce1
map2
reduce2
join1
map1
Scatter_Gather
Bipartite
Sequential
Scatter_Gather
Bipartite
Sequential
Defines the global processing flow
12. © Hortonworks Inc. 2014
Tez
–
Logical
DAG
expansion
at
Run8me
Page 12
reduce1
map2
reduce2
join1
map1
13. © Hortonworks Inc. 2014
Tez
–
Library
of
Inputs
and
Outputs
Page 13
Classical
‘Map’
Classical
‘Reduce’
Intermediate
‘Reduce’
for
Map-‐Reduce-‐Reduce
Map
Processor
HDFS
Input
Sorted
Output
Reduce
Processor
Shuffle
Input
HDFS
Output
Reduce
Processor
Shuffle
Input
Sorted
Output
• What
is
built
in?
–
Hadoop
InputFormat/OutputFormat
–
SortedGroupedPar88oned
Key-‐Value
Input/Output
–
UnsortedGroupedPar88oned
Key-‐
Value
Input/Output
–
Key-‐Value
Input/Output
14. © Hortonworks Inc. 2014
Tez
–
Broadcast
Edge
SELECT ss.ss_item_sk, ss.ss_quantity, avg_price, inv.inv_quantity_on_hand
FROM (select avg(ss_sold_price) as avg_price, ss_item_sk, ss_quantity_sk
from store_sales
group by ss_item_sk) ss
JOIN inventory inv
ON (inv.inv_item_sk = ss.ss_item_sk);
Hive – MR Hive – Tez
M
M
M
M M
HDFS
Store Sales
scan. Group by
and aggregation
reduce size of
this input.
Inventory scan
and Join
Broadcast
edge
M M M
HDFS
Store Sales
scan. Group by
and aggregation.
Inventory and Store
Sales (aggr.) output
scan and shuffle
join.
R R
R R
RR
M
MMM
HDFS
Hive
:
Broadcast
Join
15. © Hortonworks Inc. 2014
Tez
–
Custom
Edge
SELECT ss.ss_item_sk, ss.ss_quantity, inv.inv_quantity_on_hand
FROM store_sales ss
JOIN inventory inv
ON (inv.inv_item_sk = ss.ss_item_sk);
Hive – MR Hive – Tez
M MM
M M
HDFS
Inventory scan
(Runs on
cluster
potentially more
than 1 mapper)
Store Sales
scan and Join
(Custom vertex
reads both
inputs – no side
file reads)
Custom
edge (routes
outputs of
previous stage to
the correct
Mappers of the
next stage)
M MM
M
HDFS
Inventory scan
(Runs as single
local map task)
Store Sales
scan and Join
(Inventory hash
table read as
side file)
HDFS
Hive
:
Dynamically
Par88oned
Hash
Join
16. © Hortonworks Inc. 2014
Tez
–
Mul8ple
Outputs
FROM (SELECT * FROM store_sales, date_dim WHERE ss_sold_date_sk
= d_date_sk and d_year = 2000)
INSERT INTO TABLE t1 SELECT distinct ss_item_sk
INSERT INTO TABLE t2 SELECT distinct ss_customer_sk;
Hive – MR Hive – Tez
M MM
M
HDFS
Map join date_dim/
store sales
Two MR jobs to
do the distinct
M MM
M M
HDFS
RR
HDFS
M M M
R
M M M
R
HDFS
Broadcast Join
(scan date_dim,
join store sales)
Distinct for
customer + items
Materialize join on
HDFS
Hive
:
Mul8-‐insert
queries
17. © Hortonworks Inc. 2014
Tez
–
One
to
One
Edge
Page 17
Aggregate
Sample L
Join
Stage sample map
on distributed cache
l = LOAD ‘left’ AS (x, y);
r = LOAD ‘right’ AS (x, z);
j = JOIN l BY x, r BY x
USING ‘skewed’;
Load &
Sample
Aggregate
Partition L
Join
Pass through input
via 1-1 edge
Partition R
HDFS
Broadcast
sample map
Partition L and Partition R
Pig – MR Pig – Tez
Pig
:
Skewed
Join
18. © Hortonworks Inc. 2014
Tez
–
Bringing
it
all
together
Page 18
Architecting the Future of Big Data
Tez Session populates
container pool
Dimension table
calculation and HDFS
split generation in
parallel
Dimension tables
broadcasted to Hive
MapJoin tasks
Final Reducer pre-
launched and fetches
completed inputs
TPCDS – Query-27 with Hive on Tez
19. © Hortonworks Inc. 2014
Tez
–
Performance
• Benefits
of
expressing
the
data
processing
as
a
DAG
• Reducing
overheads
and
queuing
effects
• Gives
system
the
global
picture
for
beper
planning
• Efficient
use
of
resources
• Re-‐use
resources
to
maximize
u8liza8on
• Pre-‐launch,
pre-‐warm
and
cache
• Locality
&
resource
aware
scheduling
• Support
for
applica-on
defined
DAG
modifica-ons
at
run-me
for
op-mized
execu-on
• Change
task
concurrency
• Change
task
scheduling
• Change
DAG
edges
• Change
DAG
ver8ces
Page 19
20. © Hortonworks Inc. 2014
Tez
–
Benefits
of
DAG
execu8on
• Faster
Execu-on
and
Higher
Predictability
– Eliminate
replicated
write
barrier
between
successive
computa8ons.
– Eliminate
job
launch
overhead
of
workflow
jobs.
– Eliminate
extra
stage
of
map
reads
in
every
workflow
job.
– Eliminate
queue
and
resource
conten8on
suffered
by
workflow
jobs
that
are
started
a>er
a
predecessor
job
completes.
– Beper
locality
because
the
engine
has
the
global
picture
Page 20
Pig/Hive - MR
Pig/Hive - Tez
21. © Hortonworks Inc. 2014
Tez
–
Container
Re-‐Use
• Reuse
YARN
containers/JVMs
to
launch
new
tasks
• Reduce
scheduling
and
launching
delays
• Shared
in-‐memory
data
across
tasks
• JVM
JIT
friendly
execu8on
Page 21
YARN Container / JVM
TezTask Host
TezTask1
TezTask2
SharedObjects
YARN Container
Tez
Application Master
Start Task
Task Done
Start Task
22. © Hortonworks Inc. 2014
Tez
–
Sessions
Page 22
Application Master
Client
Start
Session
Submit
DAG
Task Scheduler
ContainerPool
Shared
Object
Registry
Pre
Warmed
JVM
Sessions
• Standard
concepts
of
pre-‐launch
and
pre-‐warm
applied
• Key
for
Interac8ve
queries
• Represents
a
connec8on
between
the
user
and
the
cluster
• Mul8ple
DAGs/Queries
executed
in
the
same
AM
• Containers
re-‐used
across
queries
• Takes
care
of
data
locality
and
releasing
resources
when
idle
24. © Hortonworks Inc. 2014
Tez
–
Customizable
Core
Engine
Page 24
Vertex-2
Vertex-1
Start
vertex
Vertex Manager
Start
tasks
DAG
Scheduler
Get Priority
Get Priority
Start
vertex
Task
Scheduler
Get container
Get container
• Vertex Manager
• Determines task
parallelism
• Determines
when tasks in a
vertex can start.
• DAG Scheduler
Determines priority
of task
• Task Scheduler
Allocates
containers from
YARN and assigns
them to tasks
25. © Hortonworks Inc. 2014
Tez
–
Theory
to
Prac8ce
• In theory, there is no difference
between theory and practice.
• But, in practice, there is.
Page 25
28. Tez
–
itera8ve
algorithms
• Pig
can
do
itera8ve
algorithms
on
top
of
Tez
• This
uses
heavy-‐weight
itera8on
(for-‐loop
+
map)
• Future
work
for
faster
loop-‐unrolled
out-‐of-‐order
itera8on
• 1-‐1
edges
between
loops
allows
building
morsel
style
parallelism
0
1000
2000
3000
10 50 100
Timeinsecs
Iteration
k-means
MR
Tez
14.84X
13.12X
5.37X
* Source code at http://hortonworks.com/blog/new-apache-pig-features-part-2-embedding
29. © Hortonworks Inc. 2014
Tez
–
Designed
for
big,
busy
clusters
• Number of stages in the DAG
• Higher the number of stages in the DAG, performance of Tez (over MR)
will be better.
• Cluster/queue capacity
• More congested a queue is, the performance of Tez (over MR) will be
better due to container reuse.
• Size of intermediate output
• More the size of intermediate output, the performance of Tez (over MR)
will be better due to reduced HDFS usage (cross-rack traffic)
• Size of data in the job
• For smaller data and more stages, the performance of Tez (over MR) will
be better as percentage of launch overhead in the total time is high for
smaller jobs.
• Move workloads from gateway boxes to the cluster
• Move as much work as possible to the cluster by modelling it via the job
DAG. Exploit the parallelism and resources of the cluster.
Page 29
30. © Hortonworks Inc. 2014
Tez
–
what
if
you
can’t
get
enough
containers?
• 78 vertex + 8374 tasks on 50 YARN containers
Page 30
31. © Hortonworks Inc. 2014
Tez
–
Adop8on
• Hive
• Hadoop
standard
for
declara8ve
access
via
SQL-‐like
interface
• Pig
• Hadoop
standard
for
procedural
scrip8ng
and
pipeline
processing
• Cascading
• Developer
friendly
Java
API
and
SDK
• Scalding
(Scala
API
on
Cascading)
• Commercial
Vendors
• ETL
:
Use
Tez
instead
of
MR
or
custom
pipelines
• Analy8cs
Vendors
:
Use
Tez
as
a
target
plasorm
for
scaling
parallel
analy8cal
tools
to
large
data-‐sets
Page 31
32. © Hortonworks Inc. 2014
Tez
–
Roadmap
• Richer
DAG
support
–
Addi8on
of
ver8ces
at
run8me
–
Shared
edges
for
shared
outputs
–
Enhance
Input/Output
collec8ons
• Performance
op-miza-ons
–
Improve
throughput
at
high
concurrency
–
Improve
locality
aware
scheduling
(co-‐scheduling)
–
Add
framework
level
data
sta8s8cs
–
HDFS
memory
storage
integra8on
• Usability
–
Stability
and
testability
–
API
ease
of
use
–
Tools
for
performance
analysis
and
debugging
Page 32
33. © Hortonworks Inc. 2014
Tez
–
Community
• Early
adopters
and
code
contributors
welcome
– Adopters
to
drive
more
scenarios.
Contributors
to
make
them
happen.
• Technical
blog
series
– hpp://hortonworks.com/blog/apache-‐tez-‐a-‐new-‐chapter-‐in-‐hadoop-‐data-‐
processing
• Useful
links
– Work
tracking:
hpps://issues.apache.org/jira/browse/TEZ
– Code:
hpps://github.com/apache/tez
–
Developer
list:
dev@tez.apache.org
User
list:
user@tez.apache.org
Issues
list:
issues@tez.apache.org
Page 33