Apache Tez : Accelerating Hadoop Query Processing

Apache Tez : Accelerating
Hadoop Query Processing
Jeff Markham
Technical Director, APAC
Hortonworks

Page 1

Tez – Introduction
• Distributed execution
framework targeted towards
data-processing applications.
• Based on expressing a
computation as a dataflow
graph.
• Built on top of YARN – the
resource management
framework for Hadoop.
• Open source Apache incubator
project and Apache licensed.

© Hortonworks Inc. 2013

Page 2

YARN: Taking Hadoop Beyond Batch
MapReduce as Base

Apache Tez as Base

HADOOP 1.0

HADOOP 2.0
Batch

Pig

(data
ﬂow)

Hive
Others

(sql)

(cascading)

MapReduce

MapReduce

Data
Flow

Pig

SQL

Hive

Others

(cascading)

Tez

Storm

(execu:on
engine)

YARN

(cluster
resource
management

&
data
processing)

(cluster
resource
management)

HDFS

HDFS2

(redundant,
reliable
storage)

© Hortonworks Inc. 2013.

Online

Real
Time

Data

Stream

Processing

Processing
HBase,

(redundant,
reliable
storage)

Accumulo

Apache Tez (“Speed”)
•  Replaces MapReduce as primitive for Pig, Hive, Cascading etc.
– Smaller latency for interactive queries
– Higher throughput for batch queries
– 22 contributors: Hortonworks (13), Facebook, Twitter, Yahoo, Microsoft
Task with pluggable Input, Processor and Output

Input

Processor

Output

Task

Tez Task - <Input, Processor, Output>

YARN ApplicationMaster to run DAG of Tez Tasks

Tez: Building blocks for scalable data processing
Classical ‘Map’

HDFS

Input

Map

Processor

Classical ‘Reduce’

Sorted

Output

Shuﬄe

Input

Shuﬄe

Input

Reduce

Processor

Sorted

Output

Intermediate ‘Reduce’ for
Map-Reduce-Reduce

Reduce

Processor

HDFS

Output

Hive-on-MR vs. Hive-on-Tez
Tez avoids
unneeded writes to
HDFS

SELECT a.x, AVERAGE(b.y) AS avg
FROM a JOIN b ON (a.id = b.id) GROUP BY a
UNION SELECT x, AVERAGE(y) AS AVG
FROM c GROUP BY x
ORDER BY AVG;

Hive – MR
M

M

Hive – Tez

M

SELECT a.state

SELECT b.id
R

R

M

SELECT a.state,
c.itemId

M

M

M
R

M

SELECT b.id

R

M

HDFS

JOIN (a, c)
SELECT c.price

M

R

M
R

HDFS

R

JOIN (a, c)

R

HDFS

JOIN(a, b)
GROUP BY a.state
COUNT(*)
AVERAGE(c.price)


M

M

R

M

JOIN(a, b)
GROUP BY a.state
COUNT(*)
AVERAGE(c.price)

R

Tez Sessions
… because Map/Reduce query startup is expensive
• Tez Sessions
– Hot containers ready for immediate use
– Removes task and job launch overhead (~5s – 30s)

• Hive
– Session launch/shutdown in background (seamless, user not
aware)
– Submits query plan directly to Tez Session

Native Hadoop service, not ad-hoc

Tez Delivers Interactive Query - Out of the Box!
Feature

DescripEon

Benefit

Tez
Session

Overcomes
Map-‐Reduce
job-‐launch
latency
by
pre-‐
launching
Tez
AppMaster

Latency

Tez
Container
Pre-‐
Launch

Overcomes
Map-‐Reduce
latency
by
pre-‐launching

hot
containers
ready
to
serve
queries.

Latency

Finished
maps
and
reduces
pick
up
more
work

Tez
Container
Re-‐Use
rather
than
exi:ng.
Reduces
latency
and
eliminates

difficult
split-‐size
tuning.
Out
of
box
performance!

Run:me
re-‐
Run:me
query
tuning
by
picking
aggrega:on

configura:on
of
DAG
parallelism
using
online
query
sta:s:cs

Tez
In-‐Memory

Cache

Hot
data
kept
in
RAM
for
fast
access.

Complex
DAGs

Tez
Broadcast
Edge
and
Map-‐Reduce-‐Reduce

paXern
improve
query
scale
and
throughput.


Latency

Throughput

Latency

Throughput

Page 8

Tez – Design Themes
• Empowering End Users
• Execution Performance


Page 9

Tez – Empowering End Users
• Expressive dataflow definition API’s
• Flexible Input-Processor-Output runtime model
• Data type agnostic
• Simplifying deployment


Page 10

– Enable definition of complex data flow pipelines using simple
graph connection API’s. Tez expands the logical plan at runtime.
– Targeted towards data processing applications like Hive/Pig but
not limited to it. Hive/Pig query plans naturally map to Tez dataflow
graphs with no translation impedance.
TaskA-1

TaskA-2

TaskD-1

TaskB-1

TaskB-2

TaskD-2


TaskC-1

TaskE-1

TaskC-2

TaskE-2

Page 11

Task-2

Task-1

Samples

Task-1

Partition Stage
Task-2

Preprocessor Stage

Sampler
Ranges

Distributed Sort

Task-1


Task-2
Aggregate Stage

Page 12

– Construct physical runtime executors dynamically by connecting
different inputs, processors and outputs.
– End goal is to have a library of inputs, outputs and processors that
can be programmatically composed to generate useful tasks.

HDFSInput

ShuffleInput

MapProcessor

ReduceProcessor

JoinProcessor

FileSortedOutput

HDFSOutput

FileSortedOutput

Mapper

Reducer

PairwiseJoin


Input1

Input2

Page 13

– Tez is only concerned with the movement of data. Files and
streams of bytes.
– Does not impose any data format on the user application. MR
application can use Key-Value pairs on top of Tez. Hive and Pig
can use tuple oriented formats that are natural and native to them.

Tez Task

File

User Code
Key Value

Bytes

Bytes
Tuples

Stream


Page 14

• Simplifying deployment
– Tez is a completely client side application.
– No deployments to do. Simply upload to any accessible
FileSystem and change local Tez configuration to point to that.
– Enables running different versions concurrently. Easy to test new
functionality while keeping stable versions for production.
– Leverages YARN local resources.
HDFS
Tez Lib 1

Tez Lib 2

TezClient

TezTask

TezTask

TezClient

Client
Machine

Node
Manager

Node
Manager

Client
Machine


Page 15

• Simplifying usage
With great power API’s come great responsibilities J
Tez is a framework on which end user applications can
be built


Page 16

Tez – Execution Performance
• Performance gains over Map Reduce
• Optimal resource management
• Plan reconfiguration at runtime
• Dynamic physical data flow decisions


Page 17

• Performance gains over Map Reduce
– Eliminate replicated write barrier between successive
computations.
– Eliminate job launch overhead of workflow jobs.
– Eliminate extra stage of map reads in every workflow job.
– Eliminate queue and resource contention suffered by workflow
jobs that are started after a predecessor job completes.

Pig/Hive - Tez

Pig/Hive - MR


Page 18

• Plan reconfiguration at runtime
– Dynamic runtime concurrency control based on data size, user
operator resources, available cluster resources and locality.
– Advanced changes in dataflow graph structure.
– Progressive graph construction in concert with user optimizer.

HDFS
Blocks
Stage 1
50 maps
100
partitions

Stage 2
100
reducers

Stage 1
50 maps
100
partitions

Only 10GB’s
of data

Stage 2
100 10
reducers

YARN
Resources


Page 19

• Optimal resource management
– Reuse YARN containers to launch new tasks.
– Reuse YARN containers to enable shared objects across tasks.

Start Task

Tez
Application Master

Task Done

Start Task

YARN Container

TezTask1

TezTask2

Shared Objects

TezTask Host

YARN Container


Page 20

• Dynamic physical data flow decisions
– Decide the type of physical byte movement and storage on the fly.
– Store intermediate data on distributed store, local store or inmemory.
– Transfer bytes via blocking files or streaming and the spectrum in
between.
Producer
(small size)

Producer

Local File

At Runtime

In-Memory

Consumer

Consumer


Page 21

Tez – Sessions
Start
Session

Submit
DAG

Client

Application Master
Task Scheduler

Container Pool

•  Key for interactive queries
•  Analogous to database
sessions and represents a
connection between the user
and the cluster
•  Run multiple DAGs / queries in
the same session
•  Maintains a pool of reusable
containers for low latency
execution of tasks within and
across queries
•  Takes care of data locality and
releasing resources when idle
•  Session cache in the
Application Master and in the
container pool reduce recomputation and re-initialization


PreWarmed
JVM

Shared
Object
Registry

Page 33

Tez – Benchmark Performance

Significant (but not all) speed-ups due to Tez:
•  DAG support and runtime graph reconfiguration enable utilizing the
parallelism of the cluster
•  Tez Session and container re-use enable
efficient and low latency execution


Page 35

Tez – Performance Analysis
Tez Session populates
container pool

AM

Dimension table
calculation and HDFS
split generation in
parallel
Dimension tables
broadcasted to Hive
MapJoin tasks

…

…

Final Reducer prelaunched and fetches
completed inputs

TPC-DS – Query 27 with Hive on Tez

Page 36

Tez – Current status
• Apache Incubator Project
– Rapid development. Over 600 jiras opened. Over 400 resolved.
– Growing community of contributors and users.

• Focus on stability
– Testing and quality are highest priority.
– Code ready and deployed on multi-node environments.

• Support for a vast topology of DAGs
– Already functionally equivalent to Map Reduce. Existing Map
Reduce jobs can be executed on Tez with few or no changes.
– Hive re-targeted to use Tez for execution of queries (HIVE-4660).
– Work started on Pig to use Tez for execution of scripts (PIG-3446).

Page 37

Tez – Roadmap
• Richer DAG support
– Support for co-scheduling and streaming
– Better fault tolerance with checkpoints

• Performance optimizations
– More efficiencies in transfer of data
– Improve session performance

• Usability
– Stability and testability
– Recovery and history
– Tools for performance analysis and debugging


Page 38

Tez – Key Takeaways
• Distributed execution framework that works on
computations represented as dataflow graphs
• Naturally maps to execution plans produced by query
optimizers
• Customizable execution architecture designed to
enable dynamic performance optimizations at runtime
• Works out of the box with the platform figuring out
the hard stuff
• Span the spectrum of interactive latency to batch
• Open source Apache project – your use-cases and
code are welcome
• It works and is already being used by Hive and Pig

Page 40

Thank You !


Page 41

Apache Tez : Accelerating Hadoop Query Processing

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Apache Tez : Accelerating Hadoop Query Processing

Similar to Apache Tez : Accelerating Hadoop Query Processing (20)

Recently uploaded

Recently uploaded (20)

Apache Tez : Accelerating Hadoop Query Processing