LLAP: Sub-Second Analytical Queries in Hive

© Hortonworks Inc. 2011 – 2015. All Rights Reserved
LLAP: Sub-Second Analytical Queries in Hive
Sergey Shelukhin, Siddharth Seth

LLAP: long-lived execution in Hive
What is LLAP?+
+ LLAP in Hive; performance primer+
+ LLAP as a unified data access layer+
+ Deploying and managing LLAP+
+ Future work+
+
+ How does LLAP make queries faster?+

What is LLAP?

What is LLAP?
• Hybrid model combining daemons and containers for
fast, concurrent execution of analytical workloads
(e.g. Hive SQL queries)
• Concurrent queries without specialized YARN queue setup
• Multi-threaded execution of vectorized operator pipelines
• Asynchronous IO and efficient in-memory caching
• Relational view of the data available thru the API
• High performance scans, execution code pushdown
• Centralized data security
Node
LLAP Process
Cache
Query Fragment
HDFS
Query Fragment

What LLAP isn't
• Not another "execution engine" (like Tez, MR, Spark…)
• Tez, Spark (work in progress) can work on top of LLAP
– Engines provide coordination and scheduling
– some work (e.g. large shuffles) can be scheduled in containers
• Not a storage substrate
• Daemons are stateless and read (and cache) data from HDFS/S3/Azure/..

LLAP
Queue
Technical overview – execution
• LLAP daemon has a number of executors (think
containers) that execute work "fragments"
• Fragments are parts of one, or multiple parallel
workloads (e.g. Hive SQL queries)
• Work queue with pluggable priority
• Geared towards low latency queries over long-
running queries (by default)
• I/O is similar to containers – read/write to HDFS,
shuffle, other storages and formats
• Streaming output for data API
Executor
Q1 Map 1
Executor
External read
Executor
Q3 Reducer 3
Q1 Map 1
Q1 Map 1
Q3 Map 19
HDFS
Waiting for
shuffle inputs
HBase
Container
(shuffle input)
Spark
executor

Executor
Technical overview – IO layer
• Optional when executing inside LLAP
• Wraps existing InputFormat
• Currently only ORC is supported
• Asynchronous IO for Hive
• Transparent, compressed in-memory cache
• Format-specific, extensible
• SSD cache is work in progress
RecordReade
r
Fragment
Cache
IO thread
Plan & decode
Read, decompress
Metadata cache
Actual data (HDFS, S3, …)
What to read Data buffers
Splits Vectorized data

LLAP in Hive; performance primer

Overview - Hive + LLAP
• Transparent to Hive users, BI tools, etc.
• Except the queries become faster :)
• Number of concurrent queries throttled
by Hive Server
• Hive decides where query fragments run
(LLAP, Container, AM) based on
configuration, data size, format, etc.
• Each Query coordinated independently by
a Tez AM
• Hive Operators used for processing
• Tez Runtime used for data transfer
HiveServer2
Query/AM
Controller
Client(s) YARN Cluster
AM1
llapd
llapd
Container AM1
Container AM1
llapd
Container AM2
AM2
AM3
llapd

How does LLAP speed up the queries?
• Eliminates container startup costs
• JIT optimizer has a chance to work
• Especially important for vectorized processing
• Asynchronous IO
• Caching
• Data sharing (hash join tables, etc.)
• Optimizations for parallel queries

Industry benchmark – 10Tb scale
0
5
10
15
20
25
30
35
40
45
50
query3 query12 query20 query21 query26 query27 query42 query52 query55 query73 query89 query91 query98
Time(seconds)
LLAP vs Hive 1.x 10TB Scale
Hive 1.x LLAP90% faster

Evaluation for a real use case
0
20000
1 3 5 7 9 11 13 15 17 19 21 23 25 27
• Aggregate daily statistics for a time interval:
SELECT yyyymmdd,
sum(total_1),
sum(total_2),
...
from table
where yyyymmdd >= xxx
and yyyymmdd < xxx
and userid = xxx
group by userid, yyyymmdd;
Max
Max
Max
Avg
Avg Avg
0
1
2
3
4
5
6
7
8
D W M Y D W M Y D W M Y
Tez Phoenix LLAP
Executiontime,s

Evaluation for a real use case
ExecutionTime,s.
• Display a large report
• Phoenix runtimes not displayed (30-250s
on large ranges)
Execution Time in seconds over time range
Max
Max
Avg
Avg
0
5
10
15
20
D W M Y D W M Y
Tez LLAP
Executiontime,s

How does LLAP make queries faster?

Persistent daemons (recap)
• Reduced startup time
• Better use of JIT optimizer
• Hash join hash tables, fragment plans are shared
• Multiple tasks do not all generate the table or deserialize the plans

Performance – first query
• Cold LLAP is nearly as fast as
shared pre-warmed containers
(impractical on real clusters)
• Realistic (long-running) LLAP
~3x faster than realistic (no
prewarm) Tez
No
prewarm,
20.94
Shared
prewarm
11.96
Cold LLAP
13.23
Realistic
LLAP
7.49
0
5
10
15
20
25
Firstqueryruntime,s

Performance – repeated query
• Cache disabled!
13.23
7.93 7.7
6.29
5.44 5.3 5.01 4.89 4.83
7.49
6.1
4.61 4.59
4.93
4.62 4.63 4.45 4.25
0
2
4
6
8
10
12
14
0 1 2 3 4 5 6 7 8
Queryruntime,s
Query run
Cold LLAP
LLAP after an
unrelated workload

Parallel queries – priorities, preemption
• Lower-priority fragments can be preempted
• For example, a fragment can start running before its inputs are
ready, for better pipelining; such fragments may be preempted
• LLAP work queue examines the DAG parameters to give
preference to interactive (BI) queries
LLAP
QueueExecutor
Executor
Interactive
query map 1/3
…
Interactive
query map 3/3
Executor
Interactive
query map 2/3
Wide query
reduce waiting
Time
ClusterUtilization
Long-Running Query
Short-Running Query

Parallel query execution – LLAP vs Hive 1.2
3131
2416
1741
1420
1508
531
291
147
94 75
100%
91% 90%
71%
44%
45%
13%
0.00%
20.00%
40.00%
60.00%
80.00%
100.00%
0
500
1000
1500
2000
2500
3000
3500
1 2 4 8 16
Scaling(comparedtolinear)
Totalruntimeforallqueries,s.
Number of concurrent users
Hive 1.2 linear scaling
Hive 1.2 total runtime
LLAP linear scaling
LLAP total runtime
Hive+LLAP efficiency
Hive 1.2 efficiency

Parallel query execution – 10Tb scale
531
291
147
94
75
100%
91% 90%
71%
44%
0.00%
20.00%
40.00%
60.00%
80.00%
100.00%
0
100
200
300
400
500
600
1 2 4 8 16
Scaling(comparedtolinear)
Totalruntimeforallqueries,s.
Number of concurrent users
LLAP linear scaling
LLAP total runtime
Hive+LLAP efficiency

IO elevator and caching
• Reading, decoding and processing are parallel
• Unlike in regular Hive, where processing can wait for e.g. an S3 read
• Logically-compresses (e.g. ORC-encoded) data is cached off-heap
• Metadata is cached (currently on-heap) with high priority
• Replacement policy is pluggable (LRFU is the default)
• Tez schedules work to preferred location to improve locality
• Tradeoff between HDFS reads and operator speed
• Cache takes memory away from operators (sort buffers, hash join tables, etc.)
• Depends on how much memory you have, workflow, dataset size, etc.
• Ex. 4Gb per executor x 16 CPUs – 64Gb for executors, rest for cache

I/OExecution
In-memory processing – present and future
Decoder
Distributed FS
Fragment
Hive
operator
Hive
operator
Vectorized
processin
g
col1 col2
Low-level pushdown
(e.g. filter)*
SSD cache*
Off-heap cache Compression
codec
Native data
vectors
- work in
progress
*
Compact encoded data
Compressed data
col1
col2

Fragment
I/OExecution
In-memory processing – future?
Distributed FSHive
operator
Hive
operator
col1 col2
SSD cache*
Off-heap cache Compression
codec
- work in
progress
*
Compact encoded data
Compressed data
Processing on
encoded
data, codegen
Zero copy
- next
steps?
*
Low-level operators
(e.g. filter) *

Performance – cache on slow FS
• Industry-standard cloud FS,
much slower than on-prem
HDFS
• Large scan on 1Tb scale
• Local HD/SDD cache (WIP)
• Massive improvement on slow
storage with little memory cost
0
50
100
150
200
250
300
350
400
Cold cache Hot cache
Runtime,s
Entire query
Scan portion

Performance – cache on HDFS, 1Tb scale
20.2
18.3
17.6
16.2 16.2
16.8
14.5
11.3
0.00%
20.00%
40.00%
60.00%
80.00%
100.00%
0
5
10
15
20
25
24 26 28 30 32 34 36 38 40 42
Cachehitrate
Queryruntime,s
Cache size, Gb
Query runtime, after
unrelated workload
Query runtime, after
related workload
Metadata hit rate
Data hit rate

Perf overview – a demo in a gif

LLAP as the unified data access layer

Overview
• LLAP can provide a "relational datanode" view of the data
• Security via endpoint ACLs, or fragment signing (for external engines)
• Granular, e.g. column-level, security is possible
• Anyone (with access) can push the (approved) code in, from complex query
fragments to simple data reads
• SparkSQL/LLAP integration is based on SparkSQL data source APIs
• DataFrame can be created with LlapInputFormat
• Hive execution engines and other DAG executors can use LLAP directly
• like Tez does now

Example - SparkSQL integration
• Allowing direct access to Hive
table data from Spark breaks
security model for warehouses
secured using Ranger or SQL
Standard Authorization
• Other features may not work
correctly unless support added in
Spark
• Row/Cell level security (when
implemented), ACID, Schema
Evolution, …
• SparkSQL has interfaces for external
sources; they can be implemented to
access Hive data via HS2/LLAP
• SparkSQL Catalyst optimizer hooks
can be used to push processing into
LLAP (e.g. filters on the tables)
• Some optimizations, like dynamic partition
pruning, can be used by Hive even if SparkSQL
doesn't support them; if execution pushdown
is advanced enough

Example - SparkSQL integration – execution flow
Page 30
HadoopRDD.compute()
LlapInputFormat.getRecordReader()
- Open socket for incoming data
- Send package/plan to LLAP
Check permissions
Generate splits w/LLAP locations
Return securely signed splits
HiveServer2
Spark Executor
HadoopRDD.getPartitions()
Get Hive splits
Cluster nodes
Verify splits
Run query plan
Send data back
LLAP
Partitions/
Splits
Request
splits
: :
var llapContext =
LlapContext.newInstance(
sparkContext, jdbcUrl)
var df: DataFrame =
llapContext.sql("select *
from tpch_text_5.region")
DataFrame for Hive/LLAP data

Example - Tez/LLAP integration
• Tez DAG coordination remains mostly unchanged
• Tez plugins (TEZ-2003)
• Scheduling – allows scheduling parts of a DAG as fragments on LLAP
• Task communication – custom execution specifications, protocols – allows
talking to LLAP, manages security tokens, etc.
• Hive operators used for processing
• Few or no changes necessary to support any Hive query complexity
• All new Hive features automatically supported

Deploying and managing LLAP

Availability
• First version shipped in Apache Hive 2.0
• Hive 2.0.1 contains important bugfixes to make it production ready
• Hive 2.1 would have new features and further perf improvements
• A solid platform for the basic scenario – run LLAP-only queries
on a secure cluster
• or a fraction thereof
• Work in progress to support additional features (ACID, UDFs,
hybrid deployments, etc.)
• Unified data access layer is the next step

General overview
• LLAP daemons run on YARN; deployed via Apache Slider
• Easy to bring up, tear down, flex clusters via slider commands
• Hive-on-Tez should be set up with a compatible version (0.8.2+)
• Zookeeper for service discovery and coordination
• On HS2, using doAs is not recommended; use SQL auth or Ranger
• However, you can keep using storage-based auth for other queries, as usual
• Java 8 strongly recommended
• April 2015 called and they want their Java 7 End-of-Lived

Starting the cluster
• Apache Ambari integration – WIP (will do all of this for you)
• Hive service (hive --service llap)
• Generates a slider package, and a script to deploy it and start the cluster
• Run as the correct user – slider paths are user-specific!
• HADOOP_HOME, JAVA_HOME should be set
• kinit on secure cluster
• Specify a name, e.g. --name llap0; number of instances (e.g. one per machine);
memory, cache size, number of executors (e.g. one per CPU); see --help
• G1 GC recommended (e.g. --args " -XX:+UseG1GC -XX:+ResizeTLAB -
XX:-ResizePLAB")

Monitoring
• LLAP exposes a UI for
monitoring
• Also has jmx endpoint with
much more data, logs and
jstack endpoints as usual
• Aggregate monitoring UI is
work in progress

Running queries
• Once the cluster is started, queries can be run from HS2 and CLI
• --hiveconf hive.execution.mode=llap --hiveconf hive.llap.execution.mode=all
--hiveconf hive.llap.io.enabled=true
--hiveconf hive.llap.daemon.service.hosts=@<cluster name>
• Set the usual perf settings (recommended, if not already set)
• Enable vectorization, PPD, map join; use ORC
• LLAP-specific: set hive.tez.input.generate.consistent.splits=true;
• For interactive queries, disable CBO
• Parallel compilation on HS2 – otherwise your parallel queries might bottleneck there!
• Good to go!

Watching queries – Tez UI integration

Security
• Again, Ambari will do this for you soon
• SQL or Ranger auth recommended, with hive user running queries
• LLAP uses keytabs and ACLs, similar to datanode, to secure endpoints
• Keytabs need to be set up; certs may need to be set up for Web UI
• HS2 (or client) obtains a token to access a particular LLAP cluster, and
gives it to Tez AM
• LLAP-s share tokens using secure ZK, similar to HA token sharing
• Signing fragments on HS2 and granular security checks are work in
progress

Summary and future work

Future work
• Data view for external services
• Column-level security
• Better integration and deployment
• Ambari, monitoring UI, fine-grained log aggregation
• Hybrid deployment
• Resizing LLAP to accommodate containers, YARN resource delegation, etc.
• Feature work
• ACID, text cache, better cache policies, UDFs, SSD cache, etc.
• More performance work

Summary
• LLAP is a new execution substrate for fast, concurrent
analytical workloads, harnessing Hive vectorized SQL
engine and efficient in-memory caching layer
• Provides secure relational view of the data thru a
standard InputFormat API
• Available in Hive 2, with better ecosystem integration in
the works

Questions?
?
Interested? Stop by the Hortonworks booth to learn more

Backup slides

Example execution: MR vs Tez vs Tez+LLAP
M M M
R R
M M
R
M M
R
M M
R
HDFS
HDFS
HDFS
T T T
T T
R
T T
T
R
M M M
R R
R
M M
R
R
HDFS
MapReduce Tez Tez with LLAP

IO Elevator
• Reading, decoding and
processing are parallel
• Unlike in regular Hive
• HDFS reads are expensive
(more so on S3, Azure)
• Data decompression and
decoding is expensive

LLAP: Sub-Second Analytical Queries in Hive

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a LLAP: Sub-Second Analytical Queries in Hive

Similar a LLAP: Sub-Second Analytical Queries in Hive (20)

Más de DataWorks Summit/Hadoop Summit

Más de DataWorks Summit/Hadoop Summit (20)

Último

Último (20)

LLAP: Sub-Second Analytical Queries in Hive

Notas del editor