Enviar búsqueda
Cargar
LLAP: Sub-Second Analytical Queries in Hive
•
Descargar como PPTX, PDF
•
20 recomendaciones
•
6,870 vistas
DataWorks Summit/Hadoop Summit
Seguir
LLAP: Sub-Second Analytical Queries in Hive
Leer menos
Leer más
Tecnología
Denunciar
Compartir
Denunciar
Compartir
1 de 46
Descargar ahora
Recomendados
Speed Up Your Queries with Hive LLAP Engine on Hadoop or in the Cloud
Speed Up Your Queries with Hive LLAP Engine on Hadoop or in the Cloud
gluent.
Achieving 100k Queries per Hour on Hive on Tez
Achieving 100k Queries per Hour on Hive on Tez
DataWorks Summit/Hadoop Summit
Evolving HDFS to Generalized Storage Subsystem
Evolving HDFS to Generalized Storage Subsystem
DataWorks Summit/Hadoop Summit
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
DataWorks Summit/Hadoop Summit
Taming the Elephant: Efficient and Effective Apache Hadoop Management
Taming the Elephant: Efficient and Effective Apache Hadoop Management
DataWorks Summit/Hadoop Summit
LLAP Nov Meetup
LLAP Nov Meetup
t3rmin4t0r
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache Flink
DataWorks Summit
Llap: Locality is Dead
Llap: Locality is Dead
t3rmin4t0r
Recomendados
Speed Up Your Queries with Hive LLAP Engine on Hadoop or in the Cloud
Speed Up Your Queries with Hive LLAP Engine on Hadoop or in the Cloud
gluent.
Achieving 100k Queries per Hour on Hive on Tez
Achieving 100k Queries per Hour on Hive on Tez
DataWorks Summit/Hadoop Summit
Evolving HDFS to Generalized Storage Subsystem
Evolving HDFS to Generalized Storage Subsystem
DataWorks Summit/Hadoop Summit
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
DataWorks Summit/Hadoop Summit
Taming the Elephant: Efficient and Effective Apache Hadoop Management
Taming the Elephant: Efficient and Effective Apache Hadoop Management
DataWorks Summit/Hadoop Summit
LLAP Nov Meetup
LLAP Nov Meetup
t3rmin4t0r
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache Flink
DataWorks Summit
Llap: Locality is Dead
Llap: Locality is Dead
t3rmin4t0r
Evolving HDFS to a Generalized Distributed Storage Subsystem
Evolving HDFS to a Generalized Distributed Storage Subsystem
DataWorks Summit/Hadoop Summit
ORC 2015: Faster, Better, Smaller
ORC 2015: Faster, Better, Smaller
The Apache Software Foundation
Tune up Yarn and Hive
Tune up Yarn and Hive
rxu
How the Internet of Things are Turning the Internet Upside Down
How the Internet of Things are Turning the Internet Upside Down
DataWorks Summit
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
alanfgates
Optimizing Hive Queries
Optimizing Hive Queries
DataWorks Summit
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
DataWorks Summit
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
DataWorks Summit/Hadoop Summit
Debunking the Myths of HDFS Erasure Coding Performance
Debunking the Myths of HDFS Erasure Coding Performance
DataWorks Summit/Hadoop Summit
Hadoop operations-2015-hadoop-summit-san-jose-v5
Hadoop operations-2015-hadoop-summit-san-jose-v5
Chris Nauroth
Query Engines for Hive: MR, Spark, Tez with LLAP – Considerations!
Query Engines for Hive: MR, Spark, Tez with LLAP – Considerations!
Mich Talebzadeh (Ph.D.)
What's new in hadoop 3.0
What's new in hadoop 3.0
Heiko Loewe
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
DataWorks Summit/Hadoop Summit
Optimizing Hive Queries
Optimizing Hive Queries
Owen O'Malley
Sub-second-sql-on-hadoop-at-scale
Sub-second-sql-on-hadoop-at-scale
Yifeng Jiang
HDFS Tiered Storage: Mounting Object Stores in HDFS
HDFS Tiered Storage: Mounting Object Stores in HDFS
DataWorks Summit/Hadoop Summit
Unified Batch & Stream Processing with Apache Samza
Unified Batch & Stream Processing with Apache Samza
DataWorks Summit
Scaling HDFS to Manage Billions of Files with Key-Value Stores
Scaling HDFS to Manage Billions of Files with Key-Value Stores
DataWorks Summit
Comparative Performance Analysis of AWS EC2 Instance Types Commonly Used for ...
Comparative Performance Analysis of AWS EC2 Instance Types Commonly Used for ...
DataWorks Summit
Network for the Large-scale Hadoop cluster at Yahoo! JAPAN
Network for the Large-scale Hadoop cluster at Yahoo! JAPAN
DataWorks Summit/Hadoop Summit
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
DataWorks Summit/Hadoop Summit
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
DataWorks Summit/Hadoop Summit
Más contenido relacionado
La actualidad más candente
Evolving HDFS to a Generalized Distributed Storage Subsystem
Evolving HDFS to a Generalized Distributed Storage Subsystem
DataWorks Summit/Hadoop Summit
ORC 2015: Faster, Better, Smaller
ORC 2015: Faster, Better, Smaller
The Apache Software Foundation
Tune up Yarn and Hive
Tune up Yarn and Hive
rxu
How the Internet of Things are Turning the Internet Upside Down
How the Internet of Things are Turning the Internet Upside Down
DataWorks Summit
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
alanfgates
Optimizing Hive Queries
Optimizing Hive Queries
DataWorks Summit
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
DataWorks Summit
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
DataWorks Summit/Hadoop Summit
Debunking the Myths of HDFS Erasure Coding Performance
Debunking the Myths of HDFS Erasure Coding Performance
DataWorks Summit/Hadoop Summit
Hadoop operations-2015-hadoop-summit-san-jose-v5
Hadoop operations-2015-hadoop-summit-san-jose-v5
Chris Nauroth
Query Engines for Hive: MR, Spark, Tez with LLAP – Considerations!
Query Engines for Hive: MR, Spark, Tez with LLAP – Considerations!
Mich Talebzadeh (Ph.D.)
What's new in hadoop 3.0
What's new in hadoop 3.0
Heiko Loewe
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
DataWorks Summit/Hadoop Summit
Optimizing Hive Queries
Optimizing Hive Queries
Owen O'Malley
Sub-second-sql-on-hadoop-at-scale
Sub-second-sql-on-hadoop-at-scale
Yifeng Jiang
HDFS Tiered Storage: Mounting Object Stores in HDFS
HDFS Tiered Storage: Mounting Object Stores in HDFS
DataWorks Summit/Hadoop Summit
Unified Batch & Stream Processing with Apache Samza
Unified Batch & Stream Processing with Apache Samza
DataWorks Summit
Scaling HDFS to Manage Billions of Files with Key-Value Stores
Scaling HDFS to Manage Billions of Files with Key-Value Stores
DataWorks Summit
Comparative Performance Analysis of AWS EC2 Instance Types Commonly Used for ...
Comparative Performance Analysis of AWS EC2 Instance Types Commonly Used for ...
DataWorks Summit
Network for the Large-scale Hadoop cluster at Yahoo! JAPAN
Network for the Large-scale Hadoop cluster at Yahoo! JAPAN
DataWorks Summit/Hadoop Summit
La actualidad más candente
(20)
Evolving HDFS to a Generalized Distributed Storage Subsystem
Evolving HDFS to a Generalized Distributed Storage Subsystem
ORC 2015: Faster, Better, Smaller
ORC 2015: Faster, Better, Smaller
Tune up Yarn and Hive
Tune up Yarn and Hive
How the Internet of Things are Turning the Internet Upside Down
How the Internet of Things are Turning the Internet Upside Down
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
Optimizing Hive Queries
Optimizing Hive Queries
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
Debunking the Myths of HDFS Erasure Coding Performance
Debunking the Myths of HDFS Erasure Coding Performance
Hadoop operations-2015-hadoop-summit-san-jose-v5
Hadoop operations-2015-hadoop-summit-san-jose-v5
Query Engines for Hive: MR, Spark, Tez with LLAP – Considerations!
Query Engines for Hive: MR, Spark, Tez with LLAP – Considerations!
What's new in hadoop 3.0
What's new in hadoop 3.0
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
Optimizing Hive Queries
Optimizing Hive Queries
Sub-second-sql-on-hadoop-at-scale
Sub-second-sql-on-hadoop-at-scale
HDFS Tiered Storage: Mounting Object Stores in HDFS
HDFS Tiered Storage: Mounting Object Stores in HDFS
Unified Batch & Stream Processing with Apache Samza
Unified Batch & Stream Processing with Apache Samza
Scaling HDFS to Manage Billions of Files with Key-Value Stores
Scaling HDFS to Manage Billions of Files with Key-Value Stores
Comparative Performance Analysis of AWS EC2 Instance Types Commonly Used for ...
Comparative Performance Analysis of AWS EC2 Instance Types Commonly Used for ...
Network for the Large-scale Hadoop cluster at Yahoo! JAPAN
Network for the Large-scale Hadoop cluster at Yahoo! JAPAN
Similar a LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
DataWorks Summit/Hadoop Summit
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
DataWorks Summit/Hadoop Summit
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
DataWorks Summit/Hadoop Summit
LLAP: long-lived execution in Hive
LLAP: long-lived execution in Hive
DataWorks Summit
Stinger.Next by Alan Gates of Hortonworks
Stinger.Next by Alan Gates of Hortonworks
Data Con LA
LLAP: Building Cloud First BI
LLAP: Building Cloud First BI
DataWorks Summit
Hive acid and_2.x new_features
Hive acid and_2.x new_features
Alberto Romero
Kinesis vs-kafka-and-kafka-deep-dive
Kinesis vs-kafka-and-kafka-deep-dive
Yifeng Jiang
Hive Performance Dataworks Summit Melbourne February 2019
Hive Performance Dataworks Summit Melbourne February 2019
alanfgates
Fast SQL on Hadoop, Really?
Fast SQL on Hadoop, Really?
DataWorks Summit
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
DataWorks Summit
Curing the Kafka blindness—Streams Messaging Manager
Curing the Kafka blindness—Streams Messaging Manager
DataWorks Summit
What is new in Apache Hive 3.0?
What is new in Apache Hive 3.0?
DataWorks Summit
What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?
DataWorks Summit
What's New in Apache Hive 3.0 - Tokyo
What's New in Apache Hive 3.0 - Tokyo
DataWorks Summit
Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017
alanfgates
An Apache Hive Based Data Warehouse
An Apache Hive Based Data Warehouse
DataWorks Summit
What is New in Apache Hive 3.0?
What is New in Apache Hive 3.0?
DataWorks Summit
Hive 3 New Horizons DataWorks Summit Melbourne February 2019
Hive 3 New Horizons DataWorks Summit Melbourne February 2019
alanfgates
Using Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
Using Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
DataWorks Summit
Similar a LLAP: Sub-Second Analytical Queries in Hive
(20)
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
LLAP: long-lived execution in Hive
LLAP: long-lived execution in Hive
Stinger.Next by Alan Gates of Hortonworks
Stinger.Next by Alan Gates of Hortonworks
LLAP: Building Cloud First BI
LLAP: Building Cloud First BI
Hive acid and_2.x new_features
Hive acid and_2.x new_features
Kinesis vs-kafka-and-kafka-deep-dive
Kinesis vs-kafka-and-kafka-deep-dive
Hive Performance Dataworks Summit Melbourne February 2019
Hive Performance Dataworks Summit Melbourne February 2019
Fast SQL on Hadoop, Really?
Fast SQL on Hadoop, Really?
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
Curing the Kafka blindness—Streams Messaging Manager
Curing the Kafka blindness—Streams Messaging Manager
What is new in Apache Hive 3.0?
What is new in Apache Hive 3.0?
What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0 - Tokyo
What's New in Apache Hive 3.0 - Tokyo
Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017
An Apache Hive Based Data Warehouse
An Apache Hive Based Data Warehouse
What is New in Apache Hive 3.0?
What is New in Apache Hive 3.0?
Hive 3 New Horizons DataWorks Summit Melbourne February 2019
Hive 3 New Horizons DataWorks Summit Melbourne February 2019
Using Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
Using Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
Más de DataWorks Summit/Hadoop Summit
Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in Production
DataWorks Summit/Hadoop Summit
State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache Zeppelin
DataWorks Summit/Hadoop Summit
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
DataWorks Summit/Hadoop Summit
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
DataWorks Summit/Hadoop Summit
Revolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and Zeppelin
DataWorks Summit/Hadoop Summit
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
DataWorks Summit/Hadoop Summit
Hadoop Crash Course
Hadoop Crash Course
DataWorks Summit/Hadoop Summit
Data Science Crash Course
Data Science Crash Course
DataWorks Summit/Hadoop Summit
Apache Spark Crash Course
Apache Spark Crash Course
DataWorks Summit/Hadoop Summit
Dataflow with Apache NiFi
Dataflow with Apache NiFi
DataWorks Summit/Hadoop Summit
Schema Registry - Set you Data Free
Schema Registry - Set you Data Free
DataWorks Summit/Hadoop Summit
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
DataWorks Summit/Hadoop Summit
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
DataWorks Summit/Hadoop Summit
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
DataWorks Summit/Hadoop Summit
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
DataWorks Summit/Hadoop Summit
HBase in Practice
HBase in Practice
DataWorks Summit/Hadoop Summit
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
DataWorks Summit/Hadoop Summit
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
DataWorks Summit/Hadoop Summit
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
DataWorks Summit/Hadoop Summit
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
DataWorks Summit/Hadoop Summit
Más de DataWorks Summit/Hadoop Summit
(20)
Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in Production
State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache Zeppelin
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
Revolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and Zeppelin
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
Hadoop Crash Course
Hadoop Crash Course
Data Science Crash Course
Data Science Crash Course
Apache Spark Crash Course
Apache Spark Crash Course
Dataflow with Apache NiFi
Dataflow with Apache NiFi
Schema Registry - Set you Data Free
Schema Registry - Set you Data Free
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
HBase in Practice
HBase in Practice
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Último
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
SeasiaInfotech2
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
comworks
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
UiPathCommunity
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
Stephanie Beckett
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
Alfredo García Lavilla
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
charlottematthew16
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
Kalema Edgar
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
Hervé Boutemy
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
RankYa
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
Rizwan Syed
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
2toLead Limited
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
Lorenzo Miniero
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
Fwdays
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
Manik S Magar
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
Enterprise Knowledge
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
gvaughan
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Patryk Bandurski
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
Fwdays
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
BookNet Canada
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Zilliz
Último
(20)
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
LLAP: Sub-Second Analytical Queries in Hive
1.
Page1 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved LLAP: Sub-Second Analytical Queries in Hive Sergey Shelukhin, Siddharth Seth
2.
Page2 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved LLAP: long-lived execution in Hive What is LLAP?+ + LLAP in Hive; performance primer+ + LLAP as a unified data access layer+ + Deploying and managing LLAP+ + Future work+ + + How does LLAP make queries faster?+
3.
Page3 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved What is LLAP?
4.
Page4 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved What is LLAP? • Hybrid model combining daemons and containers for fast, concurrent execution of analytical workloads (e.g. Hive SQL queries) • Concurrent queries without specialized YARN queue setup • Multi-threaded execution of vectorized operator pipelines • Asynchronous IO and efficient in-memory caching • Relational view of the data available thru the API • High performance scans, execution code pushdown • Centralized data security Node LLAP Process Cache Query Fragment HDFS Query Fragment
5.
Page5 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved What LLAP isn't • Not another "execution engine" (like Tez, MR, Spark…) • Tez, Spark (work in progress) can work on top of LLAP – Engines provide coordination and scheduling – some work (e.g. large shuffles) can be scheduled in containers • Not a storage substrate • Daemons are stateless and read (and cache) data from HDFS/S3/Azure/..
6.
Page6 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved LLAP Queue Technical overview – execution • LLAP daemon has a number of executors (think containers) that execute work "fragments" • Fragments are parts of one, or multiple parallel workloads (e.g. Hive SQL queries) • Work queue with pluggable priority • Geared towards low latency queries over long- running queries (by default) • I/O is similar to containers – read/write to HDFS, shuffle, other storages and formats • Streaming output for data API Executor Q1 Map 1 Executor External read Executor Q3 Reducer 3 Q1 Map 1 Q1 Map 1 Q3 Map 19 HDFS Waiting for shuffle inputs HBase Container (shuffle input) Spark executor
7.
Page7 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved Executor Technical overview – IO layer • Optional when executing inside LLAP • Wraps existing InputFormat • Currently only ORC is supported • Asynchronous IO for Hive • Transparent, compressed in-memory cache • Format-specific, extensible • SSD cache is work in progress RecordReade r Fragment Cache IO thread Plan & decode Read, decompress Metadata cache Actual data (HDFS, S3, …) What to read Data buffers Splits Vectorized data
8.
Page8 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved LLAP in Hive; performance primer
9.
Page9 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved Overview - Hive + LLAP • Transparent to Hive users, BI tools, etc. • Except the queries become faster :) • Number of concurrent queries throttled by Hive Server • Hive decides where query fragments run (LLAP, Container, AM) based on configuration, data size, format, etc. • Each Query coordinated independently by a Tez AM • Hive Operators used for processing • Tez Runtime used for data transfer HiveServer2 Query/AM Controller Client(s) YARN Cluster AM1 llapd llapd Container AM1 Container AM1 llapd Container AM2 AM2 AM3 llapd
10.
Page10 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved How does LLAP speed up the queries? • Eliminates container startup costs • JIT optimizer has a chance to work • Especially important for vectorized processing • Asynchronous IO • Caching • Data sharing (hash join tables, etc.) • Optimizations for parallel queries
11.
Page11 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved Industry benchmark – 10Tb scale 0 5 10 15 20 25 30 35 40 45 50 query3 query12 query20 query21 query26 query27 query42 query52 query55 query73 query89 query91 query98 Time(seconds) LLAP vs Hive 1.x 10TB Scale Hive 1.x LLAP90% faster
12.
Page12 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved Evaluation for a real use case 0 20000 1 3 5 7 9 11 13 15 17 19 21 23 25 27 • Aggregate daily statistics for a time interval: SELECT yyyymmdd, sum(total_1), sum(total_2), ... from table where yyyymmdd >= xxx and yyyymmdd < xxx and userid = xxx group by userid, yyyymmdd; Max Max Max Avg Avg Avg 0 1 2 3 4 5 6 7 8 D W M Y D W M Y D W M Y Tez Phoenix LLAP Executiontime,s
13.
Page13 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved Evaluation for a real use case ExecutionTime,s. • Display a large report • Phoenix runtimes not displayed (30-250s on large ranges) Execution Time in seconds over time range Max Max Avg Avg 0 5 10 15 20 D W M Y D W M Y Tez LLAP Executiontime,s
14.
Page14 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved How does LLAP make queries faster?
15.
Page15 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved Persistent daemons (recap) • Reduced startup time • Better use of JIT optimizer • Hash join hash tables, fragment plans are shared • Multiple tasks do not all generate the table or deserialize the plans
16.
Page16 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved Performance – first query • Cold LLAP is nearly as fast as shared pre-warmed containers (impractical on real clusters) • Realistic (long-running) LLAP ~3x faster than realistic (no prewarm) Tez No prewarm, 20.94 Shared prewarm 11.96 Cold LLAP 13.23 Realistic LLAP 7.49 0 5 10 15 20 25 Firstqueryruntime,s
17.
Page17 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved Performance – repeated query • Cache disabled! 13.23 7.93 7.7 6.29 5.44 5.3 5.01 4.89 4.83 7.49 6.1 4.61 4.59 4.93 4.62 4.63 4.45 4.25 0 2 4 6 8 10 12 14 0 1 2 3 4 5 6 7 8 Queryruntime,s Query run Cold LLAP LLAP after an unrelated workload
18.
Page18 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved Parallel queries – priorities, preemption • Lower-priority fragments can be preempted • For example, a fragment can start running before its inputs are ready, for better pipelining; such fragments may be preempted • LLAP work queue examines the DAG parameters to give preference to interactive (BI) queries LLAP QueueExecutor Executor Interactive query map 1/3 … Interactive query map 3/3 Executor Interactive query map 2/3 Wide query reduce waiting Time ClusterUtilization Long-Running Query Short-Running Query
19.
Page19 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved Parallel query execution – LLAP vs Hive 1.2 3131 2416 1741 1420 1508 531 291 147 94 75 100% 91% 90% 71% 44% 45% 13% 0.00% 20.00% 40.00% 60.00% 80.00% 100.00% 0 500 1000 1500 2000 2500 3000 3500 1 2 4 8 16 Scaling(comparedtolinear) Totalruntimeforallqueries,s. Number of concurrent users Hive 1.2 linear scaling Hive 1.2 total runtime LLAP linear scaling LLAP total runtime Hive+LLAP efficiency Hive 1.2 efficiency
20.
Page20 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved Parallel query execution – 10Tb scale 531 291 147 94 75 100% 91% 90% 71% 44% 0.00% 20.00% 40.00% 60.00% 80.00% 100.00% 0 100 200 300 400 500 600 1 2 4 8 16 Scaling(comparedtolinear) Totalruntimeforallqueries,s. Number of concurrent users LLAP linear scaling LLAP total runtime Hive+LLAP efficiency
21.
Page21 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved IO elevator and caching • Reading, decoding and processing are parallel • Unlike in regular Hive, where processing can wait for e.g. an S3 read • Logically-compresses (e.g. ORC-encoded) data is cached off-heap • Metadata is cached (currently on-heap) with high priority • Replacement policy is pluggable (LRFU is the default) • Tez schedules work to preferred location to improve locality • Tradeoff between HDFS reads and operator speed • Cache takes memory away from operators (sort buffers, hash join tables, etc.) • Depends on how much memory you have, workflow, dataset size, etc. • Ex. 4Gb per executor x 16 CPUs – 64Gb for executors, rest for cache
22.
Page22 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved I/OExecution In-memory processing – present and future Decoder Distributed FS Fragment Hive operator Hive operator Vectorized processin g col1 col2 Low-level pushdown (e.g. filter)* SSD cache* Off-heap cache Compression codec Native data vectors - work in progress * Compact encoded data Compressed data col1 col2
23.
Page23 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved Fragment I/OExecution In-memory processing – future? Distributed FSHive operator Hive operator col1 col2 SSD cache* Off-heap cache Compression codec - work in progress * Compact encoded data Compressed data Processing on encoded data, codegen Zero copy - next steps? * Low-level operators (e.g. filter) *
24.
Page24 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved Performance – cache on slow FS • Industry-standard cloud FS, much slower than on-prem HDFS • Large scan on 1Tb scale • Local HD/SDD cache (WIP) • Massive improvement on slow storage with little memory cost 0 50 100 150 200 250 300 350 400 Cold cache Hot cache Runtime,s Entire query Scan portion
25.
Page25 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved Performance – cache on HDFS, 1Tb scale 20.2 18.3 17.6 16.2 16.2 16.8 14.5 11.3 0.00% 20.00% 40.00% 60.00% 80.00% 100.00% 0 5 10 15 20 25 24 26 28 30 32 34 36 38 40 42 Cachehitrate Queryruntime,s Cache size, Gb Query runtime, after unrelated workload Query runtime, after related workload Metadata hit rate Data hit rate
26.
Page26 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved Perf overview – a demo in a gif
27.
Page27 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved LLAP as the unified data access layer
28.
Page28 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved Overview • LLAP can provide a "relational datanode" view of the data • Security via endpoint ACLs, or fragment signing (for external engines) • Granular, e.g. column-level, security is possible • Anyone (with access) can push the (approved) code in, from complex query fragments to simple data reads • SparkSQL/LLAP integration is based on SparkSQL data source APIs • DataFrame can be created with LlapInputFormat • Hive execution engines and other DAG executors can use LLAP directly • like Tez does now
29.
Page29 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved Example - SparkSQL integration • Allowing direct access to Hive table data from Spark breaks security model for warehouses secured using Ranger or SQL Standard Authorization • Other features may not work correctly unless support added in Spark • Row/Cell level security (when implemented), ACID, Schema Evolution, … • SparkSQL has interfaces for external sources; they can be implemented to access Hive data via HS2/LLAP • SparkSQL Catalyst optimizer hooks can be used to push processing into LLAP (e.g. filters on the tables) • Some optimizations, like dynamic partition pruning, can be used by Hive even if SparkSQL doesn't support them; if execution pushdown is advanced enough
30.
Page30 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved Example - SparkSQL integration – execution flow Page 30 HadoopRDD.compute() LlapInputFormat.getRecordReader() - Open socket for incoming data - Send package/plan to LLAP Check permissions Generate splits w/LLAP locations Return securely signed splits HiveServer2 Spark Executor HadoopRDD.getPartitions() Get Hive splits Cluster nodes Verify splits Run query plan Send data back LLAP Partitions/ Splits Request splits : : var llapContext = LlapContext.newInstance( sparkContext, jdbcUrl) var df: DataFrame = llapContext.sql("select * from tpch_text_5.region") DataFrame for Hive/LLAP data
31.
Page31 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved Example - Tez/LLAP integration • Tez DAG coordination remains mostly unchanged • Tez plugins (TEZ-2003) • Scheduling – allows scheduling parts of a DAG as fragments on LLAP • Task communication – custom execution specifications, protocols – allows talking to LLAP, manages security tokens, etc. • Hive operators used for processing • Few or no changes necessary to support any Hive query complexity • All new Hive features automatically supported
32.
Page32 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved Deploying and managing LLAP
33.
Page33 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved Availability • First version shipped in Apache Hive 2.0 • Hive 2.0.1 contains important bugfixes to make it production ready • Hive 2.1 would have new features and further perf improvements • A solid platform for the basic scenario – run LLAP-only queries on a secure cluster • or a fraction thereof • Work in progress to support additional features (ACID, UDFs, hybrid deployments, etc.) • Unified data access layer is the next step
34.
Page34 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved General overview • LLAP daemons run on YARN; deployed via Apache Slider • Easy to bring up, tear down, flex clusters via slider commands • Hive-on-Tez should be set up with a compatible version (0.8.2+) • Zookeeper for service discovery and coordination • On HS2, using doAs is not recommended; use SQL auth or Ranger • However, you can keep using storage-based auth for other queries, as usual • Java 8 strongly recommended • April 2015 called and they want their Java 7 End-of-Lived
35.
Page35 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved Starting the cluster • Apache Ambari integration – WIP (will do all of this for you) • Hive service (hive --service llap) • Generates a slider package, and a script to deploy it and start the cluster • Run as the correct user – slider paths are user-specific! • HADOOP_HOME, JAVA_HOME should be set • kinit on secure cluster • Specify a name, e.g. --name llap0; number of instances (e.g. one per machine); memory, cache size, number of executors (e.g. one per CPU); see --help • G1 GC recommended (e.g. --args " -XX:+UseG1GC -XX:+ResizeTLAB - XX:-ResizePLAB")
36.
Page36 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved Monitoring • LLAP exposes a UI for monitoring • Also has jmx endpoint with much more data, logs and jstack endpoints as usual • Aggregate monitoring UI is work in progress
37.
Page37 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved Running queries • Once the cluster is started, queries can be run from HS2 and CLI • --hiveconf hive.execution.mode=llap --hiveconf hive.llap.execution.mode=all --hiveconf hive.llap.io.enabled=true --hiveconf hive.llap.daemon.service.hosts=@<cluster name> • Set the usual perf settings (recommended, if not already set) • Enable vectorization, PPD, map join; use ORC • LLAP-specific: set hive.tez.input.generate.consistent.splits=true; • For interactive queries, disable CBO • Parallel compilation on HS2 – otherwise your parallel queries might bottleneck there! • Good to go!
38.
Page38 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved Watching queries – Tez UI integration
39.
Page39 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved Security • Again, Ambari will do this for you soon • SQL or Ranger auth recommended, with hive user running queries • LLAP uses keytabs and ACLs, similar to datanode, to secure endpoints • Keytabs need to be set up; certs may need to be set up for Web UI • HS2 (or client) obtains a token to access a particular LLAP cluster, and gives it to Tez AM • LLAP-s share tokens using secure ZK, similar to HA token sharing • Signing fragments on HS2 and granular security checks are work in progress
40.
Page40 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved Summary and future work
41.
Page41 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved Future work • Data view for external services • Column-level security • Better integration and deployment • Ambari, monitoring UI, fine-grained log aggregation • Hybrid deployment • Resizing LLAP to accommodate containers, YARN resource delegation, etc. • Feature work • ACID, text cache, better cache policies, UDFs, SSD cache, etc. • More performance work
42.
Page42 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved Summary • LLAP is a new execution substrate for fast, concurrent analytical workloads, harnessing Hive vectorized SQL engine and efficient in-memory caching layer • Provides secure relational view of the data thru a standard InputFormat API • Available in Hive 2, with better ecosystem integration in the works
43.
Page43 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved Questions? ? Interested? Stop by the Hortonworks booth to learn more
44.
Page44 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved Backup slides
45.
Page45 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved Example execution: MR vs Tez vs Tez+LLAP M M M R R M M R M M R M M R HDFS HDFS HDFS T T T T T R T T T R M M M R R R M M R R HDFS MapReduce Tez Tez with LLAP
46.
Page46 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved IO Elevator • Reading, decoding and processing are parallel • Unlike in regular Hive • HDFS reads are expensive (more so on S3, Azure) • Data decompression and decoding is expensive
Notas del editor
6 months note
Descargar ahora