Enviar búsqueda
Cargar
Hive Optimizations and New Features in 0.11-0.13
•
Descargar como PPTX, PDF
•
7 recomendaciones
•
4,545 vistas
Título mejorado por IA
A
alanfgates
Seguir
Tecnología
Denunciar
Compartir
Denunciar
Compartir
1 de 19
Descargar ahora
Recomendados
Strata Stinger Talk October 2013
Strata Stinger Talk October 2013
alanfgates
Hive acid-updates-summit-sjc-2014
Hive acid-updates-summit-sjc-2014
alanfgates
Achieving 100k Queries per Hour on Hive on Tez
Achieving 100k Queries per Hour on Hive on Tez
DataWorks Summit/Hadoop Summit
Evolving HDFS to a Generalized Distributed Storage Subsystem
Evolving HDFS to a Generalized Distributed Storage Subsystem
DataWorks Summit/Hadoop Summit
Optimizing Hive Queries
Optimizing Hive Queries
DataWorks Summit
De-Bugging Hive with Hadoop-in-the-Cloud
De-Bugging Hive with Hadoop-in-the-Cloud
DataWorks Summit
Sub-second-sql-on-hadoop-at-scale
Sub-second-sql-on-hadoop-at-scale
Yifeng Jiang
Optimizing Hive Queries
Optimizing Hive Queries
Owen O'Malley
Recomendados
Strata Stinger Talk October 2013
Strata Stinger Talk October 2013
alanfgates
Hive acid-updates-summit-sjc-2014
Hive acid-updates-summit-sjc-2014
alanfgates
Achieving 100k Queries per Hour on Hive on Tez
Achieving 100k Queries per Hour on Hive on Tez
DataWorks Summit/Hadoop Summit
Evolving HDFS to a Generalized Distributed Storage Subsystem
Evolving HDFS to a Generalized Distributed Storage Subsystem
DataWorks Summit/Hadoop Summit
Optimizing Hive Queries
Optimizing Hive Queries
DataWorks Summit
De-Bugging Hive with Hadoop-in-the-Cloud
De-Bugging Hive with Hadoop-in-the-Cloud
DataWorks Summit
Sub-second-sql-on-hadoop-at-scale
Sub-second-sql-on-hadoop-at-scale
Yifeng Jiang
Optimizing Hive Queries
Optimizing Hive Queries
Owen O'Malley
ORC 2015: Faster, Better, Smaller
ORC 2015: Faster, Better, Smaller
The Apache Software Foundation
Evolving HDFS to Generalized Storage Subsystem
Evolving HDFS to Generalized Storage Subsystem
DataWorks Summit/Hadoop Summit
HiveACIDPublic
HiveACIDPublic
Inderaj (Raj) Bains
Tune up Yarn and Hive
Tune up Yarn and Hive
rxu
Speed Up Your Queries with Hive LLAP Engine on Hadoop or in the Cloud
Speed Up Your Queries with Hive LLAP Engine on Hadoop or in the Cloud
gluent.
Hive+Tez: A performance deep dive
Hive+Tez: A performance deep dive
t3rmin4t0r
Adding ACID Transactions, Inserts, Updates, and Deletes in Apache Hive
Adding ACID Transactions, Inserts, Updates, and Deletes in Apache Hive
DataWorks Summit
Apache Hive ACID Project
Apache Hive ACID Project
DataWorks Summit/Hadoop Summit
Hive & HBase for Transaction Processing Hadoop Summit EU Apr 2015
Hive & HBase for Transaction Processing Hadoop Summit EU Apr 2015
alanfgates
Using Apache Hive with High Performance
Using Apache Hive with High Performance
Inderaj (Raj) Bains
ORC 2015: Faster, Better, Smaller
ORC 2015: Faster, Better, Smaller
DataWorks Summit
Hive on Spark, production experience @Uber
Hive on Spark, production experience @Uber
Future of Data Meetup
Query Engines for Hive: MR, Spark, Tez with LLAP – Considerations!
Query Engines for Hive: MR, Spark, Tez with LLAP – Considerations!
Mich Talebzadeh (Ph.D.)
Apache Hive on ACID
Apache Hive on ACID
DataWorks Summit/Hadoop Summit
Apache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query Processing
Bikas Saha
A Container-based Sizing Framework for Apache Hadoop/Spark Clusters
A Container-based Sizing Framework for Apache Hadoop/Spark Clusters
DataWorks Summit/Hadoop Summit
Hadoop operations-2015-hadoop-summit-san-jose-v5
Hadoop operations-2015-hadoop-summit-san-jose-v5
Chris Nauroth
The Heterogeneous Data lake
The Heterogeneous Data lake
DataWorks Summit/Hadoop Summit
Hortonworks.Cluster Config Guide
Hortonworks.Cluster Config Guide
Douglas Bernardini
NYC HUG - Application Architectures with Apache Hadoop
NYC HUG - Application Architectures with Apache Hadoop
markgrover
Gunther hagleitner:apache hive & stinger
Gunther hagleitner:apache hive & stinger
hdhappy001
What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?
DataWorks Summit
Más contenido relacionado
La actualidad más candente
ORC 2015: Faster, Better, Smaller
ORC 2015: Faster, Better, Smaller
The Apache Software Foundation
Evolving HDFS to Generalized Storage Subsystem
Evolving HDFS to Generalized Storage Subsystem
DataWorks Summit/Hadoop Summit
HiveACIDPublic
HiveACIDPublic
Inderaj (Raj) Bains
Tune up Yarn and Hive
Tune up Yarn and Hive
rxu
Speed Up Your Queries with Hive LLAP Engine on Hadoop or in the Cloud
Speed Up Your Queries with Hive LLAP Engine on Hadoop or in the Cloud
gluent.
Hive+Tez: A performance deep dive
Hive+Tez: A performance deep dive
t3rmin4t0r
Adding ACID Transactions, Inserts, Updates, and Deletes in Apache Hive
Adding ACID Transactions, Inserts, Updates, and Deletes in Apache Hive
DataWorks Summit
Apache Hive ACID Project
Apache Hive ACID Project
DataWorks Summit/Hadoop Summit
Hive & HBase for Transaction Processing Hadoop Summit EU Apr 2015
Hive & HBase for Transaction Processing Hadoop Summit EU Apr 2015
alanfgates
Using Apache Hive with High Performance
Using Apache Hive with High Performance
Inderaj (Raj) Bains
ORC 2015: Faster, Better, Smaller
ORC 2015: Faster, Better, Smaller
DataWorks Summit
Hive on Spark, production experience @Uber
Hive on Spark, production experience @Uber
Future of Data Meetup
Query Engines for Hive: MR, Spark, Tez with LLAP – Considerations!
Query Engines for Hive: MR, Spark, Tez with LLAP – Considerations!
Mich Talebzadeh (Ph.D.)
Apache Hive on ACID
Apache Hive on ACID
DataWorks Summit/Hadoop Summit
Apache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query Processing
Bikas Saha
A Container-based Sizing Framework for Apache Hadoop/Spark Clusters
A Container-based Sizing Framework for Apache Hadoop/Spark Clusters
DataWorks Summit/Hadoop Summit
Hadoop operations-2015-hadoop-summit-san-jose-v5
Hadoop operations-2015-hadoop-summit-san-jose-v5
Chris Nauroth
The Heterogeneous Data lake
The Heterogeneous Data lake
DataWorks Summit/Hadoop Summit
Hortonworks.Cluster Config Guide
Hortonworks.Cluster Config Guide
Douglas Bernardini
NYC HUG - Application Architectures with Apache Hadoop
NYC HUG - Application Architectures with Apache Hadoop
markgrover
La actualidad más candente
(20)
ORC 2015: Faster, Better, Smaller
ORC 2015: Faster, Better, Smaller
Evolving HDFS to Generalized Storage Subsystem
Evolving HDFS to Generalized Storage Subsystem
HiveACIDPublic
HiveACIDPublic
Tune up Yarn and Hive
Tune up Yarn and Hive
Speed Up Your Queries with Hive LLAP Engine on Hadoop or in the Cloud
Speed Up Your Queries with Hive LLAP Engine on Hadoop or in the Cloud
Hive+Tez: A performance deep dive
Hive+Tez: A performance deep dive
Adding ACID Transactions, Inserts, Updates, and Deletes in Apache Hive
Adding ACID Transactions, Inserts, Updates, and Deletes in Apache Hive
Apache Hive ACID Project
Apache Hive ACID Project
Hive & HBase for Transaction Processing Hadoop Summit EU Apr 2015
Hive & HBase for Transaction Processing Hadoop Summit EU Apr 2015
Using Apache Hive with High Performance
Using Apache Hive with High Performance
ORC 2015: Faster, Better, Smaller
ORC 2015: Faster, Better, Smaller
Hive on Spark, production experience @Uber
Hive on Spark, production experience @Uber
Query Engines for Hive: MR, Spark, Tez with LLAP – Considerations!
Query Engines for Hive: MR, Spark, Tez with LLAP – Considerations!
Apache Hive on ACID
Apache Hive on ACID
Apache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query Processing
A Container-based Sizing Framework for Apache Hadoop/Spark Clusters
A Container-based Sizing Framework for Apache Hadoop/Spark Clusters
Hadoop operations-2015-hadoop-summit-san-jose-v5
Hadoop operations-2015-hadoop-summit-san-jose-v5
The Heterogeneous Data lake
The Heterogeneous Data lake
Hortonworks.Cluster Config Guide
Hortonworks.Cluster Config Guide
NYC HUG - Application Architectures with Apache Hadoop
NYC HUG - Application Architectures with Apache Hadoop
Similar a Hive Optimizations and New Features in 0.11-0.13
Gunther hagleitner:apache hive & stinger
Gunther hagleitner:apache hive & stinger
hdhappy001
What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?
DataWorks Summit
What's New in Apache Hive 3.0 - Tokyo
What's New in Apache Hive 3.0 - Tokyo
DataWorks Summit
La big datacamp2014_vikram_dixit
La big datacamp2014_vikram_dixit
Data Con LA
Big data processing engines, Atlanta Meetup 4/30
Big data processing engines, Atlanta Meetup 4/30
Ashish Narasimham
Hadoop Now, Next and Beyond
Hadoop Now, Next and Beyond
DataWorks Summit
Discover.hdp2.2.storm and kafka.final
Discover.hdp2.2.storm and kafka.final
Hortonworks
Big data solutions in Azure
Big data solutions in Azure
Mostafa
Storage and-compute-hdfs-map reduce
Storage and-compute-hdfs-map reduce
Chris Nauroth
Stinger Initiative - Deep Dive
Stinger Initiative - Deep Dive
Hortonworks
Building Big data solutions in Azure
Building Big data solutions in Azure
Mostafa
The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...
The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...
Big Data Spain
Apache Hadoop 3 updates with migration story
Apache Hadoop 3 updates with migration story
Sunil Govindan
Introduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystem
Shivaji Dutta
Yahoo! Hack Europe Workshop
Yahoo! Hack Europe Workshop
Hortonworks
Hortonworks tech workshop in-memory processing with spark
Hortonworks tech workshop in-memory processing with spark
Hortonworks
Apache Tez -- A modern processing engine
Apache Tez -- A modern processing engine
bigdatagurus_meetup
An Apache Hive Based Data Warehouse
An Apache Hive Based Data Warehouse
DataWorks Summit
Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017
alanfgates
Stinger.Next by Alan Gates of Hortonworks
Stinger.Next by Alan Gates of Hortonworks
Data Con LA
Similar a Hive Optimizations and New Features in 0.11-0.13
(20)
Gunther hagleitner:apache hive & stinger
Gunther hagleitner:apache hive & stinger
What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0 - Tokyo
What's New in Apache Hive 3.0 - Tokyo
La big datacamp2014_vikram_dixit
La big datacamp2014_vikram_dixit
Big data processing engines, Atlanta Meetup 4/30
Big data processing engines, Atlanta Meetup 4/30
Hadoop Now, Next and Beyond
Hadoop Now, Next and Beyond
Discover.hdp2.2.storm and kafka.final
Discover.hdp2.2.storm and kafka.final
Big data solutions in Azure
Big data solutions in Azure
Storage and-compute-hdfs-map reduce
Storage and-compute-hdfs-map reduce
Stinger Initiative - Deep Dive
Stinger Initiative - Deep Dive
Building Big data solutions in Azure
Building Big data solutions in Azure
The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...
The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...
Apache Hadoop 3 updates with migration story
Apache Hadoop 3 updates with migration story
Introduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystem
Yahoo! Hack Europe Workshop
Yahoo! Hack Europe Workshop
Hortonworks tech workshop in-memory processing with spark
Hortonworks tech workshop in-memory processing with spark
Apache Tez -- A modern processing engine
Apache Tez -- A modern processing engine
An Apache Hive Based Data Warehouse
An Apache Hive Based Data Warehouse
Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017
Stinger.Next by Alan Gates of Hortonworks
Stinger.Next by Alan Gates of Hortonworks
Más de alanfgates
Hive Performance Dataworks Summit Melbourne February 2019
Hive Performance Dataworks Summit Melbourne February 2019
alanfgates
Hive 3 New Horizons DataWorks Summit Melbourne February 2019
Hive 3 New Horizons DataWorks Summit Melbourne February 2019
alanfgates
Standalone metastore-dws-sjc-june-2018
Standalone metastore-dws-sjc-june-2018
alanfgates
Hortonworks apache training
Hortonworks apache training
alanfgates
Keynote apache bd-eu-nov-2016
Keynote apache bd-eu-nov-2016
alanfgates
Big data spain keynote nov 2016
Big data spain keynote nov 2016
alanfgates
Hive2.0 big dataspain-nov-2016
Hive2.0 big dataspain-nov-2016
alanfgates
Hive ACID Apache BigData 2016
Hive ACID Apache BigData 2016
alanfgates
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
alanfgates
Hive acid-updates-strata-sjc-feb-2015
Hive acid-updates-strata-sjc-feb-2015
alanfgates
Stinger hadoop summit june 2013
Stinger hadoop summit june 2013
alanfgates
Strata feb2013
Strata feb2013
alanfgates
Más de alanfgates
(12)
Hive Performance Dataworks Summit Melbourne February 2019
Hive Performance Dataworks Summit Melbourne February 2019
Hive 3 New Horizons DataWorks Summit Melbourne February 2019
Hive 3 New Horizons DataWorks Summit Melbourne February 2019
Standalone metastore-dws-sjc-june-2018
Standalone metastore-dws-sjc-june-2018
Hortonworks apache training
Hortonworks apache training
Keynote apache bd-eu-nov-2016
Keynote apache bd-eu-nov-2016
Big data spain keynote nov 2016
Big data spain keynote nov 2016
Hive2.0 big dataspain-nov-2016
Hive2.0 big dataspain-nov-2016
Hive ACID Apache BigData 2016
Hive ACID Apache BigData 2016
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
Hive acid-updates-strata-sjc-feb-2015
Hive acid-updates-strata-sjc-feb-2015
Stinger hadoop summit june 2013
Stinger hadoop summit june 2013
Strata feb2013
Strata feb2013
Último
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Igalia
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
V3cube
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
Enterprise Knowledge
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
Results
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
Michael W. Hawkins
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
The Digital Insurer
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Drew Madelung
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
HampshireHUG
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
Sinan KOZAK
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
Delhi Call girls
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
Puma Security, LLC
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
Delhi Call girls
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
Delhi Call girls
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
Enterprise Knowledge
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
Maria Levchenko
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
Martijn de Jong
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
Malak Abu Hammad
Último
(20)
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
Hive Optimizations and New Features in 0.11-0.13
1.
© Hortonworks Inc.
2013.© Hortonworks Inc. 2013. Hive for Analytic Workloads Alan Gates (@alanfgates)
2.
© Hortonworks Inc.
2013.© Hortonworks Inc. 2013. Stinger Project (announced February 2013) Batch AND Interactive SQL-IN-Hadoop Stinger Initiative A broad, community-based effort to drive the next generation of HIVE Hive 0.13, April 2014: • Hive on Apache Tez • SQL standard authorization • Permanent UDFs • Vectorized Processing Hive 0.11, May 2013: • Base Optimizations • SQL Analytic Functions • ORCFile, Modern File Format Hive 0.12, October 2013: • VARCHAR, DATE Types • ORCFile predicate pushdown • Advanced Optimizations • Performance Boosts via YARN Speed Improve Hive query performance by 100X to allow for interactive query times (seconds) Scale The only SQL interface to Hadoop designed for queries that scale from TB to PB SQL Support broadest range of SQL semantics for analytic applications running against Hadoop …all IN Hadoop Goals:
3.
© Hortonworks Inc.
2013.© Hortonworks Inc. 2013. Stinger Highlights • 13 months • 145 separate contributors – from 44 separate entities • 3 Hive releases, 0.11, 0.12, and 0.13 • 392,000 lines of new Java code
4.
© Hortonworks Inc.
2013.© Hortonworks Inc. 2013. Now this is not the end. It is not even the beginning of the end. But it is, perhaps, the end of the beginning. -Winston Churchill
5.
© Hortonworks Inc.
2013.© Hortonworks Inc. 2013. Hive 0.13 Performance • The TPC Benchmark™DS is a decision support benchmark that models queries and data maintenance. It evaluates decision support systems that examine large volumes of data to answer real-world business questions. • Test: 50 SQL queries on Hive 0.13 • Test Environment – Driven by the Hive Testbench: https://github.com/cartershanklin/hive-testbench – Nodes: 20 nodes, 256 GB per node – only 48G per node used for Hive – Drives: 6x 4TB WDC WD4000FYYZ-0 drives per node – Interconnect: 10GB – Processors: 2x Intel(R) Xeon(R) CPU E5-2640 v2 @ 2.00GHz for total of 16 CPU cores per machine – Scale: 30K (30T total data)
6.
© Hortonworks Inc.
2013.© Hortonworks Inc. 2013. Benchmark Results Queries modified to have partition key that duplicates join key, making it easier for the optimizer to choose which partitions to scan.
7.
© Hortonworks Inc.
2013.© Hortonworks Inc. 2013. Benchmark Results Queries modified to have partition key that duplicates join key, making it easier for the optimizer to choose which partitions to scan.
8.
© Hortonworks Inc.
2013.© Hortonworks Inc. 2013. SQL Semantics Release SQL Semantics Hive 0.10 & before SELECT, JOIN, WHERE, GROUP BY, HAVING, ORDER BY, UNION, ROLLUP/CUBE, subqueries in FROM Hive 0.11 Windowing functions (RANK, ROW_NUMBER) and OVER clause Hive 0.13 • Subqueries with IN, EXISTS in WHERE and HAVING • Common table expressions (WITH clause) • Join condition in WHERE • CREATE FUNCTION (stored on cluster) Next Steps • Temporary tables • Subqueries with equality and inequality operators • Full UNION support • Set operators, EXCEPT and INTERSECT
9.
© Hortonworks Inc.
2013.© Hortonworks Inc. 2013. Security Release Security Hive 0.12 & before • StorageBasedAuthorizationProvider, maps file level security • secure, based on HDFS security • coarse grained, no column or row level security • default, all advisory • everyone has grant permissions Hive 0.13 SQL standard security for tables, views, and databases • GRANT/REVOKE • ROLEs • Column and row level permissions via views Next Steps • Integration with XA Secure • Extend to cover execution of functions
10.
© Hortonworks Inc.
2013.© Hortonworks Inc. 2013. Data Type Conformance Release Available Data Types Hive 0.10 & before Integer types, floating types, string, array, map, struct, timestamp, binary Hive 0.11 decimal (default precision and scale only) Hive 0.12 date, varchar Hive 0.13 char, user defined precision and scale for decimal
11.
© Hortonworks Inc.
2013.© Hortonworks Inc. 2013. Read and Write, ACID Release Write Capabilities, ACID Compliance Hive 0.12 & before • INSERT and INSERT OVERWRITE available • Locking available, requires ZooKeeper for durability • No ACID Hive 0.13 • ACID compliant ingestion of data from streaming sources such as Flume and Storm • Snapshot isolation for readers Next Steps • Addition of INSERT … VALUES, UPDATE, DELETE • Multi-statement transactions: BEGIN, COMMIT, ROLLBACK • Integration with HCatalog Owen and I have a talk on this at 5:30 today.
12.
© Hortonworks Inc.
2013.© Hortonworks Inc. 2013. Optimizer Release Optimizer Hive 0.11 & before Rules based optimizer • Mostly simple rules such as push filter below join Hive 0.12 Correlation optimizer • Where possible combine related execution into single job Next Steps • Use Optiq for cost based optimization • Join ordering and operator selection using statistics and cost estimates • Expand statistics calculated and used in planning Julian has a talk on this at 4:35 today.
13.
© Hortonworks Inc.
2013.© Hortonworks Inc. 2013. MapReduce is dead, Long live Hadoop
14.
© Hortonworks Inc.
2013.© Hortonworks Inc. 2013. MapReduce is dead, Long live Hadoop Tez Talks: • A New Chapter in Hadoop Data Processing, today 12:05 • Hive on Apache Tez: Benchmarked at Yahoo! Scale, today 12:05 • Hive + Tez: A Performance Deep Dive, today 2:35
15.
© Hortonworks Inc.
2013.© Hortonworks Inc. 2013. ORC File Format •Columnar format for complex data types •Built into Hive from 0.11 •Support for Pig via OrcLoader/OrcStorer •Support for MapReduce via HCat •Two levels of compression –Lightweight type-specific and generic •Built in indexes –Every 10,000 rows with position information –Min, Max, Sum, Count of each column –Supports seek to row number Page 15
16.
© Hortonworks Inc.
2013.© Hortonworks Inc. 2013. ORC File Format • Hive 0.12 –Predicate Push Down –Improved run length encoding –Adaptive string dictionaries –Padding stripes to HDFS block boundaries • Hive 0.13 –Stripe-based Input Splits –Input Split elimination –Vectorized Reader –Customized Pig Load and Store functions –ACID support • Next Steps –Faster writes –Integer dictionaries –Better block buffering Page 16
17.
© Hortonworks Inc.
2013.© Hortonworks Inc. 2013. Vectorized Query Execution •Designed for Modern Processor Architectures –Avoid branching in the inner loop. –Make the most use of L1 and L2 cache. •How It Works –Process records in batches of 1,000 rows –Generate code from templates to minimize branching. •What It Gives –30x improvement in rows processed per second. –Initial prototype: 100M rows/sec on laptop • In Hive 0.13, initial (map) tasks vectorized • Current work: vectorize shuffle and reduce tasks Page 17
18.
© Hortonworks Inc.
2013.© Hortonworks Inc. 2013. Try it Yourself • Apache Hive 0.13 –http://hive.apache.org/downloads.html • Download and play with HDP-2.1 –http://hortonworks.com/products/hortonworks-sandbox/ for use on your laptop –http://hortonworks.com/hdp/ for use on your cluster
19.
© Hortonworks Inc.
2013. Confidential and Proprietary.© Hortonworks Inc. 2013. Confidential and Proprietary. Thank You! @alanfgates @hortonworks
Notas del editor
21 – 29 sec, scan one day of items table
93 – fact to fact left outer join over a years data, finished in around an hour 13 – full year 6 way star join
Descargar ahora