Enviar búsqueda
Cargar
ORC: 2015 Faster, Better, Smaller
•
15 recomendaciones
•
7,702 vistas
DataWorks Summit
Seguir
ORC: 2015 Faster, Better, Smaller Gopal Vijayaraghavan Hortonworks
Leer menos
Leer más
Ciencias
Denunciar
Compartir
Denunciar
Compartir
1 de 31
Recomendados
ORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big Data
DataWorks Summit
Hive and Apache Tez: Benchmarked at Yahoo! Scale
Hive and Apache Tez: Benchmarked at Yahoo! Scale
DataWorks Summit
Hive: Loading Data
Hive: Loading Data
Benjamin Leonhardi
Cassandra Introduction & Features
Cassandra Introduction & Features
DataStax Academy
LLAP: long-lived execution in Hive
LLAP: long-lived execution in Hive
DataWorks Summit
Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep Dive
DataWorks Summit
File Format Benchmarks - Avro, JSON, ORC, & Parquet
File Format Benchmarks - Avro, JSON, ORC, & Parquet
Owen O'Malley
Hadoop Summit 2012 | Optimizing MapReduce Job Performance
Hadoop Summit 2012 | Optimizing MapReduce Job Performance
Cloudera, Inc.
Recomendados
ORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big Data
DataWorks Summit
Hive and Apache Tez: Benchmarked at Yahoo! Scale
Hive and Apache Tez: Benchmarked at Yahoo! Scale
DataWorks Summit
Hive: Loading Data
Hive: Loading Data
Benjamin Leonhardi
Cassandra Introduction & Features
Cassandra Introduction & Features
DataStax Academy
LLAP: long-lived execution in Hive
LLAP: long-lived execution in Hive
DataWorks Summit
Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep Dive
DataWorks Summit
File Format Benchmarks - Avro, JSON, ORC, & Parquet
File Format Benchmarks - Avro, JSON, ORC, & Parquet
Owen O'Malley
Hadoop Summit 2012 | Optimizing MapReduce Job Performance
Hadoop Summit 2012 | Optimizing MapReduce Job Performance
Cloudera, Inc.
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
StampedeCon
Hive Data Modeling and Query Optimization
Hive Data Modeling and Query Optimization
Eyad Garelnabi
Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing
DataWorks Summit
Query Compilation in Impala
Query Compilation in Impala
Cloudera, Inc.
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data Processing
DataWorks Summit
Apache Spark overview
Apache Spark overview
DataArt
Spark shuffle introduction
Spark shuffle introduction
colorant
Zeus: Uber’s Highly Scalable and Distributed Shuffle as a Service
Zeus: Uber’s Highly Scalable and Distributed Shuffle as a Service
Databricks
Optimizing Hive Queries
Optimizing Hive Queries
Owen O'Malley
Integrating NiFi and Flink
Integrating NiFi and Flink
Bryan Bende
Apache NiFi Record Processing
Apache NiFi Record Processing
Bryan Bende
Introduction to Apache NiFi dws19 DWS - DC 2019
Introduction to Apache NiFi dws19 DWS - DC 2019
Timothy Spann
ORC File & Vectorization - Improving Hive Data Storage and Query Performance
ORC File & Vectorization - Improving Hive Data Storage and Query Performance
DataWorks Summit
Apache Sparkにおけるメモリ - アプリケーションを落とさないメモリ設計手法 -
Apache Sparkにおけるメモリ - アプリケーションを落とさないメモリ設計手法 -
Yoshiyasu SAEKI
Hive tuning
Hive tuning
Michael Zhang
Hive+Tez: A performance deep dive
Hive+Tez: A performance deep dive
t3rmin4t0r
zenoh -- the ZEro Network OverHead protocol
zenoh -- the ZEro Network OverHead protocol
Angelo Corsaro
Apache NiFi in the Hadoop Ecosystem
Apache NiFi in the Hadoop Ecosystem
DataWorks Summit/Hadoop Summit
What is new in Apache Hive 3.0?
What is new in Apache Hive 3.0?
DataWorks Summit
HBase Low Latency
HBase Low Latency
DataWorks Summit
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
DataWorks Summit/Hadoop Summit
Achieving 100k Queries per Hour on Hive on Tez
Achieving 100k Queries per Hour on Hive on Tez
DataWorks Summit/Hadoop Summit
Más contenido relacionado
La actualidad más candente
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
StampedeCon
Hive Data Modeling and Query Optimization
Hive Data Modeling and Query Optimization
Eyad Garelnabi
Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing
DataWorks Summit
Query Compilation in Impala
Query Compilation in Impala
Cloudera, Inc.
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data Processing
DataWorks Summit
Apache Spark overview
Apache Spark overview
DataArt
Spark shuffle introduction
Spark shuffle introduction
colorant
Zeus: Uber’s Highly Scalable and Distributed Shuffle as a Service
Zeus: Uber’s Highly Scalable and Distributed Shuffle as a Service
Databricks
Optimizing Hive Queries
Optimizing Hive Queries
Owen O'Malley
Integrating NiFi and Flink
Integrating NiFi and Flink
Bryan Bende
Apache NiFi Record Processing
Apache NiFi Record Processing
Bryan Bende
Introduction to Apache NiFi dws19 DWS - DC 2019
Introduction to Apache NiFi dws19 DWS - DC 2019
Timothy Spann
ORC File & Vectorization - Improving Hive Data Storage and Query Performance
ORC File & Vectorization - Improving Hive Data Storage and Query Performance
DataWorks Summit
Apache Sparkにおけるメモリ - アプリケーションを落とさないメモリ設計手法 -
Apache Sparkにおけるメモリ - アプリケーションを落とさないメモリ設計手法 -
Yoshiyasu SAEKI
Hive tuning
Hive tuning
Michael Zhang
Hive+Tez: A performance deep dive
Hive+Tez: A performance deep dive
t3rmin4t0r
zenoh -- the ZEro Network OverHead protocol
zenoh -- the ZEro Network OverHead protocol
Angelo Corsaro
Apache NiFi in the Hadoop Ecosystem
Apache NiFi in the Hadoop Ecosystem
DataWorks Summit/Hadoop Summit
What is new in Apache Hive 3.0?
What is new in Apache Hive 3.0?
DataWorks Summit
HBase Low Latency
HBase Low Latency
DataWorks Summit
La actualidad más candente
(20)
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
Hive Data Modeling and Query Optimization
Hive Data Modeling and Query Optimization
Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing
Query Compilation in Impala
Query Compilation in Impala
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Spark overview
Apache Spark overview
Spark shuffle introduction
Spark shuffle introduction
Zeus: Uber’s Highly Scalable and Distributed Shuffle as a Service
Zeus: Uber’s Highly Scalable and Distributed Shuffle as a Service
Optimizing Hive Queries
Optimizing Hive Queries
Integrating NiFi and Flink
Integrating NiFi and Flink
Apache NiFi Record Processing
Apache NiFi Record Processing
Introduction to Apache NiFi dws19 DWS - DC 2019
Introduction to Apache NiFi dws19 DWS - DC 2019
ORC File & Vectorization - Improving Hive Data Storage and Query Performance
ORC File & Vectorization - Improving Hive Data Storage and Query Performance
Apache Sparkにおけるメモリ - アプリケーションを落とさないメモリ設計手法 -
Apache Sparkにおけるメモリ - アプリケーションを落とさないメモリ設計手法 -
Hive tuning
Hive tuning
Hive+Tez: A performance deep dive
Hive+Tez: A performance deep dive
zenoh -- the ZEro Network OverHead protocol
zenoh -- the ZEro Network OverHead protocol
Apache NiFi in the Hadoop Ecosystem
Apache NiFi in the Hadoop Ecosystem
What is new in Apache Hive 3.0?
What is new in Apache Hive 3.0?
HBase Low Latency
HBase Low Latency
Destacado
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
DataWorks Summit/Hadoop Summit
Achieving 100k Queries per Hour on Hive on Tez
Achieving 100k Queries per Hour on Hive on Tez
DataWorks Summit/Hadoop Summit
Redis edu 1
Redis edu 1
DaeMyung Kang
이것이 레디스다.
이것이 레디스다.
Kris Jeong
Redis edu 3
Redis edu 3
DaeMyung Kang
Redis edu 2
Redis edu 2
DaeMyung Kang
Destacado
(6)
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
Achieving 100k Queries per Hour on Hive on Tez
Achieving 100k Queries per Hour on Hive on Tez
Redis edu 1
Redis edu 1
이것이 레디스다.
이것이 레디스다.
Redis edu 3
Redis edu 3
Redis edu 2
Redis edu 2
Similar a ORC: 2015 Faster, Better, Smaller
ORC 2015
ORC 2015
t3rmin4t0r
ORC 2015: Faster, Better, Smaller
ORC 2015: Faster, Better, Smaller
DataWorks Summit
ORC 2015: Faster, Better, Smaller
ORC 2015: Faster, Better, Smaller
The Apache Software Foundation
Sql server 2016 it just runs faster sql bits 2017 edition
Sql server 2016 it just runs faster sql bits 2017 edition
Bob Ward
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
DataWorks Summit/Hadoop Summit
Using Apache Hive with High Performance
Using Apache Hive with High Performance
Inderaj (Raj) Bains
Ceph
Ceph
Hien Nguyen Van
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Odinot Stanislas
Hive on spark is blazing fast or is it final
Hive on spark is blazing fast or is it final
Hortonworks
Sub-second-sql-on-hadoop-at-scale
Sub-second-sql-on-hadoop-at-scale
Yifeng Jiang
Oracle SPARC T7 a M7 servery
Oracle SPARC T7 a M7 servery
MarketingArrowECS_CZ
Ceph Day Berlin: Deploying Flash Storage for Ceph without Compromising Perfor...
Ceph Day Berlin: Deploying Flash Storage for Ceph without Compromising Perfor...
Ceph Community
What's new in Hadoop Common and HDFS
What's new in Hadoop Common and HDFS
DataWorks Summit/Hadoop Summit
DPDK Summit - 08 Sept 2014 - NTT - High Performance vSwitch
DPDK Summit - 08 Sept 2014 - NTT - High Performance vSwitch
Jim St. Leger
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
DataWorks Summit/Hadoop Summit
Hortonworks Technical Workshop - Operational Best Practices Workshop
Hortonworks Technical Workshop - Operational Best Practices Workshop
Hortonworks
Ceph Day Berlin: Ceph on All Flash Storage - Breaking Performance Barriers
Ceph Day Berlin: Ceph on All Flash Storage - Breaking Performance Barriers
Ceph Community
Hive analytic workloads hadoop summit san jose 2014
Hive analytic workloads hadoop summit san jose 2014
alanfgates
Hive for Analytic Workloads
Hive for Analytic Workloads
DataWorks Summit
Ceph Day Seoul - Delivering Cost Effective, High Performance Ceph cluster
Ceph Day Seoul - Delivering Cost Effective, High Performance Ceph cluster
Ceph Community
Similar a ORC: 2015 Faster, Better, Smaller
(20)
ORC 2015
ORC 2015
ORC 2015: Faster, Better, Smaller
ORC 2015: Faster, Better, Smaller
ORC 2015: Faster, Better, Smaller
ORC 2015: Faster, Better, Smaller
Sql server 2016 it just runs faster sql bits 2017 edition
Sql server 2016 it just runs faster sql bits 2017 edition
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
Using Apache Hive with High Performance
Using Apache Hive with High Performance
Ceph
Ceph
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Hive on spark is blazing fast or is it final
Hive on spark is blazing fast or is it final
Sub-second-sql-on-hadoop-at-scale
Sub-second-sql-on-hadoop-at-scale
Oracle SPARC T7 a M7 servery
Oracle SPARC T7 a M7 servery
Ceph Day Berlin: Deploying Flash Storage for Ceph without Compromising Perfor...
Ceph Day Berlin: Deploying Flash Storage for Ceph without Compromising Perfor...
What's new in Hadoop Common and HDFS
What's new in Hadoop Common and HDFS
DPDK Summit - 08 Sept 2014 - NTT - High Performance vSwitch
DPDK Summit - 08 Sept 2014 - NTT - High Performance vSwitch
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
Hortonworks Technical Workshop - Operational Best Practices Workshop
Hortonworks Technical Workshop - Operational Best Practices Workshop
Ceph Day Berlin: Ceph on All Flash Storage - Breaking Performance Barriers
Ceph Day Berlin: Ceph on All Flash Storage - Breaking Performance Barriers
Hive analytic workloads hadoop summit san jose 2014
Hive analytic workloads hadoop summit san jose 2014
Hive for Analytic Workloads
Hive for Analytic Workloads
Ceph Day Seoul - Delivering Cost Effective, High Performance Ceph cluster
Ceph Day Seoul - Delivering Cost Effective, High Performance Ceph cluster
Más de DataWorks Summit
Data Science Crash Course
Data Science Crash Course
DataWorks Summit
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
DataWorks Summit
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
DataWorks Summit
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
DataWorks Summit
Managing the Dewey Decimal System
Managing the Dewey Decimal System
DataWorks Summit
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
DataWorks Summit
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
DataWorks Summit
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
DataWorks Summit
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
Security Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
DataWorks Summit
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
DataWorks Summit
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
DataWorks Summit
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
DataWorks Summit
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
DataWorks Summit
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
DataWorks Summit
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
DataWorks Summit
Más de DataWorks Summit
(20)
Data Science Crash Course
Data Science Crash Course
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Managing the Dewey Decimal System
Managing the Dewey Decimal System
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Security Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Último
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
PirithiRaju
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
PRADYUMMAURYA1
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
kantirani197
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
Damini Dixit
GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)
Areesha Ahmad
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
nishacall1
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
chandars293
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
ayushi9330
module for grade 9 for distance learning
module for grade 9 for distance learning
levieagacer
COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)
AkefAfaneh2
Site Acceptance Test .
Site Acceptance Test .
Poonam Aher Patil
Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdf
Sumit Kumar yadav
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
levieagacer
Introduction,importance and scope of horticulture.pptx
Introduction,importance and scope of horticulture.pptx
Bhagirath Gogikar
IDENTIFICATION OF THE LIVING- forensic medicine
IDENTIFICATION OF THE LIVING- forensic medicine
sherlingomez2
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
Areesha Ahmad
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
RizalinePalanog2
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY
1301aanya
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
Suji236384
Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...
Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...
Mohammad Khajehpour
Último
(20)
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
module for grade 9 for distance learning
module for grade 9 for distance learning
COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)
Site Acceptance Test .
Site Acceptance Test .
Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdf
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
Introduction,importance and scope of horticulture.pptx
Introduction,importance and scope of horticulture.pptx
IDENTIFICATION OF THE LIVING- forensic medicine
IDENTIFICATION OF THE LIVING- forensic medicine
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...
Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...
ORC: 2015 Faster, Better, Smaller
1.
Page1 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved ORC: 2015 Gopal Vijayaraghavan
2.
Page2 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved ORC – Optimized Row-Columnar File Columnar Storage+ Row-groups & Fixed splits Protobuf Metadata Storage+ + Type-safe Vectorization+ Hive ACID transactions+ Single SerDe for Format+
3.
Page3 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved Need for Speed: The Stinger Initiative Stinger: An Open Roadmap to improve Apache Hive’s performance 100x. Launched: February 2013; Delivered: April 2014. Delivered in 100% Apache Open Source. SQL Engine Vectorized SQL Engine Columnar Storage ORC = 100X+ + Distributed Execution Apache Tez
4.
Page4 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved ORC at Facebook Saved more than 1,400 servers worth of storage. Compressioni Compression ratio increased from 5x to 8x globally. Compressioni [1]
5.
Page5 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved ORC at Spotify 16x less HDFS read when using ORC versus Avro.(5) IOi 32x less CPU when using ORC versus Avro.(5) CPUi [2]
6.
Page 6 ©
Hortonworks Inc. 2011 – 2015. All Rights Reserved ORC: Today What is Optimized about ORC?
7.
Page7 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved ORC – Optimized Row-Columnar File Columnar Storage+ Row-groups & Stripe splits Protobuf Metadata Storage+ + Type-safe Vectorization+ Hive ACID transactions+ Single SerDe for Format+
8.
Page8 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved Columnar Storage Storage Performance ● Compress each column differently ● Detect & compress common sub-sequences ● Auto-increment ids ● String Enums ● Large Integers (uid scale) ● Unique strings (UUIDS) Read Performance ● Column projection ● Columnar deserializers ● Data locality Write Throughput ● Stats auto-gather
9.
Page9 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved Row-groups & Stripe splits Split Parallelism ● Effective parallelism ● No seeks to find boundaries ● No splits with zero data ● Decompress fixed chunks Stripes ● Single unsplittable chunk ● Will reside in 1 HDFS block entirely ● Is self-contained for all read ops
10.
Page10 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved A Single SerDe for all ORC Files A Single Writer ● No mismatch of serialization ● Forward compatibility Readers ● Multiple reader implementations ● Allows for vector readers ● And row-mode readers ● Similar loop – good JIT hit-rate
11.
Page11 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved Protobuf Metadata Storage Standardized Metadata ● Readers are easier to write ● Metadata readers are auto-generated Metadata Forward Compatibility ● Protobuf Optional fields Statistics Storage in Metadata ● Standard serialization for stats ● Allows for PPD into the IO layer
12.
Page12 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved Type-safe Vectorization Schema on Write ● Write ORC Structs with types ● SerDe & Inputformat Read Performance ● Data is read with few copies ● Primitive types are fast ● Primitives are also unboxed ● Predicates are typed too
13.
Page 13 ©
Hortonworks Inc. 2011 – 2015. All Rights Reserved ORC: ETL Improvements Always more new data
14.
Page14 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved ORC (Zlib): Compress Differently 674 389 433 ORC (old zlib) ORC SNAPPY ORC (new zlib) ETL for TPC-H LineItem (scale 1 Tb) Time Taken Different Zlib algorithms for encoding ● Z_FILTERED ● Z_DEFAULT ● Z_BEST_SPEED ● Z_DEFAULT_COMPRESSION In detail ● Compress IS_NULL bitsets lightly ● Compress Integers differently from Doubles ● Compress string dictionaries differently ● Allow for user choice
15.
Page15 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved ORC (Zlib): Compress Differently Different Zlib algorithms for encoding ● Z_FILTERED ● Z_DEFAULT ● Z_BEST_SPEED ● Z_DEFAULT_COMPRESSION In detail ● Compress IS_NULL bitsets lightly ● Compress Integers differently from Doubles ● Compress string dictionaries differently ● Allow for user choice 178.5 225.1 172.2 ORC (old zlib) ORC SNAPPY ORC (new zlib) Data Sizes for TPC-H Lineitem (Scale 1 Tb) Size on Disk
16.
Page16 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved Using JDK8 SIMD: Integer Writers Integer encodings ● Base + Delta ● Run-length ● Direct Trade-off for Size/Speed ● Use fixed bit-width loops ● Snap to nearest bit-width 0 200 400 600 800 1000 1200 1400 1600 1800 2000 1 2 4 8 16 24 32 40 48 56 64 MeanTime(ms) Bit Width ORC Write Integer Performance (smaller better) hive 0.13 bitpacking hive 1.0 bitpacking (new)
17.
Page17 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved Double Writers 273.331 247.634 231.741 0 50 100 150 200 250 300 old buffered + BE buffered + LE MeanTime(ms) Double Write Modes ORC Write Double Performance (smaller is better) Double Writers ● JVM is big-endian ● X86 is little-endian ● Special handling of NaN
18.
Page18 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved ORC: Scale compression buffers 269.4 263.3 258.5 258.4 258.4 258.4 184.8 183.5 182.2 180.1 178.3 177.4 140 160 180 200 220 240 260 280 300 320 8 16 32 64 128 256 SizeinMB Compression Buffer Size in KB File Size ZLIB SNAPPY Large Columns vs More Columns ● Adjust when >1000 columns Trade offs ● Compression ● Low memory use More additions ● Dynamically partitioned insert
19.
Page19 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved ORC: Streaming Ingest + ACID Broken pattern: Partitions for Atomicity- - Isolation & Consistency on retries+ Transactions are pluggable (txn.manager)+ Cache/Replication friendly (base + deltas)+
20.
Page 20 ©
Hortonworks Inc. 2011 – 2015. All Rights Reserved ORC: LLAP and Sub-second ORC – Pushing for Sub-second
21.
Page21 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved ORC: Row Indexes Min-Max pruning ● Evaluate on statistics Bloom filters ● Better String filters ● Filter a random distribution LLAP Future ● Row-level vector SARGs 5999989709 540,000 10,000 No Indexes Min-Max Indexes Bloomfilter Indexes from tpch_1000.lineitem where l_orderkey = 1212000001; (log scale) Rows Read
22.
Page22 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved ORC: Row Indexes Min-Max pruning ● Evaluate on Statistics Bloom filters ● Better String filters ● Filter a random distribution LLAP Future ● Row-level vector SARGs 74 4.5 1.34 No Indexes Min-Max Indexes Bloomfilter Indexes * from tpch_1000.lineitem where l_orderkey=1212000001; (smaller better) Time Taken (seconds)
23.
Page23 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved ORC: JDK8 SIMD Readers Integer encodings ● Base + Delta ● Run-length ● Direct Trade-off for Size/Speed ● Use fixed bit-width loops ● Snap to nearest bit-width 0 200 400 600 800 1000 1200 1400 1600 1800 1 2 4 8 16 24 32 40 48 56 64 MeanTime(ms) Bit Width ORC Read Integer Performance hive 0.13 unpacking hive-1.0 unpacking (new)
24.
Page24 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved ORC: Vectorization + SIMD Advantage of a Single SerDe ● Primitive Types Allocation free tight inner loops ● JDK8 has auto-vectorization Vectorized Early Filter ● Vectors can be filtered early in ORC ● StringDictionary can be used to binary-search Vectorized SIMD Join ● Performance for single key joins 0x00007f13d2e6afb0: vmovdqu 0x10(%rsi,%rax,8),%ymm2 0x00007f13d2e6afb6: vaddpd %ymm1,%ymm2,%ymm2 0x00007f13d2e6afba: movslq %eax,%r10 0x00007f13d2e6afbd: vmovdqu 0x30(%rsi,%r10,8),%ymm3 ;*daload vector.expressions.gen.DoubleColAddDoubleColumn::evaluate (line 94) 0x00007f13d2e6afc4: vmovdqu %ymm2,0x10(%rdx,%rax,8) 0x00007f13d2e6afca: vaddpd %ymm1,%ymm3,%ymm2 0x00007f13d2e6afce: vmovdqu %ymm2,0x30(%rdx,%r10,8) ;*dastore vector.expressions.gen.DoubleColAddDoubleColumn::evaluate (line 94)
25.
Page25 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved ORC: Split Strategies + Tez Grouping Amdahl’s Law ● As fast as the slowest task ● Slice work thinly, but not too thin Split-generation vs Execution time ● ETL ● BI ● Hybrid Split-grouping & estimation ● ColumnarSplit size ● Group by estimate, not file size ● Bucket pruning Slow split
26.
Page26 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved ORC: LLAP - JIT Performance for short queries+ Row-group level caching+ Asynchronous IO Elevator+ + Multi-threaded Column Vector processing+
27.
Page27 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved ORC: LLAP (+ SIMD + Split Strategies + Row Indexes)
28.
Page28 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved Questions? ? Interested? Stop by the Hortonworks booth to learn more
29.
Page29 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved Endnotes (1) https://code.facebook.com/posts/229861827208629/scaling-the-facebook-data-warehouse-to-300-pb/ (2) http://www.slideshare.net/AdamKawa/a-perfect-hive-query-for-a-perfect-meeting-hadoop-summit-2014
30.
Page30 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved END
31.
Page31 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved But wait, there’s more 178.5 225.1 172.2 220.1 674 389 433 446 0 50 100 150 200 250 0 100 200 300 400 500 600 700 800 ORC (old zlib) ORC SNAPPY ORC (new zlib) Parquet (snappy) TPC-H Lineitem (1000 scale) Time Taken (seconds) Data Sizes (on disk)