SlideShare una empresa de Scribd logo
1 de 20
Descargar para leer sin conexión
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Oct 17 2018, Ryu Kobayashi
PLAZMA TD Tech Talk
2018 at Shibuya
Hive2 as a new TD
Hadoop core Engine
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Agenda
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Agenda
- PTD Hive
- Our storage is PlazmaDB
- Default support Vectorization
- Test
- Next plan
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Ryu Kobayashi
Software engineer at the Hadoop team
• Backend team -> Hadoop team -> MPP(Presto) team -> Hadoop Team
• Hadoop usage history: about 10 years
– Background:
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
PTD Hive
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
PTD Hive
• PTD = Patch set by Treasure Data
• Our Hadoop and Hive History
– CDH3 -> CDH4 -> HDP2 -> Apache Hadoop and Hive
• Why did we discarded the distribution?
– Bugs are fixed by ourselves
▪ But, it will not be taken in soon(Hive): e.g. HIVE-11353
– Distribution depends on a specific version
▪ The test range becomes wider
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
PTD Hive
• PTD project starts from 2015
– At that time version: Hive 2.1.0
– Current support version: Hive 2.3.2
• Why from 2.1.0 to 2.3.2, between 2015 and 2018?
– See the self introduction
– So, restart 2018
• We have fixed many bugs in 2.3.2 as well
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
PTD Hive
• We apply internal patch besides this:
– INSERT INTO/OVERWRITE
▪ Why?
– Our storage is PlazmaDB
– Storage does not HDFS
– So, output must be made to PlazmaDB
• Our original bugs may happen
– Investigation is serious
▪ Our original or Hive itself?
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Our storage is PlazmaDB
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
PlazmaDB
• What is PlazmaDB?
– Columnar Compression Storage
• PlazmaDB’s contents
– plazmadb
– plazmadb-mpcfile
▪ What is mpcfile?
– A proprietary format that compresses the MessagePack
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
PlazmaDB
• We does not used HDFS(But, we are using it as an intermediate file)
– Advantage: Easy upgrade Hadoop’s version
• Upgrade internal PlazmaDB library from Hive2
– Old:
▪ plazmadb
▪ plazmadb-mpcfile
▪ td-storage
▪ msgpack(0.6)
– New:
▪ plazmadb
▪ partition-manager
▪ msgpack(0.8)
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Default support Vectorization
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Default support Vectorization
• Currently our Hive 0.13 does not support Vectorization
– Because there are many bugs
• Since bugs have been fixed from Hive2, support by default
– There are some problems internally
▪ Schema type problem: READ and WRITE
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Default support Vectorization
• Performance?
– About 2 times Our legacy Hive than faster
▪ Vectorization
▪ New Storage Library
• The remaining challenges
– Our UDF support for vectorization
▪ Mainly time related
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Test
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Test
• How do we testing?
– system-test
▪ scheduled run
– Hive 0.13 and Hive2
– elephant-testing
▪ scheduled run
– Register query that was problematic so far
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Test
• What kind of problems happened?
– The result is different
▪ Schema type problem
– Null
– Decimal point
▪ This also affects INSERT INTO/OVERWRITE
– Specific UDF does not work
▪ Compatibility of jar used by Hive and jar used by us
– Cross join is not supported by default
▪ Because of hive.strict.checks.cartesian.product property
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Next plan
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Next plan
• Alpha release next Month
• Beta and Stable next year
• Our new PlazmaDB
– CBO support
• Tez support
– last time 2015...
▪ 0.8.4 -> 0.9(currently 0.9.1)
• Hive3 support
Thank You!
Danke!
Merci!
谢谢!
Gracias!
Kiitos!
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.

Más contenido relacionado

La actualidad más candente

Airframe: Lightweight Building Blocks for Scala - Scale By The Bay 2018
Airframe: Lightweight Building Blocks for Scala - Scale By The Bay 2018Airframe: Lightweight Building Blocks for Scala - Scale By The Bay 2018
Airframe: Lightweight Building Blocks for Scala - Scale By The Bay 2018Taro L. Saito
 
Presentation at SF Kubernetes Meetup (10/30/18), Introducing TiDB/TiKV
Presentation at SF Kubernetes Meetup (10/30/18), Introducing TiDB/TiKVPresentation at SF Kubernetes Meetup (10/30/18), Introducing TiDB/TiKV
Presentation at SF Kubernetes Meetup (10/30/18), Introducing TiDB/TiKVKevin Xu
 
InfluxDB 101 – Concepts and Architecture by Michael DeSa, Software Engineer |...
InfluxDB 101 – Concepts and Architecture by Michael DeSa, Software Engineer |...InfluxDB 101 – Concepts and Architecture by Michael DeSa, Software Engineer |...
InfluxDB 101 – Concepts and Architecture by Michael DeSa, Software Engineer |...InfluxData
 
Kapacitor Manager
Kapacitor ManagerKapacitor Manager
Kapacitor ManagerInfluxData
 
Relationship Extraction from Unstructured Text-Based on Stanford NLP with Spa...
Relationship Extraction from Unstructured Text-Based on Stanford NLP with Spa...Relationship Extraction from Unstructured Text-Based on Stanford NLP with Spa...
Relationship Extraction from Unstructured Text-Based on Stanford NLP with Spa...Spark Summit
 
Reading The Source Code of Presto
Reading The Source Code of PrestoReading The Source Code of Presto
Reading The Source Code of PrestoTaro L. Saito
 
Inside the InfluxDB storage engine
Inside the InfluxDB storage engineInside the InfluxDB storage engine
Inside the InfluxDB storage engineInfluxData
 
Scala for Everything: From Frontend to Backend Applications - Scala Matsuri 2020
Scala for Everything: From Frontend to Backend Applications - Scala Matsuri 2020Scala for Everything: From Frontend to Backend Applications - Scala Matsuri 2020
Scala for Everything: From Frontend to Backend Applications - Scala Matsuri 2020Taro L. Saito
 
3. Apache Tez Introducation - Apache Kylin Meetup @Shanghai
3. Apache Tez Introducation - Apache Kylin Meetup @Shanghai3. Apache Tez Introducation - Apache Kylin Meetup @Shanghai
3. Apache Tez Introducation - Apache Kylin Meetup @ShanghaiLuke Han
 
Lessons and Observations Scaling a Time Series Database
Lessons and Observations Scaling a Time Series DatabaseLessons and Observations Scaling a Time Series Database
Lessons and Observations Scaling a Time Series DatabaseInfluxData
 
OPTIMIZING THE TICK STACK
OPTIMIZING THE TICK STACKOPTIMIZING THE TICK STACK
OPTIMIZING THE TICK STACKInfluxData
 
InfluxDB 101 - Concepts and Architecture | Michael DeSa | InfluxData
InfluxDB 101 - Concepts and Architecture | Michael DeSa | InfluxDataInfluxDB 101 - Concepts and Architecture | Michael DeSa | InfluxData
InfluxDB 101 - Concepts and Architecture | Michael DeSa | InfluxDataInfluxData
 
Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...
Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...
Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...InfluxData
 
WRITING QUERIES (INFLUXQL AND TICK)
WRITING QUERIES (INFLUXQL AND TICK)WRITING QUERIES (INFLUXQL AND TICK)
WRITING QUERIES (INFLUXQL AND TICK)InfluxData
 
Introduction to InfluxDB
Introduction to InfluxDBIntroduction to InfluxDB
Introduction to InfluxDBJorn Jambers
 
Linked in nosql_atnetflix_2012_v1
Linked in nosql_atnetflix_2012_v1Linked in nosql_atnetflix_2012_v1
Linked in nosql_atnetflix_2012_v1Sid Anand
 
Interoperability with netCDF-4 - Experience with NPP and HDF-EOS5 products
Interoperability with netCDF-4 - Experience with NPP and HDF-EOS5 productsInteroperability with netCDF-4 - Experience with NPP and HDF-EOS5 products
Interoperability with netCDF-4 - Experience with NPP and HDF-EOS5 productsThe HDF-EOS Tools and Information Center
 
Supporting the "Rapi" C-laguage API in an R-compatible engine
Supporting the "Rapi" C-laguage API in an R-compatible engineSupporting the "Rapi" C-laguage API in an R-compatible engine
Supporting the "Rapi" C-laguage API in an R-compatible engineadamwelc
 
Introducing TiDB [Delivered: 09/25/18 at Portland Cloud Native Meetup]
Introducing TiDB [Delivered: 09/25/18 at Portland Cloud Native Meetup]Introducing TiDB [Delivered: 09/25/18 at Portland Cloud Native Meetup]
Introducing TiDB [Delivered: 09/25/18 at Portland Cloud Native Meetup]Kevin Xu
 
IT Monitoring in the Era of Containers | Luca Deri Founder & Project Lead | ntop
IT Monitoring in the Era of Containers | Luca Deri Founder & Project Lead | ntopIT Monitoring in the Era of Containers | Luca Deri Founder & Project Lead | ntop
IT Monitoring in the Era of Containers | Luca Deri Founder & Project Lead | ntopInfluxData
 

La actualidad más candente (20)

Airframe: Lightweight Building Blocks for Scala - Scale By The Bay 2018
Airframe: Lightweight Building Blocks for Scala - Scale By The Bay 2018Airframe: Lightweight Building Blocks for Scala - Scale By The Bay 2018
Airframe: Lightweight Building Blocks for Scala - Scale By The Bay 2018
 
Presentation at SF Kubernetes Meetup (10/30/18), Introducing TiDB/TiKV
Presentation at SF Kubernetes Meetup (10/30/18), Introducing TiDB/TiKVPresentation at SF Kubernetes Meetup (10/30/18), Introducing TiDB/TiKV
Presentation at SF Kubernetes Meetup (10/30/18), Introducing TiDB/TiKV
 
InfluxDB 101 – Concepts and Architecture by Michael DeSa, Software Engineer |...
InfluxDB 101 – Concepts and Architecture by Michael DeSa, Software Engineer |...InfluxDB 101 – Concepts and Architecture by Michael DeSa, Software Engineer |...
InfluxDB 101 – Concepts and Architecture by Michael DeSa, Software Engineer |...
 
Kapacitor Manager
Kapacitor ManagerKapacitor Manager
Kapacitor Manager
 
Relationship Extraction from Unstructured Text-Based on Stanford NLP with Spa...
Relationship Extraction from Unstructured Text-Based on Stanford NLP with Spa...Relationship Extraction from Unstructured Text-Based on Stanford NLP with Spa...
Relationship Extraction from Unstructured Text-Based on Stanford NLP with Spa...
 
Reading The Source Code of Presto
Reading The Source Code of PrestoReading The Source Code of Presto
Reading The Source Code of Presto
 
Inside the InfluxDB storage engine
Inside the InfluxDB storage engineInside the InfluxDB storage engine
Inside the InfluxDB storage engine
 
Scala for Everything: From Frontend to Backend Applications - Scala Matsuri 2020
Scala for Everything: From Frontend to Backend Applications - Scala Matsuri 2020Scala for Everything: From Frontend to Backend Applications - Scala Matsuri 2020
Scala for Everything: From Frontend to Backend Applications - Scala Matsuri 2020
 
3. Apache Tez Introducation - Apache Kylin Meetup @Shanghai
3. Apache Tez Introducation - Apache Kylin Meetup @Shanghai3. Apache Tez Introducation - Apache Kylin Meetup @Shanghai
3. Apache Tez Introducation - Apache Kylin Meetup @Shanghai
 
Lessons and Observations Scaling a Time Series Database
Lessons and Observations Scaling a Time Series DatabaseLessons and Observations Scaling a Time Series Database
Lessons and Observations Scaling a Time Series Database
 
OPTIMIZING THE TICK STACK
OPTIMIZING THE TICK STACKOPTIMIZING THE TICK STACK
OPTIMIZING THE TICK STACK
 
InfluxDB 101 - Concepts and Architecture | Michael DeSa | InfluxData
InfluxDB 101 - Concepts and Architecture | Michael DeSa | InfluxDataInfluxDB 101 - Concepts and Architecture | Michael DeSa | InfluxData
InfluxDB 101 - Concepts and Architecture | Michael DeSa | InfluxData
 
Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...
Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...
Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...
 
WRITING QUERIES (INFLUXQL AND TICK)
WRITING QUERIES (INFLUXQL AND TICK)WRITING QUERIES (INFLUXQL AND TICK)
WRITING QUERIES (INFLUXQL AND TICK)
 
Introduction to InfluxDB
Introduction to InfluxDBIntroduction to InfluxDB
Introduction to InfluxDB
 
Linked in nosql_atnetflix_2012_v1
Linked in nosql_atnetflix_2012_v1Linked in nosql_atnetflix_2012_v1
Linked in nosql_atnetflix_2012_v1
 
Interoperability with netCDF-4 - Experience with NPP and HDF-EOS5 products
Interoperability with netCDF-4 - Experience with NPP and HDF-EOS5 productsInteroperability with netCDF-4 - Experience with NPP and HDF-EOS5 products
Interoperability with netCDF-4 - Experience with NPP and HDF-EOS5 products
 
Supporting the "Rapi" C-laguage API in an R-compatible engine
Supporting the "Rapi" C-laguage API in an R-compatible engineSupporting the "Rapi" C-laguage API in an R-compatible engine
Supporting the "Rapi" C-laguage API in an R-compatible engine
 
Introducing TiDB [Delivered: 09/25/18 at Portland Cloud Native Meetup]
Introducing TiDB [Delivered: 09/25/18 at Portland Cloud Native Meetup]Introducing TiDB [Delivered: 09/25/18 at Portland Cloud Native Meetup]
Introducing TiDB [Delivered: 09/25/18 at Portland Cloud Native Meetup]
 
IT Monitoring in the Era of Containers | Luca Deri Founder & Project Lead | ntop
IT Monitoring in the Era of Containers | Luca Deri Founder & Project Lead | ntopIT Monitoring in the Era of Containers | Luca Deri Founder & Project Lead | ntop
IT Monitoring in the Era of Containers | Luca Deri Founder & Project Lead | ntop
 

Similar a PLAZMA TD Tech Talk 2018 at Shibuya: Hive2 as a new td hadoop core engine

How to Upgrade Major Version of Your Production PostgreSQL
How to Upgrade Major Version of Your Production PostgreSQLHow to Upgrade Major Version of Your Production PostgreSQL
How to Upgrade Major Version of Your Production PostgreSQLKeisuke Suzuki
 
Everything You Wanted to Know About JIT Compilation but Were Afraid to Ask [J...
Everything You Wanted to Know About JIT Compilation but Were Afraid to Ask [J...Everything You Wanted to Know About JIT Compilation but Were Afraid to Ask [J...
Everything You Wanted to Know About JIT Compilation but Were Afraid to Ask [J...David Buck
 
Metrics-Driven Performance Tuning for AWS Glue ETL Jobs (ANT326) - AWS re:Inv...
Metrics-Driven Performance Tuning for AWS Glue ETL Jobs (ANT326) - AWS re:Inv...Metrics-Driven Performance Tuning for AWS Glue ETL Jobs (ANT326) - AWS re:Inv...
Metrics-Driven Performance Tuning for AWS Glue ETL Jobs (ANT326) - AWS re:Inv...Amazon Web Services
 
Metrics-Driven Performance Tuning for AWS Glue ETL Jobs (ANT332) - AWS re:Inv...
Metrics-Driven Performance Tuning for AWS Glue ETL Jobs (ANT332) - AWS re:Inv...Metrics-Driven Performance Tuning for AWS Glue ETL Jobs (ANT332) - AWS re:Inv...
Metrics-Driven Performance Tuning for AWS Glue ETL Jobs (ANT332) - AWS re:Inv...Amazon Web Services
 
Metrics-Driven Performance Tuning for AWS Glue ETL Jobs (ANT331) - AWS re:Inv...
Metrics-Driven Performance Tuning for AWS Glue ETL Jobs (ANT331) - AWS re:Inv...Metrics-Driven Performance Tuning for AWS Glue ETL Jobs (ANT331) - AWS re:Inv...
Metrics-Driven Performance Tuning for AWS Glue ETL Jobs (ANT331) - AWS re:Inv...Amazon Web Services
 
Oracle GoldenGate Performance Tuning
Oracle GoldenGate Performance TuningOracle GoldenGate Performance Tuning
Oracle GoldenGate Performance TuningBobby Curtis
 
Infrastructure for auto scaling distributed system
Infrastructure for auto scaling distributed systemInfrastructure for auto scaling distributed system
Infrastructure for auto scaling distributed systemKai Sasaki
 
HDFS tiered storage: mounting object stores in HDFS
HDFS tiered storage: mounting object stores in HDFSHDFS tiered storage: mounting object stores in HDFS
HDFS tiered storage: mounting object stores in HDFSDataWorks Summit
 
How GumGum Migrated from Cassandra to Amazon DynamoDB (DAT345) - AWS re:Inven...
How GumGum Migrated from Cassandra to Amazon DynamoDB (DAT345) - AWS re:Inven...How GumGum Migrated from Cassandra to Amazon DynamoDB (DAT345) - AWS re:Inven...
How GumGum Migrated from Cassandra to Amazon DynamoDB (DAT345) - AWS re:Inven...Amazon Web Services
 
Apache Pulsar at Yahoo! Japan
Apache Pulsar at Yahoo! JapanApache Pulsar at Yahoo! Japan
Apache Pulsar at Yahoo! JapanStreamNative
 
Open Source Databases on the Cloud - Peter Dachnowicz
Open Source Databases on the Cloud - Peter DachnowiczOpen Source Databases on the Cloud - Peter Dachnowicz
Open Source Databases on the Cloud - Peter DachnowiczAmazon Web Services
 
Accelerate Your C/C++ Applications with Amazon EC2 F1 Instances (CMP405) - AW...
Accelerate Your C/C++ Applications with Amazon EC2 F1 Instances (CMP405) - AW...Accelerate Your C/C++ Applications with Amazon EC2 F1 Instances (CMP405) - AW...
Accelerate Your C/C++ Applications with Amazon EC2 F1 Instances (CMP405) - AW...Amazon Web Services
 
Open Source Managed Databases: Database Week San Francisco
Open Source Managed Databases: Database Week San FranciscoOpen Source Managed Databases: Database Week San Francisco
Open Source Managed Databases: Database Week San FranciscoAmazon Web Services
 
Presto: SQL-on-Anything. Netherlands Hadoop User Group Meetup
Presto: SQL-on-Anything. Netherlands Hadoop User Group MeetupPresto: SQL-on-Anything. Netherlands Hadoop User Group Meetup
Presto: SQL-on-Anything. Netherlands Hadoop User Group MeetupWojciech Biela
 
CON5898 What Servlet 4.0 Means To You
CON5898 What Servlet 4.0 Means To YouCON5898 What Servlet 4.0 Means To You
CON5898 What Servlet 4.0 Means To YouEdward Burns
 
SteelFusion for Oil & Gas Industry
SteelFusion for Oil & Gas IndustrySteelFusion for Oil & Gas Industry
SteelFusion for Oil & Gas IndustryMena Migally
 

Similar a PLAZMA TD Tech Talk 2018 at Shibuya: Hive2 as a new td hadoop core engine (20)

How to Upgrade Major Version of Your Production PostgreSQL
How to Upgrade Major Version of Your Production PostgreSQLHow to Upgrade Major Version of Your Production PostgreSQL
How to Upgrade Major Version of Your Production PostgreSQL
 
Everything You Wanted to Know About JIT Compilation but Were Afraid to Ask [J...
Everything You Wanted to Know About JIT Compilation but Were Afraid to Ask [J...Everything You Wanted to Know About JIT Compilation but Were Afraid to Ask [J...
Everything You Wanted to Know About JIT Compilation but Were Afraid to Ask [J...
 
Metrics-Driven Performance Tuning for AWS Glue ETL Jobs (ANT326) - AWS re:Inv...
Metrics-Driven Performance Tuning for AWS Glue ETL Jobs (ANT326) - AWS re:Inv...Metrics-Driven Performance Tuning for AWS Glue ETL Jobs (ANT326) - AWS re:Inv...
Metrics-Driven Performance Tuning for AWS Glue ETL Jobs (ANT326) - AWS re:Inv...
 
Metrics-Driven Performance Tuning for AWS Glue ETL Jobs (ANT332) - AWS re:Inv...
Metrics-Driven Performance Tuning for AWS Glue ETL Jobs (ANT332) - AWS re:Inv...Metrics-Driven Performance Tuning for AWS Glue ETL Jobs (ANT332) - AWS re:Inv...
Metrics-Driven Performance Tuning for AWS Glue ETL Jobs (ANT332) - AWS re:Inv...
 
Metrics-Driven Performance Tuning for AWS Glue ETL Jobs (ANT331) - AWS re:Inv...
Metrics-Driven Performance Tuning for AWS Glue ETL Jobs (ANT331) - AWS re:Inv...Metrics-Driven Performance Tuning for AWS Glue ETL Jobs (ANT331) - AWS re:Inv...
Metrics-Driven Performance Tuning for AWS Glue ETL Jobs (ANT331) - AWS re:Inv...
 
Oracle GoldenGate Performance Tuning
Oracle GoldenGate Performance TuningOracle GoldenGate Performance Tuning
Oracle GoldenGate Performance Tuning
 
Infrastructure for auto scaling distributed system
Infrastructure for auto scaling distributed systemInfrastructure for auto scaling distributed system
Infrastructure for auto scaling distributed system
 
HDFS tiered storage: mounting object stores in HDFS
HDFS tiered storage: mounting object stores in HDFSHDFS tiered storage: mounting object stores in HDFS
HDFS tiered storage: mounting object stores in HDFS
 
How GumGum Migrated from Cassandra to Amazon DynamoDB (DAT345) - AWS re:Inven...
How GumGum Migrated from Cassandra to Amazon DynamoDB (DAT345) - AWS re:Inven...How GumGum Migrated from Cassandra to Amazon DynamoDB (DAT345) - AWS re:Inven...
How GumGum Migrated from Cassandra to Amazon DynamoDB (DAT345) - AWS re:Inven...
 
Apache Pulsar at Yahoo! Japan
Apache Pulsar at Yahoo! JapanApache Pulsar at Yahoo! Japan
Apache Pulsar at Yahoo! Japan
 
Hive Now Sparks
Hive Now SparksHive Now Sparks
Hive Now Sparks
 
Open Source Databases on the Cloud - Peter Dachnowicz
Open Source Databases on the Cloud - Peter DachnowiczOpen Source Databases on the Cloud - Peter Dachnowicz
Open Source Databases on the Cloud - Peter Dachnowicz
 
Accelerate Your C/C++ Applications with Amazon EC2 F1 Instances (CMP405) - AW...
Accelerate Your C/C++ Applications with Amazon EC2 F1 Instances (CMP405) - AW...Accelerate Your C/C++ Applications with Amazon EC2 F1 Instances (CMP405) - AW...
Accelerate Your C/C++ Applications with Amazon EC2 F1 Instances (CMP405) - AW...
 
2012 ah vegas top10 tips from aruba tac
2012 ah vegas   top10 tips from aruba tac2012 ah vegas   top10 tips from aruba tac
2012 ah vegas top10 tips from aruba tac
 
Open Source Managed Databases: Database Week San Francisco
Open Source Managed Databases: Database Week San FranciscoOpen Source Managed Databases: Database Week San Francisco
Open Source Managed Databases: Database Week San Francisco
 
Upgrading HDFS to 3.3.0 and deploying RBF in production #LINE_DM
Upgrading HDFS to 3.3.0 and deploying RBF in production #LINE_DMUpgrading HDFS to 3.3.0 and deploying RBF in production #LINE_DM
Upgrading HDFS to 3.3.0 and deploying RBF in production #LINE_DM
 
MySQL and MariaDB
MySQL and MariaDBMySQL and MariaDB
MySQL and MariaDB
 
Presto: SQL-on-Anything. Netherlands Hadoop User Group Meetup
Presto: SQL-on-Anything. Netherlands Hadoop User Group MeetupPresto: SQL-on-Anything. Netherlands Hadoop User Group Meetup
Presto: SQL-on-Anything. Netherlands Hadoop User Group Meetup
 
CON5898 What Servlet 4.0 Means To You
CON5898 What Servlet 4.0 Means To YouCON5898 What Servlet 4.0 Means To You
CON5898 What Servlet 4.0 Means To You
 
SteelFusion for Oil & Gas Industry
SteelFusion for Oil & Gas IndustrySteelFusion for Oil & Gas Industry
SteelFusion for Oil & Gas Industry
 

Más de Ryu Kobayashi

Treasure Data on The YARN - Hadoop Conference Japan 2014
Treasure Data on The YARN - Hadoop Conference Japan 2014Treasure Data on The YARN - Hadoop Conference Japan 2014
Treasure Data on The YARN - Hadoop Conference Japan 2014Ryu Kobayashi
 
Huahin Framework for Hadoop, Hadoop Conference Japan 2013 Winter
Huahin Framework for Hadoop, Hadoop Conference Japan 2013 WinterHuahin Framework for Hadoop, Hadoop Conference Japan 2013 Winter
Huahin Framework for Hadoop, Hadoop Conference Japan 2013 WinterRyu Kobayashi
 
Hadoop Conference Japan 2011 Fall
Hadoop Conference Japan 2011 FallHadoop Conference Japan 2011 Fall
Hadoop Conference Japan 2011 FallRyu Kobayashi
 
Developers summit cassandraで見るNoSQL
Developers summit cassandraで見るNoSQLDevelopers summit cassandraで見るNoSQL
Developers summit cassandraで見るNoSQLRyu Kobayashi
 
Hadoopソースコードリーディング第3回 Hadopo MR + Cassandra
Hadoopソースコードリーディング第3回 Hadopo MR + CassandraHadoopソースコードリーディング第3回 Hadopo MR + Cassandra
Hadoopソースコードリーディング第3回 Hadopo MR + CassandraRyu Kobayashi
 
AWSを使ったトラッキングログ収集
AWSを使ったトラッキングログ収集AWSを使ったトラッキングログ収集
AWSを使ったトラッキングログ収集Ryu Kobayashi
 
Hadoopソースコードリーディング MapReduce障害時のフロー
Hadoopソースコードリーディング MapReduce障害時のフローHadoopソースコードリーディング MapReduce障害時のフロー
Hadoopソースコードリーディング MapReduce障害時のフローRyu Kobayashi
 

Más de Ryu Kobayashi (7)

Treasure Data on The YARN - Hadoop Conference Japan 2014
Treasure Data on The YARN - Hadoop Conference Japan 2014Treasure Data on The YARN - Hadoop Conference Japan 2014
Treasure Data on The YARN - Hadoop Conference Japan 2014
 
Huahin Framework for Hadoop, Hadoop Conference Japan 2013 Winter
Huahin Framework for Hadoop, Hadoop Conference Japan 2013 WinterHuahin Framework for Hadoop, Hadoop Conference Japan 2013 Winter
Huahin Framework for Hadoop, Hadoop Conference Japan 2013 Winter
 
Hadoop Conference Japan 2011 Fall
Hadoop Conference Japan 2011 FallHadoop Conference Japan 2011 Fall
Hadoop Conference Japan 2011 Fall
 
Developers summit cassandraで見るNoSQL
Developers summit cassandraで見るNoSQLDevelopers summit cassandraで見るNoSQL
Developers summit cassandraで見るNoSQL
 
Hadoopソースコードリーディング第3回 Hadopo MR + Cassandra
Hadoopソースコードリーディング第3回 Hadopo MR + CassandraHadoopソースコードリーディング第3回 Hadopo MR + Cassandra
Hadoopソースコードリーディング第3回 Hadopo MR + Cassandra
 
AWSを使ったトラッキングログ収集
AWSを使ったトラッキングログ収集AWSを使ったトラッキングログ収集
AWSを使ったトラッキングログ収集
 
Hadoopソースコードリーディング MapReduce障害時のフロー
Hadoopソースコードリーディング MapReduce障害時のフローHadoopソースコードリーディング MapReduce障害時のフロー
Hadoopソースコードリーディング MapReduce障害時のフロー
 

Último

Hostel management system project report..pdf
Hostel management system project report..pdfHostel management system project report..pdf
Hostel management system project report..pdfKamal Acharya
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptMsecMca
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VDineshKumar4165
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptDineshKumar4165
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueBhangaleSonal
 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfRagavanV2
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Bookingdharasingh5698
 
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoorTop Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoordharasingh5698
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapRishantSharmaFr
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayEpec Engineered Technologies
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...roncy bisnoi
 
2016EF22_0 solar project report rooftop projects
2016EF22_0 solar project report rooftop projects2016EF22_0 solar project report rooftop projects
2016EF22_0 solar project report rooftop projectssmsksolar
 
Work-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxWork-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxJuliansyahHarahap1
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXssuser89054b
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756dollysharma2066
 
chapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringchapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringmulugeta48
 

Último (20)

Hostel management system project report..pdf
Hostel management system project report..pdfHostel management system project report..pdf
Hostel management system project report..pdf
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.ppt
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdf
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
 
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoorTop Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leap
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
2016EF22_0 solar project report rooftop projects
2016EF22_0 solar project report rooftop projects2016EF22_0 solar project report rooftop projects
2016EF22_0 solar project report rooftop projects
 
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced LoadsFEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
 
Work-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxWork-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptx
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
 
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
 
chapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringchapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineering
 

PLAZMA TD Tech Talk 2018 at Shibuya: Hive2 as a new td hadoop core engine

  • 1. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Oct 17 2018, Ryu Kobayashi PLAZMA TD Tech Talk 2018 at Shibuya Hive2 as a new TD Hadoop core Engine
  • 2. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Agenda
  • 3. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Agenda - PTD Hive - Our storage is PlazmaDB - Default support Vectorization - Test - Next plan
  • 4. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Ryu Kobayashi Software engineer at the Hadoop team • Backend team -> Hadoop team -> MPP(Presto) team -> Hadoop Team • Hadoop usage history: about 10 years – Background:
  • 5. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. PTD Hive
  • 6. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. PTD Hive • PTD = Patch set by Treasure Data • Our Hadoop and Hive History – CDH3 -> CDH4 -> HDP2 -> Apache Hadoop and Hive • Why did we discarded the distribution? – Bugs are fixed by ourselves ▪ But, it will not be taken in soon(Hive): e.g. HIVE-11353 – Distribution depends on a specific version ▪ The test range becomes wider
  • 7. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. PTD Hive • PTD project starts from 2015 – At that time version: Hive 2.1.0 – Current support version: Hive 2.3.2 • Why from 2.1.0 to 2.3.2, between 2015 and 2018? – See the self introduction – So, restart 2018 • We have fixed many bugs in 2.3.2 as well
  • 8. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. PTD Hive • We apply internal patch besides this: – INSERT INTO/OVERWRITE ▪ Why? – Our storage is PlazmaDB – Storage does not HDFS – So, output must be made to PlazmaDB • Our original bugs may happen – Investigation is serious ▪ Our original or Hive itself?
  • 9. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Our storage is PlazmaDB
  • 10. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. PlazmaDB • What is PlazmaDB? – Columnar Compression Storage • PlazmaDB’s contents – plazmadb – plazmadb-mpcfile ▪ What is mpcfile? – A proprietary format that compresses the MessagePack
  • 11. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. PlazmaDB • We does not used HDFS(But, we are using it as an intermediate file) – Advantage: Easy upgrade Hadoop’s version • Upgrade internal PlazmaDB library from Hive2 – Old: ▪ plazmadb ▪ plazmadb-mpcfile ▪ td-storage ▪ msgpack(0.6) – New: ▪ plazmadb ▪ partition-manager ▪ msgpack(0.8)
  • 12. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Default support Vectorization
  • 13. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Default support Vectorization • Currently our Hive 0.13 does not support Vectorization – Because there are many bugs • Since bugs have been fixed from Hive2, support by default – There are some problems internally ▪ Schema type problem: READ and WRITE
  • 14. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Default support Vectorization • Performance? – About 2 times Our legacy Hive than faster ▪ Vectorization ▪ New Storage Library • The remaining challenges – Our UDF support for vectorization ▪ Mainly time related
  • 15. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Test
  • 16. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Test • How do we testing? – system-test ▪ scheduled run – Hive 0.13 and Hive2 – elephant-testing ▪ scheduled run – Register query that was problematic so far
  • 17. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Test • What kind of problems happened? – The result is different ▪ Schema type problem – Null – Decimal point ▪ This also affects INSERT INTO/OVERWRITE – Specific UDF does not work ▪ Compatibility of jar used by Hive and jar used by us – Cross join is not supported by default ▪ Because of hive.strict.checks.cartesian.product property
  • 18. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Next plan
  • 19. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Next plan • Alpha release next Month • Beta and Stable next year • Our new PlazmaDB – CBO support • Tez support – last time 2015... ▪ 0.8.4 -> 0.9(currently 0.9.1) • Hive3 support
  • 20. Thank You! Danke! Merci! 谢谢! Gracias! Kiitos! Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.