SlideShare una empresa de Scribd logo
1 de 36
1 © Hortonworks Inc. 2011–2018. All rights reserved
Transactional Operations in Apache Hive
DataWorks Summit, San Jose 2018
• Eugene Koifman
2 © Hortonworks Inc. 2011–2018. All rights reserved
Agenda
• A bit of history
• Current Functionality
• Design
• Future Plans
• Closing Remarks
3 © Hortonworks Inc. 2011–2018. All rights reserved
Early Hive
• Transactions
• ACID: Atomicity, Consistency, Isolation, Durability
• Atomicity - Rely on File System ‘rename’
• Insert into T partition(p=1) select …. - OK
• Dynamic Partition Write – not OK
• Multi-Insert statement – not OK
• FROM <expr> Insert into A select … Insert Into B select …
• Isolation - Lock Manager
• S/X locks – not good for long running analytics
4 © Hortonworks Inc. 2011–2018. All rights reserved
Early Hive – Changing Existing Data
• Drop <…>
• Insert Overwrite = Truncate + Insert
• Gets expensive if done often on small % of data
5 © Hortonworks Inc. 2011–2018. All rights reserved
Goals
• Support ACID properties
• Support SQL Update/Delete/Merge
• Low rate of transactions
• Not OLTP
• Not a replacement for MySql or HBase
6 © Hortonworks Inc. 2011–2018. All rights reserved
Features – Hive 3
7 © Hortonworks Inc. 2011–2018. All rights reserved
Transactional Tables
• Not all tables support transactional semantics
• Managed Tables
• No External tables or Storage Handler (Hbase, Druid, etc)
• Fully ACID compliant
• Single statement transactions
• Cross partition/cross table transactions
• Snapshot Isolation
• Between Serializable and Repeatable Read
8 © Hortonworks Inc. 2011–2018. All rights reserved
Transactional Tables – Full CRUD
 Supports Update/Delete/Merge
 CREATE TABLE T(a int, b int) STORED AS ORC TBLPROPERTIES ('transactional'='true');
• Restrictions
• Managed Table
• Table cannot be sorted
• Currently requires ORC File but anything implementing
• AcidInputFormat/AcidOutputFormat
• Bucketing is optional!
• If upgrading from Hive 2
• Requires Major Compaction before Upgrading
9 © Hortonworks Inc. 2011–2018. All rights reserved
Transactional Tables – Insert only
 CREATE TABLE T(a int, b int) TBLPROPERTIES ('transactional'='true’,
‘transactional_properties’=‘insert_only’);
• Managed Table
• Any storage format
10 © Hortonworks Inc. 2011–2018. All rights reserved
Transactional Tables – Convert from flat tables
 ALTER TABLE T SET TBLPROPERTIES ('transactional'='true')
 ALTER TABLE T(a int, b int) SET TBLPROPERTIES ('transactional'='true’,
‘transactional_properties’=‘true’);
• Metadata Only operation
• Compaction will eventually rewrite the table
11 © Hortonworks Inc. 2011–2018. All rights reserved
Transactional Tables - New In Hive 3
• Alter Table Add Partition…
• Alter Table T Concatenate
• Alter Table T Rename To….
• Export/Import Table
• Non-bucketed tables
• Load Data… Into Table …
• Insert Overwrite
• Fully Vectorized
• Create Table As …
• LLAP Cache
• Predicate Push Down
12 © Hortonworks Inc. 2011–2018. All rights reserved
Design – Hive 3
13 © Hortonworks Inc. 2011–2018. All rights reserved
Transactional Tables – Insert Only
• Transaction Manager
• Begin transaction and obtain a Transaction ID
• For each table, get a Write ID – determines location to write to
create table TM (a int, b int) TBLPROPERTIES
('transactional'='true',
'transactional_properties'='insert_only');
insert into TM values(1,1);
insert into TM values(2,2);
insert into TM values(3,3);
tm
── delta_0000001_0000001_0000
└── 000000_0
── delta_0000002_0000002_0000
└── 000000_0
── delta_0000003_0000003_0000
└── 000000_0
14 © Hortonworks Inc. 2011–2018. All rights reserved
Transaction Manager
• Transaction State
• Open, Committed, Aborted
• Reader at Snapshot Isolation
• A snapshot is the state of all transactions
• High Water Mark + List of Exceptions
tm
── delta_0000001_0000001_0000
└── 000000_0
── delta_0000002_0000002_0000
└── 000000_0
── delta_0000003_0000003_0000
└── 000000_0
 Atomicity & Isolation
15 © Hortonworks Inc. 2011–2018. All rights reserved
Full CRUD
• No in-place Delete - Append-only file system
• Isolate readers from writers
16 © Hortonworks Inc. 2011–2018. All rights reserved
ROW__ID
• CREATE TABLE acidtbl (a INT, b STRING) STORED AS ORC TBLPROPERTIES
('transactional'='true');
Metadata Columns original_write_id
bucket_id
row_id
current_write_id
User Columns col_1:
a : INT
col_2:
b : STRING
ROW__ID
17 © Hortonworks Inc. 2011–2018. All rights reserved
Create
• INSERT INTO acidtbl (a,b) VALUES (100, “foo”), (200, “xyz”), (300, “bee”);
ROW__ID a b
{ 1, 0, 0 } 100 “foo”
{ 1, 0, 1 } 200 “xyz”
{ 1, 0, 2 } 300 “bee”
delta_00001_00001/bucket_0000
18 © Hortonworks Inc. 2011–2018. All rights reserved
Delete
• DELETE FROM acidTbl where a = 200;
ROW__ID a b
{ 1, 0, 0 } 100 “foo”
{ 1, 0, 1 } 200 “xyz”
{ 1, 0, 2 } 300 “bee”
ROW__ID a b
{ 1, 0, 1 } null null
delta_00001_00001/bucket_0000
delete_delta_00002_00002/bucket_0000
 Readers skip deleted rows
19 © Hortonworks Inc. 2011–2018. All rights reserved
Update
• Update = delete + insert
 UPDATE acidTbl SET b = “bar” where a = 300;
ACID_PK a b
{ 1, 0, 0 } 100 “foo”
{ 1, 0, 1 } 200 “xyz”
{ 1, 0, 2 } 300 “bee”
delta_00001_00001/bucket_0000
ACID_PK a b
{ 2, 0, 0 } 300 “bar”
ACID_PK a b
{ 1, 0, 2 } null null
delta_00003_00003/bucket_0000 delete_delta_00003_00003/bucket_0000
20 © Hortonworks Inc. 2011–2018. All rights reserved
Read
• Ask Transaction Manager for Snapshot Information
• Decide which deltas are relevant
• Take all the files in delta_x_x/ and split them into chunks for each processing Task to
work with
• Localize all delete events from each delete_deleta_x_x/ to each task
• Highly Compressed with ORC
• Filter out all Insert events that have matching delete events
• Requires an Acid aware reader – thus AcidInputFormat
21 © Hortonworks Inc. 2011–2018. All rights reserved
Design - Compactor
• More Update operations = more delete events – make reads more expensive
• Insert operations don’t add read overhead
22 © Hortonworks Inc. 2011–2018. All rights reserved
Design - Compactor
• Compactor rewrites the table in the background
• Minor compaction - merges delta files into fewer deltas
• Major compactor merges deltas with base - more expensive
• This amortizes the cost of updates and self tunes the tables
• Makes ORC more efficient - larger stripes, better compression
• Compaction can be triggered automatically or on demand
• There are various configuration options to control when the process kicks in.
• Compaction itself is a Map-Reduce job
 Key design principle is that compactor does not affect readers/writers
• Cleaner process – removes obsolete files
• Requires Standalone metastore
23 © Hortonworks Inc. 2011–2018. All rights reserved
Merge Statement – SQL Standard 2011 (Hive 2.2)
ID State County Value
1 CA LA 19.0
2 MA Norfolk 15.0
7 MA Suffolk 50.15
16 CA Orange 9.1
ID State Value
1 20.0
7 80.0
100 NH 6.0
MERGE INTO TARGET T
USING SOURCE S ON T.ID=S.ID
WHEN MATCHED THEN
UPDATE SET T.Value=S.Value
WHEN NOT MATCHED
INSERT (ID,State,Value)
VALUES(S.ID, S.State, S.Value)
ID State County Value
1 CA LA 20.0
2 MA Norfolk 15.0
7 MA Suffolk 80.0
16 CA Orange 9.1
100 NH null 6.0
24 © Hortonworks Inc. 2011–2018. All rights reserved
SQL Merge
Target
Source
ACID_PK ID Stat
e
County Value
{ 2, 0, 1 } 1 CA LA 20.0
{ 2, 0, 2 } 7 MA Suffolk 80.0
ACID_PK ID State County Value
{ 2, 0, 1 } 100 NH 6.0
delta_00002_00002/bucket_0000
delta_00002_00002_001/bucket_0000
Right Outer Join
ON T.ID=S.ID
ACID_PK Data
{ 1, 0, 1 } null
{ 1, 0, 3 } null
delete_delta_00002_00002/bucket_0000
WHEN MATCHED
WHEN NOT MATCHED
25 © Hortonworks Inc. 2011–2018. All rights reserved
Merge Statement Optimizations
• Semi Join Reduction
• aka Dynamic Runtime Filtering
• On Tez only
T.ID=S.ID
Target Source
ID in (1,7,100)
T.ID=S.ID
Target Source
26 © Hortonworks Inc. 2011–2018. All rights reserved
Design - Concurrency
• Inserts are never in conflict since Hive does not enforce unique constraints
• Write Set tracking to prevent Write-Write conflicts in concurrent transactions
• Lock Manager
• DDL operations acquire eXclusive locks – metadata operations
• Read operations acquire Shared locks
27 © Hortonworks Inc. 2011–2018. All rights reserved
Tooling
• SHOW COMPACTIONS
• Hadoop Job ID
• SHOW TRANSACTIONS
• SHOW LOCKS
• What a lock is blocked on
• ABORT TRANSACTIONS txnid1, txnid2….
28 © Hortonworks Inc. 2011–2018. All rights reserved
Other Subsystems
• Result Set Caching
• Is it valid for current reader?
• Materialized Views
• Incremental View Manitenance
• Spark
• HiveWarehouseConnector: HS2 + LLAP
29 © Hortonworks Inc. 2011–2018. All rights reserved
Streaming Ingest API
• Connection – Hive Table
• Begin transaction
• Commit/Abort transaction
• org.apache.hive.streaming.StreamingConnection
• Writer
• Write records
• org.apache.hive.streaming.RecordWriter
• Append Only via this API
• Update/Delete via SQL
• Optimized for Write operations
• Requires more aggressive Compaction for efficient reads
• Supports dynamic partitioning in a single transaction
30 © Hortonworks Inc. 2011–2018. All rights reserved
Limitations
• Transaction Manager
• State is persisted in the metastore RDBMS
• Begin/Commit/Abort
• Metastore calls
31 © Hortonworks Inc. 2011–2018. All rights reserved
Future
32 © Hortonworks Inc. 2011–2018. All rights reserved
Future Work
• Multi statement transactions, i.e. BEGIN TRANSACTION/COMMIT/ROLLBACK
• Performance
• Smarter Compaction
• Finer grained concurrency management/conflict detection
• Read Committed w/Lock Based scheduling
• Better Monitoring/Alerting
• User define Primary Key
• Transactional Tables sorted on PK
33 © Hortonworks Inc. 2011–2018. All rights reserved
Further Reading
34 © Hortonworks Inc. 2011–2018. All rights reserved
Etc
• Documentation
• https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions
• https://cwiki.apache.org/confluence/display/Hive/Streaming+Data+Ingest+V2
• Follow/Contribute
• https://issues.apache.org/jira/browse/HIVE-
14004?jql=project%20%3D%20HIVE%20AND%20component%20%3D%20Transactions
• user@hive.apache.org
• dev@hive.apache.org
35 © Hortonworks Inc. 2011–2018. All rights reserved
Credits
• Alan Gates
• Sankar Hariappan
• Prasanth Jayachandran
• Eugene Koifman
• Owen O’Malley
• Saket Saurabh
• Sergey Shelukhin
• Gopal Vijayaraghavan
• Wei Zheng
36 © Hortonworks Inc. 2011–2018. All rights reserved
Thank You

Más contenido relacionado

La actualidad más candente

Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive

Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive

Cloudera, Inc.
 
Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing DataWorks Summit
 
Hive 3 - a new horizon
Hive 3 - a new horizonHive 3 - a new horizon
Hive 3 - a new horizonThejas Nair
 
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...StampedeCon
 
HBaseCon 2015: HBase Performance Tuning @ Salesforce
HBaseCon 2015: HBase Performance Tuning @ SalesforceHBaseCon 2015: HBase Performance Tuning @ Salesforce
HBaseCon 2015: HBase Performance Tuning @ SalesforceHBaseCon
 
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingApache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingDataWorks Summit
 
Hudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilitiesHudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilitiesNishith Agarwal
 
Your first ClickHouse data warehouse
Your first ClickHouse data warehouseYour first ClickHouse data warehouse
Your first ClickHouse data warehouseAltinity Ltd
 
Hadoop Security Architecture
Hadoop Security ArchitectureHadoop Security Architecture
Hadoop Security ArchitectureOwen O'Malley
 
A Day in the Life of a ClickHouse Query Webinar Slides
A Day in the Life of a ClickHouse Query Webinar Slides A Day in the Life of a ClickHouse Query Webinar Slides
A Day in the Life of a ClickHouse Query Webinar Slides Altinity Ltd
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to RedisDvir Volk
 
How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...DataWorks Summit/Hadoop Summit
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiDatabricks
 
Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)Ryan Blue
 
Hive 3 a new horizon
Hive 3  a new horizonHive 3  a new horizon
Hive 3 a new horizonArtem Ervits
 
Hive+Tez: A performance deep dive
Hive+Tez: A performance deep diveHive+Tez: A performance deep dive
Hive+Tez: A performance deep divet3rmin4t0r
 
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...HostedbyConfluent
 
HBaseCon 2012 | Lessons learned from OpenTSDB - Benoit Sigoure, StumbleUpon
HBaseCon 2012 | Lessons learned from OpenTSDB - Benoit Sigoure, StumbleUponHBaseCon 2012 | Lessons learned from OpenTSDB - Benoit Sigoure, StumbleUpon
HBaseCon 2012 | Lessons learned from OpenTSDB - Benoit Sigoure, StumbleUponCloudera, Inc.
 

La actualidad más candente (20)

Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive

Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive


 
Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing
 
Hive 3 - a new horizon
Hive 3 - a new horizonHive 3 - a new horizon
Hive 3 - a new horizon
 
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
 
HBaseCon 2015: HBase Performance Tuning @ Salesforce
HBaseCon 2015: HBase Performance Tuning @ SalesforceHBaseCon 2015: HBase Performance Tuning @ Salesforce
HBaseCon 2015: HBase Performance Tuning @ Salesforce
 
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingApache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data Processing
 
The Impala Cookbook
The Impala CookbookThe Impala Cookbook
The Impala Cookbook
 
Hudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilitiesHudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilities
 
What's New in Apache Hive
What's New in Apache HiveWhat's New in Apache Hive
What's New in Apache Hive
 
Your first ClickHouse data warehouse
Your first ClickHouse data warehouseYour first ClickHouse data warehouse
Your first ClickHouse data warehouse
 
Hadoop Security Architecture
Hadoop Security ArchitectureHadoop Security Architecture
Hadoop Security Architecture
 
A Day in the Life of a ClickHouse Query Webinar Slides
A Day in the Life of a ClickHouse Query Webinar Slides A Day in the Life of a ClickHouse Query Webinar Slides
A Day in the Life of a ClickHouse Query Webinar Slides
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
 
How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
 
Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)
 
Hive 3 a new horizon
Hive 3  a new horizonHive 3  a new horizon
Hive 3 a new horizon
 
Hive+Tez: A performance deep dive
Hive+Tez: A performance deep diveHive+Tez: A performance deep dive
Hive+Tez: A performance deep dive
 
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
 
HBaseCon 2012 | Lessons learned from OpenTSDB - Benoit Sigoure, StumbleUpon
HBaseCon 2012 | Lessons learned from OpenTSDB - Benoit Sigoure, StumbleUponHBaseCon 2012 | Lessons learned from OpenTSDB - Benoit Sigoure, StumbleUpon
HBaseCon 2012 | Lessons learned from OpenTSDB - Benoit Sigoure, StumbleUpon
 

Similar a Transactional operations in Apache Hive: present and future

ACID Transactions in Hive
ACID Transactions in HiveACID Transactions in Hive
ACID Transactions in HiveEugene Koifman
 
What's new in apache hive
What's new in apache hive What's new in apache hive
What's new in apache hive DataWorks Summit
 
What is New in Apache Hive 3.0?
What is New in Apache Hive 3.0?What is New in Apache Hive 3.0?
What is New in Apache Hive 3.0?DataWorks Summit
 
Hive 3 New Horizons DataWorks Summit Melbourne February 2019
Hive 3 New Horizons DataWorks Summit Melbourne February 2019Hive 3 New Horizons DataWorks Summit Melbourne February 2019
Hive 3 New Horizons DataWorks Summit Melbourne February 2019alanfgates
 
Major advancements in Apache Hive towards full support of SQL compliance
Major advancements in Apache Hive towards full support of SQL complianceMajor advancements in Apache Hive towards full support of SQL compliance
Major advancements in Apache Hive towards full support of SQL complianceDataWorks Summit/Hadoop Summit
 
What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?DataWorks Summit
 
What's New in Apache Hive 3.0 - Tokyo
What's New in Apache Hive 3.0 - TokyoWhat's New in Apache Hive 3.0 - Tokyo
What's New in Apache Hive 3.0 - TokyoDataWorks Summit
 
Hive present-and-feature-shanghai
Hive present-and-feature-shanghaiHive present-and-feature-shanghai
Hive present-and-feature-shanghaiYifeng Jiang
 
Fast SQL on Hadoop, really?
Fast SQL on Hadoop, really?Fast SQL on Hadoop, really?
Fast SQL on Hadoop, really?DataWorks Summit
 
Using Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
Using Spark Streaming and NiFi for the Next Generation of ETL in the EnterpriseUsing Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
Using Spark Streaming and NiFi for the Next Generation of ETL in the EnterpriseDataWorks Summit
 
Hive acid-updates-strata-sjc-feb-2015
Hive acid-updates-strata-sjc-feb-2015Hive acid-updates-strata-sjc-feb-2015
Hive acid-updates-strata-sjc-feb-2015alanfgates
 
Apache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBaseApache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBaseDataWorks Summit/Hadoop Summit
 
Apache phoenix: Past, Present and Future of SQL over HBAse
Apache phoenix: Past, Present and Future of SQL over HBAseApache phoenix: Past, Present and Future of SQL over HBAse
Apache phoenix: Past, Present and Future of SQL over HBAseenissoz
 
Apache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBaseApache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBaseDataWorks Summit/Hadoop Summit
 
Data Governance - Atlas 7.12.2015
Data Governance - Atlas 7.12.2015Data Governance - Atlas 7.12.2015
Data Governance - Atlas 7.12.2015Hortonworks
 

Similar a Transactional operations in Apache Hive: present and future (20)

Hive 3 a new horizon
Hive 3  a new horizonHive 3  a new horizon
Hive 3 a new horizon
 
Apache Hive ACID Project
Apache Hive ACID ProjectApache Hive ACID Project
Apache Hive ACID Project
 
ACID Transactions in Hive
ACID Transactions in HiveACID Transactions in Hive
ACID Transactions in Hive
 
HiveACIDPublic
HiveACIDPublicHiveACIDPublic
HiveACIDPublic
 
What's new in apache hive
What's new in apache hive What's new in apache hive
What's new in apache hive
 
What is New in Apache Hive 3.0?
What is New in Apache Hive 3.0?What is New in Apache Hive 3.0?
What is New in Apache Hive 3.0?
 
Hive 3 New Horizons DataWorks Summit Melbourne February 2019
Hive 3 New Horizons DataWorks Summit Melbourne February 2019Hive 3 New Horizons DataWorks Summit Melbourne February 2019
Hive 3 New Horizons DataWorks Summit Melbourne February 2019
 
Major advancements in Apache Hive towards full support of SQL compliance
Major advancements in Apache Hive towards full support of SQL complianceMajor advancements in Apache Hive towards full support of SQL compliance
Major advancements in Apache Hive towards full support of SQL compliance
 
What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?
 
What's New in Apache Hive 3.0 - Tokyo
What's New in Apache Hive 3.0 - TokyoWhat's New in Apache Hive 3.0 - Tokyo
What's New in Apache Hive 3.0 - Tokyo
 
Hive present-and-feature-shanghai
Hive present-and-feature-shanghaiHive present-and-feature-shanghai
Hive present-and-feature-shanghai
 
Fast SQL on Hadoop, really?
Fast SQL on Hadoop, really?Fast SQL on Hadoop, really?
Fast SQL on Hadoop, really?
 
Using Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
Using Spark Streaming and NiFi for the Next Generation of ETL in the EnterpriseUsing Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
Using Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
 
Hive acid-updates-strata-sjc-feb-2015
Hive acid-updates-strata-sjc-feb-2015Hive acid-updates-strata-sjc-feb-2015
Hive acid-updates-strata-sjc-feb-2015
 
Apache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBaseApache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBase
 
Apache phoenix: Past, Present and Future of SQL over HBAse
Apache phoenix: Past, Present and Future of SQL over HBAseApache phoenix: Past, Present and Future of SQL over HBAse
Apache phoenix: Past, Present and Future of SQL over HBAse
 
April 2014 HUG : Apache Phoenix
April 2014 HUG : Apache PhoenixApril 2014 HUG : Apache Phoenix
April 2014 HUG : Apache Phoenix
 
Apache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBaseApache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBase
 
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in HiveLLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
 
Data Governance - Atlas 7.12.2015
Data Governance - Atlas 7.12.2015Data Governance - Atlas 7.12.2015
Data Governance - Atlas 7.12.2015
 

Más de DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

Más de DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Último

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 

Último (20)

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 

Transactional operations in Apache Hive: present and future

  • 1. 1 © Hortonworks Inc. 2011–2018. All rights reserved Transactional Operations in Apache Hive DataWorks Summit, San Jose 2018 • Eugene Koifman
  • 2. 2 © Hortonworks Inc. 2011–2018. All rights reserved Agenda • A bit of history • Current Functionality • Design • Future Plans • Closing Remarks
  • 3. 3 © Hortonworks Inc. 2011–2018. All rights reserved Early Hive • Transactions • ACID: Atomicity, Consistency, Isolation, Durability • Atomicity - Rely on File System ‘rename’ • Insert into T partition(p=1) select …. - OK • Dynamic Partition Write – not OK • Multi-Insert statement – not OK • FROM <expr> Insert into A select … Insert Into B select … • Isolation - Lock Manager • S/X locks – not good for long running analytics
  • 4. 4 © Hortonworks Inc. 2011–2018. All rights reserved Early Hive – Changing Existing Data • Drop <…> • Insert Overwrite = Truncate + Insert • Gets expensive if done often on small % of data
  • 5. 5 © Hortonworks Inc. 2011–2018. All rights reserved Goals • Support ACID properties • Support SQL Update/Delete/Merge • Low rate of transactions • Not OLTP • Not a replacement for MySql or HBase
  • 6. 6 © Hortonworks Inc. 2011–2018. All rights reserved Features – Hive 3
  • 7. 7 © Hortonworks Inc. 2011–2018. All rights reserved Transactional Tables • Not all tables support transactional semantics • Managed Tables • No External tables or Storage Handler (Hbase, Druid, etc) • Fully ACID compliant • Single statement transactions • Cross partition/cross table transactions • Snapshot Isolation • Between Serializable and Repeatable Read
  • 8. 8 © Hortonworks Inc. 2011–2018. All rights reserved Transactional Tables – Full CRUD  Supports Update/Delete/Merge  CREATE TABLE T(a int, b int) STORED AS ORC TBLPROPERTIES ('transactional'='true'); • Restrictions • Managed Table • Table cannot be sorted • Currently requires ORC File but anything implementing • AcidInputFormat/AcidOutputFormat • Bucketing is optional! • If upgrading from Hive 2 • Requires Major Compaction before Upgrading
  • 9. 9 © Hortonworks Inc. 2011–2018. All rights reserved Transactional Tables – Insert only  CREATE TABLE T(a int, b int) TBLPROPERTIES ('transactional'='true’, ‘transactional_properties’=‘insert_only’); • Managed Table • Any storage format
  • 10. 10 © Hortonworks Inc. 2011–2018. All rights reserved Transactional Tables – Convert from flat tables  ALTER TABLE T SET TBLPROPERTIES ('transactional'='true')  ALTER TABLE T(a int, b int) SET TBLPROPERTIES ('transactional'='true’, ‘transactional_properties’=‘true’); • Metadata Only operation • Compaction will eventually rewrite the table
  • 11. 11 © Hortonworks Inc. 2011–2018. All rights reserved Transactional Tables - New In Hive 3 • Alter Table Add Partition… • Alter Table T Concatenate • Alter Table T Rename To…. • Export/Import Table • Non-bucketed tables • Load Data… Into Table … • Insert Overwrite • Fully Vectorized • Create Table As … • LLAP Cache • Predicate Push Down
  • 12. 12 © Hortonworks Inc. 2011–2018. All rights reserved Design – Hive 3
  • 13. 13 © Hortonworks Inc. 2011–2018. All rights reserved Transactional Tables – Insert Only • Transaction Manager • Begin transaction and obtain a Transaction ID • For each table, get a Write ID – determines location to write to create table TM (a int, b int) TBLPROPERTIES ('transactional'='true', 'transactional_properties'='insert_only'); insert into TM values(1,1); insert into TM values(2,2); insert into TM values(3,3); tm ── delta_0000001_0000001_0000 └── 000000_0 ── delta_0000002_0000002_0000 └── 000000_0 ── delta_0000003_0000003_0000 └── 000000_0
  • 14. 14 © Hortonworks Inc. 2011–2018. All rights reserved Transaction Manager • Transaction State • Open, Committed, Aborted • Reader at Snapshot Isolation • A snapshot is the state of all transactions • High Water Mark + List of Exceptions tm ── delta_0000001_0000001_0000 └── 000000_0 ── delta_0000002_0000002_0000 └── 000000_0 ── delta_0000003_0000003_0000 └── 000000_0  Atomicity & Isolation
  • 15. 15 © Hortonworks Inc. 2011–2018. All rights reserved Full CRUD • No in-place Delete - Append-only file system • Isolate readers from writers
  • 16. 16 © Hortonworks Inc. 2011–2018. All rights reserved ROW__ID • CREATE TABLE acidtbl (a INT, b STRING) STORED AS ORC TBLPROPERTIES ('transactional'='true'); Metadata Columns original_write_id bucket_id row_id current_write_id User Columns col_1: a : INT col_2: b : STRING ROW__ID
  • 17. 17 © Hortonworks Inc. 2011–2018. All rights reserved Create • INSERT INTO acidtbl (a,b) VALUES (100, “foo”), (200, “xyz”), (300, “bee”); ROW__ID a b { 1, 0, 0 } 100 “foo” { 1, 0, 1 } 200 “xyz” { 1, 0, 2 } 300 “bee” delta_00001_00001/bucket_0000
  • 18. 18 © Hortonworks Inc. 2011–2018. All rights reserved Delete • DELETE FROM acidTbl where a = 200; ROW__ID a b { 1, 0, 0 } 100 “foo” { 1, 0, 1 } 200 “xyz” { 1, 0, 2 } 300 “bee” ROW__ID a b { 1, 0, 1 } null null delta_00001_00001/bucket_0000 delete_delta_00002_00002/bucket_0000  Readers skip deleted rows
  • 19. 19 © Hortonworks Inc. 2011–2018. All rights reserved Update • Update = delete + insert  UPDATE acidTbl SET b = “bar” where a = 300; ACID_PK a b { 1, 0, 0 } 100 “foo” { 1, 0, 1 } 200 “xyz” { 1, 0, 2 } 300 “bee” delta_00001_00001/bucket_0000 ACID_PK a b { 2, 0, 0 } 300 “bar” ACID_PK a b { 1, 0, 2 } null null delta_00003_00003/bucket_0000 delete_delta_00003_00003/bucket_0000
  • 20. 20 © Hortonworks Inc. 2011–2018. All rights reserved Read • Ask Transaction Manager for Snapshot Information • Decide which deltas are relevant • Take all the files in delta_x_x/ and split them into chunks for each processing Task to work with • Localize all delete events from each delete_deleta_x_x/ to each task • Highly Compressed with ORC • Filter out all Insert events that have matching delete events • Requires an Acid aware reader – thus AcidInputFormat
  • 21. 21 © Hortonworks Inc. 2011–2018. All rights reserved Design - Compactor • More Update operations = more delete events – make reads more expensive • Insert operations don’t add read overhead
  • 22. 22 © Hortonworks Inc. 2011–2018. All rights reserved Design - Compactor • Compactor rewrites the table in the background • Minor compaction - merges delta files into fewer deltas • Major compactor merges deltas with base - more expensive • This amortizes the cost of updates and self tunes the tables • Makes ORC more efficient - larger stripes, better compression • Compaction can be triggered automatically or on demand • There are various configuration options to control when the process kicks in. • Compaction itself is a Map-Reduce job  Key design principle is that compactor does not affect readers/writers • Cleaner process – removes obsolete files • Requires Standalone metastore
  • 23. 23 © Hortonworks Inc. 2011–2018. All rights reserved Merge Statement – SQL Standard 2011 (Hive 2.2) ID State County Value 1 CA LA 19.0 2 MA Norfolk 15.0 7 MA Suffolk 50.15 16 CA Orange 9.1 ID State Value 1 20.0 7 80.0 100 NH 6.0 MERGE INTO TARGET T USING SOURCE S ON T.ID=S.ID WHEN MATCHED THEN UPDATE SET T.Value=S.Value WHEN NOT MATCHED INSERT (ID,State,Value) VALUES(S.ID, S.State, S.Value) ID State County Value 1 CA LA 20.0 2 MA Norfolk 15.0 7 MA Suffolk 80.0 16 CA Orange 9.1 100 NH null 6.0
  • 24. 24 © Hortonworks Inc. 2011–2018. All rights reserved SQL Merge Target Source ACID_PK ID Stat e County Value { 2, 0, 1 } 1 CA LA 20.0 { 2, 0, 2 } 7 MA Suffolk 80.0 ACID_PK ID State County Value { 2, 0, 1 } 100 NH 6.0 delta_00002_00002/bucket_0000 delta_00002_00002_001/bucket_0000 Right Outer Join ON T.ID=S.ID ACID_PK Data { 1, 0, 1 } null { 1, 0, 3 } null delete_delta_00002_00002/bucket_0000 WHEN MATCHED WHEN NOT MATCHED
  • 25. 25 © Hortonworks Inc. 2011–2018. All rights reserved Merge Statement Optimizations • Semi Join Reduction • aka Dynamic Runtime Filtering • On Tez only T.ID=S.ID Target Source ID in (1,7,100) T.ID=S.ID Target Source
  • 26. 26 © Hortonworks Inc. 2011–2018. All rights reserved Design - Concurrency • Inserts are never in conflict since Hive does not enforce unique constraints • Write Set tracking to prevent Write-Write conflicts in concurrent transactions • Lock Manager • DDL operations acquire eXclusive locks – metadata operations • Read operations acquire Shared locks
  • 27. 27 © Hortonworks Inc. 2011–2018. All rights reserved Tooling • SHOW COMPACTIONS • Hadoop Job ID • SHOW TRANSACTIONS • SHOW LOCKS • What a lock is blocked on • ABORT TRANSACTIONS txnid1, txnid2….
  • 28. 28 © Hortonworks Inc. 2011–2018. All rights reserved Other Subsystems • Result Set Caching • Is it valid for current reader? • Materialized Views • Incremental View Manitenance • Spark • HiveWarehouseConnector: HS2 + LLAP
  • 29. 29 © Hortonworks Inc. 2011–2018. All rights reserved Streaming Ingest API • Connection – Hive Table • Begin transaction • Commit/Abort transaction • org.apache.hive.streaming.StreamingConnection • Writer • Write records • org.apache.hive.streaming.RecordWriter • Append Only via this API • Update/Delete via SQL • Optimized for Write operations • Requires more aggressive Compaction for efficient reads • Supports dynamic partitioning in a single transaction
  • 30. 30 © Hortonworks Inc. 2011–2018. All rights reserved Limitations • Transaction Manager • State is persisted in the metastore RDBMS • Begin/Commit/Abort • Metastore calls
  • 31. 31 © Hortonworks Inc. 2011–2018. All rights reserved Future
  • 32. 32 © Hortonworks Inc. 2011–2018. All rights reserved Future Work • Multi statement transactions, i.e. BEGIN TRANSACTION/COMMIT/ROLLBACK • Performance • Smarter Compaction • Finer grained concurrency management/conflict detection • Read Committed w/Lock Based scheduling • Better Monitoring/Alerting • User define Primary Key • Transactional Tables sorted on PK
  • 33. 33 © Hortonworks Inc. 2011–2018. All rights reserved Further Reading
  • 34. 34 © Hortonworks Inc. 2011–2018. All rights reserved Etc • Documentation • https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions • https://cwiki.apache.org/confluence/display/Hive/Streaming+Data+Ingest+V2 • Follow/Contribute • https://issues.apache.org/jira/browse/HIVE- 14004?jql=project%20%3D%20HIVE%20AND%20component%20%3D%20Transactions • user@hive.apache.org • dev@hive.apache.org
  • 35. 35 © Hortonworks Inc. 2011–2018. All rights reserved Credits • Alan Gates • Sankar Hariappan • Prasanth Jayachandran • Eugene Koifman • Owen O’Malley • Saket Saurabh • Sergey Shelukhin • Gopal Vijayaraghavan • Wei Zheng
  • 36. 36 © Hortonworks Inc. 2011–2018. All rights reserved Thank You

Notas del editor

  1. Similar to LSM
  2. Target is the table inside the Warehouse Source table contains the changes to apply
  3. Update this for split update Runtime filtering, etc