SlideShare una empresa de Scribd logo
1 de 18
HiveServer2
Oct., 2013
Schubert Zhang
Hive Evolution
• Original
• Let users express their queries in a high-level language without having to
write MapReduce programs.
• Mainly target to ad-hoc queries.
• As a data tool, usually work in CLI mode.

• Now more …
• A parallel SQL DBMS that happens to use Hadoop for its storage and
execution layers.
• Ad-hoc + regular
• As a service …
Introduction
• Limitations of HiveServer1
•
•
•
•

Concurrency
Security
Client Interface
Stability

• Sessions/Currency

• Old Thrift API and server implementation
didn’t support currency.

• xDBC

• Old Thrift API didn’t support common xDBC

• Authentication/Authorization
• Incomplete implementations

• Auditing/Logging

HiveServer2:
• From hive-0.11 / CDH4.1
• Reconstructed and Re-implemented.
(HIVE-2935)
• HiveServer2 is a container for the Hive
execution engine (Driver).
• For each client connection, it creates a
new execution context (Connection and
Session) that serves Hive SQL requests
from the client.
• The new RPC interface enables the server
to associate this Hive execution context
with the thread serving the client’s
request.
Architecture

In fact,
Driver in
Operation
Context

System Arch.

Authentication Arch.
(don’t talk here)

http://blog.cloudera.com/blog/2013/07/how-hiveserver2-brings-security-and-concurrency-to-apache-hive/

@Cloudera
hiveServer2

Architecture:
Internal

Client-1

(main entry)
start

Thrift RPC Iface
Client-2

thriftCLIService
(TThreadPoolServer,
implements Client RPC Iface)
lIsten() and accept() new client connection, and process in each Thread)

• Core Contexts
• Connections
• Sessions
• Operations

• Operation Path …

Threads for Client Connections

…
call (ICLIService internal interface)

cliService
(Real implementations of
various operations)

open/close sessions, run operations in existing sessions …
HiveSession Interface
session
HiveConf, SessionState
sessionManager

backgroundOperationPool

runAsync

session

HiveConf, SessionState

operationManager

Threads for Async Operations
…

(handleToSessionMap)

...

...

session
HiveConf, SessionState
(handleToOperationMap)
create and run operations
SQLop
sync/async

create and run hive Driver
Hive Driver

op

op

...

op

SQLOp/SetOp/DfsOp/AddResourceOp/DeleteResourceOp ..
GetTypeInfoOp/GetCatalogsOp/GetSchemasOp/GetTablesOp/
GetTableTypesOp/GetColumnsOp/GetFunctionsOp ...
Architecture: Server Context
•
•
•
•

Client-1

Connection-1
(Thread)

Client
Connection (Thread)
Session (-> HiveConf, SessionState)
Operation (-> Driver)

Client-2

Connection-2
(Thread)

Session-12

• Usually, a client only opens one
Session in a Connection. (refer to JDBC
HiveDriver: HiveConnection)

Op-121
(SQL)

Driver

Session-11

Op-122

Op-123
(SQL)

Driver
Session

New Client API

SQL and Hive
Operation

• TCLIService.thrift
• Complete API
• Complete Database API

Hive
Command
Operation
DB Metadata
Operation

• Think about JDBC/ODBC
• To be compatible with
existing DB software

• Hive Specific API

• Best Practice

Operation for
Operation

• Client API vs. Internal
API
• Converting and Isolation
Get Result

OpenSession

CloseSession
ExecuteStatement

GetInfo *
GetTypeInfo
GetCatalogs
GetSchemas
GetTables
GetTableTypes

Client request to open a new session. A new HiveSession is created
in server and return a unique SessionHandler (UUID). All other calls
depend on this session.
Client request to close the session. Will also close and remove all
operations in this session.
Execute a HQL statement. SQLOp
Some SQL statement can be tagged “runAsync”, then it will be
executed in a dedicated Thread and return immediately.
SetOp,DfsOp,AddResourceOp,DeleteResourceOp

Get various global variables of Hive. (Key-Type->Value)
Get the detailed description and constraint of data type.
Do nothing so far.
Get schema from metastore.
Get table schema from metastore.
Get the table type, e.g. MANAGED_TABLE, EXTERNAL_TABLE,
VIRTUAL_VIEW, INDEX_TABLE.
GetColumns
Get columns of a table from metastore.
GetFunctions
Get the UDF functions.
GetOperationStatu Get state of an operation by opHandler, INITIALIZED/
s
RUNNING/FINISHED/CANCELED/CLOSED/ERROR/UNKNOWN/PENDI
NG.
CancelOperation
Cancel a RUNNING or PENDING operation by opHandler.
For SQLOp, do cleanup: close and destroy Hive Driver, delete temp
output files, and cancel the task running in the background thread…
CloseOperation
Remove this operation and close it: for SQLOp, do cleanup; for
HiveCommandOp, tearDownSessionIO.
GetResultSetMeta Get the resultset’s schema, such as the title columns.
data
FetchResults
Fetch the result rows from the real resultset.
Code
• Packages

• org.apache.hive.service …, top project of apache…

• Pros

• Clear Implementation
• Decoupling of HiveServer2 and HiveCore
• Decoupling of Thrift Client API and Internal Code

• Cons
•
•
•
•

Too many design pattern.
Somewhere, inconsistent principle.
Still not complete decoupling of HiveServer2 and HiveCore.
The JDBC Driver package/jar still relies on many other core code, such Hive->Hadoop and the
libs… (may be because of the support of Embedded Mode.)
Service
+state

CompositeService

Code

HiveServer2

AbstractService

+serviceList

+HiveConf: Global,set by init()

+addService()
+removeService()

+main(): 入口

+init()
+start()
+stop()
+register(): StateChangeListener
TCLIService.Iface

ThriftCLIService

ThrifyBinaryService

+cliService
ICLIService

TThreadPoolServer

+openSession()
+closeSession()
+getInfo()
+executeStatement()
+...()
+fetchResults()

CLIService
+sessionManager

FixedThreadPool

+OpenSession()
+CloseSession()
+GetInfo()
+ExecuteStatement()
+...()
+FetchResults()
OperationManager
+handleToOperation: HashMap
+newExecuteStatementOperation()
+newGetTypeInfoOperation()
+...()
+addOperation()
+removeOperation()
+getOperation()
+getOperationState()
+cancelOperation()
+closeOperation()
+getOperationNextRowSet()
+...()

SessionManager
+handleToSession: HashMap
+operationManager
+backgroundOperationPool

HiveSession
HiveSessionImpl

+sessionHandle
+hiveConf: new for each
+sessionState: new for each
+opHandleSet

+openSession()
+closeSession()
+getSession()
+...()
+submitBackgroundOperation()

Operation
+opHandle
+parentSession
+state
+getState()
+setState()
+run()
+getNextRowSet()
+close()
+cancel()
+...()

+getSessionHandle()
+getInfo()
+executeStatement()
+executeStatementAsync()
+...()
+fetchResults()

GetInfoOperation

ExecuteStatementOperation

SQLOperation

AddResourceOperation

DeleteResourceOpetation

DfsOperation

SetOperation

GetSchemasOperation

XXXOperation

This is just a quick view, may be not exact
in some detail, and intentionally missed
something not so important.
HiveCore and Depending
Hive

Env.?

• HiveConf

• Global instance
• Instance for each Session.

• Client can inject additional KeyValue style configurations when
OpenSession.
• Set an explicit session name(id) to
control the download directory
name.

• Hive SessionState

• Instance for each Session.

• Hive Driver

• Instance for each SQL Operation.

• Global static variables?
• ??

• SetOperation ->SetProcessor

• set env: variables can not be set.
• set system: global
System.getProperties().setProperty(..)
• We may forbid system setting? Or, only
administrator can do it?

• set hiveconf: instanced.
• set hivevar: instanced.
• Set: instanced

• AddResource and DeleteResourceOperation

• SessionState. add_resource/delete_resource
• DOWNLOADED_RESOURCES_DIR("hive.downlo
aded.resources.dir",
System.getProperty("java.io.tmpdir") +
File.separator + "${hive.session.id}_resources")

• DfsOperation

• Auth. With HDFS?
Handler (Identifier)
• SessionHandler
• OperationHandler
Theift IDL:

• Use UUID

struct THandleIdentifier {
// 16 byte globally unique identifier
// This is the public ID of the handle and
// can be used for reporting.
1: required binary guid,
Now, only the public ID is used, it’s OK.
// 16 byte secret generated by the server
// and used to verify that the handle is not
// being hijacked by another user.
2: required binary secret,

}
Configurations and Run
Config:

Run:

•
•
•
•
•
•
•

• Start HiveServer2

hive.server2.transport.mode = binary | http | https
hive.server2.thrift.port = 10000
hive.server2.thrift.bind.host
hive.server2.thrift.min.worker.threads = 5
hive.server2.thrift.max.worker.threads = 500
hive.server2.async.exec.threads = 50
hive.server2.async.exec.shutdown.timeout = 10
(seconds)

• hive.support.concurrency = true ???
• hive.zookeeper.quorum =
• …

• bin/hiveserver2 &

• Start CLI (use standard JDBC)
• bin/beeline
• !connect
jdbc:hive2://localhost:10000
• show tables;
• select * from tablename limit 10;
Interface and Clients
• RPC (TCLIService.thrift)

• Binary Protocol
• Http/https Protocol (to be researched)

• New JDBC Driver

• org.apache.hive.jdbc.HiveDriver
• URL: jdbc:hive2://hostname:10000/dbname… (jdbc:hive2://localhost:10000/default)
• Implemented more API features.

3party Client over JDBC:
• CLI

• Beeline based on SQLine

• IDE: SQuirreL SQL Client
• Web Client (e.g. H2 Web, etc.)
Client Tools: CLI
SQLine, Beeline
Client Tools: IDE SQuirreL SQL Client
Client Tools: Web Client
Think More …
• Thinking of XX as Platform

• Standard JDBC/ODBC
• RESTful API over HTTP, Web Service
• AWS Redshift, SimpleDB …

• Hive as a Service?

• http://www.qubole.com/
• Request Cluster, run SQL ad-hoc and Regularly, workflow and schedule.

• Language

• SQL, R, Pig

• Computing of Estimation, Probability …
Thank You!

Más contenido relacionado

La actualidad más candente

HBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBase
HBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBaseHBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBase
HBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBaseCloudera, Inc.
 
Using Apache Hive with High Performance
Using Apache Hive with High PerformanceUsing Apache Hive with High Performance
Using Apache Hive with High PerformanceInderaj (Raj) Bains
 
How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...DataWorks Summit/Hadoop Summit
 
HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBase
HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBaseHBaseCon 2015: Taming GC Pauses for Large Java Heap in HBase
HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBaseHBaseCon
 
Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
 Best Practice of Compression/Decompression Codes in Apache Spark with Sophia... Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...Databricks
 
Apache HBase Performance Tuning
Apache HBase Performance TuningApache HBase Performance Tuning
Apache HBase Performance TuningLars Hofhansl
 
Local Secondary Indexes in Apache Phoenix
Local Secondary Indexes in Apache PhoenixLocal Secondary Indexes in Apache Phoenix
Local Secondary Indexes in Apache PhoenixRajeshbabu Chintaguntla
 
Cassandra Introduction & Features
Cassandra Introduction & FeaturesCassandra Introduction & Features
Cassandra Introduction & FeaturesDataStax Academy
 
Transactional SQL in Apache Hive
Transactional SQL in Apache HiveTransactional SQL in Apache Hive
Transactional SQL in Apache HiveDataWorks Summit
 
HBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation Buffers
HBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation BuffersHBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation Buffers
HBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation BuffersCloudera, Inc.
 
Hive User Meeting August 2009 Facebook
Hive User Meeting August 2009 FacebookHive User Meeting August 2009 Facebook
Hive User Meeting August 2009 Facebookragho
 
Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guideParquet performance tuning: the missing guide
Parquet performance tuning: the missing guideRyan Blue
 
Transactional operations in Apache Hive: present and future
Transactional operations in Apache Hive: present and futureTransactional operations in Apache Hive: present and future
Transactional operations in Apache Hive: present and futureDataWorks Summit
 
Building Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta LakeBuilding Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta LakeFlink Forward
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to RedisDvir Volk
 
Introduction to Apache ZooKeeper
Introduction to Apache ZooKeeperIntroduction to Apache ZooKeeper
Introduction to Apache ZooKeeperSaurav Haloi
 
Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Flink Forward
 
From cache to in-memory data grid. Introduction to Hazelcast.
From cache to in-memory data grid. Introduction to Hazelcast.From cache to in-memory data grid. Introduction to Hazelcast.
From cache to in-memory data grid. Introduction to Hazelcast.Taras Matyashovsky
 

La actualidad más candente (20)

HBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBase
HBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBaseHBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBase
HBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBase
 
Intro to HBase
Intro to HBaseIntro to HBase
Intro to HBase
 
Using Apache Hive with High Performance
Using Apache Hive with High PerformanceUsing Apache Hive with High Performance
Using Apache Hive with High Performance
 
How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...
 
HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBase
HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBaseHBaseCon 2015: Taming GC Pauses for Large Java Heap in HBase
HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBase
 
Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
 Best Practice of Compression/Decompression Codes in Apache Spark with Sophia... Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
 
Apache HBase Performance Tuning
Apache HBase Performance TuningApache HBase Performance Tuning
Apache HBase Performance Tuning
 
Local Secondary Indexes in Apache Phoenix
Local Secondary Indexes in Apache PhoenixLocal Secondary Indexes in Apache Phoenix
Local Secondary Indexes in Apache Phoenix
 
Cassandra Introduction & Features
Cassandra Introduction & FeaturesCassandra Introduction & Features
Cassandra Introduction & Features
 
Transactional SQL in Apache Hive
Transactional SQL in Apache HiveTransactional SQL in Apache Hive
Transactional SQL in Apache Hive
 
HBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation Buffers
HBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation BuffersHBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation Buffers
HBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation Buffers
 
Hive User Meeting August 2009 Facebook
Hive User Meeting August 2009 FacebookHive User Meeting August 2009 Facebook
Hive User Meeting August 2009 Facebook
 
Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guideParquet performance tuning: the missing guide
Parquet performance tuning: the missing guide
 
Transactional operations in Apache Hive: present and future
Transactional operations in Apache Hive: present and futureTransactional operations in Apache Hive: present and future
Transactional operations in Apache Hive: present and future
 
Building Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta LakeBuilding Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta Lake
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
 
Introduction to Apache ZooKeeper
Introduction to Apache ZooKeeperIntroduction to Apache ZooKeeper
Introduction to Apache ZooKeeper
 
Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...
 
Apache HBase™
Apache HBase™Apache HBase™
Apache HBase™
 
From cache to in-memory data grid. Introduction to Hazelcast.
From cache to in-memory data grid. Introduction to Hazelcast.From cache to in-memory data grid. Introduction to Hazelcast.
From cache to in-memory data grid. Introduction to Hazelcast.
 

Similar a HiveServer2

Running Airflow Workflows as ETL Processes on Hadoop
Running Airflow Workflows as ETL Processes on HadoopRunning Airflow Workflows as ETL Processes on Hadoop
Running Airflow Workflows as ETL Processes on Hadoopclairvoyantllc
 
Habitat talk at CodeMonsters Sofia, Bulgaria Nov 27 2018
Habitat talk at CodeMonsters Sofia, Bulgaria Nov 27 2018Habitat talk at CodeMonsters Sofia, Bulgaria Nov 27 2018
Habitat talk at CodeMonsters Sofia, Bulgaria Nov 27 2018Mandi Walls
 
.NET Core Apps: Design & Development
.NET Core Apps: Design & Development.NET Core Apps: Design & Development
.NET Core Apps: Design & DevelopmentGlobalLogic Ukraine
 
Adding Support for Networking and Web Technologies to an Embedded System
Adding Support for Networking and Web Technologies to an Embedded SystemAdding Support for Networking and Web Technologies to an Embedded System
Adding Support for Networking and Web Technologies to an Embedded SystemJohn Efstathiades
 
AAI-3218 Production Deployment Best Practices for WebSphere Liberty Profile
AAI-3218 Production Deployment Best Practices for WebSphere Liberty ProfileAAI-3218 Production Deployment Best Practices for WebSphere Liberty Profile
AAI-3218 Production Deployment Best Practices for WebSphere Liberty ProfileWASdev Community
 
What's the Hadoop-la about Kubernetes?
What's the Hadoop-la about Kubernetes?What's the Hadoop-la about Kubernetes?
What's the Hadoop-la about Kubernetes?DataWorks Summit
 
Станислав Сидоренко «DeviceHive Java Server – миграция на Spring Boot»
Станислав Сидоренко «DeviceHive Java Server – миграция на Spring Boot»Станислав Сидоренко «DeviceHive Java Server – миграция на Spring Boot»
Станислав Сидоренко «DeviceHive Java Server – миграция на Spring Boot»DataArt
 
TriHUG 2/14: Apache Sentry
TriHUG 2/14: Apache SentryTriHUG 2/14: Apache Sentry
TriHUG 2/14: Apache Sentrytrihug
 
Delivering big content at NBC News with RavenDB
Delivering big content at NBC News with RavenDBDelivering big content at NBC News with RavenDB
Delivering big content at NBC News with RavenDBJohn Bennett
 
Servletarchitecture,lifecycle,get,post
Servletarchitecture,lifecycle,get,postServletarchitecture,lifecycle,get,post
Servletarchitecture,lifecycle,get,postvamsitricks
 
Servletarchitecture,lifecycle,get,post
Servletarchitecture,lifecycle,get,postServletarchitecture,lifecycle,get,post
Servletarchitecture,lifecycle,get,postvamsi krishna
 
Servletarchitecture,lifecycle,get,post
Servletarchitecture,lifecycle,get,postServletarchitecture,lifecycle,get,post
Servletarchitecture,lifecycle,get,postvamsitricks
 
SignalR: Add real-time to your applications
SignalR: Add real-time to your applicationsSignalR: Add real-time to your applications
SignalR: Add real-time to your applicationsEugene Zharkov
 
What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?DataWorks Summit
 
What's New in Apache Hive 3.0 - Tokyo
What's New in Apache Hive 3.0 - TokyoWhat's New in Apache Hive 3.0 - Tokyo
What's New in Apache Hive 3.0 - TokyoDataWorks Summit
 
Petabyte Scale Object Storage Service Using Ceph in A Private Cloud - Varada ...
Petabyte Scale Object Storage Service Using Ceph in A Private Cloud - Varada ...Petabyte Scale Object Storage Service Using Ceph in A Private Cloud - Varada ...
Petabyte Scale Object Storage Service Using Ceph in A Private Cloud - Varada ...Ceph Community
 
CIS13: Big Data Platform Vendor’s Perspective: Insights from the Bleeding Edge
CIS13: Big Data Platform Vendor’s Perspective: Insights from the Bleeding EdgeCIS13: Big Data Platform Vendor’s Perspective: Insights from the Bleeding Edge
CIS13: Big Data Platform Vendor’s Perspective: Insights from the Bleeding EdgeCloudIDSummit
 
Local development environment evolution
Local development environment evolutionLocal development environment evolution
Local development environment evolutionWise Engineering
 

Similar a HiveServer2 (20)

Running Airflow Workflows as ETL Processes on Hadoop
Running Airflow Workflows as ETL Processes on HadoopRunning Airflow Workflows as ETL Processes on Hadoop
Running Airflow Workflows as ETL Processes on Hadoop
 
Habitat talk at CodeMonsters Sofia, Bulgaria Nov 27 2018
Habitat talk at CodeMonsters Sofia, Bulgaria Nov 27 2018Habitat talk at CodeMonsters Sofia, Bulgaria Nov 27 2018
Habitat talk at CodeMonsters Sofia, Bulgaria Nov 27 2018
 
.NET Core Apps: Design & Development
.NET Core Apps: Design & Development.NET Core Apps: Design & Development
.NET Core Apps: Design & Development
 
Adding Support for Networking and Web Technologies to an Embedded System
Adding Support for Networking and Web Technologies to an Embedded SystemAdding Support for Networking and Web Technologies to an Embedded System
Adding Support for Networking and Web Technologies to an Embedded System
 
AAI-3218 Production Deployment Best Practices for WebSphere Liberty Profile
AAI-3218 Production Deployment Best Practices for WebSphere Liberty ProfileAAI-3218 Production Deployment Best Practices for WebSphere Liberty Profile
AAI-3218 Production Deployment Best Practices for WebSphere Liberty Profile
 
What's the Hadoop-la about Kubernetes?
What's the Hadoop-la about Kubernetes?What's the Hadoop-la about Kubernetes?
What's the Hadoop-la about Kubernetes?
 
Станислав Сидоренко «DeviceHive Java Server – миграция на Spring Boot»
Станислав Сидоренко «DeviceHive Java Server – миграция на Spring Boot»Станислав Сидоренко «DeviceHive Java Server – миграция на Spring Boot»
Станислав Сидоренко «DeviceHive Java Server – миграция на Spring Boot»
 
TriHUG 2/14: Apache Sentry
TriHUG 2/14: Apache SentryTriHUG 2/14: Apache Sentry
TriHUG 2/14: Apache Sentry
 
Delivering big content at NBC News with RavenDB
Delivering big content at NBC News with RavenDBDelivering big content at NBC News with RavenDB
Delivering big content at NBC News with RavenDB
 
Ecom 1
Ecom 1Ecom 1
Ecom 1
 
Servletarchitecture,lifecycle,get,post
Servletarchitecture,lifecycle,get,postServletarchitecture,lifecycle,get,post
Servletarchitecture,lifecycle,get,post
 
Servletarchitecture,lifecycle,get,post
Servletarchitecture,lifecycle,get,postServletarchitecture,lifecycle,get,post
Servletarchitecture,lifecycle,get,post
 
Servletarchitecture,lifecycle,get,post
Servletarchitecture,lifecycle,get,postServletarchitecture,lifecycle,get,post
Servletarchitecture,lifecycle,get,post
 
SignalR: Add real-time to your applications
SignalR: Add real-time to your applicationsSignalR: Add real-time to your applications
SignalR: Add real-time to your applications
 
What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?
 
What's New in Apache Hive 3.0 - Tokyo
What's New in Apache Hive 3.0 - TokyoWhat's New in Apache Hive 3.0 - Tokyo
What's New in Apache Hive 3.0 - Tokyo
 
Petabyte Scale Object Storage Service Using Ceph in A Private Cloud - Varada ...
Petabyte Scale Object Storage Service Using Ceph in A Private Cloud - Varada ...Petabyte Scale Object Storage Service Using Ceph in A Private Cloud - Varada ...
Petabyte Scale Object Storage Service Using Ceph in A Private Cloud - Varada ...
 
CIS13: Big Data Platform Vendor’s Perspective: Insights from the Bleeding Edge
CIS13: Big Data Platform Vendor’s Perspective: Insights from the Bleeding EdgeCIS13: Big Data Platform Vendor’s Perspective: Insights from the Bleeding Edge
CIS13: Big Data Platform Vendor’s Perspective: Insights from the Bleeding Edge
 
Beginners Node.js
Beginners Node.jsBeginners Node.js
Beginners Node.js
 
Local development environment evolution
Local development environment evolutionLocal development environment evolution
Local development environment evolution
 

Más de Schubert Zhang

Engineering Culture and Infrastructure
Engineering Culture and InfrastructureEngineering Culture and Infrastructure
Engineering Culture and InfrastructureSchubert Zhang
 
Simple practices in performance monitoring and evaluation
Simple practices in performance monitoring and evaluationSimple practices in performance monitoring and evaluation
Simple practices in performance monitoring and evaluationSchubert Zhang
 
Scrum Agile Development
Scrum Agile DevelopmentScrum Agile Development
Scrum Agile DevelopmentSchubert Zhang
 
Engineering practices in big data storage and processing
Engineering practices in big data storage and processingEngineering practices in big data storage and processing
Engineering practices in big data storage and processingSchubert Zhang
 
Bigtable数据模型解决CDR清单存储问题的资源估算
Bigtable数据模型解决CDR清单存储问题的资源估算Bigtable数据模型解决CDR清单存储问题的资源估算
Bigtable数据模型解决CDR清单存储问题的资源估算Schubert Zhang
 
Big Data Engineering Team Meeting 20120223a
Big Data Engineering Team Meeting 20120223aBig Data Engineering Team Meeting 20120223a
Big Data Engineering Team Meeting 20120223aSchubert Zhang
 
HBase Coprocessor Introduction
HBase Coprocessor IntroductionHBase Coprocessor Introduction
HBase Coprocessor IntroductionSchubert Zhang
 
Hadoop大数据实践经验
Hadoop大数据实践经验Hadoop大数据实践经验
Hadoop大数据实践经验Schubert Zhang
 
Wild Thinking of BigdataBase
Wild Thinking of BigdataBaseWild Thinking of BigdataBase
Wild Thinking of BigdataBaseSchubert Zhang
 
RockStor - A Cloud Object System based on Hadoop
RockStor -  A Cloud Object System based on HadoopRockStor -  A Cloud Object System based on Hadoop
RockStor - A Cloud Object System based on HadoopSchubert Zhang
 
Hadoop compress-stream
Hadoop compress-streamHadoop compress-stream
Hadoop compress-streamSchubert Zhang
 
Ganglia轻度使用指南
Ganglia轻度使用指南Ganglia轻度使用指南
Ganglia轻度使用指南Schubert Zhang
 
DaStor/Cassandra report for CDR solution
DaStor/Cassandra report for CDR solutionDaStor/Cassandra report for CDR solution
DaStor/Cassandra report for CDR solutionSchubert Zhang
 
Learning from google megastore (Part-1)
Learning from google megastore (Part-1)Learning from google megastore (Part-1)
Learning from google megastore (Part-1)Schubert Zhang
 

Más de Schubert Zhang (20)

Blockchain in Action
Blockchain in ActionBlockchain in Action
Blockchain in Action
 
科普区块链
科普区块链科普区块链
科普区块链
 
Engineering Culture and Infrastructure
Engineering Culture and InfrastructureEngineering Culture and Infrastructure
Engineering Culture and Infrastructure
 
Simple practices in performance monitoring and evaluation
Simple practices in performance monitoring and evaluationSimple practices in performance monitoring and evaluation
Simple practices in performance monitoring and evaluation
 
Scrum Agile Development
Scrum Agile DevelopmentScrum Agile Development
Scrum Agile Development
 
Career Advice
Career AdviceCareer Advice
Career Advice
 
Engineering practices in big data storage and processing
Engineering practices in big data storage and processingEngineering practices in big data storage and processing
Engineering practices in big data storage and processing
 
Horizon for Big Data
Horizon for Big DataHorizon for Big Data
Horizon for Big Data
 
Bigtable数据模型解决CDR清单存储问题的资源估算
Bigtable数据模型解决CDR清单存储问题的资源估算Bigtable数据模型解决CDR清单存储问题的资源估算
Bigtable数据模型解决CDR清单存储问题的资源估算
 
Big Data Engineering Team Meeting 20120223a
Big Data Engineering Team Meeting 20120223aBig Data Engineering Team Meeting 20120223a
Big Data Engineering Team Meeting 20120223a
 
HBase Coprocessor Introduction
HBase Coprocessor IntroductionHBase Coprocessor Introduction
HBase Coprocessor Introduction
 
Hadoop大数据实践经验
Hadoop大数据实践经验Hadoop大数据实践经验
Hadoop大数据实践经验
 
Wild Thinking of BigdataBase
Wild Thinking of BigdataBaseWild Thinking of BigdataBase
Wild Thinking of BigdataBase
 
RockStor - A Cloud Object System based on Hadoop
RockStor -  A Cloud Object System based on HadoopRockStor -  A Cloud Object System based on Hadoop
RockStor - A Cloud Object System based on Hadoop
 
Fans of running gump
Fans of running gumpFans of running gump
Fans of running gump
 
Hadoop compress-stream
Hadoop compress-streamHadoop compress-stream
Hadoop compress-stream
 
Ganglia轻度使用指南
Ganglia轻度使用指南Ganglia轻度使用指南
Ganglia轻度使用指南
 
DaStor/Cassandra report for CDR solution
DaStor/Cassandra report for CDR solutionDaStor/Cassandra report for CDR solution
DaStor/Cassandra report for CDR solution
 
Big data and cloud
Big data and cloudBig data and cloud
Big data and cloud
 
Learning from google megastore (Part-1)
Learning from google megastore (Part-1)Learning from google megastore (Part-1)
Learning from google megastore (Part-1)
 

Último

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Principled Technologies
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 

Último (20)

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 

HiveServer2

  • 2. Hive Evolution • Original • Let users express their queries in a high-level language without having to write MapReduce programs. • Mainly target to ad-hoc queries. • As a data tool, usually work in CLI mode. • Now more … • A parallel SQL DBMS that happens to use Hadoop for its storage and execution layers. • Ad-hoc + regular • As a service …
  • 3. Introduction • Limitations of HiveServer1 • • • • Concurrency Security Client Interface Stability • Sessions/Currency • Old Thrift API and server implementation didn’t support currency. • xDBC • Old Thrift API didn’t support common xDBC • Authentication/Authorization • Incomplete implementations • Auditing/Logging HiveServer2: • From hive-0.11 / CDH4.1 • Reconstructed and Re-implemented. (HIVE-2935) • HiveServer2 is a container for the Hive execution engine (Driver). • For each client connection, it creates a new execution context (Connection and Session) that serves Hive SQL requests from the client. • The new RPC interface enables the server to associate this Hive execution context with the thread serving the client’s request.
  • 4. Architecture In fact, Driver in Operation Context System Arch. Authentication Arch. (don’t talk here) http://blog.cloudera.com/blog/2013/07/how-hiveserver2-brings-security-and-concurrency-to-apache-hive/ @Cloudera
  • 5. hiveServer2 Architecture: Internal Client-1 (main entry) start Thrift RPC Iface Client-2 thriftCLIService (TThreadPoolServer, implements Client RPC Iface) lIsten() and accept() new client connection, and process in each Thread) • Core Contexts • Connections • Sessions • Operations • Operation Path … Threads for Client Connections … call (ICLIService internal interface) cliService (Real implementations of various operations) open/close sessions, run operations in existing sessions … HiveSession Interface session HiveConf, SessionState sessionManager backgroundOperationPool runAsync session HiveConf, SessionState operationManager Threads for Async Operations … (handleToSessionMap) ... ... session HiveConf, SessionState (handleToOperationMap) create and run operations SQLop sync/async create and run hive Driver Hive Driver op op ... op SQLOp/SetOp/DfsOp/AddResourceOp/DeleteResourceOp .. GetTypeInfoOp/GetCatalogsOp/GetSchemasOp/GetTablesOp/ GetTableTypesOp/GetColumnsOp/GetFunctionsOp ...
  • 6. Architecture: Server Context • • • • Client-1 Connection-1 (Thread) Client Connection (Thread) Session (-> HiveConf, SessionState) Operation (-> Driver) Client-2 Connection-2 (Thread) Session-12 • Usually, a client only opens one Session in a Connection. (refer to JDBC HiveDriver: HiveConnection) Op-121 (SQL) Driver Session-11 Op-122 Op-123 (SQL) Driver
  • 7. Session New Client API SQL and Hive Operation • TCLIService.thrift • Complete API • Complete Database API Hive Command Operation DB Metadata Operation • Think about JDBC/ODBC • To be compatible with existing DB software • Hive Specific API • Best Practice Operation for Operation • Client API vs. Internal API • Converting and Isolation Get Result OpenSession CloseSession ExecuteStatement GetInfo * GetTypeInfo GetCatalogs GetSchemas GetTables GetTableTypes Client request to open a new session. A new HiveSession is created in server and return a unique SessionHandler (UUID). All other calls depend on this session. Client request to close the session. Will also close and remove all operations in this session. Execute a HQL statement. SQLOp Some SQL statement can be tagged “runAsync”, then it will be executed in a dedicated Thread and return immediately. SetOp,DfsOp,AddResourceOp,DeleteResourceOp Get various global variables of Hive. (Key-Type->Value) Get the detailed description and constraint of data type. Do nothing so far. Get schema from metastore. Get table schema from metastore. Get the table type, e.g. MANAGED_TABLE, EXTERNAL_TABLE, VIRTUAL_VIEW, INDEX_TABLE. GetColumns Get columns of a table from metastore. GetFunctions Get the UDF functions. GetOperationStatu Get state of an operation by opHandler, INITIALIZED/ s RUNNING/FINISHED/CANCELED/CLOSED/ERROR/UNKNOWN/PENDI NG. CancelOperation Cancel a RUNNING or PENDING operation by opHandler. For SQLOp, do cleanup: close and destroy Hive Driver, delete temp output files, and cancel the task running in the background thread… CloseOperation Remove this operation and close it: for SQLOp, do cleanup; for HiveCommandOp, tearDownSessionIO. GetResultSetMeta Get the resultset’s schema, such as the title columns. data FetchResults Fetch the result rows from the real resultset.
  • 8. Code • Packages • org.apache.hive.service …, top project of apache… • Pros • Clear Implementation • Decoupling of HiveServer2 and HiveCore • Decoupling of Thrift Client API and Internal Code • Cons • • • • Too many design pattern. Somewhere, inconsistent principle. Still not complete decoupling of HiveServer2 and HiveCore. The JDBC Driver package/jar still relies on many other core code, such Hive->Hadoop and the libs… (may be because of the support of Embedded Mode.)
  • 9. Service +state CompositeService Code HiveServer2 AbstractService +serviceList +HiveConf: Global,set by init() +addService() +removeService() +main(): 入口 +init() +start() +stop() +register(): StateChangeListener TCLIService.Iface ThriftCLIService ThrifyBinaryService +cliService ICLIService TThreadPoolServer +openSession() +closeSession() +getInfo() +executeStatement() +...() +fetchResults() CLIService +sessionManager FixedThreadPool +OpenSession() +CloseSession() +GetInfo() +ExecuteStatement() +...() +FetchResults() OperationManager +handleToOperation: HashMap +newExecuteStatementOperation() +newGetTypeInfoOperation() +...() +addOperation() +removeOperation() +getOperation() +getOperationState() +cancelOperation() +closeOperation() +getOperationNextRowSet() +...() SessionManager +handleToSession: HashMap +operationManager +backgroundOperationPool HiveSession HiveSessionImpl +sessionHandle +hiveConf: new for each +sessionState: new for each +opHandleSet +openSession() +closeSession() +getSession() +...() +submitBackgroundOperation() Operation +opHandle +parentSession +state +getState() +setState() +run() +getNextRowSet() +close() +cancel() +...() +getSessionHandle() +getInfo() +executeStatement() +executeStatementAsync() +...() +fetchResults() GetInfoOperation ExecuteStatementOperation SQLOperation AddResourceOperation DeleteResourceOpetation DfsOperation SetOperation GetSchemasOperation XXXOperation This is just a quick view, may be not exact in some detail, and intentionally missed something not so important.
  • 10. HiveCore and Depending Hive Env.? • HiveConf • Global instance • Instance for each Session. • Client can inject additional KeyValue style configurations when OpenSession. • Set an explicit session name(id) to control the download directory name. • Hive SessionState • Instance for each Session. • Hive Driver • Instance for each SQL Operation. • Global static variables? • ?? • SetOperation ->SetProcessor • set env: variables can not be set. • set system: global System.getProperties().setProperty(..) • We may forbid system setting? Or, only administrator can do it? • set hiveconf: instanced. • set hivevar: instanced. • Set: instanced • AddResource and DeleteResourceOperation • SessionState. add_resource/delete_resource • DOWNLOADED_RESOURCES_DIR("hive.downlo aded.resources.dir", System.getProperty("java.io.tmpdir") + File.separator + "${hive.session.id}_resources") • DfsOperation • Auth. With HDFS?
  • 11. Handler (Identifier) • SessionHandler • OperationHandler Theift IDL: • Use UUID struct THandleIdentifier { // 16 byte globally unique identifier // This is the public ID of the handle and // can be used for reporting. 1: required binary guid, Now, only the public ID is used, it’s OK. // 16 byte secret generated by the server // and used to verify that the handle is not // being hijacked by another user. 2: required binary secret, }
  • 12. Configurations and Run Config: Run: • • • • • • • • Start HiveServer2 hive.server2.transport.mode = binary | http | https hive.server2.thrift.port = 10000 hive.server2.thrift.bind.host hive.server2.thrift.min.worker.threads = 5 hive.server2.thrift.max.worker.threads = 500 hive.server2.async.exec.threads = 50 hive.server2.async.exec.shutdown.timeout = 10 (seconds) • hive.support.concurrency = true ??? • hive.zookeeper.quorum = • … • bin/hiveserver2 & • Start CLI (use standard JDBC) • bin/beeline • !connect jdbc:hive2://localhost:10000 • show tables; • select * from tablename limit 10;
  • 13. Interface and Clients • RPC (TCLIService.thrift) • Binary Protocol • Http/https Protocol (to be researched) • New JDBC Driver • org.apache.hive.jdbc.HiveDriver • URL: jdbc:hive2://hostname:10000/dbname… (jdbc:hive2://localhost:10000/default) • Implemented more API features. 3party Client over JDBC: • CLI • Beeline based on SQLine • IDE: SQuirreL SQL Client • Web Client (e.g. H2 Web, etc.)
  • 15. Client Tools: IDE SQuirreL SQL Client
  • 17. Think More … • Thinking of XX as Platform • Standard JDBC/ODBC • RESTful API over HTTP, Web Service • AWS Redshift, SimpleDB … • Hive as a Service? • http://www.qubole.com/ • Request Cluster, run SQL ad-hoc and Regularly, workflow and schedule. • Language • SQL, R, Pig • Computing of Estimation, Probability …