SlideShare una empresa de Scribd logo
1 de 23
HIVE
Manas Nayak
Contents
• Introduction
• Features of Hive
• Hive Architecture
• Job execution inside Hive
• Hive vs RDBMS
Introduction
The term ‘Big Data’ is used for collections
of large datasets that include huge volume,
high velocity, and a variety of data that is
increasing day by day. Using traditional data
management systems, it is difficult to process
Big Data
Introduction
Hadoop is an open-source framework to
store and process Big Data in a distributed
environment. It contains two modules, one is
MapReduce and another is Hadoop
Distributed File System (HDFS)
Introduction
• The Hadoop ecosystem contains different sub-
projects (tools) such as Sqoop, Pig, and Hive
that are used to help Hadoop modules.
• Sqoop: It is used to import and export data to
and from between HDFS and RDBMS.
• Pig: It is a procedural language platform used
to develop a script for MapReduce operations.
• Hive: It is a platform used to develop SQL type
scripts to do MapReduce operations.
Introduction
Hive is a data warehouse infrastructure
tool to process structured data in Hadoop. It
resides on top of Hadoop to summarize Big
Data, and makes querying and analyzing easy.
Initially Hive was developed by Facebook,
later the Apache Software Foundation took it
up and developed it further as an open source
under the name Apache Hive
Features of Hive
• It stores schema in a database and processed
data into HDFS.
• It is designed for OLAP.
• It provides SQL type language for querying called
HiveQL or HQL.
• It is familiar, fast, scalable, and extensible.
• Hive is designed for easy and effective data
aggregation, ad-hoc querying and analysis of
huge volumes of data.
Hive is not
• A relational database
• A design for OnLine Transaction Processing
(OLTP)
• A language for real-time queries and row-level
updates
Architecture of Hive
Architecture of Hive
3 major components:
• hive clients
• hive services
• Meta Store
Under Hive Client:
• Thrift client
• ODBC driver
• JDBC driver
Architecture of Hive
Thrift Client- it provides an easy environment to
execute the hive commands from a vast range
of programming languages. Thrift client
bindings for Hive are available for C++, Java,
PHP scripts, python scripts and Ruby.
JDBC and ODBC- JDBC and ODBC drivers can be
used for communication between hive client
and hive servers for compatible options.
Architecture of Hive
Hive services:
• Cli- the command line interface to Hive is the
default service
• Hive server-runs hive as a server exposing a
thrift service , enabling access from a range of
clients written in different languages
• Hwi-Hive web interface
Architecture of Hive
Metastore- it is the central repository of Hive
metadata. By default the metastore service
runs in the same JVM as the HIVE service.
The metastore contains information about the
partitions and tables in the warehouse, data
necessary to perform read and write
functions, and HDFS file and data locations.
Job Execution Inside Hive
Job Execution Inside Hive
Steps:
1. Execute Query-The Hive interface such as
Command Line or Web UI sends query to Driver
(any database driver such as JDBC, ODBC, etc.)
to execute.
2. Get Plan-The driver takes the help of query
compiler that parses the query to check the
syntax and query plan or the requirement of
query.
3. Get Metadata-The compiler sends metadata
request to Metastore (any database).
Job Execution Inside Hive
4. Send Metadata-Metastore sends metadata as
a response to the compiler.
5. Send Plan-The compiler checks the
requirement and resends the plan to the
driver. Up to here, the parsing and compiling
of a query is complete.
6. Execute Plan-The driver sends the execute
plan to the execution engine.
Job Execution Inside Hive
7. Execute Job-Internally, the process of execution
job is a MapReduce job. The execution engine
sends the job to JobTracker, which is in Name
node and it assigns this job to TaskTracker, which
is in Data node. Here, the query executes
MapReduce job.
7.1Metadata Ops-Meanwhile in execution, the
execution engine can execute metadata
operations with Metastore
8. Fetch Result-The execution engine receives the
results from Data nodes.
Job Execution Inside Hive
9. Send Results-The execution engine sends
those resultant values to the driver.
10. Send Results-The driver sends the results to
Hive Interfaces.
Hive Vs Relational Databases
• Hive is not a full Database it can be called a data
warehouse mainly.
• Relational databases are of “Schema on READ"
and "Schema on Write”. First creating a table and
then inserting the data into the particular table.
Insertions, Updates, Modifications can be
performed on this relational database table.
• Hive is “Schema on READ only”. Update,
modifications won't work on this because the
hive query in typical cluster is set to run on
multiple Data Nodes. So it is not possible to
update and modify data across multiple nodes.
Hive Vs Relational Databases
• Hive is based on the notation of Write once Read
many times, while the RDBMS is designed for
Read once, Write many times.
• In RDBMS, record level updates, inserts and
deletes, are possible whereas these are not
allowed in Hive
• RDBMS is best suited for dynamic data analysis
and where fast responses are expected but Hive
is suited for data warehouse applications, where
relatively static data is analyzed, fast responses
are not required and when data is not changing
rapidly.
Hive Vs Relational Databases
• In RDBMS the maximum data size allowed will
be in 10s of Terabytes whereas Hive has 100s
of Petabytes
• Hive is very easily scalable at low cost but
RDBMS is not
To overcome the limitations of Hive Hbase is
integrated with Hive for record level updates.
THANK YOU

Más contenido relacionado

La actualidad más candente

Hadoop Ecosystem | Hadoop Ecosystem Tutorial | Hadoop Tutorial For Beginners ...
Hadoop Ecosystem | Hadoop Ecosystem Tutorial | Hadoop Tutorial For Beginners ...Hadoop Ecosystem | Hadoop Ecosystem Tutorial | Hadoop Tutorial For Beginners ...
Hadoop Ecosystem | Hadoop Ecosystem Tutorial | Hadoop Tutorial For Beginners ...
Simplilearn
 
Hadoop technology
Hadoop technologyHadoop technology
Hadoop technology
tipanagiriharika
 
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Simplilearn
 

La actualidad más candente (20)

Apache hive introduction
Apache hive introductionApache hive introduction
Apache hive introduction
 
Hadoop Tutorial For Beginners
Hadoop Tutorial For BeginnersHadoop Tutorial For Beginners
Hadoop Tutorial For Beginners
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Hadoop hive presentation
Hadoop hive presentationHadoop hive presentation
Hadoop hive presentation
 
Hadoop Ecosystem | Hadoop Ecosystem Tutorial | Hadoop Tutorial For Beginners ...
Hadoop Ecosystem | Hadoop Ecosystem Tutorial | Hadoop Tutorial For Beginners ...Hadoop Ecosystem | Hadoop Ecosystem Tutorial | Hadoop Tutorial For Beginners ...
Hadoop Ecosystem | Hadoop Ecosystem Tutorial | Hadoop Tutorial For Beginners ...
 
Hadoop technology
Hadoop technologyHadoop technology
Hadoop technology
 
Introduction to Apache Hive
Introduction to Apache HiveIntroduction to Apache Hive
Introduction to Apache Hive
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
 
Hadoop Map Reduce
Hadoop Map ReduceHadoop Map Reduce
Hadoop Map Reduce
 
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
 
Introduction to HiveQL
Introduction to HiveQLIntroduction to HiveQL
Introduction to HiveQL
 
Introduction To HBase
Introduction To HBaseIntroduction To HBase
Introduction To HBase
 
Hadoop
HadoopHadoop
Hadoop
 
Hive(ppt)
Hive(ppt)Hive(ppt)
Hive(ppt)
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY ppt
 
Sqoop
SqoopSqoop
Sqoop
 
SQOOP PPT
SQOOP PPTSQOOP PPT
SQOOP PPT
 
Hadoop
Hadoop Hadoop
Hadoop
 
PPT on Hadoop
PPT on HadoopPPT on Hadoop
PPT on Hadoop
 
Big data and Hadoop
Big data and HadoopBig data and Hadoop
Big data and Hadoop
 

Similar a Hive

An Introduction-to-Hive and its Applications and Implementations.pptx
An Introduction-to-Hive and its Applications and Implementations.pptxAn Introduction-to-Hive and its Applications and Implementations.pptx
An Introduction-to-Hive and its Applications and Implementations.pptx
iaeronlineexm
 
Big Data & Analytics (CSE6005) L6.pptx
Big Data & Analytics (CSE6005) L6.pptxBig Data & Analytics (CSE6005) L6.pptx
Big Data & Analytics (CSE6005) L6.pptx
Anonymous9etQKwW
 
Hadoop Distriubted File System (HDFS) presentation 27- 5-2015
Hadoop Distriubted File System (HDFS) presentation 27- 5-2015Hadoop Distriubted File System (HDFS) presentation 27- 5-2015
Hadoop Distriubted File System (HDFS) presentation 27- 5-2015
Abdul Nasir
 

Similar a Hive (20)

1. Apache HIVE
1. Apache HIVE1. Apache HIVE
1. Apache HIVE
 
An Introduction-to-Hive and its Applications and Implementations.pptx
An Introduction-to-Hive and its Applications and Implementations.pptxAn Introduction-to-Hive and its Applications and Implementations.pptx
An Introduction-to-Hive and its Applications and Implementations.pptx
 
01-Introduction-to-Hive.pptx
01-Introduction-to-Hive.pptx01-Introduction-to-Hive.pptx
01-Introduction-to-Hive.pptx
 
Hive_Pig.pptx
Hive_Pig.pptxHive_Pig.pptx
Hive_Pig.pptx
 
Unveiling Hive: A Comprehensive Exploration of Hive in Hadoop Ecosystem
Unveiling Hive: A Comprehensive Exploration of Hive in Hadoop EcosystemUnveiling Hive: A Comprehensive Exploration of Hive in Hadoop Ecosystem
Unveiling Hive: A Comprehensive Exploration of Hive in Hadoop Ecosystem
 
Hive.pptx
Hive.pptxHive.pptx
Hive.pptx
 
Apache Hadoop Hive
Apache Hadoop HiveApache Hadoop Hive
Apache Hadoop Hive
 
hive.pptx
hive.pptxhive.pptx
hive.pptx
 
BIGDATA ppts
BIGDATA pptsBIGDATA ppts
BIGDATA ppts
 
Big Data Developers Moscow Meetup 1 - sql on hadoop
Big Data Developers Moscow Meetup 1  - sql on hadoopBig Data Developers Moscow Meetup 1  - sql on hadoop
Big Data Developers Moscow Meetup 1 - sql on hadoop
 
Big Data & Analytics (CSE6005) L6.pptx
Big Data & Analytics (CSE6005) L6.pptxBig Data & Analytics (CSE6005) L6.pptx
Big Data & Analytics (CSE6005) L6.pptx
 
SQL Server 2012 and Big Data
SQL Server 2012 and Big DataSQL Server 2012 and Big Data
SQL Server 2012 and Big Data
 
Unit II Hadoop Ecosystem_Updated.pptx
Unit II Hadoop Ecosystem_Updated.pptxUnit II Hadoop Ecosystem_Updated.pptx
Unit II Hadoop Ecosystem_Updated.pptx
 
Big data solutions in azure
Big data solutions in azureBig data solutions in azure
Big data solutions in azure
 
Getting started big data
Getting started big dataGetting started big data
Getting started big data
 
Big data - Online Training
Big data - Online TrainingBig data - Online Training
Big data - Online Training
 
Hadoop Distriubted File System (HDFS) presentation 27- 5-2015
Hadoop Distriubted File System (HDFS) presentation 27- 5-2015Hadoop Distriubted File System (HDFS) presentation 27- 5-2015
Hadoop Distriubted File System (HDFS) presentation 27- 5-2015
 
Hive and querying data
Hive and querying dataHive and querying data
Hive and querying data
 
Building Big data solutions in Azure
Building Big data solutions in AzureBuilding Big data solutions in Azure
Building Big data solutions in Azure
 
Technologies for Data Analytics Platform
Technologies for Data Analytics PlatformTechnologies for Data Analytics Platform
Technologies for Data Analytics Platform
 

Último

"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments""Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
mphochane1998
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
Neometrix_Engineering_Pvt_Ltd
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
Epec Engineered Technologies
 

Último (20)

Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
 
Work-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxWork-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptx
 
Hostel management system project report..pdf
Hostel management system project report..pdfHostel management system project report..pdf
Hostel management system project report..pdf
 
Moment Distribution Method For Btech Civil
Moment Distribution Method For Btech CivilMoment Distribution Method For Btech Civil
Moment Distribution Method For Btech Civil
 
A Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityA Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna Municipality
 
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptxA CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
 
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments""Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
 
Wadi Rum luxhotel lodge Analysis case study.pptx
Wadi Rum luxhotel lodge Analysis case study.pptxWadi Rum luxhotel lodge Analysis case study.pptx
Wadi Rum luxhotel lodge Analysis case study.pptx
 
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxHOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
 
PE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and propertiesPE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and properties
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the start
 
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
 
Block diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptBlock diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.ppt
 

Hive

  • 2. Contents • Introduction • Features of Hive • Hive Architecture • Job execution inside Hive • Hive vs RDBMS
  • 3. Introduction The term ‘Big Data’ is used for collections of large datasets that include huge volume, high velocity, and a variety of data that is increasing day by day. Using traditional data management systems, it is difficult to process Big Data
  • 4. Introduction Hadoop is an open-source framework to store and process Big Data in a distributed environment. It contains two modules, one is MapReduce and another is Hadoop Distributed File System (HDFS)
  • 5. Introduction • The Hadoop ecosystem contains different sub- projects (tools) such as Sqoop, Pig, and Hive that are used to help Hadoop modules. • Sqoop: It is used to import and export data to and from between HDFS and RDBMS. • Pig: It is a procedural language platform used to develop a script for MapReduce operations. • Hive: It is a platform used to develop SQL type scripts to do MapReduce operations.
  • 6. Introduction Hive is a data warehouse infrastructure tool to process structured data in Hadoop. It resides on top of Hadoop to summarize Big Data, and makes querying and analyzing easy. Initially Hive was developed by Facebook, later the Apache Software Foundation took it up and developed it further as an open source under the name Apache Hive
  • 7. Features of Hive • It stores schema in a database and processed data into HDFS. • It is designed for OLAP. • It provides SQL type language for querying called HiveQL or HQL. • It is familiar, fast, scalable, and extensible. • Hive is designed for easy and effective data aggregation, ad-hoc querying and analysis of huge volumes of data.
  • 8. Hive is not • A relational database • A design for OnLine Transaction Processing (OLTP) • A language for real-time queries and row-level updates
  • 10. Architecture of Hive 3 major components: • hive clients • hive services • Meta Store Under Hive Client: • Thrift client • ODBC driver • JDBC driver
  • 11. Architecture of Hive Thrift Client- it provides an easy environment to execute the hive commands from a vast range of programming languages. Thrift client bindings for Hive are available for C++, Java, PHP scripts, python scripts and Ruby. JDBC and ODBC- JDBC and ODBC drivers can be used for communication between hive client and hive servers for compatible options.
  • 12. Architecture of Hive Hive services: • Cli- the command line interface to Hive is the default service • Hive server-runs hive as a server exposing a thrift service , enabling access from a range of clients written in different languages • Hwi-Hive web interface
  • 13. Architecture of Hive Metastore- it is the central repository of Hive metadata. By default the metastore service runs in the same JVM as the HIVE service. The metastore contains information about the partitions and tables in the warehouse, data necessary to perform read and write functions, and HDFS file and data locations.
  • 15. Job Execution Inside Hive Steps: 1. Execute Query-The Hive interface such as Command Line or Web UI sends query to Driver (any database driver such as JDBC, ODBC, etc.) to execute. 2. Get Plan-The driver takes the help of query compiler that parses the query to check the syntax and query plan or the requirement of query. 3. Get Metadata-The compiler sends metadata request to Metastore (any database).
  • 16. Job Execution Inside Hive 4. Send Metadata-Metastore sends metadata as a response to the compiler. 5. Send Plan-The compiler checks the requirement and resends the plan to the driver. Up to here, the parsing and compiling of a query is complete. 6. Execute Plan-The driver sends the execute plan to the execution engine.
  • 17. Job Execution Inside Hive 7. Execute Job-Internally, the process of execution job is a MapReduce job. The execution engine sends the job to JobTracker, which is in Name node and it assigns this job to TaskTracker, which is in Data node. Here, the query executes MapReduce job. 7.1Metadata Ops-Meanwhile in execution, the execution engine can execute metadata operations with Metastore 8. Fetch Result-The execution engine receives the results from Data nodes.
  • 18. Job Execution Inside Hive 9. Send Results-The execution engine sends those resultant values to the driver. 10. Send Results-The driver sends the results to Hive Interfaces.
  • 19. Hive Vs Relational Databases • Hive is not a full Database it can be called a data warehouse mainly. • Relational databases are of “Schema on READ" and "Schema on Write”. First creating a table and then inserting the data into the particular table. Insertions, Updates, Modifications can be performed on this relational database table. • Hive is “Schema on READ only”. Update, modifications won't work on this because the hive query in typical cluster is set to run on multiple Data Nodes. So it is not possible to update and modify data across multiple nodes.
  • 20. Hive Vs Relational Databases • Hive is based on the notation of Write once Read many times, while the RDBMS is designed for Read once, Write many times. • In RDBMS, record level updates, inserts and deletes, are possible whereas these are not allowed in Hive • RDBMS is best suited for dynamic data analysis and where fast responses are expected but Hive is suited for data warehouse applications, where relatively static data is analyzed, fast responses are not required and when data is not changing rapidly.
  • 21. Hive Vs Relational Databases • In RDBMS the maximum data size allowed will be in 10s of Terabytes whereas Hive has 100s of Petabytes • Hive is very easily scalable at low cost but RDBMS is not
  • 22. To overcome the limitations of Hive Hbase is integrated with Hive for record level updates.