Submit Search
Upload
Hadoop Hive Talk At IIT-Delhi
•
Download as PPT, PDF
•
14 likes
•
3,829 views
Joydeep Sen Sarma
Follow
Talk at the CS department in IIT 04/02/09.
Read less
Read more
Technology
Report
Share
Report
Share
1 of 37
Download now
Recommended
Qubole Overview at the Fifth Elephant Conference
Qubole Overview at the Fifth Elephant Conference
Joydeep Sen Sarma
Nextag talk
Nextag talk
Joydeep Sen Sarma
Facebook Retrospective - Big data-world-europe-2012
Facebook Retrospective - Big data-world-europe-2012
Joydeep Sen Sarma
Cloud Optimized Big Data
Cloud Optimized Big Data
Joydeep Sen Sarma
Qubole @ AWS Meetup Bangalore - July 2015
Qubole @ AWS Meetup Bangalore - July 2015
Joydeep Sen Sarma
Hadoop trainting in hyderabad@kelly technologies
Hadoop trainting in hyderabad@kelly technologies
Kelly Technologies
Messaging architecture @FB (Fifth Elephant Conference)
Messaging architecture @FB (Fifth Elephant Conference)
Joydeep Sen Sarma
Migrating structured data between Hadoop and RDBMS
Migrating structured data between Hadoop and RDBMS
Bouquet
Recommended
Qubole Overview at the Fifth Elephant Conference
Qubole Overview at the Fifth Elephant Conference
Joydeep Sen Sarma
Nextag talk
Nextag talk
Joydeep Sen Sarma
Facebook Retrospective - Big data-world-europe-2012
Facebook Retrospective - Big data-world-europe-2012
Joydeep Sen Sarma
Cloud Optimized Big Data
Cloud Optimized Big Data
Joydeep Sen Sarma
Qubole @ AWS Meetup Bangalore - July 2015
Qubole @ AWS Meetup Bangalore - July 2015
Joydeep Sen Sarma
Hadoop trainting in hyderabad@kelly technologies
Hadoop trainting in hyderabad@kelly technologies
Kelly Technologies
Messaging architecture @FB (Fifth Elephant Conference)
Messaging architecture @FB (Fifth Elephant Conference)
Joydeep Sen Sarma
Migrating structured data between Hadoop and RDBMS
Migrating structured data between Hadoop and RDBMS
Bouquet
Cloudera Impala + PostgreSQL
Cloudera Impala + PostgreSQL
liuknag
Hadoop Primer
Hadoop Primer
Steve Staso
Hadoop - Overview
Hadoop - Overview
Jay
Big Data Journey
Big Data Journey
Tugdual Grall
Practical Problem Solving with Apache Hadoop & Pig
Practical Problem Solving with Apache Hadoop & Pig
Milind Bhandarkar
HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...
HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...
Cloudera, Inc.
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
Databricks
Hudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilities
Nishith Agarwal
An intriduction to hive
An intriduction to hive
Reza Ameri
Facebooks Petabyte Scale Data Warehouse using Hive and Hadoop
Facebooks Petabyte Scale Data Warehouse using Hive and Hadoop
royans
Syncsort et le retour d'expérience ComScore
Syncsort et le retour d'expérience ComScore
Modern Data Stack France
Apache Hadoop and HBase
Apache Hadoop and HBase
Cloudera, Inc.
Building a Scalable Web Crawler with Hadoop
Building a Scalable Web Crawler with Hadoop
Hadoop User Group
מיכאל
מיכאל
sqlserver.co.il
Building large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudi
Bill Liu
Hoodie - DataEngConf 2017
Hoodie - DataEngConf 2017
Vinoth Chandar
Powering a Virtual Power Station with Big Data
Powering a Virtual Power Station with Big Data
DataWorks Summit/Hadoop Summit
Apache Hadoop 1.1
Apache Hadoop 1.1
Sperasoft
Big Data and Hadoop Ecosystem
Big Data and Hadoop Ecosystem
Rajkumar Singh
Hadoop Tutorial
Hadoop Tutorial
awesomesos
HIVE: Data Warehousing & Analytics on Hadoop
HIVE: Data Warehousing & Analytics on Hadoop
Zheng Shao
Hive
Hive
Srinath Reddy
More Related Content
What's hot
Cloudera Impala + PostgreSQL
Cloudera Impala + PostgreSQL
liuknag
Hadoop Primer
Hadoop Primer
Steve Staso
Hadoop - Overview
Hadoop - Overview
Jay
Big Data Journey
Big Data Journey
Tugdual Grall
Practical Problem Solving with Apache Hadoop & Pig
Practical Problem Solving with Apache Hadoop & Pig
Milind Bhandarkar
HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...
HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...
Cloudera, Inc.
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
Databricks
Hudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilities
Nishith Agarwal
An intriduction to hive
An intriduction to hive
Reza Ameri
Facebooks Petabyte Scale Data Warehouse using Hive and Hadoop
Facebooks Petabyte Scale Data Warehouse using Hive and Hadoop
royans
Syncsort et le retour d'expérience ComScore
Syncsort et le retour d'expérience ComScore
Modern Data Stack France
Apache Hadoop and HBase
Apache Hadoop and HBase
Cloudera, Inc.
Building a Scalable Web Crawler with Hadoop
Building a Scalable Web Crawler with Hadoop
Hadoop User Group
מיכאל
מיכאל
sqlserver.co.il
Building large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudi
Bill Liu
Hoodie - DataEngConf 2017
Hoodie - DataEngConf 2017
Vinoth Chandar
Powering a Virtual Power Station with Big Data
Powering a Virtual Power Station with Big Data
DataWorks Summit/Hadoop Summit
Apache Hadoop 1.1
Apache Hadoop 1.1
Sperasoft
Big Data and Hadoop Ecosystem
Big Data and Hadoop Ecosystem
Rajkumar Singh
Hadoop Tutorial
Hadoop Tutorial
awesomesos
What's hot
(20)
Cloudera Impala + PostgreSQL
Cloudera Impala + PostgreSQL
Hadoop Primer
Hadoop Primer
Hadoop - Overview
Hadoop - Overview
Big Data Journey
Big Data Journey
Practical Problem Solving with Apache Hadoop & Pig
Practical Problem Solving with Apache Hadoop & Pig
HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...
HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
Hudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilities
An intriduction to hive
An intriduction to hive
Facebooks Petabyte Scale Data Warehouse using Hive and Hadoop
Facebooks Petabyte Scale Data Warehouse using Hive and Hadoop
Syncsort et le retour d'expérience ComScore
Syncsort et le retour d'expérience ComScore
Apache Hadoop and HBase
Apache Hadoop and HBase
Building a Scalable Web Crawler with Hadoop
Building a Scalable Web Crawler with Hadoop
מיכאל
מיכאל
Building large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudi
Hoodie - DataEngConf 2017
Hoodie - DataEngConf 2017
Powering a Virtual Power Station with Big Data
Powering a Virtual Power Station with Big Data
Apache Hadoop 1.1
Apache Hadoop 1.1
Big Data and Hadoop Ecosystem
Big Data and Hadoop Ecosystem
Hadoop Tutorial
Hadoop Tutorial
Similar to Hadoop Hive Talk At IIT-Delhi
HIVE: Data Warehousing & Analytics on Hadoop
HIVE: Data Warehousing & Analytics on Hadoop
Zheng Shao
Hive
Hive
Srinath Reddy
Hive Apachecon 2008
Hive Apachecon 2008
athusoo
2008 Ur Tech Talk Zshao
2008 Ur Tech Talk Zshao
Jeff Hammerbacher
Hadoop and Hive
Hadoop and Hive
Zheng Shao
Hive Training -- Motivations and Real World Use Cases
Hive Training -- Motivations and Real World Use Cases
nzhang
Hive ICDE 2010
Hive ICDE 2010
ragho
Hive Percona 2009
Hive Percona 2009
prasadc
Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010
nzhang
Apache Hadoop India Summit 2011 talk "Hive Evolution" by Namit Jain
Apache Hadoop India Summit 2011 talk "Hive Evolution" by Namit Jain
Yahoo Developer Network
An introduction to Hadoop for large scale data analysis
An introduction to Hadoop for large scale data analysis
Abhijit Sharma
Hadoop Summit 2009 Hive
Hadoop Summit 2009 Hive
Zheng Shao
Hadoop Summit 2009 Hive
Hadoop Summit 2009 Hive
Namit Jain
Meethadoop
Meethadoop
IIIT-H
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
Xiao Qin
02 data warehouse applications with hive
02 data warehouse applications with hive
Subhas Kumar Ghosh
Hive User Meeting August 2009 Facebook
Hive User Meeting August 2009 Facebook
ragho
Hive User Meeting 2009 8 Facebook
Hive User Meeting 2009 8 Facebook
Zheng Shao
Hadoop institutes in hyderabad
Hadoop institutes in hyderabad
Kelly Technologies
Stratosphere with big_data_analytics
Stratosphere with big_data_analytics
Avinash Pandu
Similar to Hadoop Hive Talk At IIT-Delhi
(20)
HIVE: Data Warehousing & Analytics on Hadoop
HIVE: Data Warehousing & Analytics on Hadoop
Hive
Hive
Hive Apachecon 2008
Hive Apachecon 2008
2008 Ur Tech Talk Zshao
2008 Ur Tech Talk Zshao
Hadoop and Hive
Hadoop and Hive
Hive Training -- Motivations and Real World Use Cases
Hive Training -- Motivations and Real World Use Cases
Hive ICDE 2010
Hive ICDE 2010
Hive Percona 2009
Hive Percona 2009
Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010
Apache Hadoop India Summit 2011 talk "Hive Evolution" by Namit Jain
Apache Hadoop India Summit 2011 talk "Hive Evolution" by Namit Jain
An introduction to Hadoop for large scale data analysis
An introduction to Hadoop for large scale data analysis
Hadoop Summit 2009 Hive
Hadoop Summit 2009 Hive
Hadoop Summit 2009 Hive
Hadoop Summit 2009 Hive
Meethadoop
Meethadoop
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
02 data warehouse applications with hive
02 data warehouse applications with hive
Hive User Meeting August 2009 Facebook
Hive User Meeting August 2009 Facebook
Hive User Meeting 2009 8 Facebook
Hive User Meeting 2009 8 Facebook
Hadoop institutes in hyderabad
Hadoop institutes in hyderabad
Stratosphere with big_data_analytics
Stratosphere with big_data_analytics
Recently uploaded
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
LoriGlavin3
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
Alan Dix
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
hariprasad279825
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
LoriGlavin3
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
BookNet Canada
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
BookNet Canada
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
DianaGray10
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
Fwdays
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
Rick Flair
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
gvaughan
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
LoriGlavin3
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????
blackmambaettijean
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
Manik S Magar
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
MounikaPolabathina
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
Pixlogix Infotech
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
Fwdays
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
Addepto
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
Raghuram Pandurangan
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
Sergiu Bodiu
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
Stephanie Beckett
Recently uploaded
(20)
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
Hadoop Hive Talk At IIT-Delhi
1.
Hadoop and Hive
Large Scale Data Processing using Commodity HW/SW Joydeep Sen Sarma
2.
3.
4.
5.
Looks like this
.. Disks Node Disks Node Disks Node Disks Node Disks Node Disks Node 1 Gigabit 4-8 Gigabit Node = DataNode + Map-Reduce
6.
7.
In pictures ..
NameNode Disks 32GB RAM Secondary NameNode Disks 32GB RAM DataNode DataNode DataNode DFS Client DataNode DataNode DataNode getLocations locations
8.
9.
10.
Map/Reduce DataFLow
11.
12.
13.
HIVE: Components HDFS
Hive CLI DDL Queries Browsing Map Reduce MetaStore Thrift API SerDe Thrift Jute JSON.. Execution Hive QL Parser Planner Mgmt. Web UI
14.
Data Model Logical
Partitioning Hash Partitioning Schema Library clicks HDFS MetaStore / hive/clicks /hive/clicks/ds=2008-03-25 /hive/clicks/ds=2008-03-25/0 … Tables #Buckets=32 Bucketing Info Partitioning Cols
15.
16.
17.
18.
Hive QL –
Join in Map Reduce page_view user pv_users Map Shuffle Sort Reduce key value 111 < 1, 1> 111 < 1, 2> 222 < 1, 1> pageid userid time 1 111 9:08:01 2 111 9:08:13 1 222 9:08:14 userid age gender 111 25 female 222 32 male key value 111 < 2, 25> 222 < 2, 32> key value 111 < 1, 1> 111 < 1, 2> 111 < 2, 25> key value 222 < 1, 1> 222 < 2, 32> pageid age 1 25 2 25 pageid age 1 32
19.
20.
21.
22.
23.
Hive QL –
Group By in Map Reduce pv_users Map Shuffle Sort Reduce pageid age 1 25 2 25 pageid age count 1 25 1 1 32 1 pageid age 1 32 2 25 key value <1,25> 1 <2,25> 1 key value <1,32> 1 <2,25> 1 key value <1,25> 1 <1,32> 1 key value <2,25> 1 <2,25> 1 pageid age count 2 25 2
24.
25.
Hive QL –
Group By with Distinct in Map Reduce page_view Shuffle and Sort Reduce Map Reduce pageid count 1 1 2 1 pageid count 1 1 pageid userid time 1 111 9:08:01 2 111 9:08:13 pageid userid time 1 222 9:08:14 2 111 9:08:20 key v <1,111> <2,111> <2,111> key v <1,222> pageid count 1 2 pageid count 2 1
26.
27.
28.
29.
30.
31.
32.
Data Warehousing at
Facebook Today Web Servers Scribe Servers Filers Hive on Hadoop Cluster Oracle RAC Federated MySQL
33.
34.
In Pictures
35.
36.
37.
Editor's Notes
Offline and Near-Real time data processing Not online
Download now