SlideShare una empresa de Scribd logo
1 de 59
Descargar para leer sin conexión
Copyright	
  ©	
  2011-­‐2012,	
  Think	
  Big	
  Analy8cs,	
  All	
  Rights	
  Reserved
Hive: SQL for Hadoop
An	
  Essen8al	
  Tool	
  for	
  
Hadoop-­‐based	
  
Data	
  Warehouses
Tuesday, January 10, 12
I’ll	
  argue	
  that	
  Hive	
  is	
  indispensable	
  to	
  people	
  crea8ng	
  “data	
  warehouses”	
  with	
  Hadoop,	
  because	
  it	
  gives	
  them	
  a	
  “similar”	
  SQL	
  interface	
  to	
  their	
  data,	
  
making	
  it	
  easier	
  to	
  migrate	
  skills	
  and	
  even	
  apps	
  from	
  exis8ng	
  rela8onal	
  tools	
  to	
  Hadoop.
Copyright	
  ©	
  2011-­‐2012,	
  Think	
  Big	
  Analy8cs,	
  All	
  Rights	
  Reserved
Adapted from our
Hadoop 3-Day Developer Course
info@thinkbiganalytics.com
thinkbiganalytics.com
Tuesday, January 10, 12
These	
  notes	
  are	
  (heavily)	
  adapted	
  from	
  our	
  Hadoop	
  3-­‐day	
  developer	
  course.	
  Of	
  course,	
  we	
  won’t	
  do	
  exercises	
  in	
  this	
  talk	
  ;)	
  The	
  second	
  day	
  focuses	
  on	
  
Hive.	
  We	
  also	
  offer	
  Hive	
  training	
  by	
  itself,	
  admin	
  courses,	
  etc.	
  
Copyright	
  ©	
  2011-­‐2012,	
  Think	
  Big	
  Analy8cs,	
  All	
  Rights	
  Reserved
Programming Hive
Summer, 2012
O’Reilly
Tuesday, January 10, 12
I’m	
  collabora8ng	
  with	
  two	
  other	
  colleagues	
  to	
  write	
  “Programming	
  Hive”,	
  which	
  will	
  be	
  published	
  by	
  O’Reilly	
  this	
  summer.	
  Un8l	
  it’s	
  ready,	
  the	
  best	
  source	
  
of	
  documenta8on	
  is	
  on	
  the	
  Hive	
  site,	
  hYp://hive.apache.org.
Copyright	
  ©	
  2011-­‐2012,	
  Think	
  Big	
  Analy8cs,	
  All	
  Rights	
  Reserved4
programmingscala.compolyglotprogramming.com/fpjava
Dean Wampler
Functional
Programming
for Java Developers
Dean Wampler
20+ years exp.
Think Big Academy
Tuesday, January 10, 12
Copyright	
  ©	
  2011-­‐2012,	
  Think	
  Big	
  Analy8cs,	
  All	
  Rights	
  Reserved5
A True Story...
Tuesday, January 10, 12
Once upon an Internet
time, in the land of
Enterprise, the King’s
Warehouse of Data was
bursting at the seams.
Tuesday, January 10, 12
And lo, the scribes
decided they could cut
costs and store more
data by moving to
Hadoop.
Tuesday, January 10, 12
But then the scribes
complained amongst
themselves, “The
language of the land of
Java hurts our tongues!
Tuesday, January 10, 12
“It takes forever to say
anything!”
Tuesday, January 10, 12
There was a scribe who
walked through the
Kingdom-wide Labyrinth
(kwl), where he found
The Book of Faces.
Tuesday, January 10, 12
In that book, he found
the secret language the
Bee Keepers use to
query the god Hadoop!
Tuesday, January 10, 12
And there was much
rejoicing, for the scribes
could work with their
data again!
Tuesday, January 10, 12
The End.
Tuesday, January 10, 12
Copyright	
  ©	
  2011-­‐2012,	
  Think	
  Big	
  Analy8cs,	
  All	
  Rights	
  Reserved14
Enter Hive
Tuesday, January 10, 12
Copyright	
  ©	
  2011-­‐2012,	
  Think	
  Big	
  Analy8cs,	
  All	
  Rights	
  Reserved15
Since your team knows SQL
and all your DataWarehouse
apps are written in SQL,
Hive minimizes the effort of
migrating to Hadoop.
Tuesday, January 10, 12
Copyright	
  ©	
  2011-­‐2012,	
  Think	
  Big	
  Analy8cs,	
  All	
  Rights	
  Reserved
• Ideal for data warehousing.
• Ad-hoc queries of data.
• Familiar SQL dialect.
• Analysis of large data sets.
• Hadoop MapReduce jobs.
16
Hive
Tuesday, January 10, 12
Hive is a killer app, in our opinion, for data warehouse teams migrating to Hadoop, because it
gives them a familiar SQL language that hides the complexity of MR programming.
Copyright	
  ©	
  2011-­‐2012,	
  Think	
  Big	
  Analy8cs,	
  All	
  Rights	
  Reserved
• Invented at Facebook.
• Open sourced to Apache in
2008.
• http://hive.apache.org
17
Hive
Tuesday, January 10, 12
Copyright	
  ©	
  2011-­‐2012,	
  Think	
  Big	
  Analy8cs,	
  All	
  Rights	
  Reserved18
A Scenario:
Mining Daily
Click Stream
Logs
Tuesday, January 10, 12
Copyright	
  ©	
  2011-­‐2012,	
  Think	
  Big	
  Analy8cs,	
  All	
  Rights	
  Reserved19
Ingest & Transform:
• From: file://server1/var/log/clicks.log
Jan 9 09:02:17 server1 movies[18]:
1234: search for “vampires in love”.
…
Tuesday, January 10, 12
As we copy the daily click stream log over to a local staging location, we transform it into the
Hive table format we want.
Copyright	
  ©	
  2011-­‐2012,	
  Think	
  Big	
  Analy8cs,	
  All	
  Rights	
  Reserved20
Ingest & Transform:
• From: file://server1/var/log/clicks.log
Jan 9 09:02:17 server1 movies[18]:
1234: search for “vampires in love”.
…
Timestamp
Tuesday, January 10, 12
Copyright	
  ©	
  2011-­‐2012,	
  Think	
  Big	
  Analy8cs,	
  All	
  Rights	
  Reserved21
Ingest & Transform:
• From: file://server1/var/log/clicks.log
Jan 9 09:02:17 server1 movies[18]:
1234: search for “vampires in love”.
…
The server
Tuesday, January 10, 12
Copyright	
  ©	
  2011-­‐2012,	
  Think	
  Big	
  Analy8cs,	
  All	
  Rights	
  Reserved22
Ingest & Transform:
• From: file://server1/var/log/clicks.log
Jan 9 09:02:17 server1 movies[18]:
1234: search for “vampires in love”.
…
The process
(“movies search”)
and the process id.
Tuesday, January 10, 12
Copyright	
  ©	
  2011-­‐2012,	
  Think	
  Big	
  Analy8cs,	
  All	
  Rights	
  Reserved23
Ingest & Transform:
• From: file://server1/var/log/clicks.log
Jan 9 09:02:17 server1 movies[18]:
1234: search for “vampires in love”.
…
Customer id
Tuesday, January 10, 12
Copyright	
  ©	
  2011-­‐2012,	
  Think	
  Big	
  Analy8cs,	
  All	
  Rights	
  Reserved24
Ingest & Transform:
• From: file://server1/var/log/clicks.log
Jan 9 09:02:17 server1 movies[18]:
1234: search for “vampires in love”.
…
The log “message”
Tuesday, January 10, 12
Copyright	
  ©	
  2011-­‐2012,	
  Think	
  Big	
  Analy8cs,	
  All	
  Rights	
  Reserved25
Ingest & Transform:
• From: file://server1/var/log/clicks.log
09:02:17^Aserver1^Amovies^A18^A1234^Asea
rch for “vampires in love”.
…
• To: /staging/2012-01-09.log
Jan 9 09:02:17 server1 movies[18]:
1234: search for “vampires in love”.
…
Tuesday, January 10, 12
As we copy the daily click stream log over to a local staging location, we transform it into the
Hive table format we want.
Copyright	
  ©	
  2011-­‐2012,	
  Think	
  Big	
  Analy8cs,	
  All	
  Rights	
  Reserved26
Ingest & Transform:
09:02:17^Aserver1^Amovies^A18^A1234^Asea
rch for “vampires in love”.
…
• To: /staging/2012-01-09.log
• Removed month (Jan) and day (09).
• Added ^A as field separators (Hive convention).
• Separated process id from process name.
Tuesday, January 10, 12
The transformations we made. (You could use many different Linux, scripting, code, or
Hadoop-related ingestion tools to do this.
Copyright	
  ©	
  2011-­‐2012,	
  Think	
  Big	
  Analy8cs,	
  All	
  Rights	
  Reserved27
Ingest & Transform:
hadoop fs -put /staging/2012-01-09.log 
/clicks/2012/01/09/log.txt
• Put in HDFS:
• (The final file name doesn’t matter…)
Tuesday, January 10, 12
Here we use the hadoop shell command to put the file where we want it in the file system.
Note that the name of the target file doesn’t matter; we’ll just tell Hive to read all files in the
directory, so there could be many files there!
Copyright	
  ©	
  2011-­‐2012,	
  Think	
  Big	
  Analy8cs,	
  All	
  Rights	
  Reserved28
Back to Hive...
CREATE EXTERNAL TABLE clicks (
hms STRING,
hostname STRING,
process STRING,
pid INT,
uid INT,
message STRING)
PARTITIONED BY (
year INT,
month INT,
day INT);
• Create an external Hive table:
You don’t have to
use EXTERNAL
and PARTITIONED
together….
Tuesday, January 10, 12
Now let’s create an “external” table that will read those files as the “backing store”. Also, we
make it partitioned to accelerate queries that limit by year, month or day. (You don’t have to
use external and partitioned together…)
Copyright	
  ©	
  2011-­‐2012,	
  Think	
  Big	
  Analy8cs,	
  All	
  Rights	
  Reserved29
Back to Hive...
ALTER TABLE clicks ADD IF NOT EXISTS
PARTITION (
year=2012,
month=01,
day=09)
LOCATION '/clicks/2012/01/09';
• Add a partition for 2012-01-09:
• A directory in HDFS.
Tuesday, January 10, 12
We add a partition for each day. Note the LOCATION path, which is a the directory where we
wrote our file.
Copyright	
  ©	
  2011-­‐2012,	
  Think	
  Big	
  Analy8cs,	
  All	
  Rights	
  Reserved30
Now,Analyze!!
SELECT hms, uid, message FROM clicks
WHERE message LIKE '%vampire%' AND
year = 2012 AND
month = 01 AND
day = 09;
• What’s with the kids and vampires??
• After some MapReduce crunching...
…
09:02:29 1234 search for “twilight of the vampires”
09:02:35 1234 add to cart “vampires want their genre back”
…
Tuesday, January 10, 12
And we can run SQL queries!!
Copyright	
  ©	
  2011-­‐2012,	
  Think	
  Big	
  Analy8cs,	
  All	
  Rights	
  Reserved
• SQL analysis with Hive.
• Other tools can use the data, too.
• Massive scalability with Hadoop.
31
Recap
Tuesday, January 10, 12
Copyright	
  ©	
  2011-­‐2012,	
  Think	
  Big	
  Analy8cs,	
  All	
  Rights	
  Reserved32
Hadoop
Master
✲ Job Tracker Name Node
HDFS
Hive
Driver
(compiles, optimizes, executes)
CLI HWI Thrift Server
Metastore
JDBC ODBC
Karmasphere Others...
Tuesday, January 10, 12
Hive queries generate MR jobs. (Some operations don’t invoke Hadoop processes, e.g., some
very simple queries and commands that just write updates to the metastore.)
CLI = Command Line Interface.
HWI = Hive Web Interface.
Copyright	
  ©	
  2011-­‐2012,	
  Think	
  Big	
  Analy8cs,	
  All	
  Rights	
  Reserved
• HDFS
• MapR
• S3
• HBase (new)
• Others...
33
Tables
Hadoop
Master
✲ Job Tracker Name Node
HDFS
Hive
Driver
(compiles, optimizes, executes)
CLI HWI Thrift Server
Metastore
JDBC ODBC
Karmasphere Others...
Tuesday, January 10, 12
There is “early” support for using Hive with HBase. Other databases and distributed file
systems will no doubt follow.
Copyright	
  ©	
  2011-­‐2012,	
  Think	
  Big	
  Analy8cs,	
  All	
  Rights	
  Reserved
• Table metadata
stored in a
relational DB.
34
Tables
Hadoop
Master
✲ Job Tracker Name Node
HDFS
Hive
Driver
(compiles, optimizes, executes)
CLI HWI Thrift Server
Metastore
JDBC ODBC
Karmasphere Others...
Tuesday, January 10, 12
For production, you need to set up a MySQL or PostgreSQL database for Hive’s metadata. Out
of the box, Hive uses a Derby DB, but it can only be used by a single user and a single process
at a time, so it’s fine for personal development only.
Copyright	
  ©	
  2011-­‐2012,	
  Think	
  Big	
  Analy8cs,	
  All	
  Rights	
  Reserved
• Most queries
use MapReduce
jobs.
35
Queries
Hadoop
Master
✲ Job Tracker Name Node
HDFS
Hive
Driver
(compiles, optimizes, executes)
CLI HWI Thrift Server
Metastore
JDBC ODBC
Karmasphere Others...
Tuesday, January 10, 12
Hive generates MapReduce jobs to implement all the but the simplest queries.
Copyright	
  ©	
  2011-­‐2012,	
  Think	
  Big	
  Analy8cs,	
  All	
  Rights	
  Reserved
• Benefits
• Horizontal scalability.
• Drawbacks
• Latency!
36
MapReduce Queries
Tuesday, January 10, 12
The high latency makes Hive unsuitable for “online” database use. (Hive also doesn’t support
transactions and has other limitations that are relevant here…) So, these limitations make
Hive best for offline (batch mode) use, such as data warehouse apps.
Copyright	
  ©	
  2011-­‐2012,	
  Think	
  Big	
  Analy8cs,	
  All	
  Rights	
  Reserved
• Benefits
• Horizontal scalability.
• Data redundancy.
• Drawbacks
• No insert, update, and
delete!
37
HDFS Storage
Tuesday, January 10, 12
You can generate new tables or write to local files. Forthcoming versions of HDFS will support
appending data.
Copyright	
  ©	
  2011-­‐2012,	
  Think	
  Big	
  Analy8cs,	
  All	
  Rights	
  Reserved
• Schema on Read
• Schema enforcement at
query time, not write
time.
38
HDFS Storage
Tuesday, January 10, 12
Especially for external tables, but even for internal ones since the files are HDFS files, Hive
can’t enforce that records written to table files have the specified schema, so it does these
checks at query time.
Copyright	
  ©	
  2011-­‐2012,	
  Think	
  Big	
  Analy8cs,	
  All	
  Rights	
  Reserved
• No Transactions.
• Some SQL features not
implemented (yet).
39
Other Limitations
Tuesday, January 10, 12
Copyright	
  ©	
  2011-­‐2012,	
  Think	
  Big	
  Analy8cs,	
  All	
  Rights	
  Reserved40
More on Tables
and Schemas
Tuesday, January 10, 12
Copyright	
  ©	
  2011-­‐2012,	
  Think	
  Big	
  Analy8cs,	
  All	
  Rights	
  Reserved41
Data Types
• The usual scalar types:
•TINYINT, …, BIGNT.
•FLOAT, DOUBLE.
•BOOLEAN.
•STRING.
Tuesday, January 10, 12
Like most databases...
Copyright	
  ©	
  2011-­‐2012,	
  Think	
  Big	
  Analy8cs,	
  All	
  Rights	
  Reserved42
Data Types
• The unusual complex types:
•STRUCT.
•MAP.
•ARRAY.
Tuesday, January 10, 12
Structs are like “objects” or “c-style structs”. Maps are key-value pairs, and you know what
arrays are ;)
Copyright	
  ©	
  2011-­‐2012,	
  Think	
  Big	
  Analy8cs,	
  All	
  Rights	
  Reserved43
CREATE TABLE employees (
name STRING,
salary FLOAT,
subordinates ARRAY<STRING>,
deductions MAP<STRING,FLOAT>,
address STRUCT<
street:STRING,
city:STRING,
state:STRING,
zip:INT>
);
Tuesday, January 10, 12
subordinates references other records by the employee name. (Hive doesn’t have indexes, in
the usual sense, but an indexing feature was recently added.) Deductions is a key-value list
of the name of the deduction and a float indicating the amount (e.g., %). Address is like a
“class”, “object”, or “c-style struct”, whatever you prefer.
Copyright	
  ©	
  2011-­‐2012,	
  Think	
  Big	
  Analy8cs,	
  All	
  Rights	
  Reserved44
File & Record Formats
CREATE TABLE employees (…)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '001'
COLLECTION ITEMS TERMINATED BY '002'
MAP KEYS TERMINATED BY '003'
LINES TERMINATED BY 'n'
STORED AS TEXTFILE;
All the
defaults for
text files!
Tuesday, January 10, 12
Suppose our employees table has a custom format and field delimiters. We can change them,
although here I’m showing all the default values used by Hive!
Copyright	
  ©	
  2011-­‐2012,	
  Think	
  Big	
  Analy8cs,	
  All	
  Rights	
  Reserved45
Select,Where,
Group By, Join,...
Tuesday, January 10, 12
Copyright	
  ©	
  2011-­‐2012,	
  Think	
  Big	
  Analy8cs,	
  All	
  Rights	
  Reserved46
Common SQL...
• You get most of the usual
suspects for SELECT, WHERE,
GROUP BY and JOIN.
Tuesday, January 10, 12
We’ll just highlight a few unique features.
Copyright	
  ©	
  2011-­‐2012,	
  Think	
  Big	
  Analy8cs,	
  All	
  Rights	
  Reserved47
ADD JAR MyUDFs.jar;
CREATE TEMPORARY FUNCTION
net_salary
AS 'com.example.NetCalcUDF';
SELECT name,
net_salary(salary, deductions)
FROM employees;
“User Defined
Functions”
Tuesday, January 10, 12
Following a Hive defined API, implement your own functions, build, put in a jar, and then use
them in your queries. Here we (pretend to) implement a function that takes the employee’s
salary and deductions, then computes the net salary.
Copyright	
  ©	
  2011-­‐2012,	
  Think	
  Big	
  Analy8cs,	
  All	
  Rights	
  Reserved48
SELECT name, salary
FROM employees
ORDER BY salary ASC;
ORDER BY vs.
SORT BY
• A total ordering - one reducer.
SELECT name, salary
FROM employees
SORT BY salary ASC;
• A local ordering - sorts within each reducer.
Tuesday, January 10, 12
For a giant data set, piping everything through one reducer might take a very long time. A
compromise is to sort “locally”, so each reducer sorts it’s output. However, if you structure
your jobs right, you might achieve a total order depending on how data gets to the reducers.
(E.g., each reducer handles a year’s worth of data, so joining the files together would be
totally sorted.)
Copyright	
  ©	
  2011-­‐2012,	
  Think	
  Big	
  Analy8cs,	
  All	
  Rights	
  Reserved
• Only equality (x = y).
49
Inner Joins
SELECT ...
FROM clicks a JOIN clicks b ON (
a.uid = b.uid, a.day = b.day)
WHERE a.process = 'movies'
AND b.process = 'books'
AND a.year > 2012;
Tuesday, January 10, 12
Note that the a.year > ‘…’ is in the WHERE clause, not the ON clause for the JOIN. (I’m doing a
correlation query; which users searched for movies and books on the same day?)
Some outer and semi join constructs supported, as well as some Hadoop-specific
optimization constructs.
Copyright	
  ©	
  2011-­‐2012,	
  Think	
  Big	
  Analy8cs,	
  All	
  Rights	
  Reserved50
A Final Example
of Controlling
MapReduce...
Tuesday, January 10, 12
Copyright	
  ©	
  2011-­‐2012,	
  Think	
  Big	
  Analy8cs,	
  All	
  Rights	
  Reserved
• Calling out to external
programs to perform map
and reduce operations.
51
Specify Map & Reduce
Processes
Tuesday, January 10, 12
Copyright	
  ©	
  2011-­‐2012,	
  Think	
  Big	
  Analy8cs,	
  All	
  Rights	
  Reserved52
Example
FROM (
FROM clicks
MAP message
USING '/tmp/vampire_extractor'
AS item_title, count
CLUSTER BY item_title) it
INSERT OVERWRITE TABLE vampire_stuff
REDUCE it.item_title, it.count
USING '/tmp/thing_counter.py'
AS item_title, counts;
Tuesday, January 10, 12
Note the MAP … USING and REDUCE … USING. We’re also using CLUSTER BY (distributing and
sorting on “item_title”).
Copyright	
  ©	
  2011-­‐2012,	
  Think	
  Big	
  Analy8cs,	
  All	
  Rights	
  Reserved53
Example
FROM (
FROM clicks
MAP message
USING '/tmp/vampire_extractor'
AS item_title, count
CLUSTER BY item_title) it
INSERT OVERWRITE TABLE vampire_stuff
REDUCE it.item_title, it.count
USING '/tmp/thing_counter.py'
AS item_title, counts;
Call specific
map and
reduce
processes.
Tuesday, January 10, 12
Note the MAP … USING and REDUCE … USING. We’re also using CLUSTER BY (distributing and
sorting on “item_title”).
Copyright	
  ©	
  2011-­‐2012,	
  Think	
  Big	
  Analy8cs,	
  All	
  Rights	
  Reserved54
… And Also:
FROM (
FROM clicks
MAP message
USING '/tmp/vampire_extractor'
AS item_title, count
CLUSTER BY item_title) it
INSERT OVERWRITE TABLE vampire_stuff
REDUCE it.item_title, it.count
USING '/tmp/thing_counter.py'
AS item_title, counts;
Like GROUP
BY, but
directs
output to
specific
reducers.
Tuesday, January 10, 12
Note the MAP … USING and REDUCE … USING. We’re also using CLUSTER BY (distributing and
sorting on “item_title”).
Copyright	
  ©	
  2011-­‐2012,	
  Think	
  Big	
  Analy8cs,	
  All	
  Rights	
  Reserved55
… And Also:
FROM (
FROM clicks
MAP message
USING '/tmp/vampire_extractor'
AS item_title, count
CLUSTER BY item_title) it
INSERT OVERWRITE TABLE vampire_stuff
REDUCE it.item_title, it.count
USING '/tmp/thing_counter.py'
AS item_title, counts;
How to
populate an
“internal”
table.
Tuesday, January 10, 12
Note the MAP … USING and REDUCE … USING. We’re also using CLUSTER BY (distributing and
sorting on “item_title”).
Copyright	
  ©	
  2011-­‐2012,	
  Think	
  Big	
  Analy8cs,	
  All	
  Rights	
  Reserved56
Hive:
Conclusions
Tuesday, January 10, 12
Copyright	
  ©	
  2011-­‐2012,	
  Think	
  Big	
  Analy8cs,	
  All	
  Rights	
  Reserved
• Not a real SQL Database.
• Transactions, updates, etc.
• … but features will grow.
• High latency queries.
• Documentation poor.
57
Hive Disadvantages
Tuesday, January 10, 12
Copyright	
  ©	
  2011-­‐2012,	
  Think	
  Big	
  Analy8cs,	
  All	
  Rights	
  Reserved
• Indispensable for SQL users.
• Easier than Java MR API.
• Makes porting data warehouse
apps to Hadoop much easier.
58
Hive Advantages
Tuesday, January 10, 12
Copyright	
  ©	
  2011-­‐2012,	
  Think	
  Big	
  Analy8cs,	
  All	
  Rights	
  Reserved59
Questions?
dean.wampler@thinkbiganalytics.com
thinkbiganalytics.com
This presentation: github.com/deanwampler/Presentations
Tuesday, January 10, 12

Más contenido relacionado

La actualidad más candente

Hadoop - Overview
Hadoop - OverviewHadoop - Overview
Hadoop - OverviewJay
 
Facebooks Petabyte Scale Data Warehouse using Hive and Hadoop
Facebooks Petabyte Scale Data Warehouse using Hive and HadoopFacebooks Petabyte Scale Data Warehouse using Hive and Hadoop
Facebooks Petabyte Scale Data Warehouse using Hive and Hadooproyans
 
Intro to Hadoop
Intro to HadoopIntro to Hadoop
Intro to Hadoopjeffturner
 
Integration of HIve and HBase
Integration of HIve and HBaseIntegration of HIve and HBase
Integration of HIve and HBaseHortonworks
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoopjoelcrabb
 
The Hadoop Ecosystem
The Hadoop EcosystemThe Hadoop Ecosystem
The Hadoop EcosystemJ Singh
 
Hadoop online training
Hadoop online training Hadoop online training
Hadoop online training Keylabs
 
Learning Apache HIVE - Data Warehouse and Query Language for Hadoop
Learning Apache HIVE - Data Warehouse and Query Language for HadoopLearning Apache HIVE - Data Warehouse and Query Language for Hadoop
Learning Apache HIVE - Data Warehouse and Query Language for HadoopSomeshwar Kale
 
Introduction to the Hadoop Ecosystem (SEACON Edition)
Introduction to the Hadoop Ecosystem (SEACON Edition)Introduction to the Hadoop Ecosystem (SEACON Edition)
Introduction to the Hadoop Ecosystem (SEACON Edition)Uwe Printz
 
Introduction to the Hadoop Ecosystem (codemotion Edition)
Introduction to the Hadoop Ecosystem (codemotion Edition)Introduction to the Hadoop Ecosystem (codemotion Edition)
Introduction to the Hadoop Ecosystem (codemotion Edition)Uwe Printz
 
Hadoop hive presentation
Hadoop hive presentationHadoop hive presentation
Hadoop hive presentationArvind Kumar
 
20131205 hadoop-hdfs-map reduce-introduction
20131205 hadoop-hdfs-map reduce-introduction20131205 hadoop-hdfs-map reduce-introduction
20131205 hadoop-hdfs-map reduce-introductionXuan-Chao Huang
 

La actualidad más candente (20)

Hadoop - Overview
Hadoop - OverviewHadoop - Overview
Hadoop - Overview
 
Facebooks Petabyte Scale Data Warehouse using Hive and Hadoop
Facebooks Petabyte Scale Data Warehouse using Hive and HadoopFacebooks Petabyte Scale Data Warehouse using Hive and Hadoop
Facebooks Petabyte Scale Data Warehouse using Hive and Hadoop
 
Intro to Hadoop
Intro to HadoopIntro to Hadoop
Intro to Hadoop
 
מיכאל
מיכאלמיכאל
מיכאל
 
Hadoop Family and Ecosystem
Hadoop Family and EcosystemHadoop Family and Ecosystem
Hadoop Family and Ecosystem
 
Integration of HIve and HBase
Integration of HIve and HBaseIntegration of HIve and HBase
Integration of HIve and HBase
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
An Introduction to Hadoop
An Introduction to HadoopAn Introduction to Hadoop
An Introduction to Hadoop
 
Hadoop pig
Hadoop pigHadoop pig
Hadoop pig
 
The Hadoop Ecosystem
The Hadoop EcosystemThe Hadoop Ecosystem
The Hadoop Ecosystem
 
Hadoop overview
Hadoop overviewHadoop overview
Hadoop overview
 
Hadoop online training
Hadoop online training Hadoop online training
Hadoop online training
 
Learning Apache HIVE - Data Warehouse and Query Language for Hadoop
Learning Apache HIVE - Data Warehouse and Query Language for HadoopLearning Apache HIVE - Data Warehouse and Query Language for Hadoop
Learning Apache HIVE - Data Warehouse and Query Language for Hadoop
 
Hadoop
HadoopHadoop
Hadoop
 
6.hive
6.hive6.hive
6.hive
 
Introduction to the Hadoop Ecosystem (SEACON Edition)
Introduction to the Hadoop Ecosystem (SEACON Edition)Introduction to the Hadoop Ecosystem (SEACON Edition)
Introduction to the Hadoop Ecosystem (SEACON Edition)
 
Introduction to Pig
Introduction to PigIntroduction to Pig
Introduction to Pig
 
Introduction to the Hadoop Ecosystem (codemotion Edition)
Introduction to the Hadoop Ecosystem (codemotion Edition)Introduction to the Hadoop Ecosystem (codemotion Edition)
Introduction to the Hadoop Ecosystem (codemotion Edition)
 
Hadoop hive presentation
Hadoop hive presentationHadoop hive presentation
Hadoop hive presentation
 
20131205 hadoop-hdfs-map reduce-introduction
20131205 hadoop-hdfs-map reduce-introduction20131205 hadoop-hdfs-map reduce-introduction
20131205 hadoop-hdfs-map reduce-introduction
 

Destacado

Big Data Timeline
Big Data TimelineBig Data Timeline
Big Data TimelineDeZyre
 
A Big Data Timeline
A Big Data TimelineA Big Data Timeline
A Big Data TimelineBig Cloud
 
Introduction to HiveQL
Introduction to HiveQLIntroduction to HiveQL
Introduction to HiveQLkristinferrier
 
SQL in the Hybrid World
SQL in the Hybrid WorldSQL in the Hybrid World
SQL in the Hybrid WorldTanel Poder
 
Hadoop World 2011: Replacing RDB/DW with Hadoop and Hive for Telco Big Data -...
Hadoop World 2011: Replacing RDB/DW with Hadoop and Hive for Telco Big Data -...Hadoop World 2011: Replacing RDB/DW with Hadoop and Hive for Telco Big Data -...
Hadoop World 2011: Replacing RDB/DW with Hadoop and Hive for Telco Big Data -...Cloudera, Inc.
 
Hadoop Internals (2.3.0 or later)
Hadoop Internals (2.3.0 or later)Hadoop Internals (2.3.0 or later)
Hadoop Internals (2.3.0 or later)Emilio Coppa
 
SQL Monitoring in Oracle Database 12c
SQL Monitoring in Oracle Database 12cSQL Monitoring in Oracle Database 12c
SQL Monitoring in Oracle Database 12cTanel Poder
 
Hadoop Interview Questions and Answers by rohit kapa
Hadoop Interview Questions and Answers by rohit kapaHadoop Interview Questions and Answers by rohit kapa
Hadoop Interview Questions and Answers by rohit kapakapa rohit
 
Hadoop Interview Questions and Answers | Big Data Interview Questions | Hadoo...
Hadoop Interview Questions and Answers | Big Data Interview Questions | Hadoo...Hadoop Interview Questions and Answers | Big Data Interview Questions | Hadoo...
Hadoop Interview Questions and Answers | Big Data Interview Questions | Hadoo...Edureka!
 
HIVE: Data Warehousing & Analytics on Hadoop
HIVE: Data Warehousing & Analytics on HadoopHIVE: Data Warehousing & Analytics on Hadoop
HIVE: Data Warehousing & Analytics on HadoopZheng Shao
 

Destacado (13)

20140708hcj
20140708hcj20140708hcj
20140708hcj
 
Apache hive
Apache hiveApache hive
Apache hive
 
Apache Hive
Apache HiveApache Hive
Apache Hive
 
Big Data Timeline
Big Data TimelineBig Data Timeline
Big Data Timeline
 
A Big Data Timeline
A Big Data TimelineA Big Data Timeline
A Big Data Timeline
 
Introduction to HiveQL
Introduction to HiveQLIntroduction to HiveQL
Introduction to HiveQL
 
SQL in the Hybrid World
SQL in the Hybrid WorldSQL in the Hybrid World
SQL in the Hybrid World
 
Hadoop World 2011: Replacing RDB/DW with Hadoop and Hive for Telco Big Data -...
Hadoop World 2011: Replacing RDB/DW with Hadoop and Hive for Telco Big Data -...Hadoop World 2011: Replacing RDB/DW with Hadoop and Hive for Telco Big Data -...
Hadoop World 2011: Replacing RDB/DW with Hadoop and Hive for Telco Big Data -...
 
Hadoop Internals (2.3.0 or later)
Hadoop Internals (2.3.0 or later)Hadoop Internals (2.3.0 or later)
Hadoop Internals (2.3.0 or later)
 
SQL Monitoring in Oracle Database 12c
SQL Monitoring in Oracle Database 12cSQL Monitoring in Oracle Database 12c
SQL Monitoring in Oracle Database 12c
 
Hadoop Interview Questions and Answers by rohit kapa
Hadoop Interview Questions and Answers by rohit kapaHadoop Interview Questions and Answers by rohit kapa
Hadoop Interview Questions and Answers by rohit kapa
 
Hadoop Interview Questions and Answers | Big Data Interview Questions | Hadoo...
Hadoop Interview Questions and Answers | Big Data Interview Questions | Hadoo...Hadoop Interview Questions and Answers | Big Data Interview Questions | Hadoo...
Hadoop Interview Questions and Answers | Big Data Interview Questions | Hadoo...
 
HIVE: Data Warehousing & Analytics on Hadoop
HIVE: Data Warehousing & Analytics on HadoopHIVE: Data Warehousing & Analytics on Hadoop
HIVE: Data Warehousing & Analytics on Hadoop
 

Similar a Hive sq lfor-hadoop

Dallas TDWI Meeting Dec. 2012: Hadoop
Dallas TDWI Meeting Dec. 2012: HadoopDallas TDWI Meeting Dec. 2012: Hadoop
Dallas TDWI Meeting Dec. 2012: Hadooplamont_lockwood
 
Why Scala Is Taking Over the Big Data World
Why Scala Is Taking Over the Big Data WorldWhy Scala Is Taking Over the Big Data World
Why Scala Is Taking Over the Big Data WorldDean Wampler
 
Cloud Austin Meetup - Hadoop like a champion
Cloud Austin Meetup - Hadoop like a championCloud Austin Meetup - Hadoop like a champion
Cloud Austin Meetup - Hadoop like a championAmeet Paranjape
 
Agile analytics applications on hadoop
Agile analytics applications on hadoopAgile analytics applications on hadoop
Agile analytics applications on hadoopHortonworks
 
Hortonworks: Agile Analytics Applications
Hortonworks: Agile Analytics ApplicationsHortonworks: Agile Analytics Applications
Hortonworks: Agile Analytics Applicationsrussell_jurney
 
Finding the needles in the haystack. An Overview of Analyzing Big Data with H...
Finding the needles in the haystack. An Overview of Analyzing Big Data with H...Finding the needles in the haystack. An Overview of Analyzing Big Data with H...
Finding the needles in the haystack. An Overview of Analyzing Big Data with H...Chris Baglieri
 
Hadoop, SQL & NoSQL: No Longer an Either-or Question
Hadoop, SQL & NoSQL: No Longer an Either-or QuestionHadoop, SQL & NoSQL: No Longer an Either-or Question
Hadoop, SQL & NoSQL: No Longer an Either-or QuestionTony Baer
 
Hadoop, SQL and NoSQL, No longer an either/or question
Hadoop, SQL and NoSQL, No longer an either/or questionHadoop, SQL and NoSQL, No longer an either/or question
Hadoop, SQL and NoSQL, No longer an either/or questionDataWorks Summit
 
Dancing elephants - efficiently working with object stores from Apache Spark ...
Dancing elephants - efficiently working with object stores from Apache Spark ...Dancing elephants - efficiently working with object stores from Apache Spark ...
Dancing elephants - efficiently working with object stores from Apache Spark ...DataWorks Summit
 
Hadoop crash course workshop at Hadoop Summit
Hadoop crash course workshop at Hadoop SummitHadoop crash course workshop at Hadoop Summit
Hadoop crash course workshop at Hadoop SummitDataWorks Summit
 
Running Hadoop as Service in AltiScale Platform
Running Hadoop as Service in AltiScale PlatformRunning Hadoop as Service in AltiScale Platform
Running Hadoop as Service in AltiScale PlatformInMobi Technology
 
2014 july 24_what_ishadoop
2014 july 24_what_ishadoop2014 july 24_what_ishadoop
2014 july 24_what_ishadoopAdam Muise
 
Sql on everything with drill
Sql on everything with drillSql on everything with drill
Sql on everything with drillJulien Le Dem
 
Hadoop demo ppt
Hadoop demo pptHadoop demo ppt
Hadoop demo pptPhil Young
 
OPERATING SYSTEM .pptx
OPERATING SYSTEM .pptxOPERATING SYSTEM .pptx
OPERATING SYSTEM .pptxAltafKhadim
 
Big data-at-detik
Big data-at-detikBig data-at-detik
Big data-at-detikk4ndar
 
A gentle introduction to the world of BigData and Hadoop
A gentle introduction to the world of BigData and HadoopA gentle introduction to the world of BigData and Hadoop
A gentle introduction to the world of BigData and HadoopStefano Paluello
 
Utrecht NL-HUG/Data Science-NL - Agile Data Slides
Utrecht NL-HUG/Data Science-NL - Agile Data SlidesUtrecht NL-HUG/Data Science-NL - Agile Data Slides
Utrecht NL-HUG/Data Science-NL - Agile Data SlidesHortonworks
 
Paris HUG - Agile Analytics Applications on Hadoop
Paris HUG - Agile Analytics Applications on HadoopParis HUG - Agile Analytics Applications on Hadoop
Paris HUG - Agile Analytics Applications on HadoopHortonworks
 

Similar a Hive sq lfor-hadoop (20)

Dallas TDWI Meeting Dec. 2012: Hadoop
Dallas TDWI Meeting Dec. 2012: HadoopDallas TDWI Meeting Dec. 2012: Hadoop
Dallas TDWI Meeting Dec. 2012: Hadoop
 
Why Scala Is Taking Over the Big Data World
Why Scala Is Taking Over the Big Data WorldWhy Scala Is Taking Over the Big Data World
Why Scala Is Taking Over the Big Data World
 
Cloud Austin Meetup - Hadoop like a champion
Cloud Austin Meetup - Hadoop like a championCloud Austin Meetup - Hadoop like a champion
Cloud Austin Meetup - Hadoop like a champion
 
Agile analytics applications on hadoop
Agile analytics applications on hadoopAgile analytics applications on hadoop
Agile analytics applications on hadoop
 
Hortonworks: Agile Analytics Applications
Hortonworks: Agile Analytics ApplicationsHortonworks: Agile Analytics Applications
Hortonworks: Agile Analytics Applications
 
Finding the needles in the haystack. An Overview of Analyzing Big Data with H...
Finding the needles in the haystack. An Overview of Analyzing Big Data with H...Finding the needles in the haystack. An Overview of Analyzing Big Data with H...
Finding the needles in the haystack. An Overview of Analyzing Big Data with H...
 
Hadoop, SQL & NoSQL: No Longer an Either-or Question
Hadoop, SQL & NoSQL: No Longer an Either-or QuestionHadoop, SQL & NoSQL: No Longer an Either-or Question
Hadoop, SQL & NoSQL: No Longer an Either-or Question
 
Hadoop, SQL and NoSQL, No longer an either/or question
Hadoop, SQL and NoSQL, No longer an either/or questionHadoop, SQL and NoSQL, No longer an either/or question
Hadoop, SQL and NoSQL, No longer an either/or question
 
Dancing elephants - efficiently working with object stores from Apache Spark ...
Dancing elephants - efficiently working with object stores from Apache Spark ...Dancing elephants - efficiently working with object stores from Apache Spark ...
Dancing elephants - efficiently working with object stores from Apache Spark ...
 
Hadoop crash course workshop at Hadoop Summit
Hadoop crash course workshop at Hadoop SummitHadoop crash course workshop at Hadoop Summit
Hadoop crash course workshop at Hadoop Summit
 
Running Hadoop as Service in AltiScale Platform
Running Hadoop as Service in AltiScale PlatformRunning Hadoop as Service in AltiScale Platform
Running Hadoop as Service in AltiScale Platform
 
2014 july 24_what_ishadoop
2014 july 24_what_ishadoop2014 july 24_what_ishadoop
2014 july 24_what_ishadoop
 
Sql on everything with drill
Sql on everything with drillSql on everything with drill
Sql on everything with drill
 
Hadoop demo ppt
Hadoop demo pptHadoop demo ppt
Hadoop demo ppt
 
OPERATING SYSTEM .pptx
OPERATING SYSTEM .pptxOPERATING SYSTEM .pptx
OPERATING SYSTEM .pptx
 
Hadoop in action
Hadoop in actionHadoop in action
Hadoop in action
 
Big data-at-detik
Big data-at-detikBig data-at-detik
Big data-at-detik
 
A gentle introduction to the world of BigData and Hadoop
A gentle introduction to the world of BigData and HadoopA gentle introduction to the world of BigData and Hadoop
A gentle introduction to the world of BigData and Hadoop
 
Utrecht NL-HUG/Data Science-NL - Agile Data Slides
Utrecht NL-HUG/Data Science-NL - Agile Data SlidesUtrecht NL-HUG/Data Science-NL - Agile Data Slides
Utrecht NL-HUG/Data Science-NL - Agile Data Slides
 
Paris HUG - Agile Analytics Applications on Hadoop
Paris HUG - Agile Analytics Applications on HadoopParis HUG - Agile Analytics Applications on Hadoop
Paris HUG - Agile Analytics Applications on Hadoop
 

Más de Pragati Singh

Nessus Scanner: Network Scanning from Beginner to Advanced!
Nessus Scanner: Network Scanning from Beginner to Advanced! Nessus Scanner: Network Scanning from Beginner to Advanced!
Nessus Scanner: Network Scanning from Beginner to Advanced! Pragati Singh
 
Tenable Certified Sales Associate - CS.pdf
Tenable Certified Sales Associate - CS.pdfTenable Certified Sales Associate - CS.pdf
Tenable Certified Sales Associate - CS.pdfPragati Singh
 
Analyzing risk (pmbok® guide sixth edition)
Analyzing risk (pmbok® guide sixth edition)Analyzing risk (pmbok® guide sixth edition)
Analyzing risk (pmbok® guide sixth edition)Pragati Singh
 
Pragati Singh | Sap Badge
Pragati Singh | Sap BadgePragati Singh | Sap Badge
Pragati Singh | Sap BadgePragati Singh
 
Ios record of achievement
Ios  record of achievementIos  record of achievement
Ios record of achievementPragati Singh
 
Ios2 confirmation ofparticipation
Ios2 confirmation ofparticipationIos2 confirmation ofparticipation
Ios2 confirmation ofparticipationPragati Singh
 
Certificate of completion android studio essential training 2016
Certificate of completion android studio essential training 2016Certificate of completion android studio essential training 2016
Certificate of completion android studio essential training 2016Pragati Singh
 
Certificate of completion android development essential training create your ...
Certificate of completion android development essential training create your ...Certificate of completion android development essential training create your ...
Certificate of completion android development essential training create your ...Pragati Singh
 
Certificate of completion android development essential training design a use...
Certificate of completion android development essential training design a use...Certificate of completion android development essential training design a use...
Certificate of completion android development essential training design a use...Pragati Singh
 
Certificate of completion android development essential training support mult...
Certificate of completion android development essential training support mult...Certificate of completion android development essential training support mult...
Certificate of completion android development essential training support mult...Pragati Singh
 
Certificate of completion android development essential training manage navig...
Certificate of completion android development essential training manage navig...Certificate of completion android development essential training manage navig...
Certificate of completion android development essential training manage navig...Pragati Singh
 
Certificate of completion android development essential training local data s...
Certificate of completion android development essential training local data s...Certificate of completion android development essential training local data s...
Certificate of completion android development essential training local data s...Pragati Singh
 
Certificate of completion android development essential training distributing...
Certificate of completion android development essential training distributing...Certificate of completion android development essential training distributing...
Certificate of completion android development essential training distributing...Pragati Singh
 
Certificate of completion android app development communicating with the user
Certificate of completion android app development communicating with the userCertificate of completion android app development communicating with the user
Certificate of completion android app development communicating with the userPragati Singh
 
Certificate of completion building flexible android apps with the fragments api
Certificate of completion building flexible android apps with the fragments apiCertificate of completion building flexible android apps with the fragments api
Certificate of completion building flexible android apps with the fragments apiPragati Singh
 
Certificate of completion android app development design patterns for mobile ...
Certificate of completion android app development design patterns for mobile ...Certificate of completion android app development design patterns for mobile ...
Certificate of completion android app development design patterns for mobile ...Pragati Singh
 
Certificate of completion java design patterns and apis for android
Certificate of completion java design patterns and apis for androidCertificate of completion java design patterns and apis for android
Certificate of completion java design patterns and apis for androidPragati Singh
 
Certificate of completion android development concurrent programming
Certificate of completion android development concurrent programmingCertificate of completion android development concurrent programming
Certificate of completion android development concurrent programmingPragati Singh
 
Certificate of completion android app development data persistence libraries
Certificate of completion android app development data persistence librariesCertificate of completion android app development data persistence libraries
Certificate of completion android app development data persistence librariesPragati Singh
 
Certificate of completion android app development restful web services
Certificate of completion android app development restful web servicesCertificate of completion android app development restful web services
Certificate of completion android app development restful web servicesPragati Singh
 

Más de Pragati Singh (20)

Nessus Scanner: Network Scanning from Beginner to Advanced!
Nessus Scanner: Network Scanning from Beginner to Advanced! Nessus Scanner: Network Scanning from Beginner to Advanced!
Nessus Scanner: Network Scanning from Beginner to Advanced!
 
Tenable Certified Sales Associate - CS.pdf
Tenable Certified Sales Associate - CS.pdfTenable Certified Sales Associate - CS.pdf
Tenable Certified Sales Associate - CS.pdf
 
Analyzing risk (pmbok® guide sixth edition)
Analyzing risk (pmbok® guide sixth edition)Analyzing risk (pmbok® guide sixth edition)
Analyzing risk (pmbok® guide sixth edition)
 
Pragati Singh | Sap Badge
Pragati Singh | Sap BadgePragati Singh | Sap Badge
Pragati Singh | Sap Badge
 
Ios record of achievement
Ios  record of achievementIos  record of achievement
Ios record of achievement
 
Ios2 confirmation ofparticipation
Ios2 confirmation ofparticipationIos2 confirmation ofparticipation
Ios2 confirmation ofparticipation
 
Certificate of completion android studio essential training 2016
Certificate of completion android studio essential training 2016Certificate of completion android studio essential training 2016
Certificate of completion android studio essential training 2016
 
Certificate of completion android development essential training create your ...
Certificate of completion android development essential training create your ...Certificate of completion android development essential training create your ...
Certificate of completion android development essential training create your ...
 
Certificate of completion android development essential training design a use...
Certificate of completion android development essential training design a use...Certificate of completion android development essential training design a use...
Certificate of completion android development essential training design a use...
 
Certificate of completion android development essential training support mult...
Certificate of completion android development essential training support mult...Certificate of completion android development essential training support mult...
Certificate of completion android development essential training support mult...
 
Certificate of completion android development essential training manage navig...
Certificate of completion android development essential training manage navig...Certificate of completion android development essential training manage navig...
Certificate of completion android development essential training manage navig...
 
Certificate of completion android development essential training local data s...
Certificate of completion android development essential training local data s...Certificate of completion android development essential training local data s...
Certificate of completion android development essential training local data s...
 
Certificate of completion android development essential training distributing...
Certificate of completion android development essential training distributing...Certificate of completion android development essential training distributing...
Certificate of completion android development essential training distributing...
 
Certificate of completion android app development communicating with the user
Certificate of completion android app development communicating with the userCertificate of completion android app development communicating with the user
Certificate of completion android app development communicating with the user
 
Certificate of completion building flexible android apps with the fragments api
Certificate of completion building flexible android apps with the fragments apiCertificate of completion building flexible android apps with the fragments api
Certificate of completion building flexible android apps with the fragments api
 
Certificate of completion android app development design patterns for mobile ...
Certificate of completion android app development design patterns for mobile ...Certificate of completion android app development design patterns for mobile ...
Certificate of completion android app development design patterns for mobile ...
 
Certificate of completion java design patterns and apis for android
Certificate of completion java design patterns and apis for androidCertificate of completion java design patterns and apis for android
Certificate of completion java design patterns and apis for android
 
Certificate of completion android development concurrent programming
Certificate of completion android development concurrent programmingCertificate of completion android development concurrent programming
Certificate of completion android development concurrent programming
 
Certificate of completion android app development data persistence libraries
Certificate of completion android app development data persistence librariesCertificate of completion android app development data persistence libraries
Certificate of completion android app development data persistence libraries
 
Certificate of completion android app development restful web services
Certificate of completion android app development restful web servicesCertificate of completion android app development restful web services
Certificate of completion android app development restful web services
 

Último

Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxupamatechverse
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSRajkumarAkumalla
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingrakeshbaidya232001
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)Suman Mia
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...ranjana rawat
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college projectTonystark477637
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduitsrknatarajan
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performancesivaprakash250
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations120cr0395
 

Último (20)

Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )
 
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writing
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college project
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduits
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 

Hive sq lfor-hadoop

  • 1. Copyright  ©  2011-­‐2012,  Think  Big  Analy8cs,  All  Rights  Reserved Hive: SQL for Hadoop An  Essen8al  Tool  for   Hadoop-­‐based   Data  Warehouses Tuesday, January 10, 12 I’ll  argue  that  Hive  is  indispensable  to  people  crea8ng  “data  warehouses”  with  Hadoop,  because  it  gives  them  a  “similar”  SQL  interface  to  their  data,   making  it  easier  to  migrate  skills  and  even  apps  from  exis8ng  rela8onal  tools  to  Hadoop.
  • 2. Copyright  ©  2011-­‐2012,  Think  Big  Analy8cs,  All  Rights  Reserved Adapted from our Hadoop 3-Day Developer Course info@thinkbiganalytics.com thinkbiganalytics.com Tuesday, January 10, 12 These  notes  are  (heavily)  adapted  from  our  Hadoop  3-­‐day  developer  course.  Of  course,  we  won’t  do  exercises  in  this  talk  ;)  The  second  day  focuses  on   Hive.  We  also  offer  Hive  training  by  itself,  admin  courses,  etc.  
  • 3. Copyright  ©  2011-­‐2012,  Think  Big  Analy8cs,  All  Rights  Reserved Programming Hive Summer, 2012 O’Reilly Tuesday, January 10, 12 I’m  collabora8ng  with  two  other  colleagues  to  write  “Programming  Hive”,  which  will  be  published  by  O’Reilly  this  summer.  Un8l  it’s  ready,  the  best  source   of  documenta8on  is  on  the  Hive  site,  hYp://hive.apache.org.
  • 4. Copyright  ©  2011-­‐2012,  Think  Big  Analy8cs,  All  Rights  Reserved4 programmingscala.compolyglotprogramming.com/fpjava Dean Wampler Functional Programming for Java Developers Dean Wampler 20+ years exp. Think Big Academy Tuesday, January 10, 12
  • 5. Copyright  ©  2011-­‐2012,  Think  Big  Analy8cs,  All  Rights  Reserved5 A True Story... Tuesday, January 10, 12
  • 6. Once upon an Internet time, in the land of Enterprise, the King’s Warehouse of Data was bursting at the seams. Tuesday, January 10, 12
  • 7. And lo, the scribes decided they could cut costs and store more data by moving to Hadoop. Tuesday, January 10, 12
  • 8. But then the scribes complained amongst themselves, “The language of the land of Java hurts our tongues! Tuesday, January 10, 12
  • 9. “It takes forever to say anything!” Tuesday, January 10, 12
  • 10. There was a scribe who walked through the Kingdom-wide Labyrinth (kwl), where he found The Book of Faces. Tuesday, January 10, 12
  • 11. In that book, he found the secret language the Bee Keepers use to query the god Hadoop! Tuesday, January 10, 12
  • 12. And there was much rejoicing, for the scribes could work with their data again! Tuesday, January 10, 12
  • 14. Copyright  ©  2011-­‐2012,  Think  Big  Analy8cs,  All  Rights  Reserved14 Enter Hive Tuesday, January 10, 12
  • 15. Copyright  ©  2011-­‐2012,  Think  Big  Analy8cs,  All  Rights  Reserved15 Since your team knows SQL and all your DataWarehouse apps are written in SQL, Hive minimizes the effort of migrating to Hadoop. Tuesday, January 10, 12
  • 16. Copyright  ©  2011-­‐2012,  Think  Big  Analy8cs,  All  Rights  Reserved • Ideal for data warehousing. • Ad-hoc queries of data. • Familiar SQL dialect. • Analysis of large data sets. • Hadoop MapReduce jobs. 16 Hive Tuesday, January 10, 12 Hive is a killer app, in our opinion, for data warehouse teams migrating to Hadoop, because it gives them a familiar SQL language that hides the complexity of MR programming.
  • 17. Copyright  ©  2011-­‐2012,  Think  Big  Analy8cs,  All  Rights  Reserved • Invented at Facebook. • Open sourced to Apache in 2008. • http://hive.apache.org 17 Hive Tuesday, January 10, 12
  • 18. Copyright  ©  2011-­‐2012,  Think  Big  Analy8cs,  All  Rights  Reserved18 A Scenario: Mining Daily Click Stream Logs Tuesday, January 10, 12
  • 19. Copyright  ©  2011-­‐2012,  Think  Big  Analy8cs,  All  Rights  Reserved19 Ingest & Transform: • From: file://server1/var/log/clicks.log Jan 9 09:02:17 server1 movies[18]: 1234: search for “vampires in love”. … Tuesday, January 10, 12 As we copy the daily click stream log over to a local staging location, we transform it into the Hive table format we want.
  • 20. Copyright  ©  2011-­‐2012,  Think  Big  Analy8cs,  All  Rights  Reserved20 Ingest & Transform: • From: file://server1/var/log/clicks.log Jan 9 09:02:17 server1 movies[18]: 1234: search for “vampires in love”. … Timestamp Tuesday, January 10, 12
  • 21. Copyright  ©  2011-­‐2012,  Think  Big  Analy8cs,  All  Rights  Reserved21 Ingest & Transform: • From: file://server1/var/log/clicks.log Jan 9 09:02:17 server1 movies[18]: 1234: search for “vampires in love”. … The server Tuesday, January 10, 12
  • 22. Copyright  ©  2011-­‐2012,  Think  Big  Analy8cs,  All  Rights  Reserved22 Ingest & Transform: • From: file://server1/var/log/clicks.log Jan 9 09:02:17 server1 movies[18]: 1234: search for “vampires in love”. … The process (“movies search”) and the process id. Tuesday, January 10, 12
  • 23. Copyright  ©  2011-­‐2012,  Think  Big  Analy8cs,  All  Rights  Reserved23 Ingest & Transform: • From: file://server1/var/log/clicks.log Jan 9 09:02:17 server1 movies[18]: 1234: search for “vampires in love”. … Customer id Tuesday, January 10, 12
  • 24. Copyright  ©  2011-­‐2012,  Think  Big  Analy8cs,  All  Rights  Reserved24 Ingest & Transform: • From: file://server1/var/log/clicks.log Jan 9 09:02:17 server1 movies[18]: 1234: search for “vampires in love”. … The log “message” Tuesday, January 10, 12
  • 25. Copyright  ©  2011-­‐2012,  Think  Big  Analy8cs,  All  Rights  Reserved25 Ingest & Transform: • From: file://server1/var/log/clicks.log 09:02:17^Aserver1^Amovies^A18^A1234^Asea rch for “vampires in love”. … • To: /staging/2012-01-09.log Jan 9 09:02:17 server1 movies[18]: 1234: search for “vampires in love”. … Tuesday, January 10, 12 As we copy the daily click stream log over to a local staging location, we transform it into the Hive table format we want.
  • 26. Copyright  ©  2011-­‐2012,  Think  Big  Analy8cs,  All  Rights  Reserved26 Ingest & Transform: 09:02:17^Aserver1^Amovies^A18^A1234^Asea rch for “vampires in love”. … • To: /staging/2012-01-09.log • Removed month (Jan) and day (09). • Added ^A as field separators (Hive convention). • Separated process id from process name. Tuesday, January 10, 12 The transformations we made. (You could use many different Linux, scripting, code, or Hadoop-related ingestion tools to do this.
  • 27. Copyright  ©  2011-­‐2012,  Think  Big  Analy8cs,  All  Rights  Reserved27 Ingest & Transform: hadoop fs -put /staging/2012-01-09.log /clicks/2012/01/09/log.txt • Put in HDFS: • (The final file name doesn’t matter…) Tuesday, January 10, 12 Here we use the hadoop shell command to put the file where we want it in the file system. Note that the name of the target file doesn’t matter; we’ll just tell Hive to read all files in the directory, so there could be many files there!
  • 28. Copyright  ©  2011-­‐2012,  Think  Big  Analy8cs,  All  Rights  Reserved28 Back to Hive... CREATE EXTERNAL TABLE clicks ( hms STRING, hostname STRING, process STRING, pid INT, uid INT, message STRING) PARTITIONED BY ( year INT, month INT, day INT); • Create an external Hive table: You don’t have to use EXTERNAL and PARTITIONED together…. Tuesday, January 10, 12 Now let’s create an “external” table that will read those files as the “backing store”. Also, we make it partitioned to accelerate queries that limit by year, month or day. (You don’t have to use external and partitioned together…)
  • 29. Copyright  ©  2011-­‐2012,  Think  Big  Analy8cs,  All  Rights  Reserved29 Back to Hive... ALTER TABLE clicks ADD IF NOT EXISTS PARTITION ( year=2012, month=01, day=09) LOCATION '/clicks/2012/01/09'; • Add a partition for 2012-01-09: • A directory in HDFS. Tuesday, January 10, 12 We add a partition for each day. Note the LOCATION path, which is a the directory where we wrote our file.
  • 30. Copyright  ©  2011-­‐2012,  Think  Big  Analy8cs,  All  Rights  Reserved30 Now,Analyze!! SELECT hms, uid, message FROM clicks WHERE message LIKE '%vampire%' AND year = 2012 AND month = 01 AND day = 09; • What’s with the kids and vampires?? • After some MapReduce crunching... … 09:02:29 1234 search for “twilight of the vampires” 09:02:35 1234 add to cart “vampires want their genre back” … Tuesday, January 10, 12 And we can run SQL queries!!
  • 31. Copyright  ©  2011-­‐2012,  Think  Big  Analy8cs,  All  Rights  Reserved • SQL analysis with Hive. • Other tools can use the data, too. • Massive scalability with Hadoop. 31 Recap Tuesday, January 10, 12
  • 32. Copyright  ©  2011-­‐2012,  Think  Big  Analy8cs,  All  Rights  Reserved32 Hadoop Master ✲ Job Tracker Name Node HDFS Hive Driver (compiles, optimizes, executes) CLI HWI Thrift Server Metastore JDBC ODBC Karmasphere Others... Tuesday, January 10, 12 Hive queries generate MR jobs. (Some operations don’t invoke Hadoop processes, e.g., some very simple queries and commands that just write updates to the metastore.) CLI = Command Line Interface. HWI = Hive Web Interface.
  • 33. Copyright  ©  2011-­‐2012,  Think  Big  Analy8cs,  All  Rights  Reserved • HDFS • MapR • S3 • HBase (new) • Others... 33 Tables Hadoop Master ✲ Job Tracker Name Node HDFS Hive Driver (compiles, optimizes, executes) CLI HWI Thrift Server Metastore JDBC ODBC Karmasphere Others... Tuesday, January 10, 12 There is “early” support for using Hive with HBase. Other databases and distributed file systems will no doubt follow.
  • 34. Copyright  ©  2011-­‐2012,  Think  Big  Analy8cs,  All  Rights  Reserved • Table metadata stored in a relational DB. 34 Tables Hadoop Master ✲ Job Tracker Name Node HDFS Hive Driver (compiles, optimizes, executes) CLI HWI Thrift Server Metastore JDBC ODBC Karmasphere Others... Tuesday, January 10, 12 For production, you need to set up a MySQL or PostgreSQL database for Hive’s metadata. Out of the box, Hive uses a Derby DB, but it can only be used by a single user and a single process at a time, so it’s fine for personal development only.
  • 35. Copyright  ©  2011-­‐2012,  Think  Big  Analy8cs,  All  Rights  Reserved • Most queries use MapReduce jobs. 35 Queries Hadoop Master ✲ Job Tracker Name Node HDFS Hive Driver (compiles, optimizes, executes) CLI HWI Thrift Server Metastore JDBC ODBC Karmasphere Others... Tuesday, January 10, 12 Hive generates MapReduce jobs to implement all the but the simplest queries.
  • 36. Copyright  ©  2011-­‐2012,  Think  Big  Analy8cs,  All  Rights  Reserved • Benefits • Horizontal scalability. • Drawbacks • Latency! 36 MapReduce Queries Tuesday, January 10, 12 The high latency makes Hive unsuitable for “online” database use. (Hive also doesn’t support transactions and has other limitations that are relevant here…) So, these limitations make Hive best for offline (batch mode) use, such as data warehouse apps.
  • 37. Copyright  ©  2011-­‐2012,  Think  Big  Analy8cs,  All  Rights  Reserved • Benefits • Horizontal scalability. • Data redundancy. • Drawbacks • No insert, update, and delete! 37 HDFS Storage Tuesday, January 10, 12 You can generate new tables or write to local files. Forthcoming versions of HDFS will support appending data.
  • 38. Copyright  ©  2011-­‐2012,  Think  Big  Analy8cs,  All  Rights  Reserved • Schema on Read • Schema enforcement at query time, not write time. 38 HDFS Storage Tuesday, January 10, 12 Especially for external tables, but even for internal ones since the files are HDFS files, Hive can’t enforce that records written to table files have the specified schema, so it does these checks at query time.
  • 39. Copyright  ©  2011-­‐2012,  Think  Big  Analy8cs,  All  Rights  Reserved • No Transactions. • Some SQL features not implemented (yet). 39 Other Limitations Tuesday, January 10, 12
  • 40. Copyright  ©  2011-­‐2012,  Think  Big  Analy8cs,  All  Rights  Reserved40 More on Tables and Schemas Tuesday, January 10, 12
  • 41. Copyright  ©  2011-­‐2012,  Think  Big  Analy8cs,  All  Rights  Reserved41 Data Types • The usual scalar types: •TINYINT, …, BIGNT. •FLOAT, DOUBLE. •BOOLEAN. •STRING. Tuesday, January 10, 12 Like most databases...
  • 42. Copyright  ©  2011-­‐2012,  Think  Big  Analy8cs,  All  Rights  Reserved42 Data Types • The unusual complex types: •STRUCT. •MAP. •ARRAY. Tuesday, January 10, 12 Structs are like “objects” or “c-style structs”. Maps are key-value pairs, and you know what arrays are ;)
  • 43. Copyright  ©  2011-­‐2012,  Think  Big  Analy8cs,  All  Rights  Reserved43 CREATE TABLE employees ( name STRING, salary FLOAT, subordinates ARRAY<STRING>, deductions MAP<STRING,FLOAT>, address STRUCT< street:STRING, city:STRING, state:STRING, zip:INT> ); Tuesday, January 10, 12 subordinates references other records by the employee name. (Hive doesn’t have indexes, in the usual sense, but an indexing feature was recently added.) Deductions is a key-value list of the name of the deduction and a float indicating the amount (e.g., %). Address is like a “class”, “object”, or “c-style struct”, whatever you prefer.
  • 44. Copyright  ©  2011-­‐2012,  Think  Big  Analy8cs,  All  Rights  Reserved44 File & Record Formats CREATE TABLE employees (…) ROW FORMAT DELIMITED FIELDS TERMINATED BY '001' COLLECTION ITEMS TERMINATED BY '002' MAP KEYS TERMINATED BY '003' LINES TERMINATED BY 'n' STORED AS TEXTFILE; All the defaults for text files! Tuesday, January 10, 12 Suppose our employees table has a custom format and field delimiters. We can change them, although here I’m showing all the default values used by Hive!
  • 45. Copyright  ©  2011-­‐2012,  Think  Big  Analy8cs,  All  Rights  Reserved45 Select,Where, Group By, Join,... Tuesday, January 10, 12
  • 46. Copyright  ©  2011-­‐2012,  Think  Big  Analy8cs,  All  Rights  Reserved46 Common SQL... • You get most of the usual suspects for SELECT, WHERE, GROUP BY and JOIN. Tuesday, January 10, 12 We’ll just highlight a few unique features.
  • 47. Copyright  ©  2011-­‐2012,  Think  Big  Analy8cs,  All  Rights  Reserved47 ADD JAR MyUDFs.jar; CREATE TEMPORARY FUNCTION net_salary AS 'com.example.NetCalcUDF'; SELECT name, net_salary(salary, deductions) FROM employees; “User Defined Functions” Tuesday, January 10, 12 Following a Hive defined API, implement your own functions, build, put in a jar, and then use them in your queries. Here we (pretend to) implement a function that takes the employee’s salary and deductions, then computes the net salary.
  • 48. Copyright  ©  2011-­‐2012,  Think  Big  Analy8cs,  All  Rights  Reserved48 SELECT name, salary FROM employees ORDER BY salary ASC; ORDER BY vs. SORT BY • A total ordering - one reducer. SELECT name, salary FROM employees SORT BY salary ASC; • A local ordering - sorts within each reducer. Tuesday, January 10, 12 For a giant data set, piping everything through one reducer might take a very long time. A compromise is to sort “locally”, so each reducer sorts it’s output. However, if you structure your jobs right, you might achieve a total order depending on how data gets to the reducers. (E.g., each reducer handles a year’s worth of data, so joining the files together would be totally sorted.)
  • 49. Copyright  ©  2011-­‐2012,  Think  Big  Analy8cs,  All  Rights  Reserved • Only equality (x = y). 49 Inner Joins SELECT ... FROM clicks a JOIN clicks b ON ( a.uid = b.uid, a.day = b.day) WHERE a.process = 'movies' AND b.process = 'books' AND a.year > 2012; Tuesday, January 10, 12 Note that the a.year > ‘…’ is in the WHERE clause, not the ON clause for the JOIN. (I’m doing a correlation query; which users searched for movies and books on the same day?) Some outer and semi join constructs supported, as well as some Hadoop-specific optimization constructs.
  • 50. Copyright  ©  2011-­‐2012,  Think  Big  Analy8cs,  All  Rights  Reserved50 A Final Example of Controlling MapReduce... Tuesday, January 10, 12
  • 51. Copyright  ©  2011-­‐2012,  Think  Big  Analy8cs,  All  Rights  Reserved • Calling out to external programs to perform map and reduce operations. 51 Specify Map & Reduce Processes Tuesday, January 10, 12
  • 52. Copyright  ©  2011-­‐2012,  Think  Big  Analy8cs,  All  Rights  Reserved52 Example FROM ( FROM clicks MAP message USING '/tmp/vampire_extractor' AS item_title, count CLUSTER BY item_title) it INSERT OVERWRITE TABLE vampire_stuff REDUCE it.item_title, it.count USING '/tmp/thing_counter.py' AS item_title, counts; Tuesday, January 10, 12 Note the MAP … USING and REDUCE … USING. We’re also using CLUSTER BY (distributing and sorting on “item_title”).
  • 53. Copyright  ©  2011-­‐2012,  Think  Big  Analy8cs,  All  Rights  Reserved53 Example FROM ( FROM clicks MAP message USING '/tmp/vampire_extractor' AS item_title, count CLUSTER BY item_title) it INSERT OVERWRITE TABLE vampire_stuff REDUCE it.item_title, it.count USING '/tmp/thing_counter.py' AS item_title, counts; Call specific map and reduce processes. Tuesday, January 10, 12 Note the MAP … USING and REDUCE … USING. We’re also using CLUSTER BY (distributing and sorting on “item_title”).
  • 54. Copyright  ©  2011-­‐2012,  Think  Big  Analy8cs,  All  Rights  Reserved54 … And Also: FROM ( FROM clicks MAP message USING '/tmp/vampire_extractor' AS item_title, count CLUSTER BY item_title) it INSERT OVERWRITE TABLE vampire_stuff REDUCE it.item_title, it.count USING '/tmp/thing_counter.py' AS item_title, counts; Like GROUP BY, but directs output to specific reducers. Tuesday, January 10, 12 Note the MAP … USING and REDUCE … USING. We’re also using CLUSTER BY (distributing and sorting on “item_title”).
  • 55. Copyright  ©  2011-­‐2012,  Think  Big  Analy8cs,  All  Rights  Reserved55 … And Also: FROM ( FROM clicks MAP message USING '/tmp/vampire_extractor' AS item_title, count CLUSTER BY item_title) it INSERT OVERWRITE TABLE vampire_stuff REDUCE it.item_title, it.count USING '/tmp/thing_counter.py' AS item_title, counts; How to populate an “internal” table. Tuesday, January 10, 12 Note the MAP … USING and REDUCE … USING. We’re also using CLUSTER BY (distributing and sorting on “item_title”).
  • 56. Copyright  ©  2011-­‐2012,  Think  Big  Analy8cs,  All  Rights  Reserved56 Hive: Conclusions Tuesday, January 10, 12
  • 57. Copyright  ©  2011-­‐2012,  Think  Big  Analy8cs,  All  Rights  Reserved • Not a real SQL Database. • Transactions, updates, etc. • … but features will grow. • High latency queries. • Documentation poor. 57 Hive Disadvantages Tuesday, January 10, 12
  • 58. Copyright  ©  2011-­‐2012,  Think  Big  Analy8cs,  All  Rights  Reserved • Indispensable for SQL users. • Easier than Java MR API. • Makes porting data warehouse apps to Hadoop much easier. 58 Hive Advantages Tuesday, January 10, 12
  • 59. Copyright  ©  2011-­‐2012,  Think  Big  Analy8cs,  All  Rights  Reserved59 Questions? dean.wampler@thinkbiganalytics.com thinkbiganalytics.com This presentation: github.com/deanwampler/Presentations Tuesday, January 10, 12