SlideShare una empresa de Scribd logo
1 de 56
Descargar para leer sin conexión
●

●

●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●

●
●
●
●

●

●
●

●
●

●

●
●
●
●

●
●

●

> hadoop fs
hadoop fs
●
●
●
●
●
●
●
●
●

$ hadoop fs

ls

●

$ hadoop fs –help ls
●

$ hadoop fs –ls <path>
$ hadoop fs –ls /
●

$ hadoop fs -ls
$ hadoop fs –ls /user/cloudera
●
●
●

$ hadoop fs -mkdir data
$ hadoop fs -ls
●

$ cd ~/bigdata/Exercises/hadoop/data
$ ls -l
$ hadoop fs –put mammograms.zip data
●
●
●

http://localhost:50070
fsck: an HDFS utility

$ hadoop fsck /user/cloudera/data/mammograms.zip 
-blocks -locations -files
●

$ head -n 100 ato_centenary.txt 
| hadoop fs –put - data/ato100.txt
●

$ head -n 1000 ato_centenary.txt 
| hadoop fs –put - data/ato100.txt
●

put: ‘data/ato100.txt': File exists
●

$ hadoop fs -rm data/ato100.txt
$ head -n 1000 ato_centenary.txt 
| hadoop fs –put - data/ato100.txt
●

$ hadoop fs -cat data/ato100.txt | less
●

$ hadoop fs -get data/ato100.txt ato100.txt
●

-mv, -cp, -rmdir, -stat ...
●
●
●
●
●
●
●
●
●
●
●

●
●
○
■
●
○
●
○
●
○
●
○
○
○
●
●

●
●
●
●
●
●
●

●

$ javac –classpath

`hadoop classpath` *.java

●

$ jar cvf csiro.jar *.class
●

$ hadoop jar csiro.jar Csiro input_dir output_dir
●

○
●
●

map(in_key, in_value) ->
(inter_key, inter_value) list
●

○
■
■
■

●
●

let map(key, value) =
emit(key.toUpper(), value.
toUpper())
(‘csiro’, ‘cci’) -> (‘CSIRO’, ‘CCI’)
(‘csiro’, ‘cesre’) -> (‘CSIRO’, ‘CESRE’)
(‘csiro’, ‘cmse’) -> (‘CSIRO’, ‘CMSE’)
(‘toyota’, ‘yaris’) -> (‘TOYOTA’,
‘YARIS’)
●

let map(key, value) =
foreach char c in value:
emit(key, c)
(‘cci’, ‘csiro’) -> (‘cci’, ‘c’), (‘cci’, ’s’),
(‘cci’, ‘i’), (‘cci’, ‘r’),
(‘cci’, ‘o’)
(‘open’, ‘nasa’) -> (‘open’, ‘n’), (‘open’, ’a’),
(‘open’, ‘s’), (‘open’, ‘a’)
●
let map(key, value) =
emit(value.length(), value)
(‘csiro’, ‘cci’) -> (‘3’, ‘cci’)
(‘csiro’, ‘cesre’) -> (‘5’, ‘cesre’)
(‘csiro’, ‘cmse’) -> (‘4’, ‘cmse’)
(‘toyota’, ‘yaris’) -> (‘5’, ‘yaris’)
●
●
○
○
○
●
○
●
map(String input_key, String input_value)
foreach word w in input_value:
emit(w, 1)
reduce(String output_key,
Iterator<int> intermediate_values)
set count = 0
foreach v in intermediate_values:
count += v
emit(output_key, count)
●

Wordcount
$ cd ~/bigdata/Exercises/hadoop/wordcount; ls
WordCount.java
WordMapper.java
SumReducer.java

●

$ javac –classpath

`hadoop classpath` *.java

●

$ jar cvf wc.jar *.class
●

$ hadoop jar wc.jar WordCount data/ato100.txt ato_wc
●

$ hadoop fs ls ato_wc
$ hadoop fs -cat ato_wc/part-r-00000 | less
$ hadoop fs -cat ato_wc/* | grep ‘ATO|CSIRO’
●

$ hadoop fs -rm -r ato_wc
●

Average max temperature
●
●

$ cd ~/bigdata/Exercises/hadoop/data
$ less nsw_temp.csv
$ less bom_data_Note.txt
●

map(String input_key, String input_value):
emit(input_value[3], input_value[5])

(‘IDCJAC0010,061087,1965,01,02,32.2,1,Y’)->(‘01’, 32.2)
(‘IDCJAC0010,066062,1890,04,27,20.2,1,Y’)->(‘04’, 20.2)
(‘IDCJAC0010,066062,2012,02,03,21.0,1,Y’)->(‘02’, 21.1)
●

reduce(String month, Iterator<double> values)
set count = 0
set sum = 0
foreach v in values:
sum += v
count++
set mean = sum/count
emit(month, mean)
●
$ cd ../averagetemp
$ gedit *.java&
AverageTemp.java
AverageTempMapper.java
AverageReducer.java

●

$ cd ../wordcount
$ gedit *.java&
●
●
$ hadoop fs -put ../data/nsw_temp.csv data
$ javac –classpath `hadoop classpath` *.java
$ jar cvf avt.jar *.class
$ hadoop jar avt.jar AverageTemp data/nsw_temp.csv avt
●
$ hadoop fs -cat avt/part-1-00000

~/bigdata/Exercises/hadoop/averagetemp/sample_solution
●
○

○
●
●
●
○
●
●
●

●
●
●
●

●
●

●
○
○
●

●
○
●
●

●
●
●
●
●
●
●
●
○
○
○
●
○
●
○
○
○
●
●

●

○
○
○
○
○
○
https://github.com/tomaszbednarz/pig-abc-toilets

●
●
●

We have list of local ABC Radio
stations in Australia
We have list of all Public Toilets
across Australia
We want to find a closest toilet to
a Radio Station

Demonstration of:
●
●
●

Data Schemas
Use of external libraries
Google Maps API
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●

Más contenido relacionado

La actualidad más candente

高性能かつスケールアウト可能なHPCクラウド AIST Super Green Cloud
高性能かつスケールアウト可能なHPCクラウド AIST Super Green Cloud高性能かつスケールアウト可能なHPCクラウド AIST Super Green Cloud
高性能かつスケールアウト可能なHPCクラウド AIST Super Green Cloud
Ryousei Takano
 
Plone Conference 2008 Lightning Talk Static Zope Rpx
Plone Conference 2008 Lightning Talk Static Zope RpxPlone Conference 2008 Lightning Talk Static Zope Rpx
Plone Conference 2008 Lightning Talk Static Zope Rpx
Paris, France
 
Как показать 90 млн картинок и сохранить жизнь диску
Как показать 90 млн картинок и сохранить жизнь дискуКак показать 90 млн картинок и сохранить жизнь диску
Как показать 90 млн картинок и сохранить жизнь диску
CEE-SEC(R)
 
第4章 存储器管理实验
第4章  存储器管理实验第4章  存储器管理实验
第4章 存储器管理实验
guest332a57
 
20090622 Vimm4
20090622 Vimm420090622 Vimm4
20090622 Vimm4
id774
 
bioinfolec7th20071005
bioinfolec7th20071005bioinfolec7th20071005
bioinfolec7th20071005
guest0fd313
 

La actualidad más candente (19)

高性能かつスケールアウト可能なHPCクラウド AIST Super Green Cloud
高性能かつスケールアウト可能なHPCクラウド AIST Super Green Cloud高性能かつスケールアウト可能なHPCクラウド AIST Super Green Cloud
高性能かつスケールアウト可能なHPCクラウド AIST Super Green Cloud
 
Plone Conference 2008 Lightning Talk Static Zope Rpx
Plone Conference 2008 Lightning Talk Static Zope RpxPlone Conference 2008 Lightning Talk Static Zope Rpx
Plone Conference 2008 Lightning Talk Static Zope Rpx
 
CGI.pm - 3ло?!
CGI.pm - 3ло?!CGI.pm - 3ло?!
CGI.pm - 3ло?!
 
goto dengan C++
goto dengan C++goto dengan C++
goto dengan C++
 
Database api
Database apiDatabase api
Database api
 
Как показать 90 млн картинок и сохранить жизнь диску
Как показать 90 млн картинок и сохранить жизнь дискуКак показать 90 млн картинок и сохранить жизнь диску
Как показать 90 млн картинок и сохранить жизнь диску
 
Как показать 90 млн картинок и сохранить жизнь диску
Как показать 90 млн картинок и сохранить жизнь дискуКак показать 90 млн картинок и сохранить жизнь диску
Как показать 90 млн картинок и сохранить жизнь диску
 
CouchDB Getting Start
CouchDB Getting StartCouchDB Getting Start
CouchDB Getting Start
 
第4章 存储器管理实验
第4章  存储器管理实验第4章  存储器管理实验
第4章 存储器管理实验
 
ggplot2 extensions-ggtree.
ggplot2 extensions-ggtree.ggplot2 extensions-ggtree.
ggplot2 extensions-ggtree.
 
Introduction to MongoDB for C# developers
Introduction to MongoDB for C# developersIntroduction to MongoDB for C# developers
Introduction to MongoDB for C# developers
 
20090622 Vimm4
20090622 Vimm420090622 Vimm4
20090622 Vimm4
 
A Shiny Example-- R
A Shiny Example-- RA Shiny Example-- R
A Shiny Example-- R
 
mdpress(MarkDown Press)を使ったプレゼンテーション作成
mdpress(MarkDown Press)を使ったプレゼンテーション作成mdpress(MarkDown Press)を使ったプレゼンテーション作成
mdpress(MarkDown Press)を使ったプレゼンテーション作成
 
mongodb-introduction
mongodb-introductionmongodb-introduction
mongodb-introduction
 
Program to sort array using insertion sort
Program to sort array using insertion sortProgram to sort array using insertion sort
Program to sort array using insertion sort
 
bioinfolec7th20071005
bioinfolec7th20071005bioinfolec7th20071005
bioinfolec7th20071005
 
Mongodb workshop
Mongodb workshopMongodb workshop
Mongodb workshop
 
Python And GIS - Beyond Modelbuilder And Pythonwin
Python And GIS - Beyond Modelbuilder And PythonwinPython And GIS - Beyond Modelbuilder And Pythonwin
Python And GIS - Beyond Modelbuilder And Pythonwin
 

Similar a Hadoop, HDFS, MapReduce and Pig

pg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQLpg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQL
Command Prompt., Inc
 
Writing MySQL User-defined Functions in JavaScript
Writing MySQL User-defined Functions in JavaScriptWriting MySQL User-defined Functions in JavaScript
Writing MySQL User-defined Functions in JavaScript
Roland Bouman
 

Similar a Hadoop, HDFS, MapReduce and Pig (20)

C&cpu
C&cpuC&cpu
C&cpu
 
Python 1
Python 1Python 1
Python 1
 
Malcon2017
Malcon2017Malcon2017
Malcon2017
 
C c++-meetup-1nov2017-autofdo
C c++-meetup-1nov2017-autofdoC c++-meetup-1nov2017-autofdo
C c++-meetup-1nov2017-autofdo
 
HDFS metadata (fsimage and edits) difference CDH3 and CDH4
HDFS metadata (fsimage and edits) difference CDH3 and CDH4HDFS metadata (fsimage and edits) difference CDH3 and CDH4
HDFS metadata (fsimage and edits) difference CDH3 and CDH4
 
Internationalizing CakePHP Applications
Internationalizing CakePHP ApplicationsInternationalizing CakePHP Applications
Internationalizing CakePHP Applications
 
Bash Scripting Workshop
Bash Scripting WorkshopBash Scripting Workshop
Bash Scripting Workshop
 
Coding with Vim
Coding with VimCoding with Vim
Coding with Vim
 
pg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQLpg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQL
 
pg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQLpg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQL
 
Linux 系統管理與安全:基本 Linux 系統知識
Linux 系統管理與安全:基本 Linux 系統知識Linux 系統管理與安全:基本 Linux 系統知識
Linux 系統管理與安全:基本 Linux 系統知識
 
Statsd eskimi
Statsd eskimiStatsd eskimi
Statsd eskimi
 
OSMC 2015:The road to lazy monitoring with Icinga 2 and Puppet by Tom de Vylder
OSMC 2015:The road to lazy monitoring with Icinga 2 and Puppet by Tom de VylderOSMC 2015:The road to lazy monitoring with Icinga 2 and Puppet by Tom de Vylder
OSMC 2015:The road to lazy monitoring with Icinga 2 and Puppet by Tom de Vylder
 
OSMC 2015 | The Road to Lazy Monitoring with Icinga 2 & Puppet by Tom De Vylder
OSMC 2015 | The Road to Lazy Monitoring with Icinga 2 & Puppet by Tom De VylderOSMC 2015 | The Road to Lazy Monitoring with Icinga 2 & Puppet by Tom De Vylder
OSMC 2015 | The Road to Lazy Monitoring with Icinga 2 & Puppet by Tom De Vylder
 
pg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQLpg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQL
 
pg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQLpg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQL
 
Backups
BackupsBackups
Backups
 
dplyr
dplyrdplyr
dplyr
 
Writing MySQL User-defined Functions in JavaScript
Writing MySQL User-defined Functions in JavaScriptWriting MySQL User-defined Functions in JavaScript
Writing MySQL User-defined Functions in JavaScript
 
Helping Data Teams with Puppet / Puppet Camp London - Apr 13, 2015
Helping Data Teams with Puppet / Puppet Camp London - Apr 13, 2015Helping Data Teams with Puppet / Puppet Camp London - Apr 13, 2015
Helping Data Teams with Puppet / Puppet Camp London - Apr 13, 2015
 

Más de Tomasz Bednarz

Más de Tomasz Bednarz (16)

eResearch AU 2015, intro slides
eResearch AU 2015, intro slideseResearch AU 2015, intro slides
eResearch AU 2015, intro slides
 
Four Hats of Math: CFD
Four Hats of Math: CFDFour Hats of Math: CFD
Four Hats of Math: CFD
 
NVIDIA GTC 2018 Presentation
NVIDIA GTC 2018 PresentationNVIDIA GTC 2018 Presentation
NVIDIA GTC 2018 Presentation
 
Multi-Modal High-End Visualization System
Multi-Modal High-End Visualization SystemMulti-Modal High-End Visualization System
Multi-Modal High-End Visualization System
 
Expanded Perception and Interaction Centre (EPICentre)
Expanded Perception and Interaction Centre (EPICentre)Expanded Perception and Interaction Centre (EPICentre)
Expanded Perception and Interaction Centre (EPICentre)
 
Seminar 2019 at CSE
Seminar 2019 at CSESeminar 2019 at CSE
Seminar 2019 at CSE
 
High-End Visualisation System (HEVS)
High-End Visualisation System (HEVS) High-End Visualisation System (HEVS)
High-End Visualisation System (HEVS)
 
EPICentre UNSW
EPICentre UNSWEPICentre UNSW
EPICentre UNSW
 
SIGGRAPH Asia 2019 Opening Ceremony
SIGGRAPH Asia 2019 Opening CeremonySIGGRAPH Asia 2019 Opening Ceremony
SIGGRAPH Asia 2019 Opening Ceremony
 
SoS
SoSSoS
SoS
 
STEM Camp Virtual Reality
STEM Camp Virtual RealitySTEM Camp Virtual Reality
STEM Camp Virtual Reality
 
Demoscene Stories, and Old-School Code Tricks presented at FMX2015
Demoscene Stories, and Old-School Code Tricks presented at FMX2015Demoscene Stories, and Old-School Code Tricks presented at FMX2015
Demoscene Stories, and Old-School Code Tricks presented at FMX2015
 
Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...
Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...
Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...
 
Design + Art + Science, and Demoscene
Design + Art + Science, and DemosceneDesign + Art + Science, and Demoscene
Design + Art + Science, and Demoscene
 
Introduction to OpenCL, 2010
Introduction to OpenCL, 2010Introduction to OpenCL, 2010
Introduction to OpenCL, 2010
 
Big Data in Finance, 2012
Big Data in Finance, 2012Big Data in Finance, 2012
Big Data in Finance, 2012
 

Último

Último (20)

Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 

Hadoop, HDFS, MapReduce and Pig