SlideShare una empresa de Scribd logo
1 de 17
WHAT STARTS HERE CHANGES THE WORLD




           and MapReduce
Hemanth Kumar Mantri
  Graduate Student
     UT-Austin



   November 9th 2011
WHAT STARTS HERE CHANGES THE WORLD




                 Agenda
•   What is Hadoop?
•   Where is MapReduce used?
•   HDFS and MapReduce
•   Amazon Web Services
•   Map Reduce Demo on Hadoop
WHAT STARTS HERE CHANGES THE WORLD




            What is Hadoop?
• Inspired by Google File System (GFS) and
  MapReduce.
• Supports data-intensive distributed
  applications.
• Thousands of nodes and PBytes of data.
• Apache project – Open Source
• Implemented in Java
• Yahoo! - largest contributor
WHAT STARTS HERE CHANGES THE WORLD




Typical Hadoop Cluster!
WHAT STARTS HERE CHANGES THE WORLD




Who Uses Hadoop?
WHAT STARTS HERE CHANGES THE WORLD




                    Who Uses Hadoop?
•   At Google:
     – Index construction for Google Search
     – Popular Passages in Google Books
     – Article clustering for Google News

•   At Yahoo!:
     – “Web map” powering Yahoo! Search
     – Spam detection for Yahoo! Mail
     – More than 100,000 CPUs in >36,000 computers

•   At Facebook:
     – Used in reporting/analytics and machine learning
          • Data Mining, Spam detection
     – as storage engine for logs.
     – 1100-machine cluster with 8800 cores and about 12 PB raw storage.
WHAT STARTS HERE CHANGES THE WORLD




FaceBook Lexicon
WHAT STARTS HERE CHANGES THE WORLD




                           Yelp!
• Uses Amazon S3 to store daily logs and photos,
   – generating around 100GB of logs per day.
• Amazon Elastic MapReduce for:
   –   People Who Viewed this Also Viewed
   –   Review highlights
   –   Auto complete as you type on search
   –   Search spelling suggestions
   –   Top searches
   –   Ads
• Yelp runs approximately 200 Elastic MapReduce jobs
  processing 3TB of data per day.
WHAT STARTS HERE CHANGES THE WORLD




          Hadoop Components
• Distributed file system (HDFS)
  – Single namespace for entire cluster
  – Almost same as GFS
  – Replicates data 3x for fault-tolerance

• MapReduce framework
  – Executes user jobs specified as “map” and
    “reduce” functions
  – Manages work distribution & fault-tolerance
WHAT STARTS HERE CHANGES THE WORLD




Hadoop Architecture
WHAT STARTS HERE CHANGES THE WORLD




The Big Picture
WHAT STARTS HERE CHANGES THE WORLD




                         Using the HDFS
• hadoop dfs
   –   [-ls <path>]
   –   [-du <path>]
   –   [-cp <src> <dst>]
   –   [-rm <path>]
   –   [-put <localsrc> <dst>]
   –   [-copyFromLocal <localsrc> <dst>]
   –   [-moveFromLocal <localsrc> <dst>]
   –   [-get [-crc] <src> <localdst>]
   –   [-cat <src>]
   –   [-copyToLocal [-crc] <src> <localdst>]
   –   [-moveToLocal [-crc] <src> <localdst>]
   –   [-mkdir <path>]
   –   [-touchz <path>]
   –   [-test -[ezd] <path>]
   –   [-stat [format] <path>]
   –   [-help [cmd]]
WHAT STARTS HERE CHANGES THE WORLD




AWS and Cloud
WHAT STARTS HERE CHANGES THE WORLD




           Amazon Web Services
• Collection of services – Pay as you use!
   – S3 (Simple Storage Service)
       Storage in the Cloud ($0.140/GB/Month)
       Key Value Store (Big HashMap!)
   – EC2 (Elastic Compute Cloud)
       Compute in the Cloud ($0.085 - $2.6 /computing hour)
   – Elastic MapReduce
       Run Hadoop Jobs on EC2 using Data stored in S3
   – Email Service
   – …. Many more
WHAT STARTS HERE CHANGES THE WORLD




       Map Reduce on EC2 Cluster
• Create AWS account and get the keys for authentication
• Go to src/contrib/ec2 in Hadoop directory
• Launch a cluster on EC2
   – % bin/hadoop-ec2 launch-cluster <cluster-name> <#nodes>
• Login to the cluster
   – % bin/hadoop-ec2 login test-cluster
• Start Computation
   – # cd /usr/local/hadoop-*
   – # bin/hadoop jar hadoop-*-examples.jar pi 10 10000000
• Terminate the Cluster after use!!!!!
   – % bin/hadoop-ec2 terminate-cluster test-cluster
WHAT STARTS HERE CHANGES THE WORLD




                References
• Hadoop Project Page:
  – http://hadoop.apache.org/
• Amazon Web Services:
  – http://aws.amazon.com/
WHAT STARTS HERE CHANGES THE WORLD




Thank You!

Más contenido relacionado

La actualidad más candente

Introduction to apache hadoop copy
Introduction to apache hadoop   copyIntroduction to apache hadoop   copy
Introduction to apache hadoop copyMohammad_Tariq
 
Hadoop hbase introduction
Hadoop hbase introductionHadoop hbase introduction
Hadoop hbase introductionJakub Stransky
 
Geek camp
Geek campGeek camp
Geek campjdhok
 
Another Intro To Hadoop
Another Intro To HadoopAnother Intro To Hadoop
Another Intro To HadoopAdeel Ahmad
 
Introduction to Apache Hadoop
Introduction to Apache HadoopIntroduction to Apache Hadoop
Introduction to Apache HadoopSteve Watt
 
Big Data in the Microsoft Platform
Big Data in the Microsoft PlatformBig Data in the Microsoft Platform
Big Data in the Microsoft PlatformJesus Rodriguez
 
Hadoop online trainings
Hadoop online trainingsHadoop online trainings
Hadoop online trainingsGeek Trainings
 
Facebook Hadoop Data & Applications
Facebook Hadoop Data & ApplicationsFacebook Hadoop Data & Applications
Facebook Hadoop Data & Applicationsdzhou
 
2 hadoop@e bay-hug-2010-07-21
2 hadoop@e bay-hug-2010-07-212 hadoop@e bay-hug-2010-07-21
2 hadoop@e bay-hug-2010-07-21Hadoop User Group
 
The Bixo Web Mining Toolkit
The Bixo Web Mining ToolkitThe Bixo Web Mining Toolkit
The Bixo Web Mining ToolkitTom Croucher
 
Public Terabyte Dataset Project: Web crawling with Amazon Elastic MapReduce
Public Terabyte Dataset Project: Web crawling with Amazon Elastic MapReducePublic Terabyte Dataset Project: Web crawling with Amazon Elastic MapReduce
Public Terabyte Dataset Project: Web crawling with Amazon Elastic MapReduceHadoop User Group
 
Hadoop Conference Japan 2011 Fallに行ってきました
Hadoop Conference Japan 2011 Fallに行ってきましたHadoop Conference Japan 2011 Fallに行ってきました
Hadoop Conference Japan 2011 Fallに行ってきましたmoai kids
 
Hadoop World 2011: Apache Hadoop 0.23 - Arun Murthy, Horton Works
Hadoop World 2011: Apache Hadoop 0.23 - Arun Murthy, Horton WorksHadoop World 2011: Apache Hadoop 0.23 - Arun Murthy, Horton Works
Hadoop World 2011: Apache Hadoop 0.23 - Arun Murthy, Horton WorksCloudera, Inc.
 
End-to-end Analytics with Apache Cassandra
End-to-end Analytics with Apache CassandraEnd-to-end Analytics with Apache Cassandra
End-to-end Analytics with Apache CassandraJeremy Hanna
 
Hadoop @ eBay: Past, Present, and Future
Hadoop @ eBay: Past, Present, and FutureHadoop @ eBay: Past, Present, and Future
Hadoop @ eBay: Past, Present, and FutureRyan Hennig
 
HBase backups and performance on MapR
HBase backups and performance on MapRHBase backups and performance on MapR
HBase backups and performance on MapRlohitvijayarenu
 

La actualidad más candente (19)

Introduction to apache hadoop copy
Introduction to apache hadoop   copyIntroduction to apache hadoop   copy
Introduction to apache hadoop copy
 
Hadoop hbase introduction
Hadoop hbase introductionHadoop hbase introduction
Hadoop hbase introduction
 
Geek camp
Geek campGeek camp
Geek camp
 
Another Intro To Hadoop
Another Intro To HadoopAnother Intro To Hadoop
Another Intro To Hadoop
 
Introduction to Apache Hadoop
Introduction to Apache HadoopIntroduction to Apache Hadoop
Introduction to Apache Hadoop
 
Big Data in the Microsoft Platform
Big Data in the Microsoft PlatformBig Data in the Microsoft Platform
Big Data in the Microsoft Platform
 
Hadoop online trainings
Hadoop online trainingsHadoop online trainings
Hadoop online trainings
 
Facebook Hadoop Data & Applications
Facebook Hadoop Data & ApplicationsFacebook Hadoop Data & Applications
Facebook Hadoop Data & Applications
 
2 hadoop@e bay-hug-2010-07-21
2 hadoop@e bay-hug-2010-07-212 hadoop@e bay-hug-2010-07-21
2 hadoop@e bay-hug-2010-07-21
 
The Bixo Web Mining Toolkit
The Bixo Web Mining ToolkitThe Bixo Web Mining Toolkit
The Bixo Web Mining Toolkit
 
Public Terabyte Dataset Project: Web crawling with Amazon Elastic MapReduce
Public Terabyte Dataset Project: Web crawling with Amazon Elastic MapReducePublic Terabyte Dataset Project: Web crawling with Amazon Elastic MapReduce
Public Terabyte Dataset Project: Web crawling with Amazon Elastic MapReduce
 
Hadoop Conference Japan 2011 Fallに行ってきました
Hadoop Conference Japan 2011 Fallに行ってきましたHadoop Conference Japan 2011 Fallに行ってきました
Hadoop Conference Japan 2011 Fallに行ってきました
 
Hadoop World 2011: Apache Hadoop 0.23 - Arun Murthy, Horton Works
Hadoop World 2011: Apache Hadoop 0.23 - Arun Murthy, Horton WorksHadoop World 2011: Apache Hadoop 0.23 - Arun Murthy, Horton Works
Hadoop World 2011: Apache Hadoop 0.23 - Arun Murthy, Horton Works
 
End-to-end Analytics with Apache Cassandra
End-to-end Analytics with Apache CassandraEnd-to-end Analytics with Apache Cassandra
End-to-end Analytics with Apache Cassandra
 
Hadoop online training
Hadoop online trainingHadoop online training
Hadoop online training
 
מיכאל
מיכאלמיכאל
מיכאל
 
2012 apache hadoop_map_reduce_windows_azure
2012 apache hadoop_map_reduce_windows_azure2012 apache hadoop_map_reduce_windows_azure
2012 apache hadoop_map_reduce_windows_azure
 
Hadoop @ eBay: Past, Present, and Future
Hadoop @ eBay: Past, Present, and FutureHadoop @ eBay: Past, Present, and Future
Hadoop @ eBay: Past, Present, and Future
 
HBase backups and performance on MapR
HBase backups and performance on MapRHBase backups and performance on MapR
HBase backups and performance on MapR
 

Similar a Hadoop and MapReduce

AWS (Hadoop) Meetup 30.04.09
AWS (Hadoop) Meetup 30.04.09AWS (Hadoop) Meetup 30.04.09
AWS (Hadoop) Meetup 30.04.09Chris Purrington
 
Big data, just an introduction to Hadoop and Scripting Languages
Big data, just an introduction to Hadoop and Scripting LanguagesBig data, just an introduction to Hadoop and Scripting Languages
Big data, just an introduction to Hadoop and Scripting LanguagesCorley S.r.l.
 
Big Data in the Microsoft Platform
Big Data in the Microsoft PlatformBig Data in the Microsoft Platform
Big Data in the Microsoft PlatformJesus Rodriguez
 
Apache hadoop, hdfs and map reduce Overview
Apache hadoop, hdfs and map reduce OverviewApache hadoop, hdfs and map reduce Overview
Apache hadoop, hdfs and map reduce OverviewNisanth Simon
 
Apache Hadoop 1.1
Apache Hadoop 1.1Apache Hadoop 1.1
Apache Hadoop 1.1Sperasoft
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3tcloudcomputing-tw
 
INTRODUCTION TO BIG DATA HADOOP
INTRODUCTION TO BIG DATA HADOOPINTRODUCTION TO BIG DATA HADOOP
INTRODUCTION TO BIG DATA HADOOPKrishna Sujeer
 
Hadoop ecosystem framework n hadoop in live environment
Hadoop ecosystem framework  n hadoop in live environmentHadoop ecosystem framework  n hadoop in live environment
Hadoop ecosystem framework n hadoop in live environmentDelhi/NCR HUG
 
Hadoop-Quick introduction
Hadoop-Quick introductionHadoop-Quick introduction
Hadoop-Quick introductionSandeep Singh
 
Big Data and Hadoop in Cloud - Leveraging Amazon EMR
Big Data and Hadoop in Cloud - Leveraging Amazon EMRBig Data and Hadoop in Cloud - Leveraging Amazon EMR
Big Data and Hadoop in Cloud - Leveraging Amazon EMRVijay Rayapati
 
Distributed Data processing in a Cloud
Distributed Data processing in a CloudDistributed Data processing in a Cloud
Distributed Data processing in a Cloudelliando dias
 
Hadoop - Just the Basics for Big Data Rookies (SpringOne2GX 2013)
Hadoop - Just the Basics for Big Data Rookies (SpringOne2GX 2013)Hadoop - Just the Basics for Big Data Rookies (SpringOne2GX 2013)
Hadoop - Just the Basics for Big Data Rookies (SpringOne2GX 2013)VMware Tanzu
 
Dataiku big data paris - the rise of the hadoop ecosystem
Dataiku   big data paris - the rise of the hadoop ecosystemDataiku   big data paris - the rise of the hadoop ecosystem
Dataiku big data paris - the rise of the hadoop ecosystemDataiku
 
An Introduction to Apache Hadoop, Mahout and HBase
An Introduction to Apache Hadoop, Mahout and HBaseAn Introduction to Apache Hadoop, Mahout and HBase
An Introduction to Apache Hadoop, Mahout and HBaseLukas Vlcek
 
4. hadoop גיא לבנברג
4. hadoop  גיא לבנברג4. hadoop  גיא לבנברג
4. hadoop גיא לבנברגTaldor Group
 

Similar a Hadoop and MapReduce (20)

Hadoop Primer
Hadoop PrimerHadoop Primer
Hadoop Primer
 
AWS (Hadoop) Meetup 30.04.09
AWS (Hadoop) Meetup 30.04.09AWS (Hadoop) Meetup 30.04.09
AWS (Hadoop) Meetup 30.04.09
 
Hadoop
HadoopHadoop
Hadoop
 
Big data, just an introduction to Hadoop and Scripting Languages
Big data, just an introduction to Hadoop and Scripting LanguagesBig data, just an introduction to Hadoop and Scripting Languages
Big data, just an introduction to Hadoop and Scripting Languages
 
Big Data in the Microsoft Platform
Big Data in the Microsoft PlatformBig Data in the Microsoft Platform
Big Data in the Microsoft Platform
 
Hadoop 2.0 handout 5.0
Hadoop 2.0 handout 5.0Hadoop 2.0 handout 5.0
Hadoop 2.0 handout 5.0
 
Hadoop - Introduction to HDFS
Hadoop - Introduction to HDFSHadoop - Introduction to HDFS
Hadoop - Introduction to HDFS
 
Apache hadoop, hdfs and map reduce Overview
Apache hadoop, hdfs and map reduce OverviewApache hadoop, hdfs and map reduce Overview
Apache hadoop, hdfs and map reduce Overview
 
Apache Hadoop 1.1
Apache Hadoop 1.1Apache Hadoop 1.1
Apache Hadoop 1.1
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
 
INTRODUCTION TO BIG DATA HADOOP
INTRODUCTION TO BIG DATA HADOOPINTRODUCTION TO BIG DATA HADOOP
INTRODUCTION TO BIG DATA HADOOP
 
Hadoop ecosystem framework n hadoop in live environment
Hadoop ecosystem framework  n hadoop in live environmentHadoop ecosystem framework  n hadoop in live environment
Hadoop ecosystem framework n hadoop in live environment
 
Hadoop-Quick introduction
Hadoop-Quick introductionHadoop-Quick introduction
Hadoop-Quick introduction
 
Big Data and Hadoop in Cloud - Leveraging Amazon EMR
Big Data and Hadoop in Cloud - Leveraging Amazon EMRBig Data and Hadoop in Cloud - Leveraging Amazon EMR
Big Data and Hadoop in Cloud - Leveraging Amazon EMR
 
Distributed Data processing in a Cloud
Distributed Data processing in a CloudDistributed Data processing in a Cloud
Distributed Data processing in a Cloud
 
Hadoop - Just the Basics for Big Data Rookies (SpringOne2GX 2013)
Hadoop - Just the Basics for Big Data Rookies (SpringOne2GX 2013)Hadoop - Just the Basics for Big Data Rookies (SpringOne2GX 2013)
Hadoop - Just the Basics for Big Data Rookies (SpringOne2GX 2013)
 
Dataiku big data paris - the rise of the hadoop ecosystem
Dataiku   big data paris - the rise of the hadoop ecosystemDataiku   big data paris - the rise of the hadoop ecosystem
Dataiku big data paris - the rise of the hadoop ecosystem
 
An Introduction to Apache Hadoop, Mahout and HBase
An Introduction to Apache Hadoop, Mahout and HBaseAn Introduction to Apache Hadoop, Mahout and HBase
An Introduction to Apache Hadoop, Mahout and HBase
 
4. hadoop גיא לבנברג
4. hadoop  גיא לבנברג4. hadoop  גיא לבנברג
4. hadoop גיא לבנברג
 
Hadoop ppt1
Hadoop ppt1Hadoop ppt1
Hadoop ppt1
 

Más de Hemanth Kumar Mantri

Más de Hemanth Kumar Mantri (8)

TCP Issues in DataCenter Networks
TCP Issues in DataCenter NetworksTCP Issues in DataCenter Networks
TCP Issues in DataCenter Networks
 
Basic Paxos Implementation in Orc
Basic Paxos Implementation in OrcBasic Paxos Implementation in Orc
Basic Paxos Implementation in Orc
 
Neural Networks in File access Prediction
Neural Networks in File access PredictionNeural Networks in File access Prediction
Neural Networks in File access Prediction
 
Connected Components Labeling
Connected Components LabelingConnected Components Labeling
Connected Components Labeling
 
JPEG Image Compression
JPEG Image CompressionJPEG Image Compression
JPEG Image Compression
 
Traffic Simulation using NetLogo
Traffic Simulation using NetLogoTraffic Simulation using NetLogo
Traffic Simulation using NetLogo
 
Search Engine Switching
Search Engine SwitchingSearch Engine Switching
Search Engine Switching
 
Auto Tuning
Auto TuningAuto Tuning
Auto Tuning
 

Último

Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard37
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 

Último (20)

Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptx
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 

Hadoop and MapReduce

  • 1. WHAT STARTS HERE CHANGES THE WORLD and MapReduce Hemanth Kumar Mantri Graduate Student UT-Austin November 9th 2011
  • 2. WHAT STARTS HERE CHANGES THE WORLD Agenda • What is Hadoop? • Where is MapReduce used? • HDFS and MapReduce • Amazon Web Services • Map Reduce Demo on Hadoop
  • 3. WHAT STARTS HERE CHANGES THE WORLD What is Hadoop? • Inspired by Google File System (GFS) and MapReduce. • Supports data-intensive distributed applications. • Thousands of nodes and PBytes of data. • Apache project – Open Source • Implemented in Java • Yahoo! - largest contributor
  • 4. WHAT STARTS HERE CHANGES THE WORLD Typical Hadoop Cluster!
  • 5. WHAT STARTS HERE CHANGES THE WORLD Who Uses Hadoop?
  • 6. WHAT STARTS HERE CHANGES THE WORLD Who Uses Hadoop? • At Google: – Index construction for Google Search – Popular Passages in Google Books – Article clustering for Google News • At Yahoo!: – “Web map” powering Yahoo! Search – Spam detection for Yahoo! Mail – More than 100,000 CPUs in >36,000 computers • At Facebook: – Used in reporting/analytics and machine learning • Data Mining, Spam detection – as storage engine for logs. – 1100-machine cluster with 8800 cores and about 12 PB raw storage.
  • 7. WHAT STARTS HERE CHANGES THE WORLD FaceBook Lexicon
  • 8. WHAT STARTS HERE CHANGES THE WORLD Yelp! • Uses Amazon S3 to store daily logs and photos, – generating around 100GB of logs per day. • Amazon Elastic MapReduce for: – People Who Viewed this Also Viewed – Review highlights – Auto complete as you type on search – Search spelling suggestions – Top searches – Ads • Yelp runs approximately 200 Elastic MapReduce jobs processing 3TB of data per day.
  • 9. WHAT STARTS HERE CHANGES THE WORLD Hadoop Components • Distributed file system (HDFS) – Single namespace for entire cluster – Almost same as GFS – Replicates data 3x for fault-tolerance • MapReduce framework – Executes user jobs specified as “map” and “reduce” functions – Manages work distribution & fault-tolerance
  • 10. WHAT STARTS HERE CHANGES THE WORLD Hadoop Architecture
  • 11. WHAT STARTS HERE CHANGES THE WORLD The Big Picture
  • 12. WHAT STARTS HERE CHANGES THE WORLD Using the HDFS • hadoop dfs – [-ls <path>] – [-du <path>] – [-cp <src> <dst>] – [-rm <path>] – [-put <localsrc> <dst>] – [-copyFromLocal <localsrc> <dst>] – [-moveFromLocal <localsrc> <dst>] – [-get [-crc] <src> <localdst>] – [-cat <src>] – [-copyToLocal [-crc] <src> <localdst>] – [-moveToLocal [-crc] <src> <localdst>] – [-mkdir <path>] – [-touchz <path>] – [-test -[ezd] <path>] – [-stat [format] <path>] – [-help [cmd]]
  • 13. WHAT STARTS HERE CHANGES THE WORLD AWS and Cloud
  • 14. WHAT STARTS HERE CHANGES THE WORLD Amazon Web Services • Collection of services – Pay as you use! – S3 (Simple Storage Service) Storage in the Cloud ($0.140/GB/Month) Key Value Store (Big HashMap!) – EC2 (Elastic Compute Cloud) Compute in the Cloud ($0.085 - $2.6 /computing hour) – Elastic MapReduce Run Hadoop Jobs on EC2 using Data stored in S3 – Email Service – …. Many more
  • 15. WHAT STARTS HERE CHANGES THE WORLD Map Reduce on EC2 Cluster • Create AWS account and get the keys for authentication • Go to src/contrib/ec2 in Hadoop directory • Launch a cluster on EC2 – % bin/hadoop-ec2 launch-cluster <cluster-name> <#nodes> • Login to the cluster – % bin/hadoop-ec2 login test-cluster • Start Computation – # cd /usr/local/hadoop-* – # bin/hadoop jar hadoop-*-examples.jar pi 10 10000000 • Terminate the Cluster after use!!!!! – % bin/hadoop-ec2 terminate-cluster test-cluster
  • 16. WHAT STARTS HERE CHANGES THE WORLD References • Hadoop Project Page: – http://hadoop.apache.org/ • Amazon Web Services: – http://aws.amazon.com/
  • 17. WHAT STARTS HERE CHANGES THE WORLD Thank You!