SlideShare una empresa de Scribd logo
1 de 22
Hadoop Framework
• Map-Reduce introduction
• Hadoop introduction
• Hadoop Application Architecture
• Developing a typical Hadoop Application
• Practice on Hadoop
Agenda
• A programming model specification from Google.
• Tend to use for processing Terabyte(1024GBs), Petabyte(1024
Terabytes) data.
• Break large or complex processing into smaller, independent pieces
and modeling into key-value pair.
• Run on a commodity of group of clustering machines.
• Scale by add more workers, not bigger worker
• Consist of two phases:
– Map: written by the user, takes an input pair and produce a set of
intermediate key/value pairs.
– Reduce: aggregate and collate intermediate results.
– (input)<k1, v1> map<k2, v2> combine<k2, v2> reduce<k3, v3> (output)
Map-Reduce concept
Map-Reduce flow sample
Map-Reduce overall flow
• User program splits the input file into M pieces.
• One of the copies of the program is the master, the rest are the slaves.
• Master selects idle slaves and assigns a map or reduce task to each one
of them.
• Slaves parse the input into key-value pairs and pass to reduce function.
• The slaves emit key-pair in buffer memory and local hard-disk. This
location is also sent to Master.
• The master notifies to reduce slaves the location of key-pair.
• The reduce slave get the key-pair, sort base on key.
• The reduce pass intermediate key and its value to reduce function.
• The reduce slaves process using reduce function and produce output to
user.
• End process, master return result and control to user.
Map-reduce overall flow
• An open source from Apache implementing the Map-Reduce
specification using Java.
• Distributed processing for large or computationally complex problems
• Main core tenet:
– Scale out not up
– Move processing
– Expect and embrace failure
• Normally batch processing for a massive amount of data set.
• Consisting of two main parts:
– A data storage using for processing(HDFS).
– A parallel process engine (MapReduce APIs).
• Current main players: Amazon Elastic Map Reduce, Cloudera, MapR,
Hortonworks
Hadoop framework
Hadoop Overall Architecture
• Using for temporarily storing data for Map-Reduce processing
• A typical file in HDFS is gigabytes to terabytes in size
• Divide large file into smaller block, default is 64Mb.
• Structure like any existing FS: file, directory, permission
• Support Linux-base command for interact: ls, rm, put…
• Communication model via TPC/IP protocol
• Provide a Java base APIs for access.
Hadoop Distributed File System
Hadoop Distributed File System
Hadoop working model
• Client submit a Job to Hadoop
– The job can be a Mapper, a Reducer, or list of Input.
– It’s a collection of Java classes which packaged into Jar file.
• the Job is sent to JobTracker process on Master Node.
• Each slave Node runs a process called TaskTracker.
• JobTracker instruct the TaskTracker and monitor.
• A Map or Reduce over a piece of data is a single task.
• A task attempt is an instance of a task running on a slave node.
Hadoop working model
Hadoop Programming model
• The Map-Reduce framework relies on the InputFormat of the job to:
– Validate the input-specification of the job.
– Split-up the input file(s) into logical InputSplits, each of which is then assigned to
an individual Mapper.
– Provide the RecordReader implementation to be used to glean input records
from the logical InputSplit for processing by the Mapper.
• Mapper task processing, resulting intermediate key-value pair and sending
to reducer using Map.context(k, v) class.
• Reduce reduces a set of intermediate values which share a key to a
smaller set of values and has 3 primary phases:
– Shuffle: copies the sorted output from each Mapper across the network
– Sort: sorts inputs by keys (since different Mappers may output the same key)
– Reduce: call reduce method defined by user.
• Hadoop defines “box” classes for strings (Text), integers (IntWritable) for
optimizing the serialization over the network.
Hadoop Programming model
Hadoop Application Architecture
• Using Sqoop or Flume to import/export data from various external
data source into HDFS for processing:
– The process is executed in map task of Hadoop.
– Can work with or RDBMS or NoSQL.
– Sample: sqoop import –connect jdbc:mysql://localhost:3306/sqoop -
username root -pasword pass -table employees
• Using Apache Hive as a data warehouse software facilitates querying
and managing large datasets:
– Organize data model as table, row, column, partition
– Support data type like: integer, float, double, string, list, struct
– Support Join, Group, Filter…built-in operators and function
• Using Sping Data for simplifying developing Apache Hadoop:
– Create and configure applications that use MapReduce, Streaming, Hive,
Pig, or Hbase.
– Integration with Spring Boot, using Dependency Injection…
Typical Hadoop Application Architecture
Concrete Hadoop Application Architecture
• Choose appropriate frameworks for each application:
– Hive or Pig for logged/relational data
– Sqoop for working with database, Flume for collecting log data from web
server because it’s event driven.
– HDFS or Hbase for storage of temporary data for processing
– Crunch APIs for join/aggregation rather than Hadoop APIs.
• Apply best practices:
– Choose Number of Mapper and Reducer wisely: Total mapper or reducer
= Number of Nodes * maximum number of tasks per node.
– Set Reducers to zero if you not using it.
– Mappers process optimal amount of data
– Always use Combiner if possible for local aggregation
– Minimize your mapper output
– Always write unit test and run in a small data set
Developing a typical Hadoop Application
• Tuning Hadoop using configuration parameter
– Hadoop provide a lot of parameter for tuning.
• What do when a task fail
– Usually happens
– Try again(retries possible because of idempotence)
– Report failure
• Slow tasks:
– Run anther version of the same task in parallel.
• Apply java coding best practice
Developing Typical Hadoop Application
• Support Standalone/Pseudo distributed/fully distributed mode
• Implement a word count problem
• Debug a Hadoop program:
– Using log file
– Using remote debug
Setup environment and practice
A sample demo
THANK YOU

Más contenido relacionado

La actualidad más candente

Overview of Rest Service and ASP.NET WEB API
Overview of Rest Service and ASP.NET WEB APIOverview of Rest Service and ASP.NET WEB API
Overview of Rest Service and ASP.NET WEB APIPankaj Bajaj
 
Spring Web Services: SOAP vs. REST
Spring Web Services: SOAP vs. RESTSpring Web Services: SOAP vs. REST
Spring Web Services: SOAP vs. RESTSam Brannen
 
Designing a RESTful web service
Designing a RESTful web serviceDesigning a RESTful web service
Designing a RESTful web serviceFilip Blondeel
 
Introduction to the Web API
Introduction to the Web APIIntroduction to the Web API
Introduction to the Web APIBrad Genereaux
 
Restful web services with java
Restful web services with javaRestful web services with java
Restful web services with javaVinay Gopinath
 
Impact of Restful Web Architecture on Performance and Scalability
Impact of Restful Web Architecture on Performance and ScalabilityImpact of Restful Web Architecture on Performance and Scalability
Impact of Restful Web Architecture on Performance and ScalabilitySanchit Gera
 
REST and ASP.NET Web API (Milan)
REST and ASP.NET Web API (Milan)REST and ASP.NET Web API (Milan)
REST and ASP.NET Web API (Milan)Jef Claes
 
RESTful web
RESTful webRESTful web
RESTful webAlvin Qi
 
Web services - A Practical Approach
Web services - A Practical ApproachWeb services - A Practical Approach
Web services - A Practical ApproachMadhaiyan Muthu
 
Designing REST services with Spring MVC
Designing REST services with Spring MVCDesigning REST services with Spring MVC
Designing REST services with Spring MVCSerhii Kartashov
 
Developing RESTful WebServices using Jersey
Developing RESTful WebServices using JerseyDeveloping RESTful WebServices using Jersey
Developing RESTful WebServices using Jerseyb_kathir
 
HATEOAS: The Confusing Bit from REST
HATEOAS: The Confusing Bit from RESTHATEOAS: The Confusing Bit from REST
HATEOAS: The Confusing Bit from RESTelliando dias
 
REST API Recommendations
REST API RecommendationsREST API Recommendations
REST API RecommendationsJeelani Shaik
 

La actualidad más candente (20)

Web Services
Web ServicesWeb Services
Web Services
 
Overview of Rest Service and ASP.NET WEB API
Overview of Rest Service and ASP.NET WEB APIOverview of Rest Service and ASP.NET WEB API
Overview of Rest Service and ASP.NET WEB API
 
Spring Web Services: SOAP vs. REST
Spring Web Services: SOAP vs. RESTSpring Web Services: SOAP vs. REST
Spring Web Services: SOAP vs. REST
 
Designing a RESTful web service
Designing a RESTful web serviceDesigning a RESTful web service
Designing a RESTful web service
 
Introduction to the Web API
Introduction to the Web APIIntroduction to the Web API
Introduction to the Web API
 
REST API Design
REST API DesignREST API Design
REST API Design
 
Restful web services with java
Restful web services with javaRestful web services with java
Restful web services with java
 
Impact of Restful Web Architecture on Performance and Scalability
Impact of Restful Web Architecture on Performance and ScalabilityImpact of Restful Web Architecture on Performance and Scalability
Impact of Restful Web Architecture on Performance and Scalability
 
REST and ASP.NET Web API (Milan)
REST and ASP.NET Web API (Milan)REST and ASP.NET Web API (Milan)
REST and ASP.NET Web API (Milan)
 
RESTful web
RESTful webRESTful web
RESTful web
 
Web services - A Practical Approach
Web services - A Practical ApproachWeb services - A Practical Approach
Web services - A Practical Approach
 
Designing REST services with Spring MVC
Designing REST services with Spring MVCDesigning REST services with Spring MVC
Designing REST services with Spring MVC
 
Web service introduction
Web service introductionWeb service introduction
Web service introduction
 
Restful webservices
Restful webservicesRestful webservices
Restful webservices
 
Developing RESTful WebServices using Jersey
Developing RESTful WebServices using JerseyDeveloping RESTful WebServices using Jersey
Developing RESTful WebServices using Jersey
 
SOAP-based Web Services
SOAP-based Web ServicesSOAP-based Web Services
SOAP-based Web Services
 
HATEOAS: The Confusing Bit from REST
HATEOAS: The Confusing Bit from RESTHATEOAS: The Confusing Bit from REST
HATEOAS: The Confusing Bit from REST
 
L18 REST API Design
L18 REST API DesignL18 REST API Design
L18 REST API Design
 
REST API Recommendations
REST API RecommendationsREST API Recommendations
REST API Recommendations
 
Excellent rest using asp.net web api
Excellent rest using asp.net web apiExcellent rest using asp.net web api
Excellent rest using asp.net web api
 

Similar a Hadoop introduction

writing Hadoop Map Reduce programs
writing Hadoop Map Reduce programswriting Hadoop Map Reduce programs
writing Hadoop Map Reduce programsjani shaik
 
Apache hadoop, hdfs and map reduce Overview
Apache hadoop, hdfs and map reduce OverviewApache hadoop, hdfs and map reduce Overview
Apache hadoop, hdfs and map reduce OverviewNisanth Simon
 
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015Deanna Kosaraju
 
BDA R20 21NM - Summary Big Data Analytics
BDA R20 21NM - Summary Big Data AnalyticsBDA R20 21NM - Summary Big Data Analytics
BDA R20 21NM - Summary Big Data AnalyticsNetajiGandi1
 
Introduccion a Hadoop / Introduction to Hadoop
Introduccion a Hadoop / Introduction to HadoopIntroduccion a Hadoop / Introduction to Hadoop
Introduccion a Hadoop / Introduction to HadoopGERARDO BARBERENA
 
A slide share pig in CCS334 for big data analytics
A slide share pig in CCS334 for big data analyticsA slide share pig in CCS334 for big data analytics
A slide share pig in CCS334 for big data analyticsKrishnaVeni451953
 
11. From Hadoop to Spark 1:2
11. From Hadoop to Spark 1:211. From Hadoop to Spark 1:2
11. From Hadoop to Spark 1:2Fabio Fumarola
 
Big Data Analytics Chapter3-6@2021.pdf
Big Data Analytics Chapter3-6@2021.pdfBig Data Analytics Chapter3-6@2021.pdf
Big Data Analytics Chapter3-6@2021.pdfWasyihunSema2
 
Introduction to Hadoop and Big Data
Introduction to Hadoop and Big DataIntroduction to Hadoop and Big Data
Introduction to Hadoop and Big DataJoe Alex
 
Learn what is Hadoop-and-BigData
Learn  what is Hadoop-and-BigDataLearn  what is Hadoop-and-BigData
Learn what is Hadoop-and-BigDataThanusha154
 
Introduction To Hadoop Ecosystem
Introduction To Hadoop EcosystemIntroduction To Hadoop Ecosystem
Introduction To Hadoop EcosystemInSemble
 
Meethadoop
MeethadoopMeethadoop
MeethadoopIIIT-H
 

Similar a Hadoop introduction (20)

writing Hadoop Map Reduce programs
writing Hadoop Map Reduce programswriting Hadoop Map Reduce programs
writing Hadoop Map Reduce programs
 
Apache hadoop, hdfs and map reduce Overview
Apache hadoop, hdfs and map reduce OverviewApache hadoop, hdfs and map reduce Overview
Apache hadoop, hdfs and map reduce Overview
 
Big Data Processing
Big Data ProcessingBig Data Processing
Big Data Processing
 
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
 
BDA R20 21NM - Summary Big Data Analytics
BDA R20 21NM - Summary Big Data AnalyticsBDA R20 21NM - Summary Big Data Analytics
BDA R20 21NM - Summary Big Data Analytics
 
hadoop
hadoophadoop
hadoop
 
Introduccion a Hadoop / Introduction to Hadoop
Introduccion a Hadoop / Introduction to HadoopIntroduccion a Hadoop / Introduction to Hadoop
Introduccion a Hadoop / Introduction to Hadoop
 
Apache Spark
Apache SparkApache Spark
Apache Spark
 
Hadoop
HadoopHadoop
Hadoop
 
A slide share pig in CCS334 for big data analytics
A slide share pig in CCS334 for big data analyticsA slide share pig in CCS334 for big data analytics
A slide share pig in CCS334 for big data analytics
 
11. From Hadoop to Spark 1:2
11. From Hadoop to Spark 1:211. From Hadoop to Spark 1:2
11. From Hadoop to Spark 1:2
 
Hadoop - Introduction to HDFS
Hadoop - Introduction to HDFSHadoop - Introduction to HDFS
Hadoop - Introduction to HDFS
 
Big Data Analytics Chapter3-6@2021.pdf
Big Data Analytics Chapter3-6@2021.pdfBig Data Analytics Chapter3-6@2021.pdf
Big Data Analytics Chapter3-6@2021.pdf
 
Anju
AnjuAnju
Anju
 
Introduction to Hadoop and Big Data
Introduction to Hadoop and Big DataIntroduction to Hadoop and Big Data
Introduction to Hadoop and Big Data
 
Mapreduce Hadop.pptx
Mapreduce Hadop.pptxMapreduce Hadop.pptx
Mapreduce Hadop.pptx
 
Apache Spark Core
Apache Spark CoreApache Spark Core
Apache Spark Core
 
Learn what is Hadoop-and-BigData
Learn  what is Hadoop-and-BigDataLearn  what is Hadoop-and-BigData
Learn what is Hadoop-and-BigData
 
Introduction To Hadoop Ecosystem
Introduction To Hadoop EcosystemIntroduction To Hadoop Ecosystem
Introduction To Hadoop Ecosystem
 
Meethadoop
MeethadoopMeethadoop
Meethadoop
 

Último

HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKARHAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKARKOUSTAV SARKAR
 
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxHOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxSCMS School of Architecture
 
Hostel management system project report..pdf
Hostel management system project report..pdfHostel management system project report..pdf
Hostel management system project report..pdfKamal Acharya
 
Computer Networks Basics of Network Devices
Computer Networks  Basics of Network DevicesComputer Networks  Basics of Network Devices
Computer Networks Basics of Network DevicesChandrakantDivate1
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueBhangaleSonal
 
Hospital management system project report.pdf
Hospital management system project report.pdfHospital management system project report.pdf
Hospital management system project report.pdfKamal Acharya
 
Engineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planesEngineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planesRAJNEESHKUMAR341697
 
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...Call Girls Mumbai
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startQuintin Balsdon
 
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...Amil baba
 
PE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and propertiesPE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and propertiessarkmank1
 
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments""Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"mphochane1998
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Arindam Chakraborty, Ph.D., P.E. (CA, TX)
 
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptxOrlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptxMuhammadAsimMuhammad6
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXssuser89054b
 
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptxA CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptxmaisarahman1
 
Verification of thevenin's theorem for BEEE Lab (1).pptx
Verification of thevenin's theorem for BEEE Lab (1).pptxVerification of thevenin's theorem for BEEE Lab (1).pptx
Verification of thevenin's theorem for BEEE Lab (1).pptxchumtiyababu
 
AIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsAIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsvanyagupta248
 
School management system project Report.pdf
School management system project Report.pdfSchool management system project Report.pdf
School management system project Report.pdfKamal Acharya
 

Último (20)

HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKARHAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
 
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxHOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
 
Hostel management system project report..pdf
Hostel management system project report..pdfHostel management system project report..pdf
Hostel management system project report..pdf
 
Computer Networks Basics of Network Devices
Computer Networks  Basics of Network DevicesComputer Networks  Basics of Network Devices
Computer Networks Basics of Network Devices
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
Hospital management system project report.pdf
Hospital management system project report.pdfHospital management system project report.pdf
Hospital management system project report.pdf
 
Engineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planesEngineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planes
 
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the start
 
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
 
PE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and propertiesPE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and properties
 
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments""Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
 
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptxOrlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptxA CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
 
Verification of thevenin's theorem for BEEE Lab (1).pptx
Verification of thevenin's theorem for BEEE Lab (1).pptxVerification of thevenin's theorem for BEEE Lab (1).pptx
Verification of thevenin's theorem for BEEE Lab (1).pptx
 
AIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsAIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech students
 
School management system project Report.pdf
School management system project Report.pdfSchool management system project Report.pdf
School management system project Report.pdf
 

Hadoop introduction

  • 2. • Map-Reduce introduction • Hadoop introduction • Hadoop Application Architecture • Developing a typical Hadoop Application • Practice on Hadoop Agenda
  • 3. • A programming model specification from Google. • Tend to use for processing Terabyte(1024GBs), Petabyte(1024 Terabytes) data. • Break large or complex processing into smaller, independent pieces and modeling into key-value pair. • Run on a commodity of group of clustering machines. • Scale by add more workers, not bigger worker • Consist of two phases: – Map: written by the user, takes an input pair and produce a set of intermediate key/value pairs. – Reduce: aggregate and collate intermediate results. – (input)<k1, v1> map<k2, v2> combine<k2, v2> reduce<k3, v3> (output) Map-Reduce concept
  • 6. • User program splits the input file into M pieces. • One of the copies of the program is the master, the rest are the slaves. • Master selects idle slaves and assigns a map or reduce task to each one of them. • Slaves parse the input into key-value pairs and pass to reduce function. • The slaves emit key-pair in buffer memory and local hard-disk. This location is also sent to Master. • The master notifies to reduce slaves the location of key-pair. • The reduce slave get the key-pair, sort base on key. • The reduce pass intermediate key and its value to reduce function. • The reduce slaves process using reduce function and produce output to user. • End process, master return result and control to user. Map-reduce overall flow
  • 7. • An open source from Apache implementing the Map-Reduce specification using Java. • Distributed processing for large or computationally complex problems • Main core tenet: – Scale out not up – Move processing – Expect and embrace failure • Normally batch processing for a massive amount of data set. • Consisting of two main parts: – A data storage using for processing(HDFS). – A parallel process engine (MapReduce APIs). • Current main players: Amazon Elastic Map Reduce, Cloudera, MapR, Hortonworks Hadoop framework
  • 9. • Using for temporarily storing data for Map-Reduce processing • A typical file in HDFS is gigabytes to terabytes in size • Divide large file into smaller block, default is 64Mb. • Structure like any existing FS: file, directory, permission • Support Linux-base command for interact: ls, rm, put… • Communication model via TPC/IP protocol • Provide a Java base APIs for access. Hadoop Distributed File System
  • 12. • Client submit a Job to Hadoop – The job can be a Mapper, a Reducer, or list of Input. – It’s a collection of Java classes which packaged into Jar file. • the Job is sent to JobTracker process on Master Node. • Each slave Node runs a process called TaskTracker. • JobTracker instruct the TaskTracker and monitor. • A Map or Reduce over a piece of data is a single task. • A task attempt is an instance of a task running on a slave node. Hadoop working model
  • 14. • The Map-Reduce framework relies on the InputFormat of the job to: – Validate the input-specification of the job. – Split-up the input file(s) into logical InputSplits, each of which is then assigned to an individual Mapper. – Provide the RecordReader implementation to be used to glean input records from the logical InputSplit for processing by the Mapper. • Mapper task processing, resulting intermediate key-value pair and sending to reducer using Map.context(k, v) class. • Reduce reduces a set of intermediate values which share a key to a smaller set of values and has 3 primary phases: – Shuffle: copies the sorted output from each Mapper across the network – Sort: sorts inputs by keys (since different Mappers may output the same key) – Reduce: call reduce method defined by user. • Hadoop defines “box” classes for strings (Text), integers (IntWritable) for optimizing the serialization over the network. Hadoop Programming model
  • 16. • Using Sqoop or Flume to import/export data from various external data source into HDFS for processing: – The process is executed in map task of Hadoop. – Can work with or RDBMS or NoSQL. – Sample: sqoop import –connect jdbc:mysql://localhost:3306/sqoop - username root -pasword pass -table employees • Using Apache Hive as a data warehouse software facilitates querying and managing large datasets: – Organize data model as table, row, column, partition – Support data type like: integer, float, double, string, list, struct – Support Join, Group, Filter…built-in operators and function • Using Sping Data for simplifying developing Apache Hadoop: – Create and configure applications that use MapReduce, Streaming, Hive, Pig, or Hbase. – Integration with Spring Boot, using Dependency Injection… Typical Hadoop Application Architecture
  • 18. • Choose appropriate frameworks for each application: – Hive or Pig for logged/relational data – Sqoop for working with database, Flume for collecting log data from web server because it’s event driven. – HDFS or Hbase for storage of temporary data for processing – Crunch APIs for join/aggregation rather than Hadoop APIs. • Apply best practices: – Choose Number of Mapper and Reducer wisely: Total mapper or reducer = Number of Nodes * maximum number of tasks per node. – Set Reducers to zero if you not using it. – Mappers process optimal amount of data – Always use Combiner if possible for local aggregation – Minimize your mapper output – Always write unit test and run in a small data set Developing a typical Hadoop Application
  • 19. • Tuning Hadoop using configuration parameter – Hadoop provide a lot of parameter for tuning. • What do when a task fail – Usually happens – Try again(retries possible because of idempotence) – Report failure • Slow tasks: – Run anther version of the same task in parallel. • Apply java coding best practice Developing Typical Hadoop Application
  • 20. • Support Standalone/Pseudo distributed/fully distributed mode • Implement a word count problem • Debug a Hadoop program: – Using log file – Using remote debug Setup environment and practice