SlideShare una empresa de Scribd logo
1 de 31
GANDHI INSTITUTE FOR TECHNOLOGICAL
ADVANCEMENT, BHUBANESWAR
TECHNICAL SEMINAR ON
HADOOP
GUIDED BY- PRESENTED BY-
PROF.KUNDAN CHANDRA PATRA NAME-ABHIJEET RAJ
PROF. SWOGAT KUMAR JENA BRANCH-CSE(1)
PROF. SAROJ KUMAR MOHANTY REG NO.-1301287529
CONTENTS -
1. INTRODUCTION TO HADOOP
2. HADOOP-HISTORY AND ORIGIN
3. BIG DATA ANALYTICS AND CHALLENGES
4. HADOOP ECOSYSTEM
5. HDFS ARCHITECTURE
6. HADOOP VS RDBMS
7. MAP REDUCE
8. PIG AND HIVE
9. CONCLUSION
1Abhijeet raj,131001
INTRODUCTION-
• What is Hadoop-
• Apache Hadoop is an open-source software
framework for distribuited storage and
processing of large data
• Written in java
• Based on Google file system(GFS)
2Abhijeet raj,131001
Continued...
• It is designed to scale up from single servers to
thousands of machines, each offering local
computation and storage.
• Hadoop framework consists on two main layers
• HDFS
• Map Reduce
Abhijeet raj,131001 3
History and Origin
• Doug cutting trying to make an open source
search engine in 2003
• Google released their distributed system
papers called Map/Reduce and Google file
system (GFS) which powered Google search
engine:
Abhijeet raj,131001 4
Continued...
• Doug cutting took these ideas and started to
work on open source
• In 2006 he joins Yahoo! and the distributed
system named as Hadoop
• Yahoo open sourced it through Apache
organization
Abhijeet raj,131001 5
Organizations using Hadoop
• Amazon
• Adobe
• Cloudspace
• Ebay
• Facebook
• Google
• IBM
• LinkedIn
• yahoo
Abhijeet raj,131001 6
Big data analytics and
challenges
• Minimum size of that a Big Data file starts is
at least 1 Terabyte.
• 4 V’s tossed for Big Data:-
1. VOLUME- The scale of data
2. VARIETY- Different forms of data
3. VELOCITY- Analysis of streaming data
4. VARACITY- Uncertainity of data
Abhijeet raj,131001 7
Challenges for Big Data
processing
• Meeting the need for speed
• Scale
• Continuous Availability
• Displaying meaningful results
• Workload diversity
• Data security
• Cost
• Manageability
Abhijeet raj,131001 8
Hadoop vs traditional RDBMS
Abhijeet raj,131001 9
Factors Hadoop RDBMS
Size of data Petabytes Gigabytes
Integrity of data Low High
Data schema Dynamic Static
Access method Interactive and batch Batch
Scaling Linear Non linear
Data structure Unstructured/structured Structured
Normalization of data Not required Required
Query response time Has latency(due to
batch process)
Can be near immediate
Hadoop Ecosystem
Abhijeet raj,131001 10
HDFS(Hadoop Distribuited File System)
• a distributed file system designed to run on
commodity hardware
• It is suitable for the distributed storage and
processing.
• The built-in servers of namenode and
datanode help users to easily check the
status of cluster.
• HDFS provides file permissions and
authentication.
Abhijeet raj,131001 11
Continued...
Namenode
• Namenode is the node which stores the filesystem
metadata i.e. which file maps to what block
locations and which blocks are stored on which
datanode.
Datanode
• The data node is where the actual data resides.
Abhijeet raj,131001 12
Continued...
Job tracker
• primary function of the job tracker is resource
management ,tracking resource availability and
task life cycle management
Task tracker
• Follow the orders of the job tracker and
updating the job tracker with its progress status
periodically.
Abhijeet raj,131001 13
Abhijeet raj,131001 14
Goals of HDFS
• Fault detection and recovery
• Huge datasets
• Reduce network traffic
• Increases throughput
Abhijeet raj,131001 15
Map Reduce
• MapReduce is a processing technique and a
program model for distributed computing
based on java
• Map-data are broken into tuples
• Reduce-combines the tuples into a smaller
form
Abhijeet raj,131001 16
Abhijeet raj,131001 17
Advantages of Map Reduce
• Easy to scale data processing over multiple
computing nodes.
• Parallel processing.
• Fast.
• Simple model of programming
Abhijeet raj,131001 18
HBASE
• Developed by Apache software foundation
• Database for Hadoop.
• Open source
• Non-relational
Abhijeet raj,131001 19
Continued...
• Distribuited
• Written in java
• Connectivity is done using JDBC –Type 4
driver
Abhijeet raj,131001 20
YARN
• Yet Another Resource Negotiator
• In Yarn, the job tracker is split into two
different daemons called Resource
Manager and Node Manager
Abhijeet raj,131001 21
YARN ARCHITECTURE
Abhijeet raj,131001 22
PIG
• Analyzing large data sets that consists of a
high-level language for expressing data
analysis programs
• Structure is amenable to substantial
parallelization
Abhijeet raj,131001 23
Continued...
• Easy of programming
• Optimization opportunities
• Extensibility
Abhijeet raj,131001 24
HIVE
• Data warehouse software facilitates querying
and managing large datasets
• Allows traditional map/reduce programmers
to plug in their custom mappers and
reducers
Abhijeet raj,131001 25
PIG VS HIVE
Abhijeet raj,131001 26
PIG HIVE
TYPES OF FLOW PROCEDURAL LANGUAGE DECLARATIVE LANGUAGE
EASY OF USE COMPLEX EASY
NATURE OF USAGE EFFICIENCY IN COMPUTING ANALYTICS AREA
TYPE OF DATA VARIABLES TABLES
DEBUGGING FACILITY DEBUGGED LOCALLY COMPLEX
MAINTENANCE MORE LESS
DEVELOPMENT TIME MORE LESS
HANDLING BIG DATA HANDLES MORE DATA MEMORY OVERFLOW
REFERENCES
• hadoop.apache.org
• tutorialspoint.com
• hbase.apache.org
• en.wikipedia.org/wiki/Apache_Hadoop
• Pig.apache.org
• datastax.com
• youtube.com
• Google images
Abhijeet raj,131001 27
Conclusion
• Hadoop has been very effective solution for
companies dealing with the data in petabytes
or big data.
• Has overcame the limitations of traditional
data storage problems.
• Being open source , widely accepted
Abhijeet raj,131001 28
Abhijeet raj,131001 29
•
Abhijeet raj,131001 30

Más contenido relacionado

La actualidad más candente

Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Simplilearn
 
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Simplilearn
 

La actualidad más candente (20)

PPT on Hadoop
PPT on HadoopPPT on Hadoop
PPT on Hadoop
 
Hadoop Technology
Hadoop TechnologyHadoop Technology
Hadoop Technology
 
Hadoop Tutorial For Beginners
Hadoop Tutorial For BeginnersHadoop Tutorial For Beginners
Hadoop Tutorial For Beginners
 
Hadoop And Their Ecosystem ppt
 Hadoop And Their Ecosystem ppt Hadoop And Their Ecosystem ppt
Hadoop And Their Ecosystem ppt
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Big data and Hadoop
Big data and HadoopBig data and Hadoop
Big data and Hadoop
 
Hive
HiveHive
Hive
 
Hadoop And Their Ecosystem
 Hadoop And Their Ecosystem Hadoop And Their Ecosystem
Hadoop And Their Ecosystem
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : Beginners
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)
 
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY ppt
 
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
 
Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1
 
Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...
Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...
Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...
 
Apache HBase™
Apache HBase™Apache HBase™
Apache HBase™
 
Big Data & Hadoop Tutorial
Big Data & Hadoop TutorialBig Data & Hadoop Tutorial
Big Data & Hadoop Tutorial
 

Destacado

Blue brain project ppt
Blue brain project pptBlue brain project ppt
Blue brain project ppt
Lishita Shah
 

Destacado (17)

Hadoop
HadoopHadoop
Hadoop
 
Hadoop Seminar Report
Hadoop Seminar ReportHadoop Seminar Report
Hadoop Seminar Report
 
Seminar_3D INTERNET
Seminar_3D INTERNETSeminar_3D INTERNET
Seminar_3D INTERNET
 
Blue brain by MAYANK SAHU
Blue brain by MAYANK SAHUBlue brain by MAYANK SAHU
Blue brain by MAYANK SAHU
 
3D Internet
3D Internet 3D Internet
3D Internet
 
Smart card technology
Smart card technologySmart card technology
Smart card technology
 
Best Ever PPT Of Bluebrain
Best Ever PPT Of BluebrainBest Ever PPT Of Bluebrain
Best Ever PPT Of Bluebrain
 
3d internet
3d internet3d internet
3d internet
 
Bluebrain
BluebrainBluebrain
Bluebrain
 
Blue brain
Blue brain Blue brain
Blue brain
 
Blue brain project ppt
Blue brain project pptBlue brain project ppt
Blue brain project ppt
 
Bulletin d'informations n°001 18 avril 2016 18h00-vf
Bulletin d'informations n°001 18 avril 2016   18h00-vfBulletin d'informations n°001 18 avril 2016   18h00-vf
Bulletin d'informations n°001 18 avril 2016 18h00-vf
 
Why Most Of IT Companies outsourcing?
Why Most Of IT Companies outsourcing?Why Most Of IT Companies outsourcing?
Why Most Of IT Companies outsourcing?
 
LOFAR
LOFARLOFAR
LOFAR
 
FREE Phonics worksheets
FREE Phonics worksheetsFREE Phonics worksheets
FREE Phonics worksheets
 
Lauren CV 2016
Lauren CV 2016Lauren CV 2016
Lauren CV 2016
 
Heatkal Container Design Solutions (EN 12079)
Heatkal   Container Design Solutions (EN 12079)Heatkal   Container Design Solutions (EN 12079)
Heatkal Container Design Solutions (EN 12079)
 

Similar a Hadoop

Big Data and Cloud Computing
Big Data and Cloud ComputingBig Data and Cloud Computing
Big Data and Cloud Computing
Farzad Nozarian
 
Hadoop Distriubted File System (HDFS) presentation 27- 5-2015
Hadoop Distriubted File System (HDFS) presentation 27- 5-2015Hadoop Distriubted File System (HDFS) presentation 27- 5-2015
Hadoop Distriubted File System (HDFS) presentation 27- 5-2015
Abdul Nasir
 
Introduction to Apache Hadoop
Introduction to Apache HadoopIntroduction to Apache Hadoop
Introduction to Apache Hadoop
Christopher Pezza
 
Overview of big data & hadoop v1
Overview of big data & hadoop   v1Overview of big data & hadoop   v1
Overview of big data & hadoop v1
Thanh Nguyen
 
HADOOP AND MAPREDUCE ARCHITECTURE-Unit-5.ppt
HADOOP AND MAPREDUCE ARCHITECTURE-Unit-5.pptHADOOP AND MAPREDUCE ARCHITECTURE-Unit-5.ppt
HADOOP AND MAPREDUCE ARCHITECTURE-Unit-5.ppt
ManiMaran230751
 

Similar a Hadoop (20)

M. Florence Dayana - Hadoop Foundation for Analytics.pptx
M. Florence Dayana - Hadoop Foundation for Analytics.pptxM. Florence Dayana - Hadoop Foundation for Analytics.pptx
M. Florence Dayana - Hadoop Foundation for Analytics.pptx
 
Big Data and Cloud Computing
Big Data and Cloud ComputingBig Data and Cloud Computing
Big Data and Cloud Computing
 
Hadoop Distriubted File System (HDFS) presentation 27- 5-2015
Hadoop Distriubted File System (HDFS) presentation 27- 5-2015Hadoop Distriubted File System (HDFS) presentation 27- 5-2015
Hadoop Distriubted File System (HDFS) presentation 27- 5-2015
 
Hadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log Processing
 
Big data analysis using hadoop cluster
Big data analysis using hadoop clusterBig data analysis using hadoop cluster
Big data analysis using hadoop cluster
 
Introduction to Apache Hadoop
Introduction to Apache HadoopIntroduction to Apache Hadoop
Introduction to Apache Hadoop
 
Managing Big data with Hadoop
Managing Big data with HadoopManaging Big data with Hadoop
Managing Big data with Hadoop
 
Hadoop
HadoopHadoop
Hadoop
 
Hadoop
HadoopHadoop
Hadoop
 
Unit IV.pdf
Unit IV.pdfUnit IV.pdf
Unit IV.pdf
 
Hadoop jon
Hadoop jonHadoop jon
Hadoop jon
 
Big data - Online Training
Big data - Online TrainingBig data - Online Training
Big data - Online Training
 
Hadoop hive presentation
Hadoop hive presentationHadoop hive presentation
Hadoop hive presentation
 
Foxvalley bigdata
Foxvalley bigdataFoxvalley bigdata
Foxvalley bigdata
 
Hadoop
HadoopHadoop
Hadoop
 
1.demystifying big data & hadoop
1.demystifying big data & hadoop1.demystifying big data & hadoop
1.demystifying big data & hadoop
 
Hadoop in a Nutshell
Hadoop in a NutshellHadoop in a Nutshell
Hadoop in a Nutshell
 
Overview of big data & hadoop v1
Overview of big data & hadoop   v1Overview of big data & hadoop   v1
Overview of big data & hadoop v1
 
HADOOP AND MAPREDUCE ARCHITECTURE-Unit-5.ppt
HADOOP AND MAPREDUCE ARCHITECTURE-Unit-5.pptHADOOP AND MAPREDUCE ARCHITECTURE-Unit-5.ppt
HADOOP AND MAPREDUCE ARCHITECTURE-Unit-5.ppt
 
Hp Converged Systems and Hortonworks - Webinar Slides
Hp Converged Systems and Hortonworks - Webinar SlidesHp Converged Systems and Hortonworks - Webinar Slides
Hp Converged Systems and Hortonworks - Webinar Slides
 

Último

Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Kandungan 087776558899
 
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments""Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
mphochane1998
 
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
AldoGarca30
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
Neometrix_Engineering_Pvt_Ltd
 
Hospital management system project report.pdf
Hospital management system project report.pdfHospital management system project report.pdf
Hospital management system project report.pdf
Kamal Acharya
 

Último (20)

Online electricity billing project report..pdf
Online electricity billing project report..pdfOnline electricity billing project report..pdf
Online electricity billing project report..pdf
 
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKARHAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
 
GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLE
GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLEGEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLE
GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLE
 
Engineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planesEngineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planes
 
Wadi Rum luxhotel lodge Analysis case study.pptx
Wadi Rum luxhotel lodge Analysis case study.pptxWadi Rum luxhotel lodge Analysis case study.pptx
Wadi Rum luxhotel lodge Analysis case study.pptx
 
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
 
Introduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaIntroduction to Serverless with AWS Lambda
Introduction to Serverless with AWS Lambda
 
AIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsAIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech students
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
 
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxS1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
 
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments""Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
 
DC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationDC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equation
 
Computer Networks Basics of Network Devices
Computer Networks  Basics of Network DevicesComputer Networks  Basics of Network Devices
Computer Networks Basics of Network Devices
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
kiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal loadkiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal load
 
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
 
Work-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxWork-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptx
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
 
Hospital management system project report.pdf
Hospital management system project report.pdfHospital management system project report.pdf
Hospital management system project report.pdf
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the start
 

Hadoop

  • 1. GANDHI INSTITUTE FOR TECHNOLOGICAL ADVANCEMENT, BHUBANESWAR TECHNICAL SEMINAR ON HADOOP GUIDED BY- PRESENTED BY- PROF.KUNDAN CHANDRA PATRA NAME-ABHIJEET RAJ PROF. SWOGAT KUMAR JENA BRANCH-CSE(1) PROF. SAROJ KUMAR MOHANTY REG NO.-1301287529
  • 2. CONTENTS - 1. INTRODUCTION TO HADOOP 2. HADOOP-HISTORY AND ORIGIN 3. BIG DATA ANALYTICS AND CHALLENGES 4. HADOOP ECOSYSTEM 5. HDFS ARCHITECTURE 6. HADOOP VS RDBMS 7. MAP REDUCE 8. PIG AND HIVE 9. CONCLUSION 1Abhijeet raj,131001
  • 3. INTRODUCTION- • What is Hadoop- • Apache Hadoop is an open-source software framework for distribuited storage and processing of large data • Written in java • Based on Google file system(GFS) 2Abhijeet raj,131001
  • 4. Continued... • It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. • Hadoop framework consists on two main layers • HDFS • Map Reduce Abhijeet raj,131001 3
  • 5. History and Origin • Doug cutting trying to make an open source search engine in 2003 • Google released their distributed system papers called Map/Reduce and Google file system (GFS) which powered Google search engine: Abhijeet raj,131001 4
  • 6. Continued... • Doug cutting took these ideas and started to work on open source • In 2006 he joins Yahoo! and the distributed system named as Hadoop • Yahoo open sourced it through Apache organization Abhijeet raj,131001 5
  • 7. Organizations using Hadoop • Amazon • Adobe • Cloudspace • Ebay • Facebook • Google • IBM • LinkedIn • yahoo Abhijeet raj,131001 6
  • 8. Big data analytics and challenges • Minimum size of that a Big Data file starts is at least 1 Terabyte. • 4 V’s tossed for Big Data:- 1. VOLUME- The scale of data 2. VARIETY- Different forms of data 3. VELOCITY- Analysis of streaming data 4. VARACITY- Uncertainity of data Abhijeet raj,131001 7
  • 9. Challenges for Big Data processing • Meeting the need for speed • Scale • Continuous Availability • Displaying meaningful results • Workload diversity • Data security • Cost • Manageability Abhijeet raj,131001 8
  • 10. Hadoop vs traditional RDBMS Abhijeet raj,131001 9 Factors Hadoop RDBMS Size of data Petabytes Gigabytes Integrity of data Low High Data schema Dynamic Static Access method Interactive and batch Batch Scaling Linear Non linear Data structure Unstructured/structured Structured Normalization of data Not required Required Query response time Has latency(due to batch process) Can be near immediate
  • 12. HDFS(Hadoop Distribuited File System) • a distributed file system designed to run on commodity hardware • It is suitable for the distributed storage and processing. • The built-in servers of namenode and datanode help users to easily check the status of cluster. • HDFS provides file permissions and authentication. Abhijeet raj,131001 11
  • 13. Continued... Namenode • Namenode is the node which stores the filesystem metadata i.e. which file maps to what block locations and which blocks are stored on which datanode. Datanode • The data node is where the actual data resides. Abhijeet raj,131001 12
  • 14. Continued... Job tracker • primary function of the job tracker is resource management ,tracking resource availability and task life cycle management Task tracker • Follow the orders of the job tracker and updating the job tracker with its progress status periodically. Abhijeet raj,131001 13
  • 16. Goals of HDFS • Fault detection and recovery • Huge datasets • Reduce network traffic • Increases throughput Abhijeet raj,131001 15
  • 17. Map Reduce • MapReduce is a processing technique and a program model for distributed computing based on java • Map-data are broken into tuples • Reduce-combines the tuples into a smaller form Abhijeet raj,131001 16
  • 19. Advantages of Map Reduce • Easy to scale data processing over multiple computing nodes. • Parallel processing. • Fast. • Simple model of programming Abhijeet raj,131001 18
  • 20. HBASE • Developed by Apache software foundation • Database for Hadoop. • Open source • Non-relational Abhijeet raj,131001 19
  • 21. Continued... • Distribuited • Written in java • Connectivity is done using JDBC –Type 4 driver Abhijeet raj,131001 20
  • 22. YARN • Yet Another Resource Negotiator • In Yarn, the job tracker is split into two different daemons called Resource Manager and Node Manager Abhijeet raj,131001 21
  • 24. PIG • Analyzing large data sets that consists of a high-level language for expressing data analysis programs • Structure is amenable to substantial parallelization Abhijeet raj,131001 23
  • 25. Continued... • Easy of programming • Optimization opportunities • Extensibility Abhijeet raj,131001 24
  • 26. HIVE • Data warehouse software facilitates querying and managing large datasets • Allows traditional map/reduce programmers to plug in their custom mappers and reducers Abhijeet raj,131001 25
  • 27. PIG VS HIVE Abhijeet raj,131001 26 PIG HIVE TYPES OF FLOW PROCEDURAL LANGUAGE DECLARATIVE LANGUAGE EASY OF USE COMPLEX EASY NATURE OF USAGE EFFICIENCY IN COMPUTING ANALYTICS AREA TYPE OF DATA VARIABLES TABLES DEBUGGING FACILITY DEBUGGED LOCALLY COMPLEX MAINTENANCE MORE LESS DEVELOPMENT TIME MORE LESS HANDLING BIG DATA HANDLES MORE DATA MEMORY OVERFLOW
  • 28. REFERENCES • hadoop.apache.org • tutorialspoint.com • hbase.apache.org • en.wikipedia.org/wiki/Apache_Hadoop • Pig.apache.org • datastax.com • youtube.com • Google images Abhijeet raj,131001 27
  • 29. Conclusion • Hadoop has been very effective solution for companies dealing with the data in petabytes or big data. • Has overcame the limitations of traditional data storage problems. • Being open source , widely accepted Abhijeet raj,131001 28