SlideShare a Scribd company logo
1 of 49
Krishnendu P
CONTENTS:
 Data and Big Data
 Problems with Big Data
 Hadoop
 Small History of Hadoop
 What problems can Hadoop solve?
 Components of Hadoop - HDFS, MapReduce
 Hadoop Cluster
 High Level Archetecture of Hadoop
 Hadoop Core Components
 Features of Hadoop
 Limitations of Hadoop
 Users of Hadoop
 Conclusion
 References
Data:
➔ Any real world symbol (character, numeric,
special character ) or group of them is said
to be data.
➔It may be visual, audio, scriptual etc.
Big Data
Big data means really a big data, it is a collection
of large datasets that cannot be processed using
on hand database management tools or
traditional computing techniques.
Big Data
The Big Data includes huge volume, high velocity,
and extensible variety of data. The data in it will be of
three types.
Structured data : Relational data.
Semi Structured data : XML data.
Unstructured data : Word, PDF, Text
Problems with Big Data:
➔Daily about 0.5 petabytes of updates are being
made into FACEBOOK including 40 millions
photos.
➔Daily YOUTUBE is loaded with videos that can be
watched for one year continously.
➔Limitations are encountered due to large data sets
in many areas, including genomics,complex
physics simulations, and biological and
environmental research.
Cont...
➔Also affect Internet search, finance and
business informatics.
➔The challenges include in capture, retrieval
,storage, search, sharing, analysis, and
visualization.
What could be the solution for
Big Data ?
hadoohadoo
pp
What is hadoop ?
➔Hadoop is an open source, Java-based
programming framework developed by Doug
Cutting and Mike Cafarella in 2005.
➔It is part of the Apache project sponsored by the
Apache Software Foundation.
➔Its designed to scale up from single servers to
thousands of machines, each offering local computers
and storage.
Cont...
➔It is used for distributed storage and distributed
processing of very large data sets on computer
clusters built from commodity hardware.
Small History
➔Hadoop was inspired by Google's MapReduce, a
software framework in which an application is
broken down into numerous small parts.
➔Any of these parts(also called fragments or blocks)
can be run on any node in the cluster.
➔Doug Cutting, Hadoop's creator, named the
framework after his child's stuffed toy elephant.
Small History
➔Started with building Web Search Engine
- Nutch in 2002
- Aim was to index billons of pages.
- Archetecture can't support billons of pages.
➔Google's GFS in 2003 solved storage problem.
- Nutch Distributed File System in 2004.
➔Google's MapReduce in 2004
- MapReduce implemented in 2005.
Doug Cutting with Hadoop
Mike Cafarella
2005: Doug Cutting and Mike Cafarella developed Hadoop
to support distribution for the Nutch search engine project.
The project was funded by Yahoo.
2006: Yahoo gave the project to Apache
Software Foundation.
Now Apache Hadoop is a registered trademark of the
Apache Software Foundation.
What problems can Hadoop solve?
The Hadoop platform was designed to solve problems
where you have a lot of data " perhaps a mixture of
complex and structured data " and it doesn't fit well
into tables.
Components Of Hadoop
Hadoop consists of MapReduce, the Hadoop
distributed file system (HDFS) and a number of
related projects such as Apache Hive, HBase and
Zookeeper.
HADOOPHADOOP
HDFS MapReduce
HDFS (Hadoop Distributed File System)
➔The Hadoop Distributed File System (HDFS) is a
distributed file system designed to run on
commodity hardware.
➔ Its is a sub-project of Apache Hadoop project.
➔ HDFS is highly fault-tolerant and is designed to
be deployed on low-cost hardware.
➔HDFS provides high throughput access to
application data and is suitable for applications
that have large data sets.
Cont...
➔The HDFS takes care of storing and managing the
data within the hadoop cluster.
Cont...
MapReduce
➔ MapReducing is a programming model used for
processing large data sets.
➔Programs written in this functional style are
automatically parallelized and executed on a large
cluster of commodity machines.
➔MapReduce is an associated implementation for
processing and generating large data sets.
MapReduce
MapReduce program executes in two stages, namely
map stage, and reduce stage.
Map stage :
The map or mapper’s job is to process the
input data. Generally the input data is in the form of
file or directory and is stored in the Hadoop file
system (HDFS). The input file is passed to the
mapper function line by line. The mapper processes
the data and creates several small chunks of data.
MapReduce
MapReduce program executes in two stages, namely
map stage, and reduce stage.
Reduce stage :
The Reducer’s job is to process the data that
comes from the mapper. After processing, it
produces a new set of output, which will be stored in
the HDFS.
MapReduce
Hadoop Core components
MASTER NODE
SLAVE NODE
Name node
Data node
Job tracker
Task tracker
Storage node Compute node
Cont...
Node :
It is a technical term used to describe a
machine or a computer that is present in a
cluster.
Demode :
It is a technical term used to describe the
background process that is running on a
linux machine.
Cont...
➔ The Master node responsible for running
Name nodes and Job tracker demodes.
➔The Slave node responsible for running the
Data nodes and Task tracker demodes.
Cont...
➔Name node and Data node are responsible
for storing and managing the data, and they
are commonly referred to as Storage Node.
➔Job Tracker and Task Tracker are
responsible for processing and computing the
data, and they are commonly referred to as
Compute Node.
Cont..
➔Usually Name node and Job tracker
configured on a single machine.
➔ The Data node and Task tracker
configured on multiple machines. But can
have instances running on more than one
machines at the same time.
Hadoop Cluster
➔ Normally any set of loosely connected or tightly
connected computers that work together as a single
system is called Cluster.
➔ In simple words, a computer cluster used for Hadoop
is called Hadoop Cluster.
Hadoop Cluster
Hadoop cluster is a special type of computational
cluster designed for storing and analyzing vast
amount of unstructured data in a distributed
computing environment. These clusters run on low
cost commodity computers.
Hadoop Cluster
Hadoop Cluster
➔Hadoop clusters are often referred to as "shared
nothing" systems because the only thing that is
shared between nodes is the network that connects
Them.
➔Clustering improves the system's availability to
users.
Hadoop Cluster
A Real Time Example:
Here is a picture of Yahoo's Hadoop cluster. They
have more than 10,000 machines running Hadoop
and nearly 1 petabyte of user data.
● Scalability :
Scalability basically refers to the ability of
adding or removing the nodes without bringing
down or affecting the cluster operation.
Features of Hadoop
Features of Hadoop
● Cost effective :
Hadoop does not requires any expensive
cost specialized harware. In other words, it can
be implemented on a simple hardware. These
hardware components are technically called as
commodity hardware.
Features of Hadoop
● Large Cluster of Nodes:
A hadoop cluster can be made up
off 100's and 1000's of nodes. One of the
main advantage of having a large cluster is, it
offers more computing power and huge
storage system to the clients.
Features of Hadoop
● Parallel Processing of Data:
The data can be process
simultaniously across all the nodes
within the cluster and thus saving a lot
of time.
Features of Hadoop
● Automatic Failover Management:
In case, if any of the nodes
within the cluster fails, the hadoop framework
will replace that particular machine with
another machine.
● Flexible :
Hadoop is schema-less, and can
absorb any type of data, structured or not,
from any number of sources.
● Fault-tolerant :
When you lose a node, the system
redirects work to another location of the
data and continue processing without
missing a beat.
Features of Hadoop
Limitations of Hadoop
● Security concerns
● Vulnerable by nature
● Not fit for Small data
● Potential steability issues
What is Hadoop used for?
● Search
– Yahoo, Amazon, Zvents
• Log processing
– Facebook, Yahoo, ContextWeb. Joost,
Last.fm
• Recommendation Systems
– Facebook
• Data Warehouse
– Facebook, AOL(America Online)
• Video and Image Analysis
– New York Times, Eyealike
Conclusion
➔Hadoop has been very effective for companies
dealing with the data in petabytes.
➔It has solved many problems in industry
related to huge data management and
distributed system.
➔As it is open source, so it is adopted by
companies widely.
References
● www.dezyre.com/Big-Data-and-Hadoop
● www.cloudera.com/content/www/...hadoop
/hdfs-mapreduce-yarn.html
● www.ufaber.com/hadoop/bigbata/free
● www.psgtech.edu/yrgcc/attach/haoop_archite
cture.ppt
Hadoop seminar
Hadoop seminar

More Related Content

What's hot

Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation HadoopVarun Narang
 
Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component rebeccatho
 
Introduction to Hadoop Technology
Introduction to Hadoop TechnologyIntroduction to Hadoop Technology
Introduction to Hadoop TechnologyManish Borkar
 
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...Simplilearn
 
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...Simplilearn
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY pptsravya raju
 
Introduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-SystemIntroduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-SystemMd. Hasan Basri (Angel)
 
Hadoop installation, Configuration, and Mapreduce program
Hadoop installation, Configuration, and Mapreduce programHadoop installation, Configuration, and Mapreduce program
Hadoop installation, Configuration, and Mapreduce programPraveen Kumar Donta
 
Big Data - Applications and Technologies Overview
Big Data - Applications and Technologies OverviewBig Data - Applications and Technologies Overview
Big Data - Applications and Technologies OverviewSivashankar Ganapathy
 
Big Data Technology Stack : Nutshell
Big Data Technology Stack : NutshellBig Data Technology Stack : Nutshell
Big Data Technology Stack : NutshellKhalid Imran
 
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...Simplilearn
 
Introduction to Map Reduce
Introduction to Map ReduceIntroduction to Map Reduce
Introduction to Map ReduceApache Apex
 
Hadoop And Their Ecosystem
 Hadoop And Their Ecosystem Hadoop And Their Ecosystem
Hadoop And Their Ecosystemsunera pathan
 
ETL VS ELT.pdf
ETL VS ELT.pdfETL VS ELT.pdf
ETL VS ELT.pdfBOSupport
 
Data Streaming in Big Data Analysis
Data Streaming in Big Data AnalysisData Streaming in Big Data Analysis
Data Streaming in Big Data AnalysisVincenzo Gulisano
 

What's hot (20)

Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation Hadoop
 
Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component
 
Introduction to Hadoop Technology
Introduction to Hadoop TechnologyIntroduction to Hadoop Technology
Introduction to Hadoop Technology
 
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
 
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY ppt
 
Big data and Hadoop
Big data and HadoopBig data and Hadoop
Big data and Hadoop
 
Introduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-SystemIntroduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-System
 
Hadoop installation, Configuration, and Mapreduce program
Hadoop installation, Configuration, and Mapreduce programHadoop installation, Configuration, and Mapreduce program
Hadoop installation, Configuration, and Mapreduce program
 
Hadoop and Big Data
Hadoop and Big DataHadoop and Big Data
Hadoop and Big Data
 
Introduction to HDFS
Introduction to HDFSIntroduction to HDFS
Introduction to HDFS
 
Big Data - Applications and Technologies Overview
Big Data - Applications and Technologies OverviewBig Data - Applications and Technologies Overview
Big Data - Applications and Technologies Overview
 
Big Data Technology Stack : Nutshell
Big Data Technology Stack : NutshellBig Data Technology Stack : Nutshell
Big Data Technology Stack : Nutshell
 
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
 
Introduction to Map Reduce
Introduction to Map ReduceIntroduction to Map Reduce
Introduction to Map Reduce
 
Big data
Big dataBig data
Big data
 
Hadoop And Their Ecosystem
 Hadoop And Their Ecosystem Hadoop And Their Ecosystem
Hadoop And Their Ecosystem
 
ETL VS ELT.pdf
ETL VS ELT.pdfETL VS ELT.pdf
ETL VS ELT.pdf
 
Data Streaming in Big Data Analysis
Data Streaming in Big Data AnalysisData Streaming in Big Data Analysis
Data Streaming in Big Data Analysis
 
Apache Hadoop 3
Apache Hadoop 3Apache Hadoop 3
Apache Hadoop 3
 

Similar to Hadoop seminar

Similar to Hadoop seminar (20)

Anju
AnjuAnju
Anju
 
Hadoop info
Hadoop infoHadoop info
Hadoop info
 
Seminar ppt
Seminar pptSeminar ppt
Seminar ppt
 
Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Hadoop and BigData - July 2016
Hadoop and BigData - July 2016
 
Bigdata and Hadoop Introduction
Bigdata and Hadoop IntroductionBigdata and Hadoop Introduction
Bigdata and Hadoop Introduction
 
Hadoop Technology
Hadoop TechnologyHadoop Technology
Hadoop Technology
 
Big data Analytics Hadoop
Big data Analytics HadoopBig data Analytics Hadoop
Big data Analytics Hadoop
 
Unit IV.pdf
Unit IV.pdfUnit IV.pdf
Unit IV.pdf
 
hadoop
hadoophadoop
hadoop
 
hadoop
hadoophadoop
hadoop
 
Managing Big data with Hadoop
Managing Big data with HadoopManaging Big data with Hadoop
Managing Big data with Hadoop
 
Hadoop
HadoopHadoop
Hadoop
 
THE SOLUTION FOR BIG DATA
THE SOLUTION FOR BIG DATATHE SOLUTION FOR BIG DATA
THE SOLUTION FOR BIG DATA
 
THE SOLUTION FOR BIG DATA
THE SOLUTION FOR BIG DATATHE SOLUTION FOR BIG DATA
THE SOLUTION FOR BIG DATA
 
OPERATING SYSTEM .pptx
OPERATING SYSTEM .pptxOPERATING SYSTEM .pptx
OPERATING SYSTEM .pptx
 
Hadoop programming
Hadoop programmingHadoop programming
Hadoop programming
 
Big Data and Hadoop Basics
Big Data and Hadoop BasicsBig Data and Hadoop Basics
Big Data and Hadoop Basics
 
Topic 9a-Hadoop Storage- HDFS.pptx
Topic 9a-Hadoop Storage- HDFS.pptxTopic 9a-Hadoop Storage- HDFS.pptx
Topic 9a-Hadoop Storage- HDFS.pptx
 
Cppt Hadoop
Cppt HadoopCppt Hadoop
Cppt Hadoop
 
Cppt
CpptCppt
Cppt
 

Recently uploaded

SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docxPoojaSen20
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991RKavithamani
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxRoyAbrique
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 

Recently uploaded (20)

SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docx
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 

Hadoop seminar

  • 2. CONTENTS:  Data and Big Data  Problems with Big Data  Hadoop  Small History of Hadoop  What problems can Hadoop solve?  Components of Hadoop - HDFS, MapReduce  Hadoop Cluster  High Level Archetecture of Hadoop  Hadoop Core Components  Features of Hadoop  Limitations of Hadoop  Users of Hadoop  Conclusion  References
  • 3. Data: ➔ Any real world symbol (character, numeric, special character ) or group of them is said to be data. ➔It may be visual, audio, scriptual etc.
  • 4. Big Data Big data means really a big data, it is a collection of large datasets that cannot be processed using on hand database management tools or traditional computing techniques.
  • 5. Big Data The Big Data includes huge volume, high velocity, and extensible variety of data. The data in it will be of three types. Structured data : Relational data. Semi Structured data : XML data. Unstructured data : Word, PDF, Text
  • 6. Problems with Big Data: ➔Daily about 0.5 petabytes of updates are being made into FACEBOOK including 40 millions photos. ➔Daily YOUTUBE is loaded with videos that can be watched for one year continously. ➔Limitations are encountered due to large data sets in many areas, including genomics,complex physics simulations, and biological and environmental research.
  • 7. Cont... ➔Also affect Internet search, finance and business informatics. ➔The challenges include in capture, retrieval ,storage, search, sharing, analysis, and visualization.
  • 8. What could be the solution for Big Data ?
  • 10. What is hadoop ? ➔Hadoop is an open source, Java-based programming framework developed by Doug Cutting and Mike Cafarella in 2005. ➔It is part of the Apache project sponsored by the Apache Software Foundation.
  • 11. ➔Its designed to scale up from single servers to thousands of machines, each offering local computers and storage. Cont... ➔It is used for distributed storage and distributed processing of very large data sets on computer clusters built from commodity hardware.
  • 12. Small History ➔Hadoop was inspired by Google's MapReduce, a software framework in which an application is broken down into numerous small parts. ➔Any of these parts(also called fragments or blocks) can be run on any node in the cluster. ➔Doug Cutting, Hadoop's creator, named the framework after his child's stuffed toy elephant.
  • 13. Small History ➔Started with building Web Search Engine - Nutch in 2002 - Aim was to index billons of pages. - Archetecture can't support billons of pages. ➔Google's GFS in 2003 solved storage problem. - Nutch Distributed File System in 2004. ➔Google's MapReduce in 2004 - MapReduce implemented in 2005.
  • 16. 2005: Doug Cutting and Mike Cafarella developed Hadoop to support distribution for the Nutch search engine project. The project was funded by Yahoo. 2006: Yahoo gave the project to Apache Software Foundation. Now Apache Hadoop is a registered trademark of the Apache Software Foundation.
  • 17. What problems can Hadoop solve? The Hadoop platform was designed to solve problems where you have a lot of data " perhaps a mixture of complex and structured data " and it doesn't fit well into tables.
  • 18. Components Of Hadoop Hadoop consists of MapReduce, the Hadoop distributed file system (HDFS) and a number of related projects such as Apache Hive, HBase and Zookeeper.
  • 20.
  • 21. HDFS (Hadoop Distributed File System) ➔The Hadoop Distributed File System (HDFS) is a distributed file system designed to run on commodity hardware. ➔ Its is a sub-project of Apache Hadoop project. ➔ HDFS is highly fault-tolerant and is designed to be deployed on low-cost hardware.
  • 22. ➔HDFS provides high throughput access to application data and is suitable for applications that have large data sets. Cont... ➔The HDFS takes care of storing and managing the data within the hadoop cluster.
  • 24. MapReduce ➔ MapReducing is a programming model used for processing large data sets. ➔Programs written in this functional style are automatically parallelized and executed on a large cluster of commodity machines. ➔MapReduce is an associated implementation for processing and generating large data sets.
  • 25. MapReduce MapReduce program executes in two stages, namely map stage, and reduce stage. Map stage : The map or mapper’s job is to process the input data. Generally the input data is in the form of file or directory and is stored in the Hadoop file system (HDFS). The input file is passed to the mapper function line by line. The mapper processes the data and creates several small chunks of data.
  • 26. MapReduce MapReduce program executes in two stages, namely map stage, and reduce stage. Reduce stage : The Reducer’s job is to process the data that comes from the mapper. After processing, it produces a new set of output, which will be stored in the HDFS.
  • 28. Hadoop Core components MASTER NODE SLAVE NODE Name node Data node Job tracker Task tracker Storage node Compute node
  • 29. Cont... Node : It is a technical term used to describe a machine or a computer that is present in a cluster. Demode : It is a technical term used to describe the background process that is running on a linux machine.
  • 30. Cont... ➔ The Master node responsible for running Name nodes and Job tracker demodes. ➔The Slave node responsible for running the Data nodes and Task tracker demodes.
  • 31. Cont... ➔Name node and Data node are responsible for storing and managing the data, and they are commonly referred to as Storage Node. ➔Job Tracker and Task Tracker are responsible for processing and computing the data, and they are commonly referred to as Compute Node.
  • 32. Cont.. ➔Usually Name node and Job tracker configured on a single machine. ➔ The Data node and Task tracker configured on multiple machines. But can have instances running on more than one machines at the same time.
  • 33. Hadoop Cluster ➔ Normally any set of loosely connected or tightly connected computers that work together as a single system is called Cluster. ➔ In simple words, a computer cluster used for Hadoop is called Hadoop Cluster.
  • 34. Hadoop Cluster Hadoop cluster is a special type of computational cluster designed for storing and analyzing vast amount of unstructured data in a distributed computing environment. These clusters run on low cost commodity computers.
  • 36. Hadoop Cluster ➔Hadoop clusters are often referred to as "shared nothing" systems because the only thing that is shared between nodes is the network that connects Them. ➔Clustering improves the system's availability to users.
  • 37. Hadoop Cluster A Real Time Example: Here is a picture of Yahoo's Hadoop cluster. They have more than 10,000 machines running Hadoop and nearly 1 petabyte of user data.
  • 38. ● Scalability : Scalability basically refers to the ability of adding or removing the nodes without bringing down or affecting the cluster operation. Features of Hadoop
  • 39. Features of Hadoop ● Cost effective : Hadoop does not requires any expensive cost specialized harware. In other words, it can be implemented on a simple hardware. These hardware components are technically called as commodity hardware.
  • 40. Features of Hadoop ● Large Cluster of Nodes: A hadoop cluster can be made up off 100's and 1000's of nodes. One of the main advantage of having a large cluster is, it offers more computing power and huge storage system to the clients.
  • 41. Features of Hadoop ● Parallel Processing of Data: The data can be process simultaniously across all the nodes within the cluster and thus saving a lot of time.
  • 42. Features of Hadoop ● Automatic Failover Management: In case, if any of the nodes within the cluster fails, the hadoop framework will replace that particular machine with another machine.
  • 43. ● Flexible : Hadoop is schema-less, and can absorb any type of data, structured or not, from any number of sources. ● Fault-tolerant : When you lose a node, the system redirects work to another location of the data and continue processing without missing a beat. Features of Hadoop
  • 44. Limitations of Hadoop ● Security concerns ● Vulnerable by nature ● Not fit for Small data ● Potential steability issues
  • 45. What is Hadoop used for? ● Search – Yahoo, Amazon, Zvents • Log processing – Facebook, Yahoo, ContextWeb. Joost, Last.fm • Recommendation Systems – Facebook • Data Warehouse – Facebook, AOL(America Online) • Video and Image Analysis – New York Times, Eyealike
  • 46. Conclusion ➔Hadoop has been very effective for companies dealing with the data in petabytes. ➔It has solved many problems in industry related to huge data management and distributed system. ➔As it is open source, so it is adopted by companies widely.
  • 47. References ● www.dezyre.com/Big-Data-and-Hadoop ● www.cloudera.com/content/www/...hadoop /hdfs-mapreduce-yarn.html ● www.ufaber.com/hadoop/bigbata/free ● www.psgtech.edu/yrgcc/attach/haoop_archite cture.ppt