SlideShare una empresa de Scribd logo
1 de 41
Big Data and Hadoop Essentials
2
Hadoop Ecosystem
Agenda
Map Reduce Algorithm Exemplified
Hadoop Architecture
Brief History in time
Why Hadoop?
How Big is Big Data?
Demo
3
Brief History in time
In pioneer days they used oxen for heavy pulling, and when one ox couldn’t
budge a log, they didn’t try to grow a larger ox. We shouldn’t be trying for
bigger computers, but more systems of computers.
—Grace Hopper, American Computer Scientist
4
How Big is Big Data?
5
How Big is Big Data?
6
How Big is Big Data?
7
Why Hadoop?
8
The Problem
9
BIG
DATA
Volume
Big Data comes in on large
scale. Its on TB and even PB
Records, Transaction,
Tables , Files
Veracity
Quality, consistency,
reliability and provenance of
data
Good, bad, undefined,
inconsistency, incomplete.
Variety
Big Data extends structured,
including semi- structured and
unstructured data of all variety
text, log, xml, audio, video, stream,
flat files
Velocity
Data flown continues, time
sensitive, streaming flow
Batch, Real time, Streams,
Historic
Challenges in managing Big Data
10
To overcome Big Data challenges Hadoop evolves
• Cost Effective – Commodity HW
• Big Cluster – (1000 Nodes) --- Provides Storage &
Processing
• Parallel Processing – Map reduce
• Big Storage – Memory per node * no of Nodes / RF
• Fail over mechanism – Automatic Failover
• Data Distribution
• Moving Code to data
• Heterogeneous Hardware System
(IBM,HP,AIX,Oracle Machine of any
memory and CPU configuration)
• Scalable
11
What Exactly is Hadoop?
12
What’s in a name?
13
Hadoop Vendors
14
Who uses Hadoop?
15
Why Hadoop is used for?
16
Stop and Ponder
• Is Hadoop an alternative for RDBMS?
• Hadoop is not replacing the traditional data systems used for building
analytic applications – the RDBMS, EDW and MPP systems – but rather is a
complement. & Works fine together with RDBMs.
• Hadoop is being used to distill large quantities of data into something more
manageable
17
Stop and Ponder
• But Don’t we know Coherence to be distributed too? Why Hadoop?
Coherence is the market leading In-Memory Data Grid. While Hadoop works fine
for large processing operations, i.e. requiring many TB of data, that can be
processed in a batch like way, there are use cases where the processing
requirements are more real-time and the data volumes are smaller, where
Coherence is a better choice than HDFS for storing the data
18
Hadoop vs. RDBMS
RDBMS MapReduce
Data size Gigabytes Petabytes
Access Interactive and batch Batch
Structure Fixed schema Unstructured schema
Language SQL Procedural (Java, C++, Ruby, etc)
Integrity High Low
Scaling Nonlinear Linear
Updates Read and write Write once, read many times
Latency Low High
19
Using Hadoop in Enterprise
20
Hadoop Architecture
• Hadoop Distributed File System (HDFS™): A distributed file system that
provides high-throughput access to application data.
• Hadoop MapReduce: A software framework for distributed processing of
large data sets on compute clusters.
HDFS
Map
Reduce
Hadoop
21
Hadoop Distributed File System(HDFS)
22
HDFS Architecture(Master-Slave)
Secondary
Name Node
Master
Book Keeper
Slave(s)
Periodic checkpoint
Data Block
23
The CORE
CLIENT
Data Analytics Jobs
Map Reduce
Data Storage Jobs
HDFS
MASTER
SLAVE
= HDFS
24
Hadoop Ecosystem
25
MAP REDUCE Algorithm exemplified!
Calculate the yearly average per state.
26
Group the city average temperatures by state
1
27
We don’t really care about the city names, so we will
discard those and keep only the state names and cities
Temperatures.
2
28
3
We’re going to get a list of temperatures averages for each
state.
29
That was Map/Reduce!
4
All we have to do is to calculate the average
temperature for each state.
30
Let’s do it again…
• Map/Reduce has 3 stages : Map/Shuffle/Reduce
• The Shuffle part is done automatically by Hadoop, you just need to
implement the Map and Reduce parts.
• You get input data as <Key,Value> for the Map part.
• In this example, the Key is the City name, and the Value is the set
of attributes : State and City yearly average temperature.
31
• Since you want to regroup your temperatures by state, you’re going to get
rid of the city name, and the State will become the Key, while the
Temperature will become the Value.
32
Shuffle
• Now, the shuffle task will run on the output of the Map task. It is going to
group all the values by Key, and you’ll get a List<Value>
33
Reduce
• The Reduce task is the one that does the logic on the data, in our case this
is the calculation of the State yearly average temperature.
• And that’s what we will get as final output
34
Hadoop AppStore
35
Ecosystem Matrix
36
Pig and HIVE in the Hadoop Ecosystem
37
Hadoop Ecosystem Development
38
Demo
39
References
• http://hadoop.apache.org/
• http://hadoop.apache.org/hive/
• Hadoop in Action
(http://www.manning.com/lam/)
• Definitive Guide to Hadoop, 2nd ed.
(http://oreilly.com/catalog/0636920010388)
• Yahoo! Hadoop blog
(http://developer.yahoo.net/blogs/hadoop/)
• Cloudera
(http://www.cloudera.com/)
40
Q & A
41
Thank You

Más contenido relacionado

La actualidad más candente

Big data processing with apache spark part1
Big data processing with apache spark   part1Big data processing with apache spark   part1
Big data processing with apache spark part1Abbas Maazallahi
 
Introduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-SystemIntroduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-SystemMd. Hasan Basri (Angel)
 
Big Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerBig Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerMark Kromer
 
Big data with Hadoop - Introduction
Big data with Hadoop - IntroductionBig data with Hadoop - Introduction
Big data with Hadoop - IntroductionTomy Rhymond
 
Big data introduction, Hadoop in details
Big data introduction, Hadoop in detailsBig data introduction, Hadoop in details
Big data introduction, Hadoop in detailsMahmoud Yassin
 
Intro to Big Data Hadoop
Intro to Big Data HadoopIntro to Big Data Hadoop
Intro to Big Data HadoopApache Apex
 
Big data technologies and Hadoop infrastructure
Big data technologies and Hadoop infrastructureBig data technologies and Hadoop infrastructure
Big data technologies and Hadoop infrastructureRoman Nikitchenko
 
Big Data Real Time Applications
Big Data Real Time ApplicationsBig Data Real Time Applications
Big Data Real Time ApplicationsDataWorks Summit
 
Introduction of Big data, NoSQL & Hadoop
Introduction of Big data, NoSQL & HadoopIntroduction of Big data, NoSQL & Hadoop
Introduction of Big data, NoSQL & HadoopSavvycom Savvycom
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and HadoopFlavio Vit
 
Introduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopIntroduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopAmir Shaikh
 
Emergent Distributed Data Storage
Emergent Distributed Data StorageEmergent Distributed Data Storage
Emergent Distributed Data Storagehybrid cloud
 
Whatisbigdataandwhylearnhadoop
WhatisbigdataandwhylearnhadoopWhatisbigdataandwhylearnhadoop
WhatisbigdataandwhylearnhadoopEdureka!
 
Big Data and Hadoop Basics
Big Data and Hadoop BasicsBig Data and Hadoop Basics
Big Data and Hadoop BasicsSonal Tiwari
 

La actualidad más candente (20)

Big data processing with apache spark part1
Big data processing with apache spark   part1Big data processing with apache spark   part1
Big data processing with apache spark part1
 
Introduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-SystemIntroduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-System
 
Big Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerBig Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL Server
 
What is hadoop
What is hadoopWhat is hadoop
What is hadoop
 
Big data with Hadoop - Introduction
Big data with Hadoop - IntroductionBig data with Hadoop - Introduction
Big data with Hadoop - Introduction
 
Big Data Concepts
Big Data ConceptsBig Data Concepts
Big Data Concepts
 
Big data introduction, Hadoop in details
Big data introduction, Hadoop in detailsBig data introduction, Hadoop in details
Big data introduction, Hadoop in details
 
Big data concepts
Big data conceptsBig data concepts
Big data concepts
 
Intro to Big Data Hadoop
Intro to Big Data HadoopIntro to Big Data Hadoop
Intro to Big Data Hadoop
 
Big data technologies and Hadoop infrastructure
Big data technologies and Hadoop infrastructureBig data technologies and Hadoop infrastructure
Big data technologies and Hadoop infrastructure
 
Big Data Real Time Applications
Big Data Real Time ApplicationsBig Data Real Time Applications
Big Data Real Time Applications
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Big data Analytics Hadoop
Big data Analytics HadoopBig data Analytics Hadoop
Big data Analytics Hadoop
 
Introduction of Big data, NoSQL & Hadoop
Introduction of Big data, NoSQL & HadoopIntroduction of Big data, NoSQL & Hadoop
Introduction of Big data, NoSQL & Hadoop
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
Introduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopIntroduction to BIg Data and Hadoop
Introduction to BIg Data and Hadoop
 
BIG DATA
BIG DATABIG DATA
BIG DATA
 
Emergent Distributed Data Storage
Emergent Distributed Data StorageEmergent Distributed Data Storage
Emergent Distributed Data Storage
 
Whatisbigdataandwhylearnhadoop
WhatisbigdataandwhylearnhadoopWhatisbigdataandwhylearnhadoop
Whatisbigdataandwhylearnhadoop
 
Big Data and Hadoop Basics
Big Data and Hadoop BasicsBig Data and Hadoop Basics
Big Data and Hadoop Basics
 

Similar a Hadoop and big data

Hadoop and Mapreduce for .NET User Group
Hadoop and Mapreduce for .NET User GroupHadoop and Mapreduce for .NET User Group
Hadoop and Mapreduce for .NET User GroupCsaba Toth
 
Introduction to Apache Hadoop
Introduction to Apache HadoopIntroduction to Apache Hadoop
Introduction to Apache HadoopChristopher Pezza
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation HadoopVarun Narang
 
Hadoop introduction , Why and What is Hadoop ?
Hadoop introduction , Why and What is  Hadoop ?Hadoop introduction , Why and What is  Hadoop ?
Hadoop introduction , Why and What is Hadoop ?sudhakara st
 
Map reduce and hadoop at mylife
Map reduce and hadoop at mylifeMap reduce and hadoop at mylife
Map reduce and hadoop at myliferesponseteam
 
Big data and hadoop overvew
Big data and hadoop overvewBig data and hadoop overvew
Big data and hadoop overvewKunal Khanna
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and HadoopMr. Ankit
 
Introduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduceIntroduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduceCsaba Toth
 
Hadoop Master Class : A concise overview
Hadoop Master Class : A concise overviewHadoop Master Class : A concise overview
Hadoop Master Class : A concise overviewAbhishek Roy
 
Hadoop: Distributed Data Processing
Hadoop: Distributed Data ProcessingHadoop: Distributed Data Processing
Hadoop: Distributed Data ProcessingCloudera, Inc.
 
Intro to BigData , Hadoop and Mapreduce
Intro to BigData , Hadoop and MapreduceIntro to BigData , Hadoop and Mapreduce
Intro to BigData , Hadoop and MapreduceKrishna Sangeeth KS
 
Cisco connect toronto 2015 big data sean mc keown
Cisco connect toronto 2015 big data  sean mc keownCisco connect toronto 2015 big data  sean mc keown
Cisco connect toronto 2015 big data sean mc keownCisco Canada
 
Big Data Architecture and Deployment
Big Data Architecture and DeploymentBig Data Architecture and Deployment
Big Data Architecture and DeploymentCisco Canada
 
Big data analytics: Technology's bleeding edge
Big data analytics: Technology's bleeding edgeBig data analytics: Technology's bleeding edge
Big data analytics: Technology's bleeding edgeBhavya Gulati
 

Similar a Hadoop and big data (20)

Hadoop and Mapreduce for .NET User Group
Hadoop and Mapreduce for .NET User GroupHadoop and Mapreduce for .NET User Group
Hadoop and Mapreduce for .NET User Group
 
Hadoop by sunitha
Hadoop by sunithaHadoop by sunitha
Hadoop by sunitha
 
Introduction to Apache Hadoop
Introduction to Apache HadoopIntroduction to Apache Hadoop
Introduction to Apache Hadoop
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation Hadoop
 
Hadoop introduction , Why and What is Hadoop ?
Hadoop introduction , Why and What is  Hadoop ?Hadoop introduction , Why and What is  Hadoop ?
Hadoop introduction , Why and What is Hadoop ?
 
Map reduce and hadoop at mylife
Map reduce and hadoop at mylifeMap reduce and hadoop at mylife
Map reduce and hadoop at mylife
 
Big data and hadoop overvew
Big data and hadoop overvewBig data and hadoop overvew
Big data and hadoop overvew
 
Hadoop Technology
Hadoop TechnologyHadoop Technology
Hadoop Technology
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
Introduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduceIntroduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduce
 
Hadoop Master Class : A concise overview
Hadoop Master Class : A concise overviewHadoop Master Class : A concise overview
Hadoop Master Class : A concise overview
 
Big Data
Big DataBig Data
Big Data
 
Hadoop: Distributed Data Processing
Hadoop: Distributed Data ProcessingHadoop: Distributed Data Processing
Hadoop: Distributed Data Processing
 
Intro to BigData , Hadoop and Mapreduce
Intro to BigData , Hadoop and MapreduceIntro to BigData , Hadoop and Mapreduce
Intro to BigData , Hadoop and Mapreduce
 
Cisco connect toronto 2015 big data sean mc keown
Cisco connect toronto 2015 big data  sean mc keownCisco connect toronto 2015 big data  sean mc keown
Cisco connect toronto 2015 big data sean mc keown
 
Big Data Architecture and Deployment
Big Data Architecture and DeploymentBig Data Architecture and Deployment
Big Data Architecture and Deployment
 
Hadoop
HadoopHadoop
Hadoop
 
Big data analytics: Technology's bleeding edge
Big data analytics: Technology's bleeding edgeBig data analytics: Technology's bleeding edge
Big data analytics: Technology's bleeding edge
 
Big data
Big dataBig data
Big data
 
Big data
Big dataBig data
Big data
 

Más de Yukti Kaura

Apache spark linkedin
Apache spark linkedinApache spark linkedin
Apache spark linkedinYukti Kaura
 
Cloud computing saas
Cloud computing   saasCloud computing   saas
Cloud computing saasYukti Kaura
 
Cloud computing - Basics and Beyond
Cloud computing - Basics and BeyondCloud computing - Basics and Beyond
Cloud computing - Basics and BeyondYukti Kaura
 
NodeJS ecosystem
NodeJS ecosystemNodeJS ecosystem
NodeJS ecosystemYukti Kaura
 
Web services for Laymen
Web services for LaymenWeb services for Laymen
Web services for LaymenYukti Kaura
 
Clean code - Agile Software Craftsmanship
Clean code - Agile Software CraftsmanshipClean code - Agile Software Craftsmanship
Clean code - Agile Software CraftsmanshipYukti Kaura
 
Basics of Flex Components, Skinning
Basics of Flex Components, SkinningBasics of Flex Components, Skinning
Basics of Flex Components, SkinningYukti Kaura
 

Más de Yukti Kaura (9)

Apache spark linkedin
Apache spark linkedinApache spark linkedin
Apache spark linkedin
 
Cloud computing saas
Cloud computing   saasCloud computing   saas
Cloud computing saas
 
Cloud computing - Basics and Beyond
Cloud computing - Basics and BeyondCloud computing - Basics and Beyond
Cloud computing - Basics and Beyond
 
NodeJS ecosystem
NodeJS ecosystemNodeJS ecosystem
NodeJS ecosystem
 
Web services for Laymen
Web services for LaymenWeb services for Laymen
Web services for Laymen
 
Spring batch
Spring batch Spring batch
Spring batch
 
Clean code - Agile Software Craftsmanship
Clean code - Agile Software CraftsmanshipClean code - Agile Software Craftsmanship
Clean code - Agile Software Craftsmanship
 
Maven overview
Maven overviewMaven overview
Maven overview
 
Basics of Flex Components, Skinning
Basics of Flex Components, SkinningBasics of Flex Components, Skinning
Basics of Flex Components, Skinning
 

Último

Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
IBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaIBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaManalVerma4
 
Networking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxNetworking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxHimangsuNath
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBoston Institute of Analytics
 
What To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxWhat To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxSimranPal17
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfblazblazml
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Boston Institute of Analytics
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Thomas Poetter
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024Susanna-Assunta Sansone
 
SMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxSMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxHaritikaChhatwal1
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataCyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataTecnoIncentive
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxaleedritatuxx
 
Principles and Practices of Data Visualization
Principles and Practices of Data VisualizationPrinciples and Practices of Data Visualization
Principles and Practices of Data VisualizationKianJazayeri1
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics
 
Rithik Kumar Singh codealpha pythohn.pdf
Rithik Kumar Singh codealpha pythohn.pdfRithik Kumar Singh codealpha pythohn.pdf
Rithik Kumar Singh codealpha pythohn.pdfrahulyadav957181
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max PrincetonTimothy Spann
 

Último (20)

Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
IBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaIBEF report on the Insurance market in India
IBEF report on the Insurance market in India
 
Networking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxNetworking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptx
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
 
What To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxWhat To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptx
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
 
Insurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis ProjectInsurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis Project
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
 
SMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxSMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptx
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataCyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded data
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
 
Principles and Practices of Data Visualization
Principles and Practices of Data VisualizationPrinciples and Practices of Data Visualization
Principles and Practices of Data Visualization
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
 
Rithik Kumar Singh codealpha pythohn.pdf
Rithik Kumar Singh codealpha pythohn.pdfRithik Kumar Singh codealpha pythohn.pdf
Rithik Kumar Singh codealpha pythohn.pdf
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max Princeton
 

Hadoop and big data

  • 1. Big Data and Hadoop Essentials
  • 2. 2 Hadoop Ecosystem Agenda Map Reduce Algorithm Exemplified Hadoop Architecture Brief History in time Why Hadoop? How Big is Big Data? Demo
  • 3. 3 Brief History in time In pioneer days they used oxen for heavy pulling, and when one ox couldn’t budge a log, they didn’t try to grow a larger ox. We shouldn’t be trying for bigger computers, but more systems of computers. —Grace Hopper, American Computer Scientist
  • 4. 4 How Big is Big Data?
  • 5. 5 How Big is Big Data?
  • 6. 6 How Big is Big Data?
  • 9. 9 BIG DATA Volume Big Data comes in on large scale. Its on TB and even PB Records, Transaction, Tables , Files Veracity Quality, consistency, reliability and provenance of data Good, bad, undefined, inconsistency, incomplete. Variety Big Data extends structured, including semi- structured and unstructured data of all variety text, log, xml, audio, video, stream, flat files Velocity Data flown continues, time sensitive, streaming flow Batch, Real time, Streams, Historic Challenges in managing Big Data
  • 10. 10 To overcome Big Data challenges Hadoop evolves • Cost Effective – Commodity HW • Big Cluster – (1000 Nodes) --- Provides Storage & Processing • Parallel Processing – Map reduce • Big Storage – Memory per node * no of Nodes / RF • Fail over mechanism – Automatic Failover • Data Distribution • Moving Code to data • Heterogeneous Hardware System (IBM,HP,AIX,Oracle Machine of any memory and CPU configuration) • Scalable
  • 15. 15 Why Hadoop is used for?
  • 16. 16 Stop and Ponder • Is Hadoop an alternative for RDBMS? • Hadoop is not replacing the traditional data systems used for building analytic applications – the RDBMS, EDW and MPP systems – but rather is a complement. & Works fine together with RDBMs. • Hadoop is being used to distill large quantities of data into something more manageable
  • 17. 17 Stop and Ponder • But Don’t we know Coherence to be distributed too? Why Hadoop? Coherence is the market leading In-Memory Data Grid. While Hadoop works fine for large processing operations, i.e. requiring many TB of data, that can be processed in a batch like way, there are use cases where the processing requirements are more real-time and the data volumes are smaller, where Coherence is a better choice than HDFS for storing the data
  • 18. 18 Hadoop vs. RDBMS RDBMS MapReduce Data size Gigabytes Petabytes Access Interactive and batch Batch Structure Fixed schema Unstructured schema Language SQL Procedural (Java, C++, Ruby, etc) Integrity High Low Scaling Nonlinear Linear Updates Read and write Write once, read many times Latency Low High
  • 19. 19 Using Hadoop in Enterprise
  • 20. 20 Hadoop Architecture • Hadoop Distributed File System (HDFS™): A distributed file system that provides high-throughput access to application data. • Hadoop MapReduce: A software framework for distributed processing of large data sets on compute clusters. HDFS Map Reduce Hadoop
  • 22. 22 HDFS Architecture(Master-Slave) Secondary Name Node Master Book Keeper Slave(s) Periodic checkpoint Data Block
  • 23. 23 The CORE CLIENT Data Analytics Jobs Map Reduce Data Storage Jobs HDFS MASTER SLAVE = HDFS
  • 25. 25 MAP REDUCE Algorithm exemplified! Calculate the yearly average per state.
  • 26. 26 Group the city average temperatures by state 1
  • 27. 27 We don’t really care about the city names, so we will discard those and keep only the state names and cities Temperatures. 2
  • 28. 28 3 We’re going to get a list of temperatures averages for each state.
  • 29. 29 That was Map/Reduce! 4 All we have to do is to calculate the average temperature for each state.
  • 30. 30 Let’s do it again… • Map/Reduce has 3 stages : Map/Shuffle/Reduce • The Shuffle part is done automatically by Hadoop, you just need to implement the Map and Reduce parts. • You get input data as <Key,Value> for the Map part. • In this example, the Key is the City name, and the Value is the set of attributes : State and City yearly average temperature.
  • 31. 31 • Since you want to regroup your temperatures by state, you’re going to get rid of the city name, and the State will become the Key, while the Temperature will become the Value.
  • 32. 32 Shuffle • Now, the shuffle task will run on the output of the Map task. It is going to group all the values by Key, and you’ll get a List<Value>
  • 33. 33 Reduce • The Reduce task is the one that does the logic on the data, in our case this is the calculation of the State yearly average temperature. • And that’s what we will get as final output
  • 36. 36 Pig and HIVE in the Hadoop Ecosystem
  • 39. 39 References • http://hadoop.apache.org/ • http://hadoop.apache.org/hive/ • Hadoop in Action (http://www.manning.com/lam/) • Definitive Guide to Hadoop, 2nd ed. (http://oreilly.com/catalog/0636920010388) • Yahoo! Hadoop blog (http://developer.yahoo.net/blogs/hadoop/) • Cloudera (http://www.cloudera.com/)