SlideShare una empresa de Scribd logo
1 de 18
1
BIG DATA ANALYSIS
Problem Statement..
2
3
Solution
• Open Source Apache Project
• Runs on
• Linux, Mac OS/X, Windows, and Solaris
• Commodity hardware
Who uses hadoop ?
4
Hadoop optimized content
What is Hadoop ?
• Hadoop is a software framework for distributed processing
of large datasets across large clusters of computers
• Large datasets  Terabytes or petabytes of data
• Large clusters  hundreds or thousands of nodes
• Hadoop is based on a simple programming model called
MapReduce
5
Hadoopclusters
One of the largest clusters in the world runs on around 2000 nodes at yahoo does
hundred's of jobs every month
6
Hadoop Architecture
7
Master node (single node)
Many slave nodes
• Distributed file system (HDFS)
• Execution engine (MapReduce)
Hadoop: How it Works??
8
Hadoop Distributed File System (HDFS)
9
Centralized namenode
- Maintains metadata info about files
Many datanode (1000s)
- Store the actual data
- Files are divided into blocks
- Each block is replicated N times
(Default = 3)
File F 1 2 3 4 5
Blocks (64 MB)
Main Properties of HDFS
• Large
• Replication
• Failure
• Fault Tolerance
10
Hadoop MapReduce
• Handles large-scale web search applications
• Popular programming module for data centres
• Effective programming model for data mining, machine
learning and search applications in data centers.
• Helps to abstract from the issues of parallelization, scheduling,
input partitioning, failover, replication.
11
MapReduce Phases
12
Properties of MapReduce Engine
• Job Tracker is the master node (runs with the namenode)
• Receives the user’s job
• Number of mappers
• Concept of locality
13
• This file has 5 Blocks  run 5 map tasks
• Where to run the task reading block “1”
• Try to run it on Node 1 or Node 3
Node 1 Node 2 Node 3
Properties of MapReduce Engine
(Cont’d)
• Task Tracker is the slave node (runs on each datanode)
• Receives the task from Job Tracker
• Runs the task until completion (either map or reduce task)
• Always in communication with the Job Tracker reporting progress
14
Reduce
Reduce
Reduce
Map
Map
Map
Map
Parse-hash
Parse-hash
Parse-hash
Parse-hash
In this example, 1 map-reduce
job consists of 4 map tasks and 3
reduce tasks
Key-Value Pairs
• Mappers and Reducers are users’ code (provided functions)
• Just need to obey the Key-Value pairs interface
• Mappers:
• Consume <key, value> pairs
• Produce <key, value> pairs
• Reducers:
• Consume <key, <list of values>>
• Produce <key, value>
• Shuffling and Sorting:
• Hidden phase between mappers and reducers
• Groups all similar keys from all mappers, sorts and passes them to a
certain reducer in the form of <key, <list of values>> 15
Example 1: Word Count
• Job: Count the occurrences of each word in a data set
16
Map
Tasks
Reduce
Tasks
17
THANK YOU..!!! 18

Más contenido relacionado

La actualidad más candente

Hadoop_EcoSystem_Pradeep_MG
Hadoop_EcoSystem_Pradeep_MGHadoop_EcoSystem_Pradeep_MG
Hadoop_EcoSystem_Pradeep_MGPradeep MG
 
Streaming API, Spark and Ruby
Streaming API, Spark and RubyStreaming API, Spark and Ruby
Streaming API, Spark and RubyManohar Amrutkar
 
Introduction to hadoop
Introduction to hadoopIntroduction to hadoop
Introduction to hadoopRon Sher
 
Apache Hadoop Big Data Technology
Apache Hadoop Big Data TechnologyApache Hadoop Big Data Technology
Apache Hadoop Big Data TechnologyJay Nagar
 
Big data hadoop training in pune course content advanto software
Big data hadoop training in pune course content advanto softwareBig data hadoop training in pune course content advanto software
Big data hadoop training in pune course content advanto softwareAdvanto Software
 
MapReduce Paradigm
MapReduce ParadigmMapReduce Paradigm
MapReduce ParadigmDilip Reddy
 
Hadoop: The elephant in the room
Hadoop: The elephant in the roomHadoop: The elephant in the room
Hadoop: The elephant in the roomcacois
 
Hadoop course curriculm
Hadoop course curriculm Hadoop course curriculm
Hadoop course curriculm alogarg
 
2011.10.14 Apache Giraph - Hortonworks
2011.10.14 Apache Giraph - Hortonworks2011.10.14 Apache Giraph - Hortonworks
2011.10.14 Apache Giraph - HortonworksAvery Ching
 
TritonSort: A Balanced Large-Scale Sorting System (NSDI 2011)
TritonSort: A Balanced Large-Scale Sorting System (NSDI 2011)TritonSort: A Balanced Large-Scale Sorting System (NSDI 2011)
TritonSort: A Balanced Large-Scale Sorting System (NSDI 2011)Alex Rasmussen
 
Geek camp
Geek campGeek camp
Geek campjdhok
 

La actualidad más candente (20)

Hadoop_EcoSystem_Pradeep_MG
Hadoop_EcoSystem_Pradeep_MGHadoop_EcoSystem_Pradeep_MG
Hadoop_EcoSystem_Pradeep_MG
 
Streaming API, Spark and Ruby
Streaming API, Spark and RubyStreaming API, Spark and Ruby
Streaming API, Spark and Ruby
 
Introduction to hadoop
Introduction to hadoopIntroduction to hadoop
Introduction to hadoop
 
Future of HCatalog
Future of HCatalogFuture of HCatalog
Future of HCatalog
 
Apache Hadoop Big Data Technology
Apache Hadoop Big Data TechnologyApache Hadoop Big Data Technology
Apache Hadoop Big Data Technology
 
Big data hadoop training in pune course content advanto software
Big data hadoop training in pune course content advanto softwareBig data hadoop training in pune course content advanto software
Big data hadoop training in pune course content advanto software
 
2012 apache hadoop_map_reduce_windows_azure
2012 apache hadoop_map_reduce_windows_azure2012 apache hadoop_map_reduce_windows_azure
2012 apache hadoop_map_reduce_windows_azure
 
MapReduce Paradigm
MapReduce ParadigmMapReduce Paradigm
MapReduce Paradigm
 
Hadoop: The elephant in the room
Hadoop: The elephant in the roomHadoop: The elephant in the room
Hadoop: The elephant in the room
 
Hadoop - Introduction to HDFS
Hadoop - Introduction to HDFSHadoop - Introduction to HDFS
Hadoop - Introduction to HDFS
 
Apache Giraph
Apache GiraphApache Giraph
Apache Giraph
 
Hadoop course curriculm
Hadoop course curriculm Hadoop course curriculm
Hadoop course curriculm
 
Resource scheduling
Resource schedulingResource scheduling
Resource scheduling
 
Hadoop Technology
Hadoop TechnologyHadoop Technology
Hadoop Technology
 
2011.10.14 Apache Giraph - Hortonworks
2011.10.14 Apache Giraph - Hortonworks2011.10.14 Apache Giraph - Hortonworks
2011.10.14 Apache Giraph - Hortonworks
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
 
Hadoop
HadoopHadoop
Hadoop
 
Hadoop fault-tolerance
Hadoop fault-toleranceHadoop fault-tolerance
Hadoop fault-tolerance
 
TritonSort: A Balanced Large-Scale Sorting System (NSDI 2011)
TritonSort: A Balanced Large-Scale Sorting System (NSDI 2011)TritonSort: A Balanced Large-Scale Sorting System (NSDI 2011)
TritonSort: A Balanced Large-Scale Sorting System (NSDI 2011)
 
Geek camp
Geek campGeek camp
Geek camp
 

Destacado

Resume of Vimal 4.1
Resume of Vimal 4.1Resume of Vimal 4.1
Resume of Vimal 4.1Vimal Suthar
 
Alphago vs Lee Se-Dol : Tweeter Analysis using Hadoop and Spark
Alphago vs Lee Se-Dol: Tweeter Analysis using Hadoop and SparkAlphago vs Lee Se-Dol: Tweeter Analysis using Hadoop and Spark
Alphago vs Lee Se-Dol : Tweeter Analysis using Hadoop and SparkJongwook Woo
 
Basic Sentiment Analysis using Hive
Basic Sentiment Analysis using HiveBasic Sentiment Analysis using Hive
Basic Sentiment Analysis using HiveQubole
 
Traffic data analysis using HADOOP
Traffic data analysis using HADOOPTraffic data analysis using HADOOP
Traffic data analysis using HADOOPKirthan S Holla
 
Hadoop - Stock Analysis
Hadoop - Stock AnalysisHadoop - Stock Analysis
Hadoop - Stock AnalysisVaibhav Jain
 
TRAFFIC DATA ANALYSIS USING HADOOP
TRAFFIC DATA ANALYSIS USING HADOOPTRAFFIC DATA ANALYSIS USING HADOOP
TRAFFIC DATA ANALYSIS USING HADOOPKirthan S Holla
 
Log analysis with Hadoop in livedoor 2013
Log analysis with Hadoop in livedoor 2013Log analysis with Hadoop in livedoor 2013
Log analysis with Hadoop in livedoor 2013SATOSHI TAGOMORI
 
Distributed Data Analysis with Hadoop and R - Strangeloop 2011
Distributed Data Analysis with Hadoop and R - Strangeloop 2011Distributed Data Analysis with Hadoop and R - Strangeloop 2011
Distributed Data Analysis with Hadoop and R - Strangeloop 2011Jonathan Seidman
 
Best Practices for Hadoop Data Analysis with Tableau and Hortonworks Data Pla...
Best Practices for Hadoop Data Analysis with Tableau and Hortonworks Data Pla...Best Practices for Hadoop Data Analysis with Tableau and Hortonworks Data Pla...
Best Practices for Hadoop Data Analysis with Tableau and Hortonworks Data Pla...Hortonworks
 
HW09 Social network analysis with Hadoop
HW09 Social network analysis with HadoopHW09 Social network analysis with Hadoop
HW09 Social network analysis with HadoopCloudera, Inc.
 
Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011
Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011
Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011Jonathan Seidman
 
Escape from Hadoop: Ultra Fast Data Analysis with Spark & Cassandra
Escape from Hadoop: Ultra Fast Data Analysis with Spark & CassandraEscape from Hadoop: Ultra Fast Data Analysis with Spark & Cassandra
Escape from Hadoop: Ultra Fast Data Analysis with Spark & CassandraPiotr Kolaczkowski
 
Hadoop 31-frequently-asked-interview-questions
Hadoop 31-frequently-asked-interview-questionsHadoop 31-frequently-asked-interview-questions
Hadoop 31-frequently-asked-interview-questionsAsad Masood Qazi
 
Twitter sentiment analysis project report
Twitter sentiment analysis project reportTwitter sentiment analysis project report
Twitter sentiment analysis project reportBharat Khanna
 
Practical sentiment analysis
Practical sentiment analysisPractical sentiment analysis
Practical sentiment analysisDiana Maynard
 
Facebooks Petabyte Scale Data Warehouse using Hive and Hadoop
Facebooks Petabyte Scale Data Warehouse using Hive and HadoopFacebooks Petabyte Scale Data Warehouse using Hive and Hadoop
Facebooks Petabyte Scale Data Warehouse using Hive and Hadooproyans
 
Introduction to Sentiment Analysis
Introduction to Sentiment AnalysisIntroduction to Sentiment Analysis
Introduction to Sentiment AnalysisJaganadh Gopinadhan
 
Large-scale social media analysis with Hadoop
Large-scale social media analysis with HadoopLarge-scale social media analysis with Hadoop
Large-scale social media analysis with Hadoopjakehofman
 

Destacado (20)

Resume of Vimal 4.1
Resume of Vimal 4.1Resume of Vimal 4.1
Resume of Vimal 4.1
 
Alphago vs Lee Se-Dol : Tweeter Analysis using Hadoop and Spark
Alphago vs Lee Se-Dol: Tweeter Analysis using Hadoop and SparkAlphago vs Lee Se-Dol: Tweeter Analysis using Hadoop and Spark
Alphago vs Lee Se-Dol : Tweeter Analysis using Hadoop and Spark
 
Basic Sentiment Analysis using Hive
Basic Sentiment Analysis using HiveBasic Sentiment Analysis using Hive
Basic Sentiment Analysis using Hive
 
Traffic data analysis using HADOOP
Traffic data analysis using HADOOPTraffic data analysis using HADOOP
Traffic data analysis using HADOOP
 
Hadoop - Stock Analysis
Hadoop - Stock AnalysisHadoop - Stock Analysis
Hadoop - Stock Analysis
 
TRAFFIC DATA ANALYSIS USING HADOOP
TRAFFIC DATA ANALYSIS USING HADOOPTRAFFIC DATA ANALYSIS USING HADOOP
TRAFFIC DATA ANALYSIS USING HADOOP
 
Log analysis with Hadoop in livedoor 2013
Log analysis with Hadoop in livedoor 2013Log analysis with Hadoop in livedoor 2013
Log analysis with Hadoop in livedoor 2013
 
Distributed Data Analysis with Hadoop and R - Strangeloop 2011
Distributed Data Analysis with Hadoop and R - Strangeloop 2011Distributed Data Analysis with Hadoop and R - Strangeloop 2011
Distributed Data Analysis with Hadoop and R - Strangeloop 2011
 
Best Practices for Hadoop Data Analysis with Tableau and Hortonworks Data Pla...
Best Practices for Hadoop Data Analysis with Tableau and Hortonworks Data Pla...Best Practices for Hadoop Data Analysis with Tableau and Hortonworks Data Pla...
Best Practices for Hadoop Data Analysis with Tableau and Hortonworks Data Pla...
 
HW09 Social network analysis with Hadoop
HW09 Social network analysis with HadoopHW09 Social network analysis with Hadoop
HW09 Social network analysis with Hadoop
 
Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011
Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011
Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011
 
Escape from Hadoop: Ultra Fast Data Analysis with Spark & Cassandra
Escape from Hadoop: Ultra Fast Data Analysis with Spark & CassandraEscape from Hadoop: Ultra Fast Data Analysis with Spark & Cassandra
Escape from Hadoop: Ultra Fast Data Analysis with Spark & Cassandra
 
Hadoop Interview Questions and Answers
Hadoop Interview Questions and AnswersHadoop Interview Questions and Answers
Hadoop Interview Questions and Answers
 
Hadoop 31-frequently-asked-interview-questions
Hadoop 31-frequently-asked-interview-questionsHadoop 31-frequently-asked-interview-questions
Hadoop 31-frequently-asked-interview-questions
 
Twitter sentiment analysis project report
Twitter sentiment analysis project reportTwitter sentiment analysis project report
Twitter sentiment analysis project report
 
Practical sentiment analysis
Practical sentiment analysisPractical sentiment analysis
Practical sentiment analysis
 
Video Analysis in Hadoop
Video Analysis in HadoopVideo Analysis in Hadoop
Video Analysis in Hadoop
 
Facebooks Petabyte Scale Data Warehouse using Hive and Hadoop
Facebooks Petabyte Scale Data Warehouse using Hive and HadoopFacebooks Petabyte Scale Data Warehouse using Hive and Hadoop
Facebooks Petabyte Scale Data Warehouse using Hive and Hadoop
 
Introduction to Sentiment Analysis
Introduction to Sentiment AnalysisIntroduction to Sentiment Analysis
Introduction to Sentiment Analysis
 
Large-scale social media analysis with Hadoop
Large-scale social media analysis with HadoopLarge-scale social media analysis with Hadoop
Large-scale social media analysis with Hadoop
 

Similar a Hadoop data analysis

Hadoop Map-Reduce from the subject: Big Data Analytics
Hadoop Map-Reduce from the subject: Big Data AnalyticsHadoop Map-Reduce from the subject: Big Data Analytics
Hadoop Map-Reduce from the subject: Big Data AnalyticsRUHULAMINHAZARIKA
 
Hadoop trainting in hyderabad@kelly technologies
Hadoop trainting in hyderabad@kelly technologiesHadoop trainting in hyderabad@kelly technologies
Hadoop trainting in hyderabad@kelly technologiesKelly Technologies
 
Big Data Technologies - Hadoop
Big Data Technologies - HadoopBig Data Technologies - Hadoop
Big Data Technologies - HadoopTalentica Software
 
Hadoop-Quick introduction
Hadoop-Quick introductionHadoop-Quick introduction
Hadoop-Quick introductionSandeep Singh
 
Hadoop trainting-in-hyderabad@kelly technologies
Hadoop trainting-in-hyderabad@kelly technologiesHadoop trainting-in-hyderabad@kelly technologies
Hadoop trainting-in-hyderabad@kelly technologiesKelly Technologies
 
Hadoop – Architecture.pptx
Hadoop – Architecture.pptxHadoop – Architecture.pptx
Hadoop – Architecture.pptxSakthiVinoth78
 
Hadoop and Mapreduce for .NET User Group
Hadoop and Mapreduce for .NET User GroupHadoop and Mapreduce for .NET User Group
Hadoop and Mapreduce for .NET User GroupCsaba Toth
 
Hadoop institutes-in-bangalore
Hadoop institutes-in-bangaloreHadoop institutes-in-bangalore
Hadoop institutes-in-bangaloreKelly Technologies
 
Hadoop fault tolerance
Hadoop  fault toleranceHadoop  fault tolerance
Hadoop fault tolerancePallav Jha
 
Large scale computing with mapreduce
Large scale computing with mapreduceLarge scale computing with mapreduce
Large scale computing with mapreducehansen3032
 
Asbury Hadoop Overview
Asbury Hadoop OverviewAsbury Hadoop Overview
Asbury Hadoop OverviewBrian Enochson
 
Big Data and Hadoop with MapReduce Paradigms
Big Data and Hadoop with MapReduce ParadigmsBig Data and Hadoop with MapReduce Paradigms
Big Data and Hadoop with MapReduce ParadigmsArundhati Kanungo
 

Similar a Hadoop data analysis (20)

Hadoop
HadoopHadoop
Hadoop
 
Hadoop Map-Reduce from the subject: Big Data Analytics
Hadoop Map-Reduce from the subject: Big Data AnalyticsHadoop Map-Reduce from the subject: Big Data Analytics
Hadoop Map-Reduce from the subject: Big Data Analytics
 
002 Introduction to hadoop v3
002   Introduction to hadoop v3002   Introduction to hadoop v3
002 Introduction to hadoop v3
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
2. hadoop fundamentals
2. hadoop fundamentals2. hadoop fundamentals
2. hadoop fundamentals
 
Hadoop trainting in hyderabad@kelly technologies
Hadoop trainting in hyderabad@kelly technologiesHadoop trainting in hyderabad@kelly technologies
Hadoop trainting in hyderabad@kelly technologies
 
Hadoop
HadoopHadoop
Hadoop
 
Big Data Technologies - Hadoop
Big Data Technologies - HadoopBig Data Technologies - Hadoop
Big Data Technologies - Hadoop
 
Big data
Big dataBig data
Big data
 
Big data
Big dataBig data
Big data
 
Hadoop-Quick introduction
Hadoop-Quick introductionHadoop-Quick introduction
Hadoop-Quick introduction
 
Hadoop trainting-in-hyderabad@kelly technologies
Hadoop trainting-in-hyderabad@kelly technologiesHadoop trainting-in-hyderabad@kelly technologies
Hadoop trainting-in-hyderabad@kelly technologies
 
Hadoop – Architecture.pptx
Hadoop – Architecture.pptxHadoop – Architecture.pptx
Hadoop – Architecture.pptx
 
Hadoop and Mapreduce for .NET User Group
Hadoop and Mapreduce for .NET User GroupHadoop and Mapreduce for .NET User Group
Hadoop and Mapreduce for .NET User Group
 
Hadoop institutes-in-bangalore
Hadoop institutes-in-bangaloreHadoop institutes-in-bangalore
Hadoop institutes-in-bangalore
 
Hadoop fault tolerance
Hadoop  fault toleranceHadoop  fault tolerance
Hadoop fault tolerance
 
Large scale computing with mapreduce
Large scale computing with mapreduceLarge scale computing with mapreduce
Large scale computing with mapreduce
 
Asbury Hadoop Overview
Asbury Hadoop OverviewAsbury Hadoop Overview
Asbury Hadoop Overview
 
Big Data and Hadoop with MapReduce Paradigms
Big Data and Hadoop with MapReduce ParadigmsBig Data and Hadoop with MapReduce Paradigms
Big Data and Hadoop with MapReduce Paradigms
 

Último

Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 

Último (20)

Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 

Hadoop data analysis

  • 3. 3 Solution • Open Source Apache Project • Runs on • Linux, Mac OS/X, Windows, and Solaris • Commodity hardware
  • 4. Who uses hadoop ? 4 Hadoop optimized content
  • 5. What is Hadoop ? • Hadoop is a software framework for distributed processing of large datasets across large clusters of computers • Large datasets  Terabytes or petabytes of data • Large clusters  hundreds or thousands of nodes • Hadoop is based on a simple programming model called MapReduce 5
  • 6. Hadoopclusters One of the largest clusters in the world runs on around 2000 nodes at yahoo does hundred's of jobs every month 6
  • 7. Hadoop Architecture 7 Master node (single node) Many slave nodes • Distributed file system (HDFS) • Execution engine (MapReduce)
  • 8. Hadoop: How it Works?? 8
  • 9. Hadoop Distributed File System (HDFS) 9 Centralized namenode - Maintains metadata info about files Many datanode (1000s) - Store the actual data - Files are divided into blocks - Each block is replicated N times (Default = 3) File F 1 2 3 4 5 Blocks (64 MB)
  • 10. Main Properties of HDFS • Large • Replication • Failure • Fault Tolerance 10
  • 11. Hadoop MapReduce • Handles large-scale web search applications • Popular programming module for data centres • Effective programming model for data mining, machine learning and search applications in data centers. • Helps to abstract from the issues of parallelization, scheduling, input partitioning, failover, replication. 11
  • 13. Properties of MapReduce Engine • Job Tracker is the master node (runs with the namenode) • Receives the user’s job • Number of mappers • Concept of locality 13 • This file has 5 Blocks  run 5 map tasks • Where to run the task reading block “1” • Try to run it on Node 1 or Node 3 Node 1 Node 2 Node 3
  • 14. Properties of MapReduce Engine (Cont’d) • Task Tracker is the slave node (runs on each datanode) • Receives the task from Job Tracker • Runs the task until completion (either map or reduce task) • Always in communication with the Job Tracker reporting progress 14 Reduce Reduce Reduce Map Map Map Map Parse-hash Parse-hash Parse-hash Parse-hash In this example, 1 map-reduce job consists of 4 map tasks and 3 reduce tasks
  • 15. Key-Value Pairs • Mappers and Reducers are users’ code (provided functions) • Just need to obey the Key-Value pairs interface • Mappers: • Consume <key, value> pairs • Produce <key, value> pairs • Reducers: • Consume <key, <list of values>> • Produce <key, value> • Shuffling and Sorting: • Hidden phase between mappers and reducers • Groups all similar keys from all mappers, sorts and passes them to a certain reducer in the form of <key, <list of values>> 15
  • 16. Example 1: Word Count • Job: Count the occurrences of each word in a data set 16 Map Tasks Reduce Tasks
  • 17. 17