SlideShare una empresa de Scribd logo
1 de 24
Simon Elliston Ball
Head of Big Data - Red Gate Ventures
@sireb
Riding the Elephant:
Hadoop 2.0
http://bit.ly/RidingElephants
Append only distributed file-system
In the beginning…
Map Reduce
Java.
JVM Based (scala, groovy, jython, clojure)
More languages
Streaming (python, whatever)
HDP for Windows and .NET SDK
Abstraction
Photo: https://www.flickr.com/photos/puroticorico/
Hive, Pig
Cascading
Scalding
SQL on Hadoop
Learning to share the toys
HBase
Solr on Hadoop
Sharing HDFS…
Map Reduce v1
JobTracker
Job
Head Node
TaskTracker
Task (Map / Reduce)
Data Node
m slot 1
m slot 2
…
m slot n
Task
r slot 1
r slot 2
…
r slot n
TaskTracker
Task (Map / Reduce)
Data Node
m slot 1
m slot 2
…
m slot n
r slot 1
r slot 2
…
r slot n
TaskTracker
Task (Map / Reduce)
Data Node
m slot 1
m slot 2
…
m slot n
r slot 1
r slot 2
…
r slot n
Map Reduce v1
JobTracker
Job
Head Node
TaskTracker
Task (Map / Reduce)
Data Node
m slot 1
m slot 2
…
m slot n
MR Status
r slot 1
r slot 2
…
r slot n
TaskTracker
Task (Map / Reduce)
Data Node
m slot 1
m slot 2
…
m slot n
r slot 1
r slot 2
…
r slot n
TaskTracker
Task (Map / Reduce)
Data Node
m slot 1
m slot 2
…
m slot n
r slot 1
r slot 2
…
r slot n
Typical Hadoop 1.x setup
HBaseProductionAdhoc
Typical Hadoop 1.x setup
HBaseProductionAdhoc
YARN architecture
Container
Application
Master
Container
Data Node
Node Manager
Container ContainerContainer
Data Node
Node Manager
Application
Master
Container Free Slot
Data Node
Node Manager
Resource
Manager
YARN Client
YARN architecture
Container
Application
Master
Container
Data Node
Node Manager
Container ContainerContainer
Data Node
Node Manager
Application
Master
Container Free Slot
Data Node
Node Manager
Resource
Manager
YARN Client
YARN architecture
Container
Application
Master
Container
Data Node
Node Manager
Container ContainerContainer
Data Node
Node Manager
Application
Master
Container Free Slot
Data Node
Node Manager
Resource
Manager
YARN Client
YARN architecture
Container
Application
Master
Container
Data Node
Node Manager
Container ContainerContainer
Data Node
Node Manager
Application
Master
Container Free Slot
Data Node
Node Manager
Resource
Manager
YARN Client
Removing the choke point
Advantages
60%-150% better usage
Long running applications
Not quite…
Operating system for Big Data?
Security
…but a framework for Big Data Apps
Data Access abstraction
Storm on YARN
A whole batch of new applications
HOYA
Tez (Stinger)MapReduce 2
Giraph
<Insert your application here>
Batch applications
Spinning YARNs with Spring
Services
Direct to YARN APIs
Spring Data Hadoop abstraction
Streaming
Why?
Machine Learning
Graphs
Services
Distributed Shell - Anything.
Spark
A higher abstraction
Hadoop based?
… but can run on YARN
In Memory
Distributed
Fault tolerant
Real-time
✓
✓
✓
✓
❌
RRDs
✓
Mesos
Wider sharing
HadoopSparkAurora
Mesos Framework
Hardware
YARN
MapReduce HBase etc
HDFS
Hadoop is more than MapReduce
The new world
YARN opens up new paradigms
Infrastructure maturing: better sharing
Hadoop and beyond!
Thank you
Questions?
Simon Elliston Ball
Head of Big Data - Red Gate Ventures
@sireb
simon@simonellistonball.com
http://bit.ly/RidingElephants

Más contenido relacionado

La actualidad más candente

Using SparkR to Scale Data Science Applications in Production. Lessons from t...
Using SparkR to Scale Data Science Applications in Production. Lessons from t...Using SparkR to Scale Data Science Applications in Production. Lessons from t...
Using SparkR to Scale Data Science Applications in Production. Lessons from t...
Spark Summit
 
Analyzing IOT Data in Apache Spark Across Data Centers and Cloud with NetApp ...
Analyzing IOT Data in Apache Spark Across Data Centers and Cloud with NetApp ...Analyzing IOT Data in Apache Spark Across Data Centers and Cloud with NetApp ...
Analyzing IOT Data in Apache Spark Across Data Centers and Cloud with NetApp ...
Databricks
 
AUTOMATED DATA EXPLORATION - Building efficient analysis pipelines with Dask
AUTOMATED DATA EXPLORATION - Building efficient analysis pipelines with DaskAUTOMATED DATA EXPLORATION - Building efficient analysis pipelines with Dask
AUTOMATED DATA EXPLORATION - Building efficient analysis pipelines with Dask
Víctor Zabalza
 
Lens: Data exploration with Dask and Jupyter widgets
Lens: Data exploration with Dask and Jupyter widgetsLens: Data exploration with Dask and Jupyter widgets
Lens: Data exploration with Dask and Jupyter widgets
Víctor Zabalza
 
Reliable Performance at Scale with Apache Spark on Kubernetes
Reliable Performance at Scale with Apache Spark on KubernetesReliable Performance at Scale with Apache Spark on Kubernetes
Reliable Performance at Scale with Apache Spark on Kubernetes
Databricks
 

La actualidad más candente (20)

Prediction as a service with ensemble model in SparkML and Python ScikitLearn
Prediction as a service with ensemble model in SparkML and Python ScikitLearnPrediction as a service with ensemble model in SparkML and Python ScikitLearn
Prediction as a service with ensemble model in SparkML and Python ScikitLearn
 
The BDAS Open Source Community
The BDAS Open Source CommunityThe BDAS Open Source Community
The BDAS Open Source Community
 
Using SparkR to Scale Data Science Applications in Production. Lessons from t...
Using SparkR to Scale Data Science Applications in Production. Lessons from t...Using SparkR to Scale Data Science Applications in Production. Lessons from t...
Using SparkR to Scale Data Science Applications in Production. Lessons from t...
 
Apache Spark Performance is too hard. Let's make it easier
Apache Spark Performance is too hard. Let's make it easierApache Spark Performance is too hard. Let's make it easier
Apache Spark Performance is too hard. Let's make it easier
 
A Graph-Based Method For Cross-Entity Threat Detection
 A Graph-Based Method For Cross-Entity Threat Detection A Graph-Based Method For Cross-Entity Threat Detection
A Graph-Based Method For Cross-Entity Threat Detection
 
H2O World - PySparkling Water - Nidhi Mehta
H2O World - PySparkling Water - Nidhi MehtaH2O World - PySparkling Water - Nidhi Mehta
H2O World - PySparkling Water - Nidhi Mehta
 
Analyzing IOT Data in Apache Spark Across Data Centers and Cloud with NetApp ...
Analyzing IOT Data in Apache Spark Across Data Centers and Cloud with NetApp ...Analyzing IOT Data in Apache Spark Across Data Centers and Cloud with NetApp ...
Analyzing IOT Data in Apache Spark Across Data Centers and Cloud with NetApp ...
 
Extreme Apache Spark: how in 3 months we created a pipeline that can process ...
Extreme Apache Spark: how in 3 months we created a pipeline that can process ...Extreme Apache Spark: how in 3 months we created a pipeline that can process ...
Extreme Apache Spark: how in 3 months we created a pipeline that can process ...
 
Spark Under the Hood - Meetup @ Data Science London
Spark Under the Hood - Meetup @ Data Science LondonSpark Under the Hood - Meetup @ Data Science London
Spark Under the Hood - Meetup @ Data Science London
 
R + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San Jose
R + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San JoseR + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San Jose
R + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San Jose
 
London Spark Meetup Project Tungsten Oct 12 2015
London Spark Meetup Project Tungsten Oct 12 2015London Spark Meetup Project Tungsten Oct 12 2015
London Spark Meetup Project Tungsten Oct 12 2015
 
Fast Data with Apache Ignite and Apache Spark with Christos Erotocritou
Fast Data with Apache Ignite and Apache Spark with Christos ErotocritouFast Data with Apache Ignite and Apache Spark with Christos Erotocritou
Fast Data with Apache Ignite and Apache Spark with Christos Erotocritou
 
Introduction to MapReduce
Introduction to MapReduceIntroduction to MapReduce
Introduction to MapReduce
 
LocationTech Projects
LocationTech ProjectsLocationTech Projects
LocationTech Projects
 
Dask: Scaling Python
Dask: Scaling PythonDask: Scaling Python
Dask: Scaling Python
 
Apache® Spark™ 1.6 presented by Databricks co-founder Patrick Wendell
Apache® Spark™ 1.6 presented by Databricks co-founder Patrick WendellApache® Spark™ 1.6 presented by Databricks co-founder Patrick Wendell
Apache® Spark™ 1.6 presented by Databricks co-founder Patrick Wendell
 
AUTOMATED DATA EXPLORATION - Building efficient analysis pipelines with Dask
AUTOMATED DATA EXPLORATION - Building efficient analysis pipelines with DaskAUTOMATED DATA EXPLORATION - Building efficient analysis pipelines with Dask
AUTOMATED DATA EXPLORATION - Building efficient analysis pipelines with Dask
 
Lens: Data exploration with Dask and Jupyter widgets
Lens: Data exploration with Dask and Jupyter widgetsLens: Data exploration with Dask and Jupyter widgets
Lens: Data exploration with Dask and Jupyter widgets
 
Reliable Performance at Scale with Apache Spark on Kubernetes
Reliable Performance at Scale with Apache Spark on KubernetesReliable Performance at Scale with Apache Spark on Kubernetes
Reliable Performance at Scale with Apache Spark on Kubernetes
 
First impressions of SparkR: our own machine learning algorithm
First impressions of SparkR: our own machine learning algorithmFirst impressions of SparkR: our own machine learning algorithm
First impressions of SparkR: our own machine learning algorithm
 

Similar a Riding the Elephant - Hadoop 2.0

Big Data Essentials meetup @ IBM Ljubljana 23.06.2015
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015Big Data Essentials meetup @ IBM Ljubljana 23.06.2015
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015
Andrey Vykhodtsev
 

Similar a Riding the Elephant - Hadoop 2.0 (20)

Riding the Elephant - Hadoop 2.0
Riding the Elephant - Hadoop 2.0Riding the Elephant - Hadoop 2.0
Riding the Elephant - Hadoop 2.0
 
Yarn by default (Spark on YARN)
Yarn by default (Spark on YARN)Yarn by default (Spark on YARN)
Yarn by default (Spark on YARN)
 
Hadoop past, present and future
Hadoop past, present and futureHadoop past, present and future
Hadoop past, present and future
 
Hackathon bonn
Hackathon bonnHackathon bonn
Hackathon bonn
 
The Fundamentals Guide to HDP and HDInsight
The Fundamentals Guide to HDP and HDInsightThe Fundamentals Guide to HDP and HDInsight
The Fundamentals Guide to HDP and HDInsight
 
Huhadoop - v1.1
Huhadoop - v1.1Huhadoop - v1.1
Huhadoop - v1.1
 
How YARN Enables Multiple Data Processing Engines in Hadoop
How YARN Enables Multiple Data Processing Engines in HadoopHow YARN Enables Multiple Data Processing Engines in Hadoop
How YARN Enables Multiple Data Processing Engines in Hadoop
 
Hadoop - Past, Present and Future - v1.2
Hadoop - Past, Present and Future - v1.2Hadoop - Past, Present and Future - v1.2
Hadoop - Past, Present and Future - v1.2
 
Hadoop Essential for Oracle Professionals
Hadoop Essential for Oracle ProfessionalsHadoop Essential for Oracle Professionals
Hadoop Essential for Oracle Professionals
 
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.02013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
 
Hadoop - Past, Present and Future - v1.1
Hadoop - Past, Present and Future - v1.1Hadoop - Past, Present and Future - v1.1
Hadoop - Past, Present and Future - v1.1
 
Handling not so big data
Handling not so big dataHandling not so big data
Handling not so big data
 
EclipseCon Keynote: Apache Hadoop - An Introduction
EclipseCon Keynote: Apache Hadoop - An IntroductionEclipseCon Keynote: Apache Hadoop - An Introduction
EclipseCon Keynote: Apache Hadoop - An Introduction
 
2014 sept 26_thug_lambda_part1
2014 sept 26_thug_lambda_part12014 sept 26_thug_lambda_part1
2014 sept 26_thug_lambda_part1
 
BDE-BDVA Webinar: BDE Technical Overview
BDE-BDVA Webinar: BDE Technical OverviewBDE-BDVA Webinar: BDE Technical Overview
BDE-BDVA Webinar: BDE Technical Overview
 
RackN Physical Layer Automation Innovation
RackN Physical Layer Automation InnovationRackN Physical Layer Automation Innovation
RackN Physical Layer Automation Innovation
 
20140202 fosdem-nosql-devroom-hadoop-yarn
20140202 fosdem-nosql-devroom-hadoop-yarn20140202 fosdem-nosql-devroom-hadoop-yarn
20140202 fosdem-nosql-devroom-hadoop-yarn
 
Unleash your cluster with YARN
Unleash your cluster with YARNUnleash your cluster with YARN
Unleash your cluster with YARN
 
Overview of slider project
Overview of slider projectOverview of slider project
Overview of slider project
 
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015Big Data Essentials meetup @ IBM Ljubljana 23.06.2015
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015
 

Más de Simon Elliston Ball

Más de Simon Elliston Ball (11)

A streaming architecture for Cyber Security - Apache Metron
A streaming architecture for Cyber Security - Apache MetronA streaming architecture for Cyber Security - Apache Metron
A streaming architecture for Cyber Security - Apache Metron
 
mcubed london - data science at the edge
mcubed london - data science at the edgemcubed london - data science at the edge
mcubed london - data science at the edge
 
When to no sql and when to know sql javaone
When to no sql and when to know sql   javaoneWhen to no sql and when to know sql   javaone
When to no sql and when to know sql javaone
 
Machine learning without the PhD - azure ml
Machine learning without the PhD - azure mlMachine learning without the PhD - azure ml
Machine learning without the PhD - azure ml
 
Why Hadoop and SQL just want to be friends - lightning talk NoSQL Matters Dub...
Why Hadoop and SQL just want to be friends - lightning talk NoSQL Matters Dub...Why Hadoop and SQL just want to be friends - lightning talk NoSQL Matters Dub...
Why Hadoop and SQL just want to be friends - lightning talk NoSQL Matters Dub...
 
Getting your Big Data on with HDInsight
Getting your Big Data on with HDInsightGetting your Big Data on with HDInsight
Getting your Big Data on with HDInsight
 
Finding and Using Big Data in your business
Finding and Using Big Data in your businessFinding and Using Big Data in your business
Finding and Using Big Data in your business
 
When to NoSQL and when to know SQL
When to NoSQL and when to know SQLWhen to NoSQL and when to know SQL
When to NoSQL and when to know SQL
 
Mongo db for c# developers
Mongo db for c# developersMongo db for c# developers
Mongo db for c# developers
 
NDC London 2013 - Mongo db for c# developers
NDC London 2013 - Mongo db for c# developersNDC London 2013 - Mongo db for c# developers
NDC London 2013 - Mongo db for c# developers
 
Mongo db for C# Developers
Mongo db for C# DevelopersMongo db for C# Developers
Mongo db for C# Developers
 

Último

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Último (20)

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 

Riding the Elephant - Hadoop 2.0