SlideShare una empresa de Scribd logo
1 de 28
Descargar para leer sin conexión
© Hortonworks Inc. 2013
Deploying & Managing
distributed apps on YARN
© Hortonworks Inc. 2012
Hadoop as Next-Gen Platform
HADOOP 1.0
HDFS
(redundant, reliable storage)
MapReduce
(cluster resource management
& data processing)
HDFS2
(redundant, reliable storage)
YARN
(cluster resource management)
MapReduce
(data processing)
Others
(data processing)
HADOOP 2.0
Single Use System
Batch Apps
Multi Purpose Platform
Batch, Interactive, Online, Streaming, …
Page 2
© Hortonworks Inc.
Slider
Page 3
Applications Run Natively IN Hadoop
HDFS2 (Redundant, Reliable Storage)
YARN (Cluster Resource Management)
BATCH
(MapReduce)
INTERACTIVE
(Tez)
STREAMING
(Storm, S4,…)
GRAPH
(Giraph)
HPC MPI
(OpenMPI)
OTHER
(Search)
(Weave…)
Samza
Availability (alwayson)
Flexibility (dynamic scaling)
Resource Management (optimization)
IN-MEMORY
(Spark)
HDFS2 (Redundant, Reliable Storage)
YARN (Cluster Resource Management)
BATCH
(MapReduce)
INTERACTIVE
(Tez)
STREAMING
(Storm, S4,…)
GRAPH
(Giraph)
HPC MPI
(OpenMPI)
OTHER
(Search)
(Weave…)
HBase
IN-MEMORY
(Spark)
HDFS2 (Redundant, Reliable Storage)
YARN (Cluster Resource Management)
BATCH
(MapReduce)
INTERACTIVE
(Tez)
STREAMING
(Storm, S4,…)
GRAPH
(Giraph)
HPC MPI
(OpenMPI)
OTHER
(Search)
(Weave…)
HBase
IN-MEMORY
(Spark)
© Hortonworks Inc.
Initial work: Hoya –on-demand HBase
JSON config in HDFS drives cluster setup and config
1. Small HBase cluster in large YARN cluster
2. Dynamic, self-healing
3. Freeze & Thaw
4. Custom versions & configurations
5. More efficient utilization/sharing of cluster
Page 5
© Hortonworks Inc. 2012
YARN manages the cluster
Page 6
HDFS
YARN Node Manager
HDFS
YARN Node Manager
HDFS
YARN Resource Manager
HDFS
YARN Node Manager
• Servers run YARN Node Managers
• NM's heartbeat to Resource Manager
• RM schedules work over cluster
• RM allocates containers to apps
• NMs start containers
• NMs report container health
© Hortonworks Inc. 2012
Client creates App Master
Page 7
HDFS
YARN Node Manager
HDFS
YARN Node Manager
HDFS
YARN Resource Manager
HDFS
YARN Node Manager
CLI
Application Master
© Hortonworks Inc. 2012
AM deploys HBase with YARN
Page 8
HDFS
YARN Node Manager
HDFS
YARN Node Manager
HDFS
YARN Resource Manager
CLI
HDFS
YARN Node Manager
Application Master
HBase Region Server
HBase Region Server
HBase Master
© Hortonworks Inc. 2012
HBase & clients bind via Zookeeper
Page 9
HDFS
YARN Node Manager
HBase Region Server
HDFS
YARN Node Manager
HBase Region Server
HDFS
YARN Resource Manager
HBase Client HDFS
YARN Node Manager
Application Master
CLI
HBase Master
© Hortonworks Inc. 2012
YARN notifies AM of failures
Page 10
HDFS
YARN Node Manager
HDFS
YARN Node Manager
HBase Region Server
HDFS
YARN Resource Manager
CLI
HDFS
YARN Node Manager
Application Master
HBase Region Server
HBase Region Server
HBase Master
© Hortonworks Inc.
Flexing/failure handling is same code
boolean flexCluster(ConfTree updated) {
appState.updateResourceDefinitions(updated);
return reviewRequestAndReleaseNodes();
}
void onContainersCompleted(List<ContainerStatus>
completed) {
for (ContainerStatus status : completed) {
appState.onCompletedNode(status);
}
reviewRequestAndReleaseNodes();
}
Page 11
© Hortonworks Inc.
Limitations
• Needs app with built in discovery/binding protocol
-HBase, Accumulo
• Static configuration –no dynamic information
• Kill-without-warning sole shutdown mechanism
• Custom Java in client & App Master for each service
• Client code assumed CLI
–but embedded/PaaS use was popular.
Page 12
© Hortonworks Inc.
Slider
“Imagine starting a farm of tomcat servers hooked up
to an HBase cluster you just deployed –servers
processing requests forwarded by a load-balancing
service”
Page 13
© Hortonworks Inc.
Slider: evolution of & successor to Hoya
1. Packaging format for deployable applications
2. Service registration and discovery
3. Propagation of dynamic config information back to
clients
4. Client API for embedding –CLI only one use case.
Page 14
goal: no code changes to deploy applications in a YARN
cluster
© Hortonworks Inc.
Packaging
• Packaging format for deployable applications
• metadata, template configurations
• template logic to go from config to scripts
• simple .py scripts for starting different components in
different YARN containers
• future: snapshot, stop, decommission
Page 15
© Hortonworks Inc. 2012
Slider AppMaster
Page 16
AM Main
Slider Engine
YARN
Integration
Launcher
Provider
Provider
ProviderREST API
Web UI
REST API
view
controller
YARN RM
YARN NM
provider REST
services
events
builds containers
© Hortonworks Inc. 2012
Model
Page 17
NodeMap
model of YARN cluster
ComponentHistory
persistent history of
component placements
Specification
resources.json &c
Container Queues
requested, starting,
releasing
Component Map
container ID -> component
instance
Event History
application history
Persisted Rebuilt Transient
© Hortonworks Inc. 2012
Slider as a Service
Page 18
Slider AM
Components
YARN RM
YARN NM
Components
Components
Application
resources appconf
2. queues
1. builds
7. REST Commands
3.launches
4. reads
5. starts
containers
8. Events
6. heartbeats
 commands
© Hortonworks Inc. 2012
Bonded: app owns agents
Page 19
Slider AM
Components
YARN RM
YARN NM
Components
Components
Application
resources appconf
2. queues
1. builds
7. REST Commands
3.launches
4. reads
5. starts
containers
8. Events
6. heartbeats  commands
© Hortonworks Inc.
Configuration Management
• JSON specs with two-level (global, component) values
• Resolution: inherit cluster, global and component values
with override
no: x-refs, deep inheritance, undefinition, schemas, …
• For PaaS deployment: list needed options
• Python scripts in packages get componentK-Vs & choose
how to interpret: templates, actions, generate CM specs…
• leads to need: publish of dynamic configurations
Page 20
Goal: avoid grand-Hadoop-only CM infrastructures
© Hortonworks Inc. 2012
{
"schema" : "http://example.org/specification/v2.0.0",
"metadata" : {
},
"global" : {
"yarn.vcores" : "1",
"yarn.memory" : "256",
},
"components" : {
"rs" : {
"yarn.memory" : "512",
"yarn.priority" : "2",
"yarn.instances" : "4"
},
"slider-appmaster" : {
"yarn.instances" : "1"
},
"master" : {
"yarn.priority" : "1",
"yarn.instances" : "1"
}
}
}
JSON Specification
© Hortonworks Inc.
Service Registration and Discovery
• extends beyond YARN to whole cluster
-why cant I bootstrap -site.xml configs from ZK info
• LDAP story?
• curator: flat service list; REST view
• Helix: separate internal and external views
• …others
Page 22
recurrent need - YARN-913
Help Needed!
© Hortonworks Inc.
Service registry thoughts
hadoop/services/$user/$service
• $service/externalView
$service/internalView
$service/remoteView
• components underneath with same structure
hadoop/services/$user/$service/components/$id
• ports & URLs with metadata (protocol, description)
Page 23
For enumeration, UIs, Knox,
© Hortonworks Inc.
How to publish configuration docs?
• K-V pairs costly for ZK
• LDAP integration? ApacheDS™ & enterprise
• String Template generation of documents
• HTTP serving
Initial plan
• Serve content and servicein the slider AM
• ZK entries to list URLs
Page 24
Hard problem
© Hortonworks Inc.
YARN-896: long-lived services
1. Container reconnect on AM restart
2. YARN Token renewal on long-lived apps
3. Containers: signalling, >1 process sequence
4. AM/RM managed gang scheduling
5. Anti-affinity hint in container requests
6. Service Registry (YARN-913)
7. Logging
Page 25
✓
© Hortonworks Inc.
SLAs & co-existence with analytics
1. Make IO bandwidth/IOPs schedule resources
2. Need to monitor what's going on with IO & net load from
containers  apps  queues
3. Dynamic adaptation of cgroup HDD, Net, RAM limits
4. Could we throttle MR job File & HDFS IO bandwidth?
Page 26
© Hortonworks Inc.
YARN Proxy doesn’t like REST APIs
• Proxy service downgrades all actions to GET
• AmIpFilter (in AM Web filters) redirects all operations to
Proxy with 302 (GET) if origin != proxy IP
Workaround:
SliderAmIPFilter selective on redirects
Long Term:
–Proxy to forward
–Filter to 307 redirect,
–httpclient callers to enable 307 handling
Page 27
© Hortonworks Inc.
Status as of April 2014
• Initial agent, REST API, container launch
• Package-based deployments of HBase, Ambari, Storm
• Hierarchical configuration JSON evolving
• Service registry - work in progress
• Incubator: proposal submitted

Más contenido relacionado

La actualidad más candente

What's new in hadoop 3.0
What's new in hadoop 3.0What's new in hadoop 3.0
What's new in hadoop 3.0Heiko Loewe
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkFlexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkDataWorks Summit
 
Apache Hadoop YARN 3.x in Alibaba
Apache Hadoop YARN 3.x in AlibabaApache Hadoop YARN 3.x in Alibaba
Apache Hadoop YARN 3.x in AlibabaDataWorks Summit
 
Pig on Tez: Low Latency Data Processing with Big Data
Pig on Tez: Low Latency Data Processing with Big DataPig on Tez: Low Latency Data Processing with Big Data
Pig on Tez: Low Latency Data Processing with Big DataDataWorks Summit
 
a Secure Public Cache for YARN Application Resources
a Secure Public Cache for YARN Application Resourcesa Secure Public Cache for YARN Application Resources
a Secure Public Cache for YARN Application ResourcesDataWorks Summit
 
Apache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with HadoopApache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with HadoopHortonworks
 
Towards SLA-based Scheduling on YARN Clusters
Towards SLA-based Scheduling on YARN ClustersTowards SLA-based Scheduling on YARN Clusters
Towards SLA-based Scheduling on YARN ClustersDataWorks Summit
 
YARN - Next Generation Compute Platform fo Hadoop
YARN - Next Generation Compute Platform fo HadoopYARN - Next Generation Compute Platform fo Hadoop
YARN - Next Generation Compute Platform fo HadoopHortonworks
 
An Apache Hive Based Data Warehouse
An Apache Hive Based Data WarehouseAn Apache Hive Based Data Warehouse
An Apache Hive Based Data WarehouseDataWorks Summit
 
Big data processing meets non-volatile memory: opportunities and challenges
Big data processing meets non-volatile memory: opportunities and challenges Big data processing meets non-volatile memory: opportunities and challenges
Big data processing meets non-volatile memory: opportunities and challenges DataWorks Summit
 
Hadoop Summit San Jose 2015: Towards SLA-based Scheduling on YARN Clusters
Hadoop Summit San Jose 2015: Towards SLA-based Scheduling on YARN Clusters Hadoop Summit San Jose 2015: Towards SLA-based Scheduling on YARN Clusters
Hadoop Summit San Jose 2015: Towards SLA-based Scheduling on YARN Clusters Sumeet Singh
 
Tune up Yarn and Hive
Tune up Yarn and HiveTune up Yarn and Hive
Tune up Yarn and Hiverxu
 
Apache Hadoop YARN
Apache Hadoop YARNApache Hadoop YARN
Apache Hadoop YARNAdam Kawa
 
Getting started with HBase
Getting started with HBaseGetting started with HBase
Getting started with HBaseCarol McDonald
 
Unified Batch & Stream Processing with Apache Samza
Unified Batch & Stream Processing with Apache SamzaUnified Batch & Stream Processing with Apache Samza
Unified Batch & Stream Processing with Apache SamzaDataWorks Summit
 
Apache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and FutureApache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and FutureDataWorks Summit
 

La actualidad más candente (20)

What's new in hadoop 3.0
What's new in hadoop 3.0What's new in hadoop 3.0
What's new in hadoop 3.0
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkFlexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache Flink
 
Apache Hadoop YARN 3.x in Alibaba
Apache Hadoop YARN 3.x in AlibabaApache Hadoop YARN 3.x in Alibaba
Apache Hadoop YARN 3.x in Alibaba
 
Pig on Tez: Low Latency Data Processing with Big Data
Pig on Tez: Low Latency Data Processing with Big DataPig on Tez: Low Latency Data Processing with Big Data
Pig on Tez: Low Latency Data Processing with Big Data
 
a Secure Public Cache for YARN Application Resources
a Secure Public Cache for YARN Application Resourcesa Secure Public Cache for YARN Application Resources
a Secure Public Cache for YARN Application Resources
 
Apache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with HadoopApache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with Hadoop
 
Towards SLA-based Scheduling on YARN Clusters
Towards SLA-based Scheduling on YARN ClustersTowards SLA-based Scheduling on YARN Clusters
Towards SLA-based Scheduling on YARN Clusters
 
YARN - Next Generation Compute Platform fo Hadoop
YARN - Next Generation Compute Platform fo HadoopYARN - Next Generation Compute Platform fo Hadoop
YARN - Next Generation Compute Platform fo Hadoop
 
An Apache Hive Based Data Warehouse
An Apache Hive Based Data WarehouseAn Apache Hive Based Data Warehouse
An Apache Hive Based Data Warehouse
 
Big data processing meets non-volatile memory: opportunities and challenges
Big data processing meets non-volatile memory: opportunities and challenges Big data processing meets non-volatile memory: opportunities and challenges
Big data processing meets non-volatile memory: opportunities and challenges
 
Hadoop Summit San Jose 2015: Towards SLA-based Scheduling on YARN Clusters
Hadoop Summit San Jose 2015: Towards SLA-based Scheduling on YARN Clusters Hadoop Summit San Jose 2015: Towards SLA-based Scheduling on YARN Clusters
Hadoop Summit San Jose 2015: Towards SLA-based Scheduling on YARN Clusters
 
Tune up Yarn and Hive
Tune up Yarn and HiveTune up Yarn and Hive
Tune up Yarn and Hive
 
Apache Hadoop YARN
Apache Hadoop YARNApache Hadoop YARN
Apache Hadoop YARN
 
HDFS tiered storage
HDFS tiered storageHDFS tiered storage
HDFS tiered storage
 
Getting started with HBase
Getting started with HBaseGetting started with HBase
Getting started with HBase
 
Unified Batch & Stream Processing with Apache Samza
Unified Batch & Stream Processing with Apache SamzaUnified Batch & Stream Processing with Apache Samza
Unified Batch & Stream Processing with Apache Samza
 
Apache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and FutureApache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and Future
 
Yarns About Yarn
Yarns About YarnYarns About Yarn
Yarns About Yarn
 
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in HiveLLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
 

Destacado

My other computer_is_a_datacentre
My other computer_is_a_datacentreMy other computer_is_a_datacentre
My other computer_is_a_datacentreSteve Loughran
 
Battle At Goliad
Battle At GoliadBattle At Goliad
Battle At Goliadcompd
 
Taming Deployment With Smart Frog
Taming Deployment With Smart FrogTaming Deployment With Smart Frog
Taming Deployment With Smart FrogSteve Loughran
 
Farms, Fabrics and Clouds
Farms, Fabrics and CloudsFarms, Fabrics and Clouds
Farms, Fabrics and CloudsSteve Loughran
 
Economic Scheduling of Hadoop Jobs
Economic Scheduling of Hadoop JobsEconomic Scheduling of Hadoop Jobs
Economic Scheduling of Hadoop JobsSteve Loughran
 
HDP-1 introduction for HUG France
HDP-1 introduction for HUG FranceHDP-1 introduction for HUG France
HDP-1 introduction for HUG FranceSteve Loughran
 
A New Approach To Organization
A New Approach To OrganizationA New Approach To Organization
A New Approach To Organizationcompd
 
Did you really want that data?
Did you really want that data?Did you really want that data?
Did you really want that data?Steve Loughran
 

Destacado (13)

My other computer_is_a_datacentre
My other computer_is_a_datacentreMy other computer_is_a_datacentre
My other computer_is_a_datacentre
 
Battle At Goliad
Battle At GoliadBattle At Goliad
Battle At Goliad
 
Taming Deployment With Smart Frog
Taming Deployment With Smart FrogTaming Deployment With Smart Frog
Taming Deployment With Smart Frog
 
Farms, Fabrics and Clouds
Farms, Fabrics and CloudsFarms, Fabrics and Clouds
Farms, Fabrics and Clouds
 
Economic Scheduling of Hadoop Jobs
Economic Scheduling of Hadoop JobsEconomic Scheduling of Hadoop Jobs
Economic Scheduling of Hadoop Jobs
 
Extended essay overview
Extended essay overviewExtended essay overview
Extended essay overview
 
HDP-1 introduction for HUG France
HDP-1 introduction for HUG FranceHDP-1 introduction for HUG France
HDP-1 introduction for HUG France
 
H is for_hadoop
H is for_hadoopH is for_hadoop
H is for_hadoop
 
Graphs
GraphsGraphs
Graphs
 
A New Approach To Organization
A New Approach To OrganizationA New Approach To Organization
A New Approach To Organization
 
Scholarly articles
Scholarly articlesScholarly articles
Scholarly articles
 
Echolocation
EcholocationEcholocation
Echolocation
 
Did you really want that data?
Did you really want that data?Did you really want that data?
Did you really want that data?
 

Similar a Overview of slider project

How YARN Enables Multiple Data Processing Engines in Hadoop
How YARN Enables Multiple Data Processing Engines in HadoopHow YARN Enables Multiple Data Processing Engines in Hadoop
How YARN Enables Multiple Data Processing Engines in HadoopPOSSCON
 
Combine SAS High-Performance Capabilities with Hadoop YARN
Combine SAS High-Performance Capabilities with Hadoop YARNCombine SAS High-Performance Capabilities with Hadoop YARN
Combine SAS High-Performance Capabilities with Hadoop YARNHortonworks
 
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.02013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0Adam Muise
 
Get most out of Spark on YARN
Get most out of Spark on YARNGet most out of Spark on YARN
Get most out of Spark on YARNDataWorks Summit
 
Bikas saha:the next generation of hadoop– hadoop 2 and yarn
Bikas saha:the next generation of hadoop– hadoop 2 and yarnBikas saha:the next generation of hadoop– hadoop 2 and yarn
Bikas saha:the next generation of hadoop– hadoop 2 and yarnhdhappy001
 
YARN - Hadoop Next Generation Compute Platform
YARN - Hadoop Next Generation Compute PlatformYARN - Hadoop Next Generation Compute Platform
YARN - Hadoop Next Generation Compute PlatformBikas Saha
 
Apache Hadoop YARN: Understanding the Data Operating System of Hadoop
Apache Hadoop YARN: Understanding the Data Operating System of HadoopApache Hadoop YARN: Understanding the Data Operating System of Hadoop
Apache Hadoop YARN: Understanding the Data Operating System of HadoopHortonworks
 
Developing YARN Applications - Integrating natively to YARN July 24 2014
Developing YARN Applications - Integrating natively to YARN July 24 2014Developing YARN Applications - Integrating natively to YARN July 24 2014
Developing YARN Applications - Integrating natively to YARN July 24 2014Hortonworks
 
Slider: Applications on YARN
Slider: Applications on YARNSlider: Applications on YARN
Slider: Applications on YARNSteve Loughran
 
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFSDiscover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFSHortonworks
 
Apache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the unionApache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the unionDataWorks Summit
 
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The UnionDataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The UnionWangda Tan
 
Apache Tez -- A modern processing engine
Apache Tez -- A modern processing engineApache Tez -- A modern processing engine
Apache Tez -- A modern processing enginebigdatagurus_meetup
 
Get Started Building YARN Applications
Get Started Building YARN ApplicationsGet Started Building YARN Applications
Get Started Building YARN ApplicationsHortonworks
 
Running Non-MapReduce Big Data Applications on Apache Hadoop
Running Non-MapReduce Big Data Applications on Apache HadoopRunning Non-MapReduce Big Data Applications on Apache Hadoop
Running Non-MapReduce Big Data Applications on Apache Hadoophitesh1892
 
Apache Tez - A unifying Framework for Hadoop Data Processing
Apache Tez - A unifying Framework for Hadoop Data ProcessingApache Tez - A unifying Framework for Hadoop Data Processing
Apache Tez - A unifying Framework for Hadoop Data ProcessingDataWorks Summit
 
Tez big datacamp-la-bikas_saha
Tez big datacamp-la-bikas_sahaTez big datacamp-la-bikas_saha
Tez big datacamp-la-bikas_sahaData Con LA
 
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...Data Con LA
 

Similar a Overview of slider project (20)

How YARN Enables Multiple Data Processing Engines in Hadoop
How YARN Enables Multiple Data Processing Engines in HadoopHow YARN Enables Multiple Data Processing Engines in Hadoop
How YARN Enables Multiple Data Processing Engines in Hadoop
 
Combine SAS High-Performance Capabilities with Hadoop YARN
Combine SAS High-Performance Capabilities with Hadoop YARNCombine SAS High-Performance Capabilities with Hadoop YARN
Combine SAS High-Performance Capabilities with Hadoop YARN
 
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.02013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
 
Hadoop YARN Services
Hadoop YARN ServicesHadoop YARN Services
Hadoop YARN Services
 
Get most out of Spark on YARN
Get most out of Spark on YARNGet most out of Spark on YARN
Get most out of Spark on YARN
 
Bikas saha:the next generation of hadoop– hadoop 2 and yarn
Bikas saha:the next generation of hadoop– hadoop 2 and yarnBikas saha:the next generation of hadoop– hadoop 2 and yarn
Bikas saha:the next generation of hadoop– hadoop 2 and yarn
 
YARN - Hadoop Next Generation Compute Platform
YARN - Hadoop Next Generation Compute PlatformYARN - Hadoop Next Generation Compute Platform
YARN - Hadoop Next Generation Compute Platform
 
Apache Hadoop YARN: Understanding the Data Operating System of Hadoop
Apache Hadoop YARN: Understanding the Data Operating System of HadoopApache Hadoop YARN: Understanding the Data Operating System of Hadoop
Apache Hadoop YARN: Understanding the Data Operating System of Hadoop
 
Developing YARN Applications - Integrating natively to YARN July 24 2014
Developing YARN Applications - Integrating natively to YARN July 24 2014Developing YARN Applications - Integrating natively to YARN July 24 2014
Developing YARN Applications - Integrating natively to YARN July 24 2014
 
YARN Services
YARN ServicesYARN Services
YARN Services
 
Slider: Applications on YARN
Slider: Applications on YARNSlider: Applications on YARN
Slider: Applications on YARN
 
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFSDiscover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
 
Apache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the unionApache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the union
 
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The UnionDataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union
 
Apache Tez -- A modern processing engine
Apache Tez -- A modern processing engineApache Tez -- A modern processing engine
Apache Tez -- A modern processing engine
 
Get Started Building YARN Applications
Get Started Building YARN ApplicationsGet Started Building YARN Applications
Get Started Building YARN Applications
 
Running Non-MapReduce Big Data Applications on Apache Hadoop
Running Non-MapReduce Big Data Applications on Apache HadoopRunning Non-MapReduce Big Data Applications on Apache Hadoop
Running Non-MapReduce Big Data Applications on Apache Hadoop
 
Apache Tez - A unifying Framework for Hadoop Data Processing
Apache Tez - A unifying Framework for Hadoop Data ProcessingApache Tez - A unifying Framework for Hadoop Data Processing
Apache Tez - A unifying Framework for Hadoop Data Processing
 
Tez big datacamp-la-bikas_saha
Tez big datacamp-la-bikas_sahaTez big datacamp-la-bikas_saha
Tez big datacamp-la-bikas_saha
 
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
 

Más de Steve Loughran

The age of rename() is over
The age of rename() is overThe age of rename() is over
The age of rename() is overSteve Loughran
 
What does Rename Do: (detailed version)
What does Rename Do: (detailed version)What does Rename Do: (detailed version)
What does Rename Do: (detailed version)Steve Loughran
 
Put is the new rename: San Jose Summit Edition
Put is the new rename: San Jose Summit EditionPut is the new rename: San Jose Summit Edition
Put is the new rename: San Jose Summit EditionSteve Loughran
 
@Dissidentbot: dissent will be automated!
@Dissidentbot: dissent will be automated!@Dissidentbot: dissent will be automated!
@Dissidentbot: dissent will be automated!Steve Loughran
 
PUT is the new rename()
PUT is the new rename()PUT is the new rename()
PUT is the new rename()Steve Loughran
 
Extreme Programming Deployed
Extreme Programming DeployedExtreme Programming Deployed
Extreme Programming DeployedSteve Loughran
 
What does rename() do?
What does rename() do?What does rename() do?
What does rename() do?Steve Loughran
 
Dancing Elephants: Working with Object Storage in Apache Spark and Hive
Dancing Elephants: Working with Object Storage in Apache Spark and HiveDancing Elephants: Working with Object Storage in Apache Spark and Hive
Dancing Elephants: Working with Object Storage in Apache Spark and HiveSteve Loughran
 
Apache Spark and Object Stores —for London Spark User Group
Apache Spark and Object Stores —for London Spark User GroupApache Spark and Object Stores —for London Spark User Group
Apache Spark and Object Stores —for London Spark User GroupSteve Loughran
 
Spark Summit East 2017: Apache spark and object stores
Spark Summit East 2017: Apache spark and object storesSpark Summit East 2017: Apache spark and object stores
Spark Summit East 2017: Apache spark and object storesSteve Loughran
 
Hadoop, Hive, Spark and Object Stores
Hadoop, Hive, Spark and Object StoresHadoop, Hive, Spark and Object Stores
Hadoop, Hive, Spark and Object StoresSteve Loughran
 
Apache Spark and Object Stores
Apache Spark and Object StoresApache Spark and Object Stores
Apache Spark and Object StoresSteve Loughran
 
Household INFOSEC in a Post-Sony Era
Household INFOSEC in a Post-Sony EraHousehold INFOSEC in a Post-Sony Era
Household INFOSEC in a Post-Sony EraSteve Loughran
 
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 edition
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 editionHadoop and Kerberos: the Madness Beyond the Gate: January 2016 edition
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 editionSteve Loughran
 
Hadoop and Kerberos: the Madness Beyond the Gate
Hadoop and Kerberos: the Madness Beyond the GateHadoop and Kerberos: the Madness Beyond the Gate
Hadoop and Kerberos: the Madness Beyond the GateSteve Loughran
 
Help! My Hadoop doesn't work!
Help! My Hadoop doesn't work!Help! My Hadoop doesn't work!
Help! My Hadoop doesn't work!Steve Loughran
 

Más de Steve Loughran (20)

Hadoop Vectored IO
Hadoop Vectored IOHadoop Vectored IO
Hadoop Vectored IO
 
The age of rename() is over
The age of rename() is overThe age of rename() is over
The age of rename() is over
 
What does Rename Do: (detailed version)
What does Rename Do: (detailed version)What does Rename Do: (detailed version)
What does Rename Do: (detailed version)
 
Put is the new rename: San Jose Summit Edition
Put is the new rename: San Jose Summit EditionPut is the new rename: San Jose Summit Edition
Put is the new rename: San Jose Summit Edition
 
@Dissidentbot: dissent will be automated!
@Dissidentbot: dissent will be automated!@Dissidentbot: dissent will be automated!
@Dissidentbot: dissent will be automated!
 
PUT is the new rename()
PUT is the new rename()PUT is the new rename()
PUT is the new rename()
 
Extreme Programming Deployed
Extreme Programming DeployedExtreme Programming Deployed
Extreme Programming Deployed
 
Testing
TestingTesting
Testing
 
I hate mocking
I hate mockingI hate mocking
I hate mocking
 
What does rename() do?
What does rename() do?What does rename() do?
What does rename() do?
 
Dancing Elephants: Working with Object Storage in Apache Spark and Hive
Dancing Elephants: Working with Object Storage in Apache Spark and HiveDancing Elephants: Working with Object Storage in Apache Spark and Hive
Dancing Elephants: Working with Object Storage in Apache Spark and Hive
 
Apache Spark and Object Stores —for London Spark User Group
Apache Spark and Object Stores —for London Spark User GroupApache Spark and Object Stores —for London Spark User Group
Apache Spark and Object Stores —for London Spark User Group
 
Spark Summit East 2017: Apache spark and object stores
Spark Summit East 2017: Apache spark and object storesSpark Summit East 2017: Apache spark and object stores
Spark Summit East 2017: Apache spark and object stores
 
Hadoop, Hive, Spark and Object Stores
Hadoop, Hive, Spark and Object StoresHadoop, Hive, Spark and Object Stores
Hadoop, Hive, Spark and Object Stores
 
Apache Spark and Object Stores
Apache Spark and Object StoresApache Spark and Object Stores
Apache Spark and Object Stores
 
Household INFOSEC in a Post-Sony Era
Household INFOSEC in a Post-Sony EraHousehold INFOSEC in a Post-Sony Era
Household INFOSEC in a Post-Sony Era
 
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 edition
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 editionHadoop and Kerberos: the Madness Beyond the Gate: January 2016 edition
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 edition
 
Hadoop and Kerberos: the Madness Beyond the Gate
Hadoop and Kerberos: the Madness Beyond the GateHadoop and Kerberos: the Madness Beyond the Gate
Hadoop and Kerberos: the Madness Beyond the Gate
 
Datacentre stack
Datacentre stackDatacentre stack
Datacentre stack
 
Help! My Hadoop doesn't work!
Help! My Hadoop doesn't work!Help! My Hadoop doesn't work!
Help! My Hadoop doesn't work!
 

Último

UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8DianaGray10
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsSafe Software
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioChristian Posta
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Commit University
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfAijun Zhang
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsSeth Reyes
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7DianaGray10
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfDaniel Santiago Silva Capera
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Will Schroeder
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxGDSC PJATK
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...DianaGray10
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Adtran
 
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesDavid Newbury
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaborationbruanjhuli
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfinfogdgmi
 
AI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarAI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarPrecisely
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URLRuncy Oommen
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Websitedgelyza
 

Último (20)

20150722 - AGV
20150722 - AGV20150722 - AGV
20150722 - AGV
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and Istio
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)
 
20230104 - machine vision
20230104 - machine vision20230104 - machine vision
20230104 - machine vision
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdf
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and Hazards
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptx
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™
 
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond Ontologies
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdf
 
AI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarAI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity Webinar
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URL
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Website
 

Overview of slider project

  • 1. © Hortonworks Inc. 2013 Deploying & Managing distributed apps on YARN
  • 2. © Hortonworks Inc. 2012 Hadoop as Next-Gen Platform HADOOP 1.0 HDFS (redundant, reliable storage) MapReduce (cluster resource management & data processing) HDFS2 (redundant, reliable storage) YARN (cluster resource management) MapReduce (data processing) Others (data processing) HADOOP 2.0 Single Use System Batch Apps Multi Purpose Platform Batch, Interactive, Online, Streaming, … Page 2
  • 3. © Hortonworks Inc. Slider Page 3 Applications Run Natively IN Hadoop HDFS2 (Redundant, Reliable Storage) YARN (Cluster Resource Management) BATCH (MapReduce) INTERACTIVE (Tez) STREAMING (Storm, S4,…) GRAPH (Giraph) HPC MPI (OpenMPI) OTHER (Search) (Weave…) Samza Availability (alwayson) Flexibility (dynamic scaling) Resource Management (optimization) IN-MEMORY (Spark)
  • 4. HDFS2 (Redundant, Reliable Storage) YARN (Cluster Resource Management) BATCH (MapReduce) INTERACTIVE (Tez) STREAMING (Storm, S4,…) GRAPH (Giraph) HPC MPI (OpenMPI) OTHER (Search) (Weave…) HBase IN-MEMORY (Spark) HDFS2 (Redundant, Reliable Storage) YARN (Cluster Resource Management) BATCH (MapReduce) INTERACTIVE (Tez) STREAMING (Storm, S4,…) GRAPH (Giraph) HPC MPI (OpenMPI) OTHER (Search) (Weave…) HBase IN-MEMORY (Spark)
  • 5. © Hortonworks Inc. Initial work: Hoya –on-demand HBase JSON config in HDFS drives cluster setup and config 1. Small HBase cluster in large YARN cluster 2. Dynamic, self-healing 3. Freeze & Thaw 4. Custom versions & configurations 5. More efficient utilization/sharing of cluster Page 5
  • 6. © Hortonworks Inc. 2012 YARN manages the cluster Page 6 HDFS YARN Node Manager HDFS YARN Node Manager HDFS YARN Resource Manager HDFS YARN Node Manager • Servers run YARN Node Managers • NM's heartbeat to Resource Manager • RM schedules work over cluster • RM allocates containers to apps • NMs start containers • NMs report container health
  • 7. © Hortonworks Inc. 2012 Client creates App Master Page 7 HDFS YARN Node Manager HDFS YARN Node Manager HDFS YARN Resource Manager HDFS YARN Node Manager CLI Application Master
  • 8. © Hortonworks Inc. 2012 AM deploys HBase with YARN Page 8 HDFS YARN Node Manager HDFS YARN Node Manager HDFS YARN Resource Manager CLI HDFS YARN Node Manager Application Master HBase Region Server HBase Region Server HBase Master
  • 9. © Hortonworks Inc. 2012 HBase & clients bind via Zookeeper Page 9 HDFS YARN Node Manager HBase Region Server HDFS YARN Node Manager HBase Region Server HDFS YARN Resource Manager HBase Client HDFS YARN Node Manager Application Master CLI HBase Master
  • 10. © Hortonworks Inc. 2012 YARN notifies AM of failures Page 10 HDFS YARN Node Manager HDFS YARN Node Manager HBase Region Server HDFS YARN Resource Manager CLI HDFS YARN Node Manager Application Master HBase Region Server HBase Region Server HBase Master
  • 11. © Hortonworks Inc. Flexing/failure handling is same code boolean flexCluster(ConfTree updated) { appState.updateResourceDefinitions(updated); return reviewRequestAndReleaseNodes(); } void onContainersCompleted(List<ContainerStatus> completed) { for (ContainerStatus status : completed) { appState.onCompletedNode(status); } reviewRequestAndReleaseNodes(); } Page 11
  • 12. © Hortonworks Inc. Limitations • Needs app with built in discovery/binding protocol -HBase, Accumulo • Static configuration –no dynamic information • Kill-without-warning sole shutdown mechanism • Custom Java in client & App Master for each service • Client code assumed CLI –but embedded/PaaS use was popular. Page 12
  • 13. © Hortonworks Inc. Slider “Imagine starting a farm of tomcat servers hooked up to an HBase cluster you just deployed –servers processing requests forwarded by a load-balancing service” Page 13
  • 14. © Hortonworks Inc. Slider: evolution of & successor to Hoya 1. Packaging format for deployable applications 2. Service registration and discovery 3. Propagation of dynamic config information back to clients 4. Client API for embedding –CLI only one use case. Page 14 goal: no code changes to deploy applications in a YARN cluster
  • 15. © Hortonworks Inc. Packaging • Packaging format for deployable applications • metadata, template configurations • template logic to go from config to scripts • simple .py scripts for starting different components in different YARN containers • future: snapshot, stop, decommission Page 15
  • 16. © Hortonworks Inc. 2012 Slider AppMaster Page 16 AM Main Slider Engine YARN Integration Launcher Provider Provider ProviderREST API Web UI REST API view controller YARN RM YARN NM provider REST services events builds containers
  • 17. © Hortonworks Inc. 2012 Model Page 17 NodeMap model of YARN cluster ComponentHistory persistent history of component placements Specification resources.json &c Container Queues requested, starting, releasing Component Map container ID -> component instance Event History application history Persisted Rebuilt Transient
  • 18. © Hortonworks Inc. 2012 Slider as a Service Page 18 Slider AM Components YARN RM YARN NM Components Components Application resources appconf 2. queues 1. builds 7. REST Commands 3.launches 4. reads 5. starts containers 8. Events 6. heartbeats  commands
  • 19. © Hortonworks Inc. 2012 Bonded: app owns agents Page 19 Slider AM Components YARN RM YARN NM Components Components Application resources appconf 2. queues 1. builds 7. REST Commands 3.launches 4. reads 5. starts containers 8. Events 6. heartbeats  commands
  • 20. © Hortonworks Inc. Configuration Management • JSON specs with two-level (global, component) values • Resolution: inherit cluster, global and component values with override no: x-refs, deep inheritance, undefinition, schemas, … • For PaaS deployment: list needed options • Python scripts in packages get componentK-Vs & choose how to interpret: templates, actions, generate CM specs… • leads to need: publish of dynamic configurations Page 20 Goal: avoid grand-Hadoop-only CM infrastructures
  • 21. © Hortonworks Inc. 2012 { "schema" : "http://example.org/specification/v2.0.0", "metadata" : { }, "global" : { "yarn.vcores" : "1", "yarn.memory" : "256", }, "components" : { "rs" : { "yarn.memory" : "512", "yarn.priority" : "2", "yarn.instances" : "4" }, "slider-appmaster" : { "yarn.instances" : "1" }, "master" : { "yarn.priority" : "1", "yarn.instances" : "1" } } } JSON Specification
  • 22. © Hortonworks Inc. Service Registration and Discovery • extends beyond YARN to whole cluster -why cant I bootstrap -site.xml configs from ZK info • LDAP story? • curator: flat service list; REST view • Helix: separate internal and external views • …others Page 22 recurrent need - YARN-913 Help Needed!
  • 23. © Hortonworks Inc. Service registry thoughts hadoop/services/$user/$service • $service/externalView $service/internalView $service/remoteView • components underneath with same structure hadoop/services/$user/$service/components/$id • ports & URLs with metadata (protocol, description) Page 23 For enumeration, UIs, Knox,
  • 24. © Hortonworks Inc. How to publish configuration docs? • K-V pairs costly for ZK • LDAP integration? ApacheDS™ & enterprise • String Template generation of documents • HTTP serving Initial plan • Serve content and servicein the slider AM • ZK entries to list URLs Page 24 Hard problem
  • 25. © Hortonworks Inc. YARN-896: long-lived services 1. Container reconnect on AM restart 2. YARN Token renewal on long-lived apps 3. Containers: signalling, >1 process sequence 4. AM/RM managed gang scheduling 5. Anti-affinity hint in container requests 6. Service Registry (YARN-913) 7. Logging Page 25 ✓
  • 26. © Hortonworks Inc. SLAs & co-existence with analytics 1. Make IO bandwidth/IOPs schedule resources 2. Need to monitor what's going on with IO & net load from containers  apps  queues 3. Dynamic adaptation of cgroup HDD, Net, RAM limits 4. Could we throttle MR job File & HDFS IO bandwidth? Page 26
  • 27. © Hortonworks Inc. YARN Proxy doesn’t like REST APIs • Proxy service downgrades all actions to GET • AmIpFilter (in AM Web filters) redirects all operations to Proxy with 302 (GET) if origin != proxy IP Workaround: SliderAmIPFilter selective on redirects Long Term: –Proxy to forward –Filter to 307 redirect, –httpclient callers to enable 307 handling Page 27
  • 28. © Hortonworks Inc. Status as of April 2014 • Initial agent, REST API, container launch • Package-based deployments of HBase, Ambari, Storm • Hierarchical configuration JSON evolving • Service registry - work in progress • Incubator: proposal submitted

Notas del editor

  1. Here&apos;s the limitations of this approach, as well as some of the implementation itself
  2. This is the model behind the AM. Data is pre
  3. Slider as a Service. Owning app is only client allowed to talk to AM API (ensures it is in charge and nothing happens behind its back).Slider manages everything as if it was standalone.
  4. Bonded is something we are thinking of, but haven&apos;t started to implement yet. Here the components are told to talk directly to the owning appSlider&apos;s role reduced to one of managing the #of instances of each componentApp has to implement slider agent heartbeat operation, send commands back -plan: make callback code reusable lib.Allows: more commands to maybe go into to components, monitoring hookups, …
  5. The global section defines global values, individual components can override any -otherwise they inherit the global values
  6. JMX port binding &amp; publishing of portweb port rolling restart of NM/RMAM retry logicTesting: chaos monkey for YARNLogging: running Samza without HDFS -costs of ops &amp; latency. Are only running Samza clusters w/ YARN. YARN puts logs into HDFS by default, so without HDFS you are stuffed.-rollover of stdout &amp; stderr -have the NM implement the rolling. -Samza could log from log4j to append to Kafka; need a way to pull out and view. Adds 15 min YARN can use http: URLs to pull in local resourceSamza can handle outages of a few minutes, but for other services rolling restart/upgadeno classic scheduling; assume full code can run or buy more hardware-RM to tell AM when request cant be satisfied
  7. co-existenc