Describing YARN's architecture, Resource localization model, security and future work (like rm restart, RM -HA), contibuting to open source and hadoop.
1. YARN- way to
share cluster
beyond
traditional
HADOOP
Omkar Joshi
YARN team
Hortonworks,Inc
2. About me..
› Software
developer at Hortonworks Inc.,
Palo Alto
› Contributor to Apache YARN and
MAPREDUCE projects
› Worked on resource localization
(distributed cache), security
› Currently working on resource manager
restart
3. Agenda
› Classical
HADOOP MAPREDUCE
framework
› YARN architecture
› Resource
scheduling
› Resource localization (Distributed cache)
› Security
› Future work
› How
to write custom application on YARN
› How to contribute to Open source
› Q&A
5. Drawbacks
› Scalability
› Limited
to ~4000 cluster nodes
› Maximum ~40000 concurrent tasks
› Synchronization in Job Tracker becomes
tricky
› If
Job Tracker fails then everything fails.
Users will have to resubmit all the jobs.
› Very poor cluster utilization
› Fixed
map and reduce slots
6. Drawbacks contd..
› Lacks
support to run and share cluster
resources with NON MAPREDUCE
applications.
› Lacks support for wire compatibility.
› All
clients need to have same version.
7. So what do we need?
› Better
› 10K+
› High
scalability
nodes, 10K+ jobs
availability
› Better resource utilization
› Support for multiple application
frameworks
› Support for aggregating logs
› Wire compatibility
› Easy to up grade the cluster
8. Thinktank!!
› Lets
separate the logic of managing
cluster resources from managing
application itself
› All the applications including MAPREDUCE
will run in user land.
› Better
isolation in secure cluster
› More fault tolerant.
9. Architecture
› Application
› Job
submitted by user
› Application
Master
› Just
like job tracker
› For MAPREDUCE it will manage all the map
and reduce tasks – progress, restart etc.
› Container
› Unit
of allocation(simple process)
› Replacing fixed map and reduce slots
› Eg. Container 1 = 2 GB, 4 CPU
10. Architecture contd...
› Resource
Manager (RM)
› Single
resource scheduler (Pluggable)
› Stores App state (No need to resubmit
application if RM restarts)
› Node
› Per
Manager (NM)
machine ..think like task tracker
› Manages container life cycle
› Aggregating application logs
12. How job gets executed?
2.
8.
Client
App
Master
4.
7.
5.
1.
Resource
Manager
(RM)
Node
Manager(NM)
8.
5.
1.
2.
3.
4.
5.
6.
7.
8.
3.
Node
Manager(NM)
Client submits application. (ex. MAPREDUCE)
RM asks NM to start Application Master (AM)
Node manager starts application master inside a container (process).
Application Master first registers with RM and then keeps requesting
new resources. On the same AMRM protocol it also reports the
application status to RM.
When RM allocates a new container to AM it then goes to the
specified NM and requests it to launch container (eg. Map task).
Newly stated container then will follow the application logic and
keep reporting to AM its progress.
Once done AM informs RM that application is successful.
RM then informs NM about finished application and asks it to start
aggregating logs and cleaning up container specific files.
Container
Node
Manager(NM)
6.
13. Resource Scheduler
› Pluggable
(Default is Capacity Scheduler)
› Capacity Scheduler
› Hierarchical
› Can
› User
queues
think of it as queues per Organization
limit (range of resources to use)
› Elasticity
› Black/White listing of resources.
› Supports resource priorities
› Security – queue level ACLs
› Find more about Capacity Scheduler
16. Resource Localization
› When
node manager launches container it
needs the executable to run
› Resources (files) to be downloaded should be
specified as a part of Container launch
context
› Resource Types
PUBLIC : - accessible to all
› PRIVATE :- accessible to all containers of a single
user
› APPLICATION :- only to single application
›
17. Resource Localization contd..
› Public
localizer downloads public
resources(owned by NM).
› Private localizer downloads private and
application resources(owned by user).
› Per user quota not supported yet.
› LRU cache with configurable size.
› As soon as it is localized it looses any
connection with remote location.
› Public localizer supports parallel download
where as private localizer support limited
parallel download.
18. Resource Localization contd..
AM requests 2
resources while starting
container
R1 – Public
R2 - Application
HDFS
R1
AM
R2
Public
Localizer
R1 &
R2
R1
Public
cache
(NM)
NM
Private
Localizer
Private
cache( User)
Cache
User2
User1
App
cache
A2
A1
R2
19. Security
› All
the users can not be trusted. Confidential
data /application’s data need to be
protected
› Resource manager and Node managers are
started as “yarn(super)” users.
› All applications and containers run as user
who submitted the job
› Use LinuxContainerExecutor to launch user
process. (see container-executor.c)
› Private localizers too run as app_user.
20. Security contd..
› Kerberos(TGT)
while submitting the job.
› AMRMToken :- for AM to talk to RM.
› NMToken :- for AM to talk to NM for launching
new containers
› ContainerToken :- way for RM to pass
container information from RM to NM via AM.
›
Contains resource and user information
› LocalizerToken
:- Used by private localizer
during resource localization
› RMDelegationToken :- useful when kerberos
(TGT) is not available.
22. Resource manager restart
› Saves
application state.
› Support
for Zookeeper and HDFS based
state store
› Can
recover application from saved
state. No need to resubmit the
application
› Today support only non work preserving
mode
› Lays foundation for RM-HA
24. Future work
› RM
restart
› Non
work preserving mode ..almost done
› Work preserving mode .. Needs more effort
› RM
HA .. Just started
› Task / container preemption..
› Rolling upgrades
› Support for long running services
25. Different applications already
running on YARN
› Apache
Giraph(graph processing)
› Spark (real time processing)
› Apache Tez
› MapReduce( MRV2)
› Apache Hbase (HOYA)
› Apache Helix( incubator project)
› Apache Samza (incubator project)
› Storm
26. Writing an application on
YARN
Take a look at Distributed shell
› Write Application Master which once started will
›
›
›
›
›
First register itself with RM on AMRMprotocol
Keep heartbeating and requesting resources via
“allocate”
Use container management protocol to launch
future containers on NM.
Once done notify RM via finishApplicationMaster
Always use AMRMClient and NMClient while
talking to RM / NM.
› Use distributed cache wisely.
›
27. Want to contribute to Open
source?
› Follow
this post
› Subscribe to apache user, yarn dev/issues
mailing list link
› Track YARN-issues
› Post your questions on user mailing list.
Try to be specific and add more information to
get better and quick replies
› Try to be patient.
›
› Start
with simple tickets to get an idea about
the underlying component.