Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
November 2013 HUG: Compute Capacity Calculator
1. C3 – Compute Capacity Calculator
Hadoop User Group (HUG) – 20 Nov 2013
Viraj Bhat
viraj@yahoo-inc.com
2. Why we need this tool?
o Capacity Planning for a multi-tenant system like Hadoop Grid is
critical
o Project Owners need to estimate their project capacity
requirements for provisioning on Hadoop clusters
o BU-POCs need to have capacity estimates from projects to
manage their demand vs. supply equation within their business
units
o SEO needs product owners to provide Grid capacity requirements
quarterly (CAR)
3. Onboarding Projects - Challenge
o Application developers typically develop and test their Hadoop Jobs
or Oozie workflows on the limited capacity, shared prototyping
research Hadoop cluster with partial data sets before on-boarding to
production Hadoop clusters
o Research and Production Hadoop Grids, have varying map reduce
slots, container sizes, compute and communication costs
o Projects may need optimization before being on boarded
o SupportShop is the front end portal for teams to onboard projects
onto Yahoo! Grids
o
Onboarding tool known as Himiko tracks users requests till the project is
provisioned on the cluster
5. C3 Tool Requirements
o Self-Serve deployed as a web interface tool hosted within end-user onestop portal – SupportSHOP
o Rule Based Uses post-job execution diagnostic rule engine to calculate
the computation capacities
o SLA Focus Given a desired SLA, the tool will calculate optimal compute
resources required on the cluster for the entire SLA range of [ 2x to 0.25x]
o Hide Complexity should take into account the source & target cluster’s
map-reduce slot configuration, internal Hadoop scheduling and execution
details as well as hardware specific “speedup” in calculating the compute
capacities
o Pig Jobs Support should analyze the Job DAG (Directed Acyclic Graph)
of Map Reduce job spawned by Pig to accurately compute the capacities
o Oozie Support: workflows running on our Grids use Oozie
6. C3 Architecture
Job Type:[Pig]
Grid Name: [..]
Pig Console Output: [Location]
SLA [Mins]: [..]
Browser
C3 php forms
SupportShop Frontend
Submit
Job Type:[MR]
Grid Name: [..]
Job ID: [job_202030_1234]
SLA [Mins]: [..]
Submit
SupportShop Backend
Web Server
yphp backend
Input forms
1) Parse pig logs/oozie jobid
2) Copy pig logs
3) Run pending jobs from db
Record completed jobs to db
Compute Capacity Report
Job Type:[Pig/MR]
Grid Name: [..]
SLA [Mins]: [..]
Map Slot Capacity
Reduce Slot Capacity
Job Dag
C3 DB
C3Cronjob
1) Fetch job history logs and conf
logs using HDFS Proxy
2) Execute Hadoop Vaidya rule
3) Send results back to c3cronjob
C3
Core Logic
HDFS Proxy
Output report is emailed to
user
Yahoo! Grid
7. C3 – Compute Capacity Calculator
o Calculate the compute capacity needed for their M/R jobs to meet the
required processing time Service Level Agreement (SLA)
o Compute capacity is calculated in terms of number of Map and Reduce
slots/containers
o
Estimate machines procured based on the Map and Reduce Slots/containers
o Projects normally run their jobs on the research cluster and are onboarded to
the production cluster
o Tool should automatically match the map reduce slot ratio in research to
production (Hadoop 1.x)
o Capacities of M/R jobs which are launched in parallel are added
o
Example: Fork in Oozie workflows
o Maximum of the Capacity of M/R’s jobs are considered when launched in
sequence
o
Example: Pig Dag which produces sequential jobs
8. C3 Statistics
o C3 and Himiko have helped onboard more than 200 projects
o More than 2300+ requests have been submitted to C3
o C3 has analyzed Pig Dag which consists of more than 200 individual
M/R jobs
o C3 has helped detect performance issues with certain M/R jobs
where excessive mappers were being used in a Pig script
9. C3 Backend – Hadoop Vaidya
o Rule based performance diagnosis of
M/R jobs
o
M/R performance analysis expertise is captured
and provided as an input through a set of predefined diagnostic rules
o
Detects performance problems by postmortem
analysis of a job by executing the diagnostic rules
against the job execution counters
o
Provides targeted advice against individual
performance problems
o Extensible framework
o
You can add your own rules based on a rule
template and published job counters data
structures
o
Write complex rules using existing simpler rules
Vaidya: An expert (versed
in his own profession , esp.
in medical science) , skilled
in the art of healing , a
physician
10. C3 Rule logic at the Backend
o Reduce slot capacity/containers is same as number of reduce
slots/containers required for number of reducers specified for the M/R
job
o Calculate shuffle time as amount of data per reducer / 4MBps
(conservative estimate of bandwidth) - configurable
o Reduce phase time =~ max (sort + reduce logic time) of reducers *
speedup
o Map Phase time = SLA - (shuffle time - reduce phase time) * speedup
o Map slot capacity = MAP_SLOT_MILLIS / Map Phase time (in millis)
o MAP_SLOT_MILLIS = Median of the 10% of the worst performing mappers
o Once we get initial Map and Reduce slot capacity using above
calculations, iteratively get their ratio close to slot configuration per
node (Hadoop 1.0)
o Add 10% slots for speculative execution (failed/killed task attempts)
16. Future Enhancements
o C3 should output the storage requirements for a job
o Display Map and Reduce runtime
o Capacity planning for custom Map Reduce jobs which can provide
an xml of their DAG’s
o Introduce more granular estimation using a speed-up factor per
cluster based on the hardware node configuration (processors,
memory etc)
o C3 should accept % data input to accurately estimate the
capacities
17. Links
o Hadoop Vaidya
o https://hadoop.apache.org/docs/r1.2.1/vaidya.html
o Hadoop Vaidya Job History Server Integration for Hadoop 2.0
o https://issues.apache.org/jira/browse/MAPREDUCE-3202