Dr. Elephant is a tool for the users of Hadoop to help them understand, analyze and tune their Hadoop/Spark applications easily, thus improving their productivity and the cluster’s efficiency. It analyzes the Hadoop and Spark jobs using a set of pluggable, configurable, rule-based heuristics that provide insights on how a job performed, and then uses the results to make suggestions about how to tune the job to make it perform more efficiently.
4. Scale and Optimize Hardware
● More users, more jobs, more resources
● Large investment in hardware
● Can’t keep upgrading and adding machines to solve problem forever
● Some tuning is needed to get things running
7. User Productivity
● Freedom to experiment and run jobs on the cluster
● Build tools to help developers. (Hadoop DSL, Resolvers for Pig/Hive)
○ Improve developer lifecycle
○ Also reduce unnecessary resource wastage
11. Expert Intervention
● Not enough support resources available
● Poor coverage
● Difficult to prioritize efforts
● Delays user development
Random
Suggestions
12. Training is not at all easy
● Too many users
● Diverse backgrounds
● Scope is large and evolving
● Other responsibilities are more important
15. What does Dr. Elephant do?
● Automated performance monitoring and tuning tool
● Help every user get the best performance from their jobs
● Highlights common mistakes
● Indicates best practices and tuning tips
● Provides a platform for other performance related tools
● Analyzes hundred thousand jobs every day
25. Simplified analysis of a flow’s historical executions
● Monitoring performance, resource usage and many others
● Comparing flows against previous executions
● Impact of tuning a specific parameter or a changing a line of code
29. How does a Heuristic work?
● Fetch Counters and Task Data
● Some logic to compute a value
● Compare value against threshold levels
30. Heuristic Severity
Severity Color Description
CRITICAL The job is in critical state and must be tuned
SEVERE There is scope for improvement
MODERATE There is scope for further improvement
LOW There is scope for few minor improvements
NONE The job is safe. No tuning necessary
37. Adding a New Heuristic
1. Create a new heuristic and test it.
2. Create a new view for the heuristic. For example, helpMapperSpill.scala.html
3. Add the details of the heuristic in the HeuristicConf.xml file.
<heuristic>
<applicationtype>mapreduce</applicationtype>
<heuristicname>Mapper GC</heuristicname>
<classname>com.linkedin.dre.mapreduce.heuristics.MapperGC</classname>
<viewname>views.html.help.mapreduce.helpGC</viewname>
</heuristic>
4. Run Dr. Elephant. It should now include the new heuristics.
40. Workflow monitoring and reports
● Performance characteristics change
○ Data Growth
○ Data distribution change
○ Hardware change
○ Incremental software change
● Monitor performance on each execution
● Compare behaviour across revisions
● Cost to Serve analysis
41. Production Reviews | JIRA Bot
● Separate cluster for critical workloads
● Audit before deployment
● Improved accuracy
● Faster turnaround
● Higher throughput