SlideShare a Scribd company logo
1 of 24
Download to read offline
1© Cloudera, Inc. All rights reserved.
Troubleshooting Hadoop:
Distributed Debugging
Dustin Cote | Customer Operations Engineer
2© Cloudera, Inc. All rights reserved.
Roadmap
• The Hadoop Ecosystem
• What is Hadoop?
• What are some clear challenge areas?
• Debugging tools
• How do built-in Linux tools help?
• Where do we look for typical problems?
• Custom tooling to facilitate problem solving
• Deep dive example
• Application with intermittent failure
• Some data is bigger than others
3© Cloudera, Inc. All rights reserved.
The Hadoop Ecosystem
4© Cloudera, Inc. All rights reserved.
What is Hadoop?
• Top level Apache project for storing and processing large data sets
• Originally an implementation of Google’s Mapreduce and Google File System
papers
• Since evolved to be the general platform for working with petabyte scale
datasets
• Specifically relevant for this presentation
• Mostly implemented in Java
• Users generally expand to 20+ other components that work with Hadoop
• Master-Slave architecture
• Commonly used “in the cloud”
5© Cloudera, Inc. All rights reserved. 5© Cloudera, Inc. All rights reserved.
6© Cloudera, Inc. All rights reserved. 6© Cloudera, Inc. All rights reserved.
7© Cloudera, Inc. All rights reserved.
Challenge Areas
• Infrastructure
• Network sensitivity
• Disk contention
• JVM Scaling
• Garbage collection
• Memory sizing
• Configuration management
• Host inconsistencies
• Platform config inconsistencies
• Version tracking
8© Cloudera, Inc. All rights reserved.
Debugging tools
9© Cloudera, Inc. All rights reserved.
Linux-based utilities
• Hadoop runs on Linux
• Leverage existing skillsets
• Log parsing
• grep, sed, awk, perl, etc.
• Network health
• ifconfig, telnet, traceroute, tcpdump, etc.
• Process health
• top, ps, etc.
• System health
• dmesg, messages, etc.
10© Cloudera, Inc. All rights reserved.
Extending Linux-based utilities
• My application logs are 80GB!
• split, filter, slice, but how?
• ERROR is a good place to start
• zgrep when you have time
• Keywords for YARN applications
• ApplicationMaster, MRAppMAster
• FAIL, KILL, timed out
• Map those container IDs (container_XXXXX_XX)
11© Cloudera, Inc. All rights reserved.
JVM tools
• Mostly all Java means a mostly familiar toolkit
• jstack, jmap, jconsole, jps
• Careful with heap dumps, data processing JVMs can have 10+ GB heap sizes
• Garbage collection logging (-XX:+PrintGCDetails)
• Lots of different users, make sure you are running as the right user collecting JVM
metrics
• Do not just run as root everywhere
• Sudo to the JVM owner when collecting jstacks and jmaps
12© Cloudera, Inc. All rights reserved.
Source code!
• Most of the code base is open source!
• Found a NullPointerException? Hop on github and find the line.
• https://github.com/apache/hadoop
• Even better, JIRA is available to see known issues
• Hadoop Common
• HDFS
• Mapreduce
• YARN
13© Cloudera, Inc. All rights reserved.
Log analysis helps identify anomalies
• Word counts are simple but powerful
• Tracking service logging overtime shows patterns
• Master tracking helps drill into which slaves may be unhealthy
Custom tooling
14© Cloudera, Inc. All rights reserved.
Configuration management
is hard
• Validating configuration is
in lock-step across all
instances is ideal
• Keep configuration simple
and logical
• At Cloudera, we pull whole
cluster configurations for
validation
Custom tooling
15© Cloudera, Inc. All rights reserved.
Deep dive examples
16© Cloudera, Inc. All rights reserved.
Example
• Initial complaint
• Mapreduce job shows “SUCCESSFUL” but does not generate an output
• Job was known to produce output on smaller datasets
• User environment
• ~100 node cluster
• Running YARN with Mapreduce v2
• Job uses Kite SDK and Apache Crunch API (also open source)
• Job runs for several hours, reproducing is painful
17© Cloudera, Inc. All rights reserved.
Example
• Debugging the environment
• Searching on errors, first this was found
• 2015-04-20 15:40:04,938 WARN [Readahead Thread #1] org.apache.hadoop.io.
ReadaheadPool: Failed readahead on ifile EINVAL: Invalid argument
• Bad disk? Probably not -- this job runs if the data is batch smaller
• User mailing lists confirm this is a false positive!
• File a JIRA and move on
• Other node problems? Probably not -- no indication of other jobs failing
18© Cloudera, Inc. All rights reserved.
Example
• Debugging the application
• Logging obtained through hadoop commands
• yarn logs -applicationId APP_ID > out.file
• logs are huge, need a strategy
• first check if a write-out failure is ignored -- was not
• check if any output data is created at all -- yes!
• output data is then destroyed when moving to final location -- bad, but
why?
19© Cloudera, Inc. All rights reserved.
Example
• Debugging the application
• Need more information, let’s get DEBUG level logging
• Logs are already 80GB
• now we have even more data to sift through, let’s try to focus on the final
move stage
• org.kitesdk.data.mapreduce.
DatasetKeyOutputFormat$MergeOutputCommitter makes that happen, let’
s just raise that class to DEBUG
• success! we see this class toss aside the dataset, but why?
• code shows an int is being used to count records to output :(
20© Cloudera, Inc. All rights reserved.
Example 2
• Initial complaint
• Hive query that used to run in 10 minutes, now is not complete after 10 hours
• nothing has changed (!)
• User environment
• ~50 node cluster
• Hive tables on scale of several hundred GB
• Query with JOIN operations
21© Cloudera, Inc. All rights reserved.
Example 2
• User may not be aware of changes, but what do the logs say?
• Hive generates mapreduce jobs deterministically based on:
• Table structure
• optimization flags
• HQL (SQL-like) query structure
• User shows it’s the same query and no properties have changed
• Back to those challenge areas (Infrastructure, JVM, Config management)
22© Cloudera, Inc. All rights reserved.
Example 2
• Config management is easiest
•Running from another client machine?
•cluster side default changes? (upgrades, patches, etc.)
• JVM is next easiest
•let’s pull in the Mapreduce logs again
•yarn logs -applicationId APP_ID > out.log
•2015-11-24 17:56:46,324 INFO [main] org.apache.hadoop.hive.ql.exec.
CommonJoinOperator: table 0 has 3628000 rows for join key [00011]
23© Cloudera, Inc. All rights reserved.
Example 2
• Why so many rows for one join key?
•How many rows overall? ~1.8 billion!
•Just disk write throughput will take several hours (considering size)
• So, what changed?
•Configurations would not create more rows in the output
•JVM settings and memory management doesn’t seem likely
•Infrastructure was never going to be fast enough to do this in 10 minutes
• UAT testing was being used for performance baseline!
•Hadoop scales linearly only if you scale your data linearly :)
24© Cloudera, Inc. All rights reserved.
Thank you

More Related Content

What's hot

Deployment and Management of Hadoop Clusters
Deployment and Management of Hadoop ClustersDeployment and Management of Hadoop Clusters
Deployment and Management of Hadoop ClustersAmal G Jose
 
Improving Hadoop Performance via Linux
Improving Hadoop Performance via LinuxImproving Hadoop Performance via Linux
Improving Hadoop Performance via LinuxAlex Moundalexis
 
Big Data Performance and Capacity Management
Big Data Performance and Capacity ManagementBig Data Performance and Capacity Management
Big Data Performance and Capacity Managementrightsize
 
Welcome to Hadoop2Land!
Welcome to Hadoop2Land!Welcome to Hadoop2Land!
Welcome to Hadoop2Land!Uwe Printz
 
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)Hari Shankar Sreekumar
 
Hadoop meets Agile! - An Agile Big Data Model
Hadoop meets Agile! - An Agile Big Data ModelHadoop meets Agile! - An Agile Big Data Model
Hadoop meets Agile! - An Agile Big Data ModelUwe Printz
 
Hadoop World 2011: Hadoop and Performance - Todd Lipcon & Yanpei Chen, Cloudera
Hadoop World 2011: Hadoop and Performance - Todd Lipcon & Yanpei Chen, ClouderaHadoop World 2011: Hadoop and Performance - Todd Lipcon & Yanpei Chen, Cloudera
Hadoop World 2011: Hadoop and Performance - Todd Lipcon & Yanpei Chen, ClouderaCloudera, Inc.
 
Advanced Hadoop Tuning and Optimization - Hadoop Consulting
Advanced Hadoop Tuning and Optimization - Hadoop ConsultingAdvanced Hadoop Tuning and Optimization - Hadoop Consulting
Advanced Hadoop Tuning and Optimization - Hadoop ConsultingImpetus Technologies
 
Hadoop 2 - More than MapReduce
Hadoop 2 - More than MapReduceHadoop 2 - More than MapReduce
Hadoop 2 - More than MapReduceUwe Printz
 
Applications on Hadoop
Applications on HadoopApplications on Hadoop
Applications on Hadoopmarkgrover
 
Hadoop interview quations1
Hadoop interview quations1Hadoop interview quations1
Hadoop interview quations1Vemula Ravi
 
Optimizing your Infrastrucure and Operating System for Hadoop
Optimizing your Infrastrucure and Operating System for HadoopOptimizing your Infrastrucure and Operating System for Hadoop
Optimizing your Infrastrucure and Operating System for HadoopDataWorks Summit
 
Hadoop configuration & performance tuning
Hadoop configuration & performance tuningHadoop configuration & performance tuning
Hadoop configuration & performance tuningVitthal Gogate
 

What's hot (20)

Deployment and Management of Hadoop Clusters
Deployment and Management of Hadoop ClustersDeployment and Management of Hadoop Clusters
Deployment and Management of Hadoop Clusters
 
Improving Hadoop Performance via Linux
Improving Hadoop Performance via LinuxImproving Hadoop Performance via Linux
Improving Hadoop Performance via Linux
 
February 2014 HUG : Pig On Tez
February 2014 HUG : Pig On TezFebruary 2014 HUG : Pig On Tez
February 2014 HUG : Pig On Tez
 
Apache Spark
Apache SparkApache Spark
Apache Spark
 
Big Data Performance and Capacity Management
Big Data Performance and Capacity ManagementBig Data Performance and Capacity Management
Big Data Performance and Capacity Management
 
Introduction to Hadoop Administration
Introduction to Hadoop AdministrationIntroduction to Hadoop Administration
Introduction to Hadoop Administration
 
Welcome to Hadoop2Land!
Welcome to Hadoop2Land!Welcome to Hadoop2Land!
Welcome to Hadoop2Land!
 
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
 
Hadoop meets Agile! - An Agile Big Data Model
Hadoop meets Agile! - An Agile Big Data ModelHadoop meets Agile! - An Agile Big Data Model
Hadoop meets Agile! - An Agile Big Data Model
 
Hadoop World 2011: Hadoop and Performance - Todd Lipcon & Yanpei Chen, Cloudera
Hadoop World 2011: Hadoop and Performance - Todd Lipcon & Yanpei Chen, ClouderaHadoop World 2011: Hadoop and Performance - Todd Lipcon & Yanpei Chen, Cloudera
Hadoop World 2011: Hadoop and Performance - Todd Lipcon & Yanpei Chen, Cloudera
 
Intro To Hadoop
Intro To HadoopIntro To Hadoop
Intro To Hadoop
 
Spark Tips & Tricks
Spark Tips & TricksSpark Tips & Tricks
Spark Tips & Tricks
 
Advanced Hadoop Tuning and Optimization - Hadoop Consulting
Advanced Hadoop Tuning and Optimization - Hadoop ConsultingAdvanced Hadoop Tuning and Optimization - Hadoop Consulting
Advanced Hadoop Tuning and Optimization - Hadoop Consulting
 
Hadoop 2 - More than MapReduce
Hadoop 2 - More than MapReduceHadoop 2 - More than MapReduce
Hadoop 2 - More than MapReduce
 
Hadoop 1.x vs 2
Hadoop 1.x vs 2Hadoop 1.x vs 2
Hadoop 1.x vs 2
 
Applications on Hadoop
Applications on HadoopApplications on Hadoop
Applications on Hadoop
 
Hadoop interview quations1
Hadoop interview quations1Hadoop interview quations1
Hadoop interview quations1
 
Optimizing your Infrastrucure and Operating System for Hadoop
Optimizing your Infrastrucure and Operating System for HadoopOptimizing your Infrastrucure and Operating System for Hadoop
Optimizing your Infrastrucure and Operating System for Hadoop
 
Hadoop configuration & performance tuning
Hadoop configuration & performance tuningHadoop configuration & performance tuning
Hadoop configuration & performance tuning
 
Hadoop Interview Questions and Answers
Hadoop Interview Questions and AnswersHadoop Interview Questions and Answers
Hadoop Interview Questions and Answers
 

Viewers also liked

70a monitoring & troubleshooting
70a monitoring & troubleshooting70a monitoring & troubleshooting
70a monitoring & troubleshootingmapr-academy
 
Hadoop 31-frequently-asked-interview-questions
Hadoop 31-frequently-asked-interview-questionsHadoop 31-frequently-asked-interview-questions
Hadoop 31-frequently-asked-interview-questionsAsad Masood Qazi
 
10 Common Hadoop-able Problems Webinar
10 Common Hadoop-able Problems Webinar10 Common Hadoop-able Problems Webinar
10 Common Hadoop-able Problems WebinarCloudera, Inc.
 
A Survey on Big Data Analysis Techniques
A Survey on Big Data Analysis TechniquesA Survey on Big Data Analysis Techniques
A Survey on Big Data Analysis Techniquesijsrd.com
 
Apache Ambari: Simplified Hadoop Cluster Operation & Troubleshooting
Apache Ambari: Simplified Hadoop Cluster Operation & TroubleshootingApache Ambari: Simplified Hadoop Cluster Operation & Troubleshooting
Apache Ambari: Simplified Hadoop Cluster Operation & TroubleshootingJayush Luniya
 
A day in the life of hadoop administrator!
A day in the life of hadoop administrator!A day in the life of hadoop administrator!
A day in the life of hadoop administrator!Edureka!
 
Hive Apachecon 2008
Hive Apachecon 2008Hive Apachecon 2008
Hive Apachecon 2008athusoo
 
Hadoop Summit 2009 Hive
Hadoop Summit 2009 HiveHadoop Summit 2009 Hive
Hadoop Summit 2009 HiveNamit Jain
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoopMohit Tare
 
Getting Started on Hadoop
Getting Started on HadoopGetting Started on Hadoop
Getting Started on HadoopPaco Nathan
 
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015Deanna Kosaraju
 
Apache Spark Introduction @ University College London
Apache Spark Introduction @ University College LondonApache Spark Introduction @ University College London
Apache Spark Introduction @ University College LondonVitthal Gogate
 
MongoDB Administration 101
MongoDB Administration 101MongoDB Administration 101
MongoDB Administration 101MongoDB
 
Key Considerations for Putting Hadoop in Production SlideShare
Key Considerations for Putting Hadoop in Production SlideShareKey Considerations for Putting Hadoop in Production SlideShare
Key Considerations for Putting Hadoop in Production SlideShareMapR Technologies
 
Proof of Concept for Hadoop: storage and analytics of electrical time-series
Proof of Concept for Hadoop: storage and analytics of electrical time-seriesProof of Concept for Hadoop: storage and analytics of electrical time-series
Proof of Concept for Hadoop: storage and analytics of electrical time-seriesDataWorks Summit
 
Elassandra: Elasticsearch as a Cassandra Secondary Index (Rémi Trouville, Vin...
Elassandra: Elasticsearch as a Cassandra Secondary Index (Rémi Trouville, Vin...Elassandra: Elasticsearch as a Cassandra Secondary Index (Rémi Trouville, Vin...
Elassandra: Elasticsearch as a Cassandra Secondary Index (Rémi Trouville, Vin...DataStax
 
Hadoop Interview Questions and Answers by rohit kapa
Hadoop Interview Questions and Answers by rohit kapaHadoop Interview Questions and Answers by rohit kapa
Hadoop Interview Questions and Answers by rohit kapakapa rohit
 
Infinit: Modern Storage Platform for Container Environments
Infinit: Modern Storage Platform for Container EnvironmentsInfinit: Modern Storage Platform for Container Environments
Infinit: Modern Storage Platform for Container EnvironmentsDocker, Inc.
 
Practical Problem Solving with Apache Hadoop & Pig
Practical Problem Solving with Apache Hadoop & PigPractical Problem Solving with Apache Hadoop & Pig
Practical Problem Solving with Apache Hadoop & PigMilind Bhandarkar
 

Viewers also liked (20)

70a monitoring & troubleshooting
70a monitoring & troubleshooting70a monitoring & troubleshooting
70a monitoring & troubleshooting
 
Hadoop 31-frequently-asked-interview-questions
Hadoop 31-frequently-asked-interview-questionsHadoop 31-frequently-asked-interview-questions
Hadoop 31-frequently-asked-interview-questions
 
10 Common Hadoop-able Problems Webinar
10 Common Hadoop-able Problems Webinar10 Common Hadoop-able Problems Webinar
10 Common Hadoop-able Problems Webinar
 
A Survey on Big Data Analysis Techniques
A Survey on Big Data Analysis TechniquesA Survey on Big Data Analysis Techniques
A Survey on Big Data Analysis Techniques
 
Apache Ambari: Simplified Hadoop Cluster Operation & Troubleshooting
Apache Ambari: Simplified Hadoop Cluster Operation & TroubleshootingApache Ambari: Simplified Hadoop Cluster Operation & Troubleshooting
Apache Ambari: Simplified Hadoop Cluster Operation & Troubleshooting
 
A day in the life of hadoop administrator!
A day in the life of hadoop administrator!A day in the life of hadoop administrator!
A day in the life of hadoop administrator!
 
Hive Apachecon 2008
Hive Apachecon 2008Hive Apachecon 2008
Hive Apachecon 2008
 
Hadoop Summit 2009 Hive
Hadoop Summit 2009 HiveHadoop Summit 2009 Hive
Hadoop Summit 2009 Hive
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
 
Hadoop fault-tolerance
Hadoop fault-toleranceHadoop fault-tolerance
Hadoop fault-tolerance
 
Getting Started on Hadoop
Getting Started on HadoopGetting Started on Hadoop
Getting Started on Hadoop
 
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
 
Apache Spark Introduction @ University College London
Apache Spark Introduction @ University College LondonApache Spark Introduction @ University College London
Apache Spark Introduction @ University College London
 
MongoDB Administration 101
MongoDB Administration 101MongoDB Administration 101
MongoDB Administration 101
 
Key Considerations for Putting Hadoop in Production SlideShare
Key Considerations for Putting Hadoop in Production SlideShareKey Considerations for Putting Hadoop in Production SlideShare
Key Considerations for Putting Hadoop in Production SlideShare
 
Proof of Concept for Hadoop: storage and analytics of electrical time-series
Proof of Concept for Hadoop: storage and analytics of electrical time-seriesProof of Concept for Hadoop: storage and analytics of electrical time-series
Proof of Concept for Hadoop: storage and analytics of electrical time-series
 
Elassandra: Elasticsearch as a Cassandra Secondary Index (Rémi Trouville, Vin...
Elassandra: Elasticsearch as a Cassandra Secondary Index (Rémi Trouville, Vin...Elassandra: Elasticsearch as a Cassandra Secondary Index (Rémi Trouville, Vin...
Elassandra: Elasticsearch as a Cassandra Secondary Index (Rémi Trouville, Vin...
 
Hadoop Interview Questions and Answers by rohit kapa
Hadoop Interview Questions and Answers by rohit kapaHadoop Interview Questions and Answers by rohit kapa
Hadoop Interview Questions and Answers by rohit kapa
 
Infinit: Modern Storage Platform for Container Environments
Infinit: Modern Storage Platform for Container EnvironmentsInfinit: Modern Storage Platform for Container Environments
Infinit: Modern Storage Platform for Container Environments
 
Practical Problem Solving with Apache Hadoop & Pig
Practical Problem Solving with Apache Hadoop & PigPractical Problem Solving with Apache Hadoop & Pig
Practical Problem Solving with Apache Hadoop & Pig
 

Similar to Troubleshooting Hadoop: Distributed Debugging

Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...Data Con LA
 
Intro to Apache Kudu (short) - Big Data Application Meetup
Intro to Apache Kudu (short) - Big Data Application MeetupIntro to Apache Kudu (short) - Big Data Application Meetup
Intro to Apache Kudu (short) - Big Data Application MeetupMike Percy
 
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014cdmaxime
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaSwiss Big Data User Group
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impalahuguk
 
project--2 nd review_2
project--2 nd review_2project--2 nd review_2
project--2 nd review_2Aswini Ashu
 
project--2 nd review_2
project--2 nd review_2project--2 nd review_2
project--2 nd review_2aswini pilli
 
DrupalCampLA 2014 - Drupal backend performance and scalability
DrupalCampLA 2014 - Drupal backend performance and scalabilityDrupalCampLA 2014 - Drupal backend performance and scalability
DrupalCampLA 2014 - Drupal backend performance and scalabilitycherryhillco
 
Hadoop Essentials -- The What, Why and How to Meet Agency Objectives
Hadoop Essentials -- The What, Why and How to Meet Agency ObjectivesHadoop Essentials -- The What, Why and How to Meet Agency Objectives
Hadoop Essentials -- The What, Why and How to Meet Agency ObjectivesCloudera, Inc.
 
MySQL in the Hosted Cloud
MySQL in the Hosted CloudMySQL in the Hosted Cloud
MySQL in the Hosted CloudColin Charles
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : BeginnersShweta Patnaik
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : BeginnersShweta Patnaik
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : BeginnersShweta Patnaik
 
What is new in Apache Hive 3.0?
What is new in Apache Hive 3.0?What is new in Apache Hive 3.0?
What is new in Apache Hive 3.0?DataWorks Summit
 
Hadoop Robot from eBay at China Hadoop Summit 2015
Hadoop Robot from eBay at China Hadoop Summit 2015Hadoop Robot from eBay at China Hadoop Summit 2015
Hadoop Robot from eBay at China Hadoop Summit 2015polo li
 
Running Airflow Workflows as ETL Processes on Hadoop
Running Airflow Workflows as ETL Processes on HadoopRunning Airflow Workflows as ETL Processes on Hadoop
Running Airflow Workflows as ETL Processes on Hadoopclairvoyantllc
 

Similar to Troubleshooting Hadoop: Distributed Debugging (20)

YARN
YARNYARN
YARN
 
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
 
Intro to Apache Kudu (short) - Big Data Application Meetup
Intro to Apache Kudu (short) - Big Data Application MeetupIntro to Apache Kudu (short) - Big Data Application Meetup
Intro to Apache Kudu (short) - Big Data Application Meetup
 
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
 
project--2 nd review_2
project--2 nd review_2project--2 nd review_2
project--2 nd review_2
 
project--2 nd review_2
project--2 nd review_2project--2 nd review_2
project--2 nd review_2
 
Drupal performance
Drupal performanceDrupal performance
Drupal performance
 
DrupalCampLA 2014 - Drupal backend performance and scalability
DrupalCampLA 2014 - Drupal backend performance and scalabilityDrupalCampLA 2014 - Drupal backend performance and scalability
DrupalCampLA 2014 - Drupal backend performance and scalability
 
Hadoop Essentials -- The What, Why and How to Meet Agency Objectives
Hadoop Essentials -- The What, Why and How to Meet Agency ObjectivesHadoop Essentials -- The What, Why and How to Meet Agency Objectives
Hadoop Essentials -- The What, Why and How to Meet Agency Objectives
 
MySQL in the Hosted Cloud
MySQL in the Hosted CloudMySQL in the Hosted Cloud
MySQL in the Hosted Cloud
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : Beginners
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : Beginners
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : Beginners
 
Hadoop Operations
Hadoop OperationsHadoop Operations
Hadoop Operations
 
What is new in Apache Hive 3.0?
What is new in Apache Hive 3.0?What is new in Apache Hive 3.0?
What is new in Apache Hive 3.0?
 
Hadoop Robot from eBay at China Hadoop Summit 2015
Hadoop Robot from eBay at China Hadoop Summit 2015Hadoop Robot from eBay at China Hadoop Summit 2015
Hadoop Robot from eBay at China Hadoop Summit 2015
 
Running Airflow Workflows as ETL Processes on Hadoop
Running Airflow Workflows as ETL Processes on HadoopRunning Airflow Workflows as ETL Processes on Hadoop
Running Airflow Workflows as ETL Processes on Hadoop
 
Chicago spark meetup-april2017-public
Chicago spark meetup-april2017-publicChicago spark meetup-april2017-public
Chicago spark meetup-april2017-public
 

More from Great Wide Open

The Little Meetup That Could
The Little Meetup That CouldThe Little Meetup That Could
The Little Meetup That CouldGreat Wide Open
 
Lightning Talk - 5 Hacks to Getting the Job of Your Dreams
Lightning Talk - 5 Hacks to Getting the Job of Your DreamsLightning Talk - 5 Hacks to Getting the Job of Your Dreams
Lightning Talk - 5 Hacks to Getting the Job of Your DreamsGreat Wide Open
 
Breaking Free from Proprietary Gravitational Pull
Breaking Free from Proprietary Gravitational PullBreaking Free from Proprietary Gravitational Pull
Breaking Free from Proprietary Gravitational PullGreat Wide Open
 
Dealing with Unstructured Data: Scaling to Infinity
Dealing with Unstructured Data: Scaling to InfinityDealing with Unstructured Data: Scaling to Infinity
Dealing with Unstructured Data: Scaling to InfinityGreat Wide Open
 
You Don't Know Node: Quick Intro to 6 Core Features
You Don't Know Node: Quick Intro to 6 Core FeaturesYou Don't Know Node: Quick Intro to 6 Core Features
You Don't Know Node: Quick Intro to 6 Core FeaturesGreat Wide Open
 
Using Cryptography Properly in Applications
Using Cryptography Properly in ApplicationsUsing Cryptography Properly in Applications
Using Cryptography Properly in ApplicationsGreat Wide Open
 
Lightning Talk - Getting Students Involved In Open Source
Lightning Talk - Getting Students Involved In Open SourceLightning Talk - Getting Students Involved In Open Source
Lightning Talk - Getting Students Involved In Open SourceGreat Wide Open
 
You have Selenium... Now what?
You have Selenium... Now what?You have Selenium... Now what?
You have Selenium... Now what?Great Wide Open
 
How Constraints Cultivate Growth
How Constraints Cultivate GrowthHow Constraints Cultivate Growth
How Constraints Cultivate GrowthGreat Wide Open
 
The Current Messaging Landscape
The Current Messaging LandscapeThe Current Messaging Landscape
The Current Messaging LandscapeGreat Wide Open
 
Understanding Open Source Class 101
Understanding Open Source Class 101Understanding Open Source Class 101
Understanding Open Source Class 101Great Wide Open
 
Elasticsearch for SQL Users
Elasticsearch for SQL UsersElasticsearch for SQL Users
Elasticsearch for SQL UsersGreat Wide Open
 
Open Source Security Tools for Big Data
Open Source Security Tools for Big DataOpen Source Security Tools for Big Data
Open Source Security Tools for Big DataGreat Wide Open
 

More from Great Wide Open (20)

The Little Meetup That Could
The Little Meetup That CouldThe Little Meetup That Could
The Little Meetup That Could
 
Lightning Talk - 5 Hacks to Getting the Job of Your Dreams
Lightning Talk - 5 Hacks to Getting the Job of Your DreamsLightning Talk - 5 Hacks to Getting the Job of Your Dreams
Lightning Talk - 5 Hacks to Getting the Job of Your Dreams
 
Breaking Free from Proprietary Gravitational Pull
Breaking Free from Proprietary Gravitational PullBreaking Free from Proprietary Gravitational Pull
Breaking Free from Proprietary Gravitational Pull
 
Dealing with Unstructured Data: Scaling to Infinity
Dealing with Unstructured Data: Scaling to InfinityDealing with Unstructured Data: Scaling to Infinity
Dealing with Unstructured Data: Scaling to Infinity
 
You Don't Know Node: Quick Intro to 6 Core Features
You Don't Know Node: Quick Intro to 6 Core FeaturesYou Don't Know Node: Quick Intro to 6 Core Features
You Don't Know Node: Quick Intro to 6 Core Features
 
Hidden Features in HTTP
Hidden Features in HTTPHidden Features in HTTP
Hidden Features in HTTP
 
Using Cryptography Properly in Applications
Using Cryptography Properly in ApplicationsUsing Cryptography Properly in Applications
Using Cryptography Properly in Applications
 
Lightning Talk - Getting Students Involved In Open Source
Lightning Talk - Getting Students Involved In Open SourceLightning Talk - Getting Students Involved In Open Source
Lightning Talk - Getting Students Involved In Open Source
 
You have Selenium... Now what?
You have Selenium... Now what?You have Selenium... Now what?
You have Selenium... Now what?
 
How Constraints Cultivate Growth
How Constraints Cultivate GrowthHow Constraints Cultivate Growth
How Constraints Cultivate Growth
 
Inner Source 101
Inner Source 101Inner Source 101
Inner Source 101
 
Running MySQL on Linux
Running MySQL on LinuxRunning MySQL on Linux
Running MySQL on Linux
 
Search is the new UI
Search is the new UISearch is the new UI
Search is the new UI
 
The Current Messaging Landscape
The Current Messaging LandscapeThe Current Messaging Landscape
The Current Messaging Landscape
 
Apache httpd v2.4
Apache httpd v2.4Apache httpd v2.4
Apache httpd v2.4
 
Understanding Open Source Class 101
Understanding Open Source Class 101Understanding Open Source Class 101
Understanding Open Source Class 101
 
Thinking in Git
Thinking in GitThinking in Git
Thinking in Git
 
Antifragile Design
Antifragile DesignAntifragile Design
Antifragile Design
 
Elasticsearch for SQL Users
Elasticsearch for SQL UsersElasticsearch for SQL Users
Elasticsearch for SQL Users
 
Open Source Security Tools for Big Data
Open Source Security Tools for Big DataOpen Source Security Tools for Big Data
Open Source Security Tools for Big Data
 

Recently uploaded

Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 

Recently uploaded (20)

Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 

Troubleshooting Hadoop: Distributed Debugging

  • 1. 1© Cloudera, Inc. All rights reserved. Troubleshooting Hadoop: Distributed Debugging Dustin Cote | Customer Operations Engineer
  • 2. 2© Cloudera, Inc. All rights reserved. Roadmap • The Hadoop Ecosystem • What is Hadoop? • What are some clear challenge areas? • Debugging tools • How do built-in Linux tools help? • Where do we look for typical problems? • Custom tooling to facilitate problem solving • Deep dive example • Application with intermittent failure • Some data is bigger than others
  • 3. 3© Cloudera, Inc. All rights reserved. The Hadoop Ecosystem
  • 4. 4© Cloudera, Inc. All rights reserved. What is Hadoop? • Top level Apache project for storing and processing large data sets • Originally an implementation of Google’s Mapreduce and Google File System papers • Since evolved to be the general platform for working with petabyte scale datasets • Specifically relevant for this presentation • Mostly implemented in Java • Users generally expand to 20+ other components that work with Hadoop • Master-Slave architecture • Commonly used “in the cloud”
  • 5. 5© Cloudera, Inc. All rights reserved. 5© Cloudera, Inc. All rights reserved.
  • 6. 6© Cloudera, Inc. All rights reserved. 6© Cloudera, Inc. All rights reserved.
  • 7. 7© Cloudera, Inc. All rights reserved. Challenge Areas • Infrastructure • Network sensitivity • Disk contention • JVM Scaling • Garbage collection • Memory sizing • Configuration management • Host inconsistencies • Platform config inconsistencies • Version tracking
  • 8. 8© Cloudera, Inc. All rights reserved. Debugging tools
  • 9. 9© Cloudera, Inc. All rights reserved. Linux-based utilities • Hadoop runs on Linux • Leverage existing skillsets • Log parsing • grep, sed, awk, perl, etc. • Network health • ifconfig, telnet, traceroute, tcpdump, etc. • Process health • top, ps, etc. • System health • dmesg, messages, etc.
  • 10. 10© Cloudera, Inc. All rights reserved. Extending Linux-based utilities • My application logs are 80GB! • split, filter, slice, but how? • ERROR is a good place to start • zgrep when you have time • Keywords for YARN applications • ApplicationMaster, MRAppMAster • FAIL, KILL, timed out • Map those container IDs (container_XXXXX_XX)
  • 11. 11© Cloudera, Inc. All rights reserved. JVM tools • Mostly all Java means a mostly familiar toolkit • jstack, jmap, jconsole, jps • Careful with heap dumps, data processing JVMs can have 10+ GB heap sizes • Garbage collection logging (-XX:+PrintGCDetails) • Lots of different users, make sure you are running as the right user collecting JVM metrics • Do not just run as root everywhere • Sudo to the JVM owner when collecting jstacks and jmaps
  • 12. 12© Cloudera, Inc. All rights reserved. Source code! • Most of the code base is open source! • Found a NullPointerException? Hop on github and find the line. • https://github.com/apache/hadoop • Even better, JIRA is available to see known issues • Hadoop Common • HDFS • Mapreduce • YARN
  • 13. 13© Cloudera, Inc. All rights reserved. Log analysis helps identify anomalies • Word counts are simple but powerful • Tracking service logging overtime shows patterns • Master tracking helps drill into which slaves may be unhealthy Custom tooling
  • 14. 14© Cloudera, Inc. All rights reserved. Configuration management is hard • Validating configuration is in lock-step across all instances is ideal • Keep configuration simple and logical • At Cloudera, we pull whole cluster configurations for validation Custom tooling
  • 15. 15© Cloudera, Inc. All rights reserved. Deep dive examples
  • 16. 16© Cloudera, Inc. All rights reserved. Example • Initial complaint • Mapreduce job shows “SUCCESSFUL” but does not generate an output • Job was known to produce output on smaller datasets • User environment • ~100 node cluster • Running YARN with Mapreduce v2 • Job uses Kite SDK and Apache Crunch API (also open source) • Job runs for several hours, reproducing is painful
  • 17. 17© Cloudera, Inc. All rights reserved. Example • Debugging the environment • Searching on errors, first this was found • 2015-04-20 15:40:04,938 WARN [Readahead Thread #1] org.apache.hadoop.io. ReadaheadPool: Failed readahead on ifile EINVAL: Invalid argument • Bad disk? Probably not -- this job runs if the data is batch smaller • User mailing lists confirm this is a false positive! • File a JIRA and move on • Other node problems? Probably not -- no indication of other jobs failing
  • 18. 18© Cloudera, Inc. All rights reserved. Example • Debugging the application • Logging obtained through hadoop commands • yarn logs -applicationId APP_ID > out.file • logs are huge, need a strategy • first check if a write-out failure is ignored -- was not • check if any output data is created at all -- yes! • output data is then destroyed when moving to final location -- bad, but why?
  • 19. 19© Cloudera, Inc. All rights reserved. Example • Debugging the application • Need more information, let’s get DEBUG level logging • Logs are already 80GB • now we have even more data to sift through, let’s try to focus on the final move stage • org.kitesdk.data.mapreduce. DatasetKeyOutputFormat$MergeOutputCommitter makes that happen, let’ s just raise that class to DEBUG • success! we see this class toss aside the dataset, but why? • code shows an int is being used to count records to output :(
  • 20. 20© Cloudera, Inc. All rights reserved. Example 2 • Initial complaint • Hive query that used to run in 10 minutes, now is not complete after 10 hours • nothing has changed (!) • User environment • ~50 node cluster • Hive tables on scale of several hundred GB • Query with JOIN operations
  • 21. 21© Cloudera, Inc. All rights reserved. Example 2 • User may not be aware of changes, but what do the logs say? • Hive generates mapreduce jobs deterministically based on: • Table structure • optimization flags • HQL (SQL-like) query structure • User shows it’s the same query and no properties have changed • Back to those challenge areas (Infrastructure, JVM, Config management)
  • 22. 22© Cloudera, Inc. All rights reserved. Example 2 • Config management is easiest •Running from another client machine? •cluster side default changes? (upgrades, patches, etc.) • JVM is next easiest •let’s pull in the Mapreduce logs again •yarn logs -applicationId APP_ID > out.log •2015-11-24 17:56:46,324 INFO [main] org.apache.hadoop.hive.ql.exec. CommonJoinOperator: table 0 has 3628000 rows for join key [00011]
  • 23. 23© Cloudera, Inc. All rights reserved. Example 2 • Why so many rows for one join key? •How many rows overall? ~1.8 billion! •Just disk write throughput will take several hours (considering size) • So, what changed? •Configurations would not create more rows in the output •JVM settings and memory management doesn’t seem likely •Infrastructure was never going to be fast enough to do this in 10 minutes • UAT testing was being used for performance baseline! •Hadoop scales linearly only if you scale your data linearly :)
  • 24. 24© Cloudera, Inc. All rights reserved. Thank you