SlideShare una empresa de Scribd logo
1 de 42
Hadoop Troubleshooting 101
Kate Ting, Cloudera
Ariel Rabkin, Cloudera/UC Berkeley
How to Destroy Your Cluster

  with 7 Misconfigurations
Agenda
• Ticket Breakdown
• What are Misconfigurations?
  • Memory Mismanagement
     • TT OOME
     • JT OOME
     • Native Threads
  • Thread Mismanagement
     • Fetch Failures
     • Replicas
  • Disk Mismanagement
     • No File
     • Too Many Files
• Cloudera Manager
Agenda
• Ticket Breakdown
• What are Misconfigurations?
  • Memory Mismanagement
     • TT OOME
     • JT OOME
     • Native Threads
  • Thread Mismanagement
     • Fetch Failures
     • Replicas
  • Disk Mismanagement
     • No File
     • Too Many Files
• Cloudera Manager
Ticket Breakdown

• No one issue was more than 2% of tickets

• First symptom ≠ root cause

• Bad configuration goes undetected
Breakdown of Tickets by Component
Agenda
• Ticket Breakdown
• What are Misconfigurations?
  • Memory Mismanagement
     • TT OOME
     • JT OOME
     • Native Threads
  • Thread Mismanagement
     • Fetch Failures
     • Replicas
  • Disk Mismanagement
     • No File
     • Too Many Files
• Cloudera Manager
What are Misconfigurations?

• Any diagnostic ticket requiring a change to Hadoop or to
  OS config files

• Comprise 35% of tickets

• e.g. resource-allocation: memory, file-handles, disk-space
Problem Tickets by Category and Time
Why Care About Misconfigurations?
The life of an over-subscribed MR/HBase
   cluster is nasty, brutish, and short.
            (with apologies to Thomas Hobbes)
Oversubscription of MR heap caused swap.




      Swap caused RegionServer to time out and die.



Dead RegionServer caused MR tasks to fail until MR job died.
Whodunit?



  A Faulty MR Config Killed HBase
Who Are We?

• Kate Ting
  o Customer Operations Engineer, Cloudera
  o 9 months troubleshooting 100+ Hadoop clusters
  o kate@cloudera.com

• Ariel Rabkin
  o Customer Operations Tools Team, Intern, Cloudera
  o UC Berkeley CS PhD candidate
  o Dissertation on diagnosing configuration bugs
  o asrabkin@eecs.berkeley.edu
Agenda
• Ticket Breakdown
• What are Misconfigurations?
  • Memory Mismanagement
     • TT OOME
     • JT OOME
     • Native Threads
  • Thread Mismanagement
     • Fetch Failures
     • Replicas
  • Disk Mismanagement
     • No File
     • Too Many Files
• Cloudera Manager
1. Task Out Of Memory Error

FATAL org.apache.hadoop.mapred.TaskTracker:
Error running child : java.lang.OutOfMemoryError: Java heap space
     at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.<init>(MapTask
.java:781)
     at
org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:350
)
     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
     at org.apache.hadoop.mapred.Child.main(Child.java:170)
1. Task Out Of Memory Error
• What does it mean?
  o Memory leak in task code

• What causes this?
  o MR task heap sizes will not fit

• How can it be resolved?
   o Set io.sort.mb < mapred.child.java.opts
       o e.g. 512M for io.sort.mb and 1G for mapred.child.java.opts
       io.sort.mb = size of map buffer; smaller buffer => more spills
   o Use fewer mappers & reducers
       mappers = # of cores on node
           if not using HBase
           if # disks = # cores
       4:3 mapper:reducer ratio
(Mappers + Reducers)*
               Child Task Heap
                       +
                   DN heap
                       +
Total RAM          TT heap
                       +
                     3GB
                       +
                   RS heap
                       +
             Other Services' heap
2. JobTracker Out of Memory Error
ERROR org.apache.hadoop.mapred.JobTracker: Job initialization failed:
java.lang.OutOfMemoryError: Java heap space
at org.apache.hadoop.mapred.TaskInProgress.<init>(TaskInProgress.java:122)
at org.apache.hadoop.mapred.JobInProgress.initTasks(JobInProgress.java:653)
at org.apache.hadoop.mapred.JobTracker.initJob(JobTracker.java:3965)
at
org.apache.hadoop.mapred.EagerTaskInitializationListener$InitJob.run(EagerTaskIniti
alizationListener.java:79)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:8
86)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
2. JobTracker Out of Memory Error

• What does it mean?
  o Total JT memory usage > allocated RAM

• What causes this?
  o Tasks too small
  o Too much job history
2. JobTracker Out of Memory Error

• How can it be resolved?
  o Verify by summary output: sudo -u mapred jmap -J-d64
    -histo:live <PID of JT>
  o Increase JT's heap space
  o Separate JT and NN
  o Decrease
    mapred.jobtracker.completeuserjobs.maximum to 5
      JT cleans up mem more often, reducing needed
       RAM
3. Unable to Create New Native Thread

ERROR mapred.JvmManager: Caught Throwable in JVMRunner. Aborting
TaskTracker.
java.lang.OutOfMemoryError: unable to create new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Thread.java:640)
at org.apache.hadoop.util.Shell.runCommand(Shell.java:234)
at org.apache.hadoop.util.Shell.run(Shell.java:182)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:375)
at
org.apache.hadoop.mapred.DefaultTaskController.launchTask(DefaultTaskController.j
ava:127)
at
org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.runChild
(JvmManager.java:472)
at
org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.run(Jvm
Manager.java:446)
3. Unable to Create New Native Thread
• What does it mean?
  o DN show up as dead even though processes are still
    running on those machines

• What causes this?
  o nproc default of 4200 processes is too low

• How can it be resolved?
  o Adjust /etc/security/limits.conf:
      hbase soft/hard nproc 50000
      hdfs soft/hard nproc 50000
      mapred soft/hard nproc 50000
      Restart DN, TT, JT, NN
Agenda
• Ticket Breakdown
• What are Misconfigurations?
  • Memory Mismanagement
     • TT OOME
     • JT OOME
     • Native Threads
  • Thread Mismanagment
     • Fetch Failures
     • Replicas
  • Disk Mismanagement
     • No File
     • Too Many Files
• Cloudera Manager
4. Too Many Fetch-Failures



INFO org.apache.hadoop.mapred.JobInProgress: Too many
fetch-failures for output of task:
4. Too Many Fetch-Failures

• What does it mean?
  o Reducer fetch operations fail to retrieve mapper outputs
  o Too many fetch failures occur on a particular TT
     o => blacklisted TT

• What causes this?
  o DNS issues
  o Not enough http threads on the mapper side for the
    reducers
  o JVM bug
4. Too Many Fetch-Failures
• How can it be resolved?
   o mapred.reduce.slowstart.completed.maps = 0.80
       Allows reducers from other jobs to run while a big job waits on
        mappers
   o tasktracker.http.threads = 80
       Specifies # threads used by the TT to serve map output to
        reducers
   o mapred.reduce.parallel.copies = SQRT(NodeCount) with a floor
     of 10
       Specifies # parallel copies used by reducers to fetch map
        output
   o Stop using 6.1.26 Jetty, which is fetch-failure prone and upgrade
     to CDH3u2 (MAPREDUCE-2980), MAPREDUCE-3184
5. Not Able to Place Enough Replicas



WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Not
able to place enough replicas
5. Not Able to Place Enough Replicas



• What does it mean?
  o NN is unable to choose some DNs
5. Not Able to Place Enough Replicas
•   What causes this?
    o dfs replication > # avail DNs
        o # avail DNs is low due to low disk space
        o mapred.submit.replication default of 10 is too high
    o NN is unable to satisfy block replacement policy
        o If # racks > 2, a block has to exist in at least 2 racks
    o DN being decommissioned
    o DN has too much load on it
    o Block-size unreasonably large
    o Not enough xcievers threads
        Default 256 threads that DN can manage is too low
        Note the accidental misspelling
5. Not Able to Place Enough Replicas
• How can it be resolved?
  o Set dfs.datanode.max.xcievers = 4096 then restart DN
  o Look for nodes down (or rack down)
  o Check disk space
     o Log dir may be filling up or a runaway task may fill
       up an entire disk
  o Rebalance
Agenda
• Ticket Breakdown
• What are Misconfigurations?
  • Memory Exhaustion
     • TT OOME
     • JT OOME
     • Native Threads
  • Thread Exhaustion
     • Fetch Failures
     • Replicas
  • Data Space Exhaustion
     • No File
     • Too Many Files
• Cloudera Manager
6. No Such File or Directory

ERROR org.apache.hadoop.mapred.TaskTracker: Can not start task tracker because
ENOENT: No such file or directory
at org.apache.hadoop.io.nativeio.NativeIO.chmod(Native Method)
at
org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:
496)
at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:319)
at org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:189)
at org.apache.hadoop.mapred.TaskTracker.initializeDirectories(TaskTracker.java:666)
at org.apache.hadoop.mapred.TaskTracker.initialize(TaskTracker.java:734)
at org.apache.hadoop.mapred.TaskTracker.<init>(TaskTracker.java:1431)
at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:3521)
Total Storage




                MR space
                DFS space
6. No Such File or Directory
• What does it mean?
  o TT failing to start or jobs are failing
• What causes this?
  o Diskspace filling up on the TT disk
  o Permissions on the directories that TT writes to
• How can it be resolved?
   Job logging dir dfs.datanode.du.reserved = 10% total
    vol
   Permissions should be 755 and owner should be
    mapred for userlogs and mapred.local.dir
7. Too Many Open Files

ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(<ip address>:50010,
storageID=<storage id>, infoPort=50075, ipcPort=50020):DataXceiver
java.io.IOException: Too many open files
      at sun.nio.ch.IOUtil.initPipe(Native Method)
      at sun.nio.ch.EPollSelectorImpl.<init>(EPollSelectorImpl.java:49)
      at sun.nio.ch.EPollSelectorProvider.openSelector(EPollSelectorProvider.java:18)
      at org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.get(SocketIOWithTimeout.java:407)
      at org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:322)
      at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157)
      at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
      at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
      at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
      at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
      at java.io.DataInputStream.readShort(DataInputStream.java:295)
      at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:97)
7. Too Many Open Files
• What does it mean?
  o Hitting open file handles limit of user account running
    Hadoop

• What causes this?
  o Nofile default of 1024 files is too low

• How can it be resolved?
  o Adjust /etc/security/limits.conf:
      hdfs - nofile 32768
      mapred - nofile 32768
      hbase - nofile 32768
      Restart DN, TT, JT, NN
Agenda
• Ticket Breakdown
• What are Misconfigurations?
  • Memory Mismanagement
     • TT OOME
     • JT OOME
     • Native Threads
  • Thread Mismanagement
     • Fetch Failures
     • Replicas
  • Disk Mismanagement
     • No File
     • Too Many Files
• Cloudera Manager
Additional Resources
• Avoiding Common Hadoop Administration Issues:
  http://www.cloudera.com/blog/2010/08/avoiding-common-hadoop-
  administration-issues/

• Tips for Improving MapReduce Performance:
  http://www.cloudera.com/blog/2009/12/7-tips-for-improving-
  mapreduce-performance/

• Basic Hardware Recommendations:
  http://www.cloudera.com/blog/2010/03/clouderas-support-team-
  shares-some-basic-hardware-recommendations/

• Cloudera Knowledge
  Base: http://ccp.cloudera.com/display/KB/Knowledge+Base
Acknowledgements

Adam Warrington
Amr Awadallah
Angus Klein
Brock Noland
Clint Heath
Eric Sammer
Esteban Gutierrez
Jeff Bean
Joey Echeverria
Laura Gonzalez
Linden Hillenbrand
Omer Trajman
Patrick Angeles
Takeaways

• Configuration is up to you. Bugs are up to
  the community.

• Misconfigurations are hard to diagnose.

• Get it right the first time with Cloudera
  Manager.

Más contenido relacionado

La actualidad más candente

Oracle GoldenGate and Apache Kafka: A Deep Dive Into Real-Time Data Streaming
Oracle GoldenGate and Apache Kafka: A Deep Dive Into Real-Time Data StreamingOracle GoldenGate and Apache Kafka: A Deep Dive Into Real-Time Data Streaming
Oracle GoldenGate and Apache Kafka: A Deep Dive Into Real-Time Data StreamingMichael Rainey
 
Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing DataWorks Summit
 
ORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big DataORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big DataDataWorks Summit
 
Top 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark ApplicationsTop 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark ApplicationsSpark Summit
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiDatabricks
 
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...HostedbyConfluent
 
Facebook Messages & HBase
Facebook Messages & HBaseFacebook Messages & HBase
Facebook Messages & HBase强 王
 
Hadoop Hive Tutorial | Hive Fundamentals | Hive Architecture
Hadoop Hive Tutorial | Hive Fundamentals | Hive ArchitectureHadoop Hive Tutorial | Hive Fundamentals | Hive Architecture
Hadoop Hive Tutorial | Hive Fundamentals | Hive ArchitectureSkillspeed
 
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang WangApache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang WangDatabricks
 
Parquet and AVRO
Parquet and AVROParquet and AVRO
Parquet and AVROairisData
 
Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guideParquet performance tuning: the missing guide
Parquet performance tuning: the missing guideRyan Blue
 
Hadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Hadoop World 2011: Advanced HBase Schema Design - Lars George, ClouderaHadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Hadoop World 2011: Advanced HBase Schema Design - Lars George, ClouderaCloudera, Inc.
 
Don’t optimize my queries, optimize my data!
Don’t optimize my queries, optimize my data!Don’t optimize my queries, optimize my data!
Don’t optimize my queries, optimize my data!Julian Hyde
 
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...Spark Summit
 
Hadoop Security Architecture
Hadoop Security ArchitectureHadoop Security Architecture
Hadoop Security ArchitectureOwen O'Malley
 
Improving HDFS Availability with IPC Quality of Service
Improving HDFS Availability with IPC Quality of ServiceImproving HDFS Availability with IPC Quality of Service
Improving HDFS Availability with IPC Quality of ServiceDataWorks Summit
 
Building large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudiBuilding large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudiBill Liu
 

La actualidad más candente (20)

Oracle GoldenGate and Apache Kafka: A Deep Dive Into Real-Time Data Streaming
Oracle GoldenGate and Apache Kafka: A Deep Dive Into Real-Time Data StreamingOracle GoldenGate and Apache Kafka: A Deep Dive Into Real-Time Data Streaming
Oracle GoldenGate and Apache Kafka: A Deep Dive Into Real-Time Data Streaming
 
Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing
 
ORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big DataORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big Data
 
Top 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark ApplicationsTop 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark Applications
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
 
Node Labels in YARN
Node Labels in YARNNode Labels in YARN
Node Labels in YARN
 
Apache Spark Architecture
Apache Spark ArchitectureApache Spark Architecture
Apache Spark Architecture
 
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
 
Facebook Messages & HBase
Facebook Messages & HBaseFacebook Messages & HBase
Facebook Messages & HBase
 
Hadoop Hive Tutorial | Hive Fundamentals | Hive Architecture
Hadoop Hive Tutorial | Hive Fundamentals | Hive ArchitectureHadoop Hive Tutorial | Hive Fundamentals | Hive Architecture
Hadoop Hive Tutorial | Hive Fundamentals | Hive Architecture
 
File Format Benchmark - Avro, JSON, ORC & Parquet
File Format Benchmark - Avro, JSON, ORC & ParquetFile Format Benchmark - Avro, JSON, ORC & Parquet
File Format Benchmark - Avro, JSON, ORC & Parquet
 
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang WangApache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
 
Parquet and AVRO
Parquet and AVROParquet and AVRO
Parquet and AVRO
 
Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guideParquet performance tuning: the missing guide
Parquet performance tuning: the missing guide
 
Hadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Hadoop World 2011: Advanced HBase Schema Design - Lars George, ClouderaHadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Hadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
 
Don’t optimize my queries, optimize my data!
Don’t optimize my queries, optimize my data!Don’t optimize my queries, optimize my data!
Don’t optimize my queries, optimize my data!
 
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
 
Hadoop Security Architecture
Hadoop Security ArchitectureHadoop Security Architecture
Hadoop Security Architecture
 
Improving HDFS Availability with IPC Quality of Service
Improving HDFS Availability with IPC Quality of ServiceImproving HDFS Availability with IPC Quality of Service
Improving HDFS Availability with IPC Quality of Service
 
Building large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudiBuilding large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudi
 

Destacado

Troubleshooting Hadoop: Distributed Debugging
Troubleshooting Hadoop: Distributed DebuggingTroubleshooting Hadoop: Distributed Debugging
Troubleshooting Hadoop: Distributed DebuggingGreat Wide Open
 
Hadoop Summit 2012 | Optimizing MapReduce Job Performance
Hadoop Summit 2012 | Optimizing MapReduce Job PerformanceHadoop Summit 2012 | Optimizing MapReduce Job Performance
Hadoop Summit 2012 | Optimizing MapReduce Job PerformanceCloudera, Inc.
 
A day in the life of hadoop administrator!
A day in the life of hadoop administrator!A day in the life of hadoop administrator!
A day in the life of hadoop administrator!Edureka!
 
Introduction to Cloudera's Administrator Training for Apache Hadoop
Introduction to Cloudera's Administrator Training for Apache HadoopIntroduction to Cloudera's Administrator Training for Apache Hadoop
Introduction to Cloudera's Administrator Training for Apache HadoopCloudera, Inc.
 
Hadoop 31-frequently-asked-interview-questions
Hadoop 31-frequently-asked-interview-questionsHadoop 31-frequently-asked-interview-questions
Hadoop 31-frequently-asked-interview-questionsAsad Masood Qazi
 
Introduction to Hadoop Administration
Introduction to Hadoop AdministrationIntroduction to Hadoop Administration
Introduction to Hadoop AdministrationEdureka!
 
Hadoop Administration pdf
Hadoop Administration pdfHadoop Administration pdf
Hadoop Administration pdfEdureka!
 
10 Common Hadoop-able Problems Webinar
10 Common Hadoop-able Problems Webinar10 Common Hadoop-able Problems Webinar
10 Common Hadoop-able Problems WebinarCloudera, Inc.
 
A Survey on Big Data Analysis Techniques
A Survey on Big Data Analysis TechniquesA Survey on Big Data Analysis Techniques
A Survey on Big Data Analysis Techniquesijsrd.com
 
Why Migrate from MySQL to Cassandra
Why Migrate from MySQL to CassandraWhy Migrate from MySQL to Cassandra
Why Migrate from MySQL to CassandraDATAVERSITY
 
Apache Ambari: Simplified Hadoop Cluster Operation & Troubleshooting
Apache Ambari: Simplified Hadoop Cluster Operation & TroubleshootingApache Ambari: Simplified Hadoop Cluster Operation & Troubleshooting
Apache Ambari: Simplified Hadoop Cluster Operation & TroubleshootingJayush Luniya
 
70a monitoring & troubleshooting
70a monitoring & troubleshooting70a monitoring & troubleshooting
70a monitoring & troubleshootingmapr-academy
 
How to solve the mismanagement crisis
How to solve the mismanagement crisisHow to solve the mismanagement crisis
How to solve the mismanagement crisisBalamurugan Ka
 
How Hadoop Exploits Data Locality
How Hadoop Exploits Data LocalityHow Hadoop Exploits Data Locality
How Hadoop Exploits Data LocalityUday Vakalapudi
 
Hive Apachecon 2008
Hive Apachecon 2008Hive Apachecon 2008
Hive Apachecon 2008athusoo
 
Hadoop Summit 2009 Hive
Hadoop Summit 2009 HiveHadoop Summit 2009 Hive
Hadoop Summit 2009 HiveNamit Jain
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoopMohit Tare
 
A short introduction to Vertica
A short introduction to VerticaA short introduction to Vertica
A short introduction to VerticaTommi Siivola
 
Introduction to Apache Hive
Introduction to Apache HiveIntroduction to Apache Hive
Introduction to Apache HiveTapan Avasthi
 
Cassandra at NoSql Matters 2012
Cassandra at NoSql Matters 2012Cassandra at NoSql Matters 2012
Cassandra at NoSql Matters 2012jbellis
 

Destacado (20)

Troubleshooting Hadoop: Distributed Debugging
Troubleshooting Hadoop: Distributed DebuggingTroubleshooting Hadoop: Distributed Debugging
Troubleshooting Hadoop: Distributed Debugging
 
Hadoop Summit 2012 | Optimizing MapReduce Job Performance
Hadoop Summit 2012 | Optimizing MapReduce Job PerformanceHadoop Summit 2012 | Optimizing MapReduce Job Performance
Hadoop Summit 2012 | Optimizing MapReduce Job Performance
 
A day in the life of hadoop administrator!
A day in the life of hadoop administrator!A day in the life of hadoop administrator!
A day in the life of hadoop administrator!
 
Introduction to Cloudera's Administrator Training for Apache Hadoop
Introduction to Cloudera's Administrator Training for Apache HadoopIntroduction to Cloudera's Administrator Training for Apache Hadoop
Introduction to Cloudera's Administrator Training for Apache Hadoop
 
Hadoop 31-frequently-asked-interview-questions
Hadoop 31-frequently-asked-interview-questionsHadoop 31-frequently-asked-interview-questions
Hadoop 31-frequently-asked-interview-questions
 
Introduction to Hadoop Administration
Introduction to Hadoop AdministrationIntroduction to Hadoop Administration
Introduction to Hadoop Administration
 
Hadoop Administration pdf
Hadoop Administration pdfHadoop Administration pdf
Hadoop Administration pdf
 
10 Common Hadoop-able Problems Webinar
10 Common Hadoop-able Problems Webinar10 Common Hadoop-able Problems Webinar
10 Common Hadoop-able Problems Webinar
 
A Survey on Big Data Analysis Techniques
A Survey on Big Data Analysis TechniquesA Survey on Big Data Analysis Techniques
A Survey on Big Data Analysis Techniques
 
Why Migrate from MySQL to Cassandra
Why Migrate from MySQL to CassandraWhy Migrate from MySQL to Cassandra
Why Migrate from MySQL to Cassandra
 
Apache Ambari: Simplified Hadoop Cluster Operation & Troubleshooting
Apache Ambari: Simplified Hadoop Cluster Operation & TroubleshootingApache Ambari: Simplified Hadoop Cluster Operation & Troubleshooting
Apache Ambari: Simplified Hadoop Cluster Operation & Troubleshooting
 
70a monitoring & troubleshooting
70a monitoring & troubleshooting70a monitoring & troubleshooting
70a monitoring & troubleshooting
 
How to solve the mismanagement crisis
How to solve the mismanagement crisisHow to solve the mismanagement crisis
How to solve the mismanagement crisis
 
How Hadoop Exploits Data Locality
How Hadoop Exploits Data LocalityHow Hadoop Exploits Data Locality
How Hadoop Exploits Data Locality
 
Hive Apachecon 2008
Hive Apachecon 2008Hive Apachecon 2008
Hive Apachecon 2008
 
Hadoop Summit 2009 Hive
Hadoop Summit 2009 HiveHadoop Summit 2009 Hive
Hadoop Summit 2009 Hive
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
 
A short introduction to Vertica
A short introduction to VerticaA short introduction to Vertica
A short introduction to Vertica
 
Introduction to Apache Hive
Introduction to Apache HiveIntroduction to Apache Hive
Introduction to Apache Hive
 
Cassandra at NoSql Matters 2012
Cassandra at NoSql Matters 2012Cassandra at NoSql Matters 2012
Cassandra at NoSql Matters 2012
 

Similar a Hadoop World 2011: Hadoop Troubleshooting 101 - Kate Ting - Cloudera

Yarn cloudera-kathleenting061414 kate-ting
Yarn cloudera-kathleenting061414 kate-tingYarn cloudera-kathleenting061414 kate-ting
Yarn cloudera-kathleenting061414 kate-tingData Con LA
 
Top 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applicationsTop 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applicationshadooparchbook
 
Hadoop - Disk Fail In Place (DFIP)
Hadoop - Disk Fail In Place (DFIP)Hadoop - Disk Fail In Place (DFIP)
Hadoop - Disk Fail In Place (DFIP)mundlapudi
 
Top 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applicationsTop 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applicationshadooparchbook
 
Building an Impenetrable ZooKeeper - Kathleen Ting
Building an Impenetrable ZooKeeper - Kathleen TingBuilding an Impenetrable ZooKeeper - Kathleen Ting
Building an Impenetrable ZooKeeper - Kathleen Tingjaxconf
 
Top 5 Mistakes to Avoid When Writing Apache Spark Applications
Top 5 Mistakes to Avoid When Writing Apache Spark ApplicationsTop 5 Mistakes to Avoid When Writing Apache Spark Applications
Top 5 Mistakes to Avoid When Writing Apache Spark ApplicationsCloudera, Inc.
 
Top 5 Mistakes When Writing Spark Applications by Mark Grover and Ted Malaska
Top 5 Mistakes When Writing Spark Applications by Mark Grover and Ted MalaskaTop 5 Mistakes When Writing Spark Applications by Mark Grover and Ted Malaska
Top 5 Mistakes When Writing Spark Applications by Mark Grover and Ted MalaskaSpark Summit
 
HBase tales from the trenches
HBase tales from the trenchesHBase tales from the trenches
HBase tales from the trencheswchevreuil
 
In-memory Caching in HDFS: Lower Latency, Same Great Taste
In-memory Caching in HDFS: Lower Latency, Same Great TasteIn-memory Caching in HDFS: Lower Latency, Same Great Taste
In-memory Caching in HDFS: Lower Latency, Same Great TasteDataWorks Summit
 
Top 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applicationsTop 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applicationsmarkgrover
 
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...DataWorks Summit/Hadoop Summit
 
Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Performance Scenario: Diagnosing and resolving sudden slow down on two node RACPerformance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Performance Scenario: Diagnosing and resolving sudden slow down on two node RACKristofferson A
 
Top 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applicationsTop 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applicationsmarkgrover
 
Improving Hadoop Performance via Linux
Improving Hadoop Performance via LinuxImproving Hadoop Performance via Linux
Improving Hadoop Performance via LinuxAlex Moundalexis
 
Big Data and Hadoop in Cloud - Leveraging Amazon EMR
Big Data and Hadoop in Cloud - Leveraging Amazon EMRBig Data and Hadoop in Cloud - Leveraging Amazon EMR
Big Data and Hadoop in Cloud - Leveraging Amazon EMRVijay Rayapati
 
ApacheCon2010: Cache & Concurrency Considerations in Cassandra (& limits of JVM)
ApacheCon2010: Cache & Concurrency Considerations in Cassandra (& limits of JVM)ApacheCon2010: Cache & Concurrency Considerations in Cassandra (& limits of JVM)
ApacheCon2010: Cache & Concurrency Considerations in Cassandra (& limits of JVM)srisatish ambati
 
Monitor some of the things
Monitor some of the thingsMonitor some of the things
Monitor some of the thingsandertech
 
Hadoop Architecture_Cluster_Cap_Plan
Hadoop Architecture_Cluster_Cap_PlanHadoop Architecture_Cluster_Cap_Plan
Hadoop Architecture_Cluster_Cap_PlanNarayana B
 

Similar a Hadoop World 2011: Hadoop Troubleshooting 101 - Kate Ting - Cloudera (20)

Yarn cloudera-kathleenting061414 kate-ting
Yarn cloudera-kathleenting061414 kate-tingYarn cloudera-kathleenting061414 kate-ting
Yarn cloudera-kathleenting061414 kate-ting
 
Spark Tips & Tricks
Spark Tips & TricksSpark Tips & Tricks
Spark Tips & Tricks
 
Top 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applicationsTop 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applications
 
Hadoop - Disk Fail In Place (DFIP)
Hadoop - Disk Fail In Place (DFIP)Hadoop - Disk Fail In Place (DFIP)
Hadoop - Disk Fail In Place (DFIP)
 
Top 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applicationsTop 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applications
 
Building an Impenetrable ZooKeeper - Kathleen Ting
Building an Impenetrable ZooKeeper - Kathleen TingBuilding an Impenetrable ZooKeeper - Kathleen Ting
Building an Impenetrable ZooKeeper - Kathleen Ting
 
Top 5 Mistakes to Avoid When Writing Apache Spark Applications
Top 5 Mistakes to Avoid When Writing Apache Spark ApplicationsTop 5 Mistakes to Avoid When Writing Apache Spark Applications
Top 5 Mistakes to Avoid When Writing Apache Spark Applications
 
Top 5 Mistakes When Writing Spark Applications by Mark Grover and Ted Malaska
Top 5 Mistakes When Writing Spark Applications by Mark Grover and Ted MalaskaTop 5 Mistakes When Writing Spark Applications by Mark Grover and Ted Malaska
Top 5 Mistakes When Writing Spark Applications by Mark Grover and Ted Malaska
 
HBase tales from the trenches
HBase tales from the trenchesHBase tales from the trenches
HBase tales from the trenches
 
In-memory Caching in HDFS: Lower Latency, Same Great Taste
In-memory Caching in HDFS: Lower Latency, Same Great TasteIn-memory Caching in HDFS: Lower Latency, Same Great Taste
In-memory Caching in HDFS: Lower Latency, Same Great Taste
 
Top 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applicationsTop 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applications
 
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
 
Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Performance Scenario: Diagnosing and resolving sudden slow down on two node RACPerformance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC
 
Top 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applicationsTop 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applications
 
Improving Hadoop Performance via Linux
Improving Hadoop Performance via LinuxImproving Hadoop Performance via Linux
Improving Hadoop Performance via Linux
 
Big Data and Hadoop in Cloud - Leveraging Amazon EMR
Big Data and Hadoop in Cloud - Leveraging Amazon EMRBig Data and Hadoop in Cloud - Leveraging Amazon EMR
Big Data and Hadoop in Cloud - Leveraging Amazon EMR
 
ApacheCon2010: Cache & Concurrency Considerations in Cassandra (& limits of JVM)
ApacheCon2010: Cache & Concurrency Considerations in Cassandra (& limits of JVM)ApacheCon2010: Cache & Concurrency Considerations in Cassandra (& limits of JVM)
ApacheCon2010: Cache & Concurrency Considerations in Cassandra (& limits of JVM)
 
Top ten-list
Top ten-listTop ten-list
Top ten-list
 
Monitor some of the things
Monitor some of the thingsMonitor some of the things
Monitor some of the things
 
Hadoop Architecture_Cluster_Cap_Plan
Hadoop Architecture_Cluster_Cap_PlanHadoop Architecture_Cluster_Cap_Plan
Hadoop Architecture_Cluster_Cap_Plan
 

Más de Cloudera, Inc.

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxCloudera, Inc.
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera, Inc.
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards FinalistsCloudera, Inc.
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Cloudera, Inc.
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Cloudera, Inc.
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Cloudera, Inc.
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Cloudera, Inc.
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Cloudera, Inc.
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Cloudera, Inc.
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Cloudera, Inc.
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Cloudera, Inc.
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Cloudera, Inc.
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformCloudera, Inc.
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Cloudera, Inc.
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Cloudera, Inc.
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Cloudera, Inc.
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Cloudera, Inc.
 

Más de Cloudera, Inc. (20)

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
 

Último

Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 

Último (20)

Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 

Hadoop World 2011: Hadoop Troubleshooting 101 - Kate Ting - Cloudera

  • 1. Hadoop Troubleshooting 101 Kate Ting, Cloudera Ariel Rabkin, Cloudera/UC Berkeley
  • 2. How to Destroy Your Cluster with 7 Misconfigurations
  • 3. Agenda • Ticket Breakdown • What are Misconfigurations? • Memory Mismanagement • TT OOME • JT OOME • Native Threads • Thread Mismanagement • Fetch Failures • Replicas • Disk Mismanagement • No File • Too Many Files • Cloudera Manager
  • 4. Agenda • Ticket Breakdown • What are Misconfigurations? • Memory Mismanagement • TT OOME • JT OOME • Native Threads • Thread Mismanagement • Fetch Failures • Replicas • Disk Mismanagement • No File • Too Many Files • Cloudera Manager
  • 5. Ticket Breakdown • No one issue was more than 2% of tickets • First symptom ≠ root cause • Bad configuration goes undetected
  • 6. Breakdown of Tickets by Component
  • 7. Agenda • Ticket Breakdown • What are Misconfigurations? • Memory Mismanagement • TT OOME • JT OOME • Native Threads • Thread Mismanagement • Fetch Failures • Replicas • Disk Mismanagement • No File • Too Many Files • Cloudera Manager
  • 8. What are Misconfigurations? • Any diagnostic ticket requiring a change to Hadoop or to OS config files • Comprise 35% of tickets • e.g. resource-allocation: memory, file-handles, disk-space
  • 9. Problem Tickets by Category and Time
  • 10. Why Care About Misconfigurations?
  • 11. The life of an over-subscribed MR/HBase cluster is nasty, brutish, and short. (with apologies to Thomas Hobbes)
  • 12. Oversubscription of MR heap caused swap. Swap caused RegionServer to time out and die. Dead RegionServer caused MR tasks to fail until MR job died.
  • 13. Whodunit? A Faulty MR Config Killed HBase
  • 14. Who Are We? • Kate Ting o Customer Operations Engineer, Cloudera o 9 months troubleshooting 100+ Hadoop clusters o kate@cloudera.com • Ariel Rabkin o Customer Operations Tools Team, Intern, Cloudera o UC Berkeley CS PhD candidate o Dissertation on diagnosing configuration bugs o asrabkin@eecs.berkeley.edu
  • 15. Agenda • Ticket Breakdown • What are Misconfigurations? • Memory Mismanagement • TT OOME • JT OOME • Native Threads • Thread Mismanagement • Fetch Failures • Replicas • Disk Mismanagement • No File • Too Many Files • Cloudera Manager
  • 16. 1. Task Out Of Memory Error FATAL org.apache.hadoop.mapred.TaskTracker: Error running child : java.lang.OutOfMemoryError: Java heap space at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.<init>(MapTask .java:781) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:350 ) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) at org.apache.hadoop.mapred.Child.main(Child.java:170)
  • 17. 1. Task Out Of Memory Error • What does it mean? o Memory leak in task code • What causes this? o MR task heap sizes will not fit • How can it be resolved? o Set io.sort.mb < mapred.child.java.opts o e.g. 512M for io.sort.mb and 1G for mapred.child.java.opts  io.sort.mb = size of map buffer; smaller buffer => more spills o Use fewer mappers & reducers  mappers = # of cores on node  if not using HBase  if # disks = # cores  4:3 mapper:reducer ratio
  • 18. (Mappers + Reducers)* Child Task Heap + DN heap + Total RAM TT heap + 3GB + RS heap + Other Services' heap
  • 19. 2. JobTracker Out of Memory Error ERROR org.apache.hadoop.mapred.JobTracker: Job initialization failed: java.lang.OutOfMemoryError: Java heap space at org.apache.hadoop.mapred.TaskInProgress.<init>(TaskInProgress.java:122) at org.apache.hadoop.mapred.JobInProgress.initTasks(JobInProgress.java:653) at org.apache.hadoop.mapred.JobTracker.initJob(JobTracker.java:3965) at org.apache.hadoop.mapred.EagerTaskInitializationListener$InitJob.run(EagerTaskIniti alizationListener.java:79) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:8 86) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662)
  • 20. 2. JobTracker Out of Memory Error • What does it mean? o Total JT memory usage > allocated RAM • What causes this? o Tasks too small o Too much job history
  • 21. 2. JobTracker Out of Memory Error • How can it be resolved? o Verify by summary output: sudo -u mapred jmap -J-d64 -histo:live <PID of JT> o Increase JT's heap space o Separate JT and NN o Decrease mapred.jobtracker.completeuserjobs.maximum to 5  JT cleans up mem more often, reducing needed RAM
  • 22. 3. Unable to Create New Native Thread ERROR mapred.JvmManager: Caught Throwable in JVMRunner. Aborting TaskTracker. java.lang.OutOfMemoryError: unable to create new native thread at java.lang.Thread.start0(Native Method) at java.lang.Thread.start(Thread.java:640) at org.apache.hadoop.util.Shell.runCommand(Shell.java:234) at org.apache.hadoop.util.Shell.run(Shell.java:182) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:375) at org.apache.hadoop.mapred.DefaultTaskController.launchTask(DefaultTaskController.j ava:127) at org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.runChild (JvmManager.java:472) at org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.run(Jvm Manager.java:446)
  • 23. 3. Unable to Create New Native Thread • What does it mean? o DN show up as dead even though processes are still running on those machines • What causes this? o nproc default of 4200 processes is too low • How can it be resolved? o Adjust /etc/security/limits.conf:  hbase soft/hard nproc 50000  hdfs soft/hard nproc 50000  mapred soft/hard nproc 50000  Restart DN, TT, JT, NN
  • 24. Agenda • Ticket Breakdown • What are Misconfigurations? • Memory Mismanagement • TT OOME • JT OOME • Native Threads • Thread Mismanagment • Fetch Failures • Replicas • Disk Mismanagement • No File • Too Many Files • Cloudera Manager
  • 25. 4. Too Many Fetch-Failures INFO org.apache.hadoop.mapred.JobInProgress: Too many fetch-failures for output of task:
  • 26. 4. Too Many Fetch-Failures • What does it mean? o Reducer fetch operations fail to retrieve mapper outputs o Too many fetch failures occur on a particular TT o => blacklisted TT • What causes this? o DNS issues o Not enough http threads on the mapper side for the reducers o JVM bug
  • 27. 4. Too Many Fetch-Failures • How can it be resolved? o mapred.reduce.slowstart.completed.maps = 0.80  Allows reducers from other jobs to run while a big job waits on mappers o tasktracker.http.threads = 80  Specifies # threads used by the TT to serve map output to reducers o mapred.reduce.parallel.copies = SQRT(NodeCount) with a floor of 10  Specifies # parallel copies used by reducers to fetch map output o Stop using 6.1.26 Jetty, which is fetch-failure prone and upgrade to CDH3u2 (MAPREDUCE-2980), MAPREDUCE-3184
  • 28. 5. Not Able to Place Enough Replicas WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Not able to place enough replicas
  • 29. 5. Not Able to Place Enough Replicas • What does it mean? o NN is unable to choose some DNs
  • 30. 5. Not Able to Place Enough Replicas • What causes this? o dfs replication > # avail DNs o # avail DNs is low due to low disk space o mapred.submit.replication default of 10 is too high o NN is unable to satisfy block replacement policy o If # racks > 2, a block has to exist in at least 2 racks o DN being decommissioned o DN has too much load on it o Block-size unreasonably large o Not enough xcievers threads  Default 256 threads that DN can manage is too low  Note the accidental misspelling
  • 31. 5. Not Able to Place Enough Replicas • How can it be resolved? o Set dfs.datanode.max.xcievers = 4096 then restart DN o Look for nodes down (or rack down) o Check disk space o Log dir may be filling up or a runaway task may fill up an entire disk o Rebalance
  • 32. Agenda • Ticket Breakdown • What are Misconfigurations? • Memory Exhaustion • TT OOME • JT OOME • Native Threads • Thread Exhaustion • Fetch Failures • Replicas • Data Space Exhaustion • No File • Too Many Files • Cloudera Manager
  • 33. 6. No Such File or Directory ERROR org.apache.hadoop.mapred.TaskTracker: Can not start task tracker because ENOENT: No such file or directory at org.apache.hadoop.io.nativeio.NativeIO.chmod(Native Method) at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java: 496) at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:319) at org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:189) at org.apache.hadoop.mapred.TaskTracker.initializeDirectories(TaskTracker.java:666) at org.apache.hadoop.mapred.TaskTracker.initialize(TaskTracker.java:734) at org.apache.hadoop.mapred.TaskTracker.<init>(TaskTracker.java:1431) at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:3521)
  • 34. Total Storage MR space DFS space
  • 35. 6. No Such File or Directory • What does it mean? o TT failing to start or jobs are failing • What causes this? o Diskspace filling up on the TT disk o Permissions on the directories that TT writes to • How can it be resolved?  Job logging dir dfs.datanode.du.reserved = 10% total vol  Permissions should be 755 and owner should be mapred for userlogs and mapred.local.dir
  • 36. 7. Too Many Open Files ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(<ip address>:50010, storageID=<storage id>, infoPort=50075, ipcPort=50020):DataXceiver java.io.IOException: Too many open files at sun.nio.ch.IOUtil.initPipe(Native Method) at sun.nio.ch.EPollSelectorImpl.<init>(EPollSelectorImpl.java:49) at sun.nio.ch.EPollSelectorProvider.openSelector(EPollSelectorProvider.java:18) at org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.get(SocketIOWithTimeout.java:407) at org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:322) at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128) at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) at java.io.BufferedInputStream.read(BufferedInputStream.java:237) at java.io.DataInputStream.readShort(DataInputStream.java:295) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:97)
  • 37. 7. Too Many Open Files • What does it mean? o Hitting open file handles limit of user account running Hadoop • What causes this? o Nofile default of 1024 files is too low • How can it be resolved? o Adjust /etc/security/limits.conf:  hdfs - nofile 32768  mapred - nofile 32768  hbase - nofile 32768  Restart DN, TT, JT, NN
  • 38. Agenda • Ticket Breakdown • What are Misconfigurations? • Memory Mismanagement • TT OOME • JT OOME • Native Threads • Thread Mismanagement • Fetch Failures • Replicas • Disk Mismanagement • No File • Too Many Files • Cloudera Manager
  • 39.
  • 40. Additional Resources • Avoiding Common Hadoop Administration Issues: http://www.cloudera.com/blog/2010/08/avoiding-common-hadoop- administration-issues/ • Tips for Improving MapReduce Performance: http://www.cloudera.com/blog/2009/12/7-tips-for-improving- mapreduce-performance/ • Basic Hardware Recommendations: http://www.cloudera.com/blog/2010/03/clouderas-support-team- shares-some-basic-hardware-recommendations/ • Cloudera Knowledge Base: http://ccp.cloudera.com/display/KB/Knowledge+Base
  • 41. Acknowledgements Adam Warrington Amr Awadallah Angus Klein Brock Noland Clint Heath Eric Sammer Esteban Gutierrez Jeff Bean Joey Echeverria Laura Gonzalez Linden Hillenbrand Omer Trajman Patrick Angeles
  • 42. Takeaways • Configuration is up to you. Bugs are up to the community. • Misconfigurations are hard to diagnose. • Get it right the first time with Cloudera Manager.

Notas del editor

  1. You are sitting here because you know that misconfigurations and bugs break the most clusters. Fixing bugs is up to the community. Fixing misconfigurations is up to you. I guarantee if you wait an hour before walking out that you will leave with the know-how to get your configuration right the first time.
  2. hadoop errors show up far from where you configured so don&apos;t know what log files to analyze
  3. if your cluster is broken, probably a misconfiguration. this is a hard prob b/c the err msgs are not tightly tied to the root cause
  4. @tlipcon if sum(max heap size) &gt; physical RAM - 3GB, go directly to jail. do not pass go. do not collect $200.
  5. why not fix the defaults? b/c they are not static defaults, they are not all hadoop defaults but also OS defaults
  6. resource allocation is hard because the place where you are misallocating is far from where you ran out e.g. fetch failures could be due to system, config, or bug but not sure what from the error msg and you don&apos;t know which misconfig from err msg also ff doesn&apos;t show up on the node where it&apos;s configured
  7. one large cluster doesn&apos;t necessarily show all the different dependencies;cloudera has seen a lot of diverse operation clusters.part of what cloudera does is provide tools to help diagnose and understand how hadoop operates
  8. Not biased or anything but you can get it right the first time with ClouderaMgr