Hadoop World 2011: Proven Tools to Manage Hadoop Environments - Joey Jablonski, Dell & Vin Sharma, Intel

Proven Tools to
Simplify Hadoop Environments

Joey Jablonski & Vin Sharma

Intros and Bios
• Joey Jablonski
– Dell Principal Solution Architect
– http://www.linkedin.com/in/joeyjablonski
– http://mergingbusinessandit.com
• Vin Sharma
– Intel Open Source Enterprise Strategist
– http://www.linkedin.com/in/c1f3rt3xt
– http://www.intel.com/opensource

Agenda
• Why Hadoop is difficult for IT to operate
• How the right tools can make this easier
– Deployment & Configuration with Dell Crowbar
– Monitoring and Management with Dell Barclamps
– Performance Tuning with Intel HiTune

Hadoop Operational Models
Traditional Datacenter Cloud Infrastructure
• Assigned Servers • Elastic Resources
• Rigid Policies • Services (APIs)
• Tiered Software • Distributed Software

Operational Challenges
• Deployment
– Complex because of scale (60 nodes to 1000 nodes)
– Cumbersome because of high-touch processes
• Configuration & Tuning
– Error-prone configuration management
– State management
• Monitoring and Management
– Complex troubleshooting and diagnostics
– No proactive notification of problems
• Performance Optimization
– Limitations of traditional tools

Three aspects of revolutionary clouds
Two Sides of Cloud

Ecosystem
+API Cloud = Operations
Ops
Black Box HW
OPS
CloudOps
SW
APIs

Cloud

Ops
O/S

Physical

Images vs. Layers: Overview
Images: Single Unit Layers: Stacked Pieces

Configuration Integrations

Application Foo

Configuration
Integrations + Application Bar
Applications +
Utilities + Operating Utilities
System

Operating System

Images vs. Layers: Lifecycle

Images: Replacement Layers: Upgrade

Config Config Config
I I
Foo Foo

Config

Config
I+A+U+O I+A+U+O I+A+U+O
Bar v1 Bar v2
/S /S /S
U U
OS OS

Config Bar v2
I+A+U+O
/S

Modular Design: Barclamps

APIs, User Access, Nagios Ganglia Dashboard
& Ecosystem
Partners
Ops Management

Hadoop
Dell “Crowbar”

Cloud Infrastructure
& Dell IP Extensions

Crowbar DNS Logging
Core Components &
Operating Systems
Deployer NTP

Provisioner BIOS IPMI
Physical Resources
Network RAID

Crowbar = Install State Machine

Cloud = Ops

We have capable hardware & software, the real question is
how are we going to operate it as a service?

• This is CloudOps
OPS
HW • Software mindset to infrastructure
• Software is constantly changing
Cloud • Fluid resources instead of servers
SW
Ops • Manual touch is unacceptable

Ultimately, all the rules for operating the data center become
encoded as automation software.

Platform Selection
Dell PowerEdge C2100 for Hadoop based on Intel® Xeon®
Dell PowerEdge C2100
• Designed with big data in mind
• Compact 2U form factor
• 2-socket 6-core
• Intel® Xeon® 5620 processor
• High performance memory system
• Expansive disk storage Recommended Configuration
• Intel Xeon Processor 5600 series
• 4-6 1TB or 2TB 7200 RPM SATA SSD
• 12-24GB DDR3 R-ECC RAM
• 1-2 dual-port 1GigE
• Linux kernel 2.6.30 or later
• Sun Java 6u14 or later
• Hadoop version 0.20.x or later
Intel Whitepaper: “Optimizing Hadoop Deployments” (http://software.intel.com/file/31124)

So what seems to be the problem?
• Dataflow and high level
abstraction make it difficult
to understand runtime
behaviors

• Large distributed system
makes it difficult to correlate
concurrent performance-
related activities

HiTune: Hadoop Performance Analyzer
• Collects metrics from each node
• Aggregates data using Chukwa
• Analyzes results using Hadoop
• Generates reports for visualization

• System metric (CPU, Disk I/O, Network IO, Memory)
• Hadoop metrics
(NameNode, DataNode, JobTracker, TaskTracker, JVM metrics)
• Dataflow based statistics
(Job, MapTasks, Reduce Tasks, Threaddump for M/R)
• Summary view of a single job
• Summary view by comparing multiple jobs

Apache 2.0 License

HiTune Architecture
Sampler

• Tracker Sampler
Task
Sampler
Task
Sampler

– Lightweight agent running on each node Task Task Sampler
Sampler
Task Task
Sampler
Sampler Tracker
in the Hadoop cluster
Task
Sampler
Task
Task

Tracker
• Sysstat, Hadoop logs and metrics, Java Tracker
Sampler

instrumentation Task
Sampler
Task
Sampler
Task

Tracker

• Aggregation engine
Aggregation engine
– Merges the results of all the trackers in a Analysis engine
distributed fashion

Specification file

• Analysis engine
– Generates reports based on data flow
model Dataflow diagram

Case Study
Partitioned
Input Map Tasks Reduce Tasks
D
map
spill Aggregated
shuffle Output
copier
merge sort reduce
A map spill
shuffle
copier
merge sort
reduce
T map Spill
shuffle
copier
merge sort
map reduce
A spill

Streaming dataflow Sequential dataflow

Terasort with zlib
• Large gap between end of map and end of shuffle
• No CPU, I/O, or network bandwidth bottlenecks
• Adding copiers does not change “shuffle fetchers busy percent” = 100
Terasort with LZO
• Copier threads idle 80% waiting for memory merge threads
• Memory merge threads busy mostly due to compression
• Changing compression codec to LZO closes the gap
• Improves job running time by 2.3x

Have at it
• Pull Crowbar
– https://github.com/dellcloudedge/crowbar

• Pull HiTune
– https://github.com/HiTune/HiTune

Hadoop World 2011: Proven Tools to Manage Hadoop Environments - Joey Jablonski, Dell & Vin Sharma, Intel

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (16)

Similar a Hadoop World 2011: Proven Tools to Manage Hadoop Environments - Joey Jablonski, Dell & Vin Sharma, Intel

Similar a Hadoop World 2011: Proven Tools to Manage Hadoop Environments - Joey Jablonski, Dell & Vin Sharma, Intel (20)

Más de Cloudera, Inc.

Más de Cloudera, Inc. (20)

Último

Último (20)

Hadoop World 2011: Proven Tools to Manage Hadoop Environments - Joey Jablonski, Dell & Vin Sharma, Intel

Notas del editor