2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0

Hadoop 2.2.0
Hadoop grows up
Adam Muise

© Hortonworks Inc. 2013. Confidential and Proprietary.

Page 1

Rob Ford says…

…turn off your #*@!#%!!! Mobile Phones!

Page 2

YARN
Yet Another Resource Negotiator


A new abstraction layer
Single Use System

Multi Purpose Platform

Batch Apps

Batch, Interactive, Online, Streaming, …

HADOOP 1.0

HADOOP 2.0
MapReduce

Others

(data
processing)

MapReduce

(data
processing)

YARN

(cluster
resource
management

&
data
processing)

(cluster
resource
management)

HDFS

HDFS2

(redundant,
reliable
storage)


(redundant,
reliable
storage)

Page 4

Concepts
• Application
– Application is a job submitted to the framework
– Example – Map Reduce Job

• Container
– Basic unit of allocation
– Fine-grained resource allocation across multiple resource
types (memory, cpu, disk, network, gpu etc.)
–  container_0 = 2GB, 1CPU
–  container_1 = 1GB, 6 CPU

– Replaces the fixed map/reduce slots


5

YARN Architecture
• Resource Manager
– Global resource scheduler
– Hierarchical queues

• Node Manager
– Per-machine agent
– Manages the life-cycle of container
– Container resource monitoring

• Application Master
– Per-application
– Manages application scheduling and task execution
– E.g. MapReduce Application Master

6

YARN Architecture - Walkthrough
ResourceManager

Client2

Scheduler

NodeManager

NodeManager

NodeManager

NodeManager

Container
1.1

Container
2.2

Container
2.4

NodeManager

NodeManager

AM
1

NodeManager

Container
1.2

NodeManager

Container
1.3

© Hortonworks Inc. 2012

NodeManager

AM2

NodeManager

NodeManager

Container
2.1

NodeManager

Container
2.3

YARN as OS for Data Lake
ResourceManager

Scheduler

NodeManager

NodeManager

map
1.1

NodeManager

nimbus0

NodeManager

vertex1.1.1

vertex1.2.2

NodeManager

NodeManager

NodeManager

NodeManager

map1.2

Batch

InteracFve
SQL

vertex1.1.2

nimbus2

NodeManager

NodeManager

nimbus1

reduce1.1


NodeManager

Real-‐Time

NodeManager

vertex1.2.1

Multi-Tenant YARN
ResourceManager

Scheduler

root

Mrkting
30%

Dev
20%

Adhoc
10%

Prod
80%

DW
60%

Dev Reserved Prod
10%
20%
70%

P0
70%


P1
30%

Multi-Tenancy with New Capacity Scheduler
•  Queues
•  Economics as queue-capacity
–  Heirarchical Queues

•  SLAs
–  Preemption

ResourceManager

•  Resource Isolation
–  Linux: cgroups
–  MS Windows: Job Control
–  Roadmap: Virtualization (Xen, KVM)

•  Administration
–  Queue ACLs
–  Run-time re-configuration for queues
–  Charge-back

Scheduler

root

Hierarchical
Queues
Mrkting
20%

Dev
20%

Adhoc
10%

Prod
80%

DW
70%

Dev Reserved Prod
10%
20%
70%

P0
70%

P1
30%

Capacity Scheduler

Page 10

MapReduce v2
Changes to MapReduce on YARN


MapReduce V2 is a library now…
•  MapReduce runs on YARN like all other Hadoop 2.x applications
–  Gone are the map and reduce slots, that’s up to containers in YARN now
–  Gone is the JobTracker, replaced by the YARN AppMaster library

•  Multiple versions of MapReduce
–  The older mapred APIs work without modification or recompilation
–  The newer mapreduce APIs may need to be recompiled

•  Still has one master server component: the Job History Server
–  The Job History Server stores the execution of jobs
–  Used to audit prior execution of jobs
–  Will also be used by YARN framework to store charge backs at that level


Page 12

Shuffle in MapReduce v2
•  Faster Shuffle
–  Better embedded server: Netty

•  Encrypted Shuffle
–  Secure the shuffle phase as data moves across the cluster
–  Requires 2 way HTTPS, certificates on both sides
–  Incurs significant CPU overhead, reserve 1 core for this work
–  Certs stored on each node (provision with the cluster), refreshed every 10secs

•  Pluggable Shuffle Sort
–  Shuffle is the first phase in MapReduce that is guaranteed to not be data-local
–  Pluggable Shuffle/Sort allows for intrepid application developers or hardware
developers to intercept the network-heavy workload and optimize it
–  Typical implementations have hardware components like fast networks and
software components like sorting algorithms
–  API will change with future versions of Hadoop


Page 13

Efficiency Gains of MRv2
•  Key Optimizations
–  No hard segmentation of resource into map and reduce slots
–  Yarn scheduler is more efficient
–  MRv2 framework has become more efficient than MRv1; shuffle phase in MRv2 is
more performant with the usage of netty.

•  Yahoo has over 30000 nodes running YARN across over
365PB of data.
•  They calculate running about 400,000 jobs per day for
about 10 million hours of compute time.
•  They also have estimated a 60% – 150% improvement on
node usage per day.
•  Yahoo got rid of a whole colo (10,000 node datacenter)
because of their increased utilization.


HDFS v2
In a NutShell


HA


Page 16

HDFS Snapshots: Feature Overview
•  Admin can create point in time snapshots of HDFS
–  Of the entire file system (/root)
–  Of a specific data-set (sub-tree directory of file system)

•  Restore state of entire file system or data-set to a snapshot (like Apple
Time Machine)
–  Protect against user errors

•  Snapshot diffs identify changes made to data set
–  Keep track of how raw or derived/analytical data changes over time


Page 17

NFS Gateway: Feature Overview
•  NFS v3 standard
•  Supports all HDFS commands
–  List files
–  Copy, move files
–  Create and delete directories

•  Ingest for large scale analytical workloads
–  Load immutable files as source for analytical processing
–  No random writes

•  Stream files into HDFS
–  Log ingest by applications writing directly to HDFS client mount


Federation


Page 19

Managing Namespaces


Page 20

Performance


Page 21

Other Features


Page 22

Apache Tez
A New Hadoop Data Processing Framework


Page 23

Moving Hadoop Beyond MapReduce
•  Low level data-processing execution engine
•  Built on YARN
•  Enables pipelining of jobs
•  Removes task and job launch times
•  Does not write intermediate output to HDFS
–  Much lighter disk and network usage

•  New base of MapReduce, Hive, Pig, Cascading etc.
•  Hive and Pig jobs no longer need to move to the end of the queue
between steps in the pipeline


Apache Tez as the new Primitive
MapReduce as Base

Apache Tez as Base

HADOOP 1.0

HADOOP 2.0
Batch

Pig

(data
ﬂow)

Hive
Others

(sql)

(cascading)

MapReduce

MapReduce

Data
Flow

Pig

SQL

Hive

Others

(cascading)

Tez

Storm

(execu:on
engine)

YARN

(cluster
resource
management

&
data
processing)

(cluster
resource
management)

HDFS

HDFS2

(redundant,
reliable
storage)


Online

Real
Time

Data

Stream

Processing

Processing
HBase,

(redundant,
reliable
storage)

Accumulo

Hive-on-MR vs. Hive-on-Tez
Tez avoids
unneeded writes to
HDFS

SELECT a.x, AVERAGE(b.y) AS avg
FROM a JOIN b ON (a.id = b.id) GROUP BY a
UNION SELECT x, AVERAGE(y) AS AVG
FROM c GROUP BY x
ORDER BY AVG;

Hive – MR
M

M

Hive – Tez

M

SELECT a.state

SELECT b.id
R

R

M

SELECT a.state,
c.itemId

M

M

M
R

M

SELECT b.id

R

M

HDFS

JOIN (a, c)
SELECT c.price

M

R

M
R

HDFS

R

JOIN (a, c)

R

HDFS

JOIN(a, b)
GROUP BY a.state
COUNT(*)
AVERAGE(c.price)

M

M

R


M

JOIN(a, b)
GROUP BY a.state
COUNT(*)
AVERAGE(c.price)

R

Apache Tez (“Speed”)
•  Replaces MapReduce as primitive for Pig, Hive, Cascading etc.
– Smaller latency for interactive queries
– Higher throughput for batch queries
– 22 contributors: Hortonworks (13), Facebook, Twitter, Yahoo, Microsoft
Task with pluggable Input, Processor and Output

Input

Processor

Output

Task

Tez Task - <Input, Processor, Output>

YARN ApplicationMaster to run DAG of Tez Tasks

Tez: Building blocks for scalable data processing
Classical ‘Map’

HDFS

Input

Map

Processor

Classical ‘Reduce’

Sorted

Output

Shuﬄe

Input

Shuﬄe

Input

Reduce

Processor

Sorted

Output

Intermediate ‘Reduce’ for
Map-Reduce-Reduce

Reduce

Processor

HDFS

Output

Hive


29

SQL: Enhancing SQL Semantics
Hive
SQL
Datatypes

Hive
SQL
SemanFcs

SQL Compliance

INT

SELECT,
INSERT

TINYINT/SMALLINT/BIGINT

GROUP
BY,
ORDER
BY,
SORT
BY

BOOLEAN

JOIN
on
explicit
join
key

FLOAT

Inner,
outer,
cross
and
semi
joins

DOUBLE

Sub-‐queries
in
FROM
clause

Hive 12 provides a wide
array of SQL datatypes
and semantics so your
existing tools integrate
more seamlessly with
Hadoop

STRING

ROLLUP
and
CUBE

TIMESTAMP

UNION

BINARY

Windowing
Func:ons
(OVER,
RANK,
etc)

DECIMAL

Custom
Java
UDFs

ARRAY,
MAP,
STRUCT,
UNION

Standard
Aggrega:on
(SUM,
AVG,
etc.)

DATE

Advanced
UDFs
(ngram,
Xpath,
URL)

VARCHAR

Sub-‐queries
in
WHERE,
HAVING

CHAR

Expanded
JOIN
Syntax

SQL
Compliant
Security
(GRANT,
etc.)

INSERT/UPDATE/DELETE
(ACID)


Available

Hive
0.12

Roadmap

SPEED: Increasing Hive Performance
Interactive Query Times across ALL use cases
•  Simple and advanced queries in seconds
•  Integrates seamlessly with existing tools
•  Currently a >100x improvement in just nine months
Performance Improvements
included in Hive 12
–  Base & advanced query optimization
–  Startup time improvement
–  Join optimizations


Tez on YARN
ResourceManager

Scheduler

NodeManager

NodeManager

vertex1.2.2

NodeManager

map
1.1

NodeManager

map1.2

Batch

nimbus2

NodeManager

NodeManager

nimbus1

reduce1.1


NodeManager

nimbus0

NodeManager

Hive/Tez

(SQL)

NodeManager

Real-‐Time

NodeManager

vertex1.1.1

NodeManager

vertex1.1.2

NodeManager

vertex1.2.1

Apache Falcon
Data Lifecycle Management for Hadoop


Data Lifecycle on Hadoop is Challenging

Data Management Needs

Tools

Data Processing

Oozie

Replication

Sqoop

Retention

Distcp

Scheduling

Flume

Reprocessing

Map / Reduce

Multi Cluster Management

Hive and Pig Jobs

Problem: Patchwork of tools complicate data lifecycle management.
Result:
Long development cycles and quality challenges.


Falcon: One-stop Shop for Data Lifecycle
Apache Falcon
Provides

Orchestrates

Data Management Needs

Tools

Data Processing

Oozie

Replication

Sqoop

Retention

Distcp

Scheduling

Flume

Reprocessing

Map / Reduce

Multi Cluster Management

Hive and Pig Jobs

Falcon provides a single interface to orchestrate data lifecycle.
Sophisticated DLM easily added to Hadoop applications.


Falcon Core Capabilities
•  Core Functionality
–  Pipeline processing
–  Replication
–  Retention
–  Late data handling

•  Automates
–  Scheduling and retry
–  Recording audit, lineage and metrics

•  Operations and Management
–  Monitoring, management, metering
–  Alerts and notifications
–  Multi Cluster Federation

•  CLI and REST API


Falcon At A Glance
Data Processing Applications

Falcon Data Management Framework
Data Import
and
Replication

Scheduling
and
Coordination

Data Lifecycle
Policies

Multi-Cluster
Management

SLA
Management

>  Falcon offers a high-level abstraction of key services for Hadoop data management needs.
>  Complex data processing logic is handled by Falcon instead of hard-coded in data processing apps.
>  Falcon enables faster development of ETL, reporting and other data processing apps on Hadoop.


Falcon Example: Replication
Cleansed
Data

Conformed
Data

Access
Data
Replication

Replication

Staged Data

Staged Data

Processed
Data

>  Falcon manages workflow and replication.
>  Enables business continuity without requiring full data representation.
>  Failover clusters can be smaller than primary clusters.


Falcon Example: Retention

Staged Data

Cleansed Data

Conformed
Data

Access Data

Retain 20
Years

Retain 3 Years

Retain 3 Years

Retain Last
Copy Only

>  Sophisticated retention policies expressed in one place.
>  Simplify data retention for audit, compliance, or for data re-processing.


Falcon Example: Late Data Handling
Online
Transaction
Data (via
Sqoop)
Wait up to 4
hours for FTP data
to arrive

Staged Data

Combined
Dataset

Web Log Data
(via FTP)

>  Processing waits until all required input data is available.
>  Checks for late data arrivals, issues retrigger processing as necessary.
>  Eliminates writing complex data handling rules within applications.


Examples


Page 43

Example: Cluster Specification
<?xml version="1.0"?>!
readonly!
!
write!
<cluster colo=”my-local-cluster" description="" name="cluster-alpha">
!
<interfaces>!
<interface type="readonly" endpoint="hftp://nn:50070" version="2.2.0" />!
<interface type="write" endpoint="hdfs://nn:8020" version="2.2.0" />!
<interface type="execute" endpoint=”rm:8050" version="2.2.0" />!
<interface type="workflow" endpoint="http://os:11000/oozie/" version="4.0.0" />!
<interface type="messaging" endpoint="tcp://mq:61616?daemon=true" version="5.1.6" />!
</interfaces>!
<locations>!
execute!
<location name="staging" path="/apps/falcon/cluster-alpha/staging" />!
<location name="temp" path="/tmp" />!
<location name="working" path="/apps/falcon/cluster-alpha/working" />!
</locations>!
</cluster>!

workflow!


NameNode

Resource
Manager

Oozie Server

Page 44

Example: Weblogs
Replication and Retention


Page 45

Example 1: Weblogs
•  Weblogs land hourly in my primary cluster
•  HDFS location is /weblogs/{date}

•  I want to:
–  Evict weblogs from primary cluster after 1 day


Page 46

Feed Specification 1: Weblogs
<feed description="" name="feed-weblogs1" xmlns="uri:falcon:feed:0.1” >!
<frequency>hours(1)</frequency>!
!
<clusters>!
!<cluster name="cluster-primary" type="source”>!
! <validity start="2013-10-24T00:00Z" end="2014-12-31T00:00Z"/>!
! <retention limit="days(1)" action="delete"/>!
!</cluster>!
</clusters>!
!
<locations>!
!<location type="data" path="/weblogs/${YEAR}-${MONTH}-${DAY}-${HOUR}" />!
</locations>!
!
<ACL owner="hdfs" group="users" permission="0755" />!
<schema location="/none" provider="none"/>!
</feed>!

Cluster where
data is located

Retention
policy 1 day

Location of the
data


Page 47

Example 2: Weblogs

•  I want to:
–  Replicate weblogs to my secondary cluster
–  Evict weblogs from primary cluster after 2 days
–  Evict weblogs from secondary cluster after 1 week


Page 48

<feed description=“" name=”feed-weblogs2” xmlns="uri:falcon:feed:0.1">!
!
<clusters>!
<cluster name=”cluster-primary" type="source">!
<validity start="2012-01-01T00:00Z" end="2099-12-31T00:00Z"/>!
<retention limit="days(2)" action="delete"/>!
</cluster>!
<cluster name=”cluster-secondary" type="target">!
<retention limit=”days(7)" action="delete"/>!
</cluster>!
</clusters>!
!
<locations>!
<location type="data” path="/weblogs/${YEAR}-${MONTH}-${DAY}-${HOUR} "/>!
</locations>!

Cluster where
data is located

Retention
policy 2 days

Cluster where
data will be
replicated

Retention
policy 1 week

!
<ACL owner=”hdfs" group="users" permission="0755"/>!
</feed>!


Location of the
data

Example 3: Weblogs

•  I want to:
–  Replicate weblogs to a discovery cluster
–  Replicate weblogs to a BCP cluster
–  Evict weblogs from primary cluster after 2 days
–  Evict weblogs from discovery cluster after 1 week
–  Evict weblogs from BCP cluster after 3 months


Page 50

<feed description=“” name=”feed-weblogs” xmlns="uri:falcon:feed:0.1">!
!
<clusters>!
<cluster name=”cluster-primary" type="source">!
<retention limit="days(2)" action="delete"/>!
</cluster>!
<cluster name=“cluster-discovery" type="target">!
<retention limit=”days(7)" action="delete"/>!
<locations>!
<location type="data” path="/projects/recommendations/${YEAR}-${MONTH}-${DAY}-${HOUR} "/>!
</locations>!
</cluster>!
<cluster name=”cluster-bcp" type="target">!
<retention limit=”months(3)" action="delete"/>!
<locations>!
</locations>!
</cluster>!
</clusters>!
!
<locations>!
</locations>!
!
<ACL owner=”hdfs" group="users" permission="0755"/>!
</feed>!


Cluster
specific
location

Cluster
specific
location

Apache Knox
Secure Access to Hadoop


Connecting to the Cluster..Edge Nodes
•  What is an Edge Node?
–  Nodes in a DMZ zone that has access to the cluster. Only way to access the
cluster
–  Hadoop client Apis and MR/Pig/Hive jobs would be executed from these edge
nodes.
–  Users SSH to Edge Node and upload all job artifacts and then execute API/
Commands commands from shell

SSH!
User

Edge
Node

Hadoop

• Challenges
– SSH, Edge Node, and job maintenance nightmare
– Difficult to integrate with Applications

Page 53

Connecting to the Cluster..REST API
Service

API

WebHDFS

Supports HDFS user operations including reading files,
writing to files, making directories, changing permissions
and renaming. Learn more about WebHDFS.

WebHCat

Job control for MapReduce, Pig and Hive jobs, and
HCatalog DDL commands. Learn more about WebHCat.

Oozie

Job submission and management, and Oozie
administration. Learn more about Oozie.

•  Useful for connecting to Hadoop from the outside the cluster
•  When more client language flexibility is required
–  i.e. Java binding not an option

•  Challenges
–  Client must have knowledge of cluster topology
–  Required to open ports (and in some cases, on every host) outside the cluster


Page 54

Apache Knox Gateway – Perimeter Security

Simplified Access

Centralized Security

•  Single Hadoop access point
•  Rationalized REST API hierarchy

•  Eliminate SSH “edge node”
•  LDAP and ActiveDirectory auth

•  Consolidated API calls
•  Multi-cluster support

•  Central API management + audit

•  Client DSL


Page 55

Knox Gateway Network Architecture
Kerberos/
Enterprise
Identity
Provider

Enterprise/
Cloud SSO
Provider

Firewall

Firewall
Browser

Identity Providers

Secure Hadoop Cluster 1
Masters
NN

Web
HCat

JT
DN

Ambari
Client

DMZ

Oozie

TT
YARN

HBase

Hive

Knox Gateway Cluster
REST
Client

GW
GW
GW

JDBC
Client

Secure Hadoop Cluster 2
Masters
NN

JT
DN

A stateless cluster of
reverse proxy instances
deployed in DMZ

Ambari Server/
Hue Server


Web
HCat

Oozie

TT

-Requests streamed through GW
to Hadoop services after auth.
HBase
Hive
-URLs rewritten to refer to
gateway

YARN

Page 56

Wot no 2.2.0?
Where can I get the Hadoop 2.2.0 fix?


Page 57

Like the Truth, Hadoop 2.2.0 is out there…
Component

HDP2.0 CDH4

CDH5
Beta

Intel
IDH3.0

MapR 3

IBM Big
Insights
2.1

Hadoop
Common

2.2.0

2.0.0

2.2.0

2.0.4

N/A

1.1.1

Hive +
HCatalog

0.12

0.10 +
0.5

0.11

0.10 + 0.5 0.11

0.9 + 0.4

Pig

0.12

0.11

0.11

0.10

0.11

0.10

Mahout

0.8

0.7

0.8

0.8

0.8

N/A

Flume

1.4.0

1.4.0

1.4.0

1.3.0

1.4.0

1.3.0

Oozie

4.0.0

3.3.2

4.0.0

3.3.0

3.3.2

3.2.0

Sqoop

1.4.4

1.4.3

1.4.4

1.4.3

1.4.4

1.4.2

HBase

0.96.0

0.94.6

95.2

0.94.7

94.9

0.94.3


Page 58

Thank You
THUG Life


2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a 2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0

Similar a 2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0 (20)

Más de Adam Muise

Más de Adam Muise (20)

Último

Último (20)

2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0