Más contenido relacionado La actualidad más candente (20) Similar a Apache Hadoop 3.0 Community Update (20) Más de DataWorks Summit (20) Apache Hadoop 3.0 Community Update2. 2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
About.html
Sanjay Radia
Chief Architect, Founder, Hortonworks
Part of the original Hadoop team at Yahoo! since 2007
– Chief Architect of Hadoop Core at Yahoo!
–Apache Hadoop PMC and Committer
Prior
Data center automation, virtualization, Java, HA, OSs, File Systems
Startup, Sun Microsystems, Inria …
Ph.D., University of Waterloo
Page 2
3. 3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Why Hadoop 3.0
Lot of content in Trunk that did not make
it to 2.x branch
JDK Upgrade – does not truly require
bumping major number
Hadoop command scripts rewrite
(incompatible)
Big features that need stabilizing major
release – Erasure codes
YARN: long running services
Ephemeral Ports (incompatible)
Driving Reasons Some features taking advantage of 3.0
4. 4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache Hadoop 3.0
HDFS: Erasure codes
YARN:
– Long running services,
– Scheduler enhancements,
– Isolation & Docker
– UI
Lots of Trunk content
JDK8 and newer dependent libraries
- 3.0.0-alpha1 - Sep/3/2016
- Alpha2 - Jan/25/2017
- Alpha3 – May/16/2017
- Alpha4 – July/7/2017
- Beta/GA – Q4 2017 (Estimated)
Key Takeaways Release Timeline
3.0
5. 6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Agenda
Major changes you should know
before upgrade Hadoop 3.0
– JDK upgrade
– Dependency upgrade
– Change on default port for
daemon/services
– Shell script rewrite
Features
– Hadoop Common
• Client-Side Classpath Isolation
• Shell script rewrite
– HDFS/Storage
• Erasure Coding
• Multiple Standby NameNodes
• Intradata balancer
• Cloud Storage: Support for Azure Data Lake, S3
consistency & performance
– YARN
• Support for long running services
• Scheduling enhancements: : App / Queue
Priorities, global scheduling, placement
strategies
• New UI
• ATS v2
– MAPREDUCE
• Task-level native optimization
HADOOP-11264
6. 7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Minimum JDK for Hadoop 3.0.x is JDK8 OOP-11858
– Oracle JDK 7 is EoL at April 2015!!
Moving forward to use new features of JDK8
– Lambda Expressions – starting to use this
– Stream API
– Security enhancements
– Performance enhancement for HashMaps, IO/NIO, etc.
Hadoop’s evolution with JDK upgrades
– Hadoop 2.6.x - JDK 6, 7, 8 or later
– Hadoop 2.7.x/2.8.x/2.9.x - JDK 7, 8 or later
– Hadoop 3.0.x - JDK 8 or later
Hadoop Operation - JDK Upgrade
7. 9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Previously, the default ports of multiple Hadoop services were in the Linux ephemeral
port range (32768-61000)
– Can conflict with other apps running on the same node
– Can cause problem during rolling restart if another app takes the port
New ports:
– Namenode ports: 50470 9871, 50070 9870, 8020 9820
– Secondary NN ports: 50091 9869, 50090 9868
– Datanode ports: 50020 9867, 50010 9866, 50475 9865, 50075 9864
KMS service port 16000 9600
Change of Default Ports for Hadoop Services
8. 10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Classpath isolation (HADOOP-11656)
Hadoop leaks lots of dependencies
onto the application’s classpath
○ Known offenders: Guava, Protobuf,
Jackson, Jetty, …
○ Potential conflicts with your app
dependencies (No shading)
No separate HDFS client jar means
server jars are leaked
● NN, DN libraries pulled even though
not needed
HDFS-6200: Split HDFS client into
separate JAR
HADOOP-11804: Shaded hadoop-
client dependency
YARN-6466: Shade the task
umbilical for a clean YARN
container environment (ongoing)
9. 12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
HDFS
Support for Three NameNodes for HA
Intra data node balancer
Cloud storage improvements (see afternoon talk)
Erasure coding
10. 13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Current (2.x) HDFS Replication Strategy
Three replicas by default
– 1st replica on local node, local rack or random node
– 2nd and 3rd replicas on the same remote rack
– Reliability: tolerate 2 failures
Good data locality, local shortcut
Multiple copies => Parallel IO for parallel compute
Very Fast block recovery and node recovery
– Parallel recover - the bigger the cluster the faster
– 10TB Node recovery 30sec to a few hours
3/x storage overhead vs 1.4-1.6 of Erasure Code
– Remember that Hadoop’s JBod is very cheap
• 1/10 - 1/20 of SANs
• 1/10 – 1/5 of NFS
r1
Rack I
DataNode
r2
Rack II
DataNode
r3
11. 14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Erasure Coding
k data blocks + m parity blocks (k + m)
– Example: Reed-Solomon 6+3
Reliability: tolerate m failures
Save disk space
Save I/O bandwidth on the write path
1.5x storage
overhead
Tolerate any 3
failures
b3b1 b2 P1b6b4 b5 P2 P3
6 data blocks 3 parity blocks
3-replication (6, 3) Reed-Solomon
Maximum fault
Tolerance
2 3
Disk usage
(N byte of data)
3N 1.5N
12. 15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Block Reconstruction
Block reconstruction overhead
– Higher network bandwidth cost
– Extra CPU overhead
• Local Reconstruction Codes (LRC), Hitchhiker
Huang et al. Erasure Coding in Windows Azure Storage. USENIX ATC'12.
Sathiamoorthy et al. XORing elephants: novel erasure codes for big data. VLDB 2013.
Rashmi et al. A "Hitchhiker's" Guide to Fast and Efficient Data Reconstruction in Erasure-coded Data Centers. SIGCOMM'14.
b4
Rack
b2
Rack
b3
Rack
b1
Rack
b6
Rack
b5
Rack RackRack
P1 P2
Rack
P3
13. 16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Erasure Coding on Contiguous/Striped Blocks
Two Approaches
EC on contiguous blocks
– Pros: Better for locality
– Cons: small files cannot be handled
EC on striped blocks
– Pros: Leverage multiple disks in parallel
– Pros: Works for small small files
– Cons: No data locality for readers
C1 C2 C3 C4 C5 C6 PC1 PC2 PC3
C7 C8 C9 C10 C11 C12 PC4 PC5 PC6
stripe 1
stripe 2
stripe n
b1 b2 b3 b4 b5 b6 P1 P2 P3
6 Data Blocks 3 Parity Blocks
b3b1 b2 b6b4 b5
File f1
P1 P2 P3
parity blocks
File f2 f3
data blocks
14. 18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Erasure Coding Zone
Create a zone on an empty directory
– Shell command:
hdfs erasurecode –createZone [-s <schemaName>] <path>
All the files under a zone directory are automatically erasure coded
– Rename across zones with different EC schemas are disallowed
15. 19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Write Pipeline for Replicated Files
Write pipeline to datanodes
Durability
– Use 3 replicas to tolerate maximum 2 failures
Visibility
– Read is supported for being written files
– Data can be made visible by hflush/hsync
Consistency
– Client can start reading from any replica and failover to any other replica to read the same data
Appendable
– Files can be reopened for append
* DN = DataNode
DN1 DN2 DN3
data data
ackack
Writer
data
ack
16. 20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Parallel Write for EC Files
Parallel write
– Client writes to a group of 9 datanodes at the same time
– Calculate Parity bits at client side, at Write Time
Durability
– (6, 3)-Reed-Solomon can tolerate maximum 3 failures
Visibility (Same as replicated files)
– Read is supported for being written files
– Data can be made visible by hflush/hsync
Consistency
– Client can start reading from any 6 of the 9 replicas
– When reading from a datanode fails, client can failover to any other
remaining replica to read the same data.
Appendable (Same as replicated files)
– Files can be reopened for append
DN1
DN6
DN7
data
parit
y
ack
ackWriter
data
ack
DN9
parit
yack
……
Stipe size 1MB
17. 21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
EC: Write Failure Handling
Datanode failure
– Client ignores the failed datanode and continue writing.
– Able to tolerate 3 failures.
– Require at least 6 datanodes.
– Missing blocks will be reconstructed later.
DN1
DN6
DN7
data
parit
y
ack
ackWriter
data
ack
DN9
parit
yack
……
18. 22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Replication: Slow Writers & Replace Datanode on Failure
Write pipeline for replicated files
– Datanode can be replaced in case of failure.
Slow writers
– A write pipeline may last for a long time
– The probability of datanode failures increases over time.
– Need to replace datanode on failure.
EC files
– Do not support replace-datanode-on-failure.
– Slow writer improved
DN1 DN4
data
ack
DN3DN2
data
ack
Writer
data
ack
19. 23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Reading with Parity Blocks
Parallel read
– Read from 6 Datanodes with data blocks
– Support both stateful read and pread
Block reconstruction
– Read parity blocks to reconstruct
missing blocks
DN3
DN7
DN1
DN2
Reader
DN4
DN5
DN6
Block3
reconstruct
Block2
Block1
Block4
Block5
Block6Parity1
20. 25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
EC implications
File data is striped across multiple nodes and racks
Reads and writes are remote and cross-rack
Reconstruction is network-intensive, reads m blocks cross-rack
– Need fast network
• Require high network bandwidth between client-server
• Dead DataNode implies high network traffic and reconstruction time
Important to use optimized ISA-L for performance
– 1+ GB/s encode/decode speed, much faster than Java implementation
– CPU is no longer a bottleneck
Need to combine data into larger files to avoid an explosion in replica count
– Bad: 1x1GB file -> RS(10,4) -> 14x100MB EC blocks (4.6x # replicas)
– Good: 10x1GB file -> RS(10,4) -> 14x1GB EC blocks (0.46x # replicas)
Works best for archival / cold data usecases
Need
Fast
Network
21. 26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
EC performance – write performance faster with right EC lib
22. 27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
EC performance – TPC with no DN killed
23. 28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
EC performance - TPC with 2 DN killed
24. 29 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Erasure coding status
Massive development effort by the Hadoop community
○ 20+ contributors from many companies (Hortonworks, Y! JP, Cloudera, Intel, Huawei, …)
○ 100s of commits over three years (started in 2014)
Erasure coding is feature complete!
Solidifying some user APIs in preparation for beta1
Current focus is on testing and integration efforts
○ Want the complete Hadoop stack to work with HDFS erasure coding enabled
○ Stress / endurance testing to ensure stability
25. 30 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache Hadoop 3.0 – YARN Enhancements
YARN Scheduling Enhancements
Support for Long Running Services
Re-architecture for YARN Timeline Service - ATS v2
Better elasticity and resource utilization
Better resource isolation and Docker!!
Better User Experiences
Other Enhancements
3.0
26. 31 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Scheduling Enhancements
Application priorities within a queue: YARN-1963
– In Queue A, App1 > App 2
Inter-Queue priorities
– Q1 > Q2 irrespective of demand / capacity
– Previously based on unconsumed capacity
Affinity / anti-affinity: YARN-1042
– More restraints on locations
• Affinity to rack (where you have your sibling)
• Anti-affinity (e.g. Hbase region servers)
Global Scheduling: YARN-5139
– Get rid of scheduling triggered on node heartbeat
– Replaced with global scheduler that has parallel threads
• Globally optimal placement –expect evolution of the scheduler
• Critical for long running services – they stick to the allocation – better be a good one
• Enhanced container scheduling throughput (6x)
27. 32 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Scheduling Enhancements (Contd.)
CapacityScheduler improvements
– Queue Management Improvements
• More Dynamic Queue reconfiguration
• REST API support for queue management
– Absolute resource configuration support
– Priority Support in Application and Queue
– Preemption improvements
• Inter-Queue preemption support
28. 33 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Key Drivers for Long Running Services
Consolidation of Infrastructure
– Hadoop clusters have a lot of compute and storage resources (some unused)
• Can’t I use Hadoop’s resources for non-Hadoop load?
• Openstack is hard to manage/operate, can I use YARN?
• VMs are expensive, can I use YARN?
• But does it support Docker? – yes, we heard you
Hadoop related Data Services that run outside a Hadoop cluster
– Why can’t I run them in the Hadoop cluster
Run Hadoop services (Hive, HBase) on YARN
– Run Multiple instances
– Benefit from YARN’s Elasticity and resource management
29. 34 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Built-in support for long running Service in YARN
A native YARN framework. YARN-4692
• Abstract common Framework (Similar to Slider) to support long running service
• More simplified API (to manage service lifecycle)
• Better support for long running service
Recognition of long running service
• Affect the policy of preemption, container reservation, etc.
• Auto-restart of containers
• Containers for long running service are restarted on same node in case of local state
Service/application upgrade support – YARN-4726
• In general, services are expected to run long enough to cross versions
Dynamic container configuration
• Only ask for resources just enough, but adjust them at runtime (memory harder)
30. 35 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Discovery services in YARN
Services can run on any YARN node; how do get its IP?
– It can also move due to node failure
YARN Service Discovery via DNS: YARN-4757
– Expose existing service information in YARN registry via DNS
• YARN service registry’s records will be converted into DNS entries
– Discovery of container IP and service port via standard DNS lookups.
• Application
– zkapp1.user1.yarncluster.com -> 192.168.10.11:8080
• Container
– Container 1454001598828-0001-01-00004.yarncluster.com -> 192.168.10.18
31. 36 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
A More Powerful YARN
Elastic Resource Model
– Dynamic Resource Configuration (YARN-291)
• Allow tune down/up on NM’s resource in runtime
– E.g. Helps when Hadoop cluster nodes are shared with other workloads
– E.g. Hadoop-on-Hadoop allows flexible resource allocation
– Graceful decommissioning of NodeManagers (YARN-914)
• Drains a node that’s being decommissioned to allow running containers to finish
• E.g. Removing a node for maintenance, Spot pricing on cloud, …
Efficient Resource Utilization
– Support for container resizing (YARN-1197)
• Allows applications to change the size of an existing container
• E.g. long running services
32. 37 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
More Powerful YARN (Contd.)
Resource Isolation
– Resource isolation support for disk and network
• YARN-2619 (disk), YARN-2140 (network)
• Containers get a fair share of disk and network resources using Cgroups
– Docker support in LinuxContainerExecutor (YARN-3611)
• Support to launch Docker containers alongside process
• Packaging and resource isolation
– Packing easier e.g. TensorFlow
• Complements YARN’s support for long running services
33. 38 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Hadoop Apps
Docker on Yarn & YARN on YARN - YCloud
YARN
MR Tez Spark
TensorFlow YARN
MR Tez Spark
Can use Yarn to test Hadoop!!
34. 39 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
YARN New UI (YARN-3368)
35. 42 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Other YARN work planned in Hadoop 3.X
Resource profiles (YARN-3926)
– Users can specify resource profile name instead of individual resources
– Resource types read via a config file
YARN federation (YARN-2915)
– Allows YARN to scale out to tens of thousands of nodes
– Cluster of clusters which appear as a single cluster to an end user
36. 43 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Compatibility & Testing
3.0
37. 44 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Compatibility
Preserves wire-compatibility with Hadoop 2 clients
○ Impossible to coordinate upgrading off-cluster Hadoop clients
Will support rolling upgrade from Hadoop 2 to Hadoop 3
○ Can’t take downtime to upgrade a business-critical cluster
Not fully preserving API compatibility!
○ Dependency version bumps
○ Removal of deprecated APIs and tools
○ Shell script rewrite, rework of Hadoop tools scripts
○ Incompatible bug fixes
38. 45 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Testing and validation
Extended alpha → beta → GA plan designed for stabilization
EC already has some usagein production (700 nodes at Y! JP)
– Hortonworks has worked closely with this very large customer
Hortonworks is integrating and testing HDP 3
– Integrating with all components of HDP stack
– HDP2 ++ integration tests
Cloudera is also testing Hadoop 3 as part of their stack
Plans for extensive HDFS EC testing by Hortonworks and Cloudera
Happy synergy between 2.8.x and 3.0.x lines
– Shares much of the same code, fixes flow into both
– Yahoo! Deployments based on 2.8.0
39. 46 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Summary : What’s new in Apache Hadoop 3.0?
Storage Optimization
HDFS: Erasure codes
Improved Utilization
YARN: Long Running Services
YARN: Schedule Enhancements
Additional Workloads
YARN: Docker & Isolation
Easier to Use
New User Interface
Refactor Base
Lots of Trunk content
JDK8 and newer dependent libraries
3.0
40. 47 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Thank you!
Reminder: BoFs on Thursday
Notas del editor Data Trends
From Characteristics of the Data to Data Consumption & Interaction
According to IBM, every day we create 2.5 quintillion bytes of data — so much that 90% of the data in the world today has been created in the last two years.
Insight from Data is a key competitive differentiator
Open Source is evolving and adapting with these trends the fastest
Adopting Hadoop is not a destination but a journey Data Trends
From Characteristics of the Data to Data Consumption & Interaction
According to IBM, every day we create 2.5 quintillion bytes of data — so much that 90% of the data in the world today has been created in the last two years.
Insight from Data is a key competitive differentiator
Open Source is evolving and adapting with these trends the fastest
Adopting Hadoop is not a destination but a journey it enables online EC which bypasses the conversion phase and immediately saves storage space; this is especially desirable in clusters with high end networking. Second, it naturally distributes a small file to multiple DataNodesand eliminates the need to bundle multiple files into a single coding group.
Data Trends
From Characteristics of the Data to Data Consumption & Interaction
According to IBM, every day we create 2.5 quintillion bytes of data — so much that 90% of the data in the world today has been created in the last two years.
Insight from Data is a key competitive differentiator
Open Source is evolving and adapting with these trends the fastest
Adopting Hadoop is not a destination but a journey Previously based on uncomsumed capacity
If 70% capacity has lots of uncomsumed capcity it is scheduled first
Now you can say that the 30% queue is higher priority Original Yarn design was not just for batch jobs.
- we started with that but the design was general Graceful degradation
- remove nodes gracefully
- for cloud especially if you are using spot pricing App centric – top two left pictures
Node centric
Resource centric – load vs capacity – overall and by queues
Cluster centric –
nodes summary
heatmap of resource usage across nodes Data Trends
From Characteristics of the Data to Data Consumption & Interaction
According to IBM, every day we create 2.5 quintillion bytes of data — so much that 90% of the data in the world today has been created in the last two years.
Insight from Data is a key competitive differentiator
Open Source is evolving and adapting with these trends the fastest
Adopting Hadoop is not a destination but a journey Data Trends
From Characteristics of the Data to Data Consumption & Interaction
According to IBM, every day we create 2.5 quintillion bytes of data — so much that 90% of the data in the world today has been created in the last two years.
Insight from Data is a key competitive differentiator
Open Source is evolving and adapting with these trends the fastest
Adopting Hadoop is not a destination but a journey