HDFS Federation Scales HDFS with Multiple Independent Namespaces

HDFS Federation Suresh Srinivas Yahoo! Inc

Single Namenode Limitations Namespace NN process stores entire metadata in memory Number of objects (files + blocks) are limited by the heap size 50G heap for 200 million objects - supports 4000 DNs, 12 PB of storage at 40 MB average file size Storage Growth– DN storage 4TB to 36TB; cluster size to 8000 DNs => Storage from 12PB to > 100PB Performance File system operations limited to a single NN throughput Bottleneck for Next Generation Of MapReduce Isolation Experimental apps can affect production apps Cluster Availability Failure of single namenode brings down the entire cluster

Scaling the Name Service: Separate Block Management from NN Not to scale Block-reports for Billions of blocks requires rethinking block layer # clients Good isolation properties 100x 50x Distributed Namenode 20x Multiple Namespace volumes Partial NS in memory With Namespace volumes 4x All NS in memory Partial NS (Cache) in memory 1x Archives # names 100M 10B 200M 1B 2B 20B 3

Why Vertical Scaling is Not Sufficient? Why not use NNs with 512GB memory? Startup time is huge – currently 30mins to 2 hrs for 50GB NN Stop the world GC failures can bring down the cluster All DNs could be declared dead Debugging problems with large JVM heap is harder Optimizing NN memory usage is expensive Changes in trunk reduces used memory; expensive development time, code complexity Diminishing returns

Why Federation? Simplicity Simpler robust design Multiple independent namenodes Core development in 3.5 months Changes mostly in Datanode, Config and Tools Very little change in Namenode Simpler implementation than Distributed Namenode Lesser scalability – but will serve the immediate needs Federation is an optional feature Existing single NN configuration supported as is

HDFS Background Namenode Block Management Datanode Datanode … Physical Storage HDFS has 2 main layers Namespace management Manages namespace consisting of directories, files and blocks Supports file system operations such as create/modify/list files & dirs Block storage Block management Manages DN membership Supports add/delete/modify/get block location Manages replication and replica placement Physical storage Supports read/write access to blocks. Namespace NS Block Storage

Federation Datanode 2 Datanode m Datanode 1 ... ... ... Pools k Pools n Pools 1 Block Pools Balancer NN-n NN-k NN-1 Foreign NS n NS1 ... ... NS k ,[object Object]

NNs provide both namespace and block management

Stores blocks for all the block pools

Non-HDFS namespaces can share the same storage,[object Object]

Datanode Changes A thread per NN register with all the NNs periodic heartbeat to all the NNs with utilization summary block report to the NN for its block pool NNs can be added/removed/upgraded on the fly Block Pools Automatically created when DN talks to NN Block identified by ExtendedBlockID = BlockPoolID + BlockID Unique Block Pool ID across clusters - enables merging clusters DN data structures are “indexed” by BPID BlockMap, storage etc. indexed by BPID Upgrade/rollback happens per Block Pool/per NN

Other Changes Decommissioning Tools to initiate and monitor decom at all the NNs Balancer Allows balancing at datanode or block pool level Datanode daemons disk scanner and directory scanner adapted to federation NN Web UI Additionally shows NN’s block pool storage utilization

New Cluster Manager Web UI Cluster Summary Shows overall cluster storage utilization List of namenodes For each NN - BPID, storage utilization, number of missing blocks, number of live & dead DNs NN link to go to NN Web UI Decommissioning status of DNs

Managing Namespaces Client-side mount-table / Federation has multiple namespaces – don’t you need a single global namespace? Key is to share the data and the names used to access the shared data. A global namespace is one way to do that – but even there we talk of several large “global” namespaces Client-side mount table is another way to share Shared mount-table => “global” shared view Personalized mount-table => per-application view Share the data that matter by mounting it tmp home project data

HDFS Federation Scales HDFS with Multiple Independent Namespaces

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Destacado

Destacado (14)

Similar a HDFS Federation Scales HDFS with Multiple Independent Namespaces

Similar a HDFS Federation Scales HDFS with Multiple Independent Namespaces (20)

Más de Yahoo Developer Network

Más de Yahoo Developer Network (20)

Último

Último (20)

HDFS Federation Scales HDFS with Multiple Independent Namespaces