Google File System

 GFS is scalable distributed file system for
large distributed data-intensive
applications.

 Google had key observations upon
which they decided to build their own
DFS.
 Cost Effective:
› The system is built using inexpensive
commodity components where components
failure is the norm and not the exception.
› So the system must detect, tolerate, and
recover from failures on a routine basis.

 File Size:
› Multi GB files are the common case, so the
system must be optimized in managing large
files
› Small files also are supported but no need to
optimize for them.

 Read Operation:
› Large Data Streams
 An operation reads hundreds of KBs or maybe 1MB
or more.
 Successive operations from the same client reads
usually from the same file region.
› Random Reads
 An operation reads a few KBs staring from an
arbitrary offset.
 Performance - conscious applications usually
patch and sort their small reads to advance
steadily in the file instead going back and forth.

 Write Operations:
› Are the same in size as the read operations.
› Once written the files are seldom modified.
› Write operations are in the form of sequential
append.
› Random writes are supported but not
efficient.

 Transaction Management:
› Usually applications use GFS in the form of
Producer- Consumer model.
› Many Producer can be writing to the same
file concurrently.
› Atomic writes and Synchronization between
different producers must be optimized.

 Latency Vs High Sustained Bandwidth.
› Client don’t have a tight SLA for read and
write operations response time, instead they
care more about processing and moving
data bulks in high rate.

 GFS provides an interface to:
› Create
› Delete
› Open
› Close
› Read
› Write
› Snapshot (Copy)
› Record Append

 The system is organized into clusters.
 Each Cluster has the following
components:
› Single Cluster Master
› Multiple Chunk Servers
› Multiple Clients (System Environment)

 File are divided into fixed size .
 Chunk size is 64 MB.
assigns a 64 bit identifier called
chunk handle for each Chunk upon
creation.
stores chunks on local disk.
 For reliability, each chunk is replicated
across multiple chunk servers.

maintains file system meta
data.
› Operations on Files and chunks
namespaces.
› Mapping between Files and Chunks.
› Current location of Chunks.
› Chunk leas management
› Garbage Collection
› Chunk migration between Chunk Servers.

 Namespace Management and locking.
 Replica Placement.
 Replica Creation, Re-replication, and
Rebalancing.
 Garbage Collection.
 Stale Replica Detection.

Google File System

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a Google File System

Similar a Google File System (20)

Más de Amgad Muhammad

Más de Amgad Muhammad (6)

Google File System