SlideShare a Scribd company logo
1 of 94
Download to read offline
11: Google Filesystem
Zubair Nabi
zubair.nabi@itu.edu.pk
April 20, 2013
Zubair Nabi 11: Google Filesystem April 20, 2013 1 / 29
Outline
1 Introduction
2 Google Filesystem
3 Hadoop Distributed Filesystem
Zubair Nabi 11: Google Filesystem April 20, 2013 2 / 29
Outline
1 Introduction
2 Google Filesystem
3 Hadoop Distributed Filesystem
Zubair Nabi 11: Google Filesystem April 20, 2013 3 / 29
Filesystem
The purpose of a filesystem is to:
1 Organize and store data
Zubair Nabi 11: Google Filesystem April 20, 2013 4 / 29
Filesystem
The purpose of a filesystem is to:
1 Organize and store data
2 Support sharing of data among users and applications
Zubair Nabi 11: Google Filesystem April 20, 2013 4 / 29
Filesystem
The purpose of a filesystem is to:
1 Organize and store data
2 Support sharing of data among users and applications
3 Ensure persistence of data after a reboot
Zubair Nabi 11: Google Filesystem April 20, 2013 4 / 29
Filesystem
The purpose of a filesystem is to:
1 Organize and store data
2 Support sharing of data among users and applications
3 Ensure persistence of data after a reboot
4 Examples include FAT, NTFS, ext3, ext4, etc.
Zubair Nabi 11: Google Filesystem April 20, 2013 4 / 29
Distributed filesystem
Self-explanatory: the filesystem is distributed across many machines
Zubair Nabi 11: Google Filesystem April 20, 2013 5 / 29
Distributed filesystem
Self-explanatory: the filesystem is distributed across many machines
The DFS provides a common abstraction to the dispersed files
Zubair Nabi 11: Google Filesystem April 20, 2013 5 / 29
Distributed filesystem
Self-explanatory: the filesystem is distributed across many machines
The DFS provides a common abstraction to the dispersed files
Each DFS has an associated API that provides a service to clients,
which are normal file operations, such as create, read, write, etc.
Zubair Nabi 11: Google Filesystem April 20, 2013 5 / 29
Distributed filesystem
Self-explanatory: the filesystem is distributed across many machines
The DFS provides a common abstraction to the dispersed files
Each DFS has an associated API that provides a service to clients,
which are normal file operations, such as create, read, write, etc.
Maintains a namespace which maps logical names to physical names
Zubair Nabi 11: Google Filesystem April 20, 2013 5 / 29
Distributed filesystem
Self-explanatory: the filesystem is distributed across many machines
The DFS provides a common abstraction to the dispersed files
Each DFS has an associated API that provides a service to clients,
which are normal file operations, such as create, read, write, etc.
Maintains a namespace which maps logical names to physical names
Simplifies replication and migration
Zubair Nabi 11: Google Filesystem April 20, 2013 5 / 29
Distributed filesystem
Self-explanatory: the filesystem is distributed across many machines
The DFS provides a common abstraction to the dispersed files
Each DFS has an associated API that provides a service to clients,
which are normal file operations, such as create, read, write, etc.
Maintains a namespace which maps logical names to physical names
Simplifies replication and migration
Examples include the Network Filesystem (NFS), Andrew Filesystem
(AFS), etc.
Zubair Nabi 11: Google Filesystem April 20, 2013 5 / 29
Outline
1 Introduction
2 Google Filesystem
3 Hadoop Distributed Filesystem
Zubair Nabi 11: Google Filesystem April 20, 2013 6 / 29
Introduction
Designed by Google to meet its massive storage needs
Zubair Nabi 11: Google Filesystem April 20, 2013 7 / 29
Introduction
Designed by Google to meet its massive storage needs
Shares many goals with previous distributed filesystems such as
performance, scalability, reliability, and availability
Zubair Nabi 11: Google Filesystem April 20, 2013 7 / 29
Introduction
Designed by Google to meet its massive storage needs
Shares many goals with previous distributed filesystems such as
performance, scalability, reliability, and availability
At the same time, design driven by key observations of their workload
and infrastructure, both current and future
Zubair Nabi 11: Google Filesystem April 20, 2013 7 / 29
Design Goals
1 Failure is the norm rather than the exception: The GFS must
constantly introspect and automatically recover from failure
Zubair Nabi 11: Google Filesystem April 20, 2013 8 / 29
Design Goals
1 Failure is the norm rather than the exception: The GFS must
constantly introspect and automatically recover from failure
2 The system stores a fair number of large files: Optimize for large
files, on the order of GBs, but still support small files
Zubair Nabi 11: Google Filesystem April 20, 2013 8 / 29
Design Goals
1 Failure is the norm rather than the exception: The GFS must
constantly introspect and automatically recover from failure
2 The system stores a fair number of large files: Optimize for large
files, on the order of GBs, but still support small files
3 Applications prefer to do large streaming reads of contiguous
regions: Optimize for this case
Zubair Nabi 11: Google Filesystem April 20, 2013 8 / 29
Design Goals (2)
4 Most applications perform large, sequential writes that are mostly
append operations: Support small writes but do not optimize for them
Zubair Nabi 11: Google Filesystem April 20, 2013 9 / 29
Design Goals (2)
4 Most applications perform large, sequential writes that are mostly
append operations: Support small writes but do not optimize for them
5 Most operations are producer-consume queues or many-way
merging: Support concurrent reads or writes by hundreds of clients
simultaneously
Zubair Nabi 11: Google Filesystem April 20, 2013 9 / 29
Design Goals (2)
4 Most applications perform large, sequential writes that are mostly
append operations: Support small writes but do not optimize for them
5 Most operations are producer-consume queues or many-way
merging: Support concurrent reads or writes by hundreds of clients
simultaneously
6 Applications process data in bulk at a high rate: Favour throughput
over latency
Zubair Nabi 11: Google Filesystem April 20, 2013 9 / 29
Interface
The interface is similar to traditional filesystems but no support for a
standard POSIX-like API
Zubair Nabi 11: Google Filesystem April 20, 2013 10 / 29
Interface
The interface is similar to traditional filesystems but no support for a
standard POSIX-like API
Files are organized hierarchically into directories with pathnames
Zubair Nabi 11: Google Filesystem April 20, 2013 10 / 29
Interface
The interface is similar to traditional filesystems but no support for a
standard POSIX-like API
Files are organized hierarchically into directories with pathnames
Support for create, delete, open, close, read, and write operations
Zubair Nabi 11: Google Filesystem April 20, 2013 10 / 29
Architecture
Consists of a single master and multiple chunkservers
Zubair Nabi 11: Google Filesystem April 20, 2013 11 / 29
Architecture
Consists of a single master and multiple chunkservers
The system can be accessed by multiple clients
Zubair Nabi 11: Google Filesystem April 20, 2013 11 / 29
Architecture
Consists of a single master and multiple chunkservers
The system can be accessed by multiple clients
Both the master and chunkservers run as user-space server processes
on commodity Linux machines
Zubair Nabi 11: Google Filesystem April 20, 2013 11 / 29
Files
Files are sliced into fixed-size chunks
Zubair Nabi 11: Google Filesystem April 20, 2013 12 / 29
Files
Files are sliced into fixed-size chunks
Each chunk is identifiable by an immutable and globally unique 64-bit
handle
Zubair Nabi 11: Google Filesystem April 20, 2013 12 / 29
Files
Files are sliced into fixed-size chunks
Each chunk is identifiable by an immutable and globally unique 64-bit
handle
Chunks are stored by chunkservers as local Linux files
Zubair Nabi 11: Google Filesystem April 20, 2013 12 / 29
Files
Files are sliced into fixed-size chunks
Each chunk is identifiable by an immutable and globally unique 64-bit
handle
Chunks are stored by chunkservers as local Linux files
Reads and writes to a chunk are specified by a handle and a byte
range
Zubair Nabi 11: Google Filesystem April 20, 2013 12 / 29
Files
Files are sliced into fixed-size chunks
Each chunk is identifiable by an immutable and globally unique 64-bit
handle
Chunks are stored by chunkservers as local Linux files
Reads and writes to a chunk are specified by a handle and a byte
range
Each chunk is replicated on multiple chunkservers
Zubair Nabi 11: Google Filesystem April 20, 2013 12 / 29
Files
Files are sliced into fixed-size chunks
Each chunk is identifiable by an immutable and globally unique 64-bit
handle
Chunks are stored by chunkservers as local Linux files
Reads and writes to a chunk are specified by a handle and a byte
range
Each chunk is replicated on multiple chunkservers
3 by default
Zubair Nabi 11: Google Filesystem April 20, 2013 12 / 29
Master
In charge of all filesystem metadata
Zubair Nabi 11: Google Filesystem April 20, 2013 13 / 29
Master
In charge of all filesystem metadata
Namespace, access control information, mapping between files and
chunks, and current locations of chunks
Zubair Nabi 11: Google Filesystem April 20, 2013 13 / 29
Master
In charge of all filesystem metadata
Namespace, access control information, mapping between files and
chunks, and current locations of chunks
Holds this information in memory and regularly syncs it with a log file
Zubair Nabi 11: Google Filesystem April 20, 2013 13 / 29
Master
In charge of all filesystem metadata
Namespace, access control information, mapping between files and
chunks, and current locations of chunks
Holds this information in memory and regularly syncs it with a log file
Also in charge of chunk leasing, garbage collection, and chunk
migration
Zubair Nabi 11: Google Filesystem April 20, 2013 13 / 29
Master
In charge of all filesystem metadata
Namespace, access control information, mapping between files and
chunks, and current locations of chunks
Holds this information in memory and regularly syncs it with a log file
Also in charge of chunk leasing, garbage collection, and chunk
migration
Periodically sends each chunkserver a heartbeat signal to check its
state and send it instructions
Zubair Nabi 11: Google Filesystem April 20, 2013 13 / 29
Master
In charge of all filesystem metadata
Namespace, access control information, mapping between files and
chunks, and current locations of chunks
Holds this information in memory and regularly syncs it with a log file
Also in charge of chunk leasing, garbage collection, and chunk
migration
Periodically sends each chunkserver a heartbeat signal to check its
state and send it instructions
Clients interact with it to access metadata but all data-bearing
communication goes directly to the relevant chunkservers
Zubair Nabi 11: Google Filesystem April 20, 2013 13 / 29
Master
In charge of all filesystem metadata
Namespace, access control information, mapping between files and
chunks, and current locations of chunks
Holds this information in memory and regularly syncs it with a log file
Also in charge of chunk leasing, garbage collection, and chunk
migration
Periodically sends each chunkserver a heartbeat signal to check its
state and send it instructions
Clients interact with it to access metadata but all data-bearing
communication goes directly to the relevant chunkservers
As a result, the master does not become a performance bottleneck
Zubair Nabi 11: Google Filesystem April 20, 2013 13 / 29
Zubair Nabi 11: Google Filesystem April 20, 2013 14 / 29
Consistency Model: Master
All namespace mutations (such as file creation) are atomic as they are
exclusively handled by the master
Zubair Nabi 11: Google Filesystem April 20, 2013 15 / 29
Consistency Model: Master
All namespace mutations (such as file creation) are atomic as they are
exclusively handled by the master
Namespace locking guarantees atomicity and correctness
Zubair Nabi 11: Google Filesystem April 20, 2013 15 / 29
Consistency Model: Master
All namespace mutations (such as file creation) are atomic as they are
exclusively handled by the master
Namespace locking guarantees atomicity and correctness
The operation log maintained by the master defines a global total order
of these operations
Zubair Nabi 11: Google Filesystem April 20, 2013 15 / 29
Consistency Model: Data
The state after mutation depends on:
Mutation type: write or append
Zubair Nabi 11: Google Filesystem April 20, 2013 16 / 29
Consistency Model: Data
The state after mutation depends on:
Mutation type: write or append
Whether it succeeds or fails
Zubair Nabi 11: Google Filesystem April 20, 2013 16 / 29
Consistency Model: Data
The state after mutation depends on:
Mutation type: write or append
Whether it succeeds or fails
Whether there are other concurrent mutations
Zubair Nabi 11: Google Filesystem April 20, 2013 16 / 29
Consistency Model: Data
The state after mutation depends on:
Mutation type: write or append
Whether it succeeds or fails
Whether there are other concurrent mutations
A file region is consistent if all clients see the same data, regardless
of the replica
Zubair Nabi 11: Google Filesystem April 20, 2013 16 / 29
Consistency Model: Data
The state after mutation depends on:
Mutation type: write or append
Whether it succeeds or fails
Whether there are other concurrent mutations
A file region is consistent if all clients see the same data, regardless
of the replica
A region is defined after a mutation if it is still consistent and clients
see the mutation in its entirety
Zubair Nabi 11: Google Filesystem April 20, 2013 16 / 29
Consistency Model: Data (2)
If there are no other concurrent writers, the region is defined and
consistent
Zubair Nabi 11: Google Filesystem April 20, 2013 17 / 29
Consistency Model: Data (2)
If there are no other concurrent writers, the region is defined and
consistent
Concurrent and successful mutations leave the region undefined but
consistent
Zubair Nabi 11: Google Filesystem April 20, 2013 17 / 29
Consistency Model: Data (2)
If there are no other concurrent writers, the region is defined and
consistent
Concurrent and successful mutations leave the region undefined but
consistent
Mingled fragments from multiple mutations
Zubair Nabi 11: Google Filesystem April 20, 2013 17 / 29
Consistency Model: Data (2)
If there are no other concurrent writers, the region is defined and
consistent
Concurrent and successful mutations leave the region undefined but
consistent
Mingled fragments from multiple mutations
A failed mutation makes the region both inconsistent and undefined
Zubair Nabi 11: Google Filesystem April 20, 2013 17 / 29
Mutation Operations
Each chunk has many replicas
Zubair Nabi 11: Google Filesystem April 20, 2013 18 / 29
Mutation Operations
Each chunk has many replicas
The primary replica holds a lease from the master
Zubair Nabi 11: Google Filesystem April 20, 2013 18 / 29
Mutation Operations
Each chunk has many replicas
The primary replica holds a lease from the master
It decides the order of all mutations for all replicas
Zubair Nabi 11: Google Filesystem April 20, 2013 18 / 29
Write Operation
Client obtains the location of replicas and the identity of the primary
replica from the master
Zubair Nabi 11: Google Filesystem April 20, 2013 19 / 29
Write Operation
Client obtains the location of replicas and the identity of the primary
replica from the master
It then pushes the data to all replica nodes
Zubair Nabi 11: Google Filesystem April 20, 2013 19 / 29
Write Operation
Client obtains the location of replicas and the identity of the primary
replica from the master
It then pushes the data to all replica nodes
The client issues an update request to primary
Zubair Nabi 11: Google Filesystem April 20, 2013 19 / 29
Write Operation
Client obtains the location of replicas and the identity of the primary
replica from the master
It then pushes the data to all replica nodes
The client issues an update request to primary
Primary forwards the write request to all replicas
Zubair Nabi 11: Google Filesystem April 20, 2013 19 / 29
Write Operation
Client obtains the location of replicas and the identity of the primary
replica from the master
It then pushes the data to all replica nodes
The client issues an update request to primary
Primary forwards the write request to all replicas
It waits for a reply from all replicas before returning to the client
Zubair Nabi 11: Google Filesystem April 20, 2013 19 / 29
Record Append Operation
Performed atomically
Zubair Nabi 11: Google Filesystem April 20, 2013 20 / 29
Record Append Operation
Performed atomically
Append location chosen by the GFS and communicated to the client
Zubair Nabi 11: Google Filesystem April 20, 2013 20 / 29
Record Append Operation
Performed atomically
Append location chosen by the GFS and communicated to the client
Primary forwards the write request to all replicas
Zubair Nabi 11: Google Filesystem April 20, 2013 20 / 29
Record Append Operation
Performed atomically
Append location chosen by the GFS and communicated to the client
Primary forwards the write request to all replicas
It waits for a reply from all replicas before returning to the client
Zubair Nabi 11: Google Filesystem April 20, 2013 20 / 29
Record Append Operation
Performed atomically
Append location chosen by the GFS and communicated to the client
Primary forwards the write request to all replicas
It waits for a reply from all replicas before returning to the client
1 If the records fits in the current chunk, it is written and communicated to
the client
Zubair Nabi 11: Google Filesystem April 20, 2013 20 / 29
Record Append Operation
Performed atomically
Append location chosen by the GFS and communicated to the client
Primary forwards the write request to all replicas
It waits for a reply from all replicas before returning to the client
1 If the records fits in the current chunk, it is written and communicated to
the client
2 If it does not, the chunk is padded and the client is told to try the next
chunk
Zubair Nabi 11: Google Filesystem April 20, 2013 20 / 29
Zubair Nabi 11: Google Filesystem April 20, 2013 21 / 29
Application Safeguards
Use record append rather than write
Zubair Nabi 11: Google Filesystem April 20, 2013 22 / 29
Application Safeguards
Use record append rather than write
Insert checksums in record headers to detect fragments
Zubair Nabi 11: Google Filesystem April 20, 2013 22 / 29
Application Safeguards
Use record append rather than write
Insert checksums in record headers to detect fragments
Insert sequence numbers to detect duplicates
Zubair Nabi 11: Google Filesystem April 20, 2013 22 / 29
Chunk Placement
Put on chunkservers with below average disk space usage
Zubair Nabi 11: Google Filesystem April 20, 2013 23 / 29
Chunk Placement
Put on chunkservers with below average disk space usage
Limit number of “recent” creations on a chunkserver, to ensure that it
does not experience any traffic spike due to its fresh data
Zubair Nabi 11: Google Filesystem April 20, 2013 23 / 29
Chunk Placement
Put on chunkservers with below average disk space usage
Limit number of “recent” creations on a chunkserver, to ensure that it
does not experience any traffic spike due to its fresh data
For reliability, replicas spread across racks
Zubair Nabi 11: Google Filesystem April 20, 2013 23 / 29
Garbage Collection
Chunks become garbage when they are orphaned
Zubair Nabi 11: Google Filesystem April 20, 2013 24 / 29
Garbage Collection
Chunks become garbage when they are orphaned
A lazy reclamation strategy is used by not reclaiming chunks at delete
time
Zubair Nabi 11: Google Filesystem April 20, 2013 24 / 29
Garbage Collection
Chunks become garbage when they are orphaned
A lazy reclamation strategy is used by not reclaiming chunks at delete
time
Each chunkserver communicates the subset of its current chunks to
the master in the heartbeat signal
Zubair Nabi 11: Google Filesystem April 20, 2013 24 / 29
Garbage Collection
Chunks become garbage when they are orphaned
A lazy reclamation strategy is used by not reclaiming chunks at delete
time
Each chunkserver communicates the subset of its current chunks to
the master in the heartbeat signal
Master pinpoints chunks which have been orphaned
Zubair Nabi 11: Google Filesystem April 20, 2013 24 / 29
Garbage Collection
Chunks become garbage when they are orphaned
A lazy reclamation strategy is used by not reclaiming chunks at delete
time
Each chunkserver communicates the subset of its current chunks to
the master in the heartbeat signal
Master pinpoints chunks which have been orphaned
The chunkserver finally reclaims that space
Zubair Nabi 11: Google Filesystem April 20, 2013 24 / 29
Stale Replica Detection
Each chunk is assigned a version number
Zubair Nabi 11: Google Filesystem April 20, 2013 25 / 29
Stale Replica Detection
Each chunk is assigned a version number
Each time a new lease is granted, the version number is incremented
Zubair Nabi 11: Google Filesystem April 20, 2013 25 / 29
Stale Replica Detection
Each chunk is assigned a version number
Each time a new lease is granted, the version number is incremented
Stale replicas will have outdated version numbers
Zubair Nabi 11: Google Filesystem April 20, 2013 25 / 29
Stale Replica Detection
Each chunk is assigned a version number
Each time a new lease is granted, the version number is incremented
Stale replicas will have outdated version numbers
They are simply garbage collected
Zubair Nabi 11: Google Filesystem April 20, 2013 25 / 29
Outline
1 Introduction
2 Google Filesystem
3 Hadoop Distributed Filesystem
Zubair Nabi 11: Google Filesystem April 20, 2013 26 / 29
Introduction
Open-source clone of GFS
Zubair Nabi 11: Google Filesystem April 20, 2013 27 / 29
Introduction
Open-source clone of GFS
Comes packaged with Hadoop
Zubair Nabi 11: Google Filesystem April 20, 2013 27 / 29
Introduction
Open-source clone of GFS
Comes packaged with Hadoop
Master is called the NameNode and chunkservers are called
DataNodes
Zubair Nabi 11: Google Filesystem April 20, 2013 27 / 29
Introduction
Open-source clone of GFS
Comes packaged with Hadoop
Master is called the NameNode and chunkservers are called
DataNodes
Chunks are known as blocks
Zubair Nabi 11: Google Filesystem April 20, 2013 27 / 29
Introduction
Open-source clone of GFS
Comes packaged with Hadoop
Master is called the NameNode and chunkservers are called
DataNodes
Chunks are known as blocks
Exposes a Java API and a command-line interface
Zubair Nabi 11: Google Filesystem April 20, 2013 27 / 29
Command-line API
Accessible through: bin/hdfs dfs -command args
1
http:
//hadoop.apache.org/docs/r1.0.4/file_system_shell.html
Zubair Nabi 11: Google Filesystem April 20, 2013 28 / 29
Command-line API
Accessible through: bin/hdfs dfs -command args
Useful commands: cat, copyFromLocal, copyToLocal, cp,
ls, mkdir, moveFromLocal, moveToLocal, mv, rm, etc.1
1
http:
//hadoop.apache.org/docs/r1.0.4/file_system_shell.html
Zubair Nabi 11: Google Filesystem April 20, 2013 28 / 29
References
1 Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung. 2003. The
Google file system. In Proceedings of the nineteenth ACM symposium
on Operating systems principles (SOSP ’03). ACM, New York, NY,
USA, 29-43.
Zubair Nabi 11: Google Filesystem April 20, 2013 29 / 29

More Related Content

Similar to Topic 11: Google Filesystem

Ict google drive report
Ict google drive reportIct google drive report
Ict google drive reportSEANROMMEL
 
OSS 2020 Using SOLR as Open-Source Search Platform.pdf
OSS 2020 Using SOLR as Open-Source Search Platform.pdfOSS 2020 Using SOLR as Open-Source Search Platform.pdf
OSS 2020 Using SOLR as Open-Source Search Platform.pdfGan Keng Hoon
 
Cloud File System with GFS and HDFS
Cloud File System with GFS and HDFS  Cloud File System with GFS and HDFS
Cloud File System with GFS and HDFS Dr Neelesh Jain
 
Google file system GFS
Google file system GFSGoogle file system GFS
Google file system GFSzihad164
 
Topic 13: Cloud Stacks
Topic 13: Cloud StacksTopic 13: Cloud Stacks
Topic 13: Cloud StacksZubair Nabi
 
ch11.ppt
ch11.pptch11.ppt
ch11.pptnikky58
 
Webinar file-director-unencrypted
Webinar file-director-unencryptedWebinar file-director-unencrypted
Webinar file-director-unencryptedIvanti
 
Best Practices for Using Alluxio with Spark
Best Practices for Using Alluxio with SparkBest Practices for Using Alluxio with Spark
Best Practices for Using Alluxio with SparkAlluxio, Inc.
 
State of the_gluster_-_lceu
State of the_gluster_-_lceuState of the_gluster_-_lceu
State of the_gluster_-_lceuGluster.org
 
Spark Pipelines in the Cloud with Alluxio by Bin Fan
Spark Pipelines in the Cloud with Alluxio by Bin FanSpark Pipelines in the Cloud with Alluxio by Bin Fan
Spark Pipelines in the Cloud with Alluxio by Bin FanData Con LA
 
Introduction to linux at Introductory Bioinformatics Workshop
Introduction to linux at Introductory Bioinformatics WorkshopIntroduction to linux at Introductory Bioinformatics Workshop
Introduction to linux at Introductory Bioinformatics WorkshopSetor Amuzu
 
Popularity of Open source databases
Popularity of Open source databasesPopularity of Open source databases
Popularity of Open source databasesGlobalLogic, Inc.
 
Best Practices for Using Alluxio with Spark
Best Practices for Using Alluxio with SparkBest Practices for Using Alluxio with Spark
Best Practices for Using Alluxio with SparkAlluxio, Inc.
 
Google File System
Google File SystemGoogle File System
Google File Systemvivatechijri
 
Topic 14: Operating Systems and Virtualization
Topic 14: Operating Systems and VirtualizationTopic 14: Operating Systems and Virtualization
Topic 14: Operating Systems and VirtualizationZubair Nabi
 

Similar to Topic 11: Google Filesystem (20)

Ict google drive report
Ict google drive reportIct google drive report
Ict google drive report
 
NetApp & Storage fundamentals
NetApp & Storage fundamentalsNetApp & Storage fundamentals
NetApp & Storage fundamentals
 
HDF Status and Development
HDF Status and DevelopmentHDF Status and Development
HDF Status and Development
 
OSS 2020 Using SOLR as Open-Source Search Platform.pdf
OSS 2020 Using SOLR as Open-Source Search Platform.pdfOSS 2020 Using SOLR as Open-Source Search Platform.pdf
OSS 2020 Using SOLR as Open-Source Search Platform.pdf
 
Cloud File System with GFS and HDFS
Cloud File System with GFS and HDFS  Cloud File System with GFS and HDFS
Cloud File System with GFS and HDFS
 
Google file system GFS
Google file system GFSGoogle file system GFS
Google file system GFS
 
Topic 13: Cloud Stacks
Topic 13: Cloud StacksTopic 13: Cloud Stacks
Topic 13: Cloud Stacks
 
ch11.ppt
ch11.pptch11.ppt
ch11.ppt
 
Webinar file-director-unencrypted
Webinar file-director-unencryptedWebinar file-director-unencrypted
Webinar file-director-unencrypted
 
Best Practices for Using Alluxio with Spark
Best Practices for Using Alluxio with SparkBest Practices for Using Alluxio with Spark
Best Practices for Using Alluxio with Spark
 
State of the_gluster_-_lceu
State of the_gluster_-_lceuState of the_gluster_-_lceu
State of the_gluster_-_lceu
 
Spark Pipelines in the Cloud with Alluxio by Bin Fan
Spark Pipelines in the Cloud with Alluxio by Bin FanSpark Pipelines in the Cloud with Alluxio by Bin Fan
Spark Pipelines in the Cloud with Alluxio by Bin Fan
 
linuxtl05.pptx
linuxtl05.pptxlinuxtl05.pptx
linuxtl05.pptx
 
Introduction to linux at Introductory Bioinformatics Workshop
Introduction to linux at Introductory Bioinformatics WorkshopIntroduction to linux at Introductory Bioinformatics Workshop
Introduction to linux at Introductory Bioinformatics Workshop
 
DMS
DMSDMS
DMS
 
Popularity of Open source databases
Popularity of Open source databasesPopularity of Open source databases
Popularity of Open source databases
 
Best Practices for Using Alluxio with Spark
Best Practices for Using Alluxio with SparkBest Practices for Using Alluxio with Spark
Best Practices for Using Alluxio with Spark
 
File_mngtChap6.pdf
File_mngtChap6.pdfFile_mngtChap6.pdf
File_mngtChap6.pdf
 
Google File System
Google File SystemGoogle File System
Google File System
 
Topic 14: Operating Systems and Virtualization
Topic 14: Operating Systems and VirtualizationTopic 14: Operating Systems and Virtualization
Topic 14: Operating Systems and Virtualization
 

More from Zubair Nabi

AOS Lab 12: Network Communication
AOS Lab 12: Network CommunicationAOS Lab 12: Network Communication
AOS Lab 12: Network CommunicationZubair Nabi
 
AOS Lab 11: Virtualization
AOS Lab 11: VirtualizationAOS Lab 11: Virtualization
AOS Lab 11: VirtualizationZubair Nabi
 
AOS Lab 10: File system -- Inodes and beyond
AOS Lab 10: File system -- Inodes and beyondAOS Lab 10: File system -- Inodes and beyond
AOS Lab 10: File system -- Inodes and beyondZubair Nabi
 
AOS Lab 9: File system -- Of buffers, logs, and blocks
AOS Lab 9: File system -- Of buffers, logs, and blocksAOS Lab 9: File system -- Of buffers, logs, and blocks
AOS Lab 9: File system -- Of buffers, logs, and blocksZubair Nabi
 
AOS Lab 8: Interrupts and Device Drivers
AOS Lab 8: Interrupts and Device DriversAOS Lab 8: Interrupts and Device Drivers
AOS Lab 8: Interrupts and Device DriversZubair Nabi
 
AOS Lab 7: Page tables
AOS Lab 7: Page tablesAOS Lab 7: Page tables
AOS Lab 7: Page tablesZubair Nabi
 
AOS Lab 6: Scheduling
AOS Lab 6: SchedulingAOS Lab 6: Scheduling
AOS Lab 6: SchedulingZubair Nabi
 
AOS Lab 5: System calls
AOS Lab 5: System callsAOS Lab 5: System calls
AOS Lab 5: System callsZubair Nabi
 
AOS Lab 4: If you liked it, then you should have put a “lock” on it
AOS Lab 4: If you liked it, then you should have put a “lock” on itAOS Lab 4: If you liked it, then you should have put a “lock” on it
AOS Lab 4: If you liked it, then you should have put a “lock” on itZubair Nabi
 
AOS Lab 1: Hello, Linux!
AOS Lab 1: Hello, Linux!AOS Lab 1: Hello, Linux!
AOS Lab 1: Hello, Linux!Zubair Nabi
 
AOS Lab 2: Hello, xv6!
AOS Lab 2: Hello, xv6!AOS Lab 2: Hello, xv6!
AOS Lab 2: Hello, xv6!Zubair Nabi
 
AOS Lab 1: Hello, Linux!
AOS Lab 1: Hello, Linux!AOS Lab 1: Hello, Linux!
AOS Lab 1: Hello, Linux!Zubair Nabi
 
The Big Data Stack
The Big Data StackThe Big Data Stack
The Big Data StackZubair Nabi
 
Raabta: Low-cost Video Conferencing for the Developing World
Raabta: Low-cost Video Conferencing for the Developing WorldRaabta: Low-cost Video Conferencing for the Developing World
Raabta: Low-cost Video Conferencing for the Developing WorldZubair Nabi
 
The Anatomy of Web Censorship in Pakistan
The Anatomy of Web Censorship in PakistanThe Anatomy of Web Censorship in Pakistan
The Anatomy of Web Censorship in PakistanZubair Nabi
 
MapReduce and DBMS Hybrids
MapReduce and DBMS HybridsMapReduce and DBMS Hybrids
MapReduce and DBMS HybridsZubair Nabi
 
MapReduce Application Scripting
MapReduce Application ScriptingMapReduce Application Scripting
MapReduce Application ScriptingZubair Nabi
 
Topic 15: Datacenter Design and Networking
Topic 15: Datacenter Design and NetworkingTopic 15: Datacenter Design and Networking
Topic 15: Datacenter Design and NetworkingZubair Nabi
 
Lab 5: Interconnecting a Datacenter using Mininet
Lab 5: Interconnecting a Datacenter using MininetLab 5: Interconnecting a Datacenter using Mininet
Lab 5: Interconnecting a Datacenter using MininetZubair Nabi
 
Lab 4: Interfacing with Cassandra
Lab 4: Interfacing with CassandraLab 4: Interfacing with Cassandra
Lab 4: Interfacing with CassandraZubair Nabi
 

More from Zubair Nabi (20)

AOS Lab 12: Network Communication
AOS Lab 12: Network CommunicationAOS Lab 12: Network Communication
AOS Lab 12: Network Communication
 
AOS Lab 11: Virtualization
AOS Lab 11: VirtualizationAOS Lab 11: Virtualization
AOS Lab 11: Virtualization
 
AOS Lab 10: File system -- Inodes and beyond
AOS Lab 10: File system -- Inodes and beyondAOS Lab 10: File system -- Inodes and beyond
AOS Lab 10: File system -- Inodes and beyond
 
AOS Lab 9: File system -- Of buffers, logs, and blocks
AOS Lab 9: File system -- Of buffers, logs, and blocksAOS Lab 9: File system -- Of buffers, logs, and blocks
AOS Lab 9: File system -- Of buffers, logs, and blocks
 
AOS Lab 8: Interrupts and Device Drivers
AOS Lab 8: Interrupts and Device DriversAOS Lab 8: Interrupts and Device Drivers
AOS Lab 8: Interrupts and Device Drivers
 
AOS Lab 7: Page tables
AOS Lab 7: Page tablesAOS Lab 7: Page tables
AOS Lab 7: Page tables
 
AOS Lab 6: Scheduling
AOS Lab 6: SchedulingAOS Lab 6: Scheduling
AOS Lab 6: Scheduling
 
AOS Lab 5: System calls
AOS Lab 5: System callsAOS Lab 5: System calls
AOS Lab 5: System calls
 
AOS Lab 4: If you liked it, then you should have put a “lock” on it
AOS Lab 4: If you liked it, then you should have put a “lock” on itAOS Lab 4: If you liked it, then you should have put a “lock” on it
AOS Lab 4: If you liked it, then you should have put a “lock” on it
 
AOS Lab 1: Hello, Linux!
AOS Lab 1: Hello, Linux!AOS Lab 1: Hello, Linux!
AOS Lab 1: Hello, Linux!
 
AOS Lab 2: Hello, xv6!
AOS Lab 2: Hello, xv6!AOS Lab 2: Hello, xv6!
AOS Lab 2: Hello, xv6!
 
AOS Lab 1: Hello, Linux!
AOS Lab 1: Hello, Linux!AOS Lab 1: Hello, Linux!
AOS Lab 1: Hello, Linux!
 
The Big Data Stack
The Big Data StackThe Big Data Stack
The Big Data Stack
 
Raabta: Low-cost Video Conferencing for the Developing World
Raabta: Low-cost Video Conferencing for the Developing WorldRaabta: Low-cost Video Conferencing for the Developing World
Raabta: Low-cost Video Conferencing for the Developing World
 
The Anatomy of Web Censorship in Pakistan
The Anatomy of Web Censorship in PakistanThe Anatomy of Web Censorship in Pakistan
The Anatomy of Web Censorship in Pakistan
 
MapReduce and DBMS Hybrids
MapReduce and DBMS HybridsMapReduce and DBMS Hybrids
MapReduce and DBMS Hybrids
 
MapReduce Application Scripting
MapReduce Application ScriptingMapReduce Application Scripting
MapReduce Application Scripting
 
Topic 15: Datacenter Design and Networking
Topic 15: Datacenter Design and NetworkingTopic 15: Datacenter Design and Networking
Topic 15: Datacenter Design and Networking
 
Lab 5: Interconnecting a Datacenter using Mininet
Lab 5: Interconnecting a Datacenter using MininetLab 5: Interconnecting a Datacenter using Mininet
Lab 5: Interconnecting a Datacenter using Mininet
 
Lab 4: Interfacing with Cassandra
Lab 4: Interfacing with CassandraLab 4: Interfacing with Cassandra
Lab 4: Interfacing with Cassandra
 

Recently uploaded

Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 

Recently uploaded (20)

Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 

Topic 11: Google Filesystem

  • 1. 11: Google Filesystem Zubair Nabi zubair.nabi@itu.edu.pk April 20, 2013 Zubair Nabi 11: Google Filesystem April 20, 2013 1 / 29
  • 2. Outline 1 Introduction 2 Google Filesystem 3 Hadoop Distributed Filesystem Zubair Nabi 11: Google Filesystem April 20, 2013 2 / 29
  • 3. Outline 1 Introduction 2 Google Filesystem 3 Hadoop Distributed Filesystem Zubair Nabi 11: Google Filesystem April 20, 2013 3 / 29
  • 4. Filesystem The purpose of a filesystem is to: 1 Organize and store data Zubair Nabi 11: Google Filesystem April 20, 2013 4 / 29
  • 5. Filesystem The purpose of a filesystem is to: 1 Organize and store data 2 Support sharing of data among users and applications Zubair Nabi 11: Google Filesystem April 20, 2013 4 / 29
  • 6. Filesystem The purpose of a filesystem is to: 1 Organize and store data 2 Support sharing of data among users and applications 3 Ensure persistence of data after a reboot Zubair Nabi 11: Google Filesystem April 20, 2013 4 / 29
  • 7. Filesystem The purpose of a filesystem is to: 1 Organize and store data 2 Support sharing of data among users and applications 3 Ensure persistence of data after a reboot 4 Examples include FAT, NTFS, ext3, ext4, etc. Zubair Nabi 11: Google Filesystem April 20, 2013 4 / 29
  • 8. Distributed filesystem Self-explanatory: the filesystem is distributed across many machines Zubair Nabi 11: Google Filesystem April 20, 2013 5 / 29
  • 9. Distributed filesystem Self-explanatory: the filesystem is distributed across many machines The DFS provides a common abstraction to the dispersed files Zubair Nabi 11: Google Filesystem April 20, 2013 5 / 29
  • 10. Distributed filesystem Self-explanatory: the filesystem is distributed across many machines The DFS provides a common abstraction to the dispersed files Each DFS has an associated API that provides a service to clients, which are normal file operations, such as create, read, write, etc. Zubair Nabi 11: Google Filesystem April 20, 2013 5 / 29
  • 11. Distributed filesystem Self-explanatory: the filesystem is distributed across many machines The DFS provides a common abstraction to the dispersed files Each DFS has an associated API that provides a service to clients, which are normal file operations, such as create, read, write, etc. Maintains a namespace which maps logical names to physical names Zubair Nabi 11: Google Filesystem April 20, 2013 5 / 29
  • 12. Distributed filesystem Self-explanatory: the filesystem is distributed across many machines The DFS provides a common abstraction to the dispersed files Each DFS has an associated API that provides a service to clients, which are normal file operations, such as create, read, write, etc. Maintains a namespace which maps logical names to physical names Simplifies replication and migration Zubair Nabi 11: Google Filesystem April 20, 2013 5 / 29
  • 13. Distributed filesystem Self-explanatory: the filesystem is distributed across many machines The DFS provides a common abstraction to the dispersed files Each DFS has an associated API that provides a service to clients, which are normal file operations, such as create, read, write, etc. Maintains a namespace which maps logical names to physical names Simplifies replication and migration Examples include the Network Filesystem (NFS), Andrew Filesystem (AFS), etc. Zubair Nabi 11: Google Filesystem April 20, 2013 5 / 29
  • 14. Outline 1 Introduction 2 Google Filesystem 3 Hadoop Distributed Filesystem Zubair Nabi 11: Google Filesystem April 20, 2013 6 / 29
  • 15. Introduction Designed by Google to meet its massive storage needs Zubair Nabi 11: Google Filesystem April 20, 2013 7 / 29
  • 16. Introduction Designed by Google to meet its massive storage needs Shares many goals with previous distributed filesystems such as performance, scalability, reliability, and availability Zubair Nabi 11: Google Filesystem April 20, 2013 7 / 29
  • 17. Introduction Designed by Google to meet its massive storage needs Shares many goals with previous distributed filesystems such as performance, scalability, reliability, and availability At the same time, design driven by key observations of their workload and infrastructure, both current and future Zubair Nabi 11: Google Filesystem April 20, 2013 7 / 29
  • 18. Design Goals 1 Failure is the norm rather than the exception: The GFS must constantly introspect and automatically recover from failure Zubair Nabi 11: Google Filesystem April 20, 2013 8 / 29
  • 19. Design Goals 1 Failure is the norm rather than the exception: The GFS must constantly introspect and automatically recover from failure 2 The system stores a fair number of large files: Optimize for large files, on the order of GBs, but still support small files Zubair Nabi 11: Google Filesystem April 20, 2013 8 / 29
  • 20. Design Goals 1 Failure is the norm rather than the exception: The GFS must constantly introspect and automatically recover from failure 2 The system stores a fair number of large files: Optimize for large files, on the order of GBs, but still support small files 3 Applications prefer to do large streaming reads of contiguous regions: Optimize for this case Zubair Nabi 11: Google Filesystem April 20, 2013 8 / 29
  • 21. Design Goals (2) 4 Most applications perform large, sequential writes that are mostly append operations: Support small writes but do not optimize for them Zubair Nabi 11: Google Filesystem April 20, 2013 9 / 29
  • 22. Design Goals (2) 4 Most applications perform large, sequential writes that are mostly append operations: Support small writes but do not optimize for them 5 Most operations are producer-consume queues or many-way merging: Support concurrent reads or writes by hundreds of clients simultaneously Zubair Nabi 11: Google Filesystem April 20, 2013 9 / 29
  • 23. Design Goals (2) 4 Most applications perform large, sequential writes that are mostly append operations: Support small writes but do not optimize for them 5 Most operations are producer-consume queues or many-way merging: Support concurrent reads or writes by hundreds of clients simultaneously 6 Applications process data in bulk at a high rate: Favour throughput over latency Zubair Nabi 11: Google Filesystem April 20, 2013 9 / 29
  • 24. Interface The interface is similar to traditional filesystems but no support for a standard POSIX-like API Zubair Nabi 11: Google Filesystem April 20, 2013 10 / 29
  • 25. Interface The interface is similar to traditional filesystems but no support for a standard POSIX-like API Files are organized hierarchically into directories with pathnames Zubair Nabi 11: Google Filesystem April 20, 2013 10 / 29
  • 26. Interface The interface is similar to traditional filesystems but no support for a standard POSIX-like API Files are organized hierarchically into directories with pathnames Support for create, delete, open, close, read, and write operations Zubair Nabi 11: Google Filesystem April 20, 2013 10 / 29
  • 27. Architecture Consists of a single master and multiple chunkservers Zubair Nabi 11: Google Filesystem April 20, 2013 11 / 29
  • 28. Architecture Consists of a single master and multiple chunkservers The system can be accessed by multiple clients Zubair Nabi 11: Google Filesystem April 20, 2013 11 / 29
  • 29. Architecture Consists of a single master and multiple chunkservers The system can be accessed by multiple clients Both the master and chunkservers run as user-space server processes on commodity Linux machines Zubair Nabi 11: Google Filesystem April 20, 2013 11 / 29
  • 30. Files Files are sliced into fixed-size chunks Zubair Nabi 11: Google Filesystem April 20, 2013 12 / 29
  • 31. Files Files are sliced into fixed-size chunks Each chunk is identifiable by an immutable and globally unique 64-bit handle Zubair Nabi 11: Google Filesystem April 20, 2013 12 / 29
  • 32. Files Files are sliced into fixed-size chunks Each chunk is identifiable by an immutable and globally unique 64-bit handle Chunks are stored by chunkservers as local Linux files Zubair Nabi 11: Google Filesystem April 20, 2013 12 / 29
  • 33. Files Files are sliced into fixed-size chunks Each chunk is identifiable by an immutable and globally unique 64-bit handle Chunks are stored by chunkservers as local Linux files Reads and writes to a chunk are specified by a handle and a byte range Zubair Nabi 11: Google Filesystem April 20, 2013 12 / 29
  • 34. Files Files are sliced into fixed-size chunks Each chunk is identifiable by an immutable and globally unique 64-bit handle Chunks are stored by chunkservers as local Linux files Reads and writes to a chunk are specified by a handle and a byte range Each chunk is replicated on multiple chunkservers Zubair Nabi 11: Google Filesystem April 20, 2013 12 / 29
  • 35. Files Files are sliced into fixed-size chunks Each chunk is identifiable by an immutable and globally unique 64-bit handle Chunks are stored by chunkservers as local Linux files Reads and writes to a chunk are specified by a handle and a byte range Each chunk is replicated on multiple chunkservers 3 by default Zubair Nabi 11: Google Filesystem April 20, 2013 12 / 29
  • 36. Master In charge of all filesystem metadata Zubair Nabi 11: Google Filesystem April 20, 2013 13 / 29
  • 37. Master In charge of all filesystem metadata Namespace, access control information, mapping between files and chunks, and current locations of chunks Zubair Nabi 11: Google Filesystem April 20, 2013 13 / 29
  • 38. Master In charge of all filesystem metadata Namespace, access control information, mapping between files and chunks, and current locations of chunks Holds this information in memory and regularly syncs it with a log file Zubair Nabi 11: Google Filesystem April 20, 2013 13 / 29
  • 39. Master In charge of all filesystem metadata Namespace, access control information, mapping between files and chunks, and current locations of chunks Holds this information in memory and regularly syncs it with a log file Also in charge of chunk leasing, garbage collection, and chunk migration Zubair Nabi 11: Google Filesystem April 20, 2013 13 / 29
  • 40. Master In charge of all filesystem metadata Namespace, access control information, mapping between files and chunks, and current locations of chunks Holds this information in memory and regularly syncs it with a log file Also in charge of chunk leasing, garbage collection, and chunk migration Periodically sends each chunkserver a heartbeat signal to check its state and send it instructions Zubair Nabi 11: Google Filesystem April 20, 2013 13 / 29
  • 41. Master In charge of all filesystem metadata Namespace, access control information, mapping between files and chunks, and current locations of chunks Holds this information in memory and regularly syncs it with a log file Also in charge of chunk leasing, garbage collection, and chunk migration Periodically sends each chunkserver a heartbeat signal to check its state and send it instructions Clients interact with it to access metadata but all data-bearing communication goes directly to the relevant chunkservers Zubair Nabi 11: Google Filesystem April 20, 2013 13 / 29
  • 42. Master In charge of all filesystem metadata Namespace, access control information, mapping between files and chunks, and current locations of chunks Holds this information in memory and regularly syncs it with a log file Also in charge of chunk leasing, garbage collection, and chunk migration Periodically sends each chunkserver a heartbeat signal to check its state and send it instructions Clients interact with it to access metadata but all data-bearing communication goes directly to the relevant chunkservers As a result, the master does not become a performance bottleneck Zubair Nabi 11: Google Filesystem April 20, 2013 13 / 29
  • 43. Zubair Nabi 11: Google Filesystem April 20, 2013 14 / 29
  • 44. Consistency Model: Master All namespace mutations (such as file creation) are atomic as they are exclusively handled by the master Zubair Nabi 11: Google Filesystem April 20, 2013 15 / 29
  • 45. Consistency Model: Master All namespace mutations (such as file creation) are atomic as they are exclusively handled by the master Namespace locking guarantees atomicity and correctness Zubair Nabi 11: Google Filesystem April 20, 2013 15 / 29
  • 46. Consistency Model: Master All namespace mutations (such as file creation) are atomic as they are exclusively handled by the master Namespace locking guarantees atomicity and correctness The operation log maintained by the master defines a global total order of these operations Zubair Nabi 11: Google Filesystem April 20, 2013 15 / 29
  • 47. Consistency Model: Data The state after mutation depends on: Mutation type: write or append Zubair Nabi 11: Google Filesystem April 20, 2013 16 / 29
  • 48. Consistency Model: Data The state after mutation depends on: Mutation type: write or append Whether it succeeds or fails Zubair Nabi 11: Google Filesystem April 20, 2013 16 / 29
  • 49. Consistency Model: Data The state after mutation depends on: Mutation type: write or append Whether it succeeds or fails Whether there are other concurrent mutations Zubair Nabi 11: Google Filesystem April 20, 2013 16 / 29
  • 50. Consistency Model: Data The state after mutation depends on: Mutation type: write or append Whether it succeeds or fails Whether there are other concurrent mutations A file region is consistent if all clients see the same data, regardless of the replica Zubair Nabi 11: Google Filesystem April 20, 2013 16 / 29
  • 51. Consistency Model: Data The state after mutation depends on: Mutation type: write or append Whether it succeeds or fails Whether there are other concurrent mutations A file region is consistent if all clients see the same data, regardless of the replica A region is defined after a mutation if it is still consistent and clients see the mutation in its entirety Zubair Nabi 11: Google Filesystem April 20, 2013 16 / 29
  • 52. Consistency Model: Data (2) If there are no other concurrent writers, the region is defined and consistent Zubair Nabi 11: Google Filesystem April 20, 2013 17 / 29
  • 53. Consistency Model: Data (2) If there are no other concurrent writers, the region is defined and consistent Concurrent and successful mutations leave the region undefined but consistent Zubair Nabi 11: Google Filesystem April 20, 2013 17 / 29
  • 54. Consistency Model: Data (2) If there are no other concurrent writers, the region is defined and consistent Concurrent and successful mutations leave the region undefined but consistent Mingled fragments from multiple mutations Zubair Nabi 11: Google Filesystem April 20, 2013 17 / 29
  • 55. Consistency Model: Data (2) If there are no other concurrent writers, the region is defined and consistent Concurrent and successful mutations leave the region undefined but consistent Mingled fragments from multiple mutations A failed mutation makes the region both inconsistent and undefined Zubair Nabi 11: Google Filesystem April 20, 2013 17 / 29
  • 56. Mutation Operations Each chunk has many replicas Zubair Nabi 11: Google Filesystem April 20, 2013 18 / 29
  • 57. Mutation Operations Each chunk has many replicas The primary replica holds a lease from the master Zubair Nabi 11: Google Filesystem April 20, 2013 18 / 29
  • 58. Mutation Operations Each chunk has many replicas The primary replica holds a lease from the master It decides the order of all mutations for all replicas Zubair Nabi 11: Google Filesystem April 20, 2013 18 / 29
  • 59. Write Operation Client obtains the location of replicas and the identity of the primary replica from the master Zubair Nabi 11: Google Filesystem April 20, 2013 19 / 29
  • 60. Write Operation Client obtains the location of replicas and the identity of the primary replica from the master It then pushes the data to all replica nodes Zubair Nabi 11: Google Filesystem April 20, 2013 19 / 29
  • 61. Write Operation Client obtains the location of replicas and the identity of the primary replica from the master It then pushes the data to all replica nodes The client issues an update request to primary Zubair Nabi 11: Google Filesystem April 20, 2013 19 / 29
  • 62. Write Operation Client obtains the location of replicas and the identity of the primary replica from the master It then pushes the data to all replica nodes The client issues an update request to primary Primary forwards the write request to all replicas Zubair Nabi 11: Google Filesystem April 20, 2013 19 / 29
  • 63. Write Operation Client obtains the location of replicas and the identity of the primary replica from the master It then pushes the data to all replica nodes The client issues an update request to primary Primary forwards the write request to all replicas It waits for a reply from all replicas before returning to the client Zubair Nabi 11: Google Filesystem April 20, 2013 19 / 29
  • 64. Record Append Operation Performed atomically Zubair Nabi 11: Google Filesystem April 20, 2013 20 / 29
  • 65. Record Append Operation Performed atomically Append location chosen by the GFS and communicated to the client Zubair Nabi 11: Google Filesystem April 20, 2013 20 / 29
  • 66. Record Append Operation Performed atomically Append location chosen by the GFS and communicated to the client Primary forwards the write request to all replicas Zubair Nabi 11: Google Filesystem April 20, 2013 20 / 29
  • 67. Record Append Operation Performed atomically Append location chosen by the GFS and communicated to the client Primary forwards the write request to all replicas It waits for a reply from all replicas before returning to the client Zubair Nabi 11: Google Filesystem April 20, 2013 20 / 29
  • 68. Record Append Operation Performed atomically Append location chosen by the GFS and communicated to the client Primary forwards the write request to all replicas It waits for a reply from all replicas before returning to the client 1 If the records fits in the current chunk, it is written and communicated to the client Zubair Nabi 11: Google Filesystem April 20, 2013 20 / 29
  • 69. Record Append Operation Performed atomically Append location chosen by the GFS and communicated to the client Primary forwards the write request to all replicas It waits for a reply from all replicas before returning to the client 1 If the records fits in the current chunk, it is written and communicated to the client 2 If it does not, the chunk is padded and the client is told to try the next chunk Zubair Nabi 11: Google Filesystem April 20, 2013 20 / 29
  • 70. Zubair Nabi 11: Google Filesystem April 20, 2013 21 / 29
  • 71. Application Safeguards Use record append rather than write Zubair Nabi 11: Google Filesystem April 20, 2013 22 / 29
  • 72. Application Safeguards Use record append rather than write Insert checksums in record headers to detect fragments Zubair Nabi 11: Google Filesystem April 20, 2013 22 / 29
  • 73. Application Safeguards Use record append rather than write Insert checksums in record headers to detect fragments Insert sequence numbers to detect duplicates Zubair Nabi 11: Google Filesystem April 20, 2013 22 / 29
  • 74. Chunk Placement Put on chunkservers with below average disk space usage Zubair Nabi 11: Google Filesystem April 20, 2013 23 / 29
  • 75. Chunk Placement Put on chunkservers with below average disk space usage Limit number of “recent” creations on a chunkserver, to ensure that it does not experience any traffic spike due to its fresh data Zubair Nabi 11: Google Filesystem April 20, 2013 23 / 29
  • 76. Chunk Placement Put on chunkservers with below average disk space usage Limit number of “recent” creations on a chunkserver, to ensure that it does not experience any traffic spike due to its fresh data For reliability, replicas spread across racks Zubair Nabi 11: Google Filesystem April 20, 2013 23 / 29
  • 77. Garbage Collection Chunks become garbage when they are orphaned Zubair Nabi 11: Google Filesystem April 20, 2013 24 / 29
  • 78. Garbage Collection Chunks become garbage when they are orphaned A lazy reclamation strategy is used by not reclaiming chunks at delete time Zubair Nabi 11: Google Filesystem April 20, 2013 24 / 29
  • 79. Garbage Collection Chunks become garbage when they are orphaned A lazy reclamation strategy is used by not reclaiming chunks at delete time Each chunkserver communicates the subset of its current chunks to the master in the heartbeat signal Zubair Nabi 11: Google Filesystem April 20, 2013 24 / 29
  • 80. Garbage Collection Chunks become garbage when they are orphaned A lazy reclamation strategy is used by not reclaiming chunks at delete time Each chunkserver communicates the subset of its current chunks to the master in the heartbeat signal Master pinpoints chunks which have been orphaned Zubair Nabi 11: Google Filesystem April 20, 2013 24 / 29
  • 81. Garbage Collection Chunks become garbage when they are orphaned A lazy reclamation strategy is used by not reclaiming chunks at delete time Each chunkserver communicates the subset of its current chunks to the master in the heartbeat signal Master pinpoints chunks which have been orphaned The chunkserver finally reclaims that space Zubair Nabi 11: Google Filesystem April 20, 2013 24 / 29
  • 82. Stale Replica Detection Each chunk is assigned a version number Zubair Nabi 11: Google Filesystem April 20, 2013 25 / 29
  • 83. Stale Replica Detection Each chunk is assigned a version number Each time a new lease is granted, the version number is incremented Zubair Nabi 11: Google Filesystem April 20, 2013 25 / 29
  • 84. Stale Replica Detection Each chunk is assigned a version number Each time a new lease is granted, the version number is incremented Stale replicas will have outdated version numbers Zubair Nabi 11: Google Filesystem April 20, 2013 25 / 29
  • 85. Stale Replica Detection Each chunk is assigned a version number Each time a new lease is granted, the version number is incremented Stale replicas will have outdated version numbers They are simply garbage collected Zubair Nabi 11: Google Filesystem April 20, 2013 25 / 29
  • 86. Outline 1 Introduction 2 Google Filesystem 3 Hadoop Distributed Filesystem Zubair Nabi 11: Google Filesystem April 20, 2013 26 / 29
  • 87. Introduction Open-source clone of GFS Zubair Nabi 11: Google Filesystem April 20, 2013 27 / 29
  • 88. Introduction Open-source clone of GFS Comes packaged with Hadoop Zubair Nabi 11: Google Filesystem April 20, 2013 27 / 29
  • 89. Introduction Open-source clone of GFS Comes packaged with Hadoop Master is called the NameNode and chunkservers are called DataNodes Zubair Nabi 11: Google Filesystem April 20, 2013 27 / 29
  • 90. Introduction Open-source clone of GFS Comes packaged with Hadoop Master is called the NameNode and chunkservers are called DataNodes Chunks are known as blocks Zubair Nabi 11: Google Filesystem April 20, 2013 27 / 29
  • 91. Introduction Open-source clone of GFS Comes packaged with Hadoop Master is called the NameNode and chunkservers are called DataNodes Chunks are known as blocks Exposes a Java API and a command-line interface Zubair Nabi 11: Google Filesystem April 20, 2013 27 / 29
  • 92. Command-line API Accessible through: bin/hdfs dfs -command args 1 http: //hadoop.apache.org/docs/r1.0.4/file_system_shell.html Zubair Nabi 11: Google Filesystem April 20, 2013 28 / 29
  • 93. Command-line API Accessible through: bin/hdfs dfs -command args Useful commands: cat, copyFromLocal, copyToLocal, cp, ls, mkdir, moveFromLocal, moveToLocal, mv, rm, etc.1 1 http: //hadoop.apache.org/docs/r1.0.4/file_system_shell.html Zubair Nabi 11: Google Filesystem April 20, 2013 28 / 29
  • 94. References 1 Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung. 2003. The Google file system. In Proceedings of the nineteenth ACM symposium on Operating systems principles (SOSP ’03). ACM, New York, NY, USA, 29-43. Zubair Nabi 11: Google Filesystem April 20, 2013 29 / 29