SlideShare una empresa de Scribd logo
1 de 50
Descargar para leer sin conexión
Case study: ext2 FS




                      1
The ext2 file system
• Second Extended Filesystem
  – The main Linux FS before ext3
  – Evolved from Minix filesystem (via “Extended Filesystem”)
• Features
  – Block size (1024, 2048, and 4096) configured at FS creation
  – inode-based FS
  – Performance optimisations to improve locality (from BSD
    FFS)
• Main Problem: unclean unmount e2fsck
  – Ext3fs keeps a journal of (meta-data) updates
  – Journal is a file where updates are logged
  – Compatible with ext2fs
Recap: i-nodes
• Each file is represented by an inode on disk
• Inode contains all of a file’s metadata
   – Access rights, owner,accounting info
   – (partial) block index table of a file
• Each inode has a unique number
   – System oriented name
   – Try ‘ls –i’ on Unix (Linux)
• Directories map file names to inode numbers
   – Map human-oriented to system-oriented names




                                                   3
Recap: i-nodes
mode
      uid
      gid
                   Ext2 i-nodes
     atime
     ctime               • Mode
     mtime                 – Type
      size                       • Regular file or directory
  block count              – Access mode
reference count                  • rwxrwxrwx
 direct blocks           • Uid
     (12)                  – User ID
single indirect
                         • Gid
double indirect
 triple indirect           – Group ID


                                                         5
mode
      uid
      gid
                   Inode Contents
     atime
     ctime                • atime
     mtime                  – Time of last access
      size                • ctime
  block count
                            – Time when file was
reference count
                              created
 direct blocks
     (12)
                          • mtime
                            – Time when file was
single indirect
                              last modified
double indirect
 triple indirect

                                                    6
mode
      uid
      gid
                   Inode Contents
                          •   Size
     atime                     – Size of the file in bytes
     ctime                •   Block count
     mtime                     – Number of disk blocks used by
                                 the file.
      size                •   Note that number of blocks
  block count                 can be much less than
reference count               expected given the file size
                               – Files can be sparsely
 direct blocks                   populated
                                     • E.g. write(f,“hello”); lseek(f,
     (12)                              1000000); write(f, “world”);
single indirect                      • Only needs to store the start
                                       an end of file, not all the
double indirect                        empty blocks in between.
                                          – Size = 1000005
 triple indirect                          – Blocks = 2 + overheads



                                                                     7
mode
      uid
      gid
                      Inode Direct Blocks
                          •
                            Contents                                       File
     atime                   – Block numbers of first 12                   11
     ctime                     blocks in the file                          10
                             – Most files are small                        9
     mtime
                                   • We can find blocks of file
      size                           directly from the inode
                                                                           8
  block count                                                              7
                               0              10                      7    6
reference count                3          8          4                     5
 direct blocks (12)
   40,58,26,8,12,                                                     11   4
44,62,30,10,42,3,21                       2                       7        3
single indirect                                                            2
double indirect                0          9          5                     1
 triple indirect                                                           0
                              56          1                       6 63
                                                                           8
                                               Disk
Problem
• How do we store files greater than 12
  blocks in size?
  – Adding significantly more direct entries in the
    inode results in many unused entries most of
    the time.




                                                  9
mode                                                                17
      uid
      gid
                      Inode Contents                                     16
                                                                         15
                                  •   Single Indirect Block
     atime                                                               14
                                       – Block number of a block
     ctime                               containing block numbers        13
     mtime                                                               12
                                                                         11
      size
                                                                         10
  block count                0              10                7          9
reference count              3          8        4                       8
 direct blocks (12)
   40,58,26,8,12,       28                                    11         7
44,62,30,10,42,3,21     29              2    12 13 7                     6
single indirect: 32     38   SI                    14                    5
double indirect         46   0          9 17 5     15                    4
 triple indirect        61                                               3
                        43   56         1            16 6 63             2
                                                                         1
                                                                    10
                                            Disk                         0
Single Indirection
• Requires two disk access to read
  – One for the indirect block; one for the target block
• Max File Size
  – Assume 1Kbyte block size, 4 byte block numbers
  – 12 * 1K + 1K/4 * 1K = 268 Kbytes
• For large majority of files (< 268 K), given the
  inode, only one or two further accesses required
  to read any block in file.


                                                           11
mode
      uid
      gid
                      Inode Contents
                             •   Double Indirect Block
     atime                       – Block number of a block
     ctime                         containing block numbers of
                                   blocks containing block
     mtime
                                   numbers
      size
  block count
reference count
 direct blocks (12)
   40,58,26,8,12,
44,62,30,10,42,3,21
single indirect: 32
double indirect
 triple indirect

                                                            12
mode
      uid
      gid
                      Inode Contents
                             •   Double Indirect Block
     atime                        – Block number of a block
     ctime                          containing block numbers of
                                    blocks containing block
     mtime
                                    numbers
      size                   •   Triple Indirect
  block count                     – Block number of a block
reference count                     containing block numbers of
 direct blocks (12)                 blocks containing block
   40,58,26,8,12,                   numbers of blocks containing
44,62,30,10,42,3,21                 block numbers 

single indirect: 32
double indirect
 triple indirect

                                                             13
UNIX Inode Block Addressing
         Scheme




                          14
Max File Size
• Assume 4 bytes block numbers and 1K blocks
• The number of addressable blocks
  –   Direct Blocks = 12
  –   Single Indirect Blocks = 256
  –   Double Indirect Blocks = 256 * 256 = 65536
  –   Triple Indirect Blocks = 256 * 256 * 256 = 16777216
• Max File Size
  – 12 + 256 + 65536 + 16777216 = 16843020 blocks =
    16 GB

                                                        15
Where is the data block number
            stored?
• Assume 4K blocks, 4 byte block numbers, 12 direct
  blocks
• A 1 byte file produced by
  lseek(fd, 1048576, SEEK_SET) /* 1 megabyte */
  write(fd, “x”, 1)
• What if we add
  lseek(fd, 5242880, SEEK_SET) /* 5 megabytes */
  write(fd, “x”, 1)



                                                      16
Solution
               block #            location
0 through 11              Direct block
                 ?        Single-indirect block
                          Double-indirect blocks
Solution
               block #                   location
0 through 11                     Direct block
12 through (11 + 1024 = 1035)    Single-indirect block
                 ?               Double-indirect blocks
Solution
               block #                   location
0 through 11                     Direct block
12 through (11 + 1024 = 1035)    Single-indirect block
1036 through (1035+1024*1024     Double-indirect blocks
= 1049611)

Address = 1048576 ==> block number=?
Solution
               block #                   location
0 through 11                     Direct block
12 through (11 + 1024 = 1035)    Single-indirect block
1036 through (1035+1024*1024     Double-indirect blocks
= 1049611)

Address = 1048576 ==> block number=1048576/4096=256
Solution
               block #                      location
0 through 11                        Direct block
12 through (11 + 1024 = 1035)       Single-indirect block
1036 through (1035+1024*1024        Double-indirect blocks
= 1049611)

Address = 1048576 ==> block number=1048576/4096=256

Block number=256 ==> index in the single-indirect block=?
Solution
               block #                     location
0 through 11                       Direct block
12 through (11 + 1024 = 1035)      Single-indirect block
1036 through (1035+1024*1024       Double-indirect blocks
= 1049611)

Address = 1048576 ==> block number=1048576/4096=256

Block number=256 ==> index in the single-indirect block=256-12=244


Address = 5242880 ==> block number=5242880/4096=1280

Block number=1280 ==> double-indirect block number=?
Solution
               block #                         location
0 through 11                           Direct block
12 through (11 + 1024 = 1035)          Single-indirect block
1036 through (1035+1024*1024           Double-indirect blocks
= 1049611)

Address = 1048576 ==> block number=1048576/4096=256

Block number=256 ==> index in the single-indirect block=256-12=244


Address = 5242880 ==> block number=5242880/4096=1280

Block number=1280 ==> double-indirect block number=(1280-1036)/1024=244/1024=0

Index in the double indirect block=?
Solution
               block #                       location
0 through 11                         Direct block
12 through (11 + 1024 = 1035)        Single-indirect block
1036 through (1035+1024*1024         Double-indirect blocks
= 1049611)

Address = 1048576 ==> block number=1048576/4096=256

Block number=256 ==> index in the single-indirect block=256-12=244


Address = 5242880 ==> block number=5242880/4096=1280

Block number=1280 ==> double-indirect block number=(1280-1036)/1024=244/1024=0

Index in the double indirect block=244
Some Best and Worst Case
        Access Patterns
Assume Inode already in memory
• To read 1 byte
   – Best:
      • 1 access via direct block
   – Worst:
      • 4 accesses via the triple indirect block
• To write 1 byte
   – Best:
      • 1 write via direct block (with no previous content)
   – Worst:
      • 4 reads (to get previous contents of block via triple indirect) + 1 write
        (to write modified block back)

                                                                            25
Worst Case Access Patterns with
   Unallocated Indirect Blocks
• Worst to write 1 byte
   – 4 writes (3 indirect blocks; 1 data)
   – 1 read, 4 writes (read-write 1 indirect, write 2; write 1 data)
   – 2 reads, 3 writes (read 1 indirect, read-write 1 indirect, write 1;
     write 1 data)
   – 3 reads, 2 writes (read 2, read-write 1; write 1 data)
• Worst to read 1 byte
   – If reading writes a zero-filled block on disk
       • Worst case is same as write 1 byte
   – If not, worst-case depends on how deep is the current indirect
     block tree.


                                                                       26
Inode Summary
• The inode contains the on disk data associated with a
  file
   –   Contains mode, owner, and other bookkeeping
   –   Efficient random and sequential access via indexed allocation
   –   Small files (the majority of files) require only a single access
   –   Larger files require progressively more disk accesses for
       random access
        • Sequential access is still efficient
   – Can support really large files via increasing levels of indirection




                                                                      27
Where/How are Inodes Stored
    Boot Super          Inode
                                            Data Blocks
    Block Block         Array

• System V Disk Layout (s5fs)
  – Boot Block
     • contain code to bootstrap the OS
  – Super Block
     • Contains attributes of the file system itself
         – e.g. size, number of inodes, start block of inode array, start of data
           block area, free inode list, free data block list
  – Inode Array
  – Data blocks
                                                                               28
Some problems with s5fs
• Inodes at start of disk; data blocks end
  – Long seek times
     • Must read inode before reading data blocks
• Only one superblock
  – Corrupt the superblock and entire file system is lost
• Block allocation was suboptimal
  – Consecutive free block list created at FS format time
     • Allocation and de-allocation eventually randomises the list
       resulting the random allocation
• Inodes also allocated randomly
  – Directory listing resulted in random inode access patterns

                                                                     29
Berkeley Fast Filesystem (FFS)
• Historically followed s5fs
  – Addressed many limitations with s5fs
  – ext2fs mostly similar




                                           30
Layout of an Ext2 FS
     Boot      Block Group        Block Group
                             ….
     Block          0                  n



• Partition:
  – Reserved boot block,
  – Collection of equally sized block groups
  – All block groups have the same structure

                                                31
Layout of a Block Group
          Group     Data
Super                       Inode  Inode
         Descrip- Block                                  Data blocks
Block                      Bitmap  Table
            tors   Bitmap
1 blk    n blks   1 blk   1 blk   m blks              k blks
  • Replicated super block
     – For e2fsck
 •   Group descriptors
 •   Bitmaps identify used inodes/blocks
 •   All block groups have the same number of data blocks
 •   Advantages of this structure:
     – Replication simplifies recovery
     – Proximity of inode tables and data blocks (reduces seek time)
                                                                   32
Superblocks
• Size of the file system, block size and similar
  parameters
• Overall free inode and block counters
• Data indicating whether file system check is
  needed:
   –   Uncleanly unmounted
   –   Inconsistency
   –   Certain number of mounts since last check
   –   Certain time expired since last check
• Replicated to provide redundancy to aid
  recoverability

                                                    33
Group Descriptors
• Location of the bitmaps
• Counter for free blocks and inodes in this
  group
• Number of directories in the group




                                           34
Performance considerations
• EXT2 optimisations
  – Block groups cluster related inodes and data blocks
  – Read-ahead for directories
     • For directory searching
  – Pre-allocation of blocks on write (up to 8 blocks)
     • 8 bits in bit tables
     • Better contiguity when there are concurrent writes
• FFS optimisations
  – Aim to store files within a directory in the same group


                                                            35
Thus far…
• Inodes representing files laid out on disk.
• Inodes are referred to by number!!!
  – How do users name files? By number?




                                            36
Ext2fs Directories
     inode      rec_len     name_len         type       name…
• Directories are files of a special type
   – Consider it a file of special format, managed by the kernel, that uses
     most of the same machinery to implement it
       • Inodes, etc…
• Directories translate names to inode numbers
• Directory entries are of variable length
• Entries can be deleted in place
   – inode = 0
   – Add to length of previous entry
   – use null terminated strings for names


                                                                       37
Ext2fs Directories
                                 7          Inode No

• “f1” = inode 7                12
                                 2
                                            Rec Length
                                            Name Length

• “file2” = inode 43       ‘f’ ‘1’ 0 0         Name
                                43
• “f3” = inode 85               16
                                 5
                          ‘f’ ‘i’ ‘l’ ‘e’
                           ‘2’ 0 0 0
                                85
                                12
                                 2
                           ‘f’ ‘3’ 0 0
                                 0


                                                38
Hard links
                                         7          Inode No

• Note that inodes can                  12          Rec Length
                                         2          Name Length
  have more than one               ‘f’ ‘1’ 0 0         Name

  name                                   7
                                        16
  – Called a Hard Link                   5
                                  ‘f’ ‘i’ ‘l’ ‘e’
  – Inode (file) 7 has
                                   ‘2’ 0 0 0
    three names                          7
    • “f1” = inode 7                    12
                                         2
    • “file2” = inode 7
                                   ‘f’ ‘3’ 0 0
    • “f3” = inode 7                     0


                                                        39
mode
      uid
      gid
                      Inode Contents
                       •   We can have many names for the same
     atime                 inode.
     ctime             •   When we delete a file by name, i.e. remove
     mtime                 the directory entry (link), how does the file
      size                 system know when to delete the underlying
                           inode?
  block count               – Keep a reference count in the inode
reference count                 • Adding a name (directory entry) increments the
 direct blocks (12)               count
   40,58,26,8,12,               • Removing a name decrements the count
44,62,30,10,42,3,21             • If the reference count == 0, then we have no
                                  names for the inode (it is unreachable), we can
single indirect: 32               delete the inode (underlying file or directory)
double indirect
 triple indirect

                                                                              40
Hard links




(a) Situation prior to linking
(b) After the link is created
(c)After the original owner removes the file
Symbolic links

• A symbolic link is a file that contains a
  reference to another file or directory
  – Has its own inode and data block, which
    contains a path to the target file
  – Marked by a special file attribute
  – Transparent for some operations
  – Can point across FS boundaries
Ext2fs Directories
                                      7          Inode No

• Deleting a filename                12          Rec Length
                                      2          Name Length
  – rm file2                    ‘f’ ‘1’ 0 0         Name
                                      7
                                     16
                                      5
                               ‘f’ ‘i’ ‘l’ ‘e’
                                ‘2’ 0 0 0
                                      7
                                     12
                                      2
                                ‘f’ ‘3’ 0 0
                                      0


                                                     43
Ext2fs Directories
                                      7       Inode No

• Deleting a filename                32       Rec Length
                                      2       Name Length
   – rm file2                   ‘f’ ‘1’ 0 0      Name

• Adjust the record
  length to skip to next
  valid entry
                                      7
                                     12
                                      2
                                ‘f’ ‘3’ 0 0
                                      0


                                                  44
FS reliability

• Disk writes are buffered in RAM
  – OS crash or power outage ==> lost data
  – Commit writes to disk periodically (e.g., every
    30 sec)
  – Use the sync command to force a FS flush
• FS operations are non-atomic
  – Incomplete transaction can leave the FS in an
    inconsistent state
FS reliability
 dir entries      i-nodes        data blocks
                                     2

                                 3         1


• Example: deleting a file
  1.Remove the directory entry
  2.Mark the i-node as free
  3.Mark disk blocks as free
FS reliability
 dir entries      i-nodes             data blocks
                                          2

                                      3         1


• Example: deleting a file
  1.Remove the directory entry--> crash
  2.Mark the i-node as free
  3.Mark disk blocks as free
The i-node and data blocks are lost
FS reliability
 dir entries        i-nodes              data blocks
                                             2

                                         3         1


• Example: deleting a file
  1.Mark the i-node as free --> crash
  2.Remove the directory entry
  3.Mark disk blocks as free
The dir entry points to the wrong file
FS reliability
 dir entries       i-nodes           data blocks
                                         2

                                     3         1


• Example: deleting a file
  1.Mark disk blocks as free --> crash
  2.Remove the directory entry
  3.Mark the i-node as free
The file randomly shares disk blocks with other files
FS reliability
• e2fsck
  – Scans the disk after an unclean shutdown and
    attempts to restore FS invariants
• Journaling file systems
  – Keep a journal of FS updates
  – Before performing an atomic update sequence,
    write it to the journal
  – Replay the last journal entries upon an unclean
    shutdown
  – Example: ext3fs

Más contenido relacionado

Destacado

Course 102: Lecture 5: File Handling Internals
Course 102: Lecture 5: File Handling Internals Course 102: Lecture 5: File Handling Internals
Course 102: Lecture 5: File Handling Internals Ahmed El-Arabawy
 
Future of cities meda city congress november 2012
Future of cities   meda city congress november 2012Future of cities   meda city congress november 2012
Future of cities meda city congress november 2012Alain Jordà
 
শিক্ষার্থীদের ঝরে পড়া প্রতিকারে প্রয়োাজন সমন্বিত উদ্যোগ
শিক্ষার্থীদের ঝরে পড়া প্রতিকারে প্রয়োাজন সমন্বিত উদ্যোগ শিক্ষার্থীদের ঝরে পড়া প্রতিকারে প্রয়োাজন সমন্বিত উদ্যোগ
শিক্ষার্থীদের ঝরে পড়া প্রতিকারে প্রয়োাজন সমন্বিত উদ্যোগ Abul Bashar
 
Arrangeren
ArrangerenArrangeren
Arrangerenwimdboer
 
Tele3113 wk2tue
Tele3113 wk2tueTele3113 wk2tue
Tele3113 wk2tueVin Voro
 
Удержание пользователей (User Retention) // CPA Network Russia
Удержание пользователей (User Retention) // CPA Network RussiaУдержание пользователей (User Retention) // CPA Network Russia
Удержание пользователей (User Retention) // CPA Network RussiaKira Zhestkova
 
The Caribbean can never be surpassed
The Caribbean can never be surpassedThe Caribbean can never be surpassed
The Caribbean can never be surpassedTrevor Fordyce
 
実績、事例紹介マップ御提案書
実績、事例紹介マップ御提案書実績、事例紹介マップ御提案書
実績、事例紹介マップ御提案書Takaho Maeda
 
Dr Jen Gupta - Understanding nature’s death ray guns - 13 Oct 2015
Dr Jen Gupta - Understanding nature’s death ray guns - 13 Oct 2015Dr Jen Gupta - Understanding nature’s death ray guns - 13 Oct 2015
Dr Jen Gupta - Understanding nature’s death ray guns - 13 Oct 2015onthewight
 
http://commercialinsurance.inbuffalolocalarea.com/
http://commercialinsurance.inbuffalolocalarea.com/http://commercialinsurance.inbuffalolocalarea.com/
http://commercialinsurance.inbuffalolocalarea.com/filcancommercial
 
Introducing A Branded Cell Phone To Vietnam
Introducing A Branded Cell Phone To VietnamIntroducing A Branded Cell Phone To Vietnam
Introducing A Branded Cell Phone To Vietnamsedagokoglu
 
Ill student training
Ill student trainingIll student training
Ill student trainingdparkin
 
Kamaroninfo núm. 22 febrer 1998
Kamaroninfo núm. 22 febrer 1998Kamaroninfo núm. 22 febrer 1998
Kamaroninfo núm. 22 febrer 1998Josep Miquel
 

Destacado (20)

Linux File System
Linux File SystemLinux File System
Linux File System
 
Course 102: Lecture 5: File Handling Internals
Course 102: Lecture 5: File Handling Internals Course 102: Lecture 5: File Handling Internals
Course 102: Lecture 5: File Handling Internals
 
4. linux file systems
4. linux file systems4. linux file systems
4. linux file systems
 
Future of cities meda city congress november 2012
Future of cities   meda city congress november 2012Future of cities   meda city congress november 2012
Future of cities meda city congress november 2012
 
Extlect02
Extlect02Extlect02
Extlect02
 
শিক্ষার্থীদের ঝরে পড়া প্রতিকারে প্রয়োাজন সমন্বিত উদ্যোগ
শিক্ষার্থীদের ঝরে পড়া প্রতিকারে প্রয়োাজন সমন্বিত উদ্যোগ শিক্ষার্থীদের ঝরে পড়া প্রতিকারে প্রয়োাজন সমন্বিত উদ্যোগ
শিক্ষার্থীদের ঝরে পড়া প্রতিকারে প্রয়োাজন সমন্বিত উদ্যোগ
 
Arrangeren
ArrangerenArrangeren
Arrangeren
 
Tele3113 wk2tue
Tele3113 wk2tueTele3113 wk2tue
Tele3113 wk2tue
 
Удержание пользователей (User Retention) // CPA Network Russia
Удержание пользователей (User Retention) // CPA Network RussiaУдержание пользователей (User Retention) // CPA Network Russia
Удержание пользователей (User Retention) // CPA Network Russia
 
Iphone 4
Iphone 4Iphone 4
Iphone 4
 
The Caribbean can never be surpassed
The Caribbean can never be surpassedThe Caribbean can never be surpassed
The Caribbean can never be surpassed
 
Monterey Slides
Monterey SlidesMonterey Slides
Monterey Slides
 
Lect17
Lect17Lect17
Lect17
 
実績、事例紹介マップ御提案書
実績、事例紹介マップ御提案書実績、事例紹介マップ御提案書
実績、事例紹介マップ御提案書
 
Dr Jen Gupta - Understanding nature’s death ray guns - 13 Oct 2015
Dr Jen Gupta - Understanding nature’s death ray guns - 13 Oct 2015Dr Jen Gupta - Understanding nature’s death ray guns - 13 Oct 2015
Dr Jen Gupta - Understanding nature’s death ray guns - 13 Oct 2015
 
http://commercialinsurance.inbuffalolocalarea.com/
http://commercialinsurance.inbuffalolocalarea.com/http://commercialinsurance.inbuffalolocalarea.com/
http://commercialinsurance.inbuffalolocalarea.com/
 
Introducing A Branded Cell Phone To Vietnam
Introducing A Branded Cell Phone To VietnamIntroducing A Branded Cell Phone To Vietnam
Introducing A Branded Cell Phone To Vietnam
 
Ill student training
Ill student trainingIll student training
Ill student training
 
Mallorca
Mallorca Mallorca
Mallorca
 
Kamaroninfo núm. 22 febrer 1998
Kamaroninfo núm. 22 febrer 1998Kamaroninfo núm. 22 febrer 1998
Kamaroninfo núm. 22 febrer 1998
 

Similar a Lect10

北航云计算公开课03 google file system
北航云计算公开课03 google file system北航云计算公开课03 google file system
北航云计算公开课03 google file systemCando Zhou
 
Nick twitter
Nick twitterNick twitter
Nick twitterd0nn9n
 
Big Data in Real-Time at Twitter
Big Data in Real-Time at TwitterBig Data in Real-Time at Twitter
Big Data in Real-Time at Twitternkallen
 
Everything I Ever Learned About JVM Performance Tuning @Twitter
Everything I Ever Learned About JVM Performance Tuning @TwitterEverything I Ever Learned About JVM Performance Tuning @Twitter
Everything I Ever Learned About JVM Performance Tuning @TwitterAttila Szegedi
 
NickKallen_DataArchitectureAtTwitterScale
NickKallen_DataArchitectureAtTwitterScaleNickKallen_DataArchitectureAtTwitterScale
NickKallen_DataArchitectureAtTwitterScaleKostas Mavridis
 
From Java code to Java heap: Understanding and optimizing your application's ...
From Java code to Java heap: Understanding and optimizing your application's ...From Java code to Java heap: Understanding and optimizing your application's ...
From Java code to Java heap: Understanding and optimizing your application's ...Chris Bailey
 
Memory Analysis of the Dalvik (Android) Virtual Machine
Memory Analysis of the Dalvik (Android) Virtual MachineMemory Analysis of the Dalvik (Android) Virtual Machine
Memory Analysis of the Dalvik (Android) Virtual MachineAndrew Case
 
DaStor/Cassandra report for CDR solution
DaStor/Cassandra report for CDR solutionDaStor/Cassandra report for CDR solution
DaStor/Cassandra report for CDR solutionSchubert Zhang
 

Similar a Lect10 (11)

北航云计算公开课03 google file system
北航云计算公开课03 google file system北航云计算公开课03 google file system
北航云计算公开课03 google file system
 
Nick twitter
Nick twitterNick twitter
Nick twitter
 
Big Data in Real-Time at Twitter
Big Data in Real-Time at TwitterBig Data in Real-Time at Twitter
Big Data in Real-Time at Twitter
 
qcon
qconqcon
qcon
 
VAST-Tree, EDBT'12
VAST-Tree, EDBT'12VAST-Tree, EDBT'12
VAST-Tree, EDBT'12
 
Everything I Ever Learned About JVM Performance Tuning @Twitter
Everything I Ever Learned About JVM Performance Tuning @TwitterEverything I Ever Learned About JVM Performance Tuning @Twitter
Everything I Ever Learned About JVM Performance Tuning @Twitter
 
NickKallen_DataArchitectureAtTwitterScale
NickKallen_DataArchitectureAtTwitterScaleNickKallen_DataArchitectureAtTwitterScale
NickKallen_DataArchitectureAtTwitterScale
 
From Java code to Java heap: Understanding and optimizing your application's ...
From Java code to Java heap: Understanding and optimizing your application's ...From Java code to Java heap: Understanding and optimizing your application's ...
From Java code to Java heap: Understanding and optimizing your application's ...
 
Memory Analysis of the Dalvik (Android) Virtual Machine
Memory Analysis of the Dalvik (Android) Virtual MachineMemory Analysis of the Dalvik (Android) Virtual Machine
Memory Analysis of the Dalvik (Android) Virtual Machine
 
DaStor/Cassandra report for CDR solution
DaStor/Cassandra report for CDR solutionDaStor/Cassandra report for CDR solution
DaStor/Cassandra report for CDR solution
 
Lecture 25
Lecture 25Lecture 25
Lecture 25
 

Más de Vin Voro

Tele3113 tut6
Tele3113 tut6Tele3113 tut6
Tele3113 tut6Vin Voro
 
Tele3113 tut5
Tele3113 tut5Tele3113 tut5
Tele3113 tut5Vin Voro
 
Tele3113 tut4
Tele3113 tut4Tele3113 tut4
Tele3113 tut4Vin Voro
 
Tele3113 tut1
Tele3113 tut1Tele3113 tut1
Tele3113 tut1Vin Voro
 
Tele3113 tut3
Tele3113 tut3Tele3113 tut3
Tele3113 tut3Vin Voro
 
Tele3113 tut2
Tele3113 tut2Tele3113 tut2
Tele3113 tut2Vin Voro
 
Tele3113 wk11tue
Tele3113 wk11tueTele3113 wk11tue
Tele3113 wk11tueVin Voro
 
Tele3113 wk10wed
Tele3113 wk10wedTele3113 wk10wed
Tele3113 wk10wedVin Voro
 
Tele3113 wk10tue
Tele3113 wk10tueTele3113 wk10tue
Tele3113 wk10tueVin Voro
 
Tele3113 wk11wed
Tele3113 wk11wedTele3113 wk11wed
Tele3113 wk11wedVin Voro
 
Tele3113 wk7wed
Tele3113 wk7wedTele3113 wk7wed
Tele3113 wk7wedVin Voro
 
Tele3113 wk9tue
Tele3113 wk9tueTele3113 wk9tue
Tele3113 wk9tueVin Voro
 
Tele3113 wk8wed
Tele3113 wk8wedTele3113 wk8wed
Tele3113 wk8wedVin Voro
 
Tele3113 wk9wed
Tele3113 wk9wedTele3113 wk9wed
Tele3113 wk9wedVin Voro
 
Tele3113 wk7wed
Tele3113 wk7wedTele3113 wk7wed
Tele3113 wk7wedVin Voro
 
Tele3113 wk7wed
Tele3113 wk7wedTele3113 wk7wed
Tele3113 wk7wedVin Voro
 
Tele3113 wk7tue
Tele3113 wk7tueTele3113 wk7tue
Tele3113 wk7tueVin Voro
 
Tele3113 wk6wed
Tele3113 wk6wedTele3113 wk6wed
Tele3113 wk6wedVin Voro
 
Tele3113 wk6tue
Tele3113 wk6tueTele3113 wk6tue
Tele3113 wk6tueVin Voro
 
Tele3113 wk5tue
Tele3113 wk5tueTele3113 wk5tue
Tele3113 wk5tueVin Voro
 

Más de Vin Voro (20)

Tele3113 tut6
Tele3113 tut6Tele3113 tut6
Tele3113 tut6
 
Tele3113 tut5
Tele3113 tut5Tele3113 tut5
Tele3113 tut5
 
Tele3113 tut4
Tele3113 tut4Tele3113 tut4
Tele3113 tut4
 
Tele3113 tut1
Tele3113 tut1Tele3113 tut1
Tele3113 tut1
 
Tele3113 tut3
Tele3113 tut3Tele3113 tut3
Tele3113 tut3
 
Tele3113 tut2
Tele3113 tut2Tele3113 tut2
Tele3113 tut2
 
Tele3113 wk11tue
Tele3113 wk11tueTele3113 wk11tue
Tele3113 wk11tue
 
Tele3113 wk10wed
Tele3113 wk10wedTele3113 wk10wed
Tele3113 wk10wed
 
Tele3113 wk10tue
Tele3113 wk10tueTele3113 wk10tue
Tele3113 wk10tue
 
Tele3113 wk11wed
Tele3113 wk11wedTele3113 wk11wed
Tele3113 wk11wed
 
Tele3113 wk7wed
Tele3113 wk7wedTele3113 wk7wed
Tele3113 wk7wed
 
Tele3113 wk9tue
Tele3113 wk9tueTele3113 wk9tue
Tele3113 wk9tue
 
Tele3113 wk8wed
Tele3113 wk8wedTele3113 wk8wed
Tele3113 wk8wed
 
Tele3113 wk9wed
Tele3113 wk9wedTele3113 wk9wed
Tele3113 wk9wed
 
Tele3113 wk7wed
Tele3113 wk7wedTele3113 wk7wed
Tele3113 wk7wed
 
Tele3113 wk7wed
Tele3113 wk7wedTele3113 wk7wed
Tele3113 wk7wed
 
Tele3113 wk7tue
Tele3113 wk7tueTele3113 wk7tue
Tele3113 wk7tue
 
Tele3113 wk6wed
Tele3113 wk6wedTele3113 wk6wed
Tele3113 wk6wed
 
Tele3113 wk6tue
Tele3113 wk6tueTele3113 wk6tue
Tele3113 wk6tue
 
Tele3113 wk5tue
Tele3113 wk5tueTele3113 wk5tue
Tele3113 wk5tue
 

Último

Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)lakshayb543
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptxmary850239
 
FILIPINO PSYCHology sikolohiyang pilipino
FILIPINO PSYCHology sikolohiyang pilipinoFILIPINO PSYCHology sikolohiyang pilipino
FILIPINO PSYCHology sikolohiyang pilipinojohnmickonozaleda
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfphamnguyenenglishnb
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxCarlos105
 
Judging the Relevance and worth of ideas part 2.pptx
Judging the Relevance  and worth of ideas part 2.pptxJudging the Relevance  and worth of ideas part 2.pptx
Judging the Relevance and worth of ideas part 2.pptxSherlyMaeNeri
 
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptxiammrhaywood
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfJemuel Francisco
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Celine George
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPCeline George
 
Culture Uniformity or Diversity IN SOCIOLOGY.pptx
Culture Uniformity or Diversity IN SOCIOLOGY.pptxCulture Uniformity or Diversity IN SOCIOLOGY.pptx
Culture Uniformity or Diversity IN SOCIOLOGY.pptxPoojaSen20
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Seán Kennedy
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfTechSoup
 
ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomnelietumpap1
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 

Último (20)

Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx
 
FILIPINO PSYCHology sikolohiyang pilipino
FILIPINO PSYCHology sikolohiyang pilipinoFILIPINO PSYCHology sikolohiyang pilipino
FILIPINO PSYCHology sikolohiyang pilipino
 
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptxYOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
 
Judging the Relevance and worth of ideas part 2.pptx
Judging the Relevance  and worth of ideas part 2.pptxJudging the Relevance  and worth of ideas part 2.pptx
Judging the Relevance and worth of ideas part 2.pptx
 
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERP
 
Culture Uniformity or Diversity IN SOCIOLOGY.pptx
Culture Uniformity or Diversity IN SOCIOLOGY.pptxCulture Uniformity or Diversity IN SOCIOLOGY.pptx
Culture Uniformity or Diversity IN SOCIOLOGY.pptx
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...
 
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptxFINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
 
Raw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptxRaw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptx
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
 
ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choom
 
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptxYOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 

Lect10

  • 2. The ext2 file system • Second Extended Filesystem – The main Linux FS before ext3 – Evolved from Minix filesystem (via “Extended Filesystem”) • Features – Block size (1024, 2048, and 4096) configured at FS creation – inode-based FS – Performance optimisations to improve locality (from BSD FFS) • Main Problem: unclean unmount e2fsck – Ext3fs keeps a journal of (meta-data) updates – Journal is a file where updates are logged – Compatible with ext2fs
  • 3. Recap: i-nodes • Each file is represented by an inode on disk • Inode contains all of a file’s metadata – Access rights, owner,accounting info – (partial) block index table of a file • Each inode has a unique number – System oriented name – Try ‘ls –i’ on Unix (Linux) • Directories map file names to inode numbers – Map human-oriented to system-oriented names 3
  • 5. mode uid gid Ext2 i-nodes atime ctime • Mode mtime – Type size • Regular file or directory block count – Access mode reference count • rwxrwxrwx direct blocks • Uid (12) – User ID single indirect • Gid double indirect triple indirect – Group ID 5
  • 6. mode uid gid Inode Contents atime ctime • atime mtime – Time of last access size • ctime block count – Time when file was reference count created direct blocks (12) • mtime – Time when file was single indirect last modified double indirect triple indirect 6
  • 7. mode uid gid Inode Contents • Size atime – Size of the file in bytes ctime • Block count mtime – Number of disk blocks used by the file. size • Note that number of blocks block count can be much less than reference count expected given the file size – Files can be sparsely direct blocks populated • E.g. write(f,“hello”); lseek(f, (12) 1000000); write(f, “world”); single indirect • Only needs to store the start an end of file, not all the double indirect empty blocks in between. – Size = 1000005 triple indirect – Blocks = 2 + overheads 7
  • 8. mode uid gid Inode Direct Blocks • Contents File atime – Block numbers of first 12 11 ctime blocks in the file 10 – Most files are small 9 mtime • We can find blocks of file size directly from the inode 8 block count 7 0 10 7 6 reference count 3 8 4 5 direct blocks (12) 40,58,26,8,12, 11 4 44,62,30,10,42,3,21 2 7 3 single indirect 2 double indirect 0 9 5 1 triple indirect 0 56 1 6 63 8 Disk
  • 9. Problem • How do we store files greater than 12 blocks in size? – Adding significantly more direct entries in the inode results in many unused entries most of the time. 9
  • 10. mode 17 uid gid Inode Contents 16 15 • Single Indirect Block atime 14 – Block number of a block ctime containing block numbers 13 mtime 12 11 size 10 block count 0 10 7 9 reference count 3 8 4 8 direct blocks (12) 40,58,26,8,12, 28 11 7 44,62,30,10,42,3,21 29 2 12 13 7 6 single indirect: 32 38 SI 14 5 double indirect 46 0 9 17 5 15 4 triple indirect 61 3 43 56 1 16 6 63 2 1 10 Disk 0
  • 11. Single Indirection • Requires two disk access to read – One for the indirect block; one for the target block • Max File Size – Assume 1Kbyte block size, 4 byte block numbers – 12 * 1K + 1K/4 * 1K = 268 Kbytes • For large majority of files (< 268 K), given the inode, only one or two further accesses required to read any block in file. 11
  • 12. mode uid gid Inode Contents • Double Indirect Block atime – Block number of a block ctime containing block numbers of blocks containing block mtime numbers size block count reference count direct blocks (12) 40,58,26,8,12, 44,62,30,10,42,3,21 single indirect: 32 double indirect triple indirect 12
  • 13. mode uid gid Inode Contents • Double Indirect Block atime – Block number of a block ctime containing block numbers of blocks containing block mtime numbers size • Triple Indirect block count – Block number of a block reference count containing block numbers of direct blocks (12) blocks containing block 40,58,26,8,12, numbers of blocks containing 44,62,30,10,42,3,21 block numbers  single indirect: 32 double indirect triple indirect 13
  • 14. UNIX Inode Block Addressing Scheme 14
  • 15. Max File Size • Assume 4 bytes block numbers and 1K blocks • The number of addressable blocks – Direct Blocks = 12 – Single Indirect Blocks = 256 – Double Indirect Blocks = 256 * 256 = 65536 – Triple Indirect Blocks = 256 * 256 * 256 = 16777216 • Max File Size – 12 + 256 + 65536 + 16777216 = 16843020 blocks = 16 GB 15
  • 16. Where is the data block number stored? • Assume 4K blocks, 4 byte block numbers, 12 direct blocks • A 1 byte file produced by lseek(fd, 1048576, SEEK_SET) /* 1 megabyte */ write(fd, “x”, 1) • What if we add lseek(fd, 5242880, SEEK_SET) /* 5 megabytes */ write(fd, “x”, 1) 16
  • 17. Solution block # location 0 through 11 Direct block ? Single-indirect block Double-indirect blocks
  • 18. Solution block # location 0 through 11 Direct block 12 through (11 + 1024 = 1035) Single-indirect block ? Double-indirect blocks
  • 19. Solution block # location 0 through 11 Direct block 12 through (11 + 1024 = 1035) Single-indirect block 1036 through (1035+1024*1024 Double-indirect blocks = 1049611) Address = 1048576 ==> block number=?
  • 20. Solution block # location 0 through 11 Direct block 12 through (11 + 1024 = 1035) Single-indirect block 1036 through (1035+1024*1024 Double-indirect blocks = 1049611) Address = 1048576 ==> block number=1048576/4096=256
  • 21. Solution block # location 0 through 11 Direct block 12 through (11 + 1024 = 1035) Single-indirect block 1036 through (1035+1024*1024 Double-indirect blocks = 1049611) Address = 1048576 ==> block number=1048576/4096=256 Block number=256 ==> index in the single-indirect block=?
  • 22. Solution block # location 0 through 11 Direct block 12 through (11 + 1024 = 1035) Single-indirect block 1036 through (1035+1024*1024 Double-indirect blocks = 1049611) Address = 1048576 ==> block number=1048576/4096=256 Block number=256 ==> index in the single-indirect block=256-12=244 Address = 5242880 ==> block number=5242880/4096=1280 Block number=1280 ==> double-indirect block number=?
  • 23. Solution block # location 0 through 11 Direct block 12 through (11 + 1024 = 1035) Single-indirect block 1036 through (1035+1024*1024 Double-indirect blocks = 1049611) Address = 1048576 ==> block number=1048576/4096=256 Block number=256 ==> index in the single-indirect block=256-12=244 Address = 5242880 ==> block number=5242880/4096=1280 Block number=1280 ==> double-indirect block number=(1280-1036)/1024=244/1024=0 Index in the double indirect block=?
  • 24. Solution block # location 0 through 11 Direct block 12 through (11 + 1024 = 1035) Single-indirect block 1036 through (1035+1024*1024 Double-indirect blocks = 1049611) Address = 1048576 ==> block number=1048576/4096=256 Block number=256 ==> index in the single-indirect block=256-12=244 Address = 5242880 ==> block number=5242880/4096=1280 Block number=1280 ==> double-indirect block number=(1280-1036)/1024=244/1024=0 Index in the double indirect block=244
  • 25. Some Best and Worst Case Access Patterns Assume Inode already in memory • To read 1 byte – Best: • 1 access via direct block – Worst: • 4 accesses via the triple indirect block • To write 1 byte – Best: • 1 write via direct block (with no previous content) – Worst: • 4 reads (to get previous contents of block via triple indirect) + 1 write (to write modified block back) 25
  • 26. Worst Case Access Patterns with Unallocated Indirect Blocks • Worst to write 1 byte – 4 writes (3 indirect blocks; 1 data) – 1 read, 4 writes (read-write 1 indirect, write 2; write 1 data) – 2 reads, 3 writes (read 1 indirect, read-write 1 indirect, write 1; write 1 data) – 3 reads, 2 writes (read 2, read-write 1; write 1 data) • Worst to read 1 byte – If reading writes a zero-filled block on disk • Worst case is same as write 1 byte – If not, worst-case depends on how deep is the current indirect block tree. 26
  • 27. Inode Summary • The inode contains the on disk data associated with a file – Contains mode, owner, and other bookkeeping – Efficient random and sequential access via indexed allocation – Small files (the majority of files) require only a single access – Larger files require progressively more disk accesses for random access • Sequential access is still efficient – Can support really large files via increasing levels of indirection 27
  • 28. Where/How are Inodes Stored Boot Super Inode Data Blocks Block Block Array • System V Disk Layout (s5fs) – Boot Block • contain code to bootstrap the OS – Super Block • Contains attributes of the file system itself – e.g. size, number of inodes, start block of inode array, start of data block area, free inode list, free data block list – Inode Array – Data blocks 28
  • 29. Some problems with s5fs • Inodes at start of disk; data blocks end – Long seek times • Must read inode before reading data blocks • Only one superblock – Corrupt the superblock and entire file system is lost • Block allocation was suboptimal – Consecutive free block list created at FS format time • Allocation and de-allocation eventually randomises the list resulting the random allocation • Inodes also allocated randomly – Directory listing resulted in random inode access patterns 29
  • 30. Berkeley Fast Filesystem (FFS) • Historically followed s5fs – Addressed many limitations with s5fs – ext2fs mostly similar 30
  • 31. Layout of an Ext2 FS Boot Block Group Block Group …. Block 0 n • Partition: – Reserved boot block, – Collection of equally sized block groups – All block groups have the same structure 31
  • 32. Layout of a Block Group Group Data Super Inode Inode Descrip- Block Data blocks Block Bitmap Table tors Bitmap 1 blk n blks 1 blk 1 blk m blks k blks • Replicated super block – For e2fsck • Group descriptors • Bitmaps identify used inodes/blocks • All block groups have the same number of data blocks • Advantages of this structure: – Replication simplifies recovery – Proximity of inode tables and data blocks (reduces seek time) 32
  • 33. Superblocks • Size of the file system, block size and similar parameters • Overall free inode and block counters • Data indicating whether file system check is needed: – Uncleanly unmounted – Inconsistency – Certain number of mounts since last check – Certain time expired since last check • Replicated to provide redundancy to aid recoverability 33
  • 34. Group Descriptors • Location of the bitmaps • Counter for free blocks and inodes in this group • Number of directories in the group 34
  • 35. Performance considerations • EXT2 optimisations – Block groups cluster related inodes and data blocks – Read-ahead for directories • For directory searching – Pre-allocation of blocks on write (up to 8 blocks) • 8 bits in bit tables • Better contiguity when there are concurrent writes • FFS optimisations – Aim to store files within a directory in the same group 35
  • 36. Thus far… • Inodes representing files laid out on disk. • Inodes are referred to by number!!! – How do users name files? By number? 36
  • 37. Ext2fs Directories inode rec_len name_len type name… • Directories are files of a special type – Consider it a file of special format, managed by the kernel, that uses most of the same machinery to implement it • Inodes, etc… • Directories translate names to inode numbers • Directory entries are of variable length • Entries can be deleted in place – inode = 0 – Add to length of previous entry – use null terminated strings for names 37
  • 38. Ext2fs Directories 7 Inode No • “f1” = inode 7 12 2 Rec Length Name Length • “file2” = inode 43 ‘f’ ‘1’ 0 0 Name 43 • “f3” = inode 85 16 5 ‘f’ ‘i’ ‘l’ ‘e’ ‘2’ 0 0 0 85 12 2 ‘f’ ‘3’ 0 0 0 38
  • 39. Hard links 7 Inode No • Note that inodes can 12 Rec Length 2 Name Length have more than one ‘f’ ‘1’ 0 0 Name name 7 16 – Called a Hard Link 5 ‘f’ ‘i’ ‘l’ ‘e’ – Inode (file) 7 has ‘2’ 0 0 0 three names 7 • “f1” = inode 7 12 2 • “file2” = inode 7 ‘f’ ‘3’ 0 0 • “f3” = inode 7 0 39
  • 40. mode uid gid Inode Contents • We can have many names for the same atime inode. ctime • When we delete a file by name, i.e. remove mtime the directory entry (link), how does the file size system know when to delete the underlying inode? block count – Keep a reference count in the inode reference count • Adding a name (directory entry) increments the direct blocks (12) count 40,58,26,8,12, • Removing a name decrements the count 44,62,30,10,42,3,21 • If the reference count == 0, then we have no names for the inode (it is unreachable), we can single indirect: 32 delete the inode (underlying file or directory) double indirect triple indirect 40
  • 41. Hard links (a) Situation prior to linking (b) After the link is created (c)After the original owner removes the file
  • 42. Symbolic links • A symbolic link is a file that contains a reference to another file or directory – Has its own inode and data block, which contains a path to the target file – Marked by a special file attribute – Transparent for some operations – Can point across FS boundaries
  • 43. Ext2fs Directories 7 Inode No • Deleting a filename 12 Rec Length 2 Name Length – rm file2 ‘f’ ‘1’ 0 0 Name 7 16 5 ‘f’ ‘i’ ‘l’ ‘e’ ‘2’ 0 0 0 7 12 2 ‘f’ ‘3’ 0 0 0 43
  • 44. Ext2fs Directories 7 Inode No • Deleting a filename 32 Rec Length 2 Name Length – rm file2 ‘f’ ‘1’ 0 0 Name • Adjust the record length to skip to next valid entry 7 12 2 ‘f’ ‘3’ 0 0 0 44
  • 45. FS reliability • Disk writes are buffered in RAM – OS crash or power outage ==> lost data – Commit writes to disk periodically (e.g., every 30 sec) – Use the sync command to force a FS flush • FS operations are non-atomic – Incomplete transaction can leave the FS in an inconsistent state
  • 46. FS reliability dir entries i-nodes data blocks 2 3 1 • Example: deleting a file 1.Remove the directory entry 2.Mark the i-node as free 3.Mark disk blocks as free
  • 47. FS reliability dir entries i-nodes data blocks 2 3 1 • Example: deleting a file 1.Remove the directory entry--> crash 2.Mark the i-node as free 3.Mark disk blocks as free The i-node and data blocks are lost
  • 48. FS reliability dir entries i-nodes data blocks 2 3 1 • Example: deleting a file 1.Mark the i-node as free --> crash 2.Remove the directory entry 3.Mark disk blocks as free The dir entry points to the wrong file
  • 49. FS reliability dir entries i-nodes data blocks 2 3 1 • Example: deleting a file 1.Mark disk blocks as free --> crash 2.Remove the directory entry 3.Mark the i-node as free The file randomly shares disk blocks with other files
  • 50. FS reliability • e2fsck – Scans the disk after an unclean shutdown and attempts to restore FS invariants • Journaling file systems – Keep a journal of FS updates – Before performing an atomic update sequence, write it to the journal – Replay the last journal entries upon an unclean shutdown – Example: ext3fs