SlideShare una empresa de Scribd logo
1 de 55
Descargar para leer sin conexión
UNIX File Management
     (continued)
OS storage stack (recap)
                     Application

                      FD table
                      OF table
                        VFS
                         FS
                     Buffer cache
                    Disk scheduler
                    Device driver




                                   2
Virtual File System (VFS)
                      Application

                       FD table
                       OF table
                         VFS
                          FS
                      Buffer cache
                     Disk scheduler
                     Device driver




                                    3
Older Systems only had a single
          file system
• They had file system specific open, close, read,
  write, … calls.
• However, modern systems need to support
  many file system types
  – ISO9660 (CDROM), MSDOS (floppy), ext2fs, tmpfs




                                                     4
Supporting Multiple File
            Systems
• Alternatives
  – Change the file system code to understand
    different file system types
     • Prone to code bloat, complex, non-solution
  – Provide a framework that separates file
    system independent and file system
    dependent code.
     • Allows different file systems to be “plugged in”




                                                          5
Virtual File System (VFS)
                      Application

                           FD table
                       OF table
                             VFS
                FS                        FS2
                      Buffer cache
          Disk scheduler           Disk scheduler
           Device driver              Device driver




                                                  6
Virtual file system (VFS)
  /
                open(“/home/leonidr/file”, …);

                Traversing the directory hierarchy
ext3            may require VFS to issue requests
                to several underlying file systems
/home/leonidr



       nfs


                                                 7
Virtual File System (VFS)
• Provides single system call interface for many file
  systems
   – E.g., UFS, Ext2, XFS, DOS, ISO9660,…
• Transparent handling of network file systems
   – E.g., NFS, AFS, CODA
• File-based interface to arbitrary device drivers (/dev)
• File-based interface to kernel data structures (/proc)
• Provides an indirection layer for system calls
   – File operation table set up at file open time
   – Points to actual handling code for particular type
   – Further file operations redirected to those functions

                                                             8
The file system independent code
   deals with vfs and vnodes
                                       VFS        FS




                                       vnode      inode




                                               File system
                                               dependent
      File Descriptor
          Tables
                     Open File Table               code9 9
VFS Interface
•       Reference
    –     S.R. Kleiman., "Vnodes: An Architecture for Multiple File System Types
          in Sun Unix," USENIX Association: Summer Conference Proceedings,
          Atlanta, 1986
    –     Linux and OS/161 differ slightly, but the principles are the same
•       Two major data types
    –     vfs
          •     Represents all file system types
          •     Contains pointers to functions to manipulate each file system as a whole
                (e.g. mount, unmount)
                –   Form a standard interface to the file system
    –     vnode
          •     Represents a file (inode) in the underlying filesystem
          •     Points to the real inode
          •     Contains pointers to functions to manipulate files/inodes (e.g. open, close,
                read, write,…)


                                                                                      10
Vfs and Vnode Structures
    struct vnode                       • size
          Generic                      • uid, gid
     (FS-independent)                  • ctime, atime, mtime
            fields                     •…
          fs_data
        vnode ops
                                       FS-specific
                                          fields
ext2fs_read
ext2fs_write
    ...
                                           • Block group number
                   FS-specific
                                           • Data block list
                   implementation of
                                           •…
                   vnode operations

                                                               11
Vfs and Vnode Structures
    struct vfs
                                      • Block size
          Generic                     • Max file size
     (FS-independent)                 •…
            fields
          fs_data
           vfs ops
                                      FS-specific
                                         fields
ext2_unmount
 ext2_getroot
     ...
                                          • i-nodes per group
                  FS-specific
                                          • Superblock address
                  implementation of
                                          •…
                  FS operations

                                                          12
A look at OS/161’s VFS
                                                 Force the
The OS161’s file system type                     filesystem to
Represents interface to a mounted filesystem     flush its content
                                                 to disk
struct fs {                                                          Retrieve the
   int           (*fs_sync)(struct fs *);                            volume name
   const char   *(*fs_getvolname)(struct fs *);
   struct vnode *(*fs_getroot)(struct fs *);                  Retrieve the vnode
   int           (*fs_unmount)(struct fs *);                  associated with the
                                                              root of the
                                                              filesystem
     void *fs_data;
};                                              Unmount the filesystem
                                                Note: mount called via
                                                function ptr passed to
                          Private file system
                                                vfs_mount
                          specific data


                                                                         13
Count the                                       Number of
   number of
  “references”
                  Vnode                            times vnode
                                                    is currently
 to this vnode                                          open
struct vnode {                                    Lock for mutual
                                                     exclusive
  int vn_refcount;                                  access to
  int vn_opencount;                                   counts

  struct lock *vn_countlock;
  struct fs *vn_fs;   Pointer to FS
                                                    Pointer to FS
                                                     containing
  void *vn_data;        specific                     the vnode
                                   vnode data
                                   (e.g. inode)
  const struct vnode_ops *vn_ops;
};             Array of pointers
                    to functions
                    operating on
                      vnodes                                14
struct vnode_ops {
                                               Vnode Ops
   unsigned long vop_magic;        /* should always be VOP_MAGIC */

   int (*vop_open)(struct vnode *object, int flags_from_open);
   int (*vop_close)(struct vnode *object);
   int (*vop_reclaim)(struct vnode *vnode);


   int   (*vop_read)(struct vnode *file, struct uio *uio);
   int   (*vop_readlink)(struct vnode *link, struct uio *uio);
   int   (*vop_getdirentry)(struct vnode *dir, struct uio *uio);
   int   (*vop_write)(struct vnode *file, struct uio *uio);
   int   (*vop_ioctl)(struct vnode *object, int op, userptr_t data);
   int   (*vop_stat)(struct vnode *object, struct stat *statbuf);
   int   (*vop_gettype)(struct vnode *object, int *result);
   int   (*vop_tryseek)(struct vnode *object, off_t pos);
   int   (*vop_fsync)(struct vnode *object);
   int   (*vop_mmap)(struct vnode *file /* add stuff */);
   int   (*vop_truncate)(struct vnode *file, off_t len);
   int   (*vop_namefile)(struct vnode *file, struct uio *uio);




                                                                       15
Vnode Ops
     int (*vop_creat)(struct vnode *dir,
      const char *name, int excl,
      struct vnode **result);
     int (*vop_symlink)(struct vnode *dir,
        const char *contents, const char *name);
     int (*vop_mkdir)(struct vnode *parentdir,
      const char *name);
     int (*vop_link)(struct vnode *dir,
     const char *name, struct vnode *file);
     int (*vop_remove)(struct vnode *dir,
       const char *name);
     int (*vop_rmdir)(struct vnode *dir,
      const char *name);

     int (*vop_rename)(struct vnode *vn1, const char *name1,
       struct vnode *vn2, const char *name2);


     int (*vop_lookup)(struct vnode *dir,
       char *pathname, struct vnode **result);
     int (*vop_lookparent)(struct vnode *dir,
           char *pathname, struct vnode **result,
           char *buf, size_t len);
};

                                                               16
Vnode Ops
• Note that most operations are on vnodes. How do we
  operate on file names?
    – Higher level API on names that uses the internal VOP_*
      functions
int vfs_open(char *path, int openflags, struct vnode **ret);
void vfs_close(struct vnode *vn);
int vfs_readlink(char *path, struct uio *data);
int vfs_symlink(const char *contents, char *path);
int vfs_mkdir(char *path);
int vfs_link(char *oldpath, char *newpath);
int vfs_remove(char *path);
int vfs_rmdir(char *path);
int vfs_rename(char *oldpath, char *newpath);

int vfs_chdir(char *path);
int vfs_getcwd(struct uio *buf);




                                                               17
Example: OS/161 emufs vnode
            ops
/*
                                     emufs_file_gettype,
 * Function table for emufs
                                     emufs_tryseek,
   files.
                                     emufs_fsync,
 */
                                     UNIMP,   /* mmap */
static const struct vnode_ops        emufs_truncate,
   emufs_fileops = {                 NOTDIR, /* namefile */
   VOP_MAGIC, /* mark this a
   valid vnode ops table */          NOTDIR,   /*   creat */
                                     NOTDIR,   /*   symlink */
  emufs_open,                        NOTDIR,   /*   mkdir */
  emufs_close,                       NOTDIR,   /*   link */
  emufs_reclaim,                     NOTDIR,   /*   remove */
                                     NOTDIR,   /*   rmdir */
  emufs_read,                        NOTDIR,   /*   rename */
  NOTDIR, /* readlink */
                                     NOTDIR,   /* lookup */
  NOTDIR, /* getdirentry */
                                     NOTDIR,   /* lookparent */
  emufs_write,                  };
  emufs_ioctl,
  emufs_stat,
                                                                  18
File Descriptor & Open File Tables
                          Application

                           FD table
                           OF table
                             VFS
                              FS
                          Buffer cache
                         Disk scheduler
                         Device driver




                                      19
Motivation
                                       Application
System call interface:
fd = open(“file”,…); 
                                        FD table
read(fd,…);write(fd,…);lseek(fd,…);
close(fd);                              OF table
                                          VFS
                                           FS
                                       Buffer cache
VFS interface:                        Disk scheduler
vnode = vfs_open(“file”,…); 
                                      Device driver
vop_read(vnode,uio);
vop_write(vnode,uio);
vop_close(vnode);




                                                   20
File Descriptors
• File descriptors
  – Each open file has a file descriptor
  – Read/Write/lseek/…. use them to specify
    which file to operate on.
• State associated with a file fescriptor
  – File pointer
     • Determines where in the file the next read or write
       is performed
  – Mode
     • Was the file opened read-only, etc….
                                                        21
An Option?
• Use vnode numbers as file descriptors
  and add a file pointer to the vnode

• Problems
  – What happens when we concurrently open
    the same file twice?
    • We should get two separate file descriptors and file
      pointers….


                                                      22
An Option?              Array of Inodes
                           fd               in RAM
• Single global open
  file array
  – fd is an index into           fp
    the array                   i-ptr        vnode
  – Entries contain file
    pointer and pointer
    to a vnode



                                                23
Issues
                           fd
• File descriptor 1 is
  stdout
  – Stdout is                    fp
     • console for some         v-ptr   vnode
       processes
     • A file for others
• Entry 1 needs to be
  different per
  process!
                                          24
Per-process File Descriptor
             Array
• Each process has       P1 fd
  its own open file
  array                           fp
  – Contains fp, v-ptr           v-ptr   vnode
    etc.
  – Fd 1 can be any                      vnode
    inode for each       P2 fd
    process (console,
    log file).                    fp
                                 v-ptr
                                           25
Issue
• Fork
   – Fork defines that the child    P1 fd
     shares the file pointer with
     the parent
• Dup2                                       fp
   – Also defines the file                  v-ptr   vnode
     descriptors share the file
     pointer
• With per-process table, we                        vnode
  can only have independent P2 fd
  file pointers
   – Even when accessing the
     same file
                                             fp
                                            v-ptr
                                                      26
Per-Process fd table with global
           open file table
• Per-process file descriptor
  array                            P1 fd
    – Contains pointers to open
      file table entry
• Open file table array                             f-ptr        fp
    – Contain entries with a fp
      and pointer to an vnode.                                  v-ptr          vnode
• Provides
    – Shared file pointers if
      required
                                                                 fp
                                                    f-ptr                      vnode
    – Independent file pointers     P2 fd                       v-ptr
      if required
• Example:
    – All three fds refer to the                    f-ptr
      same file, two share a file
      pointer, one has an          Per-process
      independent file pointer    File Descriptor                             27
                                                            Open File Table
                                      Tables
Per-Process fd table with global
          open file table
• Used by Linux and
                    P1 fd
  most other Unix
  operating systems
                                      f-ptr        fp
                                                  v-ptr          vnode


                                                   fp
                                      f-ptr                      vnode
                     P2 fd                        v-ptr


                                      f-ptr
                     Per-process
                    File Descriptor           Open File Table   28
                        Tables
Buffer Cache
                Application

                 FD table
                 OF table
                   VFS
                    FS
                Buffer cache
               Disk scheduler
               Device driver




                            29
Buffer
• Buffer:
  – Temporary storage used when transferring
    data between two entities
     • Especially when the entities work at different rates
     • Or when the unit of transfer is incompatible
     • Example: between application program and disk




                                                        30
Buffering Disk Blocks
                                   •    Allow applications to work with
                                        arbitrarily sized region of a file
Application
                        Buffers           – However, apps can still
 Program               in Kernel            optimise for a particular block
                                            size
                         RAM
               Transfer of
                arbitrarily
              sized regions
                                       Transfer of
                                         whole
                                                     4     10
                  of file                blocks      11
                                                     12 13 7
                                                           14
                                                     5     15

                                                         16 6
                                                                        31
                                                          Disk
Buffering Disk Blocks
                                   •    Writes can return immediately
                                        after copying to kernel buffer
Application
                        Buffers           – Avoids waiting until write to
 Program               in Kernel            disk is complete
                                          – Write is scheduled in the
                         RAM                background
               Transfer of
                arbitrarily
              sized regions
                                       Transfer of
                                         whole
                                                     4     10
                  of file                blocks      11
                                                     12 13 7
                                                           14
                                                     5     15

                                                         16 6
                                                                        32
                                                          Disk
Buffering Disk Blocks
                                   •    Can implement read-ahead by
                                        pre-loading next block on disk
                        Buffers         into kernel buffer
Application
 Program               in Kernel          – Avoids having to wait until
                                            next read is issued
                         RAM
               Transfer of
                arbitrarily
              sized regions
                                       Transfer of
                                         whole
                                                     4     10
                  of file                blocks      11
                                                     12 13 7
                                                           14
                                                     5     15

                                                        16 6
                                                                          33
                                                          Disk
Cache
• Cache:
  – Fast storage used to temporarily hold data to
    speed up repeated access to the data
    • Example: Main memory can cache disk blocks




                                                   34
Caching Disk Blocks
                                  •    On access
                       Cached       – Before loading block from disk,
                       blocks in       check if it is in cache first
Application                              • Avoids disk accesses
 Program                Kernel
                                 • Can optimise for repeated access
                         RAM       for single or several processes
               Transfer of
                arbitrarily
              sized regions
                                      Transfer of
                                        whole
                                                    4     10
                  of file               blocks      11
                                                    12 13 7
                                                          14
                                                    5     15

                                                      16 6
                                                               35
                                                       Disk
Buffering and caching are
              related
• Data is read into buffer; extra cache copy
  would be wasteful
• After use, block should be put in a cache
• Future access may hit cached copy
• Cache utilises unused kernel memory
  space; may have to shrink



                                           36
Unix Buffer Cache
On read
  – Hash the
    device#, block#
  – Check if match in
    buffer cache
  – Yes, simply use
    in-memory copy
  – No, follow the
    collision chain
  – If not found, we
    load block from
    disk into cache
                               37
Replacement
• What happens when the buffer cache is full and
  we need to read another block into memory?
  – We must choose an existing entry to replace
     • Need a policy to choose a victim
         – Can use First-in First-out
         – Least Recently Used, or others.
     • Timestamps required for LRU implementation
     • However, is strict LRU what we want?




                                                    38
File System Consistency
• File data is expected to survive
• Strict LRU could keep critical data in
  memory forever if it is frequently used.




                                             39
File System Consistency
• Generally, cached disk blocks are prioritised in
  terms of how critical they are to file system
  consistency
  – Directory blocks, inode blocks if lost can corrupt
    entire filesystem
     • E.g. imagine losing the root directory
     • These blocks are usually scheduled for immediate write to
       disk
  – Data blocks if lost corrupt only the file that they are
    associated with
     • These blocks are only scheduled for write back to disk
       periodically
     • In UNIX, flushd (flush daemon) flushes all modified blocks to
       disk every 30 seconds
                                                                 40
File System Consistency
• Alternatively, use a write-through cache
  – All modified blocks are written immediately to disk
  – Generates much more disk traffic
     • Temporary files written back
     • Multiple updates not combined
  – Used by DOS
     • Gave okay consistency when
        – Floppies were removed from drives
        – Users were constantly resetting (or crashing) their machines
  – Still used, e.g. USB storage devices




                                                                     41
Disk scheduler
                  Application

                   FD table
                   OF table
                     VFS
                      FS
                  Buffer cache
                 Disk scheduler
                 Device driver




                              42
Disk Management
• Management and ordering of disk access
  requests is important:
  – Huge speed gap between memory and disk
  – Disk throughput is extremely sensitive to
    • Request order ⇒ Disk Scheduling
    • Placement of data on the disk ⇒ file system
      design
  – Disk scheduler must be aware of disk
    geometry
                                                    43
Disk Geometry




• Physical geometry of a disk with two zones
   – Outer tracks can store more sectors than inner without exceed max information density
• A possible virtual geometry for this disk


                                                                                             44
Evolution of Disk Hardware




Disk parameters for the original IBM PC floppy disk and
         a Western Digital WD 18300 hard disk
                                                     45
Things to Note
• Average seek time is approx 12 times better
• Rotation time is 24 times faster
• Transfer time is 1300 times faster
  – Most of this gain is due to increase in density
• Represents a gradual engineering improvement




                                                      46
Storage Capacity is 50000
      times greater




                            47
Estimating Access Time




                         48
A Timing Comparison




               11     8   67




                               49
Disk Performance is Entirely Dominated
    by Seek and Rotational Delays
•   Will only get worse as
    capacity increases much
    faster than increase in                 Average Access Time Scaled to 100%

    seek time and rotation        100%
    speed
                                   80%
    – Note it has been easier
      to spin the disk faster      60%                                            Transfer
                                                                                  Rot. Del.
      than improve seek time       40%                                            Seek

•   Operating System               20%

    should minimise                 0%
    mechanical delays as                        1
                                                22
                                                                       2
                                                                      0.017
                                Transfer
    much as possible            Rot. Del.      100                    4.165
                                Seek            77                     6.9
                                                          Disk




                                                                                 50
Disk Arm Scheduling Algorithms
•        Time required to read or write a disk
         block determined by 3 factors
    1.     Seek time
    2.     Rotational delay
    3.     Actual transfer time
•        Seek time dominates
•        For a single disk, there will be a
         number of I/O requests
    – Processing them in random order leads
      to worst possible performance
First-in, First-out (FIFO)
•   Process requests as they come
•   Fair (no starvation)
•   Good for a few processes with clustered requests
•   Deteriorates to random if there are many processes
Shortest Seek Time First
• Select request that minimises the seek time
• Generally performs much better than FIFO
• May lead to starvation
Elevator Algorithm (SCAN)
•   Move head in one direction
     – Services requests in track order until it reaches the last track,
       then reverses direction
•   Better than FIFO, usually worse than SSTF
•   Avoids starvation
•   Makes poor use of sequential reads (on down-scan)
•   Less Locality
Modified Elevator (Circular SCAN, C-SCAN)
• Like elevator, but reads sectors in only one direction
   – When reaching last track, go back to first track non-stop
• Better locality on sequential reads
• Better use of read ahead cache on controller
• Reduces max delay to read a particular sector

Más contenido relacionado

La actualidad más candente

11 linux filesystem copy
11 linux filesystem copy11 linux filesystem copy
11 linux filesystem copyShay Cohen
 
Mca ii os u-5 unix linux file system
Mca  ii  os u-5 unix linux file systemMca  ii  os u-5 unix linux file system
Mca ii os u-5 unix linux file systemRai University
 
Linux file system nevigation
Linux file system nevigationLinux file system nevigation
Linux file system nevigationhetaldobariya
 
FUSE (Filesystem in Userspace) on OpenSolaris
FUSE (Filesystem in Userspace) on OpenSolarisFUSE (Filesystem in Userspace) on OpenSolaris
FUSE (Filesystem in Userspace) on OpenSolariselliando dias
 
Lesson 2 Understanding Linux File System
Lesson 2 Understanding Linux File SystemLesson 2 Understanding Linux File System
Lesson 2 Understanding Linux File SystemSadia Bashir
 
FUSE and beyond: bridging filesystems slides by Emmanuel Dreyfus
FUSE and beyond: bridging filesystems slides by Emmanuel DreyfusFUSE and beyond: bridging filesystems slides by Emmanuel Dreyfus
FUSE and beyond: bridging filesystems slides by Emmanuel Dreyfuseurobsdcon
 
How to design a file system
How to design a file systemHow to design a file system
How to design a file systemNikhil Anurag VN
 
A fast file system for unix presentation by parang saraf (cs5204 VT)
A fast file system for unix presentation by parang saraf (cs5204 VT)A fast file system for unix presentation by parang saraf (cs5204 VT)
A fast file system for unix presentation by parang saraf (cs5204 VT)Parang Saraf
 

La actualidad más candente (20)

Os
OsOs
Os
 
Python Fuse
Python FusePython Fuse
Python Fuse
 
Linux file system
Linux file systemLinux file system
Linux file system
 
11 linux filesystem copy
11 linux filesystem copy11 linux filesystem copy
11 linux filesystem copy
 
Mca ii os u-5 unix linux file system
Mca  ii  os u-5 unix linux file systemMca  ii  os u-5 unix linux file system
Mca ii os u-5 unix linux file system
 
4. linux file systems
4. linux file systems4. linux file systems
4. linux file systems
 
Linux file system nevigation
Linux file system nevigationLinux file system nevigation
Linux file system nevigation
 
FUSE (Filesystem in Userspace) on OpenSolaris
FUSE (Filesystem in Userspace) on OpenSolarisFUSE (Filesystem in Userspace) on OpenSolaris
FUSE (Filesystem in Userspace) on OpenSolaris
 
Lect08
Lect08Lect08
Lect08
 
Lesson 2 Understanding Linux File System
Lesson 2 Understanding Linux File SystemLesson 2 Understanding Linux File System
Lesson 2 Understanding Linux File System
 
FUSE and beyond: bridging filesystems slides by Emmanuel Dreyfus
FUSE and beyond: bridging filesystems slides by Emmanuel DreyfusFUSE and beyond: bridging filesystems slides by Emmanuel Dreyfus
FUSE and beyond: bridging filesystems slides by Emmanuel Dreyfus
 
Linux file system
Linux file systemLinux file system
Linux file system
 
linux file system
linux file systemlinux file system
linux file system
 
Ubuntu File System
Ubuntu File SystemUbuntu File System
Ubuntu File System
 
Linux filesystemhierarchy
Linux filesystemhierarchyLinux filesystemhierarchy
Linux filesystemhierarchy
 
NTFS vs FAT
NTFS vs FATNTFS vs FAT
NTFS vs FAT
 
How to design a file system
How to design a file systemHow to design a file system
How to design a file system
 
Ch12
Ch12Ch12
Ch12
 
A fast file system for unix presentation by parang saraf (cs5204 VT)
A fast file system for unix presentation by parang saraf (cs5204 VT)A fast file system for unix presentation by parang saraf (cs5204 VT)
A fast file system for unix presentation by parang saraf (cs5204 VT)
 
Linux lecture5
Linux lecture5Linux lecture5
Linux lecture5
 

Destacado

Kernel development
Kernel developmentKernel development
Kernel developmentNuno Martins
 
Kernel Recipes 2016 - New hwmon device registration API - Jean Delvare
Kernel Recipes 2016 -  New hwmon device registration API - Jean DelvareKernel Recipes 2016 -  New hwmon device registration API - Jean Delvare
Kernel Recipes 2016 - New hwmon device registration API - Jean DelvareAnne Nicolas
 
Virtual file system (VFS)
Virtual file system (VFS)Virtual file system (VFS)
Virtual file system (VFS)Waylin Ch
 
Part 03 File System Implementation in Linux
Part 03 File System Implementation in LinuxPart 03 File System Implementation in Linux
Part 03 File System Implementation in LinuxTushar B Kute
 
Disk scheduling
Disk schedulingDisk scheduling
Disk schedulingniralim40
 
Kamaroninfo núm 23 març 1998
Kamaroninfo núm 23 març 1998Kamaroninfo núm 23 març 1998
Kamaroninfo núm 23 març 1998Josep Miquel
 
Cpa network - способы монетизации трафика
Cpa network -  способы монетизации трафикаCpa network -  способы монетизации трафика
Cpa network - способы монетизации трафикаKira Zhestkova
 
2015 04-22 presintaasje rr f kl
2015 04-22 presintaasje rr f kl2015 04-22 presintaasje rr f kl
2015 04-22 presintaasje rr f klwimdboer
 
Slide bio3397
Slide bio3397Slide bio3397
Slide bio3397dparkin
 
মাধ্যমিক স্তরে ভূগোল শিক্ষা ও করণীয়
মাধ্যমিক স্তরে ভূগোল শিক্ষা ও করণীয়মাধ্যমিক স্তরে ভূগোল শিক্ষা ও করণীয়
মাধ্যমিক স্তরে ভূগোল শিক্ষা ও করণীয়Abul Bashar
 
Getting Started
Getting StartedGetting Started
Getting StartedDiveon
 
2007_1
2007_12007_1
2007_1Diveon
 
Social Media and 2010 Olympic Winter Games
Social Media and 2010 Olympic Winter GamesSocial Media and 2010 Olympic Winter Games
Social Media and 2010 Olympic Winter GamesAshley Spilak
 
আত্ম সচেতনতা অর্জনের সূত্র
আত্ম সচেতনতা অর্জনের সূত্র আত্ম সচেতনতা অর্জনের সূত্র
আত্ম সচেতনতা অর্জনের সূত্র Abul Bashar
 

Destacado (20)

1 04 rao
1 04 rao1 04 rao
1 04 rao
 
Kernel development
Kernel developmentKernel development
Kernel development
 
Kernel Recipes 2016 - New hwmon device registration API - Jean Delvare
Kernel Recipes 2016 -  New hwmon device registration API - Jean DelvareKernel Recipes 2016 -  New hwmon device registration API - Jean Delvare
Kernel Recipes 2016 - New hwmon device registration API - Jean Delvare
 
OSCh12
OSCh12OSCh12
OSCh12
 
File system
File systemFile system
File system
 
Virtual file system (VFS)
Virtual file system (VFS)Virtual file system (VFS)
Virtual file system (VFS)
 
Unix training session 1
Unix training   session 1Unix training   session 1
Unix training session 1
 
Part 03 File System Implementation in Linux
Part 03 File System Implementation in LinuxPart 03 File System Implementation in Linux
Part 03 File System Implementation in Linux
 
Linux Vfs
Linux VfsLinux Vfs
Linux Vfs
 
Disk scheduling
Disk schedulingDisk scheduling
Disk scheduling
 
Kamaroninfo núm 23 març 1998
Kamaroninfo núm 23 març 1998Kamaroninfo núm 23 març 1998
Kamaroninfo núm 23 març 1998
 
Cpa network - способы монетизации трафика
Cpa network -  способы монетизации трафикаCpa network -  способы монетизации трафика
Cpa network - способы монетизации трафика
 
2015 04-22 presintaasje rr f kl
2015 04-22 presintaasje rr f kl2015 04-22 presintaasje rr f kl
2015 04-22 presintaasje rr f kl
 
Slide bio3397
Slide bio3397Slide bio3397
Slide bio3397
 
মাধ্যমিক স্তরে ভূগোল শিক্ষা ও করণীয়
মাধ্যমিক স্তরে ভূগোল শিক্ষা ও করণীয়মাধ্যমিক স্তরে ভূগোল শিক্ষা ও করণীয়
মাধ্যমিক স্তরে ভূগোল শিক্ষা ও করণীয়
 
Getting Started
Getting StartedGetting Started
Getting Started
 
2007_1
2007_12007_1
2007_1
 
Social Media and 2010 Olympic Winter Games
Social Media and 2010 Olympic Winter GamesSocial Media and 2010 Olympic Winter Games
Social Media and 2010 Olympic Winter Games
 
আত্ম সচেতনতা অর্জনের সূত্র
আত্ম সচেতনতা অর্জনের সূত্র আত্ম সচেতনতা অর্জনের সূত্র
আত্ম সচেতনতা অর্জনের সূত্র
 
The journey(www)
The journey(www)The journey(www)
The journey(www)
 

Similar a Lect12

The Linux Kernel Implementation of Pipes and FIFOs
The Linux Kernel Implementation of Pipes and FIFOsThe Linux Kernel Implementation of Pipes and FIFOs
The Linux Kernel Implementation of Pipes and FIFOsDivye Kapoor
 
Introduction to file system and OCFS2
Introduction to file system and OCFS2Introduction to file system and OCFS2
Introduction to file system and OCFS2Gang He
 
Fuse'ing python for rapid development of storage efficient
Fuse'ing python for rapid development of storage efficientFuse'ing python for rapid development of storage efficient
Fuse'ing python for rapid development of storage efficientVishal Kanaujia
 
Introduction One of the key goals for the Windows Subsystem for Li.pdf
Introduction One of the key goals for the Windows Subsystem for Li.pdfIntroduction One of the key goals for the Windows Subsystem for Li.pdf
Introduction One of the key goals for the Windows Subsystem for Li.pdfanwarfoot
 
Distributed File System
Distributed File SystemDistributed File System
Distributed File SystemNtu
 
File system Os
File system OsFile system Os
File system OsNehal Naik
 
Xfs file system for linux
Xfs file system for linuxXfs file system for linux
Xfs file system for linuxAjay Sood
 
Xen server storage Overview
Xen server storage OverviewXen server storage Overview
Xen server storage OverviewNuno Alves
 
Unix file systems 2 in unix internal systems
Unix file systems 2 in unix internal systems Unix file systems 2 in unix internal systems
Unix file systems 2 in unix internal systems senthilamul
 
UNIX(Essential needs of administration)
UNIX(Essential needs of administration)UNIX(Essential needs of administration)
UNIX(Essential needs of administration)Papu Kumar
 
Btrfs: Design, Implementation and the Current Status
Btrfs: Design, Implementation and the Current StatusBtrfs: Design, Implementation and the Current Status
Btrfs: Design, Implementation and the Current StatusLukáš Czerner
 
Disk and File System Management in Linux
Disk and File System Management in LinuxDisk and File System Management in Linux
Disk and File System Management in LinuxHenry Osborne
 
linuxfilesystem-180727181106 (1).pdf
linuxfilesystem-180727181106 (1).pdflinuxfilesystem-180727181106 (1).pdf
linuxfilesystem-180727181106 (1).pdfShaswatSurya
 
Ntfs and computer forensics
Ntfs and computer forensicsNtfs and computer forensics
Ntfs and computer forensicsGaurav Ragtah
 

Similar a Lect12 (20)

The Linux Kernel Implementation of Pipes and FIFOs
The Linux Kernel Implementation of Pipes and FIFOsThe Linux Kernel Implementation of Pipes and FIFOs
The Linux Kernel Implementation of Pipes and FIFOs
 
Introduction to file system and OCFS2
Introduction to file system and OCFS2Introduction to file system and OCFS2
Introduction to file system and OCFS2
 
Fuse'ing python for rapid development of storage efficient
Fuse'ing python for rapid development of storage efficientFuse'ing python for rapid development of storage efficient
Fuse'ing python for rapid development of storage efficient
 
Introduction One of the key goals for the Windows Subsystem for Li.pdf
Introduction One of the key goals for the Windows Subsystem for Li.pdfIntroduction One of the key goals for the Windows Subsystem for Li.pdf
Introduction One of the key goals for the Windows Subsystem for Li.pdf
 
[ArabBSD] Unix Basics
[ArabBSD] Unix Basics[ArabBSD] Unix Basics
[ArabBSD] Unix Basics
 
Ch1 linux basics
Ch1 linux basicsCh1 linux basics
Ch1 linux basics
 
Distributed File System
Distributed File SystemDistributed File System
Distributed File System
 
File system Os
File system OsFile system Os
File system Os
 
Xfs file system for linux
Xfs file system for linuxXfs file system for linux
Xfs file system for linux
 
File System Modules
File System ModulesFile System Modules
File System Modules
 
Xen server storage Overview
Xen server storage OverviewXen server storage Overview
Xen server storage Overview
 
Operating System
Operating SystemOperating System
Operating System
 
Unix file systems 2 in unix internal systems
Unix file systems 2 in unix internal systems Unix file systems 2 in unix internal systems
Unix file systems 2 in unix internal systems
 
UNIX(Essential needs of administration)
UNIX(Essential needs of administration)UNIX(Essential needs of administration)
UNIX(Essential needs of administration)
 
009709863.pdf
009709863.pdf009709863.pdf
009709863.pdf
 
Btrfs: Design, Implementation and the Current Status
Btrfs: Design, Implementation and the Current StatusBtrfs: Design, Implementation and the Current Status
Btrfs: Design, Implementation and the Current Status
 
Disk and File System Management in Linux
Disk and File System Management in LinuxDisk and File System Management in Linux
Disk and File System Management in Linux
 
linuxfilesystem-180727181106 (1).pdf
linuxfilesystem-180727181106 (1).pdflinuxfilesystem-180727181106 (1).pdf
linuxfilesystem-180727181106 (1).pdf
 
Ntfs and computer forensics
Ntfs and computer forensicsNtfs and computer forensics
Ntfs and computer forensics
 
UNIT III.pptx
UNIT III.pptxUNIT III.pptx
UNIT III.pptx
 

Más de Vin Voro

Tele3113 tut6
Tele3113 tut6Tele3113 tut6
Tele3113 tut6Vin Voro
 
Tele3113 tut5
Tele3113 tut5Tele3113 tut5
Tele3113 tut5Vin Voro
 
Tele3113 tut4
Tele3113 tut4Tele3113 tut4
Tele3113 tut4Vin Voro
 
Tele3113 tut1
Tele3113 tut1Tele3113 tut1
Tele3113 tut1Vin Voro
 
Tele3113 tut3
Tele3113 tut3Tele3113 tut3
Tele3113 tut3Vin Voro
 
Tele3113 tut2
Tele3113 tut2Tele3113 tut2
Tele3113 tut2Vin Voro
 
Tele3113 wk11tue
Tele3113 wk11tueTele3113 wk11tue
Tele3113 wk11tueVin Voro
 
Tele3113 wk10wed
Tele3113 wk10wedTele3113 wk10wed
Tele3113 wk10wedVin Voro
 
Tele3113 wk10tue
Tele3113 wk10tueTele3113 wk10tue
Tele3113 wk10tueVin Voro
 
Tele3113 wk11wed
Tele3113 wk11wedTele3113 wk11wed
Tele3113 wk11wedVin Voro
 
Tele3113 wk7wed
Tele3113 wk7wedTele3113 wk7wed
Tele3113 wk7wedVin Voro
 
Tele3113 wk9tue
Tele3113 wk9tueTele3113 wk9tue
Tele3113 wk9tueVin Voro
 
Tele3113 wk8wed
Tele3113 wk8wedTele3113 wk8wed
Tele3113 wk8wedVin Voro
 
Tele3113 wk9wed
Tele3113 wk9wedTele3113 wk9wed
Tele3113 wk9wedVin Voro
 
Tele3113 wk7wed
Tele3113 wk7wedTele3113 wk7wed
Tele3113 wk7wedVin Voro
 
Tele3113 wk7wed
Tele3113 wk7wedTele3113 wk7wed
Tele3113 wk7wedVin Voro
 
Tele3113 wk7tue
Tele3113 wk7tueTele3113 wk7tue
Tele3113 wk7tueVin Voro
 
Tele3113 wk6wed
Tele3113 wk6wedTele3113 wk6wed
Tele3113 wk6wedVin Voro
 
Tele3113 wk6tue
Tele3113 wk6tueTele3113 wk6tue
Tele3113 wk6tueVin Voro
 
Tele3113 wk5tue
Tele3113 wk5tueTele3113 wk5tue
Tele3113 wk5tueVin Voro
 

Más de Vin Voro (20)

Tele3113 tut6
Tele3113 tut6Tele3113 tut6
Tele3113 tut6
 
Tele3113 tut5
Tele3113 tut5Tele3113 tut5
Tele3113 tut5
 
Tele3113 tut4
Tele3113 tut4Tele3113 tut4
Tele3113 tut4
 
Tele3113 tut1
Tele3113 tut1Tele3113 tut1
Tele3113 tut1
 
Tele3113 tut3
Tele3113 tut3Tele3113 tut3
Tele3113 tut3
 
Tele3113 tut2
Tele3113 tut2Tele3113 tut2
Tele3113 tut2
 
Tele3113 wk11tue
Tele3113 wk11tueTele3113 wk11tue
Tele3113 wk11tue
 
Tele3113 wk10wed
Tele3113 wk10wedTele3113 wk10wed
Tele3113 wk10wed
 
Tele3113 wk10tue
Tele3113 wk10tueTele3113 wk10tue
Tele3113 wk10tue
 
Tele3113 wk11wed
Tele3113 wk11wedTele3113 wk11wed
Tele3113 wk11wed
 
Tele3113 wk7wed
Tele3113 wk7wedTele3113 wk7wed
Tele3113 wk7wed
 
Tele3113 wk9tue
Tele3113 wk9tueTele3113 wk9tue
Tele3113 wk9tue
 
Tele3113 wk8wed
Tele3113 wk8wedTele3113 wk8wed
Tele3113 wk8wed
 
Tele3113 wk9wed
Tele3113 wk9wedTele3113 wk9wed
Tele3113 wk9wed
 
Tele3113 wk7wed
Tele3113 wk7wedTele3113 wk7wed
Tele3113 wk7wed
 
Tele3113 wk7wed
Tele3113 wk7wedTele3113 wk7wed
Tele3113 wk7wed
 
Tele3113 wk7tue
Tele3113 wk7tueTele3113 wk7tue
Tele3113 wk7tue
 
Tele3113 wk6wed
Tele3113 wk6wedTele3113 wk6wed
Tele3113 wk6wed
 
Tele3113 wk6tue
Tele3113 wk6tueTele3113 wk6tue
Tele3113 wk6tue
 
Tele3113 wk5tue
Tele3113 wk5tueTele3113 wk5tue
Tele3113 wk5tue
 

Lect12

  • 1. UNIX File Management (continued)
  • 2. OS storage stack (recap) Application FD table OF table VFS FS Buffer cache Disk scheduler Device driver 2
  • 3. Virtual File System (VFS) Application FD table OF table VFS FS Buffer cache Disk scheduler Device driver 3
  • 4. Older Systems only had a single file system • They had file system specific open, close, read, write, … calls. • However, modern systems need to support many file system types – ISO9660 (CDROM), MSDOS (floppy), ext2fs, tmpfs 4
  • 5. Supporting Multiple File Systems • Alternatives – Change the file system code to understand different file system types • Prone to code bloat, complex, non-solution – Provide a framework that separates file system independent and file system dependent code. • Allows different file systems to be “plugged in” 5
  • 6. Virtual File System (VFS) Application FD table OF table VFS FS FS2 Buffer cache Disk scheduler Disk scheduler Device driver Device driver 6
  • 7. Virtual file system (VFS) / open(“/home/leonidr/file”, …); Traversing the directory hierarchy ext3 may require VFS to issue requests to several underlying file systems /home/leonidr nfs 7
  • 8. Virtual File System (VFS) • Provides single system call interface for many file systems – E.g., UFS, Ext2, XFS, DOS, ISO9660,… • Transparent handling of network file systems – E.g., NFS, AFS, CODA • File-based interface to arbitrary device drivers (/dev) • File-based interface to kernel data structures (/proc) • Provides an indirection layer for system calls – File operation table set up at file open time – Points to actual handling code for particular type – Further file operations redirected to those functions 8
  • 9. The file system independent code deals with vfs and vnodes VFS FS vnode inode File system dependent File Descriptor Tables Open File Table code9 9
  • 10. VFS Interface • Reference – S.R. Kleiman., "Vnodes: An Architecture for Multiple File System Types in Sun Unix," USENIX Association: Summer Conference Proceedings, Atlanta, 1986 – Linux and OS/161 differ slightly, but the principles are the same • Two major data types – vfs • Represents all file system types • Contains pointers to functions to manipulate each file system as a whole (e.g. mount, unmount) – Form a standard interface to the file system – vnode • Represents a file (inode) in the underlying filesystem • Points to the real inode • Contains pointers to functions to manipulate files/inodes (e.g. open, close, read, write,…) 10
  • 11. Vfs and Vnode Structures struct vnode • size Generic • uid, gid (FS-independent) • ctime, atime, mtime fields •… fs_data vnode ops FS-specific fields ext2fs_read ext2fs_write ... • Block group number FS-specific • Data block list implementation of •… vnode operations 11
  • 12. Vfs and Vnode Structures struct vfs • Block size Generic • Max file size (FS-independent) •… fields fs_data vfs ops FS-specific fields ext2_unmount ext2_getroot ... • i-nodes per group FS-specific • Superblock address implementation of •… FS operations 12
  • 13. A look at OS/161’s VFS Force the The OS161’s file system type filesystem to Represents interface to a mounted filesystem flush its content to disk struct fs { Retrieve the int (*fs_sync)(struct fs *); volume name const char *(*fs_getvolname)(struct fs *); struct vnode *(*fs_getroot)(struct fs *); Retrieve the vnode int (*fs_unmount)(struct fs *); associated with the root of the filesystem void *fs_data; }; Unmount the filesystem Note: mount called via function ptr passed to Private file system vfs_mount specific data 13
  • 14. Count the Number of number of “references” Vnode times vnode is currently to this vnode open struct vnode { Lock for mutual exclusive int vn_refcount; access to int vn_opencount; counts struct lock *vn_countlock; struct fs *vn_fs; Pointer to FS Pointer to FS containing void *vn_data; specific the vnode vnode data (e.g. inode) const struct vnode_ops *vn_ops; }; Array of pointers to functions operating on vnodes 14
  • 15. struct vnode_ops { Vnode Ops unsigned long vop_magic; /* should always be VOP_MAGIC */ int (*vop_open)(struct vnode *object, int flags_from_open); int (*vop_close)(struct vnode *object); int (*vop_reclaim)(struct vnode *vnode); int (*vop_read)(struct vnode *file, struct uio *uio); int (*vop_readlink)(struct vnode *link, struct uio *uio); int (*vop_getdirentry)(struct vnode *dir, struct uio *uio); int (*vop_write)(struct vnode *file, struct uio *uio); int (*vop_ioctl)(struct vnode *object, int op, userptr_t data); int (*vop_stat)(struct vnode *object, struct stat *statbuf); int (*vop_gettype)(struct vnode *object, int *result); int (*vop_tryseek)(struct vnode *object, off_t pos); int (*vop_fsync)(struct vnode *object); int (*vop_mmap)(struct vnode *file /* add stuff */); int (*vop_truncate)(struct vnode *file, off_t len); int (*vop_namefile)(struct vnode *file, struct uio *uio); 15
  • 16. Vnode Ops int (*vop_creat)(struct vnode *dir, const char *name, int excl, struct vnode **result); int (*vop_symlink)(struct vnode *dir, const char *contents, const char *name); int (*vop_mkdir)(struct vnode *parentdir, const char *name); int (*vop_link)(struct vnode *dir, const char *name, struct vnode *file); int (*vop_remove)(struct vnode *dir, const char *name); int (*vop_rmdir)(struct vnode *dir, const char *name); int (*vop_rename)(struct vnode *vn1, const char *name1, struct vnode *vn2, const char *name2); int (*vop_lookup)(struct vnode *dir, char *pathname, struct vnode **result); int (*vop_lookparent)(struct vnode *dir, char *pathname, struct vnode **result, char *buf, size_t len); }; 16
  • 17. Vnode Ops • Note that most operations are on vnodes. How do we operate on file names? – Higher level API on names that uses the internal VOP_* functions int vfs_open(char *path, int openflags, struct vnode **ret); void vfs_close(struct vnode *vn); int vfs_readlink(char *path, struct uio *data); int vfs_symlink(const char *contents, char *path); int vfs_mkdir(char *path); int vfs_link(char *oldpath, char *newpath); int vfs_remove(char *path); int vfs_rmdir(char *path); int vfs_rename(char *oldpath, char *newpath); int vfs_chdir(char *path); int vfs_getcwd(struct uio *buf); 17
  • 18. Example: OS/161 emufs vnode ops /* emufs_file_gettype, * Function table for emufs emufs_tryseek, files. emufs_fsync, */ UNIMP, /* mmap */ static const struct vnode_ops emufs_truncate, emufs_fileops = { NOTDIR, /* namefile */ VOP_MAGIC, /* mark this a valid vnode ops table */ NOTDIR, /* creat */ NOTDIR, /* symlink */ emufs_open, NOTDIR, /* mkdir */ emufs_close, NOTDIR, /* link */ emufs_reclaim, NOTDIR, /* remove */ NOTDIR, /* rmdir */ emufs_read, NOTDIR, /* rename */ NOTDIR, /* readlink */ NOTDIR, /* lookup */ NOTDIR, /* getdirentry */ NOTDIR, /* lookparent */ emufs_write, }; emufs_ioctl, emufs_stat, 18
  • 19. File Descriptor & Open File Tables Application FD table OF table VFS FS Buffer cache Disk scheduler Device driver 19
  • 20. Motivation Application System call interface: fd = open(“file”,…);  FD table read(fd,…);write(fd,…);lseek(fd,…); close(fd); OF table VFS FS Buffer cache VFS interface: Disk scheduler vnode = vfs_open(“file”,…);  Device driver vop_read(vnode,uio); vop_write(vnode,uio); vop_close(vnode); 20
  • 21. File Descriptors • File descriptors – Each open file has a file descriptor – Read/Write/lseek/…. use them to specify which file to operate on. • State associated with a file fescriptor – File pointer • Determines where in the file the next read or write is performed – Mode • Was the file opened read-only, etc…. 21
  • 22. An Option? • Use vnode numbers as file descriptors and add a file pointer to the vnode • Problems – What happens when we concurrently open the same file twice? • We should get two separate file descriptors and file pointers…. 22
  • 23. An Option? Array of Inodes fd in RAM • Single global open file array – fd is an index into fp the array i-ptr vnode – Entries contain file pointer and pointer to a vnode 23
  • 24. Issues fd • File descriptor 1 is stdout – Stdout is fp • console for some v-ptr vnode processes • A file for others • Entry 1 needs to be different per process! 24
  • 25. Per-process File Descriptor Array • Each process has P1 fd its own open file array fp – Contains fp, v-ptr v-ptr vnode etc. – Fd 1 can be any vnode inode for each P2 fd process (console, log file). fp v-ptr 25
  • 26. Issue • Fork – Fork defines that the child P1 fd shares the file pointer with the parent • Dup2 fp – Also defines the file v-ptr vnode descriptors share the file pointer • With per-process table, we vnode can only have independent P2 fd file pointers – Even when accessing the same file fp v-ptr 26
  • 27. Per-Process fd table with global open file table • Per-process file descriptor array P1 fd – Contains pointers to open file table entry • Open file table array f-ptr fp – Contain entries with a fp and pointer to an vnode. v-ptr vnode • Provides – Shared file pointers if required fp f-ptr vnode – Independent file pointers P2 fd v-ptr if required • Example: – All three fds refer to the f-ptr same file, two share a file pointer, one has an Per-process independent file pointer File Descriptor 27 Open File Table Tables
  • 28. Per-Process fd table with global open file table • Used by Linux and P1 fd most other Unix operating systems f-ptr fp v-ptr vnode fp f-ptr vnode P2 fd v-ptr f-ptr Per-process File Descriptor Open File Table 28 Tables
  • 29. Buffer Cache Application FD table OF table VFS FS Buffer cache Disk scheduler Device driver 29
  • 30. Buffer • Buffer: – Temporary storage used when transferring data between two entities • Especially when the entities work at different rates • Or when the unit of transfer is incompatible • Example: between application program and disk 30
  • 31. Buffering Disk Blocks • Allow applications to work with arbitrarily sized region of a file Application Buffers – However, apps can still Program in Kernel optimise for a particular block size RAM Transfer of arbitrarily sized regions Transfer of whole 4 10 of file blocks 11 12 13 7 14 5 15 16 6 31 Disk
  • 32. Buffering Disk Blocks • Writes can return immediately after copying to kernel buffer Application Buffers – Avoids waiting until write to Program in Kernel disk is complete – Write is scheduled in the RAM background Transfer of arbitrarily sized regions Transfer of whole 4 10 of file blocks 11 12 13 7 14 5 15 16 6 32 Disk
  • 33. Buffering Disk Blocks • Can implement read-ahead by pre-loading next block on disk Buffers into kernel buffer Application Program in Kernel – Avoids having to wait until next read is issued RAM Transfer of arbitrarily sized regions Transfer of whole 4 10 of file blocks 11 12 13 7 14 5 15 16 6 33 Disk
  • 34. Cache • Cache: – Fast storage used to temporarily hold data to speed up repeated access to the data • Example: Main memory can cache disk blocks 34
  • 35. Caching Disk Blocks • On access Cached – Before loading block from disk, blocks in check if it is in cache first Application • Avoids disk accesses Program Kernel • Can optimise for repeated access RAM for single or several processes Transfer of arbitrarily sized regions Transfer of whole 4 10 of file blocks 11 12 13 7 14 5 15 16 6 35 Disk
  • 36. Buffering and caching are related • Data is read into buffer; extra cache copy would be wasteful • After use, block should be put in a cache • Future access may hit cached copy • Cache utilises unused kernel memory space; may have to shrink 36
  • 37. Unix Buffer Cache On read – Hash the device#, block# – Check if match in buffer cache – Yes, simply use in-memory copy – No, follow the collision chain – If not found, we load block from disk into cache 37
  • 38. Replacement • What happens when the buffer cache is full and we need to read another block into memory? – We must choose an existing entry to replace • Need a policy to choose a victim – Can use First-in First-out – Least Recently Used, or others. • Timestamps required for LRU implementation • However, is strict LRU what we want? 38
  • 39. File System Consistency • File data is expected to survive • Strict LRU could keep critical data in memory forever if it is frequently used. 39
  • 40. File System Consistency • Generally, cached disk blocks are prioritised in terms of how critical they are to file system consistency – Directory blocks, inode blocks if lost can corrupt entire filesystem • E.g. imagine losing the root directory • These blocks are usually scheduled for immediate write to disk – Data blocks if lost corrupt only the file that they are associated with • These blocks are only scheduled for write back to disk periodically • In UNIX, flushd (flush daemon) flushes all modified blocks to disk every 30 seconds 40
  • 41. File System Consistency • Alternatively, use a write-through cache – All modified blocks are written immediately to disk – Generates much more disk traffic • Temporary files written back • Multiple updates not combined – Used by DOS • Gave okay consistency when – Floppies were removed from drives – Users were constantly resetting (or crashing) their machines – Still used, e.g. USB storage devices 41
  • 42. Disk scheduler Application FD table OF table VFS FS Buffer cache Disk scheduler Device driver 42
  • 43. Disk Management • Management and ordering of disk access requests is important: – Huge speed gap between memory and disk – Disk throughput is extremely sensitive to • Request order ⇒ Disk Scheduling • Placement of data on the disk ⇒ file system design – Disk scheduler must be aware of disk geometry 43
  • 44. Disk Geometry • Physical geometry of a disk with two zones – Outer tracks can store more sectors than inner without exceed max information density • A possible virtual geometry for this disk 44
  • 45. Evolution of Disk Hardware Disk parameters for the original IBM PC floppy disk and a Western Digital WD 18300 hard disk 45
  • 46. Things to Note • Average seek time is approx 12 times better • Rotation time is 24 times faster • Transfer time is 1300 times faster – Most of this gain is due to increase in density • Represents a gradual engineering improvement 46
  • 47. Storage Capacity is 50000 times greater 47
  • 49. A Timing Comparison 11 8 67 49
  • 50. Disk Performance is Entirely Dominated by Seek and Rotational Delays • Will only get worse as capacity increases much faster than increase in Average Access Time Scaled to 100% seek time and rotation 100% speed 80% – Note it has been easier to spin the disk faster 60% Transfer Rot. Del. than improve seek time 40% Seek • Operating System 20% should minimise 0% mechanical delays as 1 22 2 0.017 Transfer much as possible Rot. Del. 100 4.165 Seek 77 6.9 Disk 50
  • 51. Disk Arm Scheduling Algorithms • Time required to read or write a disk block determined by 3 factors 1. Seek time 2. Rotational delay 3. Actual transfer time • Seek time dominates • For a single disk, there will be a number of I/O requests – Processing them in random order leads to worst possible performance
  • 52. First-in, First-out (FIFO) • Process requests as they come • Fair (no starvation) • Good for a few processes with clustered requests • Deteriorates to random if there are many processes
  • 53. Shortest Seek Time First • Select request that minimises the seek time • Generally performs much better than FIFO • May lead to starvation
  • 54. Elevator Algorithm (SCAN) • Move head in one direction – Services requests in track order until it reaches the last track, then reverses direction • Better than FIFO, usually worse than SSTF • Avoids starvation • Makes poor use of sequential reads (on down-scan) • Less Locality
  • 55. Modified Elevator (Circular SCAN, C-SCAN) • Like elevator, but reads sectors in only one direction – When reaching last track, go back to first track non-stop • Better locality on sequential reads • Better use of read ahead cache on controller • Reduces max delay to read a particular sector