3. File Concept
• Contiguous logical address space
• Types:
– Data
• numeric
• character
• binary
– Program
4. File Concept
• A file is a named collection of related
information that is recorded on secondary
storage.
• Data can NOT be written to secondary storage
unless they are within a file.
5. File Structure
• A file has a certain defined structure which depends on its
types:
– A text file is a sequence of characters organized into
lines.
– A source file is a sequence of subroutines and function.
– An object file is a sequence of bytes organized into
blocks understandable by the system’s linker.
– An executable file is a series of code sections that the
loader can bring into memory and execute.
6. File Structure
• None - sequence of words, bytes
• Simple record structure
– Lines
– Fixed length
– Variable length
• Complex Structures
– Formatted document
– Relocatable load file
• Can simulate last two with first method by inserting
appropriate control characters
• Who decides:
– Operating system
– Program
7. File Attributes
• Name – only information kept in human-readable form
• Identifier – unique tag (number) identifies file within file
system
• Type – needed for systems that support different types
• Location – pointer to file location on device
• Size – current file size
• Protection – controls who can do reading, writing,
executing
• Time, date, and user identification – data for protection,
security, and usage monitoring
• Information about files are kept in the directory structure,
which is maintained on the disk
8. File Operations
• File is an abstract data type
• Create
• Write
• Read
• Reposition within file
• Delete
• Truncate
• Open(Fi) – search the directory structure on disk for
entry Fi, and move the content of entry to memory
• Close (Fi) – move the content of entry Fi in memory to
directory structure on disk
9. Open Files
• Several pieces of data are needed to manage
open files:
– File pointer: pointer to last read/write location, per
process that has the file open
– File-open count: counter of number of times a file is
open – to allow removal of data from open-file table
when last processes closes it
– Disk location of the file: cache of data access
information
– Access rights: per-process access mode information
10. Open File Locking
• Provided by some operating systems and file
systems
• Mediates access to a file
• Mandatory or advisory:
– Mandatory – access is denied depending on locks
held and requested
– Advisory – processes can find status of locks and
decide what to do
11. File Locking Example – Java API
import java.io.*;
import java.nio.channels.*;
public class LockingExample {
public static final boolean EXCLUSIVE = false;
public static final boolean SHARED = true;
public static void main(String arsg[]) throws IOException {
FileLock sharedLock = null;
FileLock exclusiveLock = null;
try {
RandomAccessFile raf = new RandomAccessFile("file.txt", "rw");
// get the channel for the file
FileChannel ch = raf.getChannel();
// this locks the first half of the file - exclusive
exclusiveLock = ch.lock(0, raf.length()/2, EXCLUSIVE);
/** Now modify the data . . . */
// release the lock
exclusiveLock.release();
12. File Locking Example – Java API (cont)
// this locks the second half of the file - shared
sharedLock = ch.lock(raf.length()/2+1, raf.length(),
SHARED);
/** Now read the data . . . */
// release the lock
sharedLock.release();
} catch (java.io.IOException ioe) {
System.err.println(ioe);
}finally {
if (exclusiveLock != null)
exclusiveLock.release();
if (sharedLock != null)
sharedLock.release();
}
}
}
18. Simulation of Sequential Access on Direct-access File
•Relative block number is the index relative to the beginning of file
19. Other Access Methods
•The index contains pointers to the various blocks. To
find a record in the file, we first search the index and
then use the pointer to access the file directly and to find
the desired record.
•IBMs Indexed Sequential Access Method (ISAM) uses
master index that points to disk blocks
•Secondary index points to file blocks
•To find particular item binary search to be performed on
master index
21. Information in a Device Directory
• Name
• Type
• Address
• Current length
• Maximum length
• Date last accessed (for archival)
• Date last updated (for dump)
• Owner ID (who pays)
• Protection information
22. Disk Structure
• Disk can be subdivided into partitions
• Disks or partitions can be RAID protected against failure
• Disk or partition can be used raw – without a file system, or
formatted with a file system
• Partitions also known as minidisks, slices
• Entity containing file system known as a volume
• Each volume containing file system also tracks that file system’s info
in device directory or volume table of contents
• As well as general-purpose file systems there are many special-
purpose file systems, frequently all within the same operating
system or computer
24. Directory Operations
• Search for a file – need to find a particular entry or be
able to find file names based on a pattern match.
• Create a file - and add its entry to the directory.
• Delete a file – and remove it from the directory.
• List a directory – list both the files in the directory and
the directory contents for each file.
• Rename a file – renaming may imply changing the
position of the file entry in the directory structure.
• Traverse the file system – the directory needs a logical
structure such that every directory and every file within
each directory can be accessing efficiently.
25. Directory Design Goal
To organize the logical structure to obtain:
• Efficiency – locating a file quickly.
• Naming – convenient to users.
– Two users can have same name for different files.
– The same file can have several different names.
• Grouping – logical grouping of files by properties,
(e.g., all Java programs, all games, …)
26. Single-Level Directory
• The simplest solution:: A single-level directory with file
entries for all users contained in the same directory.
• Advantages:
– Easy to support and understand.
• Disadvantages::
– Requires unique file names {the naming problem}.
– No natural system for keeping track of file names {the grouping problem}.
27. Two-Level Directory
• Standard solution: a separate directory for each user.
• The system’s Master File Directory (MFD) has
pointers to individual User File Directories (UFD’s).
• File names default to localized UFD for all operations.
28. Two-Level Directory
• Advantages
– Solves the name-collision problem.
– Isolates users from one another a form of protection.
– Efficient searching.
• Disadvantages
– Restricts user cooperation.
– No logical grouping capability (other than by user).
29. Path Name
• If a user can access another user’s files, the
concept of path name is needed.
• In two-level directory, this tree structure has MFD
as root of path through UFD to user file name at
leaf.
• Path name :: username + filename
• Standard syntax -- /user/file.ext
Add Partitions
• Additional syntax needed to specify partition
– e.g. in MS-DOS C:userfile.ext
30. Path Name
System File Issues
• Those programs provided as part of the system (e.g.
loaders, compilers, utility routines)
• e.g., Dotted files in Unix
• Another tradeoff issue
– Copy all system files into each UFD OR
– Create special user file directory that contains the
system files.
• Note: This complicates the file search procedure.
• Default is to search local UFD, and then special
UFD.
• To override this default search scheme, the user specifies
a specific sequence of directories to be searched when a
files is named – the search path.
31. Tree-Structured Directories
• This generalization to a directory tree structure of arbitrary height
allows users to create their own subdirectories and organize their
files accordingly.
Directory
• Becomes simply another file.
• Contains a set of files or subdirectories.
• All directories have the same internal format.
• One bit in directory entry defines entry as file or directory.
• Special commands are used to create and delete directories.
33. Tree-Structured Directories
• Advantages
– Efficient searching
– Grouping Capability
• Each user has a current directory (working
directory)
– cd /spell/mail/prog
– type list
34. Tree-Structured Directories
• Absolute or relative path name
• Creating a new file is done in current directory.
• Delete a file
rm <file-name>
• Creating a new subdirectory is done in current directory.
mkdir <dir-name>
Example: if in current directory /mail
mkdir count
mail
prog copy prt exp count
Deleting “mail” deleting the entire subtree rooted by “mail”.
35. Acyclic-Graph Directories
• A tree structure prohibits the sharing of files or directories.
• Acyclic graphs allow directories to have shared subdirectories and
files.
36. Acyclic-Graph Directories
Implementations of shared files or directories
• Links
– A new type of directory entry
– Effectively a pointer to another file or subdirectory
• Implemented as an absolute or relative path name.
– A link entry is resolved by using the path name to locate the
real file. {Note the inefficiency !}
– Problems are similar to aliasing because distinct file names
can refer to the same file.
• Duplicate all information in sharing directories
– Big problem is maintaining consistency when the file is
modified.
37. Acyclic-Graph Directories
Problems to consider with link implementation:
• Upon traversal of file system, do not want to traverse
shared structures more than once (e.g., doing backups or
accumulating file statistics).
• On deletion, which action to take?
– Option1: remove file when anyone issues delete
possible dangling pointer to non-existent file.
– Option2: [UNIX] use symbolic links links are left
when file is deleted and user has to “realize” that
original file is gone.
– Option3: maintain a file reference list containing one
entry for each reference to the file {disadvantages –
variable and large list}.
– Option4: keep a count of the number of references.
When count=0, file is deleted.
38. General Graph Directory
• When links are added to an existing tree-structured
directory, a general graph structure can be created.
39. General Graph Directory
• A general graph can have cycles and cycles cause
problems when searching or traversing file system.
• How do we guarantee no cycles?
– Allow only links to files not subdirectories.
– Use Garbage collection. {computationally expensive}
– Every time a new link is added, use a cycle detection
algorithm to determine whether a cycle now exists.
{computationally expensive}
• An alternative approach – to bypass links during directory
traversal.
40. File System Mounting
• A file system must be mounted before it can be available
to processes on the system.
• The mount procedure :: the OS is given the device name
and the location within the file structure at which to
attach the file system. {the mount point}
• A mount point is typically an empty directory where the
mounted file system will be attached.
• The OS verifies that device has valid file system by asking
device driver to read the device directory and verify that
directory has the proper format.
43. File Sharing
• Sharing of files on multi-user systems is desirable.
• Sharing may be done through a protection
scheme.
• On distributed systems, files may be shared
across a network.
• Network File System (NFS) is a common
distributed file-sharing method.
44. • File system
– Provide efficient and convenient access to disk
– Easy access to the data (store, locate and retrieve)
• Two aspects
– User’s view
• Define files/attributes, operations, directory
– Implementing file system
• Data structures and algorithms to map logical view to
physical one
File-System Structure
45. File-System Structure
• File structure
– Logical storage unit
– Collection of related information
• File system organized into layers
• File system resides on secondary storage (disks)
– Provides efficient and convenient access to disk by
allowing data to be stored, located retrieved easily
• File control block – storage structure consisting of
information about a file
• Device driver controls the physical device
46. Layered File System
• Each level uses the feature
of low level
• Create new features for
higher level Hardware specific
instruction
R/W Physical block
(cylinder, track, sector)
Issue commands
Translates logical to
physical blocks
Manages FCB
Device driver, transfer
information between
memory/disk
47. Application Programs The code that's making a file request.
Logical File System This is the highest level in the OS; it does
protection, and security. Uses the directory
structure to do name resolution.
File-organization Module Here we read the file control block maintained
in the directory so we know about files and the
logical blocks where information about that file
is located.
Basic File System Knowing specific blocks to access, we can now
make generic requests to the appropriate device
driver.
IO Control These are device drivers and interrupt handlers.
They cause the device to transfer information
between that device and CPU memory.
Devices The disks / tapes / etc.
48. File System Layers
I/O control layer consists of device drivers manage I/O devices at
the I/O control layer
– Given commands like “read drive1, cylinder 72, track 2,
sector 10, into memory location 1060” outputs low-level
hardware specific commands to hardware controller
Basic file system Issues commands with physical block address
(sector, track)
File organization module understands files, logical address, and
physical blocks
Translates logical block # to physical block #
Manages free space, disk allocation
49. File System Layers (Cont.)
Logical file system manages metadata information
Translates file name into file number, file handle, location by
maintaining file control blocks (inodes in Unix)
Directory management
Layering useful for reducing complexity and redundancy,
but adds overhead and can decrease performance
Shares the I/O control and basic File System
Many file systems, sometimes many within an operating
system
Each with its own format (CD-ROM is ISO 9660; Unix has UFS, FFS
(Berkley FAST FILE SYSTEM); Windows has FAT, FAT32, NTFS as
well as floppy, CD, DVD, Linux has more than 40 types, with
extended file system ext2 and ext3 leading; plus distributed file
systems, etc)
50. File-System Implementation
• Boot control block contains info needed by system to
boot OS from that volume
• Volume control block contains volume details
• Directory structure organizes the files
• Per-file File Control Block (FCB) contains many details
about the file
51. • In-memory information used for file system management
and performance management
• Data loaded at mount time and discarded at dismount
• Data Structures
– In-memory mount table
• Mounted volume
– In-memory data structure
• Directory information of recently accessed directory
– System-wide open-file table
• Copy of FCB for each open file
– Pre-process open-file table
• Pointer to appropriate entry in system-wide open-file table
53. In-Memory File System Structures
refers to opening a file
refers to reading a file
the necessary file system structures provided by the OS
54. Virtual File Systems
• Virtual File Systems (VFS) provide an
object-oriented way of implementing file
systems.
• VFS allows the same system call interface
(the API) to be used for different types of
file systems.
• The API is to the VFS interface, rather than
any specific type of file system.
55. • Partitions and mounting
– Raw disk
• Holds information needed by RAID system
– Boot information can be stored in separate
partition
• Dual booted
– Root partition
– Mount table structure
57. • VFS architecture in linux has 4 main object
types
– inode object
• Represents individual file
– file object
• Represents open file
– superblock object
• Represents entire file system
– dentry object
• Represents individual directory entry
58. • Abbreviated APIS for some file operations
– int open(, , ,) – open a file
– ssize & read(, , ,)- read from file
– ssize & wrie(, , , )-write to a file
– int mmap(, , ,)- memory map a file
60. Directory Implementation
• Linear list of file names with pointer to the data blocks.
– simple to program
– time-consuming to execute
Disadvantages:
– Finding a file requires linear search
– Can be overcome by using
• Sorted list with binary search mechanism
• Tree data structure such as B-Tree
61. Directory Implementation
• Hash Table – linear list with hash data structure.
– decreases directory search time
– collisions – situations where two file names hash to the
same location, can be overcome by chained overflow hash
table
– Each hash entry considered as linked list instead of value
Difficulties with hash table:
– fixed size
– Depends on hash function
62. Allocation Methods
• An allocation method refers to how disk blocks are
allocated for files:
– Contiguous allocation
– Linked allocation
– Indexed allocation
63. Contiguous Allocation of Disk Space
• Each file occupies a set of contiguous blocks on the
disk
• Simple – only starting location (block #) and length
(number of blocks) are required
• Random access
• Wasteful of space (dynamic storage-allocation
problem)
64. Allocation Methods - Contiguous
• Contiguous allocation – each file occupies set of
contiguous blocks
• Blocks are allocated b, b+1, b+2,…….
– Best performance in most cases
– Simple – only starting location (block #) and length
(number of blocks) are required (directory)
• Easy to implement
• Read performance is great. Only need one seek to locate the
first block in the file. The rest is easy.
• Accessing file is easy
– Minimum disk head movement
– Sequential and direct access
65. • Problems
– Finding space for file
• Satisfy the request of size n from the list of holes
• External fragmentation
– Need for compaction routine
– off-line (downtime) or on-line
– Do not know the file size a priori
• Terminate and restart
• Overestimate
• Copy it in a larger hole
• Allocate new contiguous space (Extent)
66. Extent-Based Systems
• Many newer file systems (i.e., Veritas File
System) use a modified contiguous allocation
scheme
• Extent-based file systems allocate disk blocks
in extents
• An extent is a contiguous block of disks
– Extents are allocated for file allocation
– A file consists of one or more extents
67. (a) Contiguous allocation of disk space for 7 files.
(b) The state of the disk after files D and F have been removed.
Contiguous Allocation
68. Linked Allocation
• Each file is a linked list of disk blocks: blocks may be scattered
anywhere on the disk.
pointerblock =
69. Linked Allocation
• Free blocks are arranged from the free space management
• No external fragmentation
• Files can continue to grow
Disadvantage
1. Effective only for sequential access
Random/direct access (i-th block) is difficult
2. Space wastage
If block size 512 B
Disk address 4B
Effective size 508B
3. Reliability
Lost/damaged pointer
Bug in the OS software and disk hardware failure
4. Poor performance
Solution: Clusters
• Improves disk access time
(head movement)
• Decreases the link space
needed for block
• Internal fragmentation
70. Section of the disk at the
beginning of the partition
contains FAT table
Unused blocks => 0
Linked Allocation
FAT (File Allocation Table) variation
Beginning of partition has table, indexed by block
number
Much like a linked list, but faster on disk and
cacheable
New block allocation simple
71. Allocation Methods - Linked
• Linked allocation – each file a linked list of
blocks
– No compaction, external fragmentation
– Free space management system called when new
block needed
– Reliability can be a problem
– Locating a block can take many I/Os and disk seeks
72. File-Allocation Table
Caching of FAT16
File-allocation table (FAT) – disk-space
allocation used by MS-DOS and OS/2.
Each block is indexed by block number.
75. Allocation Methods - Indexed
• Indexed allocation
– Each file has its own index block(s) of pointers to
its data blocks
• Directory contains address of the index block
• Logical view
index table
78. • If index block is too small it will not hold enough pointers
• Mechanisms to overcome this problem
– Linked scheme
– Multilevel index
– Combined scheme
79. • Linked scheme
– Index block is one disk block
– For large files several index blocks can be linked
– Contains small header with file name and addresses for first 100
blocks
• Multilevel index
– First level index block points to second level index block which inturn
points to file block
• Combined scheme
– Used in UNIX os keeps 15 pointers in files inode
– First 12 point to direct block and nest three points to in direct block
80. Multilevel Indexed Allocation
• Certain index entries point to index blocks, as
opposed to data blocks (e.g., Linux ext2)
data block location
index block location
index block location
index block location
data block location
index block location
index block location
data block location
data block location
File header
12
data block location
data block locationdata block location
data block location
index block location
81. Multilevel Indexed Allocation
• A single indirect block contains pointers to
data blocks
• A double indirect block contains pointers to
single indirect blocks
• A triple indirect block contains pointers to
double indirect blocks
82. Pros and Cons of Multilevel Indexed
Allocation
+ Optimized for small and large files
– Small files accessed through the first 12 pointers
– Large files can grow incrementally
- Multiple disk accesses to fetch a data block
under triple indirect block
- Largest file size capped by the number of
pointers
- Arbitrary file size boundaries among levels
84. Free-Space Management
• Bit vector (n blocks)
…
0 1 2 n-1
bit[i] =
0 block[i] free
1 block[i] occupied
Block number calculation
(number of bits per word) *
(number of 0-value words) +
offset of first 1 bit
85. Free-Space Management (Cont.)
• Bit map requires extra space
– Example:
block size = 212 bytes
disk size = 230 bytes (1 gigabyte)
n = 230/212 = 218 bits (or 32K bytes)
• Easy to get contiguous files
• Linked list (free list)
– Cannot get contiguous space easily
– No waste of space
86. Free-Space Management (Cont.)
• Need to protect:
– Pointer to free list
– Bit map
• Must be kept on disk
• Copy in memory and disk may differ
• Cannot allow for block[i] to have a situation
where bit[i] = 1 in memory and bit[i] = 0 on disk
– Solution:
• Set bit[i] = 1 in disk
• Allocate block[i]
• Set bit[i] = 1 in memory