Covers three basic organization of files as Sequential access, Random Access and Index sequential access with advantages, disadvantages, applications and comparison
2. Logical vs. Physical Organization
of Data
• logical organization
• the abstract way that the computer
program is able to access the data
• use of logical structures (e.g. linked lists)
• physical organization
• the actual physical structure of data in
memory
• i.e. what the sequence of bits look like in
memory
3. Definitions
• database
– collection of related files
• file
– collection of related records
• record
– collection of related fields (e.g. Name, Age)
• key field
– uniquely identifies a record (e.g. UserID)
4. Taxonomy of file structures
• One record after another,
from beginning to end
Access one specific record without having
to retrieve all records before
5. Basics
• Records are stored at different places (different
indices or locations)
• The access method determines how records
can be retrieved: sequentially or randomly.
• To find a record, we need to know its location
• We can search for the record
OR
• Jump to its location directly (if location is known)
OR
• A combination of jumping and searching
6. Sequential File Organization
• Suitable for applications that
require sequential processing
of the entire file
• The records in the file are
ordered by a search-key
• Originally designed to
operate on magnetic tapes
• records can only be accessed sequentially, one
after another, from beginning to end.
7. Sequential File Organization
• Deletion – use pointer chains
• Insertion –locate the position
where the record is to be inserted
– if there is free space insert there
– if no free space, insert the record
in an overflow block
– In either case, pointer chain
must be updated
• Need to reorganize the file
from time to time to restore
sequential order
8. Updating sequential files
• sequential files must be updated periodically to
reflect changes in information.
• The updating process –
all of the records need to be checked and updated
(if necessary) sequentially.
– New Master File
– Old Master File
– Transaction File –
contains changes to be applied to the master file.
• Add transaction
• Delete transaction
• Change transaction
• A key is one or more fields that uniquely identify the data in
the file.
– Error Report File
10. Updating sequential files
• To make updating process efficient, all files are
sorted on the same key.
• The update process requires that you compare :
[transaction file key] vs. [old master file key]
– < : add transaction to new master
– = :
• Change content of master file data (transaction code =
R(revise) )
• Remove data from master file (transaction code = D(delete) )
– > : write old master file record to new master file
(transaction code = A(add) )
11. Rename and Remove
• Remove(filename)
– This function is used to remove any file
from the record. Use one argument that is
name of the file which we want to delete.
• Rename(oldname,newname)
– This function is used to rename any file.
Takes two arguments. Old file name and
New name of the file.
12. Advantages
• If the order in which you keep records in a file is not
important, sequential organization is a good choice
whether there are many records or only a few.
Sequential output is also useful for printing reports.
• Reading of records in order of the ordering key is
extremely efficient.
• Finding the next record in order of the ordering key
usually, does not require additional block access.
Moreover, Next record may found in the same block.
• Moreover, Searching operation on ordering key is must
faster. Binary search can utilize. Also, A binary search
will require log2b block accesses where b is the total
number of blocks in the file
• It is simple to program and easy to design.
• Sequential file is best use if storage space.
13. Disadvantages
• The sequential file does not give any advantage when the
search operation is to carry out in non- ordering field.
• Inserting a record is an expensive operation. Insertion of a
new record requires the finding of a place of insertion and
then all records ahead of it must move to create space for
the record to insert. Moreover, This could be very expensive
for large files.
• Moreover, Deleting a record is an expensive operation. So,
Deletion too requires movement of records.
• Modification of field value of ordering key could time-
consuming. Also, Modifying the ordering field means the
record can change its position. This requires deletion of the
old record followed by insertion of the modified record.
• Sequential file is time consuming process.
• It has high data redundancy.
• Random searching is not possible.
15. Random Access File
Organization
• Direct access file is also known as random
access or relative file organization.
• In direct access file, all records are stored in
direct access storage device (DASD), such as
hard disk. The records are randomly placed
throughout the file.
• The records does not need to be in sequence
because they are updated directly and
rewritten back in the same location.
• This file organization is useful for immediate
access to large amount of information. It is
used in accessing large databases.
• It is also called as hashing.
16. Random Access File
Organization
A hashed file uses a hash function
to map the key to the address.
Eliminates the need for an extra file
(index).
There is no need for an index and all
of the overhead associated with it.
17. Functions
Function Syntax Explaination
seekg()
Fileobj.seekg(longnum,
origin)
We can move input file pointer to a
specific location using this function.
Fileobj is the pointer to the file that
we want to access and longnum is
the number of bytes we want to
skip. Origin is the value that tells
compiler where to begin skipping of
bytes.
seekp()
Fileobj.seekp(longnum,
origin)
We can move output file pointer to
a specific location using this
function. Same as seekg but
works for writing.
tellg() Fileobj.tellg( )
Return the current position of input
pointer.
tellp() Fileobj.tellp( )
Return the current position of
output pointer.
18. Flag Modes of Seek()
Mode Flag Description
ios::beg
The offset is calculated from
the beginning of the file.
ios::end
The offset is calculated from
the end of the file.
ios::cur
The offset is calculated from
the current position.
19. Flag Modes of Seek()
• Both istream and ostream provide member
functions for repositioning the file-position pointer.
These member functions are seekg ("seek get") for
istream and seekp ("seek put") for ostream.
• The argument to seekg and seekp normally is a
long integer. A second argument can be specified
to indicate the seek direction. The seek direction
can be ios::beg (the default) for positioning relative
to the beginning of a stream, ios::cur for positioning
relative to the current position in a stream or
ios::end for positioning relative to the end of a
stream.
20. Advantages
• Direct access file helps in online
transaction processing system (OLTP) like
online railway reservation system.
• In direct access file, sorting of the records
are not required.
• It accesses the desired records
immediately.
• It updates several files quickly.
• It has better control over record allocation.
21. Disadvantages
• Direct access file does not provide
back up facility.
• It is expensive.
• It has less storage space as
compared to sequential file.
23. Indexed sequential access file
organization
• Indexed sequential access file combines both
sequential file and direct access file organization.
• In indexed sequential access file, records are
stored randomly on a direct access device such
as magnetic disk by a primary key.
• This file have multiple keys. These keys can be
alphanumeric in which the records are ordered is
called primary key.
• The data can be access either sequentially or
randomly using the index. The index is stored in
a file and read into memory when the file is
opened.
24. Advantages
• In indexed sequential access file, sequential
file and random file access is possible.
• It accesses the records very fast if the index
table is properly organized.
• The records can be inserted in the middle of
the file.
• It provides quick access for sequential and
direct processing.
• It reduces the degree of the sequential
search.
25. Disadvantages
• Indexed sequential access file requires
unique keys and periodic reorganization.
• Indexed sequential access file takes
longer time to search the index for the
data access or retrieval.
• It requires more storage space.
• It is expensive because it requires
special software.
• It is less efficient in the use of storage
space as compared to other file
organizations.
27. Fully Indexed Files
• Every record has an index (address)
• Sequentially search through key field
for specific record address
• Records may be accessed directly OR
in sequential order by address
29. Applications
• ISAM (Indexed Sequential Access Method) is a file
management system developed at IBM that allows
records to be accessed either sequentially (in the
order they were entered) or randomly (with an index).
Each index defines a different ordering of the records.
An employee database may have several indexes,
based on the information being sought. For example, a
name index may order employees alphabetically by
last name, while a department index may order
employees by their department. A key is specified in
each index. For an alphabetical index of employee
names, the last name field would be the key.
30. Application
• Indexed sequential files are used when it is
necessary to use
both indexed and sequential access. A
company might store an employee file as an
indexed sequential file, because...
• Sometimes only one record needs to be
accessed ...
... an employee changes their address...
... use indexed access.
• Sometimes all records need to be accessed...
... the end-of-month payroll is run...
... use sequential access.
31. Comparison
Sequential File Index File Relative/Random File
Data is entered in
entry sequential
order
Data is entered in key
sequential order
Data is entered in RRN
number
Duplicate data is
allowed
Duplicate data is not
allowed
Duplicate data is
notallowed
Data is in sorted
order
Data is in sorted order
based on key
Data is in sorted order
based on RRN
Delete is not
applicable
Delete is applicable Delete is applicable
Access is slow Access is faster
Access is faster than
index files
Key not available
Key is available. Key is
user defined. It is a part
of record.
Key is available. Key is
system defined. It is
outside of record.
Data is stored on
tape/Disk
Data is stored on disk
only
Data is stored on disk
only
Frequently used Rarely used Not yet all used