File organization 1

1 
File Organization & Indexing
2 
DBMS stores data on hard disks 
• This means that data needs to be 
– read from the hard disk into memory (RAM) 
– Written from the memory onto the hard disk 
• Because I/O disk operations are slow query 
performance depends upon how data is stored 
on hard disks 
• The lowest component of the DBMS performs 
storage management activities 
• Other DBMS components need not know how 
these low level activities are performed
3 
Basics of Data storage on hard 
disk 
• A disk is organized into a number of 
blocks or pages 
• A page is the unit of exchange between 
the disk and the main memory 
• A collection of pages is known as a file 
• DBMS stores data in one or more files 
on the hard disk
4 
File Organization 
• The physical arrangement of data in a file into records and 
pages on the disk 
• File organization determines the set of access methods for 
– Storing and retrieving records from a file 
• We study three types of file organization 
– Unordered or Heap files 
– Ordered or sequential files 
– Hash files 
• We examine each of them in terms of the operations we 
perform on the database 
– Insert a new record 
– Search for a record (or update a record) 
– Delete a record
Organization of Records in Files 
• Heap – a record can be placed anywhere in the file where there 
5 
is space 
• Sequential – store records in sequential order, based on the 
value of the search key of each record. 
• Hashing – 
This function computed on some attribute of each record. 
The term hash indicates splitting of key into pieces. 
Records of each relation may be stored in a separate file.
6 
Unordered Or Heap File 
• Records are stored in the same order in which they 
are created 
• Insert operation 
– Fast – because the incoming record is written at the end of 
the last page of the file 
• Search (or update) operation 
– Slow – because linear search is performed on pages 
• Delete Operation 
– Slow – because the record to be deleted is first searched 
– Deleting the record creates a hole in the page
7 
Ordered or Sequential File 
• Records are sorted on the values of one or more fields 
– Ordering field – the field on which the records are sorted 
• Search (or update) Operation 
– Fast – because binary search is performed on sorted records 
• Delete Operation 
– Fast – because searching the record is fast 
• Insert Operation 
– Poor – because if we insert the new record in the correct 
position 
– we need to shift more than half the subsequent records in 
the file 
– Alternatively an ‘overflow file’ is created which contains all 
the new records as a heap 
– Periodically overflow file is merged with the main file
8 
Sequential access vs random 
access . 
• sequential access means 
that a group of elements is 
accessed predetermined, 
ordered sequence 
• Random Access files will 
be spited in to pieces and 
will be stored wherever 
spaces available. 
• Sequential file may load 
faster and random access 
files may take time
9 
Hash File 
• Is an array of buckets 
– Given a record, k a hash function, h(k) computes the index 
of the bucket in which record k belongs 
– h uses one or more fields in the record called hash fields 
– Hash key - the key of the file when it is used by the hash 
function 
– h(K)=K mod M 
• Example hash function 
– Assume that the staff last name is used as the hash field 
– Assume also that the hash file size is 26 buckets - each 
bucket corresponding to each of the letters from the 
alphabet 
– Then a hash function can be defined which computes the 
bucket address (index) based on the first letter in the last 
name.
10 
A bucket is a unit of storage containing one or more records 
(a bucket is typically a disk block). 
Hash function is used to locate records for access, insertion 
as well as deletion. 
Hashing is an effective technique to calculate direct location 
of data record on the disk without using index structure.
11 
Hash File 
• Insert Operation 
– Fast – because the hash function computes the 
index of the bucket to which the record belongs 
• If that bucket is full you go to the next free one 
• Search Operation 
– Fast – because the hash function computes the 
index of the bucket 
• Delete Operation 
– Fast – once again for the same reason of hashing 
function being able to locate the record quick
12 
Internal Hashing: 
•Opening Addressing: 
-Proceeding from occupied position specified by the hash address, 
program check the subsequent position in order until an unused empty 
position is found. 
•Chaining 
-Various overflow locations are kept, usually by extending the array 
with number of overflow position 
-A pointer field is added to each record location. 
•Multiple hashing: 
External Hashing: 
- Hashing for disk file is called External Hashing 
- The Goal of good hashing function is to distribute the record 
uniformly over the address space so as to minimize collisions.
13 
Static Hashing 
!!! ….Problem with static hashing 
is that it does not expand or 
shrink dynamically as the size of 
database grows or shrinks….??? 
Dynamic Hashing 
Dynamic hashing provides a 
mechanism in which data buckets are 
added and removed dynamically and 
on-demand(extended hashing)
14 
Overflow Chaining: When buckets are 
full, a new bucket is allocated for the 
same hash result and is linked after the 
previous one. 
This mechanism is called Closed 
Hashing. 
Linear Probing: When hash function 
generates an address at which data is 
already stored, the next free bucket is 
allocated to it. 
This mechanism is called Open Hashing.
15 
Hash file organization of account file, using branch_name as key 
For a string search - key, the binary representations of all the characters in the 
string could be added and the sum modulo the number of buckets could be 
returned 
Use of Extendable Hash Structure: Example 
Initial Hash structure, bucket size = 2
File organization 1
17
18
19
20 
Indexing 
• Index File (same idea as textbook index) : auxiliary structure designed to 
speed up access to desired data. 
• Indexing field: field on which the index file is defined. 
• Index file stores each value of the index field along with pointer 
(eg:page no.) pointer(s) to block(s) that contain record(s) with that field value 
or pointer to the record with that field value:<Indexing Field, Pointer> 
• To find a record in the data file based on a certain selection criterion on an 
indexing field , we initially access the index file, which will allow the access 
of the record on the data file. 
• Index file much smaller than the data file => searching will be fast. 
• Indexing important for file systems and DBMSs:
21 
Choosing Indexing Technique 
• Five Factors involved when choosing the 
indexing technique: 
• access type 
• access time 
• insertion time 
• deletion time 
• space overhead
22 
Two Types of Indices 
• Ordered index (Primary index or clustering 
index) – which is used to access data sorted by 
order of values. 
• Hash index (secondary index or non-clustering 
index ) - used to access data that is distributed 
uniformly across a range of buckets.
23 
Types of Indexes 
• Indexes on ordered vs. unordered files 
• Dense vs. non-dense (i.e. sparse) indexes 
- Dense: An entry in the index file for each record of the data file. 
- Sparse: only some of the data records are represented in the index, often 
one index entry per block of the data file. 
• Primary indexes vs. secondary indexes 
• Ordered Indexes – Hash indexes 
- Ordered Indexes: indexing fields stored in sorted order. 
- Hash indexes: indexing fields stored using a hash function. 
• Single-level vs. multi-level 
– single-level index is an ordered file and is searched using binary search. 
– multi-level ones are tree-structured that improve the search and require a 
more elaborate search algorithm. 
• Index on a single indexing field – 
• Index on multiple indexing fields (i.e. Composite Index).
24 
Primary Index: 
Index built on ordering key field of a file 
Clustering Index: 
Index built on ordering non-key field of a file 
Secondary Index: 
Index built on any non-ordering field of a file
25 
Single-Level Ordered Index : Primary Index 
A primary index file is an index that is constructed using the 
sorting attribute of the main file. 
• Physical records may be kept ordered on the primary key. 
• The index is ordered but only one entry record for each block 
• Each index entry has the value of the primary key field for 
the first record (or the last record) in a block and a pointer to 
that block.
26
27 
Procedure: 
First perform a binary search on the primary index file, to find the 
address of the corresponding data. 
Performance: Very fast! 
Problem: The Primary Index will work only if the main file is a sorted file. 
Solution: 
The new records are inserted into an unordered (heap) in the overflow file for the 
table. Periodically, the ordered and overflow tables are merged together; at this time, 
the main file is sorted again, and the Primary Index file is accordingly updated.
28 
Dense and Sparse Indices 
There are Two types of ordered indices: 
Dense Index: 
• An index record appears for every search key value in file. 
• This record contains search key value and a pointer to the actual 
record. 
Sparse Index: 
• Index records are created only for some of the records. 
• We start at that record pointed to by the index record, and proceed 
along the pointers in the file (that is, sequentially) until we find the 
desired record.
29 
Figures 1 and 2 show dense and sparse indices for the deposit file. 
Figure 1: Dense index. 
•Notice how we would find records for Perryridge branch using both methods. 
Figure 2: Sparse index.
30 
Index Choice 
• Dense index requires more space overhead and more 
memory. 
• Data can be accessed in a shorter time using Dense 
Index. 
• It is preferable to use a dense index when the file is 
using a secondary index, or when the index file is 
small compared to the size of the memory.
31 
Single-Level Ordered Index: Clustering Index 
• Records physically ordered by a non-key field 
• Same general structure as ordered file index 
– <Clustering field, Block pointer> 
• One entry in the index for each distinct value of the clustering field with 
a pointer to the first block in the data file that has a record with that value 
for its clustering field. 
– Possibly many records for one index entry (non-dense) 
• Sometimes entire blocks reserved for each distinct clustering field value
32 
Secondary Indexes 
• secondary index must contain pointers to all the records. 
• A pointer does not point directly to the file but to a 
bucket that contains pointers to the file. 
• Secondary indices must be dense, with an index entry for 
every search-key value, and a pointer to every record in 
the file. Secondary indices improve the performance of 
queries on non-primary keys.
33 
Choosing Multi-Level Index 
• In some cases an index may be too large for efficient 
processing. 
• In that case use multi-level indexing. 
• In multi-level indexing, the primary index is treated as a 
sequence file and sparse index is created on it. 
• The outer index is a sparse index of the primary index whereas 
the inner index is the primary index.
34 
Multi-Level Index
35 
B-Tree Index 
• B-tree is the most commonly used data 
structures for indexing. 
• It is fully dynamic, that is it can grow 
and shrink.
36 
Three Types B-Tree Nodes 
• Root node - contains node pointers to 
branch nodes. 
• Branch node - contains pointers to leaf 
nodes or other branch nodes. 
• Leaf node - contains index items and 
horizontal pointers to other leaf nodes.
37 
Full B Tree Structure
Dynamic Multilevel Indexes 
– Retain the benefits of using multilevel indexing while reducing index 
insertion & deletion 
– Dynamic multilevel indexes are implemented as B-trees and often as B+- 
trees. 
• B-tree: 
Allow an indexing field value to appear only once at some level in the tree ; 
. pointer to data at each node. 
• B+tree: 
. pointers to data are stored only at the leaf nodes of the tree 
. Leaf nodes have an entry for every indexing field value. 
. The leaf nodes are usually linked together to provide ordered access on the 
indexing field to the records. 
All the leaf nodes of the tree are at the same depth: retrieval of any record 
takes the same time. 
38
39 
In a B tree search keys and data stored in internal or leaf nodes. 
But in B+tree data store only leaf nodes. 
Searching of any data in a B+ tree is very easy because all data are found in leaf 
nodes otherwise in a B tree data cannot found in leaf node. 
In B tree data may found leaf or non leaf node. Deletion of non leaf node is very 
complicated. Otherwise in a B+ tree data must found leaf node. So deletion 
is easy in leaf node. 
Insertion of a B tree is more complicated than B+ tree. 
B +tree store redundant search key but B-tree has no redundant value. 
In B+ tree leaf node data are ordered in a sequential linked list but in B tree the 
leaf node cannot stored using linked list. Many database system 
implementers prefer the structural simplicity of a B+ tree
40 
B+-tree 
B-tree
1 de 40

Recomendados

File organizationFile organization
File organizationRituBhargava7
6.6K vistas24 diapositivas
File organisationFile organisation
File organisationMukund Trivedi
46.5K vistas30 diapositivas
12. Indexing and Hashing in DBMS12. Indexing and Hashing in DBMS
12. Indexing and Hashing in DBMSkoolkampus
66.8K vistas75 diapositivas
file system in operating systemfile system in operating system
file system in operating systemtittuajay
17K vistas22 diapositivas

Más contenido relacionado

La actualidad más candente

Indexing and HashingIndexing and Hashing
Indexing and Hashingsathish sak
895 vistas58 diapositivas
File structuresFile structures
File structuresShyam Kumar
16.3K vistas13 diapositivas
HashingHashing
HashingAmar Jukuntla
17.3K vistas22 diapositivas
Database recoveryDatabase recovery
Database recoveryVritti Malhotra
12.8K vistas9 diapositivas

La actualidad más candente(20)

Indexing and HashingIndexing and Hashing
Indexing and Hashing
sathish sak895 vistas
File structuresFile structures
File structures
Shyam Kumar16.3K vistas
HashingHashing
Hashing
Amar Jukuntla17.3K vistas
Database recoveryDatabase recovery
Database recovery
Vritti Malhotra12.8K vistas
Distributed databaseDistributed database
Distributed database
ReachLocal Services India63.8K vistas
Data ModelsData Models
Data Models
RituBhargava715.2K vistas
Lec 1 indexing and hashing Lec 1 indexing and hashing
Lec 1 indexing and hashing
Md. Mashiur Rahman1.5K vistas
Databases: NormalisationDatabases: Normalisation
Databases: Normalisation
Damian T. Gordon63.9K vistas
Database design & Normalization (1NF, 2NF, 3NF)Database design & Normalization (1NF, 2NF, 3NF)
Database design & Normalization (1NF, 2NF, 3NF)
Jargalsaikhan Alyeksandr520.9K vistas
Sql fundamentalsSql fundamentals
Sql fundamentals
Ravinder Kamboj6.9K vistas
Database DesignDatabase Design
Database Design
learnt14.9K vistas
File organization and introduction of DBMSFile organization and introduction of DBMS
File organization and introduction of DBMS
VrushaliSolanke421 vistas
Database , 4 Data IntegrationDatabase , 4 Data Integration
Database , 4 Data Integration
Ali Usman4.5K vistas
File OrganizationFile Organization
File Organization
Manyi Man24.6K vistas
Segmentation in Operating Systems.Segmentation in Operating Systems.
Segmentation in Operating Systems.
Muhammad SiRaj Munir10.3K vistas

Destacado

Concept of hashingConcept of hashing
Concept of hashingRafi Dar
9.7K vistas12 diapositivas
DBMS topics for BCADBMS topics for BCA
DBMS topics for BCAAdbay
2.5K vistas97 diapositivas
DbmsDbms
Dbmsharleenmahajan
5.2K vistas26 diapositivas
DbmsDbms
Dbmssevtap87
30.8K vistas43 diapositivas
Xfs file system for linuxXfs file system for linux
Xfs file system for linuxAjay Sood
2K vistas20 diapositivas

Destacado(13)

Concept of hashingConcept of hashing
Concept of hashing
Rafi Dar9.7K vistas
DBMS topics for BCADBMS topics for BCA
DBMS topics for BCA
Adbay2.5K vistas
DbmsDbms
Dbms
harleenmahajan5.2K vistas
DbmsDbms
Dbms
sevtap8730.8K vistas
Xfs file system for linuxXfs file system for linux
Xfs file system for linux
Ajay Sood2K vistas
File organizationFile organization
File organization
Computer Hardware & Trouble shooting46.5K vistas
File SystemsFile Systems
File Systems
Anil Kumar Pugalia18K vistas
Database management systemDatabase management system
Database management system
RizwanHafeez83.6K vistas
Dbms slidesDbms slides
Dbms slides
rahulrathore725112.1K vistas
Database Management Systems (DBMS)Database Management Systems (DBMS)
Database Management Systems (DBMS)
Dimara Hakim38.8K vistas
Management Information System (MIS)Management Information System (MIS)
Management Information System (MIS)
Navneet Jingar239.8K vistas

Similar a File organization 1

File organizationFile organization
File organizationGokul017
2.3K vistas17 diapositivas
File Structure.pptxFile Structure.pptx
File Structure.pptxzedd15
6 vistas22 diapositivas
Indexing and hashingIndexing and hashing
Indexing and hashingAbdul mannan Karim
511 vistas26 diapositivas
Unit 08 dbmsUnit 08 dbms
Unit 08 dbmsanuragmbst
2.1K vistas45 diapositivas

Similar a File organization 1(20)

File organizationFile organization
File organization
Gokul0172.3K vistas
File Structure.pptxFile Structure.pptx
File Structure.pptx
zedd156 vistas
Indexing and hashingIndexing and hashing
Indexing and hashing
Abdul mannan Karim511 vistas
Overview of Storage and Indexing                                             ...Overview of Storage and Indexing                                             ...
Overview of Storage and Indexing ...
Javed Khan2.1K vistas
Unit 08 dbmsUnit 08 dbms
Unit 08 dbms
anuragmbst2.1K vistas
Main MeMory Data BaseMain MeMory Data Base
Main MeMory Data Base
Siva Rushi1.7K vistas
FILE ORGANIZATION.pptxFILE ORGANIZATION.pptx
FILE ORGANIZATION.pptx
Kavya9900963 vistas
Data storage and indexingData storage and indexing
Data storage and indexing
pradeepa velmurugan1.2K vistas
Database management system session 6Database management system session 6
Database management system session 6
Infinity Tech Solutions3.3K vistas
File OrganizationFile Organization
File Organization
Amrit Kaur165 vistas
File organizationFile organization
File organization
KanchanPatil345.4K vistas
storage techniques_overview-1.pptxstorage techniques_overview-1.pptx
storage techniques_overview-1.pptx
20CS102RAMMPRASHATHK6 vistas
FIle Organization.pptxFIle Organization.pptx
FIle Organization.pptx
Sreenivas R3 vistas
OS Unit5.pptxOS Unit5.pptx
OS Unit5.pptx
DHANABALSUBRAMANIAN4 vistas
Storage structStorage struct
Storage struct
durgaprasad1407657 vistas
IsamIsam
Isam
Javed Khan1.9K vistas

Más de Rupali Rana

Ch 9 treesCh 9 trees
Ch 9 treesRupali Rana
350 vistas15 diapositivas
Ch 9 treesCh 9 trees
Ch 9 treesRupali Rana
559 vistas15 diapositivas
Chapter 7: CONNECTED GRAPHSChapter 7: CONNECTED GRAPHS
Chapter 7: CONNECTED GRAPHSRupali Rana
354 vistas5 diapositivas

Más de Rupali Rana(12)

Ch 8 eulerian and hamiltonian graphsCh 8 eulerian and hamiltonian graphs
Ch 8 eulerian and hamiltonian graphs
Rupali Rana616 vistas
Ch 9 treesCh 9 trees
Ch 9 trees
Rupali Rana350 vistas
Ch 8 eulerian and hamiltonian graphsCh 8 eulerian and hamiltonian graphs
Ch 8 eulerian and hamiltonian graphs
Rupali Rana478 vistas
Ch 9 treesCh 9 trees
Ch 9 trees
Rupali Rana559 vistas
Ch 8 eulerian and hamiltonian graphsCh 8 eulerian and hamiltonian graphs
Ch 8 eulerian and hamiltonian graphs
Rupali Rana2.9K vistas
Chapter 7: CONNECTED GRAPHSChapter 7: CONNECTED GRAPHS
Chapter 7: CONNECTED GRAPHS
Rupali Rana354 vistas
Chapter 6: OPERATIONS ON GRAPHSChapter 6: OPERATIONS ON GRAPHS
Chapter 6: OPERATIONS ON GRAPHS
Rupali Rana6.1K vistas
Chapter 5: GRAPHSChapter 5: GRAPHS
Chapter 5: GRAPHS
Rupali Rana525 vistas
Ch 2 lattice & boolean algebraCh 2 lattice & boolean algebra
Ch 2 lattice & boolean algebra
Rupali Rana43.4K vistas
Revision ch 3Revision ch 3
Revision ch 3
Rupali Rana458 vistas
ER MODELER MODEL
ER MODEL
Rupali Rana3.8K vistas
Ch 2-introduction to dbmsCh 2-introduction to dbms
Ch 2-introduction to dbms
Rupali Rana4.2K vistas

Último(20)

ME_URBAN_WAR.pptME_URBAN_WAR.ppt
ME_URBAN_WAR.ppt
Norvell (Tex) DeAtkine117 vistas
Class 10 English  lesson plansClass 10 English  lesson plans
Class 10 English lesson plans
Tariq KHAN149 vistas
ICS3211_lecture_week72023.pdfICS3211_lecture_week72023.pdf
ICS3211_lecture_week72023.pdf
Vanessa Camilleri175 vistas
STERILITY TEST.pptxSTERILITY TEST.pptx
STERILITY TEST.pptx
Anupkumar Sharma97 vistas
ICANNICANN
ICANN
RajaulKarim2053 vistas
Scope of Biochemistry.pptxScope of Biochemistry.pptx
Scope of Biochemistry.pptx
shoba shoba104 vistas
Universe revised.pdfUniverse revised.pdf
Universe revised.pdf
DrHafizKosar79 vistas
STYP infopack.pdfSTYP infopack.pdf
STYP infopack.pdf
Fundacja Rozwoju Społeczeństwa Przedsiębiorczego125 vistas
Narration lesson plan.docxNarration lesson plan.docx
Narration lesson plan.docx
Tariq KHAN84 vistas
Industry4wrd.pptxIndustry4wrd.pptx
Industry4wrd.pptx
BC Chew144 vistas
class-3   Derived lipids (steorids).pptxclass-3   Derived lipids (steorids).pptx
class-3 Derived lipids (steorids).pptx
Dr. Santhosh Kumar. N45 vistas
MEDICINAL & TOILET PREPARATION ACT MEDICINAL & TOILET PREPARATION ACT
MEDICINAL & TOILET PREPARATION ACT
Vishal Bagul40 vistas
Azure DevOps Pipeline setup for Mule APIs #36Azure DevOps Pipeline setup for Mule APIs #36
Azure DevOps Pipeline setup for Mule APIs #36
MysoreMuleSoftMeetup66 vistas

File organization 1

  • 1. 1 File Organization & Indexing
  • 2. 2 DBMS stores data on hard disks • This means that data needs to be – read from the hard disk into memory (RAM) – Written from the memory onto the hard disk • Because I/O disk operations are slow query performance depends upon how data is stored on hard disks • The lowest component of the DBMS performs storage management activities • Other DBMS components need not know how these low level activities are performed
  • 3. 3 Basics of Data storage on hard disk • A disk is organized into a number of blocks or pages • A page is the unit of exchange between the disk and the main memory • A collection of pages is known as a file • DBMS stores data in one or more files on the hard disk
  • 4. 4 File Organization • The physical arrangement of data in a file into records and pages on the disk • File organization determines the set of access methods for – Storing and retrieving records from a file • We study three types of file organization – Unordered or Heap files – Ordered or sequential files – Hash files • We examine each of them in terms of the operations we perform on the database – Insert a new record – Search for a record (or update a record) – Delete a record
  • 5. Organization of Records in Files • Heap – a record can be placed anywhere in the file where there 5 is space • Sequential – store records in sequential order, based on the value of the search key of each record. • Hashing – This function computed on some attribute of each record. The term hash indicates splitting of key into pieces. Records of each relation may be stored in a separate file.
  • 6. 6 Unordered Or Heap File • Records are stored in the same order in which they are created • Insert operation – Fast – because the incoming record is written at the end of the last page of the file • Search (or update) operation – Slow – because linear search is performed on pages • Delete Operation – Slow – because the record to be deleted is first searched – Deleting the record creates a hole in the page
  • 7. 7 Ordered or Sequential File • Records are sorted on the values of one or more fields – Ordering field – the field on which the records are sorted • Search (or update) Operation – Fast – because binary search is performed on sorted records • Delete Operation – Fast – because searching the record is fast • Insert Operation – Poor – because if we insert the new record in the correct position – we need to shift more than half the subsequent records in the file – Alternatively an ‘overflow file’ is created which contains all the new records as a heap – Periodically overflow file is merged with the main file
  • 8. 8 Sequential access vs random access . • sequential access means that a group of elements is accessed predetermined, ordered sequence • Random Access files will be spited in to pieces and will be stored wherever spaces available. • Sequential file may load faster and random access files may take time
  • 9. 9 Hash File • Is an array of buckets – Given a record, k a hash function, h(k) computes the index of the bucket in which record k belongs – h uses one or more fields in the record called hash fields – Hash key - the key of the file when it is used by the hash function – h(K)=K mod M • Example hash function – Assume that the staff last name is used as the hash field – Assume also that the hash file size is 26 buckets - each bucket corresponding to each of the letters from the alphabet – Then a hash function can be defined which computes the bucket address (index) based on the first letter in the last name.
  • 10. 10 A bucket is a unit of storage containing one or more records (a bucket is typically a disk block). Hash function is used to locate records for access, insertion as well as deletion. Hashing is an effective technique to calculate direct location of data record on the disk without using index structure.
  • 11. 11 Hash File • Insert Operation – Fast – because the hash function computes the index of the bucket to which the record belongs • If that bucket is full you go to the next free one • Search Operation – Fast – because the hash function computes the index of the bucket • Delete Operation – Fast – once again for the same reason of hashing function being able to locate the record quick
  • 12. 12 Internal Hashing: •Opening Addressing: -Proceeding from occupied position specified by the hash address, program check the subsequent position in order until an unused empty position is found. •Chaining -Various overflow locations are kept, usually by extending the array with number of overflow position -A pointer field is added to each record location. •Multiple hashing: External Hashing: - Hashing for disk file is called External Hashing - The Goal of good hashing function is to distribute the record uniformly over the address space so as to minimize collisions.
  • 13. 13 Static Hashing !!! ….Problem with static hashing is that it does not expand or shrink dynamically as the size of database grows or shrinks….??? Dynamic Hashing Dynamic hashing provides a mechanism in which data buckets are added and removed dynamically and on-demand(extended hashing)
  • 14. 14 Overflow Chaining: When buckets are full, a new bucket is allocated for the same hash result and is linked after the previous one. This mechanism is called Closed Hashing. Linear Probing: When hash function generates an address at which data is already stored, the next free bucket is allocated to it. This mechanism is called Open Hashing.
  • 15. 15 Hash file organization of account file, using branch_name as key For a string search - key, the binary representations of all the characters in the string could be added and the sum modulo the number of buckets could be returned Use of Extendable Hash Structure: Example Initial Hash structure, bucket size = 2
  • 17. 17
  • 18. 18
  • 19. 19
  • 20. 20 Indexing • Index File (same idea as textbook index) : auxiliary structure designed to speed up access to desired data. • Indexing field: field on which the index file is defined. • Index file stores each value of the index field along with pointer (eg:page no.) pointer(s) to block(s) that contain record(s) with that field value or pointer to the record with that field value:<Indexing Field, Pointer> • To find a record in the data file based on a certain selection criterion on an indexing field , we initially access the index file, which will allow the access of the record on the data file. • Index file much smaller than the data file => searching will be fast. • Indexing important for file systems and DBMSs:
  • 21. 21 Choosing Indexing Technique • Five Factors involved when choosing the indexing technique: • access type • access time • insertion time • deletion time • space overhead
  • 22. 22 Two Types of Indices • Ordered index (Primary index or clustering index) – which is used to access data sorted by order of values. • Hash index (secondary index or non-clustering index ) - used to access data that is distributed uniformly across a range of buckets.
  • 23. 23 Types of Indexes • Indexes on ordered vs. unordered files • Dense vs. non-dense (i.e. sparse) indexes - Dense: An entry in the index file for each record of the data file. - Sparse: only some of the data records are represented in the index, often one index entry per block of the data file. • Primary indexes vs. secondary indexes • Ordered Indexes – Hash indexes - Ordered Indexes: indexing fields stored in sorted order. - Hash indexes: indexing fields stored using a hash function. • Single-level vs. multi-level – single-level index is an ordered file and is searched using binary search. – multi-level ones are tree-structured that improve the search and require a more elaborate search algorithm. • Index on a single indexing field – • Index on multiple indexing fields (i.e. Composite Index).
  • 24. 24 Primary Index: Index built on ordering key field of a file Clustering Index: Index built on ordering non-key field of a file Secondary Index: Index built on any non-ordering field of a file
  • 25. 25 Single-Level Ordered Index : Primary Index A primary index file is an index that is constructed using the sorting attribute of the main file. • Physical records may be kept ordered on the primary key. • The index is ordered but only one entry record for each block • Each index entry has the value of the primary key field for the first record (or the last record) in a block and a pointer to that block.
  • 26. 26
  • 27. 27 Procedure: First perform a binary search on the primary index file, to find the address of the corresponding data. Performance: Very fast! Problem: The Primary Index will work only if the main file is a sorted file. Solution: The new records are inserted into an unordered (heap) in the overflow file for the table. Periodically, the ordered and overflow tables are merged together; at this time, the main file is sorted again, and the Primary Index file is accordingly updated.
  • 28. 28 Dense and Sparse Indices There are Two types of ordered indices: Dense Index: • An index record appears for every search key value in file. • This record contains search key value and a pointer to the actual record. Sparse Index: • Index records are created only for some of the records. • We start at that record pointed to by the index record, and proceed along the pointers in the file (that is, sequentially) until we find the desired record.
  • 29. 29 Figures 1 and 2 show dense and sparse indices for the deposit file. Figure 1: Dense index. •Notice how we would find records for Perryridge branch using both methods. Figure 2: Sparse index.
  • 30. 30 Index Choice • Dense index requires more space overhead and more memory. • Data can be accessed in a shorter time using Dense Index. • It is preferable to use a dense index when the file is using a secondary index, or when the index file is small compared to the size of the memory.
  • 31. 31 Single-Level Ordered Index: Clustering Index • Records physically ordered by a non-key field • Same general structure as ordered file index – <Clustering field, Block pointer> • One entry in the index for each distinct value of the clustering field with a pointer to the first block in the data file that has a record with that value for its clustering field. – Possibly many records for one index entry (non-dense) • Sometimes entire blocks reserved for each distinct clustering field value
  • 32. 32 Secondary Indexes • secondary index must contain pointers to all the records. • A pointer does not point directly to the file but to a bucket that contains pointers to the file. • Secondary indices must be dense, with an index entry for every search-key value, and a pointer to every record in the file. Secondary indices improve the performance of queries on non-primary keys.
  • 33. 33 Choosing Multi-Level Index • In some cases an index may be too large for efficient processing. • In that case use multi-level indexing. • In multi-level indexing, the primary index is treated as a sequence file and sparse index is created on it. • The outer index is a sparse index of the primary index whereas the inner index is the primary index.
  • 35. 35 B-Tree Index • B-tree is the most commonly used data structures for indexing. • It is fully dynamic, that is it can grow and shrink.
  • 36. 36 Three Types B-Tree Nodes • Root node - contains node pointers to branch nodes. • Branch node - contains pointers to leaf nodes or other branch nodes. • Leaf node - contains index items and horizontal pointers to other leaf nodes.
  • 37. 37 Full B Tree Structure
  • 38. Dynamic Multilevel Indexes – Retain the benefits of using multilevel indexing while reducing index insertion & deletion – Dynamic multilevel indexes are implemented as B-trees and often as B+- trees. • B-tree: Allow an indexing field value to appear only once at some level in the tree ; . pointer to data at each node. • B+tree: . pointers to data are stored only at the leaf nodes of the tree . Leaf nodes have an entry for every indexing field value. . The leaf nodes are usually linked together to provide ordered access on the indexing field to the records. All the leaf nodes of the tree are at the same depth: retrieval of any record takes the same time. 38
  • 39. 39 In a B tree search keys and data stored in internal or leaf nodes. But in B+tree data store only leaf nodes. Searching of any data in a B+ tree is very easy because all data are found in leaf nodes otherwise in a B tree data cannot found in leaf node. In B tree data may found leaf or non leaf node. Deletion of non leaf node is very complicated. Otherwise in a B+ tree data must found leaf node. So deletion is easy in leaf node. Insertion of a B tree is more complicated than B+ tree. B +tree store redundant search key but B-tree has no redundant value. In B+ tree leaf node data are ordered in a sequential linked list but in B tree the leaf node cannot stored using linked list. Many database system implementers prefer the structural simplicity of a B+ tree