SlideShare a Scribd company logo
1 of 11
Indexing
• A database index is a data structure that
improves the speed of data retrieval
operations on a database table at the cost of
additional writes and the use of more storage
space to maintain the extra copy of data.
Indexes are used to quickly locate data
without having to search every row in a
database table every time a database table is
accessed
• Indexing is a way of sorting a number of records on multiple fields.
Creating an index on a field in a table creates another data structure which
holds the field value, and pointer to the record it relates to. This index
structure is then sorted, allowing Binary Searches to be performed on it.
Why is Indexing needed?
• When data is stored on disk based storage devices, it is stored as blocks of
data. These blocks are accessed in their entirety, making them the atomic
disk access operation. Disk blocks are structured in much the same way as
linked lists; both contain a section for data, a pointer to the location of the
next node (or block), and both need not be stored contiguously.
• Due to the fact that a number of records can only be sorted on one field,
we can state that searching on a field that isn’t sorted requires a Linear
Search which requires N/2 block accesses (on average), where N is the
number of blocks that the table spans. If that field is a non-key field (i.e.
doesn’t contain unique entries) then the entire table space must be
searched at N block accesses.
• Whereas with a sorted field, a Binary Search may be used, this has log2 N
block accesses. Also since the data is sorted given a non-key field, the rest
of the table doesn’t need to be searched for duplicate values, once a
higher value is found. Thus the performance increase is substantial.
Creating index
• String query = "CREATE UNIQUE INDEX
CustNum " +"ON CustomerAddress
(CustomerNumber)";
• DataRequest = Database.createStatement();
• DataRequest.execute (query);
• DataRequest.close();
How does it work?
• Firstly, let’s outline a sample database table
schema;
Field name
Data type
id (Primary key) Unsigned
firstName
Char(50)
lastName
Char(50)
emailAddress
Char(100)

Size on disk
INT 4 bytes
50 bytes
50 bytes
100 bytes
• Note: char was used in place of varchar to allow for an accurate size on
disk value. This sample database contains five million rows, and is
unindexed. The performance of several queries will now be analyzed.
These are a query using the id (a sorted key field) and one using the
firstName (a non-key unsorted field).
•
•

•

•
•

Example 1
Given our sample database of r = 5,000,000 records of a fixed size giving a record length
of R = 204 bytes and they are stored in a table using the MyISAM engine which is using
the default block size B = 1,024 bytes. The blocking factor of the table would be bfr =
(B/R) = 1024/204 = 5 records per disk block. The total number of blocks required to
hold the table is N = (r/bfr) = 5000000/5 = 1,000,000 blocks.
A linear search on the id field would require an average of N/2 = 500,000 block accesses
to find a value given that the id field is a key field. But since the id field is also sorted a
binary search can be conducted requiring an average of log2 1000000 = 19.93 = 20
block accesses. Instantly we can see this is a drastic improvement.
Now the firstName field is neither sorted, so a binary search is impossible, nor are the
values unique, and thus the table will require searching to the end for an exact N =
1,000,000 block accesses. It is this situation that indexing aims to correct.
Given that an index record contains only the indexed field and a pointer to the original
record, it stands to reason that it will be smaller than the multi-field record that it
points to. So the index itself requires fewer disk blocks that the original table, which
therefore requires fewer block accesses to iterate through. The schema for an index on
the firstName field is outlined below;
• Field name
Data type Size on disk
• firstName
Char(50) 50 bytes
• (record pointer) Special 4 bytes
• Note: Pointers in MySQL are 2, 3, 4 or 5 bytes in length depending on the
size of the table.
• Example 2
• Given our sample database of r = 5,000,000 records with an
index record length of R = 54 bytes and using the default
block size B = 1,024 bytes. The blocking factor of the index
would be bfr = (B/R) = 1024/54 = 18 records per disk block.
The total number of blocks required to hold the table is N =
(r/bfr) = 5000000/18 = 277,778 blocks.
• Now a search using the firstName field can utilise the index
to increase performance. This allows for a binary search of
the index with an average of log2 277778 = 18.08 = 19
block accesses. To find the address of the actual record,
which requires a further block access to read, bringing the
total to 19 + 1 = 20 block accesses, a far cry from the
277,778 block accesses required by the non-indexed table.
When should it be used?
• Given that creating an index requires additional disk space (277,778
blocks extra from the above example), and that too many indexes
can cause issues arising from the file systems size limits, careful
thought must be used to select the correct fields to index.
• Since indexes are only used to speed up the searching for a
matching field within the records, it stands to reason that indexing
fields used only for output would be simply a waste of disk space
and processing time when doing an insert or delete operation, and
thus should be avoided. Also given the nature of a binary search,
the cardinality or uniqueness of the data is important. Indexing on a
field with a cardinality of 2 would split the data in half, whereas a
cardinality of 1,000 would return approximately 1,000 records. With
such a low cardinality the effectiveness is reduced to a linear sort,
and the query optimizer will avoid using the index if the cardinality
is less than 30% of the record number, effectively making the index
a waste of space.

More Related Content

What's hot

Data Structures : hashing (1)
Data Structures : hashing (1)Data Structures : hashing (1)
Data Structures : hashing (1)Home
 
Hashing In Data Structure
Hashing In Data Structure Hashing In Data Structure
Hashing In Data Structure Meghaj Mallick
 
Double Linked List (Algorithm)
Double Linked List (Algorithm)Double Linked List (Algorithm)
Double Linked List (Algorithm)Huba Akhtar
 
MySQL: Indexing for Better Performance
MySQL: Indexing for Better PerformanceMySQL: Indexing for Better Performance
MySQL: Indexing for Better Performancejkeriaki
 
SQL Joins.pptx
SQL Joins.pptxSQL Joins.pptx
SQL Joins.pptxAnkit Rai
 
SQL Joins and Query Optimization
SQL Joins and Query OptimizationSQL Joins and Query Optimization
SQL Joins and Query OptimizationBrian Gallagher
 
Polynomial reppresentation using Linkedlist-Application of LL.pptx
Polynomial reppresentation using Linkedlist-Application of LL.pptxPolynomial reppresentation using Linkedlist-Application of LL.pptx
Polynomial reppresentation using Linkedlist-Application of LL.pptxAlbin562191
 
3.3 shell sort
3.3 shell sort3.3 shell sort
3.3 shell sortKrish_ver2
 
Selection sort and insertion sort
Selection sort and insertion sortSelection sort and insertion sort
Selection sort and insertion sortMay Ann Mendoza
 
Conditional formatting - Excel
Conditional formatting - ExcelConditional formatting - Excel
Conditional formatting - ExcelYi Chiao Cheng
 
Introduction of sql server indexing
Introduction of sql server indexingIntroduction of sql server indexing
Introduction of sql server indexingMahabubur Rahaman
 
Java Linked List Tutorial | Edureka
Java Linked List Tutorial |  EdurekaJava Linked List Tutorial |  Edureka
Java Linked List Tutorial | EdurekaEdureka!
 

What's hot (20)

SQL Views
SQL ViewsSQL Views
SQL Views
 
Joins in SQL
Joins in SQLJoins in SQL
Joins in SQL
 
Lec 1 indexing and hashing
Lec 1 indexing and hashing Lec 1 indexing and hashing
Lec 1 indexing and hashing
 
Data Structures : hashing (1)
Data Structures : hashing (1)Data Structures : hashing (1)
Data Structures : hashing (1)
 
View & index in SQL
View & index in SQLView & index in SQL
View & index in SQL
 
Hashing In Data Structure
Hashing In Data Structure Hashing In Data Structure
Hashing In Data Structure
 
Double Linked List (Algorithm)
Double Linked List (Algorithm)Double Linked List (Algorithm)
Double Linked List (Algorithm)
 
MySQL: Indexing for Better Performance
MySQL: Indexing for Better PerformanceMySQL: Indexing for Better Performance
MySQL: Indexing for Better Performance
 
SQL Joins.pptx
SQL Joins.pptxSQL Joins.pptx
SQL Joins.pptx
 
SQL Joins and Query Optimization
SQL Joins and Query OptimizationSQL Joins and Query Optimization
SQL Joins and Query Optimization
 
DBMS Keys
DBMS KeysDBMS Keys
DBMS Keys
 
Database Keys
Database KeysDatabase Keys
Database Keys
 
Hashing
HashingHashing
Hashing
 
Polynomial reppresentation using Linkedlist-Application of LL.pptx
Polynomial reppresentation using Linkedlist-Application of LL.pptxPolynomial reppresentation using Linkedlist-Application of LL.pptx
Polynomial reppresentation using Linkedlist-Application of LL.pptx
 
3.3 shell sort
3.3 shell sort3.3 shell sort
3.3 shell sort
 
Indexing Data Structure
Indexing Data StructureIndexing Data Structure
Indexing Data Structure
 
Selection sort and insertion sort
Selection sort and insertion sortSelection sort and insertion sort
Selection sort and insertion sort
 
Conditional formatting - Excel
Conditional formatting - ExcelConditional formatting - Excel
Conditional formatting - Excel
 
Introduction of sql server indexing
Introduction of sql server indexingIntroduction of sql server indexing
Introduction of sql server indexing
 
Java Linked List Tutorial | Edureka
Java Linked List Tutorial |  EdurekaJava Linked List Tutorial |  Edureka
Java Linked List Tutorial | Edureka
 

Viewers also liked

5 data structures-hashtable
5 data structures-hashtable5 data structures-hashtable
5 data structures-hashtableirdginfo
 
Overview of Indexing In Object Oriented Database
Overview of Indexing In Object Oriented DatabaseOverview of Indexing In Object Oriented Database
Overview of Indexing In Object Oriented DatabaseEditor IJMTER
 
Database indexing framework
Database indexing frameworkDatabase indexing framework
Database indexing frameworkNitin Pande
 
Face Images Database Indexing for Person Identification Problem
Face Images Database Indexing for Person Identification ProblemFace Images Database Indexing for Person Identification Problem
Face Images Database Indexing for Person Identification ProblemCSCJournals
 
Database indexing techniques
Database indexing techniquesDatabase indexing techniques
Database indexing techniquesahmadmughal0312
 
Data indexing presentation
Data indexing presentationData indexing presentation
Data indexing presentationgmbmanikandan
 
Indexing and-hashing
Indexing and-hashingIndexing and-hashing
Indexing and-hashingAmi Ranjit
 
BTree, Data Structures
BTree, Data StructuresBTree, Data Structures
BTree, Data StructuresJibrael Jos
 
B trees in Data Structure
B trees in Data StructureB trees in Data Structure
B trees in Data StructureAnuj Modi
 

Viewers also liked (11)

5 data structures-hashtable
5 data structures-hashtable5 data structures-hashtable
5 data structures-hashtable
 
Overview of Indexing In Object Oriented Database
Overview of Indexing In Object Oriented DatabaseOverview of Indexing In Object Oriented Database
Overview of Indexing In Object Oriented Database
 
Mysql Indexing
Mysql IndexingMysql Indexing
Mysql Indexing
 
B tree short
B tree shortB tree short
B tree short
 
Database indexing framework
Database indexing frameworkDatabase indexing framework
Database indexing framework
 
Face Images Database Indexing for Person Identification Problem
Face Images Database Indexing for Person Identification ProblemFace Images Database Indexing for Person Identification Problem
Face Images Database Indexing for Person Identification Problem
 
Database indexing techniques
Database indexing techniquesDatabase indexing techniques
Database indexing techniques
 
Data indexing presentation
Data indexing presentationData indexing presentation
Data indexing presentation
 
Indexing and-hashing
Indexing and-hashingIndexing and-hashing
Indexing and-hashing
 
BTree, Data Structures
BTree, Data StructuresBTree, Data Structures
BTree, Data Structures
 
B trees in Data Structure
B trees in Data StructureB trees in Data Structure
B trees in Data Structure
 

Similar to Indexing

Database Performance
Database PerformanceDatabase Performance
Database PerformanceBoris Hristov
 
Юра Гуляев. Oracle tables
Юра Гуляев. Oracle tablesЮра Гуляев. Oracle tables
Юра Гуляев. Oracle tablesAleksandr Motsjonov
 
MySQL Indexing
MySQL IndexingMySQL Indexing
MySQL IndexingBADR
 
Unit 4 data storage and querying
Unit 4   data storage and queryingUnit 4   data storage and querying
Unit 4 data storage and queryingRavindran Kannan
 
Data Never Lies Presentation for beginners in data field.pptx
Data Never Lies Presentation for beginners in data field.pptxData Never Lies Presentation for beginners in data field.pptx
Data Never Lies Presentation for beginners in data field.pptxTusharAgarwal49094
 
CS 2212- UNIT -4.pptx
CS 2212-  UNIT -4.pptxCS 2212-  UNIT -4.pptx
CS 2212- UNIT -4.pptxLilyMkayula
 
A tour of Amazon Redshift
A tour of Amazon RedshiftA tour of Amazon Redshift
A tour of Amazon RedshiftKel Graham
 
Sql query performance analysis
Sql query performance analysisSql query performance analysis
Sql query performance analysisRiteshkiit
 
Data Indexing Presentation-My.pptppt.ppt
Data Indexing Presentation-My.pptppt.pptData Indexing Presentation-My.pptppt.ppt
Data Indexing Presentation-My.pptppt.pptsdsm2
 
File organization 1
File organization 1File organization 1
File organization 1Rupali Rana
 
Memory Management Strategies - III.pdf
Memory Management Strategies - III.pdfMemory Management Strategies - III.pdf
Memory Management Strategies - III.pdfHarika Pudugosula
 
Amazon Redshift: Performance Tuning and Optimization
Amazon Redshift: Performance Tuning and OptimizationAmazon Redshift: Performance Tuning and Optimization
Amazon Redshift: Performance Tuning and OptimizationAmazon Web Services
 

Similar to Indexing (20)

Indexing
IndexingIndexing
Indexing
 
Database Sizing
Database SizingDatabase Sizing
Database Sizing
 
Tunning overview
Tunning overviewTunning overview
Tunning overview
 
Database Performance
Database PerformanceDatabase Performance
Database Performance
 
Юра Гуляев. Oracle tables
Юра Гуляев. Oracle tablesЮра Гуляев. Oracle tables
Юра Гуляев. Oracle tables
 
MySQL Indexing
MySQL IndexingMySQL Indexing
MySQL Indexing
 
Unit 4 data storage and querying
Unit 4   data storage and queryingUnit 4   data storage and querying
Unit 4 data storage and querying
 
Data Never Lies Presentation for beginners in data field.pptx
Data Never Lies Presentation for beginners in data field.pptxData Never Lies Presentation for beginners in data field.pptx
Data Never Lies Presentation for beginners in data field.pptx
 
1650607.ppt
1650607.ppt1650607.ppt
1650607.ppt
 
CS 2212- UNIT -4.pptx
CS 2212-  UNIT -4.pptxCS 2212-  UNIT -4.pptx
CS 2212- UNIT -4.pptx
 
A tour of Amazon Redshift
A tour of Amazon RedshiftA tour of Amazon Redshift
A tour of Amazon Redshift
 
Infos2014
Infos2014Infos2014
Infos2014
 
Sql query performance analysis
Sql query performance analysisSql query performance analysis
Sql query performance analysis
 
Data Indexing Presentation-My.pptppt.ppt
Data Indexing Presentation-My.pptppt.pptData Indexing Presentation-My.pptppt.ppt
Data Indexing Presentation-My.pptppt.ppt
 
File organization 1
File organization 1File organization 1
File organization 1
 
Memory Management Strategies - III.pdf
Memory Management Strategies - III.pdfMemory Management Strategies - III.pdf
Memory Management Strategies - III.pdf
 
Amazon Redshift: Performance Tuning and Optimization
Amazon Redshift: Performance Tuning and OptimizationAmazon Redshift: Performance Tuning and Optimization
Amazon Redshift: Performance Tuning and Optimization
 
Indexing and hashing
Indexing and hashingIndexing and hashing
Indexing and hashing
 
Unit08 dbms
Unit08 dbmsUnit08 dbms
Unit08 dbms
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
 

More from myrajendra (20)

Fundamentals
FundamentalsFundamentals
Fundamentals
 
Data type
Data typeData type
Data type
 
Hibernate example1
Hibernate example1Hibernate example1
Hibernate example1
 
Jdbc workflow
Jdbc workflowJdbc workflow
Jdbc workflow
 
2 jdbc drivers
2 jdbc drivers2 jdbc drivers
2 jdbc drivers
 
3 jdbc api
3 jdbc api3 jdbc api
3 jdbc api
 
4 jdbc step1
4 jdbc step14 jdbc step1
4 jdbc step1
 
Dao example
Dao exampleDao example
Dao example
 
Sessionex1
Sessionex1Sessionex1
Sessionex1
 
Internal
InternalInternal
Internal
 
3. elements
3. elements3. elements
3. elements
 
2. attributes
2. attributes2. attributes
2. attributes
 
1 introduction to html
1 introduction to html1 introduction to html
1 introduction to html
 
Headings
HeadingsHeadings
Headings
 
Forms
FormsForms
Forms
 
Css
CssCss
Css
 
Views
ViewsViews
Views
 
Views
ViewsViews
Views
 
Views
ViewsViews
Views
 
Starting jdbc
Starting jdbcStarting jdbc
Starting jdbc
 

Recently uploaded

Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 

Recently uploaded (20)

Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 

Indexing

  • 2. • A database index is a data structure that improves the speed of data retrieval operations on a database table at the cost of additional writes and the use of more storage space to maintain the extra copy of data. Indexes are used to quickly locate data without having to search every row in a database table every time a database table is accessed
  • 3. • Indexing is a way of sorting a number of records on multiple fields. Creating an index on a field in a table creates another data structure which holds the field value, and pointer to the record it relates to. This index structure is then sorted, allowing Binary Searches to be performed on it.
  • 4. Why is Indexing needed? • When data is stored on disk based storage devices, it is stored as blocks of data. These blocks are accessed in their entirety, making them the atomic disk access operation. Disk blocks are structured in much the same way as linked lists; both contain a section for data, a pointer to the location of the next node (or block), and both need not be stored contiguously. • Due to the fact that a number of records can only be sorted on one field, we can state that searching on a field that isn’t sorted requires a Linear Search which requires N/2 block accesses (on average), where N is the number of blocks that the table spans. If that field is a non-key field (i.e. doesn’t contain unique entries) then the entire table space must be searched at N block accesses. • Whereas with a sorted field, a Binary Search may be used, this has log2 N block accesses. Also since the data is sorted given a non-key field, the rest of the table doesn’t need to be searched for duplicate values, once a higher value is found. Thus the performance increase is substantial.
  • 5. Creating index • String query = "CREATE UNIQUE INDEX CustNum " +"ON CustomerAddress (CustomerNumber)"; • DataRequest = Database.createStatement(); • DataRequest.execute (query); • DataRequest.close();
  • 6. How does it work? • Firstly, let’s outline a sample database table schema; Field name Data type id (Primary key) Unsigned firstName Char(50) lastName Char(50) emailAddress Char(100) Size on disk INT 4 bytes 50 bytes 50 bytes 100 bytes
  • 7. • Note: char was used in place of varchar to allow for an accurate size on disk value. This sample database contains five million rows, and is unindexed. The performance of several queries will now be analyzed. These are a query using the id (a sorted key field) and one using the firstName (a non-key unsorted field).
  • 8. • • • • • Example 1 Given our sample database of r = 5,000,000 records of a fixed size giving a record length of R = 204 bytes and they are stored in a table using the MyISAM engine which is using the default block size B = 1,024 bytes. The blocking factor of the table would be bfr = (B/R) = 1024/204 = 5 records per disk block. The total number of blocks required to hold the table is N = (r/bfr) = 5000000/5 = 1,000,000 blocks. A linear search on the id field would require an average of N/2 = 500,000 block accesses to find a value given that the id field is a key field. But since the id field is also sorted a binary search can be conducted requiring an average of log2 1000000 = 19.93 = 20 block accesses. Instantly we can see this is a drastic improvement. Now the firstName field is neither sorted, so a binary search is impossible, nor are the values unique, and thus the table will require searching to the end for an exact N = 1,000,000 block accesses. It is this situation that indexing aims to correct. Given that an index record contains only the indexed field and a pointer to the original record, it stands to reason that it will be smaller than the multi-field record that it points to. So the index itself requires fewer disk blocks that the original table, which therefore requires fewer block accesses to iterate through. The schema for an index on the firstName field is outlined below;
  • 9. • Field name Data type Size on disk • firstName Char(50) 50 bytes • (record pointer) Special 4 bytes • Note: Pointers in MySQL are 2, 3, 4 or 5 bytes in length depending on the size of the table.
  • 10. • Example 2 • Given our sample database of r = 5,000,000 records with an index record length of R = 54 bytes and using the default block size B = 1,024 bytes. The blocking factor of the index would be bfr = (B/R) = 1024/54 = 18 records per disk block. The total number of blocks required to hold the table is N = (r/bfr) = 5000000/18 = 277,778 blocks. • Now a search using the firstName field can utilise the index to increase performance. This allows for a binary search of the index with an average of log2 277778 = 18.08 = 19 block accesses. To find the address of the actual record, which requires a further block access to read, bringing the total to 19 + 1 = 20 block accesses, a far cry from the 277,778 block accesses required by the non-indexed table.
  • 11. When should it be used? • Given that creating an index requires additional disk space (277,778 blocks extra from the above example), and that too many indexes can cause issues arising from the file systems size limits, careful thought must be used to select the correct fields to index. • Since indexes are only used to speed up the searching for a matching field within the records, it stands to reason that indexing fields used only for output would be simply a waste of disk space and processing time when doing an insert or delete operation, and thus should be avoided. Also given the nature of a binary search, the cardinality or uniqueness of the data is important. Indexing on a field with a cardinality of 2 would split the data in half, whereas a cardinality of 1,000 would return approximately 1,000 records. With such a low cardinality the effectiveness is reduced to a linear sort, and the query optimizer will avoid using the index if the cardinality is less than 30% of the record number, effectively making the index a waste of space.