SlideShare una empresa de Scribd logo
1 de 35
Descargar para leer sin conexión
Lecture 16
Dr. Fawad Hussain
Primary and Secondary IndicesPrimary and Secondary IndicesPrimary and Secondary IndicesPrimary and Secondary Indices
Use multiple indices for certain types of queries.
Example:
select account-number
from account
where branch-name = “Perryridge” and balance = 1000
Possible strategies for processing query using indices on single
attributes:
1. Use index on branch-name to find accounts with balances of $1000; test branch-name =
“Perryridge”.
2. Use index on balance to find accounts with balances of $1000; test branch-name = “Perryridge”.
3. Use branch-name index to find pointers to all records pertaining to the Perryridge branch.
Similarly use index on balance. Take intersection of both sets of pointers obtained.
Multiple-Key Access
With the where clause
where branch-name = “Perryridge” and balance = 1000
the index on the combined search-key will fetch only records that satisfy
both conditions.
Using separate indices in less efficient — we may fetch many records (or
pointers) that satisfy only one of the conditions.
Can also efficiently handle
where branch-name - “Perryridge” and balance < 1000
Indices on Multiple Attributes
Sample account File
Hash Function of branch-name
Bitmap Indices on Relation customer-info
Primary Index
Secondary Index
Sparse Index vs Dense Index
IndexingTechniques
Primary versus secondary indexing.
Single index access versus scanning.
Combining multiple indexes.
What we studied
PI for a table (inTeradata) is a specification of its partitioning
column(s).
PI may be defined as unique (UPI) or non-unique (NUPI).
Automatic enforcement of uniqueness when UPI is specified.
PI provides an implicit access path to any row just by knowing its
value.
Only one PI per table.
PI can be on multiple columns i.e. composite.
Primary Index
Primary index selection criteria:
Common join and retrieval key.
Distributes rows evenly across database partitions.
Less than 10,000 rows per PI value when non-unique.
WHY?
Primary Index
Trick question: What should be the primary index of the transaction
table for a large financial services firm?
create table tx
(tx_id decimal (15,0) NOT NULL
,account_it decimal (10,0) NOT NULL
,tx_amt decimal (15,2) NOT NULL
,tx_dt date NOT NULL
,tx_cd char (2) NOT NULL
....
) primary index (???);
Ans: It depends
Primary Index
Almost all joins and retrievals will come in through the account
_id foreign key.
Want account_id as NUPI.
If data is “lumpy” when distributed on account_id or if accounts
have very large numbers of transactions (e.g., an institutional
account could easily have 10,000+ transactions).
Want tx_id as UPI for good data distribution.
Primary Index
Joins and access via primary index are very efficient due toTeradata’s
sophisticated row hashing algorithms that allow going directly to the data
block containing the desired row.
Single I/O operation for accessing a data row via UPI.
Single I/O operation for accessing a data row via NUPI whenever all rows
with the same PI value fit into a single block.
SingleVAMP operation for indexed retrieval.
No spool space required.
Primary Index
Primary index is free!
No storage cost.
No index build required.
This is a direct result of the underlying hash-based file system
implementation.
OLTP databases use a page-based file system and therefore do not deliver
this performance advantage.
Primary Index
Any index that is NOT a PI
SI structures are implemented using the same underlying structure
as base tables (often referred to as sub-tables).
SI may be defined as unique (USI) or non-unique (NUSI).
Automatic enforcement of uniqueness when USI is specified.
Up to thirty-two SI’s per table inTeradata.
Unlike a primary index, SI are not “free” in terms of storage.
SI is NOT required BUT desired.
Secondary Index(SI)
Lecture 17
Dr. Fawad Hussain
Primary and Secondary IndicesPrimary and Secondary IndicesPrimary and Secondary IndicesPrimary and Secondary Indices----IIIIIIII
Primary index selection criteria:
Common join and retrieval key.
Distributes rows evenly across database partitions.
Less than 10,000 rows per PI value when non-unique.
WHY?
Recall - 1
Secondary Index (SI)
Any index that is NOT a PI
SI structures are implemented using the same underlying structure as
base tables (often referred to as sub-tables).
SI may be defined as unique (USI) or non-unique (NUSI).
Automatic enforcement of uniqueness when USI is specified.
Up to thirty-two SI’s per table inTeradata.
Unlike a primary index, SI are not “free” in terms of storage.
SI is NOT required BUT desired.
Recall - 2
A non-unique secondary index (NUSI) is partitioned so that each index
entry is co-located on the sameVamp (VirtualAccess Module Processor)
with its corresponding row in the base table.
Each row access via a NUSI is a singleVamp operation because the NUSI
entry and data row are co-located.
NUSI access is always performed in parallel across allVamp whenever it
is appropriate to do so.
Secondary Index (NUSI)
Compressed ROWID index structure:
Hash on index value to get block location (ROWID for sub-table).
Store index value just once followed by all ROWIDs in base table corresponding to the
index value.
Sorted by ROWID to facilitate maximum efficiency when accessing base table,
performing updates and deletes, etc.
Additional blocks allocated when NUSI is non-selective and compressed ROWID
structure for the index value exceeds 64K.
Secondary Index (NUSI)
Secondary Index (NUSI)
Non Unique Secondary
Index Value
Non Unique Secondary
Index Value
Hashing AlgorithmHashing Algorithm
NUSI Sub-table
Base Table
Building a NUSI helps when the selectivity of the indexed column is very
high.
Cost-based optimizer will determine when to access via NUSI:
Number of rows selected by NUSI must be less than number of blocks in the table
to justify access via NUSI (assumes even distribution of rows with NUSI value
within table).
Must also consider cost for reading the NUSI sub-table and building ROWID spool
file.
Note that the extreme efficiency of table scanning inTeradata reduces the need for
secondary indexing as compared to other databases.
When to build NUSI?
A unique secondary index (USI) is partitioned by the unique column
upon which the index is built.
Row access via a USI is a twoVamp operation.
First I/O is initiated on theVamp with the USI entry.
Second I/O is initiated on theVamp with the data row entry.
Secondary Index (USI)
When to Build a USI?
To allow data access without allVAMP operations.
Increased efficiency for (very) high selectivity retrievals.
Obtain co-location of index with frequently joined tables.
When to build USI?
Example:
create table order_header
(order_id decimal(12, 0) NOT NULL
,customer_id decimal(9, 0) NOT NULL
,order_dt date NOT NULL
...
)
primary index( customer_id );
create unique index oh_order_idx (order_id) on order_header;
create table order_detail
(order_id decimal(12, 0) NOT NULL
,product_id integer NOT NULL
,extended_price_amt decimal(15,2) NOT NULL
,item_cnt integer NOT NULL
...
)
primary index( order_id );
When to build USI?
Example: How many customers ordered green socks in the last month? (Assume that
green socks is quite selective)
select count(distinct order_header.customer_id)
from order_header
,order_detail ,product
where order_header.order_id = order_detail.order_id
and order.order_dt > add_months(date, -1)
and order_detail.product_id = product.product_id
and product.product_subcategory_cd = 'SOCKS'
and product.color_cd = 'GREEN'
;
The order_id USI on order_header table obviates the need for allVamp duplication of spool result from
order detail to product join when joining to the order header table.
When to build USI?
Lecture 18
Dr. Fawad Hussain
Primary and Secondary IndicesPrimary and Secondary IndicesPrimary and Secondary IndicesPrimary and Secondary Indices----IIIIIIIIIIII
Example: What is the average age (in years) of customers who live in California or
Massachusetts, completed a graduate degree, are consultants, and have a hobby of
volleyball or chess?
select avg( (days(date) - days(customer.birth_dt)) / 365.25
)
from customer
where customer.state_cd in (‘CA’ , MA’)
and customer.education_cd = ‘G’
and customer.occupation_cd = ‘CONSULTANT’
and customer.hobby_cd in (‘VOLLEYBALL’,‘CHESS’)
;
A Simple Query
Assume:
20M customers.
128 byte rows.
64K data block size.
Results in approximately 512 rows per block and a total of 39,063 blocks
in the customer table.
Note: We are ignoring block overhead for purposes of simplicity in calculations.
Sample Query Structure
Assume:
8% of customers live in California.
4% of customers live in Massachusetts.
4% of customers have completed a graduate degree.
6% of customers are consultants.
2% of customers have a primary hobby of chess.
3% of customers have a primary hobby of volleyball.
Data Demographics
Must read every block in the table.
Apply where clause predicates to determine which customers to include
in average.
Adjust numerator and denominator of average as appropriate.
Total I/O count = 39,063
Note: Data demographics have no (minimal) impact on query performance when using
a full table scan operation.
Full Table Scan
B-tree or hash organization of column values:
Index entries store row IDs (RIDs), lists of RIDs, or pointers to lists of
RIDs.
Originally designed for columns with many unique values (OLTP
legacy).
Assuming an eight byte RID, we will get 8096 RIDs per 64K block.
Single Index Structure
Optimizer chooses index with best selectivity based on values specified
in query.
1. Access next (first) index entry corresponding to specified column
value(s).
2. Use RID from index entry to locate row with specified column value.
3. Validate remaining predicates to qualify row.
4. Adjust average as appropriate.
5. Go to 2 until no more matching index values.
Single Index Access
What are my indexing choices?
state_cd (8% + 4% = 12% selectivity)
education_cd (4% selectivity)
occupation_cd (6% selectivity)
hobby_cd (2% + 3% = 5% selectivity)
Choose education_cd because it has best selectivity.
Single Index Access
Access via index on education_cd:
800,000 RIDs (4% of 20M)
99 blocks of RIDs to read
But...4% selectivity with 512 rows per block in the base table means that
800,000 selected RIDs will cause access to every block in the base table!
Total I/O count = 39,063 + 99 = 39,162
Worse than full table scan!
Single Index Access Performance
How do we calculate that number?
The selectivity on education is 4% (or 8 lac rows). If these rows were
consecutively distributed, then we would have to access
8lac rows/512 (rows per block)= 1563 blocks.
However, assuming equal distribution, and 4% selectivity, we will use
probability to find the distribution.That is, 4% selectivity gives us
0.04*512=20.48 rows (that are desired) are found in each block.
Hence, total I/O required is
8lacs /20.48 = 39062.5 bocks
Plus the additional 99 for accessing the index (8 lac Rows and 8096 RIDs
/block=99 blocks of RIDs to read)
The calculations

Más contenido relacionado

Destacado

Tik allisya smpit rpi
Tik allisya smpit rpiTik allisya smpit rpi
Tik allisya smpit rpiichaa17
 
What is Bitcoin Currency
What is Bitcoin CurrencyWhat is Bitcoin Currency
What is Bitcoin Currencynasim12
 
behavior tips! for school kids !
behavior tips! for school kids !behavior tips! for school kids !
behavior tips! for school kids !Ramya Aggarwal
 
Com 303 1
Com 303 1Com 303 1
Com 303 1ChadH1
 
Developing for Windows 8 based devices
Developing for Windows 8 based devicesDeveloping for Windows 8 based devices
Developing for Windows 8 based devicesAneeb_Khawar
 
Fiqih icha
Fiqih ichaFiqih icha
Fiqih ichaichaa17
 
Programme on Strategic Management and Management of Change
Programme on Strategic Management and Management of ChangeProgramme on Strategic Management and Management of Change
Programme on Strategic Management and Management of Changevamnicom123
 
Epc slides part 2
Epc slides part 2Epc slides part 2
Epc slides part 2Jıa Yıı
 
Year 7 energy_resources_and_electrical_circuits_mark_scheme (1)
Year 7 energy_resources_and_electrical_circuits_mark_scheme (1)Year 7 energy_resources_and_electrical_circuits_mark_scheme (1)
Year 7 energy_resources_and_electrical_circuits_mark_scheme (1)Nurul Aron
 
Why I love the Rain and You Will too - Guarenteed
Why I love the Rain and You Will too - GuarenteedWhy I love the Rain and You Will too - Guarenteed
Why I love the Rain and You Will too - GuarenteedJane Coombs
 
Ici final project report
Ici final project reportIci final project report
Ici final project reportJıa Yıı
 
tik icha smpit rpi
tik icha smpit rpi tik icha smpit rpi
tik icha smpit rpi ichaa17
 
Programme on Governance and Reforms in Cooperatives for UCB and Credit Societies
Programme on Governance and Reforms in Cooperatives for UCB and Credit SocietiesProgramme on Governance and Reforms in Cooperatives for UCB and Credit Societies
Programme on Governance and Reforms in Cooperatives for UCB and Credit Societiesvamnicom123
 
Performance improvement techniques for software distributed shared memory
Performance improvement techniques for software distributed shared memoryPerformance improvement techniques for software distributed shared memory
Performance improvement techniques for software distributed shared memoryZongYing Lyu
 

Destacado (20)

Tik allisya smpit rpi
Tik allisya smpit rpiTik allisya smpit rpi
Tik allisya smpit rpi
 
What is Bitcoin Currency
What is Bitcoin CurrencyWhat is Bitcoin Currency
What is Bitcoin Currency
 
behavior tips! for school kids !
behavior tips! for school kids !behavior tips! for school kids !
behavior tips! for school kids !
 
Com 303 1
Com 303 1Com 303 1
Com 303 1
 
Developing for Windows 8 based devices
Developing for Windows 8 based devicesDeveloping for Windows 8 based devices
Developing for Windows 8 based devices
 
Pkn
PknPkn
Pkn
 
Cot safety
Cot safetyCot safety
Cot safety
 
Fiqih icha
Fiqih ichaFiqih icha
Fiqih icha
 
Apresentacao now ventures
Apresentacao now venturesApresentacao now ventures
Apresentacao now ventures
 
Engranajes fotos
Engranajes fotosEngranajes fotos
Engranajes fotos
 
Programme on Strategic Management and Management of Change
Programme on Strategic Management and Management of ChangeProgramme on Strategic Management and Management of Change
Programme on Strategic Management and Management of Change
 
Epc slides part 2
Epc slides part 2Epc slides part 2
Epc slides part 2
 
Year 7 energy_resources_and_electrical_circuits_mark_scheme (1)
Year 7 energy_resources_and_electrical_circuits_mark_scheme (1)Year 7 energy_resources_and_electrical_circuits_mark_scheme (1)
Year 7 energy_resources_and_electrical_circuits_mark_scheme (1)
 
Creative, Digital & Design Business Briefing - September 2015
Creative, Digital & Design Business Briefing - September 2015Creative, Digital & Design Business Briefing - September 2015
Creative, Digital & Design Business Briefing - September 2015
 
Why I love the Rain and You Will too - Guarenteed
Why I love the Rain and You Will too - GuarenteedWhy I love the Rain and You Will too - Guarenteed
Why I love the Rain and You Will too - Guarenteed
 
Ici final project report
Ici final project reportIci final project report
Ici final project report
 
tik icha smpit rpi
tik icha smpit rpi tik icha smpit rpi
tik icha smpit rpi
 
5G Info Briefing
5G Info Briefing 5G Info Briefing
5G Info Briefing
 
Programme on Governance and Reforms in Cooperatives for UCB and Credit Societies
Programme on Governance and Reforms in Cooperatives for UCB and Credit SocietiesProgramme on Governance and Reforms in Cooperatives for UCB and Credit Societies
Programme on Governance and Reforms in Cooperatives for UCB and Credit Societies
 
Performance improvement techniques for software distributed shared memory
Performance improvement techniques for software distributed shared memoryPerformance improvement techniques for software distributed shared memory
Performance improvement techniques for software distributed shared memory
 

Similar a Cs437 lecture 16-18

Sql server lesson6
Sql server lesson6Sql server lesson6
Sql server lesson6Ala Qunaibi
 
Maryna Popova "Deep dive AWS Redshift"
Maryna Popova "Deep dive AWS Redshift"Maryna Popova "Deep dive AWS Redshift"
Maryna Popova "Deep dive AWS Redshift"Lviv Startup Club
 
Database Performance Tuning
Database Performance Tuning Database Performance Tuning
Database Performance Tuning Arno Huetter
 
Physical elements of data
Physical elements of dataPhysical elements of data
Physical elements of dataDimara Hakim
 
Advanced MySQL Query Optimizations
Advanced MySQL Query OptimizationsAdvanced MySQL Query Optimizations
Advanced MySQL Query OptimizationsDave Stokes
 
Database index
Database indexDatabase index
Database indexRiteshkiit
 
Myth busters - performance tuning 101 2007
Myth busters - performance tuning 101 2007Myth busters - performance tuning 101 2007
Myth busters - performance tuning 101 2007paulguerin
 
Database Performance
Database PerformanceDatabase Performance
Database PerformanceBoris Hristov
 
MSBI and Data WareHouse techniques by Quontra
MSBI and Data WareHouse techniques by Quontra MSBI and Data WareHouse techniques by Quontra
MSBI and Data WareHouse techniques by Quontra QUONTRASOLUTIONS
 
U-SQL Partitioned Data and Tables (SQLBits 2016)
U-SQL Partitioned Data and Tables (SQLBits 2016)U-SQL Partitioned Data and Tables (SQLBits 2016)
U-SQL Partitioned Data and Tables (SQLBits 2016)Michael Rys
 
How mysql choose the execution plan
How mysql choose the execution planHow mysql choose the execution plan
How mysql choose the execution plan辛鹤 李
 
SQL -Beginner To Intermediate Level.pdf
SQL -Beginner To Intermediate Level.pdfSQL -Beginner To Intermediate Level.pdf
SQL -Beginner To Intermediate Level.pdfDraguClaudiu
 
Steps towards of sql server developer
Steps towards of sql server developerSteps towards of sql server developer
Steps towards of sql server developerAhsan Kabir
 

Similar a Cs437 lecture 16-18 (20)

Sql server lesson6
Sql server lesson6Sql server lesson6
Sql server lesson6
 
Maryna Popova "Deep dive AWS Redshift"
Maryna Popova "Deep dive AWS Redshift"Maryna Popova "Deep dive AWS Redshift"
Maryna Popova "Deep dive AWS Redshift"
 
Physical Design and Development
Physical Design and DevelopmentPhysical Design and Development
Physical Design and Development
 
Database Performance Tuning
Database Performance Tuning Database Performance Tuning
Database Performance Tuning
 
Physical elements of data
Physical elements of dataPhysical elements of data
Physical elements of data
 
Advanced MySQL Query Optimizations
Advanced MySQL Query OptimizationsAdvanced MySQL Query Optimizations
Advanced MySQL Query Optimizations
 
Indexing
IndexingIndexing
Indexing
 
MySQL Indexes
MySQL IndexesMySQL Indexes
MySQL Indexes
 
Keerty rdbms sql
Keerty rdbms sqlKeerty rdbms sql
Keerty rdbms sql
 
Database index
Database indexDatabase index
Database index
 
Database Basics
Database BasicsDatabase Basics
Database Basics
 
Myth busters - performance tuning 101 2007
Myth busters - performance tuning 101 2007Myth busters - performance tuning 101 2007
Myth busters - performance tuning 101 2007
 
Database Performance
Database PerformanceDatabase Performance
Database Performance
 
MSBI and Data WareHouse techniques by Quontra
MSBI and Data WareHouse techniques by Quontra MSBI and Data WareHouse techniques by Quontra
MSBI and Data WareHouse techniques by Quontra
 
unit 1 ppt.pptx
unit 1 ppt.pptxunit 1 ppt.pptx
unit 1 ppt.pptx
 
U-SQL Partitioned Data and Tables (SQLBits 2016)
U-SQL Partitioned Data and Tables (SQLBits 2016)U-SQL Partitioned Data and Tables (SQLBits 2016)
U-SQL Partitioned Data and Tables (SQLBits 2016)
 
Database concepts
Database conceptsDatabase concepts
Database concepts
 
How mysql choose the execution plan
How mysql choose the execution planHow mysql choose the execution plan
How mysql choose the execution plan
 
SQL -Beginner To Intermediate Level.pdf
SQL -Beginner To Intermediate Level.pdfSQL -Beginner To Intermediate Level.pdf
SQL -Beginner To Intermediate Level.pdf
 
Steps towards of sql server developer
Steps towards of sql server developerSteps towards of sql server developer
Steps towards of sql server developer
 

Más de Aneeb_Khawar

Más de Aneeb_Khawar (6)

Cs437 lecture 14_15
Cs437 lecture 14_15Cs437 lecture 14_15
Cs437 lecture 14_15
 
Cs437 lecture 13
Cs437 lecture 13Cs437 lecture 13
Cs437 lecture 13
 
Cs437 lecture 10-12
Cs437 lecture 10-12Cs437 lecture 10-12
Cs437 lecture 10-12
 
Cs437 lecture 09
Cs437 lecture 09Cs437 lecture 09
Cs437 lecture 09
 
Cs437 lecture 7-8
Cs437 lecture 7-8Cs437 lecture 7-8
Cs437 lecture 7-8
 
Cs437 lecture 1-6
Cs437 lecture 1-6Cs437 lecture 1-6
Cs437 lecture 1-6
 

Último

Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxdolaknnilon
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 

Último (20)

Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptx
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 

Cs437 lecture 16-18

  • 1. Lecture 16 Dr. Fawad Hussain Primary and Secondary IndicesPrimary and Secondary IndicesPrimary and Secondary IndicesPrimary and Secondary Indices
  • 2. Use multiple indices for certain types of queries. Example: select account-number from account where branch-name = “Perryridge” and balance = 1000 Possible strategies for processing query using indices on single attributes: 1. Use index on branch-name to find accounts with balances of $1000; test branch-name = “Perryridge”. 2. Use index on balance to find accounts with balances of $1000; test branch-name = “Perryridge”. 3. Use branch-name index to find pointers to all records pertaining to the Perryridge branch. Similarly use index on balance. Take intersection of both sets of pointers obtained. Multiple-Key Access
  • 3. With the where clause where branch-name = “Perryridge” and balance = 1000 the index on the combined search-key will fetch only records that satisfy both conditions. Using separate indices in less efficient — we may fetch many records (or pointers) that satisfy only one of the conditions. Can also efficiently handle where branch-name - “Perryridge” and balance < 1000 Indices on Multiple Attributes
  • 5. Hash Function of branch-name
  • 6. Bitmap Indices on Relation customer-info
  • 7. Primary Index Secondary Index Sparse Index vs Dense Index IndexingTechniques Primary versus secondary indexing. Single index access versus scanning. Combining multiple indexes. What we studied
  • 8. PI for a table (inTeradata) is a specification of its partitioning column(s). PI may be defined as unique (UPI) or non-unique (NUPI). Automatic enforcement of uniqueness when UPI is specified. PI provides an implicit access path to any row just by knowing its value. Only one PI per table. PI can be on multiple columns i.e. composite. Primary Index
  • 9. Primary index selection criteria: Common join and retrieval key. Distributes rows evenly across database partitions. Less than 10,000 rows per PI value when non-unique. WHY? Primary Index
  • 10. Trick question: What should be the primary index of the transaction table for a large financial services firm? create table tx (tx_id decimal (15,0) NOT NULL ,account_it decimal (10,0) NOT NULL ,tx_amt decimal (15,2) NOT NULL ,tx_dt date NOT NULL ,tx_cd char (2) NOT NULL .... ) primary index (???); Ans: It depends Primary Index
  • 11. Almost all joins and retrievals will come in through the account _id foreign key. Want account_id as NUPI. If data is “lumpy” when distributed on account_id or if accounts have very large numbers of transactions (e.g., an institutional account could easily have 10,000+ transactions). Want tx_id as UPI for good data distribution. Primary Index
  • 12. Joins and access via primary index are very efficient due toTeradata’s sophisticated row hashing algorithms that allow going directly to the data block containing the desired row. Single I/O operation for accessing a data row via UPI. Single I/O operation for accessing a data row via NUPI whenever all rows with the same PI value fit into a single block. SingleVAMP operation for indexed retrieval. No spool space required. Primary Index
  • 13. Primary index is free! No storage cost. No index build required. This is a direct result of the underlying hash-based file system implementation. OLTP databases use a page-based file system and therefore do not deliver this performance advantage. Primary Index
  • 14. Any index that is NOT a PI SI structures are implemented using the same underlying structure as base tables (often referred to as sub-tables). SI may be defined as unique (USI) or non-unique (NUSI). Automatic enforcement of uniqueness when USI is specified. Up to thirty-two SI’s per table inTeradata. Unlike a primary index, SI are not “free” in terms of storage. SI is NOT required BUT desired. Secondary Index(SI)
  • 15. Lecture 17 Dr. Fawad Hussain Primary and Secondary IndicesPrimary and Secondary IndicesPrimary and Secondary IndicesPrimary and Secondary Indices----IIIIIIII
  • 16. Primary index selection criteria: Common join and retrieval key. Distributes rows evenly across database partitions. Less than 10,000 rows per PI value when non-unique. WHY? Recall - 1
  • 17. Secondary Index (SI) Any index that is NOT a PI SI structures are implemented using the same underlying structure as base tables (often referred to as sub-tables). SI may be defined as unique (USI) or non-unique (NUSI). Automatic enforcement of uniqueness when USI is specified. Up to thirty-two SI’s per table inTeradata. Unlike a primary index, SI are not “free” in terms of storage. SI is NOT required BUT desired. Recall - 2
  • 18. A non-unique secondary index (NUSI) is partitioned so that each index entry is co-located on the sameVamp (VirtualAccess Module Processor) with its corresponding row in the base table. Each row access via a NUSI is a singleVamp operation because the NUSI entry and data row are co-located. NUSI access is always performed in parallel across allVamp whenever it is appropriate to do so. Secondary Index (NUSI)
  • 19. Compressed ROWID index structure: Hash on index value to get block location (ROWID for sub-table). Store index value just once followed by all ROWIDs in base table corresponding to the index value. Sorted by ROWID to facilitate maximum efficiency when accessing base table, performing updates and deletes, etc. Additional blocks allocated when NUSI is non-selective and compressed ROWID structure for the index value exceeds 64K. Secondary Index (NUSI)
  • 20. Secondary Index (NUSI) Non Unique Secondary Index Value Non Unique Secondary Index Value Hashing AlgorithmHashing Algorithm NUSI Sub-table Base Table
  • 21. Building a NUSI helps when the selectivity of the indexed column is very high. Cost-based optimizer will determine when to access via NUSI: Number of rows selected by NUSI must be less than number of blocks in the table to justify access via NUSI (assumes even distribution of rows with NUSI value within table). Must also consider cost for reading the NUSI sub-table and building ROWID spool file. Note that the extreme efficiency of table scanning inTeradata reduces the need for secondary indexing as compared to other databases. When to build NUSI?
  • 22. A unique secondary index (USI) is partitioned by the unique column upon which the index is built. Row access via a USI is a twoVamp operation. First I/O is initiated on theVamp with the USI entry. Second I/O is initiated on theVamp with the data row entry. Secondary Index (USI)
  • 23. When to Build a USI? To allow data access without allVAMP operations. Increased efficiency for (very) high selectivity retrievals. Obtain co-location of index with frequently joined tables. When to build USI?
  • 24. Example: create table order_header (order_id decimal(12, 0) NOT NULL ,customer_id decimal(9, 0) NOT NULL ,order_dt date NOT NULL ... ) primary index( customer_id ); create unique index oh_order_idx (order_id) on order_header; create table order_detail (order_id decimal(12, 0) NOT NULL ,product_id integer NOT NULL ,extended_price_amt decimal(15,2) NOT NULL ,item_cnt integer NOT NULL ... ) primary index( order_id ); When to build USI?
  • 25. Example: How many customers ordered green socks in the last month? (Assume that green socks is quite selective) select count(distinct order_header.customer_id) from order_header ,order_detail ,product where order_header.order_id = order_detail.order_id and order.order_dt > add_months(date, -1) and order_detail.product_id = product.product_id and product.product_subcategory_cd = 'SOCKS' and product.color_cd = 'GREEN' ; The order_id USI on order_header table obviates the need for allVamp duplication of spool result from order detail to product join when joining to the order header table. When to build USI?
  • 26. Lecture 18 Dr. Fawad Hussain Primary and Secondary IndicesPrimary and Secondary IndicesPrimary and Secondary IndicesPrimary and Secondary Indices----IIIIIIIIIIII
  • 27. Example: What is the average age (in years) of customers who live in California or Massachusetts, completed a graduate degree, are consultants, and have a hobby of volleyball or chess? select avg( (days(date) - days(customer.birth_dt)) / 365.25 ) from customer where customer.state_cd in (‘CA’ , MA’) and customer.education_cd = ‘G’ and customer.occupation_cd = ‘CONSULTANT’ and customer.hobby_cd in (‘VOLLEYBALL’,‘CHESS’) ; A Simple Query
  • 28. Assume: 20M customers. 128 byte rows. 64K data block size. Results in approximately 512 rows per block and a total of 39,063 blocks in the customer table. Note: We are ignoring block overhead for purposes of simplicity in calculations. Sample Query Structure
  • 29. Assume: 8% of customers live in California. 4% of customers live in Massachusetts. 4% of customers have completed a graduate degree. 6% of customers are consultants. 2% of customers have a primary hobby of chess. 3% of customers have a primary hobby of volleyball. Data Demographics
  • 30. Must read every block in the table. Apply where clause predicates to determine which customers to include in average. Adjust numerator and denominator of average as appropriate. Total I/O count = 39,063 Note: Data demographics have no (minimal) impact on query performance when using a full table scan operation. Full Table Scan
  • 31. B-tree or hash organization of column values: Index entries store row IDs (RIDs), lists of RIDs, or pointers to lists of RIDs. Originally designed for columns with many unique values (OLTP legacy). Assuming an eight byte RID, we will get 8096 RIDs per 64K block. Single Index Structure
  • 32. Optimizer chooses index with best selectivity based on values specified in query. 1. Access next (first) index entry corresponding to specified column value(s). 2. Use RID from index entry to locate row with specified column value. 3. Validate remaining predicates to qualify row. 4. Adjust average as appropriate. 5. Go to 2 until no more matching index values. Single Index Access
  • 33. What are my indexing choices? state_cd (8% + 4% = 12% selectivity) education_cd (4% selectivity) occupation_cd (6% selectivity) hobby_cd (2% + 3% = 5% selectivity) Choose education_cd because it has best selectivity. Single Index Access
  • 34. Access via index on education_cd: 800,000 RIDs (4% of 20M) 99 blocks of RIDs to read But...4% selectivity with 512 rows per block in the base table means that 800,000 selected RIDs will cause access to every block in the base table! Total I/O count = 39,063 + 99 = 39,162 Worse than full table scan! Single Index Access Performance
  • 35. How do we calculate that number? The selectivity on education is 4% (or 8 lac rows). If these rows were consecutively distributed, then we would have to access 8lac rows/512 (rows per block)= 1563 blocks. However, assuming equal distribution, and 4% selectivity, we will use probability to find the distribution.That is, 4% selectivity gives us 0.04*512=20.48 rows (that are desired) are found in each block. Hence, total I/O required is 8lacs /20.48 = 39062.5 bocks Plus the additional 99 for accessing the index (8 lac Rows and 8096 RIDs /block=99 blocks of RIDs to read) The calculations