3. Provide efficient access to relevant records
Based on values of particular attribute(s)
Same idea as index in back of a book
An index is a “thin” copy of a relation
Not all columns from the relation are included
The index is sorted in a particular way
Index supports efficient lookup
Useful when filters are selective
Avoid scanning rows that will be filtered out
4. Indexes organized based on some search key
Column (or set of columns) whose values are used to access the index
Organization can be sorting or hashing
Index is built for some relation
One index entry per record in the relation
Index consists of <Value, RID> pairs
Value = value of the search key for this record
RID = record identifier
▪ Tells the DBMS where the record is stored
▪ Usually (page number, offset in page)
5. Traditional Access Methods
B-trees, hash tables, R-trees, grids, …
Popular in Warehouses
Covering indexes
Multi column indexes
join indexes
bit map indexes
5
6. Idea behind fact index:
Thinner version of fact table
Index takes up less space than fact table
Fewer I/Os required to scan it
7. Index has 1 index entry per fact table row
Regardless of how many columns are in the
index
8. Sometimes an index has all the data you need
Allows index-only query plan
Not necessary to access the actual tuples
Such an index is called a covering index
SELECT COUNT(*) FROM R WHERE A=5
Use index on A
Count number of <5,RID> entries
No need to look up records referenced by RIDs
9. Multi-column indexes are very useful in data warehousing
We say such an index has a composite key
Example: B-Tree index on (A,B)
Search key is (A,B) combination
Index entries sorted by A value
Entries with same A value are sorted by B value
Called a lexicographic sort
SELECT SUM(B) FROM R WHERE A=5
Our (A,B) index covers this query!
Coverage vs. size trade-off
More attributes in search key → index covers more queries
More attributes in search key → index takes up more disk space
11. Advantages
efficient computation of joins involving first index
columns (or all columns)
Disadvantages
useful only for specific join combinations
▪ for general usage, it is necessary to store a high number
of indices
required space may be significant
▪ joins always involve the fact table
11
12. Base table Index on Region Index on Type
Cust Region Type RecIDAsia Europe America RecID Retail Dealer
C1 Asia Retail 1 1 0 0 1 1 0
C2 Europe Dealer 2 0 1 0 2 0 1
C3 Asia Dealer 3 1 0 0 3 0 1
C4 America Retail 4 0 0 1 4 1 0
C5 Europe Dealer 5 0 1 0 5 0 1
Query:
Get customer with region = „Asia‟ AND type = “Dealer”
12
13. Good if domain cardinality small
Most useful for attributes with low or
medium cardinality
▪ Not good for something like LastName
13
14. Index intersection plans with bitmap indexes
are fast
Just perform bitwise AND!
Index intersection with B-Trees requires a
join
15. Save space for low-cardinality attributes
As compared to a B-Tree or Hash index
16. Bit vectors can be compressed
Compression Pros and Cons
Reduce storage space → reduce number of I/Os required
Need to compress/uncompress → increase CPU work
required
Each compression scheme negotiates this trade-off
differently
Operate directly on compressed bitmap → improved
performance
16
17. Bit matrix which precomputes the join between a
dimension and the fact table
one column for each dimension RID
one row for each fact table RID
cell (i,j) is 1 if fact table tuple i joins dimension tuple j, 0
otherwise
18. Indexing dimensions
attributes frequently involved in selection predicates
if domain cardinality is high, then B-tree index
if domain cardinality is low, then bitmap index
Indices for join
indexing only foreign keys in the fact table is rarely
appropriate
star join index should be used with caution (column order
issue)
bitmapped join index is suggested (if available)
Indices for group by
use materialized views