2. Database Tuning
◮ Make a database application run more quickly
◮ higher throughput
◮ lower response time
◮ Auto-tuning / self-tuning
◮ Better performance
◮ Easier manageability
◮ Query workload
◮ Online transaction processing (OLTP)
◮ Decision support systems (DSS) / Online analytical
processing (OLAP) / Data warehousing
CS5226: Sem 2, 2012/13 Introduction 2
3. Anatomy of DBMS
(Hellerstein, Stonebraker, Hamilton, 2007)
CS5226: Sem 2, 2012/13 Introduction 3
4. Query Optimization
πX , Y , Z project
X,Y,Z
sort-merge join
⊲⊳R .B=T .B
select R.X, S.Y, T.Z R.B = T.B
from R, S, T
hash join scan table
where R.A = S.A ⊲⊳R .A=S.A T
R.A = S.A for T
and R.B = T.B
R S scan index
scan table
on (A,Y)
for R
for S
Internal Query Physical
Query
Representation Query Plan
CS5226: Sem 2, 2012/13 Introduction 4
5. Performance Tuning Knobs
◮ Schema tuning
◮ Query tuning
◮ Index & materialized view selection
◮ Statistics tuning
◮ Concurrency control tuning
◮ Data partitioning
◮ Memory tuning
◮ Hardware tuning
CS5226: Sem 2, 2012/13 Introduction 5
6. Schema Tuning
CourseInfo
Module Prof Room Building Time
CS101 Turing LT 1 CS 0800
CS400 Turing LT 1 CS 1400
MU300 Bach LT 2 Math 1400
MA200 Newton LT 2 Math 1000
CS101 Turing LT 2 Math 1200
Schedule
Course
Room Time Module
Facility Module Prof
LT 1 0800 CS101
Room Building CS101 Turing
LT 1 1400 CS400
LT 1 CS CS400 Turing
LT 2 1400 MU200
LT 2 Math MU300 Bach
LT 2 1000 MA200
MA200 Newton
LT 2 1200 CS101
CS5226: Sem 2, 2012/13 Introduction 6
7. Query Tuning
Q1: select c.cname
from Customer c
where 1000 < (select sum(o.totalprice)
from Order o
where o.cust# = c.cust#)
Q2: select c.cname
from Customer c join Order o
on c.cust# = o.cust#
group by c.cust#, c.cname
having 1000 < sum(o.totalprice)
CS5226: Sem 2, 2012/13 Introduction 7
8. Query Tuning (cont.)
Q3: select c.cust#, c.cname, sum(o.totalprice) as T
from Customer c join Order o
on c.cust# = o.cust#
group by c.cust#, cname
Q4: select c.cust#, cname, T
from customer c,
(select cust#, sum(totalprice) as T
from Order
group by cust#) as o
where c.cust# = o.cust#
CS5226: Sem 2, 2012/13 Introduction 8
9. Query Tuning (cont.)
Q5: select distinct R.A, S.X
from R, S
where R.B = S.Y
Q6: select R.A, S.X
from R, S
where R.B = S.Y
CS5226: Sem 2, 2012/13 Introduction 9
10. Index Tuning
select A, B, C
from R
where 10 < A < 20
Access methods for selection queries:
◮ Table scan
◮ Use one or more indexes
CS5226: Sem 2, 2012/13 Introduction 10
11. B+-tree index
18
12 17 22 24
(Moe,10,55,180) (Larry,12,70,175) (Alice,17,48,175) (John,18,59,182) (Marcie,22,50,165) (Sally,24,48,169)
(Curly,10,65,171) (Bob,15,60,178) (Lucy,17,45,170) (Charlie,20,69,173) (Linus,23,60,166) (Tom,25,56,176)
Clustered index on (age)
Relation R
59
name age weight height
Moe 10 55 180
Curly 10 65 171
50 65
Larry 12 70 175
Bob 15 60 178
Alice 17 48 175
Lucy 17 45 170 (45, RID32) (50, RID51) (59, RID41) (65, RID12)
John 18 59 182 (48, RID31) (55, RID11) (60, RID22) (69, RID42)
Charlie 20 69 173 (48, RID61) (56, RID62) (60, RID52) (70, RID21)
Marcie 22 50 165
Linus 23 60 166
Sally
Tom
24
25
48
56
169
176
Unclustered index on (weight)
CS5226: Sem 2, 2012/13 Introduction 11
12. Index access methods
◮ Index scan
◮ Index seek [+ RID lookup ]
◮ Index intersection [+ RID lookup ]
CS5226: Sem 2, 2012/13 Introduction 12
13. Index scan
select height
from Student
(175,48)
(170,45) (178,60)
(165,50, RID51) (170,45, RID32) (175,48, RID31) (178,60, RID22)
(166,60, RID52) (171,65, RID12) (175,70, RID21) (180,55, RID11)
(169,48, RID61) (173,69, RID42) (176,56, RID62) (182,59, RID41)
Index on (height,weight)
CS5226: Sem 2, 2012/13 Introduction 13
14. Index seek
select weight
from Student
where weight between 55 and 65
59
50 65
(45, RID32) (50, RID51) (59, RID41) (65, RID12)
(48, RID31) (55, RID11) (60, RID22) (69, RID42)
(48, RID61) (56, RID62) (60, RID52) (70, RID21)
Index on (weight)
CS5226: Sem 2, 2012/13 Introduction 14
15. Index seek + RID lookups
select weight
select name
from Student
from Student
where weight between 55 and 59
where weight between 55 and 59
and age ≥ 20
59
50 65 Index on (weight)
(45, RID32) (50, RID51) (59, RID41) (65, RID12)
(48, RID31) (55, RID11) (60, RID22) (69, RID42)
(48, RID61) (56, RID62) (60, RID52) (70, RID21)
(Moe,10,55,180) (Larry,12,70,175) (Alice,17,48,175)
(Curly,10,65,171) (Bob,15,60,178) (Lucy,17,45,170)
Data
(Sally,24,48,169) (Marcie,22,50,165) (John,18,59,182)
(Tom,25,56,176) (Linus,23,60,166) (Charlie,20,69,173)
Pages
CS5226: Sem 2, 2012/13 Introduction 15
16. Index intersection
select height, weight from Student
where height between 164 and 170
and weight between 50 and 59
(175,48)
170 178
Index on (height)
(165, RID51) (170, RID32) (175, RID31) (178, RID22)
(166, RID52) (171, RID12) (175, RID21) (180, RID11)
(169, RID61) (173, RID42) (176, RID62) (182, RID41)
59
50 65
Index on (weight)
(45, RID32) (50, RID51) (59, RID41) (65, RID12)
(48, RID31) (55, RID11) (60, RID22) (69, RID42)
(48, RID61) (56, RID62) (60, RID52) (70, RID21)
CS5226: Sem 2, 2012/13 Introduction 16
17. Index Tuning
Q1: select A, B, C
from R
where 10 < A < 20
and 20 < B < 100
Q2: select B, C, D
from R
where 50 < B < 100
and 60 < D < 80
CS5226: Sem 2, 2012/13 Introduction 17
18. Materialized View Tuning
Q1: select R.B
from R, S
where R.A = S.X
and S.Y > 100
MV1: select R.A, R.B, S.X, S.Y
from R, S
where R.A = S.X
Q1’: select B
from MV1
where Y > 100
CS5226: Sem 2, 2012/13 Introduction 18
19. Tuning of indexes & materialized views
Given a query workload and a disk space constraint, what
is the optimal configuration of indexes & materialized views
to optimize the performance of the workload?
CS5226: Sem 2, 2012/13 Introduction 19
20. Tuning of statistics
Examples of statistics:
◮ table cardinality
◮ statistics for each column:
◮ number of distinct values
◮ highest & lowest values
◮ frequent values
◮ data distribution statistics
◮ multi-column statistics
Issues
◮ What statistics to collect?
◮ When to collect/refresh statistics?
CS5226: Sem 2, 2012/13 Introduction 20
21. Tuning of concurrency control
◮ Concurrency control protocols
◮ Two-phase locking
◮ Snapshot isolation
◮ Consistency vs concurrency tradeoff
◮ ANSI SQL isolation levels
Dirty Unrepeatable Phantom
Isolation Level Read Read Read
READ UNCOMMITTED possible possible possible
READ COMMITTED not possible possible possible
REPEATABLE READ not possible not possible possible
SERIALIZABLE not possible not possible not possible
CS5226: Sem 2, 2012/13 Introduction 21
22. Tuning of memory
◮ How to optimize memory allocation?
Oracle Memory Model (Dageville & Zait, VLDB 2002)
CS5226: Sem 2, 2012/13 Introduction 22
23. Data partitioning
◮ Increase data availability
◮ Decrease administrative cost
◮ Improve query performance
Shared-nothing parallel DBMS (Hellerstein, et al., 2007)
CS5226: Sem 2, 2012/13 Introduction 23
24. References
Additional Readings:
◮ J.M. Hellerstein, M. Stonebraker, J. Hamilton, Architecture of a
Database System, Foundations and Trends in Databases, 1(2),
2007, 141-259.
CS5226: Sem 2, 2012/13 Introduction 24