More Related Content Similar to [INSIGHT OUT 2011] A15 how to design optimal sql(jonathan lewis) (20) More from Insight Technology, Inc. (20) [INSIGHT OUT 2011] A15 how to design optimal sql(jonathan lewis)1. How to âŠ
.. design efficient SQL
Jonathan Lewis
jonathanlewis.wordpress.com
www.jlcomp.demon.co.uk
Who am I ?
Independent Consultant.
27+ years in IT
23+ using Oracle
Strategy, Design, Review,
Briefings, Educational,
Trouble-shooting
jonathanlewis.wordpress.com
www.jlcomp.demon.co.uk
Member of the Oak Table Network
Oracle ACE Director
Oracle author of the year 2006
Select Editorâs choice 2007
O1 visa for USA
Jonathan Lewis Many slides have a foot-note. This is a brief summary of the comments that I Efficient SQL
© 2006 - 2011 should have made whilst displaying the slide, and is there for later reference. 2 / 32
1
2. Highlights
Know the data
Does a good execution path exist ?
Can the optimizer find that path ?
Jonathan Lewis Efficient SQL
© 2006 - 2011 3 / 32
Knowing the data
How much data?
Where is it ?
Jonathan Lewis Efficient SQL
© 2006 - 2011 4 / 32
2
3. Knowing the data - conflict
Your knowledge of the data
The optimizer's model of the data
Jonathan Lewis Efficient SQL
© 2006 - 2011 5 / 32
Choice of Strategies
Lots of little jobs
How many
How little (how precise)
One big job
Jonathan Lewis Efficient SQL
© 2006 - 2011 6 / 32
3
4. Common Outcomes
You think the task is
Small Big
Oracle thinks
Good Bad
Small
the task is
Plan Plan
Bad Good
Big
Plan Plan
Jonathan Lewis Efficient SQL
© 2006 - 2011 7 / 32
Optimizer problems
Correlated columns
Uneven data distribution
Aggregate subqueries
Non-equality joins
Bind variables
Jonathan Lewis Efficient SQL
© 2006 - 2011 8 / 32
4
5. Know the metadata
select AP_GRP_FK_I GRP_ID
table_owner, AP_GRP_ROLE_I GRP_ID (compress)
table_name, ROLE_ID
index_name,
AP_ORG_AP_I ORG_ID (compress 1)
column_name AP_ID
from
AP_ORG_FK_I ORG_ID
dba_ind_columns
order by AP_PER_AP_I PER_ID (compress 1)
AP_ID
table_owner,
table_name, AP_PER_FK_I PER_ID
index_name, AP_PK AP_ID
column_position (compress)
AP_ROLE_FK_I ROLE_ID
;
AP_UD_I TRUNC(UPD_DATE)
(drop, compress, coalesce)
Jonathan Lewis A simple query, and a little thought, can show us indexes which could be Efficient SQL
© 2006 - 2011 dropped or made more efficient. 9 / 32
Know the data (a)
COLX CT
select CHK 12
colX, COM 3252534
count(*) ct LDD 314
from t1 PD 1821
group by colX VAL 108
order by colX XRF 32
Jonathan Lewis Look for odd data patterns, and think how you can take advantage of them Efficient SQL
© 2006 - 2011 10 / 32
5
6. Know the data (b)
COLX CT
1 9
2 12
select 3 12
colX, 4 8
count(*) ct 5 7
from t1 6 9
group by colX ...
order by colX ...
9997 1
sample (5) 9998 1
sample block (5)
sample block (5, 2) -- 9i 9999 1
sample block (5, 2) seed(N) -- 10g 10000 1
Jonathan Lewis A simple query may not help - but it is still a good starting point Efficient SQL
© 2006 - 2011 11 / 32
Know the data (c)
select CT COUNT(*)
ct, count(*) 1 9001
...
from (
6 37
select 7 78
colX, 8 94
count(*) ct 9 117
from t1 10 112
11 126
group by colX There are 99 values 12 99
) that appear 12 times
13 97
group by ct 14 86
order by ct 15 49
; 16 32
...
22 1
Jonathan Lewis With a little extra sophistication we get an interesting idea about the number Efficient SQL
© 2006 - 2011 of rows that might be returned for "column = constant". 12 / 32
6
7. Know the data (d)
select BLOCKS COUNT(*)
blocks, count(*) 1 9001
from ( ...
select 6 43
/*+ index(t1(colX)) */ 7 83
colX, 8 107
count( 9 126
distinct substr(rowid,1,15) 10 120
) blocks 11 125
from t1 12 119
group by colX There are 90 values 13 90
that are scattered 14 69
)
across 13 blocks 15 42
group by blocks
16 28
order by blocks
...
; 19 2
Jonathan Lewis We also need to know about number of blocks accessed. Efficient SQL
© 2006 - 2011 13 / 32
Know the data (e)
select /*+ index(t,"T1_I1") */
count(*) nrw,
count(distinct sys_op_lbid(49721,'L',t.rowid)) nlb,
count(distinct hextoraw(
sys_op_descend("DATE_ORD") ||
sys_op_descend("SEQ_ORD")
)) ndk,
sys_op_countchg(substrb(t.rowid,1,15),1) clf
from
"TEST_USER"."T1" t
where
"DATE_ORD" is not null
or "SEQ_ORD" is not null
;
Jonathan Lewis This is a query used by the dbms_stats package to collect index stats. Efficient SQL
© 2006 - 2011 14 / 32
7
8. Know the data (f)
select
keys_per_leaf, count(*) blocks
from (
select
sys_op_lbid(49721,'l',t.rowid) block_id,
count(*) keys_per_leaf
from
t1 t
where
{index_columns are not all null}
group by
sys_op_lbid(49721,'l',t.rowid)
)
group by keys_per_leaf
order by keys_per_leaf
;
Jonathan Lewis We can take advantage of some of the functions to analyse the index quality Efficient SQL
© 2006 - 2011 15 / 32
Know the data (g)
KEYS_PER_LEAF BLOCKS KEYS_PER_LEAF BLOCKS
17 206 3 114
18 373
19 516 4 39
50% usage for splits 20 678 6 38
21 830 7 39
22 979 13 37
23 1,094
24 1,178 14 1
25 1,201 21 1
26 1,274 27 15
Expect 70% usage 27 1,252 28 3
28 1,120
29 1,077 39 1
30 980 54 6
31 934 55 3
32 893 244 1
33 809
34 751 281 1
35 640 326 8
36 738
37 625
38 570 Smashed (FIFO) Index
Assume 100% full 39 539
40 489
Jonathan Lewis Efficient SQL
© 2006 - 2011 16 / 32
8
9. Draw the query - requirement
select {columns} Orders in the last week where
from customers cus,
the customer is in London
orders ord
order_lines orl
the supplier is from Leeds
products prd1 there is a supplier elsewhere
suppliers sup1
where cus.location = 'LONDON'
and ord.id_customer = cus.id
and ord.date_placed between sysdate - 7 and sysdate
and orl.id_order = ord.id
and prd1.id = orl.id_product
and sup1.id = prd1.id_supplier
and sup1.location = 'LEEDS'
and exists (
select null
from product_match mch,
products prd2,
suppliers sup2
where mch.id_product = prd1.id
and prd2.id = mch.id_product_sub
and sup2.id = prd2.id_supplier
and sup2.location != 'LEEDS'
)
Jonathan Lewis Efficient SQL
© 2006 - 2011 17 / 32
Draw the query - outline
Orders in the last week where
the customer is in London
the supplier is from Leeds Order_lines
there is a supplier elsewhere
Exists Recent
Product_match Products Orders
Products
Not Leeds Leeds London
Suppliers Suppliers Customers
Jonathan Lewis Efficient SQL
© 2006 - 2011 18 / 32
9
10. Draw the query - indexes
Order_lines
PK FK PK FK
PK
Date
Product_match Products Orders
FK
Products
PK FK PK FK
Suppliers Suppliers Customers
Location Location Location
Jonathan Lewis Efficient SQL
© 2006 - 2011 19 / 32
Draw the query - statistics
Huge Order_lines
1:10
Good Clustering
Date:
Big Orders 1:2,500
Good Clustering
1:10 / 1:150 Good caching
Totally Random for recent data
Small Customers
Jonathan Lewis Efficient SQL
© 2006 - 2011 20 / 32
10
11. Sketch in paths - strategy
âą Pick a starting point
â How many rows will I start with
â How efficiently can I get them
â the first step may be inefficient (it only happens once)
âą How do I get to next table
â How many times do I make the step
â How precise is the access path
â How much data do I now have
Jonathan Lewis Efficient SQL
© 2006 - 2011 21 / 32
Sketch in paths - analysis
Order_lines
4
6 3
1
Product_match Products Orders
7
Products 5 2
8
Suppliers Suppliers Customers
http://jonathanlewis.wordpress.com/2010/03/04/sql-server-2
http://www.embarcadero.com/master-sql-tuners-oracle-lewis-hailey
Jonathan Lewis Efficient SQL
© 2006 - 2011 22 / 32
11
12. Case Study (a)
select "distinct" is always a bit little suspect,
distinct trx.id_contract suggesting an error in a join clause or
a rewrite with an existence subquery.
from (The latter is not viable in this case)
transactions trx,
contracts con,
transaction_types tty
where
trx.id_ttype = tty.id
and trx.id_contract = con.id
and con.id_office = :b1
and tty.qxt <> 'NONE'
and trx.created between :b2 and :b3
and trx.error = 0
;
Jonathan Lewis The SQL is a little odd - it's creating a drop-down list for an OLTP system to Efficient SQL
© 2006 - 2011 show contracts that have had some work done on them in a given date range 23 / 32
Case Study (b)
.
| Id| Operation | Name |Rows |Bytes | Cost |
| 0| SELECT STATEMENT | | | | 14976 |
| 1| SORT UNIQUE | | 7791 | 304K| 14976 |
| 2| FILTER | | | | |
| 3| HASH JOIN | | 7798 | 304K| 14974 |
| 4| VIEW | | 9819 |98190 | 1599 |
| 5| HASH JOIN | | | | |
| 6| INDEX RANGE SCAN | CON_OFF_FK | 9819 |98190 | 35 |
| 7| INDEX FAST FULL SCAN| CON_PK | 9819 |98190 | 1558 |
| 8| HASH JOIN | | 7798 | 228K| 13374 |
| 9| TABLE ACCESS FULL | TRANS_TYPES | 105 | 945 | 3 |
| 10| TABLE ACCESS FULL | TRANSACTIONS | 7856 | 161K| 13370 |
Jonathan Lewis The AWR history showed 11 different plans in the previous week. The most Efficient SQL
© 2006 - 2011 resource-intenstive bit was always the scan on transactions. 24 / 32
12
13. Case Study (c)
12,000 rows per day
Almost all have error = 0
Data for the same day is well packed.
created between âŠ
(indexed)
error = 0 Transactions
tty (id) trx (id_ttype) con (id) trx(id_contract)
id_office = âŠ
qxt != âNONEâ
TX_types (indexed) Contracts
240 offices
100 rows no exclusions Contracts per office: 100 to 18,000
Histogram on Office ID
Jonathan Lewis This picture is simple, and we donât need all the details to see the problems Efficient SQL
© 2006 - 2011 and possible solutions. Bind variables and histograms donât go well together. 25 / 32
Case Study (d)
The variation in contracts per office is difficult.
Weâre allowed one plan (due to bind variables -- but see 11g).
Make the worst execution plan âgood enoughâ
The date range is usually one day.
Start at table transactions for constant response times.
Data cluster:
Date-based transactions are well clustered
Contract-based transactions are scattered over time
The database is âyoungâ and will be growing â a lot.
Jonathan Lewis The best plan depends on the office that wants the data â a small office Efficient SQL
© 2006 - 2011 would want to start at contracts, a large office at transactions ⊠at present. 26 / 32
13
14. Case Study (e)
Get rid of the histogram on office_id
Use hints (if necessary) to force the only "constant volume" plan
Target plan:
hash join
table access full transaction_types
table access by rowid contracts
nested loop
table access by rowid transactions
index range scan transactions_idx
index range scan contracts_idx
Jonathan Lewis We get rid of the histogram that is introducing instability. Future growth in Efficient SQL
© 2006 - 2011 contracts means we don't want to start at contracts. Hinting may be needed. 27 / 32
Case Study (f)
select
/*+
leading (trx con tty)
index(trx(created))
use_nl(con) index(con(id))
use_hash(tty) swap_join_inputs(tty) full(tty)
*/
distinct trx.id_contract
from
transactions trx,
contracts con,
transaction_types tty
where
...
Jonathan Lewis If you have to hint, you need to be thorough - get the complete join order, Efficient SQL
© 2006 - 2011 every access method, and every join method. Note the 10g index hints. 28 / 32
14
15. Case Study (g)
Reduce work with better indexing
Modified Indexes (option a - fairly safe)
contracts(id, id_office)
Target plan:
hash join
table access full transaction_types
nested loop
table access by rowid transactions
index range scan transactions_idx
index range scan contracts_idx
Jonathan Lewis Create a non-unique index with extra column to support the PK to get rid of Efficient SQL
© 2006 - 2011 the table access. Being high precision this is probably a safe index change. 29 / 32
Case Study (h)
Reduce work with better indexing
Modified Indexes (option b - needs careful testing)
contracts(id, id_office)
transactions(created, id_con, error, id_ttype)
Target plan:
hash join
table access full transaction_types
nested loop
index range scan transactions_idx
index range scan contracts_idx
Jonathan Lewis Adding columns to the TX index allow an index-only access. But the new Efficient SQL
© 2006 - 2011 index is much bigger with a worse clustering_factor, so may cause problems 30 / 32
15
16. Case Study (i)
select con.id
from contracts con
where id_office = :b1
and exists (
select null
from
transactions trx,
transaction_types tty
where trx.id_contract = con.id
and trx.created between :b2 and :b3
and trx.error = 0
and tty.id = trx.id_ttype
and tty.qxt <> 'NONE'
)
;
Jonathan Lewis An alternative style of query - but the workload increases with time, and the Efficient SQL
© 2006 - 2011 "best" indexing for transactions would be different. 31 / 32
Summary
Know the data
Draw the picture
Identify the problems
Bad indexing
Bad statistics
Optimizer deficiencies
Structure the query with hints
Jonathan Lewis Efficient SQL
© 2006 - 2011 32 / 32
16