Writing efficient sql

 Karen Morton – Pro Oracle SQL
@karen_morton
 Maria Colgan – Product Manager for Oracle
Optimizer @SQLMaria
 Toon Koppelaars - @ToonKoppelaars
 Tune My Query - @TuneMyQuery
 Jeff Smith – Product Manager for SQL
Developer @thatjeffsmith
 Kerry Osborne, Robyn Sands, Riyaj
Shamsudeen and Jared Still – Pro Oracle SQL

 Under the covers
 Parse
 Query Transformation
 Execution Plan
 Oracle I/O
 FTS vs Index
 Joins
 Set Theory
 Logical Expressions
 Subquery Factoring

 What you will get -> concept behind Oracle
as well as some key concepts that help you
formulate your queries for efficiency
 What you won‟t get -> a silver bullet to
writing the most efficient SQL. It takes
practice and great bit of understanding of
oracle internals, as well as understanding the
data. No one way to write SQL

 Results don‟t equal an efficient query
 Your queries impact the overall performance
of the system
 The database is not a black box
 The more you know the more efficiently you
will write queries

 Check Syntax
 Validate Objects
 Check permission
 Does the query already exist? If yes go
straight to execution using existing execution
plan  soft

 Converts to hash value for comparison
 Must match in case
Select * from Employee;
Select * from employee;

 Comments
Select /* not the same */ * from employee;
Select * from employee;
*** This trick is very useful when testing
queries

 Values
Select * from employee where id = 1234;
Select * from employee where id = 5678;

 Binds allow values to be different but
statement the same
Variable v_id number
Exec :v_id := 1234
Select * from employees where id = v_id;
Variable v_id number
Exec :v_id := 5678
Select * from employees where id = v_id;

 Latch is a type of lock
 Required when reading structures from
Oracle‟s memory
 Serialized to protect the memory
 “Can I have the latch” -> No, spin
 If not acquired in x times it is temporarily put
on hold until it‟s turn on the CPU again.
 This eats CPU cycles

 Re-write statement if a better plan can be
achieved
 Doesn‟t change the result sets
 Write your query so no transformation
happens

 Subquery unnesting
 View Merging
 Predicate pushing
 Query Rewrite with Materialized Views

 Subquery turned into a join
Select id from employer where admn_id in
(select id from administrator
where id = 1000080);
Select a.id from employer er, administrator a
Where er.admn_id = a.id
And a.id = 1000080;

Plan hash value: 150543180
------------------------------------------------------------------------
--------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------
--------------------
| 0 | SELECT STATEMENT | | 1 | 12 | 2 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| EMPLOYER | 1 | 12 | 2 (0)|
00:00:01 |
|* 2 | INDEX RANGE SCAN | ER_ADMN_FK_I | 1 | | 1 (0)| 00:00:01 |
------------------------------------------------------------------------
--------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("ADMN_ID"=1000080)

--------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------
|* 1 | FILTER | | | | | |
| 2 | VIEW | index$_join$_001 | 5300 | 63600 | 31 (4)| 00:00:01 |
|* 3 | HASH JOIN | | | | | |
| 4 | INDEX FAST FULL SCAN| ER_PK | 5300 | 63600 | 15 (0)| 00:00:01 |
| 5 | INDEX FAST FULL SCAN| ER_ADMN_FK_I | 5300 | 63600 | 23 (0)| 00:00:01 |
|* 6 | FILTER | | | | | |
|* 7 | INDEX UNIQUE SCAN | ADMN_PK | 1 | 6 | 0 (0)| 00:00:01 |
--------------------------------------------------------------------------------------------
---------------------------------------------------
1 - filter( EXISTS (SELECT 0 FROM "CLAIMS"."ADMINISTRATOR"
"ADMINISTRATOR" WHERE :B1=1000080 AND "ID"=:B2))
3 - access(ROWID=ROWID)
6 - filter(:B1=1000080)
7 - access("ID"=:B1)

Update claim_batch set claim_type = null
where submitted_by = „Conversion‟
and claim_type = „Conversion‟
and id in
(select clmbt_id from claim
where status =„Approved‟ and ee_id in
(select id from employee where pycl_er_id in
(select id from employer where admn_id in
(select id from administrator where id =
1000080 or parent_admn_id =
1000080))))

Update claim_batch set claim_type = null
where submitted_by=„Conversion‟ and
claim_type = „Conversion‟
and exists (select 0 from claim c,
employee e, employer er, administrator a
where status =„Approved‟ and
c.ee_id = e.id, and e.pycl_er_id = er.id
and er.admn_id =a.id and
(er.admn_id = 1000080 or
a.parent_admn_id = 1000080)))

 Expands views (in-line or stored) into
separate query blocks
 Analyzed separately or merged into the query
and evaluated together
Select * from vw_er_user;
Re-writes the query to underlying query and
then creates the execution plan, and the view
contained in the view as well

 Query block predicate contains a column that
can be used in an index within another query
block
 A column that can be used for partition
pruning within another query block
 A condition that limits the rows returned
from one of the tables in a joined view

Select * from claim_batch cb1,
(select ee_id, id from claim_batch cb2)
cb_view
Where cb1.ee_id = cb_view.ee_id(+)
And cb1.id = cb_view.id
And cb1.id >20000;

------------------------------------------------------------------------
---------
------------------------------------------------------------------------
---------
| 0 | SELECT STATEMENT | | 7038K| 872M| 48942 (1)| 00:01:38 |
|* 1 | TABLE ACCESS FULL| CLAIM_BATCH | 7038K| 872M| 48942 (1)| 00:01:38 |
------------------------------------------------------------------------
---------
---------------------------------------------------
1 - filter("CB1"."ID">20000)

-----------------------------------------------------------------------------------
--------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
-----------------------------------------------------------------------------------
--------
| 0 | SELECT STATEMENT | | 7038K| 1047M| | 157K (1)| 00:05:16 |
|* 1 | HASH JOIN | | 7038K| 1047M| 255M| 157K (1)| 00:05:16 |
| 2 | VIEW | | 7038K| 174M| | 48768 (1)| 00:01:38 |
|* 3 | TABLE ACCESS FULL| CLAIM_BATCH | 7038K| 80M| | 48768 (1)| 00:01:38 |
|* 4 | TABLE ACCESS FULL | CLAIM_BATCH | 7038K| 872M| | 48942 (1)| 00:01:38 |
-----------------------------------------------------------------------------------
--------
---------------------------------------------------
1 - access("CB1"."EE_ID"="CB_VIEW"."EE_ID" AND "CB1"."ID"="CB_VIEW"."ID")
3 - filter("ID">20000)
4 - filter("CB1"."ID">20000)

 Analytic or aggregate functions
 Set operations
 Order By clause
 Rownum

 Apply predicates from a containing query
block into a non-mergeable query block.
 Use an index or other filtering of data earlier
in the query plan
 Less data = Less work

Select ee.id.ee.pycl_er_id,cl.sum_amt
From employee ee,
(select ee_id, sum(amt) from claim
group by ee_id) cl
Where ee.id = cl.ee_id
And ee.id = 5728460

----------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------
| 1 | NESTED LOOPS | | 1 | 27 | 25 (0)| 00:00:01 |
| 2 | TABLE ACCESS BY INDEX ROWID | EMPLOYEE | 1 | 12 | 3 (0)| 00:00:01 |
|* 3 | INDEX UNIQUE SCAN | EE_PK | 1 | | 2 (0)| 00:00:01 |
| 4 | VIEW | | 1 | 15 | 22 (0)| 00:00:01 |
| 5 | SORT GROUP BY | | 1 | 10 | 22 (0)| 00:00:01 |
| 6 | TABLE ACCESS BY INDEX ROWID| CLAIM | 38 | 380 | 22 (0)| 00:00:01 |
|* 7 | INDEX RANGE SCAN | CLM_EE_FK_I | 38 | | 3 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------
---------------------------------------------------
3 - access("EE"."ID"=5728460)
7 - access("EE_ID"=5728460)

---------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
---------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 459K| 16M| | 231K (1)| 00:07:43 |
| 1 | NESTED LOOPS | | 459K| 16M| | 231K (1)| 00:07:43 |
| 2 | TABLE ACCESS BY INDEX ROWID| EMPLOYEE | 1 | 12 | | 3 (0)| 00:00:01 |
|* 3 | INDEX UNIQUE SCAN | EE_PK | 1 | | | 2 (0)| 00:00:01 |
|* 4 | VIEW | | 459K| 11M| | 231K (1)| 00:07:43 |
| 5 | SORT GROUP BY | | 459K| 4489K| 351M| 231K (1)| 00:07:43 |
| 6 | COUNT | | | | | | |
|* 7 | FILTER | | | | | | |
| 8 | VIEW | index$_join$_003 | 18M| 175M| | 203K (1)| 00:06:48 |
|* 9 | HASH JOIN | | | | | | |
| 10 | INDEX FAST FULL SCAN | CLM_EE_FK_I | 18M| 175M| | 71296 (1)| 00:02:23 |
| 11 | INDEX FAST FULL SCAN | CLM_CAT_NS_AMT | 18M| 175M| | 124K (1)| 00:04:09 |
---------------------------------------------------------------------------------------------------------
---------------------------------------------------
3 - access("EE"."ID"=5728460)
4 - filter("CL"."EE_ID"=5728460)
7 - filter(ROWNUM>1)
9 - access(ROWID=ROWID)

 Statistics are used to determine the cost of
each execution plan that is formulated by the
optimizer
 Cost is an internal value calculated by the
optimizer to select the most efficient plan
 Lower cost may not always be the most
efficient plan
 Should not be used by users to determine
which plan is better – evaluate, evaluate

Select * from claim_type
where clmc_category = „Medical‟;
Total number of rows in table = 1000
Total number of distinct clmc_category = 10
Number of rows should return =
(1/num_distinct) * num_rows =
(1/10) * 1000 = 100
Selectivity = 100

 Two predicates joined with AND condition
 Selectivity is determined for each and then
multiplied together
 Table claim_type has different categories and
subcategories. Subcategories are exclusive to
categories

Select * from claim_type where clmc_subcat =
„MED‟ and clmc_category = „Medical‟;
Total number of rows 1,000,000
Number of distinct categories = 4
Number of distinct subcategories = 1000
Each distinct subcategory can only exist in 1
category Med is only found in Medical

Selectivity for category = .25
Selectivity for subcategory = .001
Overall selectivity (.25*.001) = .00025 or 250
Reality the number or rows is 1000
Selectivity reduced by 25%

 Plan stored in library cache for reuse
 Execute plan – instructions access method of
table object, order and join methods
 Fetch Rows – retrieves blocks and return to
application

 Blocks – not rows
 Checks the buffer cache first – logical I/O (lio)
 Physical I/O (the most common)
1. db sequential reads
2. db scattered reads
3. direct i/o skips placing in buffer

 LIO drives PIO – oracle must read the buffer
cache before issuing a PIO request
 Workload characterized as LIO
 Least amount of LIO to satisfy the results
 Sometimes Full Table Scan FTS is better

 How data is stored as well as how much is
returned
 Number of blocks to read and how many
rows will be in the final results
 How many throwaway rows – checked against
predicate and don‟t match (CPU operation)
 Blocks accessed and amount of throwaway
increase cost of the FTS goes up
 Multiple blocks are read in an I/O operation

 Index list column value and rowid
 Value matched to predicate
 Table accessed via rowid
 Each row retrieved at least 2 blocks read

 Index Range/Unique Scan – Predicate returns a
range of data
 Root Block -> Branch Block -> Leaf Block
 Once first value is found -> table access rowid
 Repeat until all values are found
 Selectivity is important

 Index Full Scan – Scan every block, read all
rowids and retrieve table rows
 No predicate but column list can be satisfied
through an index
 Predicate on a non-leading column
 Data can be retrieved in sorted order

 Index skip scan
 Predicate contains on a non-leading column
and leading column is fairly distinct

 Index Fast Full Scan
 Multiblock read
 All columns exist in the index to satisfy query
 At least one column has a not null constraint

 Between tables or row sources
 Nest Loops – Read each row of outer source
and join to inner source row set smaller

 Sort-merge – tables read independently, sort
rows that meet the conditions by join key and
then merge rowset

 Hash joins – Read tables independently apply
condition. Return the fewest rows hashed into
memory. Read larger table apply hash
function look for match in smaller rowset

 Cartesian- All rows from one table joined to
all rows in another table
Join condition may have been overlooked
in the where clause
 Outer join – returns all rows from one table
and only those rows from the joined table
where the join condition is met.

 www.setgame.com/puzzle.set.htm
 Study of sets – collection of objects
 If you aren‟t writing queries based on sets
then you aren‟t using SQL the way it was
meant to be used
 Move from Procedural or Process to Set
thinking

 Procedural or process ->
flow of each step that needs to be taken
For each row I need to do “x”
If then else while do loop
 Set theory -> For all I need to do “x”

 If “X” then “Y” where X and Y are in both
conditions
 Boolean help filter out and reduce code path
executed -- and or not
 Where (:erID=1) or (er.id = :nERid)

 Where (:erID=1) AND (er.id = :nERid)
 Mostly like to evaluate to false first
 Reduce the workload by reaching answer
quicker

 Usually AND will allow the opitimizer to make
better choices but not always
 With an AND single choice is more likely
 With an OR two different operations

 Index scans not always the most efficient
access method
 Predicates with selectivity
 Traverse index for rowids -> fetch rows from
table blocks -> apply remainder of predicates
 Data distribution

 Should match the predicate
 Concatenated index leading column used
most frequently as a predicate
 Cardinality of predicates and selectivity of
columns
 Cardinality -> number of rows expected to
be fetched by a predicate or execution step

 Not every column needs to be index
 Based on application access and queries
 There is a cost associated with each Index for
DML
 Big columns could be more costly
 Functions negate the use of an index

 Null not stored in single column index
 Null is stored in multi column index
 Create an index on the single column that is
null and add a second dummy column
(claim_type,0)
This will allow the index to be choosen

 With Clause
 Move parts of large query with many tables
from main body to the With Clause
 Great for Nested Subqueries
 Great for repeating tables
 Not a silver bullet

 When you move a query block to the With
clause it is given a name and that name is
then referenced later in the query.

 select tw.ee_id Ee_Id,cb.claim_type,Decode(cl.substantiated_by,'pre-substantiated','Y','N') "Auto-Substantiated",
 ca.action_on,
 count(cl.id) "NumberofClaimLines"
 from claim cl,claim_batch cb,claim_activity ca,(select ee.id as ee_id,el.id as el_id
 from administrator ad, employer er, employee ee, employer_account ea, election el
 where ad.id = 233
 and er.admn_id=ad.id
 and er.id=44144
 and ea.er_id=er.id
 and ee.pycl_er_id=er.id
 and el.ee_id=ee.id
 and el.erac_er_id=er.id
 and el.erac_actp_cd=ea.actp_cd
 and el.erac_ends=ea.ends) tw
 where
 cb.ee_id=tw.ee_id
 and cl.ee_id=tw.ee_id
 and cl.ee_id=cb.ee_id
 and cl.clmbt_id=cb.id
 and cl.elct_id=tw.el_id
 and cl.service_begins between to_date('01/01/2013','mm/dd/yyyy') and to_Date('01/31/2013','mm/dd/yyyy')
 and ca.clm_id=cl.id
 and ca.action='Entered'
 and ca.action_on = (select min(ca1.action_on) from claim_activity ca1 where ca1.clm_id=cl.id and ca1.action=ca.action)
 group by tw.ee_id,cb.claim_type,Decode(cl.substantiated_by,'pre-substantiated','Y','N'),ca.action_on;

 with EE as (select ee.id as ee_id,el.id as el_id
 from administrator ad, employer er, employee ee, employer_account ea, election el
 where ad.id = 233
 and er.admn_id=ad.id
 and er.id=44144
 and ea.er_id=er.id
 and ee.pycl_er_id=er.id
 and el.ee_id=ee.id
 and el.erac_er_id=er.id
 and el.erac_actp_cd=ea.actp_cd
 and el.erac_ends=ea.ends)
 select tw.ee_id Ee_Id,cb.claim_type,Decode(cl.substantiated_by,'pre-substantiated','Y','N') "Auto-Substantiated",
 ca.action_on,
 count(cl.id) "NumberofClaimLines"
 from claim cl,claim_batch cb,claim_activity ca, EE tw
 where
 cb.ee_id=tw.ee_id
 and cl.ee_id=tw.ee_id
 and cl.ee_id=cb.ee_id
 and cl.clmbt_id=cb.id
 and cl.elct_id=tw.el_id
 and cl.service_begins between to_date('01/01/2013','mm/dd/yyyy') and to_Date('01/31/2013','mm/dd/yyyy')
 and ca.clm_id=cl.id
 and ca.action='Entered'
 and ca.action_on = (select min(ca1.action_on) from claim_activity ca1 where ca1.clm_id=cl.id and ca1.action=ca.action)
 group by tw.ee_id,cb.claim_type,Decode(cl.substantiated_by,'pre-substantiated','Y','N'),ca.action_on
 order by 1;

 WHERE (AD.ID=:B4 OR AD.ID IN
(SELECT DM.ID
FROM ADMINISTRATOR DM
WHERE DM.PARENT_ADMN_ID=:B4 )) AND
ER.ADMN_ID= AD.ID
 with ad as (select id from administrator where
id = 1000060 or parent_admn_id =
1000060)

 Less is better in regards to I/O physical as
well as logical
 Don‟t do unnecessary work Keep it simple
 Complex query start with the most selectivity
filter early
 Only columns necessary

 More is better
 The more information provided to Oracle about
your data the better plan the optimizer will select
 Test with different versions – need same results
returned
 Keep informed of developments within SQL
especially when moving from different verisions

 A hint is a directive
 They can and do though not always alter the
execution plan and not always for the good
 Don‟t be afraid just use with caution and be
responsible – should always be reviewed by a
DBA
 When you see a hint in code review it, is it
still relevant.
 Some times hints are the only solution to a
more efficient plan

 Understand What and How Oracle is doing on
your behalf
 Parse - Soft vs Hard
 Query Transformation
 Execution Plan
 Fetch
 FTS vs Index
 Joins
 Set Theory
 Logical Expressions
 Subquery Factoring

Writing efficient sql

Recommended

Recommended

More Related Content

What's hot

What's hot (7)

Similar to Writing efficient sql

Similar to Writing efficient sql (20)

Recently uploaded

Recently uploaded (20)

Writing efficient sql

Editor's Notes