2. Karen Morton – Pro Oracle SQL
@karen_morton
Maria Colgan – Product Manager for Oracle
Optimizer @SQLMaria
Toon Koppelaars - @ToonKoppelaars
Tune My Query - @TuneMyQuery
Jeff Smith – Product Manager for SQL
Developer @thatjeffsmith
Kerry Osborne, Robyn Sands, Riyaj
Shamsudeen and Jared Still – Pro Oracle SQL
3. Under the covers
Parse
Query Transformation
Execution Plan
Oracle I/O
FTS vs Index
Joins
Set Theory
Logical Expressions
Subquery Factoring
4. What you will get -> concept behind Oracle
as well as some key concepts that help you
formulate your queries for efficiency
What you won‟t get -> a silver bullet to
writing the most efficient SQL. It takes
practice and great bit of understanding of
oracle internals, as well as understanding the
data. No one way to write SQL
5. Results don‟t equal an efficient query
Your queries impact the overall performance
of the system
The database is not a black box
The more you know the more efficiently you
will write queries
7. Check Syntax
Validate Objects
Check permission
Does the query already exist? If yes go
straight to execution using existing execution
plan soft
8. Converts to hash value for comparison
Must match in case
Select * from Employee;
Select * from employee;
9. Comments
Select /* not the same */ * from employee;
Select * from employee;
*** This trick is very useful when testing
queries
10. Values
Select * from employee where id = 1234;
Select * from employee where id = 5678;
11. Binds allow values to be different but
statement the same
Variable v_id number
Exec :v_id := 1234
Select * from employees where id = v_id;
Variable v_id number
Exec :v_id := 5678
Select * from employees where id = v_id;
12. Latch is a type of lock
Required when reading structures from
Oracle‟s memory
Serialized to protect the memory
“Can I have the latch” -> No, spin
If not acquired in x times it is temporarily put
on hold until it‟s turn on the CPU again.
This eats CPU cycles
13. Re-write statement if a better plan can be
achieved
Doesn‟t change the result sets
Write your query so no transformation
happens
15. Subquery turned into a join
Select id from employer where admn_id in
(select id from administrator
where id = 1000080);
Select a.id from employer er, administrator a
Where er.admn_id = a.id
And a.id = 1000080;
16. Plan hash value: 150543180
------------------------------------------------------------------------
--------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------
--------------------
| 0 | SELECT STATEMENT | | 1 | 12 | 2 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| EMPLOYER | 1 | 12 | 2 (0)|
00:00:01 |
|* 2 | INDEX RANGE SCAN | ER_ADMN_FK_I | 1 | | 1 (0)| 00:00:01 |
------------------------------------------------------------------------
--------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("ADMN_ID"=1000080)
18. Update claim_batch set claim_type = null
where submitted_by = „Conversion‟
and claim_type = „Conversion‟
and id in
(select clmbt_id from claim
where status =„Approved‟ and ee_id in
(select id from employee where pycl_er_id in
(select id from employer where admn_id in
(select id from administrator where id =
1000080 or parent_admn_id =
1000080))))
19. Update claim_batch set claim_type = null
where submitted_by=„Conversion‟ and
claim_type = „Conversion‟
and exists (select 0 from claim c,
employee e, employer er, administrator a
where status =„Approved‟ and
c.ee_id = e.id, and e.pycl_er_id = er.id
and er.admn_id =a.id and
(er.admn_id = 1000080 or
a.parent_admn_id = 1000080)))
20. Expands views (in-line or stored) into
separate query blocks
Analyzed separately or merged into the query
and evaluated together
Select * from vw_er_user;
Re-writes the query to underlying query and
then creates the execution plan, and the view
contained in the view as well
21. Query block predicate contains a column that
can be used in an index within another query
block
A column that can be used for partition
pruning within another query block
A condition that limits the rows returned
from one of the tables in a joined view
22. Select * from claim_batch cb1,
(select ee_id, id from claim_batch cb2)
cb_view
Where cb1.ee_id = cb_view.ee_id(+)
And cb1.id = cb_view.id
And cb1.id >20000;
25. Analytic or aggregate functions
Set operations
Order By clause
Rownum
26. Apply predicates from a containing query
block into a non-mergeable query block.
Use an index or other filtering of data earlier
in the query plan
Less data = Less work
30. Statistics are used to determine the cost of
each execution plan that is formulated by the
optimizer
Cost is an internal value calculated by the
optimizer to select the most efficient plan
Lower cost may not always be the most
efficient plan
Should not be used by users to determine
which plan is better – evaluate, evaluate
31. Select * from claim_type
where clmc_category = „Medical‟;
Total number of rows in table = 1000
Total number of distinct clmc_category = 10
Number of rows should return =
(1/num_distinct) * num_rows =
(1/10) * 1000 = 100
Selectivity = 100
32. Two predicates joined with AND condition
Selectivity is determined for each and then
multiplied together
Table claim_type has different categories and
subcategories. Subcategories are exclusive to
categories
33. Select * from claim_type where clmc_subcat =
„MED‟ and clmc_category = „Medical‟;
Total number of rows 1,000,000
Number of distinct categories = 4
Number of distinct subcategories = 1000
Each distinct subcategory can only exist in 1
category Med is only found in Medical
34. Selectivity for category = .25
Selectivity for subcategory = .001
Overall selectivity (.25*.001) = .00025 or 250
Reality the number or rows is 1000
Selectivity reduced by 25%
35. Plan stored in library cache for reuse
Execute plan – instructions access method of
table object, order and join methods
Fetch Rows – retrieves blocks and return to
application
36. Blocks – not rows
Checks the buffer cache first – logical I/O (lio)
Physical I/O (the most common)
1. db sequential reads
2. db scattered reads
3. direct i/o skips placing in buffer
37. LIO drives PIO – oracle must read the buffer
cache before issuing a PIO request
Workload characterized as LIO
Least amount of LIO to satisfy the results
Sometimes Full Table Scan FTS is better
38. How data is stored as well as how much is
returned
Number of blocks to read and how many
rows will be in the final results
How many throwaway rows – checked against
predicate and don‟t match (CPU operation)
Blocks accessed and amount of throwaway
increase cost of the FTS goes up
Multiple blocks are read in an I/O operation
39. Index list column value and rowid
Value matched to predicate
Table accessed via rowid
Each row retrieved at least 2 blocks read
40. Index Range/Unique Scan – Predicate returns a
range of data
Root Block -> Branch Block -> Leaf Block
Once first value is found -> table access rowid
Repeat until all values are found
Selectivity is important
41. Index Full Scan – Scan every block, read all
rowids and retrieve table rows
No predicate but column list can be satisfied
through an index
Predicate on a non-leading column
Data can be retrieved in sorted order
42. Index skip scan
Predicate contains on a non-leading column
and leading column is fairly distinct
43. Index Fast Full Scan
Multiblock read
All columns exist in the index to satisfy query
At least one column has a not null constraint
44. Between tables or row sources
Nest Loops – Read each row of outer source
and join to inner source row set smaller
45.
46. Sort-merge – tables read independently, sort
rows that meet the conditions by join key and
then merge rowset
47.
48. Hash joins – Read tables independently apply
condition. Return the fewest rows hashed into
memory. Read larger table apply hash
function look for match in smaller rowset
49.
50. Cartesian- All rows from one table joined to
all rows in another table
Join condition may have been overlooked
in the where clause
Outer join – returns all rows from one table
and only those rows from the joined table
where the join condition is met.
51.
52. www.setgame.com/puzzle.set.htm
Study of sets – collection of objects
If you aren‟t writing queries based on sets
then you aren‟t using SQL the way it was
meant to be used
Move from Procedural or Process to Set
thinking
53. Procedural or process ->
flow of each step that needs to be taken
For each row I need to do “x”
If then else while do loop
Set theory -> For all I need to do “x”
54. If “X” then “Y” where X and Y are in both
conditions
Boolean help filter out and reduce code path
executed -- and or not
Where (:erID=1) or (er.id = :nERid)
55. Where (:erID=1) AND (er.id = :nERid)
Mostly like to evaluate to false first
Reduce the workload by reaching answer
quicker
56. Usually AND will allow the opitimizer to make
better choices but not always
With an AND single choice is more likely
With an OR two different operations
57. Index scans not always the most efficient
access method
Predicates with selectivity
Traverse index for rowids -> fetch rows from
table blocks -> apply remainder of predicates
Data distribution
58. Should match the predicate
Concatenated index leading column used
most frequently as a predicate
Cardinality of predicates and selectivity of
columns
Cardinality -> number of rows expected to
be fetched by a predicate or execution step
59. Not every column needs to be index
Based on application access and queries
There is a cost associated with each Index for
DML
Big columns could be more costly
Functions negate the use of an index
60. Null not stored in single column index
Null is stored in multi column index
Create an index on the single column that is
null and add a second dummy column
(claim_type,0)
This will allow the index to be choosen
61. With Clause
Move parts of large query with many tables
from main body to the With Clause
Great for Nested Subqueries
Great for repeating tables
Not a silver bullet
62. When you move a query block to the With
clause it is given a name and that name is
then referenced later in the query.
63. select tw.ee_id Ee_Id,cb.claim_type,Decode(cl.substantiated_by,'pre-substantiated','Y','N') "Auto-Substantiated",
ca.action_on,
count(cl.id) "NumberofClaimLines"
from claim cl,claim_batch cb,claim_activity ca,(select ee.id as ee_id,el.id as el_id
from administrator ad, employer er, employee ee, employer_account ea, election el
where ad.id = 233
and er.admn_id=ad.id
and er.id=44144
and ea.er_id=er.id
and ee.pycl_er_id=er.id
and el.ee_id=ee.id
and el.erac_er_id=er.id
and el.erac_actp_cd=ea.actp_cd
and el.erac_ends=ea.ends) tw
where
cb.ee_id=tw.ee_id
and cl.ee_id=tw.ee_id
and cl.ee_id=cb.ee_id
and cl.clmbt_id=cb.id
and cl.elct_id=tw.el_id
and cl.service_begins between to_date('01/01/2013','mm/dd/yyyy') and to_Date('01/31/2013','mm/dd/yyyy')
and ca.clm_id=cl.id
and ca.action='Entered'
and ca.action_on = (select min(ca1.action_on) from claim_activity ca1 where ca1.clm_id=cl.id and ca1.action=ca.action)
group by tw.ee_id,cb.claim_type,Decode(cl.substantiated_by,'pre-substantiated','Y','N'),ca.action_on;
64. with EE as (select ee.id as ee_id,el.id as el_id
from administrator ad, employer er, employee ee, employer_account ea, election el
where ad.id = 233
and er.admn_id=ad.id
and er.id=44144
and ea.er_id=er.id
and ee.pycl_er_id=er.id
and el.ee_id=ee.id
and el.erac_er_id=er.id
and el.erac_actp_cd=ea.actp_cd
and el.erac_ends=ea.ends)
select tw.ee_id Ee_Id,cb.claim_type,Decode(cl.substantiated_by,'pre-substantiated','Y','N') "Auto-Substantiated",
ca.action_on,
count(cl.id) "NumberofClaimLines"
from claim cl,claim_batch cb,claim_activity ca, EE tw
where
cb.ee_id=tw.ee_id
and cl.ee_id=tw.ee_id
and cl.ee_id=cb.ee_id
and cl.clmbt_id=cb.id
and cl.elct_id=tw.el_id
and cl.service_begins between to_date('01/01/2013','mm/dd/yyyy') and to_Date('01/31/2013','mm/dd/yyyy')
and ca.clm_id=cl.id
and ca.action='Entered'
and ca.action_on = (select min(ca1.action_on) from claim_activity ca1 where ca1.clm_id=cl.id and ca1.action=ca.action)
group by tw.ee_id,cb.claim_type,Decode(cl.substantiated_by,'pre-substantiated','Y','N'),ca.action_on
order by 1;
65. WHERE (AD.ID=:B4 OR AD.ID IN
(SELECT DM.ID
FROM ADMINISTRATOR DM
WHERE DM.PARENT_ADMN_ID=:B4 )) AND
ER.ADMN_ID= AD.ID
with ad as (select id from administrator where
id = 1000060 or parent_admn_id =
1000060)
66. Less is better in regards to I/O physical as
well as logical
Don‟t do unnecessary work Keep it simple
Complex query start with the most selectivity
filter early
Only columns necessary
67. More is better
The more information provided to Oracle about
your data the better plan the optimizer will select
Test with different versions – need same results
returned
Keep informed of developments within SQL
especially when moving from different verisions
68. A hint is a directive
They can and do though not always alter the
execution plan and not always for the good
Don‟t be afraid just use with caution and be
responsible – should always be reviewed by a
DBA
When you see a hint in code review it, is it
still relevant.
Some times hints are the only solution to a
more efficient plan
69. Understand What and How Oracle is doing on
your behalf
Parse - Soft vs Hard
Query Transformation
Execution Plan
Fetch
FTS vs Index
Joins
Set Theory
Logical Expressions
Subquery Factoring
Editor's Notes
No silver bullets! No one way to write a query!
Just because you are receiving the correct results doesn’t mean its an efficient queryThe database is a shared resource, queries can and do impact everyone. Ever get that IM from one us asking what you are doing on a specific database?
Our focus is on the SGA, in particular the buffer cache, and shared pool
Every query or DML will perform a Parse, Bind, Execute, FetchWhether it performs a soft parse or hard parse is the questionHard parse requires more work aim to soft parse
Comments count
Binds are good, but some times it is better not to bind. If statement is executed infrequently literals will be the better choice which is why DW environment you generally don’t bindIf you execute a statement thousands of times per second bind
Latches are serial by their very nature they are a necessary evil to protect the oracle memory structuresKey point here is that we hard parse we hold the latch longer, performing more work and preventing others from acquiring the latchResource intense and one more reason to aim for soft parsing
The query you submit may not be what oracle actually executesWould be nice if it could take any query and re-write it but that doesn’t happen Certain circumstances a re-write may occur
We don’t use query rewrite with MV, because we don’t use MV those are for DW environments
Subquery is find the where clause of a statement.Unnesting is removing the subquery and making it a join in the main query block
Notice the index access on employer with only the er_admn_fk_i All the information I need is there I don’t actually need to visit the administrator table and oracle knows this. Good example where constraints provide information to the optimizer
Without the unnesting we see the word view in the execution plan, plus I utilize both tables the administrator and the employer table. Notice the number of rows are much larger which means it’s doing more work by visiting more blocks. Will discuss blocks later.
This is a very popular query that I see in a several places executed over and over. It is also very classic of what ORM tools would create with lots of subqueries instead of joins.
What Oracle does with it is re-writes it. The IN becomes an exist and all the little subqueries become 1 with joins instead of nested subquery.
View merging very similar to subqueryunnesting except in-line or stored views which are part of the select statement.There is actually a view within the vw_er_user view, oracle will re-write the query and submit both as one query when possible. Not always possible.As a side note the view used within the vw_er_user only exist for the conditions but no rows are returned to the main view. More than likely this is a case where the tables in the second view should just be folded into the main view.
If there is a column that I can use to reduce the amount of data within the view, then view merging will occur provided all conditions are met. Why the goal is to reduce the amount of data I’m passing from one block to the next block. Filter early.
Made up query to prove the concept but I’m sure we could find queries that this occurs.
Notice the access is one FTS with claim_batch. Notice 7038k row one time hit straight to the table.
When no merging occurs the word view shows up in the execution plan, and there are two accesses to the claim_batch table. Twice as much work. The table has to be accessed twice one for each block.
Be aware of the conditions that can prevent view merging. In most cases we want view merging to happen, however, these conditions can stop it from occurring.
Idea is to filter earlier the less data that you can you can return for the next operation the more efficient the query will be
Notice the predicate is part of the outer query
Notice the index range scan on clm_ee_fk_i that was our predicate and it is pushed into the view.
Without predicate pushing you see the count and a filter, plus access additional indices. Trick that stopped it from happening is the rownum. Notice the amount of work that has to be done with the pushing of the predicate to the inner query?
Now that the optimizer has determined whether it needs to do a query re-write or not. It’s time to determine the execution plan.Don’t really need to worry about statistics. The DBAs take care of the ensuring the stats run nightly depending on the data changes and whether the object requires it. But it is the basis for all the math that oracle performs to determine the most efficient plan. What may show up in the math as the most efficient plan may not end up being the most efficient. There are details about the data that may not be known to oracle. The way that the query is written may lead the optimizer to take into consideration items. The cost is internal don’t use the cost to assume the plan is more efficient and will run faster.
Selectivity and cardinality are often used interchangable. Selectivity = how many rows can expect to be returned by the query. Cardinality is num of distinct values a column has
The way a query is written can and does determine if the estimates of selectivity are calculated correctly. Sometimes providing too much information to the optimizer will cause the calculation to be flat out wrong.
Keep in mind that each subcategory can only exist in 1 category. In other words MED is only found in Medical category and no other category.
Oracle calculates the selectivity of both columns and multiples them.Knowthe data and re-write queries to inform the optimizer about the data. In this case adding the category was unnecessary but Oracle had know way knowing it.
Once the plan is determined it’s stored for future use. The execution plan is just a set of instructions.Then oracle will fetch the rows. And how the rows are stored and distributed come into play on whether the execution plan is efficient or not.
Logical I/O drives physical I/O the buffer is always checked first then the spinning diskSeveral different types of physical i/oExecution plan choosen will determine the amount I/O performedBuffer cache is memory
Want efficiency query needs to do the minimal amount of I/O Sometimes that does mean a FTS will be better
It’s about blocks not the number of rows returned in a query that drives whether the cost of FTS is lower than the cost of Index scan. FTS is a multiblock read. Index with the exception of Fast Full are single block and we have to read at least two blocks of an index.
Just like using the index in the back of a book to find the location of a certain topic within the book. The index points to the rowid in the table and then allows access straight to that rowid.
Size of index determines the number of blocks. Index stored sorted. Sometimes on low selectivity a FTS will actually be more efficient than an index scan. This is what we discovered with the Payment_register report in bank. The information that is being retrieved is actually in the Pay_file and employee_payment, to get there we have to traverse a hand full of other tables. However, the only way to pull the correct data out of the pay_file is with the ID of the pay_file which is only in the employee_payment. ID is unique but we could have several in the EP table. Very low selectivity too many rows returned.
All the columns are in the index therefore just use the index instead of going back and forth between index and table
Fairly distinct is the key here. The way this works is that it creates subindexes on the leading column if it is too distinct with too many subindexes being created then it becomes less efficient.
Just like a FTS but it’s the index only. This is the most efficient index. Requires all the columns in the where clause, the select statement and the order by or group by
Not going into great detail on the joins I just want you to be aware of the different joins. This will be covered later with the reading execution plans
Cartesians some times will crop up from time to time as necessary but more often they seem to be due to missing joins conditionsSometimes needed but mostly not necessary always be suspect
Nothing more than a collection of objects. It differs from procedural or process thinking because it deals with all rows that meet a condition. When thinking procedural you think for each row I need to “X” then you set out in a careful step by step process to perform X row by row.
We are just discussing SQL here not PL/SQL, but the concept is the same. Look at the entire set to arrive at the results. It’s ok to start with a procedural thought when working out the details of what you are trying to achieve, but always work toward a set.
With an or if the first condition is met the second one isn’t even evaluate. In an or statement when one evaluates to true the entire statement is true
If the first one evaluates to false then the whole expression is false. Both conditions would need to evaluate to true
Strive to use AND condition when at all possible.
Construction of the SQL statement, application data, distribution of data and the environmentThe greater the selectivity of predicates the more efficient the index accessRemember how we calculate selectivity? Adding additional columns that can actual reduce the selectivity can impact the optimizerThe second step in an index access is costly if the selectivity isn’t good or numerous rowids are returned.
Where clause what columns existLeading column used frequently in queries Cardinality distict values with uniform distribution not a good canidate non-uniform distribution good candidate
DML the index must be modifiedIndex cost impacted by the how big the index is the larger the more costly they are to useBe careful using upper/lower/nvl/to_char an index won’t be used – requires function based index but do you really need that function upper and lower used on the login tableInstead of using NVL in code would it not be more prudent to default the column to 0. Null and 0 are not the same thing, but if you are treating an unknown value as a 0 then maybe it should be a 0
The with clause actually makes SQL more readable the more complex the SQL isBreaking it into chuncksThe optimizer with take the with clauses and either do an in-line view or create a temporary tableWhat it’s not – it’s not a silver bullet it does not solve every performance issue but it never hurts to see if it will in given situations even when you don’t think that is the solutionAlways have a few different options to test.
Here’s one query that was sent to me to tune.What stood out to me is that this like others traverse to the adm, employer, employeee, election to get to the the claim information which is really what is wanted.
I don’t have the execution plan of both of these queries, but basically what happened was that Oracle created a temporary table with the data from the with clause and
Here’s a extraction from the payment_register on enterprise. The piece that was changed.Creates a temp table with just those records from the administrator tableHuge gain in performance 1 hour to 13 seconds
Less I/O should always be your goal.
Know you data! Provide this information via the query you write
Should never be placed in code unless reviewed by DBA then it should always be reviewed periodically whenever changes go into the code to ensure that it is still relevantRemember your queries have an impact on the overall system performance. It is conceivable that a hint can and does cause performance problem which can make the database interoperable