4. What is top-N
SELECT picture
“The internet”
FROM images
WHERE subject=‘lolcats’
/
sorted by: funny
“Lolcats”
view more: next >
5. Setup
SQL> @desc cities
Name Null? Type
-------------------------- -------- ---------------
NAME NOT NULL VARCHAR2(100)
STATE NOT NULL VARCHAR2(100)
POPULATION NOT NULL NUMBER
PCTFREE 99 PCTUSED 1
http://www.census.gov
6. Naïve Top-N
Give me the top 5 cities by population
NAME Pop
SELECT name, population
---------------------- ------
FROM cities
Robertsdale city 5,276
WHERE rownum <= 5 Glen Allen town (pt.) 458
ORDER BY population DESC; Boligee town 328
Riverview town 184
Altoona town (pt.) 30
Statistics
7 consistent gets
8. Correct top-N query
SELECT name, population SELECT * FROM (
FROM cities SELECT name, population
ORDER BY population DESC FROM cities
FETCH FIRST 5 ROWS ONLY ORDER BY population DESC
) WHERE rownum <= 5
>= 12c <= 11g
9. Correct top-N query:
Execution
NAME Pop
SELECT * FROM ( -------------------- ----------
SELECT name, population Los Angeles city 3,792,621
FROM cities Chicago city (pt.) 2,695,598
Chicago city (pt.) 2,695,598
ORDER BY population DESC Chicago city 2,695,598
) WHERE rownum <= 5; New York city (pt.) 2,504,700
Statistics
56024 consistent gets
12. Proper data structure
Ordered By: Population
CREATE INDEX i_pop ON cities(population);
--------------------------------------------------------------------
| Id | Operation | Name | Rows | Time |
--------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 5 | 00:00:01 |
|* 1 | COUNT STOPKEY | | | |
| 2 | VIEW | | 10 | 00:00:01 |
| 3 | TABLE ACCESS BY INDEX ROWID | CITIES | 56072 | 00:00:01 |
| 4 | INDEX RANGE SCAN DESCENDING| I_POP | 10 | 00:00:01 |
--------------------------------------------------------------------
Statistics
12 consistent gets
13. Why indexes work
Ordered By: Population
CREATE INDEX i_pop ON cities(population);
• Colocation
• Can stop after reading N rows
• No Sort
14. More elaborate top-N
Give me the top 5 cities by population in Florida
NAME Pop
SELECT * FROM (
-------------------- ----------
SELECT name, population Jacksonville city 821,784
FROM cities Miami city 399,457
Tampa city 335,709
WHERE state='Florida'
St. Petersburg city 244,769
ORDER BY population DESC Orlando city 238,300
) WHERE rownum <= 5;
Statistics
264 consistent gets
15. Uncertain nature
of filtering
Ordered By: Population
WHERE state='Florida' WHERE state='Florida'
ORDER BY population DESC ORDER BY population DESC
) WHERE rownum <= 5; ) WHERE rownum <= 200;
Statistics Statistics
264 consistent gets 19747 consistent gets
16. Multi column indexes
CREATE INDEX i_state_pop ON cities(state, population);
where state=‘FL’
State AL AK AZ CO FL MA WA
Population
*NOT* Ordered by: Ordered By:
Population Population
20. Covering index
CREATE INDEX i_state_pop CREATE INDEX i_state_pop_c
ON cities ON cities
(state, population); (state, population, name);
Statistics Statistics
12 consistent gets 7 consistent gets
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 5 | 00:00:01 |
|* 1 | COUNT STOPKEY | | | |
| 2 | VIEW | | 10 | 00:00:01 |
|* 3 | INDEX RANGE SCAN DESCENDING| I_STATE_POP_C | 506 | 00:00:01 |
--------------------------------------------------------------------------
21. Ideal top-N
• Use the index
• Make the best index
• And read only from the index
22. Less than ideal top-N
• Effect of query conditions
• Effect of deletes and updates
• Technicalities
23. Condition better!
CREATE TABLE orders (
…
active char(1) NOT NULL CHECK (active IN ('Y', 'N'))
WHERE active != 'N' WHERE active = 'Y'
ORDER BY order_date DESC ORDER BY order_date DESC
) WHERE rownum <= 10; ) WHERE rownum <= 10;
Statistics Statistics
12345 consistent gets 10 consistent gets
24. Trade WHERE
for ORDER BY
CREATE INDEX t_idx ON t(a, b, c);
SELECT * FROM (SELECT * FROM t WHERE a=12 ORDER BY c)
WHERE rownum <= 10;
WHERE a=12 ORDER BY c Statistics
1200 consistent gets
WHERE a=12 ORDER BY b, c Statistics
12 consistent gets
WHERE a=12 AND b=0 Statistics
ORDER BY c 12 consistent gets
25. Tolerate filtering
SELECT * FROM (
SELECT name, population
FROM cities
WHERE state != 'Florida'
ORDER BY population DESC
) WHERE rownum <= 10;
Statistics
28 consistent gets
26. Tolerate filtering
--------------------------------------------------------------------
| Id | Operation | Name | Rows | Time |
--------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 11 | 00:00:01 |
|* 1 | COUNT STOPKEY | | | |
| 2 | VIEW | | 11 | 00:00:01 |
|* 3 | TABLE ACCESS BY INDEX ROWID | CITIES | 55566 | 00:00:01 |
| 4 | INDEX RANGE SCAN DESCENDING| I_POP | 12 | 00:00:01 |
-------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(ROWNUM<=10)
3 - filter("STATE"<>'Florida')
27. Updates and Deletes
SQL> @desc cities2
Name Null? Type
---------------------- -------- ----------------
NAME NOT NULL VARCHAR2(100)
STATE NOT NULL VARCHAR2(100)
POPULATION NOT NULL NUMBER
BUDGET_SURPLUS NOT NULL VARCHAR2(1)
CREATE INDEX i2_pop
ON cities2(budget_surplus, population, name);
28. Updates and Deletes
SELECT * FROM (
SELECT name, population FROM cities2
WHERE budget_surplus='Y' ORDER BY population DESC
) WHERE rownum <= 5;
-------------------------------------------------------------------
| Id | Operation | Name | Rows | Time |
-------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 5 | 00:00:01 |
|* 1 | COUNT STOPKEY | | | |
| 2 | VIEW | | 12 | 00:00:01 |
|* 3 | INDEX RANGE SCAN DESCENDING| I2_POP | 56067 | 00:00:01 |
-------------------------------------------------------------------
Statistics
7 consistent gets
29. Updates and Deletes
UPDATE cities2 SET budget_surplus='N' WHERE rowid IN (
SELECT * FROM (
SELECT rowid FROM cities2 ORDER BY population DESC
) WHERE rownum <= 200);
-------------------------------------------------------------------
| Id | Operation | Name | Rows | Time |
-------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 5 | 00:00:01 |
|* 1 | COUNT STOPKEY | | | |
| 2 | VIEW | | 12 | 00:00:01 |
|* 3 | INDEX RANGE SCAN DESCENDING| I2_POP | 56067 | 00:00:01 |
-------------------------------------------------------------------
Statistics
207 consistent gets
31. Updates and Deletes
ALTER TABLE cities2 ADD (version number default 0 NOT NULL);
CREATE INDEX i2_vpop
ON cities2(budget_surplus, version, population);
UPDATE cities2 SET version=1
WHERE budget_surplus='Y' AND version=0;
Budget_surplus
Y Y
Budget_surplus
Version
0
Y 1
Population
32. Updates and Deletes
SELECT * FROM (
SELECT name, population FROM cities2
WHERE budget_surplus='Y' AND version=1
ORDER BY population DESC
) WHERE rownum <= 5;
--------------------------------------------------------------------
| Id | Operation | Name | Rows | Time |
--------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 00:00:01 |
|* 1 | COUNT STOPKEY | | | |
| 2 | VIEW | | 1 | 00:00:01 |
|* 3 | INDEX RANGE SCAN DESCENDING| I2_VPOP | 1 | 00:00:01 |
--------------------------------------------------------------------
Statistics
9 consistent gets
33. Pagination
SELECT * FROM ( SELECT * FROM (
SELECT name, population SELECT * FROM (
FROM cities SELECT name, population,
WHERE state='Florida' rownum AS rn
ORDER BY population DESC FROM cities
) WHERE rownum <= 10; WHERE state='Florida'
ORDER BY population DESC
) WHERE rownum <= 20
) WHERE rn > 10;
34. Dumb Pagination
) WHERE rownum <= 20 Statistics
) WHERE rn > 10; 22 consistent gets
) WHERE rownum <= 30 Statistics
) WHERE rn > 20; 32 consistent gets
35. Smart pagination
SELECT * FROM ( SELECT * FROM (
SELECT * FROM ( SELECT name, population
SELECT name, population, FROM cities
rownum AS rn WHERE state='Florida'
FROM cities AND population < 154750
WHERE state='Florida' ORDER BY population DESC
ORDER BY population DESC ) WHERE rownum <= 10;
) WHERE rownum <= 20
) WHERE rn > 10;
Statistics Statistics
22 consistent gets 12 consistent gets
36. Top-N with joins: Rules
• ORDER BY only the LEADING table
• Use NESTED LOOPS
• Build indexes for STREAMING
37. Top-N with joins
SELECT * FROM ( Driving Filter state
table:
SELECT c.name as city, Order By population
c.population, s.capital Join state_id
FROM cities c, states s Select name
WHERE c.state_id = s.id
AND c.state='Florida'
ORDER BY c.population DESC
Joined to Join id
) WHERE rownum <= 5 table: Select capital
/
38. Top-N with joins: Good
-------------------------------------------------------
| Id | Operation | Name | Rows | Time |
-------------------------------------------------------
| 0 | SELECT STATEMENT | | 5 | 00:00:13 |
|* 1 | COUNT STOPKEY | | | |
| 2 | VIEW | | 10 | 00:00:13 |
| 3 | NESTED LOOPS | | 10 | 00:00:13 |
|* 4 | INDEX RANGE SCAN| I_C | 506 | 00:00:07 |
|* 5 | INDEX RANGE SCAN| I_S | 1 | 00:00:01 |
-------------------------------------------------------
39. Top-N with joins: Bad
-----------------------------------------------------------
| Id | Operation | Name | Rows | Time |
-----------------------------------------------------------
| 0 | SELECT STATEMENT | | 5 | 00:00:07 |
|* 1 | COUNT STOPKEY | | | |
| 2 | VIEW | | 10 | 00:00:07 |
|* 3 | SORT ORDER BY STOPKEY| | 10 | 00:00:07 |
|* 4 | HASH JOIN | | 10 | 00:00:07 |
|* 5 | INDEX RANGE SCAN | I_C | 506 | 00:00:07 |
|* 6 | INDEX RANGE SCAN | I_S | 1 | 00:00:01 |
-----------------------------------------------------------
This feels a bit 1990s … Let’s see some examples that are more 21st century
Nowadays every website has a search buttonUsers can search for anythingWhen you search you get some results, but these are normally hugeYou need to cut the results and display a few that are most interesting and fit the page (aka: top-n) and then you need to be able to move on to less interesting results (aka: paginate)“Most interesting” can be defined several different ways
Notice no SORT step
Likely: return a few rows, ‘freeze’, then return a few rows again etcFiltering in “plain English”: reading junk
Notice no TABLE ACCESS BY INDEX ROWID – all data is in the index
For pagination, equality filter and order by conditions are tradeableYou can either “fix” the condition (where a=…) or include it into order by
Can work if number of “filtered” conditions is small
… And then we run a select
Alternatively, statement can extract current version from the index itself:SELECT * FROM ( SELECT name, population FROM cities2 WHERE budget_surplus='Y' AND version=(SELECT max(version) FROM cities2 WHERE budget_surplus=‘Y’) ORDER BY population DESC) WHERE rownum <= 5;