Are you an Oracle developer or a DBA?
Do you know the difference between aggregate and analytic functions?
Without complex sub-queries or self-joins, do you know how to:
Calculate running/cumulative totals and moving/centered averages?
List products with revenues above or below their peers or product groups?
Compute the ratio of one category’s sales to the total sales?
Select the Top-N or Top N % of the customers/products?
Classify advertisers into quartiles/n-tiles based on the revenue potential?
Compare period-over-period (year-over-year, month-over-month) growth and rank advancement?
Convert rows into columns (pivot), columns into rows (unpivot) or aggregate strings?
Perform what-if analysis and hypothetical ranking?
Analytic functions are more performant because tables need to be scanned only once. They make you more productive because there is no need to write procedural code. No wonder Tom Kyte, a well-respected Oracle guru, says analytic functions are the best thing to happen after the sliced bread.
In the first half, I will cover the basics of the various analytic functions:
Ranking: RANK, DENSE_RANK, ROW_NUMBER, NTILE, CUME_DIST, PERCENTILE_RANK
Windowing: SUM, AVG, MAX, MIN, FIRST_VALUE, LAST_VALUE
Reporting: RATIO_TO_REPORT
Others: FIRST/LAST, LEAD/LAG, hypothetical ranking,
In the second half, I will show how powerful these functions are with a few examples.
If there is time, I will cover enhanced aggregation (ROLLUP, CUBE, GROUPING SET extensions to GROUP BY clause)
This class would be useful for both developers and DBAs alike, especially for those working in Analytic, Business Intelligence, and Datawarehouse environments.
Are you already an expert in analytic functions? Then come and help me refine the content.
For more info, read
http://download.oracle.com/docs/cd/E11882_01/server.112/e16579/analysis.htm
http://download.oracle.com/docs/cd/E11882_01/server.112/e16579/aggreg.htm
rollup, cross-tabulation across different dimensions using ROLLUP, CUBE and GROUPING SETS extension to GROUP BY clause
, most active time-periods (i.e. days when the most number of tickets are open in BZ, hours with the most take-off and landings, months with the highest sales, 5-minute periods with the maximum number of calls made, etc)
data densification?
their rank last year, this year, rank growth, running/cumulative total (Year-To-Date/Month-To-Date summation), moving averages, Year-Over-Year comparison, sales projection, average/min/max time between one sale and the next sale, products with above and below average sales.
overall average, sum, departmental average, sum, ranking, job wise ranking in one SQL.
2. Agenda
Difference between aggregate and analytic
functions
Introduction to various analytic functions
Functions that are both aggregate and
analytic
Break
More examples
Enhanced Aggregation (CUBE, ROLLUP)
3. Meeting Basics
Put your phones/pagers on vibrate/mute
Messenger: Change the status to offline or
in-meeting
Remote attendees: Mute yourself (*6). Ask
questions via Adobe Connect.
4. Aggregates vs. Analytics
Aggregate functions
Rows are collapsed. One row per group
Non-Group-By columns not allowed in SELECT list.
Analytic functions
Rows are not collapsed
As many rows in the output as in the input
No restrictions on the columns in the SELECT list
Evaluated after joins, WHERE, GROUP BY, HAVING clauses
Nesting not allowed
Can appear only in the SELECT or ORDER BY clause
analytic_aggr_diff.sql
5. Analytics vs. other methods
Show the dept, empno, sal and the sum of all salaries
in their dept
Three possible ways
Using Joins
Using Scalar Sub-queries
Using Analytic Functions
analytics_vs_others.sql
6. Anatomy of an analytic funcion
function (arg1, ..., argN) OVER ([partition_by_clause]
[order_by_clause [windowing_clause]])
The OVER keyword
partition_by_clause: Optional. Not related to table/index
partitions. Analogous to GROUP BY
order_by_clause: Mandatory for Ranking and Windowing
functions. Optional or meaningless for others
windowing_clause: Optional. Should always be preceded by
ORDER BY clause
8. Ranking Functions
ROW_NUMBER()
RANK() – Skips ranks after duplicate ranks
DENSE_RANK() – Doesn't skip rank after duplicate ranks
NTILE(n) – Sorts the rows into N equi-sized buckets
CUME_DIST() – % of rows with values lower or equal
PERCENT_RANK() - (rank of row -1)/(#rows – 1)
function OVER ([PARTITION BY <c1,c2..>] ORDER BY
<c3, ..>)
PARTITION BY clause: Optional
ORDER BY clause: Mandatory
rank_dense_rank.sql
9. FIRST_VALUE/LAST_VALUE/NTH_VALUE
Returns the first/last/nth value from an ordered set
FIRST_VALUE(expr, [IGNORE NULLS]) OVER
([partitonby_clause] orderby_clause)
IGNORE NULLS options helps you "carry forward".
Often used in "Data Densification"
Operates on Default Window (unbounded preceding
and current row) when a window is not explicitly
specified.
NTH_VALUE introduced in 11gR2
flnth_value.sql
10. Window functions
Used for computing cumulative/running totals (YTD, MTD,
QTD), moving/centered averages
function(args) OVER([partition_by_clause] order_by_clause
[windowing_clause])
ORDER BY clause: mandatory.
Windowing Clause: Optional. Defaults to: UNBOUNDED
PRECEDING and CURRENT ROW
anchored or sliding windows
Two ways to specify windows: ROWS, RANGE
[ROW | RANGE ] BETWEEN <start_exp> AND <end_exp>
window.sql
11. ROWS type windows
Physical offset. Number of rows before or after current
row
Non deterministic results if rows are not sorted uniquely
Any number of columns in the ORDER BY clause
ORDER BY columns can be of any type
function(args) OVER ([partition_by_clause] order by c1,
.., cN ROWS between <start_exp> and <end_exp>)
windows_rows.sql
12. RANGE type Windows
Logical offset
non-unique rows treated as one logical row
Only one column allowed in ORDER BY clause
ORDER BY column should be numeric or date
function(args) OVER ([partition_by_clause] order
by c1 RANGE between <start_exp> and <end_exp>)
windows_range.sql
13. Reporting function
Computes the ratio of a value to the sum of a set of
values
RATIO_TO_REPORT(arg) OVER ([PARTITION BY
<c1, .., cN>]
PARTITION BY clause: Optional.
ratio_to_report.sql
14. LAG/LEAD
Gives the ability to access other rows without self-join.
Allows you to treat cursor as an array
Useful for making inter-row calculations (year-over-year
comparison, time between events)
LEAD (expr, <offset>, <default value>) [IGNORE
NULLS] OVER ([partioning_clause] orderby_clause)
Physical offset. Can be fixed or varying. default offset is 1
default value: value returned if offset points to a non-
existent row
IGNORE NULLS determines whether null values of are
included or eliminated from the calculation.
lead_lag.sql
15. FIRST/LAST
Very different from FIRST_VALUE/LAST_VALUE
Returns the results of aggregate/analytic function applied
on column B on the first or last ranked rows sorted by
column A
function (expr_with_colB) KEEP (DENSE_RANK
FIRST/LAST ORDER BY colA) [OVER
(<partitioning_clause)>)]
Slightly different syntax. Note the word KEEP
analytic clause is optional.
first_last.sql
16. Above/Below average calculation
Find the list of employees whose salary is higher than
the department average.
above_average.sql
17. Top-N queries
Find the full details of "set of" employees with the top-
N salaries
Find the two most recent hires in each department
List the names and employee count of departments
with the highest employee count
top_n.sql
19. Multi Top-N queries
For each customer, find out
the maximum sale in the last 7 days
the date of that sale
the maximum sale in the last 30 days
the date of that sale
multi_top.sql
22. Inverse Percentile functions
Return the value corresponding to a certain
percentile (opposite of CUME_DIST)
PERCENTILE_CONT (continuous)
PERCENTILE_DISC (discrete)
PERCENTILE_CONT(0.5) is the same as MEDIAN
inverse_p.sql
23. String Aggregation: LISTAGG, STRAGG
Concatenated string of values for a particular group
(e.g. employees working in a dept)
Tom Kyte's STRAGG
11gR2 has LISTAGG
10g has COLLECT
listagg.sql
24. Pivoting/Unpivoting
Pivoting
transposes rows to columns
DECODE/CASE and GROUP BY used
Unpivoting
Converts columns to rows
Join the base table with a one column serial number table
11gR2 introduced PIVOT and UNPIVOT clauses to
SELECT
pivot.sql
25. Data Densification
Data normally stored in sparse form (e.g. No rows
if there is no sales for a particular period)
Missing data needed for comparison (e.g. month-
over-month comparison)
Data Densification comes in handy
LAG (col, INGORE NULLS), and PARTITION BY
OUTER JOIN are used.
http://hoopercharles.wordpress.com/2009/12/07
/sql-filling-in-gaps-in-the-source-data/
26. When not to use analytics
When a simple group by would do the job
when_not_to_use_analytics.sq
27. Drawback of analytics
Lot of sorting.
Set
PGA_AGGREGATE_TARGET/SORT_AREA_SIZE
appropriately
New versions reduce the number of sorts (same
partition_by and order_by clauses on multiple
analytic functions use single sort)
http://asktom.oracle.com/pls/asktom/f?p=100:11:0::
NO::P11_QUESTION_ID:1137250200346660664
http://jonathanlewis.wordpress.com/2009/09/07/an
alytic-agony/
28. Recap of Analytic Functions
Analytic Functions:
Were introduced in 8.1.6 (~1998)
Are supported within PL/SQL only from 10g. Use "view" or
"dynamic sql" older versions.
Compute the 'aggregates' while preserving the 'details'
Eliminate the need for self-joins or multiple passes on the same
table
Reduce the amount of data transferred between DB and client.
Can be used only in SELECT and ORDER BY clauses. Use sub-
queries if there is a need to filter.
Are computed at the end - after join, where, group by, having
clauses
29. Advanced Aggregation
GROUP BY col1, col2
GROUP BY ROLLUP(col1, col2)
GROUP BY CUBE(col1, col2)
GROUP BY GROUPING SETS ((col1, col2), col1)
30. ROLLUP
GROUP BY ROLLUP(col1, col2)
Generates subtotals automatically
Generally used in hierarchical dimensions (region,
state, city), (year, quarter, month, day)
n + 1 different groupings where n is the number of
expressions in the ROLLUP operator in the GROUP
BY clause.
Order of the columns in ROLLUP matter.
ROLLUP(col1, col2), ROLLUP(col2, col1) produce
different outputs
31. CUBE
GROUP BY CUBE(col1, col2)
Gives subtotals automatically for every possible combination
Used in cross-tabular reports.
Suitable when dimensions are independent of each other
2n different groupings where n is the number of expressions
in the CUBE operator in the GROUP BY clause.
Have to be careful with higher values for n
Order of the columns in CUBE doesn’t really matter.
CUBE(col1, col2), CUBE(col2, col1) produce same results, but
in a different order.
32. Grouping Sets
GROUP BY GROUPING SETS (col1, (col1, col2))
Explicitly lists the needed groupings
GROUPING, GROUPING_ID, GROUP_ID functions
help you differentiate one grouping from the other.
Advanced aggregation functions more efficient than
their UNION ALL equivalents (why?)
Grouping Equivalent GROUPING SETS
advanced_agg.sql
CUBE(a,b) GROUPING SETS((a,b), (a), (b), ())
ROLLUP(a,b) GROUPING SETS((a,b), (a), ())
ROLLUP(b,a) GROUPING SETS((a,b), (b), ())
ROLLUP(a) GROUPING SETS((a), ())
34. Concatenated Groupings
GROUP BY GROUPING SETs (a,b), GROUPING
SETS (c,d)
The above is same as GROUP BY GROUPING SETS
((a,c), (a,d), (b,c), (b,d))
38. Predicate merging in views with analytics
create view v select .. over(partition by ...) from t;
select ... from v where col1 = 'A'
In some cases predicates don't get merged.
Reasons:
http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QU
ESTION_ID:12864646978683#30266389821111
http://asktom.oracle.com/pls/asktom/f?p=100:11:0::NO::P11_
QUESTION_ID:1137250200346660664
http://forums.oracle.com/forums/thread.jspa?messageID=416
9151�