Streamlining Python Development: A Guide to a Modern Project Setup
Oracle analysis 101_v1.0_ext
1. Oracle Analysis 101
Simple techniques to help
analyze performance
• Glenn.Fawcett@Sun.com
> http://blogs.sun.com/~glennf
> Sr. Staff Engineer
> Performance Technologies Group
oracle_analysis_101 (12/9/08) Glenn.Fawcett@sun.com Page 1
2. Goal Statements
Introduce basic techniques that are
required to better collect and analyze
Oracle performance data.
oracle_analysis_101 (12/9/08) Glenn.Fawcett@sun.com Page 2
3. Overview Collecting Data
• Developing a well defined problem statement.
• Define types of performance data and what is
important.
• Minimal set of data required for performance
engagements.
• Data quality – properly scoped and collected.
• Show techniques to gather various types of
performance data from Oracle.
> Basic STATSPACK and Automatic Workload
Repository (AWR) capabilities
> Gathering Oracle Trace Data
oracle_analysis_101 (12/9/08) Glenn.Fawcett@sun.com Page 3
4. Developing a problem statement
• Be as specific as possible using business
metrics:
> Warehouse Inventory user response time increases
from 1 to 10 seconds during peak hours (10AM to
1PM PST).
> The Fulfillment batch job has increased from 1 hour to
2 hours over the past month.
• Avoid defining performance in terms of system
metrics.
> System cpu% has increased from 10% to 25% during
peak hours.
> This may be an indication of a potential problem or Future
problem. This by itself is NOT a problem. Just a symptom.
oracle_analysis_101 (12/9/08) Glenn.Fawcett@sun.com Page 4
5. CPU is not a workload metric!
• Consider the following:
> Upgrade from Older v880 server running Solaris 8.
> New server m4000 on Solaris 10.
• CPU% on new server at 60% during peak vs old
system at 50% during peak.
> Panic!!! The new server can't possibly handle any growth!!
> Escalations ensue, people flap their arms, executive get
involved... you get the picture :)
• Observations
> Need real metrics like “orders/hr”, etc...
> CPU% is not a workload metric or a measure of throughput.
> Solaris 8 often under-reports CPU% vs Solaris 10.... use Tim
Cooks utility and blog:
http://blogs.sun.com/timc/entry/how_event_driven_utilization_measurement
oracle_analysis_101 (12/9/08) Glenn.Fawcett@sun.com Page 5
6. Types of Performance Data
• Environmental
> Configuration (HW, OS, Network, IO, and DB)
> Event/Error logs (“messages” and “alert_xx.log”)
> System Run logs or ECOs.
• High Level statistics (Be sure to scope the data!)
> Business metrics: Orders/min, Shipments/sec, ...
> iostat, netstat, vmstat, mount, prstat, ps -ecf, ... (guds?)
> Oracle STATSPACK or AWR
• Low Level statistics
> mpstat, trapstat, cpustat, lockstat, DTrace
> Event 10046 tracing in Oracle.
oracle_analysis_101 (12/9/08) Glenn.Fawcett@sun.com Page 6
7. Scoped and Correlated Data
• Focus on data around the event
> I once received a STATSPACK where the report
spanned 36 hours ☺
> Avoid data-overload... I recently received 2GB of trace files
> Averages have a funny way of distorting problems and
pointing you in the wrong direction.
> User response time and business metrics
• OS and Database statistics should be from
the SAME interval.
> Often I see an Explorer from midnight with some utilization
data paired up with a STATSPACK from the afternoon.
oracle_analysis_101 (12/9/08) Glenn.Fawcett@sun.com Page 7
8. DTRACE - Cool, but not the best
place to start!
• Treats Oracle as a BLACK box.
• Can identify resource consumers, but can NOT tell
if this behavior is correct or not.
• STATSPACK or AWR can provide DB stats
overview
• Oracle Event Tracing is best for deep drill-down..
the “Dtrace” of Oracle.
oracle_analysis_101 (12/9/08) Glenn.Fawcett@sun.com Page 8
9. Oracle Performance data
• STATSPACK introduced in 8.1.6
> Replaced tired bstat/estat
> Workload profiling with Persistent storage of perf data
> More detailed latch and shared pool data
> Finds HOT SQL statements to aid in SQL tuning.
• Automated Workload Repository (AWR) in 10g
> HTML output!, Remote capabilities, sort by CPU and
Elasped time.
• Trace Wait interface
> Enhanced in 10g
> Trace individual processes/sessions via “oradebug”
oracle_analysis_101 (12/9/08) Glenn.Fawcett@sun.com Page 9
10. Overview Analysis
• Basic techniques
• Environmental (logfiles and configuration)
• STATSPACK / AWR overview
• Oracle Event Tracing (The “DTrace” of Oracle)
oracle_analysis_101 (12/9/08) Glenn.Fawcett@sun.com Page 10
11. Basic techniques
● Start with a well defined problem
●
Look for high-level signs of problems
– alert.log
– STATSPACK/AWR: (1st page stats)
Ÿ
Load profile
Ÿ
top wait events
Ÿ
Hit rates
– Top SQL CPU consumers in AWR reports
Oracle Performance analys"
takes years to ma#er... so be patient.
oracle_analysis_101 (12/9/08) Glenn.Fawcett@sun.com Page 11
12. Alert.log analysis
• Startup time and messages.
> Restart frequency.
> init.ora hacking shows up “_underbar_params”
> Restart frequency
• Errors are reported to the alert.log file.
• Log file switch frequency.
Tue Aug 30 14:01:22 2005
Starting ORACLE instance (normal) Startup
LICENSE_MAX_SESSION = 0
LICENSE_SESSIONS_WARNING = 0 message.
Picked latch-free SCN scheme 3
....
SYS auditing is disabled
Starting up ORACLE RDBMS Version: 10.1.0.2.0. Log switches every
.....
Mon Nov 28 14:39:26 2005
71 seconds!!
Private_strands 3 at log switch
Beginning log switch checkpoint up to RBA [0x19d.2.10], SCN: 0x0000.00478e91
Thread 1 advanced to log sequence 413
Current log# 1 seq# 413 mem# 0: /export/home/oracle/oradata/GLENNF/redo01.log
Mon Nov 28 14:40:37 2005
Private_strands 3 at log switch
Beginning log switch checkpoint up to RBA [0x19e.2.10], SCN: 0x0000.00478ead
Thread 1 advanced to log sequence 414
Current log# 2 seq# 414 mem# 0: /export/home/oracle/oradata/GLENNF/redo02.log
oracle_analysis_101 (12/9/08) Glenn.Fawcett@sun.com Page 12
13. AWR / Statspack Analysis 101
• GOAL:
> Give basic guidance when looking at an AWR or
STATSPACK report.
• Answer basic questions like:
> What is the scope of the data collected?
> Is this RAC or single instance?
> How many connections?
> What is the transaction rate?
> IO rate? Cache hit rate?
> How much CPU is being used?
> What SQL is using the most CPU, IO?
oracle_analysis_101 (12/9/08) Glenn.Fawcett@sun.com Page 13
14. HEADER for Statspack/AWR
• A fair amount of information can be squeezed just from the
header.
RAC cluster
650+ connections...
Sample interval Shadow Processes
oracle_analysis_101 (12/9/08) Glenn.Fawcett@sun.com Page 14
15. Scoping issues
• Example #1. Can you find the issue?
WORKLOAD REPOSITORY report for
DB Name DB Id Instance Inst Num Release RAC Host
------------ ----------- ------------ -------- ----------- --- ------------
PROD 4060419904 PROD2 2 10.2.0.3.0 YES thdtoltpr02
Snap Id Snap Time Sessions Curs/Sess
--------- ------------------- -------- ---------
Begin Snap: 25738 15-May-08 09:00:12 828 15.6
End Snap: 25744 15-May-08 13:00:73 832 15.6
Elapsed: 240.86 (mins)
DB Time: 2405.07 (mins)
Notice Gap in Snap IDs? 4 hour window??
Oracle by default schedules
AWR by the hour.
oracle_analysis_101 (12/9/08) Glenn.Fawcett@sun.com Page 15
16. Scoping issues cont...
• Example #2: What's wrong with this sample?
STATSPACK report for 30min collection
DB Name DB Id Instance
intervalNum good.
Inst
is Release
Cluster Host
------------ ----------- ------------ -------- ----------- ------- ------------
SWINGBCH 861079668 SWINGBCH 1 9.2.0.6.0 NO dc1-beta
Snap Id Snap Time Sessions Curs/Sess Comment
--------- ------------------ -------- --------- -------------------
Begin Snap: 221 27-Apr-07 02:00:06 14 48.6
End Snap: 223 27-Apr-07 02:30:07 3,017 34.5
Elapsed: 30.02 (mins)
This is an application
startup phase.
3000 sessions were added
in the 30min interval!!
oracle_analysis_101 (12/9/08) Glenn.Fawcett@sun.com Page 16
17. Oracle Cache Sizes
• Shows Default Buffer cache, shared pool, recycle, ..
• Caches use IPC shared memory.
> “ipcs -mb” shows segments from OS point of view
> “pmap -xs <orapid>” shows pages and sizes from OS
point view
With DISM, caches
can grow and shrink
Cache Sizes
~~~~~~~~~~~ Begin End
---------- ----------
Buffer Cache: 5,712M 5,712M Std Block Size: 8K
Shared Pool Size: 2,864M 2,864M Log Buffer: 14,376K
Oracle block size. 8K is the
safest by far. All development
and optimizer work is with 8K.
oracle_analysis_101 (12/9/08) Glenn.Fawcett@sun.com Page 17
18. Load Profile
• How many transactions/sec?
• IO profile? Query profile?
Load Profile
~~~~~~~~~~~~ Per Second Per Transaction
--------------- ---------------
Redo size: 14,529,454.45 506,509.90
Logical reads: 154,624.04 5,390.33
Block changes: 45,862.25 1,598.80
Physical reads: 196.92 6.86
Physical writes: 794.24 27.69
User calls: 148.29 5.17
Parses: 34.47 1.20
Hard parses: 0.00 0.00
Sorts: 15.67 0.55
Logons: 0.29 0.01
Executes: 98.55 3.44
Transactions: 28.69
% Blocks changed per Read: 29.66 Recursive Call %: 48.51
Rollback per transaction %: 0.02 Rows per Sort: 1137.18
oracle_analysis_101 (12/9/08) Glenn.Fawcett@sun.com Page 18
19. Load Profile: Apples and Oranges!
• As “Joe the DBA” might say:
– “Nothing's changed”
– “It's the same application”
• Verify it is the same... The truth is in the DATA!
• Key metrics: Logical IO, Physical IO, Transaction profile.
Load Profile
~~~~~~~~~~~~ Per Second Per Transaction
--------------- ---------------
Redo size: 14,529,454.45 506,509.90
Logical reads: 154,624.04 5,390.33
Block changes: 45,862.25 1,598.80
Physical reads: 196.92 6.86
Physical writes: 794.24 27.69
User calls: 148.29 5.17
Parses: 34.47 1.20
Hard parses: 0.00 0.00
Sorts: 15.67 0.55
Logons: 0.29 0.01
Executes: 98.55 3.44
Transactions: 28.69
% Blocks changed per Read: 29.66 Recursive Call %: 48.51
Rollback per transaction %: 0.02 Rows per Sort: 1137.18
oracle_analysis_101 (12/9/08) Glenn.Fawcett@sun.com Page 19
20. Load Profile... warning signs
• High physical IO rate.
• Hard parses... should primarily be soft parses.
• High “Logons/sec”... use persistent connections!
Load Profile
~~~~~~~~~~~~ Per Second Per Transaction
--------------- ---------------
Redo size: 1,282,493.19 2,192.82
Logical reads: 1,104,645.30 1,888.74
Block changes: 9,286.08 15.88
Physical reads: 48,975.96 0.01
Physical writes: 11,983.33 0.37
User calls: 484.33 0.83
Parses: 79.70 0.14
Hard parses: 0.14 0.00
Sorts: 6.74 0.01
Logons: 1.56 0.00
Executes: 4,375.60 7.48
Transactions: 584.86
% Blocks changed per Read: 0.84 Recursive Call %: 97.13
Rollback per transaction %: 1.30 Rows per Sort: 527.7
oracle_analysis_101 (12/9/08) Glenn.Fawcett@sun.com Page 20
21. Instances Efficiency Percentages
• Buffer Hit rate
> Values below 99% are suspect for OLTP.
• Shared Pool “% SQL with exec > 1”
> low values mean poor reuse of shared statements
> SQL without bind variables..
Instance Efficiency Percentages (Target 100%)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Buffer Nowait %: 99.99 Redo NoWait %: 100.00
Buffer Hit %: 98.92 In-memory Sort %: 100.00
Library Hit %: 100.15 Soft Parse %: 99.87
Execute to Parse %: 98.15 Latch Hit %: 99.81
Parse CPU to Parse Elapsd %: 93.41 % Non-Parse CPU: 99.89
Shared Pool Statistics Begin End
------ ------
Memory Usage %: 68.00 68.06
% SQL with executions>1: 98.40 95.83
% Memory for SQL w/exec>1: 96.84 94.28
oracle_analysis_101 (12/9/08) Glenn.Fawcett@sun.com Page 21
22. Top 5 Timed Events
• Wait events
> Shows where “Oracle” connections wait.
> Bad problems usually show up here first.
> This is an Average of all sessions, so treat it as such.
> This is a good sample of the TOP 5 events
> CPU and IO are the top events.
Top 5 Timed Events Avg %Total
~~~~~~~~~~~~~~~~~~ wait Call
Event Waits Time (s) (ms) Time Wait Class
------------------------------ ------------ ----------- ------ ------ ----------
CPU time 3,641 60.6
db file sequential read 268,976 1,375 5 22.9 User I/O
gc cr grant 2-way 218,866 384 2 6.4 Cluster
log file sync 12,625 131 10 2.2 Commit
gc current block 2-way 61,056 130 2 2.2 Cluster
-------------------------------------------------------------
oracle_analysis_101 (12/9/08) Glenn.Fawcett@sun.com Page 22
23. CPU time in Oracle
• Total amount of CPU seconds during the sample interval.
> CPU is typically one of the top stats... along with IO.
> Can calculate CPU utilization!
> Useful for consolidation since only the CPU time for this
instance is considered.
Snap Id Snap Time Sessions Curs/Sess
--------- ------------------- -------- ---------
Begin Snap: 7380 07-Nov-08 20:00:43 1,375 72.0
End Snap: 7382 07-Nov-08 20:30:59 1,361 71.9
Elapsed: 30.27 (mins)
DB Time: 395.62 (mins)
Top 5 Timed Events Avg %Total
~~~~~~~~~~~~~~~~~~ wait Call
Event Waits Time (s) (ms) Time Wait Class
------------------------------ ------------ ----------- ------ ------ ----------
CPU time 14,845 62.5
db file sequential read 1,146,234 8,873 8 37.4 User I/O
db file scattered read 21,784 545 25 2.3 User I/O
read by other session 53,589 244 5 1.0 User I/O
14845/(30.27*60) = 8.17 CPUs busy for “usr” time.
oracle_analysis_101 (12/9/08) Glenn.Fawcett@sun.com Page 23
24. Drill down on Expensive SQL
• Which SQL is using the most CPU?
> Allows you to quickly locate expensive SQL statements...
but beware, this might not be the problem :)
CPU Elapsd
Buffer Gets Executions Gets per Exec %Total Time (s) Time (s) Hash Value
--------------- ------------ -------------- ------ -------- --------- ----------
18,641,061 2,645 7,047.7 39.9 369.37 372.72 3894562395
Module: JDBC Thin Client
insert into PLANARRIV (item, source, dest, transmode, needarrivd
ate, schedarrivdate, needshipdate, schedshipdate, expdate, qty,
firmplansw, seqnum, substqty, departuredate, deliverydate, orde
rplacedate, sourcing ) values ( :1, :2, :3, :4, :5, :6, :7, :8,
:9, :10, :11, :12, :13, :14, :15, :16, :17 )
6,867,117 377 18,215.2 14.7 66.63 94.12 1924417985
Module: JDBC Thin Client
SELECT sku.item,sku.loc,item.perishablesw,loc.ohpost,sku.oh,sku
.ohpost,sku.replentype,skudemandparam.alloccal,skudemandparam.cc
psw,skudemandparam.custorderdur,skudemandparam.dmdredid,skudeman
dparam.dmdtodate,skudemandparam.fcstadjrule,skudemandparam.fcstc
onsumptionrule,skudemandparam.fcstprimconsdur,skudemandparam.fcs
oracle_analysis_101 (12/9/08) Glenn.Fawcett@sun.com Page 24
25. Problem wait events
• “enq”, “buffer busy”, “latch free”.. Often a sign of too many
connections or application problems.
Top 5 Timed Events Avg %Total
~~~~~~~~~~~~~~~~~~ wait Call
Event Waits Time (s) (ms) Time Wait Class
------------------------------ ------------ ----------- ------ ------ ----------
enq: TX - row lock contention 427,949 157,270 367 26.2 Applicatio
CPU time 113,999 19.0
gc buffer busy 3,642,184 95,627 26 16.0 Cluster
gc current block busy 2,264,273 76,874 34 12.8 Cluster
db file scattered read 351,146 30,238 86 5.0 User I/O
Event Waits Time (s) (ms) Time Wait Class
------------------------------ ------------ ----------- ------ ------ ----------
gc buffer busy 556,725 487,263 875 89.7 Cluster
db file sequential read 10,814 9,982 923 1.8 User I/O
enq: HW - contention 7,313 899 123 0.2 Configurat
CPU time 852 0.2
gc current multi block request 901 796 883 0.1 Cluster
% Total
Event Waits Time (s) Ela Time
-------------------------------------------- ------------ ----------- --------
latch free 4,542,675 1,137,914 79.04
log file sync 242,359 164,671 11.44
buffer busy waits 102,540 61,887 4.30
enqueue 35,142 42,498 2.95
CPU time 25,310 1.76
oracle_analysis_101 (12/9/08) Glenn.Fawcett@sun.com Page 25
26. Problem wait events... “log file sync”
• Too many connections lead to scheduling issues.
• Rarely an IO issue.... but check Log file io just in case.
• 2ms or less is desirable
• Many bugs... Use 10.2.0.4.. (Checksum bug #6814520 in 10.2.0.3)
Top 5 Timed Events Avg %Total
~~~~~~~~~~~~~~~~~~ wait Call
Event Waits Time (s) (ms) Time Wait Class
------------------------------ ------------ ----------- ------ ------ ----------
log file sync 107,090 82,401 769 29.1 Commit
enq: HW - contention 78,617 29,060 370 10.3 Configurat
db file sequential read 25,928 24,612 949 8.7 User I/O
gc buffer busy 7,803 5,906 757 2.1 Cluster
Further down the AWR you see all wait events...
Avg
%Time Total Wait wait Waits
Event Waits -outs Time (s) (ms) /txn
---------------------------- -------------- ------ ----------- ------- ---------
log file sync 107,090 77.4 82,401 769 4.5
enq: HW - contention 78,617 73.7 29,060 370 3.3
...
...
log file sequential read 3,975 .0 86 22 0.2
log file parallel write 27,333 .0 86 3 1.1
oracle_analysis_101 (12/9/08) Glenn.Fawcett@sun.com Page 26
27. IO wait events
• You can get avg wait for IO from the Top 5 events.
> Oracle's statistic: “db file sequential read”
– Storage centric view: “Random single block IO”
> Oracle's statistic: “db file scattered read”
– Storage centric view : “Sequential IO”... HUH?
Top 5 Timed Events Avg %Total
~~~~~~~~~~~~~~~~~~ wait Call
Event Waits Time (s) (ms) Time Wait Class
------------------------------ ------------ ----------- ------ ------ ----------
CPU time 17,186 75.3
db file sequential read 744,522 5,874 8 25.7 User I/O
db file scattered read 23,809 459 19 2.0 User I/O
oracle_analysis_101 (12/9/08) Glenn.Fawcett@sun.com Page 27
28. More IO information...
• Reads by Tablespace, Datafile, SQL statement,
Tablespace IO Stats DB/Inst: ITMSCMP/itscr11p Snaps: 6785-6788
-> ordered by IOs (Reads + Writes) desc
Tablespace
------------------------------
Av Av Av Av Buffer Av Buf
Reads Reads/s Rd(ms) Blks/Rd Writes Writes/s Waits Wt(ms)
-------------- ------- ------ ------- ------------ -------- ---------- ------
DATA_TS
477,801 180 8.1 2.6 99,555 37 7,016 5.5
INDEX_TS
186,082 70 8.3 1.0 64,924 24 30,214 0.9
Tablespace Filename
------------------------ ----------------------------------------------------
Av Av Av Av Buffer Av Buf
Reads Reads/s Rd(ms) Blks/Rd Writes Writes/s Waits Wt(ms)
-------------- ------- ------ ------- ------------ -------- ---------- ------
AMG_ALBUM_IDX_TS /oradata/itmscmp/data2/amg_album_idx_ts01.dbf
392 0 7.0 1.0 5 0 0 0.0
AMG_ALBUM_TS /oradata/itmscmp/data3/amg_album_ts01.dbf
7,604 3 7.4 1.0 5 0 2 10.0
oracle_analysis_101 (12/9/08) Glenn.Fawcett@sun.com Page 28
29. Even More IO information...
• Reads by SQL statement, Database object
SQL ordered by Reads DB/Inst: TTOPERF1/ttoperf15 Snaps: 5141-5142
-> Total Disk Reads: 318,079
-> Captured SQL account for 133.5% of Total
Reads CPU Elapsed
Physical Reads Executions per Exec %Total Time (s) Time (s) SQL Id
-------------- ----------- ------------- ------ -------- --------- -------------
811,212 1 811,212.0 51.4 538.11 1013.10 2j2g639a9s4kx
Module: sqlplus@itscontentrepdb05 (TNS V1-V3)
select /*+ parallel(ppc, 2) */ count(distinct p.adam_id) from mz_playlist p, mz
_playlist_price_cache ppc where p.first_production_release is not null and p.las
t_production_release is null and p.playlist_id=ppc.playlist_id and (ppc.start_da
te is NULL or ppc.start_date <= sysdate) and (ppc.end_date is NULL or ppc.end_da
Segments by Physical Reads DB/Inst: ITMSCMP/itscr11p Snaps: 6785-6788
-> Total Physical Reads: 1,577,615
-> Captured Segments account for 81.5% of Total
Tablespace Subobject Obj. Physical
Owner Name Object Name Name Type Reads %Total
---------- ---------- -------------------- ---------- ----- ------------ -------
CONTENT_OW DATA_TS MZ_PLAYLIST_PRICE_CA TABLE 723,411 45.85
CONTENT_OW DATA_TS MZ_PLAYLIST__LS TABLE 87,947 5.57
CONTENT_OW DATA_TS MZ_USER_REVIEW TABLE 79,534 5.04
CONTENT_OW DATA_TS MZ_PRODUCT__LS TABLE 52,580 3.33
CONTENT_OW DATA_TS MZ_PODCAST_EPISODE_2 TABLE 43,243 2.74
oracle_analysis_101 (12/9/08) Glenn.Fawcett@sun.com Page 29
30. “Who needs iostat?”
• IO rate information from the Load Profile
> physical reads/writes per second
• IO service time(s) from wait events
• IO broken down by Tablespace and Datafile, etc..
• Seriously, who needs it?
• Sorry, you still need “iostat”.
> Like the CPU wait events, IO events are only from this instance.
> Times aren't accurate on an over-processed system.
> iostat from the system point of view
> “storage level” analytics are useful as well!
> They often don't match due to
> IO configuration and layout
> Scheduling
oracle_analysis_101 (12/9/08) Glenn.Fawcett@sun.com Page 30
31. Case Study: Oracle Applications BM
• Benchmark for DIT (India IRS :)
• Configuration
> E20K with 36 USIV @1200MHz
> Solaris 10 with Oracle 9iR2
• Oracle Statistics
> STATSPACK
> Event trace
• Problem Statement:
> Unable to support more than 2000 users within 2 second average
response time. The goal is 4000 users. At 2000 users the system
is fully utilized 100% cpu.
oracle_analysis_101 (12/9/08) Glenn.Fawcett@sun.com Page 31
32. Case Study: STATSPACK Data
• STATSPACK data showed severe latch contention
Top 5 Timed Events
~~~~~~~~~~~~~~~~~~ % Total
Event Waits Time (s) Ela Time
-------------------------------------------- ------------ ----------- --------
latch free 10,597,141 1,425,538 97.52
CPU time 25,842 1.77
row cache lock 105,066 4,235 .29
enqueue 7,065 2,438 .17
buffer busy waits 23,785 2,195 .15
• Drill down by CPU, IO, etc... didn't show the problem.
CPU Elapsd
Buffer Gets Executions Gets per Exec %Total Time (s) Time (s) Hash Value
--------------- ------------ -------------- ------ -------- --------- ----------
264,391,557 50,560 5,229.3 44.7 1799.33 3566.29 3184176672
Module: f90runm@sleepy (TNS V1-V3)
SELECT ROWID,SEQ_NO,IND_STAT,BNDL_AREA_CD,BNDL_AO_TYP,BNDL_RANGE
_CD,BNDL_AO_NO,BNDL_FIN_YR,BNDL_CNTR_NO,BNDL_SEQ_NO,ACK_NO,AST_Y
R,PAN,DT_FILED,NAME,RET_INC FROM SS_RETURN WHERE (SEQ_NO IN (SEL
ECT a.SEQ_NO FROM ss_return a WHERE A.RANGE_CD = :1 AND A.AO_NO
= :2 AND A.AO_TYP = :3 AND A.area_cd = :4)) and (AST_YR=:5) and
75,312,641 113,269 664.9 12.7 1063.39 1451.77 3785480933
select max(nvl(option$,0)) from sysauth$ where privilege#=:1 con
nect by grantee#=prior privilege# and privilege#>0 start with (g
rantee#=:2 or grantee#=1) and privilege#>0 group by privilege#
oracle_analysis_101 (12/9/08) Glenn.Fawcett@sun.com Page 32
33. Case Study: Oracle Trace Top-level
• Using Oracle event trace allowed us to narrow our focus and
concentrate on the true bottle-neck.
• Gathered several *.trc files and used “orasrp” to analyze.
• Drilled down on “latch free” events as shown in profile below...
oracle_analysis_101 (12/9/08) Glenn.Fawcett@sun.com Page 33
34. Case Study: Oracle Trace “latch free”
• Drilling down again on statements which contribute the most to
“latch free” shows an interesting pattern with the “dual” table... a
well known problem in Oracle 9i.
oracle_analysis_101 (12/9/08) Glenn.Fawcett@sun.com Page 34
35. Case Study: Summary
• OS showed 100% CPU utilization, but no anomalies. DTRACE
was not helpful here either.
• STATSPACK provided starting point of problem.
• Oracle Trace interface and “response-time” profiling pinned
down the source of the problem.
• Researched “dual” table problem on-line (metalink)
> Problem is fixed in 10g
> Trick / workaround for 9i.
> Re-coding to avoid is Best!!
oracle_analysis_101 (12/9/08) Glenn.Fawcett@sun.com Page 35
36. Oracle Resources
• http://metalink.oracle.com - Oracle's Metalink
> Need an account. Check oracle-interest@sun.com archives for latest.
> Research bugs, tech tips, download patches, ...
• http://technet.oracle.com – Oracle's Technet
> Documentation, white papers, ...
• http://asktom.oracle.com Misc questions mostly dba but some perf
• http://www.oraperf.com - Analyzer for STATSPACK files!!
• http://oracledba.ru/orasrp/ - Oracle Session profiler.
• http://method-r.com/ - Great papers and insight – Cary Millsap
• http://www.orapub.com - Papers, advice, ...
• Nasty bug for 10.2.0.3 : Checksum bug #6814520
oracle_analysis_101 (12/9/08) Glenn.Fawcett@sun.com Page 36
37. More references and resources
• metalink.oracle.com documents on Trace
> 245981.1 – Trace wait functionality in 10g
> 21154.1 – Enabling Tracing (session level)
> 1058210.6 – Enabling Tracing ORADEBUG
> 39817.1 – Interpreting Raw trace data
• Oracle papers
> Avoiding Common Oracle Performance Problems
> http://www.sun.com/blueprints/0303/817-1781.pdf
• Sun Blogs
> Oracle performance on Sun
> http://blogs.sun.com/glennf
> Tim Cook's Solaris 8,9,10 CPU% blog and “old-new” utility.
> http://blogs.sun.com/timc/entry/how_event_driven_utilization_measurement
oracle_analysis_101 (12/9/08) Glenn.Fawcett@sun.com Page 37
38. Summary
• Identify and define the problem
• Collect and identify Oracle performance data
> Alert.log
> STATSPACK
> Oracle Tracing and analysis
• Know when to say when
> Use experts to help guide analysis.
> Avoid Google performance hackers.
oracle_analysis_101 (12/9/08) Glenn.Fawcett@sun.com Page 38
40. Extra slides...
Oracle Analysis 101
●
Glenn.Fawcett@Sun.com
http://blogs.sun.com/~glennf
Sr. Staff Engineer
Performance Technologies Group
41. Where is the Oracle Data?
• Alert logs & Trace Files
$ORACLE_HOME/rdbms/log ##Default
• Optimal Flexible Architecture (OFA) is common to manage
multiple instances
> Places Files files in set location to ease administration.
> User Trace and Alert.log found in:
“USER_DUMP_DEST” init.ora over-rides Default.
“BACKGROUND_DUMP_DEST” for server files...
Including the “alert.log” file.
• Full OFA documentation
http://www.hotsos.com/e-library/abstract.php?id=19
oracle_analysis_101 (12/9/08) Glenn.Fawcett@sun.com Page 41
42. Using STATSPACK
• Install package from $ORACLE_HOME/rdbms/admin
SQL> connect / as sysdba
SQL> @?/rdbms/admin/spcreate ## Usually not necessary
• Take snapshots throughout the day. Often an hourly job.
SQL> connect perfstat/perfstat
SQL> exec statspack.snap(i_snap_level=>7);
...
... run workload ...
...
SQL> exec statspack.snap(i_snap_level=>7);
• Run “spreport.sql” and select two intervals
SQL> @?/rdbms/admin/spreport ## Run report
• init.ora “statistics_level=ALL”
> Necessary to get details about Query plans and Segment
statistics.
oracle_analysis_101 (12/9/08) Glenn.Fawcett@sun.com Page 42
43. Using Automatic Workload Repository
• AWR installation automatic as part of 10g.
• Snaphot
SQL> connect / as sysdba
SQL> exec dbms_workload_repository.create_snapshot();
...run test....
SQL> exec dbms_workload_repository.create_snapshot();
• Run “@?/rdbms/admin/awrrpt” and select two snapshots.
oracle_analysis_101 (12/9/08) Glenn.Fawcett@sun.com Page 43
44. Show Query plans
• Further drill down with ?/rdbms/admin/awrrepsql.sql
> Get full stats and QEP given the hash value of statement
SQL Statistics
~~~~~~~~~~~~~~
-> CPU and Elapsed Time are in seconds (s) for Statement Total and in
milliseconds (ms) for Per Execute
% Snap
Statement Total Per Execute Total
--------------- --------------- ------
Buffer Gets: 6,867,117 18,215.2 14.71
Disk Reads: 3,887 10.3 6.54
Rows processed: 378,635 1,004.3
CPU Time(s/ms): 67 176.7
Elapsed Time(s/ms): 94 249.6
Sorts: 377 1.0
Parse Calls: 0 .0
Invalidations: 0
Version count: 1
Sharable Mem(K): 346
Executions: 377
oracle_analysis_101 (12/9/08) Glenn.Fawcett@sun.com Page 44
46. Drill down on Object Statistics
• Object statistics...
> Which objects are doing the most IO?
> Which objects get the most Buffer Busy Waits?
Subobject Obj. Physical
Owner Tablespace Object Name Name Type Reads %Total
---------- ---------- -------------------- ---------- ----- ------------ -------
STSC INDX DFUTOSKUFCST_PK INDEX 16,784 28.24
STSC DATA DFUTOSKUFCST TABLE 14,792 24.89
STSC DATA SKUPLANNINGPARAM TABLE 5,412 9.11
STSC DATA SOURCING TABLE 3,644 6.13
STSC DATA SKUSAFETYSTOCKPARAM TABLE 2,923 4.92
-------------------------------------------------------------
Buffer
Subobject Obj. Busy
Owner Tablespace Object Name Name Type Waits %Total
---------- ---------- -------------------- ---------- ----- ------------ -------
STSC DATA PLANARRIV TABLE 95,897 94.22
STSC INDX PLANARRIV_PK INDEX 3,999 3.93
STSC DATA SKU TABLE 466 .46
STSC INDX XIF4RECSHIP INDEX 391 .38
STSC DATA RECSHIP TABLE 347 .34
-------------------------------------------------------------
oracle_analysis_101 (12/9/08) Glenn.Fawcett@sun.com Page 46
47. Using the Trace Wait Interface
• Oracle tracing is a lot like truss or Dtrace for the database.
> What is a particular “shadow” process doing? (SQL statements, wait
events, ...)
> Trace produces *.trc file in udump directory. (Post process with HOTSOS
profiler or ORASRP)
SQL> connect / as sysdba
SQL> oradebug setospid 5544
SQL> oradebug event 10046 trace name context forever, level 12
...wait for a while...
SQL> oradebug event 10046 trace name context off
==ora_5544.trc file====
EXEC #2:c=0,e=324,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=4,tim=677730703911
WAIT #2: nam='db file sequential read' ela= 5954 p1=1 p2=15356 p3=1
WAIT #2: nam='db file sequential read' ela= 7235 p1=1 p2=14168 p3=1
FETCH #2:c=10000,e=13869,p=2,cr=3,cu=0,mis=0,r=1,dep=1,og=4,tim=677730717849
STAT #1 id=1 cnt=0 pid=0 pos=1 obj=6251 op='TABLE ACCESS FULL SQLPLUS_PRODUCT_PROFILE (cr=3 r=0
w=0 time=121 us)'
STAT #2 id=1 cnt=1 pid=0 pos=1 obj=18 op='TABLE ACCESS BY INDEX ROWID OBJ#(18) (cr=3 r=2 w=0
time=13829 us)'
STAT #2 id=2 cnt=1 pid=1 pos=1 obj=36 op='INDEX UNIQUE SCAN OBJ#(36) (cr=2 r=1 w=0 time=6350
us)'
WAIT #1: nam='SQL*Net message to client' ela= 3 p1=1650815232 p2=1 p3=0
WAIT #1: nam='SQL*Net message from client' ela= 256 p1=1650815232 p2=1 p3=0
oracle_analysis_101 (12/9/08) Glenn.Fawcett@sun.com Page 47
48. Response Time Profiling Trace Data
• Collect *.trc file as previously shown via oradebug or ???
• Analyze files with
> HOTSOS / Method-R profiler
> “orasrp” freeware which gives a similar profile to HOTSOS.
oracle_analysis_101 (12/9/08) Glenn.Fawcett@sun.com Page 48