SlideShare una empresa de Scribd logo
1 de 135
Descargar para leer sin conexión
VP AIOps for the Autonomous Database
Sandesh Rao
For Database 19c
19 Troubleshooting Tips and Tricks
@sandeshr
https://www.linkedin.com/in/raosandesh/
https://www.slideshare.net/SandeshRao4
The following is intended to outline our general product direction. It is intended for information
purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any
material, code, or functionality, and should not be relied upon in making purchasing decisions. The
development, release, timing, and pricing of any features or functionality described for Oracle’s
products may change and remains at the sole discretion of Oracle Corporation.
Statements in this presentation relating to Oracle’s future plans, expectations, beliefs, intentions and
prospects are “forward-looking statements” and are subject to material risks and uncertainties. A
detailed discussion of these factors and other risks that affect our business is contained in Oracle’s
Securities and Exchange Commission (SEC) filings, including our most recent reports on Form 10-K and
Form 10-Q under the heading “Risk Factors.” These filings are available on the SEC’s website or on
Oracle’s website at http://www.oracle.com/investor. All information in this presentation is current as of
September 2019 and Oracle undertakes no duty to update any statement in light of new information or
future events.
Safe harbor statement
1. Systemstate dumps
2. ADDM in multitenant
environment
3. Analyze logs
4. Connect to a hung database
5. Guide resolution with Oracle
Support
6. SQLHC
7. Query trace files using SQL
8. Hanganalye enhancements
9. Track file attribute changes
10. Event notification
11. Self analysis on MOS with TFA
12. AWR scripts
13. Sanitize sensitive information
14. Find if anything changed
15. Detect and collect using SRDCs
16. Manage logs
17. Monitor multiple logs
18. Monitor Database performance
19. Analyze OS metrics
20. Diagnose cluster health
Agenda
1: Systemstate dumps what are
they and how do I read them
A systemstate is made up of the processstate of each process in the instance
found at the time the systemstate was called for.
Each processtate is made up of SO (State Objects) which hold details of the
state of current objects owned by each PROCESS.
To navigate a statestate:
1. Find what process most sessions are waiting for
2. Recursively navigate what each process is waiting for
3. When you find a process on the CPU get an error stack to understand
why it is blocked
Systemstate Dumps
These are waits for locks held upon a particular object. In the example below, the process is waiting for
a TX enqueue as indicated by the "waiting for 'enq: TX - row lock contention'" message:
Enqueues
Systemstate Dumps
PROCESS 41
...
waiting for 'enq: TX - row lock contention' blocking sess=0x39b3a5c90 seq=152 wait_time=0 seconds since wait
started=796
name|mode=54580006, usn * 54580006 is ASCII and can be split up as follows to reveal the meaning:
* ASCII 54 (T) + ASCII 58 (T) => (TX) + Mode 0006 (X) ...
To find more details on the enqueue, do a search for the string 'req:' (searching DOWN) within the
process. In this case we find a section with a "req:X" request:
"req:" in this case refers the "request" for the TX lock that is being waited for by the 'enq: TX - row lock
contention' wait. The request is for an eXclusive TX lock.
This section also reveals the enqueue name as a string: (TX-00020009-0001FA04) that can be used to
search for the HOLDER (the holder of the resource is shown with the string "mode:" with the mode that
the lock is being held in by the holder, in this case eXclusive) :
We can see we hold the enqueue (mode: X) in a incompatible mode to the req: X request...
Enqueues
Systemstate Dumps
SO: 39ad80d60, type: 5, owner: 393cb85e0, flag: INIT/-/-/0x00
(enqueue) TX-00020009-0001FA04 DID: 0001-0029-00000090
lv: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 flag: 0x6
res: 39aef20c8, req: X, prv: 39aef20e8, own: 39b383aa8, sess: 39b383aa8, proc: 39b7384f0
(enqueue) TX-00020009-0001FA04 DID: 0001-002E-00000014
lv: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 flag: 0x6
res: 39aef20c8, mode: X, prv: 39aef20d8, own: 39b3a5c90, sess: 39b3a5c90, proc: 39b73ac78
A Row cache waits are waits against the Row Cache (or Dictionary Cache). Processes will show "waiting
for 'row cache lock’”
• mode=0 shows the lock is not currently held
• request=3 shows we are requesting the lock in Shared (mode 3)
• object=7000000eedc13a0 show the object we are requesting the lock on
• request=S shows the lock is Shared(S)
• cid=7(dc_users) shows the cache type of dc_users with a cache ID of 7
• mode=X shows the lock is held in eXclusive mode
Rowcache locks
Systemstate Dumps
PROCESS 19:
...
waiting for 'row cache lock' blocking sess=0x0 seq=2174 wait_time=0
cache id=7, mode=0, request=3
--------------------------------------------------------------------------------
SO: 7000000c6de7678, type: 48, owner: 7000000a6c97cf8, flag: INIT/-/-/0x00
row cache enqueue: count=1 session=7000000a660b8b0 object=7000000eedc13a0, request=S
savepoint=2148
row cache parent object: address=7000000eedc13a0 cid=7(dc_users) hash=2a057ebe typ=9 transaction=7000000c42297a0
flags=00000002
own=7000000eedc1480[7000000c6de8518,7000000c6de8518] wat=7000000eedc1490[7000000c6de7568,7000000c6deed98] mode=X
status=VALID/-/-/-/-/-/-/-/-
request=N release=TRUE flags=0
This process is waiting for 'row cache lock'. The waiter is waiting for "object=7000000eedc13a0" and it is requesting a
Share mode lock "request=S". To find the HOLDER, search for object but use the mode: string to indicate a holder
Rowcache locks
Systemstate Dumps
PROCESS 19:
...
waiting for 'row cache lock' blocking sess=0x0 seq=2174 wait_time=0
cache id=7, mode=0, request=3
--------------------------------------------------------------------------------
SO: 7000000c6de7678, type: 48, owner: 7000000a6c97cf8, flag: INIT/-/-/0x00
row cache enqueue: count=1 session=7000000a660b8b0 object=7000000eedc13a0, request=S
savepoint=2148
row cache parent object: address=7000000eedc13a0 cid=7(dc_users) hash=2a057ebe typ=9 transaction=7000000c42297a0 flags=00000002
own=7000000eedc1480[7000000c6de8518,7000000c6de8518] wat=7000000eedc1490[7000000c6de7568,7000000c6deed98] mode=X status=VALID/-/-
/-/-/-/-/-/-
request=N release=TRUE flags=0
SO: 7000000c6de84e8, type: 48, owner: 7000000c42297a0, flag: INIT/-/-/0x00
row cache enqueue: count=1 session=7000000a6702710 object=7000000eedc13a0, mode=X
savepoint=109
row cache parent object: address=7000000eedc13a0 cid=7(dc_users)
hash=2a057ebe typ=9 transaction=7000000c42297a0 flags=00000002
own=7000000eedc1480[7000000c6de8518,7000000c6de8518] wat=7000000eedc1490[7000000c6de7568,7000000c6df1b08] mode=X
status=VALID/-/-/-/-/-/-/-/-
request=N release=TRUE flags=0
instance lock id=QH 00000440 00000000
set=0, complete=FALSE
set=1, complete=FALSE
set=2, complete=FALSE
data=
In this case the "mode:" of the holder is eXclusive
(i.e. object=7000000eedc13a0, mode=X). Search back up to the top
of this process to find which process is holding the resource.
Waits for library cache pins are of the form" waiting for 'cursor: pin S wait on X’”
To find more details use the idn=XXXXXX to search down in the systemstate (idn=535d1a6c)
• SID 3094 holds the Mutex (3094,0)
• Request is for Shared (GET_SHRD) mode
Library Cache Pins
Systemstate Dumps
PROCESS 16:
waiting for 'cursor: pin S wait on X' blocking sess=0x0 seq=58849 wait_time=0 seconds since wait started=0
idn=535d1a6c, value=c1600000000, where|sleeps=5003f2428
KGX Atomic Operation Log 7000002e5b9d160
Mutex 7000002b8e92268(3094, 0) idn 535d1a6c oper GET_SHRD
Cursor Pin uid 2489 efd 0 whr 5 slp 58733
opr=2 pso=70000028c47def0 flg=0
pcs=7000002b8e92268 nxt=0 flg=34 cld=3 hd=70000030d6c6eb0 par=7000002eefe64d0
ct=31 hsh=0 unp=0 unn=0 hvl=b825a4d0 nhv=1 ses=700000309b42600
hep=7000002b8e922e8 flg=80 ld=1 ob=7000002de49f8a0 ptr=70000022cf39db8 fex=70000022cf390c8
To find the HOLDER, search for idn XXXXXXX oper until you find one which is held (ie not GET_XXX)
( idn 535d1a6c oper):
• SID 3094 holds Mutex in Exclusive (EXCL)
Library Cache Pins
Systemstate Dumps
KGX Atomic Operation Log 7000002cd934270
Mutex 7000002b8e92268(3094, 0) idn 535d1a6c oper EXCL
Cursor Pin uid 3094 efd 0 whr 7 slp 0
opr=3 pso=7000002a71c4180 flg=0
pcs=7000002b8e92268 nxt=0 flg=34 cld=3 hd=70000030d6c6eb0 par=7000002eefe64d0
ct=31 hsh=0 unp=0 unn=0 hvl=b825a4d0 nhv=1 ses=700000309b42600
hep=7000002b8e922e8 flg=80 ld=1 ob=7000002de49f8a0 ptr=70000022cf39db8 fex=70000022cf390c8
To find more details use the handle address in the form handle=address to search down in the
systemstate (ie handle=70000030de975a8)
• Exclusive (X) Requested
• <USER_NAME>.<OBJECT_NAME> is the object we are trying to lock
Library Cache Lock
Systemstate Dumps
PROCESS 35:
waiting for 'library cache lock' blocking sess=0x0 seq=35844 wait_time=0 seconds since wait started=14615
handle address=70000030de975a8, lock address=70000026947e190, 100*mode+namespace=12d
SO: 70000026947e190, type: 53, owner: 700000308d726f0, flag: INIT/-/-/0x00
LIBRARY OBJECT LOCK: lock=70000026947e190 handle=70000030de975a8 request=X
call pin=0 session pin=0 hpc=0000 hlc=0000
htl=70000026947e210[7000002b333ffe8,7000002b333ffe8] htb=7000002b333ffe8 ssga=7000002b333f2a0
user=700000307a7ca68 session=700000307a7ca68 count=0 flags=[0000] savepoint=0x23e411
LIBRARY OBJECT HANDLE: handle=70000030de975a8 mtx=70000030de976d8(0) cdp=0
name=<USER_NAME>.<OBJECT_NAME>
To find the HOLDER, search for 'handle=XXXXXXXXXX mode=' until you find one which is held (but not
in NULL)( handle=70000030de975a8 mode=)
• Hold in Shared (S)
• name=<USER_NAME>.<OBJECT_NAME> confirms the object name
Library Cache Lock
Systemstate Dumps
SO: 700000288b03ae0, type: 53, owner: 7000002cc697468, flag: INIT/-/-/0x00
LIBRARY OBJECT LOCK: lock=700000288b03ae0 handle=70000030de975a8 mode=S
call pin=0 session pin=0 hpc=0000 hlc=0000
htl=700000288b03b60[7000002a179a1a8,7000002b3800878] htb=7000002b3800878 ssga=7000002b37ffb30
user=70000030fafab00 session=70000030fafab00 count=1 flags=[0000] savepoint=0x417
LIBRARY OBJECT HANDLE: handle=70000030de975a8 mtx=70000030de976d8(0) cdp=0
name=<USER_NAME>.<OBJECT_NAME>
• 9d is the latch# (in HEX = 157) from v$latchname
Towards the top of the PROCESS dump you will see the exact latch we are waiting for and even who holds it:
• PROCESS 127 (ospid:23086) holds the latch, PROCESS 127 shows:
Latch free
Systemstate Dumps
PROCESS 8:
waiting for 'latch free' blocking sess=0x0 seq=4577 wait_time=0
address=99ff60018, number=9d, tries=0
waiting for 99ff60018 Child library cache level=5 child#=3
Location from where latch is held: kglic: child
Context saved from call: 26
state=busy
possible holder pid = 127 ospid=23086
wtr=99ff60018, next waiter 9993858b8
holding 99ff60018 Child library cache level=5 child#=3
Location from where latch is held: kglic: child
Context saved from call: 26
state=busy
If you want to find which object a handle refers to then use the handle=XXXXXXXXXX until you come across
the LIBRARY OBJECT HANDLE. ie handle=c00000006c0f8490:-
• name shows the name of the handle
• Namespace=CRSR show the that it is of type CURSOR
Other useful information
Systemstate Dumps
LIBRARY OBJECT HANDLE: handle=c00000006c0f8490
name=SELECT USER FROM DUAL
hash=cd1ceca0 timestamp=03-23-2007 09:00:00
namespace=CRSR flags=RON/TIM/PN0/SML/[12010000]
2: ADDM in a Multitenant
Environment
Starting with Oracle Database 12c, ADDM is enabled by default in the root
container of a multitenant container database (CDB)
You can also use ADDM in a pluggable database (PDB)
• In a CDB, ADDM works in the same way as it works in a non-CDB
• ADDM analysis is performed each time an AWR snapshot is taken on a CDB root or a
PDB
• ADDM does not work in a PDB by default, because automatic AWR snapshots are
disabled
ADDM in a multitenant environment
To enable ADDM in a PDB:
Set the AWR_PDB_AUTOFLUSH_ENABLED initialization parameter to TRUE in the
PDB using the following command:
Set the AWR snapshot interval greater than 0 in the PDB using the command as
shown in the following example:
Results on a PDB provide only PDB-specific findings and recommendations
ADDM in a multitenant environment
SQL> ALTER SYSTEM SET AWR_PDB_AUTOFLUSH_ENABLED=TRUE;
SQL> EXEC
dbms_workload_repository.modify_snapshot_settings(interval=>60);
3: Analyze logs and
look for errors
Investigate logs and look for errors
tfactl analyze -since 1d
INFO: analyzing all (Alert and Unix System Logs) logs for the last 1440
minutes...
...
Unique error messages for last ~1 day(s)
Occurrences percent server name error
----------- ------- -------------------- -----
1 100.0% myserver1 Errors in file
/u01/oracle/diag/rdbms/orcl2/orcl2/trace/orcl2_ora_12272.trc
(incident=10151):
ORA-00600: internal error code, arguments: [600], [], [], [], [], [], [], [],
[], [], [], []
Incident details in:
/u01/oracle/diag/rdbms/orcl2/orcl2/incident/incdir_10151/orcl2_ora_12272_i101
51.trc
...
Investigate logs and look for errors
tfactl analyze -search "ORA-04031" -last 1d
INFO: analyzing all (Alert and Unix System Logs) logs for the last 1440
minutes...
...
Matching regex: ORA-04031
Case sensitive: false
Match count: 1
[Source: /u01/oracle/diag/rdbms/orcl2/orcl2/trace/alert_orcl2.log, Line: 1941]
Sep 15 12:09:05 2019
Errors in file /u01/oracle/diag/rdbms/orcl2/orcl2/trace/orcl2_ora_6982.trc
(incident=7665):
ORA-04031: unable to allocate bytes of shared memory ("","","","")
Incident details in:
/u01/app/oracle/diag/rdbms/orcl2/orcl2/incident/incdir_7665/orcl2_ora_6982_i76
65.trc
...
Examples
tfactl analyze -since 5h
#Show summary of events from alert logs,
system messages in last 5 hours
tfactl analyze -comp os -since 1d
#Show summary of events from system
messages in last 1 day
tfactl analyze -search "ORA-" -since 2d
#Search string ORA- in alert and system
logs in past 2 days
tfactl analyze -search "/Starting/c" -
since 2d
#Search case sensitive string "Starting"
in past 2 days
tfactl analyze -comp os -for "Feb/24/2019
11" -search "."
#Show all system log messages at time
Feb/24/2019 11
tfactl analyze -comp osw -since 6h
#Show OSWatcher Top summary in last 6
hours
tfactl analyze -comp oswslabinfo -from
"Feb/26/2019 05:00:01" -to "Feb/26/2019
06:00:01"
#Show OSWatcher slabinfo summary for
specified time period
tfactl analyze -since 1h -type generic
#Analyze all generic messages in last one
hour
Investigate logs and look for errors
$ ./tfactl analyze -type generic -since 7d
INFO: analyzing all (Alert and Unix System Logs) logs for the last 10080 minutes...
...
Total message count: 54,807, from 28-Jan-2019 04:26:28 PM PST to
03-Mar-2019 02:41:34
Messages matching last ~7 day(s): 3,139, from 24-Feb-2019 02:46:23 PM PST to
03-Mar-2019 02:41:34
last ~7 day(s) generic count: 3,139, from 24-Feb-2019 02:46:23 PM PST to
03-Mar-2019 02:41:34
last ~7 day(s) unique generic count: 94
Message types for last ~7 day(s)
Occurrences percent server name type
----------- ------- -------------------- -----
3,139 100.0% myhost1 generic
...
Investigate logs and look for errors
Unique generic messages for last ~7 day(s)
Occurrences percent server name generic
----------- ------- -------------------- -----
1,504 47.9% myhost1 : [crflogd(13931)]CRS-9520:The storage of Grid
Infrastructure Managem...
487 15.5% myhost1 : [crflogd(13931)]CRS-9520:The storage of Grid
Infrastructure Managem...
336 10.7% myhost1 myhost1 smartd[13812]: Device: /dev/sdv, SMART
Failure: FAILURE...
336 10.7% myhost1 myhost1 smartd[13812]: Device: /dev/sdag, SMART
Failure: FAILURE ...
103 3.3% myhost1 myhost1 last message repeated 9 times
103 3.3% myhost1 myhost1 kernel: oracle: sending ioctl 2285 to a
partition!
...snipping for brevity...
Pattern match search output
tfactl analyze -search "ORA-" -since 7d
...
[Source: /u01/app/oracle/diag/rdbms/ratoda/RATODA1/trace/alert_RATODA1.log, Line:
9494]
Feb 25 22:00:02 2014
Errors in file
/u01/app/oracle/diag/rdbms/ratoda/RATODA1/trace/RATODA1_j003_10948.trc:
ORA-12012: error on auto execute of job "ORACLE_OCM"."MGMT_CONFIG_JOB_2_1"
ORA-29280: invalid directory path
ORA-06512: at "ORACLE_OCM.MGMT_DB_LL_METRICS", line 2436
ORA-06512: at line 1
End automatic SQL Tuning Advisor run for special tuning task
"SYS_AUTO_SQL_TUNING_TASK”
...
OS Watcher top data
tfactl analyze -comp osw -since 6h
...
statistic: t first highest (time) lowest (time) average non zero 3rd last 2nd last last trend
top.cpu.util.id: % 98.0 99.7 @10:35AM 72.8 @03:11PM 97.3 2,059 95.2 96.8 96.0 -2%
top.cpu.util.st: % 0.1 0.1 @09:14AM 0.0 @09:14AM 0.0 889 0.0 0.0 0.0 -100%
top.cpu.util.us: % 0.1 8.8 @11:31AM 0.0 @09:14AM 0.6 1,966 4.3 0.8 3.4 3300%
top.cpu.util.wa: % 1.7 18.7 @03:11PM 0.1 @10:35AM 1.1 2,059 0.3 0.4 0.4 -76%
top.loadavg.last01min: 1.17 3.12 @09:44AM 0.07 @12:45PM 0.93 1,823 0.31 0.26 0.22 -81%
top.loadavg.last05min: 0.94 2.26 @09:44AM 0.27 @12:45PM 0.93 1,823 0.82 0.79 0.77 -18%
top.loadavg.last15min: 0.79 1.60 @09:46AM 0.44 @01:18PM 0.92 1,823 0.96 0.95 0.94 18%
top.mem.buffers: k 808232 808388 @09:41AM 785608 @02:57PM 796511 2,093 785744 785744 785744 -2%
top.mem.free: k 1130332 1291344 @10:02AM 927576 @09:43AM 1188576 2,093 1244020 1265248 1265188 11%
top.swap.used: k 47556 48088 @03:00PM 47556 @09:14AM 47828 2,097 48088 48088 48088 1%
top.tasks.running: 1 4 @12:04PM 1 @09:14AM 1 1,996 1 2 2 100%
top.tasks.total: 514 527 @02:57PM 509 @09:18AM 514 1,996 518 521 520 1%
top.tasks.zombie: 0 5 @11:04AM 0 @09:14AM 0 62 0 0 0 n/a
top.users: 5 6 @03:00PM 5 @09:14AM 5 1,823 6 6 6 20%
...
OS Watcher slabinfo data
tfactl analyze -comp oswslabinfo -from "Feb/26/2019 05:00:01" -to "Feb/26/2019 06:00:01"
...
statistic: t first highest (time) lowest (time) average non zero 3rd last 2nd last last trend
slabinfo.acfs_ccb_cache.active_objs: 4 38 @05:52AM 0 @05:01AM 10 294 3 1 8 100%
slabinfo.inet_peer_cache.active_objs: 23 39 @05:59AM 23 @05:00AM 23 351 23 23 39 69%
slabinfo.sigqueue.active_objs: 385 768 @05:28AM 285 @05:27AM 554 351 712 621 577 49%
slabinfo.skbuff_fclone_cache.active_objs: 55 133 @05:51AM 11 @05:20AM 69 351 56 77 70 27%
slabinfo.names_cache.active_objs: 126 180 @05:00AM 110 @05:23AM 146 351 171 166 156 23%
slabinfo.sgpool-8.active_objs: 135 228 @05:31AM 59 @05:11AM 152 351 180 165 157 16%
slabinfo.UDP.active_objs: 568 675 @05:28AM 492 @05:17AM 597 351 630 596 626 10%
slabinfo.size-8192.active_objs: 174 209 @05:36AM 160 @05:14AM 181 351 205 187 188 8%
slabinfo.task_delay_info.active_objs: 1477 1856 @05:28AM 1334 @05:57AM 1574 351 1529 1411 1579 6%
slabinfo.pid.active_objs: 1608 1980 @05:29AM 1452 @05:21AM 1678 351 1564 1487 1689 5%
slabinfo.blkdev_requests.active_objs: 720 880 @05:04AM 651 @05:54AM 745 351 707 736 761 5%
slabinfo.size-256.active_objs: 1116 1305 @05:06AM 846 @05:11AM 1091 351 1245 1143 1166 4%
slabinfo.ip_dst_cache.active_objs: 1497 1800 @05:28AM 1279 @05:36AM 1517 351 1594 1466 1560 4%
slabinfo.sock_inode_cache.active_objs: 2168 2329 @05:11AM 2106 @05:56AM 2225 351 2322 2278 2232 2%
slabinfo.size-512.active_objs: 3036 3152 @05:38AM 3007 @05:01AM 3088 351 3136 3112 3075 1%
...
4: How to connect to a hung
database for diagnostics
How do you connect to a database when connections are hanging?
• sqlplus preliminary connection will connect to database since no session is
created
- You will have limited access to the SGA
- This will help in capturing diagnostic information like a systemstate dump
• Two ways to connect to sqlplus using a preliminary connection:
or
sqlplus -prelim
sqlplus -prelim / as sysdba
SQL> set _prelim on
SQL> connect / as sysdba
Prelim connection established
5 – Guided resolution
with Oracle Support
Oracle Database ORA-00060 Errors on Single Instance (Non-RAC) Diagnosing
Using Deadlock Graphs in ORA-00060 Trace Files (Doc ID 1550091.2)
Troubleshooting Assistant
https://support.oracle.com/epmos/faces/DocContentDisplay?id=1550091.2
Oracle Database ORA-00060 Errors on Single Instance (Non-RAC) Diagnosing
Using Deadlock Graphs in ORA-00060 Trace Files (Doc ID 1550091.2)
Troubleshooting Assistant
Understand and Troubleshoot Startup/Shutdown Issues (Doc ID 1591095.2)
Troubleshooting Assistant
https://support.oracle.com/epmos/faces/DocContentDisplay?id=1591095.2
Understand and Troubleshoot Startup/Shutdown Issues (Doc ID 1591095.2)
Troubleshooting Assistant
Understand and Troubleshoot Startup/Shutdown Issues (Doc ID 1591095.2)
Troubleshooting Assistant
Understand and Troubleshoot Startup/Shutdown Issues (Doc ID 1591095.2)
Troubleshooting Assistant
Oracle Undo Management (ORA-01555, ORA-30036, ORA-01628,
ORA-01552, etc.) (Doc ID 1575667.2)
Troubleshooting Assistant
https://support.oracle.com/epmos/faces/DocContentDisplay?id=1575667.2
Oracle Undo Management (ORA-01555, ORA-30036, ORA-01628,
ORA-01552, etc.) (Doc ID 1575667.2)
Troubleshooting Assistant
Oracle Undo Management (ORA-01555, ORA-30036, ORA-01628,
ORA-01552, etc.) (Doc ID 1575667.2)
Troubleshooting Assistant
Handling Block Corruptions in Oracle7 / 8 / 8i / 9i / 10g / 11g (Doc ID 1598103.2)
Troubleshooting Assistant
https://support.oracle.com/epmos/faces/DocContentDisplay?id=1598103.2
Handling Block Corruptions in Oracle7 / 8 / 8i / 9i / 10g / 11g (Doc ID 1598103.2)
Troubleshooting Assistant
Handling Block Corruptions in Oracle7 / 8 / 8i / 9i / 10g / 11g (Doc ID 1598103.2)
Troubleshooting Assistant
6: SQLHC –
Healthcheck for a SQL
Health Check
SQL
https://support.oracle.com/epmos/faces/DocContentDisplay?id=1366133.1
1. Login to the database server and set the environment used by the Database Instance
2. Download the "sqlhc.zip" archive file and extract the contents to a suitable directory/folder
3. Connect into SQL*Plus as SYS, a DBA account, or a user with access to Data Dictionary views
and simply execute the "sqlhc.sql" script. It will request to enter two parameters:
i. Oracle Pack License (Tuning, Diagnostics or None) [T|D|N] (required)
ii. A valid SQL_ID for the SQL to be analyzed.
If site has both Tuning and Diagnostics licenses then specify T
(Oracle Tuning pack includes Oracle Diagnostics)
For Example:
Health Check
SQL
# sqlplus / as sysdba
SQL> START sqlhc.sql T djkbyr8vkc64h
Health Check
SQL
7: Query trace files
using SQL
SQL> describe V$DIAG_TRACE_FILE
Name Null? Type
----------------------------------------- -------- ----------------------------
ADR_HOME VARCHAR2(444)
TRACE_FILENAME VARCHAR2(68)
CHANGE_TIME TIMESTAMP(3) WITH TIME ZONE
MODIFY_TIME TIMESTAMP(3) WITH TIME ZONE
CON_ID NUMBER
V$DIAG_TRACE_FILE and V$DIAG_TRACE_FILE_CONTENTS
SQL> describe V$DIAG_TRACE_FILE_CONTENTS
Name Null? Type
----------------------------------------- -------- ----------------------------
ADR_HOME VARCHAR2(444)
TRACE_FILENAME VARCHAR2(68)
RECORD_LEVEL NUMBER
PARENT_LEVEL NUMBER
RECORD_TYPE NUMBER
TIMESTAMP TIMESTAMP(3) WITH TIME ZONE
PAYLOAD VARCHAR2(4000)
SECTION_ID NUMBER
SECTION_NAME VARCHAR2(64)
COMPONENT_NAME VARCHAR2(64)
OPERATION_NAME VARCHAR2(64)
FILE_NAME VARCHAR2(64)
FUNCTION_NAME VARCHAR2(64)
LINE_NUMBER NUMBER
THREAD_ID VARCHAR2(64)
SESSION_ID NUMBER
SERIAL# NUMBER
CON_UID NUMBER
CONTAINER_NAME VARCHAR2(64)
CON_ID NUMBER
V$DIAG_TRACE_FILE and V$DIAG_TRACE_FILE_CONTENTS
SQL> select trace_filename from v$diag_trace_file;
TRACE_FILENAME
--------------------------------------------------------------------
ORCL1_mz00_21108.trc
ORCL1_gcr2_16504.trc
ORCL1_gcr3_12849.trc
ORCL1_gcr1_28159.trc
ORCL1_gcr1_27603.trc
ORCL1_gcr0_29971.trc
ORCL1_mz00_26487.trc
ORCL1_mz00_28329.trc
ORCL1_ora_19005.trc
ORCL1_gcr3_12879.trc
ORCL1_gcr1_11688.trc
V$DIAG_TRACE_FILE and V$DIAG_TRACE_FILE_CONTENTS
SQL> select payload from v$diag_trace_file_contents where trace_filename ='ORCL1_ora_19005.trc';
PAYLOAD
--------------------------------------------------------------------------------
Trace file /u01/app/oracle/diag/rdbms/orcl_unq/ORCL1/trace/ORCL1_ora_19005.trc
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.2.0.0.0
Build label: RDBMS_19.2.0.0.0_LINUX.X64_190121
ORACLE_HOME: /u01/app/oracle/product/19c/dbhome_1
System name: Linux
Node name: myserver65
Release: 4.14.35-1844.1.3.el7uek.x86_64
Version: #2 SMP Wed Jan 2 21:18:29 PST 2019
Machine: x86_64
VM name: Xen Version: 4.1 (HVM)
...
V$DIAG_TRACE_FILE and V$DIAG_TRACE_FILE_CONTENTS
...
PAYLOAD
--------------------------------------------------------------------------------
Instance name: ORCL1
Redo thread mounted by this instance: 1
Oracle process number: 12
Unix process pid: 19005, image: oracle@myserver65 (TNS V1-V3)
*** 2019-11-20T01:22:10.770960+00:00
*** SESSION ID:(106.17196) 2019-11-20T01:22:10.771014+00:00
*** CLIENT ID:() 2019-11-20T01:22:10.771027+00:00
*** SERVICE NAME:(SYS$USERS) 2019-11-20T01:22:10.771039+00:00
...
V$DIAG_TRACE_FILE and V$DIAG_TRACE_FILE_CONTENTS
SQL> describe V$DIAG_SESS_SQL_TRACE_RECORDS
Name Null? Type
----------------------------------------- -------- ----------------------------
ADR_HOME VARCHAR2(444)
TRACE_FILENAME VARCHAR2(68)
RECORD_LEVEL NUMBER
PARENT_LEVEL NUMBER
RECORD_TYPE NUMBER
TIMESTAMP TIMESTAMP(3) WITH TIME ZONE
PAYLOAD VARCHAR2(4000)
SECTION_ID NUMBER
SECTION_NAME VARCHAR2(64)
COMPONENT_NAME VARCHAR2(64)
OPERATION_NAME VARCHAR2(64)
FILE_NAME VARCHAR2(64)
FUNCTION_NAME VARCHAR2(64)
LINE_NUMBER NUMBER
THREAD_ID VARCHAR2(64)
SESSION_ID NUMBER
SERIAL# NUMBER
CON_UID NUMBER
CONTAINER_NAME VARCHAR2(64)
CON_ID NUMBER
V$DIAG_SESS_SQL_TRACE_RECORDS
SQL> SELECT sid,serial# FROM v$session WHERE username = 'SYS’;
SID SERIAL#
---------- ----------
33 45888
129 6051
SQL> EXECUTE DBMS_SYSTEM.SET_SQL_TRACE_IN_SESSION(129,6051,TRUE);
PL/SQL procedure successfully completed.
V$DIAG_SESS_SQL_TRACE_RECORDS
Enable session tracing
SQL> select unique trace_filename from V$DIAG_SESS_SQL_TRACE_RECORDS;
TRACE_FILENAME
--------------------------------------------------------------------
ORCL1_ora_14151.trc
SQL> select payload from V$DIAG_SESS_SQL_TRACE_RECORDS where trace_filename = 'ORCL1_ora_14151.trc';
PAYLOAD
--------------------------------------------------------------------------------
CLOSE #140506358472544:c=19,e=18,dep=0,type=1,tim=7769230586778
=====================
PARSING IN CURSOR #140506358494608 len=97 dep=1 uid=0 oct=3 lid=0 tim=7769230600
163 hv=791757000 ad='7fa0c290' sqlid='87gaftwrm2h68'
select o.owner#,o.name,o.namespace,o.remoteowner,o.linkname,o.subname from obj$
o where o.obj#=:1
END OF STMT
EXEC #140506358494608:c=65,e=65,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=4,plh=107238262
4,tim=7769230600159
...
V$DIAG_SESS_SQL_TRACE_RECORDS
...
PAYLOAD
--------------------------------------------------------------------------------
FETCH #140506358494608:c=38,e=37,p=0,cr=2,cu=0,mis=0,r=0,dep=1,og=4,plh=10723826
24,tim=7769230600324
CLOSE #140506358494608:c=5,e=4,dep=1,type=3,tim=7769230600381
EXEC #140506358494608:c=23,e=23,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=4,plh=107238262
4,tim=7769230600500
FETCH #140506358494608:c=11,e=12,p=0,cr=2,cu=0,mis=0,r=0,dep=1,og=4,plh=10723826
24,tim=7769230600547
...
V$DIAG_SESS_SQL_TRACE_RECORDS
SQL> EXECUTE DBMS_SYSTEM.SET_SQL_TRACE_IN_SESSION(129,6051,FALSE);
PL/SQL procedure successfully completed.
8: Hanganalyze
enhancements
Always on - Enabled by default
Reliably detects database hangs and deadlocks
Autonomously resolves them
Logs all detections and resolutions
New SQL interface to configure sensitivity (Normal/High)
and trace file sizes
Oracle Hang Manager
Session
DIA0
EVALUATE
DETECT
ANALYZE
Hung?
VERIFY
Victim
Policy
Monitors Session snapshots for progress
Evaluates potential hangs over time with
based upon Wait Graphs
Analyzes hang chain of sessions to
identify blocker/victim
Discovers blocker is located in ASM
instance
Requests ASM terminate session or
instance relying on Flex ASM for recovery
Detection and resolution is bi-directional
Database Hang Management - Infrastructure
Database
ASM
Full Resolution Dump Trace File and DB Alert Log Audit Reports
Oracle 12c Hang Manager
Dump file …/diag/rdbms/hm6/hm62/incident/incdir_5753/hm62_dia0_12656_i5753.trc
Oracle Database 12c Enterprise Edition Release 18/19c.0.0.0 - 64bit Beta
With the Partitioning, Real Application Clusters, OLAP, Advanced Analytics
and Real Application Testing options
Build label: RDBMS_MAIN_LINUX.X64_151013
ORACLE_HOME: …/3775268204/oracle
System name: Linux
Node name: slc05kyr
Release: 2.6.39-400.211.1.el6uek.x86_64
Version: #1 SMP Fri Nov 15 13:39:16 PST 2013
Machine: x86_64
VM name: Xen Version: 3.4 (PVM)
Instance name: hm62
Redo thread mounted by this instance: 2
Oracle process number: 19
Unix process pid: 12656, image: oracle@slc05kyr (DIA0)
*** 2019-10-13T16:47:59.541509+17:00
*** SESSION ID:(96.41299) 2019-10-13T16:47:59.541519+17:00
*** CLIENT ID:() 2019-10-13T16:47:59.541529+17:00
*** SERVICE NAME:(SYS$BACKGROUND) 2019-10-13T16:47:59.541538+17:00
*** MODULE NAME:() 2019-10-13T16:47:59.541547+17:00
*** ACTION NAME:() 2019-10-13T16:47:59.541556+17:00
*** CLIENT DRIVER:() 2019-10-13T16:47:59.541565+17:00
Full Resolution Dump Trace File and DB Alert Log Audit Reports
Oracle 12c Hang Manager
2019-10-13T16:47:59.435039+17:00
Errors in file /oracle/log/diag/rdbms/hm6/hm6/trace/hm6_dia0_12433.trc (incident=7353):
ORA-32701: Possible hangs up to hang ID=1 detected
Incident details in: …/diag/rdbms/hm6/hm6/incident/incdir_7353/hm6_dia0_12433_i7353.trc
2019-10-13T16:47:59.506775+17:00
DIA0 requesting termination of session sid:40 with serial # 43179 (ospid:13031) on instance 2
due to a GLOBAL, HIGH confidence hang with ID=1.
Hang Resolution Reason: Automatic hang resolution was performed to free a
significant number of affected sessions.
DIA0: Examine the alert log on instance 2 for session termination status of hang with ID=1.
In the alert log on the instance local to the session (instance 2 in this case),
we see the following:
2019-10-13T16:47:59.538673+17:00
Errors in file …/diag/rdbms/hm6/hm62/trace/hm62_dia0_12656.trc (incident=5753):
ORA-32701: Possible hangs up to hang ID=1 detected
Incident details in: …/diag/rdbms/hm6/hm62/incident/incdir_5753/hm62_dia0_12656_i5753.trc
2019-10-13T16:48:04.222661+17:00
DIA0 terminating blocker (ospid: 13031 sid: 40 ser#: 43179) of hang with ID = 1
requested by master DIA0 process on instance 1
Hang Resolution Reason: Automatic hang resolution was performed to free a
significant number of affected sessions.
by terminating session sid:40 with serial # 43179 (ospid:13031)
9: Keep track of the attributes of
important files pre-post
patching
Start tracking using –fileattr start
Automatically discovers Grid Infrastructure and Database directories and files
• Prevent discovery using –excludediscovery
Further configure the list of monitored directories using –includedir
Track attribute changes on important files
tfactl <orachk|exachk> -fileattr start -includedir "/root/myapp/config"
...
List of directories(recursive) for checking file attributes:
/u01/app/oradb/product/11.2.0/dbhome_11203
/u01/app/oradb/product/11.2.0/dbhome_11204
/root/myapp/config
orachk has taken snapshot of file attributes for above directories at:
/orahome/oradb/orachk/orachk_mysrv21_20170504_041214
Compare current attributes against first snapshot using –fileattr check
When checking, use the same include/exclude arguments you started with
Track attribute changes on important files
tfactl <orachk|exachk> -fileattr check -includedir "/root/myapp/config”
...
List of directories(recursive) for checking file attributes:
/u01/app/oradb/product/11.2.0/dbhome_11203
/u01/app/oradb/product/11.2.0/dbhome_11204
/root/myapp/config
Checking file attribute changes...
"/root/myapp/config/myappconfig.xml" is different:
Baseline : 0644 oracle root
/root/myapp/config/myappconfig.xml
Current : 0644 root root
/root/myapp/config/myappconfig.xml
...
Automatically proceeds to run compliance checks after file attribute checks
• Only run attribute checks by using -fileattronly
File Attribute Changes are shown in HTML report output
Track attribute changes on important files
10: Event Notification
Automatically running critical checks every two hours and full checks once a day at 2am
• You only need to configure your email for notification
ORAchk | EXAchk email notification
tfactl <orachk|exachk> -set “NOTIFICATION_EMAIL=SOME.BODY@COMPANY.COM
TFA can send email notification when faults are detected
• Notification for all problems:
• Notification for all problems on database owned by oracle user:
• Optionally configure an SMTP server:
• Confirm email notification work:
Critical event notification
tfactl set notificationAddress=some.body@example.com
tfactl set notificationAddress=oracle:another.person@example.com
tfactl set smtp
tfactl sendmail <email_address>
Copyright © 2019 Oracle and/or its affiliates.
Critical event notification
Event: ORA-29770
Event time: Fri Sep
13 07:13:09 PDT 2019
File containing
event:
/u01/app/oracle/diag
/rdbms/orcl/orcl/tra
ce/alert_orcl.log
Logs will be
collected at:
/opt/oracle.ahf/data
/repository/auto_srd
c_ORA-
29770_2019_07_18:09_
myserver1.zip
Copyright © 2019 Oracle and/or its affiliates.
Critical event notification
Symptom
LCK0 (ospid:NNNN)
has not called a
wait for <n_secs>
secs.
Call stack:
ksedsts <-
kjzdssdmp <-
kjzduptcctx <-
kjzdicrshnfy <-
ksuitm <-
kjgcr_KillInstance
<- kjgcr_Main <-
kjfmlmhb_Main <-
ksbrdp
Copyright © 2019 Oracle and/or its affiliates.
Critical event notification
Action
Apply the one-off
patch 18795105 to
resolve this issue
For further
information see
Doc :1998445.1 and
Doc :18795105.8
Cause
Instance crash due
to ORA-29770 LCK0
hung
Copyright © 2019 Oracle and/or its affiliates.
Critical event notification
Evidence
Orcl_lmhb_23242.trc
(15):
ksedsts()+465<-
kjzdssdmp()+267<-
kjzduptcctx()+232<-
kjzdicrshnfy()+63<-
ksuitm()+5570<-
kjgcr_KillInstance()
+125
alert_orcl.log(140):
ORA-29770: global
enqueue process LMS0
(OSID 11912) is hung
for more than 70
seconds
11: Self Analysis in MOS
using TFA uploads
tfactl diagcollect –srdc <srdc_type>
• Scans system to identify recent events
• Once the relevant event is chosen, proceeds with diagnostic collection
One command SRDC
tfactl diagcollect -srdc ORA-00600
Enter the time of the ORA-00600 [YYYY-MM-DD HH24:MI:SS,<RETURN>=ALL] :
Enter the Database Name [<RETURN>=ALL] :
1. Sep/07/2019 05:29:58 : [orcl2] ORA-00600: internal error code,
arguments: [600], [], [], [], [], [], [], [], [], [], [], []
2. Aug/16/2019 06:55:08 : [orcl2] ORA-00600: internal error code,
arguments: [600], [], [], [], [], [], [], [], [], [], [], []
Please choose the event : 1-2 [1]
Selected value is : 1 ( Sep/07/2019 05:29:58 )
All required files are identified
• Trimmed where applicable
• Package in a zip ready to provide to support
One command SRDC
...
2019/09/07 06:14:24 EST : Getting List of Files to Collect
2019/09/07 06:14:27 EST : Trimming file :
myserver1/rdbms/orcl2/orcl2/trace/orcl2_lmhb_3542.trc with original file
size : 163MB
...
2019/09/07 06:14:58 EST : Total time taken : 39s
2019/09/07 06:14:58 EST : Completed collection of zip files.
...
/opt/oracle.ahf/data/repository/srdc_ora600_collection_Tue_Sep_7_06_14_17
_EST_2017_node_local/myserver1.tfa_srdc_ora600_Tue_Sep_7_06_14_17_EST_201
9.zip
12: AWR scripts
Collects, processes, and maintains performance statistics for problem detection and self-tuning purposes
Gathered data is stored both in memory and in the database, and is displayed in both reports and views
Automatic Workload Repository (AWR)
The statistics collected and processed by AWR include:
• Object statistics that determine both access and usage
statistics of database segments
• Time model statistics based on time usage for activities,
displayed in the V$SYS_TIME_MODEL and
V$SESS_TIME_MODEL views
• Some of the system and session statistics collected in
the V$SYSSTAT and V$SESSTAT views
• SQL statements that are producing the highest load on
the system, based on criteria such as elapsed time and
CPU time
• Active Session History (ASH) statistics, representing the
history of recent sessions activity
82
Create an AWR snapshot
Run your workload
Create an AWR snapshot
Generate report for the time period
Generating an AWR Report
SQL> EXECUTE DBMS_WORKLOAD_REPOSITORY.CREATE_SNAPSHOT()
SQL> EXECUTE DBMS_WORKLOAD_REPOSITORY.CREATE_SNAPSHOT()
SQL> @$ORACLE_HOME/rdbms/admin/awrrpt.sql
Generating an AWR Compare Periods Report for the Local Database
Generating an AWR Compare Periods Report for a Specific Database
To generate an AWR Compare Periods report for Oracle RAC on the local database instance
To generate an AWR Compare Periods report for Oracle RAC on a specific database
To generate a Global AWR report for RAC
To generate a SQL Statement report
Information on the AWR Repository
AWR Scripts
SQL> @$ORACLE_HOME/rdbms/admin/awrddrpt.sql
SQL> @$ORACLE_HOME/rdbms/admin/awrddrpi.sql
SQL> @$ORACLE_HOME/rdbms/admin/awrgdrpt.sql
SQL> @$ORACLE_HOME/rdbms/admin/awrgdrpi.sql
SQL> @$ORACLE_HOME/rdbms/admin/awrgrpt.sql
SQL> @$ORACLE_HOME/rdbms/admin/awrsqrpt.sql
SQL> @$ORACLE_HOME/rdbms/admin/awrinfo.sql
13: Sanitize sensitive
information
Copyright © 2019 Oracle and/or its affiliates.
Sensitive information can be hidden from diagnostics
Machine learning algorithms determine sensitive data like:
• Host names
• IP addresses
• MAC addresses
• Oracle Database names
• Tablespace names
• Service names
• Ports
• Operating system user names
Sanitize or mask sensitive information
Copyright © 2019 Oracle and/or its affiliates.
Add –sanitize or –mask to any command
• –sanitize replaces a sensitive value with random characters
- myhost123 >>>> JnsF3km9
• –mask replaces a sensitive value with a series of ‘X’
- myhost123 >>>> XXXXXXXX
Sanitize or mask sensitive information
Sanitized hostname
Sanitized hostname
tfactl orachk –preupgrade -sanitize
tfactl orachk -rmap qzh024703246tsa1
TFA using ORAchk : /opt/oracle.ahf/orachk/orachk
___________________________________________________________________________
| Entity Type | Substituted Entity Name | Original Entity Name |
___________________________________________________________________________
| hostname | qzh024703246tsa1 | myserver1 |
___________________________________________________________________________
Reverse map the sanitization
14: Find if anything
has changed
tfactl changes
Output from host : myserver69
------------------------------
[Sep/17/2019 04:54:15.397]: Parameter: fs.aio-nr: Value: 95488 => 97024
[Sep/17/2019 04:54:15.397]: Parameter: fs.inode-nr: Value: 764974 131561 => 740744
131259
[Sep/17/2019 04:54:15.397]: Parameter: kernel.pty.nr: Value: 2 => 1
[Sep/17/2019 04:54:15.397]: Parameter: kernel.random.entropy_avail: Value: 189 =>
158
[Sep/17/2019 04:54:15.397]: Parameter: kernel.random.uuid: Value: 36269877-9bc9-
40a3-82e0-1619865096f2 => 7551c5e7-c59f-40fa-b55f-5bd170e8b1ab
[Sep/17/2019 05:46:15.397]: Parameter: fs.aio-nr: Value: 119680 => 122880
[Sep/17/2019 05:46:15.397]: Parameter: fs.inode-nr: Value: 1580316 810036 =>
1562320 768555
[Sep/17/2019 05:46:15.397]: Parameter: kernel.pty.nr: Value: 19 => 18
[Sep/17/2019 05:46:15.397]: Parameter: kernel.random.uuid: Value: 37cc31aa-ee31-
459e-8f2a-0766b34b1b64 => f5176cdc-6390-415d-882e-02c4cff2ae4e
...
Has anything changed recently?
...
Output from host : myserver70
------------------------------
[Sep/17/2019 04:54:15.397]: Parameter: fs.aio-nr: Value: 95488 => 97024
[Sep/17/2019 04:54:15.397]: Parameter: fs.inode-nr: Value: 764974 131561 => 740744
131259
[Sep/17/2019 04:54:15.397]: Parameter: kernel.pty.nr: Value: 2 => 1
[Sep/17/2019 04:54:15.397]: Parameter: kernel.random.entropy_avail: Value: 189 =>
158
[Sep/17/2019 04:54:15.397]: Parameter: kernel.random.uuid: Value: 36269877-9bc9-
40a3-82e0-1619865096f2 => 7551c5e7-c59f-40fa-b55f-5bd170e8b1ab
[Sep/17/2019 05:46:15.397]: Parameter: fs.aio-nr: Value: 119680 => 122880
[Sep/17/2019 05:46:15.397]: Parameter: fs.inode-nr: Value: 1580316 810036 =>
1562320 768555
[Sep/17/2019 05:46:15.397]: Parameter: kernel.pty.nr: Value: 19 => 18
[Sep/17/2019 05:46:15.397]: Parameter: kernel.random.uuid: Value: 37cc31aa-ee31-
459e-8f2a-0766b34b1b64 => f5176cdc-6390-415d-882e-02c4cff2ae4e
[Sep/17/2019 16:56:15.398]: Parameter: fs.aio-nr: Value: 97024 => 98560
Has anything changed recently?
15: Detect and Collect
using SRDC’s
Other Server Technology
Enterprise Manager
Data Guard
GoldenGate
Exalogic
Database areas
Errors / Corruption
Performance
Install / patching / upgrade
RAC / Grid Infrastructure
Import / Export
RMAN
Transparent Data Encryption
Storage / partitioning
Undo / auditing
Listener / naming services
Spatial / XDB
Some problem areas covered in SRDCs
Full list in documentation
Around 100 problem types covered
tfactl diagcollect –srdc <srdc_type>
[-sr <sr_number>]
TFA SRDCManual method
Manual collection vs TFA SRDC for database performance
1. Generate ADDM reviewing Document 1680075.1 (multiple steps)
2. Identify “good” and “problem” periods and gather AWR reviewing
Document 1903158.1 (multiple steps)
3. Generate AWR compare report (awrddrpt.sql) using “good” and
“problem” periods
4. Generate ASH report for “good” and “problem” periods reviewing
Document 1903145.1 (multiple steps)
5. Collect OSWatcher data reviewing Document 301137.1 (multiple
steps)
6. Collect Hang Analyze output at Level 4
7. Generate SQL Healthcheck for problem SQL id using Document
1366133.1 (multiple steps)
8. Run support provided sql scripts – Log File sync diagnostic output
using Document 1064487.1 (multiple steps)
9. Check alert.log if there are any errors during the “problem” period
10. Find any trace files generated during the “problem” period
11. Collate and upload all the above files/outputs to SR
1. Run
tfactl diagcollect –srdc dbperf
[-sr <sr_number>]
tfactl diagcollect –srdc <srdc_type>
• Scans system to identify recent events
• Once the relevant event is chosen, proceeds with diagnostic collection
One command SRDC
tfactl diagcollect -srdc ORA-00600
Enter the time of the ORA-00600 [YYYY-MM-DD HH24:MI:SS,<RETURN>=ALL] :
Enter the Database Name [<RETURN>=ALL] :
1. Sep/07/2019 05:29:58 : [orcl2] ORA-00600: internal error code,
arguments: [600], [], [], [], [], [], [], [], [], [], [], []
2. Aug/16/2019 06:55:08 : [orcl2] ORA-00600: internal error code,
arguments: [600], [], [], [], [], [], [], [], [], [], [], []
Please choose the event : 1-2 [1]
Selected value is : 1 ( Sep/07/2019 05:29:58 )
All required files are identified
• Trimmed where applicable
• Package in a zip ready to provide to support
One command SRDC
...
2019/09/07 06:14:24 EST : Getting List of Files to Collect
2019/09/07 06:14:27 EST : Trimming file :
myserver1/rdbms/orcl2/orcl2/trace/orcl2_lmhb_3542.trc with original file
size : 163MB
...
2019/09/07 06:14:58 EST : Total time taken : 39s
2019/09/07 06:14:58 EST : Completed collection of zip files.
...
/opt/oracle.ahf/data/repository/srdc_ora600_collection_Tue_Sep_7_06_14_17
_EST_2017_node_local/myserver1.tfa_srdc_ora600_Tue_Sep_7_06_14_17_EST_201
9.zip
16: Manage logs
TFA can automatically purge database logs
Purging automatically removes logs older than 30 days
• Configurable with
Purging runs every 60 minutes
• Configurable with:
Automatic Database Log Purge
tfactl set manageLogsAutoPurge=ON
tfactl set manageLogsAutoPurgePolicyAge=<n><d|h>
tfactl set manageLogsAutoPurgeInterval=<minutes>
TFA can manage ADR log and trace files
tfactl managelogs <options>
–show usage #Show disk space usage per diagnostic directory for both
GI and database logs
-show variation –older <n><m|h|d> #Show disk space growth for specified period
-purge –older <n><m|h|d> #Remove ADR files older than the time specified
–gi #Restrict command to only files under the GI_BASE
–database [all | dbname] #Restrict command to only files under the database directory
-dryrun #Use with –purge to estimate how many files will be affected and how much disk space
will be freed by a potential purge command
Manual Database Log Purge
tfactl managelogs -show usage
...
.---------------------------------------------------------------------------------.
| Grid Infrastructure Usage |
+---------------------------------------------------------------------+-----------+
| Location | Size |
+---------------------------------------------------------------------+-----------+
| /u01/app/crsusr/diag/afdboot/user_root/host_309243680_94/alert | 28.00 KB |
| /u01/app/crsusr/diag/afdboot/user_root/host_309243680_94/incident | 4.00 KB |
| /u01/app/crsusr/diag/afdboot/user_root/host_309243680_94/trace | 8.00 KB |
...
+---------------------------------------------------------------------+-----------+
| Total | 739.06 MB |
'---------------------------------------------------------------------+-----------’
...
Understand Database log disk space usage
Use -gi to only show grid infrastructure
...
.---------------------------------------------------------------.
| Database Homes Usage |
+---------------------------------------------------+-----------+
| Location | Size |
+---------------------------------------------------+-----------+
| /u01/app/crsusr/diag/rdbms/cdb674/CDB674/alert | 1.06 MB |
| /u01/app/crsusr/diag/rdbms/cdb674/CDB674/incident | 4.00 KB |
| /u01/app/crsusr/diag/rdbms/cdb674/CDB674/trace | 146.19 MB |
| /u01/app/crsusr/diag/rdbms/cdb674/CDB674/cdump | 4.00 KB |
| /u01/app/crsusr/diag/rdbms/cdb674/CDB674/hm | 4.00 KB |
+---------------------------------------------------+-----------+
| Total | 147.26 MB |
'---------------------------------------------------+-----------'
Understand Database log disk space usage
Use -database to only show database
Understand Database log disk space usage variations
tfactl managelogs -show variation -older 30d
Output from host : myserver74
------------------------------
2019-09-20 12:30:42: INFO Checking space variation for 30 days
.---------------------------------------------------------------------------------------------.
| Grid Infrastructure Variation |
+---------------------------------------------------------------------+-----------+-----------+
| Directory | Old Size | New Size |
+---------------------------------------------------------------------+-----------+-----------+
| /u01/app/crsusr/diag/asm/user_root/host_309243680_96/alert | 22.00 KB | 28.00 KB |
+---------------------------------------------------------------------+-----------+-----------+
| /u01/app/crsusr/diag/clients/user_crsusr/host_309243680_96/cdump | 4.00 KB | 4.00 KB |
+---------------------------------------------------------------------+-----------+-----------+
| /u01/app/crsusr/diag/tnslsnr/myserver74/listener/alert | 15.06 MB | 244.10 MB |
+---------------------------------------------------------------------+-----------+-----------+
...
Understand Database log disk space usage variations
...
.---------------------------------------------------------------------------.
| Database Homes Variation |
+---------------------------------------------------+-----------+-----------+
| Directory | Old Size | New Size |
+---------------------------------------------------+-----------+-----------+
| /u01/app/crsusr/diag/rdbms/cdb674/CDB674/hm | 4.00 KB | 4.00 KB |
+---------------------------------------------------+-----------+-----------+
| /u01/app/crsusr/diag/rdbms/cdb674/CDB674/trace | 16.63 MB | 146.19 MB |
+---------------------------------------------------+-----------+-----------+
| /u01/app/crsusr/diag/rdbms/cdb674/CDB674/cdump | 4.00 KB | 4.00 KB |
+---------------------------------------------------+-----------+-----------+
| /u01/app/crsusr/diag/rdbms/cdb674/CDB674/incident | 4.00 KB | 4.00 KB |
+---------------------------------------------------+-----------+-----------+
| /u01/app/crsusr/diag/rdbms/cdb674/CDB674/alert | 1.06 MB | 1.06 MB |
'------------------------------------------------------------+-------------+-------------'
Run a database log purge dryrun
tfactl managelogs -purge -older 30d -dryrun
Output from host : myserver74
------------------------------
Estimating files older than 30 days
Estimating purge for diagnostic destination "diag/afdboot/user_root/host_309243680_94" for files ~ 2 files deleted , 22.58 KB freed ]
Estimating purge for diagnostic destination "diag/afdboot/user_crsusr/host_309243680_94" for files ~ 2 files deleted , 11.72 KB freed ]
Estimating purge for diagnostic destination "diag/asmtool/user_root/host_309243680_96" for files ~ 2 files deleted , 21.36 KB freed ]
Estimating purge for diagnostic destination "diag/asmtool/user_crsusr/host_309243680_96" for files ~ 3 files deleted , 23.22 KB freed ]
Estimating purge for diagnostic destination "diag/tnslsnr/myserver74/listener" for files ~ 23 files deleted , 225.33 MB freed ]
Estimating purge for diagnostic destination "diag/diagtool/user_root/adrci_309243680_96" for files ~ 73 files deleted , 517.69 KB freed ]
Estimating purge for diagnostic destination "diag/clients/user_crsusr/host_309243680_96" for files ~ 38 files deleted , 17.15 KB freed ]
Estimating purge for diagnostic destination "diag/asm/+asm/+ASM" for files ~ 0 files deleted , 0 bytes freed ]
Estimating purge for diagnostic destination "diag/asm/user_root/host_309243680_96" for files ~ 1 files deleted , 19.52 KB freed ]
Estimating purge for diagnostic destination "diag/asm/user_crsusr/host_309243680_96" for files ~ 1 files deleted , 20.25 KB freed ]
Estimating purge for diagnostic destination "diag/crs/myserver74/crs" for files ~ 40 files deleted , 219.39 MB freed ]
Estimation for Grid Infrastructure [ Files to delete : ~ 185 files | Space to be freed : ~ 445.36 MB ]
Estimating purge for diagnostic destination "diag/rdbms/cdb674/CDB674" for files ~ 27760 files deleted , 66.57 MB freed ]
Estimation for Database Home [ Files to delete : ~ 27760 files | Space to be freed : ~ 66.57 MB ]
Run a database log purge
tfactl managelogs -purge -older 30d
Output from host : myserver74
------------------------------
Purging files older than 30 days
Cleaning Grid Infrastructure destinations
Purging diagnostic destination "diag/afdboot/user_root/host_309243680_94" for files - 0 files deleted , 0 bytes freed
Purging diagnostic destination "diag/afdboot/user_crsusr/host_309243680_94" for files - 1 files deleted , 10.16 KB freed
Purging diagnostic destination "diag/asmtool/user_root/host_309243680_96" for files - 1 files deleted , 10.16 KB freed
Purging diagnostic destination "diag/asmtool/user_crsusr/host_309243680_96" for files - 2 files deleted , 29.18 KB freed
Purging diagnostic destination "diag/tnslsnr/myserver74/listener" for files - 2 files deleted , 29.18 KB freed
Purging diagnostic destination "diag/diagtool/user_root/adrci_309243680_96" for files - 2 files deleted , 29.18 KB freed
Purging diagnostic destination "diag/clients/user_crsusr/host_309243680_96" for files - 2 files deleted , 29.18 KB freed
Purging diagnostic destination "diag/asm/+asm/+ASM" for files - 2 files deleted , 29.18 KB freed
Purging diagnostic destination "diag/asm/user_root/host_309243680_96" for files - 2 files deleted , 29.18 KB freed
Purging diagnostic destination "diag/asm/user_crsusr/host_309243680_96" for files - 2 files deleted , 29.18 KB freed
Purging diagnostic destination "diag/crs/myserver74/crs" for files - 2 files deleted , 29.18 KB freed
...
Run a database log purge
...
Grid Infrastructure [ Files deleted : 18 files | Space Freed : 253.75 KB ]
.-----------------------------------------------------------------------------------------------.
| File System Variation : /u01/app/crsusr/12.2.0/grid2 |
+--------+-----------------------------------+----------+----------+---------+----------+-------+
| State | Name | Size | Used | Free | Capacity | Mount |
+--------+-----------------------------------+----------+----------+---------+----------+-------+
| Before | /dev/mapper/vg_rws1270665-lv_root | 51475068 | 46597152 | 2256476 | 96% | / |
| After | /dev/mapper/vg_rws1270665-lv_root | 51475068 | 46597152 | 2256476 | 96% | / |
'--------+-----------------------------------+----------+----------+---------+----------+-------'
17: Monitor multiple
logs
tail files
tfactl tail alert
Output from host : myserver69
------------------------------
/scratch/app/11.2.0.4/grid/log/myserver69/alertmyserver69.log
2019-09-25 23:28:22.532:
[ctssd(5630)]CRS-2409:The clock on host myserver69 is not synchronous with
the mean cluster time. No action has been taken as the Cluster Time
Synchronization Service is running in observer mode.
2019-09-25 23:58:22.964:
[ctssd(5630)]CRS-2409:The clock on host myserver69 is not synchronous with
the mean cluster time. No action has been taken as the Cluster Time
Synchronization Service is running in observer mode.
...
tail files
...
/scratch/app/oradb/diag/rdbms/apxcmupg/apxcmupg_2/trace/alert_apxcmupg_2.log
Wed Sep 25 06:00:00 2018 VKRM started with pid=82, OS id=4903
Wed Sep 25 06:00:02 2018 Begin automatic SQL Tuning Advisor run for special
tuning task "SYS_AUTO_SQL_TUNING_TASK"
Wed Sep 25 06:00:37 2018 End automatic SQL Tuning Advisor run for special
tuning task "SYS_AUTO_SQL_TUNING_TASK"
Wed Sep 25 23:00:28 2018 Thread 2 advanced to log sequence 759 (LGWR switch)
Current log# 3 seq# 759 mem# 0:
+DATA/apxcmupg/onlinelog/group_3.289.917164707
Current log# 3 seq# 759 mem# 1:
+FRA/apxcmupg/onlinelog/group_3.289.917164707
...
tail files
...
/scratch/app/oradb/diag/rdbms/ogg11204/ogg112041/trace/alert_ogg112041.log
Clearing Resource Manager plan via parameter
Wed Sep 25 05:59:59 2018
Setting Resource Manager plan DEFAULT_MAINTENANCE_PLAN via parameter
Wed Sep 25 05:59:59 2018
Starting background process VKRM
Wed Sep 25 05:59:59 2018
VKRM started with pid=36, OS id=4901
Wed Sep 25 22:00:31 2018
Thread 1 advanced to log sequence 305 (LGWR switch)
Current log# 1 seq# 305 mem# 0: +DATA/ogg11204/redo01.log
...
tail files
...
/scratch/app/oragrid/diag/asm/+asm/+ASM1/trace/alert_+ASM1.log <==
Sun Sep 22 04:42:22 2018
NOTE: [ocrcheck.bin@myserver69 (TNS V1-V3) 2323] opening OCR file
Mon Sep 23 01:05:39 2018
NOTE: [ocrcheck.bin@myserver69 (TNS V1-V3) 16591] opening OCR file
Mon Sep 23 01:05:41 2018
NOTE: [ocrcheck.bin@myserver69 (TNS V1-V3) 16603] opening OCR file
Mon Sep 23 01:21:12 2018
NOTE: [ocrcheck.bin@myserver69 (TNS V1-V3) 1803] opening OCR file
Mon Sep 23 01:21:12 2018
NOTE: [ocrcheck.bin@myserver69 (TNS V1-V3) 1816] opening OCR file
...
18: Monitor Database
performance
Near real-time Database monitoring
• Single instance & RAC
• Monitoring current database activities
• Database performance
• Identifying contentions and bottleneck
• Process & SQL Monitoring
• Real time wait events
• Active Data Guard support
• Multitenant Database (CDB) support
oratop (Support Tools Bundle)
Monitor Database performance
tfactl run oratop -database ogg19c
Section 1 DATABASE:
Global database
information
Section 2 INSTANCE:
Database instance
Activity
Section 3 EVENT: AWR
like “Top 5 Timed
Events“
Section 4 PROCESS |
SQL: Processes or SQL
mode information
Monitor Database performance
more info 1500864.1
19: Analyze OS Metrics
Collect & Archive OS Metrics
Executes standard UNIX utilities (e.g. vmstat, iostat, ps,
etc) on regular intervals
Built in Analyzer functionality to summarize, graph and
report upon collected metrics
Output is Required for node reboot and performance
issues
Simple to install, extremely lightweight
Runs on ALL platforms (Except Windows)
OS Watcher (Support Tools Bundle)
Analyse OS Metrics
tfactl run oswbb
Starting OSW Analyzer V8.1.2
OSWatcher Analyzer Written by Oracle Center of Expertise
Copyright (c) 2017 by Oracle Corporation
Parsing Data. Please Wait...
Scanning file headers for version and platform info...
Parsing file rws1270069_iostat_18.11.24.0900.dat ...
Parsing file rws1270069_iostat_18.11.24.1000.dat ...
...
Analyse OS Metrics
...
Enter 1 to Display CPU Process Queue Graphs
Enter 2 to Display CPU Utilization Graphs
Enter 3 to Display CPU Other Graphs
Enter 4 to Display Memory Graphs
Enter 5 to Display Disk IO Graphs
Enter GC to Generate All CPU Gif Files
Enter GM to Generate All Memory Gif Files
Enter GD to Generate All Disk Gif Files
Enter GN to Generate All Network Gif Files
Enter L to Specify Alternate Location of Gif Directory
Enter Z to Zoom Graph Time Scale (Does not change analysis dataset)
...
Analyse OS Metrics
...
Enter B to Returns to Baseline Graph Time Scale (Does not change
analysis dataset)
Enter R to Remove Currently Displayed Graphs
Enter X to Export Parsed Data to Flat File
Enter S to Analyze Subset of Data(Changes analysis dataset including
graph time scale)
Enter A to Analyze Data
Enter D to Generate DashBoard
Enter Q to Quit Program
Please Select an Option:1
Analyse OS Metrics
myserver69
Analyse OS Metrics
myserver69
more info 301137.1
20: Diagnose cluster
health
Generates view of Cluster and Database diagnostic
metrics
• Always on - Enabled by default
• Provides Detailed OS Resource Metrics
• Assists Node eviction analysis
• Locally logs all process data
• User can define pinned processes
• Listens to CSS and GIPC events
• Categorizes processes by type
• Supports plug-in collectors (ex. traceroute, netstat,
ping, etc.)
• New CSV output for ease of analysis
Cluster Health Monitor (CHM)
GIMR
ologgerd
(master)
osysmon
d
osysmon
d
osysmon
d
osysmon
d
12c Grid Infrastructure
Management Repository
OS Data OS Data
OS Data
OS Data
Cluster Health Monitor (CHM)
Confidential – Oracle Internal/Restricted/Highly
Restricted
Oclumon CLI or full integration
with EM Cloud Control
Always on - Enabled by default
Detects node and database performance problems
Provides early-warning alerts and corrective action
Supports on-site calibration to improve sensitivity
Integrated into EMCC Incident Manager and
notifications
Standalone Interactive GUI Tool
Cluster Health Advisor (CHA)*
OS Data
GIMR
ochad
DB Data
CHM
Node
Health
Prognostic
s
Engine
Database
Health
Prognostic
s
Engine
* Requires and Included with RAC or R1N License
Choosing a Data Set for Calibration – Defining “normal”
Calibrating CHA to your RAC deployment
chactl query calibration –cluster –timeranges ‘start=2016-10-28
07:00:00,end=2016-10-28 13:00:00’
Cluster name : mycluster
Start time : 2019-09-28 07:00:00
End time : 2019-09-28 13:00:00
Total Samples : 11524
Percentage of filtered data : 100%
1) Disk read (ASM) (Mbyte/sec)
MEAN MEDIAN STDDEV MIN MAX
0.11 0.00 2.62 0.00 114.66
<25 <50 <75 <100 >=100
99.87% 0.08% 0.00% 0.02% 0.03%
...
Choosing a Data Set for Calibration – Defining “normal”
Calibrating CHA to your RAC deployment
...
2) Disk write (ASM) (Mbyte/sec)
MEAN MEDIAN STDDEV MIN MAX
0.01 0.00 0.15 0.00 6.77
<50 <100 <150 <200 >=200
100.00% 0.00% 0.00% 0.00% 0.00%
...
Choosing a Data Set for Calibration – Defining “normal”
Calibrating CHA to your RAC deployment
...
3) Disk throughput (ASM) (IO/sec)
MEAN MEDIAN STDDEV MIN MAX
2.20 0.00 31.17 0.00 1100.00
<5000 <10000 <15000 <20000 >=20000
100.00% 0.00% 0.00% 0.00% 0.00%
4) CPU utilization (total) (%)
MEAN MEDIAN STDDEV MIN MAX
9.62 9.30 7.95 1.80 77.90
<20 <40 <60 <80 >=80
92.67% 6.17% 1.11% 0.05% 0.00%
...
Create and store a new model
Begin using the new model
Confirm the new model is working
Calibrating CHA to your RAC deployment
chactl query calibrate cluster –model daytime –timeranges
‘start=2018-10-28 07:00:00, end=2018-10-28 13:00:00’
chactl monitor cluster –model daytime
chactl status –verbose
monitoring nodes svr01, svr02 using model daytime
monitoring database qoltpacdb, instances oltpacdb_1, oltpacdb_2 using
model DEFAULT_DB
Enable CHA monitoring on RAC database with optional model
Enable CHA monitoring on RAC database with optional verbose
Command line operations
chactl monitor database –db oltpacdb [-model model_name]
chactl status –verbose
monitoring nodes svr01, svr02 using model DEFAULT_CLUSTER
monitoring database oltpacdb, instances oltpacdb_1, oltpacdb_2 using
model DEFAULT_DB
Check for Health Issues and Corrective Actions with CHACTL QUERY DIAGNOSIS
Command line operations
chactl query diagnosis -db oltpacdb -start "2016-10-28 01:52:50" -end "2016-10-28 03:19:15"
2019-09-28 01:47:10.0 Database oltpacdb DB Control File IO Performance (oltpacdb_1) [detected]
2019-09-28 01:47:10.0 Database oltpacdb DB Control File IO Performance (oltpacdb_2) [detected]
2019-09-28 02:59:35.0 Database oltpacdb DB Log File Switch (oltpacdb_1) [detected]
2019-09-28 02:59:45.0 Database oltpacdb DB Log File Switch (oltpacdb_2) [detected]
Problem: DB Control File IO Performance
Description: CHA has detected that reads or writes to the control files are slower than expected.
Cause: The Cluster Health Advisor (CHA) detected that reads or writes to the control files were
slow because of an increase in disk IO.
The slow control file reads and writes may have an impact on checkpoint and Log Writer (LGWR)
performance.
Action: Separate the control files from other database files and move them to faster disks or Solid
State Devices.
Problem: DB Log File Switch
Description: CHA detected that database sessions are waiting longer than expected
for log switch completions.
Cause: The Cluster Health Advisor (CHA) detected high contention during log switches
because the redo log files were small and the redo logs switched frequently.
Action: Increase the size of the redo logs.
HTML diagnostic health output available (-html <file_name>)
Command line operations
Diagnose cluster health
chactl query diagnosis -db oltpacdb -start "2019-09-26 02:52:50.0" -end "2019-09-26 03:19:15.0"
2019-09-26 01:47:10.0 Database oltpacdb DB Control File IO Performance (oltpacdb_1) [detected]
2019-09-26 01:47:10.0 Database oltpacdb DB Control File IO Performance (oltpacdb_2) [detected]
2019-09-26 02:52:15.0 Database oltpacdb DB CPU Utilization (oltpacdb_2) [detected]
2019-09-26 02:52:50.0 Database oltpacdb DB CPU Utilization (oltpacdb_1) [detected]
2019-09-26 02:59:35.0 Database oltpacdb DB Log File Switch (oltpacdb_1) [detected]
2019-09-26 02:59:45.0 Database oltpacdb DB Log File Switch (oltpacdb_2) [detected]
Thank You
Any Questions ?
Sandesh Rao
Vice President, Development
@sandeshr
https://www.linkedin.com/in/raosandesh/
https://www.slideshare.net/SandeshRao4
Troubleshooting Tips and Tricks for Database 19c   ILOUG Feb 2020

Más contenido relacionado

La actualidad más candente

EXAchk for Exadata Presentation
EXAchk for Exadata PresentationEXAchk for Exadata Presentation
EXAchk for Exadata Presentation
Sandesh Rao
 
Colvin exadata and_oem12c
Colvin exadata and_oem12cColvin exadata and_oem12c
Colvin exadata and_oem12c
Enkitec
 

La actualidad más candente (19)

15 Troubleshooting Tips and Tricks for database 21c - OGBEMEA KSAOUG
15 Troubleshooting Tips and Tricks for database 21c - OGBEMEA KSAOUG15 Troubleshooting Tips and Tricks for database 21c - OGBEMEA KSAOUG
15 Troubleshooting Tips and Tricks for database 21c - OGBEMEA KSAOUG
 
How to Use EXAchk Effectively to Manage Exadata Environments
How to Use EXAchk Effectively to Manage Exadata EnvironmentsHow to Use EXAchk Effectively to Manage Exadata Environments
How to Use EXAchk Effectively to Manage Exadata Environments
 
TFA_Whats_New_in version 12.1.2.8.4
TFA_Whats_New_in version 12.1.2.8.4TFA_Whats_New_in version 12.1.2.8.4
TFA_Whats_New_in version 12.1.2.8.4
 
Oracle ORAchk & EXAchk, What's New in 12.1.0.2.7
Oracle ORAchk & EXAchk, What's New in 12.1.0.2.7Oracle ORAchk & EXAchk, What's New in 12.1.0.2.7
Oracle ORAchk & EXAchk, What's New in 12.1.0.2.7
 
#dbhouseparty - Real World Problem Solving with SQL
#dbhouseparty - Real World Problem Solving with SQL#dbhouseparty - Real World Problem Solving with SQL
#dbhouseparty - Real World Problem Solving with SQL
 
Whats new in oracle trace file analyzer 19.2
Whats new in oracle trace file analyzer 19.2Whats new in oracle trace file analyzer 19.2
Whats new in oracle trace file analyzer 19.2
 
What's new in Oracle ORAchk & EXAchk 19.2
What's new in Oracle ORAchk & EXAchk 19.2What's new in Oracle ORAchk & EXAchk 19.2
What's new in Oracle ORAchk & EXAchk 19.2
 
Whats new in oracle trace file analyzer 18.3.0
Whats new in oracle trace file analyzer 18.3.0Whats new in oracle trace file analyzer 18.3.0
Whats new in oracle trace file analyzer 18.3.0
 
What's new in Oracle and Exachk version 18.4.0
What's new in Oracle and Exachk version 18.4.0What's new in Oracle and Exachk version 18.4.0
What's new in Oracle and Exachk version 18.4.0
 
Oracle Trace File Analyzer Overview
Oracle Trace File Analyzer OverviewOracle Trace File Analyzer Overview
Oracle Trace File Analyzer Overview
 
EXAchk for Exadata Presentation
EXAchk for Exadata PresentationEXAchk for Exadata Presentation
EXAchk for Exadata Presentation
 
Colvin exadata and_oem12c
Colvin exadata and_oem12cColvin exadata and_oem12c
Colvin exadata and_oem12c
 
Sangam 18 - The New Optimizer in Oracle 12c
Sangam 18 - The New Optimizer in Oracle 12cSangam 18 - The New Optimizer in Oracle 12c
Sangam 18 - The New Optimizer in Oracle 12c
 
AIOUG : ODEVCYathra 2018 - Oracle Autonomous Database What Every DBA should know
AIOUG : ODEVCYathra 2018 - Oracle Autonomous Database What Every DBA should knowAIOUG : ODEVCYathra 2018 - Oracle Autonomous Database What Every DBA should know
AIOUG : ODEVCYathra 2018 - Oracle Autonomous Database What Every DBA should know
 
Whats new in oracle orachk & exachk 18.4.0
Whats new in oracle orachk & exachk 18.4.0Whats new in oracle orachk & exachk 18.4.0
Whats new in oracle orachk & exachk 18.4.0
 
How to use Exachk effectively to manage Exadata environments OGBEmea
How to use Exachk effectively to manage Exadata environments OGBEmeaHow to use Exachk effectively to manage Exadata environments OGBEmea
How to use Exachk effectively to manage Exadata environments OGBEmea
 
AIOUG-GroundBreakers-Jul 2019 - 19c RAC
AIOUG-GroundBreakers-Jul 2019 - 19c RACAIOUG-GroundBreakers-Jul 2019 - 19c RAC
AIOUG-GroundBreakers-Jul 2019 - 19c RAC
 
Whats new in oracle ORAchk & EXAchk 18.3.0
Whats new in oracle ORAchk & EXAchk 18.3.0Whats new in oracle ORAchk & EXAchk 18.3.0
Whats new in oracle ORAchk & EXAchk 18.3.0
 
Oracle Autonomous Health Service- For Protecting Your On-Premise Databases- F...
Oracle Autonomous Health Service- For Protecting Your On-Premise Databases- F...Oracle Autonomous Health Service- For Protecting Your On-Premise Databases- F...
Oracle Autonomous Health Service- For Protecting Your On-Premise Databases- F...
 

Similar a Troubleshooting Tips and Tricks for Database 19c ILOUG Feb 2020

Troubleshooting linux-kernel-modules-and-device-drivers-1233050713693744-1
Troubleshooting linux-kernel-modules-and-device-drivers-1233050713693744-1Troubleshooting linux-kernel-modules-and-device-drivers-1233050713693744-1
Troubleshooting linux-kernel-modules-and-device-drivers-1233050713693744-1
Jagadisha Maiya
 
Oracle Basics and Architecture
Oracle Basics and ArchitectureOracle Basics and Architecture
Oracle Basics and Architecture
Sidney Chen
 
Wait Events 10g
Wait Events 10gWait Events 10g
Wait Events 10g
sagai
 

Similar a Troubleshooting Tips and Tricks for Database 19c ILOUG Feb 2020 (20)

Troubleshooting tips and tricks for Oracle Database Oct 2020
Troubleshooting tips and tricks for Oracle Database Oct 2020Troubleshooting tips and tricks for Oracle Database Oct 2020
Troubleshooting tips and tricks for Oracle Database Oct 2020
 
DEF CON 24 - Patrick Wardle - 99 problems little snitch
DEF CON 24 - Patrick Wardle - 99 problems little snitchDEF CON 24 - Patrick Wardle - 99 problems little snitch
DEF CON 24 - Patrick Wardle - 99 problems little snitch
 
Mach-O par Stéphane Sudre
Mach-O par Stéphane SudreMach-O par Stéphane Sudre
Mach-O par Stéphane Sudre
 
44CON London 2015 - Jtagsploitation: 5 wires, 5 ways to root
44CON London 2015 - Jtagsploitation: 5 wires, 5 ways to root44CON London 2015 - Jtagsploitation: 5 wires, 5 ways to root
44CON London 2015 - Jtagsploitation: 5 wires, 5 ways to root
 
Troubleshooting Linux Kernel Modules And Device Drivers
Troubleshooting Linux Kernel Modules And Device DriversTroubleshooting Linux Kernel Modules And Device Drivers
Troubleshooting Linux Kernel Modules And Device Drivers
 
Troubleshooting linux-kernel-modules-and-device-drivers-1233050713693744-1
Troubleshooting linux-kernel-modules-and-device-drivers-1233050713693744-1Troubleshooting linux-kernel-modules-and-device-drivers-1233050713693744-1
Troubleshooting linux-kernel-modules-and-device-drivers-1233050713693744-1
 
Bsides
BsidesBsides
Bsides
 
OSDC 2017 - Werner Fischer - Linux performance profiling and monitoring
OSDC 2017 - Werner Fischer - Linux performance profiling and monitoringOSDC 2017 - Werner Fischer - Linux performance profiling and monitoring
OSDC 2017 - Werner Fischer - Linux performance profiling and monitoring
 
The MySQL Performance Schema & New SYS Schema
The MySQL Performance Schema & New SYS SchemaThe MySQL Performance Schema & New SYS Schema
The MySQL Performance Schema & New SYS Schema
 
OSMC 2015: Linux Performance Profiling and Monitoring by Werner Fischer
OSMC 2015: Linux Performance Profiling and Monitoring by Werner FischerOSMC 2015: Linux Performance Profiling and Monitoring by Werner Fischer
OSMC 2015: Linux Performance Profiling and Monitoring by Werner Fischer
 
OSMC 2015 | Linux Performance Profiling and Monitoring by Werner Fischer
OSMC 2015 | Linux Performance Profiling and Monitoring by Werner FischerOSMC 2015 | Linux Performance Profiling and Monitoring by Werner Fischer
OSMC 2015 | Linux Performance Profiling and Monitoring by Werner Fischer
 
Java bytecode Malware Analysis
Java bytecode Malware AnalysisJava bytecode Malware Analysis
Java bytecode Malware Analysis
 
Percona Live UK 2014 Part III
Percona Live UK 2014  Part IIIPercona Live UK 2014  Part III
Percona Live UK 2014 Part III
 
Oracle Basics and Architecture
Oracle Basics and ArchitectureOracle Basics and Architecture
Oracle Basics and Architecture
 
Windows Debugging with WinDbg
Windows Debugging with WinDbgWindows Debugging with WinDbg
Windows Debugging with WinDbg
 
APEX Connect 2019 - array/bulk processing in PLSQL
APEX Connect 2019 - array/bulk processing in PLSQLAPEX Connect 2019 - array/bulk processing in PLSQL
APEX Connect 2019 - array/bulk processing in PLSQL
 
Sandboxie process isolation with kernel hooks
Sandboxie process isolation with kernel hooksSandboxie process isolation with kernel hooks
Sandboxie process isolation with kernel hooks
 
Whose Stack Is It Anyway?
Whose Stack Is It Anyway?Whose Stack Is It Anyway?
Whose Stack Is It Anyway?
 
Reverse engineering of binary programs for custom virtual machines
Reverse engineering of binary programs for custom virtual machinesReverse engineering of binary programs for custom virtual machines
Reverse engineering of binary programs for custom virtual machines
 
Wait Events 10g
Wait Events 10gWait Events 10g
Wait Events 10g
 

Más de Sandesh Rao

Más de Sandesh Rao (20)

Whats new in Autonomous Database in 2022
Whats new in Autonomous Database in 2022Whats new in Autonomous Database in 2022
Whats new in Autonomous Database in 2022
 
Oracle Database performance tuning using oratop
Oracle Database performance tuning using oratopOracle Database performance tuning using oratop
Oracle Database performance tuning using oratop
 
Analysis of Database Issues using AHF and Machine Learning v2 - AOUG2022
Analysis of Database Issues using AHF and Machine Learning v2 -  AOUG2022Analysis of Database Issues using AHF and Machine Learning v2 -  AOUG2022
Analysis of Database Issues using AHF and Machine Learning v2 - AOUG2022
 
Analysis of Database Issues using AHF and Machine Learning v2 - SOUG
Analysis of Database Issues using AHF and Machine Learning v2 -  SOUGAnalysis of Database Issues using AHF and Machine Learning v2 -  SOUG
Analysis of Database Issues using AHF and Machine Learning v2 - SOUG
 
AutoML - Heralding a New Era of Machine Learning - CASOUG Oct 2021
AutoML - Heralding a New Era of Machine Learning - CASOUG Oct 2021AutoML - Heralding a New Era of Machine Learning - CASOUG Oct 2021
AutoML - Heralding a New Era of Machine Learning - CASOUG Oct 2021
 
15 Troubleshooting tips and Tricks for Database 21c - KSAOUG
15 Troubleshooting tips and Tricks for Database 21c - KSAOUG15 Troubleshooting tips and Tricks for Database 21c - KSAOUG
15 Troubleshooting tips and Tricks for Database 21c - KSAOUG
 
Machine Learning and AI at Oracle
Machine Learning and AI at OracleMachine Learning and AI at Oracle
Machine Learning and AI at Oracle
 
Top 20 FAQs on the Autonomous Database
Top 20 FAQs on the Autonomous DatabaseTop 20 FAQs on the Autonomous Database
Top 20 FAQs on the Autonomous Database
 
TFA Collector - what can one do with it
TFA Collector - what can one do with it TFA Collector - what can one do with it
TFA Collector - what can one do with it
 
Introduction to Machine learning - DBA's to data scientists - Oct 2020 - OGBEmea
Introduction to Machine learning - DBA's to data scientists - Oct 2020 - OGBEmeaIntroduction to Machine learning - DBA's to data scientists - Oct 2020 - OGBEmea
Introduction to Machine learning - DBA's to data scientists - Oct 2020 - OGBEmea
 
Introduction to Machine Learning - From DBA's to Data Scientists - OGBEMEA
Introduction to Machine Learning - From DBA's to Data Scientists - OGBEMEAIntroduction to Machine Learning - From DBA's to Data Scientists - OGBEMEA
Introduction to Machine Learning - From DBA's to Data Scientists - OGBEMEA
 
20 tips and tricks with the Autonomous Database
20 tips and tricks with the Autonomous Database20 tips and tricks with the Autonomous Database
20 tips and tricks with the Autonomous Database
 
TFA, ORAchk and EXAchk 20.2 - What's new
TFA, ORAchk and EXAchk 20.2 - What's new TFA, ORAchk and EXAchk 20.2 - What's new
TFA, ORAchk and EXAchk 20.2 - What's new
 
Machine Learning in Autonomous Data Warehouse
 Machine Learning in Autonomous Data Warehouse Machine Learning in Autonomous Data Warehouse
Machine Learning in Autonomous Data Warehouse
 
Introduction to AutoML and Data Science using the Oracle Autonomous Database ...
Introduction to AutoML and Data Science using the Oracle Autonomous Database ...Introduction to AutoML and Data Science using the Oracle Autonomous Database ...
Introduction to AutoML and Data Science using the Oracle Autonomous Database ...
 
Introduction to Machine Learning and Data Science using Autonomous Database ...
Introduction to Machine Learning and Data Science using Autonomous Database  ...Introduction to Machine Learning and Data Science using Autonomous Database  ...
Introduction to Machine Learning and Data Science using Autonomous Database ...
 
The Machine Learning behind the Autonomous Database ILOUG Feb 2020
The Machine Learning behind the Autonomous Database   ILOUG Feb 2020 The Machine Learning behind the Autonomous Database   ILOUG Feb 2020
The Machine Learning behind the Autonomous Database ILOUG Feb 2020
 
Introduction to Machine Learning and Data Science using the Autonomous databa...
Introduction to Machine Learning and Data Science using the Autonomous databa...Introduction to Machine Learning and Data Science using the Autonomous databa...
Introduction to Machine Learning and Data Science using the Autonomous databa...
 
The Machine Learning behind the Autonomous Database- EMEA Tour Oct 2019
The Machine Learning behind the Autonomous Database- EMEA Tour Oct 2019 The Machine Learning behind the Autonomous Database- EMEA Tour Oct 2019
The Machine Learning behind the Autonomous Database- EMEA Tour Oct 2019
 
Oracle Real Application Clusters 19c- Best Practices and Internals- EMEA Tour...
Oracle Real Application Clusters 19c- Best Practices and Internals- EMEA Tour...Oracle Real Application Clusters 19c- Best Practices and Internals- EMEA Tour...
Oracle Real Application Clusters 19c- Best Practices and Internals- EMEA Tour...
 

Último

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 

Último (20)

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 

Troubleshooting Tips and Tricks for Database 19c ILOUG Feb 2020

  • 1. VP AIOps for the Autonomous Database Sandesh Rao For Database 19c 19 Troubleshooting Tips and Tricks @sandeshr https://www.linkedin.com/in/raosandesh/ https://www.slideshare.net/SandeshRao4
  • 2. The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, timing, and pricing of any features or functionality described for Oracle’s products may change and remains at the sole discretion of Oracle Corporation. Statements in this presentation relating to Oracle’s future plans, expectations, beliefs, intentions and prospects are “forward-looking statements” and are subject to material risks and uncertainties. A detailed discussion of these factors and other risks that affect our business is contained in Oracle’s Securities and Exchange Commission (SEC) filings, including our most recent reports on Form 10-K and Form 10-Q under the heading “Risk Factors.” These filings are available on the SEC’s website or on Oracle’s website at http://www.oracle.com/investor. All information in this presentation is current as of September 2019 and Oracle undertakes no duty to update any statement in light of new information or future events. Safe harbor statement
  • 3. 1. Systemstate dumps 2. ADDM in multitenant environment 3. Analyze logs 4. Connect to a hung database 5. Guide resolution with Oracle Support 6. SQLHC 7. Query trace files using SQL 8. Hanganalye enhancements 9. Track file attribute changes 10. Event notification 11. Self analysis on MOS with TFA 12. AWR scripts 13. Sanitize sensitive information 14. Find if anything changed 15. Detect and collect using SRDCs 16. Manage logs 17. Monitor multiple logs 18. Monitor Database performance 19. Analyze OS metrics 20. Diagnose cluster health Agenda
  • 4. 1: Systemstate dumps what are they and how do I read them
  • 5. A systemstate is made up of the processstate of each process in the instance found at the time the systemstate was called for. Each processtate is made up of SO (State Objects) which hold details of the state of current objects owned by each PROCESS. To navigate a statestate: 1. Find what process most sessions are waiting for 2. Recursively navigate what each process is waiting for 3. When you find a process on the CPU get an error stack to understand why it is blocked Systemstate Dumps
  • 6. These are waits for locks held upon a particular object. In the example below, the process is waiting for a TX enqueue as indicated by the "waiting for 'enq: TX - row lock contention'" message: Enqueues Systemstate Dumps PROCESS 41 ... waiting for 'enq: TX - row lock contention' blocking sess=0x39b3a5c90 seq=152 wait_time=0 seconds since wait started=796 name|mode=54580006, usn * 54580006 is ASCII and can be split up as follows to reveal the meaning: * ASCII 54 (T) + ASCII 58 (T) => (TX) + Mode 0006 (X) ...
  • 7. To find more details on the enqueue, do a search for the string 'req:' (searching DOWN) within the process. In this case we find a section with a "req:X" request: "req:" in this case refers the "request" for the TX lock that is being waited for by the 'enq: TX - row lock contention' wait. The request is for an eXclusive TX lock. This section also reveals the enqueue name as a string: (TX-00020009-0001FA04) that can be used to search for the HOLDER (the holder of the resource is shown with the string "mode:" with the mode that the lock is being held in by the holder, in this case eXclusive) : We can see we hold the enqueue (mode: X) in a incompatible mode to the req: X request... Enqueues Systemstate Dumps SO: 39ad80d60, type: 5, owner: 393cb85e0, flag: INIT/-/-/0x00 (enqueue) TX-00020009-0001FA04 DID: 0001-0029-00000090 lv: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 flag: 0x6 res: 39aef20c8, req: X, prv: 39aef20e8, own: 39b383aa8, sess: 39b383aa8, proc: 39b7384f0 (enqueue) TX-00020009-0001FA04 DID: 0001-002E-00000014 lv: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 flag: 0x6 res: 39aef20c8, mode: X, prv: 39aef20d8, own: 39b3a5c90, sess: 39b3a5c90, proc: 39b73ac78
  • 8. A Row cache waits are waits against the Row Cache (or Dictionary Cache). Processes will show "waiting for 'row cache lock’” • mode=0 shows the lock is not currently held • request=3 shows we are requesting the lock in Shared (mode 3) • object=7000000eedc13a0 show the object we are requesting the lock on • request=S shows the lock is Shared(S) • cid=7(dc_users) shows the cache type of dc_users with a cache ID of 7 • mode=X shows the lock is held in eXclusive mode Rowcache locks Systemstate Dumps PROCESS 19: ... waiting for 'row cache lock' blocking sess=0x0 seq=2174 wait_time=0 cache id=7, mode=0, request=3 -------------------------------------------------------------------------------- SO: 7000000c6de7678, type: 48, owner: 7000000a6c97cf8, flag: INIT/-/-/0x00 row cache enqueue: count=1 session=7000000a660b8b0 object=7000000eedc13a0, request=S savepoint=2148 row cache parent object: address=7000000eedc13a0 cid=7(dc_users) hash=2a057ebe typ=9 transaction=7000000c42297a0 flags=00000002 own=7000000eedc1480[7000000c6de8518,7000000c6de8518] wat=7000000eedc1490[7000000c6de7568,7000000c6deed98] mode=X status=VALID/-/-/-/-/-/-/-/- request=N release=TRUE flags=0
  • 9. This process is waiting for 'row cache lock'. The waiter is waiting for "object=7000000eedc13a0" and it is requesting a Share mode lock "request=S". To find the HOLDER, search for object but use the mode: string to indicate a holder Rowcache locks Systemstate Dumps PROCESS 19: ... waiting for 'row cache lock' blocking sess=0x0 seq=2174 wait_time=0 cache id=7, mode=0, request=3 -------------------------------------------------------------------------------- SO: 7000000c6de7678, type: 48, owner: 7000000a6c97cf8, flag: INIT/-/-/0x00 row cache enqueue: count=1 session=7000000a660b8b0 object=7000000eedc13a0, request=S savepoint=2148 row cache parent object: address=7000000eedc13a0 cid=7(dc_users) hash=2a057ebe typ=9 transaction=7000000c42297a0 flags=00000002 own=7000000eedc1480[7000000c6de8518,7000000c6de8518] wat=7000000eedc1490[7000000c6de7568,7000000c6deed98] mode=X status=VALID/-/- /-/-/-/-/-/- request=N release=TRUE flags=0 SO: 7000000c6de84e8, type: 48, owner: 7000000c42297a0, flag: INIT/-/-/0x00 row cache enqueue: count=1 session=7000000a6702710 object=7000000eedc13a0, mode=X savepoint=109 row cache parent object: address=7000000eedc13a0 cid=7(dc_users) hash=2a057ebe typ=9 transaction=7000000c42297a0 flags=00000002 own=7000000eedc1480[7000000c6de8518,7000000c6de8518] wat=7000000eedc1490[7000000c6de7568,7000000c6df1b08] mode=X status=VALID/-/-/-/-/-/-/-/- request=N release=TRUE flags=0 instance lock id=QH 00000440 00000000 set=0, complete=FALSE set=1, complete=FALSE set=2, complete=FALSE data= In this case the "mode:" of the holder is eXclusive (i.e. object=7000000eedc13a0, mode=X). Search back up to the top of this process to find which process is holding the resource.
  • 10. Waits for library cache pins are of the form" waiting for 'cursor: pin S wait on X’” To find more details use the idn=XXXXXX to search down in the systemstate (idn=535d1a6c) • SID 3094 holds the Mutex (3094,0) • Request is for Shared (GET_SHRD) mode Library Cache Pins Systemstate Dumps PROCESS 16: waiting for 'cursor: pin S wait on X' blocking sess=0x0 seq=58849 wait_time=0 seconds since wait started=0 idn=535d1a6c, value=c1600000000, where|sleeps=5003f2428 KGX Atomic Operation Log 7000002e5b9d160 Mutex 7000002b8e92268(3094, 0) idn 535d1a6c oper GET_SHRD Cursor Pin uid 2489 efd 0 whr 5 slp 58733 opr=2 pso=70000028c47def0 flg=0 pcs=7000002b8e92268 nxt=0 flg=34 cld=3 hd=70000030d6c6eb0 par=7000002eefe64d0 ct=31 hsh=0 unp=0 unn=0 hvl=b825a4d0 nhv=1 ses=700000309b42600 hep=7000002b8e922e8 flg=80 ld=1 ob=7000002de49f8a0 ptr=70000022cf39db8 fex=70000022cf390c8
  • 11. To find the HOLDER, search for idn XXXXXXX oper until you find one which is held (ie not GET_XXX) ( idn 535d1a6c oper): • SID 3094 holds Mutex in Exclusive (EXCL) Library Cache Pins Systemstate Dumps KGX Atomic Operation Log 7000002cd934270 Mutex 7000002b8e92268(3094, 0) idn 535d1a6c oper EXCL Cursor Pin uid 3094 efd 0 whr 7 slp 0 opr=3 pso=7000002a71c4180 flg=0 pcs=7000002b8e92268 nxt=0 flg=34 cld=3 hd=70000030d6c6eb0 par=7000002eefe64d0 ct=31 hsh=0 unp=0 unn=0 hvl=b825a4d0 nhv=1 ses=700000309b42600 hep=7000002b8e922e8 flg=80 ld=1 ob=7000002de49f8a0 ptr=70000022cf39db8 fex=70000022cf390c8
  • 12. To find more details use the handle address in the form handle=address to search down in the systemstate (ie handle=70000030de975a8) • Exclusive (X) Requested • <USER_NAME>.<OBJECT_NAME> is the object we are trying to lock Library Cache Lock Systemstate Dumps PROCESS 35: waiting for 'library cache lock' blocking sess=0x0 seq=35844 wait_time=0 seconds since wait started=14615 handle address=70000030de975a8, lock address=70000026947e190, 100*mode+namespace=12d SO: 70000026947e190, type: 53, owner: 700000308d726f0, flag: INIT/-/-/0x00 LIBRARY OBJECT LOCK: lock=70000026947e190 handle=70000030de975a8 request=X call pin=0 session pin=0 hpc=0000 hlc=0000 htl=70000026947e210[7000002b333ffe8,7000002b333ffe8] htb=7000002b333ffe8 ssga=7000002b333f2a0 user=700000307a7ca68 session=700000307a7ca68 count=0 flags=[0000] savepoint=0x23e411 LIBRARY OBJECT HANDLE: handle=70000030de975a8 mtx=70000030de976d8(0) cdp=0 name=<USER_NAME>.<OBJECT_NAME>
  • 13. To find the HOLDER, search for 'handle=XXXXXXXXXX mode=' until you find one which is held (but not in NULL)( handle=70000030de975a8 mode=) • Hold in Shared (S) • name=<USER_NAME>.<OBJECT_NAME> confirms the object name Library Cache Lock Systemstate Dumps SO: 700000288b03ae0, type: 53, owner: 7000002cc697468, flag: INIT/-/-/0x00 LIBRARY OBJECT LOCK: lock=700000288b03ae0 handle=70000030de975a8 mode=S call pin=0 session pin=0 hpc=0000 hlc=0000 htl=700000288b03b60[7000002a179a1a8,7000002b3800878] htb=7000002b3800878 ssga=7000002b37ffb30 user=70000030fafab00 session=70000030fafab00 count=1 flags=[0000] savepoint=0x417 LIBRARY OBJECT HANDLE: handle=70000030de975a8 mtx=70000030de976d8(0) cdp=0 name=<USER_NAME>.<OBJECT_NAME>
  • 14. • 9d is the latch# (in HEX = 157) from v$latchname Towards the top of the PROCESS dump you will see the exact latch we are waiting for and even who holds it: • PROCESS 127 (ospid:23086) holds the latch, PROCESS 127 shows: Latch free Systemstate Dumps PROCESS 8: waiting for 'latch free' blocking sess=0x0 seq=4577 wait_time=0 address=99ff60018, number=9d, tries=0 waiting for 99ff60018 Child library cache level=5 child#=3 Location from where latch is held: kglic: child Context saved from call: 26 state=busy possible holder pid = 127 ospid=23086 wtr=99ff60018, next waiter 9993858b8 holding 99ff60018 Child library cache level=5 child#=3 Location from where latch is held: kglic: child Context saved from call: 26 state=busy
  • 15. If you want to find which object a handle refers to then use the handle=XXXXXXXXXX until you come across the LIBRARY OBJECT HANDLE. ie handle=c00000006c0f8490:- • name shows the name of the handle • Namespace=CRSR show the that it is of type CURSOR Other useful information Systemstate Dumps LIBRARY OBJECT HANDLE: handle=c00000006c0f8490 name=SELECT USER FROM DUAL hash=cd1ceca0 timestamp=03-23-2007 09:00:00 namespace=CRSR flags=RON/TIM/PN0/SML/[12010000]
  • 16. 2: ADDM in a Multitenant Environment
  • 17. Starting with Oracle Database 12c, ADDM is enabled by default in the root container of a multitenant container database (CDB) You can also use ADDM in a pluggable database (PDB) • In a CDB, ADDM works in the same way as it works in a non-CDB • ADDM analysis is performed each time an AWR snapshot is taken on a CDB root or a PDB • ADDM does not work in a PDB by default, because automatic AWR snapshots are disabled ADDM in a multitenant environment
  • 18. To enable ADDM in a PDB: Set the AWR_PDB_AUTOFLUSH_ENABLED initialization parameter to TRUE in the PDB using the following command: Set the AWR snapshot interval greater than 0 in the PDB using the command as shown in the following example: Results on a PDB provide only PDB-specific findings and recommendations ADDM in a multitenant environment SQL> ALTER SYSTEM SET AWR_PDB_AUTOFLUSH_ENABLED=TRUE; SQL> EXEC dbms_workload_repository.modify_snapshot_settings(interval=>60);
  • 19. 3: Analyze logs and look for errors
  • 20. Investigate logs and look for errors tfactl analyze -since 1d INFO: analyzing all (Alert and Unix System Logs) logs for the last 1440 minutes... ... Unique error messages for last ~1 day(s) Occurrences percent server name error ----------- ------- -------------------- ----- 1 100.0% myserver1 Errors in file /u01/oracle/diag/rdbms/orcl2/orcl2/trace/orcl2_ora_12272.trc (incident=10151): ORA-00600: internal error code, arguments: [600], [], [], [], [], [], [], [], [], [], [], [] Incident details in: /u01/oracle/diag/rdbms/orcl2/orcl2/incident/incdir_10151/orcl2_ora_12272_i101 51.trc ...
  • 21. Investigate logs and look for errors tfactl analyze -search "ORA-04031" -last 1d INFO: analyzing all (Alert and Unix System Logs) logs for the last 1440 minutes... ... Matching regex: ORA-04031 Case sensitive: false Match count: 1 [Source: /u01/oracle/diag/rdbms/orcl2/orcl2/trace/alert_orcl2.log, Line: 1941] Sep 15 12:09:05 2019 Errors in file /u01/oracle/diag/rdbms/orcl2/orcl2/trace/orcl2_ora_6982.trc (incident=7665): ORA-04031: unable to allocate bytes of shared memory ("","","","") Incident details in: /u01/app/oracle/diag/rdbms/orcl2/orcl2/incident/incdir_7665/orcl2_ora_6982_i76 65.trc ...
  • 22. Examples tfactl analyze -since 5h #Show summary of events from alert logs, system messages in last 5 hours tfactl analyze -comp os -since 1d #Show summary of events from system messages in last 1 day tfactl analyze -search "ORA-" -since 2d #Search string ORA- in alert and system logs in past 2 days tfactl analyze -search "/Starting/c" - since 2d #Search case sensitive string "Starting" in past 2 days tfactl analyze -comp os -for "Feb/24/2019 11" -search "." #Show all system log messages at time Feb/24/2019 11 tfactl analyze -comp osw -since 6h #Show OSWatcher Top summary in last 6 hours tfactl analyze -comp oswslabinfo -from "Feb/26/2019 05:00:01" -to "Feb/26/2019 06:00:01" #Show OSWatcher slabinfo summary for specified time period tfactl analyze -since 1h -type generic #Analyze all generic messages in last one hour
  • 23. Investigate logs and look for errors $ ./tfactl analyze -type generic -since 7d INFO: analyzing all (Alert and Unix System Logs) logs for the last 10080 minutes... ... Total message count: 54,807, from 28-Jan-2019 04:26:28 PM PST to 03-Mar-2019 02:41:34 Messages matching last ~7 day(s): 3,139, from 24-Feb-2019 02:46:23 PM PST to 03-Mar-2019 02:41:34 last ~7 day(s) generic count: 3,139, from 24-Feb-2019 02:46:23 PM PST to 03-Mar-2019 02:41:34 last ~7 day(s) unique generic count: 94 Message types for last ~7 day(s) Occurrences percent server name type ----------- ------- -------------------- ----- 3,139 100.0% myhost1 generic ...
  • 24. Investigate logs and look for errors Unique generic messages for last ~7 day(s) Occurrences percent server name generic ----------- ------- -------------------- ----- 1,504 47.9% myhost1 : [crflogd(13931)]CRS-9520:The storage of Grid Infrastructure Managem... 487 15.5% myhost1 : [crflogd(13931)]CRS-9520:The storage of Grid Infrastructure Managem... 336 10.7% myhost1 myhost1 smartd[13812]: Device: /dev/sdv, SMART Failure: FAILURE... 336 10.7% myhost1 myhost1 smartd[13812]: Device: /dev/sdag, SMART Failure: FAILURE ... 103 3.3% myhost1 myhost1 last message repeated 9 times 103 3.3% myhost1 myhost1 kernel: oracle: sending ioctl 2285 to a partition! ...snipping for brevity...
  • 25. Pattern match search output tfactl analyze -search "ORA-" -since 7d ... [Source: /u01/app/oracle/diag/rdbms/ratoda/RATODA1/trace/alert_RATODA1.log, Line: 9494] Feb 25 22:00:02 2014 Errors in file /u01/app/oracle/diag/rdbms/ratoda/RATODA1/trace/RATODA1_j003_10948.trc: ORA-12012: error on auto execute of job "ORACLE_OCM"."MGMT_CONFIG_JOB_2_1" ORA-29280: invalid directory path ORA-06512: at "ORACLE_OCM.MGMT_DB_LL_METRICS", line 2436 ORA-06512: at line 1 End automatic SQL Tuning Advisor run for special tuning task "SYS_AUTO_SQL_TUNING_TASK” ...
  • 26. OS Watcher top data tfactl analyze -comp osw -since 6h ... statistic: t first highest (time) lowest (time) average non zero 3rd last 2nd last last trend top.cpu.util.id: % 98.0 99.7 @10:35AM 72.8 @03:11PM 97.3 2,059 95.2 96.8 96.0 -2% top.cpu.util.st: % 0.1 0.1 @09:14AM 0.0 @09:14AM 0.0 889 0.0 0.0 0.0 -100% top.cpu.util.us: % 0.1 8.8 @11:31AM 0.0 @09:14AM 0.6 1,966 4.3 0.8 3.4 3300% top.cpu.util.wa: % 1.7 18.7 @03:11PM 0.1 @10:35AM 1.1 2,059 0.3 0.4 0.4 -76% top.loadavg.last01min: 1.17 3.12 @09:44AM 0.07 @12:45PM 0.93 1,823 0.31 0.26 0.22 -81% top.loadavg.last05min: 0.94 2.26 @09:44AM 0.27 @12:45PM 0.93 1,823 0.82 0.79 0.77 -18% top.loadavg.last15min: 0.79 1.60 @09:46AM 0.44 @01:18PM 0.92 1,823 0.96 0.95 0.94 18% top.mem.buffers: k 808232 808388 @09:41AM 785608 @02:57PM 796511 2,093 785744 785744 785744 -2% top.mem.free: k 1130332 1291344 @10:02AM 927576 @09:43AM 1188576 2,093 1244020 1265248 1265188 11% top.swap.used: k 47556 48088 @03:00PM 47556 @09:14AM 47828 2,097 48088 48088 48088 1% top.tasks.running: 1 4 @12:04PM 1 @09:14AM 1 1,996 1 2 2 100% top.tasks.total: 514 527 @02:57PM 509 @09:18AM 514 1,996 518 521 520 1% top.tasks.zombie: 0 5 @11:04AM 0 @09:14AM 0 62 0 0 0 n/a top.users: 5 6 @03:00PM 5 @09:14AM 5 1,823 6 6 6 20% ...
  • 27. OS Watcher slabinfo data tfactl analyze -comp oswslabinfo -from "Feb/26/2019 05:00:01" -to "Feb/26/2019 06:00:01" ... statistic: t first highest (time) lowest (time) average non zero 3rd last 2nd last last trend slabinfo.acfs_ccb_cache.active_objs: 4 38 @05:52AM 0 @05:01AM 10 294 3 1 8 100% slabinfo.inet_peer_cache.active_objs: 23 39 @05:59AM 23 @05:00AM 23 351 23 23 39 69% slabinfo.sigqueue.active_objs: 385 768 @05:28AM 285 @05:27AM 554 351 712 621 577 49% slabinfo.skbuff_fclone_cache.active_objs: 55 133 @05:51AM 11 @05:20AM 69 351 56 77 70 27% slabinfo.names_cache.active_objs: 126 180 @05:00AM 110 @05:23AM 146 351 171 166 156 23% slabinfo.sgpool-8.active_objs: 135 228 @05:31AM 59 @05:11AM 152 351 180 165 157 16% slabinfo.UDP.active_objs: 568 675 @05:28AM 492 @05:17AM 597 351 630 596 626 10% slabinfo.size-8192.active_objs: 174 209 @05:36AM 160 @05:14AM 181 351 205 187 188 8% slabinfo.task_delay_info.active_objs: 1477 1856 @05:28AM 1334 @05:57AM 1574 351 1529 1411 1579 6% slabinfo.pid.active_objs: 1608 1980 @05:29AM 1452 @05:21AM 1678 351 1564 1487 1689 5% slabinfo.blkdev_requests.active_objs: 720 880 @05:04AM 651 @05:54AM 745 351 707 736 761 5% slabinfo.size-256.active_objs: 1116 1305 @05:06AM 846 @05:11AM 1091 351 1245 1143 1166 4% slabinfo.ip_dst_cache.active_objs: 1497 1800 @05:28AM 1279 @05:36AM 1517 351 1594 1466 1560 4% slabinfo.sock_inode_cache.active_objs: 2168 2329 @05:11AM 2106 @05:56AM 2225 351 2322 2278 2232 2% slabinfo.size-512.active_objs: 3036 3152 @05:38AM 3007 @05:01AM 3088 351 3136 3112 3075 1% ...
  • 28. 4: How to connect to a hung database for diagnostics
  • 29. How do you connect to a database when connections are hanging? • sqlplus preliminary connection will connect to database since no session is created - You will have limited access to the SGA - This will help in capturing diagnostic information like a systemstate dump • Two ways to connect to sqlplus using a preliminary connection: or sqlplus -prelim sqlplus -prelim / as sysdba SQL> set _prelim on SQL> connect / as sysdba Prelim connection established
  • 30. 5 – Guided resolution with Oracle Support
  • 31. Oracle Database ORA-00060 Errors on Single Instance (Non-RAC) Diagnosing Using Deadlock Graphs in ORA-00060 Trace Files (Doc ID 1550091.2) Troubleshooting Assistant https://support.oracle.com/epmos/faces/DocContentDisplay?id=1550091.2
  • 32. Oracle Database ORA-00060 Errors on Single Instance (Non-RAC) Diagnosing Using Deadlock Graphs in ORA-00060 Trace Files (Doc ID 1550091.2) Troubleshooting Assistant
  • 33. Understand and Troubleshoot Startup/Shutdown Issues (Doc ID 1591095.2) Troubleshooting Assistant https://support.oracle.com/epmos/faces/DocContentDisplay?id=1591095.2
  • 34. Understand and Troubleshoot Startup/Shutdown Issues (Doc ID 1591095.2) Troubleshooting Assistant
  • 35. Understand and Troubleshoot Startup/Shutdown Issues (Doc ID 1591095.2) Troubleshooting Assistant
  • 36. Understand and Troubleshoot Startup/Shutdown Issues (Doc ID 1591095.2) Troubleshooting Assistant
  • 37. Oracle Undo Management (ORA-01555, ORA-30036, ORA-01628, ORA-01552, etc.) (Doc ID 1575667.2) Troubleshooting Assistant https://support.oracle.com/epmos/faces/DocContentDisplay?id=1575667.2
  • 38. Oracle Undo Management (ORA-01555, ORA-30036, ORA-01628, ORA-01552, etc.) (Doc ID 1575667.2) Troubleshooting Assistant
  • 39. Oracle Undo Management (ORA-01555, ORA-30036, ORA-01628, ORA-01552, etc.) (Doc ID 1575667.2) Troubleshooting Assistant
  • 40. Handling Block Corruptions in Oracle7 / 8 / 8i / 9i / 10g / 11g (Doc ID 1598103.2) Troubleshooting Assistant https://support.oracle.com/epmos/faces/DocContentDisplay?id=1598103.2
  • 41. Handling Block Corruptions in Oracle7 / 8 / 8i / 9i / 10g / 11g (Doc ID 1598103.2) Troubleshooting Assistant
  • 42. Handling Block Corruptions in Oracle7 / 8 / 8i / 9i / 10g / 11g (Doc ID 1598103.2) Troubleshooting Assistant
  • 45. 1. Login to the database server and set the environment used by the Database Instance 2. Download the "sqlhc.zip" archive file and extract the contents to a suitable directory/folder 3. Connect into SQL*Plus as SYS, a DBA account, or a user with access to Data Dictionary views and simply execute the "sqlhc.sql" script. It will request to enter two parameters: i. Oracle Pack License (Tuning, Diagnostics or None) [T|D|N] (required) ii. A valid SQL_ID for the SQL to be analyzed. If site has both Tuning and Diagnostics licenses then specify T (Oracle Tuning pack includes Oracle Diagnostics) For Example: Health Check SQL # sqlplus / as sysdba SQL> START sqlhc.sql T djkbyr8vkc64h
  • 47. 7: Query trace files using SQL
  • 48. SQL> describe V$DIAG_TRACE_FILE Name Null? Type ----------------------------------------- -------- ---------------------------- ADR_HOME VARCHAR2(444) TRACE_FILENAME VARCHAR2(68) CHANGE_TIME TIMESTAMP(3) WITH TIME ZONE MODIFY_TIME TIMESTAMP(3) WITH TIME ZONE CON_ID NUMBER V$DIAG_TRACE_FILE and V$DIAG_TRACE_FILE_CONTENTS
  • 49. SQL> describe V$DIAG_TRACE_FILE_CONTENTS Name Null? Type ----------------------------------------- -------- ---------------------------- ADR_HOME VARCHAR2(444) TRACE_FILENAME VARCHAR2(68) RECORD_LEVEL NUMBER PARENT_LEVEL NUMBER RECORD_TYPE NUMBER TIMESTAMP TIMESTAMP(3) WITH TIME ZONE PAYLOAD VARCHAR2(4000) SECTION_ID NUMBER SECTION_NAME VARCHAR2(64) COMPONENT_NAME VARCHAR2(64) OPERATION_NAME VARCHAR2(64) FILE_NAME VARCHAR2(64) FUNCTION_NAME VARCHAR2(64) LINE_NUMBER NUMBER THREAD_ID VARCHAR2(64) SESSION_ID NUMBER SERIAL# NUMBER CON_UID NUMBER CONTAINER_NAME VARCHAR2(64) CON_ID NUMBER V$DIAG_TRACE_FILE and V$DIAG_TRACE_FILE_CONTENTS
  • 50. SQL> select trace_filename from v$diag_trace_file; TRACE_FILENAME -------------------------------------------------------------------- ORCL1_mz00_21108.trc ORCL1_gcr2_16504.trc ORCL1_gcr3_12849.trc ORCL1_gcr1_28159.trc ORCL1_gcr1_27603.trc ORCL1_gcr0_29971.trc ORCL1_mz00_26487.trc ORCL1_mz00_28329.trc ORCL1_ora_19005.trc ORCL1_gcr3_12879.trc ORCL1_gcr1_11688.trc V$DIAG_TRACE_FILE and V$DIAG_TRACE_FILE_CONTENTS
  • 51. SQL> select payload from v$diag_trace_file_contents where trace_filename ='ORCL1_ora_19005.trc'; PAYLOAD -------------------------------------------------------------------------------- Trace file /u01/app/oracle/diag/rdbms/orcl_unq/ORCL1/trace/ORCL1_ora_19005.trc Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production Version 19.2.0.0.0 Build label: RDBMS_19.2.0.0.0_LINUX.X64_190121 ORACLE_HOME: /u01/app/oracle/product/19c/dbhome_1 System name: Linux Node name: myserver65 Release: 4.14.35-1844.1.3.el7uek.x86_64 Version: #2 SMP Wed Jan 2 21:18:29 PST 2019 Machine: x86_64 VM name: Xen Version: 4.1 (HVM) ... V$DIAG_TRACE_FILE and V$DIAG_TRACE_FILE_CONTENTS
  • 52. ... PAYLOAD -------------------------------------------------------------------------------- Instance name: ORCL1 Redo thread mounted by this instance: 1 Oracle process number: 12 Unix process pid: 19005, image: oracle@myserver65 (TNS V1-V3) *** 2019-11-20T01:22:10.770960+00:00 *** SESSION ID:(106.17196) 2019-11-20T01:22:10.771014+00:00 *** CLIENT ID:() 2019-11-20T01:22:10.771027+00:00 *** SERVICE NAME:(SYS$USERS) 2019-11-20T01:22:10.771039+00:00 ... V$DIAG_TRACE_FILE and V$DIAG_TRACE_FILE_CONTENTS
  • 53. SQL> describe V$DIAG_SESS_SQL_TRACE_RECORDS Name Null? Type ----------------------------------------- -------- ---------------------------- ADR_HOME VARCHAR2(444) TRACE_FILENAME VARCHAR2(68) RECORD_LEVEL NUMBER PARENT_LEVEL NUMBER RECORD_TYPE NUMBER TIMESTAMP TIMESTAMP(3) WITH TIME ZONE PAYLOAD VARCHAR2(4000) SECTION_ID NUMBER SECTION_NAME VARCHAR2(64) COMPONENT_NAME VARCHAR2(64) OPERATION_NAME VARCHAR2(64) FILE_NAME VARCHAR2(64) FUNCTION_NAME VARCHAR2(64) LINE_NUMBER NUMBER THREAD_ID VARCHAR2(64) SESSION_ID NUMBER SERIAL# NUMBER CON_UID NUMBER CONTAINER_NAME VARCHAR2(64) CON_ID NUMBER V$DIAG_SESS_SQL_TRACE_RECORDS
  • 54. SQL> SELECT sid,serial# FROM v$session WHERE username = 'SYS’; SID SERIAL# ---------- ---------- 33 45888 129 6051 SQL> EXECUTE DBMS_SYSTEM.SET_SQL_TRACE_IN_SESSION(129,6051,TRUE); PL/SQL procedure successfully completed. V$DIAG_SESS_SQL_TRACE_RECORDS Enable session tracing
  • 55. SQL> select unique trace_filename from V$DIAG_SESS_SQL_TRACE_RECORDS; TRACE_FILENAME -------------------------------------------------------------------- ORCL1_ora_14151.trc SQL> select payload from V$DIAG_SESS_SQL_TRACE_RECORDS where trace_filename = 'ORCL1_ora_14151.trc'; PAYLOAD -------------------------------------------------------------------------------- CLOSE #140506358472544:c=19,e=18,dep=0,type=1,tim=7769230586778 ===================== PARSING IN CURSOR #140506358494608 len=97 dep=1 uid=0 oct=3 lid=0 tim=7769230600 163 hv=791757000 ad='7fa0c290' sqlid='87gaftwrm2h68' select o.owner#,o.name,o.namespace,o.remoteowner,o.linkname,o.subname from obj$ o where o.obj#=:1 END OF STMT EXEC #140506358494608:c=65,e=65,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=4,plh=107238262 4,tim=7769230600159 ... V$DIAG_SESS_SQL_TRACE_RECORDS
  • 56. ... PAYLOAD -------------------------------------------------------------------------------- FETCH #140506358494608:c=38,e=37,p=0,cr=2,cu=0,mis=0,r=0,dep=1,og=4,plh=10723826 24,tim=7769230600324 CLOSE #140506358494608:c=5,e=4,dep=1,type=3,tim=7769230600381 EXEC #140506358494608:c=23,e=23,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=4,plh=107238262 4,tim=7769230600500 FETCH #140506358494608:c=11,e=12,p=0,cr=2,cu=0,mis=0,r=0,dep=1,og=4,plh=10723826 24,tim=7769230600547 ... V$DIAG_SESS_SQL_TRACE_RECORDS SQL> EXECUTE DBMS_SYSTEM.SET_SQL_TRACE_IN_SESSION(129,6051,FALSE); PL/SQL procedure successfully completed.
  • 58. Always on - Enabled by default Reliably detects database hangs and deadlocks Autonomously resolves them Logs all detections and resolutions New SQL interface to configure sensitivity (Normal/High) and trace file sizes Oracle Hang Manager Session DIA0 EVALUATE DETECT ANALYZE Hung? VERIFY Victim Policy
  • 59. Monitors Session snapshots for progress Evaluates potential hangs over time with based upon Wait Graphs Analyzes hang chain of sessions to identify blocker/victim Discovers blocker is located in ASM instance Requests ASM terminate session or instance relying on Flex ASM for recovery Detection and resolution is bi-directional Database Hang Management - Infrastructure Database ASM
  • 60. Full Resolution Dump Trace File and DB Alert Log Audit Reports Oracle 12c Hang Manager Dump file …/diag/rdbms/hm6/hm62/incident/incdir_5753/hm62_dia0_12656_i5753.trc Oracle Database 12c Enterprise Edition Release 18/19c.0.0.0 - 64bit Beta With the Partitioning, Real Application Clusters, OLAP, Advanced Analytics and Real Application Testing options Build label: RDBMS_MAIN_LINUX.X64_151013 ORACLE_HOME: …/3775268204/oracle System name: Linux Node name: slc05kyr Release: 2.6.39-400.211.1.el6uek.x86_64 Version: #1 SMP Fri Nov 15 13:39:16 PST 2013 Machine: x86_64 VM name: Xen Version: 3.4 (PVM) Instance name: hm62 Redo thread mounted by this instance: 2 Oracle process number: 19 Unix process pid: 12656, image: oracle@slc05kyr (DIA0) *** 2019-10-13T16:47:59.541509+17:00 *** SESSION ID:(96.41299) 2019-10-13T16:47:59.541519+17:00 *** CLIENT ID:() 2019-10-13T16:47:59.541529+17:00 *** SERVICE NAME:(SYS$BACKGROUND) 2019-10-13T16:47:59.541538+17:00 *** MODULE NAME:() 2019-10-13T16:47:59.541547+17:00 *** ACTION NAME:() 2019-10-13T16:47:59.541556+17:00 *** CLIENT DRIVER:() 2019-10-13T16:47:59.541565+17:00
  • 61. Full Resolution Dump Trace File and DB Alert Log Audit Reports Oracle 12c Hang Manager 2019-10-13T16:47:59.435039+17:00 Errors in file /oracle/log/diag/rdbms/hm6/hm6/trace/hm6_dia0_12433.trc (incident=7353): ORA-32701: Possible hangs up to hang ID=1 detected Incident details in: …/diag/rdbms/hm6/hm6/incident/incdir_7353/hm6_dia0_12433_i7353.trc 2019-10-13T16:47:59.506775+17:00 DIA0 requesting termination of session sid:40 with serial # 43179 (ospid:13031) on instance 2 due to a GLOBAL, HIGH confidence hang with ID=1. Hang Resolution Reason: Automatic hang resolution was performed to free a significant number of affected sessions. DIA0: Examine the alert log on instance 2 for session termination status of hang with ID=1. In the alert log on the instance local to the session (instance 2 in this case), we see the following: 2019-10-13T16:47:59.538673+17:00 Errors in file …/diag/rdbms/hm6/hm62/trace/hm62_dia0_12656.trc (incident=5753): ORA-32701: Possible hangs up to hang ID=1 detected Incident details in: …/diag/rdbms/hm6/hm62/incident/incdir_5753/hm62_dia0_12656_i5753.trc 2019-10-13T16:48:04.222661+17:00 DIA0 terminating blocker (ospid: 13031 sid: 40 ser#: 43179) of hang with ID = 1 requested by master DIA0 process on instance 1 Hang Resolution Reason: Automatic hang resolution was performed to free a significant number of affected sessions. by terminating session sid:40 with serial # 43179 (ospid:13031)
  • 62. 9: Keep track of the attributes of important files pre-post patching
  • 63. Start tracking using –fileattr start Automatically discovers Grid Infrastructure and Database directories and files • Prevent discovery using –excludediscovery Further configure the list of monitored directories using –includedir Track attribute changes on important files tfactl <orachk|exachk> -fileattr start -includedir "/root/myapp/config" ... List of directories(recursive) for checking file attributes: /u01/app/oradb/product/11.2.0/dbhome_11203 /u01/app/oradb/product/11.2.0/dbhome_11204 /root/myapp/config orachk has taken snapshot of file attributes for above directories at: /orahome/oradb/orachk/orachk_mysrv21_20170504_041214
  • 64. Compare current attributes against first snapshot using –fileattr check When checking, use the same include/exclude arguments you started with Track attribute changes on important files tfactl <orachk|exachk> -fileattr check -includedir "/root/myapp/config” ... List of directories(recursive) for checking file attributes: /u01/app/oradb/product/11.2.0/dbhome_11203 /u01/app/oradb/product/11.2.0/dbhome_11204 /root/myapp/config Checking file attribute changes... "/root/myapp/config/myappconfig.xml" is different: Baseline : 0644 oracle root /root/myapp/config/myappconfig.xml Current : 0644 root root /root/myapp/config/myappconfig.xml ...
  • 65. Automatically proceeds to run compliance checks after file attribute checks • Only run attribute checks by using -fileattronly File Attribute Changes are shown in HTML report output Track attribute changes on important files
  • 67. Automatically running critical checks every two hours and full checks once a day at 2am • You only need to configure your email for notification ORAchk | EXAchk email notification tfactl <orachk|exachk> -set “NOTIFICATION_EMAIL=SOME.BODY@COMPANY.COM
  • 68. TFA can send email notification when faults are detected • Notification for all problems: • Notification for all problems on database owned by oracle user: • Optionally configure an SMTP server: • Confirm email notification work: Critical event notification tfactl set notificationAddress=some.body@example.com tfactl set notificationAddress=oracle:another.person@example.com tfactl set smtp tfactl sendmail <email_address>
  • 69. Copyright © 2019 Oracle and/or its affiliates. Critical event notification Event: ORA-29770 Event time: Fri Sep 13 07:13:09 PDT 2019 File containing event: /u01/app/oracle/diag /rdbms/orcl/orcl/tra ce/alert_orcl.log Logs will be collected at: /opt/oracle.ahf/data /repository/auto_srd c_ORA- 29770_2019_07_18:09_ myserver1.zip
  • 70. Copyright © 2019 Oracle and/or its affiliates. Critical event notification Symptom LCK0 (ospid:NNNN) has not called a wait for <n_secs> secs. Call stack: ksedsts <- kjzdssdmp <- kjzduptcctx <- kjzdicrshnfy <- ksuitm <- kjgcr_KillInstance <- kjgcr_Main <- kjfmlmhb_Main <- ksbrdp
  • 71. Copyright © 2019 Oracle and/or its affiliates. Critical event notification Action Apply the one-off patch 18795105 to resolve this issue For further information see Doc :1998445.1 and Doc :18795105.8 Cause Instance crash due to ORA-29770 LCK0 hung
  • 72. Copyright © 2019 Oracle and/or its affiliates. Critical event notification Evidence Orcl_lmhb_23242.trc (15): ksedsts()+465<- kjzdssdmp()+267<- kjzduptcctx()+232<- kjzdicrshnfy()+63<- ksuitm()+5570<- kjgcr_KillInstance() +125 alert_orcl.log(140): ORA-29770: global enqueue process LMS0 (OSID 11912) is hung for more than 70 seconds
  • 73. 11: Self Analysis in MOS using TFA uploads
  • 74.
  • 75.
  • 76. tfactl diagcollect –srdc <srdc_type> • Scans system to identify recent events • Once the relevant event is chosen, proceeds with diagnostic collection One command SRDC tfactl diagcollect -srdc ORA-00600 Enter the time of the ORA-00600 [YYYY-MM-DD HH24:MI:SS,<RETURN>=ALL] : Enter the Database Name [<RETURN>=ALL] : 1. Sep/07/2019 05:29:58 : [orcl2] ORA-00600: internal error code, arguments: [600], [], [], [], [], [], [], [], [], [], [], [] 2. Aug/16/2019 06:55:08 : [orcl2] ORA-00600: internal error code, arguments: [600], [], [], [], [], [], [], [], [], [], [], [] Please choose the event : 1-2 [1] Selected value is : 1 ( Sep/07/2019 05:29:58 )
  • 77. All required files are identified • Trimmed where applicable • Package in a zip ready to provide to support One command SRDC ... 2019/09/07 06:14:24 EST : Getting List of Files to Collect 2019/09/07 06:14:27 EST : Trimming file : myserver1/rdbms/orcl2/orcl2/trace/orcl2_lmhb_3542.trc with original file size : 163MB ... 2019/09/07 06:14:58 EST : Total time taken : 39s 2019/09/07 06:14:58 EST : Completed collection of zip files. ... /opt/oracle.ahf/data/repository/srdc_ora600_collection_Tue_Sep_7_06_14_17 _EST_2017_node_local/myserver1.tfa_srdc_ora600_Tue_Sep_7_06_14_17_EST_201 9.zip
  • 78.
  • 80. Collects, processes, and maintains performance statistics for problem detection and self-tuning purposes Gathered data is stored both in memory and in the database, and is displayed in both reports and views Automatic Workload Repository (AWR) The statistics collected and processed by AWR include: • Object statistics that determine both access and usage statistics of database segments • Time model statistics based on time usage for activities, displayed in the V$SYS_TIME_MODEL and V$SESS_TIME_MODEL views • Some of the system and session statistics collected in the V$SYSSTAT and V$SESSTAT views • SQL statements that are producing the highest load on the system, based on criteria such as elapsed time and CPU time • Active Session History (ASH) statistics, representing the history of recent sessions activity
  • 81. 82 Create an AWR snapshot Run your workload Create an AWR snapshot Generate report for the time period Generating an AWR Report SQL> EXECUTE DBMS_WORKLOAD_REPOSITORY.CREATE_SNAPSHOT() SQL> EXECUTE DBMS_WORKLOAD_REPOSITORY.CREATE_SNAPSHOT() SQL> @$ORACLE_HOME/rdbms/admin/awrrpt.sql
  • 82. Generating an AWR Compare Periods Report for the Local Database Generating an AWR Compare Periods Report for a Specific Database To generate an AWR Compare Periods report for Oracle RAC on the local database instance To generate an AWR Compare Periods report for Oracle RAC on a specific database To generate a Global AWR report for RAC To generate a SQL Statement report Information on the AWR Repository AWR Scripts SQL> @$ORACLE_HOME/rdbms/admin/awrddrpt.sql SQL> @$ORACLE_HOME/rdbms/admin/awrddrpi.sql SQL> @$ORACLE_HOME/rdbms/admin/awrgdrpt.sql SQL> @$ORACLE_HOME/rdbms/admin/awrgdrpi.sql SQL> @$ORACLE_HOME/rdbms/admin/awrgrpt.sql SQL> @$ORACLE_HOME/rdbms/admin/awrsqrpt.sql SQL> @$ORACLE_HOME/rdbms/admin/awrinfo.sql
  • 84. Copyright © 2019 Oracle and/or its affiliates. Sensitive information can be hidden from diagnostics Machine learning algorithms determine sensitive data like: • Host names • IP addresses • MAC addresses • Oracle Database names • Tablespace names • Service names • Ports • Operating system user names Sanitize or mask sensitive information
  • 85. Copyright © 2019 Oracle and/or its affiliates. Add –sanitize or –mask to any command • –sanitize replaces a sensitive value with random characters - myhost123 >>>> JnsF3km9 • –mask replaces a sensitive value with a series of ‘X’ - myhost123 >>>> XXXXXXXX Sanitize or mask sensitive information
  • 86. Sanitized hostname Sanitized hostname tfactl orachk –preupgrade -sanitize
  • 87. tfactl orachk -rmap qzh024703246tsa1 TFA using ORAchk : /opt/oracle.ahf/orachk/orachk ___________________________________________________________________________ | Entity Type | Substituted Entity Name | Original Entity Name | ___________________________________________________________________________ | hostname | qzh024703246tsa1 | myserver1 | ___________________________________________________________________________ Reverse map the sanitization
  • 88. 14: Find if anything has changed
  • 89. tfactl changes Output from host : myserver69 ------------------------------ [Sep/17/2019 04:54:15.397]: Parameter: fs.aio-nr: Value: 95488 => 97024 [Sep/17/2019 04:54:15.397]: Parameter: fs.inode-nr: Value: 764974 131561 => 740744 131259 [Sep/17/2019 04:54:15.397]: Parameter: kernel.pty.nr: Value: 2 => 1 [Sep/17/2019 04:54:15.397]: Parameter: kernel.random.entropy_avail: Value: 189 => 158 [Sep/17/2019 04:54:15.397]: Parameter: kernel.random.uuid: Value: 36269877-9bc9- 40a3-82e0-1619865096f2 => 7551c5e7-c59f-40fa-b55f-5bd170e8b1ab [Sep/17/2019 05:46:15.397]: Parameter: fs.aio-nr: Value: 119680 => 122880 [Sep/17/2019 05:46:15.397]: Parameter: fs.inode-nr: Value: 1580316 810036 => 1562320 768555 [Sep/17/2019 05:46:15.397]: Parameter: kernel.pty.nr: Value: 19 => 18 [Sep/17/2019 05:46:15.397]: Parameter: kernel.random.uuid: Value: 37cc31aa-ee31- 459e-8f2a-0766b34b1b64 => f5176cdc-6390-415d-882e-02c4cff2ae4e ... Has anything changed recently?
  • 90. ... Output from host : myserver70 ------------------------------ [Sep/17/2019 04:54:15.397]: Parameter: fs.aio-nr: Value: 95488 => 97024 [Sep/17/2019 04:54:15.397]: Parameter: fs.inode-nr: Value: 764974 131561 => 740744 131259 [Sep/17/2019 04:54:15.397]: Parameter: kernel.pty.nr: Value: 2 => 1 [Sep/17/2019 04:54:15.397]: Parameter: kernel.random.entropy_avail: Value: 189 => 158 [Sep/17/2019 04:54:15.397]: Parameter: kernel.random.uuid: Value: 36269877-9bc9- 40a3-82e0-1619865096f2 => 7551c5e7-c59f-40fa-b55f-5bd170e8b1ab [Sep/17/2019 05:46:15.397]: Parameter: fs.aio-nr: Value: 119680 => 122880 [Sep/17/2019 05:46:15.397]: Parameter: fs.inode-nr: Value: 1580316 810036 => 1562320 768555 [Sep/17/2019 05:46:15.397]: Parameter: kernel.pty.nr: Value: 19 => 18 [Sep/17/2019 05:46:15.397]: Parameter: kernel.random.uuid: Value: 37cc31aa-ee31- 459e-8f2a-0766b34b1b64 => f5176cdc-6390-415d-882e-02c4cff2ae4e [Sep/17/2019 16:56:15.398]: Parameter: fs.aio-nr: Value: 97024 => 98560 Has anything changed recently?
  • 91. 15: Detect and Collect using SRDC’s
  • 92. Other Server Technology Enterprise Manager Data Guard GoldenGate Exalogic Database areas Errors / Corruption Performance Install / patching / upgrade RAC / Grid Infrastructure Import / Export RMAN Transparent Data Encryption Storage / partitioning Undo / auditing Listener / naming services Spatial / XDB Some problem areas covered in SRDCs Full list in documentation Around 100 problem types covered tfactl diagcollect –srdc <srdc_type> [-sr <sr_number>]
  • 93. TFA SRDCManual method Manual collection vs TFA SRDC for database performance 1. Generate ADDM reviewing Document 1680075.1 (multiple steps) 2. Identify “good” and “problem” periods and gather AWR reviewing Document 1903158.1 (multiple steps) 3. Generate AWR compare report (awrddrpt.sql) using “good” and “problem” periods 4. Generate ASH report for “good” and “problem” periods reviewing Document 1903145.1 (multiple steps) 5. Collect OSWatcher data reviewing Document 301137.1 (multiple steps) 6. Collect Hang Analyze output at Level 4 7. Generate SQL Healthcheck for problem SQL id using Document 1366133.1 (multiple steps) 8. Run support provided sql scripts – Log File sync diagnostic output using Document 1064487.1 (multiple steps) 9. Check alert.log if there are any errors during the “problem” period 10. Find any trace files generated during the “problem” period 11. Collate and upload all the above files/outputs to SR 1. Run tfactl diagcollect –srdc dbperf [-sr <sr_number>]
  • 94. tfactl diagcollect –srdc <srdc_type> • Scans system to identify recent events • Once the relevant event is chosen, proceeds with diagnostic collection One command SRDC tfactl diagcollect -srdc ORA-00600 Enter the time of the ORA-00600 [YYYY-MM-DD HH24:MI:SS,<RETURN>=ALL] : Enter the Database Name [<RETURN>=ALL] : 1. Sep/07/2019 05:29:58 : [orcl2] ORA-00600: internal error code, arguments: [600], [], [], [], [], [], [], [], [], [], [], [] 2. Aug/16/2019 06:55:08 : [orcl2] ORA-00600: internal error code, arguments: [600], [], [], [], [], [], [], [], [], [], [], [] Please choose the event : 1-2 [1] Selected value is : 1 ( Sep/07/2019 05:29:58 )
  • 95. All required files are identified • Trimmed where applicable • Package in a zip ready to provide to support One command SRDC ... 2019/09/07 06:14:24 EST : Getting List of Files to Collect 2019/09/07 06:14:27 EST : Trimming file : myserver1/rdbms/orcl2/orcl2/trace/orcl2_lmhb_3542.trc with original file size : 163MB ... 2019/09/07 06:14:58 EST : Total time taken : 39s 2019/09/07 06:14:58 EST : Completed collection of zip files. ... /opt/oracle.ahf/data/repository/srdc_ora600_collection_Tue_Sep_7_06_14_17 _EST_2017_node_local/myserver1.tfa_srdc_ora600_Tue_Sep_7_06_14_17_EST_201 9.zip
  • 97. TFA can automatically purge database logs Purging automatically removes logs older than 30 days • Configurable with Purging runs every 60 minutes • Configurable with: Automatic Database Log Purge tfactl set manageLogsAutoPurge=ON tfactl set manageLogsAutoPurgePolicyAge=<n><d|h> tfactl set manageLogsAutoPurgeInterval=<minutes>
  • 98. TFA can manage ADR log and trace files tfactl managelogs <options> –show usage #Show disk space usage per diagnostic directory for both GI and database logs -show variation –older <n><m|h|d> #Show disk space growth for specified period -purge –older <n><m|h|d> #Remove ADR files older than the time specified –gi #Restrict command to only files under the GI_BASE –database [all | dbname] #Restrict command to only files under the database directory -dryrun #Use with –purge to estimate how many files will be affected and how much disk space will be freed by a potential purge command Manual Database Log Purge
  • 99. tfactl managelogs -show usage ... .---------------------------------------------------------------------------------. | Grid Infrastructure Usage | +---------------------------------------------------------------------+-----------+ | Location | Size | +---------------------------------------------------------------------+-----------+ | /u01/app/crsusr/diag/afdboot/user_root/host_309243680_94/alert | 28.00 KB | | /u01/app/crsusr/diag/afdboot/user_root/host_309243680_94/incident | 4.00 KB | | /u01/app/crsusr/diag/afdboot/user_root/host_309243680_94/trace | 8.00 KB | ... +---------------------------------------------------------------------+-----------+ | Total | 739.06 MB | '---------------------------------------------------------------------+-----------’ ... Understand Database log disk space usage Use -gi to only show grid infrastructure
  • 100. ... .---------------------------------------------------------------. | Database Homes Usage | +---------------------------------------------------+-----------+ | Location | Size | +---------------------------------------------------+-----------+ | /u01/app/crsusr/diag/rdbms/cdb674/CDB674/alert | 1.06 MB | | /u01/app/crsusr/diag/rdbms/cdb674/CDB674/incident | 4.00 KB | | /u01/app/crsusr/diag/rdbms/cdb674/CDB674/trace | 146.19 MB | | /u01/app/crsusr/diag/rdbms/cdb674/CDB674/cdump | 4.00 KB | | /u01/app/crsusr/diag/rdbms/cdb674/CDB674/hm | 4.00 KB | +---------------------------------------------------+-----------+ | Total | 147.26 MB | '---------------------------------------------------+-----------' Understand Database log disk space usage Use -database to only show database
  • 101. Understand Database log disk space usage variations tfactl managelogs -show variation -older 30d Output from host : myserver74 ------------------------------ 2019-09-20 12:30:42: INFO Checking space variation for 30 days .---------------------------------------------------------------------------------------------. | Grid Infrastructure Variation | +---------------------------------------------------------------------+-----------+-----------+ | Directory | Old Size | New Size | +---------------------------------------------------------------------+-----------+-----------+ | /u01/app/crsusr/diag/asm/user_root/host_309243680_96/alert | 22.00 KB | 28.00 KB | +---------------------------------------------------------------------+-----------+-----------+ | /u01/app/crsusr/diag/clients/user_crsusr/host_309243680_96/cdump | 4.00 KB | 4.00 KB | +---------------------------------------------------------------------+-----------+-----------+ | /u01/app/crsusr/diag/tnslsnr/myserver74/listener/alert | 15.06 MB | 244.10 MB | +---------------------------------------------------------------------+-----------+-----------+ ...
  • 102. Understand Database log disk space usage variations ... .---------------------------------------------------------------------------. | Database Homes Variation | +---------------------------------------------------+-----------+-----------+ | Directory | Old Size | New Size | +---------------------------------------------------+-----------+-----------+ | /u01/app/crsusr/diag/rdbms/cdb674/CDB674/hm | 4.00 KB | 4.00 KB | +---------------------------------------------------+-----------+-----------+ | /u01/app/crsusr/diag/rdbms/cdb674/CDB674/trace | 16.63 MB | 146.19 MB | +---------------------------------------------------+-----------+-----------+ | /u01/app/crsusr/diag/rdbms/cdb674/CDB674/cdump | 4.00 KB | 4.00 KB | +---------------------------------------------------+-----------+-----------+ | /u01/app/crsusr/diag/rdbms/cdb674/CDB674/incident | 4.00 KB | 4.00 KB | +---------------------------------------------------+-----------+-----------+ | /u01/app/crsusr/diag/rdbms/cdb674/CDB674/alert | 1.06 MB | 1.06 MB | '------------------------------------------------------------+-------------+-------------'
  • 103. Run a database log purge dryrun tfactl managelogs -purge -older 30d -dryrun Output from host : myserver74 ------------------------------ Estimating files older than 30 days Estimating purge for diagnostic destination "diag/afdboot/user_root/host_309243680_94" for files ~ 2 files deleted , 22.58 KB freed ] Estimating purge for diagnostic destination "diag/afdboot/user_crsusr/host_309243680_94" for files ~ 2 files deleted , 11.72 KB freed ] Estimating purge for diagnostic destination "diag/asmtool/user_root/host_309243680_96" for files ~ 2 files deleted , 21.36 KB freed ] Estimating purge for diagnostic destination "diag/asmtool/user_crsusr/host_309243680_96" for files ~ 3 files deleted , 23.22 KB freed ] Estimating purge for diagnostic destination "diag/tnslsnr/myserver74/listener" for files ~ 23 files deleted , 225.33 MB freed ] Estimating purge for diagnostic destination "diag/diagtool/user_root/adrci_309243680_96" for files ~ 73 files deleted , 517.69 KB freed ] Estimating purge for diagnostic destination "diag/clients/user_crsusr/host_309243680_96" for files ~ 38 files deleted , 17.15 KB freed ] Estimating purge for diagnostic destination "diag/asm/+asm/+ASM" for files ~ 0 files deleted , 0 bytes freed ] Estimating purge for diagnostic destination "diag/asm/user_root/host_309243680_96" for files ~ 1 files deleted , 19.52 KB freed ] Estimating purge for diagnostic destination "diag/asm/user_crsusr/host_309243680_96" for files ~ 1 files deleted , 20.25 KB freed ] Estimating purge for diagnostic destination "diag/crs/myserver74/crs" for files ~ 40 files deleted , 219.39 MB freed ] Estimation for Grid Infrastructure [ Files to delete : ~ 185 files | Space to be freed : ~ 445.36 MB ] Estimating purge for diagnostic destination "diag/rdbms/cdb674/CDB674" for files ~ 27760 files deleted , 66.57 MB freed ] Estimation for Database Home [ Files to delete : ~ 27760 files | Space to be freed : ~ 66.57 MB ]
  • 104. Run a database log purge tfactl managelogs -purge -older 30d Output from host : myserver74 ------------------------------ Purging files older than 30 days Cleaning Grid Infrastructure destinations Purging diagnostic destination "diag/afdboot/user_root/host_309243680_94" for files - 0 files deleted , 0 bytes freed Purging diagnostic destination "diag/afdboot/user_crsusr/host_309243680_94" for files - 1 files deleted , 10.16 KB freed Purging diagnostic destination "diag/asmtool/user_root/host_309243680_96" for files - 1 files deleted , 10.16 KB freed Purging diagnostic destination "diag/asmtool/user_crsusr/host_309243680_96" for files - 2 files deleted , 29.18 KB freed Purging diagnostic destination "diag/tnslsnr/myserver74/listener" for files - 2 files deleted , 29.18 KB freed Purging diagnostic destination "diag/diagtool/user_root/adrci_309243680_96" for files - 2 files deleted , 29.18 KB freed Purging diagnostic destination "diag/clients/user_crsusr/host_309243680_96" for files - 2 files deleted , 29.18 KB freed Purging diagnostic destination "diag/asm/+asm/+ASM" for files - 2 files deleted , 29.18 KB freed Purging diagnostic destination "diag/asm/user_root/host_309243680_96" for files - 2 files deleted , 29.18 KB freed Purging diagnostic destination "diag/asm/user_crsusr/host_309243680_96" for files - 2 files deleted , 29.18 KB freed Purging diagnostic destination "diag/crs/myserver74/crs" for files - 2 files deleted , 29.18 KB freed ...
  • 105. Run a database log purge ... Grid Infrastructure [ Files deleted : 18 files | Space Freed : 253.75 KB ] .-----------------------------------------------------------------------------------------------. | File System Variation : /u01/app/crsusr/12.2.0/grid2 | +--------+-----------------------------------+----------+----------+---------+----------+-------+ | State | Name | Size | Used | Free | Capacity | Mount | +--------+-----------------------------------+----------+----------+---------+----------+-------+ | Before | /dev/mapper/vg_rws1270665-lv_root | 51475068 | 46597152 | 2256476 | 96% | / | | After | /dev/mapper/vg_rws1270665-lv_root | 51475068 | 46597152 | 2256476 | 96% | / | '--------+-----------------------------------+----------+----------+---------+----------+-------'
  • 107. tail files tfactl tail alert Output from host : myserver69 ------------------------------ /scratch/app/11.2.0.4/grid/log/myserver69/alertmyserver69.log 2019-09-25 23:28:22.532: [ctssd(5630)]CRS-2409:The clock on host myserver69 is not synchronous with the mean cluster time. No action has been taken as the Cluster Time Synchronization Service is running in observer mode. 2019-09-25 23:58:22.964: [ctssd(5630)]CRS-2409:The clock on host myserver69 is not synchronous with the mean cluster time. No action has been taken as the Cluster Time Synchronization Service is running in observer mode. ...
  • 108. tail files ... /scratch/app/oradb/diag/rdbms/apxcmupg/apxcmupg_2/trace/alert_apxcmupg_2.log Wed Sep 25 06:00:00 2018 VKRM started with pid=82, OS id=4903 Wed Sep 25 06:00:02 2018 Begin automatic SQL Tuning Advisor run for special tuning task "SYS_AUTO_SQL_TUNING_TASK" Wed Sep 25 06:00:37 2018 End automatic SQL Tuning Advisor run for special tuning task "SYS_AUTO_SQL_TUNING_TASK" Wed Sep 25 23:00:28 2018 Thread 2 advanced to log sequence 759 (LGWR switch) Current log# 3 seq# 759 mem# 0: +DATA/apxcmupg/onlinelog/group_3.289.917164707 Current log# 3 seq# 759 mem# 1: +FRA/apxcmupg/onlinelog/group_3.289.917164707 ...
  • 109. tail files ... /scratch/app/oradb/diag/rdbms/ogg11204/ogg112041/trace/alert_ogg112041.log Clearing Resource Manager plan via parameter Wed Sep 25 05:59:59 2018 Setting Resource Manager plan DEFAULT_MAINTENANCE_PLAN via parameter Wed Sep 25 05:59:59 2018 Starting background process VKRM Wed Sep 25 05:59:59 2018 VKRM started with pid=36, OS id=4901 Wed Sep 25 22:00:31 2018 Thread 1 advanced to log sequence 305 (LGWR switch) Current log# 1 seq# 305 mem# 0: +DATA/ogg11204/redo01.log ...
  • 110. tail files ... /scratch/app/oragrid/diag/asm/+asm/+ASM1/trace/alert_+ASM1.log <== Sun Sep 22 04:42:22 2018 NOTE: [ocrcheck.bin@myserver69 (TNS V1-V3) 2323] opening OCR file Mon Sep 23 01:05:39 2018 NOTE: [ocrcheck.bin@myserver69 (TNS V1-V3) 16591] opening OCR file Mon Sep 23 01:05:41 2018 NOTE: [ocrcheck.bin@myserver69 (TNS V1-V3) 16603] opening OCR file Mon Sep 23 01:21:12 2018 NOTE: [ocrcheck.bin@myserver69 (TNS V1-V3) 1803] opening OCR file Mon Sep 23 01:21:12 2018 NOTE: [ocrcheck.bin@myserver69 (TNS V1-V3) 1816] opening OCR file ...
  • 112. Near real-time Database monitoring • Single instance & RAC • Monitoring current database activities • Database performance • Identifying contentions and bottleneck • Process & SQL Monitoring • Real time wait events • Active Data Guard support • Multitenant Database (CDB) support oratop (Support Tools Bundle)
  • 113. Monitor Database performance tfactl run oratop -database ogg19c
  • 114. Section 1 DATABASE: Global database information Section 2 INSTANCE: Database instance Activity Section 3 EVENT: AWR like “Top 5 Timed Events“ Section 4 PROCESS | SQL: Processes or SQL mode information Monitor Database performance more info 1500864.1
  • 115. 19: Analyze OS Metrics
  • 116. Collect & Archive OS Metrics Executes standard UNIX utilities (e.g. vmstat, iostat, ps, etc) on regular intervals Built in Analyzer functionality to summarize, graph and report upon collected metrics Output is Required for node reboot and performance issues Simple to install, extremely lightweight Runs on ALL platforms (Except Windows) OS Watcher (Support Tools Bundle)
  • 117. Analyse OS Metrics tfactl run oswbb Starting OSW Analyzer V8.1.2 OSWatcher Analyzer Written by Oracle Center of Expertise Copyright (c) 2017 by Oracle Corporation Parsing Data. Please Wait... Scanning file headers for version and platform info... Parsing file rws1270069_iostat_18.11.24.0900.dat ... Parsing file rws1270069_iostat_18.11.24.1000.dat ... ...
  • 118. Analyse OS Metrics ... Enter 1 to Display CPU Process Queue Graphs Enter 2 to Display CPU Utilization Graphs Enter 3 to Display CPU Other Graphs Enter 4 to Display Memory Graphs Enter 5 to Display Disk IO Graphs Enter GC to Generate All CPU Gif Files Enter GM to Generate All Memory Gif Files Enter GD to Generate All Disk Gif Files Enter GN to Generate All Network Gif Files Enter L to Specify Alternate Location of Gif Directory Enter Z to Zoom Graph Time Scale (Does not change analysis dataset) ...
  • 119. Analyse OS Metrics ... Enter B to Returns to Baseline Graph Time Scale (Does not change analysis dataset) Enter R to Remove Currently Displayed Graphs Enter X to Export Parsed Data to Flat File Enter S to Analyze Subset of Data(Changes analysis dataset including graph time scale) Enter A to Analyze Data Enter D to Generate DashBoard Enter Q to Quit Program Please Select an Option:1
  • 123. Generates view of Cluster and Database diagnostic metrics • Always on - Enabled by default • Provides Detailed OS Resource Metrics • Assists Node eviction analysis • Locally logs all process data • User can define pinned processes • Listens to CSS and GIPC events • Categorizes processes by type • Supports plug-in collectors (ex. traceroute, netstat, ping, etc.) • New CSV output for ease of analysis Cluster Health Monitor (CHM) GIMR ologgerd (master) osysmon d osysmon d osysmon d osysmon d 12c Grid Infrastructure Management Repository OS Data OS Data OS Data OS Data
  • 124. Cluster Health Monitor (CHM) Confidential – Oracle Internal/Restricted/Highly Restricted Oclumon CLI or full integration with EM Cloud Control
  • 125. Always on - Enabled by default Detects node and database performance problems Provides early-warning alerts and corrective action Supports on-site calibration to improve sensitivity Integrated into EMCC Incident Manager and notifications Standalone Interactive GUI Tool Cluster Health Advisor (CHA)* OS Data GIMR ochad DB Data CHM Node Health Prognostic s Engine Database Health Prognostic s Engine * Requires and Included with RAC or R1N License
  • 126. Choosing a Data Set for Calibration – Defining “normal” Calibrating CHA to your RAC deployment chactl query calibration –cluster –timeranges ‘start=2016-10-28 07:00:00,end=2016-10-28 13:00:00’ Cluster name : mycluster Start time : 2019-09-28 07:00:00 End time : 2019-09-28 13:00:00 Total Samples : 11524 Percentage of filtered data : 100% 1) Disk read (ASM) (Mbyte/sec) MEAN MEDIAN STDDEV MIN MAX 0.11 0.00 2.62 0.00 114.66 <25 <50 <75 <100 >=100 99.87% 0.08% 0.00% 0.02% 0.03% ...
  • 127. Choosing a Data Set for Calibration – Defining “normal” Calibrating CHA to your RAC deployment ... 2) Disk write (ASM) (Mbyte/sec) MEAN MEDIAN STDDEV MIN MAX 0.01 0.00 0.15 0.00 6.77 <50 <100 <150 <200 >=200 100.00% 0.00% 0.00% 0.00% 0.00% ...
  • 128. Choosing a Data Set for Calibration – Defining “normal” Calibrating CHA to your RAC deployment ... 3) Disk throughput (ASM) (IO/sec) MEAN MEDIAN STDDEV MIN MAX 2.20 0.00 31.17 0.00 1100.00 <5000 <10000 <15000 <20000 >=20000 100.00% 0.00% 0.00% 0.00% 0.00% 4) CPU utilization (total) (%) MEAN MEDIAN STDDEV MIN MAX 9.62 9.30 7.95 1.80 77.90 <20 <40 <60 <80 >=80 92.67% 6.17% 1.11% 0.05% 0.00% ...
  • 129. Create and store a new model Begin using the new model Confirm the new model is working Calibrating CHA to your RAC deployment chactl query calibrate cluster –model daytime –timeranges ‘start=2018-10-28 07:00:00, end=2018-10-28 13:00:00’ chactl monitor cluster –model daytime chactl status –verbose monitoring nodes svr01, svr02 using model daytime monitoring database qoltpacdb, instances oltpacdb_1, oltpacdb_2 using model DEFAULT_DB
  • 130. Enable CHA monitoring on RAC database with optional model Enable CHA monitoring on RAC database with optional verbose Command line operations chactl monitor database –db oltpacdb [-model model_name] chactl status –verbose monitoring nodes svr01, svr02 using model DEFAULT_CLUSTER monitoring database oltpacdb, instances oltpacdb_1, oltpacdb_2 using model DEFAULT_DB
  • 131. Check for Health Issues and Corrective Actions with CHACTL QUERY DIAGNOSIS Command line operations chactl query diagnosis -db oltpacdb -start "2016-10-28 01:52:50" -end "2016-10-28 03:19:15" 2019-09-28 01:47:10.0 Database oltpacdb DB Control File IO Performance (oltpacdb_1) [detected] 2019-09-28 01:47:10.0 Database oltpacdb DB Control File IO Performance (oltpacdb_2) [detected] 2019-09-28 02:59:35.0 Database oltpacdb DB Log File Switch (oltpacdb_1) [detected] 2019-09-28 02:59:45.0 Database oltpacdb DB Log File Switch (oltpacdb_2) [detected] Problem: DB Control File IO Performance Description: CHA has detected that reads or writes to the control files are slower than expected. Cause: The Cluster Health Advisor (CHA) detected that reads or writes to the control files were slow because of an increase in disk IO. The slow control file reads and writes may have an impact on checkpoint and Log Writer (LGWR) performance. Action: Separate the control files from other database files and move them to faster disks or Solid State Devices. Problem: DB Log File Switch Description: CHA detected that database sessions are waiting longer than expected for log switch completions. Cause: The Cluster Health Advisor (CHA) detected high contention during log switches because the redo log files were small and the redo logs switched frequently. Action: Increase the size of the redo logs.
  • 132. HTML diagnostic health output available (-html <file_name>) Command line operations
  • 133. Diagnose cluster health chactl query diagnosis -db oltpacdb -start "2019-09-26 02:52:50.0" -end "2019-09-26 03:19:15.0" 2019-09-26 01:47:10.0 Database oltpacdb DB Control File IO Performance (oltpacdb_1) [detected] 2019-09-26 01:47:10.0 Database oltpacdb DB Control File IO Performance (oltpacdb_2) [detected] 2019-09-26 02:52:15.0 Database oltpacdb DB CPU Utilization (oltpacdb_2) [detected] 2019-09-26 02:52:50.0 Database oltpacdb DB CPU Utilization (oltpacdb_1) [detected] 2019-09-26 02:59:35.0 Database oltpacdb DB Log File Switch (oltpacdb_1) [detected] 2019-09-26 02:59:45.0 Database oltpacdb DB Log File Switch (oltpacdb_2) [detected]
  • 134. Thank You Any Questions ? Sandesh Rao Vice President, Development @sandeshr https://www.linkedin.com/in/raosandesh/ https://www.slideshare.net/SandeshRao4