SlideShare una empresa de Scribd logo
1 de 141
Descargar para leer sin conexión
VP AIOps for the Autonomous Database
Sandesh Rao
15 Troubleshooting tips and tricks for the
Oracle Database
@sandeshr
https://www.linkedin.com/in/raosandesh/
https://www.slideshare.net/SandeshRao4
Systemstate dumps what they are
and how to read them
A systemstate is made up of the processtate of each process in the instance
found at the time the systemstate was called for.
Each processtate is made up of SO (State Objects) which hold details of the
state of current objects owned by each PROCESS.
To navigate a systemstate:
1. Find what process most sessions are waiting for
2. Recursively navigate what each process is waiting for
3. When you find a process on the CPU get an error stack to understand
why it is blocked
Systemstate Dumps
These are waits for locks held upon a particular object. In the example below, the process is waiting for
a TX enqueue as indicated by the "waiting for 'enq: TX - row lock contention'" message:
Enqueues
Systemstate Dumps
PROCESS 41
...
waiting for 'enq: TX - row lock contention' blocking sess=0x39b3a5c90 seq=152 wait_time=0 seconds since wait
started=796
name|mode=54580006, usn * 54580006 is ASCII and can be split up as follows to reveal the meaning:
* ASCII 54 (T) + ASCII 58 (T) => (TX) + Mode 0006 (X) ...
To find more details on the enqueue, do a search for the string 'req:' (searching DOWN) within the
process. In this case we find a section with a "req:X" request:
"req:" in this case refers the "request" for the TX lock that is being waited for by the 'enq: TX - row lock
contention' wait. The request is for an eXclusive TX lock.
This section also reveals the enqueue name as a string: (TX-00020009-0001FA04) that can be used to
search for the HOLDER (the holder of the resource is shown with the string "mode:" with the mode that
the lock is being held in by the holder, in this case eXclusive) :
We can see we hold the enqueue (mode: X) in a incompatible mode to the req: X request...
Enqueues
Systemstate Dumps
SO: 39ad80d60, type: 5, owner: 393cb85e0, flag: INIT/-/-/0x00
(enqueue) TX-00020009-0001FA04 DID: 0001-0029-00000090
lv: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 flag: 0x6
res: 39aef20c8, req: X, prv: 39aef20e8, own: 39b383aa8, sess: 39b383aa8, proc: 39b7384f0
(enqueue) TX-00020009-0001FA04 DID: 0001-002E-00000014
lv: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 flag: 0x6
res: 39aef20c8, mode: X, prv: 39aef20d8, own: 39b3a5c90, sess: 39b3a5c90, proc: 39b73ac78
A Row cache waits are waits against the Row Cache (or Dictionary Cache). Processes will show "waiting
for 'row cache lock’”
• mode=0 shows the lock is not currently held
• request=3 shows we are requesting the lock in Shared (mode 3)
• object=7000000eedc13a0 show the object we are requesting the lock on
• request=S shows the lock is Shared(S)
• cid=7(dc_users) shows the cache type of dc_users with a cache ID of 7
• mode=X shows the lock is held in eXclusive mode
Rowcache locks
Systemstate Dumps
PROCESS 19:
...
waiting for 'row cache lock' blocking sess=0x0 seq=2174 wait_time=0
cache id=7, mode=0, request=3
--------------------------------------------------------------------------------
SO: 7000000c6de7678, type: 48, owner: 7000000a6c97cf8, flag: INIT/-/-/0x00
row cache enqueue: count=1 session=7000000a660b8b0 object=7000000eedc13a0, request=S
savepoint=2148
row cache parent object: address=7000000eedc13a0 cid=7(dc_users) hash=2a057ebe typ=9 transaction=7000000c42297a0
flags=00000002
own=7000000eedc1480[7000000c6de8518,7000000c6de8518] wat=7000000eedc1490[7000000c6de7568,7000000c6deed98] mode=X
status=VALID/-/-/-/-/-/-/-/-
request=N release=TRUE flags=0
This process is waiting for 'row cache lock'. The waiter is waiting for "object=7000000eedc13a0" and it is requesting a
Share mode lock "request=S". To find the HOLDER, search for object but use the mode: string to indicate a holder
Rowcache locks
Systemstate Dumps
PROCESS 19:
...
waiting for 'row cache lock' blocking sess=0x0 seq=2174 wait_time=0
cache id=7, mode=0, request=3
--------------------------------------------------------------------------------
SO: 7000000c6de7678, type: 48, owner: 7000000a6c97cf8, flag: INIT/-/-/0x00
row cache enqueue: count=1 session=7000000a660b8b0 object=7000000eedc13a0, request=S
savepoint=2148
row cache parent object: address=7000000eedc13a0 cid=7(dc_users) hash=2a057ebe typ=9 transaction=7000000c42297a0 flags=00000002
own=7000000eedc1480[7000000c6de8518,7000000c6de8518] wat=7000000eedc1490[7000000c6de7568,7000000c6deed98] mode=X status=VALID/-/-
/-/-/-/-/-/-
request=N release=TRUE flags=0
SO: 7000000c6de84e8, type: 48, owner: 7000000c42297a0, flag: INIT/-/-/0x00
row cache enqueue: count=1 session=7000000a6702710 object=7000000eedc13a0, mode=X
savepoint=109
row cache parent object: address=7000000eedc13a0 cid=7(dc_users)
hash=2a057ebe typ=9 transaction=7000000c42297a0 flags=00000002
own=7000000eedc1480[7000000c6de8518,7000000c6de8518] wat=7000000eedc1490[7000000c6de7568,7000000c6df1b08] mode=X
status=VALID/-/-/-/-/-/-/-/-
request=N release=TRUE flags=0
instance lock id=QH 00000440 00000000
set=0, complete=FALSE
set=1, complete=FALSE
set=2, complete=FALSE
data=
In this case the "mode:" of the holder is eXclusive
(i.e. object=7000000eedc13a0, mode=X). Search back up to the top
of this process to find which process is holding the resource.
Waits for library cache pins are of the form" waiting for 'cursor: pin S wait on X’”
To find more details use the idn=XXXXXX to search down in the systemstate (idn=535d1a6c)
• SID 3094 holds the Mutex (3094,0)
• Request is for Shared (GET_SHRD) mode
Library Cache Pins
Systemstate Dumps
PROCESS 16:
waiting for 'cursor: pin S wait on X' blocking sess=0x0 seq=58849 wait_time=0 seconds since wait started=0
idn=535d1a6c, value=c1600000000, where|sleeps=5003f2428
KGX Atomic Operation Log 7000002e5b9d160
Mutex 7000002b8e92268(3094, 0) idn 535d1a6c oper GET_SHRD
Cursor Pin uid 2489 efd 0 whr 5 slp 58733
opr=2 pso=70000028c47def0 flg=0
pcs=7000002b8e92268 nxt=0 flg=34 cld=3 hd=70000030d6c6eb0 par=7000002eefe64d0
ct=31 hsh=0 unp=0 unn=0 hvl=b825a4d0 nhv=1 ses=700000309b42600
hep=7000002b8e922e8 flg=80 ld=1 ob=7000002de49f8a0 ptr=70000022cf39db8 fex=70000022cf390c8
To find the HOLDER, search for idn XXXXXXX oper until you find one which is held (ie not GET_XXX)
( idn 535d1a6c oper):
• SID 3094 holds Mutex in Exclusive (EXCL)
Library Cache Pins
Systemstate Dumps
KGX Atomic Operation Log 7000002cd934270
Mutex 7000002b8e92268(3094, 0) idn 535d1a6c oper EXCL
Cursor Pin uid 3094 efd 0 whr 7 slp 0
opr=3 pso=7000002a71c4180 flg=0
pcs=7000002b8e92268 nxt=0 flg=34 cld=3 hd=70000030d6c6eb0 par=7000002eefe64d0
ct=31 hsh=0 unp=0 unn=0 hvl=b825a4d0 nhv=1 ses=700000309b42600
hep=7000002b8e922e8 flg=80 ld=1 ob=7000002de49f8a0 ptr=70000022cf39db8 fex=70000022cf390c8
To find more details use the handle address in the form handle=address to search down in the
systemstate (ie handle=70000030de975a8)
• Exclusive (X) Requested
• <USER_NAME>.<OBJECT_NAME> is the object we are trying to lock
Library Cache Lock
Systemstate Dumps
PROCESS 35:
waiting for 'library cache lock' blocking sess=0x0 seq=35844 wait_time=0 seconds since wait started=14615
handle address=70000030de975a8, lock address=70000026947e190, 100*mode+namespace=12d
SO: 70000026947e190, type: 53, owner: 700000308d726f0, flag: INIT/-/-/0x00
LIBRARY OBJECT LOCK: lock=70000026947e190 handle=70000030de975a8 request=X
call pin=0 session pin=0 hpc=0000 hlc=0000
htl=70000026947e210[7000002b333ffe8,7000002b333ffe8] htb=7000002b333ffe8 ssga=7000002b333f2a0
user=700000307a7ca68 session=700000307a7ca68 count=0 flags=[0000] savepoint=0x23e411
LIBRARY OBJECT HANDLE: handle=70000030de975a8 mtx=70000030de976d8(0) cdp=0
name=<USER_NAME>.<OBJECT_NAME>
To find the HOLDER, search for 'handle=XXXXXXXXXX mode=' until you find one which is held (but not
in NULL)( handle=70000030de975a8 mode=)
• Hold in Shared (S)
• name=<USER_NAME>.<OBJECT_NAME> confirms the object name
Library Cache Lock
Systemstate Dumps
SO: 700000288b03ae0, type: 53, owner: 7000002cc697468, flag: INIT/-/-/0x00
LIBRARY OBJECT LOCK: lock=700000288b03ae0 handle=70000030de975a8 mode=S
call pin=0 session pin=0 hpc=0000 hlc=0000
htl=700000288b03b60[7000002a179a1a8,7000002b3800878] htb=7000002b3800878 ssga=7000002b37ffb30
user=70000030fafab00 session=70000030fafab00 count=1 flags=[0000] savepoint=0x417
LIBRARY OBJECT HANDLE: handle=70000030de975a8 mtx=70000030de976d8(0) cdp=0
name=<USER_NAME>.<OBJECT_NAME>
• 9d is the latch# (in HEX = 157) from v$latchname
Towards the top of the PROCESS dump you will see the exact latch we are waiting for and even who holds it:
• PROCESS 127 (ospid:23086) holds the latch, PROCESS 127 shows:
Latch free
Systemstate Dumps
PROCESS 8:
waiting for 'latch free' blocking sess=0x0 seq=4577 wait_time=0
address=99ff60018, number=9d, tries=0
waiting for 99ff60018 Child library cache level=5 child#=3
Location from where latch is held: kglic: child
Context saved from call: 26
state=busy
possible holder pid = 127 ospid=23086
wtr=99ff60018, next waiter 9993858b8
holding 99ff60018 Child library cache level=5 child#=3
Location from where latch is held: kglic: child
Context saved from call: 26
state=busy
If you want to find which object a handle refers to then use the handle=XXXXXXXXXX until you come across
the LIBRARY OBJECT HANDLE. ie handle=c00000006c0f8490:-
• name shows the name of the handle
• Namespace=CRSR show the that it is of type CURSOR
Other useful information
Systemstate Dumps
LIBRARY OBJECT HANDLE: handle=c00000006c0f8490
name=SELECT USER FROM DUAL
hash=cd1ceca0 timestamp=03-23-2007 09:00:00
namespace=CRSR flags=RON/TIM/PN0/SML/[12010000]
ADDM in a multitenant environment
Starting with Oracle Database 12c, ADDM is enabled by default in the root
container of a multitenant container database (CDB)
You can also use ADDM in a pluggable database (PDB)
• In a CDB, ADDM works in the same way as it works in a non-CDB
• ADDM analysis is performed each time an AWR snapshot is taken on a CDB root or a
PDB
• ADDM does not work in a PDB by default, because automatic AWR snapshots are
disabled
ADDM in a multitenant environment
To enable ADDM in a PDB:
Set the AWR_PDB_AUTOFLUSH_ENABLED initialization parameter to TRUE in the
PDB using the following command:
Set the AWR snapshot interval greater than 0 in the PDB using the command as
shown in the following example:
Results on a PDB provide only PDB-specific findings and recommendations
ADDM in a multitenant environment
SQL> ALTER SYSTEM SET AWR_PDB_AUTOFLUSH_ENABLED=TRUE;
SQL> EXEC
dbms_workload_repository.modify_snapshot_settings(interval=>60);
Analyze logs and look for errors
Investigate logs and look for errors
tfactl analyze -since 1d
INFO: analyzing all (Alert and Unix System Logs) logs for the last 1440
minutes...
...
Unique error messages for last ~1 day(s)
Occurrences percent server name error
----------- ------- -------------------- -----
1 100.0% myserver1 Errors in file
/u01/oracle/diag/rdbms/orcl2/orcl2/trace/orcl2_ora_12272.trc
(incident=10151):
ORA-00600: internal error code, arguments: [600], [], [], [], [], [], [], [],
[], [], [], []
Incident details in:
/u01/oracle/diag/rdbms/orcl2/orcl2/incident/incdir_10151/orcl2_ora_12272_i101
51.trc
...
Investigate logs and look for errors
tfactl analyze -search "ORA-04031" -last 1d
INFO: analyzing all (Alert and Unix System Logs) logs for the last 1440
minutes...
...
Matching regex: ORA-04031
Case sensitive: false
Match count: 1
[Source: /u01/oracle/diag/rdbms/orcl2/orcl2/trace/alert_orcl2.log, Line: 1941]
Oct 01 12:09:05 2020
Errors in file /u01/oracle/diag/rdbms/orcl2/orcl2/trace/orcl2_ora_6982.trc
(incident=7665):
ORA-04031: unable to allocate bytes of shared memory ("","","","")
Incident details in:
/u01/app/oracle/diag/rdbms/orcl2/orcl2/incident/incdir_7665/orcl2_ora_6982_i76
65.trc
...
Examples
tfactl analyze -since 5h
#Show summary of events from alert logs,
system messages in last 5 hours
tfactl analyze -comp os -since 1d
#Show summary of events from system
messages in last 1 day
tfactl analyze -search "ORA-" -since 2d
#Search string ORA- in alert and system
logs in past 2 days
tfactl analyze -search "/Starting/c" -
since 2d
#Search case sensitive string "Starting"
in past 2 days
tfactl analyze -comp os -for ”Oct/01/2020
11" -search "."
#Show all system log messages at time
Oct/01/2020 11
tfactl analyze -comp osw -since 6h
#Show OSWatcher Top summary in last 6
hours
tfactl analyze -comp oswslabinfo -from
”Oct/01/2020 05:00:01" -to ”Oct/01/2020
06:00:01"
#Show OSWatcher slabinfo summary for
specified time period
tfactl analyze -since 1h -type generic
#Analyze all generic messages in last one
hour
Investigate logs and look for errors
$ ./tfactl analyze -type generic -since 7d
INFO: analyzing all (Alert and Unix System Logs) logs for the last 10080 minutes...
...
Total message count: 54,807, from 01-Oct-2020 02:41:34 PM PST to
08-Oct-2020 02:41:34
Messages matching last ~7 day(s): 3,139, from 02-Oct-2020 02:46:23 PM PST to
08-Oct-2020 02:41:34
last ~7 day(s) generic count: 3,139, from 06-Oct-2020 02:46:23 PM PST to
08-Oct-2020 02:41:34
last ~7 day(s) unique generic count: 94
Message types for last ~7 day(s)
Occurrences percent server name type
----------- ------- -------------------- -----
3,139 100.0% myhost1 generic
...
Investigate logs and look for errors
Unique generic messages for last ~7 day(s)
Occurrences percent server name generic
----------- ------- -------------------- -----
1,504 47.9% myhost1 : [crflogd(13931)]CRS-9520:The storage of Grid
Infrastructure Managem...
487 15.5% myhost1 : [crflogd(13931)]CRS-9520:The storage of Grid
Infrastructure Managem...
336 10.7% myhost1 myhost1 smartd[13812]: Device: /dev/sdv, SMART
Failure: FAILURE...
336 10.7% myhost1 myhost1 smartd[13812]: Device: /dev/sdag, SMART
Failure: FAILURE ...
103 3.3% myhost1 myhost1 last message repeated 9 times
103 3.3% myhost1 myhost1 kernel: oracle: sending ioctl 2285 to a
partition!
...snipping for brevity...
Pattern match search output
tfactl analyze -search "ORA-" -since 7d
...
[Source: /u01/app/oracle/diag/rdbms/ratoda/RATODA1/trace/alert_RATODA1.log, Line:
9494]
Feb 25 22:00:02 2014
Errors in file
/u01/app/oracle/diag/rdbms/ratoda/RATODA1/trace/RATODA1_j003_10948.trc:
ORA-12012: error on auto execute of job "ORACLE_OCM"."MGMT_CONFIG_JOB_2_1"
ORA-29280: invalid directory path
ORA-06512: at "ORACLE_OCM.MGMT_DB_LL_METRICS", line 2436
ORA-06512: at line 1
End automatic SQL Tuning Advisor run for special tuning task
"SYS_AUTO_SQL_TUNING_TASK”
...
OS Watcher top data
tfactl analyze -comp osw -since 6h
...
statistic: t first highest (time) lowest (time) average non zero 3rd last 2nd last last trend
top.cpu.util.id: % 98.0 99.7 @10:35AM 72.8 @03:11PM 97.3 2,059 95.2 96.8 96.0 -2%
top.cpu.util.st: % 0.1 0.1 @09:14AM 0.0 @09:14AM 0.0 889 0.0 0.0 0.0 -100%
top.cpu.util.us: % 0.1 8.8 @11:31AM 0.0 @09:14AM 0.6 1,966 4.3 0.8 3.4 3300%
top.cpu.util.wa: % 1.7 18.7 @03:11PM 0.1 @10:35AM 1.1 2,059 0.3 0.4 0.4 -76%
top.loadavg.last01min: 1.17 3.12 @09:44AM 0.07 @12:45PM 0.93 1,823 0.31 0.26 0.22 -81%
top.loadavg.last05min: 0.94 2.26 @09:44AM 0.27 @12:45PM 0.93 1,823 0.82 0.79 0.77 -18%
top.loadavg.last15min: 0.79 1.60 @09:46AM 0.44 @01:18PM 0.92 1,823 0.96 0.95 0.94 18%
top.mem.buffers: k 808232 808388 @09:41AM 785608 @02:57PM 796511 2,093 785744 785744 785744 -2%
top.mem.free: k 1130332 1291344 @10:02AM 927576 @09:43AM 1188576 2,093 1244020 1265248 1265188 11%
top.swap.used: k 47556 48088 @03:00PM 47556 @09:14AM 47828 2,097 48088 48088 48088 1%
top.tasks.running: 1 4 @12:04PM 1 @09:14AM 1 1,996 1 2 2 100%
top.tasks.total: 514 527 @02:57PM 509 @09:18AM 514 1,996 518 521 520 1%
top.tasks.zombie: 0 5 @11:04AM 0 @09:14AM 0 62 0 0 0 n/a
top.users: 5 6 @03:00PM 5 @09:14AM 5 1,823 6 6 6 20%
...
OS Watcher slabinfo data
tfactl analyze -comp oswslabinfo -from ”Oct/01/2020 05:00:01" -to ”Oct/01/2020 06:00:01"
...
statistic: t first highest (time) lowest (time) average non zero 3rd last 2nd last last trend
slabinfo.acfs_ccb_cache.active_objs: 4 38 @05:52AM 0 @05:01AM 10 294 3 1 8 100%
slabinfo.inet_peer_cache.active_objs: 23 39 @05:59AM 23 @05:00AM 23 351 23 23 39 69%
slabinfo.sigqueue.active_objs: 385 768 @05:28AM 285 @05:27AM 554 351 712 621 577 49%
slabinfo.skbuff_fclone_cache.active_objs: 55 133 @05:51AM 11 @05:20AM 69 351 56 77 70 27%
slabinfo.names_cache.active_objs: 126 180 @05:00AM 110 @05:23AM 146 351 171 166 156 23%
slabinfo.sgpool-8.active_objs: 135 228 @05:31AM 59 @05:11AM 152 351 180 165 157 16%
slabinfo.UDP.active_objs: 568 675 @05:28AM 492 @05:17AM 597 351 630 596 626 10%
slabinfo.size-8192.active_objs: 174 209 @05:36AM 160 @05:14AM 181 351 205 187 188 8%
slabinfo.task_delay_info.active_objs: 1477 1856 @05:28AM 1334 @05:57AM 1574 351 1529 1411 1579 6%
slabinfo.pid.active_objs: 1608 1980 @05:29AM 1452 @05:21AM 1678 351 1564 1487 1689 5%
slabinfo.blkdev_requests.active_objs: 720 880 @05:04AM 651 @05:54AM 745 351 707 736 761 5%
slabinfo.size-256.active_objs: 1116 1305 @05:06AM 846 @05:11AM 1091 351 1245 1143 1166 4%
slabinfo.ip_dst_cache.active_objs: 1497 1800 @05:28AM 1279 @05:36AM 1517 351 1594 1466 1560 4%
slabinfo.sock_inode_cache.active_objs: 2168 2329 @05:11AM 2106 @05:56AM 2225 351 2322 2278 2232 2%
slabinfo.size-512.active_objs: 3036 3152 @05:38AM 3007 @05:01AM 3088 351 3136 3112 3075 1%
...
How to connect to a hung
database for diagnostics
How do you connect to a database when connections are hanging?
• sqlplus preliminary connection will connect to database since no session is
created
• You will have limited access to the SGA
• This will help in capturing diagnostic information like a systemstate dump
• Two ways to connect to sqlplus using a preliminary connection:
or
sqlplus -prelim
sqlplus -prelim / as sysdba
SQL> set _prelim on
SQL> connect / as sysdba
Prelim connection established
Analyze a hung database
Always on - Enabled by default
Reliably detects database hangs and deadlocks
Autonomously resolves them
Logs all detections and resolutions
New SQL interface to configure sensitivity (Normal/High)
and trace file sizes
Oracle Hang Manager
Session
DIA0
EVALUATE
DETECT
ANALYZE
Hung?
VERIFY
Victim
Policy
Monitors Session snapshots for progress
Evaluates potential hangs over time with
based upon Wait Graphs
Analyzes hang chain of sessions to
identify blocker/victim
Discovers blocker is located in ASM
instance
Requests ASM terminate session or
instance relying on Flex ASM for recovery
Detection and resolution is bi-directional
Database Hang Management - Infrastructure
Database
ASM
Full Resolution Dump Trace File and DB Alert Log Audit Reports
Oracle 12c Hang Manager
Dump file …/diag/rdbms/hm6/hm62/incident/incdir_5753/hm62_dia0_12656_i5753.trc
Oracle Database 12c Enterprise Edition Release 18/19c.0.0.0 - 64bit Beta
With the Partitioning, Real Application Clusters, OLAP, Advanced Analytics
and Real Application Testing options
Build label: RDBMS_MAIN_LINUX.X64_151013
ORACLE_HOME: …/3775268204/oracle
System name: Linux
Node name: slc05kyr
Release: 2.6.39-400.211.1.el6uek.x86_64
Version: #1 SMP Fri Nov 15 13:39:16 PST 2013
Machine: x86_64
VM name: Xen Version: 3.4 (PVM)
Instance name: hm62
Redo thread mounted by this instance: 2
Oracle process number: 19
Unix process pid: 12656, image: oracle@slc05kyr (DIA0)
*** 2020-10-01T16:47:59.541509+17:00
*** SESSION ID:(96.41299) 2020-10-01T16:47:59.541519+17:00
*** CLIENT ID:() 2020-10-01T16:47:59.541529+17:00
*** SERVICE NAME:(SYS$BACKGROUND) 2020-10-01T16:47:59.541538+17:00
*** MODULE NAME:() 2020-10-01T16:47:59.541547+17:00
*** ACTION NAME:() 2020-10-01T16:47:59.541556+17:00
*** CLIENT DRIVER:() 2020-10-01T3T16:47:59.541565+17:00
Full Resolution Dump Trace File and DB Alert Log Audit Reports
Oracle 12c Hang Manager
2020-10-01T16:47:59.435039+17:00
Errors in file /oracle/log/diag/rdbms/hm6/hm6/trace/hm6_dia0_12433.trc (incident=7353):
ORA-32701: Possible hangs up to hang ID=1 detected
Incident details in: …/diag/rdbms/hm6/hm6/incident/incdir_7353/hm6_dia0_12433_i7353.trc
2020-10-01T16:47:59.506775+17:00
DIA0 requesting termination of session sid:40 with serial # 43179 (ospid:13031) on instance 2
due to a GLOBAL, HIGH confidence hang with ID=1.
Hang Resolution Reason: Automatic hang resolution was performed to free a
significant number of affected sessions.
DIA0: Examine the alert log on instance 2 for session termination status of hang with ID=1.
In the alert log on the instance local to the session (instance 2 in this case),
we see the following:
2020-10-01T16:47:59.538673+17:00
Errors in file …/diag/rdbms/hm6/hm62/trace/hm62_dia0_12656.trc (incident=5753):
ORA-32701: Possible hangs up to hang ID=1 detected
Incident details in: …/diag/rdbms/hm6/hm62/incident/incdir_5753/hm62_dia0_12656_i5753.trc
2020-10-01T16:48:04.222661+17:00
DIA0 terminating blocker (ospid: 13031 sid: 40 ser#: 43179) of hang with ID = 1
requested by master DIA0 process on instance 1
Hang Resolution Reason: Automatic hang resolution was performed to free a
significant number of affected sessions.
by terminating session sid:40 with serial # 43179 (ospid:13031)
Guided resolution with Oracle Support
Oracle Database ORA-00060 Errors on Single Instance (Non-RAC) Diagnosing
Using Deadlock Graphs in ORA-00060 Trace Files (Doc ID 1550091.2)
Troubleshooting Assistant
https://support.oracle.com/epmos/faces/DocContentDisplay?id=1550091.2
Oracle Database ORA-00060 Errors on Single Instance (Non-RAC) Diagnosing
Using Deadlock Graphs in ORA-00060 Trace Files (Doc ID 1550091.2)
Troubleshooting Assistant
Understand and Troubleshoot Startup/Shutdown Issues (Doc ID 1591095.2)
Troubleshooting Assistant
https://support.oracle.com/epmos/faces/DocContentDisplay?id=1591095.2
Understand and Troubleshoot Startup/Shutdown Issues (Doc ID 1591095.2)
Troubleshooting Assistant
Understand and Troubleshoot Startup/Shutdown Issues (Doc ID 1591095.2)
Troubleshooting Assistant
Understand and Troubleshoot Startup/Shutdown Issues (Doc ID 1591095.2)
Troubleshooting Assistant
Oracle Undo Management (ORA-01555, ORA-30036, ORA-01628,
ORA-01552, etc.) (Doc ID 1575667.2)
Troubleshooting Assistant
https://support.oracle.com/epmos/faces/DocContentDisplay?id=1575667.2
Oracle Undo Management (ORA-01555, ORA-30036, ORA-01628,
ORA-01552, etc.) (Doc ID 1575667.2)
Troubleshooting Assistant
Oracle Undo Management (ORA-01555, ORA-30036, ORA-01628,
ORA-01552, etc.) (Doc ID 1575667.2)
Troubleshooting Assistant
Handling Block Corruptions in Oracle7 / 8 / 8i / 9i / 10g / 11g (Doc ID 1598103.2)
Troubleshooting Assistant
https://support.oracle.com/epmos/faces/DocContentDisplay?id=1598103.2
Handling Block Corruptions in Oracle7 / 8 / 8i / 9i / 10g / 11g (Doc ID 1598103.2)
Troubleshooting Assistant
Handling Block Corruptions in Oracle7 / 8 / 8i / 9i / 10g / 11g (Doc ID 1598103.2)
Troubleshooting Assistant
SQLHC – Healthcheck for SQL
Health Check
SQL
https://support.oracle.com/epmos/faces/DocContentDisplay?id=1366133.1
1. Login to the database server and set the environment used by the Database Instance
2. Download the "sqlhc.zip" archive file and extract the contents to a suitable directory/folder
3. Connect into SQL*Plus as SYS, a DBA account, or a user with access to Data Dictionary views
and simply execute the "sqlhc.sql" script. It will request to enter two parameters:
i. Oracle Pack License (Tuning, Diagnostics or None) [T|D|N] (required)
ii. A valid SQL_ID for the SQL to be analyzed.
If site has both Tuning and Diagnostics licenses then specify T
(Oracle Tuning pack includes Oracle Diagnostics)
For Example:
Health Check
SQL
# sqlplus / as sysdba
SQL> START sqlhc.sql T djkbyr8vkc64h
Health Check
SQL
Query trace files using SQL
SQL> describe V$DIAG_TRACE_FILE
Name Null? Type
----------------------------------------- -------- ----------------------------
ADR_HOME VARCHAR2(444)
TRACE_FILENAME VARCHAR2(68)
CHANGE_TIME TIMESTAMP(3) WITH TIME ZONE
MODIFY_TIME TIMESTAMP(3) WITH TIME ZONE
CON_ID NUMBER
V$DIAG_TRACE_FILE and V$DIAG_TRACE_FILE_CONTENTS
SQL> describe V$DIAG_TRACE_FILE_CONTENTS
Name Null? Type
----------------------------------------- -------- ----------------------------
ADR_HOME VARCHAR2(444)
TRACE_FILENAME VARCHAR2(68)
RECORD_LEVEL NUMBER
PARENT_LEVEL NUMBER
RECORD_TYPE NUMBER
TIMESTAMP TIMESTAMP(3) WITH TIME ZONE
PAYLOAD VARCHAR2(4000)
SECTION_ID NUMBER
SECTION_NAME VARCHAR2(64)
COMPONENT_NAME VARCHAR2(64)
OPERATION_NAME VARCHAR2(64)
FILE_NAME VARCHAR2(64)
FUNCTION_NAME VARCHAR2(64)
LINE_NUMBER NUMBER
THREAD_ID VARCHAR2(64)
SESSION_ID NUMBER
SERIAL# NUMBER
CON_UID NUMBER
CONTAINER_NAME VARCHAR2(64)
CON_ID NUMBER
V$DIAG_TRACE_FILE and V$DIAG_TRACE_FILE_CONTENTS
SQL> select trace_filename from v$diag_trace_file;
TRACE_FILENAME
--------------------------------------------------------------------
ORCL1_mz00_21108.trc
ORCL1_gcr2_16504.trc
ORCL1_gcr3_12849.trc
ORCL1_gcr1_28159.trc
ORCL1_gcr1_27603.trc
ORCL1_gcr0_29971.trc
ORCL1_mz00_26487.trc
ORCL1_mz00_28329.trc
ORCL1_ora_19005.trc
ORCL1_gcr3_12879.trc
ORCL1_gcr1_11688.trc
V$DIAG_TRACE_FILE and V$DIAG_TRACE_FILE_CONTENTS
SQL> select payload from v$diag_trace_file_contents where trace_filename ='ORCL1_ora_19005.trc';
PAYLOAD
--------------------------------------------------------------------------------
Trace file /u01/app/oracle/diag/rdbms/orcl_unq/ORCL1/trace/ORCL1_ora_19005.trc
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.2.0.0.0
Build label: RDBMS_19.2.0.0.0_LINUX.X64_190121
ORACLE_HOME: /u01/app/oracle/product/19c/dbhome_1
System name: Linux
Node name: myserver65
Release: 4.14.35-1844.1.3.el7uek.x86_64
Version: #2 SMP Wed Jan 2 21:18:29 PST 2019
Machine: x86_64
VM name: Xen Version: 4.1 (HVM)
...
V$DIAG_TRACE_FILE and V$DIAG_TRACE_FILE_CONTENTS
...
PAYLOAD
--------------------------------------------------------------------------------
Instance name: ORCL1
Redo thread mounted by this instance: 1
Oracle process number: 12
Unix process pid: 19005, image: oracle@myserver65 (TNS V1-V3)
*** 2020-10-01T01:22:10.770960+00:00
*** SESSION ID:(106.17196) 2020-10-01T01:22:10.771014+00:00
*** CLIENT ID:() 2020-10-01T01:22:10.771027+00:00
*** SERVICE NAME:(SYS$USERS) 2020-10-01T01:22:10.771039+00:00
...
V$DIAG_TRACE_FILE and V$DIAG_TRACE_FILE_CONTENTS
SQL> describe V$DIAG_SESS_SQL_TRACE_RECORDS
Name Null? Type
----------------------------------------- -------- ----------------------------
ADR_HOME VARCHAR2(444)
TRACE_FILENAME VARCHAR2(68)
RECORD_LEVEL NUMBER
PARENT_LEVEL NUMBER
RECORD_TYPE NUMBER
TIMESTAMP TIMESTAMP(3) WITH TIME ZONE
PAYLOAD VARCHAR2(4000)
SECTION_ID NUMBER
SECTION_NAME VARCHAR2(64)
COMPONENT_NAME VARCHAR2(64)
OPERATION_NAME VARCHAR2(64)
FILE_NAME VARCHAR2(64)
FUNCTION_NAME VARCHAR2(64)
LINE_NUMBER NUMBER
THREAD_ID VARCHAR2(64)
SESSION_ID NUMBER
SERIAL# NUMBER
CON_UID NUMBER
CONTAINER_NAME VARCHAR2(64)
CON_ID NUMBER
V$DIAG_SESS_SQL_TRACE_RECORDS
SQL> SELECT sid,serial# FROM v$session WHERE username = 'SYS’;
SID SERIAL#
---------- ----------
33 45888
129 6051
SQL> EXECUTE DBMS_SYSTEM.SET_SQL_TRACE_IN_SESSION(129,6051,TRUE);
PL/SQL procedure successfully completed.
V$DIAG_SESS_SQL_TRACE_RECORDS
Enable session tracing
SQL> select unique trace_filename from V$DIAG_SESS_SQL_TRACE_RECORDS;
TRACE_FILENAME
--------------------------------------------------------------------
ORCL1_ora_14151.trc
SQL> select payload from V$DIAG_SESS_SQL_TRACE_RECORDS where trace_filename = 'ORCL1_ora_14151.trc';
PAYLOAD
--------------------------------------------------------------------------------
CLOSE #140506358472544:c=19,e=18,dep=0,type=1,tim=7769230586778
=====================
PARSING IN CURSOR #140506358494608 len=97 dep=1 uid=0 oct=3 lid=0 tim=7769230600
163 hv=791757000 ad='7fa0c290' sqlid='87gaftwrm2h68'
select o.owner#,o.name,o.namespace,o.remoteowner,o.linkname,o.subname from obj$
o where o.obj#=:1
END OF STMT
EXEC #140506358494608:c=65,e=65,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=4,plh=107238262
4,tim=7769230600159
...
V$DIAG_SESS_SQL_TRACE_RECORDS
...
PAYLOAD
--------------------------------------------------------------------------------
FETCH #140506358494608:c=38,e=37,p=0,cr=2,cu=0,mis=0,r=0,dep=1,og=4,plh=10723826
24,tim=7769230600324
CLOSE #140506358494608:c=5,e=4,dep=1,type=3,tim=7769230600381
EXEC #140506358494608:c=23,e=23,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=4,plh=107238262
4,tim=7769230600500
FETCH #140506358494608:c=11,e=12,p=0,cr=2,cu=0,mis=0,r=0,dep=1,og=4,plh=10723826
24,tim=7769230600547
...
V$DIAG_SESS_SQL_TRACE_RECORDS
SQL> EXECUTE DBMS_SYSTEM.SET_SQL_TRACE_IN_SESSION(129,6051,FALSE);
PL/SQL procedure successfully completed.
Keep track of the attribute of
import file pre and post patching
Start tracking using –fileattr start
Automatically discovers Grid Infrastructure and Database directories and files
• Prevent discovery using –excludediscovery
Further configure the list of monitored directories using –includedir
Track attribute changes on important files
tfactl <orachk|exachk> -fileattr start -includedir "/root/myapp/config"
...
List of directories(recursive) for checking file attributes:
/u01/app/oradb/product/11.2.0/dbhome_11203
/u01/app/oradb/product/11.2.0/dbhome_11204
/root/myapp/config
orachk has taken snapshot of file attributes for above directories at:
/orahome/oradb/orachk/orachk_mysrv21_20201001_041214
Compare current attributes against first snapshot using –fileattr check
When checking, use the same include/exclude arguments you started with
Track attribute changes on important files
tfactl <orachk|exachk> -fileattr check -includedir "/root/myapp/config”
...
List of directories(recursive) for checking file attributes:
/u01/app/oradb/product/11.2.0/dbhome_11203
/u01/app/oradb/product/11.2.0/dbhome_11204
/root/myapp/config
Checking file attribute changes...
"/root/myapp/config/myappconfig.xml" is different:
Baseline : 0644 oracle root
/root/myapp/config/myappconfig.xml
Current : 0644 root root
/root/myapp/config/myappconfig.xml
...
Automatically proceeds to run compliance checks after file attribute checks
• Only run attribute checks by using -fileattronly
File Attribute Changes are shown in HTML report output
Track attribute changes on important files
Event notification
Automatically running critical checks every two hours and full checks once a day at 2am
• You only need to configure your email for notification
ORAchk | EXAchk email notification
tfactl <orachk|exachk> -set “NOTIFICATION_EMAIL=SOME.BODY@COMPANY.COM
TFA can send email notification when faults are detected
• Notification for all problems:
• Notification for all problems on database owned by oracle user:
• Optionally configure an SMTP server:
• Confirm email notification work:
Critical event notification
tfactl set notificationAddress=some.body@example.com
tfactl set notificationAddress=oracle:another.person@example.com
tfactl set smtp
tfactl sendmail <email_address>
Critical event notification
Event: ORA-29770
Event time: Thu Oct
01 07:13:09 PDT 2020
File containing
event:
/u01/app/oracle/diag
/rdbms/orcl/orcl/tra
ce/alert_orcl.log
Logs will be
collected at:
/opt/oracle.ahf/data
/repository/auto_srd
c_ORA-
29770_2020_10_01:09_
myserver1.zip
Critical event notification
Symptom
LCK0 (ospid:NNNN)
has not called a
wait for <n_secs>
secs.
Call stack:
ksedsts <-
kjzdssdmp <-
kjzduptcctx <-
kjzdicrshnfy <-
ksuitm <-
kjgcr_KillInstance
<- kjgcr_Main <-
kjfmlmhb_Main <-
ksbrdp
Critical event notification
Action
Apply the one-off
patch 18795105 to
resolve this issue
For further
information see
Doc :1998445.1 and
Doc :18795105.8
Cause
Instance crash due
to ORA-29770 LCK0
hung
Critical event notification
Evidence
Orcl_lmhb_23242.trc
(15):
ksedsts()+465<-
kjzdssdmp()+267<-
kjzduptcctx()+232<-
kjzdicrshnfy()+63<-
ksuitm()+5570<-
kjgcr_KillInstance()
+125
alert_orcl.log(140):
ORA-29770: global
enqueue process LMS0
(OSID 11912) is hung
for more than 70
seconds
Self analysis in MOS using TFA collections
tfactl diagcollect –srdc <srdc_type>
• Scans system to identify recent events
• Once the relevant event is chosen, proceeds with diagnostic collection
One command SRDC
tfactl diagcollect -srdc ORA-00600
Enter the time of the ORA-00600 [YYYY-MM-DD HH24:MI:SS,<RETURN>=ALL] :
Enter the Database Name [<RETURN>=ALL] :
1. Oct/01/2020 05:29:58 : [orcl2] ORA-00600: internal error code,
arguments: [600], [], [], [], [], [], [], [], [], [], [], []
2. Oct/01/2020 06:55:08 : [orcl2] ORA-00600: internal error code,
arguments: [600], [], [], [], [], [], [], [], [], [], [], []
Please choose the event : 1-2 [1]
Selected value is : 1 (Oct/01/2020 05:29:58 )
All required files are identified
• Trimmed where applicable
• Package in a zip ready to provide to support
One command SRDC
...
2020/10/01 06:14:24 EST : Getting List of Files to Collect
2020/10/01 06:14:27 EST : Trimming file :
myserver1/rdbms/orcl2/orcl2/trace/orcl2_lmhb_3542.trc with original file
size : 163MB
...
2020/10/01 06:14:58 EST : Total time taken : 39s
2020/10/01 06:14:58 EST : Completed collection of zip files.
...
/opt/oracle.ahf/data/repository/srdc_ora600_collection_Tue_Sep_7_06_14_17
_EST_2020_node_local/myserver1.tfa_srdc_ora600_Thu_Oct_1_06_14_17_EST_202
0.zip
AWR scripts
Collects, processes, and maintains performance statistics for problem detection and self-tuning purposes
Gathered data is stored both in memory and in the database, and is displayed in both reports and views
Automatic Workload Repository (AWR)
The statistics collected and processed by AWR include:
• Object statistics that determine both access and usage
statistics of database segments
• Time model statistics based on time usage for activities,
displayed in the V$SYS_TIME_MODEL and
V$SESS_TIME_MODEL views
• Some of the system and session statistics collected in
the V$SYSSTAT and V$SESSTAT views
• SQL statements that are producing the highest load on
the system, based on criteria such as elapsed time and
CPU time
• Active Session History (ASH) statistics, representing the
history of recent sessions activity
Create an AWR snapshot
Run your workload
Create an AWR snapshot
Generate report for the time period
Generating an AWR Report
SQL> EXECUTE DBMS_WORKLOAD_REPOSITORY.CREATE_SNAPSHOT()
SQL> EXECUTE DBMS_WORKLOAD_REPOSITORY.CREATE_SNAPSHOT()
SQL> @$ORACLE_HOME/rdbms/admin/awrrpt.sql
Generating an AWR Compare Periods Report for the Local Database
Generating an AWR Compare Periods Report for a Specific Database
To generate an AWR Compare Periods report for Oracle RAC on the local database instance
To generate an AWR Compare Periods report for Oracle RAC on a specific database
To generate a Global AWR report for RAC
To generate a SQL Statement report
Information on the AWR Repository
AWR Scripts
SQL> @$ORACLE_HOME/rdbms/admin/awrddrpt.sql
SQL> @$ORACLE_HOME/rdbms/admin/awrddrpi.sql
SQL> @$ORACLE_HOME/rdbms/admin/awrgdrpt.sql
SQL> @$ORACLE_HOME/rdbms/admin/awrgdrpi.sql
SQL> @$ORACLE_HOME/rdbms/admin/awrgrpt.sql
SQL> @$ORACLE_HOME/rdbms/admin/awrsqrpt.sql
SQL> @$ORACLE_HOME/rdbms/admin/awrinfo.sql
Sanitize sensitive information
Sensitive information can be hidden from diagnostics
Machine learning algorithms determine sensitive data like:
• Host names
• IP addresses
• MAC addresses
• Oracle Database names
• Tablespace names
• Service names
• Ports
• Operating system user names
Sanitize or mask sensitive information
Add –sanitize or –mask to any command
• –sanitize replaces a sensitive value with random characters
• myhost123 >>>> JnsF3km9
• –mask replaces a sensitive value with a series of ‘X’
• myhost123 >>>> XXXXXXXX
Sanitize or mask sensitive information
Sanitized hostname
Sanitized hostname
tfactl orachk –preupgrade -sanitize
tfactl orachk -rmap qzh024703246tsa1
TFA using ORAchk : /opt/oracle.ahf/orachk/orachk
___________________________________________________________________________
| Entity Type | Substituted Entity Name | Original Entity Name |
___________________________________________________________________________
| hostname | qzh024703246tsa1 | myserver1 |
___________________________________________________________________________
Reverse map the sanitization
Repair compliance drift
Sanitized hostname
Repair command
Check ID
Repair command
Check IDCheck ID
Repair command
Understand what the repair command does
Understand what the repair command will do with:
tfactl orachk -showrepair 8300E0A2FFE48253E053D298EB0A76CC
TFA using ORAchk : /opt/oracle.ahf/orachk/orachk
Repair Command:
currentUserName=$(whoami)
if [ "$currentUserName" = "root" ]
then
repair_report=$(rpm -e stix-fonts 2>&1)
else
repair_report="$currentUserName does not have priviedges to run
$CRS_HOME/bin/crsctl set resource use 1"
fi
echo -e "$repair_report"
Run the repair command
Run the checks again and repair everything that fails
Run the checks again and repair only the specified checks
Run the checks again and repair all checks listed in the file
tfactl orachk -repaircheck all
tfactl orachk -repaircheck <check_id_1>,<check_id_2>
tfactl orachk -repaircheck <file>
Find if anything has changed
tfactl changes
Output from host : myserver69
------------------------------
[Oct/01/2020 04:54:15.397]: Parameter: fs.aio-nr: Value: 95488 => 97024
[Oct/01/2020 04:54:15.397]: Parameter: fs.inode-nr: Value: 764974 131561 => 740744
131259
[Oct/01/2020 04:54:15.397]: Parameter: kernel.pty.nr: Value: 2 => 1
[Oct/01/2020 04:54:15.397]: Parameter: kernel.random.entropy_avail: Value: 189 =>
158
[Oct/01/2020 04:54:15.397]: Parameter: kernel.random.uuid: Value: 36269877-9bc9-
40a3-82e0-1619865096f2 => 7551c5e7-c59f-40fa-b55f-5bd170e8b1ab
[Oct/01/2020 05:46:15.397]: Parameter: fs.aio-nr: Value: 119680 => 122880
[Oct/01/2020 05:46:15.397]: Parameter: fs.inode-nr: Value: 1580316 810036 =>
1562320 768555
[Oct/01/2020 05:46:15.397]: Parameter: kernel.pty.nr: Value: 19 => 18
[Oct/01/2020 05:46:15.397]: Parameter: kernel.random.uuid: Value: 37cc31aa-ee31-
459e-8f2a-0766b34b1b64 => f5176cdc-6390-415d-882e-02c4cff2ae4e
...
Has anything changed recently?
...
Output from host : myserver70
------------------------------
[Oct/01/2020 04:54:15.397]: Parameter: fs.aio-nr: Value: 95488 => 97024
[Oct/01/2020 04:54:15.397]: Parameter: fs.inode-nr: Value: 764974 131561 => 740744
131259
[Oct/01/2020 04:54:15.397]: Parameter: kernel.pty.nr: Value: 2 => 1
[Oct/01/2020 04:54:15.397]: Parameter: kernel.random.entropy_avail: Value: 189 =>
158
[Oct/01/2020 04:54:15.397]: Parameter: kernel.random.uuid: Value: 36269877-9bc9-
40a3-82e0-1619865096f2 => 7551c5e7-c59f-40fa-b55f-5bd170e8b1ab
[Oct/01/2020 05:46:15.397]: Parameter: fs.aio-nr: Value: 119680 => 122880
[Oct/01/2020 05:46:15.397]: Parameter: fs.inode-nr: Value: 1580316 810036 =>
1562320 768555
[Oct/01/2020 05:46:15.397]: Parameter: kernel.pty.nr: Value: 19 => 18
[Oct/01/2020 05:46:15.397]: Parameter: kernel.random.uuid: Value: 37cc31aa-ee31-
459e-8f2a-0766b34b1b64 => f5176cdc-6390-415d-882e-02c4cff2ae4e
[Oct/01/2020 16:56:15.398]: Parameter: fs.aio-nr: Value: 97024 => 98560
Has anything changed recently?
Pre and post upgrade compliance checking
ORAchk/EXAchk provides a single source for all upgrade checks
ORAchk checks
EXAchk checks
Database
AutoUpgrade checks
Cluster Verification
Utility (CVU) checks
Compare
Contrast
Combine
Consolidate
Resulting ORAchk / EXAchk
checks
ORAchk/EXAchk provides a single source for all upgrade checks
To check an environment before upgrading run:
To check an environment after upgrade run:
tfactl <orachk|exachk> –preupgrade
tfactl <orachk|exachk> –postupgrade
Detect and collect using SRDC
Other Server Technology
Enterprise Manager
Data Guard
GoldenGate
Exalogic
Database areas
Errors / Corruption
Performance
Install / patching / upgrade
RAC / Grid Infrastructure
Import / Export
RMAN
Transparent Data Encryption
Storage / partitioning
Undo / auditing
Listener / naming services
Spatial / XDB
Some problem areas covered in SRDCs
Full list in documentation
Around 100 problem types covered
tfactl diagcollect –srdc <srdc_type>
[-sr <sr_number>]
TFA SRDCManual method
Manual collection vs TFA SRDC for database performance
1. Generate ADDM reviewing Document 1680075.1 (multiple steps)
2. Identify “good” and “problem” periods and gather AWR reviewing
Document 1903158.1 (multiple steps)
3. Generate AWR compare report (awrddrpt.sql) using “good” and
“problem” periods
4. Generate ASH report for “good” and “problem” periods reviewing
Document 1903145.1 (multiple steps)
5. Collect OSWatcher data reviewing Document 301137.1 (multiple
steps)
6. Collect Hang Analyze output at Level 4
7. Generate SQL Healthcheck for problem SQL id using Document
1366133.1 (multiple steps)
8. Run support provided sql scripts – Log File sync diagnostic output
using Document 1064487.1 (multiple steps)
9. Check alert.log if there are any errors during the “problem” period
10. Find any trace files generated during the “problem” period
11. Collate and upload all the above files/outputs to SR
1. Run
tfactl diagcollect –srdc dbperf
[-sr <sr_number>]
tfactl diagcollect –srdc <srdc_type>
• Scans system to identify recent events
• Once the relevant event is chosen, proceeds with diagnostic collection
One command SRDC
tfactl diagcollect -srdc ORA-00600
Enter the time of the ORA-00600 [YYYY-MM-DD HH24:MI:SS,<RETURN>=ALL] :
Enter the Database Name [<RETURN>=ALL] :
1. Oct/01/2020 05:29:58 : [orcl2] ORA-00600: internal error code,
arguments: [600], [], [], [], [], [], [], [], [], [], [], []
2. Oct/01/2020 06:55:08 : [orcl2] ORA-00600: internal error code,
arguments: [600], [], [], [], [], [], [], [], [], [], [], []
Please choose the event : 1-2 [1]
Selected value is : 1 (Oct/01/2020 05:29:58 )
All required files are identified
• Trimmed where applicable
• Package in a zip ready to provide to support
One command SRDC
...
2020/10/01 06:14:24 EST : Getting List of Files to Collect
2020/10/01 06:14:27 EST : Trimming file :
myserver1/rdbms/orcl2/orcl2/trace/orcl2_lmhb_3542.trc with original file
size : 163MB
...
2020/10/01 06:14:58 EST : Total time taken : 39s
2020/10/01 06:14:58 EST : Completed collection of zip files.
...
/opt/oracle.ahf/data/repository/srdc_ora600_collection_Thu_Oct_1_06_14_17
_EST_2020_node_local/myserver1.tfa_srdc_ora600_Thu_Oct_1_06_14_17_EST_202
0.zip
Manage logs
TFA can automatically purge database logs
Purging automatically removes logs older than 30 days
• Configurable with
Purging runs every 60 minutes
• Configurable with:
Automatic Database Log Purge
tfactl set manageLogsAutoPurge=ON
tfactl set manageLogsAutoPurgePolicyAge=<n><d|h>
tfactl set manageLogsAutoPurgeInterval=<minutes>
TFA can manage ADR log and trace files
tfactl managelogs <options>
–show usage #Show disk space usage per diagnostic directory for both
GI and database logs
-show variation –older <n><m|h|d> #Show disk space growth for
specified period
-purge –older <n><m|h|d> #Remove ADR files older than the time
specified
–gi #Restrict command to only files under the GI_BASE
–database [all | dbname] #Restrict command to only files under the
database directory
-dryrun #Use with –purge to estimate how many files will be affected and
how much disk space will be freed by a potential purge command
Manual Database Log Purge
tfactl managelogs -show usage
...
.---------------------------------------------------------------------------------.
| Grid Infrastructure Usage |
+---------------------------------------------------------------------+-----------+
| Location | Size |
+---------------------------------------------------------------------+-----------+
| /u01/app/crsusr/diag/afdboot/user_root/host_309243680_94/alert | 28.00 KB |
| /u01/app/crsusr/diag/afdboot/user_root/host_309243680_94/incident | 4.00 KB |
| /u01/app/crsusr/diag/afdboot/user_root/host_309243680_94/trace | 8.00 KB |
...
+---------------------------------------------------------------------+-----------+
| Total | 739.06 MB |
'---------------------------------------------------------------------+-----------’
...
Understand Database log disk space usage
Use -gi to only show grid infrastructure
...
.---------------------------------------------------------------.
| Database Homes Usage |
+---------------------------------------------------+-----------+
| Location | Size |
+---------------------------------------------------+-----------+
| /u01/app/crsusr/diag/rdbms/cdb674/CDB674/alert | 1.06 MB |
| /u01/app/crsusr/diag/rdbms/cdb674/CDB674/incident | 4.00 KB |
| /u01/app/crsusr/diag/rdbms/cdb674/CDB674/trace | 146.19 MB |
| /u01/app/crsusr/diag/rdbms/cdb674/CDB674/cdump | 4.00 KB |
| /u01/app/crsusr/diag/rdbms/cdb674/CDB674/hm | 4.00 KB |
+---------------------------------------------------+-----------+
| Total | 147.26 MB |
'---------------------------------------------------+-----------'
Understand Database log disk space usage
Use -database to only show database
Understand Database log disk space usage variations
tfactl managelogs -show variation -older 30d
Output from host : myserver74
------------------------------
2020-10-01 12:30:42: INFO Checking space variation for 30 days
.---------------------------------------------------------------------------------------------.
| Grid Infrastructure Variation |
+---------------------------------------------------------------------+-----------+-----------+
| Directory | Old Size | New Size |
+---------------------------------------------------------------------+-----------+-----------+
| /u01/app/crsusr/diag/asm/user_root/host_309243680_96/alert | 22.00 KB | 28.00 KB |
+---------------------------------------------------------------------+-----------+-----------+
| /u01/app/crsusr/diag/clients/user_crsusr/host_309243680_96/cdump | 4.00 KB | 4.00 KB |
+---------------------------------------------------------------------+-----------+-----------+
| /u01/app/crsusr/diag/tnslsnr/myserver74/listener/alert | 15.06 MB | 244.10 MB |
+---------------------------------------------------------------------+-----------+-----------+
...
Understand Database log disk space usage variations
...
.---------------------------------------------------------------------------.
| Database Homes Variation |
+---------------------------------------------------+-----------+-----------+
| Directory | Old Size | New Size |
+---------------------------------------------------+-----------+-----------+
| /u01/app/crsusr/diag/rdbms/cdb674/CDB674/hm | 4.00 KB | 4.00 KB |
+---------------------------------------------------+-----------+-----------+
| /u01/app/crsusr/diag/rdbms/cdb674/CDB674/trace | 16.63 MB | 146.19 MB |
+---------------------------------------------------+-----------+-----------+
| /u01/app/crsusr/diag/rdbms/cdb674/CDB674/cdump | 4.00 KB | 4.00 KB |
+---------------------------------------------------+-----------+-----------+
| /u01/app/crsusr/diag/rdbms/cdb674/CDB674/incident | 4.00 KB | 4.00 KB |
+---------------------------------------------------+-----------+-----------+
| /u01/app/crsusr/diag/rdbms/cdb674/CDB674/alert | 1.06 MB | 1.06 MB |
'------------------------------------------------------------+-------------+-------------'
Run a database log purge dryrun
tfactl managelogs -purge -older 30d -dryrun
Output from host : myserver74
------------------------------
Estimating files older than 30 days
Estimating purge for diagnostic destination "diag/afdboot/user_root/host_309243680_94" for files ~ 2 files deleted , 22.58 KB freed ]
Estimating purge for diagnostic destination "diag/afdboot/user_crsusr/host_309243680_94" for files ~ 2 files deleted , 11.72 KB freed ]
Estimating purge for diagnostic destination "diag/asmtool/user_root/host_309243680_96" for files ~ 2 files deleted , 21.36 KB freed ]
Estimating purge for diagnostic destination "diag/asmtool/user_crsusr/host_309243680_96" for files ~ 3 files deleted , 23.22 KB freed ]
Estimating purge for diagnostic destination "diag/tnslsnr/myserver74/listener" for files ~ 23 files deleted , 225.33 MB freed ]
Estimating purge for diagnostic destination "diag/diagtool/user_root/adrci_309243680_96" for files ~ 73 files deleted , 517.69 KB freed ]
Estimating purge for diagnostic destination "diag/clients/user_crsusr/host_309243680_96" for files ~ 38 files deleted , 17.15 KB freed ]
Estimating purge for diagnostic destination "diag/asm/+asm/+ASM" for files ~ 0 files deleted , 0 bytes freed ]
Estimating purge for diagnostic destination "diag/asm/user_root/host_309243680_96" for files ~ 1 files deleted , 19.52 KB freed ]
Estimating purge for diagnostic destination "diag/asm/user_crsusr/host_309243680_96" for files ~ 1 files deleted , 20.25 KB freed ]
Estimating purge for diagnostic destination "diag/crs/myserver74/crs" for files ~ 40 files deleted , 219.39 MB freed ]
Estimation for Grid Infrastructure [ Files to delete : ~ 185 files | Space to be freed : ~ 445.36 MB ]
Estimating purge for diagnostic destination "diag/rdbms/cdb674/CDB674" for files ~ 27760 files deleted , 66.57 MB freed ]
Estimation for Database Home [ Files to delete : ~ 27760 files | Space to be freed : ~ 66.57 MB ]
Run a database log purge
tfactl managelogs -purge -older 30d
Output from host : myserver74
------------------------------
Purging files older than 30 days
Cleaning Grid Infrastructure destinations
Purging diagnostic destination "diag/afdboot/user_root/host_309243680_94" for files - 0 files deleted , 0 bytes freed
Purging diagnostic destination "diag/afdboot/user_crsusr/host_309243680_94" for files - 1 files deleted , 10.16 KB freed
Purging diagnostic destination "diag/asmtool/user_root/host_309243680_96" for files - 1 files deleted , 10.16 KB freed
Purging diagnostic destination "diag/asmtool/user_crsusr/host_309243680_96" for files - 2 files deleted , 29.18 KB freed
Purging diagnostic destination "diag/tnslsnr/myserver74/listener" for files - 2 files deleted , 29.18 KB freed
Purging diagnostic destination "diag/diagtool/user_root/adrci_309243680_96" for files - 2 files deleted , 29.18 KB freed
Purging diagnostic destination "diag/clients/user_crsusr/host_309243680_96" for files - 2 files deleted , 29.18 KB freed
Purging diagnostic destination "diag/asm/+asm/+ASM" for files - 2 files deleted , 29.18 KB freed
Purging diagnostic destination "diag/asm/user_root/host_309243680_96" for files - 2 files deleted , 29.18 KB freed
Purging diagnostic destination "diag/asm/user_crsusr/host_309243680_96" for files - 2 files deleted , 29.18 KB freed
Purging diagnostic destination "diag/crs/myserver74/crs" for files - 2 files deleted , 29.18 KB freed
...
Run a database log purge
...
Grid Infrastructure [ Files deleted : 18 files | Space Freed : 253.75 KB ]
.-----------------------------------------------------------------------------------------------.
| File System Variation : /u01/app/crsusr/12.2.0/grid2 |
+--------+-----------------------------------+----------+----------+---------+----------+-------+
| State | Name | Size | Used | Free | Capacity | Mount |
+--------+-----------------------------------+----------+----------+---------+----------+-------+
| Before | /dev/mapper/vg_rws1270665-lv_root | 51475068 | 46597152 | 2256476 | 96% | / |
| After | /dev/mapper/vg_rws1270665-lv_root | 51475068 | 46597152 | 2256476 | 96% | / |
'--------+-----------------------------------+----------+----------+---------+----------+-------'
Monitor multiple logs
tail files
tfactl tail alert
Output from host : myserver69
------------------------------
/scratch/app/11.2.0.4/grid/log/myserver69/alertmyserver69.log
2020-10-01 23:28:22.532:
[ctssd(5630)]CRS-2409:The clock on host myserver69 is not synchronous with
the mean cluster time. No action has been taken as the Cluster Time
Synchronization Service is running in observer mode.
2020-10-01 23:58:22.964:
[ctssd(5630)]CRS-2409:The clock on host myserver69 is not synchronous with
the mean cluster time. No action has been taken as the Cluster Time
Synchronization Service is running in observer mode.
...
tail files
...
/scratch/app/oradb/diag/rdbms/apxcmupg/apxcmupg_2/trace/alert_apxcmupg_2.log
Thu Oct 01 06:00:00 2020 VKRM started with pid=82, OS id=4903
Thu Oct 01 06:00:02 2020 Begin automatic SQL Tuning Advisor run for special
tuning task "SYS_AUTO_SQL_TUNING_TASK"
Thu Oct 01 06:00:37 2020 End automatic SQL Tuning Advisor run for special
tuning task "SYS_AUTO_SQL_TUNING_TASK"
Thu Oct 01 23:00:28 2020 Thread 2 advanced to log sequence 759 (LGWR switch)
Current log# 3 seq# 759 mem# 0:
+DATA/apxcmupg/onlinelog/group_3.289.917164707
Current log# 3 seq# 759 mem# 1:
+FRA/apxcmupg/onlinelog/group_3.289.917164707
...
tail files
...
/scratch/app/oradb/diag/rdbms/ogg11204/ogg112041/trace/alert_ogg112041.log
Clearing Resource Manager plan via parameter
Thu Oct 01 05:59:59 2020
Setting Resource Manager plan DEFAULT_MAINTENANCE_PLAN via parameter
Thu Oct 01 05:59:59 2020
Starting background process VKRM
Thu Oct 01 05:59:59 2020
VKRM started with pid=36, OS id=4901
Thu Oct 01 22:00:31 2020
Thread 1 advanced to log sequence 305 (LGWR switch)
Current log# 1 seq# 305 mem# 0: +DATA/ogg11204/redo01.log
...
tail files
...
/scratch/app/oragrid/diag/asm/+asm/+ASM1/trace/alert_+ASM1.log <==
Thu Oct 01 04:42:22 2020
NOTE: [ocrcheck.bin@myserver69 (TNS V1-V3) 2323] opening OCR file
Thu Oct 01 01:05:39 2020
NOTE: [ocrcheck.bin@myserver69 (TNS V1-V3) 16591] opening OCR file
Thu Oct 01 01:05:41 2020
NOTE: [ocrcheck.bin@myserver69 (TNS V1-V3) 16603] opening OCR file
Thu Oct 01 01:21:12 2020
NOTE: [ocrcheck.bin@myserver69 (TNS V1-V3) 1803] opening OCR file
Thu Oct 01 01:21:12 2020
NOTE: [ocrcheck.bin@myserver69 (TNS V1-V3) 1816] opening OCR file
...
Monitor database performance
Near real-time Database monitoring
• Single instance & RAC
• Monitoring current database activities
• Database performance
• Identifying contentions and bottleneck
• Process & SQL Monitoring
• Real time wait events
• Active Data Guard support
• Multitenant Database (CDB) support
oratop (Support Tools Bundle)
Monitor Database performance
tfactl run oratop -database ogg19c
Section 1 DATABASE:
Global database
information
Section 2 INSTANCE:
Database instance
Activity
Section 3 EVENT: AWR
like “Top 5 Timed
Events“
Section 4 PROCESS |
SQL: Processes or SQL
mode information
Monitor Database performance
more info 1500864.1
Analyze OS metrics
Collect & Archive OS Metrics
Executes standard UNIX utilities (e.g. vmstat, iostat, ps,
etc) on regular intervals
Built in Analyzer functionality to summarize, graph and
report upon collected metrics
Output is Required for node reboot and performance
issues
Simple to install, extremely lightweight
Runs on ALL platforms (Except Windows)
OS Watcher (Support Tools Bundle)
Analyse OS Metrics
tfactl run oswbb
Starting OSW Analyzer V8.4.0
OSWatcher Analyzer Written by Oracle Center of Expertise
Copyright (c) 2020 by Oracle Corporation
Parsing Data. Please Wait...
Scanning file headers for version and platform info...
Parsing file rws1270069_iostat_18.11.24.0900.dat ...
Parsing file rws1270069_iostat_18.11.24.1000.dat ...
...
Analyse OS Metrics
...
Enter 1 to Display CPU Process Queue Graphs
Enter 2 to Display CPU Utilization Graphs
Enter 3 to Display CPU Other Graphs
Enter 4 to Display Memory Graphs
Enter 5 to Display Disk IO Graphs
Enter GC to Generate All CPU Gif Files
Enter GM to Generate All Memory Gif Files
Enter GD to Generate All Disk Gif Files
Enter GN to Generate All Network Gif Files
Enter L to Specify Alternate Location of Gif Directory
Enter Z to Zoom Graph Time Scale (Does not change analysis dataset)
...
Analyse OS Metrics
...
Enter B to Returns to Baseline Graph Time Scale (Does not change
analysis dataset)
Enter R to Remove Currently Displayed Graphs
Enter X to Export Parsed Data to Flat File
Enter S to Analyze Subset of Data(Changes analysis dataset including
graph time scale)
Enter A to Analyze Data
Enter D to Generate DashBoard
Enter Q to Quit Program
Please Select an Option:1
Analyse OS Metrics
myserver69
Analyse OS Metrics
myserver69
more info 301137.1
Diagnose cluster health
Generates view of Cluster and Database diagnostic
metrics
• Always on - Enabled by default
• Provides Detailed OS Resource Metrics
• Assists Node eviction analysis
• Locally logs all process data
• User can define pinned processes
• Listens to CSS and GIPC events
• Categorizes processes by type
• Supports plug-in collectors (ex. traceroute, netstat,
ping, etc.)
• New CSV output for ease of analysis
Cluster Health Monitor (CHM)
GIMR
ologgerd
(master)
osysmon
d
osysmon
d
osysmon
d
osysmon
d
12c Grid Infrastructure
Management Repository
OS Data OS Data
OS Data
OS Data
Cluster Health Monitor (CHM)
Confidential – Oracle Internal/Restricted/Highly
Restricted
Oclumon CLI or full integration
with EM Cloud Control
Always on - Enabled by default
Detects node and database performance problems
Provides early-warning alerts and corrective action
Supports on-site calibration to improve sensitivity
Integrated into EMCC Incident Manager and
notifications
Standalone Interactive GUI Tool
Cluster Health Advisor (CHA)*
OS Data
GIMR
ochad
DB Data
CHM
Node
Health
Prognostic
s
Engine
Database
Health
Prognostic
s
Engine
* Requires and Included with RAC or R1N License
Choosing a Data Set for Calibration – Defining “normal”
Calibrating CHA to your RAC deployment
chactl query calibration –cluster –timeranges ‘start=2020-10-01
07:00:00,end=2020-10-01 13:00:00’
Cluster name : mycluster
Start time : 2020-10-01 07:00:00
End time : 2020-10-01 13:00:00
Total Samples : 11524
Percentage of filtered data : 100%
1) Disk read (ASM) (Mbyte/sec)
MEAN MEDIAN STDDEV MIN MAX
0.11 0.00 2.62 0.00 114.66
<25 <50 <75 <100 >=100
99.87% 0.08% 0.00% 0.02% 0.03%
...
Choosing a Data Set for Calibration – Defining “normal”
Calibrating CHA to your RAC deployment
...
2) Disk write (ASM) (Mbyte/sec)
MEAN MEDIAN STDDEV MIN MAX
0.01 0.00 0.15 0.00 6.77
<50 <100 <150 <200 >=200
100.00% 0.00% 0.00% 0.00% 0.00%
...
Choosing a Data Set for Calibration – Defining “normal”
Calibrating CHA to your RAC deployment
...
3) Disk throughput (ASM) (IO/sec)
MEAN MEDIAN STDDEV MIN MAX
2.20 0.00 31.17 0.00 1100.00
<5000 <10000 <15000 <20000 >=20000
100.00% 0.00% 0.00% 0.00% 0.00%
4) CPU utilization (total) (%)
MEAN MEDIAN STDDEV MIN MAX
9.62 9.30 7.95 1.80 77.90
<20 <40 <60 <80 >=80
92.67% 6.17% 1.11% 0.05% 0.00%
...
Create and store a new model
Begin using the new model
Confirm the new model is working
Calibrating CHA to your RAC deployment
chactl query calibrate cluster –model daytime –timeranges
‘start=2020-10-01 07:00:00, end= 2020-10-01 13:00:00’
chactl monitor cluster –model daytime
chactl status –verbose
monitoring nodes svr01, svr02 using model daytime
monitoring database qoltpacdb, instances oltpacdb_1, oltpacdb_2 using
model DEFAULT_DB
Enable CHA monitoring on RAC database with optional model
Enable CHA monitoring on RAC database with optional verbose
Command line operations
chactl monitor database –db oltpacdb [-model model_name]
chactl status –verbose
monitoring nodes svr01, svr02 using model DEFAULT_CLUSTER
monitoring database oltpacdb, instances oltpacdb_1, oltpacdb_2 using
model DEFAULT_DB
Check for Health Issues and Corrective Actions with CHACTL QUERY DIAGNOSIS
Command line operations
chactl query diagnosis -db oltpacdb -start "2020-10-01 01:42:50" -end "2020-10-01 03:19:15"
2020-10-01 01:47:10.0 Database oltpacdb DB Control File IO Performance (oltpacdb_1) [detected]
2020-10-01 01:47:10.0 Database oltpacdb DB Control File IO Performance (oltpacdb_2) [detected]
2020-10-01 02:59:35.0 Database oltpacdb DB Log File Switch (oltpacdb_1) [detected]
2020-10-01 02:59:45.0 Database oltpacdb DB Log File Switch (oltpacdb_2) [detected]
Problem: DB Control File IO Performance
Description: CHA has detected that reads or writes to the control files are slower than expected.
Cause: The Cluster Health Advisor (CHA) detected that reads or writes to the control files were
slow because of an increase in disk IO.
The slow control file reads and writes may have an impact on checkpoint and Log Writer (LGWR)
performance.
Action: Separate the control files from other database files and move them to faster disks or Solid
State Devices.
Problem: DB Log File Switch
Description: CHA detected that database sessions are waiting longer than expected
for log switch completions.
Cause: The Cluster Health Advisor (CHA) detected high contention during log switches
because the redo log files were small and the redo logs switched frequently.
Action: Increase the size of the redo logs.
HTML diagnostic health output available (-html <file_name>)
Command line operations
Diagnose cluster health
chactl query diagnosis -db oltpacdb -start ”2020-10-01 01:42:50.0" -end " 2020-10-01 03:19:15.0"
2020-10-01 01:47:10.0 Database oltpacdb DB Control File IO Performance (oltpacdb_1) [detected]
2020-10-01 01:47:10.0 Database oltpacdb DB Control File IO Performance (oltpacdb_2) [detected]
2020-10-01 02:52:15.0 Database oltpacdb DB CPU Utilization (oltpacdb_2) [detected]
2020-10-01 02:52:50.0 Database oltpacdb DB CPU Utilization (oltpacdb_1) [detected]
2020-10-01 02:59:35.0 Database oltpacdb DB Log File Switch (oltpacdb_1) [detected]
2020-10-01 02:59:45.0 Database oltpacdb DB Log File Switch (oltpacdb_2) [detected]
Thank you
Troubleshooting tips and tricks for Oracle Database Oct 2020

Más contenido relacionado

La actualidad más candente

Performance Tuning With Oracle ASH and AWR. Part 1 How And What
Performance Tuning With Oracle ASH and AWR. Part 1 How And WhatPerformance Tuning With Oracle ASH and AWR. Part 1 How And What
Performance Tuning With Oracle ASH and AWR. Part 1 How And What
udaymoogala
 
Oracle db performance tuning
Oracle db performance tuningOracle db performance tuning
Oracle db performance tuning
Simon Huang
 

La actualidad más candente (20)

AWR and ASH Deep Dive
AWR and ASH Deep DiveAWR and ASH Deep Dive
AWR and ASH Deep Dive
 
Oracle ASM Training
Oracle ASM TrainingOracle ASM Training
Oracle ASM Training
 
Achieving Continuous Availability for Your Applications with Oracle MAA
Achieving Continuous Availability for Your Applications with Oracle MAAAchieving Continuous Availability for Your Applications with Oracle MAA
Achieving Continuous Availability for Your Applications with Oracle MAA
 
Oracle Performance Tuning Fundamentals
Oracle Performance Tuning FundamentalsOracle Performance Tuning Fundamentals
Oracle Performance Tuning Fundamentals
 
Oracle_Multitenant_19c_-_All_About_Pluggable_D.pdf
Oracle_Multitenant_19c_-_All_About_Pluggable_D.pdfOracle_Multitenant_19c_-_All_About_Pluggable_D.pdf
Oracle_Multitenant_19c_-_All_About_Pluggable_D.pdf
 
Standard Edition High Availability (SEHA) - The Why, What & How
Standard Edition High Availability (SEHA) - The Why, What & HowStandard Edition High Availability (SEHA) - The Why, What & How
Standard Edition High Availability (SEHA) - The Why, What & How
 
New Generation Oracle RAC Performance
New Generation Oracle RAC PerformanceNew Generation Oracle RAC Performance
New Generation Oracle RAC Performance
 
Chasing the optimizer
Chasing the optimizerChasing the optimizer
Chasing the optimizer
 
The Oracle RAC Family of Solutions - Presentation
The Oracle RAC Family of Solutions - PresentationThe Oracle RAC Family of Solutions - Presentation
The Oracle RAC Family of Solutions - Presentation
 
Performance Stability, Tips and Tricks and Underscores
Performance Stability, Tips and Tricks and UnderscoresPerformance Stability, Tips and Tricks and Underscores
Performance Stability, Tips and Tricks and Underscores
 
Performance Tuning With Oracle ASH and AWR. Part 1 How And What
Performance Tuning With Oracle ASH and AWR. Part 1 How And WhatPerformance Tuning With Oracle ASH and AWR. Part 1 How And What
Performance Tuning With Oracle ASH and AWR. Part 1 How And What
 
Why oracle data guard new features in oracle 18c, 19c
Why oracle data guard new features in oracle 18c, 19cWhy oracle data guard new features in oracle 18c, 19c
Why oracle data guard new features in oracle 18c, 19c
 
Oracle db performance tuning
Oracle db performance tuningOracle db performance tuning
Oracle db performance tuning
 
Understanding oracle rac internals part 1 - slides
Understanding oracle rac internals   part 1 - slidesUnderstanding oracle rac internals   part 1 - slides
Understanding oracle rac internals part 1 - slides
 
A deep dive about VIP,HAIP, and SCAN
A deep dive about VIP,HAIP, and SCAN A deep dive about VIP,HAIP, and SCAN
A deep dive about VIP,HAIP, and SCAN
 
Exadata master series_asm_2020
Exadata master series_asm_2020Exadata master series_asm_2020
Exadata master series_asm_2020
 
Oracle RAC 19c and Later - Best Practices #OOWLON
Oracle RAC 19c and Later - Best Practices #OOWLONOracle RAC 19c and Later - Best Practices #OOWLON
Oracle RAC 19c and Later - Best Practices #OOWLON
 
Oracle Database Performance Tuning Advanced Features and Best Practices for DBAs
Oracle Database Performance Tuning Advanced Features and Best Practices for DBAsOracle Database Performance Tuning Advanced Features and Best Practices for DBAs
Oracle Database Performance Tuning Advanced Features and Best Practices for DBAs
 
Maximum Availability Architecture - Best Practices for Oracle Database 19c
Maximum Availability Architecture - Best Practices for Oracle Database 19cMaximum Availability Architecture - Best Practices for Oracle Database 19c
Maximum Availability Architecture - Best Practices for Oracle Database 19c
 
Oracle 12c and its pluggable databases
Oracle 12c and its pluggable databasesOracle 12c and its pluggable databases
Oracle 12c and its pluggable databases
 

Similar a Troubleshooting tips and tricks for Oracle Database Oct 2020

Profiling of Oracle Function Calls
Profiling of Oracle Function CallsProfiling of Oracle Function Calls
Profiling of Oracle Function Calls
Enkitec
 

Similar a Troubleshooting tips and tricks for Oracle Database Oct 2020 (20)

Troubleshooting Tips and Tricks for Database 19c - Sangam 2019
Troubleshooting Tips and Tricks for Database 19c - Sangam 2019Troubleshooting Tips and Tricks for Database 19c - Sangam 2019
Troubleshooting Tips and Tricks for Database 19c - Sangam 2019
 
Troubleshooting Tips and Tricks for Database 19c ILOUG Feb 2020
Troubleshooting Tips and Tricks for Database 19c   ILOUG Feb 2020Troubleshooting Tips and Tricks for Database 19c   ILOUG Feb 2020
Troubleshooting Tips and Tricks for Database 19c ILOUG Feb 2020
 
DEP/ASLR bypass without ROP/JIT
DEP/ASLR bypass without ROP/JITDEP/ASLR bypass without ROP/JIT
DEP/ASLR bypass without ROP/JIT
 
Java bytecode Malware Analysis
Java bytecode Malware AnalysisJava bytecode Malware Analysis
Java bytecode Malware Analysis
 
Broom not included curling the modern way
Broom not included curling the modern wayBroom not included curling the modern way
Broom not included curling the modern way
 
44CON London 2015 - Jtagsploitation: 5 wires, 5 ways to root
44CON London 2015 - Jtagsploitation: 5 wires, 5 ways to root44CON London 2015 - Jtagsploitation: 5 wires, 5 ways to root
44CON London 2015 - Jtagsploitation: 5 wires, 5 ways to root
 
DEF CON 24 - Patrick Wardle - 99 problems little snitch
DEF CON 24 - Patrick Wardle - 99 problems little snitchDEF CON 24 - Patrick Wardle - 99 problems little snitch
DEF CON 24 - Patrick Wardle - 99 problems little snitch
 
Windows Debugging with WinDbg
Windows Debugging with WinDbgWindows Debugging with WinDbg
Windows Debugging with WinDbg
 
Shrimp: A Rather Practical Example Of Application Development With RESTinio a...
Shrimp: A Rather Practical Example Of Application Development With RESTinio a...Shrimp: A Rather Practical Example Of Application Development With RESTinio a...
Shrimp: A Rather Practical Example Of Application Development With RESTinio a...
 
Profiling of Oracle Function Calls
Profiling of Oracle Function CallsProfiling of Oracle Function Calls
Profiling of Oracle Function Calls
 
Reverse engineering of binary programs for custom virtual machines
Reverse engineering of binary programs for custom virtual machinesReverse engineering of binary programs for custom virtual machines
Reverse engineering of binary programs for custom virtual machines
 
Troubleshooting PostgreSQL with pgCenter
Troubleshooting PostgreSQL with pgCenterTroubleshooting PostgreSQL with pgCenter
Troubleshooting PostgreSQL with pgCenter
 
gumiStudy#2 実践 memcached
gumiStudy#2 実践 memcachedgumiStudy#2 実践 memcached
gumiStudy#2 実践 memcached
 
実践 memcached
実践 memcached実践 memcached
実践 memcached
 
Performance tuning
Performance tuningPerformance tuning
Performance tuning
 
StHack 2013 - Florian "@agixid" Gaultier No SQL injection but NoSQL injection
StHack 2013 - Florian "@agixid" Gaultier No SQL injection but NoSQL injectionStHack 2013 - Florian "@agixid" Gaultier No SQL injection but NoSQL injection
StHack 2013 - Florian "@agixid" Gaultier No SQL injection but NoSQL injection
 
NYU hacknight, april 6, 2016
NYU hacknight, april 6, 2016NYU hacknight, april 6, 2016
NYU hacknight, april 6, 2016
 
Marat-Slides
Marat-SlidesMarat-Slides
Marat-Slides
 
3
33
3
 
OSDC 2017 - Werner Fischer - Linux performance profiling and monitoring
OSDC 2017 - Werner Fischer - Linux performance profiling and monitoringOSDC 2017 - Werner Fischer - Linux performance profiling and monitoring
OSDC 2017 - Werner Fischer - Linux performance profiling and monitoring
 

Más de Sandesh Rao

Más de Sandesh Rao (20)

Whats new in Autonomous Database in 2022
Whats new in Autonomous Database in 2022Whats new in Autonomous Database in 2022
Whats new in Autonomous Database in 2022
 
Analysis of Database Issues using AHF and Machine Learning v2 - AOUG2022
Analysis of Database Issues using AHF and Machine Learning v2 -  AOUG2022Analysis of Database Issues using AHF and Machine Learning v2 -  AOUG2022
Analysis of Database Issues using AHF and Machine Learning v2 - AOUG2022
 
Analysis of Database Issues using AHF and Machine Learning v2 - SOUG
Analysis of Database Issues using AHF and Machine Learning v2 -  SOUGAnalysis of Database Issues using AHF and Machine Learning v2 -  SOUG
Analysis of Database Issues using AHF and Machine Learning v2 - SOUG
 
AutoML - Heralding a New Era of Machine Learning - CASOUG Oct 2021
AutoML - Heralding a New Era of Machine Learning - CASOUG Oct 2021AutoML - Heralding a New Era of Machine Learning - CASOUG Oct 2021
AutoML - Heralding a New Era of Machine Learning - CASOUG Oct 2021
 
15 Troubleshooting tips and Tricks for Database 21c - KSAOUG
15 Troubleshooting tips and Tricks for Database 21c - KSAOUG15 Troubleshooting tips and Tricks for Database 21c - KSAOUG
15 Troubleshooting tips and Tricks for Database 21c - KSAOUG
 
Machine Learning and AI at Oracle
Machine Learning and AI at OracleMachine Learning and AI at Oracle
Machine Learning and AI at Oracle
 
Top 20 FAQs on the Autonomous Database
Top 20 FAQs on the Autonomous DatabaseTop 20 FAQs on the Autonomous Database
Top 20 FAQs on the Autonomous Database
 
How to Use EXAchk Effectively to Manage Exadata Environments
How to Use EXAchk Effectively to Manage Exadata EnvironmentsHow to Use EXAchk Effectively to Manage Exadata Environments
How to Use EXAchk Effectively to Manage Exadata Environments
 
15 Troubleshooting Tips and Tricks for database 21c - OGBEMEA KSAOUG
15 Troubleshooting Tips and Tricks for database 21c - OGBEMEA KSAOUG15 Troubleshooting Tips and Tricks for database 21c - OGBEMEA KSAOUG
15 Troubleshooting Tips and Tricks for database 21c - OGBEMEA KSAOUG
 
TFA Collector - what can one do with it
TFA Collector - what can one do with it TFA Collector - what can one do with it
TFA Collector - what can one do with it
 
Introduction to Machine learning - DBA's to data scientists - Oct 2020 - OGBEmea
Introduction to Machine learning - DBA's to data scientists - Oct 2020 - OGBEmeaIntroduction to Machine learning - DBA's to data scientists - Oct 2020 - OGBEmea
Introduction to Machine learning - DBA's to data scientists - Oct 2020 - OGBEmea
 
How to use Exachk effectively to manage Exadata environments OGBEmea
How to use Exachk effectively to manage Exadata environments OGBEmeaHow to use Exachk effectively to manage Exadata environments OGBEmea
How to use Exachk effectively to manage Exadata environments OGBEmea
 
Introduction to Machine Learning - From DBA's to Data Scientists - OGBEMEA
Introduction to Machine Learning - From DBA's to Data Scientists - OGBEMEAIntroduction to Machine Learning - From DBA's to Data Scientists - OGBEMEA
Introduction to Machine Learning - From DBA's to Data Scientists - OGBEMEA
 
20 tips and tricks with the Autonomous Database
20 tips and tricks with the Autonomous Database20 tips and tricks with the Autonomous Database
20 tips and tricks with the Autonomous Database
 
TFA, ORAchk and EXAchk 20.2 - What's new
TFA, ORAchk and EXAchk 20.2 - What's new TFA, ORAchk and EXAchk 20.2 - What's new
TFA, ORAchk and EXAchk 20.2 - What's new
 
Machine Learning in Autonomous Data Warehouse
 Machine Learning in Autonomous Data Warehouse Machine Learning in Autonomous Data Warehouse
Machine Learning in Autonomous Data Warehouse
 
Introduction to AutoML and Data Science using the Oracle Autonomous Database ...
Introduction to AutoML and Data Science using the Oracle Autonomous Database ...Introduction to AutoML and Data Science using the Oracle Autonomous Database ...
Introduction to AutoML and Data Science using the Oracle Autonomous Database ...
 
Oracle Autonomous Health Service- For Protecting Your On-Premise Databases- F...
Oracle Autonomous Health Service- For Protecting Your On-Premise Databases- F...Oracle Autonomous Health Service- For Protecting Your On-Premise Databases- F...
Oracle Autonomous Health Service- For Protecting Your On-Premise Databases- F...
 
Introduction to Machine Learning and Data Science using Autonomous Database ...
Introduction to Machine Learning and Data Science using Autonomous Database  ...Introduction to Machine Learning and Data Science using Autonomous Database  ...
Introduction to Machine Learning and Data Science using Autonomous Database ...
 
The Machine Learning behind the Autonomous Database ILOUG Feb 2020
The Machine Learning behind the Autonomous Database   ILOUG Feb 2020 The Machine Learning behind the Autonomous Database   ILOUG Feb 2020
The Machine Learning behind the Autonomous Database ILOUG Feb 2020
 

Último

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Último (20)

Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 

Troubleshooting tips and tricks for Oracle Database Oct 2020

  • 1. VP AIOps for the Autonomous Database Sandesh Rao 15 Troubleshooting tips and tricks for the Oracle Database @sandeshr https://www.linkedin.com/in/raosandesh/ https://www.slideshare.net/SandeshRao4
  • 2. Systemstate dumps what they are and how to read them
  • 3. A systemstate is made up of the processtate of each process in the instance found at the time the systemstate was called for. Each processtate is made up of SO (State Objects) which hold details of the state of current objects owned by each PROCESS. To navigate a systemstate: 1. Find what process most sessions are waiting for 2. Recursively navigate what each process is waiting for 3. When you find a process on the CPU get an error stack to understand why it is blocked Systemstate Dumps
  • 4. These are waits for locks held upon a particular object. In the example below, the process is waiting for a TX enqueue as indicated by the "waiting for 'enq: TX - row lock contention'" message: Enqueues Systemstate Dumps PROCESS 41 ... waiting for 'enq: TX - row lock contention' blocking sess=0x39b3a5c90 seq=152 wait_time=0 seconds since wait started=796 name|mode=54580006, usn * 54580006 is ASCII and can be split up as follows to reveal the meaning: * ASCII 54 (T) + ASCII 58 (T) => (TX) + Mode 0006 (X) ...
  • 5. To find more details on the enqueue, do a search for the string 'req:' (searching DOWN) within the process. In this case we find a section with a "req:X" request: "req:" in this case refers the "request" for the TX lock that is being waited for by the 'enq: TX - row lock contention' wait. The request is for an eXclusive TX lock. This section also reveals the enqueue name as a string: (TX-00020009-0001FA04) that can be used to search for the HOLDER (the holder of the resource is shown with the string "mode:" with the mode that the lock is being held in by the holder, in this case eXclusive) : We can see we hold the enqueue (mode: X) in a incompatible mode to the req: X request... Enqueues Systemstate Dumps SO: 39ad80d60, type: 5, owner: 393cb85e0, flag: INIT/-/-/0x00 (enqueue) TX-00020009-0001FA04 DID: 0001-0029-00000090 lv: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 flag: 0x6 res: 39aef20c8, req: X, prv: 39aef20e8, own: 39b383aa8, sess: 39b383aa8, proc: 39b7384f0 (enqueue) TX-00020009-0001FA04 DID: 0001-002E-00000014 lv: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 flag: 0x6 res: 39aef20c8, mode: X, prv: 39aef20d8, own: 39b3a5c90, sess: 39b3a5c90, proc: 39b73ac78
  • 6. A Row cache waits are waits against the Row Cache (or Dictionary Cache). Processes will show "waiting for 'row cache lock’” • mode=0 shows the lock is not currently held • request=3 shows we are requesting the lock in Shared (mode 3) • object=7000000eedc13a0 show the object we are requesting the lock on • request=S shows the lock is Shared(S) • cid=7(dc_users) shows the cache type of dc_users with a cache ID of 7 • mode=X shows the lock is held in eXclusive mode Rowcache locks Systemstate Dumps PROCESS 19: ... waiting for 'row cache lock' blocking sess=0x0 seq=2174 wait_time=0 cache id=7, mode=0, request=3 -------------------------------------------------------------------------------- SO: 7000000c6de7678, type: 48, owner: 7000000a6c97cf8, flag: INIT/-/-/0x00 row cache enqueue: count=1 session=7000000a660b8b0 object=7000000eedc13a0, request=S savepoint=2148 row cache parent object: address=7000000eedc13a0 cid=7(dc_users) hash=2a057ebe typ=9 transaction=7000000c42297a0 flags=00000002 own=7000000eedc1480[7000000c6de8518,7000000c6de8518] wat=7000000eedc1490[7000000c6de7568,7000000c6deed98] mode=X status=VALID/-/-/-/-/-/-/-/- request=N release=TRUE flags=0
  • 7. This process is waiting for 'row cache lock'. The waiter is waiting for "object=7000000eedc13a0" and it is requesting a Share mode lock "request=S". To find the HOLDER, search for object but use the mode: string to indicate a holder Rowcache locks Systemstate Dumps PROCESS 19: ... waiting for 'row cache lock' blocking sess=0x0 seq=2174 wait_time=0 cache id=7, mode=0, request=3 -------------------------------------------------------------------------------- SO: 7000000c6de7678, type: 48, owner: 7000000a6c97cf8, flag: INIT/-/-/0x00 row cache enqueue: count=1 session=7000000a660b8b0 object=7000000eedc13a0, request=S savepoint=2148 row cache parent object: address=7000000eedc13a0 cid=7(dc_users) hash=2a057ebe typ=9 transaction=7000000c42297a0 flags=00000002 own=7000000eedc1480[7000000c6de8518,7000000c6de8518] wat=7000000eedc1490[7000000c6de7568,7000000c6deed98] mode=X status=VALID/-/- /-/-/-/-/-/- request=N release=TRUE flags=0 SO: 7000000c6de84e8, type: 48, owner: 7000000c42297a0, flag: INIT/-/-/0x00 row cache enqueue: count=1 session=7000000a6702710 object=7000000eedc13a0, mode=X savepoint=109 row cache parent object: address=7000000eedc13a0 cid=7(dc_users) hash=2a057ebe typ=9 transaction=7000000c42297a0 flags=00000002 own=7000000eedc1480[7000000c6de8518,7000000c6de8518] wat=7000000eedc1490[7000000c6de7568,7000000c6df1b08] mode=X status=VALID/-/-/-/-/-/-/-/- request=N release=TRUE flags=0 instance lock id=QH 00000440 00000000 set=0, complete=FALSE set=1, complete=FALSE set=2, complete=FALSE data= In this case the "mode:" of the holder is eXclusive (i.e. object=7000000eedc13a0, mode=X). Search back up to the top of this process to find which process is holding the resource.
  • 8. Waits for library cache pins are of the form" waiting for 'cursor: pin S wait on X’” To find more details use the idn=XXXXXX to search down in the systemstate (idn=535d1a6c) • SID 3094 holds the Mutex (3094,0) • Request is for Shared (GET_SHRD) mode Library Cache Pins Systemstate Dumps PROCESS 16: waiting for 'cursor: pin S wait on X' blocking sess=0x0 seq=58849 wait_time=0 seconds since wait started=0 idn=535d1a6c, value=c1600000000, where|sleeps=5003f2428 KGX Atomic Operation Log 7000002e5b9d160 Mutex 7000002b8e92268(3094, 0) idn 535d1a6c oper GET_SHRD Cursor Pin uid 2489 efd 0 whr 5 slp 58733 opr=2 pso=70000028c47def0 flg=0 pcs=7000002b8e92268 nxt=0 flg=34 cld=3 hd=70000030d6c6eb0 par=7000002eefe64d0 ct=31 hsh=0 unp=0 unn=0 hvl=b825a4d0 nhv=1 ses=700000309b42600 hep=7000002b8e922e8 flg=80 ld=1 ob=7000002de49f8a0 ptr=70000022cf39db8 fex=70000022cf390c8
  • 9. To find the HOLDER, search for idn XXXXXXX oper until you find one which is held (ie not GET_XXX) ( idn 535d1a6c oper): • SID 3094 holds Mutex in Exclusive (EXCL) Library Cache Pins Systemstate Dumps KGX Atomic Operation Log 7000002cd934270 Mutex 7000002b8e92268(3094, 0) idn 535d1a6c oper EXCL Cursor Pin uid 3094 efd 0 whr 7 slp 0 opr=3 pso=7000002a71c4180 flg=0 pcs=7000002b8e92268 nxt=0 flg=34 cld=3 hd=70000030d6c6eb0 par=7000002eefe64d0 ct=31 hsh=0 unp=0 unn=0 hvl=b825a4d0 nhv=1 ses=700000309b42600 hep=7000002b8e922e8 flg=80 ld=1 ob=7000002de49f8a0 ptr=70000022cf39db8 fex=70000022cf390c8
  • 10. To find more details use the handle address in the form handle=address to search down in the systemstate (ie handle=70000030de975a8) • Exclusive (X) Requested • <USER_NAME>.<OBJECT_NAME> is the object we are trying to lock Library Cache Lock Systemstate Dumps PROCESS 35: waiting for 'library cache lock' blocking sess=0x0 seq=35844 wait_time=0 seconds since wait started=14615 handle address=70000030de975a8, lock address=70000026947e190, 100*mode+namespace=12d SO: 70000026947e190, type: 53, owner: 700000308d726f0, flag: INIT/-/-/0x00 LIBRARY OBJECT LOCK: lock=70000026947e190 handle=70000030de975a8 request=X call pin=0 session pin=0 hpc=0000 hlc=0000 htl=70000026947e210[7000002b333ffe8,7000002b333ffe8] htb=7000002b333ffe8 ssga=7000002b333f2a0 user=700000307a7ca68 session=700000307a7ca68 count=0 flags=[0000] savepoint=0x23e411 LIBRARY OBJECT HANDLE: handle=70000030de975a8 mtx=70000030de976d8(0) cdp=0 name=<USER_NAME>.<OBJECT_NAME>
  • 11. To find the HOLDER, search for 'handle=XXXXXXXXXX mode=' until you find one which is held (but not in NULL)( handle=70000030de975a8 mode=) • Hold in Shared (S) • name=<USER_NAME>.<OBJECT_NAME> confirms the object name Library Cache Lock Systemstate Dumps SO: 700000288b03ae0, type: 53, owner: 7000002cc697468, flag: INIT/-/-/0x00 LIBRARY OBJECT LOCK: lock=700000288b03ae0 handle=70000030de975a8 mode=S call pin=0 session pin=0 hpc=0000 hlc=0000 htl=700000288b03b60[7000002a179a1a8,7000002b3800878] htb=7000002b3800878 ssga=7000002b37ffb30 user=70000030fafab00 session=70000030fafab00 count=1 flags=[0000] savepoint=0x417 LIBRARY OBJECT HANDLE: handle=70000030de975a8 mtx=70000030de976d8(0) cdp=0 name=<USER_NAME>.<OBJECT_NAME>
  • 12. • 9d is the latch# (in HEX = 157) from v$latchname Towards the top of the PROCESS dump you will see the exact latch we are waiting for and even who holds it: • PROCESS 127 (ospid:23086) holds the latch, PROCESS 127 shows: Latch free Systemstate Dumps PROCESS 8: waiting for 'latch free' blocking sess=0x0 seq=4577 wait_time=0 address=99ff60018, number=9d, tries=0 waiting for 99ff60018 Child library cache level=5 child#=3 Location from where latch is held: kglic: child Context saved from call: 26 state=busy possible holder pid = 127 ospid=23086 wtr=99ff60018, next waiter 9993858b8 holding 99ff60018 Child library cache level=5 child#=3 Location from where latch is held: kglic: child Context saved from call: 26 state=busy
  • 13. If you want to find which object a handle refers to then use the handle=XXXXXXXXXX until you come across the LIBRARY OBJECT HANDLE. ie handle=c00000006c0f8490:- • name shows the name of the handle • Namespace=CRSR show the that it is of type CURSOR Other useful information Systemstate Dumps LIBRARY OBJECT HANDLE: handle=c00000006c0f8490 name=SELECT USER FROM DUAL hash=cd1ceca0 timestamp=03-23-2007 09:00:00 namespace=CRSR flags=RON/TIM/PN0/SML/[12010000]
  • 14. ADDM in a multitenant environment
  • 15. Starting with Oracle Database 12c, ADDM is enabled by default in the root container of a multitenant container database (CDB) You can also use ADDM in a pluggable database (PDB) • In a CDB, ADDM works in the same way as it works in a non-CDB • ADDM analysis is performed each time an AWR snapshot is taken on a CDB root or a PDB • ADDM does not work in a PDB by default, because automatic AWR snapshots are disabled ADDM in a multitenant environment
  • 16. To enable ADDM in a PDB: Set the AWR_PDB_AUTOFLUSH_ENABLED initialization parameter to TRUE in the PDB using the following command: Set the AWR snapshot interval greater than 0 in the PDB using the command as shown in the following example: Results on a PDB provide only PDB-specific findings and recommendations ADDM in a multitenant environment SQL> ALTER SYSTEM SET AWR_PDB_AUTOFLUSH_ENABLED=TRUE; SQL> EXEC dbms_workload_repository.modify_snapshot_settings(interval=>60);
  • 17. Analyze logs and look for errors
  • 18. Investigate logs and look for errors tfactl analyze -since 1d INFO: analyzing all (Alert and Unix System Logs) logs for the last 1440 minutes... ... Unique error messages for last ~1 day(s) Occurrences percent server name error ----------- ------- -------------------- ----- 1 100.0% myserver1 Errors in file /u01/oracle/diag/rdbms/orcl2/orcl2/trace/orcl2_ora_12272.trc (incident=10151): ORA-00600: internal error code, arguments: [600], [], [], [], [], [], [], [], [], [], [], [] Incident details in: /u01/oracle/diag/rdbms/orcl2/orcl2/incident/incdir_10151/orcl2_ora_12272_i101 51.trc ...
  • 19. Investigate logs and look for errors tfactl analyze -search "ORA-04031" -last 1d INFO: analyzing all (Alert and Unix System Logs) logs for the last 1440 minutes... ... Matching regex: ORA-04031 Case sensitive: false Match count: 1 [Source: /u01/oracle/diag/rdbms/orcl2/orcl2/trace/alert_orcl2.log, Line: 1941] Oct 01 12:09:05 2020 Errors in file /u01/oracle/diag/rdbms/orcl2/orcl2/trace/orcl2_ora_6982.trc (incident=7665): ORA-04031: unable to allocate bytes of shared memory ("","","","") Incident details in: /u01/app/oracle/diag/rdbms/orcl2/orcl2/incident/incdir_7665/orcl2_ora_6982_i76 65.trc ...
  • 20. Examples tfactl analyze -since 5h #Show summary of events from alert logs, system messages in last 5 hours tfactl analyze -comp os -since 1d #Show summary of events from system messages in last 1 day tfactl analyze -search "ORA-" -since 2d #Search string ORA- in alert and system logs in past 2 days tfactl analyze -search "/Starting/c" - since 2d #Search case sensitive string "Starting" in past 2 days tfactl analyze -comp os -for ”Oct/01/2020 11" -search "." #Show all system log messages at time Oct/01/2020 11 tfactl analyze -comp osw -since 6h #Show OSWatcher Top summary in last 6 hours tfactl analyze -comp oswslabinfo -from ”Oct/01/2020 05:00:01" -to ”Oct/01/2020 06:00:01" #Show OSWatcher slabinfo summary for specified time period tfactl analyze -since 1h -type generic #Analyze all generic messages in last one hour
  • 21. Investigate logs and look for errors $ ./tfactl analyze -type generic -since 7d INFO: analyzing all (Alert and Unix System Logs) logs for the last 10080 minutes... ... Total message count: 54,807, from 01-Oct-2020 02:41:34 PM PST to 08-Oct-2020 02:41:34 Messages matching last ~7 day(s): 3,139, from 02-Oct-2020 02:46:23 PM PST to 08-Oct-2020 02:41:34 last ~7 day(s) generic count: 3,139, from 06-Oct-2020 02:46:23 PM PST to 08-Oct-2020 02:41:34 last ~7 day(s) unique generic count: 94 Message types for last ~7 day(s) Occurrences percent server name type ----------- ------- -------------------- ----- 3,139 100.0% myhost1 generic ...
  • 22. Investigate logs and look for errors Unique generic messages for last ~7 day(s) Occurrences percent server name generic ----------- ------- -------------------- ----- 1,504 47.9% myhost1 : [crflogd(13931)]CRS-9520:The storage of Grid Infrastructure Managem... 487 15.5% myhost1 : [crflogd(13931)]CRS-9520:The storage of Grid Infrastructure Managem... 336 10.7% myhost1 myhost1 smartd[13812]: Device: /dev/sdv, SMART Failure: FAILURE... 336 10.7% myhost1 myhost1 smartd[13812]: Device: /dev/sdag, SMART Failure: FAILURE ... 103 3.3% myhost1 myhost1 last message repeated 9 times 103 3.3% myhost1 myhost1 kernel: oracle: sending ioctl 2285 to a partition! ...snipping for brevity...
  • 23. Pattern match search output tfactl analyze -search "ORA-" -since 7d ... [Source: /u01/app/oracle/diag/rdbms/ratoda/RATODA1/trace/alert_RATODA1.log, Line: 9494] Feb 25 22:00:02 2014 Errors in file /u01/app/oracle/diag/rdbms/ratoda/RATODA1/trace/RATODA1_j003_10948.trc: ORA-12012: error on auto execute of job "ORACLE_OCM"."MGMT_CONFIG_JOB_2_1" ORA-29280: invalid directory path ORA-06512: at "ORACLE_OCM.MGMT_DB_LL_METRICS", line 2436 ORA-06512: at line 1 End automatic SQL Tuning Advisor run for special tuning task "SYS_AUTO_SQL_TUNING_TASK” ...
  • 24. OS Watcher top data tfactl analyze -comp osw -since 6h ... statistic: t first highest (time) lowest (time) average non zero 3rd last 2nd last last trend top.cpu.util.id: % 98.0 99.7 @10:35AM 72.8 @03:11PM 97.3 2,059 95.2 96.8 96.0 -2% top.cpu.util.st: % 0.1 0.1 @09:14AM 0.0 @09:14AM 0.0 889 0.0 0.0 0.0 -100% top.cpu.util.us: % 0.1 8.8 @11:31AM 0.0 @09:14AM 0.6 1,966 4.3 0.8 3.4 3300% top.cpu.util.wa: % 1.7 18.7 @03:11PM 0.1 @10:35AM 1.1 2,059 0.3 0.4 0.4 -76% top.loadavg.last01min: 1.17 3.12 @09:44AM 0.07 @12:45PM 0.93 1,823 0.31 0.26 0.22 -81% top.loadavg.last05min: 0.94 2.26 @09:44AM 0.27 @12:45PM 0.93 1,823 0.82 0.79 0.77 -18% top.loadavg.last15min: 0.79 1.60 @09:46AM 0.44 @01:18PM 0.92 1,823 0.96 0.95 0.94 18% top.mem.buffers: k 808232 808388 @09:41AM 785608 @02:57PM 796511 2,093 785744 785744 785744 -2% top.mem.free: k 1130332 1291344 @10:02AM 927576 @09:43AM 1188576 2,093 1244020 1265248 1265188 11% top.swap.used: k 47556 48088 @03:00PM 47556 @09:14AM 47828 2,097 48088 48088 48088 1% top.tasks.running: 1 4 @12:04PM 1 @09:14AM 1 1,996 1 2 2 100% top.tasks.total: 514 527 @02:57PM 509 @09:18AM 514 1,996 518 521 520 1% top.tasks.zombie: 0 5 @11:04AM 0 @09:14AM 0 62 0 0 0 n/a top.users: 5 6 @03:00PM 5 @09:14AM 5 1,823 6 6 6 20% ...
  • 25. OS Watcher slabinfo data tfactl analyze -comp oswslabinfo -from ”Oct/01/2020 05:00:01" -to ”Oct/01/2020 06:00:01" ... statistic: t first highest (time) lowest (time) average non zero 3rd last 2nd last last trend slabinfo.acfs_ccb_cache.active_objs: 4 38 @05:52AM 0 @05:01AM 10 294 3 1 8 100% slabinfo.inet_peer_cache.active_objs: 23 39 @05:59AM 23 @05:00AM 23 351 23 23 39 69% slabinfo.sigqueue.active_objs: 385 768 @05:28AM 285 @05:27AM 554 351 712 621 577 49% slabinfo.skbuff_fclone_cache.active_objs: 55 133 @05:51AM 11 @05:20AM 69 351 56 77 70 27% slabinfo.names_cache.active_objs: 126 180 @05:00AM 110 @05:23AM 146 351 171 166 156 23% slabinfo.sgpool-8.active_objs: 135 228 @05:31AM 59 @05:11AM 152 351 180 165 157 16% slabinfo.UDP.active_objs: 568 675 @05:28AM 492 @05:17AM 597 351 630 596 626 10% slabinfo.size-8192.active_objs: 174 209 @05:36AM 160 @05:14AM 181 351 205 187 188 8% slabinfo.task_delay_info.active_objs: 1477 1856 @05:28AM 1334 @05:57AM 1574 351 1529 1411 1579 6% slabinfo.pid.active_objs: 1608 1980 @05:29AM 1452 @05:21AM 1678 351 1564 1487 1689 5% slabinfo.blkdev_requests.active_objs: 720 880 @05:04AM 651 @05:54AM 745 351 707 736 761 5% slabinfo.size-256.active_objs: 1116 1305 @05:06AM 846 @05:11AM 1091 351 1245 1143 1166 4% slabinfo.ip_dst_cache.active_objs: 1497 1800 @05:28AM 1279 @05:36AM 1517 351 1594 1466 1560 4% slabinfo.sock_inode_cache.active_objs: 2168 2329 @05:11AM 2106 @05:56AM 2225 351 2322 2278 2232 2% slabinfo.size-512.active_objs: 3036 3152 @05:38AM 3007 @05:01AM 3088 351 3136 3112 3075 1% ...
  • 26. How to connect to a hung database for diagnostics
  • 27. How do you connect to a database when connections are hanging? • sqlplus preliminary connection will connect to database since no session is created • You will have limited access to the SGA • This will help in capturing diagnostic information like a systemstate dump • Two ways to connect to sqlplus using a preliminary connection: or sqlplus -prelim sqlplus -prelim / as sysdba SQL> set _prelim on SQL> connect / as sysdba Prelim connection established
  • 28. Analyze a hung database
  • 29. Always on - Enabled by default Reliably detects database hangs and deadlocks Autonomously resolves them Logs all detections and resolutions New SQL interface to configure sensitivity (Normal/High) and trace file sizes Oracle Hang Manager Session DIA0 EVALUATE DETECT ANALYZE Hung? VERIFY Victim Policy
  • 30. Monitors Session snapshots for progress Evaluates potential hangs over time with based upon Wait Graphs Analyzes hang chain of sessions to identify blocker/victim Discovers blocker is located in ASM instance Requests ASM terminate session or instance relying on Flex ASM for recovery Detection and resolution is bi-directional Database Hang Management - Infrastructure Database ASM
  • 31. Full Resolution Dump Trace File and DB Alert Log Audit Reports Oracle 12c Hang Manager Dump file …/diag/rdbms/hm6/hm62/incident/incdir_5753/hm62_dia0_12656_i5753.trc Oracle Database 12c Enterprise Edition Release 18/19c.0.0.0 - 64bit Beta With the Partitioning, Real Application Clusters, OLAP, Advanced Analytics and Real Application Testing options Build label: RDBMS_MAIN_LINUX.X64_151013 ORACLE_HOME: …/3775268204/oracle System name: Linux Node name: slc05kyr Release: 2.6.39-400.211.1.el6uek.x86_64 Version: #1 SMP Fri Nov 15 13:39:16 PST 2013 Machine: x86_64 VM name: Xen Version: 3.4 (PVM) Instance name: hm62 Redo thread mounted by this instance: 2 Oracle process number: 19 Unix process pid: 12656, image: oracle@slc05kyr (DIA0) *** 2020-10-01T16:47:59.541509+17:00 *** SESSION ID:(96.41299) 2020-10-01T16:47:59.541519+17:00 *** CLIENT ID:() 2020-10-01T16:47:59.541529+17:00 *** SERVICE NAME:(SYS$BACKGROUND) 2020-10-01T16:47:59.541538+17:00 *** MODULE NAME:() 2020-10-01T16:47:59.541547+17:00 *** ACTION NAME:() 2020-10-01T16:47:59.541556+17:00 *** CLIENT DRIVER:() 2020-10-01T3T16:47:59.541565+17:00
  • 32. Full Resolution Dump Trace File and DB Alert Log Audit Reports Oracle 12c Hang Manager 2020-10-01T16:47:59.435039+17:00 Errors in file /oracle/log/diag/rdbms/hm6/hm6/trace/hm6_dia0_12433.trc (incident=7353): ORA-32701: Possible hangs up to hang ID=1 detected Incident details in: …/diag/rdbms/hm6/hm6/incident/incdir_7353/hm6_dia0_12433_i7353.trc 2020-10-01T16:47:59.506775+17:00 DIA0 requesting termination of session sid:40 with serial # 43179 (ospid:13031) on instance 2 due to a GLOBAL, HIGH confidence hang with ID=1. Hang Resolution Reason: Automatic hang resolution was performed to free a significant number of affected sessions. DIA0: Examine the alert log on instance 2 for session termination status of hang with ID=1. In the alert log on the instance local to the session (instance 2 in this case), we see the following: 2020-10-01T16:47:59.538673+17:00 Errors in file …/diag/rdbms/hm6/hm62/trace/hm62_dia0_12656.trc (incident=5753): ORA-32701: Possible hangs up to hang ID=1 detected Incident details in: …/diag/rdbms/hm6/hm62/incident/incdir_5753/hm62_dia0_12656_i5753.trc 2020-10-01T16:48:04.222661+17:00 DIA0 terminating blocker (ospid: 13031 sid: 40 ser#: 43179) of hang with ID = 1 requested by master DIA0 process on instance 1 Hang Resolution Reason: Automatic hang resolution was performed to free a significant number of affected sessions. by terminating session sid:40 with serial # 43179 (ospid:13031)
  • 33. Guided resolution with Oracle Support
  • 34. Oracle Database ORA-00060 Errors on Single Instance (Non-RAC) Diagnosing Using Deadlock Graphs in ORA-00060 Trace Files (Doc ID 1550091.2) Troubleshooting Assistant https://support.oracle.com/epmos/faces/DocContentDisplay?id=1550091.2
  • 35. Oracle Database ORA-00060 Errors on Single Instance (Non-RAC) Diagnosing Using Deadlock Graphs in ORA-00060 Trace Files (Doc ID 1550091.2) Troubleshooting Assistant
  • 36. Understand and Troubleshoot Startup/Shutdown Issues (Doc ID 1591095.2) Troubleshooting Assistant https://support.oracle.com/epmos/faces/DocContentDisplay?id=1591095.2
  • 37. Understand and Troubleshoot Startup/Shutdown Issues (Doc ID 1591095.2) Troubleshooting Assistant
  • 38. Understand and Troubleshoot Startup/Shutdown Issues (Doc ID 1591095.2) Troubleshooting Assistant
  • 39. Understand and Troubleshoot Startup/Shutdown Issues (Doc ID 1591095.2) Troubleshooting Assistant
  • 40. Oracle Undo Management (ORA-01555, ORA-30036, ORA-01628, ORA-01552, etc.) (Doc ID 1575667.2) Troubleshooting Assistant https://support.oracle.com/epmos/faces/DocContentDisplay?id=1575667.2
  • 41. Oracle Undo Management (ORA-01555, ORA-30036, ORA-01628, ORA-01552, etc.) (Doc ID 1575667.2) Troubleshooting Assistant
  • 42. Oracle Undo Management (ORA-01555, ORA-30036, ORA-01628, ORA-01552, etc.) (Doc ID 1575667.2) Troubleshooting Assistant
  • 43. Handling Block Corruptions in Oracle7 / 8 / 8i / 9i / 10g / 11g (Doc ID 1598103.2) Troubleshooting Assistant https://support.oracle.com/epmos/faces/DocContentDisplay?id=1598103.2
  • 44. Handling Block Corruptions in Oracle7 / 8 / 8i / 9i / 10g / 11g (Doc ID 1598103.2) Troubleshooting Assistant
  • 45. Handling Block Corruptions in Oracle7 / 8 / 8i / 9i / 10g / 11g (Doc ID 1598103.2) Troubleshooting Assistant
  • 48. 1. Login to the database server and set the environment used by the Database Instance 2. Download the "sqlhc.zip" archive file and extract the contents to a suitable directory/folder 3. Connect into SQL*Plus as SYS, a DBA account, or a user with access to Data Dictionary views and simply execute the "sqlhc.sql" script. It will request to enter two parameters: i. Oracle Pack License (Tuning, Diagnostics or None) [T|D|N] (required) ii. A valid SQL_ID for the SQL to be analyzed. If site has both Tuning and Diagnostics licenses then specify T (Oracle Tuning pack includes Oracle Diagnostics) For Example: Health Check SQL # sqlplus / as sysdba SQL> START sqlhc.sql T djkbyr8vkc64h
  • 50. Query trace files using SQL
  • 51. SQL> describe V$DIAG_TRACE_FILE Name Null? Type ----------------------------------------- -------- ---------------------------- ADR_HOME VARCHAR2(444) TRACE_FILENAME VARCHAR2(68) CHANGE_TIME TIMESTAMP(3) WITH TIME ZONE MODIFY_TIME TIMESTAMP(3) WITH TIME ZONE CON_ID NUMBER V$DIAG_TRACE_FILE and V$DIAG_TRACE_FILE_CONTENTS
  • 52. SQL> describe V$DIAG_TRACE_FILE_CONTENTS Name Null? Type ----------------------------------------- -------- ---------------------------- ADR_HOME VARCHAR2(444) TRACE_FILENAME VARCHAR2(68) RECORD_LEVEL NUMBER PARENT_LEVEL NUMBER RECORD_TYPE NUMBER TIMESTAMP TIMESTAMP(3) WITH TIME ZONE PAYLOAD VARCHAR2(4000) SECTION_ID NUMBER SECTION_NAME VARCHAR2(64) COMPONENT_NAME VARCHAR2(64) OPERATION_NAME VARCHAR2(64) FILE_NAME VARCHAR2(64) FUNCTION_NAME VARCHAR2(64) LINE_NUMBER NUMBER THREAD_ID VARCHAR2(64) SESSION_ID NUMBER SERIAL# NUMBER CON_UID NUMBER CONTAINER_NAME VARCHAR2(64) CON_ID NUMBER V$DIAG_TRACE_FILE and V$DIAG_TRACE_FILE_CONTENTS
  • 53. SQL> select trace_filename from v$diag_trace_file; TRACE_FILENAME -------------------------------------------------------------------- ORCL1_mz00_21108.trc ORCL1_gcr2_16504.trc ORCL1_gcr3_12849.trc ORCL1_gcr1_28159.trc ORCL1_gcr1_27603.trc ORCL1_gcr0_29971.trc ORCL1_mz00_26487.trc ORCL1_mz00_28329.trc ORCL1_ora_19005.trc ORCL1_gcr3_12879.trc ORCL1_gcr1_11688.trc V$DIAG_TRACE_FILE and V$DIAG_TRACE_FILE_CONTENTS
  • 54. SQL> select payload from v$diag_trace_file_contents where trace_filename ='ORCL1_ora_19005.trc'; PAYLOAD -------------------------------------------------------------------------------- Trace file /u01/app/oracle/diag/rdbms/orcl_unq/ORCL1/trace/ORCL1_ora_19005.trc Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production Version 19.2.0.0.0 Build label: RDBMS_19.2.0.0.0_LINUX.X64_190121 ORACLE_HOME: /u01/app/oracle/product/19c/dbhome_1 System name: Linux Node name: myserver65 Release: 4.14.35-1844.1.3.el7uek.x86_64 Version: #2 SMP Wed Jan 2 21:18:29 PST 2019 Machine: x86_64 VM name: Xen Version: 4.1 (HVM) ... V$DIAG_TRACE_FILE and V$DIAG_TRACE_FILE_CONTENTS
  • 55. ... PAYLOAD -------------------------------------------------------------------------------- Instance name: ORCL1 Redo thread mounted by this instance: 1 Oracle process number: 12 Unix process pid: 19005, image: oracle@myserver65 (TNS V1-V3) *** 2020-10-01T01:22:10.770960+00:00 *** SESSION ID:(106.17196) 2020-10-01T01:22:10.771014+00:00 *** CLIENT ID:() 2020-10-01T01:22:10.771027+00:00 *** SERVICE NAME:(SYS$USERS) 2020-10-01T01:22:10.771039+00:00 ... V$DIAG_TRACE_FILE and V$DIAG_TRACE_FILE_CONTENTS
  • 56. SQL> describe V$DIAG_SESS_SQL_TRACE_RECORDS Name Null? Type ----------------------------------------- -------- ---------------------------- ADR_HOME VARCHAR2(444) TRACE_FILENAME VARCHAR2(68) RECORD_LEVEL NUMBER PARENT_LEVEL NUMBER RECORD_TYPE NUMBER TIMESTAMP TIMESTAMP(3) WITH TIME ZONE PAYLOAD VARCHAR2(4000) SECTION_ID NUMBER SECTION_NAME VARCHAR2(64) COMPONENT_NAME VARCHAR2(64) OPERATION_NAME VARCHAR2(64) FILE_NAME VARCHAR2(64) FUNCTION_NAME VARCHAR2(64) LINE_NUMBER NUMBER THREAD_ID VARCHAR2(64) SESSION_ID NUMBER SERIAL# NUMBER CON_UID NUMBER CONTAINER_NAME VARCHAR2(64) CON_ID NUMBER V$DIAG_SESS_SQL_TRACE_RECORDS
  • 57. SQL> SELECT sid,serial# FROM v$session WHERE username = 'SYS’; SID SERIAL# ---------- ---------- 33 45888 129 6051 SQL> EXECUTE DBMS_SYSTEM.SET_SQL_TRACE_IN_SESSION(129,6051,TRUE); PL/SQL procedure successfully completed. V$DIAG_SESS_SQL_TRACE_RECORDS Enable session tracing
  • 58. SQL> select unique trace_filename from V$DIAG_SESS_SQL_TRACE_RECORDS; TRACE_FILENAME -------------------------------------------------------------------- ORCL1_ora_14151.trc SQL> select payload from V$DIAG_SESS_SQL_TRACE_RECORDS where trace_filename = 'ORCL1_ora_14151.trc'; PAYLOAD -------------------------------------------------------------------------------- CLOSE #140506358472544:c=19,e=18,dep=0,type=1,tim=7769230586778 ===================== PARSING IN CURSOR #140506358494608 len=97 dep=1 uid=0 oct=3 lid=0 tim=7769230600 163 hv=791757000 ad='7fa0c290' sqlid='87gaftwrm2h68' select o.owner#,o.name,o.namespace,o.remoteowner,o.linkname,o.subname from obj$ o where o.obj#=:1 END OF STMT EXEC #140506358494608:c=65,e=65,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=4,plh=107238262 4,tim=7769230600159 ... V$DIAG_SESS_SQL_TRACE_RECORDS
  • 59. ... PAYLOAD -------------------------------------------------------------------------------- FETCH #140506358494608:c=38,e=37,p=0,cr=2,cu=0,mis=0,r=0,dep=1,og=4,plh=10723826 24,tim=7769230600324 CLOSE #140506358494608:c=5,e=4,dep=1,type=3,tim=7769230600381 EXEC #140506358494608:c=23,e=23,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=4,plh=107238262 4,tim=7769230600500 FETCH #140506358494608:c=11,e=12,p=0,cr=2,cu=0,mis=0,r=0,dep=1,og=4,plh=10723826 24,tim=7769230600547 ... V$DIAG_SESS_SQL_TRACE_RECORDS SQL> EXECUTE DBMS_SYSTEM.SET_SQL_TRACE_IN_SESSION(129,6051,FALSE); PL/SQL procedure successfully completed.
  • 60. Keep track of the attribute of import file pre and post patching
  • 61. Start tracking using –fileattr start Automatically discovers Grid Infrastructure and Database directories and files • Prevent discovery using –excludediscovery Further configure the list of monitored directories using –includedir Track attribute changes on important files tfactl <orachk|exachk> -fileattr start -includedir "/root/myapp/config" ... List of directories(recursive) for checking file attributes: /u01/app/oradb/product/11.2.0/dbhome_11203 /u01/app/oradb/product/11.2.0/dbhome_11204 /root/myapp/config orachk has taken snapshot of file attributes for above directories at: /orahome/oradb/orachk/orachk_mysrv21_20201001_041214
  • 62. Compare current attributes against first snapshot using –fileattr check When checking, use the same include/exclude arguments you started with Track attribute changes on important files tfactl <orachk|exachk> -fileattr check -includedir "/root/myapp/config” ... List of directories(recursive) for checking file attributes: /u01/app/oradb/product/11.2.0/dbhome_11203 /u01/app/oradb/product/11.2.0/dbhome_11204 /root/myapp/config Checking file attribute changes... "/root/myapp/config/myappconfig.xml" is different: Baseline : 0644 oracle root /root/myapp/config/myappconfig.xml Current : 0644 root root /root/myapp/config/myappconfig.xml ...
  • 63. Automatically proceeds to run compliance checks after file attribute checks • Only run attribute checks by using -fileattronly File Attribute Changes are shown in HTML report output Track attribute changes on important files
  • 65. Automatically running critical checks every two hours and full checks once a day at 2am • You only need to configure your email for notification ORAchk | EXAchk email notification tfactl <orachk|exachk> -set “NOTIFICATION_EMAIL=SOME.BODY@COMPANY.COM
  • 66. TFA can send email notification when faults are detected • Notification for all problems: • Notification for all problems on database owned by oracle user: • Optionally configure an SMTP server: • Confirm email notification work: Critical event notification tfactl set notificationAddress=some.body@example.com tfactl set notificationAddress=oracle:another.person@example.com tfactl set smtp tfactl sendmail <email_address>
  • 67. Critical event notification Event: ORA-29770 Event time: Thu Oct 01 07:13:09 PDT 2020 File containing event: /u01/app/oracle/diag /rdbms/orcl/orcl/tra ce/alert_orcl.log Logs will be collected at: /opt/oracle.ahf/data /repository/auto_srd c_ORA- 29770_2020_10_01:09_ myserver1.zip
  • 68. Critical event notification Symptom LCK0 (ospid:NNNN) has not called a wait for <n_secs> secs. Call stack: ksedsts <- kjzdssdmp <- kjzduptcctx <- kjzdicrshnfy <- ksuitm <- kjgcr_KillInstance <- kjgcr_Main <- kjfmlmhb_Main <- ksbrdp
  • 69. Critical event notification Action Apply the one-off patch 18795105 to resolve this issue For further information see Doc :1998445.1 and Doc :18795105.8 Cause Instance crash due to ORA-29770 LCK0 hung
  • 71. Self analysis in MOS using TFA collections
  • 72.
  • 73.
  • 74. tfactl diagcollect –srdc <srdc_type> • Scans system to identify recent events • Once the relevant event is chosen, proceeds with diagnostic collection One command SRDC tfactl diagcollect -srdc ORA-00600 Enter the time of the ORA-00600 [YYYY-MM-DD HH24:MI:SS,<RETURN>=ALL] : Enter the Database Name [<RETURN>=ALL] : 1. Oct/01/2020 05:29:58 : [orcl2] ORA-00600: internal error code, arguments: [600], [], [], [], [], [], [], [], [], [], [], [] 2. Oct/01/2020 06:55:08 : [orcl2] ORA-00600: internal error code, arguments: [600], [], [], [], [], [], [], [], [], [], [], [] Please choose the event : 1-2 [1] Selected value is : 1 (Oct/01/2020 05:29:58 )
  • 75. All required files are identified • Trimmed where applicable • Package in a zip ready to provide to support One command SRDC ... 2020/10/01 06:14:24 EST : Getting List of Files to Collect 2020/10/01 06:14:27 EST : Trimming file : myserver1/rdbms/orcl2/orcl2/trace/orcl2_lmhb_3542.trc with original file size : 163MB ... 2020/10/01 06:14:58 EST : Total time taken : 39s 2020/10/01 06:14:58 EST : Completed collection of zip files. ... /opt/oracle.ahf/data/repository/srdc_ora600_collection_Tue_Sep_7_06_14_17 _EST_2020_node_local/myserver1.tfa_srdc_ora600_Thu_Oct_1_06_14_17_EST_202 0.zip
  • 76.
  • 78. Collects, processes, and maintains performance statistics for problem detection and self-tuning purposes Gathered data is stored both in memory and in the database, and is displayed in both reports and views Automatic Workload Repository (AWR) The statistics collected and processed by AWR include: • Object statistics that determine both access and usage statistics of database segments • Time model statistics based on time usage for activities, displayed in the V$SYS_TIME_MODEL and V$SESS_TIME_MODEL views • Some of the system and session statistics collected in the V$SYSSTAT and V$SESSTAT views • SQL statements that are producing the highest load on the system, based on criteria such as elapsed time and CPU time • Active Session History (ASH) statistics, representing the history of recent sessions activity
  • 79. Create an AWR snapshot Run your workload Create an AWR snapshot Generate report for the time period Generating an AWR Report SQL> EXECUTE DBMS_WORKLOAD_REPOSITORY.CREATE_SNAPSHOT() SQL> EXECUTE DBMS_WORKLOAD_REPOSITORY.CREATE_SNAPSHOT() SQL> @$ORACLE_HOME/rdbms/admin/awrrpt.sql
  • 80. Generating an AWR Compare Periods Report for the Local Database Generating an AWR Compare Periods Report for a Specific Database To generate an AWR Compare Periods report for Oracle RAC on the local database instance To generate an AWR Compare Periods report for Oracle RAC on a specific database To generate a Global AWR report for RAC To generate a SQL Statement report Information on the AWR Repository AWR Scripts SQL> @$ORACLE_HOME/rdbms/admin/awrddrpt.sql SQL> @$ORACLE_HOME/rdbms/admin/awrddrpi.sql SQL> @$ORACLE_HOME/rdbms/admin/awrgdrpt.sql SQL> @$ORACLE_HOME/rdbms/admin/awrgdrpi.sql SQL> @$ORACLE_HOME/rdbms/admin/awrgrpt.sql SQL> @$ORACLE_HOME/rdbms/admin/awrsqrpt.sql SQL> @$ORACLE_HOME/rdbms/admin/awrinfo.sql
  • 82. Sensitive information can be hidden from diagnostics Machine learning algorithms determine sensitive data like: • Host names • IP addresses • MAC addresses • Oracle Database names • Tablespace names • Service names • Ports • Operating system user names Sanitize or mask sensitive information
  • 83. Add –sanitize or –mask to any command • –sanitize replaces a sensitive value with random characters • myhost123 >>>> JnsF3km9 • –mask replaces a sensitive value with a series of ‘X’ • myhost123 >>>> XXXXXXXX Sanitize or mask sensitive information
  • 84. Sanitized hostname Sanitized hostname tfactl orachk –preupgrade -sanitize
  • 85. tfactl orachk -rmap qzh024703246tsa1 TFA using ORAchk : /opt/oracle.ahf/orachk/orachk ___________________________________________________________________________ | Entity Type | Substituted Entity Name | Original Entity Name | ___________________________________________________________________________ | hostname | qzh024703246tsa1 | myserver1 | ___________________________________________________________________________ Reverse map the sanitization
  • 88. Repair command Check IDCheck ID Repair command
  • 89. Understand what the repair command does Understand what the repair command will do with: tfactl orachk -showrepair 8300E0A2FFE48253E053D298EB0A76CC TFA using ORAchk : /opt/oracle.ahf/orachk/orachk Repair Command: currentUserName=$(whoami) if [ "$currentUserName" = "root" ] then repair_report=$(rpm -e stix-fonts 2>&1) else repair_report="$currentUserName does not have priviedges to run $CRS_HOME/bin/crsctl set resource use 1" fi echo -e "$repair_report"
  • 90. Run the repair command Run the checks again and repair everything that fails Run the checks again and repair only the specified checks Run the checks again and repair all checks listed in the file tfactl orachk -repaircheck all tfactl orachk -repaircheck <check_id_1>,<check_id_2> tfactl orachk -repaircheck <file>
  • 91. Find if anything has changed
  • 92. tfactl changes Output from host : myserver69 ------------------------------ [Oct/01/2020 04:54:15.397]: Parameter: fs.aio-nr: Value: 95488 => 97024 [Oct/01/2020 04:54:15.397]: Parameter: fs.inode-nr: Value: 764974 131561 => 740744 131259 [Oct/01/2020 04:54:15.397]: Parameter: kernel.pty.nr: Value: 2 => 1 [Oct/01/2020 04:54:15.397]: Parameter: kernel.random.entropy_avail: Value: 189 => 158 [Oct/01/2020 04:54:15.397]: Parameter: kernel.random.uuid: Value: 36269877-9bc9- 40a3-82e0-1619865096f2 => 7551c5e7-c59f-40fa-b55f-5bd170e8b1ab [Oct/01/2020 05:46:15.397]: Parameter: fs.aio-nr: Value: 119680 => 122880 [Oct/01/2020 05:46:15.397]: Parameter: fs.inode-nr: Value: 1580316 810036 => 1562320 768555 [Oct/01/2020 05:46:15.397]: Parameter: kernel.pty.nr: Value: 19 => 18 [Oct/01/2020 05:46:15.397]: Parameter: kernel.random.uuid: Value: 37cc31aa-ee31- 459e-8f2a-0766b34b1b64 => f5176cdc-6390-415d-882e-02c4cff2ae4e ... Has anything changed recently?
  • 93. ... Output from host : myserver70 ------------------------------ [Oct/01/2020 04:54:15.397]: Parameter: fs.aio-nr: Value: 95488 => 97024 [Oct/01/2020 04:54:15.397]: Parameter: fs.inode-nr: Value: 764974 131561 => 740744 131259 [Oct/01/2020 04:54:15.397]: Parameter: kernel.pty.nr: Value: 2 => 1 [Oct/01/2020 04:54:15.397]: Parameter: kernel.random.entropy_avail: Value: 189 => 158 [Oct/01/2020 04:54:15.397]: Parameter: kernel.random.uuid: Value: 36269877-9bc9- 40a3-82e0-1619865096f2 => 7551c5e7-c59f-40fa-b55f-5bd170e8b1ab [Oct/01/2020 05:46:15.397]: Parameter: fs.aio-nr: Value: 119680 => 122880 [Oct/01/2020 05:46:15.397]: Parameter: fs.inode-nr: Value: 1580316 810036 => 1562320 768555 [Oct/01/2020 05:46:15.397]: Parameter: kernel.pty.nr: Value: 19 => 18 [Oct/01/2020 05:46:15.397]: Parameter: kernel.random.uuid: Value: 37cc31aa-ee31- 459e-8f2a-0766b34b1b64 => f5176cdc-6390-415d-882e-02c4cff2ae4e [Oct/01/2020 16:56:15.398]: Parameter: fs.aio-nr: Value: 97024 => 98560 Has anything changed recently?
  • 94. Pre and post upgrade compliance checking
  • 95. ORAchk/EXAchk provides a single source for all upgrade checks ORAchk checks EXAchk checks Database AutoUpgrade checks Cluster Verification Utility (CVU) checks Compare Contrast Combine Consolidate Resulting ORAchk / EXAchk checks
  • 96. ORAchk/EXAchk provides a single source for all upgrade checks To check an environment before upgrading run: To check an environment after upgrade run: tfactl <orachk|exachk> –preupgrade tfactl <orachk|exachk> –postupgrade
  • 97. Detect and collect using SRDC
  • 98. Other Server Technology Enterprise Manager Data Guard GoldenGate Exalogic Database areas Errors / Corruption Performance Install / patching / upgrade RAC / Grid Infrastructure Import / Export RMAN Transparent Data Encryption Storage / partitioning Undo / auditing Listener / naming services Spatial / XDB Some problem areas covered in SRDCs Full list in documentation Around 100 problem types covered tfactl diagcollect –srdc <srdc_type> [-sr <sr_number>]
  • 99. TFA SRDCManual method Manual collection vs TFA SRDC for database performance 1. Generate ADDM reviewing Document 1680075.1 (multiple steps) 2. Identify “good” and “problem” periods and gather AWR reviewing Document 1903158.1 (multiple steps) 3. Generate AWR compare report (awrddrpt.sql) using “good” and “problem” periods 4. Generate ASH report for “good” and “problem” periods reviewing Document 1903145.1 (multiple steps) 5. Collect OSWatcher data reviewing Document 301137.1 (multiple steps) 6. Collect Hang Analyze output at Level 4 7. Generate SQL Healthcheck for problem SQL id using Document 1366133.1 (multiple steps) 8. Run support provided sql scripts – Log File sync diagnostic output using Document 1064487.1 (multiple steps) 9. Check alert.log if there are any errors during the “problem” period 10. Find any trace files generated during the “problem” period 11. Collate and upload all the above files/outputs to SR 1. Run tfactl diagcollect –srdc dbperf [-sr <sr_number>]
  • 100. tfactl diagcollect –srdc <srdc_type> • Scans system to identify recent events • Once the relevant event is chosen, proceeds with diagnostic collection One command SRDC tfactl diagcollect -srdc ORA-00600 Enter the time of the ORA-00600 [YYYY-MM-DD HH24:MI:SS,<RETURN>=ALL] : Enter the Database Name [<RETURN>=ALL] : 1. Oct/01/2020 05:29:58 : [orcl2] ORA-00600: internal error code, arguments: [600], [], [], [], [], [], [], [], [], [], [], [] 2. Oct/01/2020 06:55:08 : [orcl2] ORA-00600: internal error code, arguments: [600], [], [], [], [], [], [], [], [], [], [], [] Please choose the event : 1-2 [1] Selected value is : 1 (Oct/01/2020 05:29:58 )
  • 101. All required files are identified • Trimmed where applicable • Package in a zip ready to provide to support One command SRDC ... 2020/10/01 06:14:24 EST : Getting List of Files to Collect 2020/10/01 06:14:27 EST : Trimming file : myserver1/rdbms/orcl2/orcl2/trace/orcl2_lmhb_3542.trc with original file size : 163MB ... 2020/10/01 06:14:58 EST : Total time taken : 39s 2020/10/01 06:14:58 EST : Completed collection of zip files. ... /opt/oracle.ahf/data/repository/srdc_ora600_collection_Thu_Oct_1_06_14_17 _EST_2020_node_local/myserver1.tfa_srdc_ora600_Thu_Oct_1_06_14_17_EST_202 0.zip
  • 103. TFA can automatically purge database logs Purging automatically removes logs older than 30 days • Configurable with Purging runs every 60 minutes • Configurable with: Automatic Database Log Purge tfactl set manageLogsAutoPurge=ON tfactl set manageLogsAutoPurgePolicyAge=<n><d|h> tfactl set manageLogsAutoPurgeInterval=<minutes>
  • 104. TFA can manage ADR log and trace files tfactl managelogs <options> –show usage #Show disk space usage per diagnostic directory for both GI and database logs -show variation –older <n><m|h|d> #Show disk space growth for specified period -purge –older <n><m|h|d> #Remove ADR files older than the time specified –gi #Restrict command to only files under the GI_BASE –database [all | dbname] #Restrict command to only files under the database directory -dryrun #Use with –purge to estimate how many files will be affected and how much disk space will be freed by a potential purge command Manual Database Log Purge
  • 105. tfactl managelogs -show usage ... .---------------------------------------------------------------------------------. | Grid Infrastructure Usage | +---------------------------------------------------------------------+-----------+ | Location | Size | +---------------------------------------------------------------------+-----------+ | /u01/app/crsusr/diag/afdboot/user_root/host_309243680_94/alert | 28.00 KB | | /u01/app/crsusr/diag/afdboot/user_root/host_309243680_94/incident | 4.00 KB | | /u01/app/crsusr/diag/afdboot/user_root/host_309243680_94/trace | 8.00 KB | ... +---------------------------------------------------------------------+-----------+ | Total | 739.06 MB | '---------------------------------------------------------------------+-----------’ ... Understand Database log disk space usage Use -gi to only show grid infrastructure
  • 106. ... .---------------------------------------------------------------. | Database Homes Usage | +---------------------------------------------------+-----------+ | Location | Size | +---------------------------------------------------+-----------+ | /u01/app/crsusr/diag/rdbms/cdb674/CDB674/alert | 1.06 MB | | /u01/app/crsusr/diag/rdbms/cdb674/CDB674/incident | 4.00 KB | | /u01/app/crsusr/diag/rdbms/cdb674/CDB674/trace | 146.19 MB | | /u01/app/crsusr/diag/rdbms/cdb674/CDB674/cdump | 4.00 KB | | /u01/app/crsusr/diag/rdbms/cdb674/CDB674/hm | 4.00 KB | +---------------------------------------------------+-----------+ | Total | 147.26 MB | '---------------------------------------------------+-----------' Understand Database log disk space usage Use -database to only show database
  • 107. Understand Database log disk space usage variations tfactl managelogs -show variation -older 30d Output from host : myserver74 ------------------------------ 2020-10-01 12:30:42: INFO Checking space variation for 30 days .---------------------------------------------------------------------------------------------. | Grid Infrastructure Variation | +---------------------------------------------------------------------+-----------+-----------+ | Directory | Old Size | New Size | +---------------------------------------------------------------------+-----------+-----------+ | /u01/app/crsusr/diag/asm/user_root/host_309243680_96/alert | 22.00 KB | 28.00 KB | +---------------------------------------------------------------------+-----------+-----------+ | /u01/app/crsusr/diag/clients/user_crsusr/host_309243680_96/cdump | 4.00 KB | 4.00 KB | +---------------------------------------------------------------------+-----------+-----------+ | /u01/app/crsusr/diag/tnslsnr/myserver74/listener/alert | 15.06 MB | 244.10 MB | +---------------------------------------------------------------------+-----------+-----------+ ...
  • 108. Understand Database log disk space usage variations ... .---------------------------------------------------------------------------. | Database Homes Variation | +---------------------------------------------------+-----------+-----------+ | Directory | Old Size | New Size | +---------------------------------------------------+-----------+-----------+ | /u01/app/crsusr/diag/rdbms/cdb674/CDB674/hm | 4.00 KB | 4.00 KB | +---------------------------------------------------+-----------+-----------+ | /u01/app/crsusr/diag/rdbms/cdb674/CDB674/trace | 16.63 MB | 146.19 MB | +---------------------------------------------------+-----------+-----------+ | /u01/app/crsusr/diag/rdbms/cdb674/CDB674/cdump | 4.00 KB | 4.00 KB | +---------------------------------------------------+-----------+-----------+ | /u01/app/crsusr/diag/rdbms/cdb674/CDB674/incident | 4.00 KB | 4.00 KB | +---------------------------------------------------+-----------+-----------+ | /u01/app/crsusr/diag/rdbms/cdb674/CDB674/alert | 1.06 MB | 1.06 MB | '------------------------------------------------------------+-------------+-------------'
  • 109. Run a database log purge dryrun tfactl managelogs -purge -older 30d -dryrun Output from host : myserver74 ------------------------------ Estimating files older than 30 days Estimating purge for diagnostic destination "diag/afdboot/user_root/host_309243680_94" for files ~ 2 files deleted , 22.58 KB freed ] Estimating purge for diagnostic destination "diag/afdboot/user_crsusr/host_309243680_94" for files ~ 2 files deleted , 11.72 KB freed ] Estimating purge for diagnostic destination "diag/asmtool/user_root/host_309243680_96" for files ~ 2 files deleted , 21.36 KB freed ] Estimating purge for diagnostic destination "diag/asmtool/user_crsusr/host_309243680_96" for files ~ 3 files deleted , 23.22 KB freed ] Estimating purge for diagnostic destination "diag/tnslsnr/myserver74/listener" for files ~ 23 files deleted , 225.33 MB freed ] Estimating purge for diagnostic destination "diag/diagtool/user_root/adrci_309243680_96" for files ~ 73 files deleted , 517.69 KB freed ] Estimating purge for diagnostic destination "diag/clients/user_crsusr/host_309243680_96" for files ~ 38 files deleted , 17.15 KB freed ] Estimating purge for diagnostic destination "diag/asm/+asm/+ASM" for files ~ 0 files deleted , 0 bytes freed ] Estimating purge for diagnostic destination "diag/asm/user_root/host_309243680_96" for files ~ 1 files deleted , 19.52 KB freed ] Estimating purge for diagnostic destination "diag/asm/user_crsusr/host_309243680_96" for files ~ 1 files deleted , 20.25 KB freed ] Estimating purge for diagnostic destination "diag/crs/myserver74/crs" for files ~ 40 files deleted , 219.39 MB freed ] Estimation for Grid Infrastructure [ Files to delete : ~ 185 files | Space to be freed : ~ 445.36 MB ] Estimating purge for diagnostic destination "diag/rdbms/cdb674/CDB674" for files ~ 27760 files deleted , 66.57 MB freed ] Estimation for Database Home [ Files to delete : ~ 27760 files | Space to be freed : ~ 66.57 MB ]
  • 110. Run a database log purge tfactl managelogs -purge -older 30d Output from host : myserver74 ------------------------------ Purging files older than 30 days Cleaning Grid Infrastructure destinations Purging diagnostic destination "diag/afdboot/user_root/host_309243680_94" for files - 0 files deleted , 0 bytes freed Purging diagnostic destination "diag/afdboot/user_crsusr/host_309243680_94" for files - 1 files deleted , 10.16 KB freed Purging diagnostic destination "diag/asmtool/user_root/host_309243680_96" for files - 1 files deleted , 10.16 KB freed Purging diagnostic destination "diag/asmtool/user_crsusr/host_309243680_96" for files - 2 files deleted , 29.18 KB freed Purging diagnostic destination "diag/tnslsnr/myserver74/listener" for files - 2 files deleted , 29.18 KB freed Purging diagnostic destination "diag/diagtool/user_root/adrci_309243680_96" for files - 2 files deleted , 29.18 KB freed Purging diagnostic destination "diag/clients/user_crsusr/host_309243680_96" for files - 2 files deleted , 29.18 KB freed Purging diagnostic destination "diag/asm/+asm/+ASM" for files - 2 files deleted , 29.18 KB freed Purging diagnostic destination "diag/asm/user_root/host_309243680_96" for files - 2 files deleted , 29.18 KB freed Purging diagnostic destination "diag/asm/user_crsusr/host_309243680_96" for files - 2 files deleted , 29.18 KB freed Purging diagnostic destination "diag/crs/myserver74/crs" for files - 2 files deleted , 29.18 KB freed ...
  • 111. Run a database log purge ... Grid Infrastructure [ Files deleted : 18 files | Space Freed : 253.75 KB ] .-----------------------------------------------------------------------------------------------. | File System Variation : /u01/app/crsusr/12.2.0/grid2 | +--------+-----------------------------------+----------+----------+---------+----------+-------+ | State | Name | Size | Used | Free | Capacity | Mount | +--------+-----------------------------------+----------+----------+---------+----------+-------+ | Before | /dev/mapper/vg_rws1270665-lv_root | 51475068 | 46597152 | 2256476 | 96% | / | | After | /dev/mapper/vg_rws1270665-lv_root | 51475068 | 46597152 | 2256476 | 96% | / | '--------+-----------------------------------+----------+----------+---------+----------+-------'
  • 113. tail files tfactl tail alert Output from host : myserver69 ------------------------------ /scratch/app/11.2.0.4/grid/log/myserver69/alertmyserver69.log 2020-10-01 23:28:22.532: [ctssd(5630)]CRS-2409:The clock on host myserver69 is not synchronous with the mean cluster time. No action has been taken as the Cluster Time Synchronization Service is running in observer mode. 2020-10-01 23:58:22.964: [ctssd(5630)]CRS-2409:The clock on host myserver69 is not synchronous with the mean cluster time. No action has been taken as the Cluster Time Synchronization Service is running in observer mode. ...
  • 114. tail files ... /scratch/app/oradb/diag/rdbms/apxcmupg/apxcmupg_2/trace/alert_apxcmupg_2.log Thu Oct 01 06:00:00 2020 VKRM started with pid=82, OS id=4903 Thu Oct 01 06:00:02 2020 Begin automatic SQL Tuning Advisor run for special tuning task "SYS_AUTO_SQL_TUNING_TASK" Thu Oct 01 06:00:37 2020 End automatic SQL Tuning Advisor run for special tuning task "SYS_AUTO_SQL_TUNING_TASK" Thu Oct 01 23:00:28 2020 Thread 2 advanced to log sequence 759 (LGWR switch) Current log# 3 seq# 759 mem# 0: +DATA/apxcmupg/onlinelog/group_3.289.917164707 Current log# 3 seq# 759 mem# 1: +FRA/apxcmupg/onlinelog/group_3.289.917164707 ...
  • 115. tail files ... /scratch/app/oradb/diag/rdbms/ogg11204/ogg112041/trace/alert_ogg112041.log Clearing Resource Manager plan via parameter Thu Oct 01 05:59:59 2020 Setting Resource Manager plan DEFAULT_MAINTENANCE_PLAN via parameter Thu Oct 01 05:59:59 2020 Starting background process VKRM Thu Oct 01 05:59:59 2020 VKRM started with pid=36, OS id=4901 Thu Oct 01 22:00:31 2020 Thread 1 advanced to log sequence 305 (LGWR switch) Current log# 1 seq# 305 mem# 0: +DATA/ogg11204/redo01.log ...
  • 116. tail files ... /scratch/app/oragrid/diag/asm/+asm/+ASM1/trace/alert_+ASM1.log <== Thu Oct 01 04:42:22 2020 NOTE: [ocrcheck.bin@myserver69 (TNS V1-V3) 2323] opening OCR file Thu Oct 01 01:05:39 2020 NOTE: [ocrcheck.bin@myserver69 (TNS V1-V3) 16591] opening OCR file Thu Oct 01 01:05:41 2020 NOTE: [ocrcheck.bin@myserver69 (TNS V1-V3) 16603] opening OCR file Thu Oct 01 01:21:12 2020 NOTE: [ocrcheck.bin@myserver69 (TNS V1-V3) 1803] opening OCR file Thu Oct 01 01:21:12 2020 NOTE: [ocrcheck.bin@myserver69 (TNS V1-V3) 1816] opening OCR file ...
  • 118. Near real-time Database monitoring • Single instance & RAC • Monitoring current database activities • Database performance • Identifying contentions and bottleneck • Process & SQL Monitoring • Real time wait events • Active Data Guard support • Multitenant Database (CDB) support oratop (Support Tools Bundle)
  • 119. Monitor Database performance tfactl run oratop -database ogg19c
  • 120. Section 1 DATABASE: Global database information Section 2 INSTANCE: Database instance Activity Section 3 EVENT: AWR like “Top 5 Timed Events“ Section 4 PROCESS | SQL: Processes or SQL mode information Monitor Database performance more info 1500864.1
  • 122. Collect & Archive OS Metrics Executes standard UNIX utilities (e.g. vmstat, iostat, ps, etc) on regular intervals Built in Analyzer functionality to summarize, graph and report upon collected metrics Output is Required for node reboot and performance issues Simple to install, extremely lightweight Runs on ALL platforms (Except Windows) OS Watcher (Support Tools Bundle)
  • 123. Analyse OS Metrics tfactl run oswbb Starting OSW Analyzer V8.4.0 OSWatcher Analyzer Written by Oracle Center of Expertise Copyright (c) 2020 by Oracle Corporation Parsing Data. Please Wait... Scanning file headers for version and platform info... Parsing file rws1270069_iostat_18.11.24.0900.dat ... Parsing file rws1270069_iostat_18.11.24.1000.dat ... ...
  • 124. Analyse OS Metrics ... Enter 1 to Display CPU Process Queue Graphs Enter 2 to Display CPU Utilization Graphs Enter 3 to Display CPU Other Graphs Enter 4 to Display Memory Graphs Enter 5 to Display Disk IO Graphs Enter GC to Generate All CPU Gif Files Enter GM to Generate All Memory Gif Files Enter GD to Generate All Disk Gif Files Enter GN to Generate All Network Gif Files Enter L to Specify Alternate Location of Gif Directory Enter Z to Zoom Graph Time Scale (Does not change analysis dataset) ...
  • 125. Analyse OS Metrics ... Enter B to Returns to Baseline Graph Time Scale (Does not change analysis dataset) Enter R to Remove Currently Displayed Graphs Enter X to Export Parsed Data to Flat File Enter S to Analyze Subset of Data(Changes analysis dataset including graph time scale) Enter A to Analyze Data Enter D to Generate DashBoard Enter Q to Quit Program Please Select an Option:1
  • 129. Generates view of Cluster and Database diagnostic metrics • Always on - Enabled by default • Provides Detailed OS Resource Metrics • Assists Node eviction analysis • Locally logs all process data • User can define pinned processes • Listens to CSS and GIPC events • Categorizes processes by type • Supports plug-in collectors (ex. traceroute, netstat, ping, etc.) • New CSV output for ease of analysis Cluster Health Monitor (CHM) GIMR ologgerd (master) osysmon d osysmon d osysmon d osysmon d 12c Grid Infrastructure Management Repository OS Data OS Data OS Data OS Data
  • 130. Cluster Health Monitor (CHM) Confidential – Oracle Internal/Restricted/Highly Restricted Oclumon CLI or full integration with EM Cloud Control
  • 131. Always on - Enabled by default Detects node and database performance problems Provides early-warning alerts and corrective action Supports on-site calibration to improve sensitivity Integrated into EMCC Incident Manager and notifications Standalone Interactive GUI Tool Cluster Health Advisor (CHA)* OS Data GIMR ochad DB Data CHM Node Health Prognostic s Engine Database Health Prognostic s Engine * Requires and Included with RAC or R1N License
  • 132. Choosing a Data Set for Calibration – Defining “normal” Calibrating CHA to your RAC deployment chactl query calibration –cluster –timeranges ‘start=2020-10-01 07:00:00,end=2020-10-01 13:00:00’ Cluster name : mycluster Start time : 2020-10-01 07:00:00 End time : 2020-10-01 13:00:00 Total Samples : 11524 Percentage of filtered data : 100% 1) Disk read (ASM) (Mbyte/sec) MEAN MEDIAN STDDEV MIN MAX 0.11 0.00 2.62 0.00 114.66 <25 <50 <75 <100 >=100 99.87% 0.08% 0.00% 0.02% 0.03% ...
  • 133. Choosing a Data Set for Calibration – Defining “normal” Calibrating CHA to your RAC deployment ... 2) Disk write (ASM) (Mbyte/sec) MEAN MEDIAN STDDEV MIN MAX 0.01 0.00 0.15 0.00 6.77 <50 <100 <150 <200 >=200 100.00% 0.00% 0.00% 0.00% 0.00% ...
  • 134. Choosing a Data Set for Calibration – Defining “normal” Calibrating CHA to your RAC deployment ... 3) Disk throughput (ASM) (IO/sec) MEAN MEDIAN STDDEV MIN MAX 2.20 0.00 31.17 0.00 1100.00 <5000 <10000 <15000 <20000 >=20000 100.00% 0.00% 0.00% 0.00% 0.00% 4) CPU utilization (total) (%) MEAN MEDIAN STDDEV MIN MAX 9.62 9.30 7.95 1.80 77.90 <20 <40 <60 <80 >=80 92.67% 6.17% 1.11% 0.05% 0.00% ...
  • 135. Create and store a new model Begin using the new model Confirm the new model is working Calibrating CHA to your RAC deployment chactl query calibrate cluster –model daytime –timeranges ‘start=2020-10-01 07:00:00, end= 2020-10-01 13:00:00’ chactl monitor cluster –model daytime chactl status –verbose monitoring nodes svr01, svr02 using model daytime monitoring database qoltpacdb, instances oltpacdb_1, oltpacdb_2 using model DEFAULT_DB
  • 136. Enable CHA monitoring on RAC database with optional model Enable CHA monitoring on RAC database with optional verbose Command line operations chactl monitor database –db oltpacdb [-model model_name] chactl status –verbose monitoring nodes svr01, svr02 using model DEFAULT_CLUSTER monitoring database oltpacdb, instances oltpacdb_1, oltpacdb_2 using model DEFAULT_DB
  • 137. Check for Health Issues and Corrective Actions with CHACTL QUERY DIAGNOSIS Command line operations chactl query diagnosis -db oltpacdb -start "2020-10-01 01:42:50" -end "2020-10-01 03:19:15" 2020-10-01 01:47:10.0 Database oltpacdb DB Control File IO Performance (oltpacdb_1) [detected] 2020-10-01 01:47:10.0 Database oltpacdb DB Control File IO Performance (oltpacdb_2) [detected] 2020-10-01 02:59:35.0 Database oltpacdb DB Log File Switch (oltpacdb_1) [detected] 2020-10-01 02:59:45.0 Database oltpacdb DB Log File Switch (oltpacdb_2) [detected] Problem: DB Control File IO Performance Description: CHA has detected that reads or writes to the control files are slower than expected. Cause: The Cluster Health Advisor (CHA) detected that reads or writes to the control files were slow because of an increase in disk IO. The slow control file reads and writes may have an impact on checkpoint and Log Writer (LGWR) performance. Action: Separate the control files from other database files and move them to faster disks or Solid State Devices. Problem: DB Log File Switch Description: CHA detected that database sessions are waiting longer than expected for log switch completions. Cause: The Cluster Health Advisor (CHA) detected high contention during log switches because the redo log files were small and the redo logs switched frequently. Action: Increase the size of the redo logs.
  • 138. HTML diagnostic health output available (-html <file_name>) Command line operations
  • 139. Diagnose cluster health chactl query diagnosis -db oltpacdb -start ”2020-10-01 01:42:50.0" -end " 2020-10-01 03:19:15.0" 2020-10-01 01:47:10.0 Database oltpacdb DB Control File IO Performance (oltpacdb_1) [detected] 2020-10-01 01:47:10.0 Database oltpacdb DB Control File IO Performance (oltpacdb_2) [detected] 2020-10-01 02:52:15.0 Database oltpacdb DB CPU Utilization (oltpacdb_2) [detected] 2020-10-01 02:52:50.0 Database oltpacdb DB CPU Utilization (oltpacdb_1) [detected] 2020-10-01 02:59:35.0 Database oltpacdb DB Log File Switch (oltpacdb_1) [detected] 2020-10-01 02:59:45.0 Database oltpacdb DB Log File Switch (oltpacdb_2) [detected]