SlideShare una empresa de Scribd logo
1 de 32
blog.tanelpoder.com 1
Troubleshooting a Yet Another Complex
Performance Issue
Tanel Põder
https://blog.tanelpoder.com
@tanelpoder
blog.tanelpoder.com 2
Let's get started!
This is a reconstruction using the real problem
screenshots and AWR/ASH reports, plus follow-
up experiments on my test environment.
blog.tanelpoder.com 3
Symptoms
• Some queries intermittently extremely slow
• Concurrency wait class waits going through the roof
• Performance gets back to normal by itself
blog.tanelpoder.com 4
The AWR approach
AWR isn't that
great for short
spikes
blog.tanelpoder.com 5
The AWR approach
Average wait of
28ms for a latch
is very long!
Could it be CPU
starvation
(scheduling
latency?)
blog.tanelpoder.com 6
The AWR approach
Plenty of idle CPU
time (remember this
is an average over
30 minute AWR
period!)
Warning, on Linux
the LOAD also
includes processes
waiting for disk IO!
blog.tanelpoder.com 7
The AWR approach
Plenty of Idle
CPU cycles
blog.tanelpoder.com 8
The AWR approach
The IOWAIT is
IDLE as far as
CPU utilization is
concerned!!!
IOWAIT is a
flawed metric (I
don't use it)
blog.tanelpoder.com 9
The AWR approach
Can't really systematically
figure out which SQL_IDs
are responsible for the
waits
blog.tanelpoder.com 10
The AWR approach
Could lots of LIOs be the
cause for the extreme
latch contention?
blog.tanelpoder.com 11
The AWR approach
Again we do not know
who and why does cause
the contention...
blog.tanelpoder.com 12
The AWR approach
Nothing special here
either. kcbgtcr is the
Consistent Read LIO
function
blog.tanelpoder.com 13
The ASH approach
ASH links together many
facts about a session's
doings (wait event, sql_id,
username, block# etc)
And ASH has 1-second
measurement granularity,
we can manually query
that
blog.tanelpoder.com 14
Drill down into Concurrency wait class with ASH:
99% of
Concurrency
waits is due to a
CBC latch!
blog.tanelpoder.com 15
All due to a SELECT
statement (73
different SQLIDs
spotted)
blog.tanelpoder.com 16
One latch or many?
The P1 is the
latch address!
Now we know that
almost all of the
latch contention
was against a
single child latch!
blog.tanelpoder.com 17
So, who's blocking us?
Bummer, we
know the blocker
for only 1% of
the latch waits!
So, we don't know
who blocked us for
99% of the waits...
This is a wait
interface
shortcoming...
blog.tanelpoder.com 18
The ASH approach
ASH report incorrectly
labels "unknown"
blocker as "Held
Shared"
Sometimes the latch
waits are extremely
long!
Let's pick the one
blocker reported and
hope it's relevant  *
blog.tanelpoder.com 19
What was the blocker doing?
SQL> SELECT session_state,event,sql_id, COUNT(*) * 10 seconds
FROM dba_hist_active_sess_history
WHERE session_id = 5914
AND sample_time BETWEEN <spike_start> AND <spike_end>;
%This SESSION EVENT SQL_ID
------ ------- ---------------------------------------- -------------
67% ON CPU 2bdg4ygkpyxc9
14% WAITING log file sequential read 2bdg4ygkpyxc9
10% ON CPU fmdctt76kf3mb
5% WAITING library cache: mutex X
5% WAITING log buffer space 2bdg4ygkpyxc9
It was a user
session by "JDBC
Thin Client"
Why would a
user session
read a redo log
file?!
blog.tanelpoder.com 20
Alternative: LatchProf Collector ("ASH" of Latch Holders)
SQL> SELECT latch_name, hold_mode, sid, COUNT(*)
FROM latchprof_view
WHERE latch_name = 'cache buffers chains'
AND child_address = '385BA5C4'
GROUP BY latch_name, hold_mode, sid
ORDER BY COUNT(*) DESC;
LATCH_NAME HOLD_MODE SID COUNT(*)
------------------------------ -------------- ---------- ----------
cache buffers chains MAYBE-SHARED 19 3
cache buffers chains SHARED 201 2
cache buffers chains MAYBE-SHARED 78 2
cache buffers chains EXCLUSIVE 201 2
cache buffers chains MAYBE-SHARED 132 1
cache buffers chains MAYBE-SHARED 79 1
Report holders
for only the child
latch involved in
the waits
• http://tech.e2sn.com/oracle/troubleshooting/latch-contention-
troubleshooting
• http://blog.tanelpoder.com/files/scripts/tools/collectors/latchprof_install.sql
blog.tanelpoder.com 21
What are the reasons for reading a redo log file?
1. LGWR doing a log switch (but ours was a user session!)
2. A Streams/GoldenGate/LogMiner log mining operation?
• But this was a regular application SELECT query against normal tables
3. Manual dumping of redo log contents
• ALTER SYSTEM DUMP LOGFILE …
4. Automatic Block Media Recovery?
• http://docs.oracle.com/cd/E11882_01/backup.112/e10642/rcmblock
.htm#BRADV118
• v$database_block_corruption was empty (IIRC)
• I thought to check the alert log for any corruption warnings…
blog.tanelpoder.com 22
alert.log
Mon Feb 24 15:54:18 2014
Dumping diagnostic data in directory=[cdmp_20140224155418], requested by
(instance=1, osid=25519), summary=[incident=74688].
Mon Feb 24 15:56:58 2014
Errors in file
/u01/app/oracle/diag/rdbms/lin112/LIN112/trace/LIN112_ora_25519.trc
(incident=74689):
ORA-00600: internal error code, arguments: [kdsgrp1], [], [], [], [], [], [], [],
[], [], [], []
Incident details in:
/u01/app/oracle/diag/rdbms/lin112/LIN112/incident/incdir_74689/LIN112_ora_2551
9_i74689.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Mon Feb 24 15:56:59 2014
Sweep [inc][74689]: completed
Sweep [inc2][74689]: completed
Wow, ORA-600s!
kdsgrp = Kernel
Data Get Row Piece
blog.tanelpoder.com 23
Process trace file – errorstack
Dump continued from file: /u01/app/oracle/diag/rdbms/lin112/.../LIN112_ora_25519.trc
ORA-00600: internal error code, arguments: [kdsgrp1], [], [], [], [], [], [], []
----- Current SQL Statement for this session (sql_id=6wvgqn05s855u) -----
SELECT /*+ INDEX_RS_ASC(t) */ COUNT(LENGTH(owner)) FROM t_c_hotsos t WHERE object_id > 1
----- Call Stack Trace -----
calling call entry argument values in hex
location type point (? means dubious value)
-------------------- -------- -------------------- ----------------------------
kgerinv()+41 call kgerinv_internal() 11B60460 ? F6970020 ?
10809D5C ? 258 ? 0 ? 0 ?
kgeasnmierr()+47 call kgerinv() 11B60460 ? F6970020 ?
10809D5C ? 0 ? FFB53640 ?
kdsgrp1_dump()+1138 call kgeasnmierr() 11B60460 ? F6970020 ?
10809D5C ? 0 ?
kdsgrp1()+32 call kdsgrp1_dump() F663C9D0 ? F663C9D0 ? 0 ?
qetlbr()+247 call kdsgrp1() F663C9D0 ? F663C9D0 ? 0 ?
qertbFetchByRowID() call qetlbr() F663C9D0 ? F6639F08 ?
+5617 2233A6E0 ? 0 ? F663C8FC ? 0 ?
0 ?
qergsFetch()+497 call 00000000 2233A6E0 ? F663C8D0 ?
1024AA74 ? FFB53A08 ? 7FFF ?
opifch2()+2659 call 00000000 2233A5E4 ? F663CB38 ?
Apparently the
failure in kdsgrp1
function causes
some special dump
function to be called
This is a reproduced
test query
blog.tanelpoder.com 24
Process trace file – pinned buffer history
...
END OF PROCESS STATE
----- Pinned Buffer History -----
---------------------
PINNED BUFFER HISTORY (oldest pin first)
---------------------
BH (0x283f2628) file#: 1 rdba: 0x00430445 (1/197701) class: 1 ba: 0x28264000
set: 9 pool: 3 bsz: 8192 bsi: 0 sflg: 1 pwc: 0,25
dbwrid: 0 obj: 132533 objn: 132533 tsn: 0 afn: 1 hint: f
hash: [0x38563cf4,0x38563cf4] lru: [0x23be8998,0x323fa1c0]
ckptq: [NULL] fileq: [NULL] objq: [0x23be89b0,0x323fa1d8] objaq: [0x23be89b8,0x323fa1e0]
st: XCURRENT md: NULL fpin: 'kdiwh100: kdircys' tch: 3
flags: only_sequential_access
LRBA: [0x0.0.0] LSCN: [0x0.0] HSCN: [0xffff.ffffffff] HSUB: [65535]
buffer tsn: 0 rdba: 0x00430445 (1/197701)
scn: 0x0000.02919937 seq: 0x02 flg: 0x04 tail: 0x99370602
frmt: 0x02 chkval: 0x7eb5 type: 0x06=trans data
Hex dump of block: st=0, typ_found=1
Dump of memory from 0x28264000 to 0x28266000
28264000 0000A206 00430445 02919937 04020000 [....E.C.7.......]
28264010 00007EB5 00000002 000205B5 02919929 [.~..........)...]
28264020 00000000 00020002 00000000 00000000 [................]
28264030 00000000 00000000 00000000 00000000 [................]
Also a number of
recently accessed
buffers/blocks will
be dumped (from
memory)
You can issue
"dump_pinned_
buffer_history"
manually too
blog.tanelpoder.com 25
Process trace file – buffer change history (REDO)
END OF PINNED BUFFER HISTORY
*** timestamp before redo dump: 02/24/2014 15:52:16
***********************************************
* Dump Online Redo for Buffers in Pin History *
***********************************************
$$$$$$$ Dump Online Redo for DBA list (tsn.rdba in hex):
0x0.00430445 0x9.020001f9 0x9.020001fa 0x9.020001fb 0x9.020001fc 0x9.020001fd
0x9.020001fe 0x9.020001ff 0x9.02000202 0x9.02000203 0x9.02000204:
DUMP REDO
Opcodes *.*
DBAs (file#, block#):
(1, 197701) (8, 505) (8, 506) (8, 507) (8, 508) (8, 509) (8, 510) (8, 511) (8, 514) .
SCNs: scn: 0x0000.00000000 thru scn: 0xffff.ffffffff
Times: 02/24/2014 14:52:16 thru eternity
REDO RECORD - Thread:1 RBA: 0x000a13.0000b070.0010 LEN: 0x10e0 VLD: 0x0d
SCN: 0x0000.0291d462 SUBSCN: 1 02/24/2014 15:13:56
(LWN RBA: 0x000a13.0000b070.0010 LEN: 0115 NST: 0001 SCN: 0x0000.0291d462)
CHANGE #1 TYP:2 CLS:1 AFN:2 DBA:0x00826444 OBJ:5823 SCN:0x0000.0291d2a7 SEQ:2 OP:11.2 ENC:0
RBL:0
KTB Redo
op: 0x01 ver: 0x01
compat bit: 4 (post-11) padding: 1
op: F xid: 0x0023.017.0000028a uba: 0x01c0156a.01c6.03
KDO Op code: IRP row dependencies Disabled
...
Oracle also
dumps the
online REDO
against the
recently pinned
blocks!!!
This is why the
blocking session
was waiting for the
log file sequential
read!
blog.tanelpoder.com 26
A Corruption?
• So, do we have a corruption?
• No block checksum / checking errors
• dbverify, RMAN, DBMS_REPAIR tools didn't report a problem
• As this was an ORA-600 error, it's time to search the MOS
• "kdsgrp1 ora 600"
blog.tanelpoder.com 27
A bug!!!
blog.tanelpoder.com 28
Causal Chain
1. A session took a cache buffers chains latch and accessed a
buffer using the data layer function kdsgrp1
2. The session hit an ORA-600 due to a bug (not corruption)
3. As the data access function failed, an errorstack dump with
the recently accessed buffer dump was invoked
4. The recent buffer dump also read and dumped relevant
changes from the online redo logs (log file sequential read!)
5. The cache buffers chains latch was held until the end of the
dump!
blog.tanelpoder.com 29
Shouldn't we have waited for buffer busy waits?
• With regular logical IOs the buffer contents are not read while
holding the CBC latch:
1. Take CBC latch in shared mode
2. Walk the buffer hash chain until you find the relevant buffer header
3. Upgrade the CBC latch to exclusive mode
4. Pin the buffer header
5. Release the CBC latch
6. Now access the buffer (call transaction, data layer etc)
7. Take the CBC latch again (in shared mode)
8. Unpin the buffer header
9. Release the CBC latch
If someone else
wants to pin the
buffer now, they'd
wait for buffer busy
waits
blog.tanelpoder.com 30
Sometimes "short" logical IOs can skip a few steps
• With "short" LIOs like unique index lookup LIOs (etc) Oracle
can avoid the buffer pinning codepath:
1. Take CBC latch in shared mode
2. Walk the buffer hash chain until you find the relevant buffer header
3. Upgrade the CBC latch to exclusive mode
4. Pin the buffer header
5. Release the CBC latch
6. Now access the buffer (call transaction, data layer etc)
7. Take the CBC latch again (in shared mode)
8. Unpin the buffer header
9. Release the CBC latch
This shows up as
consistent reads –
examination
counter in
v$sesstat
If someone wants
to get the CBC latch
in exclusive mode
now, they'd wait for
the latch
blog.tanelpoder.com 31
Conclusion
1. Troubleshoot by following the causal chain of events
2. Don't try to jump to the "solution" or "root cause"
immediately
• There are many possible root causes
3. Sometimes you need to bridge a gap in the chain with your
own reasoning (and later verify)
• "Why would a user session need to read from a redo log?"
4. Sometimes you need to selectively ignore/postpone
evidence
• Latch contention is not always a "too heavy usage" issue
blog.tanelpoder.com 32
Thanks!!!
Oracle Troubleshooting Training by Tanel Poder
blog: https://blog.tanelpoder.com
github: https://github.com/tanelpoder
twitter: @tanelpoder

Más contenido relacionado

Más de Tanel Poder

Connecting Hadoop and Oracle
Connecting Hadoop and OracleConnecting Hadoop and Oracle
Connecting Hadoop and OracleTanel Poder
 
Oracle Exadata Performance: Latest Improvements and Less Known Features
Oracle Exadata Performance: Latest Improvements and Less Known FeaturesOracle Exadata Performance: Latest Improvements and Less Known Features
Oracle Exadata Performance: Latest Improvements and Less Known FeaturesTanel Poder
 
Oracle Database In-Memory Option in Action
Oracle Database In-Memory Option in ActionOracle Database In-Memory Option in Action
Oracle Database In-Memory Option in ActionTanel Poder
 
Tanel Poder - Troubleshooting Complex Oracle Performance Issues - Part 1
Tanel Poder - Troubleshooting Complex Oracle Performance Issues - Part 1Tanel Poder - Troubleshooting Complex Oracle Performance Issues - Part 1
Tanel Poder - Troubleshooting Complex Oracle Performance Issues - Part 1Tanel Poder
 
Tanel Poder - Troubleshooting Complex Oracle Performance Issues - Part 2
Tanel Poder - Troubleshooting Complex Oracle Performance Issues - Part 2Tanel Poder - Troubleshooting Complex Oracle Performance Issues - Part 2
Tanel Poder - Troubleshooting Complex Oracle Performance Issues - Part 2Tanel Poder
 
Tanel Poder Oracle Scripts and Tools (2010)
Tanel Poder Oracle Scripts and Tools (2010)Tanel Poder Oracle Scripts and Tools (2010)
Tanel Poder Oracle Scripts and Tools (2010)Tanel Poder
 
Oracle Latch and Mutex Contention Troubleshooting
Oracle Latch and Mutex Contention TroubleshootingOracle Latch and Mutex Contention Troubleshooting
Oracle Latch and Mutex Contention TroubleshootingTanel Poder
 
Oracle LOB Internals and Performance Tuning
Oracle LOB Internals and Performance TuningOracle LOB Internals and Performance Tuning
Oracle LOB Internals and Performance TuningTanel Poder
 
Tanel Poder - Performance stories from Exadata Migrations
Tanel Poder - Performance stories from Exadata MigrationsTanel Poder - Performance stories from Exadata Migrations
Tanel Poder - Performance stories from Exadata MigrationsTanel Poder
 

Más de Tanel Poder (9)

Connecting Hadoop and Oracle
Connecting Hadoop and OracleConnecting Hadoop and Oracle
Connecting Hadoop and Oracle
 
Oracle Exadata Performance: Latest Improvements and Less Known Features
Oracle Exadata Performance: Latest Improvements and Less Known FeaturesOracle Exadata Performance: Latest Improvements and Less Known Features
Oracle Exadata Performance: Latest Improvements and Less Known Features
 
Oracle Database In-Memory Option in Action
Oracle Database In-Memory Option in ActionOracle Database In-Memory Option in Action
Oracle Database In-Memory Option in Action
 
Tanel Poder - Troubleshooting Complex Oracle Performance Issues - Part 1
Tanel Poder - Troubleshooting Complex Oracle Performance Issues - Part 1Tanel Poder - Troubleshooting Complex Oracle Performance Issues - Part 1
Tanel Poder - Troubleshooting Complex Oracle Performance Issues - Part 1
 
Tanel Poder - Troubleshooting Complex Oracle Performance Issues - Part 2
Tanel Poder - Troubleshooting Complex Oracle Performance Issues - Part 2Tanel Poder - Troubleshooting Complex Oracle Performance Issues - Part 2
Tanel Poder - Troubleshooting Complex Oracle Performance Issues - Part 2
 
Tanel Poder Oracle Scripts and Tools (2010)
Tanel Poder Oracle Scripts and Tools (2010)Tanel Poder Oracle Scripts and Tools (2010)
Tanel Poder Oracle Scripts and Tools (2010)
 
Oracle Latch and Mutex Contention Troubleshooting
Oracle Latch and Mutex Contention TroubleshootingOracle Latch and Mutex Contention Troubleshooting
Oracle Latch and Mutex Contention Troubleshooting
 
Oracle LOB Internals and Performance Tuning
Oracle LOB Internals and Performance TuningOracle LOB Internals and Performance Tuning
Oracle LOB Internals and Performance Tuning
 
Tanel Poder - Performance stories from Exadata Migrations
Tanel Poder - Performance stories from Exadata MigrationsTanel Poder - Performance stories from Exadata Migrations
Tanel Poder - Performance stories from Exadata Migrations
 

Último

CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...shivangimorya083
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 

Último (20)

Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 

Troubleshooting Another Complex Oracle Performance Issue (vol. 3)

  • 1. blog.tanelpoder.com 1 Troubleshooting a Yet Another Complex Performance Issue Tanel Põder https://blog.tanelpoder.com @tanelpoder
  • 2. blog.tanelpoder.com 2 Let's get started! This is a reconstruction using the real problem screenshots and AWR/ASH reports, plus follow- up experiments on my test environment.
  • 3. blog.tanelpoder.com 3 Symptoms • Some queries intermittently extremely slow • Concurrency wait class waits going through the roof • Performance gets back to normal by itself
  • 4. blog.tanelpoder.com 4 The AWR approach AWR isn't that great for short spikes
  • 5. blog.tanelpoder.com 5 The AWR approach Average wait of 28ms for a latch is very long! Could it be CPU starvation (scheduling latency?)
  • 6. blog.tanelpoder.com 6 The AWR approach Plenty of idle CPU time (remember this is an average over 30 minute AWR period!) Warning, on Linux the LOAD also includes processes waiting for disk IO!
  • 7. blog.tanelpoder.com 7 The AWR approach Plenty of Idle CPU cycles
  • 8. blog.tanelpoder.com 8 The AWR approach The IOWAIT is IDLE as far as CPU utilization is concerned!!! IOWAIT is a flawed metric (I don't use it)
  • 9. blog.tanelpoder.com 9 The AWR approach Can't really systematically figure out which SQL_IDs are responsible for the waits
  • 10. blog.tanelpoder.com 10 The AWR approach Could lots of LIOs be the cause for the extreme latch contention?
  • 11. blog.tanelpoder.com 11 The AWR approach Again we do not know who and why does cause the contention...
  • 12. blog.tanelpoder.com 12 The AWR approach Nothing special here either. kcbgtcr is the Consistent Read LIO function
  • 13. blog.tanelpoder.com 13 The ASH approach ASH links together many facts about a session's doings (wait event, sql_id, username, block# etc) And ASH has 1-second measurement granularity, we can manually query that
  • 14. blog.tanelpoder.com 14 Drill down into Concurrency wait class with ASH: 99% of Concurrency waits is due to a CBC latch!
  • 15. blog.tanelpoder.com 15 All due to a SELECT statement (73 different SQLIDs spotted)
  • 16. blog.tanelpoder.com 16 One latch or many? The P1 is the latch address! Now we know that almost all of the latch contention was against a single child latch!
  • 17. blog.tanelpoder.com 17 So, who's blocking us? Bummer, we know the blocker for only 1% of the latch waits! So, we don't know who blocked us for 99% of the waits... This is a wait interface shortcoming...
  • 18. blog.tanelpoder.com 18 The ASH approach ASH report incorrectly labels "unknown" blocker as "Held Shared" Sometimes the latch waits are extremely long! Let's pick the one blocker reported and hope it's relevant  *
  • 19. blog.tanelpoder.com 19 What was the blocker doing? SQL> SELECT session_state,event,sql_id, COUNT(*) * 10 seconds FROM dba_hist_active_sess_history WHERE session_id = 5914 AND sample_time BETWEEN <spike_start> AND <spike_end>; %This SESSION EVENT SQL_ID ------ ------- ---------------------------------------- ------------- 67% ON CPU 2bdg4ygkpyxc9 14% WAITING log file sequential read 2bdg4ygkpyxc9 10% ON CPU fmdctt76kf3mb 5% WAITING library cache: mutex X 5% WAITING log buffer space 2bdg4ygkpyxc9 It was a user session by "JDBC Thin Client" Why would a user session read a redo log file?!
  • 20. blog.tanelpoder.com 20 Alternative: LatchProf Collector ("ASH" of Latch Holders) SQL> SELECT latch_name, hold_mode, sid, COUNT(*) FROM latchprof_view WHERE latch_name = 'cache buffers chains' AND child_address = '385BA5C4' GROUP BY latch_name, hold_mode, sid ORDER BY COUNT(*) DESC; LATCH_NAME HOLD_MODE SID COUNT(*) ------------------------------ -------------- ---------- ---------- cache buffers chains MAYBE-SHARED 19 3 cache buffers chains SHARED 201 2 cache buffers chains MAYBE-SHARED 78 2 cache buffers chains EXCLUSIVE 201 2 cache buffers chains MAYBE-SHARED 132 1 cache buffers chains MAYBE-SHARED 79 1 Report holders for only the child latch involved in the waits • http://tech.e2sn.com/oracle/troubleshooting/latch-contention- troubleshooting • http://blog.tanelpoder.com/files/scripts/tools/collectors/latchprof_install.sql
  • 21. blog.tanelpoder.com 21 What are the reasons for reading a redo log file? 1. LGWR doing a log switch (but ours was a user session!) 2. A Streams/GoldenGate/LogMiner log mining operation? • But this was a regular application SELECT query against normal tables 3. Manual dumping of redo log contents • ALTER SYSTEM DUMP LOGFILE … 4. Automatic Block Media Recovery? • http://docs.oracle.com/cd/E11882_01/backup.112/e10642/rcmblock .htm#BRADV118 • v$database_block_corruption was empty (IIRC) • I thought to check the alert log for any corruption warnings…
  • 22. blog.tanelpoder.com 22 alert.log Mon Feb 24 15:54:18 2014 Dumping diagnostic data in directory=[cdmp_20140224155418], requested by (instance=1, osid=25519), summary=[incident=74688]. Mon Feb 24 15:56:58 2014 Errors in file /u01/app/oracle/diag/rdbms/lin112/LIN112/trace/LIN112_ora_25519.trc (incident=74689): ORA-00600: internal error code, arguments: [kdsgrp1], [], [], [], [], [], [], [], [], [], [], [] Incident details in: /u01/app/oracle/diag/rdbms/lin112/LIN112/incident/incdir_74689/LIN112_ora_2551 9_i74689.trc Use ADRCI or Support Workbench to package the incident. See Note 411.1 at My Oracle Support for error and packaging details. Mon Feb 24 15:56:59 2014 Sweep [inc][74689]: completed Sweep [inc2][74689]: completed Wow, ORA-600s! kdsgrp = Kernel Data Get Row Piece
  • 23. blog.tanelpoder.com 23 Process trace file – errorstack Dump continued from file: /u01/app/oracle/diag/rdbms/lin112/.../LIN112_ora_25519.trc ORA-00600: internal error code, arguments: [kdsgrp1], [], [], [], [], [], [], [] ----- Current SQL Statement for this session (sql_id=6wvgqn05s855u) ----- SELECT /*+ INDEX_RS_ASC(t) */ COUNT(LENGTH(owner)) FROM t_c_hotsos t WHERE object_id > 1 ----- Call Stack Trace ----- calling call entry argument values in hex location type point (? means dubious value) -------------------- -------- -------------------- ---------------------------- kgerinv()+41 call kgerinv_internal() 11B60460 ? F6970020 ? 10809D5C ? 258 ? 0 ? 0 ? kgeasnmierr()+47 call kgerinv() 11B60460 ? F6970020 ? 10809D5C ? 0 ? FFB53640 ? kdsgrp1_dump()+1138 call kgeasnmierr() 11B60460 ? F6970020 ? 10809D5C ? 0 ? kdsgrp1()+32 call kdsgrp1_dump() F663C9D0 ? F663C9D0 ? 0 ? qetlbr()+247 call kdsgrp1() F663C9D0 ? F663C9D0 ? 0 ? qertbFetchByRowID() call qetlbr() F663C9D0 ? F6639F08 ? +5617 2233A6E0 ? 0 ? F663C8FC ? 0 ? 0 ? qergsFetch()+497 call 00000000 2233A6E0 ? F663C8D0 ? 1024AA74 ? FFB53A08 ? 7FFF ? opifch2()+2659 call 00000000 2233A5E4 ? F663CB38 ? Apparently the failure in kdsgrp1 function causes some special dump function to be called This is a reproduced test query
  • 24. blog.tanelpoder.com 24 Process trace file – pinned buffer history ... END OF PROCESS STATE ----- Pinned Buffer History ----- --------------------- PINNED BUFFER HISTORY (oldest pin first) --------------------- BH (0x283f2628) file#: 1 rdba: 0x00430445 (1/197701) class: 1 ba: 0x28264000 set: 9 pool: 3 bsz: 8192 bsi: 0 sflg: 1 pwc: 0,25 dbwrid: 0 obj: 132533 objn: 132533 tsn: 0 afn: 1 hint: f hash: [0x38563cf4,0x38563cf4] lru: [0x23be8998,0x323fa1c0] ckptq: [NULL] fileq: [NULL] objq: [0x23be89b0,0x323fa1d8] objaq: [0x23be89b8,0x323fa1e0] st: XCURRENT md: NULL fpin: 'kdiwh100: kdircys' tch: 3 flags: only_sequential_access LRBA: [0x0.0.0] LSCN: [0x0.0] HSCN: [0xffff.ffffffff] HSUB: [65535] buffer tsn: 0 rdba: 0x00430445 (1/197701) scn: 0x0000.02919937 seq: 0x02 flg: 0x04 tail: 0x99370602 frmt: 0x02 chkval: 0x7eb5 type: 0x06=trans data Hex dump of block: st=0, typ_found=1 Dump of memory from 0x28264000 to 0x28266000 28264000 0000A206 00430445 02919937 04020000 [....E.C.7.......] 28264010 00007EB5 00000002 000205B5 02919929 [.~..........)...] 28264020 00000000 00020002 00000000 00000000 [................] 28264030 00000000 00000000 00000000 00000000 [................] Also a number of recently accessed buffers/blocks will be dumped (from memory) You can issue "dump_pinned_ buffer_history" manually too
  • 25. blog.tanelpoder.com 25 Process trace file – buffer change history (REDO) END OF PINNED BUFFER HISTORY *** timestamp before redo dump: 02/24/2014 15:52:16 *********************************************** * Dump Online Redo for Buffers in Pin History * *********************************************** $$$$$$$ Dump Online Redo for DBA list (tsn.rdba in hex): 0x0.00430445 0x9.020001f9 0x9.020001fa 0x9.020001fb 0x9.020001fc 0x9.020001fd 0x9.020001fe 0x9.020001ff 0x9.02000202 0x9.02000203 0x9.02000204: DUMP REDO Opcodes *.* DBAs (file#, block#): (1, 197701) (8, 505) (8, 506) (8, 507) (8, 508) (8, 509) (8, 510) (8, 511) (8, 514) . SCNs: scn: 0x0000.00000000 thru scn: 0xffff.ffffffff Times: 02/24/2014 14:52:16 thru eternity REDO RECORD - Thread:1 RBA: 0x000a13.0000b070.0010 LEN: 0x10e0 VLD: 0x0d SCN: 0x0000.0291d462 SUBSCN: 1 02/24/2014 15:13:56 (LWN RBA: 0x000a13.0000b070.0010 LEN: 0115 NST: 0001 SCN: 0x0000.0291d462) CHANGE #1 TYP:2 CLS:1 AFN:2 DBA:0x00826444 OBJ:5823 SCN:0x0000.0291d2a7 SEQ:2 OP:11.2 ENC:0 RBL:0 KTB Redo op: 0x01 ver: 0x01 compat bit: 4 (post-11) padding: 1 op: F xid: 0x0023.017.0000028a uba: 0x01c0156a.01c6.03 KDO Op code: IRP row dependencies Disabled ... Oracle also dumps the online REDO against the recently pinned blocks!!! This is why the blocking session was waiting for the log file sequential read!
  • 26. blog.tanelpoder.com 26 A Corruption? • So, do we have a corruption? • No block checksum / checking errors • dbverify, RMAN, DBMS_REPAIR tools didn't report a problem • As this was an ORA-600 error, it's time to search the MOS • "kdsgrp1 ora 600"
  • 28. blog.tanelpoder.com 28 Causal Chain 1. A session took a cache buffers chains latch and accessed a buffer using the data layer function kdsgrp1 2. The session hit an ORA-600 due to a bug (not corruption) 3. As the data access function failed, an errorstack dump with the recently accessed buffer dump was invoked 4. The recent buffer dump also read and dumped relevant changes from the online redo logs (log file sequential read!) 5. The cache buffers chains latch was held until the end of the dump!
  • 29. blog.tanelpoder.com 29 Shouldn't we have waited for buffer busy waits? • With regular logical IOs the buffer contents are not read while holding the CBC latch: 1. Take CBC latch in shared mode 2. Walk the buffer hash chain until you find the relevant buffer header 3. Upgrade the CBC latch to exclusive mode 4. Pin the buffer header 5. Release the CBC latch 6. Now access the buffer (call transaction, data layer etc) 7. Take the CBC latch again (in shared mode) 8. Unpin the buffer header 9. Release the CBC latch If someone else wants to pin the buffer now, they'd wait for buffer busy waits
  • 30. blog.tanelpoder.com 30 Sometimes "short" logical IOs can skip a few steps • With "short" LIOs like unique index lookup LIOs (etc) Oracle can avoid the buffer pinning codepath: 1. Take CBC latch in shared mode 2. Walk the buffer hash chain until you find the relevant buffer header 3. Upgrade the CBC latch to exclusive mode 4. Pin the buffer header 5. Release the CBC latch 6. Now access the buffer (call transaction, data layer etc) 7. Take the CBC latch again (in shared mode) 8. Unpin the buffer header 9. Release the CBC latch This shows up as consistent reads – examination counter in v$sesstat If someone wants to get the CBC latch in exclusive mode now, they'd wait for the latch
  • 31. blog.tanelpoder.com 31 Conclusion 1. Troubleshoot by following the causal chain of events 2. Don't try to jump to the "solution" or "root cause" immediately • There are many possible root causes 3. Sometimes you need to bridge a gap in the chain with your own reasoning (and later verify) • "Why would a user session need to read from a redo log?" 4. Sometimes you need to selectively ignore/postpone evidence • Latch contention is not always a "too heavy usage" issue
  • 32. blog.tanelpoder.com 32 Thanks!!! Oracle Troubleshooting Training by Tanel Poder blog: https://blog.tanelpoder.com github: https://github.com/tanelpoder twitter: @tanelpoder