This presentation is different from the previous uploads as SLOB was used for the testing.
Oracle Database 12c Multitenant provides the highest level of Oracle Database resource efficiency, driven by an improved resource manager. The 12c resource manager effectively allocates resources both within a single database and between multiple pluggable databases in a container. This presentation will review new features of the 12c resource manager, provide guidelines for migration of your current resource management plan to 12c, and will also look into how much overhead the resource manager introduces.
2. Maris Elsins
Lead DatabaseConsultant
At Pythian since 2011
Located in Riga, Latvia
Oracle [Apps] DBA since 2005
Speaker at conferences since 2007
@MarisElsins elsins@pythian.com
http://bit.ly/getMOSPatch
3. ABOUT PYTHIAN
11,400
Pythian currently manages
more than 11,400 systems.
400+
Pythian currently employs
more than 400 people in 200
cities in 35 countries
1997
Pythian was founded in 1997
Global Leader In IT Transformation And Operational Excellence
Unparalleled Expertise
• Top 5% in databases,applications,infrastructure,Big Data, Cloud,Data Science,
and DevOps
Unmatched Certifications
• 9 Oracle ACEs, 4 Oracle ACE Directors, 1 Oracle ACE Associate
• 6 Microsoft MVPs, 1 Microsoft Certified Master
• 5 Google Platform Qualified Developers
• 1 Cloudera Champion ofBig Data
• 1 Mongo DB Certified DBAAssociate Level
• 1 DataStax Certified Partner, 1 MVP
Broad Technical Experience
• Oracle, Microsoft, MySQL, Oracle EBS, Hadoop,Cassandra,MongoDB,
virtualization,configuration management,monitoring,trending,and more.
4. AGENDA
• Features of the Resource Manager
• The new 12c-stuff
• Consolidations using Oracle Multitenant
• Overhead of the RM
6. THE PROBLEM
• Problems start when there’s not enough CPU for
everyone
• CPU starvation can be hard to recover from
(the snowball effect)
• Troubleshooting an ongoing problem is difficult to do
• OS doesn’t care enough about DB-specific resources
– Undo
– Locks
– Parallelism
7. PROBLEM SCENARIOS
• Running reports causes too much load on the OLTP system.
• One of the sessions allocate all parallel query slaves therefore other sessions don’t get any
• Application support team runs heavy queries to analyze the data leaving less resources for
online transactions
• Wide search criteria cause “hangs” in the search form
• 3 of 8 CPU cores are idle, my query runs without parallel execution,
I could use the idle CPUs to provide results faster
• Users don’t log out and leave idle sessions
• My batch process requires DOP=8 to complete in time, but it’s downgraded to smaller DOP
if enough parallel slaves are not available
• My query is very important. It’s IO requests have to be prioritized!
• Sessions with incomplete transactions have locked some rows and other sessions have
stuck.
8. THE SOLUTION
• Resource Manager
– Included in Oracle EE license
– Prioritization of sessions based on defined rules
– Guaranteed amount resources for each type of sessions (consumer group)
– (optional) upper bound of resources for each type of sessions
• Prioritization is achieved by changing the process states to running/sleeping
– DBRM (resource plan management) / VKRM (CPU scheduling)
– Utilizes Semaphores (wake up sleeping processes)
– CPU quantum (_dbrm_quantum)
• Resource manager does not solve the «lack of CPU resources» problem, it just
controls the execution queue
• Resource manager uses some resources too, the last part of the presentation
will estimate the overhead
9. BASIC FEATURES
9.2 10.2 11.1 11.2 12.1
CPU resource allocation J J J J J
Limitof the degree of parallelism J J J J J
active session pool J J J J J
Automated change of consumer group if session has used or is
estimated to use the defined amountof resources
CPU,
Est CPU
CPU,
Est CPU
CPU,
Est CPU,
IO_MB,
IO_REQ
CPU,
Est CPU,
IO_MB,
IO_REQ
CPU, IO_MB,
IO_REQ,
Est CPU, LIO,
Ela, Est Ela
Limitof estimated execution time J J J J J
Limitsize of undo used by uncommitted sessions J J J J J
Termination of idle sessions J J J J
Termination of idle blocking sessions J J J J
L0 70% CPU _ORACLE_BACKGROUND_GROUP_hidden consumer
group for background processes J J J at 90%
Instance caging /CPU_COUNT+ resource plan/ J J
Max CPU Utilization limit J J
Parallel StatementQueue J J
LOG_ONLY “switch group” for real-time SQL monitoring J
Simplified automated consumer group switching J
11. THE BASIC CONCEPTS
• Consumer group
– Set of sessions having similar requirements
for server resources
– Resources are allocated to the consumer
group, not individual sessions
– DBA_RSRC_CONSUME_GROUPS
• Directives
– Rules that define resource allocation to the
consumer group
– DBA_RSRC_PLAN_DIRECTIVES
• Resource plan
– Set of directives defining the distribution of
resources among consumer groups
– DBA_RSRC_PLANS
12. RESMGR:CPU QUANTUM
WHY IS MY SESSION NOT RUNNING?
SQL> select event, count(*) from v$session group by event order by 2 desc;
EVENT COUNT(*)
---------------------------------------------------------------- ----------
resmgr:cpu quantum 25
rdbms ipc message 23
Space Manager: slave idle wait 16
SQL*Net message from client 9
EMON slave idle wait 5
DIAG idle wait 2
LGWR worker group idle 2
GCR sleep 2
Streams AQ: waiting for time management or cleanup tasks 1
VKTM Logical Idle Wait 1
AQPC idle 1
Streams AQ: qmn coordinator idle wait 1
VKRM Idle 1
PING 1
...
23 rows selected.
13. RESMGR:CPU QUANTUM
WHY IS MY SESSION NOT RUNNING?
SQL> select event, status, count(*) from v$session
where event='resmgr:cpu quantum'
group by event, status order by 1,2;
EVENT STATUS COUNT(*)
------------------ -------- ----------
resmgr:cpu quantum ACTIVE 25
14. RESMGR:CPU QUANTUM
WHY IS MY SESSION NOT RUNNING?
SQL> select event, status, state, count(*)
from v$session where event='resmgr:cpu quantum'
group by event, status, state order by 1,2,3;
EVENT STATUS STATE COUNT(*)
------------------ -------- ------------------- ----------
resmgr:cpu quantum ACTIVE WAITED KNOWN TIME 7
resmgr:cpu quantum ACTIVE WAITED SHORT TIME 16
resmgr:cpu quantum ACTIVE WAITING 2
15. RESMGR:CPU QUANTUM
WHY IS MY SESSION NOT RUNNING?
• EVENT values are often misinterpreted:
– V$SESSION
– V$SESSION_WAIT
• Common mistake is to forget about V$SESSION.STATE
• If STATE = 'WAITING’, only then the session is waiting
– EVENT shows what the session is waiting for
– STATUS can be ACTIVE or INACTIVE
• If STATE = 'WAITED % TIME’ ..
– and STATUS = 'ACTIVE', the session is ON CPU
– and STATUS != 'ACTIVE', the session is not running
THIS IS TRUE FOR ALL WAITEVENTS
17. THE NEW 12C-STUFF
• Improvements to non-CDB RM
– Mostly to improve automated consumer group switching
• RM in 12c CDB
– CDB resource plans
– PDB resource plans
18. AUTOMATED CONSUMER GROUP SWITCHING
12C: MORE OPTIONS!
• Logical IO
• Elapsed time
• Estimated elapsed time
• Estimated CPU time
– The new algorithm replaces cost-based estimation
• Real-time SQL monitoring
– LOG_ONLY
19. AUTOMATED CONSUMER GROUP SWITCHING
ESTIMATED ELAPSED/CPU TIME – RECURSIVE STATEMENT
SELECT executions,
end_of_fetch_count,
elapsed_time / px_servers elapsed_time,
cpu_time / px_servers cpu_time,
buffer_gets / executions buffer_gets
FROM
(SELECT SUM(executions) AS executions,
sum (
CASE
WHEN px_servers_executions > 0
THEN px_servers_executions
ELSE executions
END) AS px_servers,
SUM(end_of_fetch_count) AS end_of_fetch_count,
SUM(elapsed_time) AS elapsed_time,
SUM(cpu_time) AS cpu_time,
SUM(buffer_gets) AS buffer_gets
FROM gv$sql
WHERE executions > 0
AND sql_id = :1
AND parsing_schema_name = :2
)
20. AUTOMATED CONSUMER GROUP SWITCHING
ESTIMATED ELAPSED/CPU TIME – RECURSIVE STATEMENT
SELECT executions,
end_of_fetch_count,
elapsed_time / px_servers elapsed_time,
cpu_time / px_servers cpu_time,
buffer_gets / executions buffer_gets
FROM
(SELECT SUM(executions_delta) AS EXECUTIONS,
SUM(
CASE WHEN px_servers_execs_delta > 0 THEN px_servers_execs_delta ELSE
executions_delta
END) AS px_servers,
SUM(end_of_fetch_count_delta) AS end_of_fetch_count,
SUM(elapsed_time_delta) AS ELAPSED_TIME,
SUM(cpu_time_delta) AS CPU_TIME,
SUM(buffer_gets_delta) AS BUFFER_GETS
FROM DBA_HIST_SQLSTAT s,
V$DATABASE d,
DBA_HIST_SNAPSHOT sn
WHERE s.dbid = d.dbid
AND bitand(NVL(s.flag, 0), 1) = 0
AND sn.end_interval_time > (SELECT SYS imestamp at TIME ZONE dbtimezone FROM
dual) - 7
AND s.sql_id = :1
AND s.snap_id = sn.snap_id
AND s.instance_number = sn.instance_number
AND s.dbid = sn.dbid
AND parsing_schema_name = :2)
21. REAL-TIME SQL MONITORING IMPROVEMENTS
LOG_ONLY – RESERVED CONSUMER GROUP NAME
• Simplifies analysis of consumer group switching? – Not Much L
• V$SQL_MONITOR
– RM_LAST_ACTION (i.e. LOG_ONLY)
– RM_LAST_ACTION_REASON (i.e. SWITCH_ELAPSED_TIME)
– RM_LAST_ACTION_TIME (i.e. 2015.11.26)
– RM_CONSUMER_GROUP (i.e. BATCH_GROUP)
• RM_* columns are not represented in reports, just in
V$SQL_MONITOR
• Historical SQL Monitor Reports – don’t include the RM_* info either
– DBA_HIST_REPORTS / DBA_HIST_REPORTS_DETAILS
– http://mauro-pagano.com/2015/05/04/historical-sql-monitor-reports-in-12c
– But at least you have the reports!
23. CONSUMER GROUP SWITCHING
SIMPLIFIED MANAGEMENT OF PRIVILEGES
• In pre-12c any kind of switching required explicit privilege
– DBMS_RESOURCE_MANAGER_PRIVS.GRANT_SWITCH_CONSUMER_GROUP
• 12.1 privileges included for:
– Consumergroup mappings
– Condition based on SWITCH_GROUP
• What it means to DBAs?
– Removes redundantwork
– Simplicity
– More flexibility as explicit grants can be avoided
25. CDB RESOURCE PLAN
• CDB resource plan
– Defines how resources are distributed between PDBs
– Shares – Minimum portion of resources allocated to the PDB
– Additional Limits
• Utilization_limit
• Parallel_server_limit (%)
• CDB Plan Directives (in DEFAULT_CDB_PLAN)
– ORA$DEFAULT_PDB_DIRECTIVE – default
• Shares=1, utilization_limit=100, parallel_server_limit=100
– ORA$AUTOTASK – for autotasks in root container
• Shares=1, utilization_limit=90, parallel_server_limit=100
• User-defined directives for exceptionalPDBs
• *_limit parameters allow setting up “PDB caging”
26. PDB RESOURCE PLAN
• Allows to use the resources proportionally to the allocated
shares
• Works just like a resource plan for non-CDB
• Few restrictions
– A PDB resource plan can't have sub-plans.
– A PDB resource plan can have a maximum of eight consumer
groups.
– A PDB resource plan cannothave a multi-level scheduling policy.
• So we need to take action to re-implement the resource plans
when we switch from non-CDB to the CDB?
– Not always! It happens automatically,but how?
27. CONVERTING NON-CDB PLANS TO PDB PLANS
MULTI-LEVEL SCHEDULING POLICIES ARE NOT ALLOWED
• Automatically when the non-CDB is converted into PDB
– $ORACLE_HOME/rdbms/admin/noncdb_to_pdb.sql
– The original plan and plan directives are saved with STATUS=LEGACY
– A new plan is added with the same name and STATUS={null}
• Multilevel plan is converted into a single-level plan
• Algorithm is not documented, but appears to be simple enough
– Adjust allocated CPU% on each level
• Reduce each level to 75% proportionally
• Leave it as is if it’s already lower than 75%
– The “free portion” is passed to the lower level and split per calculated
percentages, the remaining portion is passed down
– The last level get’s all remaining resources
32. • RM requires resources
– I’ve heard rumors: 1-5-10 % of CPU?
• Testing needed!
NOTHING IS FOR FREE
33. MEASURING THE OVERHEAD
HOW AND WHAT DO WE TEST?
• HW – ODA V1 (12 Cores With HT => 24 Logical CPUs)
– Two 6-core 3.06 GHz Intel Xeon® X5675 processors
• DB versions
– 12.1.0.2 non-CDB
– 12.1.0.2 CDB (tests executed in 1 PDB)
– 11.2.0.4
• Checking:
– TEST1: Max Performance without RM
– TEST2: Max Performance with RM
– TEST3: Is the guaranteed resource allocation working?
– TEST4: Accuracy of the resource allocation
– TEST5: Overhead
33
34. MEASURING THE OVERHEAD
HOW AND WHAT DO WE TEST?
• SLOB in LIO testing mode
– 60 schemas, each 10000 blocks (80MB)
– Read-only (UPDATE_PCT=0)
– No think time (THINK_TM_FREQUENCY=0)
• A Few custom scripts
– Warm_cache.sql
– Wrapper to initiate SLOB (total of 441 runs)
– Modified runit.sh
• Switches consumer groups
• Triggers the status check
• Kills sessions
– Status check
– Response time of a non-DB script
34
35. TESTING SCRIPTS
STATUS.SQL
...
SELECT CURRENT_TIMESTAMP ts ,
NVL(RESOURCE_CONSUMER_GROUP,'{null}'),
COUNT(*) sessions,
SUM(ss.value) WORK_DONE
FROM v$session s,
v$sesstat ss
WHERE s.username LIKE 'USER%’
AND s.sid =ss.sid
AND ss.statistic#=(SELECT statistic# FROM v$statname WHERE name='consistent gets')
GROUP BY CURRENT_TIMESTAMP,
NVL(RESOURCE_CONSUMER_GROUP,'{null}')
ORDER BY 2
...
36. TESTING SCRIPTS
STATUS.SQL
DECLARE
TYPE t_progr IS TABLE OF NUMBER INDEX BY VARCHAR2(64);
pre_work t_progr;
pre_sess t_progr;
post_work t_progr;
post_sess t_progr;
pre_ts timestamp;
post_ts timestamp;
cursor c is select current_timestamp ts , nvl(RESOURCE_CONSUMER_GROUP,'{null}')||' / '||action RESOURCE_CONSUMER_GROUP, count(*) sessions,
sum(ss.value) WORK_DONE from v$session s, v$sesstat ss where s.username like 'USER%' and s.sid=ss.sid and ss.statistic#=(select statistic# from
v$statname where name='consistent gets') group by current_timestamp, nvl(RESOURCE_CONSUMER_GROUP,'{null}')||' / '||action order by 2;
c1 c%rowtype;
c2 c%rowtype;
l_key varchar2(100);
work_done number;
begin
for c1 in c loop
pre_ts:=c1.ts;
pre_work(c1.RESOURCE_CONSUMER_GROUP):=c1.WORK_DONE;
pre_sess(c1.RESOURCE_CONSUMER_GROUP):=c1.sessions;
end loop;
dbms_lock.sleep(30);
for c2 in c loop
post_ts:=c2.ts;
post_work(c2.RESOURCE_CONSUMER_GROUP):=c2.WORK_DONE;
post_sess(c2.RESOURCE_CONSUMER_GROUP):=c2.sessions;
end loop;
l_key := pre_work.first;
LOOP
EXIT WHEN l_key IS NULL;
work_done:=round((post_work(l_key)-pre_work(l_key))/(extract(minute from (post_ts-pre_ts))*60+extract(second from (post_ts-pre_ts))),3);
dbms_output.put_line(rpad(l_key,60,' ')||': '||rpad(post_work(l_key),16,' ')||' - '||rpad(pre_work(l_key),16,' ')||' = '||rpad(post_work(l_key)-
pre_work(l_key)||' / '||(extract(minute from (post_ts-pre_ts))*60+extract(second from (post_ts-pre_ts)))||'s',40,' ')||' ==> '||work_done||' w/s
(with '||post_sess(l_key)||' sessions) ' || round((work_done/post_sess(l_key)),3)||' w/s per session');
l_key := pre_work.next(l_key);
END LOOP;
end;
/
L2_GROUP1: 15582619 -681053 = 14901566 /180.46772s ==> 82571.919 w/s (with 12 sessions) 6880.993 w/s per session
L2_GROUP2: 129517874-6005013 = 123512861/180.46772s ==> 684404.175 w/s (with 12 sessions) 57033.681 w/s per session
L2_GROUP3: 260275057-9074727 = 251200330/180.46772s ==> 1391940.509 w/s (with 12 sessions) 115995.042 w/s per session
L2_GROUP4: 390102520-10238916 = 379863604/180.46772s ==> 2104883.932 w/s (with 12 sessions) 175406.994 w/s per session
L2_GROUP5: 457499395-9980217 = 447519178/180.46772s ==> 2479774.1 w/s (with 12 sessions) 206647.842 w/s per session
37. TESTING SCRIPTS
! RESPONSE.SH
$ cat ../response.sh
for i in {1..5000}
do
echo "sqrt($i)" | bc > /dev/null
done
$ time response.sh
real 0m4.886s
user 0m0.291s
sys 0m1.096s
38. TEST1
NO RESOURCE MANAGER
• Init parameters:
– CPU_COUNT=24
– RESOURCE_MANAGER_PLAN='FORCE:’
• CDB
– RESOURCE_MANAGER_PLAN='FORCE:’ was set in all PDBs
and ROOT.
– ! Having a RM plan enabled in one PDB caused the whole CDB
to be managed by the Resource Manager (even if no CDB plan
was set)
38
39. TEST1
NO RESOURCE MANAGER – TOTAL WORK
39
§ Almost linear scaling till 12 cores, HT adds ~25-30% per core.
§ Performance: 11gR2 > 12c CDB > 12c non-CDB
40. TEST2
NO RESOURCE MANAGER – BURN_CPU.SQL V2
40
§ OS script response is:
§ 4 – 7 s for 1-23 sessions
§ ~70 – 90 s for 24-48 sessions
41. OFFTOPIC – TEST1 (PURE PL/SQL TEST)
NO RESOURCE MANAGER – BURN_CPU.SQL V2
41
§ PL/SQL on 11gR2 performs worse compared to 12c J
42. TEST2
SIMPLE RESOURCE PLAN
42
• The resource plan
– SYS_GROUP = 1% at L1
– OTHER_GROUP = 1% at L1
– L2_GROUP1 = 1% at L1
• All sessions will be in L2_GROUP1
44. TEST2
SIMPLE RESOURCE PLAN
44
What is that
spike?
§ Even a very simple RM plan throttles sessions instead of letting them saturate the server
§ Spike at exactly 24 active sessions is caused by RM is not yet throttling sessions and all
Logical CPUs are used
45. TEST3
80%-15% RESOURCE PLAN
• The resource plan
– SYS_GROUP = 5%
– OTHER_GROUP = 0%
– L2_GROUP1 = 80%
– L2_GROUP1 = 15%
• 24 sessions will be started in L2_GROUP1
• 0-36 sessions will be started in L2_GROUP2
• The Goal
– Check if both consumer groups get the allocated resources
45
49. TEST4
ALLOCATION ACCURACY
• The resource plan
– SYS_GROUP = 1% at L1
– L2_GROUP1 = 0% at L1
– L2_GROUP2 = 10% at L1
– L2_GROUP3 = 20% at L1
– L2_GROUP4 = 30% at L1
– L2_GROUP5 = 39% at L1
– OTHER_GROUP = 0% at L1
• 12 sessions will be started in each L2_GROUP% group
• The Goal
– Check if all percentages are met
– 3 * 3 minutes, AVG
49
56. FINDINGS
• The basic overhead of RM is negligible ( <2% )
– Outlier cases are possible (but rare)
• Session holding a “latch” is sent off-CPU
• Session holding a lock is sent off-CPU
– ... only if out of resources already
• OS Responsiveness is useful – this alone is good enough reason to use RM
– For Troubleshooting
– For keeping RAC alive
• Don’t create “fancy” RM plans – It does not guarantee exact resource
distribution
• Careful with RM on CDB/PDBs!
– Enabling it on 1 PDB enables it for the whole CDB
– Remember the scheduler windows: (RMP='FORCE:')