1. BIM BIG extractions and
performance monitoring
or how to Prevent extraction from BIM to BIG when BIM is
overloaded
June 11th, 2007
2. Summary
the same tool collect KPI of BIM (free memory, free
WP, CPU usage...) and calculate how many new
extractions on BIM can be started from BIG.
the GC can influence the calculated number of
extractions with the following parameters in table
YEDW_BWADMIN, maintained in BIM :
BIG_EXTRACT_BTC_MAX_ALLOW_PARA
BIG_EXTRACT_BTC_FREE_REQ_PARA
BIG_EXTRACT_DIA_FREE_REQ%
BIG_EXTRACT_DIA_FREE_REQ_PARA
BIG_EXTRACT_MAX_CPULOAD%
BIG_EXTRACT_MAX_IO_WAIT%
BIG_EXTRACT_MAX_TIME_RFC_MS
BIG_EXTRACT_MAX_TIME_CHECK_MS
3. Limitating factors A & B
YEDW_BWADMIN Parameters
A - BIG_EXTRACT_BTC_MAX_ALLOW_PARA
is the main parameter. It's the number of extractions that can
run in parallel (we count the number of BTC). The number of
extraction that can be started is the value of this parameter
minus hte number of BTC used by BIG (ie number of
extractions already running).
Change it to 0 to prevent any extraction. When you reduce
this parameter, inform immediatly Performance team and
BIG project Leader.
Default is 4.
B - BIG_EXTRACT_BTC_FREE_REQ_PARA
Indicate how many batch workprocess (BTC WP) must be
free to kick off one additional extraction. Default is 5. It
means that if 15 BTC are free, 3 additional extractions can
be started maximum (depending also on other parameters).
4. Limitating factors C, D, E
YEDW_BWADMIN Parameters
C - BIG_EXTRACT_DIA_FREE_REQ%
Indicate how many DIA must be free, in percentage of total
number of DIA, to kick off additional extractions. Default is
20. It means that if you have 100DIA, and less than 20 are
free, no new extraction will be started.
D - BIG_EXTRACT_DIA_FREE_REQ_PARA
Indicate how many DIA WP must be free to kick off one
additional extraction. Default is 20. It means that if 60 DIA
are free, 3 additional extractions can be started maximum
(depending also on other parameters).
E - BIG_EXTRACT_MAX_CPULOAD%
The load average tells you how many processes are trying to
use the available CPU's. This figure is averaged out of the
period 5 min. If the load is greater than number of CPU, then
jobs are queuing.
If the load in percentage of total number of CPU is greater
than this parameter, no new extraction will be started. Default
is 80%.
5. Limitating factors F, G, H & I
YEDW_BWADMIN Parameters
F - BIG_EXTRACT_MAX_IO_WAIT%
if IO wait on the DB is higher than this limit, no new extraction is
started. Default is 50%.
G - BIG_EXTRACT_MAX_PACK_DELAY_MIN
If tRFC queue is too long, no new extraction is started. The program
check the age of the oldest packet in SM58 queue, with target BIG.
Default is 180min.
This indicator has been removed on Aug 14th as with "non
standard" tRFC method, the age of the queue is the beginning of the
IP. And with tRFC issue, sometimes a packets remains in the queue
and age of the queue is meaningless.
H - BIG_EXTRACT_MAX_TIME_RFC_MS
This is duration on a single RFC call. if RFC call to AS is too slow,
no new extraction is started. Default is 1000ms.
I - BIG_EXTRACT_MAX_TIME_CHECK_MS
This is the total duration of the check program. RFC call, check all
KPI : memory, free WP,... It can be slow if BIG is slow or if BIM is
slow. Default is 6000ms. If duration is higher, no new extraction is
started.
6. Calculate the number of extraction to start
Use fonction module YGTTC_PERF_MONITOR
2 cases : the target
contains new FM, or
not. With July 2007
import all, a FM will
be delivered in BIM,
the test will be done
in less than 1sec.
If the source does not
have this FM, no
extraction are started
if the check runs for
more than 60sec (no
param for this
how many new extractions the details : free temporary solution)
can be started and limitating WP...
factor
7. Monitoring
Report YGTTC_PERF_MONITOR must be scheduled
on BIG to capture the KPI in the following tables :
table YGTTC_PERF_MONIT contains history
table YGTTC_PERF_MLAST contains only the last
snapshot
table /BIC/AYGTTCPM00 (ODS YGTTCPM) contains
data formatted for charts
8. Fields of YGTTC_PERF_MONIT and
YGTTC_PERF_MLAST
DATS & TIMS Day and time of the snapshot
SID SID
HOST Host
DB DB (Y/N)
PHYS_MEM Total Memory
FREE_MEM Memory free
SWAP_FREE Swap free
WP_DIA No of DIA (for DB, sum of all AS and DB)
WP_DIA_FREE No of free DIA (for DB, sum of all AS and DB)
WP_DIA_ITSELF No of DIA used by ITSELF (for DB, sum of all AS and DB)
WP_BTC No of BTC (for DB, sum of all AS and DB)
WP_BTC_FREE No of free BTC (for DB, sum of all AS and DB)
WP_BTC_ITSELF No of BTC used by ITSELF (for DB, sum of all AS and DB)
USR_TOTAL CPU used by user
SYS_TOTAL CPU used by system
IDLE_TRUE True Idle
WAIT_TRUE IO WAIT
NBR_CPU No of CPU on this box
LOAD_AVG CPU Load Average on 5 min (ST06) * 100
(divide by 100 the value and compare with number of CPU)
RESPTIME_MS Response time to collect this information
TIMESTAMP UTC time stamp in long form (YYYYMMDDhhmmss,mmmuuun) of the snapshot
PACKET_RECORDED Number of packets in status recorded (to be sent)
PACKET_DELAY_MIN Age of the oldest packet in stat recorded =delay in RFCqueue
MAX_BTC_ALLOWED Max number of BTC allowed in YEDW_ADMIN in source
CALC_EXTRACT_PAR Calculated number of extraction that can be started in paral
CALC_LIMIT Limitating factor in the calculated number of parallel extra
DUMPS Number of dumps in the last 10 minutes
RESPTIME_DIA Average response time of DIA processes
9. YGTTC_PERF_M*
For each instance you have figures per AS and for the DB. On
the DB line, you have the total of all AS for number of WP, the
detail of tRFC queue, and the calculation of the number of new
extractions that can be started.
DB = total for all AS
read 15.44 DB = time for complete check. result only on DB line
AS = time of a simple tRFC
10. Report YGTTC_SM66_READ
If a performance issue due to system overloaded has
to be investigated, report YGTTC_SM66_READ can
capture all the jobs running (SM66) in tables :
YKPISM66MASTER : header info
YKPISM66RECORD : details of the job running
This report runs remotely on any system and uses the
FM YS_KPI_RFC_GET_SM66 developped by FTS.
YKPISM66RECORD contains the userid, report,
memory used... for all jobs running on the remote
system. For example if issue is due to all memory
used, you can see which job is using most of the
memory.
11. Table /BIC/AYGTTCPM00
Contains mainly KPI in percentage to represent all system on the same
baseline :
0CALDAY
0TCTTIMSTMP
0TCTSYSID
YAVLOAD - Load average of CPU of DB in percentage of total CPU. Trigger
sm66 if >80%.
YIOWAIT - IO Wait % of DB. Trigger sm66 if >30%.
YFREEDIA - Free Dia in percentage of all DIA (total all AS)
YFREESWAP - Free Swap in percentage of total swap. Minimum value of all AS
& DB of one instance
YRESPPERC - percentage of average response time of check report compared
to average on one week
YRESPDIA - Average response time of DIA processes
YRESPBIB – MAX or ?? Average response time of subjob ODS activation (jobs
BIB*), they always insert the same number of rows (except last packet) and
should always be below 200sec. Only available since v7.
YDUMPS - Number of dumps (total all AS)
http://hqaap012.ctr.nestle.com:3401/sap/bw/BEx?cmd=ldoc&sap-language=EN&template_
12. High level view shows OB8 CPU overloaded
24 hours : 22 hours with hourly average,
2 hours with snapshot every 10 min
15. Next indicators to capture – to discuss with FTS
Average response time of BIB (running or finished in last 10min)
locks and locks wait ?
Row scan for top 10
IP with too many packets
tablespace fill rate
BIG locks
Capture sm66 automatically if one indicator is too high
Who will be doing the monitoring ? PCI BW team.
Alert to PCI team. Plasma screen (Kirkan>Shaun) ? Analysis to
send ticket to right team. Direct link from PCI team and SME BW
or perf person ? ie no first level involvment. Michael ?
Front page to list known issues and IM in progress to avoid
investigation of same issue by several persons?
Philippe to distribute names of PCI team and SME ? tbc