Current situation: You’ve gone live on Oracle JD Edwards EnterpriseOne, recently having completed an upgrade or a new implementation, and now you’re noticing signs that something isn’t quite right. How do you determine what is going on with your system – where do you start trouble shooting?
2. April 2-6, 2017 in Las Vegas, NV USA #C17LV
Topics
• 100% CPU on JAS Servers
• Network Causing JDE to Crash
• Backup Strategy Crashing JD Edwards
• Database locking and blocking causing sporadic
performance issues?
• Largest Table in Database
• SQL Maintenance
• Index on a table improve performance?
• Add a second Web server to Production?
4. April 2-6, 2017 in Las Vegas, NV USA #C17LV
Symptoms
• WebServer CPU is running at or
near 100%
• Users are experiencing slow
and non responsive behavior.
• Some Users might not be able
to log on.
• Other instances on same server
are slow as well.
5. April 2-6, 2017 in Las Vegas, NV USA #C17LV
Multiple Causes = Same Resolution
Cause
• User Pulls Large amount of
Data. (DataBrowser, App)
• Custom Code Heavy Data
Calculations in ER.
• Standard Code Issues –
P30200, P90CG501 , P4310,
P04015, P40215…. (21033380,
2174901, 19995782)
• Tools Release Bugs (18787865,
18921015, 19711614)
• WebLogic Bugs (13836819,
23094342, 16857433…)
Resolution
• Restart the JAS Server
6. April 2-6, 2017 in Las Vegas, NV USA #C17LV
Under the Covers – Garbage Collection
CPU is running 100%
because Java Garbage
Collection Running Non-
Stop.
Garbage Collections Result
in “Stop the World” events.
http://www.oracle.com/webfolder/technetwork/tutorials/obe/java/gc01/index.html
7. April 2-6, 2017 in Las Vegas, NV USA #C17LV
Under the Cover – Infinite Loops
CPU is running 100%
because of a code infinite
loop.
Infinite loops use as much
CPU as possible, but
typically do not get in the
way of other requests.
If Condition
True
Run Code
Check
Condition
Unchanged
Set
Condition
to true
8. April 2-6, 2017 in Las Vegas, NV USA #C17LV
Troubleshooting
• Read the Log files
• Jas
• Weblogic
• Potentially Java logging
• Identify the Situation
• What are users doing?
• Investigate Known Bugs and issue
• Apps, Tools, Weblogic and Java
9. April 2-6, 2017 in Las Vegas, NV USA #C17LV
Resolutions
• Fix Custom Code
• Apply ESU’s
• Apply Tools release
• Update/Patch Weblogic
• Update Java
12. April 2-6, 2017 in Las Vegas, NV USA #C17LV
Physical Network
13. April 2-6, 2017 in Las Vegas, NV USA #C17LV
VMWare
https://pubs.vmware.com/vsphere-51/topic/com.vmware.vsphere.avail.doc/GUID-52F1BC6A-CC0D-4B1A-BDD7-5063B3AED1CE.html
14. April 2-6, 2017 in Las Vegas, NV USA #C17LV
Resolution - Recommendations
• Find the Metadata Kernel and verify that the disconnect
is in its logs. Kill it. A new one will be generated.
• Find the Security Server Kernel with the disconnects and
kill it.
• All JDE Servers should be on the same VLAN, Subnet,
Switch(es).
• Traffic between servers should never be routed
• If Network has issue, Test JDE do not assume it is OK.
• Simple, Network can’t go down.
16. April 2-6, 2017 in Las Vegas, NV USA #C17LV
Symptoms
• JDE is running on Virtual Hardware
• User’s can’t log into JDE first think in the morning
(especially Monday Morning), however not every
Monday.
• Log files look like a Network issue with lost connections.
• In Windows the Event Viewer looks clean except for a
Time Adjustment.
• Problem appears to happen at the end of the backups,
not the beginning or during
17. April 2-6, 2017 in Las Vegas, NV USA #C17LV
Snapshot Backup Initialized
San
Server-1
C_Drive.File
D_Drive.File
Server-2
C_Drive.File
D_Drive.File
Server-3
C_Drive.File
D_Drive.File
Physical Server
Physical Server
Server-1
Server-2
Server-3
Back-
up
Delta
Network
18. April 2-6, 2017 in Las Vegas, NV USA #C17LV
Snapshot Removed
San
Server-1
C_Drive.File
D_Drive.File
Server-2
C_Drive.File
D_Drive.File
Server-3
C_Drive.File
D_Drive.File
Physical Server
Physical Server
Server-1
Server-2
Server-3
Back-
up
Delta
Network
19. April 2-6, 2017 in Las Vegas, NV USA #C17LV
Back to Normal Operations
San
Server-1
C_Drive.File
D_Drive.File
Server-2
C_Drive.File
D_Drive.File
Server-3
C_Drive.File
D_Drive.File
Physical Server
Physical Server
Server-1
Server-2
Server-3
Back-
up
Network
20. April 2-6, 2017 in Las Vegas, NV USA #C17LV
Resolution
• Only do Backups during slow to no activity times
• Restart JD Edwards after Backups
• Use Database Tools to do Database Backups
• WebServers/BSSV/AIS Rarely Change – Limit Backups
• Enterprise Servers – PrintQueue, Packages change
regularly
• Deployment Server does not stability of production.
22. April 2-6, 2017 in Las Vegas, NV USA #C17LV
Symptoms
• Endless Spinning Wheel
• Slow responses in some applications, but fine in others.
• Batch Job never finishing
• Batches getting backed up in queues
23. April 2-6, 2017 in Las Vegas, NV USA #C17LV
SQL Server – Activity Monitor
24. April 2-6, 2017 in Las Vegas, NV USA #C17LV
Oracle Server
Script to Identify Blockers
• select * from dba_blockers
• will show the Oracle SID(s) of sessions blocking
You can then take that to v$session as in:
• select username, sid, serial#, program, machine, event,
seconds_in_wait from v$session where sid = NNN
• (NNN from first query)
25. April 2-6, 2017 in Las Vegas, NV USA #C17LV
Resolutions
• All - Determine the Code the is causing the block and fix
it.
• All – Stop the offending process, Typically JDE call object
or Batch Job Kernel
• Oracle – Kill Session (alter system kill session ‘sid’,’serial’)
• SQL Server – Turn on RCSI and Snap Shot Isolation.
Which will allow dirty reads of the data
• SQL Server – Use NoLock Option in third party database
accsees
• Select * from PRODDTA.F0101 with (NOLOCK)
27. April 2-6, 2017 in Las Vegas, NV USA #C17LV
Finding the Largest Tables
SQL Server
SELECT
t.NAME AS TableName,
s.Name AS SchemaName,
p.rows AS RowCounts,
SUM(a.total_pages) * 8 AS TotalSpaceKB,
SUM(a.used_pages) * 8 AS UsedSpaceKB,
(SUM(a.total_pages) - SUM(a.used_pages)) * 8 AS UnusedSpaceKB
FROM
sys.tables t
INNER JOIN
sys.indexes i ON t.OBJECT_ID = i.object_id
INNER JOIN
sys.partitions p ON i.object_id = p.OBJECT_ID AND i.index_id =
p.index_id
INNER JOIN
sys.allocation_units a ON p.partition_id = a.container_id
LEFT OUTER JOIN
sys.schemas s ON t.schema_id = s.schema_id
WHERE
t.NAME NOT LIKE 'dt%'
AND t.is_ms_shipped = 0
AND i.OBJECT_ID > 255
GROUP BY
t.Name, s.Name, p.Rows
ORDER BY
SUM(a.used_pages) desc
From <http://stackoverflow.com/questions/7892334/get-size-of-all-
tables-in-database>
Oracle
select * from (
select owner, segment_name, bytes/1024/1024 Size_Mb from
dba_segments order by bytes/1024/1024 DESC )
where rownum <= 20
From <http://www.freelists.org/post/oracle-l/Finding-top-20-large-
objectstables-in-database,2>
29. April 2-6, 2017 in Las Vegas, NV USA #C17LV
Resolutions
• Run you JDE Purge Programs (F42199, F986110,
F0101Z1, etc.)
• Implement Archiving Solutions like Essentio and
Archtools
• Scheduled SQL Scripts to clear outdated data.
31. April 2-6, 2017 in Las Vegas, NV USA #C17LV
SQL Fragmentation
SELECT dbschemas.[name] as 'Schema',
dbtables.[name] as 'Table',
dbindexes.[name] as 'Index',
indexstats.avg_fragmentation_in_percent,
indexstats.page_count
FROM sys.dm_db_index_physical_stats (DB_ID(), NULL, NULL, NULL, NULL) AS indexstats
INNER JOIN sys.tables dbtables on dbtables.[object_id] = indexstats.[object_id]
INNER JOIN sys.schemas dbschemas on dbtables.[schema_id] = dbschemas.[schema_id]
INNER JOIN sys.indexes AS dbindexes ON dbindexes.[object_id] = indexstats.[object_id]
AND indexstats.index_id = dbindexes.index_id
WHERE indexstats.database_id = DB_ID()
ORDER BY indexstats.avg_fragmentation_in_percent desc
33. April 2-6, 2017 in Las Vegas, NV USA #C17LV
Rebuild vs ReOganize
• Rebuild is always an offline
operation and has to finish in
order to fix indexing problems
• Reoganize can be done online
but only reorganizes may not
work well with highly
fragmented tables
avg_fragmentation_in_percent value Corrective statement
> 5% and < = 30% ALTER INDEX REORGANIZE
> 30% ALTER INDEX REBUILD WITH (ONLINE = ON)*
35. April 2-6, 2017 in Las Vegas, NV USA #C17LV
Index added to F03B11Z1, F0411Z1,
F0911Z1
“I just tested several batches and the run time improved
significantly on the large batches. Batch 600000000819
and 600000000831 went from over one hour+ to less than
5 minute”
- Happy Customer
36. April 2-6, 2017 in Las Vegas, NV USA #C17LV
Resolution
• Identify Indexes needed using Database Tools and
Developer expertise. (aka understand the code)
• Build the Index in JDE Edwards (you can do it at the
database level but you will loose it during an upgrade)
• Don’t over do it! Many time indexes will help but it is
possible to create an index that has a negative impact on
performance.
38. April 2-6, 2017 in Las Vegas, NV USA #C17LV
Symptoms
• Best Practices says 50-100 users per Jas instance.
• Need to be able to able to bring a webserver down but
not all of JDE.
• Want to leave the Webservers up for testing but keep
users out.
• Want to use a Alias URL eg. jde.company.net
39. April 2-6, 2017 in Las Vegas, NV USA #C17LV
Resolution
• Option 1 – Give different users different URL’s and
effectively load balance by picking who goes where
Web 1 Web 2
40. April 2-6, 2017 in Las Vegas, NV USA #C17LV
Resolution
• Option 2 – Use DNS Round Robin and have users hit
different servers.
Web 1 Web 2
41. April 2-6, 2017 in Las Vegas, NV USA #C17LV
Resolution
• Option 3 – Use a Load Balancer
Web 1 Web 2
Load Balancer
42. April 2-6, 2017 in Las Vegas, NV USA #C17LV
Contact us: weknowjde@terillium.com
Questions?
Notas del editor
Windows more likely to see Weblogic 100% Cpu issues.
Automatic garbage collection is the process of looking at heap memory, identifying which objects are in use and which are not, and deleting the unused objects. An in use object, or a referenced object, means that some part of your program still maintains a pointer to that object. An unused object, or unreferenced object, is no longer referenced by any part of your program. So the memory used by an unreferenced object can be reclaimed.
In a programming language like C, allocating and deallocating memory is a manual process. In Java, process of deallocating memory is handled automatically by the garbage collector. The basic process can be described as follows. - (http://www.oracle.com/webfolder/technetwork/tutorials/obe/java/gc01/index.html )
Infinite loops seems to have less impact on performance.
A variety of messages.
Reporting Servers cause locks and block
To Rebuild or Reorganize: That is the Question
First off: ‘Reorganize’ and ‘Rebuild’ are two different operations that each reduce fragmentation in an index. They work differently toward the same end. You don’t need to run both against the same index. (I sometimes find that people are doing both against every index in a maintenance plan. That’s just double the work and NOT double the fun.)
Rebuild: An index ‘rebuild’ creates a fresh, sparkling new structure for the index. If the index is disabled, rebuilding brings it back to life. You can apply a new fillfactor when you rebuild an index. If you cancel a rebuild operation midway, it must roll back (and if it’s being done offline, that can take a while).
Reorganize: This option is more lightweight. It runs through the leaf level of the index, and as it goes it fixes physical ordering of pages and also compacts pages to apply any previously set fillfactor settings. This operation is always online, and if you cancel it then it’s able to just stop where it is (it doesn’t have a giant operation to rollback).
Factors to consider:
Standard Edition rebuilds ain’t awesome. If you’ve got SQL Server Standard Edition, index rebuilds are always an offline operation. Bad news: they’re also single-threaded. (Ouch!)
Enterprise Edition rebuilds have gotchas. With SQL Server Enterprise Edition, you can specify an online rebuild — unless the index contains large object types. (This restriction is relaxed somewhat in SQL Server 2012). You can also use parallelism when creating or rebuilding an index— and that can save a whole lot of time. Even with an online rebuild, a schema modification lock (SCH-M) is needed at the time the fresh new index is put in place. This is an exclusive lock and in highly concurrent environments, getting it can be a big (blocking) problem.
There’s a bug in SQL Server 2012 Enterprise Edition Rebuilds that can cause corruption. If you’re running SQL Server 2012 SP1 – SP2, parallel online index rebuilds can cause corruption. Read about your options here.
Rebuilding partitioned tables is especially tricky. You can rebuild an entire partitioned index online– but nobody really wants to do that because they’re huge! The whole idea behind horizontal partitioning is to break data into more manageable chunks, right? Unfortunately, partition level rebuilds are offline until SQL Server 2014.
Reorganizing can be pretty cool. ‘Reorganizing’ an index is always an online op, no matter what edition of SQL Server you’re using. It doesn’t require a schema mod lock, so it can provide better concurrency. Reorganizing only defragments the leaf level of the index. On large tables it can take longer than a rebuild would take, too. But as I said above, it’s nice that you can reorganize for a while and then stop without facing a massive rollback.