Handwritten Text Recognition for manuscripts and early printed texts
Openmic crash,hang,monitoring
1. Open mic onOpen mic on
Overview of Domino serverOverview of Domino server
crash, hang and health checkcrash, hang and health check
88thth
August, 2013August, 2013
2. 2
Ranjit Rai – Lotus Technical Advisor
Focussing on Entire Notes Domino
Hansraj Mali – Lotus Technical Advisor
Focussing on Entire Notes Domino
Vinayak Tavargeri – Lotus Support Manager
Open Mic Facilitator
Open Mic Team
Jayaval Rajendran – Lotus Technical Advisor
Focussing on Entire Notes Domino
Sukanya Deepthi – Lotus Technical Support Engineer
Presenter
3. AgendaAgenda
➢ What is Domino server crash?
➢ Causes of Domino server crash.
➢ What is NSD/causes of incomplete NSD
➢ What is Domino server hang /Performance
➢ Causes of Domino server hang/Performance
➢ Data Collection for crash
➢ Performance/hang Data Collection
➢ How to Monitor Server health (SAI, DCT, DDM and server commands)
➢ What and How to use Domino Diagnostic Probe
➢ Best Practices
➢ Resources
➢ Q&A
4. Definition of Domino Server crashDefinition of Domino Server crash
➢ A Crash is a controlled shutdown of the Domino processes.
➢ The Shutdown handler, in most cases generates an NSD so the cause of the
crash can be identified and prevented.
➢ All the processes that loads Notes runtime (Nnotes.dll) gets terminated by
the shutdown handler.
➢ In most cases, a descriptive message, to the cause of the crash may be
found on the console.log.
For sample it will be as below:-
Thread=[00E3:012T] PANIC: Insufficient Memory
5. What could be the causes of aWhat could be the causes of a
server to crash?server to crash?
● In most cases, a defect in the codes.
● Processing an un-initialized piece of data can cause an 'Access violation'
● Accessing a Corrupt data structure. A crash in this case may be desirable
since this could cause corrupt data be written to the disk.
● Un-handled/unknown errors.
● Low resources - eg., memory.
● OS Simply can (uncontrolled) shutsdown the processes because they use
too much resources - may not produce an NSD.
6. What is NSD?What is NSD?
➢ NSD - Notes System Diagnostic
➢ It is most important diagnostic data used in troubleshooting issues like
crash, hang, performance, memory related and any errors on console
➢ By default it will be enabled in server document --> Basic Tab --> Fault
Recovery Section --> “Run NSD to Collect Diagnostic Information”.
➢ NSD log file will be created by default in IBM_TECHNICAL_SUPPORT
directory with following name format (it includes date and time when the file
was created)
➢ Format will be as below:
nsd_<Platform>_<ServerName>_YYYY_MM_DD@HH_MM_SS. Log
Note: Recommended to leave the default settings on.
7. Reasons for Incomplete NSDReasons for Incomplete NSD
- Do not click on cancel button while NSD running at the time of crash
(Windows):-
- In Unix do not kill any processes/all processes with nsd -kill command.
8. My server crashed, Now what?My server crashed, Now what?
1. Send the NSD & console.log files from IBM_Technical_Support folder to
Support team.
2. Support matches the fatal stack with the existing known defects for the
release of the version that customer is using.
2.1 If there's a match and a coded Fix exist.
→ A HotFix is provided OR if the next MR already has a fix, an upgrade
maybe suggested and is desired in most cases; more than a handful of Hotfixes
can be bad.
→ If there no hotfix available, a new fix request will be submitted. Usually
takes a day or two to build it.
2.2 If the crash is new, usually an extensive review/collaboration is carried
out as to find cause of the crash and a mean to alleviate the crash can be
suggested.
eg., Crashing due to corrupt data or low memory condition which can easily be
alleviate by running maintenance on the database or by reducing memory
usage.
9. Continued..Continued..
2.3 Not enough data to move forward. Need more data with some debug
ini's turned on. Eg., Memory overwriteCorruption.
May require few iterations of data collection with some special ini's turned on.
Support would usually be explicit about this.
3. Once a fix is provided, monitor the server post fix application. IMPORTANT:
Be sure to turn off any ini's that were suggested during the course of
troubleshooting.
10. Best Practices in minimizingBest Practices in minimizing
crashes/downtimescrashes/downtimes
● Always run the latest MR.
● Run only what's needed.
● Periodic maintenance of the key databases.
● Enable Transaction logging for large server - this ensure the server come
back online quickly in the event of a crash.
● If NSD takes longer to finish then please report this to Support. Support may
suggest few parameters to speed up based on the conditions - eg., nsd
-nomemcheck and -nodirlist can speed up the NSD generation.
11. What is hang/performanceWhat is hang/performance
Hang is a situation where the Domino server is still running and can see domino
console, but one or more tasks on the server are not responding to requests.
These tasks may still be active, but they are not responding to the request. This
is also a state that sometimes occurs when computer programs do not run as
designed. Most of the time, a hang occurs due to a low-level loop or a
permanent unavailability of a resource, causing serious performance issues.
Here the NSD will not get generate automatically. We need to run NSD manually
at the time of issue.
12. Causes of Domino ServerCauses of Domino Server
Hang/PerformanceHang/Performance
➢ It includes Resource problems (insufficient resources)
➢ Third-party application conflicts
➢ Hardware problems such as:
High CPU Usage
High memory usage
Slow Disk I/O
Network related issues
➢ In general, server hangs are more difficult to analyze than server crashes.
13. Data collection for DominoData collection for Domino
hang/performancehang/performance
➢ Should enable below debug parameters:-
debug_threadid=1
Console_log_enabled=1
debug_show_timeout=1
debug_capture_timeout=1
Server_show_performance=1
➢ Should run back to back manual NSDs
➢ Collect that NSD, console log and semdebug files.
14. Difference between crash and hangDifference between crash and hang
Crash:-
1. All Domino processes will end
2. NSD will run automatically
Hang :-
1. Domino processes still keep running but users will not receive any response
from Domino for their requests
2. NSD should run manually.
15. What to do when Domino server isWhat to do when Domino server is
completely downcompletely down
- When Domino server continuously crashing or throwing semaphores at startup
then primarily try below things to make it up:-
Recreate log.nsf
Recreate mail boxes
Recreate transaction logging (if it is enabled)
- If still Domino server not coming up then comment out server tasks line from
notes.ini by putting semicolon (;) in-front of it and load task one after the other
and see which task having the issue:-
16. Monitoring Server healthMonitoring Server health
- We can see server health by viewing few things as below:-
Here we can see CPU utilization of each task (same thing can be done from
task manger for Windows/ Topas command for Unix)
17. Continued..Continued..
- Using Domino Configuration Tuner:-
DCT pulls information from the notes.ini file, Server documents, and
Configuration documents. DCT looks at the configuration settings to see if
something's out of line. Can download for free from below link:-
http://www-01.ibm.com/support/docview.wss?uid=swg24019358&rs=0&cs=utf-
8&context=SWA00&dc=D400&q1=dct
- Open it and Chose server -->Name scan-->Click run
19. Continued..Continued..
- Using Domino Domain Monitoring:-
Here we can see the status of all processes. And main part here we have to
check availability Index. If this value is very low then we may face performance
issue. We can check this value with command "show ai" and use the
recommended value for server_transinfo_range.
20. Continued..Continued..
- Each server in a cluster periodically determines its own workload based on the
response time of the requests the server has processed recently. The workload
is expressed as a number from 0 to 100, where 0 indicates a heavily loaded
server and 100 indicates a lightly loaded server. This number is called the server
availability index. As response times increase, the server availability index
decreases. The server availability index is based on the expansion factor, which
indicates the current workload on a server
You can gauge the average Expansion Factor during busy times and use this
chart to determine a value of Server_Transinfo_Range that will yield the
approximate desired SAI. The expansion factor can be obtained from Domino
statistics.
21. Continued..Continued..
-- From below we can check statistics of particular server. Same can be
obtained with “sh stat” command from Domino server console:-
22. Console commandsConsole commands
- “Sh Server”
Here we can see since how long server is up and running. transaction/minute,
peak# of transaction and at what time, availability index, mails pending/dead
mails etc;
24. Continued..Continued..
- “Sh stat”
We will gets all statistics details. Here we can see average queue length. If
average queue length is greater than 2 we can say there might be some disk IO
issues.
We can see each task CPU utlization and memory utilization
25. Continued..Continued..
- “Sh user debug”
Can see how may users connected, idle time, netadress etc;. If they are idle
since long time then we can use parameter “server_session_timeout=xx”, where
xx is a value in minutes. This forces the server to close a session which has
been idle for the "xx" period of time and frees up the session memory used by
an otherwise idle session.30-45 minutes is the minimum recommended setting
for this parameter.
26. Domino Diagnostic ProbeDomino Diagnostic Probe
● DDP is a small Java utility provided by Lotus Support to monitor Lotus
Domino servers (Domino 8.x and above). It intended to be used for servers
that have been intermittently slow or unresponsive. The probe is a Java
process (dbopen.jar) that runs in the background. The utility probes a
specific Domino server over time by invoking database open transactions. If
it detects a slow response it will invoke NSD to gather diagnostic data.
● How the probe works
The probe is started from a command prompt and runs as a standalone
process, not as a Domino server process. It uses the identity of the Domino
server and attempts to open a session and a database as specified by the
-database [-d] parameter every n seconds as specified by the -polling [-p]
parameter. If the time it takes to open the database exceeds the time specified
by the -threshold [-t] parameter, then the NSD program is launched to collect
diagnostic data.
Note: You must use the IBM Java that comes with Lotus Domino. The probe
has not been tested with SUN Java and is therefore not supported.
27. Continued..Continued..
● Options for the probe:-
The table below describes the options for the Domino Diagnostic Probe utility.
To stop the probe, you must use the quit command at the command prompt from
which you started the probe. Always use the quit command to stop the probe.
Do not use Ctrl-C or close the window without first issuing a quit command. If
you do, the java.exe process will terminate abnormally and you will see
messages on the Domino server console. In this event, the computer will need
to be restarted.
28. Continued..Continued..
● Setting up to run the probe:-
1. Create a new database of any type on the Domino server to be monitored; for
example, maildomprobe.nsf. Once created, open the database and go to File
-> Application -> Access Control. In the Access Control List, add and then
highlight the Domino server name and change the User type field to Unspecified
as shown below. Save changes.
29. Continued..Continued..
2. Copy the dbopen.jar to the Domino program directory. From a command
prompt, switch to the directory where the server’s notes.ini is located (typically
the data directory on UNIX and the program directory on Windows). Start the
probe from a command prompt using the syntax example provided below:
Windows:
jvmbinjava -jar dbopen.jar -d maildomprobe.nsf -t 3 -p 30 -nsdoptions "-
nomemcheck" -outfile
C:DominodataIBM_TECHNICAL_SUPPORTDomPerfMon.txt
● For Unix and iSeries you can check from below link:-
http://www.ibm.com/support/docview.wss?uid=swg21429892
Note: If the Domino server becomes unresponsive for an extended period of
time, the probe will execute the NSD three times. Once the Domino server is
restarted, the probe will resume normal operation.
30. Best PracticesBest Practices
- Make Sure Transaction logging and DAOS to be on different/dedicated drive
(other than Domino data Directory Drive), if enabled.
- Maintain good free disk space where data directory is located
- Schedule monthly server restarts
- Do not schedule maintenance tasks during business hours and also make
sure to finish before business hours (like compact, updall, fixup)
- Slow mail delivery can take place when large group is listed in bcc field in a
mail message. Use the notes.ini variable, Disable_BCC_group_expansion=1, to
disable bcc group expansion.
31. Determining if the OS resources isDetermining if the OS resources is
causing the performance issuecausing the performance issue
● AIX
- nmon
- http://www.ibm.com/developerworks/aix/library/au-analyze_aix/
● Windows
-perfmon
- http://technet.microsoft.com/en-us/library/bb490957.aspx
● Linux
- linperf
- http://sourceforge.net/projects/iperf/
● Solaris
- sarmon
- http://www.geckotechnology.com/sarmon
32. ResourcesResources
Title: How to run a manual NSD for Notes/Domino on Windows
URL: http://www.ibm.com/support/docview.wss?rs=899&uid=swg21204263
➢ Title: How to run NSD manually on a Domino server for UNIX platforms
URL: http://www.ibm.com/support/docview.wss?rs=899&uid=swg21214298
➢ Title: How to automate the collection of memory dumps
URL: http://www.ibm.com/support/docview.wss?rs=899&uid=swg21104943
➢ Overview of HTTP Request Logs for Domino Web server
URL: http://www.ibm.com/support/docview.wss?uid=swg27003598
➢ What is a basic definition of a Domino server hang and crash?
URL: http://www.ibm.com/support/docview.wss?uid=swg21164958