3. Skill Sets of Kernel Debugging
• Key elements for kernel debugging
> Kernel source code
– http://src.opensolaris.org/source/xref/onnv/o
nnv-gate/usr/src/
> Kernel debugging tools
> System Architecture
– x32/x64/SPARC
> Programing skills
– C/Assembly/D/Shell/Awk/Sed/Perl
3
4. Kernel Debugging Tools
• Debug In Code
> cmn_err(9F) - Kernel version of printf(3C)
> ASSERT - Only effective in debug kernel
• In-situ kernel debuggers
> Kmdb, SPARC OBP
• Run time tracing
> DTrace, Lockstat, Kmem allocator...etc.
• Post-mortem debuggers
> Mdb, ACT, SCAT
4
5. Difficulties of Kernel Debugging...
• The problems you may encounter
> System Panic
> System hang
> Memory leaks & corruption
> Performance issues
> Any other functionality issues
• Some of hot bugs found on customer sites...
> Can not debug on the non-production kernel
> Can not debug on mission-critical machines
> May not be deterministically reproduced
> May only have the crash dumps
5
7. Mdb - The Modular Debugger
• Mdb targets
> User processes
> User process core files
> Live kernel read only by /dev/kmem&/dev/ksyms
> Live Kernel with execution control by kmdb
> System crash dumps
> User process images inside system crash dumps
> ELF object files
> Raw data files
7
8. Live Kernel Debug – Read Only
• How to run it?
> mdb -k
• What you can do?
> Inspect kernel data structures and kernel pages
> /dev/kmem
Access kernel virtual address space excluding memory
that is associated with an I/O device
> /dev/ksyms
Access kernel symbols as kernel ELF definitions
8
9. Live Kernel Debug - Execution Control
• How to run it?
> mdb -K
> Boot system with kmdb loaded
– x86 “-k”option in grub menu
– SPARC “-k or kmdb” option in OBP
• What you can do?
> Instruction-level control of kernel threads
executing on each CPU
> Setting breakpoint and single-step the kernel
and inspect data structures in real time
9
11. Post-mortem Debug - Crash Dumps
• How to use it
> mdb unix.<n> vmcore.<n>
• What you can do?
> Access kernel memory pages and user process
images inside a system crash dump
> Inspect kernel/user process data structures
and kernel/user process pages
11
12. Post-mortem Debug - Crash Dumps
• You can get a crash dump by...
> A real panic
> Reboot with -d
> Enter kmdb, run $<systemdump
> Deadman timer
– Setting snooping to 1 in /etc/system, reboot
– Setting deadman_enabled to 1 via mdb -kw
• savecore(1M) & dumpadm(1M)
Dump content: kernel pages
Dump device: /dev/dsk/c0d0s1 (swap)
Savecore directory: /var/crash/<hostname>
Savecore enabled: yes
12
14. Modular Debugger Basic
• Inspect memory and data structures
> addr[,b]::dump [-g sz] [-e]
> addr::dis
> addr::print type field
> ::sizeof type
> ::offsetof type field
> ::enum enumname
> addr::array [type count] [var]
> addr::list type field [var]
14
15. Crash Dumps Analysis - Panic
• Panic procedures
> Panic messages
– Panic thread
– Trap number
– Pointer of trap frame
– CPU registers
– back trace
> Dump memory to dump device
> Dump CPU registers to dump device
> Reboot
> Savecore (from dump device to file system)
15
17. Crash Dumps Analysis – Hang
• What conditions cause hangs?
> Deadlock
> Resources exhaustion
> Hardware problems
• Debugging system hangs
> Live debugging with kmdb
> Forcing a crash dump and analysis with mdb
17
21. Dynamic Tracing Framework
• DTrace framework includes...
> Consumer programs running in user land
– dtrace(1M)/intrstat(1M)/lockstat(1M)...
> Kernel modules that provide probes to gather
tracing data
– dtrace(7D) and providers: syscall/fbt/sdt/vminfo...
> A library interface that consumer programs use
to access the DTrace facility by dtrace driver
21
23. Provider
• How provider works
> Provider represents a methodology for
instrumenting the system
> Provider covers a certain aspect of the system
> Provider makes probes available to the DTrace
framework
> DTrace informs providers when a probe is to be
enabled provider transfers
• Using providers with different ways
> Watch code path
– fbt/sdt/syscall/pid/fsinfo/io/vminfo/proc/sched, etc.
> Get statistical data
– mib/lockstat/profile/sysinfo, etc.
23
24. Providers
Provider Description
lockstat lock contention statistics or understand locking behaviors
profile a time-based interrupt firing every fixed, specified interval
fbt entry to and return from most functions in the Solaris kernel
syscall entry to and return from every system call in the system
sdt locations at that a programmer has formally designated
sysinfo correspond to kernel statistics classified by the name sys
vminfo correspond to the vm kernel statistics
proc process creation and termination,sending and handling signals
sched related to CPU scheduling
io related to disk input and output
mib related to counters in MIB - management information bases
pid entry and return of any function in a user process
24
25. Running DTrace
• D scripts
> Run *.d scripts
#!/usr/sbin/dtrace -s
probe
/predicate/
{
actions
}
• Command line
> Run dtrace command, see dtrace(1M)
dtrace -n probe'/predicate/{actions}'
25
26. Probe
• provider:module:function:name
> Provider
– The instrumentation method to be used.For example,
the syscall provider is used to monitor system calls
while the io provider is used to monitor the disk io.
> Module
– The kernel module you want to observe
> Function
– The kernel function you want to observe
> Name
– Represents the location in the function. For example,
use entry for name to instrument when you enter the
function.
26
27. Probe
• A probe...
> Is defined as 4-attribute tuple
> could be listed by dtrace -l [-f|-l|-m|-n|-P]
> supports wildcards match
Probe Description Explanation
fbt::bge_intr:entry entry into bge_intr functions
fbt::bge_*:entry entry into any kernel functions that starts with bge_
fbt:bge::entry entry into any bge driver functions
fbt:::entry entry into any kernel functions
fbt::: all probes published by the fbt provider
27
28. Predicate
• A predicate...
> could be any D expression, result is boolean
> is true means the actions could be executed
Predicate Explanation
CPU == 0 true if the probe executes on cpu0
true if the pid of the process that caused
Pid == 1029
the probe to fire is 1029
execname != “sched” true if the process is not the scheduler
true if the parent process id is not 0 and
ppid !=0 && arg0 == 0
first argument is 0
28
29. Action
• An Action...
> is executed when a probe fires
> has two categories
– Data Recording Action/Destructive Action
Action Explanation
trace() trace the D expression results
printf() print something using C-style printf()
printa() print the aggregations
ustack() print the user stack trace
stack() print the kernel stack trace
tracemem() copy data from an address in memory to a buffer
breakpoint() a kernel breakpoint, causes system drop into kmdb
panic() cause a kernel panic
chill() spin for the specified number of nanoseconds
29
30. Aggregation
• Aggregation syntax
> @name[ keys ] = aggfunc( args );
Functions Explanation
count() times that the count function is called
sum() total value of the specified expressions
avg() arithmetic average of the specified expressions
min() smallest value among the specified expressions
max() largest value among the specified expressions
A linear frequency distribution of the values of the
lquantize() specified expressions that is sized by the specified
range
A power of 2 frequency distribution of the values
quantize()
of the specified expressions.
30
31. Variables
> Scalar Variables
– Represent individual fixed-size data objects
> Associative Arrays
– name [ key ] = expression ;
> Thread-Local Variables
– self->[variable name]
> Clause-Local Variables
– this->[variable name]
> Built-in Variables
– pre-defined scalar global variables
> External Variables
– the ”`” is a scoping operator for accessing variables
that are defined in the OS, eg: `kmem_flags
31
32. Built-in Variables
Type and Name Explanation
int64_t arg0...arg9 The first 10 input arguments
cpuinfo_t *curcpu The CPU information for the current CPU.
processorid_t cpu The CPU identifier for the current CPU.
kthread_t *curthread kthread_t address for current kernel thread
pid_t pid The process ID of the current process
pid_t ppid parent process ID of the current process
uint_t ipl IPL on the current CPU at probe firing time
int errno Error value returned by the last system call
string execname name passed to exec(2) to execute the process
A nanosecond timestamp counter, it increments
uint64_t timestamp from an arbitrary point in the past and should only
be used for relative computations
A nanosecond timestamp counter that is the time
uint64_t vtimestamp of the current thread has been running on a CPU,
minus the time spent in predicates and actions
32