3. Andrea Righi - andrea@betterlinux.com
What's a kernel?
● The kernel provides an abstraction layer for the
applications to use the physical hardware
resources
● Kernel basic facilities
● Process management
● Memory management
● Device management
● System call interface
4. Andrea Righi - andrea@betterlinux.com
User space
● Good for debugging (gdb)
● Lots of user-space libraries available
● Unpredictable latency (context switch, scheduler, syscall, ...)
● Overhead
● Impossibility to fully interact with interrupt routines
● Impossibility to access certain memory address
● More difficult to share certain features with other drivers
● Reliability: user processes can be terminated upon critical
system events (OOM, filesystem errors, etc.)
5. Andrea Righi - andrea@betterlinux.com
Kernel space
●
Written in C and assembly
●
No debugging tool (kgdb, UML, ...)
●
Bugs can hang the entire system
● User memory is swappable, kernel memory can't be swapped out
● Kernel stack size is small (8K / 4K - THREAD_SIZE_ORDER)
● Floating point is forbidden
● Userspace libraries are not available
●
Linux kernel must be portable (this is important if you consider to
contribute mainstream)
●
Closed source kernel modules taint the kernel
7. Andrea Righi - andrea@betterlinux.com
Kernel problems
● Kernel panic (fatal error for the system)
● Kernel oops (non-fatal error)
● Wrong result (fatal from user's perspective)
8. Andrea Righi - andrea@betterlinux.com
Kernel panic
● No recovery is possible
● Example: exception in an atomic context (i.e.,
interrupt)
● Typically result in a system reboot (panic=N), or
blinking LED or just hang
11. Andrea Righi - andrea@betterlinux.com
Kernel oops
● A message is displayed in the log when a
recoverable error has occurred in kernel space
● Example: access a bad address (i.e., NULL pointer
dereference)
● An oops does not mean the system has crashed
● Current process is killed
● Oops message is displayed along with a registers
dump and a stack trace
13. Andrea Righi - andrea@betterlinux.com
Taxonomy of kernel faults
●
panic(“have a nice day... ;-)”)
●
BUG() / BUG_ON(condition)
●
exception (i.e., invalid opcode, division by zero, ...)
●
memory corruption
●
stack overflow/underflow
– NOTE: in kernel space stack size is limited to 2 pages (8K in almost all architectures)
●
write after free
●
write to a bad address
●
concurrent access without protections (locks, etc.)
●
soft lockup
●
lock a CPU without giving other tasks a chance to run
●
hard lockup
●
lock a CPU without giving other tasks or interrupts a chance to run
●
hung task: task doesn't get a chance to run for more than N seconds
●
scheduling while atomic
●
deadlock
●
use FPU registers in kernel space
14. Andrea Righi - andrea@betterlinux.com
Useful debugging kernel options
● Kernel Hacking section ->
● CONFIG_KALLSYMS_ALL: print function names instead of addresses in kernel
messages
● CONFIG_FRAME_POINTER: get useful stack info in case of kernel bugs
● CONFIG_DEBUG_ATOMIC_SLEEP: enable sleep inside atomic section checks
(i.e., sleep from interrupt handler, sleep when a lock is held, etc...)
● CONFIG_LOCKUP_DETECTOR: detect hard and soft lockups
● CONFIG_LOCKDEP: lock dependency enging (deadlock detection)
● CONFIG_DYNAMIC_FTRACE: enable individual function tracing dynamically
(via debugfs /sys/kernel/debug/tracing)
15. Andrea Righi - andrea@betterlinux.com
Debugging techniques
● blinking LED
● printk()
● procfs
● SysReq key (Documentation/sysrq.txt)
● function instrumentation (kprobes)
● dynamic ftrace (CONFIG_DYNAMIC_FTRACE)
● debugger (kgdb)
16. Andrea Righi - andrea@betterlinux.com
printk()
● Advantages
● easy to use
● no need any other system support
● Disadvantages
● have to modify and rebuild kernel/modules
● no interactive debugging
17. Andrea Righi - andrea@betterlinux.com
printk(): levels
● printk levels
● KERN_EMERG: system is unusable
● KERN_ALERT: action must be taken immediately
● KERN_CRIT: critical condition
● KERN_ERR: error condition
● KERN_WARNING: warning condition
● KERN_NOTICE: normal condition
● KERN_INFO: informational
● KERN_DEBUG: debug message
● Show kernel messages:
# dmesg
● Redirect all kernel messages to the console
# echo 8 > /proc/sys/kernel/printk
●
19. Andrea Righi - andrea@betterlinux.com
Kprobes (Kernel probes)
● Kprobes allow to dynamically break into any kernel routine and collect
debugging and performance information (CONFIG_KPROBES=y)
● Trap almost every kernel code address, specifying a handler routine to be
invoked when the breakpoint is hit
● How does it work?
● Make a copy of the probed instruction and replace the original instruction with a
breakpoint instruction (int3 on x86)
● When the breakpoint is hit, a trap occurs, CPU's registers are saved and the
control passes to the Kprobes pre-handler
● The saved instruction is executed in single-step mode
● The Kprobes post-handler is executed
● The rest of the original function is executed
20. Andrea Righi - andrea@betterlinux.com
Kprobes (example)
static int my_handler(struct kprobe *p, struct pt_regs *regs)
{
/* Do something here... */
}
static struct kprobe my_kp = {
.pre_handler = my_wrapper,
.symbol_name = “schedule_timeout”,
};
static int __init my_kprobe_init(void)
{
int ret;
ret = register_kprobe(&my_kp);
if (ret < 0) {
printk(KERN_INFO "%s: error %dn", __func__, ret);
return ret;
}
return 0;
}
static void __exit my_kprobe_exit(void)
{
unregister_kprobe(&my_kp);
}
21. Andrea Righi - andrea@betterlinux.com
Dump a stack trace
static const char function_name[] = "schedule_timeout";
static int my_handler(struct kprobe *p, struct pt_regs *regs)
{
dump_stack();
printk(KERN_INFO "%s called %s(%d)n",
current->comm, function_name, (int)regs->di);
}
static struct kprobe my_kp = {
.pre_handler = my_wrapper,
.symbol_name = function_name,
};
static int __init my_kprobe_init(void)
{
int ret;
ret = register_kprobe(&my_kp);
if (ret < 0) {
printk(KERN_INFO "%s: error %dn", __func__, ret);
return ret;
}
return 0;
}
static void __exit my_kprobe_exit(void)
{
unregister_kprobe(&my_kp);
}
24. Andrea Righi - andrea@betterlinux.com
Debugging workqueues
● workqueue: asynchronous process execution context
● kworkers are going crazy (using too much cpu)?
● Something being scheduled in rapid succession
● A single work item consumes alots of cpu cycles
● How to debug?
● kernel tracepoints:
– echo workqueue:workqueue_queue_work > /sys/kernel/debug/tracing/set_event
● kworker stack trace:
– cat /proc/THE_OFFENDING_KWORKER/stack
root 5671 0.0 0.0 0 0 ? S 12:07 0:00 [kworker/0:1]
root 5672 0.0 0.0 0 0 ? S 12:07 0:00 [kworker/1:2]
root 5673 0.0 0.0 0 0 ? S 12:12 0:00 [kworker/0:0]
root 5674 0.0 0.0 0 0 ? S 12:13 0:00 [kworker/1:0]
25. Andrea Righi - andrea@betterlinux.com
References
● J. Corbet, A. Rubini, G. Kroah-Hartman:
Linux Device Drivers 3rd Edition
● Linux documentation
● http://lxr.linux.no/linux/Documentation/trace
● http://lxr.linux.no/linux/Documentation/kprobes.txt
● Linux weekly news: http://lwn.net
26. Andrea Righi - andrea@betterlinux.com
Q/A
● You're very welcome!
● Twitter
● @arighi
● #bem2013