Debugging linux

Andrea Righi - andrea@betterlinux.com
Tecniche di debugging nel kernel
Linux

Agenda
● Overview (kernel programming)
● Kernel crash taxonomy
● Debugging techniques
● Example(s)
● Q/A

What's a kernel?
● The kernel provides an abstraction layer for the
applications to use the physical hardware
resources
● Kernel basic facilities
● Process management
● Memory management
● Device management
● System call interface

User space
● Good for debugging (gdb)
● Lots of user-space libraries available
● Unpredictable latency (context switch, scheduler, syscall, ...)
● Overhead
● Impossibility to fully interact with interrupt routines
● Impossibility to access certain memory address
● More difficult to share certain features with other drivers
● Reliability: user processes can be terminated upon critical
system events (OOM, filesystem errors, etc.)

Kernel space
●
Written in C and assembly
●
No debugging tool (kgdb, UML, ...)
●
Bugs can hang the entire system
● User memory is swappable, kernel memory can't be swapped out
● Kernel stack size is small (8K / 4K - THREAD_SIZE_ORDER)
● Floating point is forbidden
● Userspace libraries are not available
●
Linux kernel must be portable (this is important if you consider to
contribute mainstream)
●
Closed source kernel modules taint the kernel

Example kernel module
#include <linux/init.h>
#include <linux/module.h>
/* Module constructor */
static int __init hello_init(void)
{
printk(KERN_INFO "Hello, world!n");
return 0;
}
/* Module destructor */
static void __exit hello_exit(void)
{
printk(KERN_INFO "Goodbyen");
}
module_init(hello_init);
module_exit(hello_exit);
MODULE_LICENSE("GPL");
MODULE_AUTHOR("Andrea Righi <andrea@betterlinux.com>");
MODULE_DESCRIPTION("BetterEmbedded hello world example");

Kernel problems
● Kernel panic (fatal error for the system)
● Kernel oops (non-fatal error)
● Wrong result (fatal from user's perspective)

Kernel panic
● No recovery is possible
● Example: exception in an atomic context (i.e.,
interrupt)
● Typically result in a system reboot (panic=N), or
blinking LED or just hang

[ 165.552280] general protection fault: 0000 [#1] PREEMPT SMP
[ 165.553055] Modules linked in: crashtest(O) [last unloaded: crashtest]
[ 165.553092] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G O 3.10.0-rc7+ #535
[ 165.553092] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[ 165.553092] task: ffff88003d90a2c0 ti: ffff88003d92e000 task.ti: ffff88003d92e000
[ 165.553092] RIP: 0010:[<ffffffff811ab0e5>] [<ffffffff811ab0e5>] __kmalloc_track_caller+0xd5/0x2b0
[ 165.553092] RSP: 0018:ffff88003e003988 EFLAGS: 00010206
[ 165.553092] RAX: 0000000000000000 RBX: ffff88003e1d6a20 RCX: 00000000000be841
[ 165.553092] RDX: 00000000000be801 RSI: 0000000000000000 RDI: 0000000000000001
[ 165.553092] RBP: ffff88003e0039c8 R08: 00000000001d6a20 R09: 0000000000000000
[ 165.553092] R10: 0000000000000000 R11: 0000000000000001 R12: 7878787878787878
[ 165.553092] R13: 0000000000010220 R14: 0000000000000240 R15: ffff88003d801780
[ 165.553092] FS: 0000000000000000(0000) GS:ffff88003e000000(0000) knlGS:0000000000000000
[ 165.553092] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 165.553092] CR2: 00000000081ab008 CR3: 0000000037dc8000 CR4: 00000000000006e0
[ 165.553092] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 165.553092] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 165.553092] Stack:
[ 165.553092] 00000000000be801 ffff88003d92ffd8 ffffffff8161683d ffff880034e3f300
[ 165.553092] ffff88003e003a17 0000000000000020 0000000000000240 0000000000000000
[ 165.553092] ffff88003e003a00 ffffffff8161433c ffff880034e3f300 0000000000000020
...
...
...

...
[ 165.553092] Call Trace:
[ 165.553092] <IRQ>
[ 165.553092] [<ffffffff8161683d>] ? __alloc_skb+0x7d/0x290
[ 165.553092] [<ffffffff8161433c>] __kmalloc_reserve.isra.52+0x3c/0xa0
[ 165.553092] [<ffffffff8161683d>] __alloc_skb+0x7d/0x290
[ 165.553092] [<ffffffff81677e5b>] tcp_send_ack+0x3b/0xf0
[ 165.553092] [<ffffffff8166ab1e>] __tcp_ack_snd_check+0x5e/0xa0
[ 165.553092] [<ffffffff81671c64>] tcp_rcv_established+0x204/0x6f0
[ 165.553092] [<ffffffff810e678e>] ? put_lock_stats.isra.26+0xe/0x40
[ 165.553092] [<ffffffff8167c681>] tcp_v4_do_rcv+0x161/0x360
[ 165.553092] [<ffffffff816fea39>] ? _raw_spin_lock_nested+0x79/0x90
[ 165.553092] [<ffffffff8167dc91>] tcp_v4_rcv+0x731/0x980
[ 165.553092] [<ffffffff810e706f>] ? __lock_is_held+0x5f/0x80
[ 165.553092] [<ffffffff816563d8>] ip_local_deliver_finish+0xc8/0x2f0
[ 165.553092] [<ffffffff8165635a>] ? ip_local_deliver_finish+0x4a/0x2f0
[ 165.553092] [<ffffffff81656e77>] ip_local_deliver+0x47/0x80
[ 165.553092] [<ffffffff81656740>] ip_rcv_finish+0x140/0x5e0
[ 165.553092] [<ffffffff816570e3>] ip_rcv+0x233/0x380
[ 165.553092] [<ffffffff81626062>] __netif_receive_skb_core+0x6a2/0x970
[ 165.553092] [<ffffffff81625a10>] ? __netif_receive_skb_core+0x50/0x970
[ 165.553092] [<ffffffff81626351>] __netif_receive_skb+0x21/0x70
[ 165.553092] [<ffffffff81626563>] netif_receive_skb+0x23/0x1f0
[ 165.553092] [<ffffffff81627448>] napi_gro_receive+0x98/0xd0
[ 165.553092] [<ffffffff81565c5a>] e1000_clean_rx_irq+0x18a/0x520
[ 165.553092] [<ffffffff81567451>] e1000_clean+0x251/0x910
[ 165.553092] [<ffffffff810e678e>] ? put_lock_stats.isra.26+0xe/0x40
[ 165.553092] [<ffffffff810e6df4>] ? lock_release_holdtime.part.27+0xd4/0x160
[ 165.553092] [<ffffffff81627015>] net_rx_action+0xd5/0x2e0
[ 165.553092] [<ffffffff81088d17>] __do_softirq+0xf7/0x420
[ 165.553092] [<ffffffff810891d5>] irq_exit+0xb5/0xc0
[ 165.553092] [<ffffffff81709303>] do_IRQ+0x63/0xd0
[ 165.553092] Code: c8 48 8b 55 c0 48 8b 81 38 e0 ff ff a8 08 0f 85 5f 01 00 00 4c 8b 23 4d 85 e4 0f 84 15
01 00 00 49 63 47 20 48 8d 4a 40 4d 8b 07 <49> 8b 1c 04 4c 89 e0 65 49 0f c7 08 0f 94 c0 84 c0 74 97 49 63
[ 165.553092] RIP [<ffffffff811ab0e5>] __kmalloc_track_caller+0xd5/0x2b0
[ 165.553092] RSP <ffff88003e003988>
[ 165.553092] ---[ end trace baac76a23c6da73c ]---
[ 165.553092] Kernel panic - not syncing: Fatal exception in interrupt

Kernel oops
● A message is displayed in the log when a
recoverable error has occurred in kernel space
● Example: access a bad address (i.e., NULL pointer
dereference)
● An oops does not mean the system has crashed
● Current process is killed
● Oops message is displayed along with a registers
dump and a stack trace

[ 75.962412] BUG: unable to handle kernel NULL pointer dereference at (null)
[ 75.963046] IP: [<ffffffffa00003c6>] procfs_write+0x2d6/0x320 [crashtest]
[ 75.963046] PGD 3a78d067 PUD 362be067 PMD 0
[ 75.963046] Oops: 0002 [#1] PREEMPT SMP
[ 75.963046] Modules linked in: crashtest(O)
[ 75.963046] CPU: 0 PID: 1587 Comm: bash Tainted: G O 3.10.0-rc7+ #535
[ 75.963046] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[ 75.963046] task: ffff88003a7ec580 ti: ffff8800362f6000 task.ti: ffff8800362f6000
[ 75.963046] RIP: 0010:[<ffffffffa00003c6>] [<ffffffffa00003c6>] procfs_write+0x2d6/0x320
[crashtest]
[ 75.963046] RSP: 0018:ffff8800362f7e78 EFLAGS: 00010297
[ 75.963046] RAX: 0000000000000000 RBX: 0000000000000002 RCX: 000000000000004e
[ 75.963046] RDX: 0000000000000000 RSI: ffffffffa0000469 RDI: ffff8800362f7eaa
[ 75.963046] RBP: ffff8800362f7ee0 R08: 0000000000000000 R09: 0000000000000000
[ 75.963046] R10: ffff88003a7ec580 R11: 0000000000000000 R12: 0000000000000003
[ 75.963046] R13: 000000000000000a R14: ffff8800362f7f50 R15: 0000000000000000
[ 75.963046] FS: 0000000000000000(0000) GS:ffff88003de00000(0063) knlGS:00000000f75f76c0
[ 75.963046] CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033
[ 75.963046] CR2: 0000000000000000 CR3: 0000000036209000 CR4: 00000000000006f0
[ 75.963046] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 75.963046] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 75.963046] Stack:
[ 75.963046] ffffffff811b66cb 0000000000000000 0000000000000000 ffff88003a7ec580
[ 75.963046] ffff8800362f7ec8 4f49545045435845 000000000000004e 0000000000000000
[ 75.963046] 0000000000000000 00000000463b9fa0 ffff8800362fd300 000000000000000a
[ 75.963046] Call Trace:
[ 75.963046] [<ffffffff811b66cb>] ? vfs_write+0x1bb/0x1f0
[ 75.963046] [<ffffffff8121a86d>] proc_reg_write+0x3d/0x80
[ 75.963046] [<ffffffff811b65d8>] vfs_write+0xc8/0x1f0
[ 75.963046] [<ffffffff811b6ad5>] SyS_write+0x55/0xa0
[ 75.963046] [<ffffffff81708ce5>] sysenter_dispatch+0x7/0x1f
[ 75.963046] [<ffffffff813c50ae>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[ 75.963046] Code: e1 f3 6f e1 48 c7 c7 60 09 00 a0 e8 d5 f3 6f e1 e9 e2 fd ff ff c7 45 d0 78 56
34 12 e9 d6 fd ff ff e8 bf fc ff ff e9 cc fd ff ff <c7> 04 25 00 00 00 00 00 00 00 00 e9 bc fd ff ff
eb fe 66 c7 07
[ 75.963046] RIP [<ffffffffa00003c6>] procfs_write+0x2d6/0x320 [crashtest]
[ 75.963046] RSP <ffff8800362f7e78>
[ 75.963046] CR2: 0000000000000000

Taxonomy of kernel faults
●
panic(“have a nice day... ;-)”)
●
BUG() / BUG_ON(condition)
●
exception (i.e., invalid opcode, division by zero, ...)
●
memory corruption
●
stack overflow/underflow
– NOTE: in kernel space stack size is limited to 2 pages (8K in almost all architectures)
●
write after free
●
write to a bad address
●
concurrent access without protections (locks, etc.)
●
soft lockup
●
lock a CPU without giving other tasks a chance to run
●
hard lockup
●
lock a CPU without giving other tasks or interrupts a chance to run
●
hung task: task doesn't get a chance to run for more than N seconds
●
scheduling while atomic
●
deadlock
●
use FPU registers in kernel space

Useful debugging kernel options
● Kernel Hacking section ->
● CONFIG_KALLSYMS_ALL: print function names instead of addresses in kernel
messages
● CONFIG_FRAME_POINTER: get useful stack info in case of kernel bugs
● CONFIG_DEBUG_ATOMIC_SLEEP: enable sleep inside atomic section checks
(i.e., sleep from interrupt handler, sleep when a lock is held, etc...)
● CONFIG_LOCKUP_DETECTOR: detect hard and soft lockups
● CONFIG_LOCKDEP: lock dependency enging (deadlock detection)
● CONFIG_DYNAMIC_FTRACE: enable individual function tracing dynamically
(via debugfs /sys/kernel/debug/tracing)

Debugging techniques
● blinking LED
● printk()
● procfs
● SysReq key (Documentation/sysrq.txt)
● function instrumentation (kprobes)
● dynamic ftrace (CONFIG_DYNAMIC_FTRACE)
● debugger (kgdb)

printk()
● Advantages
● easy to use
● no need any other system support
● Disadvantages
● have to modify and rebuild kernel/modules
● no interactive debugging

printk(): levels
● printk levels
● KERN_EMERG: system is unusable
● KERN_ALERT: action must be taken immediately
● KERN_CRIT: critical condition
● KERN_ERR: error condition
● KERN_WARNING: warning condition
● KERN_NOTICE: normal condition
● KERN_INFO: informational
● KERN_DEBUG: debug message
● Show kernel messages:
# dmesg
● Redirect all kernel messages to the console
# echo 8 > /proc/sys/kernel/printk
●

procfsstatic int procfs_read(struct seq_file *m, void *v)
{
...
}
static ssize_t procfs_write(struct file *file,
const char __user *ubuf, size_t count, loff_t *pos)
{
...
}
static int procfs_open(struct inode *inode, struct file *file)
{
return single_open(file, procfs_read, NULL);
}
static int procfs_release(struct inode *inode, struct file *file)
{
return 0;
}
static const struct file_operations procfs_fops = {
.open = procfs_open,
.read = seq_read,
.write = procfs_write,
.llseek = seq_lseek,
.release = procfs_release,
};
static int __init myproc_init(void)
{
if (!proc_create(“myproc”, 0666, NULL, &procfs_fops))
return -ENOMEM;
return 0;
}
static void __exit myproc_exit(void)
{
remove_proc_entry(“myproc”, NULL);
}

Kprobes (Kernel probes)
● Kprobes allow to dynamically break into any kernel routine and collect
debugging and performance information (CONFIG_KPROBES=y)
● Trap almost every kernel code address, specifying a handler routine to be
invoked when the breakpoint is hit
● How does it work?
● Make a copy of the probed instruction and replace the original instruction with a
breakpoint instruction (int3 on x86)
● When the breakpoint is hit, a trap occurs, CPU's registers are saved and the
control passes to the Kprobes pre-handler
● The saved instruction is executed in single-step mode
● The Kprobes post-handler is executed
● The rest of the original function is executed

Kprobes (example)
static int my_handler(struct kprobe *p, struct pt_regs *regs)
{
/* Do something here... */
}
static struct kprobe my_kp = {
.pre_handler = my_wrapper,
.symbol_name = “schedule_timeout”,
};
static int __init my_kprobe_init(void)
{
int ret;
ret = register_kprobe(&my_kp);
if (ret < 0) {
printk(KERN_INFO "%s: error %dn", __func__, ret);
return ret;
}
return 0;
}
static void __exit my_kprobe_exit(void)
{
unregister_kprobe(&my_kp);
}

Dump a stack trace
static const char function_name[] = "schedule_timeout";
static int my_handler(struct kprobe *p, struct pt_regs *regs)
{
dump_stack();
printk(KERN_INFO "%s called %s(%d)n",
current->comm, function_name, (int)regs->di);
}
static struct kprobe my_kp = {
.pre_handler = my_wrapper,
.symbol_name = function_name,
};
static int __init my_kprobe_init(void)
{
int ret;
ret = register_kprobe(&my_kp);
if (ret < 0) {
printk(KERN_INFO "%s: error %dn", __func__, ret);
return ret;
}
return 0;
}
static void __exit my_kprobe_exit(void)
{
unregister_kprobe(&my_kp);
}

Dynamic ftrace
# mount -t debufs none /sys/kernel/debug
# cd /sys/kernel/debug
# echo sys_nanosleep hrtimer_interrupt > set_ftrace_filter
# echo function > current_tracer
# echo 1 > tracing_on
# usleep 1
# echo 0 > tracing_on
# cat trace
# tracer: function
#
# entries-in-buffer/entries-written: 5/5 #P:4
#
# _-----=> irqs-off
# / _----=> need-resched
# | / _---=> hardirq/softirq
# || / _--=> preempt-depth
# ||| / delay
# TASK-PID CPU# |||| TIMESTAMP FUNCTION
# | | | |||| | |
usleep-2665 [001] .... 4186.475355: sys_nanosleep <-system_call_fastpath
<idle>-0 [001] d.h1 4186.475409: hrtimer_interrupt <-smp_apic_timer_interrupt
usleep-2665 [001] d.h1 4186.475426: hrtimer_interrupt <-smp_apic_timer_interrupt

KGDB + QEMU
$ kvm -m 1024 -smp 4 -drive file=debian-6-i386.img -vnc :1 -redir tcp:5190:10.0.2.15:22
-kernel /src/linux/arch/x86/boot/bzImage -append "root=/dev/sda1 kgdbwait kgdboc=ttyS0"
-serial pty
char device redirected to /dev/pts/3 (label serial0)
$ gdb vmlinux
(gdb) target remote /dev/pts/3
● Setting up kgdb using kvm/qemu

Debugging workqueues
● workqueue: asynchronous process execution context
● kworkers are going crazy (using too much cpu)?
● Something being scheduled in rapid succession
● A single work item consumes alots of cpu cycles
● How to debug?
● kernel tracepoints:
– echo workqueue:workqueue_queue_work > /sys/kernel/debug/tracing/set_event
● kworker stack trace:
– cat /proc/THE_OFFENDING_KWORKER/stack
root 5671 0.0 0.0 0 0 ? S 12:07 0:00 [kworker/0:1]
root 5672 0.0 0.0 0 0 ? S 12:07 0:00 [kworker/1:2]
root 5673 0.0 0.0 0 0 ? S 12:12 0:00 [kworker/0:0]
root 5674 0.0 0.0 0 0 ? S 12:13 0:00 [kworker/1:0]

References
● J. Corbet, A. Rubini, G. Kroah-Hartman:
Linux Device Drivers 3rd Edition
● Linux documentation
● http://lxr.linux.no/linux/Documentation/trace
● http://lxr.linux.no/linux/Documentation/kprobes.txt
● Linux weekly news: http://lwn.net

Q/A
● You're very welcome!
● Twitter
● @arighi
● #bem2013

Debugging linux

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a Debugging linux

Similar a Debugging linux (20)

Más de Andrea Righi

Más de Andrea Righi (6)

Último

Último (20)

Debugging linux