This presentation by Anton Bondarenko (Senior Software Engineer/Architect, Bosch Sensortec, Sweden) was delivered at GlobalLogic Kharkiv Embedded Conference 2019 on July 7, 2019.
Live debugging in Linux is a good method during development but it’s not always possible. Alternative is post mortem debugging. Post mortem analyze includes investigations performed using system snapshot. There are different tools supporting this approach. ‘Crash’ tool is one of them and we will review it in details during Anton’s speech. The talk provided information about different aspects of post mortem analyze like collection, processing and comparison to other methods.
Conference materials: https://www.globallogic.com/ua/events/kharkiv-embedded-conference-2019/
1. Post mortem debugging in
Embedded Linux Systems
Anton Bondarenko
Senior Software Engineer/Architect
Bosch Sensortec
2. Topics
● Introduction
● What is post-mortem analysis?
● Why do we need post-mortem data?
● How it could be retrieved? Problems and solutions
● How it could be analyzed?
○ Crash tool
● Examples
3. Introduction
● 10+ years of Embedded Linux experience
● 4 years as System engineer in Sony Mobile working with
Xperia Z to Z3 generations with focus on stability
○ Major activity was post-mortem analysis using different
methods and approaches
4. Post-mortem analysis
Post-mortem analysis consist of different methods to
investigate over data collected at the moment system state
become unstable
Well known solutions
● GDB with coredump
6. Why do we want post-mortem data
Live debugging
● Focused on flow control
Post-mortem debugging
● State analysis
● Single instance ● Multiple processing
● Online on target ● Offline or semi-offline
● System continues to evolve ● System state is atomic
● Limited scope ● Global scope
7. How it could be retrieved
Important rules to follow;
● Keep critical state information unmodified
● Collect as much as possible
Collection may happen:
● With system reset, for example in bootloader
● W/o system reset, for example kdump approach
● In Hypervisor as VM dump
8. Bootloader dumper
Advantages:
● Small footprint
● Handle hardware cases
Disadvantages:
● Separate drivers & tools
● Require special handling for
RAM initialization
● Intermediate boot stages
First kernel
Unexpected
system
reset
ROM
bootloader
RAM
bootloader
Disk
Network
9. KDump
Advantages:
● “Same” kernel
● Same utils
● Direct jump
Disadvantages:
● Requires more memory
● Memory reservation
● HW failures might not work
11. How it could be analyzed
Main requirement - OS and CPU architecture awareness
Tool Examples
● Lauterbach TRACE32
● Red Hat Crash
12. Lauterbach TRACE32
● Many supported
architectures
● Requires Linux
kernel OS
awareness library
● Support scripting
with its own script
language
● Active
maintenance
● License:
Proprietary
13. Red Hat Crash Utility
● Many
supported
architectures
(x86, ARM,
ARM64,
MIPS)
● Using GDB
as core
library
● Native
support for
Linux kernel
OS
● Active
maintenance
● License: GPL
14. Crash extensions
● Native support of plugin concept
● Few available including very promising one
○ Python scripts in Crash environment (PyScript)
● Supports symbols for whole system:
kernel+modules+userspace
● Full access to OS memory
○ User space analysis in tool directly
○ JVM stack and state analysis
15. Linux Kernel crash
● Possible causes
○ Many different ones
● Important information
○ Access to OS memory
20. IPC issues
● Possible causes
○ Unexpected state in complex system
● Important information
○ All involved parts of memory (both kernel and userspace)
Android
App 1
Android
Framework
Manager
Android
Framework
Service
Android
App 2
Android
Framework
Manager
?
21. LK Deadlock
● Possible causes
○ Wrong handling of locks
● Important information
○ Access to lock memory
22. Watchdog
● Possible causes
○ Interrupt handling
○ Hardware errors
○ Memory corruption
● Important information
○ CPUs registers state
○ Special traces and logging