GlobalLogic Azure TechTalk ONLINE “Marketing Data Lake in Azure”
Vpm
1. Post mortem debugging in
Embedded Linux Systems
Anton Bondarenko
Senior Software Engineer/Architect
Bosch Sensortec
2. Topics
● Introduction
● What is post-mortem analysis?
● Why do we need post-mortem data?
● How it could be retrieved? Problems and solutions
● How it could be analyzed?
○ Crash tool
● Examples
3. Introduction
● 10+ years of Embedded Linux experience
● 4 years as System engineer in Sony Mobile working with
Xperia Z to Z3 generations with focus on stability
○ Major activity was post-mortem analysis using different
methods and approaches
4. Post-mortem analysis
Post-mortem analysis consist of different methods to
investigate over data collected at the moment system state
become unstable
Well known solutions
● GDB with coredump
6. Why do we want post-mortem data
Live debugging
● Focused on flow control
Post-mortem debugging
● State analysis
● Single instance ● Multiple processing
● Online on target ● Offline or semi-offline
● System continues to evolve ● System state is atomic
● Limited scope ● Global scope
7. How it could be retrieved
Important rules to follow;
● Keep critical state information unmodified
● Collect as much as possible
Collection may happen:
● With system reset, for example in bootloader
● W/o system reset, for example kdump approach
● In Hypervisor as VM dump
8. Bootloader dumper
Advantages:
● Small footprint
● Handle hardware cases
Disadvantages:
● Separate drivers & tools
● Require special handling for
RAM initialization
● Intermediate boot stages
First kernel
Unexpected
system
reset
ROM
bootloader
RAM
bootloader
Disk
Network
9. KDump
Advantages:
● “Same” kernel
● Same utils
● Direct jump
Disadvantages:
● Requires more memory
● Memory reservation
● HW failures might not work
11. How it could be analyzed
Main requirement - OS and CPU architecture awareness
Tool Examples
● Lauterbach TRACE32
● Red Hat Crash
12. Lauterbach TRACE32
● Many supported
architectures
● Requires Linux
kernel OS
awareness library
● Support scripting
with its own script
language
● Active
maintenance
● License:
Proprietary
13. Red Hat Crash Utility
● Many
supported
architectures
(x86, ARM,
ARM64,
MIPS)
● Using GDB
as core
library
● Native
support for
Linux kernel
OS
● Active
maintenance
● License: GPL
14. Crash extensions
● Native support of plugin concept
● Few available including very promising one
○ Python scripts in Crash environment (PyScript)
● Supports symbols for whole system:
kernel+modules+userspace
● Full access to OS memory
○ User space analysis in tool directly
○ JVM stack and state analysis
15. Linux Kernel crash
● Possible causes
○ Many different ones
● Important information
○ Access to OS memory
20. IPC issues
● Possible causes
○ Unexpected state in complex system
● Important information
○ All involved parts of memory (both kernel and userspace)
Android
App 1
Android
Framework
Manager
Android
Framework
Service
Android
App 2
Android
Framework
Manager
?
21. LK Deadlock
● Possible causes
○ Wrong handling of locks
● Important information
○ Access to lock memory
22. Watchdog
● Possible causes
○ Interrupt handling
○ Hardware errors
○ Memory corruption
● Important information
○ CPUs registers state
○ Special traces and logging