2. AGENDA
• Embedded Software Development
Compilation Tools
Linking and Libraries
Target Platforms
AAETC5v00
Developing Code for ARM 2
Target Platforms
Debug
Invasive Debug
Non-Invasive Debug
Performance Monitoring
3. BUILDING EMBEDDED SOFTWARE
.c
.o .axf
Compile
Link
Optional
AAETC5v00
Developing Code for ARM 3
.s
.lib
Assemble
Target
System
Optional
binary
conversion
Librarian
4. AGENDA
Embedded Software Development
• Compilation Tools
Linking and Libraries
Target Platforms
AAETC5v00
Developing Code for ARM 4
Target Platforms
Debug
Invasive Debug
Non-Invasive Debug
Performance Monitoring
5. THE COMPILER
• Set optimization level appropriately
– In general, increasing optimization level reduces debug visibility
• For the ARM compiler…
-O0 : best debug view, restricted optimization
-O1 : most optimizations, good debug view
-O2 : full optimization (the default), limited debug view
-O3 : higher optimisation, “more aggressive” than –O2
• Most compilers allow you to optimize for either code size or execution speed
AAETC5v00
Developing Code for ARM 5
• Most compilers allow you to optimize for either code size or execution speed
– For the ARM compiler…
-Otime / -Ospace
• It is vital to specify the target processor or architecture
• For the ARM compiler…
• Specify architecture: –-cpu 5TE
• Specify processor: –-cpu Cortex-A9
– Be as specific as possible to enable maximum optimization
– Make sure you specify other features of the target platform, e.g.
• Unaligned access: –-no_unaligned_access
• Floating point support: –-fpu=vfpv3_d16
6. VARIABLE TYPES
• An ABI-compliant ARM compiler supports these basic types:
int / long 32 bit (word) integer
short 16-bit (half-word) integer
char 8-bit byte, unsigned by default
long long 64-bit integer
AAETC5v00
Developing Code for ARM 6
float 32-bit single-precision IEEE floating point
double 64-bit double-precision IEEE floating point
bool 8-bit Boolean (C++ only)
wchar_t 16-bit “wide character” type (C++ only)
Pointers 32-bit integer addresses
• Take care when porting legacy code from other vendors’ architectures
7. INSTRUCTION SET SELECTION
• ARMv7-AR processors support two instructions sets
• ARM
– Use for critical functions which perform better with access to the whole register
set and all instruction features
– For the ARM compiler…
--arm or --arm_only
• Thumb with Thumb-2 extensions
AAETC5v00
Developing Code for ARM 7
• Thumb with Thumb-2 extensions
– Use for the majority of compiled code
– For the ARM compiler…
--thumb
• Some compilers support #pragmas for selecting instruction set on a per-
function basis
– For the ARM compiler…
#pragma arm or #pragma thumb
8. INTRINSICS
• C/C++ are suited to a wide variety of tasks but do not provide built-in
support for specific areas of application, e.g. DSP operations
• Most compilers support various families of intrinsics
– Instruction intrinsics for realizing ARM instructions from your C/C++ code
• Generic intrinsics: __current_pc, __current_sp,
__return_address, ...
AAETC5v00
Developing Code for ARM 8
__return_address, ...
• IRQ/FIQ intrinsics: __disable_irq, __enable_irq, ...
• Optimization barriers: __schedule_barrier, __force_stores, ...
• Native instructions: __pld, __ldrex, __isb, __dsb,...
• DSP intrinsics: __clz, __fabs, __sqrt, ...
– Named register variables (e.g., register int cpsr __asm(“CPSR”))
– NEON intrinsics for use with the NEON instruction set to access NEON features (in
arm_neon.h)
9. AUTOMATIC VECTORIZATION
void add_int(int * restrict pa, int * restrict pb,
unsigned int n, int x)
{
unsigned int i;
for(i = 0; i < (n & ~3); i++)
pa[i] = pb[i] + x;
}
add_int PROC
BICS r12,r2,#3
AAETC5v00
Developing Code for ARM 9
armcc ----cpu=Cortex-A8 –O3 –Otime
BICS r12,r2,#3
BEQ |L0.36|
VDUP.32 q1,r3
LSR r2,r2,#2
|L0.16|
VLD1.32 {d0,d1},[r1]!
SUBS r2,r2,#1
VADD.I32 q0,q0,q1
VST1.32 {d0,d1},[r0]!
BNE |L0.16|
|L0.36|
BX lr
10. CROSS VS. NATIVE COMPILATION
• In traditional (native) software development the compilation tools
are executed on the same platform which runs the output code
• In embedded software development, this is not usually the case
.c .o .axf
Target
System
Download
AAETC5v00
Developing Code for ARM 10
.c .axf
Compile Link
System
Debugger
Host system Target system
In the above example, code is compiled and linked on the host system and
then downloaded to the target system for execution
The debugger may run on the same host or on a different one
11. AGENDA
Embedded Software Development
Compilation Tools
• Linking and Libraries
Target Platforms
AAETC5v00
Developing Code for ARM 11
Target Platforms
Debug
Invasive Debug
Non-Invasive Debug
Performance Monitoring
12. armlink
THE LINKER
• It links object files produced by a compiler or assembler into an executable image
image.axf
Object Files
Libraries
AAETC5v00
Developing Code for ARM 12
Memory
Description
To do this it must:
Ensure that all the required functions and data are present in the image
Place the contents of the object files to suit the specified memory map
Fill in any required addresses
13. HOW DOES THE LINKER KNOW WHAT
TO DO?
• The linker uses several inputs to decide what to do
– Command line
• List of object files and user library files
• Output file name
• Other options, for example diagnostic information
– Description of the memory map
• Command line options for simple images, Scatterfile for complex images
– Object files
AAETC5v00
Developing Code for ARM 13
– Object files
• Symbol table – contains information on what variables/functions are in the
object file (definitions) and required (references) by the object file
• Relocation information – informs the linker where it needs to fill in address
information
• For a link step to succeed it must match a single symbol definition to every
reference
• Example command line
armlink object1.o object2.o lib1.a --scatter memory.scat –o
image.axf
14. OBJECT FILE STRUCTURE
• Object files (and images) are ELF format
• Contents are split into a number of sections
• Program sections
– Program code
– Initialized (RW) data
– Zero-Initialized (ZI) data
ELF header
Code
RW Data
AAETC5v00
Developing Code for ARM 14
– Zero-Initialized (ZI) data
• Non-program sections
– Symbol table
– Relocation information
– Debug data (DWARF2/3)
• The linker works with whole sections
– Can not split sections or add to sections
– A section can be moved independently of other
sections
ZI Data
Symbol table
Relocation
information
…
15. LIBRARY STRUCTURE
• A library is a collection of object files
gathered together into a single “ar”
format file
• Symbol table
Library header
Object1.o
Symbol table
AAETC5v00
Developing Code for ARM 15
• Symbol table
– Symbol names
– Object file(s) that contain the symbol
– File offset to object file
Object2.o
Object3.o
Object4.o
16. SCATTER-LOADING
• “Scatter-loading” is the ARM tools mechanism to describe the memory
layout for the program
• The memory description can be specified with
– Command line options for simple images (--ro-base, --rw-base)
AAETC5v00
Developing Code for ARM 16
– Command line options for simple images (--ro-base, --rw-base)
– Text description file (scatterfile) for more complex images (--scatter)
• This describes the placement of code and data
• The syntax of the scatter-loading file is not discussed in detail in this
training course
17. ENTRY POINTS
• An application usually has to have at least one entry point
– This is where the application starts executing
– When running with a debugger, this is the initial program counter value
– When executing stand-alone on target hardware, the entry point is
usually the reset vector
• Entry points are used by the linker to identify which modules are
AAETC5v00
Developing Code for ARM 17
• Entry points are used by the linker to identify which modules are
required by an application
– Unused modules will be automatically eliminated
– Modules which are not called or referenced must be marked as entry
points to prevent their removal
• Examples include the vector table
• Linkers vary in how entry points are defined
18. STATIC AND DYNAMIC LIBRARIES
.o.c
Static
Program X
Static
library
Program Y
Static
library
Creating static and dynamic
libraries
Using static and dynamic
libraries
Static linking
at build-time
AAETC5v00
Developing Code for ARM 18
Dynamic Program X Program Y
Shared
library
Dynamic linking
at run-time
A dynamic, shared library may be
loaded automatically by the Operating
System or on demand by the application
20. RETARGETING THE C LIBRARY
• You should replace the C library’s device driver level functionality
with an implementation that is tailored to your target hardware
– For example: printf() should go to LCD screen, not debugger console
AAETC5v00
Developing Code for ARM 20
• You must also target the C library memory map to y our target
e.g. setting the initial value of the stack pointer
21. REMOVING SEMIHOSTING
• The standard ARM C library makes use of a technique called “semihosting” to
access hardware-specific features
– In the absence of drivers, these are intercepted by the debugger and routed to the host
system
– For more detail on semihosting, see the Software Debug section
• To ‘Retarget’ the C library, simply replace those C library functions which use
semihosting with your own implementations, to suit your system
– For example, the family of functions (except ) all ultimately call
AAETC5v00
Developing Code for ARM 21
– For example, the printf() family of functions (except sprintf()) all ultimately call
fputc()
– The default implementation of fputc() uses semihosting
– Replace this with:
extern void sendchar(char *ch);
–
int fputc(int ch, FILE *f)
{ /* e.g. write a character to an LCD */
char tempch = ch;
sendchar(&tempch);
return ch;
}
22. RUN-TIME MEMORY MODELS
• You must decide whether to place your stack and heap in a single region of
memory (one-region model) or in separate regions (two-region model)
Stack
SB
SB
HB
HL
Heap
heap is
checked
against heap
limit
AAETC5v00
Developing Code for ARM 22
Heap
Stack
One region model Two region model
HB
SBheap is
checked
against stack
pointer
• One region model is the default
• To implement a two-region model, import __use_two_region_memory
The initial value of the stack pointer must be doubleword-aligned
23. ABI
• The standard C library will conform to
the ARM ABI
– Application Binary Interface
• The most important part of this is the
calling convention
– Otherwise know as the Procedure Call
AAETC5v00
Developing Code for ARM 23
– Otherwise know as the Procedure Call
Standard for the ARM Architecture, or
“AAPCS”
– This governs register usage across function
calls
– It also specifies stack alignment
requirements
• Floating point linkage…
…is tricky!
24. SOFT OR HARD FLOATING POINT
• Soft FP does not require hardware capability
– Entirely software solution using run-time library
– Slower than hardware solutions
• Hard FP requires coprocessor (e.g. VFP/NEON)
– Later versions do not require library support
– Faster than software emulation
AAETC5v00
Developing Code for ARM 24
– Faster than software emulation
• Code compiled for hard FP will not run on systems
which do not have floating point hardware support
• Code can be compiled with a variety of linkage options
to maximize flexibility
25. FLOATING POINT LINKAGE
• How floating point parameters and return values are passed into and
returned from functions is called the “floating point linkage”
• Hardware floating point linkage
– Floating point arguments are passed to (and returned from) functions in VFP
Coprocessor registers
– Requires VFP Coprocessor to be present
– Can only be used with ARM and Thumb-2 code
AAETC5v00
Developing Code for ARM 25
– Can only be used with ARM and Thumb-2 code
• Software floating point linkage
– Floating point arguments are passed to (and returned from) functions in ARM
registers
– Compatible with all ARM cores, with or without VFP
– Can still have code that uses VFP instructions
• Can not mix functions that use different floating point linkage
– Arguments will not be in the correct registers
27. AGENDA
Embedded Software Development
Compilation Tools
Linking and Libraries
• Target Platforms
AAETC5v00
Developing Code for ARM 27
• Target Platforms
Debug
Invasive Debug
Non-Invasive Debug
Performance Monitoring
28. TARGET
• Models
– Programmers view model (PV)
– Cycle Accurate Model (CA)
AAETC5v00
Developing Code for ARM 28
• Development Boards
• Final Hardware
29. AGENDA
Embedded Software Development
Compilation Tools
Linking and Libraries
Target Platforms
AAETC5v00
Developing Code for ARM 29
Target Platforms
• Debug
Invasive Debug
Non-Invasive Debug
Performance Monitoring
30. WHY DEBUG?
• Debugging can be a useful way to determine why events are
occurring on your system
• For example:
– Why is an abort occurring when the core executes a particular
function?
– Why is an interrupt not being taken as expected?
AAETC5v00
Developing Code for ARM 30
– Why is an interrupt not being taken as expected?
– Why am I not seeing the expected result for a set of
computations?
– Why does my application crash when X occurs?
– What was happening when my application crashed?
• ARM debug falls into two categories, invasive and non-
invasive
31. TYPES OF DEBUG
• Invasive
– Any debug method that affects the state of the system
– For example:
• Stopping execution
• Modifying registers
• Reading from and writing to memory via the core
AAETC5v00
Developing Code for ARM 31
• Reading from and writing to memory via the core
• Non-invasive
– Any debug method that does not effect the state of
the system
– For example:
• Performance Monitoring Unit (without interrupts)
• Trace
32. DEBUG INFRASTRUCTURE
e.g. DStream
Debugger
USB/Ethernet
ARM
Debug Logic
CoreSightInfrastructure
Debug
Hardware
e.g. DS-5
JTAG
ARM
Debug Logic
Third Party IP
AAETC5v00
Developing Code for ARM 32
• ARM processors have integrated debug logic, which contains the
necessary registers and comparators to perform debug operations
― The Debug Status and Control Register (DSCR) in the Debug Logic controls the debug mode
and state of the core
• CoreSight is the standard for connecting together multiple debug
components in a system
― This course does not cover low level debug information or CoreSight
Debug Logic
33. ARM DEBUG LOGIC COMPONENTS
ARM Core ETM Memory
Control
Address
Data
Single Core System on Chip
Debug Logic
AAETC5v00
Developing Code for ARM 33
Debug
Port
Trace
Port
ETB
Debug Logic
34. AGENDA
Embedded Software Development
Compilation Tools
Linking and Libraries
Target Platforms
AAETC5v00
Developing Code for ARM 34
Target Platforms
Debug
• Invasive Debug
Non-Invasive Debug
Performance Monitoring
35. HALTING MODE DEBUG
• Debug State
– Core is halted and isolated from rest of system
– Processor and system state can be viewed/modified
– No interrupts will be handled until execution restarted by debugger
• Entry into debug state is caused by
AAETC5v00
Developing Code for ARM 35
• Entry into debug state is caused by
– Request from external debug agent, or
– Core hitting a breakpoint
• In debug state, the core is isolated from the clock
• The external debugger may read the status of core signals
• Under the control of the debugger, the core may be made to
execute instructions
– This allows the debugger to read and modify system state
36. MONITOR MODE DEBUG
• Used when it is not possible or desirable to halt the target CPU
– e.g. Hard disk controller, Engine Management system
• External debugger communicates with the system via a resident
software monitor
– This is downloaded to the target by the debugger
• Monitor program is entered via an exception
AAETC5v00
Developing Code for ARM 36
• Monitor program is entered via an exception
– Caused when a BKPT instruction is executed
– This instruction is placed in instruction memory by the debugger in order
to set breakpoints
• The debugger communicates with the monitor via a reserved
channel called the “Debug Communications Channel” (DCC)
• Breakpoints and Watchpoints can be set when in Monitor Mode
Debug
– Using MRC, MCR instructions from a privileged mode
37. BREAKPOINTS AND WATCHPOINTS
Data
Address
Instruction
MEMORYARM
Mask and
Control logic
VALUE
Comparators
EXECUTE
AAETC5v00
Developing Code for ARM 37
• Separate comparators on instruction and data buses
• Breakpoints for Instruction, Watchpoints for Data
Instruction
Address
VALUE
BREAK
‘Tags’ instruction so break will only
occur if instruction reaches the
execute stage
FETCH
DECODE
EXECUTE
Pipeline
38. VECTOR CATCH
• Dedicated logic for trapping exceptions
– Sensitive only to hard exceptions
– A branch into the vector table will not be trapped
• Useful during early stages of development
when software handlers may not be implemented
FIQ
IRQ
(Reserved)
Data Abort
Prefetch Abort
Software Interrupt
Undefined Instruction
Reset
0x1C
0x18
0x14
0x10
0x0C
0x08
0x04
0x00
AAETC5v00
Developing Code for ARM 38
when software handlers may not be implemented
• Allows core to be reset and execution stop
at reset vector
– Prevents any code from being executed out of reset
• Useful to trap data aborts
Reset0x00
39. SINGLE STEPPING
• Stepping can occur at high or low levels in a debugger
• High level step
– Step over a single line of C/C++ code
• Can involve execution of many instructions
• Usually implemented by setting an instruction breakpoint at destination address
and running the core
AAETC5v00
Developing Code for ARM 39
and running the core
• Low level step
– Step a single machine instruction
• These are configured to halt core at any address except the current one, when the
core is run
• Execution halted when next instruction about to be executed
• Interrupts may or may not be respected when stepping (depending on
debug logic configuration)
40. SOFTWARE INSTRUCTION
BREAKPOINTS
• Software Instruction Breakpoints rely on modifying the contents of memory
– Can therefore only be used in RAM
– In theory, an unlimited number of software breakpoints can be set
• Debug tools use BKPT instruction
• Original instruction needs to be replaced to ‘step off’ the breakpoint
– First instruction stepped, and breakpoint replaced when target execution
started
AAETC5v00
Developing Code for ARM 40
Memory
1. Read and store opcode
2. Write BKPT opcode
BKPT instruction written to memory
41. VIEWING MEMORY
• When in debug state, most debuggers access memory through the
processor
– This means the debugger displays memory as seen by the processor
• Will see the affects of the memory management, caches, etc…
ARMDAP
(Debug Access Port)
Chip
AAETC5v00
Developing Code for ARM 41
• If a debugger reads or writes target memory it may need to perform cache
maintenance operations
– For instance, to write a software breakpoint, the debugger may need to clean the
data cache and invalidate the instruction cache
• Beware of possible side-effects when accessing memory via the debugger
(Debug Access Port)
Memory
42. SEMIHOSTING
• Semihosting: Library code runs on ARM target, low-level I/O
support provided by debug tools
printf(“Hello World!n”);
:
Application Code Library Code
SVC x123456
:
AAETC5v00
Developing Code for ARM 42
– Commonly used to provide file and string I/O before hardware-specific device
drivers available
• Uses reserved SVC numbers (0x123456 or 0xAB)
– ARM compilation tools use semihosting implementations for many default C
library I/O functions (eg: printf, scanf, fopen)
• Full details in compiler documentation
– Semihosting is supported by all ARM's debug tools
43. AGENDA
Embedded Software Development
Compilation Tools
Linking and Libraries
Target Platforms
AAETC5v00
Developing Code for ARM 43
Target Platforms
Debug
Invasive Debug
• Non-Invasive Debug
Performance Monitoring
44. ARM TRACE LOGIC COMPONENTS
ARM Core ETM Memory
Control
Address
Data
Single Core System on Chip
Debug Logic
AAETC5v00
Developing Code for ARM 44
Debug
Port
Trace
Port
ETB
Debug Logic
• What is trace?
• Trace is non-invasive debug
45. ON-CHIP TRACE CAPTURE VS. OFF-CHIP
• High speed, high bandwidth trace
– Small on-chip embedded trace buffer (ETB)
– Limited execution history and data capture
– Useful for in field failure analysis
AAETC5v00
Developing Code for ARM 45
• Lower speed, lower bandwidth trace
– Larger off-chip trace buffer
• Trace port analyzer (e.g. RVT unit)
– Increased execution history
– Better for profiling and code coverage
• More trace port pins, higher bandwidth
46. STANDARD DEBUG TECHNIQUES
• Call Stack
– A call stack trace will show you the history of function calls up to the point
the program halted
• Single Step/Start/Stop
– Allows you to execute
AAETC5v00
Developing Code for ARM 46
– Allows you to execute
• Single instructions
• Single high-level source statements
• Functions
• Etc.
• Printf
– Simple text output (often generated via printf) is a very
common way of tracking program execution
printf(“Hello World!n”);
:
47. DEBUG SERVER VS. BARE METAL
• A debug server (e.g. GDBserver) is a control program which runs on the
target platform alongside the application you wish to debug
– Requires that the application to be debugged is already resident on the target
GDBServer
ARM
Application
TCP or serial port
Debugger
AAETC5v00
Developing Code for ARM 47
JTAG
– Requires that the application to be debugged is already resident on the target
– Operates under Unix-based OSes (like Linux or Android)
• Bare metal debug
– Used to debug non-OS based images, kernels and device drivers
– Images can be dynamically downloaded to target memory
ARM
Debugger
48. AGENDA
Embedded Software Development
Compilation Tools
Linking and Libraries
Target Platforms
AAETC5v00
Developing Code for ARM 48
Target Platforms
Debug
Invasive Debug
Non-Invasive Debug
• Performance Monitoring
49. PERFORMANCE MONITORING
HARDWARE
• ARMv7-A cores include a performance monitoring unit (PMU)
• A PMU provides a non-intrusive method of collecting execution information
from the core
– Enabling the PMU does not change the timing of the core
• The PMU provides:
– Cycle counter – counts execution cycles (optional 1/64 divider)
– Programmable event counters
AAETC5v00
Developing Code for ARM 49
– Programmable event counters
• The number of counters and available events vary between cores
– The PMU can be configured to generate interrupts if a counter overflows
• Counting the interrupts allows data to be collected over an arbitrarily long time
period
• Some examples common to most cores:
– Cache Hits or Misses, TLB Misses (on MMU cores), Branch prediction, correct/incorrect
predictions, Number of instructions executed, etc…
• Some events are architecturally defined while others are core-dependent
– Check the ARM ARM and your core’s TRM for a full list
50. USING THE PMU IN LINUX
• In an OS environment you may not have direct access to the PMU
• Most OSes will provide some other method to access the PMU
– Typically an API, e.g. Linux provides PerfEvents
AAETC5v00
Developing Code for ARM 50
armv7_pmnc_enable_counter(ARMV7_CCNT);
armv7pmu_start(void);
armv7pmu_stop(void);
armv7pmu_read_counter(ARMV7_CCNT);
51. HOW CLOSE TO REALITY?
• When debugging you need to consider how close the development
platform is to your final target hardware:
• Custom board (final hardware)
• May not be available until late in the development cycle
• Development board based on the same part
• Available earlier and similar base peripheral set, may not
include custom IP
AAETC5v00
Developing Code for ARM 51
Realism
include custom IP
• Development platform based on the same core
• Available very early, but could have a very different peripheral
set/memory characteristics to final design
• Cycle Accurate Model
• Limited availability
• Programmers view model
• Available early, may not show errors due to timing or access
ordering
52. ARE MY NUMBERS MEANINGFUL?
• It is easy to get a set of numbers, but how can you ensure that they are
meaningful?
• System Configuration
– Are you configuring the core and board features (MMU/MPU, caches, branch
prediction…) as they will be in the final design?
• Semihosting
AAETC5v00
Developing Code for ARM 52
• Semihosting
– An “out of the box” build with DS-5 will use semihosting for many operations
• For example input/output and calls to time()
– Semi-hosted operations can take hundreds or thousands of extra cycles
– Useful for getting something working, but will not be included in the final design
• Code Fragments: Caches & Interrupts
– When testing small code sections (e.g. an algorithm) in isolation you might get
different performance to running the same code under an OS
• Code may fit entirely within cache, without risk of being evicted
• Might not have interrupts enabled