1. Program Structure
In
GNU/Linux
Author:
Varun Mahajan
<varunmahajan06@gmail.com>
2. Contents
$gcc *.c -o Program
– Processing of a User Program
• Preprocessing
• Compilation
• Assembly
• Linking
– ELF Format
The content is specific to a GNU/Linux system running on Intel
Architecture
3. Processing of a User Program
.c .h
(C code)
cpp main.c main.i
cpp OR
(C pre-processor) gcc -E main.c -o main.i
.i
(Preprocessed C
code)
/usr/lib/gcc/i486-linux-gnu/4.3.2/cc1 -fpreprocessed
cc1
main.i -o main.s -quiet
(C compiler)
OR
gcc -S main.i -o main.s
.s
(Assembly
code)
as main.s -o main.o
as
(Assembler) OR
gcc main.s -o main.o
.o
(Object code)
5. ELF Format: Object Files
ELF Header
Program Header Table
(optional)
Section Header Table
Section 1
...
...
...
...
Section n
Except the ELF Header, which is in the beginning of the file, rest of the components may be in
any order
6. ELF Header (.o)
$readelf -h main.o ELF Identification
Relocation is the process of
connecting symbolic references with
symbolic definitions. For example,
when a program calls a function, the
associated call instruction must transfer
control to the proper destination
address at execution
Relocatable files must have
information that describes how to
modify their section contents, thus
allowing executable and shared object
files to hold the right information for a
process's program image
An ELF header resides at the beginning and holds a 'road map' describing the file's organization
●
ELF Identification: (16 bytes)
●
Magic no: Identifies the file as ELF object file [0x7f, 'E', 'L', 'F']
●
Class: Identifies file's class or capacity. ELF32 supports machines with files and virtual address spaces up to 4 gigabytes
●
Data: Data encoding for processor-specific data in the object file
●
Version: ELF header version number
●
OS/ABI: Operating system
●
ABI Version: Application Binary Interface version (low-level interface between an application program and the OS)
●
Type: Type of the object file (Relocatable, Executable, Shared object, etc)
●
Machine: The required architecture for the file
●
Entry point address: The virtual address to which the system first transfers the control thus starting the process. If the file has no
associated entry point then it holds 0
●
Start of program headers: Program header table's file offset in bytes. If the file has no program header table then it holds 0
●
Start of section headers: Section header table's file offset in bytes. If the file has no section header table then it holds 0
●
Flags: Processor specific flags
●
Section header string table index: The section header table index of the entry associated with the section name string table (This
section holds section names)
7. Section Header Table (.o)
#Section Header Table (executable)
$readelf -S main.o
A Section Header Table is an array of Section Headers
$readelf -p '.shstrtab' main.o
●
Name: Name of the section
●
Type: Type of the section
●
PROGBITS: Holds information whose format and meaning are determined solely by the
program
●
REL: Holds relocation entries without explicit addends
●
NOBITS: Occupies no space in the file but otherwise resembles PROGBITS
●
STRTAB: Holds a string table
●
SYMTAB: Holds a symbol table
●
Addr: If this section will appear in the memory image of a process, this member gives the address at
which section's first byte should reside. Otherwise it contains 0
●
Off (Offset): The byte offset from the beginning of the file to the first byte in the section
●
Size: Section's size in bytes
●
ES (Entry Size): Size in bytes of each entry (For the sections which hold a table of fixed-size entries)
●
Flg (Flags): Miscellaneous attributes
●
W: Contains data that should be writable during process execution
●
X: Contains executable machine instructions
●
A: Occupies memory during process execution
●
Lk (Link), Inf (info): Interpretation depends on section type
●
AL (Address Align): Some sections have address alignment constraints. (0, 1 : no constraints)
8. .symtab Section: Symbol Table (.o)
#.symtab & .dynsym Sections: Symbol Tables (executable)
$readelf -s main.o
$readelf -p '.strtab' main.o
Symbol Table holds the information needed to locate and relocate a program's symbolic definitions and references
●
Name: Symbol name ●
Size: Size in bytes (for symbols which have associated size, e.g. for
●
Type: Symbol type data objects). 0 if symbol has no size or unknown size
●
NOTYPE: Type not specified ●
Ndx (Index):
●
OBJECT: Symbol is associated with a data object ●
Relevant section header table's index
●
FUNC: Symbol is associated with a function or other ●
UND: undefined, missing, irrelevant or otherwise
executable code meaningless section reference
●
SECTION: Symbol is associated with a section ●
COM: Unallocated C external variables
●
FILE: File symbol ●
ABS: Specifies absolute value for the corresponding
●
Bind: reference
●
LOCAL: Symbol not visible outside the object file in which ●
Value: For relocatable files:
is defined ●
Alignment constraints for a symbol whose Ndx is COM
●
GLOBAL: Symbol is visible to all object files being ●
Section offset for a defined symbol
combined
9. .data & .bss Sections (.o)
#.data & .bss Sections (executable)
$objdump -DxtT main.o
●
.data: Holds initialized data that contribute towards the program's memory image
●
.bss: Holds uninitialized data that contribute to the program's memory image. By definition
the system initializes the data with zeros when the program begins to run. The section
occupies no file space
10. .rodata Section (.o)
$objdump -s main.o
$readelf -p '.rodata' main.o
.rodata Section holds read-only data that typically contribute to a non-writable segment
in the process image
11. .text Section (.o)
#.text Section (executable)
$objdump -DxtT main.o
.text Section holds the executable
instructions of the program
12. .rel.text Section (.o)
rel.text holds the Relocation Entries for the .text
$readelf -r main.o section
Relocation entries serve two functions. When a section of
code is relocated to a different base address, relocation
entries mark the places in the code that have to be modified.
In a linkable file, there are also relocation entries that mark
references to undefined symbols, so the linker knows where
to patch in the symbol's value when the symbol is finally
defined
Section header table:
●
Lk (link): Section header index of the associated symbol
table
●
Inf (Info): Section header index to which the relocation
applies
Relocation section:
Section Header table entries: ●
Offset: The location at which to apply the relocation action.
For Relocatable file:
●
The byte offset from the beginning of the section
to the storage unit affected by the relocation
●
Info:
●
((info) >> 8) is the symbol table index w.r.t.
which the relocation should be made
E.g.: A call instruction's entry would hold symbol table
index of the function being called
efunc
((0x1302 >> 8)) = 0x13 = 19
●
((info) & 0xff) is the Relocation Type
(processor specific)
E.g.: efunc
((0x1302) & 0xff) = 0x02 (R_386_PC32)
gei
((0xf01) & 0xff) = 0x01 (R_386_32)
The Link Editor merges one or more relocatable files to for
the output (executable or shared object file). It first decides
how to combine and locate the input files, then updates the
symbol values, and finally performs relocation
13. Linking with External Libraries
A Library is a collection of precompiled object files which can be linked into
programs
E.g. C Math library, etc
Two types:
●
Static Library: Archive file (.a). A collection of ordinary object files created using the
GNU archiver (ar)
When a program is linked against a static library, the machine code from the object files
for any external functions used by the program is copied from the library into the final
executable (Static Linking)
●
Shared Library: Shared Object (.so). It is created from the object files using the
-shared option of gcc
An executable file linked against a shared library contains only a small table of the
functions it requires, instead of the complete machine code from the object files for the
external functions. Before the executable file starts running, the machine code for the
external functions is copied into memory from the shared library file on disk by the
operating system (Dynamic Linking)
The standard system libraries are usually found in the directories ‘/usr/lib’ and ‘/lib’
14. Types of Object Files
●
Relocatable File: Holds code and data suitable for linking with other object
files to create an executable or shared object file
●
Executable File: Holds a program suitable for execution
●
Shared Object File: Holds code and data suitable for linking in two
contexts:
●
The Link Editor may process it with other relocatable and shared
object files to create another object file
●
The Dynamic Linker combines it with an executable file and other
shared objects to create a process image
15. Processing of a User Program contd...
main.o *.a *.so
*.o
edf.o (Static (Shared
(Relocatable)
(Relocatable) Libraries) Libraries)
ld
(Link Editor)
Program
(Executable)
ld -dynamic-linker /lib/ld-linux.so.2 /usr/lib/crt1.o /usr/lib/crti.o /usr/lib/gcc/i486-linux-gnu/4.3.2/crtbegin.o -L/usr/lib/gcc/i486-
linux-gnu/4.3.2/ main.o edf.o -lgcc -lgcc_eh -lc -lgcc_eh /usr/lib/gcc/i486-linux-gnu/4.3.2/crtend.o /usr/lib/crtn.o -o Program
17. Section Header Table (executable)
#Section Header Table (.o)
$readelf -S Program
●
Type:
●
NOTE: Holds information that
marks the file in some way
●
HASH: Holds symbol hash
table
●
DYNSYM: Holds a symbol
table
●
DYNAMIC: Holds information
for dynamic linking
18. .symtab & .dynsym Sections: Symbol Tables (executable)
#.symtab Section: Symbol Table (.o)
$readelf -s Program
21. .Program Header Table (executable)
$readelf -l Program
An Object File Segment contains one or more Sections
Program Header Table is an array of structures, each describing a Segment or other information the system needs to
prepare the program for execution
●
Offset: Offset from the beginning of the file at which the first byte of the segment resides
●
VirtAddr: The virtual address at which the first byte of the segment resides in the memory
●
FileSiz: Number of bytes in the file image of the segment
●
MemSiz: Number of bytes in the memory image of the segment
●
Flg: Permissions (R W E)
●
Type:
●
PHDR: Specifies the location size of the program header table itself both in file and memory image of the program
●
INTERP: Specifies the location and size of a null-terminated path name to invoke as an interpreter
●
LOAD: Loadable segment
●
DYNAMIC: Specifies dynamic linking information
●
Align: Gives the value to which the segments are aligned in memory and in the file
22. Brief description of some Sections
●
Following sections provide information for dynamic linking:
●
.dynsym: Holds dynamic linking symbol table
●
.dynstr: Holds strings needed for dynamic linking, most commonly the strings that represent the
names associated with symbol table entries
●
.interp: Holds the pathname of program interpreter
●
.hash: Holds a symbol hash table
●
.dynamic: Holds dynamic linking information
●
.rel & .relname: Holds relocation information
●
.got & .plt: Global offset table, Procedure linkage table (Content is processor specific)
●
.rela & relaname
●
Initialization and termination:
●
.init: Holds executable instructions that contribute to the process initialization code. When a
program starts to run, the system executes the code in this section before calling the main
program entry point
●
.fini: Holds executable instructions that contribute to the process termination code. When a
program exits normally, the system executes the code in this section
23. Segment Loading
●
Executable File Segments typically contain absolute code. To let the
process execute correctly, the segments must reside at the virtual addresses
used to build the executable
●
Shared Object Segments typically contain position-independent code. This
lets a segment's virtual address change from one process to another, without
invalidating the execution behavior