he common recipe for performance improvement is to profile an application, identify the most time-consuming routines, and finally select them for optimization. Sometimes that is not enough. Developers may have to look inside the OS searching for performance improvement opportunities. Or they might need to optimize code inside a third party library they do not have access to. For those cases, other strategies shall be used. This presentation reports the experiences of Motorola's Brazilian developers reducing the startup time of an application on Motorola's MOTOMAGX embedded Linux platform. Most of the optimization was performed in the binary loading stage, prior to the execution of the entry point function. This endeavor required use of Linux ABI and Linux Loader going beyond typical bottleneck searching. The presentation will cover prelink, dynamic library loading, tuning of shared objects, and enhancing user experience. A live demo will show the use of prelink and other tools to improve performance of general Linux platforms when libraries are used.
8. Loading a dynamically
linked program
.interp
A
Load dynamic
linker .rel.text
Relocation
.rel.data
.dynamic
Libraries .init
Dependency
libraries
Program’s
Symbol entry point
tables
A
9. A closer look at relocation
Relative Symbol-based
Type
Lookup failed
Symbol’s
Compute
hash Yes
offset
Lookup
Hash Next No
scope
Add load bucket object empty
address
Yes Next
Match
element
No No
Adjust
Chain Yes
address
empty
12. How does prelink work? I
• Collects ELF binaries which should be prelinked and all the ELF
shared libraries they depend on
• Assigns a unique virtual address space slot for each library and
relinks the shared library to that base address
• Resolves all relocations in the binary or library against its
dependant libraries and stores the relocations into the ELF object
• Stores a list of all dependant libraries together with their
checksums into the binary or library
• For binaries, it also computes a list of conflicts and stores it into a
special ELF section
Note: Libraries shall be compiled with the GCC option -fPIC
13. How does prelink work? II
• At runtime, the dynamic linker first checks if it is prelinked itself
• Just before starting an application, the dynamic linker checks if:
• There is a library list section created by prelink
• They are present in symbol search scope in the same order
• None have been modified since prelinking
• There aren’t any new shared libraries loaded either
• If all conditions are satisfied, prelinking is used:
• Dynamic linker processes the fixup section and skips all normal
relocation handling
• If at least one condition fails:
• Dynamic linker continues with normal relocation processing in the
executable and all shared libraries
16. How to use prelink?
• prelink –avf --ld-library-path=PATH --dynamic-linker=LDSO
• -a --all
• Prelink all binaries and dependant libraries found in directory hierarchies
specified in /etc/prelink.conf
• -v --verbose
• Verbose mode. Print the virtual address slot assignment to libraries
• -f --force
• Force re-prelinking even for already prelinked objects for which no
dependencies changed
• --ld-library-path=PATH
• Specify special LD_LIBRARY_PATH to be used when prelink queries
dynamic linker about symbol resolution details
• --dynamic-linker=LDSO
• Specify alternate dynamic linker instead of the default
19. Motivation II
If there are any libraries you are going to use
only on special occasions, it is better to load
them when they are really needed.
20. The Basics
#include <dlfcn.h>
void* dlopen ( const char* filename, int flags);
void* dlsym ( void* handle, const char* symbol);
char* dlerror (void);
int dlclose (void* handle);
#echo Although you don’t have to link against the
library
#echo you still have to link against libdl
#
#gcc main.cpp -ldl -o program
21.
22. Loading C++ Libraries
C++ uses mangling!
int mod (int a , int b); _Z3sumii
float mod (float a, float b); _Z3sumff
math.cpp math.o
23. The example
class Foo
{
public:
Foo(){}
~Foo(){}
void bar(const char * msg)
{
std::cout<<"Msg:"<<msg<<std::endl;
}
};
24. The solution
Step 1 Define an interface for your class.
Foo
+ Foo()
+ ~Foo()
+ void bar(const char*)
25. The solution
Step 1 Define an interface for your class.
<<interface>>
Foo
+ virtual void bar(const
char*) = 0
FooImpl
+ Foo()
+ ~Foo()
+ void bar(const char*)
26. The solution - Lib’s Header file
Step 1 Define an interface for your class
#ifndef FOO_H__
#define FOO_H__
class Foo
{
public:
virtual void bar (const char*) = 0;
};
27. The solution - Lib’s Header file
Step 2 Create “C functions” to create and destroy instances
of your class
Step 3 You might want to create typedefs
extern "C" Foo* createFoo();
extern "C" void destroyFoo(Foo*);
typedef Foo* (*createFoo_t) ();
typedef void (*destroyFoo_t)(Foo*);
#endif
31. Inspiration
“How To Write Shared Libraries”
Ulrich Drepper- Red Hat
http://people.redhat.com/drepper/dsohowto.pdf
32. Less is always better
Keep at minimum…
• The number of libraries you directly or indirectly depend
• The size of libraries you link against shall have the smallest size possible
• The number for search directories for libraries, ideally one directory
• The number of exported symbols
• The length of symbols strings
• The numbers of relocations
34. Reducing search space
Step 1 Set LD_LIBRARY_PATH to empty
Step 2 When linking use the options:
-rpath-link <dir> to the specify your system’s directory for
libraries
-z nodeflib to avoid searching on /lib, /usr/lib and others
places specified by /etc/ld.so.conf and /etc/ld.so.cache
#export LD_LIBRARY_PATH=“”
#gcc main.cpp -Wl,-z,nodeflib -Wl,-rpath-link,/lib
-lfoo -o program
35. Reducing exported symbols
Using GCC’s attribute feature
int localVar __attribute__((visibility(“hidden”)));
int localFunction() __attribute__((visibility(“hidden”)));
class Someclass
{
private:
static int a __attribute__((visibility(“hidden”)));
int b;
int doSomething(int d)__attribute__((visibility
(“hidden”)));
public:
Someclass(int c);
int doSomethingImportant();
};
36. Reducing exported symbols II
{ You can tell the linker which
global: symbols shall be exported
cFunction*; using export maps
extern “C++”
{
cppFunction*;
*Someclass;
Someclass::Someclass*; #g++ -shared example.cpp -o
Someclass::?Someclass*; libexample.so.1 -Wl,
Someclass::method* -soname=libexample.so.1 -Wl,-
}; -version-script=example.map
local: *;
};
37. Pro and Cons
Pros Cons
Visibility attribute Visibility attribute
• Compiler can generate optimal • GCC’s specific feature;
code; • Code become less readable;
Export Maps Export Maps
• More practical; • No optimization can be done by
• Centralizes the definition of library’s compiler because any symbol may
API; be exported
38. Restricting symbol string’s lenght
namespace java
{
namespace lang
{
class Math
{
static const int PI;
static double sin(double d);
static double cos(double d);
static double FastFourierTransform
(double a, int b,const int** const c);
}; _ZN4java4lang4Math2PIE
} _ZN4java4lang4Math3sinEd
} _ZN4java4lang4Math3cosEd
_ZN4java4lang4Math20FastFourierTransformEdiPPKi
39. Avoiding relocations
char* a = “ABC”; A B C 0
.data
const char a[] = “ABC”; A B C 0
.rodata
ELF
43. Improving responsiveness
It is not always possible to optimize code because:
• You might not have access to problematic code;
• It demands too much effort or it is too risky to change it.
• There is nothing you can do (I/O latency, etc…).
• Other reasons ...
49. In conclusion …
• You learned that libraries may play an important role in the startup
performance of your application;
• You saw how dynamic link works on Linux;
• You were introduce to prelink and and became aware of its potential
to boost the startup;
• You learned how to load a shared object on demand, preventing
that some them be a burden at startup;
• You got some tips on how to write libraries to get the best
performance;
• You understood that an UI that provides quick user feedback is more
important than performance;