1. 1Samsung Open Source Group
Clang:
Much more than just a C/C++ Compiler
Tilmann Scheller
Principal Compiler Engineer
t.scheller@samsung.com
Samsung Open Source Group
Samsung Research UK
LinuxCon Europe 2016
Berlin, Germany, October 4 – 6, 2016
2. 2Samsung Open Source Group
Overview
● Introduction
● LLVM Overview
● Clang
● Summary
4. 4Samsung Open Source Group
What is LLVM?
● Mature, production-quality compiler framework
● Modular architecture
● Heavily optimizing static and dynamic compiler
● Supports all major architectures (x86, ARM, MIPS,
PowerPC, …)
● Powerful link-time optimizations (LTO)
● Permissive license (BSD-like)
5. 5Samsung Open Source Group
LLVM sub-projects
● Clang
C/C++/Objective C frontend and static analyzer
● LLDB
Next generation debugger leveraging the LLVM libraries, e.g. the Clang expression
parser
● lld
Framework for creating linkers, will make Clang independent of the system linker in
the future
● Polly
Polyhedral optimizer for LLVM, e.g. high-level loop optimizations and data-locality
optimizations
7. 7Samsung Open Source Group
Who is using LLVM?
● Rust
● Android (NDK, RenderScript)
● Portable NativeClient (PNaCl)
● Majority of OpenCL implementations based on
Clang/LLVM
● CUDA
● LLVM on Linux: LLVMLinux, LLVMpipe (software
rasterizer in Mesa), AMDGPU drivers in Mesa
8. 8Samsung Open Source Group
Clang users
● Default compiler on macOS
● Default compiler on FreeBSD
● Default compiler for native applications on Tizen
● Default compiler on OpenMandriva Lx 3.0
● Debian experimenting with Clang as an additional
compiler (94.4% of ~24.5k packages successfully built with Clang 3.8.1)
● Android NDK defaults to Clang
10. 10Samsung Open Source Group
LLVM
● LLVM IR (Intermediate Representation)
● Scalar optimizations
● Interprocedural optimizations
● Auto-vectorizer (BB, Loop and SLP)
● Profile-guided optimizations
11. 11Samsung Open Source Group
Compiler architecture
C Frontend
C++ Frontend
Fortran Frontend
Optimizer
x86 Backend
ARM Backend
MIPS Backend
12. 12Samsung Open Source Group
Compilation steps
● Many steps involved in the translation from C source code to machine code:
– Frontend:
● Lexing, Parsing, AST (Abstract Syntax Tree) construction
● Translation to LLVM IR
– Middle-end
● Target-independent optimizations (Analyses & Transformations)
– Backend:
●
Translation into a DAG (Directed Acyclic Graph)
●
Instruction selection: Pattern matching on the DAG
● Instruction scheduling: Assigning an order of execution
● Register allocation: Trying to reduce memory traffic
13. 13Samsung Open Source Group
LLVM Intermediate Representation
● The representation of the middle-end
● The majority of optimizations is done at LLVM IR level
● Low-level representation which carries type information
● RISC-like three-address code in static single assignment
form (SSA) with an infinite number of virtual registers
● Three different formats: bitcode (compact on-disk format),
in-memory representation and textual representation
(LLVM assembly language)
15. 15Samsung Open Source Group
Target-independent code generator
● Part of the backend
● Domain specific language to describe the instruction set,
register file, calling conventions (TableGen)
● Pattern matcher is generated automatically
● Backend is a mix of C++ and TableGen
● Usually generates assembly code, direct machine code
emission is also possible
17. 17Samsung Open Source Group
Clang
● Goals:
– Fast compile time
– Low memory usage
– GCC compatibility
– Expressive diagnostics
● Several tools built on top of Clang:
– Clang static analyzer
– clang-format, clang-tidy
18. 18Samsung Open Source Group
Clang Static Analyzer
● Part of Clang
● Tries to find bugs without executing the program
● Slower than compilation
● False positives
● Source annotations
● Works best on C code
● Runs from the commandline (scan-build), web interface
for results
19. 19Samsung Open Source Group
Clang Static Analyzer
● Core Checkers
● C++ Checkers
● Dead Code Checkers
● Security Checkers
● Unix Checkers
25. 25Samsung Open Source Group
clang-format
● Automatic code formatting
● Consistent coding style is important
● Developers spend a lot of time on code formatting (e.g.
requesting trivial formatting changes in reviews)
● Supports different coding conventions (~80 settings)
● Includes configurations for LLVM, Google, Chromium,
Mozilla and WebKit coding conventions
26. 26Samsung Open Source Group
clang-format
● Once the codebase is "clang-format clean" the coding
conventions can be enforced automatically
● Simplifies reformatting after automated refactorings
● Uses the Clang lexer
● Supports the following programming languages: C/C++,
Java, JavaScript, Objective-C and Protobuf
27. 27Samsung Open Source Group
clang-tidy
● Detect bug prone coding patterns
● Enforce coding conventions
● Advocate modern and maintainable code
● Checks can be more expensive than compilation
● Currently 136 different checks
● Can run static analyzer checks as well
28. 28Samsung Open Source Group
Sanitizers
● LLVM/Clang-based Sanitizer projects:
– AddressSanitizer – Fast memory error detector
– ThreadSanitizer – Detects data races
– LeakSanitizer – Memory leak detector
– MemorySanitizer – Detects reads of uninitialized variables
– UBSanitizer – Detects undefined behavior
29. 29Samsung Open Source Group
AddressSanitizer: Stack Buffer Overflow
int main(int argc, char **argv) {
int stack_array[100];
stack_array[1] = 0;
return stack_array[argc + 100];
}
Example from https://linuxplumbersconf.org/2015/ocw/proposals/3261.html
$ clang++ -O1 -fsanitize=address a.cc; ./a.out
==10589== ERROR: AddressSanitizer stack-buffer-overflow
READ of size 4 at 0x7f5620d981b4 thread T0
#0 0x4024e8 in main a.cc:4
Address 0x7f5620d981b4 is located at offset 436 in frame
<main> of T0's stack:
This frame has 1 object(s):
[32, 432) 'stack_array'
30. 30Samsung Open Source Group
AddressSanitizer: Use-After-Free
int main(int argc, char **argv) {
int *array = new int[100];
delete [] array;
return array[argc];
}
$ clang++ -O1 -fsanitize=address a.cc && ./a.out
==30226== ERROR: AddressSanitizer heap-use-after-free
READ of size 4 at 0x7faa07fce084 thread T0
#0 0x40433c in main a.cc:4
0x7faa07fce084 is located 4 bytes inside of 400-byte region
freed by thread T0 here:
#0 0x4058fd in operator delete[](void*) _asan_rtl_
#1 0x404303 in main a.cc:3
previously allocated by thread T0 here:
#0 0x405579 in operator new[](unsigned long) _asan_rtl_
#1 0x4042f3 in main a.cc:2
Example from https://linuxplumbersconf.org/2015/ocw/proposals/3261.html
31. 31Samsung Open Source Group
AddressSanitizer: Stack-Use-After-Return
int main() {
LeakLocal();
return *g;
}
$ clang++ -g -fsanitize=address a.cc
$ ASAN_OPTIONS=detect_stack_use_after_return=1 ./a.out
==19177==ERROR: AddressSanitizer: stack-use-after-return
READ of size 4 at 0x7f473d0000a0 thread T0
#0 0x461ccf in main a.cc:8
Address is located in stack of thread T0 at offset 32 in frame
#0 0x461a5f in LeakLocal() a.cc:2
This frame has 1 object(s):
[32, 36) 'local' <== Memory access at offset 32
int *g;
void LeakLocal() {
int local;
g = &local;
}
Example from https://linuxplumbersconf.org/2015/ocw/proposals/3261.html
32. 32Samsung Open Source Group
MemorySanitizer: Uninitialized Data
int main(int argc, char **argv) {
int x[10];
x[0] = 1;
return x[argc];
}
$ clang -fsanitize=memory a.c -g; ./a.out
WARNING: Use of uninitialized value
#0 0x7f1c31f16d10 in main a.cc:4
Uninitialized value was created by an
allocation of 'x' in the stack frame of
function 'main'
Example from https://linuxplumbersconf.org/2015/ocw/proposals/3261.html
33. 33Samsung Open Source Group
UBSanitizer: Integer Overflow
int main(int argc, char **argv) {
int t = argc << 16;
return t * t;
}
$ clang -fsanitize=undefined a.cc -g; ./a.out
a.cc:3:12: runtime error:
signed integer overflow: 65536 * 65536
cannot be represented in type 'int'
Example from https://linuxplumbersconf.org/2015/ocw/proposals/3261.html
34. 34Samsung Open Source Group
UBSanitizer: Invalid Shift
int main(int argc, char **argv) {
return (1 << (32 * argc)) == 0;
}
$ clang -fsanitize=undefined a.cc -g; ./a.out
a.cc:2:13: runtime error: shift exponent 32 is
too large for 32-bit type 'int'
Example from https://linuxplumbersconf.org/2015/ocw/proposals/3261.html
35. 35Samsung Open Source Group
LibFuzzer
● Coverage-guided fuzz testing
● Coverage data provided by SanitizerCoverage (very low
overhead, tracking of function-level coverage causes no
measurable overhead)
● Best used in combination with the different Sanitizers
● LLVM project has bots which are fuzzing clang-format
and Clang continuously
38. 38Samsung Open Source Group
Summary
● Great compiler infrastructure
● Fast C/C++ compiler with expressive diagnostics
● Bug detection at compile time
● Automated formatting of code
● Detect bugs early with Sanitizers
● Highly accurate source code browsing, code completion
39. 39Samsung Open Source Group
Give it a try!
● Visit llvm.org
● Distributions with Clang/LLVM packages:
– Fedora
– Debian/Ubuntu
– openSUSE
– Arch Linux
– ...and many more