Instrumentation of Unsafe C Methods to Safe Methods (Samuel Bret Collie) (1)
1. BY SAM COLLIE
UNIVERSITY OF ALABAMA
AT BIRMINGHAM
IPROGRESS LAB
Instrumentation of Unsafe C
Functions to Safe Functions Using
the RTC Tool
Slide: 1 of 21
2. Outline
Background on RTC and ROSE
My Extension
Comparison to other tools
Current Drawbacks and Future Goals
Slide: 2 of 21
3. What is RTC?
RTC is a runtime checking tool for the C
programming language co-developed at
UAB.
It’s built on the ROSE compiler to read in
source code, make changes to it, and output
new code.
Slide: 3 of 21
4. Background
C is the second most popular programming
language (IEEE Spectrum ranking 2015).
Thanks to manual management of memory via
pointer, C programs can achieve a staggering level
of efficiency.
Slide: 4 of 21
5. Safe Languages
Slide: 5 of 21
Java:
int[] numbers = new int[10];
numbers[11] = 5;
Exception in thread "main"
java.lang.IndexOutOfBoundsException: Index: 11,
Size: 10 at
java.util.ArrayList.rangeCheck(ArrayList.java:604)
6. Unsafe Languages
Slide: of 21
C Language:
int numbers[10];
numbers[11] = 5;
What will C do?
Undefined Behaviour.
8. Pointer Metadata
Slide: 7 of 21
Records Lower and Upper bounds of allocation.
Records scope in which pointer is valid.
Records whether the memory allocated to the
pointer has been freed.
9. Checks Made by RTC
Slide: 8 of 21
Uses the metadata to make spatial and temporal
checks
Spatial: No index out of bounds access
Temporal: No leaving a valid scope with freeing
memory allocated to pointer
11. Metadata Stack
Globally accessible stack
Handles passing pointers from one function to
another
Only handles situation where both functions can
access same global scope.
Slide: 10 of 21
13. Specific Aims
To instrument function calls to C
Standard Library functions that are
unsafe.
By changing the function calls to call a
library whose source code will be visible to
RTC during the instrumentation process.
Slide: 12 of 21
14. Take the following code:
BEFORE
int main(){ memcpy(dest, src, size); }
************************************************
AFTER
int main(){ rtcMemCpy(dest,src,size); }
rtcMemCpy(void* dest, const void* src, size_t size){
// rtcMemCpy source code
}
************************************************
Slide: 14 of 21
15. Implementation
Takes place before RTC instrumentation
Function names are instrumented, and function
definitions are added.
Finally, RTC itself is run to insert the checks into
added function definitions
Slide: 15 of 21
16. How is this helpful?
There are several commonly used standard
library functions that operate on pointers.
Some of these functions don't guarantee
spatial and temporal safety within the
function (unsafe).
Slide: 16 of 21
18. Comparison to Other Techniques
ManagedC (Grimmer et al., 2015)
All or nothing instrumentation
Valgrind
Can make checks due to using binaries, has a much
larger footprint
Address Sanitizer (Serebryany et al, 2012)
Handles “some” functions but excludes third party
libraries
Slide: 18 of 21
19. Current Drawbacks
Some standard library functions are dependent on
other standard library functions that are also unsafe.
Can’t keep up to date pointer information when
pointers are passed to functions not in our provided
library.
Slide: 19 of 21
20. Future Goals
Fix interdependency problems
Optimize
Add more functions to the safe library
Allow easy addition of functions to safe library by
users
Slide: 20 of 21
Editor's Notes
Hi, my name is Sam Collie. I’m an undergraduate researcher at the University of Alabama at Birmingham in the IPROGRESS Lab. My topic is The Instrumentation of Unsafe C Methods to Safe Methods Using the RTC Tool.
So first, we’ll be looking at background information concerning RTC and the compiler it’s built on, ROSE. Next we’ll discuss the extension I’m currently working on for RTC. Then we’ll compare the features I’ve added to other tools. Finally, we’ll discuss the current drawbacks of my extension and my future goals for improving it.
That’s where RTC comes in. RTC is a runtime checking tool for the C programming language co-developed at UAB. It’s built on the ROSE source to source compiler to instrument code written in C. The ROSE compiler reads in source code and breaks it down into a data structure (an abstract syntax tree), that can be traversed and manipulated by other code.
According to the IEEE Spectrum ranking C is the second most popular programming language. This will probably not come as a shock to most of you since C is a very old and efficient programming language. Unlike Java, C can manually manage memory via pointers. Pointers allow memory to be allocated to them for general use, thus allowing a great deal of robust usage out of the C programming language. This allows it to operate at a lower level than most high level languages, and not to work with safety checks and runtime environements. However, the contribution to speed C gets from it’s lack of a runtime environment is also one of it’s greatest weaknesses. C has no built in exceptions that will be thrown if you use access memory outside of a pointer’s allocation. If your program crashes, C will likely only tell you that the program crashed and nothing else. Even more fun, C might simply give you a random value, which can be even more tricky to troubleshoot.
First we run the source code through the ROSE parser which gives us our AST. Then preprocessing normalizes the AST. (e.g. converting all arrow expressions to dot expressions, moving termination conditions of for and while loops into their bodies, moving structs defined in functions out into the global scope). We then instrument the source code to include the code for the metadata and checks we want to impose.
RTC implements three kinds of safety checks: Arithmetic overow/underow, memory safety checks to nd memory bugs on stack and heap, and run-time type-safety violations. For every type of pointer in the input program, RTC declares and defines a struct to hold pointers of that type, as well as functions that handle the creation of those structs.
Average Execution overhead?
Memcpy recently gained notoriety due to it’s role in the heartbleed bug. Memcpy takes two pointers and an integer, and copies the number of bytes specified by the integer into the other pointer. What happens if I specify more bytes than whats available from the source pointer.
ASAN: The current implementation of AddressSanitizer is based on compile-time instrumentation and thus does not han-dle system libraries (it does, however, handle some C library functions such as memset). For the open source libraries the best approach might be to create special instrumented builds.
ManagedC: Managed allocations cannot be shared with precompiled native code. Therefore, ManagedC requires that the source code of the entire C program is available and is executed under ManagedC.