1. Operating Systems
CMPSCI 377
Dynamic Memory Management
Emery Berger
University of Massachusetts Amherst
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science
2. Dynamic Memory Management
How the heap manager is implemented
malloc, free
new, delete
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 2
3. Memory Management
Ideal memory manager:
Fast
Raw time, asymptotic runtime, locality
Memory efficient
Low fragmentation
With multicore & multiprocessors:
Scalable to multiple processors
New issues:
Secure from attack
Reliable in face of errors
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 3
4. Memory Manager Functions
Not just malloc/free
realloc
Change size of object, copying old contents
ptr = realloc (ptr, 10);
But: realloc(ptr, 0) = ?
How about: realloc (NULL, 16) ?
Other fun
calloc
memalign
Needs ability to locate size & object start
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 4
5. Fragmentation
Intuitively, fragmentation stems from
“breaking” up heap into unusable spaces
More fragmentation = worse utilization of
memory
External fragmentation
Wasted space outside allocated objects
Internal fragmentation
Wasted space inside an object
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 5
6. Classical Algorithms
First-fit
find first chunk of desired size
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 6
7. Classical Algorithms
Best-fit
find chunk that fits best
Minimizes wasted space
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 7
8. Classical Algorithms
Worst-fit
find chunk that fits worst
then split object
Reclaim space: coalesce free adjacent
objects into one big object
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 8
9. Implementation Techniques
Freelists
Linked lists of objects in same size class
Range of object sizes
First-fit, best-fit in this context
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 9
10. Implementation Techniques
Segregated size classes
Use free lists, but never coalesce or split
Choice of size classes
Exact
Powers-of-two
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 10
11. Implementation Techniques
Big Bag of Pages (BiBOP)
Page or pages (multiples of 4K)
Usually segregated size classes
Header contains metadata
Locate with bitmasking
Limits external fragmentation
Can be very fast
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 11
12. Runtime Analysis
Key components
Cost of malloc (best, worst, average)
Cost of free
Cost of size lookup (for realloc & free)
Examine for first-fit, best-fit, segregated
(with BiBOP)
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 12
13. Space Bounds
Fragmentation worst-case for “optimal”:
O(log M/m)
M = largest object size
m = smallest object size
Best-fit = O(M * m) !
Goal: perform well for typical programs
Considerations:
Internal fragmentation
External fragmentation
Headers (metadata)
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 13
14. Performance Issues
We’ll talk about scalability later
Reliability, too
But: general-purpose allocator often seen
as too slow
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 14
15. Custom Memory Allocation
Programmers replace Very common
new/delete, bypassing practice
system allocator Apache, gcc, lcc, STL,
database servers…
Reduce runtime – often
Language-level
Expand functionality –
support in C++
sometimes
Widely
Reduce space – rarely
recommended
“Use custom
allocators”
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 15
16. Drawbacks of Custom Allocators
Avoiding system allocator:
More code to maintain & debug
Can’t use memory debuggers
Not modular or robust:
Mix memory from custom
and general-purpose allocators → crash!
Increased burden on programmers
Are custom allocators really a win?
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 16
17. (1) Per-Class Allocators
Recycle freed objects from a free list
a = new Class1; Class1
Fast
free list
b = new Class1; +
c = new Class1; Linked list operations
+
a
delete a; Simple
+
delete b;
Identical semantics
b +
delete c;
C++ language support
+
a = new Class1;
c
Possibly space-inefficient
b = new Class1; -
c = new Class1;
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 17
18. (II) Custom Patterns
Tailor-made to fit allocation patterns
Example: 197.parser (natural language
parser)
db
a c
char[MEMORY_LIMIT]
end_of_array
end_of_array
end_of_array
end_of_array
end_of_array
a = xalloc(8); Fast
+
b = xalloc(16); Pointer-bumping allocation
+
c = xalloc(8);
- Brittle
xfree(b);
- Fixed memory size
xfree(c);
- Requires stack-like lifetimes
d = xalloc(8);
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 18
19. (III) Regions
Separate areas, deletion only en masse
regioncreate(r) r
regionmalloc(r, sz)
regiondelete(r)
- Risky
Fast
+
- Dangling
Pointer-bumping allocation
+
references
Deletion of chunks
+
- Too much space
Convenient
+
One call frees all memory
+
Increasingly popular custom allocator
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 19
20. Custom Allocators Are Faster…
Runtime - Custom Allocator Benchmarks
Custom Win32
1.75
Normalized Runtime
non-regions regions
1.5
1.25
1
0.75
0.5
0.25
0
r
he
er
lle
ze
m
c
c
vp
gc
lc
rs
si
ud
ac
ee
5.
6.
d-
pa
m
17
ap
br
17
xe
7.
c-
bo
19
As good as and sometimes much faster than Win32
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 20
21. Not So Fast…
Runtime - Custom Allocator Benchmarks
Custom Win32 DLmalloc
1.75
non-regions regions
Normalized Runtime
1.5
1.25
1
0.75
0.5
0.25
0
lle
e
r
he
c
r
m
c
vp
e
lc
z
gc
si
ud
rs
ee
ac
5.
d-
6.
pa
m
br
17
ap
17
xe
7.
c-
bo
19
DLmalloc: as fast or faster for most benchmarks
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 21
22. The Lea Allocator (DLmalloc 2.7.0)
Mature public-domain general-purpose
allocator
Optimized for common allocation patterns
Per-size quicklists ≈ per-class allocation
Deferred coalescing
(combining adjacent free objects)
Highly-optimized fastpath
Space-efficient
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 22
23. Space Consumption: Mixed Results
Space - Custom Allocator Benchmarks
Custom DLmalloc
1.75
non-regions regions
Normalized Space
1.5
1.25
1
0.75
0.5
0.25
0
lle
e
r
he
c
r
sim
c
vp
e
lc
z
gc
ud
rs
ee
ac
5.
d-
6.
pa
m
br
17
ap
17
xe
7.
c-
bo
19
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 23
24. Custom Allocators?
Generally not worth the trouble:
use good general-purpose allocator
Avoids risky software engineering errors
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 24
25. Problems with Unsafe Languages
C, C++: pervasive apps, but langs.
memory unsafe
Numerous opportunities for security
vulnerabilities, errors
Double free
Invalid free
Uninitialized reads
Dangling pointers
Buffer overflows (stack & heap)
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science
26. Soundness for “Erroneous” Programs
Normally: memory errors ) ? …
Consider infinite-heap allocator:
All news fresh;
ignore delete
No dangling pointers, invalid frees,
double frees
Every object infinitely large
No buffer overflows, data overwrites
Transparent to correct program
“Erroneous” programs sound
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science
27. Probabilistic Memory Safety
Approximate with M-heaps (e.g., M=2)
DieHard: fully-randomized M-heap
Increases odds of benign errors
Probabilistic memory safety
i.e., P(no error) n
Errors independent across heaps
E(users with no error) n * |users|
? Efficient implementation…
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science
28. Implementation Choices
Conventional, freelist-based heaps
Hard to randomize, protect from errors
Double frees, heap corruption
What about bitmaps? [Wilson90]
– Catastrophic fragmentation
Each small object likely to occupy one page
obj obj obj obj
pages
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science
29. Randomized Heap Layout
00000001 1010 10 metadata
size = 2i+3 2i+4 2i+5
heap
Bitmap-based, segregated size classes
Bit represents one object of given size
i.e., one bit = 2i+3 bytes, etc.
Prevents fragmentation
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science
30. Randomized Allocation
00000001 1010 10 metadata
size = 2i+3 2i+4 2i+5
heap
malloc(8):
compute size class = ceil(log2 sz) – 3
randomly probe bitmap for zero-bit (free)
Fast: runtime O(1)
M=2 – E[# of probes] · 2
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science
31. Randomized Allocation
00010001 1010 10 metadata
size = 2i+3 2i+4 2i+5
heap
malloc(8):
compute size class = ceil(log2 sz) – 3
randomly probe bitmap for zero-bit (free)
Fast: runtime O(1)
M=2 – E[# of probes] · 2
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science
32. Randomized Deallocation
00010001 1010 10 metadata
size = 2i+3 2i+4 2i+5
heap
free(ptr):
Ensure object valid – aligned to right address
Ensure allocated – bit set
Resets bit
Prevents invalid frees, double frees
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science
33. Randomized Deallocation
00010001 1010 10 metadata
size = 2i+3 2i+4 2i+5
heap
free(ptr):
Ensure object valid – aligned to right address
Ensure allocated – bit set
Resets bit
Prevents invalid frees, double frees
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science
34. Randomized Deallocation
00000001 1010 10 metadata
size = 2i+3 2i+4 2i+5
heap
free(ptr):
Ensure object valid – aligned to right address
Ensure allocated – bit set
Resets bit
Prevents invalid frees, double frees
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science
35. Randomized Heaps & Reliability
object size = 2i+3 object size = 2i+4
…
24 5 3 1 6 3
My Mozilla: “malignant” overflow
Objects randomly spread across heap
Different run = different heap
Errors across heaps independent
Your Mozilla: “benign” overflow
…
1 6 3 2 54 1
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science
36. DieHard software architecture
replica1
seed1
input output
replica2
seed2
vote
broadcast
replica3
seed3
execute replicas
(separate
processes)
Replication-based fault-tolerance
Requires randomization: errors independent
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science
37. DieHard Results
Analytical results (pictures!)
Buffer overflows
Uninitialized reads
Dangling pointer errors (the best)
Empirical results
Runtime overhead
Error avoidance
Injected faults & actual applications
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science
38. Analytical Results: Buffer Overflows
Model overflow as write of live data
Heap half full (max occupancy)
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science
39. Analytical Results: Buffer Overflows
Model overflow as write of live data
Heap half full (max occupancy)
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science
40. Analytical Results: Buffer Overflows
Model overflow: random write of live
data
Heap half full (max occupancy)
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science
41. Analytical Results: Buffer Overflows
Replicas: Increase odds of avoiding
overflow in at least one replica
replicas
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science
42. Analytical Results: Buffer Overflows
Replicas: Increase odds of avoiding
overflow in at least one replica
replicas
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science
43. Analytical Results: Buffer Overflows
Replicas: Increase odds of avoiding
overflow in at least one replica
replicas
P(Overflow in all replicas) = (½)3 = 1/8
P(No overflow in > 1 replica) = 1-(½)3 = 7/8
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science
44. Analytical Results: Buffer Overflows
F = free space
H = heap size
N = # objects
worth of
overflow
k = replicas
Overflow one object
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science