Operating Systems - Dynamic Memory Management

Operating Systems
CMPSCI 377
Dynamic Memory Management
Emery Berger
University of Massachusetts Amherst

UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science

Dynamic Memory Management
How the heap manager is implemented


malloc, free


new, delete


UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 2

Memory Management
Ideal memory manager:


Fast


Raw time, asymptotic runtime, locality


Memory efficient


Low fragmentation


With multicore & multiprocessors:


Scalable to multiple processors


New issues:


Secure from attack


Reliable in face of errors



Memory Manager Functions
Not just malloc/free


realloc


Change size of object, copying old contents


ptr = realloc (ptr, 10);


But: realloc(ptr, 0) = ?


How about: realloc (NULL, 16) ?


Other fun


calloc


memalign


Needs ability to locate size & object start



Fragmentation
Intuitively, fragmentation stems from


“breaking” up heap into unusable spaces
More fragmentation = worse utilization of


memory
External fragmentation


Wasted space outside allocated objects


Internal fragmentation


Wasted space inside an object



Classical Algorithms
First-fit


find first chunk of desired size



Best-fit


find chunk that fits best


Minimizes wasted space



Worst-fit


find chunk that fits worst


then split object


Reclaim space: coalesce free adjacent

objects into one big object


Implementation Techniques
Freelists


Linked lists of objects in same size class


Range of object sizes


First-fit, best-fit in this context



Segregated size classes


Use free lists, but never coalesce or split


Choice of size classes


Exact


Powers-of-two



Big Bag of Pages (BiBOP)


Page or pages (multiples of 4K)


Usually segregated size classes


Header contains metadata


Locate with bitmasking


Limits external fragmentation


Can be very fast



Runtime Analysis
Key components


Cost of malloc (best, worst, average)


Cost of free


Cost of size lookup (for realloc & free)


Examine for first-fit, best-fit, segregated


(with BiBOP)


Space Bounds
Fragmentation worst-case for “optimal”:


O(log M/m)
M = largest object size


m = smallest object size


Best-fit = O(M * m) !


Goal: perform well for typical programs


Considerations:


Internal fragmentation


External fragmentation


Headers (metadata)



Performance Issues
We’ll talk about scalability later


Reliability, too


But: general-purpose allocator often seen


as too slow


Custom Memory Allocation
Programmers replace Very common
 

new/delete, bypassing practice
system allocator Apache, gcc, lcc, STL,


database servers…
Reduce runtime – often


Language-level
Expand functionality – 

support in C++
sometimes
Widely
Reduce space – rarely 

recommended

“Use custom
allocators”


Drawbacks of Custom Allocators

Avoiding system allocator:


More code to maintain & debug


Can’t use memory debuggers


Not modular or robust:


Mix memory from custom


and general-purpose allocators → crash!
Increased burden on programmers


Are custom allocators really a win?


(1) Per-Class Allocators

Recycle freed objects from a free list


a = new Class1; Class1
Fast
free list
b = new Class1; +
c = new Class1; Linked list operations
+
a
delete a; Simple
+
delete b;
Identical semantics
b +
delete c;
C++ language support
+
a = new Class1;
c
Possibly space-inefficient
b = new Class1; -
c = new Class1;


(II) Custom Patterns
Tailor-made to fit allocation patterns


Example: 197.parser (natural language


parser)

db
a c
char[MEMORY_LIMIT]

end_of_array
end_of_array
end_of_array
end_of_array
end_of_array
a = xalloc(8); Fast
+
b = xalloc(16); Pointer-bumping allocation
+
c = xalloc(8);
- Brittle
xfree(b);
- Fixed memory size
xfree(c);
- Requires stack-like lifetimes
d = xalloc(8);

(III) Regions

Separate areas, deletion only en masse


regioncreate(r) r
regionmalloc(r, sz)
regiondelete(r)
- Risky
Fast
+

- Dangling
Pointer-bumping allocation
+

references
Deletion of chunks
+

- Too much space
Convenient
+

One call frees all memory
+

Increasingly popular custom allocator


Custom Allocators Are Faster…

Runtime - Custom Allocator Benchmarks

Custom Win32

1.75
Normalized Runtime

non-regions regions
1.5
1.25
1
0.75
0.5
0.25
0
r

he
er

lle
ze
m

c

c
vp

gc

lc
rs

si

ud
ac
ee

5.

6.
d-
pa

m
17

ap
br

17
xe
7.

c-
bo
19

As good as and sometimes much faster than Win32



Not So Fast…

Runtime - Custom Allocator Benchmarks
Custom Win32 DLmalloc

1.75
non-regions regions
Normalized Runtime

1.5
1.25
1
0.75
0.5
0.25
0

lle
e

r

he

c
r

m

c
vp
e

lc
z

gc
si

ud
rs

ee

ac
5.
d-

6.
pa

m
br

17

ap
17
xe
7.

c-
bo
19

DLmalloc: as fast or faster for most benchmarks


The Lea Allocator (DLmalloc 2.7.0)
Mature public-domain general-purpose

allocator
Optimized for common allocation patterns


Per-size quicklists ≈ per-class allocation


Deferred coalescing

(combining adjacent free objects)
Highly-optimized fastpath


Space-efficient



Space Consumption: Mixed Results

Space - Custom Allocator Benchmarks

Custom DLmalloc

1.75
non-regions regions
Normalized Space

1.5
1.25
1
0.75
0.5
0.25
0

lle
e

r

he

c
r

sim

c
vp
e

lc
z

gc

ud
rs

ee

ac
5.
d-

6.
pa

m
br

17

ap
17
xe
7.

c-
bo
19


Custom Allocators?
Generally not worth the trouble:


use good general-purpose allocator
Avoids risky software engineering errors



Problems with Unsafe Languages
C, C++: pervasive apps, but langs.


memory unsafe
Numerous opportunities for security


vulnerabilities, errors
Double free


Invalid free


Uninitialized reads


Dangling pointers


Buffer overflows (stack & heap)



Soundness for “Erroneous” Programs

Normally: memory errors ) ? …


Consider infinite-heap allocator:


All news fresh;


ignore delete
No dangling pointers, invalid frees,


double frees
Every object infinitely large


No buffer overflows, data overwrites


Transparent to correct program


“Erroneous” programs sound



Probabilistic Memory Safety

Approximate with M-heaps (e.g., M=2)

DieHard: fully-randomized M-heap


Increases odds of benign errors


Probabilistic memory safety


i.e., P(no error) n


Errors independent across heaps


E(users with no error) n * |users|


? Efficient implementation…


Implementation Choices

Conventional, freelist-based heaps


Hard to randomize, protect from errors


Double frees, heap corruption


What about bitmaps? [Wilson90]


– Catastrophic fragmentation
Each small object likely to occupy one page


obj obj obj obj

pages


Randomized Heap Layout
00000001 1010 10 metadata
size = 2i+3 2i+4 2i+5

heap

Bitmap-based, segregated size classes


Bit represents one object of given size


i.e., one bit = 2i+3 bytes, etc.


Prevents fragmentation



Randomized Allocation
00000001 1010 10 metadata
size = 2i+3 2i+4 2i+5

heap

malloc(8):
compute size class = ceil(log2 sz) – 3


randomly probe bitmap for zero-bit (free)


Fast: runtime O(1)


M=2 – E[# of probes] · 2



Randomized Allocation
00010001 1010 10 metadata
size = 2i+3 2i+4 2i+5

heap

malloc(8):
compute size class = ceil(log2 sz) – 3


randomly probe bitmap for zero-bit (free)


Fast: runtime O(1)


M=2 – E[# of probes] · 2



Randomized Deallocation
00010001 1010 10 metadata
size = 2i+3 2i+4 2i+5

heap

free(ptr):


Ensure object valid – aligned to right address


Ensure allocated – bit set


Resets bit


Prevents invalid frees, double frees



Randomized Deallocation
00000001 1010 10 metadata
size = 2i+3 2i+4 2i+5

heap

free(ptr):


Ensure object valid – aligned to right address


Ensure allocated – bit set


Resets bit


Prevents invalid frees, double frees



Randomized Heaps & Reliability
object size = 2i+3 object size = 2i+4
…
24 5 3 1 6 3

My Mozilla: “malignant” overflow

Objects randomly spread across heap


Different run = different heap

Errors across heaps independent


Your Mozilla: “benign” overflow

…
1 6 3 2 54 1


DieHard software architecture

replica1
seed1

input output
replica2
seed2

vote
broadcast
replica3
seed3

execute replicas
(separate
processes)

Replication-based fault-tolerance

Requires randomization: errors independent



DieHard Results

Analytical results (pictures!)


Buffer overflows


Uninitialized reads


Dangling pointer errors (the best)


Empirical results


Runtime overhead


Error avoidance


Injected faults & actual applications



Analytical Results: Buffer Overflows

Model overflow as write of live data


Heap half full (max occupancy)




Model overflow: random write of live


data
Heap half full (max occupancy)




Replicas: Increase odds of avoiding


overflow in at least one replica
replicas



Replicas: Increase odds of avoiding


overflow in at least one replica
replicas

P(Overflow in all replicas) = (½)3 = 1/8


P(No overflow in > 1 replica) = 1-(½)3 = 7/8




F = free space


H = heap size

N = # objects

worth of
overflow
k = replicas


Overflow one object



Empirical Results: Runtime


Empirical Results: Error Avoidance
Injected faults:


Dangling pointers (@50%, 10 allocations)


glibc: crashes; DieHard: 9/10 correct


Overflows (@1%, 4 bytes over) –


glibc: crashes 9/10, inf loop; DieHard: 10/10 correct


Real faults:


Avoids Squid web cache overflow


Crashes BDW & glibc


Avoids dangling pointer error in Mozilla


DoS in glibc & Windows



The End


Operating Systems - Dynamic Memory Management

Recomendados

Recomendados

Más contenido relacionado

Similar a Operating Systems - Dynamic Memory Management

Similar a Operating Systems - Dynamic Memory Management (20)

Más de Emery Berger

Más de Emery Berger (20)

Último

Último (20)

Operating Systems - Dynamic Memory Management