- The document discusses direct mapped caches including cache hit/miss terminology and how direct mapped caches work by mapping each memory word to a single cache block based on the memory address.
- It provides an example of a direct mapped cache with 1024KB capacity and 32-bit addresses, showing the cache block format and how an example address would map to a cache block and tag field.
- The document also discusses cache block size being larger than one word to improve cache performance and provides an example with a 4-word cache block.
2. Memory Hierarchy
Registers: very few, very fast
cache memory: small, fast
main memory: large, slow
secondary memory (disk): huge, very slow
Example: http://www.dell.com/content/products/category.aspx/vostrodt?
c=us&cs=04&l=en&s=bsd
Comp Sci 251 -- mem hierarchy 2
3. Memory Hierarchy
Registers: very small, very fast
cache memory: small, fast
Up to Intel® CoreTM 2 Duo E6700 (2.60GHz, 4MB L2 cache,
1066MHz FSB)
main memory: large, slow
Up to 4GB1 Dual-Channel DDR22 SDRAM3 (667MHz)
secondary memory (disk): huge, very slow
Up to 2 hard drives and 1 TB of data
Comp Sci 251 -- mem hierarchy 3
4. Goal: Illusion of Large Fast Memory
Exploit two characteristics of programs:
Temporal locality: programs tend to re-
access recently accessed locations
Spatial locality: programs tend to access
neighbors of recently accessed locations
Comp Sci 251 -- mem hierarchy 4
5. Exploiting Locality
Keep most recently accessed items and their
neighbors in faster memory
We will focus on the cache/main memory
interface
Comp Sci 251 -- mem hierarchy 5
6. Terminology
Cache hit
word we want to access is already in cache
transaction is fast
Cache miss
word we want is not in cache
main memory access is necessary
transaction is slow
Comp Sci 251 -- mem hierarchy 6
7. Hit or Miss?
Two questions on each memory access:
Is the requested word in cache?
Where in cache is it?
Cache block contains:
copy of main memory word(s)
info about where word(s) came from
(Note: cache block can contain >1 memory word)
Comp Sci 251 -- mem hierarchy 7
8. Direct Mapped Cache
Each memory word maps to a single cache
block
“round robin” assignment of words to blocks
(assume one-word blocks for now, byte-
addressable memory)
How do we tell which block a word maps to?
number of cache blocks is a power of 2
Comp Sci 251 -- mem hierarchy 8
9. Direct Mapped Cache
Memory Memory Address data
Cache
Memory address data
Comp Sci 251 -- mem hierarchy 9
10. Direct Mapped Cache
Cache
Main
Memory Memory
Index Index
Index0 Index0
Index1 Index1
Index2 Index2
Index3 Index3
Index4
Index5
Index6
Index7
Index8
Index9
Comp Sci 251 -- mem hierarchy 10
11. Direct Mapped Cache
If cache has 2n blocks, cache block index = n bits
main memory address: Tag cache index 00
leftover bits are stored in cache as a tag
Cache block format:
Cache index V Tag Data
Comp Sci 251 -- mem hierarchy 11
12. Cache Read Operation
Look up cache block indexed by low n bits of
address
Compare Tag in cache with Tag bits of
address
if valid and match: HIT
read from cache
mismatch or invalid: MISS
new word + Tag is brought into cache
Comp Sci 251 -- mem hierarchy 12
13. Cache Write Operation
(Problem: main memory and cache can become
inconsistent)
Look up cache block indexed by low n bits of
address
Compare Tag with high bits of address
if valid and match: HIT
write cache and main memory (write-through policy)
invalid or mismatch: MISS
write main memory
overwrite cache with new word + Tag
Comp Sci 251 -- mem hierarchy 13
14. Exercise
Byte-addressable main memory
32-bit main memory addresses
1024KB-capacity cache; one word per cache block
256K = 218 cache blocks
Show cache block format & address decomposition
Access memory address 0x0040006c
Which cache block?
What tag bits?
Comp Sci 251 -- mem hierarchy 14
15. Cache Block Size
A cache block may contain > 1 main memory
word (why is this a good idea?)
Example:
byte-addressable memory
4-word cache block
Address: | block address |offset|00
|tag|cache index|offset|00
Comp Sci 251 -- mem hierarchy 15
16. Exercise
Byte-addressable main memory
32-bit main memory addresses
1024KB-capacity cache; four words per cache block
64K=216 cache blocks
Show cache block format & address decomposition
Access memory address 0x0040006c
Which cache block?
What tag bits?
Comp Sci 251 -- mem hierarchy 16
17. Given main memory of 256 bytes, a memory block is one word, cache size
(number of blocks ) is 8
What is the total size of the cache
Address (binary) Contents (Hex) in bits?
000000 aa bb cc dd
Memory address is m bytes long
000100 00 11 00 33
256 bytes = 2m bytes → m = ?
001000 ff ee 01 23
001100 45 67 89 0a
010000 bc de f0 1a You need n bits to address 8
010100 2a 3a 4a 5a blocks in cache. n = ?
011000 6a 7a 8a 9a
011100 1b 2b 3b 4b
100000 b2 b3 b4 b5 You need b bits to address 4 bytes
100100 c1 c2 c3 c4 in a word. b = ?
101000 d1 d2 d3 d4
101100 e1 e2 e3 e4
110000 f1 f2 f3 f4 Size of tag bits = t = m - n - b
110100 a1 a2 a3 a4 Size of cache = (1+ t + 32) x 8
111000 2c 3c 4c 5c
111100 2d 3d 4d 5d
Comp Sci 251 -- mem hierarchy 17
18. Given main memory of 256 bytes, a memory block is 2 words, cache size is 8
Address (binary) Contents (Hex) What is the total size of the
000000 aa bb cc dd cache in bits?
000100 00 11 00 33 Memory address is m bytes long
001000 ff ee 01 23 256 bytes = 2m bytes → m = ?
001100 45 67 89 0a
010000 bc de f0 1a You need n bits to address 8
010100 2a 3a 4a 5a blocks in cache. n = ?
011000 6a 7a 8a 9a
011100 1b 2b 3b 4b You need b bits to address the
100000 b2 b3 b4 b5 bytes in 2 words. b = ?
100100 c1 c2 c3 c4
101000 d1 d2 d3 d4
Size of tag bits = t = m - n – b
101100 e1 e2 e3 e4
Size of cache = (1+ t + 64) x 8
110000 f1 f2 f3 f4
110100 a1 a2 a3 a4
111000 2c 3c 4c 5c
111100 2d 3d 4d 5d
Comp Sci 251 -- mem hierarchy 18
19. Suppose a memory address is 32 bits, a memory block is 8 words and the
cache size is 16K blocks. What is the total size of the cache in bytes?
32-bit Memory address
Cache
v Tag Data in 8-word block
16K blocks
Comp Sci 251 -- mem hierarchy 19
Notas del editor
In personal computers , the Front Side Bus ( FSB ) is the bus that carries data between the CPU and the northbridge.
Multi-level caches Another issue is the fundamental tradeoff between cache latency and hit rate. Larger caches have better hit rates but longer latency. To address this tradeoff, many computers use multiple levels of cache, with small fast caches backed up by larger slower caches. Multi-level caches generally operate by checking the smallest Level 1 (L1) cache first; if it hits, the processor proceeds at high speed. If the smaller cache misses, the next larger cache (L2) is checked, and so on, before external memory is checked. As the latency difference between main memory and the fastest cache has become larger, some processors have begun to utilize as many as three levels of on-chip cache. For example, in 2003, Itanium 2 began shipping with a 6 MiB unified level 3 (L3) cache on-chip. The IBM Power 4 series has a 256 MiB L3 cache off chip, shared among several processors. The new AMD Phenom series of chips carries a 2MB on die L3 cache. [edit] Exclusive versus inclusive Multi-level caches introduce new design decisions. For instance, in some processors, all data in the L1 cache must also be somewhere in the L2 cache. These caches are called strictly inclusive . Other processors (like the AMD Athlon) have exclusive caches — data is guaranteed to be in at most one of the L1 and L2 caches, never in both. Still other processors (like the Intel Pentium II, III, and 4), do not require that data in the L1 cache also reside in the L2 cache, although it may often do so. There is no universally accepted name for this intermediate policy, although the term mainly inclusive has been used.
One word per cache block → 4 bytes/block 1024K bytes / 4 bytes/block = 256 K blocks = 2 8 x 2 10 blocks = 2 18 blocks 18 bits needed to access 2 18 blocks Need 2 bits to access a block of 1 word Cache index: 18 bits Tag size 12 bits 0x0040006c = 0000 0000 0100 0000 0000 0000 0110 11 00 Tag bits Block addr
4 words per cache block → 4 x 4 bytes/block 1024K bytes / (4 x 4) bytes/block = 256/4 K blocks = 64K blocks = 2 6 x 2 10 blocks = 2 16 blocks 16 bits needed to access 2 16 blocks Need 2 bits to access a word in a block; 2 bits in a word → need 4 bits to access a block Cache index: 16 bits Tag size 12 bits 0x0040006c = 0000 0000 0100 0000 0000 0000 0110 1100 Tag address Block addr w b
What is the total size of the cache in bits? Memory address is m bytes long 256 bytes = 2 m bytes → m = 3 You need n bits to address 8 blocks in cache. n = 3 You need b bits to address 4 bytes in a word. b = 2 Size of tag bits = t = m - n – b = 8 – 3 – 2 = 3 Size of cache = (1+ t + 32) x 8 = (1 + 3 + 32) x 8
What is the total size of the cache in bits? Memory address is m bytes long 256 bytes = 2 m bytes → m = 8 You need n bits to address 8 blocks in cache. n = 3 You need b bits to address the bytes in 2 words. b = 3 Size of tag bits = t = m - n – b = 8 – 3 – 3 = 2 Size of cache = (1+ t + 64) x 8 = (1 + 2 + 64) x 8
16K blocks = 2 4 x 2 10 blocks = 2 14 blocks → need 14 bits for cache index Offset bits = 3 Offset in a word = 2 Size of tag bits = 32 – 14 - 3 – 2 = 13 Size of cache memory = ( valid + tag + memoryBlock) x cacheSize = ( 1 + 13 + 8x32) x 2 14 = 4320 K bits = 540 KB