CPU Caches

See Memory Hierarchy .

Caches data from main memory. There are also special Instruction Caches to cache memory containing instructions. See also: Cache Optimizations

Reasons to caching #

Temporal locality: (The same) addresses recenctly accessed will probably be accessed again in the future
Spatial locality: Addresses near recently accessed addresses will probably also be accessed in the near future

Implementations of caches #

Fully assiciative cache #

The address is split up in a “Tag” and word- and byte-select
Each cacheline is able to store all memory lines (alignment applies of course)
Each cacheline has to be checked for the tag in parallel -> expensive / hard / slow

Direct mapped cache #

The address is split into Tag, index, word- and byte-select
A decoder determines the unique cacheline for that index -> Super fast access

-> if unlucky, accesses to frequent data keep colliding in the same cache line and cannot be held both in the cache

Set-associative cache #

Have multiple direct mapped chaches that are associative to another, only limited number of cache lines have to be checked.

Cache update stategies #

Write through cache #

Every write to the cache is directly propagated to higher levels
higher bus bandwidth
On write miss:
- Write Allocation: Fetch first then write trough
- No Write allocation: no action in the cache

Write back cache #

Modifications are stroed in the cache and only propagated to memory if cache line is evicted
Uses dirty bit per cache line
Reduce bus traffic

Cache Replacement stategies #

Which cache line to evict?
- Random: .. yeah
- LRU: theoretically goof, but hard to implement in hardware
- FIFO: somehow based on temporal locality

Reasons of Cache Misses #

Cold miss: Access of data that was never in cache before (nothing can be done to prevent this)
Capacity miss: Memory block was in cache before but had to be evicted due to limited cache size (can be improved with bigger caches)
Conflict miss: Memory block was in cache before but had to be evicted due to conflict (like in a direct mapped cache)

Blocking vs Non-Blocking Cache #

A blocking cache stalls on a miss until the data arrives from higher level cache.

Non blocking caches can still serve memory accesses while another load is waiting for the data.

For non-blocking caches, additional components are necessary:

Miss Status/information Holding Registers (MSHRs)
- Store information about pending misses.
Fill Buffer
- Holds fetched data before they are written to the data array

Categories of Cache misses (for non blocking caches) #

Primary miss: the first time a miss occurrs to a memory block
Secondary miss: subsequent misses to the same memory block while the data is still being fetched
Structural-stall miss: a secondary miss that the availiable hardware resources cannot handle

MSHR: implicit #

Each MSHR field stores the offset in the cache line
multiple outstanding misses for the same word cannot be stored -> only one entry per address possible

MSHR: explicit #

Each MSHR field stores the offset in the cache line
multiple outstanding misses for the same word can be stored

MSHR: in-cache #

Reuse the cache line to store info about pending misses
Extra bit: “Transient bit” to distinguish from data in cache