CPU Caches

CPU Caches

See Memory Hierarchy .

Caches data from main memory. There are also special Instruction Caches to cache memory containing instructions. See also: Cache Optimizations

Reasons to caching #

Temporal locality
(The same) addresses recenctly accessed will probably be accessed again in the future
Spatial locality
Addresses near recently accessed addresses will probably also be accessed in the near future

Implementations of caches #

Fully assiciative cache #

  • The address is split up in a “Tag” and word- and byte-select
  • Each cacheline is able to store all memory lines (alignment applies of course)
  • Each cacheline has to be checked for the tag in parallel -> expensive / hard / slow

Direct mapped cache #

  • The address is split into Tag, index, word- and byte-select

  • A decoder determines the unique cacheline for that index -> Super fast access

    -> if unlucky, accesses to frequent data keep colliding in the same cache line and cannot be held both in the cache

Set-associative cache #

  • Have multiple direct mapped chaches that are associative to another, only limited number of cache lines have to be checked.

Cache update stategies #

Write through cache #

  • Every write to the cache is directly propagated to higher levels

  • higher bus bandwidth

  • On write miss:

    • Write Allocation: Fetch first then write trough
    • No Write allocation: no action in the cache

Write back cache #

  • Modifications are stroed in the cache and only propagated to memory if cache line is evicted
  • Uses dirty bit per cache line
  • Reduce bus traffic

Cache Replacement stategies #

  • Which cache line to evict?
    • Random: .. yeah
    • LRU: theoretically goof, but hard to implement in hardware
    • FIFO: somehow based on temporal locality

Reasons of Cache Misses #

Cold miss
Access of data that was never in cache before (nothing can be done to prevent this)
Capacity miss
Memory block was in cache before but had to be evicted due to limited cache size (can be improved with bigger caches)
Conflict miss
Memory block was in cache before but had to be evicted due to conflict (like in a direct mapped cache)

Blocking vs Non-Blocking Cache #

A blocking cache stalls on a miss until the data arrives from higher level cache.

Non blocking caches can still serve memory accesses while another load is waiting for the data.

For non-blocking caches, additional components are necessary:

  1. Miss Status/information Holding Registers (MSHRs)
    • Store information about pending misses.
  2. Fill Buffer
    • Holds fetched data before they are written to the data array

Categories of Cache misses (for non blocking caches) #

Primary miss
the first time a miss occurrs to a memory block
Secondary miss
subsequent misses to the same memory block while the data is still being fetched
Structural-stall miss
a secondary miss that the availiable hardware resources cannot handle

MSHR: implicit #

  • Each MSHR field stores the offset in the cache line
  • multiple outstanding misses for the same word cannot be stored -> only one entry per address possible

MSHR: explicit #

  • Each MSHR field stores the offset in the cache line
  • multiple outstanding misses for the same word can be stored

MSHR: in-cache #

  • Reuse the cache line to store info about pending misses
  • Extra bit: “Transient bit” to distinguish from data in cache
Calendar October 22, 2023