Tomasulo Algorithm

Tomasulo Algorithm

Way to implement Out-Of-Order Execution . But not speculative execution.

  • Proposed by R. Tomasulo for IBM 360 (influential) in 1967

  • Each functional unit has a private buffer called reservation station.

  • Reservation stations implement the issue queue

    • Instructions and operands
  • As instructions are comming they are forwareded in the allocation stage to a free reservation station

    • Independent if the operands are available yet or not
  • reservation station takes over control to start the execution once all operands are available

Architecture #

  1. Start with instruction queue
    • comes from decode (rename?)
    • Instructions are In-Order in the queue
  2. Send instructions from the queue (in order) to a fitting reservation station (as soon as one is available)
    • Here 3 reservation stations in front of the floating point adder
    • 2 reservation stations in front of the FB multiplier
  3. Instructions in the reservation station await their operands
    • either from the FB registers
    • or info about which reservation station is computing it (forwarding)
  4. Once a computation was done, the result is put on the “Common data bus”
    • From here go to the FP registers or to an reservation station waiting for the operand (forwarding)

Instruction Cycle #

  • Allocation Stage:

    • Select free reservation station
    • Copy over available operands form the registers into the reservation sation or store the reservation station that is responsible for computing the operand (resolving RAW and WAR Data Hazards )
      • RAW is prevented because reservation station will wait until opoerand is available
      • WAR is prevented because reservation station will hold a snapshot of the original value, even if register file will be changed again by other instruction
    • Registers store the info which reservation station will generate the next value for them (resolving WAW Data Hazards )
      • It is known which instruction is the last one writing to a register.
      • Other instructions writing to that register will actually not be propagated to the register (still land on the bus for forwarding if somebody is waiting)
  • Issue Stage:

    • If operands are missing: wait
    • Insert new operands in the reservation station
    • If all operands are available and functional unit is available start operation
  • Write result:

    • Results are broadcasted on the result bus (common data bus)

Example #

  • Reservation Table that stores 3 reservation stations

  • For each reservation station store:

    • Empty

    • InFU (being executed right now)

    • Which operation the FU should execute

    • “Tag” of the destination

    • For each operand

      • operand value (if available)
      • if operand value is valid
      • tag of the reservation station we wait for the operand
  • Register file:

    • For each register store:
      • value
      • if value is valid
      • the reservation station producing the next value

Cycle 0 #

  • We have fetched and decoded instructions (instruction queue)
  • We have some register file

Cycle 1 #

  • The first instruction from the instruction queue is inserted into a reservation station
    • All operands are available so insert them also in the reservation station
    • Mark RS 3 in the register file for register 1, since RS 3 will produce the next value for register 1 and make register 1 invalid.

Cycle 2 #

  • The next instruction (sub) from the instruction queue is inserted into a reservation station
    • All operands are available so they are also inserted
    • Register 2 is marked as invalid and next value will come from tag 1
  • Execution is started in RS 3 (inFU -> true)

Cycle 3 #

  • The next instruction (div) from the instruction queue is inserted into a reservation station
    • Not all operands are ready (R1) so we put the tag for R1 (RS 3) in the reservation station to wait for it
    • Register 6 is marked as invalid and next value will come from tag 4
  • Execution is started in RS 1 (inFU -> true)

Cycle 4 #

  • RS 1 finishes execution, boradcast own token and result on bus
  • The next instruction (add) from the instruction queue is inserted into a reservation station
    • R5 is ready and the operand can be fetched directly, the other operand R2 was finished in RS 1 this cycle so result can be fetched from the bus directly
Calendar October 22, 2023