Tomasulo Algorithm
Way to implement Out-Of-Order Execution . But not speculative execution.
-
Proposed by R. Tomasulo for IBM 360 (influential) in 1967
-
Each functional unit has a private buffer called reservation station.
-
Reservation stations implement the issue queue
- Instructions and operands
-
As instructions are comming they are forwareded in the allocation stage to a free reservation station
- Independent if the operands are available yet or not
-
reservation station takes over control to start the execution once all operands are available
Architecture #
- Start with instruction queue
- comes from decode (rename?)
- Instructions are In-Order in the queue
- Send instructions from the queue (in order) to a fitting reservation
station (as soon as one is available)
- Here 3 reservation stations in front of the floating point adder
- 2 reservation stations in front of the FB multiplier
- Instructions in the reservation station await their operands
- either from the FB registers
- or info about which reservation station is computing it (forwarding)
- Once a computation was done, the result is put on the “Common data bus”
- From here go to the FP registers or to an reservation station waiting for the operand (forwarding)
Instruction Cycle #
-
Allocation Stage:
- Select free reservation station
- Copy over available operands form the registers into the reservation
sation or store the reservation station that is responsible for
computing the operand (resolving RAW and WAR Data Hazards
)
- RAW is prevented because reservation station will wait until opoerand is available
- WAR is prevented because reservation station will hold a snapshot of the original value, even if register file will be changed again by other instruction
- Registers store the info which reservation station will generate the
next value for them (resolving WAW Data Hazards
)
- It is known which instruction is the last one writing to a register.
- Other instructions writing to that register will actually not be propagated to the register (still land on the bus for forwarding if somebody is waiting)
-
Issue Stage:
- If operands are missing: wait
- Insert new operands in the reservation station
- If all operands are available and functional unit is available start operation
-
Write result:
- Results are broadcasted on the result bus (common data bus)
Example #
-
Reservation Table that stores 3 reservation stations
-
For each reservation station store:
-
Empty
-
InFU (being executed right now)
-
Which operation the FU should execute
-
“Tag” of the destination
-
For each operand
- operand value (if available)
- if operand value is valid
- tag of the reservation station we wait for the operand
-
-
Register file:
- For each register store:
- value
- if value is valid
- the reservation station producing the next value
- For each register store:
Cycle 0 #
- We have fetched and decoded instructions (instruction queue)
- We have some register file
Cycle 1 #
- The first instruction from the instruction queue is inserted into a
reservation station
- All operands are available so insert them also in the reservation station
- Mark
RS 3in the register file for register 1, since RS 3 will produce the next value for register 1 and make register 1 invalid.
Cycle 2 #
- The next instruction (sub) from the instruction queue is inserted into a
reservation station
- All operands are available so they are also inserted
- Register 2 is marked as invalid and next value will come from tag 1
- Execution is started in RS 3 (inFU -> true)
Cycle 3 #
- The next instruction (div) from the instruction queue is inserted into a
reservation station
- Not all operands are ready (R1) so we put the tag for R1 (
RS 3) in the reservation station to wait for it - Register 6 is marked as invalid and next value will come from tag 4
- Not all operands are ready (R1) so we put the tag for R1 (
- Execution is started in RS 1 (inFU -> true)
Cycle 4 #
- RS 1 finishes execution, boradcast own token and result on bus
- The next instruction (add) from the instruction queue is inserted into a
reservation station
- R5 is ready and the operand can be fetched directly, the other operand
R2was finished in RS 1 this cycle so result can be fetched from the bus directly
- R5 is ready and the operand can be fetched directly, the other operand