Compiler support for preventing stall cycles
See also: Register Allocation (in compilers)
Assumption:
| instructions | stall cycles |
|---|---|
| FP-ALU -> Store | 2 cycles |
| FP-ALU -> FP-ALU | 3 cycles |
| Load -> FP-ALU | 1 cycle |
| Load -> Store | 0 cycles |
| Jumps | 1 cycle |
Rescheduling instructions #
for (int i = 1000; i > 0; --i) {
x[i] = x[i]+s;
}
loop:
load f0, 0(r1) ;; f0 = x[i]
add f4, f0, f2 ;; x[i] + s
store f4, 0(r1) ;; x[I] = _
addi r1, r1, -8 ;;
bne r1, r2, loop ;; branch r1!=r2
loop: ;; cycle
load f0, 0(r1) ;; 1
stall ;; 2
add f4, f0, f2 ;; 3
stall ;; 4
stall ;; 5
store f4, 0(r1) ;; 6
addi r1, r1, -8 ;; 7
stall ;; 8
stall ;; 8
bne r1, r2, loop ;; 9
stall ;; 10
A better compilation using better instruction scheduling #
loop:
load f0, 0(r1) ;; 1
addi r1, r1, -8 ;; 2
add f4, f0, f2 ;; 3
stall ;; 4
bne r1, r2, loop ;; 5
store f4, 8(r1) ;; 6
Loop unrolling #
for (int i = 1000; i > 0; --i) {
x[i] = x[i]+s;
}
for (int i = 1000; i > 0; i-=4) {
x[i-0] = x[i-0]+s;
x[i-1] = x[i-1]+s;
x[i-2] = x[i-2]+s;
x[i-3] = x[i-3]+s;
}
Colors are given by Register Allocation :

Figure 1: before reordering

Figure 2: after reordering: no stall after loads: the first add can start after the load block
Software pipelining #

Figure 3: (a) loop unrolling (b) software pipelining
Loops are restuctured that in each loop iteration the different stages of different iterations are handled (less dependencies between instructions) -> only works if no loop dependencies

Figure 4: Horizontal yellow line: an interation of the software pipelined loop

Figure 5: Software pipelined code transformation, startup and winddown clode have been omitted
Software pipelining vs Loop unrolling #
Advantage of Loop unrolling:
- Reduce loop overhead
Advantage of Software pipelining:
- shorter code
- Reduces area of low overlap
Advantage of both:
- Use independent operations from different loop iterations
Loop fusion #
Combine loops to only have to iterate once. Not always possible. Only possible if data dependencies are the same before and after the program transformation.