Consider the following code sequence that is executed on a processor that doesnt supports stalls and only supports ALU-ALU forwarding :
I1: lw $ 1, 40($ 6) I2: add $ 6, $ 2, $ 2 I3: sw $ 6, 50($ 1) I4: lw $ 5, -16($ 5) I5: sw $ 5, -16($ 5) I6: add $ 5, $ 5, $ 5
Now the only way to run this code on this processor is to insert nops . The solution is :
I1: lw $ 1, 40($ 6) I2: add $ 6, $ 2, $ 2 I22: nop I3: sw $ 6, 50($ 1) I4: lw $ 5, -16($ 5) I44: nop I45: nop I5: sw $ 5, -16($ 5) I6: add $ 5, $ 5, $ 5
My question is why between the instructions I2 and I3 ( Alu-Store hazards ) we inserted only one nop ? Why one nop is enough here ? According to my understanding since this hazard cant be supported through ALU-ALU forwarding then this is a hazard for this processor which means that the add must write first the result through the WB stage and then the SW instruction reads it from the Register file and therefore two nops are needed since in this case, the instance the add instruction is at its WB stage the SW is at its ID stage and through register file forwarding the SW can read then the register needed .