Data Hazards and stalls

I am studying for my exam tomorrow and I am having difficulty in the below code :

sub $  2, $  1, $  3 and $  12, $  2, $  5 or $  13, $  6, $  2 add $  14, $  2, $  2 sw $  15, 100($  2) 

Due to the ALU-ALU dependency here on Register $ 2 , The sub instruction does not write its result until the fifth stage, meaning that we would have to waste three clock cycles in the pipeline. My question is why 3 clock cycles ? This dependency can be solved by inserting two nops and therefore we are wasting 2 clock cycles ? Please clarify it to me as I am trying to relate the nops to the wasted cycles and I am sure that I have a huge misunderstanding here .