Data Hazards

A data hazard occurs when the pipeline execution must be stalled because one step must wait for another one to complete. This comes up when a planned instruction can not execute in the planned clock cycle because the data needed is not yet available.

If we look at a example from doing the laundry again, a data hazard could be the following: You find a sock without match while you are folding the dry laundry. One strategy could be running to your room to search for a matching sock. While you are doing that, the drying laundry will be ready to be folded, the washed laundry will be ready to be put into the dryer and a new load waits to be put into the washer.

In a processor pipeline, data hazards arise from the dependecies of one instruction on data from an earlier one that is still in the pipeline.

As an example, we want to execute an add instruction and need to store the value in the data memory:

add x3, x1, x2 // a + b
sw x3, 8(x31) // store (a + b)

We can observe, that the sw instruction wants to read the x3 value as an operand for the ALU in CC 3, but the result of the add instruction does not write its result until CC 5.

CC 1 CC 2 CC 3 CC 4 CC 5 CC 6
We already introduced stalling as a workaround for structural hazards; without a solution to data hazards, we also have to stall the pipeline in this case for 2 clock cycles, i. e. delaying the execution of sw until the result of add is written into the register. We can depict those delays by bubbles which indicate the idle state of the pipeline.
CC 1 CC 2 CC 3 CC 4 CC 5 CC 6 CC 7 CC 8

Exercise

To visualize stalls on data hazards, consider you want to perform the operation (a + b) - c on a RISC-V processor and store the value in the data memory. This can be achieved by the following instruction sequence, considering the values of a, b, c are available in registers x1, x2, x3:

add x4, x1, x2 // a + b
sub x5, x4, x3 // (a + b) - c
sw x5, 8(x31) // store (a + b) - c

Insert stalls ("bubbles") where needed by dragging bubbles into the pipeline diagram.
CC 1 CC 2 CC 3 CC 4 CC 5 CC 6 CC 7



Another possible way of solving data hazards is called forwarding.
It is a method that retrieves the missing data from the pipeline registers rather than waiting for it to arrive from (programmer-visible) registers or memory.
We alreay introduced pipeline registers as buffers between stages. In fact, they are written to at the end of every clock cycle and read from in the following clock cycle. Afterwards they are overwritten by new data from the following clock cycle.
Now pipeline registers serve a purpose at resolving hazards: Since they are shared among all datapath components, they can be used as data shortcuts through reading directly from the pipeline registers!

Reconsider our example from above: an add instruction followed by a sw instruction. As we have seen, stalling delays the sw instruction until the sum is written to its destination register.
Through forwarding from pipeline registers, the needed value is available right after CC 3, in which the ALU calculates the sum and buffers it temporarily in the pipeline register. Since the datapath components can access the pipeline registers, the sw instruction can read the needed value as an input for its operation instead of waiting until CC 5.
The line in the diagram illustrates the forwarding connection from pipeline register to ALU.
CC 1 CC 2 CC 3 CC 4 CC 5 CC 6
As we can see, this method is more efficient than stalling, since it makes the 2 stall cycles obsolete. It comes with the need to implement additional control logic for managing the shortcuts between pipeline registers and datapath components, but leads to a siginificant performance increase.

Exercise

Reconsider the instruction sequence from the exercise above. Which forwarding connections would make stalls unnecessary? Draw the lines directly into the pipeline diagram.

CC 1 CC 2 CC 3 CC 4 CC 5 CC 6 CC 7



For certain data hazards, the so called load-use hazards, forwarding does not lead to an optimal solution. For example, during a lw instruction the value from the data memory address is available after the MEM stage. If we want to operate with this value in the next instruction, we have to wait for it to be available.
Let us again consider the first example from above, but now we do not assume the values to be already available in registers. We therefore have to load the values from the data memory first.

lw x1, 0(x31) // load a
lw x2, 8(x31) // load b
add x3, x1, x2 // a + b
sw x3, 8(x31) // store (a + b)

The x2 value is available after the read from data memory in CC 5, so the first opportunity to retrieve it from the pipeline register is after CC 5. Since the ALU operation of the following add instruction would take place in CC 5 already, the ALU operation has to be delayed until CC 6, leading to a bubble regardless of using forwarding.
CC 1 CC 2 CC 3 CC 4 CC 5 CC 6 CC 7 CC 8 CC 9

This example shows the limitations to forwarding. Fortunately, another approach can lead to further reduction of bubbles in some cases. The approach is called reordering and, as the name says, tries to fill potential bubbles caused by hazards through reordering instructions. A logic has to decide whether instructions can be moved around in the pipeline without changing the outcome of the instruction sequence. Indeed, especially if instruction sequences are large, there are often ways to reschedule instructions to enlargen the "distance" of dependent instructions in the pipeline.

Exercise

Consider the example from above which calculates (a + b) - c. This time, the values for a, b and c are not available in the registers, so we need to read all values from the data memory when needed:

lw x1, 0(x31) // load a
lw x2, 8(x31) // load b
add x4, x1, x2 // a + b
lw x3, 16(x31) // load c
sub x5, x4, x3 // (a + b) - c
sw x5, 24(x31) // store (a + b) - c

In this case, reordering can eliminate all potential bubbles. Reorder the sequence without creating hazards or changing the desired outcome.

CC 1 CC 2 CC 3 CC 4 CC 5 CC 6 CC 7 CC 8 CC 9 CC 10



However, reordering is not always possible and needs a rather complex control unit that detects dependencies and reschedules instructions efficiently.
Likewise, overhead is needed for handling control hazards, the last hazard type covered by this course.