A data hazard occurs when the pipeline execution must be stalled
because one step must wait for another one to complete. This comes up
when a planned instruction can not execute in the planned clock cycle
because the data needed is not yet available.
If we look at a example from doing the laundry again, a data hazard
could be the following: You find a sock without match while you are
folding the dry laundry. One strategy could be running to your room to
search for a matching sock. While you are doing that, the drying
laundry will be ready to be folded, the washed laundry will be ready
to be put into the dryer and a new load waits to be put into the
washer.
In a processor pipeline, data hazards arise from the dependecies of
one instruction on data from an earlier one that is still in the
pipeline.
As an example, we want to execute an add instruction and need to store
the value in the data memory:
add x3, x1, x2 // a + b
sw x3, 8(x31) // store (a + b)
We can observe, that the sw instruction wants to read the x3 value as
an operand for the ALU in CC 3, but the result of the add instruction
does not write its result until CC 5.
CC 1
CC 2
CC 3
CC 4
CC 5
CC 6
We already introduced stalling as a workaround for structural
hazards; without a solution to data hazards, we also have to stall the
pipeline in this case for 2 clock cycles, i. e. delaying the execution
of sw until the result of add is written into the register. We can
depict those delays by bubbles which indicate the idle state of
the pipeline.
CC 1
CC 2
CC 3
CC 4
CC 5
CC 6
CC 7
CC 8
Exercise
To visualize stalls on data hazards, consider you want to perform the
operation (a + b) - c on a RISC-V processor and store the value
in the data memory. This can be achieved by the following instruction
sequence, considering the values of a, b, c are available in registers
x1, x2, x3:
add x4, x1, x2 // a + b
sub x5, x4, x3 // (a + b) - c
sw x5, 8(x31) // store (a + b) - c
Insert stalls ("bubbles") where needed by dragging bubbles into the
pipeline diagram.
CC 1
CC 2
CC 3
CC 4
CC 5
CC 6
CC 7
Another possible way of solving data hazards is called
forwarding.
It is a method that retrieves the missing data from the
pipeline registers rather than waiting for it to arrive from
(programmer-visible) registers or memory.
We alreay introduced pipeline registers as buffers between stages. In
fact, they are written to at the end of every clock cycle and read from
in the following clock cycle. Afterwards they are overwritten by new
data from the following clock cycle.
Now pipeline registers serve a purpose at resolving hazards: Since they
are shared among all datapath components, they can be used as data
shortcuts through reading directly from the pipeline registers!
Reconsider our example from above: an add instruction followed by a sw
instruction. As we have seen, stalling delays the sw instruction until
the sum is written to its destination register.
Through forwarding from pipeline registers, the needed value is
available right after CC 3, in which the ALU calculates the sum and
buffers it temporarily in the pipeline register. Since the datapath
components can access the pipeline registers, the sw instruction can
read the needed value as an input for its operation instead of waiting
until CC 5.
The line in the diagram illustrates the forwarding connection from
pipeline register to ALU.
CC 1
CC 2
CC 3
CC 4
CC 5
CC 6
As we can see, this method is more efficient than stalling, since it
makes the 2 stall cycles obsolete. It comes with the need to implement
additional control logic for managing the shortcuts between pipeline
registers and datapath components, but leads to a siginificant
performance increase.
Exercise
Reconsider the instruction sequence from the exercise above. Which
forwarding connections would make stalls unnecessary? Draw the lines
directly into the pipeline diagram.
CC 1
CC 2
CC 3
CC 4
CC 5
CC 6
CC 7
For certain data hazards, the so called load-use hazards,
forwarding does not lead to an optimal solution. For example, during a
lw instruction the value from the data memory address is available after
the MEM stage. If we want to operate with this value in the next
instruction, we have to wait for it to be available.
Let us again consider the first example from above, but now we do not
assume the values to be already available in registers. We therefore
have to load the values from the data memory first.
lw x1, 0(x31) // load a
lw x2, 8(x31) // load b
add x3, x1, x2 // a + b
sw x3, 8(x31) // store (a + b)
The x2 value is available after the read from data memory in CC 5, so
the first opportunity to retrieve it from the pipeline register is after
CC 5. Since the ALU operation of the following add instruction would
take place in CC 5 already, the ALU operation has to be delayed until CC
6, leading to a bubble regardless of using forwarding.
CC 1
CC 2
CC 3
CC 4
CC 5
CC 6
CC 7
CC 8
CC 9
This example shows the limitations to forwarding. Fortunately, another
approach can lead to further reduction of bubbles in some cases. The
approach is called reordering and, as the name says, tries to
fill potential bubbles caused by hazards through reordering
instructions. A logic has to decide whether instructions can be moved
around in the pipeline without changing the outcome of the instruction
sequence. Indeed, especially if instruction sequences are large, there
are often ways to reschedule instructions to enlargen the "distance"
of dependent instructions in the pipeline.
Exercise
Consider the example from above which calculates (a + b) - c.
This time, the values for a, b and c are not available in the registers,
so we need to read all values from the data memory when needed:
lw x1, 0(x31) // load a
lw x2, 8(x31) // load b
add x4, x1, x2 // a + b
lw x3, 16(x31) // load c
sub x5, x4, x3 // (a + b) - c
sw x5, 24(x31) // store (a + b) - c
In this case, reordering can eliminate all potential bubbles. Reorder
the sequence without creating hazards or changing the desired outcome.
CC 1
CC 2
CC 3
CC 4
CC 5
CC 6
CC 7
CC 8
CC 9
CC 10
However, reordering is not always possible and needs a rather complex
control unit that detects dependencies and reschedules instructions
efficiently.
Likewise, overhead is needed for handling control hazards, the last
hazard type covered by this course.