In the pipelining introduction we viewed at the pipeline stages
abstractly as steps in the instruction execution. We know take a closer
look at those pipeline stages. The diagram not only depicts the stage
procedure, but is an abstraction of the processor datapath with its
components itself. It indicates the order in which an instruction paths
through the processor. Every instruction passes (or bypasses) the same
components in the same order.
In detail, the 5 pipeline stages align with the 5 main components of the
datapath in the following ways:
Instruction fetch (IF)
This step fetches the next instruction to be executed from the
instruction memory (InstrMem).
Instruction decode and register fetch (ID)
The instruction is decoded and values are read from registers
Reg if needed. We can read two registers at once, therefore
two output wires from the registers.
Execution, memory address computation or branch completion
(EX)
The ALU is operating on up to two (therefore 2 input wires)
operands prepared by the previous stage. It performs one of three
operations depending on the instruction type.
Memory reference
Adding operands to calculate a data memory address.
Arithmetic-logical instruction
Performing an operation specified by the instruction code on the
two values prepared by the previous stage.
Branch
Equal comparison between the two registers prepared by the
previous stage. The output signal determines whether to branch
or not to branch.
Memory access (MEM)
Instructions which write or read access the data memory
(DataMem) and read or write a value to / from the adress
calculated by the ALU. Other instructions bypass the data memory.
Write back to register (WB)
The instruction writes a value, e. g. the result of an addition, to
the destination register (Reg), if needed.
Further, in between the stage components there are additional
components, so called pipeline registers. For now, we see these
as buffers to pass the data from one component to another between
stages without occupying a datapath component used within a stage.
With the 5 pipeline stages above, we can have up to 5 instructions
executing in the same datapath hardware, each in different parts of
the datapath at the same time, moving forward through components every
clock cycle. This resembles the principle of the laundry pipeline: we
need just one component of every type (one washer, one dryer, one
table), we split tasks between those separatable components and keep
every component busy.
As mentioned in the introduction, every component should only occur
once in all stages for optimal parallelization, since we do not want
to have conflicts between different stages trying to access the same
component simultaneously. Indeed, every RISC-V pipeline stage above
only adresses and uses one datapath element, and every datapath
element, except of the registers, is used by only one stage.
Why can registers be accessed in two stages?
The register read has to occur before the execution stage, since we
want to use the values read from registers as operands for the ALU
operation. Analogously, the register write has to take place after
execution, since we often want to store results at the end of an
instruction.
But why can we further have write and read in the same instruction and
especially in different stages?
The answer is simple: the write and read operations only need half a
clock cycle to finish. Instead of leaving the register idle for the
remainder of a clock cycle, we allow the component to be read and
written to in the same cycle: write takes place in the first half of
the cycle, read in the second. This not only allows it to execute
register access parallelized: we can even read the same value, stored
in the first half of the clock cycle, in the second half without
worrying about a conflict.
For example, we can look at the add instruction from the RISC-V
instruction set.
The following instruction adds the values stored at registers x1 and x2
and stores them to x3.
It uses the components and wires of the datapath as specified in the
instruction code:
First, the instruction is loaded from the instruction memory (IF). Then
the instruction is decoded into parts and two values are read from the
registers x1 and x2 (ID). These values are operands for the following
addition in the ALU (EX). Afterwards, the result bypasses the data
memory (MEM) and is stored to the register x3 (WB).
Color code
Every instruction uses certain components of the processor hardware,
highlighted by color. If a wire is colored, it is used to pass between
components. Analogously, if a datapath component is colored, it is
used:
If the left half is colored, it indicates a
write interaction within the stage.
If the right half is colored, it indicates a read interaction
within the stage.
Pipeline registers between the components are always colored, since
they are used regardless of the instruction type.
Here you can look at some more instructions and which path they take
through the datapath.
(The lw instruction is added after solving the exercise below.)
Exercise
Which components and connections of the datapath are used by a lw (load
word) instruction?
Click on the processor components and pipeline registers to toggle them
as active (colored) or inactive (not colored). You can click on the MEM
stage repeatedly to toggle between load, write, bypass.
(If you need help, look at some of the above datapath diagrams again.)
In this chapter we have seen that the RISC-V instruction set was
designed with pipelining in mind. A datapath closely aligned by a
5-stage pipeline separating datapath components allows executing parts
of multiple instructions simultaneously.
In the next chapter, we take a closer look on constraints which
nonetheless can hinder a perfectly pipelined execution even if datapath
and instruction set are closely aligned.