Datapath

To go into detail on pipelining, we from here on use a subset of the RISC-V instruction set, containing the following 7 instructions:

If you are not familiar with these RISC-V instructions and what they do, look it up before continuing.

Pipeline Stages

In the pipelining introduction we viewed at the pipeline stages abstractly as steps in the instruction execution. We know take a closer look at those pipeline stages. The diagram not only depicts the stage procedure, but is an abstraction of the processor datapath with its components itself. It indicates the order in which an instruction paths through the processor. Every instruction passes (or bypasses) the same components in the same order.
In detail, the 5 pipeline stages align with the 5 main components of the datapath in the following ways:

  1. Instruction fetch (IF)
    This step fetches the next instruction to be executed from the instruction memory (InstrMem).

  2. Instruction decode and register fetch (ID)
    The instruction is decoded and values are read from registers Reg if needed. We can read two registers at once, therefore two output wires from the registers.

  3. Execution, memory address computation or branch completion (EX)
    The ALU is operating on up to two (therefore 2 input wires) operands prepared by the previous stage. It performs one of three operations depending on the instruction type.
    1. Memory reference
      Adding operands to calculate a data memory address.
    2. Arithmetic-logical instruction
      Performing an operation specified by the instruction code on the two values prepared by the previous stage.
    3. Branch
      Equal comparison between the two registers prepared by the previous stage. The output signal determines whether to branch or not to branch.

  4. Memory access (MEM)
    Instructions which write or read access the data memory (DataMem) and read or write a value to / from the adress calculated by the ALU. Other instructions bypass the data memory.

  5. Write back to register (WB)
    The instruction writes a value, e. g. the result of an addition, to the destination register (Reg), if needed.

Further, in between the stage components there are additional components, so called pipeline registers. For now, we see these as buffers to pass the data from one component to another between stages without occupying a datapath component used within a stage.
With the 5 pipeline stages above, we can have up to 5 instructions executing in the same datapath hardware, each in different parts of the datapath at the same time, moving forward through components every clock cycle. This resembles the principle of the laundry pipeline: we need just one component of every type (one washer, one dryer, one table), we split tasks between those separatable components and keep every component busy.

As mentioned in the introduction, every component should only occur once in all stages for optimal parallelization, since we do not want to have conflicts between different stages trying to access the same component simultaneously. Indeed, every RISC-V pipeline stage above only adresses and uses one datapath element, and every datapath element, except of the registers, is used by only one stage.

Why can registers be accessed in two stages?
The register read has to occur before the execution stage, since we want to use the values read from registers as operands for the ALU operation. Analogously, the register write has to take place after execution, since we often want to store results at the end of an instruction.
But why can we further have write and read in the same instruction and especially in different stages?
The answer is simple: the write and read operations only need half a clock cycle to finish. Instead of leaving the register idle for the remainder of a clock cycle, we allow the component to be read and written to in the same cycle: write takes place in the first half of the cycle, read in the second. This not only allows it to execute register access parallelized: we can even read the same value, stored in the first half of the clock cycle, in the second half without worrying about a conflict.

For example, we can look at the add instruction from the RISC-V instruction set.
The following instruction adds the values stored at registers x1 and x2 and stores them to x3.
It uses the components and wires of the datapath as specified in the instruction code:
First, the instruction is loaded from the instruction memory (IF). Then the instruction is decoded into parts and two values are read from the registers x1 and x2 (ID). These values are operands for the following addition in the ALU (EX). Afterwards, the result bypasses the data memory (MEM) and is stored to the register x3 (WB).
Color code
Every instruction uses certain components of the processor hardware, highlighted by color. If a wire is colored, it is used to pass between components. Analogously, if a datapath component is colored, it is used:
If the left half is colored, it indicates a write interaction within the stage.
If the right half is colored, it indicates a read interaction within the stage.
Pipeline registers between the components are always colored, since they are used regardless of the instruction type.
Here you can look at some more instructions and which path they take through the datapath.
(The lw instruction is added after solving the exercise below.)

Exercise

Which components and connections of the datapath are used by a lw (load word) instruction?
Click on the processor components and pipeline registers to toggle them as active (colored) or inactive (not colored). You can click on the MEM stage repeatedly to toggle between load, write, bypass.




(If you need help, look at some of the above datapath diagrams again.)
In this chapter we have seen that the RISC-V instruction set was designed with pipelining in mind. A datapath closely aligned by a 5-stage pipeline separating datapath components allows executing parts of multiple instructions simultaneously.
In the next chapter, we take a closer look on constraints which nonetheless can hinder a perfectly pipelined execution even if datapath and instruction set are closely aligned.