Pipelining is a method of implementing and arranging data processing
elements to allow the execution of multiple instructions concurrently
by overlapping their execution. This requires thought in the design of
hardware components and how they should be connected to enable
pipeling efficiently.
If implemented succesfully, this improves efficiency, performance, and
time consumption by breaking down tasks into smaller, independent
stages.
To show the motivation for pipelining, imagine a routine like doing laundry. We can abstract 4 steps for processing a load of dirty laundry.
A simple approach would be to run through all steps, then repeat with
a new load of clothes. This routine also takes a fixed time, because
all stages can only proceed if the one before is completed.
However, if you have multiple loads of laundry at the same time, this
can take a long time. And actually, there is room for improvement of
the strategy above. If we observe the steps above, every component
(washer, dryer, table) is used in only one step and is idle in 3
steps.
If we put a new load of clothes in the washer instantly after we
unloaded it and move every load one step further, we can fill the idle
time of every step and speed up the overall routine. And this without
needing more machines or people!
What we achieve with this scheduling is improving the
throughput of the routine by starting the routine another time
if it is possible to initiate the first step again.
But we can observe some aspects of our pipeline to be
considered:
We go into detail on what is happening in each of the stages in a later chapter.
By having those 5 stages, we can now take a closer look on how the pipeline works and show that this method speeds up the instruction execution just like it speeds up doing laundry.800 ps | 800 ps | 800 ps | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Now we take a closer look at the same 3 instructions in a
pipelined processor design. We want to overlap the instruction
execution by starting the next instruction as soon as the previous one
finished the first stage, which is equal to "leaving" the first
resource, which would leave it idle in a sequential execution.
Any split into stages has to allow the slowest instruction part to
always finish. The most convenient way is to define a uniform stage
execution time that corresponds to the longest a stage can take. This
is then called a clock cycle (CC) and serves as the unit of
time in which we can safely assume one stage to be finished and the
resource to be freed for the next instruction. In our case, we assume
the slowest stage to take 200 ps, therefore a clock cycle takes 200
ps. One instruction takes 5 clock cycles or 1000 ps in total to
finish, since it is composed of 5 stages. Note that this is longer
than the sequential instruction execution time of 800 ps.
With these assumptions made, we can now take a closer look at the
pipeline performance. Here, the time between the start of the first
and the end of the third instruction is just 7 × 200 = 1400 ps.
This is nearly half of the time needed by the sequential approach
(2400 ps), regardless of the longer execution time for a single
instruction!
For general estimations, the pipelining speed-up can be turned into a
formula. If all stages take the same amount of time, the formula is
roughly equal to:
time between instructionspipelined = time between instructionsnon-pipelined / number of pipeline stages
This implies that, if pipelining a large amount of instructions, we can achieve a speed-up of nearly 4x, as an instruction is completed every 200 ps instead of every 800 ps!200 ps | 200 ps | 200 ps | 200 ps | 200 ps | 200 ps | 200 ps |
---|---|---|---|---|---|---|
Overall, pipelining enhances the performance by increasing the
instruction throughput, in contrast to decreasing the execution time
of a single instruction as achievable by sequential execution. As we
have shown, increasing instruction throughput is more important in the
long run because of the amount of instructions programs execute.
However, often there are constraints, e. g. dependencies between
instructions, that do not allow perfect "stacking" of instructions.
Pipeline design decisions and strategies to handle these constraints
are described in the following chapters.