Instruction-Level Parallelism

6 min readDec 6, 2022

Since 1985, almost all processors have used pipelining to spread out the execution of instructions and boost speed. Instruction level parallelism is the term used to describe this possible instruction overlap.

Processing several instructions concurrently is known as instruction level parallelism.

Parallelism at the Instructional Level: Concepts and Problems

In order to increase system speed, instruction-level parallelism (ILP) uses the pipeline idea to potentially overlap the execution of instructions. Reduces the effect of data, controls risks, and improves processor ability to use parallelism are some of the approaches used to enhance the amount of parallelism.

There are two ways to take use of ILP.

1. Software-Dependent Static Technique

2. Hardware-dependent Dynamic Technique

Two main strategies
Dynamic: These methods rely on hardware to find the parallelism.
Static — fixed solutions produced at build time by the compiler.
Limitations are imposed by data and control dangers. These strategies are not entirely disjointed.

A pipelined processor’s CPI (Cycles per Instruction) is the total of the basic CPI and any contributions from stalls:
Ideal pipeline = Pipeline CPI with Structural stalls plus Data stalls with hazards and Control stalls

A measurement of the highest performance that the implementation can achieve is the ideal pipeline CPI. We decrease the total pipeline CPI and hence raise the IPC by lowering each of the components on the right-hand side (Instructions per Clock).

The ideal pipeline has structural stalls, data stalls with risks, and control stalls.
Determine whether instructions may be performed concurrently in order to benefit from instruction-level parallelism. If two instructions are parallel, they can run in a pipeline concurrently without stalling. Dependent instructions cannot be performed in simultaneously and must be carried out sequentially.
Data dependences (also known as genuine data dependences), name dependences, and control dependences are the three main categories of dependencies.

Data Dependencies and Hazards

How much parallelism a programme has and how to take use of it

If two instructions are parallel, they can run in a pipeline at the same time without stalling (assuming no structural hazards exist)

No dependencies exist between simultaneous instructions.

Two instructions may frequently partially overlap if they are not parallel and must be carried out sequentially.

If each of the following conditions is true, then an instruction j is data dependent on instruction i. Instruction I gives a result that instruction j may utilise.
Instructions are dependent on one another if there is a chain of dependencies of the first kind connecting them.
For example, instruction j is data dependent on instruction k, and instruction k is data dependent on instruction i.

Data Dependencies –

The aforementioned dependencies are in integer data for the last two instructions and floating point data for the first two directions.

Pipeline Hazards

>It is required to stop the pipeline due to dangers.

Some pipelined instructions are permitted to move forward while others are delayed.
When an instruction in this example pipeline approach stalls, all instructions farther back in the pipeline also stall.
During the delay, no fresh instructions are requested.
Earlier pipeline instructions must be followed.

>Overcoming Dependences

Both Ways

Maintain reliance while avoiding the danger
dynamically set the code’s schedule
Change the code

>Having Trouble Identifying Dependences

A data value may move from one instruction to another using memory or registers.

Consequently, detection is not always simple.
The register dependencies are simple to identify for instructions that relate to memory.

Assume, however, that we utilise 100(R4) and 20 when R4 = 20 and R6 = 100. (R6)

Consider the case where we incremented R4 in an instruction between two references, let’s say 20(R4).

Name Dependences; Two Categories

Even while two instructions share the same name — a register or piece of memory — there is really no data transfer between the instructions connected to that name. whenever I comes before j.

When instruction j writes to a register or memory address that instruction I reads, there is an antidependence between the two instructions. The initial arrangement must be maintained.
When instructions I and j write to the same register or memory address, an output dependency results, and the order must once more be maintained.

Data Hazards

Every time there is a dependency between instructions and the instructions are near enough together that the sequence of access to the operand involved in the reliance would be altered by pipelining or another reordering of the instructions, a danger is generated.
We must maintain programme order, or the order in which instructions would be executed in a system without pipelines.
But the sequence of the programme only has to be followed when it influences how the programme turns out.

Data Hazards — Three Types

Two directives Possible dangers for I and j, where I comes before j in the programme sequence, include:

>RAW (read after write) — When a source is attempted to be read before it is being written, the old value is mistakenly obtained.

The most typical kind
Program sequence must be maintained.
A RAW danger will emerge from a load instruction followed by an integer ALU instruction that directly uses the load result in a straightforward common static pipeline.

>WAW (Write after Write) refers to j attempting to write an operand before it has been written by i. As a result, the writes are put in the wrong order, leaving the value that I has written.

output sensitivity
present in pipelines that enable writing to many pipes or allowing one instruction to continue in the presence of a halted prior instruction
In the traditional illustration, these type of dangers are avoided by using the WB stage for write back.
Rearranging the instructions is a potential risk if it is permitted.
Imagine that a floating point instruction is followed by an integer instruction when writing to a register.

>WAR (write after read) occurs when j tries to write an operand before I reads it, causing I to get the new value improperly.

Antidependence
can’t happen in the majority of static pipelines since reads happen early in ID and writes happen late in WB.

Control Dependencies

Establishes the order in which an instruction, I shall be performed in relation to a branch instruction so that it is carried out only when necessary and in the proper programme sequence.

S1 is control dependent on p1 and S2 is control dependent on P2 but not on P1

Two limitations are present.

A branch cannot be relocated before a control-dependent instruction to remove the branch’s control over the instruction’s execution. For instance, we cannot shift a sentence from an if statement’s then section to the if statement itself.
It is not possible to relocate an instruction after a branch so that the branch controls the execution of an instruction that is not control dependent on the branch. For instance, we are unable to relocate a statement into the then section from before the if.

Limits to ILP

The fundamental inquiry is if a technique is energy efficient — does it result in a quicker growth in power consumption than in performance? The bulk of performance-enhancing techniques raise power usage.
• Energy-inefficient multiple issue processor approaches include:
• When several instructions are issued, there is some overhead in logic that increases more quickly than the issue rate does.
• Widening discrepancy between sustained performance and peak issue rates
• Performance = f(sustained rate) and number of transistors switching = f(peak issue rate):
Energy consumption per performance unit rising when the gap between peak and sustained

Limitations may be solved by developments in compiler technology combined with considerably new and different hardware approaches, but it is doubtful that these advancements will be combined with realistic hardware any time soon.
Limitations may be overcome by improvements in compiler technology paired with significantly new and distinct hardware techniques.e soon.

ILP Summary

Utilize Instruction Level Parallelism to Benefit Performance from implicit Parallelism
Compiler unrolling of loops to boost ILP
Dynamic HW using ILP to boost branch prediction
When reliance cannot be known at compile time, it works.
able to conceal L1 cache misses
One machine’s code runs smoothly on