Pipelining

Question 1
Instruction execution in a processor is divided into 5 stages, Instruction Fetch (IF), Instruction Decode (ID), Operand Fetch (OF), Execute (EX), and Write Back (WB). These stages take 5, 4, 20, 10 and 3 nanoseconds (ns) respectively. A pipelined implementation of the processor requires buffering between each pair of consecutive stages with a delay of 2 ns. Two pipelined implementations of the processor are contemplated: (i) A naive pipeline implementation (NP) with 5 stages and (ii) An efficient pipeline (EP) where the OF stage is divided into stages OF1 and OF2 with execution times of 12 ns and 8 ns respectively. The speedup (correct to two decimal places) achieved by EP over NP in executing 20 independent instructions with no hazards is _________.
A
1.51
B
1.52
C
1.53
D
1.54
       Computer-Organization       Pipelining       GATE 2017 [Set-1]       Video-Explanation
Question 1 Explanation: 
Naive Pipeline implementation:
The stage delays are 5, 4, 20, 10 and 3. And buffer delay = 2ns
So clock cycle time = max of stage delays + buffer delay
= max(5, 4, 20, 10,3)+2
= 20+2
= 22ns
Execution time for n-instructions in a pipeline with k-stages = (k+n-1) clock cycles
= (k+n-1)* clock cycle time
In this case execution time for 20 instructions in the pipeline with 5-stages
= (5+20-1)*22ns
= 24*22
= 528ns
Efficient Pipeline implementation:
OF phase is split into two stages OF1, OF2 with execution times of 12ns, 8ns
New stage delays in this case = 5, 4, 12, 8, 10, 3
Buffer delay is the same 2ns.
So clock cycle time = max of stage delays + buffer delay
= max(5, 4, 12, 8, 10,3) + 2
= 12+2
= 14ns
Execution time = (k+n-1) clock cycles
= (k+n-1)* clock cycle time
In this case no. of pipeline stages, k = 6
No. of instructions = 20
Execution time = (6+20-1)*14 = 25*14 = 350ns
Speed up of Efficient pipeline over native pipeline
= Naive pipeline execution time / efficient pipeline execution time
= 528 / 350
≌ 1.51
Question 2

The stage delays in a 4-stage pipeline are 800, 500, 400 and 300 picoseconds. The first stage (with delay 800 picoseconds) is replaced with a functionally equivalent design involving two stages with respective delays 600 and 350 picoseconds. The throughput increase of the pipeline is ________ percent.

A
33.33%
B
33.34%
C
33.35%
D
33.36%
       Computer-Organization       Pipelining       GATE 2016 [Set-1]       Video-Explanation
Question 2 Explanation: 
In a pipelined processor the throughput is 1/clock cycle time.
Cycle time = max of all stage delays.
In the first case max stage delay = 800.
So throughput = 1/800 initially.
After replacing this stage with two stages of delays 600, 350... the cycle time = maximum stage delay = 600.
So the new throughput = 1/600.
The new throughput > old throughput.
And the increase in throughput = 1/600 - 1/800.
We calculate the percentage increase in throughput w.r.t initial throughput, so the % increase in throughput
= (1/600 - 1/800) / (1/800) * 100
= ((800 / 600) - 1) * 100
= ((8/6) -1) * 100
= 33.33%
Question 3

Consider a 3 GHz (gigahertz) processor with a three-stage pipeline and stage latencies τ1, τ2, τ3 and such that τ = 3τ2/4 = 2τ3. If the longest pipeline stage is split into two pipeline stages of equal latency, the new frequency is _________ GHz, ignoring delays in the pipeline registers.

A
4
B
5
C
6
D
7
       Computer-Organization       Pipelining       GATE 2016 [Set-2]       Video-Explanation
Question 3 Explanation: 
Given 3 stage pipeline, with 3 GHz processor.
Given, τ1 = 3 τ2/4 = 2 τ3
Put τ1 = 6t, we get τ2 = 8t, τ3 = 3t
Now largest stage time is 8t.
So, frequency is 1/8t
⇒ 1/8t = 3 GHz
⇒ 1/t = 24 GHz
From the given 3 stages, τ 1 = 6t, τ 2 = 8t and τ 3 = 3t
So, τ 2 > τ1 > τ3.
The longest stage is τ2 = 8t and we will split that into two stages of 4t & 4t.
New processor has 4 stages - 6t, 4t, 4t, 3t.
Now largest stage time is 6t.
So, new frequency is = 1/6t
We can substitute 24 in place of 1/t, which gives the new frequency as 24/6 = 4 GHz
Question 4

Consider the sequence of machine instructions given below:

  MUL R5, R0, R1
  DIV R6, R2, R3
  ADD R7, R5, R6
  SUB R8, R7, R4 

In the above sequence, R0 to R8 are general purpose registers. In the instructions shown, the first register stores the result of the operation performed on the second and the third registers. This sequence of instructions is to be executed in a pipelined instruction processor with the following 4 stages: (1) Instruction Fetch and Decode (IF), (2) Operand Fetch (OF), (3) Perform Operation (PO) and (4) Write back the Result (WB). The IF, OF and WB stages take 1 clock cycle each for any instruction. The PO stage takes 1 clock cycle for ADD or SUB instruction, 3 clock cycles for MUL instruction and 5 clock cycles for DIV instruction. The pipelined processor uses operand forwarding from the PO stage to the OF stage. The number of clock cycles taken for the execution of the above sequence of instructions is

A
11
B
12
C
13
D
14
       Computer-Organization       Pipelining       GATE 2015 [Set-2]
Question 4 Explanation: 
I ⇒ Instruction Fetch and Decode
O ⇒ Operand Fetch
P ⇒ Perform the operation
W ⇒ write back the result
Question 5

Consider the following reservation table for a pipeline having three stages S1, S2 and S3.

     Time -->
-----------------------------
      1    2   3    4     5
-----------------------------
S1  | X  |   |   |    |  X |    
S2  |    | X |   | X  |    |
S3  |    |   | X |    |    |

The minimum average latency (MAL) is ________.

A
3
B
5
C
6
D
7
       Computer-Organization       Pipelining       GATE 2015 [Set-3]
Question 5 Explanation: 
Minimum average latency is based on an advanced concept in pipelining.
S1 is needed at time 1 and 5, so its forbidden latency is 5-1 = 4.
S2 is needed at time 2 and 4, so its forbidden latency is 4-2 = 2.
So, forbidden latency = (2,4,0) (0 by default is forbidden)
Allowed latency = (1,3,5) (any value more than 5 also).
Collision vector (4,3,2,1,0) = 10101 which is the initial state as well.
From initial state we can have a transition after "1" or "3" cycles and we reach new states with collision vectors
(10101 >> 1 + 10101 = 11111) and (10101 >> 3 + 10101 = 10111) respectively.
These 2 becomes states 2 and 3 respectively.
For "5" cycles we come back to state 1 itself.
From state 2 (11111), the new collision vector is 11111.
We can have a transition only when we see first 0 from right.
So, here it happens on 5th cycle only which goes to initial state. (Any transition after 5 or more cycles goes to initial state as we have 5 time slices).
From state 3 (10111), the new collision vector is 10111.
So, we can have a transition on 3, which will give (10111 >> 3 + 10101 = 10111) third state itself. For 5, we get the initial state.
Thus all the transitions are complete. State\Time 1 3 5 1 (10101) 2 3 1 2 (11111) - - 1 3 (10111) - 3 1 So, minimum length cycle is of length 3 either from 3-3 or from 1-3. So the minimum average latency is also 3.
Question 6

Consider the following code sequence having five instructions I1 to I5. Each of these instructions has the following format.

     
          OP Ri, Rj, Rk 

where operation OP is performed on contents of registers Rj and Rk and the result is stored in register Ri.

     
          I1 : ADD R1, R2, R3
          I2 : MUL R7, R1, R3
          I3 : SUB R4, R1, R5
          I4 : ADD R3, R2, R4
          I5 : MUL R7, R8, R9 

Consider the following three statements:

     
          S1: There is an anti-dependence between instructions I2 and I5.
          S2: There is an anti-dependence between instructions I2 and I4.
          S3: Within an instruction pipeline an anti-dependence always creates one or more stalls. 

Which one of above statements is/are correct?

A
Only S1 is true
B
Only S2 is true
C
Only S1 and S3 are true
D
Only S2 and S3 are true
       Computer-Organization       Pipelining       GATE 2015 [Set-3]
Question 6 Explanation: 
S1: False. Antidependency means WAR dependency. There is no WAR dependency between I2 and I5.
S2: True. There is WAR dependency between I2 and I4.
S3: False. Because WAR or antidependency can be resolved by register renaming.
Question 7

Consider a 6-stage instruction pipeline, where all stages are perfectly balanced. Assume that there is no cycle-time overhead of pipelining. When an application is executing on this 6-stage pipeline, the speedup achieved with respect to non-pipelined execution if 25% of the instructions incur 2 pipeline stall cycles is _________.

A
4
B
5
C
6
D
7
       Data-Structures       Pipelining       GATE 2014 [Set-1]
Question 7 Explanation: 
For 6 stages, non- pipelining takes 6 cycles.
There were 2 stall cycles for pipelining for 25% of the instructions.
So pipeline time =(1+(25/100)*2)=3/2=1.5
Speed up =Non-pipeline time / Pipeline time=6/1.5=4
Question 8

A 5-stage pipelined processor has Instruction Fetch(IF),Instruction Decode(ID),Operand Fetch(OF),Perform Operation(PO)and Write Operand(WO)stages.The IF,ID,OF and WO stages take 1 clock cycle each for any instruction.The PO stage takes 1 clock cycle for ADD and SUB instructions,3 clock cycles for MUL instruction,and 6 clock cycles for DIV instruction respectively.Operand forwarding is used in the pipeline.What is the number of clock cycles needed to execute the following sequence of instructions?

     Instruction           Meaning of instruction
  I0 :MUL R2 ,R0 ,R1	      R2 ¬ R0 *R1
  I1 :DIV R5 ,R3 ,R4  	      R5 ¬ R3/R4
  I2 :ADD R2 ,R5 ,R2	      R2 ¬ R5+R2
  I3 :SUB R5 ,R2 ,R6	      R5 ¬ R2-R6
A
13
B
15
C
17
D
19
       Computer-Organization       Pipelining       GATE 2010
Question 8 Explanation: 
It is given that there is operand forwarding. In the case of operand forwarding the updated value from previous instruction’s PO stage is forwarded to the present instruction’s PO stage. Here there’s RAW dependency between I1-I2 for R5 and between I2-I3 for R2. These dependencies are resolved by using operand forwarding as shown in the below timeline diagram. The total number of clock cycles needed is 15.
Question 9

Consider a 4 stage pipeline processor. The number of cycles needed by the four instructions I1, I2, I3, I4 in stages S1, S2, S3, S4 is shown below:

What is the number of cycles needed to execute the following loop?

           for (i=1 to 2) {I1; I2; I3; I4;} 
A
16
B
23
C
28
D
30
       Computer-Organization       Pipelining       GATE 2009
Question 9 Explanation: 
Question 10

Which of the following are NOT true in a pipelined processor?

    I. Bypassing can handle all RAW hazards
    II. Register renaming can eliminate all register carried WAR hazards
    III. Control hazard penalties can be eliminated by dynamic branch prediction
A
I and II only
B
I and III only
C
II and III only
D
I, II and III
       Computer-Organization       Pipelining       GATE 2008
Question 10 Explanation: 
I. False. Bypassing can't handle all RAW hazard.
II. True. Register renaming can eliminate all WAR Hazard as well as WAW hazard.
III. If this statement would have said that
"Control hazard penalties can be completely eliminated by dynamic branch prediction", then it is false. But it is only given that "Control hazard penalties can be eliminated by dynamic branch prediction". So, it is true.
Hence, none of the given Option is Correct.
Question 11

Delayed branching can help in the handling of control hazards

For all delayed conditional branch instructions, irrespective of whether the condition evaluates to true or false

A
The instruction following the conditional branch instruction in memory is executed.
B
The first instruction in the fall through path is executed.
C
The first instruction in the taken path is executed.
D
The branch takes longer to execute than any other instruction.
       Computer-Organization       Pipelining       GATE 2008
Question 11 Explanation: 
In order to avoid the pipeline delay due to conditional branch instruction, a suitable instruction is placed below the conditional branch instruction such that the instruction will be executed irrespective of whether branch is taken or not and won't affect the program behaviour. Hence option A is the answer.
Question 12

Delayed branching can help in the handling of control hazards

The following code is to run on a pipelined processor with one branch delay slot:

I1: ADD R2 ← R7+R8
I2 : SUB R4 ← R5-R6
I3 : ADD R1 ← R2+R3
I4 : STORE Memory [R4] ← [R1]
     BRANCH to Label if R1 == 0 

Which of the instructions I1, I2, I3 or I4 can legitimately occupy the delay slot without any other program modification?

A
I1
B
I2
C
I3
D
I4
       Computer-Organization       Pipelining       GATE 2008
Question 12 Explanation: 
It is the method to maximize the use of the pipeline by finding and executing an instruction that can be safely executed whether the branch is taken or not. So, when a branch instruction is encountered, the hardware puts the instruction following the branch into the pipe and begins executing it. Here we do not need to worry about whether the branch is taken or not, as we do not need to clear the pipe because no matter whether the branch is taken or not, we know the instruction is safe to execute.
From the given set of instructions I3 is updating R1, and the branch condition is based on the value of R1 so I3 can’t be executed in the delay slot.
Instruction I1 is updating the value of R2 and R2 is used in I3. So I1 also can’t be executed in the delay slot.
Instruction I2 is updating R4, and at the memory location represented by R4 the value of R1 is stored. So if I2 is executed in the delay slot then the memory location where R1 is to be stored as part of I4 will be in a wrong place. Hence between I2 and I4, I2 can’t be executed after I4. Hence I2 can’t be executed in the delay slot.
Instruction I4 can be executed in the delay slot as this is storing the value of R1 in a memory location and executing this in the delay slot will have no effect. Hence option D is the answer.
Question 13

Consider a pipelined processor with the following four stages:

  IF: Instruction Fetch
  ID: Instruction Decode and Operand Fetch
  EX: Execute
  WB: Write Back

The IF, ID and WB stages take one clock cycle each to complete the operation. The number of clock cycles for the EX stage depends on the instruction. The ADD and SUB instructions need 1 clock cycle and the MUL instruction needs 3 clock cycles in the EX stage. Operand forwarding is used in the pipelined processor. What is the number of clock cycles taken to complete the following sequence of instructions?

  ADD R2, R1, R0       R2 <- R0 + R1
  MUL R4, R3, R2       R4 <- R3 * R2
  SUB R6, R5, R4       R6 <- R5 - R4
A
7
B
8
C
10
D
14
       Computer-Organization       Pipelining       GATE 2007
Question 13 Explanation: 
Since operand forwarding is there, by default we consider the operand forwarding from EX stage to EX stage.

So, total no. of clock cycles needed to execute the given 3 instructions is 8.
Question 14

A CPU has a five-stage pipeline and runs at 1 GHz frequency. Instruction fetch happens in the first stage of the pipeline. A conditional branch instruction computes the target address and evaluates the condition in the third stage of the pipeline. The processor stops fetching new instructions following a conditional branch until the branch outcome is known. A program executes 109 instructions out of which 20% are conditional branches. If each instruction takes one cycle to complete on average, the total execution time of the program is:

A
1.0 second
B
1.2 seconds
C
1.4 seconds
D
1.6 seconds
       Computer-Organization       Pipelining       GATE 2006
Question 14 Explanation: 
No. of total instructions = 109
20% are condition branches out of 109
⇒ 20/100 × 109
⇒ 2 × 108
In third stage of pipeline it consists of 2 stage cycles.
Total cycle penalty = 2 × 2 × 108 = 4 × 108
Clock speed = 1 GHz
Each Instruction takes 1 cycle i.e., 109 instructions.
Total execution time of a program is
= (109 / 109) +((4× 108) / 109) = 1+0.4 = 1.4 seconds
Question 15

A 5 stage pipelined CPU has the following sequence of stages:

IF — Instruction fetch from instruction memory.
RD — Instruction decode and register read.
EX — Execute: ALU operation for data and address computation.
MA — Data memory access - for write access, the register read at RD stage is used.
WB — Register write back. 

Consider the following sequence of instructions:

I1 : L R0, 1oc1; R0 <= M[1oc1]
I2 : A R0, R0 1; R0 <= R0 + R0
I3 : S R2, R0 1; R2 <= R2 - R0
Let each stage take one clock cycle.

What is the number of clock cycles taken to complete the above sequence of instructions starting from the fetch of I1?

A
8
B
10
C
12
D
15
       Computer-Organization       Pipelining       GATE 2005
Question 15 Explanation: 
From memory stage we are using operator forwarding:

If we don't use operator forwarding:

Total clock cycles = 8/11
There is no '11' in option.
Then no. of cycles = 8
Question 16

A 4-stage pipeline has the stage delays as 150, 120, 160 and 140 nanoseconds respectively. Registers that are used between the stages have a delay of 5 nanoseconds each. Assuming constant clocking rate, the total time taken to process 1000 data items on this pipeline will be

A
120.4 microseconds
B
160.5 microseconds
C
165.5 microseconds
D
590.0 microseconds
       Computer-Organization       Pipelining       GATE 2004
Question 16 Explanation: 
First instruction will take complete four cycle for execution. And then after that all 999 instruction will take only 1 cycle for execution to be completed. So time required to process 1000 instruction or data items is,
1st instruction × 4 × clock time + 999 instruction × 1 × clock time
1 × 4 × 165ns + 999 × 1 × 165ns
= 1654.95ns
= 165.5μs
Question 17

For a pipelined CPU with a single ALU, consider the following situations

I. The j + 1-st instruction uses the result of the j-th instruction as an operand
II. The execution of a conditional jump instruction
III. The j-th and j + 1-st instructions require the ALU at the same time 

Which of the above can cause a hazard?

A
I and II only
B
II and III only
C
III only
D
All the three
       Computer-Organization       Pipelining       GATE 2003
Question 17 Explanation: 
I is belongs to the Data hazard.
II is belongs to the Control hazard.
III is belongs to the Structural hazard.
→ Hazards are the problems with the instruction pipeline in CPU micro architectures.
Question 18

The performance of a pipelined processor suffers if

A
the pipeline stages have different delays
B
consecutive instructions are dependent on each other
C
the pipeline stages share hardware resources
D
All of the above
       Computer-Organization       Pipelining       GATE 2002
Question 18 Explanation: 
To speedup from pipelining equals the number of pipe stages are involve. Usually, however, the stages will not be perfectly balanced; besides, the pipelining itself involves some overhead.
If pipeline stages can’t have different delays, no dependency among consecutive instructions and sharing of hardware resources should not be there.
Question 19

Consider a 5-stage pipeline – IF (Instruction Fetch), ID (Instruction Decode and register read), EX (Execute), MEM (memory), and WB (Write Back). All (memory or register) reads take place in the second phase of a clock cycle and writes occur in the first phase of the clock cycle. Consider the execution of the following instruction sequence:

      11:      sub r2, r3, r4;         /*   r2 ← r3 – r4    */
      12:      sub r4, r2, r3;         /*   r4 ← r2 – r3    */
      13:      sw r2, 100(r1)          /*   M[r1+100] ← r2  */
      14:      sub r3, r4, r2;         /*   r3 ← r4 – r2    */  

(a) Show all data dependencies between the four instructions.
(b) Identify the data hazards.
(c) Can all hazards be avoided by forwarding in this case?

A
Theory Explanation is given below.
       Computer-Organization       Pipelining       GATE 2001
Question 20

Comparing the time T1 taken for a single instruction on a pipelined CPU with time T2 taken on a non-pipelined but identical CPU, we can say that

A
T1 ≤ T2
B
T1 ≥ T2
C
T1 < T2
D
T1 is T2 plus the time taken for one instruction fetch cycle
       Computer-Organization       Pipelining       GATE 2000
Question 20 Explanation: 
PIPELINING SYSTEM:
Pipelining is an implementation technique where multiple instructions are overlapped in execution. It has a high throughput (amount of instructions executed per unit time). In pipelining, many instructions are executed at the same time and execution is completed in fewer cycles. The pipeline is filled by the CPU scheduler from a pool of work which is waiting to occur. Each execution unit has a pipeline associated with it, so as to have work pre-planned. The efficiency of pipelining system depends upon the effectiveness of CPU scheduler.
NON- PIPELINING SYSTEM:
All the actions (fetching, decoding, executing of instructions and writing the results into the memory) are grouped into a single step. It has a low throughput.
Only one instruction is executed per unit time and execution process requires more number of cycles. The CPU scheduler in the case of non-pipelining system merely chooses from the pool of waiting work when an execution unit gives a signal that it is free. It is not dependent on CPU scheduler.
Question 21

An instruction pipeline consists of 4 stages: Fetch(F), Decode operand field (D), Execute (E), and Result-Write (W). The five instructions in a certain instruction sequence need these stages for the different number of clock cycles as shown by the table below.

Find the number of clock cycles needed to perform the 5 instructions.

A
Theory Explanation.
       Computer-Organization       Pipelining       GATE 1999
Question 22

Consider a non-pipelined processor operating at 2.5 GHz. It takes 5 clock cycles to complete an instruction. You are going to make a 5-stage pipeline out of this processor. Overheads associated with pipelining force you to operate the pipelined processor at 2 GHz. In a given program, assume that 30% are memory instructions, 60% are ALU instructions and the rest are branch instructions. 5% of the memory instructions cause stalls of 50 clock cycles each due to cache misses and 50% of the branch instructions cause stalls of 2 cycles each. Assume that there are no stalls associated with the execution of ALU instructions. For this program, the speedup achieved by the pipelined processor over the non-pipelined processor (round off to 2 decimal places) is _____.

A
2.16
       Computer-Organization       Pipelining       GATE 2020
Question 22 Explanation: 
In the non-pipelined architecture the clock cycle time = 1/(2.5)G = 0.4 ns
It is given that each instruction takes 5 clock cycles to execute in the non-pipelined architecture, so time taken to execute each instruction = 5 * 0.4 = 2ns
In the pipelined architecture the clock cycle time = 1/2G = 0.5 ns
In the pipelined architecture there are stalls due to memory instructions and branch instructions.
In the pipeline, the updated clocks per instruction CPI = (1 + stall frequency due to memory operations * stalls of memory instructions + stall frequency due to branch operations * stalls due to branch instructions)
Out of the total instructions , 30% are memory instructions. Out of those 30%, only 5% cause stalls of 50 cycles each.
Stalls per instruction due to memory operations = 0.3*0.05*50 = 0.75
Out of the total instructions 10% are branch instructions. Out of those 10% of instructions 50% of them cause stalls of 2 cycles each.
Stalls per instruction due to branch operations = 0.1*0.5*2 = 0.1
The updated CPI in pipeline = 1 + 0.75 + 0.1 = 1.85
The execution time in the pipeline = 1.85 * 0.5 = 0.925 ns
The speed up = Time in non-pipelined architecture / Time in pipelined architecture = 2 / 0.925 = 2.16
Question 23

A non pipelined single cycle processor operating at 100 MHz is converted into a synchro­nous pipelined processor with five stages requiring 2.5 nsec, 1.5 nsec, 2 nsec, 1.5 nsec and 2.5 nsec, respectively. The delay of the latches is 0.5 nsec. The speedup of the pipeline processor for a large number of instructions is

A
4.5
B
4.0
C
3.33
D
3.0
       Computer-Organization       Pipelining       GATE 2008-IT
Question 23 Explanation: 
For non-pipelined system time required = 2.5 + 1.5 + 2.0 + 1.5 + 2.5 = 10
For pipelined system = Max(stage delay) + Max(latch delay) = 2.5 + 0.5 = 3.0
Speedup = Time in non-pipelined system/Time in pipelined system = 10/3 = 3.33
Question 24

A processor takes 12 cycles to complete an instruction I. The corresponding pipelined processor uses 6 stages with the execution times of 3, 2, 5, 4, 6 and 2 cycles respectively. What is the asymptotic speedup assuming that a very large number of instructions are to be executed?

A
1.83
B
2
C
3
D
6
       Computer-Organization       Pipelining       GATE 2007-IT
Question 24 Explanation: 
Let there be n instructions.
For a non-pipelined processor each instruction takes 12 cycles.
So for n instructions total execution time be 12 × n = 12n
For a pipelined processor each instruction takes
max (3, 2, 5, 4, 6, 2) = 6
So for n instructions total execution time be,
(1×6 + (n-1) × 1) × 6
= (6 + n - 1) × 6
= (5 + n) × 6
= 30 + 6n
∴ Speedup = time without pipeline/time with pipeline = 12n/30+6n
So, if n is very large,
Question 25

A pipelined processor uses a 4-stage instruction pipeline with the following stages: Instruction fetch (IF), Instruction decode (ID), Execute (EX) and Writeback (WB). The arithmetic operations as well as the load and store operations are carried out in the EX stage. The sequence of instructions corresponding to the statement X = (S - R * (P + Q))/T is given below. The values of variables P, Q, R, S and T are available in the registers R0, R1, R2, R3 and R4 respectively, before the execution of the instruction sequence.

The number of Read-After-Write (RAW) dependencies, Write-After-Read( WAR) dependencies, and Write-After-Write (WAW) dependencies in the sequence of instructions are, respectively,

A
2, 2, 4
B
3, 2, 3
C
4, 2, 2
D
3, 3, 2
       Computer-Organization       Pipelining       GATE 2006-IT
Question 25 Explanation: 
RAW:
I1 - I2 (R5)
I2 - I3 (R6)
I3 - I4 (R5)
I4 - I5 (R6)
WAR:
I2 - I3 (R5)
I3 - I4 (R6)
WAW:
I1 - I3 (R5)
I3 - I4 (R6)
Question 26

A pipelined processor uses a 4-stage instruction pipeline with the following stages: Instruction fetch (IF), Instruction decode (ID), Execute (EX) and Writeback (WB). The arithmetic operations as well as the load and store operations are carried out in the EX stage. The sequence of instructions corresponding to the statement X = (S - R * (P + Q))/T is given below. The values of variables P, Q, R, S and T are available in the registers R0, R1, R2, R3 and R4 respectively, before the execution of the instruction

The IF, ID and WB stages take 1 clock cycle each. The EX stage takes 1 clock cycle each for the ADD, SUB and STORE operations, and 3 clock cycles each for MUL and DIV operations. Operand forwarding from the EX stage to the ID stage is used. The number of clock cycles required to complete the sequence of instructions is

A
10
B
12
C
14
D
16
       Computer-Organization       Pipelining       GATE 2006-IT
Question 26 Explanation: 
Question 27
A five-stage pipeline has stage delays of 150, 120, 150, 160 and 140 nanoseconds. The registers that are used between the pipeline stages have a delay of 5 nanoseconds each. The total time to execute 100 independent instructions on this pipeline, assuming there are no pipeline stalls, is ______ nanoseconds.
A
17160
       Computer-Organization       Pipelining       GATE 2021 CS-Set-1
Question 27 Explanation: 

In a pipeline with k-stages, number of cycles to execute n instructions = (k+n-1) cycles

Here k = 5, n = 100

So we need a total of 5+100-1 = 104 cycles.

Clock cycle time = maximum of all stage delays + register delay

                           = max(150, 120, 150, 160, 140) + 5 = 160+5 = 165 ns

 

Time in ns = 104*165 = 17160ns

Question 28
Consider a pipelined processor with 5 stages, Instruction Fetch (IF), Instruction Decode (ID), Execute (EX), Memory Access (MEM), and Write Back (WB). Each stage of the pipeline, except the EX stage, takes one cycle. Assume that the ID stage merely decodes the instruction and the register read is performed in the EX stage. The EX stage takes one cycle for ADD instruction and two cycles for MUL instruction.
Consider the following sequence of 8 instructions:
              ADD, MUL, ADD, MUL, ADD, MUL, ADD, MUL
Assume that every MUL instruction is data-dependent on the ADD instruction just before it and every ADD instruction (except the first ADD) is data- dependent on the MUL instruction just before it. The Speedup is defined as follows:
Assume that every MUL instruction is data-dependent on the ADD instruction just before it and every ADD instruction (except the first ADD) is data-dependent on the MUL instruction just before it. The Speedup is defined as follows:

The Speedup achieved in executing the given instruction sequence on the pipelined processor (rounded to 2 decimal places) is ________.
A
1.875
       Computer-Organization       Pipelining       GATE 2021 CS-Set-2
Question 28 Explanation: 

Question 29
The performance of a pipelined processor suffers if
A
the pipeline stages have different delays
B
consecutive instructions are dependent on each other
C
the pipeline stages share hardware resources
D
All of the above
       Computer-Organization       Pipelining       ISRO CS 2008
Question 29 Explanation: 
1. Pipelining is one way of improving the overall processing performance of a processor.
2. This architectural approach allows the simultaneous execution of several instructions.
3. Pipelining is transparent to the programmer; it exploits parallelism at the instruction level by overlapping the execution process of instructions.
4. It is analogous to an assembly line where workers perform a specific task and pass the partially completed product to the next worker.
Question 30
Consider a non-pipelined processor with a clock rate of 2.5 gigahertz and average cycles per instruction of four. The same processor is upgraded to a pipelined processor with five stages; but due to the internal pipeline delay, the clock speed is reduced to 2 gigahertz. Assume that there are no stalls in the pipeline. The speedup achieved in this pipelined processor is
A
3.2
B
3.0
C
2.2
D
2.0
       Computer-Organization       Pipelining       ISRO-2016
Question 30 Explanation: 
→ Given that the processor clock rate = 2.5 GHz, the processor takes 2.5 G cycles in one second.
→ Time taken to complete one cycle = (1 / 2.5 G) seconds
→ Since it is given that average number of cycles per instruction = 4, the time taken for completing one instruction=(4/2.5 G) = 1.6 ns
→ In the pipelined case we know in the ideal case CPI = 1, and the clock speed = 2 GHz.
→ Time taken for one instruction in the pipelined case = (1 / 2 G) = 0.5 ns
→ Speedup = 1.6/0.5 = 3.2
Question 31
Register renaming is done in pipelined processors
A
as an alternative to register allocation at compile time
B
for efficient access to function parameters and local variables
C
to handle certain kinds of hazards
D
as part of address translation
       Computer-Organization       Pipelining       ISRO-2016
Question 31 Explanation: 
→ Register renaming is used to eliminate hazards that arise due to WAR (Write After Read) and WAW(Write After Write) dependencies.
Question 32
Consider a pipelined processor with the following four stages:
IF: Instruction Fetch
ID: Instruction Decode and Operand Fetch
EX: Execute
WB: Write Back
The IF, ID and WB stages take one clock cycle each to complete the operation. The number of clock cycles for the EX stage depends on the instruction. The ADD and SUB instructions need 1 clock cycle and the MUL instruction needs 3 clock cycles in the EX stage. Operand forwarding is used in the pipelined processor. What is the number of clock cycles taken to complete the following sequence of instructions?
ADD   R2, R1, R0 R2 ← R1 + R0
MUL   R4, R3, R2    R4 ← R3 * R2
SUB   R6, R5, R4    R6 ← R5 - R4
A
7
B
8
C
10
D
14
       Computer-Organization       Pipelining       ISRO CS 2009
Question 32 Explanation: 
Since operand forwarding is there, by default we consider the operand forwarding from EX stage to EX stage.
Question 33
The use of multiple register windows with overlap causes a reduction in the number of memory accesses for
  1. Function locals and parameters
  2. Register saves and restores
III. Instruction fetches
A
I only
B
II only
C
III only
D
I, II and III
       Computer-Organization       Pipelining       ISRO CS 2009
Question 33 Explanation: 
→ I is true because when we make a function call there are some input registers and some output registers. If function F() is calling function G(), we can make the caller function F()'s output registers the same as the called procedure G()'s input registers this is done using overlapping register windows.This will reduce the memory accesses so that F()'s output need not be put into memory for G() to access again from memory.
→ II is false as register saves and restores would still be required for each and every variable.
→ III is also false as instruction fetch is not affected by memory access using multiple register windows.
Question 34
A pipeline P operating at 400 MHz has a speedup factor of 6 and operating at 70% efficiency. How many stages are there in the pipeline?
A
5
B
6
C
8
D
9
       Computer-Organization       Pipelining       ISRO CS 2013
Question 34 Explanation: 
Given Data,
Speedup factor=6.
efficiency=70%
=0.7
Step-1: Here, we have to find out number of stages.
Efficiency = Speedup factor/ Number of stages
0.7 = 6 / Number of stages
Step-2: Number of stages = 8.56
= 9
Question 35
Comparing the time T1 taken for a single instruction on a pipelined CPU, with time T2 taken on a no-pipelined but identical CPU, we can say that___?
A
T1=T2
B
T1>T2
C
T1
D
T1 is T2 plus time taken for one instruction fetch cycle
       Computer-Organization       Pipelining       Nielit Scientist-B CS 22-07-2017
Question 35 Explanation: 
PIPELINING SYSTEM:
Pipelining is an implementation technique where multiple instructions are overlapped in execution. It has a high throughput (amount of instructions executed per unit time). In pipelining, many instructions are executed at the same time and execution is completed in fewer cycles. The pipeline is filled by the CPU scheduler from a pool of work which is waiting to occur. Each execution unit has a pipeline associated with it, so as to have work pre-planned. The efficiency of pipelining system depends upon the effectiveness of CPU scheduler.
NON- PIPELINING SYSTEM:
All the actions (fetching, decoding, execution of instructions and writing the results into the memory) are grouped into a single step. It has a low throughput. Only one instruction is executed per unit time and execution process requires more number of cycles. The CPU scheduler in the case of non-pipelining system merely chooses from the pool of waiting work when an execution unit gives a signal that it is free. It is not dependent on CPU scheduler.
Question 36
A nonpipelined system taken 50ns to process a task. the same task can be processed in a six-segment pipeline with a clock cycle of 10ns. Determinant the speedup ration of the pipeline for 100 tasks. What is the maximum speedup that can be achieved?
A
4.90,5
B
4.76,5
C
3.90,5
D
4.30,5
       Computer-Organization       Pipelining       Nielit Scientist-B CS 4-12-2016
Question 36 Explanation: 
Speed up ratio (S)​ :
It is defined as the speedup of a pipeline processing with respect to the equivalent non-pipeline processing.
S =nt n/(k+n−1)t p
Number of tasks n = 100
For Non-pipeline:
Time taken by non-pipeline to process a task t n = 50ns
Total time taken by non-pipeline to process 100 task = n t n
= 100 × 50
= 5000ns
For Pipeline:
Number of segment pipeline k = 6
Time period of 1 clock cycle t p = 10ns
Total time required to complete n tasks in k segment pipeline with tp clock cycle time:
= ( k + n − 1 )t p
= ( 6 + 100 − 1 )10
= 1050ns
Speed up Ratio:
When total time taken by the pipeline to process 100 tasks is divided by the total time required to complete n tasks in k segment pipeline with t p clock cycle time then speed up ratio is obtained.
S =5000/1050
= 4 .76
Question 37
A pipeline is having speed up factor as 10 and operating with efficiency of 80%. what will be the number of stages in the pipeline?
A
10
B
8
C
13
D
None
       Computer-Organization       Pipelining       Nielit Scientific Assistance IT 15-10-2017
Question 37 Explanation: 
Efficiency of a pipeline(E​ k​ ) = Speed Up / No of stages
No of Stages =Speed Up / E​ k​ = 10/0.8 =12.5(approximately 13)
Question 38
Pipelining improves performance by:
A
decreasing instruction latency
B
eliminating data hazards
C
exploiting instruction level parallelism
D
decreasing the cache miss rate
       Computer-Organization       Pipelining       UGC NET CS 2016 July- paper-2
Question 38 Explanation: 
→ Pipelining improves performance by exploiting instruction level parallelism.
→ Instruction pipelining is a technique for implementing instruction-level parallelism within a single processor.
→ Pipelining attempts to keep every part of the processor busy with some instruction by dividing incoming instructions into a series of sequential steps (the eponymous "pipeline") performed by different processor units with different parts of instructions processed in parallel.
→ It allows faster CPU throughput than would otherwise be possible at a given clock rate, but may increase latency due to the added overhead of the pipelining process itself.
Question 39
In the case of parallelization, Amdahl’s law states that if P is the proportion of a program that can be made parallel and (1 -P) is the proportion that cannot be parallelized, then the maximum speed-up that can be achieved by using N processors is:
A
1/((1−p)+ N .P)
B
1/((N −1)P +P)
C
1/((1−P )+ P /N)
D
1/((P)+(1-P)/N)
       Computer-Organization       Pipelining       UGC NET CS 2015 Jun- paper-2
Question 39 Explanation: 
Amdahl’s law can be formulated in the following way:

where
● S​ latency ​ is the theoretical speedup of the execution of the whole task;
● s is the speedup of the part of the task that benefits from improved system resources;
● p is the proportion of execution time that the part benefiting from improved resources originally occupied.
Furthermore,

shows that the theoretical speedup of the execution of the whole task increases with the improvement of the resources of the system and that regardless of the magnitude of the improvement, the theoretical speedup is always limited by the part of the task that cannot benefit from the improvement.
→ Amdahl's law applies only to the cases where the problem size is fixed. In practice, as more computing resources become available, they tend to get used on larger problems (larger datasets), and the time spent in the parallelizable part often grows much faster than the inherently serial work. In this case, Gustafson's law gives a less pessimistic and more realistic assessment of the parallel performance.
Question 40
The processing speeds of pipeline segments are usually :
A
Equal
B
Unequal
C
Greater
D
None of these
       Computer-Organization       Pipelining       UGC NET CS 2004 Dec-Paper-2
Question 40 Explanation: 
→ The processing speeds of pipeline segments are usually unequal.
→ Pipelining attempts to keep every part of the processor busy with some instruction by dividing incoming instructions into a series of sequential steps (the eponymous "pipeline") performed by different processor units with different parts of instructions processed in parallel.
Question 41
The speed up of a pipeline processing over an equivalent non-pipeline processing is defined by the ratio :   Where n → no. of tasks t<sub>n</sub>→ time of completion of each task k → no. of segments of pipeline t<sub>p</sub> → clock cycle time S → speed up ratio
A
S =n tn/(k + n – 1)tp
B
S =n tn/(k + n + 1)tp
C
S =n tn/(k – n + 1)tp
D
S =(k + n – 1)tp/n tn
       Computer-Organization       Pipelining       UGC NET CS 2013 Sep-paper-2
Question 41 Explanation: 
Without pipeline one task needs tn time. So, n tasks need n*tn time. T without pipeline = n*tn With pipeline: First task needs k cycles to finish. So time for 1st task will be k*tp Each of the other n-1 tasks need tp time only to finish. So, T pipeline = (k+n-1)*tp Speed up = T without pipeline / T pipeline = n*tn / (k+n-1)*tp
Question 42
The branch logic that provides making capabilities in the control unit is known as
A
Controlled transfer
B
Conditional transfer
C
Unconditional transfer
D
None of the above
       Computer-Organization       Pipelining       UGC NET CS 2012 June-Paper2
Question 42 Explanation: 
The branch logic that provides making capabilities in the control unit is known as controlled transfer.
Question 43
Pipelining strategy is called implement
A
instruction execution
B
instruction prefetch
C
instruction decoding
D
instruction manipulation
       Computer-Organization       Pipelining       UGC NET CS 2012 June-Paper2
Question 43 Explanation: 
→ Instruction prefetch is a technique used in central processor units to speed up the execution of a program by reducing wait states.
→ Pipelining strategy is called implement instruction prefetch.
Question 44
Amdahl’s law states that the maximum speedup S achievable by a parallel computer with ‘p’ processors is given by :
A
S≤f+(1-f)/p
B
S≤f/p+(1-f)
C
S≤1/[f+(1-f)/p]
D
S≤1/[1-f+f/p]
       Computer-Organization       Pipelining       UGC NET CS 2008-june-Paper-2
Question 44 Explanation: 
Amdahl’s law can be formulated in the following way:

where
→Slatency is the theoretical speedup of the execution of the whole task;
→s is the speedup of the part of the task that benefits from improved system resources;
→p is the proportion of execution time that the part benefiting from improved resources originally occupied.
Furthermore,

shows that the theoretical speedup of the execution of the whole task increases with the improvement of the resources of the system and that regardless of the magnitude of the improvement, the theoretical speedup is always limited by the part of the task that cannot benefit from the improvement.
→ Amdahl's law applies only to the cases where the problem size is fixed. In practice, as more computing resources become available, they tend to get used on larger problems (larger datasets), and the time spent in the parallelizable part often grows much faster than the inherently serial work. In this case, Gustafson's law gives a less pessimistic and more realistic assessment of the parallel performance.
Question 45
A Non-pipelined system takes 30ns to process a task. The same task can be processed in a four-segment pipeline with a clock cycle of 10ns. Determine the speed up of the pipeline for 100 tasks
A
3
B
4
C
3.91
D
2.91
       Computer-Organization       Pipelining       UGC-NET DEC-2019 Part-2
Question 45 Explanation: 
S=ntn / (n+k-1)tp
= 100*30/(100+4-1)*10
= 3000 / 1030
= 2.91
Question 46
A non-pipelined CPU has 12 general purpose registers(R0, R1, R2,........R12). Following operations are supported
ADD Ra, Rb, Rr Add Ra to Rb and store the result in Rr
MUL Ra, Rb, Rr Multiply Ra to Rb and store the result in Rr
MUL operations takes two clock cycles, ADD takes one clock cycle.
Calculate the minimum number of clock cycles required to compute the value of the expression XY+XYZ+YZ. The variables X, Y, Z are initially available in registers R0, R1 and R2 contents of these registers must not be modified.
A
5
B
6
C
7
D
8
       Computer-Organization       Pipelining       ISRO CS 2020       Video-Explanation
Question 46 Explanation: 
To calculate XY+XYZ+YZ, we need at least 3 multiplication operations and 2 ADD operations.
Assuming R0 = X, R1 = Y, R2 = Z.
MUL R3, R0, R1
MUL R4, R3, R2
ADD R5, R4, R3
MUL R6, R1, R2
ADD R7, R5, R6
Total we need 3 multiplication operations and 2 add operations.
Total cycles needed = 3*2+2 = 8
Question 47
One instruction tries to write an operand before it is written by previous instruction. This may lead to a dependency called
A
True dependency
B
Anti dependency
C
Output dependency
D
Control hazard
       Computer-Organization       Pipelining       ISRO CS 2020       Video-Explanation
Question 47 Explanation: 
When two instructions are updating an operand and if the ordering of the instructions changes the output this is nothing but Write After Write (WAW) dependency..which is also called output dependency.
Question 48
Consider a 5-segment pipeline with a clock cycle time 20ns in each sub operation. Find out the approximate speed-up ratio between pipelined and non-pipelined system to execute 100 instructions. (if an average, every five cycles, a bubble due to data hazard has to be introduced in the pipeline)
A
5
B
4.03
C
4.81
D
4.17
       Computer-Organization       Pipelining       ISRO CS 2020       Video-Explanation
Question 48 Explanation: 
A 5 segment pipeline with clock cycle time of 20ns.
Time without pipeline for one instruction = 5*20 = 100ns.
For 100 instructions time without pipeline = 100*100ns = 10^4 ns.
With pipeline, it is given that on average for every 5 cycles there is one bubble or stall cycle which is introduced. So stall frequency = ⅕.
In general in a pipeline we have clocks per instruction CPI = 1. But when there are stall cycles, the updated CPI = (1 + stall frequency * no. of stall cycles) = (1+⅕*1) = 1.2
Execution time with the pipeline = no. of instructions * CPI * clock cycle time.
= 100 * 1.2 * 20 = 2400ns
Speed up = Time without pipeline / Time with pipeline
= 10^4 / 2400 = 100/24 = 4.17
Question 49

Answer the following two questions Q53 And Q54 Based on the following information

Suppose a processor executes instructions in the following 4 stages (no pipeline), IF&ID, EX, MEM and WB. the IF&ID stage takes 10ns, EX stage 5ns, MEM stage 20ns and WB stage takes 5 ns to finish.

What is the average time per instruction for the unpipelined implementation
A
20ns
B
10ns
C
40ns
D
5ns
       Computer-Organization       Pipelining       HCU PHD CS 2018 December
Question 49 Explanation: 
Unpipelined = 10+5+20+5 = 40ns/instruction
Question 50

Answer the following two questions Q53 And Q54 Based on the following information

Suppose a processor executes instructions in the following 4 stages (no pipeline), IF&ID, EX, MEM and WB. the IF&ID stage takes 10ns, EX stage 5ns, MEM stage 20ns and WB stage takes 5 ns to finish.

What is approximately the average time per instruction for the 4 stage pipelined implementation
A
20ns
B
10ns
C
40ns
D
5ns
       Computer-Organization       Pipelining       HCU PHD CS 2018 December
Question 50 Explanation: 
Pipelined = 20ns/instruction (assuming no stalls), stages take 20ns each
Question 51

What is approximately the average time per instruction for the 4 stage pipelined implementation
A
20ns
B
10ns
C
40ns
D
5ns
       Computer-Organization       Pipelining       HCU PHD CS 2018 December
Question 51 Explanation: 
Pipelined = 20ns/instruction (assuming no stalls), stages take 20ns each
Question 52
A non-pipeline system takes 40ns to process a task. The same task can be processed in a 6-segment pipeline with a clock cycle of 10ns. Determine the speed up ratio of the pipeline for 100 tasks.
A
4.8
B
5.1
C
4.76
D
4.92
E
None of the given option is correct.
       Computer-Organization       Pipelining       HCU PHD CS 2018 June
Question 52 Explanation: 
Now tag bits,
32 - (offset bits + index bits)
= 32 - (4 + 12)
= 16
Time taken without pipeline to complete 100 tasks,
100 × 40ns = 4000 ns
Time taken with pipeline to complete 100 tasks,
1 × 6 × 10ns + 99 × 1 × 10ns
= 60 + 990
= 1050 ns
∴ Speedup ratio = 4000ns/1050ns = 3.8
Question 53
Consider an instruction pipeline with five stages without any branch prediction. Fetch Instruction(FI), Decode Instruction(DI), Fetch Operand(FO), Execute Instruction( EI) and Write Operand(WO). The stage delays for FI, DI, FO, EI and WO are 5 ns, 7 ns, 10 ns, 8 ns and 6 ns, respectively. There are intermediate storage buffers after each stage and the delay of each buffer is 1 ns. A program consisting of 12 instructions I1, I2, I3 , ••• ,I12 is executed in this instruction pipeline, the time (in ns) needed to complete the program is
A
132
B
165
C
176
D
32
       Computer-Organization       Pipelining       HCU PHD CS 2018 June
Question 53 Explanation: 
Cycle time = maximum stage delay out of given five stage delay + buffer delay
= 10ns + 1ns
= 11ns
First instruction will take five cycles and then the remaining instruction will take 1 cycle to complete their execution.
So, total cycle required to complete the execution of i.e., instruction is
1 × 5 + 11 × 1 = 16
Hence total execution time
= No. of cycles × cycle time
= 16 × 11 ns
= 176 ns
Question 54

Two processors, M-5 and M-7 implement the same instruction set. Processor M-5 uses a 5-stage pipeline and a clock cycle of 10 nanoseconds. Processor M-7 uses a 7-stage pipeline and a clock cycle of 7.5 nanoseconds. Which of tile following are true ?

I. M-7’s pipeline has better maximum throughput than M-5's pipeline

II. The latency of a single instruction is shorter on M-7's pipeline than on M-5's pipeline.

III. Programs executing on M-7 will always run faster than programs executing on M-5.
A
I only
B
II only
C
I and III only
D
II and III only
       Computer-Organization       Pipelining       HCU PHD CS MAY 2016
Question 54 Explanation: 
→ S1 is True. Throughput is no. of instructions executed per second.
Assuming ideal pipeline with CPI = 1,

→ S2 is False.
The latency of single instruction in M-7 is,
7.5 × 7 = 52.5 ns
The latency of single instruction in M-5 is,
10 × 5 = 50 ns
So, latency of single instruction is longer on M-7’s pipeline than on M-5’s pipeline.

→ S3 is False. Because if the program contains only single instruction then M-5 will run faster than M-7 as we have seen in S2.
Question 55

In a pipelined RISC computer where all arithmetic instructions have the same cycles per instruction which of the following actions would improve the execution time of a typical program ?

I. increasing the clock cycle rate

II. Disallowing any forwarding in the pipeline

III. Doubling the sizes of the instruction cache and the data cache without changing the clock cycle time
A
I only
B
III only
C
I and II only
D
I and III only
       Computer-Organization       Pipelining       HCU PHD CS MAY 2016
Question 55 Explanation: 
(I) is true
(II) is false because forwarding in the pipeline reduces hazards,which in turn can improve the execution time.
(III) is true because larger cache size can store more data,hence lower cache miss rate.
Question 56
A non-pipeline system takes 50ns to process a task. The same task can be processed in a six-segment pipeline with a clock cycle of 10ns. Determine approximately the speedup ratio of the pipeline for 500 tasks.
A
6
B
4.95
C
5.7
D
5.5
       Computer-Organization       Pipelining       UGC NET JRF November 2020 Paper-2
Question 56 Explanation: 
Time required to execute 500 tasks without pipelining =500*50 ns=25000 ns
Time required to execute 500 tasks with 6-stage pipelining =(1*6*10)+(499*1*10)
=60+4990
=5050
Question 57
Which of the following statements with respect to K-segment pipelining are true? A) Maximum speedup that a pipeline can provide is k theoretically. B) It is impossible to achieve maximum speedup k in the k-segment pipeline. C) All segments in the pipeline take the same time in computation. Choose the correct answer from the options given below:
A
(A) and (B) only
B
(B) and (C) only
C
(A) and (C) only
D
(A), (B) and (C)
       Computer-Organization       Pipelining       UGC NET JRF November 2020 Paper-2
Question 57 Explanation: 
Pipelining : overlapping execution – Parallelism improves performance.
The aim of using pipelining is to use a single clock per instruction(CPI) which automatically improves performance.
Statement A is true because

This shows that the theoretical maximum speedup that a pipeline can provide is k, where k is the number of segments in the pipeline.

Statement B is true because
k*efficiency= speedup.
Speedup can’t be equals to K till efficiency becomes 1 and in real time environment efficiency can't become 1 because of many reasons like delay at intermediate buffers, different times taken by different segments to perform their sub operations which causes all other segments to waste time while waiting for the next clock.
Statement C is wrong because different segments can take different times to complete their sub operations.
There are 57 questions to complete.