Computer-Organization

Question 1

The chip select logic for a certain DRAM chip in a memory system design is shown below. Assume that the memory system has 16 address lines denoted by A15 to A0. What is the range of addresses (in hexadecimal) of the memory system that can get enabled by the chip select (CS) signal?

A
C800 to C8FF
B
C800 to CFFF
C
DA00 to DFFF
D
CA00 to CAFF
       Computer-Organization       DRAM       GATE 2019       Video-Explanation
Question 1 Explanation: 
Total address space from A0 – A15 = 215
The chip select address for given figure:
A4 – A15 = 25
So, total addressable loactions = 216/25 = 211
211 = 2048 or location 0 to 2047
∴ CFFF – C800 = 2047
Answer is C800 to CFFF.
Question 2

A certain processor uses a fully associative cache of size 16 kB. The cache block size is 16 bytes. Assume that the main memory is byte addressable and uses a 32-bit address. How many bits are required for the Tag and the Index fields respectively in the addresses generated by the processor?

A
28 bits and 0 bits
B
24 bits and 4 bits
C
24 bits and 0 bits
D
28 bits and 4 bits
       Computer-Organization       Cache       GATE 2019       Video-Explanation
Question 2 Explanation: 
Since, it is said for fully associative cache so

No index bits is there. So, now for tag bits,
Total bits – Offset bits = 32 – 4 = 28
So, tag bits = 28, Index bits = 0
Question 3

A certain processor deploys a single-level cache. The cache block size is 8 words and the word size is 4 bytes. The memory system uses a 60-MHz clock. To service a cache miss, the memory controller first takes 1 cycle to accept the starting address of the block, it then takes 3 cycles to fetch all the eight words of the block, and finally transmits the words of the requested block at the rate of 1 word per cycle. The maximum bandwidth for the memory system when the program running on the processor issues a series of read operations is ______ × 106 bytes/sec.

A
160
B
145
C
172
D
124
       Computer-Organization       Cache       GATE 2019       Video-Explanation
Question 3 Explanation: 

Cache block = 8 words
Word size = 4 bytes
Cache block size = 32 bytes
Clock = 60 MHz
⇒ T = 1/clock = 1/60×106 seconds
Cache miss
= 1 cycle(Address) + 3 cycles (8 words) + 1word/cycle ×8 (transfer)
= 12 cycles
= 12/60×106
Total bandwidth = total data/total time = 32 bytes/(12/60×106) = 160 × 106 bytes/second
Answer: 160
Question 4

A 32-bit wide main memory unit with a capacity of 1 GB is built using 256M × 4-bit DRAM chips. The number of rows of memory cells in the DRAM chip is 214. The time taken to perform one refresh operation is 50 nanoseconds. The refresh period is 2 milliseconds. The percentage (rounded to the closest integer) of the time available for performing the memory read/write operations in the main memory unit is _________.

A
59%
B
60%
C
61%
D
62%
       Computer-Organization       Memory-Interfacing       GATE 2018       Video-Explanation
Question 4 Explanation: 
Time taken to refresh one row = 50 ns
There are 214 rows, so time taken to refresh all the rows = 214 * 50ns = 0.82 milliseconds
It is given that total refresh period is 2ms. The refresh period contains the time to refresh all the rows and also the time to perform read/write operation.
So % time spent in refresh = (Time taken to refresh all rows / refresh period)*100
= (0.82 ms / 2ms)*100
= 41%
So the % of time for read/write operation = 100 – 41 = 59%
Question 5
The following are some events that occur after a device controller issues an interrupt while process L is under execution.
(P) The processor pushes the process status of L onto the control stack.
(Q) The processor finishes the execution of the current instruction.
(R) The processor executes the interrupt service routine.
(S) The processor pops the process status of L from the control stack.
(T) The processor loads the new PC value based on the interrupt.
A
QPTRS
B
PTRSQ
C
TRPQS
D
QTPRS
       Computer-Organization       Interrupt-Service-Routing       GATE 2018       Video-Explanation
Question 5 Explanation: 
The sequence of execution in the case of an interrupt while executing a process L is:
(Q) The processor finishes the execution of the current instruction
(P) The processor pushes the process status of L onto the control stack
(T) The processor loads the new PC value based on the interrupt
(R) The processor executes the interrupt service routine
(S) The processor pops the process status of L from the control stack
This is the sequence because when a process is in the middle of execution if an interrupt comes then that process execution is completed then the interrupt is serviced.
Question 6
Consider the following processor design characteristics.
I. Register-to-register arithmetic operations only
II. Fixed-length instruction format
III. Hardwired control unit
Which of the characteristics above are used in the design of a RISC processor?
A
I and II only
B
II and III only
C
I and III only
D
I, II and III
       Computer-Organization       RISC-and-CISC       GATE 2018       Video-Explanation
Question 6 Explanation: 
RISC processor has all the three given characteristics.
So option D is the correct answer.
Question 7

The size of the physical address space of a processor is 2P bytes. The word length is 2W bytes. The capacity of cache memory is 2N bytes. The size of each cache block is 2M words. For a K-way set-associative cache memory, the length (in number of bits) of the tag field is

A
P – N – log2K
B
P – N + log2K
C
P – N – M – W – log2K
D
P – N – M – W + log2K
       Computer-Organization       Cache       GATE 2018       Video-Explanation
Question 7 Explanation: 
Physical memory is of size 2P bytes.
Each word is of size 2W bytes.
Number of words in physical memory = 2(P-W)
So the physical address is P-W bits
Cache size is 2N bytes.
Number of words in the cache = 2(N-W)
Block size is 2M words
No. of blocks in the cache = 2(N-W-M)
Since it is k-way set associative cache, each set in the cache will have k blocks.
No. of sets = 2(N-W-M ) / k
SET bits = N-W-M-logk
Block offset = M
TAG bits = P-W-(N-M-W-logk)-M = P-W-N+M+W+logk-M = P – N + logk
Question 8
A processor has 16 integer registers (R0, R1, …, R15) and 64 floating point registers (F0, F1, …, F63). It uses a 2-byte instruction format. There are four categories of instructions:
Type-1, Type-2, Type-3 and Type-4. Type-1 category consists of four instructions, each with 3 integer register operands (3Rs).
Type-2 category consists of eight instructions, each with 2 floating point register operands (2Fs).
Type-3 category consists of fourteen instructions, each with one integer register operand and one floating point register operand (1R+1F).
Type-4 category consists of N instructions, each with a floating point register operand (1F).
The maximum value of N is ________.
A
32
B
33
C
34
D
35
       Computer-Organization       Machine-Instructions       GATE 2018       Video-Explanation
Question 8 Explanation: 
Instruction size = 2−byte.
So, total number of instruction encodings = 216
There are 16 possible integer registers, so no. of bits required for an integer operand = 4
There are 64 possible floating point registers, so no. of bits required for a floating point operand = 6
Type-1 instructions:
There are 4 type-1 instructions and each takes 3 integer operands.
No. of encodings consumed by type-1 = 4 × 24 × 24 × 24 = 214.
Type-2 instructions:
There are 8 type-2 instructions and each takes 2 floating point operands.
No. of encodings consumed by Type-2 instructions = 8 × 26 x 26 = 215.
Type-3 instructions:
There are 14 type-3 instructions and each takes one integer operand and one floating point operand.
No. of encodings consumed by Type-3 instructions = 14 × 24 × 26 = 14336.
So, no. of encodings left for Type-4 = 216 − (214 + 215 + 14336) = 2048.
Since type-4 instructions take one floating point register, no. of different instructions of Type-4 = 2048 / 64 = 32.
Question 9

The instruction pipeline of a RISC processor has the following stages: Instruction Fetch (IF), Instruction Decode (ID), Operand Fetch (OF), Perform Operation (PO) and Writeback (WB). The IF, ID, OF and WB stages take 1 clock cycle each for every instruction. Consider a sequence of 100 instructions. In the PO stage, 40 instructions take 3 clock cycles each, 35 instructions take 2 clock cycles each, and the remaining 25 instructions take 1 clock cycle each. Assume that there are no data hazards and no control hazards.

The number of clock cycles required for completion of execution of the sequence of instructions is ___________.

A
219
B
220
C
221
D
222
       Computer-Organization       Pipeling       GATE 2018       Video-Explanation
Question 9 Explanation: 
Total Instruction = 100
Number of stages = 5
We know in a normal pipeline with k-stages time taken for n-instructions = k + n – 1 clock cycles.
So, in normal case total cycles = 100 + 5 – 1 = 104 cycles
But in this question it is given that PO stage of 40 instructions takes 3 cycles, 35 instructions takes 2 cycles and 25 instructions takes 1 cycle.
It is also given that all other stages take one clock cycle in all the 100 instructions.
PO stage of 40 instructions takes 3 cycles so these instructions will cause 2 stall cycle each, PO stage of 35 instructions takes 2 cycles so these instructions will cause 1 stall cycle each, But the 25 instruction whose PO stage takes 1 cycle, there are no stall cycles for these.
So, extra stall cycles = 40*2 + 35*1 = 80+35 = 115 cycles. So, total clock cycles = 104 + 115 = 219
Question 10

Consider the C struct defined below:

    struct data   {
        int marks [100];
        char grade;
        int cnumber;
    };
    struct data student;

The base address of student is available in register R1. The field student.grade can be accessed efficiently using

A
Post-increment addressing mode, (R1)+
B
Pre-decrement addressing mode, -(R1)
C
Register direct addressing mode, R1
D
Index addressing mode, X(R1), where X is an offset represented in 2’s complement 16-bit representation
       Computer-Organization       Addressing-Modes       GATE 2017 [Set-1]       Video-Explanation
Question 10 Explanation: 
sruct data
{
int marks[100];
char grade;
int cnumber;
}; struct data student
Base Address of student is available in R1.
So student.grade can be accessed efficiently by Relative Indexed Addressing Mode.
It is clearly mentioned X is the offset address to be summed with Base Address of R1.

Hence Index Addressing mode X(R1), where X is an offset represented in 2’s complement 16-bit representation.
⇾ Relative, Base Indexed & all subtypes of Indirect addressing modes are used with Arrays.
Question 11

Consider a two-level cache hierarchy with L1 and L2 caches. An application incurs 1.4 memory accesses per instruction on average. For this application, the miss rate of L1 cache is 0.1; the L2 cache experiences, on average, 7 misses per 1000 instructions. The miss rate of L2 expressed correct to two decimal places is __________.

A
0.05
B
0.06
C
0.07
D
0.08
       Computer-Organization       Cache       GATE 2017 [Set-1]       Video-Explanation
Question 11 Explanation: 
It is given that average references per instruction = 1.4
For 1000 instructions total number of memory references = 1000 * 1.4 = 1400
These 1400 memory references are first accessed in the L1.
Since the miss rate of L1 is 0.1, for 1400 L1 references the number of misses = 0.1 * 1400 = 140
We know when there is a miss in L1 we next access the L2 cache.
So number of memory references to L2 = 140
It is given that there are 7 misses in L2 cache. Out of 140 memory references to L2 cache there are 7 misses.
Hence the miss rate in L2 cache = 7/140 = 0.05
Question 12

Consider a RISC machine where each instruction is exactly 4 bytes long. Conditional and unconditional branch instructions use PC-relative addressing mode with Offset specified in bytes to the target location of the branch instruction. Further the Offset is always with respect to the address of the next instruction in the program sequence. Consider the following instruction sequence

If the target of the branch instruction is i, then the decimal value of the Offset is ___________.

A
-16
B
-17
C
-18
D
-19
       Computer-Organization       Instruction-Cycle       GATE 2017 [Set-1]       Video-Explanation
Question 12 Explanation: 
Size of each instruction = 4 Bytes
Program counter Relative Addressing Mode
⇾ Assuming the first instruction starts at address zero

Offset should go from Address 16 to Address 0
⇒ Offset = 0 – 16 = (-16) ⇾ Final answer
Question 13
Instruction execution in a processor is divided into 5 stages, Instruction Fetch (IF), Instruction Decode (ID), Operand Fetch (OF), Execute (EX), and Write Back (WB). These stages take 5, 4, 20, 10 and 3 nanoseconds (ns) respectively. A pipelined implementation of the processor requires buffering between each pair of consecutive stages with a delay of 2 ns. Two pipelined implementations of the processor are contemplated: (i) A naive pipeline implementation (NP) with 5 stages and (ii) An efficient pipeline (EP) where the OF stage is divided into stages OF1 and OF2 with execution times of 12 ns and 8 ns respectively. The speedup (correct to two decimal places) achieved by EP over NP in executing 20 independent instructions with no hazards is _________.
A
1.51
B
1.52
C
1.53
D
1.54
       Computer-Organization       Pipelining       GATE 2017 [Set-1]       Video-Explanation
Question 13 Explanation: 
Naive Pipeline implementation:
The stage delays are 5, 4, 20, 10 and 3. And buffer delay = 2ns
So clock cycle time = max of stage delays + buffer delay
= max(5, 4, 20, 10,3)+2
= 20+2
= 22ns
Execution time for n-instructions in a pipeline with k-stages = (k+n-1) clock cycles
= (k+n-1)* clock cycle time
In this case execution time for 20 instructions in the pipeline with 5-stages
= (5+20-1)*22ns
= 24*22
= 528ns
Efficient Pipeline implementation:
OF phase is split into two stages OF1, OF2 with execution times of 12ns, 8ns
New stage delays in this case = 5, 4, 12, 8, 10, 3
Buffer delay is the same 2ns.
So clock cycle time = max of stage delays + buffer delay
= max(5, 4, 12, 8, 10,3) + 2
= 12+2
= 14ns
Execution time = (k+n-1) clock cycles
= (k+n-1)* clock cycle time
In this case no. of pipeline stages, k = 6
No. of instructions = 20
Execution time = (6+20-1)*14 = 25*14 = 350ns
Speed up of Efficient pipeline over native pipeline
= Naive pipeline execution time / efficient pipeline execution time
= 528 / 350
≌ 1.51
Question 14

Consider a 2-way set associative cache with 256 blocks and uses LRU replacement. Initially the cache is empty. Conflict misses are those misses which occur due to contention of multiple blocks for the same cache set. Compulsory misses occur due to first time access to the block. The following sequence of accesses to memory blocks

     (0, 128, 256, 128, 0, 128, 256, 128, 1, 129, 257, 129, 1, 129, 257, 129)

is repeated 10 times. The number of conflict misses experienced by the cache is __________.

A
76
B
79
C
80
D
81
       Computer-Organization       Cache       GATE 2017 [Set-1]       Video-Explanation
Question 14 Explanation: 
When a block is first time accessed and it will not be present in the cache, so it will be a compulsory miss.
If a block is accessed once and then before its second access if there are k-unique block accesses and the cache size is less than k, and in that case if the second access is amiss then it is capacity miss. In this case the cache doesn’t have the size to hold all the k-unique blocks that came and so when the initial block came back again it is not in the cache because capacity of the cache is less than the unique block accesses k. Hence it is capacity miss.
If a block is accessed once and then before its second access if there are k-unique block accesses and the cache size is greater than k, and in that case if the second access is a miss then it is conflict miss. In this case the cache can hold all the k-unique blocks that came and even then when the initial block came back again it is not in the cache because it got replaced, then it is conflict miss.
LRU will use the function xmod128

Cache size = 256 Bytes
2 way set associative cache
So, no. of cache sets = 256/2 = 128
Blue → Compulsory miss
Red → Conflict miss
At the end of first round we have 4, compulsory misses & 4 conflict misses.
Similarly, if we continue from Round-2 to last round, in every round we will get 8 conflict misses.
Total conflict misses = 4+9(8) = 4+72 = 76 (conflict misses)
Question 15

A cache memory unit with capacity of N words and block size of B words is to be designed. If it is designed as a direct mapped cache, the length of the TAG field is 10 bits. If the cache unit is now designed as a 16-way set-associative cache, the length of the TAG field is ___________ bits.

A
14
B
15
C
16
D
17
       Computer-Organization       Cache       GATE 2017 [Set-1]       Video-Explanation
Question 15 Explanation: 
When it is directed mapped cache, the physical address can be divided as
(Tag bits + bits for block number + bits for block offset)
With block size being B words no. of bits for block offset = log (B)
Because the cache capacity is N words and each block is B words, number of blocks in cache = N / B
No. of bits for block number = log (N/B)
So, the physical address in direct mapping case
= 10 + log (N/B) + log (B)
= 10 + log (N) – log B + log B
= 10 + log (N)
If the same cache unit is designed as 16-way set associative, then the physical address becomes
(Tag bits + bits for set no. + Bits for block offset)
There are N/B blocks in the cache and in 16-way set associative cache each set contains 16 blocks.
So no. of sets = (N/B) / 16 = N / (16*B)
Then bits for set no = log (N/16B)
Bits for block offset remain the same in this case also. That is log (B).
So physical address in the set associative case
= tag bits + log (N/16*B) + log B
= tag bits + log (N) – log (16*B) + log B
= tag bits + log (N) – log 16 – log B + log B
= tag bits + log N – 4
The physical address is the same in both the cases.
So, 10 + log N = tag bits + log N – 4
Tag bits = 14
So, no. of tag bits in the case 16-way set associative mapping for the same cache = 14.
Question 16

In a two-level cache system, the access times of L1 and L2 caches are 1 and 8 clock cycles, respectively. The miss penalty from the L2 cache to main memory is 18 clock cycles. The miss rate of L1 cache is twice that of L2. The average memory access time (AMAT) of this cache system is 2 cycles. The miss rates of L1 and L2 respectively are:

A
0.111 and 0.056
B
0.056 and 0.111
C
0.0892 and 0.1784
D
0.1784 and 0.0892
       Computer-Organization       Cache       GATE 2017 [Set-2]       Video-Explanation
Question 16 Explanation: 
The Average memory access time is,
AMAT = (L1 hit rate)*(L1 access time) + (L1 miss rate)*(L1 access time + L1 miss penalty)
= L1 hit rate * L1 access time + L1 miss rate * L1 access time + L1 miss rate * L1 miss penalty
We can write,
L1 miss rate = 1 – L1 hit rate
AMAT = L1 hit rate * L1 access time + (1 – L1 hit rate) * L1 access time + L1 miss rate * L1 miss penalty
By taking L1 access time common,
= (L1 hit rate + 1 – L1 hit rate)* L1 access time + L1 miss rate * L1 miss penalty
AMAT = L1 access time + L1 miss rate * L1 miss penalty
We access L2 only when there is a miss in L1.
So, L1 miss penalty is nothing but the effective time taken to access L2.
L1_miss_penalty = Hit_rate_of_L2* Access time of L2 + MissRate of L2 *(Access time of L2+ miss penalty L2)
= Hit_rate_of_L2* Access time of L2 + MissRate of L2 *Access time of L2 + MissRate of L2 * miss penalty L2
By taking Access time of L2 common we get,
= Access time of L2 * (Hit_rate_of_L2 + MissRate of L2 ) + MissRate of L2 * miss penalty L2
We know, MissRate of L2 = 1 – Hit_rate_of_L2 → Hit_rate_of_L2 + MissRate of L2 = 1
So, the above formula becomes,
L1_miss_penalty = Access time of L2 + (MissRate of L2 * miss penalty L2)
It is given,
access time of L1 = 1,
access time of L2 = 8,
miss penalty of L2 = 18,
AMAT = 2.
Let, miss rate of L2 = x.
Since it is given that L1 miss rate is twice that of 12 miss rate, L1 miss rate = 2 * x.
Substituting the above values,
L1_miss_penalty = Access time of L2 + (MissRate of L2 * miss penalty L2)
L1_miss_penalty = 8 + (x*18)
AMAT = L1 access time + L1 miss rate * L1 miss penalty
2 = 1 + (2*x) (8+18*x)
36*x2+ 16*x -1 = 0
By solving the above quadratic equation we get,
x = Miss rate of L2 = 0.056
Miss rate of L1 = 2*x = 0.111
Question 17

The read access times and the hit ratios for different caches in a memory hierarchy are as given below.

The read access time of main memory is 90 nanoseconds. Assume that the caches use the referred-word-first read policy and the write back policy. Assume that all the caches are direct mapped caches. Assume that the dirty bit is always 0 for all the blocks in the caches. In execution of a program, 60% of memory reads are for instruction fetch and 40% are for memory operand fetch. The average read access time in nanoseconds (up to 2 decimal places) is ___________.

A
4.72
B
4.73
C
4.74
D
4.75
       Computer-Organization       Cache       GATE 2017 [Set-2]       Video-Explanation
Question 17 Explanation: 
Since nothing is given whether it is hierarchical memory or simultaneous memory, by default we consider it as hierarchical memory.
Hierarchical memory (Default case):
For 2-level memory:
The formula for average memory access time = h1 t1 + (1-h1)(t1 + t2)
This can be simplified as
t1 + (1-h1)t2
For 3-level memory:
h1 t1 + (1-h1)(t1 + h2 t2 + (1-h2)(t2 + t3))
This can be simplified as
t1 + (1-h1)t2 + (1-h1)(1-h2)t3
Instruction fetch happens from I-cache whereas operand fetch happens from D-cache.
Using that we need to calculate the instruction fetch time (Using I-cache and L2-cache) and operand fetch time (Using D-cache and L2-cache) separately.
Then calculate 0.6 (instruction fetch time) + 0.4(operand fetch time) to find the average read access time.
The equation for instruction fetch time = t1 + (1-h1 ) t2 + (1-h1 )(1-h2 ) t3
= 2 + 0.2*8 + 0.2*0.1*90 = 5.4ns
Operand fetch time = t1 + (1-h1)t2 + (1-h1)(1-h2)t3 = 2 + 0.1*8 + 0.1*0.1*90 = 3.7ns
The average read access time = 0.6*5.4 + 0.4*3.7 = 4.72ns
Question 18

Consider a machine with a byte addressable main memory of 232 bytes divided into blocks of size 32 bytes. Assume that a direct mapped cache having 512 cache lines is used with this machine. The size of the tag field in bits is _____________.

A
18
B
19
C
20
D
21
       Computer-Organization       Cache       GATE 2017 [Set-2]       Video-Explanation
Question 18 Explanation: 
Given that it is a byte addressable memory and the main memory is of size 232 Bytes.
So the physical address is 32 bits long.
Each block is of size 32(=25) Bytes. So block offset 5.
Also given that there are 512(=29) cache lines, since it is a direct mapped cache we need 9 bits for the LINE number.
When it is directed mapped cache, the physical address can be divided as
(Tag bits + bits for block/LINE number + bits for block offset)
So, tag bits + 9 + 5 = 32
Tag bits = 32 – 14 = 18
Question 19

A processor can support a maximum memory of 4GB, where the memory is word-addressable (a word consists of two bytes). The size of the address bus of the processor is at least _________ bits.

A
32
B
34
C
31
D
33
       Computer-Organization       Memory-Interfacing       GATE 2016 [Set-1]       Video-Explanation
Question 19 Explanation: 
Maximum Memory = 4GB = 232 bytes
Size of a word = 2 bytes
Therefore, Number of words = 232 / 2 = 231
So, we require 31 bits for the address bus of the processor.
Question 20

The size of the data count register of a DMA controller is 16 bits. The processor needs to transfer a file of 29,154 kilobytes from disk to main memory. The memory is byte addressable. The minimum number of times the DMA controller needs to get the control of the system bus from the processor to transfer the file from the disk to main memory is ________.

A
456
B
457
C
458
D
459
       Computer-Organization       DMA       GATE 2016 [Set-1]       Video-Explanation
Question 20 Explanation: 
Since nothing is mentioned about the mode of DMA working whether it is cycle stealing mode or burst mode, we consider it as burst mode by default.
As the data count register of the DMA is 16 bits long in burst mode DMA transfers 216 Bytes (= 64KB) once it gets the control.
To transfer 29,154 KB, no. of times DMA needs to take control
= (29,154 KB / 64KB)
= 29,154/64
= 455.53, means 456 times.
Question 21

The stage delays in a 4-stage pipeline are 800, 500, 400 and 300 picoseconds. The first stage (with delay 800 picoseconds) is replaced with a functionally equivalent design involving two stages with respective delays 600 and 350 picoseconds. The throughput increase of the pipeline is ________ percent.

A
33.33%
B
33.34%
C
33.35%
D
33.36%
       Computer-Organization       Pipelining       GATE 2016 [Set-1]       Video-Explanation
Question 21 Explanation: 
In a pipelined processor the throughput is 1/clock cycle time.
Cycle time = max of all stage delays.
In the first case max stage delay = 800.
So throughput = 1/800 initially.
After replacing this stage with two stages of delays 600, 350… the cycle time = maximum stage delay = 600.
So the new throughput = 1/600.
The new throughput > old throughput.
And the increase in throughput = 1/600 – 1/800.
We calculate the percentage increase in throughput w.r.t initial throughput, so the % increase in throughput
= (1/600 – 1/800) / (1/800) * 100
= ((800 / 600) – 1) * 100
= ((8/6) -1) * 100
= 33.33%
Question 22

A processor has 40 distinct instructions and 24 general purpose registers. A 32-bit instruction word has an opcode, two register operands and an immediate operand. The number of bits available for the immediate operand field is __________.

A
16 bits
B
17 bits
C
18 bits
D
19 bits
       Computer-Organization       Machine-Instructions       GATE 2016 [Set-2]       Video-Explanation
Question 22 Explanation: 
6 bits are needed for 40 distinct instructions (because, 25 < 40 < 26)
5 bits are needed for 24 general purpose registers (because, 24 < 24 < 25)
32-bit instruction word has an opcode (6 bit), two register operands (total 10 bits) and an immediate operand (x bits).
The number of bits available for the immediate operand field
⇒ x = 32 – (6 + 10) = 16 bits
Question 23

Suppose the functions F and G can be computed in 5 and 3 nanoseconds by functional units UF and UG, respectively. Given two instances of UF and two instances of UG, it is required to implement the computation F(G(Xi)) for 1 ≤ i ≤ 10. Ignoring all other delays, the minimum time required to complete this computation is _________ nanoseconds.

A
28
B
29
C
30
D
31
       Computer-Organization       Datapath       GATE 2016 [Set-2]       Video-Explanation
Question 23 Explanation: 
We have to do 10 calculations of U and G as the loop runs from i = 1 to 10.
Since we have 2 instances of UF and UG, each unit need to be run 5 times to complete the execution.
Suppose computation starts at time 0, which means G starts at 0 and F starts at 3rd second since F is dependent on G and G finishes computing first element at 3rd second.
Computation of F ten times using two UF units can be done in 5*10/2 = 25ns.
For the start UF needs to wait for UG output for 3ns and rest all are pipelined and hence no more wait.
So, answer is 3 + 25 = 28ns
We can see the timing diagram below:
Question 24

Consider a processor with 64 registers and an instruction set of size twelve. Each instruction has five distinct fields, namely, opcode, two source register identifiers, one destination register r identifier, and a twelve-bit immediate value. Each instruction must be stored in memory in a byte-aligned fashion. If a program has 100 instructions, the amount of memory (in bytes) consumed by the program text is _________.

A
500 bytes
B
501 bytes
C
502 bytes
D
503 bytes
       Computer-Organization       Machine-Instructions       GATE 2016 [Set-2]       Video-Explanation
Question 24 Explanation: 
One instruction is divided into five parts,
(i) The opcode- As we have instruction set of size 12, an instruction opcode can be identified by 4 bits, as 24 = 16 and we cannot go any less.
(ii) & (iii) Two source register identifiers- As there are total 64 registers, they can be identified by 6 bits. As they are two i.e. 6 bit + 6 bit.
iv) One destination register identifier- Again it will be 6 bits.
v) A twelve bit immediate value- 12 bit.
Adding them all we get,
4 + 6 + 6 + 6 + 12 = 34 bit = 34/8 byte = 4.25 bytes.
Due to byte alignment total bytes per instruction = 5 bytes.
As there are 100 instructions, total size = 5*100 = 500 Bytes.
Question 25

The width of the physical address on a machine is 40 bits. The width of the tag field in a 512 KB 8-way set associative cache is _________ bits.

A
24
B
25
C
26
D
27
       Computer-Organization       Cache       GATE 2016 [Set-2]       Video-Explanation
Question 25 Explanation: 
Given that it is a set associative cache, so the physical address is divided as following:
(Tag bits + bits for set no. + Bits for block offset)
In question block size has not been given, so we can assume block size 2x Bytes.
The cache is of size 512KB, so number of blocks in the cache = 219/2x = 219-x
It is 8-way set associative cache so there will be 8 blocks in each set.
So number of sets = (219 − x)/8 = 216 − x
So number of bits for sets = 16−x
Let number of bits for Tag = T
Since we assumed block size is 2x Bytes, number of bits for block offset is x.
So, T + (16−x) + x = 40
T + 16 = 40
T = 24
Question 26

Consider a 3 GHz (gigahertz) processor with a three-stage pipeline and stage latencies τ1, τ2, τ3 and such that τ = 3τ2/4 = 2τ3. If the longest pipeline stage is split into two pipeline stages of equal latency, the new frequency is _________ GHz, ignoring delays in the pipeline registers.

A
4
B
5
C
6
D
7
       Computer-Organization       Pipelining       GATE 2016 [Set-2]       Video-Explanation
Question 26 Explanation: 
Given 3 stage pipeline, with 3 GHz processor.
Given, τ1 = 3 τ2/4 = 2 τ3
Put τ1 = 6t, we get τ2 = 8t, τ3 = 3t
Now largest stage time is 8t.
So, frequency is 1/8t
⇒ 1/8t = 3 GHz
⇒ 1/t = 24 GHz
From the given 3 stages, τ 1 = 6t, τ 2 = 8t and τ 3 = 3t
So, τ 2 > τ1 > τ3.
The longest stage is τ2 = 8t and we will split that into two stages of 4t & 4t.
New processor has 4 stages – 6t, 4t, 4t, 3t.
Now largest stage time is 6t.
So, new frequency is = 1/6t
We can substitute 24 in place of 1/t, which gives the new frequency as 24/6 = 4 GHz
Question 27

A file system uses an in-memory cache to cache disk blocks. The miss rate of the cache is shown in the figure. The latency to read a block from the cache is 1 ms and to read a block from the disk is 10 ms. Assume that the cost of checking whether a block exists in the cache is negligible. Available cache sizes are in multiples of 10 MB.

The smallest cache size required to ensure an average read latency of less than 6 ms is _______ MB.

A
30
B
31
C
32
D
33
       Computer-Organization       Cache       GATE 2016 [Set-2]       Video-Explanation
Question 27 Explanation: 
Here it is not given whether the memory is hierarchical or simultaneous.
So we consider it as hierarchical memory.
But it is given that “assume that the cost of checking whether a block exists in the cache is negligible”, which means don’t consider the checking time in the cache when there is a miss.
So formula for average access time becomes h1t1 + (1-h1)(t2) which is same as for simultaneous access.
Though the memory is hierarchical because of the statement given in the question we ignored the cache access time when there is a miss and effectively the calculation became like simultaneous access.
The average access time or read latency = h1t1 + (1-h1)t2.
It is given that the average read latency has to be less than 6ms.
So, h1t1 + (1-h1)t2 < 6
From the given information t1 = 1ms, t2 = 10ms
h1*1+(1-h1)10 < 6
10-9h1 < 6
-9h1 < - 4
-h1 < - 4/9
-h1 < -0.444
Since in the given graph we have miss rate information and 1-h1 gives the miss rate, so we add 1 on both sides of the above inequality.
1-h1 < 1-0.444
1-h1 < 0.555
So for the average read latency to be less than 6 ms the miss rate hsa to be less than 55.5%.
From the given graph the closest value of miss rate less than 55.5 is 40%.
For 40% miss rate the corresponding cache size is 30MB.
Hence the answer is 30MB.
Question 28

For computers based on three-address instruction formats, each address field can be used to specify which of the following:

    (S1) A memory operand
    (S2) A processor register
    (S3) An implied accumulator register
A
Either S1 or S2
B
Either S2 or S3
C
Only S2 and S3
D
All of S1, S2 and S3
       Computer-Organization       Machine-Instructions       GATE 2015 [Set-1]
Question 28 Explanation: 
Any implied register is never explicitly mentioned as an operand in an operation.
So as the question asks what can be specified using the address fields, implied accumulator register can’t be represented in address field.
So, S3 is wrong.
Hence Option A is the correct answer.
Question 29

Consider a non-pipelined processor with a clock rate of 2.5 gigahertz and average cycles per instruction of four. The same processor is upgraded to a pipelined processor with five stages; but due to the internal pipeline delay, the clock speed is reduced to 2 gigahertz. Assume that there are no stalls in the pipeline. The speed up achieved in this pipelined processor is __________.

A
3.2
B
3.3
C
3.4
D
3.5
       Computer-Organization       Pipeling       GATE 2015 [Set-1]
Question 29 Explanation: 
Given that the processor clock rate = 2.5 GHz, the processor takes 2.5 G cycles in one second.
Time taken to complete one cycle = (1 / 2.5 G) seconds
Since it is given that average number of cycles per instruction = 4, the time taken for completing one instruction = (4 / 2.5 G) = 1.6 ns
In the pipelined case we know in the ideal case CPI = 1, and the clock speed = 2 GHz.
Time taken for one instruction in the pipelined case = (1 / 2 G) = 0.5 ns
Speedup = 1.6/0.5 = 3.2
Question 30

Consider a disk pack with a seek time of 4 milliseconds and rotational speed of 10000 rotations per minute (RPM). It has 600 sectors per track and each sector can store 512 bytes of data. Consider a file stored in the disk. The file contains 2000 sectors. Assume that every sector access necessitates a seek, and the average rotational latency for accessing each sector is half of the time for one complete rotation. The total time (in milliseconds) needed to read the entire file is _________.

A
14020
B
14021
C
14022
D
14023
       Computer-Organization       Secondary-Storage       GATE 2015 [Set-1]
Question 30 Explanation: 
Given
Seek time = 4ms
60s→ 10000 rotations

∴ Rotational latency =1/2×6ms = 3ms
1track → 600sectors
⇒6ms ← 600sectors (1 rotation means 600 sectors (or)1 track)
1 track → 6ms/600 = 0.01ms
2000sector → 2000(0.01) = 20ms
∴total time needed to read the entire file is
= 2000(4+3) + 20
= 8000 + 6000+20
= 14020ms
Question 31

Assume that for a certain processor, a read request takes 50 nanoseconds on a cache miss and 5 nanoseconds on a cache hit. Suppose while running a program, it was observed that 80% of the processors read requests result in a cache hit. The average and access time in nanoseconds is  _______.

A
14
B
15
C
16
D
17
       Computer-Organization       Cache       GATE 2015 [Set-2]
Question 31 Explanation: 
Average read access time = [(0.8)(5) + (0.2)(50)]ns = 4 + 10 = 14ns
Question 32

Consider a processor with byte-addressable memory. Assume that all registers, including Program Counter (PC) and Program Status Word (PSW), are of size 2 bytes. A stack in the main memory is implemented from memory location (0100)16 and it grows upward. The stack pointer (SP) points to the top element of the stack. The current value of SP is (016E)16. The CALL instruction is of two words, the first word is the op-code and the second word is the starting address of the subroutine (one word = 2 bytes). The CALL instruction is implemented as follows:

    • Store the current value of PC in the stack.
    • Store the value of PSW register in the stack.
    • Load the starting address of the subroutine in PC.

The content of PC just before the fetch of a CALL instruction is (5FA0)16. After execution of the CALL instruction, the value of the stack pointer is

A
(016A)16
B
(016C)16
C
(0170)16
D
(0172)16
       Computer-Organization       Machine-Instructions       GATE 2015 [Set-2]
Question 32 Explanation: 
Here the memory is byte-addressable.
The CALL instruction is implemented as follows:
-Store the current value of PC in the stack
pc is 2 bytes so when we store pc in stack SP is increased by 2 so current value of SP is (016E)16+2 -Store the value of PSW register in the stack
psw is 2 bytes, so when we store psw in stack SP is increased by 2
so current value of SP is (016E)16+2+2 = (0172)16
Question 33

Consider the sequence of machine instructions given below:

  MUL R5, R0, R1
  DIV R6, R2, R3
  ADD R7, R5, R6
  SUB R8, R7, R4 

In the above sequence, R0 to R8 are general purpose registers. In the instructions shown, the first register stores the result of the operation performed on the second and the third registers. This sequence of instructions is to be executed in a pipelined instruction processor with the following 4 stages: (1) Instruction Fetch and Decode (IF), (2) Operand Fetch (OF), (3) Perform Operation (PO) and (4) Write back the Result (WB). The IF, OF and WB stages take 1 clock cycle each for any instruction. The PO stage takes 1 clock cycle for ADD or SUB instruction, 3 clock cycles for MUL instruction and 5 clock cycles for DIV instruction. The pipelined processor uses operand forwarding from the PO stage to the OF stage. The number of clock cycles taken for the execution of the above sequence of instructions is

A
11
B
12
C
13
D
14
       Computer-Organization       Pipelining       GATE 2015 [Set-2]
Question 33 Explanation: 
I ⇒ Instruction Fetch and Decode
O ⇒ Operand Fetch
P ⇒ Perform the operation
W ⇒ write back the result
Question 34

Consider the following reservation table for a pipeline having three stages S1, S2 and S3.

     Time -->
-----------------------------
      1    2   3    4     5
-----------------------------
S1  | X  |   |   |    |  X |    
S2  |    | X |   | X  |    |
S3  |    |   | X |    |    |

The minimum average latency (MAL) is ________.

A
3
B
5
C
6
D
7
       Computer-Organization       Pipelining       GATE 2015 [Set-3]
Question 34 Explanation: 
Minimum average latency is based on an advanced concept in pipelining.
S1 is needed at time 1 and 5, so its forbidden latency is 5-1 = 4.
S2 is needed at time 2 and 4, so its forbidden latency is 4-2 = 2.
So, forbidden latency = (2,4,0) (0 by default is forbidden)
Allowed latency = (1,3,5) (any value more than 5 also).
Collision vector (4,3,2,1,0) = 10101 which is the initial state as well.
From initial state we can have a transition after “1” or “3” cycles and we reach new states with collision vectors
(10101 >> 1 + 10101 = 11111) and (10101 >> 3 + 10101 = 10111) respectively.
These 2 becomes states 2 and 3 respectively.
For “5” cycles we come back to state 1 itself.
From state 2 (11111), the new collision vector is 11111.
We can have a transition only when we see first 0 from right.
So, here it happens on 5th cycle only which goes to initial state. (Any transition after 5 or more cycles goes to initial state as we have 5 time slices).
From state 3 (10111), the new collision vector is 10111.
So, we can have a transition on 3, which will give (10111 >> 3 + 10101 = 10111) third state itself. For 5, we get the initial state.
Thus all the transitions are complete. State\Time 1 3 5 1 (10101) 2 3 1 2 (11111) – – 1 3 (10111) – 3 1 So, minimum length cycle is of length 3 either from 3-3 or from 1-3. So the minimum average latency is also 3.
Question 35

Consider the following code sequence having five instructions I1 to I5. Each of these instructions has the following format.

     
          OP Ri, Rj, Rk 

where operation OP is performed on contents of registers Rj and Rk and the result is stored in register Ri.

     
          I1 : ADD R1, R2, R3
          I2 : MUL R7, R1, R3
          I3 : SUB R4, R1, R5
          I4 : ADD R3, R2, R4
          I5 : MUL R7, R8, R9 

Consider the following three statements:

     
          S1: There is an anti-dependence between instructions I2 and I5.
          S2: There is an anti-dependence between instructions I2 and I4.
          S3: Within an instruction pipeline an anti-dependence always creates one or more stalls. 

Which one of above statements is/are correct?

A
Only S1 is true
B
Only S2 is true
C
Only S1 and S3 are true
D
Only S2 and S3 are true
       Computer-Organization       Pipelining       GATE 2015 [Set-3]
Question 35 Explanation: 
S1: False. Antidependency means WAR dependency. There is no WAR dependency between I2 and I5.
S2: True. There is WAR dependency between I2 and I4.
S3: False. Because WAR or antidependency can be resolved by register renaming.
Question 36

A machine has a 32-bit architecture, with 1-word long instructions. It has 64 registers, each of which is 32 bits long. It needs to support 45 instructions, which have an immediate operand in addition to two register operands. Assuming that the immediate operand is an unsigned integer, the maximum value of the immediate operand is ________.

A
16383
B
16384
C
16385
D
16386
       Computer-Organization       Machine-Instructions       GATE 2014 [Set-1]
Question 36 Explanation: 
1 Word = 32 bits
Each instruction has 32 bits.
To support 45 instructions, opcode must contain 6-bits.
Register operand1 requires 6 bits, since the total registers are 64.
Register operand 2 also requires 6 bits

14-bits are left over for immediate Operand using 14-bits, we can give maximum 16383, Since 214 = 16384 (from 0 to 16383)
Question 37

Consider two processors P1 and P2 executing the same instruction set. Assume that under identical conditions, for the same input, a program running on P2 takes 25% less time but incurs 20% more CPI (clock cycles per instruction) as compared to the program running on P1. If the clock frequency of P1 is 1GHz, then the clock frequency of P2 (in GHz) is _________.

A
1.6
B
1.7
C
1.8
D
1.9
       Computer-Organization       Speed-Up       GATE 2014 [Set-1]
Question 37 Explanation: 
1 cycle time for P1 = 109/ 1GH = 1 ns
Assume P1 takes 5 cycles for a program then P2 takes 20%more, means, 6 cycles.
P2 takes 25% less time, means, if P1 takes 5 ns, then P2 takes 3.75 ns.
Assume P2 clock frequency is x GHz.
P2 taken 6 cycles, so 6×109/x GH = 3.75, x = 1.6
Question 38

A 4-way set-associative cache memory unit with a capacity of 16 KB is built using a block size of 8 words. The word length is 32 bits. The size of the physical address space is 4 GB. The number of bits for the TAG field is __________.

A
20
B
21
C
22
D
23
       Computer-Organization       Cache       GATE 2014 [Set-2]
Question 38 Explanation: 
Physical address size = 32 bits
Cache size = 16K bytes = 214 Bytes
block size = 8 words = 8⨯4 Byte = 32 Bytes = 25 Bytes
(where each word = 4 Bytes)
No. of blocks =214/25=29
block offset =9bits
Because it is 4-way set associative cache, no. of sets =29/4=27
Set of set = 7 bits
TAG = 32 – (7 + 5) = 20 bits
Question 39

Suppose a stack implementation supports an instruction REVERSE, which reverses the order of elements on the stack, in addition to the PUSH and POP instructions. Which one of the following statements is TRUE with respect to this modified stack?

A
A queue cannot be implemented using this stack.
B
A queue can be implemented where ENQUEUE takes a single instruction and DEQUEUE takes a sequence of two instructions.
C
A queue can be implemented where ENQUEUE takes a sequence of three instructions and DEQUEUE takes a single instruction.
D
A queue can be implemented where both ENQUEUE and DEQUEUE take a single instruction each.
       Computer-Organization       Cache       GATE 2014 [Set-2]
Question 39 Explanation: 
A Queue can be implemented where Enqueue takes 3 operations & dequeue takes 1 operation.
Suppose:

Dequeue:

If we want to delete an element, that first we need to delete 1.
Enqueue:
Question 40

In designing a computer’s cache system, the cache block (or cache line) size is an important parameter. Which one of the following statements is correct in this context?

A
A smaller block size implies better spatial locality
B
A smaller block size implies a smaller cache tag and hence lower cache tag overhead
C
A smaller block size implies a larger cache tag and hence lower cache hit time
D
A smaller block size incurs a lower cache miss penalty
       Computer-Organization       Cache        GATE 2014 [Set-2]
Question 40 Explanation: 
When a cache block size is smaller, it could accommodate more number of blocks, it improves the hit ratio for cache, so the miss penalty for cache will be lowered.
Question 41

If the associativity of a processor cache is doubled while keeping the capacity and block size unchanged, which one of the following is guaranteed to be NOT affected?

A
Width of tag comparator
B
Width of set index decoder
C
Width of way selection multiplexor
D
Width of processor to main memory data bus
       Computer-Organization       Cache       GATE 2014 [Set-2]
Question 41 Explanation: 
When associativity is doubled, then the set offset will be affected, accordingly, the number of bits used for TAG comparator be affected.
Width of set index decoder also will be affected when set offset is changed.
A k-way set associative cache needs k-to-1 way selection multiplexer. If the associativity is doubled the width of way selection multiplexer will also be doubled.
With of processor to main memory data bus is guaranteed to be NOT affected as this is not dependent on the cache associativity.
Question 42

The value of a float type variable is represented using the single-precision 32-bit floating point format of IEEE-754 standard that uses 1 bit for sign, 8 bits for biased exponent and 23 bits for mantissa. A float type variable X is assigned the decimal value of −14.25. The representation of X in hexadecimal notation is

A
C1640000H
B
416C0000H
C
41640000H
D
C16C0000H
       Computer-Organization       Number-Systems       GATE 2014 [Set-2]
Question 42 Explanation: 
Given number is a negative number. So sign bit = 1
(14.25)10 = 1110.01000
= 1.11001000 x 23
23 bit Mantissa = 11001000000000000000000
Biased Exponent = exponent + bias
= 3 + 127 = 130 = 1000 0010
(-14.25) in 32-bit IEEE-754 floating point representation is
1 10000010 11001000000000000000000
= 1100 0001 0110 0100 0000 0000 000 0000
= (C 1 6 4 0 0 0 0)16
Question 43

Consider a main memory system that consists of 8 memory modules attached to the system bus, which is one word wide. When a write request is made, the bus is occupied for 100 nanoseconds (ns) by the data, address, and control signals. During the same 100 ns, and for 500 ns thereafter, the addressed memory module executes one cycle accepting and storing the data. The (internal) operation of different memory modules may overlap in time, but only one request can be on the bus at any time. The maximum number of stores (of one word each) that can be initiated in 1 millisecond is ____________.

A
10000
B
10001
C
10002
D
10003
       Computer-Organization       Instruction-Cycle       GATE 2014 [Set-2]
Question 43 Explanation: 
Each write request, the bus is occupied for 100 n.s
Storing of data requires 100 n.s.
In 100 n.s – 1 store
100/106m.s = 1 store
1 m.s = 106/100stores = 10000 stores
Question 44

Consider the following processors (ns stands for nanoseconds). Assume that the pipeline registers have zero latency.

    P1: Four-stage pipeline with stage latencies 1 ns, 2 ns, 2 ns, 1 ns.
    P2: Four-stage pipeline with stage latencies 1 ns, 1.5 ns, 1.5 ns, 1.5 ns.
    P3: Five-stage pipeline with stage latencies 0.5 ns, 1 ns, 1 ns, 0.6 ns, 1 ns.
    P4: Five-stage pipeline with stage latencies 0.5 ns, 0.5 ns, 1 ns, 1 ns, 1.1 ns.

Which processor has the highest peak clock frequency?

A
P1
B
P2
C
P3
D
P4
       Computer-Organization       Pipelining-and-addressing-modes       GATE 2014 [Set-3]
Question 44 Explanation: 
Clock period (CP) = max stage delay + overhead
So CPP1 = Max(1,2,2,1) = 2ns
CPP2 = Max(1,1.5,1.5,1.5) = 1.5ns
CPP3 = Max(0.5,1,1,0.6,1) = 1ns
CPP4 = Max(0.5,0.5,1,1,1.1)=1.1ns
As frequency ∝ 1/C.P, so least clock period will give the highest peak clock frequency.
So, fP3 = 1/1ns = 1GHz
Question 45

An instruction pipeline has five stages, namely, instruction fetch (IF), instruction decode and register fetch (ID/RF), instruction execution (EX), memory access (MEM), and register writeback (WB) with stage latencies 1 ns, 2.2 ns, 2 ns, 1 ns, and 0.75 ns, respectively (ns stands for nanoseconds). To gain in terms of frequency, the designers have decided to split the ID/RF stage into three stages (ID, RF1, RF2) each of latency 2.2/3 ns. Also, the EX stage is split into two stages (EX1, EX2) each of latency 1 ns. The new design has a total of eight pipeline stages. A program has 20% branch instructions which execute in the EX stage and produce the next instruction pointer at the end of the EX stage in the old design and at the end of the EX2 stage in the new design. The IF stage stalls after fetching a branch instruction until the next instruction pointer is computed. All instructions other than the branch instruction have an average CPI of one in both the designs. The execution times of this program on the old and the new design are P and Q nanoseconds, respectively. The value of P/Q is __________.

A
1.54
B
1.55
C
1.56
D
1.57
       Computer-Organization       Pipelining-and-addressing-modes       GATE 2014 [Set-3]
Question 45 Explanation: 
There are five stages:
Instruction Fetch (IF),
instruction decode and register fetch (ID/RF),
Instruction execution (EX),
Memory access (MEM),
and register writeback (WB)
P old design:
With stage latencies 1 ns, 2.2 ns, 2 ns, 1 ns, and 0.75 ns
Cycle time = MAX(1 ns, 2.2 ns, 2 ns, 1 ns, and 0.75 ns) = 2.2nsec
Branch penalty = 3-1 = 2 because the next instruction pointer at the end of the EX stage(which is 3rd stage) in the old design
AVG instruction execution time is
P = Tavg = (1+no. of stalls*branch penalty)*cycle time
= (1+0.20*2)2.2
P = 3.08 nsec
Q new design:
ID/RF stage is split into three stages (ID, RF1, RF2) each of latency (2.2)/3 ns = 0.73ns.
The EX stage is split into two stages (EX1, EX2) each of latency 1 ns.
The new design has a total of eight pipeline stages.
Time of stages in new design = {1ns, 0.73ns, 0.73ns, 0.73ns , 1ns, 1ns, 1ns, and 0.75ns}
Cycle time = MAX(1ns, 0.73ns, 0.73ns, 0.73ns , 1ns, 1ns, 1ns, and 0.75ns) = 1nsec
Branch penalty = 6-1 = 5 because the next instruction pointer at the end of the EX2 stage(which is 6th stage) in the new design.
AVG instruction execution time is
Q = Tavg = (1+no. of stalls*branch penality)*cycle time
= (1+0.20*5)1
Q = 2 nsec
Therefore, P/Q = 3.08/2 = 1.54
Question 46

In a k-way set associative cache, the cache is divided into v sets, each of which consists of k lines. The lines of a set are placed in sequence one after another. The lines in set s are sequenced before the lines in set (s+1). The main memory blocks are numbered 0 onwards. The main memory block numbered j must be mapped to any one of the cache lines from

A
(j mod v) * k to (j mod v) * k + (k-1)
B
(j mod v) to (j mod v) + (k-1)
C
(j mod k) to (j mod k) + (v-1)
D
(j mod k) * v to (j mod k) * v + (v-1)
       Computer-Organization       Cache       GATE 2013
Question 46 Explanation: 
Number of sets in cache = v.
A block numbered j will be mapped to set number (j mod v). Since it is k-way set associative cache, there are k blocks in each set. It is given in the question that the blocks in consecutive sets are sequenced. It means for set-0 the cache lines are numbered 0, 1, .., k-1 and for set-1, the cache lines are numbered k, k+1,… k+k-1 and so on. As the main memory block j will be mapped to set (j mod v), it will be any one of the cache lines from (j mod v) * k to (j mod v) * k + (k-1).
Question 47

Consider the following sequence of micro-operations.

     MBR ← PC 
     MAR ← X  
     PC ← Y  
     Memory ← MBR

Which one of the following is a possible operation performed by this sequence?

A
Instruction fetch
B
Operand fetch
C
Conditional branch
D
Initiation of interrupt service
       Computer-Organization       CPU-Control-Design-and-Interfaces       GATE 2013
Question 47 Explanation: 
We know PC holds the value of the next instruction to be executed. Here the value of PC is stored in MBR( MBR ← PC ) and then to the memory (Memory ← MBR). We are saving the value of PC in memory and a new address is loaded into the PC (PC ← Y), so we start executing from the new address Y which is loaded into PC. So this behaviour matches with the interrupt service execution as we have already saved the address of current running instruction into memory and then we are loading new address in to the PC and start executing from there. Hence option D is the answer.
Question 48

A RAM chip has a capacity of 1024 words of 8 bits each (1K×8). The number of 2×4 decoders with enable line needed to construct a 16K×16 RAM from 1K×8 RAM is

A
4
B
5
C
6
D
7
       Computer-Organization       Cache       GATE 2013
Question 48 Explanation: 
The capacity of the RAM needed = 16K
Capacity of the chips available = 1K
No. of address lines = 16K/1K = 16
Hence we can use 4 × 16 decoder for this. But we were only given 2 × 4 decoders.
So 4 decoders are required in inner level as from one 2×4 decoder we have only 4 output lines whereas we need 16 output lines.
Now to point to these 4 decoders, another 2×4 decoder is required in the outer level.
Hence no. of 2×4 decoders to realize the above implementation of RAM = 1 + 4 = 5
Question 49

Consider an instruction pipeline with five stages without any branch prediction: Fetch Instruction (FI), Decode Instruction (DI), Fetch Operand (FO), Execute Instruction (EI) and Write Operand (WO). The stage delays for FI, DI, FO, EI and WO are 5 ns, 7 ns, 10 ns, 8 ns and 6 ns, respectively. There are intermediate storage buffers after each stage and the delay of each buffer is 1 ns. A program consisting of 12 instructions I1, I2, I3, …, I12 is executed in this pipelined processor. Instruction I4 is the only branch instruction and its branch target is I9. If the branch is taken during the execution of this program, the time (in ns) needed to complete the program is

A
132
B
165
C
176
D
328
       Computer-Organization       Pipelining-and-addressing-modes       GATE 2013
Question 49 Explanation: 

Cycle time = max of all stage delays + buffer delay = max (5 ns, 7 ns, 10 ns, 8 ns, 6 ns) + 1 = 10+1 = 11ns
Out of all the instructions I1, I2, I3….I12 it is given that only I4 is a branch instruction and when I4 takes branch the control will jump to instruction I9 as I9 is the target instruction.
As can be seen from the timing diagram there is a gap of only 3 stall cycles between I4 and I9 because after I4 enters Decode Instruction (DI) whether there is a branch or not will be known at the end of Execute Instruction (EI) phase. So there are total 3 phases here namely DI, FO, EI. After 3 stall cycles I9 will start executing as that is the branch target.
As per the timing diagram total no. of clock cycles to complete the program = 15
Since 1 clock cycle = 11ns, time to complete the program = 15*11 = 165ns
Question 50

The following code segment is executed on a processor which allows only register operands in its instructions. Each instruction can have atmost two source operands and one destination operand. Assume that all variables are dead after this code segment.

   c = a + b;
   d = c * a;
   e = c + a;
   x = c * c;
   if (x > a) {
      y = a * a;
   }
   else {
     d = d * d;
     e = e * e;
  }

What is the minimum number of registers needed in the instruction set architecture of the processor to compile this code segment without any spill to memory? Do not apply any optimization other than optimizing register allocation.

A
3
B
4
C
5
D
6
       Computer-Organization       Code-Segment       GATE 2013
Question 50 Explanation: 
Let R1← a, R2← b
The 1st statement c = a + b; can be written as R2 = R1+R2, here R2 is used to store the value of ‘c’ as we don’t need ‘b’ any further.
Let R3 be used to store ‘d’, so we can write the 2nd statement d = c * a; as R3 = R2*R1
Let R4 be used to store ‘e’, so we can write the 3rd statement e = c + a; as R4 = R2+R1
We can reuse R2 to store ‘x’ as the value of ‘c’ is not needed after statement-4.
We can write the 4th statement x = c*c; as R2 = R2*R2;
Using the four registers R1, R2, R3 and R4 we can write the remaining statements of the program without needing any new registers…
if (x > a) {
y = a * a;
}
else {
d = d * d;
e = e * e;
}
So the minimum no. of registers needed is 4.
There are 50 questions to complete.

Access subject wise (1000+) question and answers by becoming as a solutions adda PRO SUBSCRIBER with Ad-Free content

Register Now