Consider a system with 2 level caches. Access times of Level 1 cache, Level 2 cache and main memory are 1 ns, 10ns, and 500 ns, respectively. The hit rates of Level 1 and Level 2 caches are 0.8 and 0.9, respectively. What is the average access time of the system ignoring the search time within the cache?
H1 = 0.8, (1 – H1) = 0.2
H2 = 0.9, (1 – H2) = 0.1
T1 = Access time for level 1 cache = 1ns
T2 = Access time for level 2 cache = 10ns
Hm = Hit rate of main memory = 1
Tm = Access time for main memory = 500ns
Average access time = [(0.8 * 1) + (0.2 * 0.9 * 10) + (0.2)(0.1) * 1 * 500]
= 0.8 + 1.8 + 10
If we use internal data forwarding to speed up the performance of a CPU (R1, R2 and R3 are registers and M is a memory reference), then the sequence of operations
R1 → M M → R2 M → R3can be replaced by
R1 → R3
R2 → M
M → R2
R1 → R2
R1 → R3
R1 → M
R2 → R3
R1 → R2
R1 → R3
R1 → M
Consider the following statements.
- I. Daisy chaining is used to assign priorities in attending interrupts.
II. When a device raises a vectored interrupt, the CPU does polling to identify the source of the interrupt.
III. In polling, the CPU periodically checks the status bits to know if any device needs its attention.
IV. During DMA, both the CPU and DMA controller can be bus masters at the same time.
Which of the above statements is/are TRUE?
I and IV only
I and II only
I and III only
Statement-II is false as vectored interrupt doesn’t involve polling but non-vectored interrupt involves polling.
Statement-III is true as polling means that CPU periodically checks the status bits to know if any device needs attention.
Statement-IV is false as during DMA only one of the CPU or DMA can be bus master at a time.
A direct mapped cache memory of 1 MB has a block ize of 256 bytes. The cache has an access time of 3 ns and a hit rate of 94%. During a cache miss, it takes 20 ns to bring the first word of a block from the main memory, while each subsequent word takes 5 ns. The word size is 64 bits. The average memory access time in ns (round off to 1 decimal place) is _____.
Hit ratio of cache = 0.94
Word size is 64 bits = 8 bytes.
Cache line size = 256 bytes = 32 words
Main memory access time = 20ns(time for first word) + 155ns(time for remaining 31 words, 31*5 = 155ns) = 175 ns
Average access time = h1*t1 + (1-h1)(t1+t2) = t1 +(1-h1)t2
⇒ 3 + (0.06)(175) = 13.5 ns
Consider the following data path diagram.
Consider an instruction: R0 ← R1 + R2. The following steps are used to execute it over the given data path. Assume that PC is incremented appropriately. The subscripts r and w indicate read and write operations, respectively.
1. R2r, TEMP1r, ALUadd, TEMP2w 2. R1r, TEMP1w 3. PCr, MARw, MEMr 4. TEMP2r, ROw 5. MDRr, IRw
Which one of the following is the correct order of execution of the above steps?
3, 5, 1, 2, 4
3, 5, 2, 1, 4
1, 2, 4, 3, 5
2, 1, 4, 5, 3
First the PC value has to be moved into MAR (step-3 from the given sequence), then the instruction has to be fetched(step-5 from the given sequence). Then Temp1 is loaded with the value of R1 (step-2 from the given sequence), then the addition operation is performed by accessing the R2 value directly and adding it to Temp1 value and storing the result in Temp2 (step-1 from the given sequence).
Finally the result from Temp2 is stored in R0 (step-4 from the given sequence).
Hence the correct sequence is (3, 5, 2, 1, 4).
Consider a non-pipelined processor operating at 2.5 GHz. It takes 5 clock cycles to complete an instruction. You are going to make a 5-stage pipeline out of this processor. Overheads associated with pipelining force you to operate the pipelined processor at 2 GHz. In a given program, assume that 30% are memory instructions, 60% are ALU instructions and the rest are branch instructions. 5% of the memory instructions cause stalls of 50 clock cycles each due to cache misses and 50% of the branch instructions cause stalls of 2 cycles each. Assume that there are no stalls associated with the execution of ALU instructions. For this program, the speedup achieved by the pipelined processor over the non-pipelined processor (round off to 2 decimal places) is _____.
It is given that each instruction takes 5 clock cycles to execute in the non-pipelined architecture, so time taken to execute each instruction = 5 * 0.4 = 2ns
In the pipelined architecture the clock cycle time = 1/2G = 0.5 ns
In the pipelined architecture there are stalls due to memory instructions and branch instructions.
In the pipeline, the updated clocks per instruction CPI = (1 + stall frequency due to memory operations * stalls of memory instructions + stall frequency due to branch operations * stalls due to branch instructions)
Out of the total instructions , 30% are memory instructions. Out of those 30%, only 5% cause stalls of 50 cycles each.
Stalls per instruction due to memory operations = 0.3*0.05*50 = 0.75
Out of the total instructions 10% are branch instructions. Out of those 10% of instructions 50% of them cause stalls of 2 cycles each.
Stalls per instruction due to branch operations = 0.1*0.5*2 = 0.1
The updated CPI in pipeline = 1 + 0.75 + 0.1 = 1.85
The execution time in the pipeline = 1.85 * 0.5 = 0.925 ns
The speed up = Time in non-pipelined architecture / Time in pipelined architecture = 2 / 0.925 = 2.16
A computer system with a word length of 32 bits has a 16 MB byte-addressable main memory and a 64 KB, 4-way set associative cache memory with a block size of 256 bytes. Consider the following four physical addresses represented in hexadecimal notation.
A1 = 0x42C8A4, A2 = 0x546888, A3 = 0x6A289C, A4 = 0x5E4880
Which one of the following is TRUE?
A1 and A4 are mapped to different cache sets.
A1 and A3 are mapped to the same cache set.
A3 and A4 are mapped to the same cache set.
A2 and A3 are mapped to the same cache set.
The word length is given as 32 bits and the physical addresses mentioned are all contain 6 hexadecimal digits, so the the physical address is 32 bits long.
Block size is 256 bytes, block offset = 8 bits as it is a byte addressable memory.
Cache size = 64KB
Number of blocks in the cache = 64KB/256B = 256
It is a 4-way set associative cache, so no. of sets in the cache = 256/4 = 64 = 26
In the physical address we need 6 bits for the SET number.
TAG bits = 32 – 6 – 8 = 18
So the 32 bits physical address is divided as (18 TAG bits + 6 SET number bits + 8 OFFSET bits)
Since in all the options we are asked about SET numbers of the given addresses, we need to find the SET number of each of the addresses.
A1 = 0x42C8A4, here SET number is (00 1000) which includes the last 2 bits of C(1100) and binary representation of 8 (1000).
A2 = 0x546888, here SET number is (10 1000) which includes the last 2 bits of 6(0110) and binary representation of 8 (1000).
A3 = 0x6A289C here SET number is (10 1000) which includes the last 2 bits of 2(0010) and binary representation of 8 (1000).
A4 = 0x5E4880 here SET number is (00 1000) which includes the last 2 bits of 4 (0100) and binary representation of 8 (1000).
From the given options option-4 is TRUE as A2, A3 are mapped to the same cache SET.
A processor has 64 registers and uses 16-bit instruction format. It has two types of instructions: I-type and R-type. Each I-type instruction contains an opcode, a register name, and a 4-bit immediate value. Each R-type instruction contains an opcode and two register names. If there are 8 distinct I-type opcodes, then the maximum number of distinct R-type opcodes is _____.
All possible binary combinations = 216
There are 64 registers, so no. of bits needed to identify a register = 6
I-type instruction has (Opcode+Register+4-bit immediate value). There are 8 distinct I-type instructions.
All the binary combinations possible with the I-type instructions are = 8*26*24 = 213
R-type instructions have 2 register operands.
Let x be the number of R-type instructions.
All the possible binary combinations of R-type instructions = x*26*26 = x*212
The sum of I-type and R-type binary combinations should be equal to 216.
x*212 + 213 = 216
212 (x+2) = 216
x+2 = 24
x = 16 – 2 = 14
DMA based I/O transfer
Interrupt driven I/O transfer
Polling based I/O transfer
Programmed I/O transfer
DMA or direct memory access allows the peripherals to directly communicate with each other using the memory buses, removing the intervention of the CPU. Hence it is the fast mode of IO transfer compared to all other methods. In polling, the processor wastes countless processor cycles by repeatedly checking each IO device if it needs attention.
Each cache block in WB and WT has a dirty bit.
Every write hit in WB leads to a data transfer from cache to main memory.
Eviction of a block from WT will not lead to data transfer from cache to main memory
A read miss in WB will never lead to eviction of a dirty block from WB.
Option B: Every write hit in WB need not lead to data transfer from cache to main memory, rather only when that particular block is being evicted there will be transfer from cache to main memory. There can be several write hits in WB before the block is evicted and all those writes will be propagated to the main memory at once. Hence the given statement is false.
Option C: The given statement is true because in WT the writes happen in parallel both in cache and main memory so at the eviction of a block in WT it will not lead to data transfer from cache to main memory.
Option D: A read miss in WB will fetch a new block from main memory and can lead to eviction of a dirty block. Hence the given statement is false.
h1 = 0.8
t1 = 10 ns
t2 = 100ns
After the optimization the average memory access time remains the same,
Let the new hit rate be h2. The new cache access time is 15ns.
Applying the average access time formula h1t1+(1-h1)(t1+t2)
0.8*10+0.2*(10+100) = h2*15+(1-h2)(15+100)
30 = 15 + (1-h2)*100
1-h2 = 0.15
h2 = 0.85
The TLB performs an associative search in parallel on all its valid entries using page number of incoming virtual address.
If the virtual address of a word given by CPU has a TLB hit, but the subsequent search for the word results in a cache miss, then the word will always be present in the main memory.
The memory access time using a given inverted page table is always same for all incoming virtual addresses.
In a system that uses hashed page tables, if two distinct virtual addresses V1 and V2 map to the same value while hashing, then the memory access time of these addresses will not be the same.
(B) True: TLB hit implies no page fault. Hence, the page-frame will always be in main memory.
(C) False: Inverted page table depends on the size of the process. Hence, the memory access time also varies.
(D) True: Collision resolution techniques are applied. Hence, the each address may take different time.
Every access to S is a hit.
Once P is brought to the cache it is never evicted.
At the end of the execution only R and S reside in the cache.
Every access to R evicts Q from the cache.
Block size = 64 bytes = 2^6 bytes, we need 6 bits OFFSET to identify each byte in the block.
No. of blocks in the cache = 2^11 / 2^6 = 2^5, since it is a direct mapped cache we need 5 bits in the physical address to identify the LINE number.
Each word is 16 bits long = 2 bytes.
Number of words in each block = 64/2 = 32 words.
The physical memory is 64KB = 2^16 bytes, so the physical address is 16 bits.
In direct mapped cache the physical address is divided into (TAG, LINE, OFFSET)
In the 16 bits physical address OFFSET is 6 bits and LINE number is 5 bits, so there are 5 TAG bits.
The given physical addresses of P, Q, R, S can be written in binary as below (each hexadecimal digit represents 4 binary digits):
P: 0xA248 (10100 01001 001000 )
Q: 0xC28A (11000 01010 001010)
R: 0xCA8A (11001 01010 001010)
S: 0xA262 (10100 01001 100010)
Here P is mapped to line number 9 in the cache. S also is mapped to line number 9 and not only that they both are from the same block as even the TAG bits are matching. So, even the first access of S will be a hit as it gets prefetched along with P. Once P is fetched it is never evicted.
Both Q and R are mapped to line number 10 in the cache. But they are from different blocks as the TAG bits are different. Since it is a direct mapped cache, every access of R evicts Q from the cache.
Hence options A, B, D are true. Option C is false as at the end of execution P, R, S are in the cache not just R, S.
It is given that there are 30% branch instructions and each branch instruction causes 2 stall cycles.
= 1+0.3*2 = 1.6
In X2 there is a branch predictor which predicts correctly 80% of the time. Only in 20% cases the prediction is wrong. So, out of the 30% of branch instructions only 20% of them now lead to stalls.
The effective CPI in X2 : (1+Stall frequency * No. of stall cycles) = (1+0.3*0.2*2) = 1.12
Since the processor speed remains the same, the clock cycle time also doesn’t change. Hence speedup of X2 over X1 : 1.6/1.12 = 1.43
Relative mode of addressing is most relevant to writing
position – independent code
Number of machine cycles required for RET instruction in 8085 microprocessor is
2 for stack operation.
Total no. of cycles = 2+1 = 3
For the daisy chain scheme of connecting I/O devices, which of the following statements is true?
It gives non-uniform priority to various devices.
It gives uniform priority to all devices
It is only useful for connecting slow devices to a processor device.
It requires a separate interrupt pin on the processor for each device.
→ In this all devices connected serially.
→ High priority devices placed first, followed by low priority devices.
A micro program control unit is required to generate a total of 25 control signals. Assume that during any microinstruction, at most two control signals are active. Minimum number of bits required in the control word to generate the required control signals will be
= 10 bits
An 8052 based system has an output port with address 00H. Consider the following assembly language program.
ORG 0100H MVI A, 00H LXI H, 0105H OUT 00H INR A PCHL HLT
(a) What does the program do with respect to the output port 00H?
(b) Show the wave forms at the three least significant bits of the port 00H.
Consider the following program in pseudo-pascal syntax. What is printed by the program if parameter a in procedure test 1 is passed as
(i) call-by-reference parameter
(ii) call-by-value-result parameter
program Example (input, output) var b: integer; procedure test2: begin b:=10; end procedure test1 (a:integer): begin a:=5; writeln ('point 1: ', a, b); test2; writeln ('point 2: ', a, b); end begin(*Example*) b:=3; test1(b); writeln('point3:', b); end
A hard disk is connected to a 50 MHz processor through a DMA controller. Assume that the initial set-up of a DMA transfer takes 1000 clock cycles for the processor, and assume that the handling of the interrupt at DMA completion requires 500 clock cycles for the processor. The hard disk has a transfer rate of 2000 Kbytes/sec and average block transferred is 4 K bytes. What fraction of the processor time is consumed by the disk, if the disk is actively transferring 100% of the time?
A computer system has a three level memory hierarchy, with access time and hit ratios as shown below:
(a) What should be the minimum sizes of level 1 and 2 memories to achieve an average access time of less than 100 nsec?
(b) What is the average access time achieved using the chosen sizes of level 1 and level 2 memories?
RST 7.5 interrupt in 8085 microprocessor executes the interrupt service routine from interrupt vector location
→ 60 in hexa decimal is 003CH.
Purpose of a start bit in RS 232 serial communication protocol is
to synchronize receiver for receiving every byte
to synchronize receiver for receiving a sequence of bytes
a parity bit
to synchronize receiver for receiving the last byte
The correct matching for the following pairs is
(A) DMA I/O (1) High speed RAM (B) Cache (2) Disk (C) Interrupt I/O (3) Printer (D) Condition Code Register (4) ALU
A – 4 B – 3 C – 1 D – 2
A – 2 B – 1 C – 3 D – 4
A – 4 B – 3 C – 2 D – 1
A – 2 B – 3 C – 4 D – 1
Cache → High speed RAM
Interrupt I/O → Printer
Condition code register → ALU
When an interrupt occurs, an operating system
ignores the interrupt
always changes state of interrupted process after processing the interrupt
always resumes execution of interrupted process after processing the interrupt
may change state of interrupted process to ‘blocked’ and schedule another process
Option B: Not always.
Option C: Not always. If some high priority interrupt comes during execution of current interrupt then it fails.
Option D: It is True always.
The expression (a*b)* c op….
where ‘op’ is one of ‘+’, ‘*’ and ‘↑’ (exponentiation) can be evaluated on a CPU with a single register without storing the value of (a * b) if
‘op’ is ’+’ or ‘*’
‘op’ is ’↑’ or ‘*’
‘op’ is ’↑’ or ‘+’
not possible to evaluate without storing
(a*b)*c + d
(a*b)*c * d
(a*b)*c ∧ d
In any case, brackets has the highest priority always. So I have to compute brackets first. Now, for + and *, I can do the rest of the operation and save results in the same register. But for exponentiation, I have to store the result of a*b, then do the computation of c∧d, then multiply these two results.
Hence, (A) is correct.
Contents of A register after the execution of the following 8085 microprocessor program is
MVI A, 55 H MVI C, 25 H ADD C DAA
A micro instruction into be designed to specify
- (a) none or one of the three micro operations of one kind and
(b) none or upto six micro operations of another kind
The minimum number of bits in the micro-instruction is
None of the above
So, no. of bits required
⌈log2 (3)⌉ = 2
(b) This is a horizontal micro-programming because at any given time atmost six micro-operations will be activated.
So, no. of bits required = 6
So, total minimum no. of bits required = 6+2 = 8
In serial communication employing 8 data bits, a parity bit and 2 stop bits, the minimum band rate required to sustain a transfer rate of 300 characters per second is
(8+2+1+1) * 300
Minimum band rate required is 4800 band.
Which of the following devices should get higher priority in assigning interrupts?
Which of the following addressing modes permits relocation without any change whatsoever in the code?
Base register addressing
PC relative addressing
Which of the following is true?
Unless enabled, a CPU will not be able to process interrupts.
Loop instructions cannot be interrupted till they complete.
A processor checks for interrupts before executing a new instruction.
Only level triggered interrupts are possible on microprocessors.
Option ‘C’ also false. A processor checks for the interrupt before fetching an instruction.
Formatting for a floppy disk refers to
arranging the data on the disk in contiguous fashion
writing the directory
erasing the system area
writing identification information on all tracks and sectors
The address space of 8086 CPU is
1 K Megabytes
(a) Draw the schematic of an 8085 based system that can be used to measure the width of a pulse. Assume that the pulse is given as a TTL compatible signal by the source which generates it.
(b) Write the 8085 Assembly Language program to measure the width of the pulse. State all your assumption clearly.
Calculate the total time required to read 35 sectors on a 2-sided floppy disk. Assume that each track has 8 sectors and the track-to-track step time is 8 milliseconds. The first sector to be read is sector 3 on track 10. Assume that the diskette is soft stored and the controller has a 1-sector buffer. The diskette spins at 300 RPM and initially, the head is on track 10.
For a set-associative Cache Organization, the parameters are as follows:
- tc — Cache access tine
tm — Main memory access time
l — number of sets
b — block size
k*b — set size
Calculate the hit ratio for a loop executed 100 times where the size of the loop is n * b and n= k * m is a non-zero integer and 1 < m ≤ l.
Given the value of the hit ratio for l = 1.
The main memory of a computer has 2 cm blocks while the cache has 2c blocks. If the cache uses the set associative mapping scheme with 2 blocks per set, then block k of the main memory maps to the set
(k mod m) of the cache
(k mod c) of the cache
(k mod 2c) of the cache
(k mod 2 cm) of the cache
∴ Cache set no. to which block k of main memory maps to
= (Main memory block no.) mod (Total set in cache)
= k mod c
Raid configurations of the disks are used to provide
High data density
None of the above
So, we can say that RAID is used to provide fault tolerance.
Arrange the following configuration for CPU in decreasing order of operating speeds: Hard wired control, vertical microprogramming, horizontal microprogramming.
Hard wired control, vertical micro-programming, horizontal micro- programming.
Hard wired control, horizontal micro-programming, vertical micro- programming.
Horizontal micro-programming, vertical micro-programming, Hard wired control.
Vertical micro-programming, horizontal micro-programming, hard wired control.
Now between horizontal and vertical microprogramming, the signals in vertical microprogramming are in encoded form which has to be decoded first using decoder. So, horizontal microprogramming is faster than vertical microprogramming.
The main differences(s) between a CISC and A RISC processor is/are that a RISC processor typically
has fewer instructions
has fewer addressing modes
has more registers
is easier to implement using hard-wired control logic
All the above
A certain processor supports only the immediate and the direct addressing modes. Which of the following programming language features cannot be implemented on this processor?
Recursive procedures with local variable
All the above
B) Cannot be implemented because arrays need Register indexing.
C) Records also needs pointers which needs indirect addressing modes, so this also cannot be implemented.
D) Recursive procedures needs stack, and so it needs stack pointers which in turn needs indirect addressing. So this also cannot be implemented.
An instruction pipeline consists of 4 stages: Fetch(F), Decode operand field (D), Execute (E), and Result-Write (W). The five instructions in a certain instruction sequence need these stages for the different number of clock cycles as shown by the table below.
Find the number of clock cycles needed to perform the 5 instructions.
Consider the following program fragment in the assembly language of a certain hypothetical processor. The processor has three general purpose registers R1, R2 and R3. The meanings of the instructions are shown by comments (starting with 😉 after the instructions.
X: CMP R1, 0 ; Compare R1 and 0, set flags appropriately in status register JZ Z ; Jump if zero to target Z MOV R2, R1 ; Copy contents of R1 to R2 SHR R1 ; Shift right R1 by 1 bit SHL R1 ; Shift left R1 by 1 bit CMP R2, R1 ; Compare R2 and R1 and set flag in status register JZ Y ; Jump if zero to target Y INC R3 ; Increment R3 by 1; Y: SHR R1 ; Shift right R1 by 1 bit JMP X ; Jump to target X Z:...
(a) Initially R1, R2 and R3 contain the value 5, 0 and 0 respectively. What are the final values of R1 and R3 when control reaches Z?
(b) In general, if R1, R2 and R3 initially contain the values n, 0 and 0 respectively. What is the final value of R3 when control reaches Z?
Design a 2K x 8 (2048 locations, each 8 bit wide) memory system mapped at addresses (1000)16 to (17FF)16 for the 8085 processor using four 1K x 4 memory chips. Each of these chips has the following signal pins:
- (i) (Chip select, data lines are in high impedance state when it is 1)
(ii) (0 for read operation)
(iii) (0 for write operation)
(iv) A0, A1, …A9(input address lines. A0 is the lest significant)
(v) D0, D1, D2, D3(bi-directional data lines. D0 is the least significant)
To put the 8085 microprocessor in the wait state
lower the HOLD input
lower the READY input
raise the HOLD input
raise the READY input
Comparing the time T1 taken for a single instruction on a pipelined CPU with time T2 taken on a non-pipelined but identical CPU, we can say that
T1 ≤ T2
T1 ≥ T2
T1 < T2
T1 is T2 plus the time taken for one instruction fetch cycle
Pipelining is an implementation technique where multiple instructions are overlapped in execution. It has a high throughput (amount of instructions executed per unit time). In pipelining, many instructions are executed at the same time and execution is completed in fewer cycles. The pipeline is filled by the CPU scheduler from a pool of work which is waiting to occur. Each execution unit has a pipeline associated with it, so as to have work pre-planned. The efficiency of pipelining system depends upon the effectiveness of CPU scheduler.
NON- PIPELINING SYSTEM:
All the actions (fetching, decoding, executing of instructions and writing the results into the memory) are grouped into a single step. It has a low throughput.
Only one instruction is executed per unit time and execution process requires more number of cycles. The CPU scheduler in the case of non-pipelining system merely chooses from the pool of waiting work when an execution unit gives a signal that it is free. It is not dependent on CPU scheduler.
The 8085 microprocessor responds to the present of an interrupt
as soon as the TRAP pin becomes ‘high’
by checking the TRAP pin for ‘high’ status at the end of each instruction each
by checking the TRAP pin for ‘high’ status at the end of the execution of each instruction
by checking the TRAP pin for ‘high’ status at regular intervals
The most appropriate matching for the following pairs
X: Indirect addressing 1 : Loops Y: Immediate addressing 2 : Pointers Z: Auto decrement addressing 3: Constants
X – 3 Y – 2 Z – 1
X – 1 Y – 3 Z – 2
X – 2 Y – 3 Z – 1
X – 3 Y – 1 Z – 2
Indirect addressing means that the address of the data is held in an intermediate location so that the address is first ‘looked up’ and then used to locate the data itself.
Immediate Addressing. An immediate operand has a constant value or an expression. When an instruction with two operands uses immediate addressing, the first operand may be a register or memory location, and the second operand is an immediate constant.
Auto increment or decrements can be one by using loops.