CHENNAI-MATHEMATICAL INSTITUTE-(M.Sc. / Ph.D. Programme-in-Computer -Science)-15-May-2013

CHENNAI MATHEMATICAL INSTITUTE (M.Sc. / Ph.D. Programme in Computer Science)15 May 2013

Please wait while the activity loads.
If this activity does not load, try refreshing your browser. Also, this page requires javascript. Please visit using a browser with javascript enabled.

If loading fails, click here to try again

Question 1

Ball Mart has 10⁷ different items in stock across all its stores worldwide. The company has collected billing data for 10 ¹⁰ customer transactions. Each individual bill has at most 10 distinct items in it. Ball Mart’s CEO wants to optimize the company’s inventory and has asked for a list of those items that appear in at least 2% of the billed transactions. Which of the following is the most precise upper bound one can compute for the number of such items, given the data?

A	500
B	1000
C	5000
D	20000

AlgorithmsTime-Complexity

Question 1 Explanation:

An item that is in 2% of the bills must appear in 2 × 10⁸ bills. Across all bills, there are at most (10 ¹⁰ ) × 10 = 10 ¹¹ items mentioned. So at most (10 ¹¹ )/(2 × 10⁸ ) = 500 items can appear in 2% of the bills. The number of items in stock is irrelevant.

Question 2

10% of all email you receive is spam. Your spam filter is 90% reliable: that is, 90% of the mails it marks as spam are indeed spam and 90% of spam mails are correctly labelled as spam. If you see a mail marked spam by your filter, what is the probability that it really is spam?

A	10%
B	50%
C	70%
D	90%

Engineering-MathematicsProbability

Question 2 Explanation:

Out of 100 mails, 10 are spam. The filter will label 9 of 10 spam as spam and 9 of 90 non-spam as spam. So 18 are labelled spam, of which 9 are actually spam. You can compute the same result more formally using conditional probabilities.

Question 3

When a user submits a query, a search engine does the following. For every webpage that has been visited by the search engine, it computes a score indicating how relevant that page is to the query. Finally, it reports the pages with the top k scores on the screen, for a number k specified by the user. A good data structure for accumulating the scores and ranking them is:

A	a queue
B	a heap
C	a stack
D	a binary search tree

Data-StructuresHeap-Tree

Question 3 Explanation:

Let n be the number of pages visited by the search engine at the time a query is submitted. Assume that it takes constant time to compute the relevance score for each page w.r.t. a query. Then it takes O(n) time to compute the relevance scores, a further O(n) time to build a heap of n relevance scores, and O(k · log n) time for k delete-max operations to return the top k scores.

Question 4

Consider the set of all words over the alphabet {x, y, z} where the number of y’s is not divisible by 2 or 7 and no x appears after a z. This language is:

A	regular
B	not known to be regular
C	context-free but not regular
D	recursively enumerable but not context-free

Theory-of-ComputationRegular-Language

Question 4 Explanation:

Let Σ = {x, y, z}. The language in question is (Σ ∗ \ (L ₁ ∪ L ₂ )) ∩ L ₃ where
L ₁ = {w ∈ Σ ∗ | the number of y’s is divisible by 2},
L ₂ = {w ∈ Σ ∗ | the number of y’s is divisible by 7}, and
L ₃ = {w ∈ Σ ∗ | no x appears after a z}.
It suffices to show that all these three languages are regular, and appeal to the fact that regular languages are closed under boolean operations.
They are described by the following regular expressions, respectively.
r ₁ = ((x + z) ∗ y(x + z) ∗ y(x + z) ∗ ) ∗
r ₂ = ((x + z) ∗ y(x + z) ∗ y(x + z) ∗ y(x + z) ∗ y(x + z) ∗ y(x + z) ∗ y(x + z) ∗ y(x + z) ∗ ) ∗
r ₃ = (x + y) ∗ (y + z) ∗

There are 4 questions to complete.

Access quiz wise question and answers by becoming as a solutions adda PRO SUBSCRIBER with Ad-Free content

If you have registered and made your payment please contact solutionsadda.in@gmail.com to get access