Exercise 1.2 Consider these documents:

Doc 1     breakthrough drug for schizophrenia

Doc 2     new schizophrenia drug

Doc 3     new approach for treatment of schizophrenia

Doc 4     new hopes for schizophrenia patients

a. Draw the term-document incidence matrix for this document collection.

Term/Doc

Doc 1

Doc 2

Doc 3

Doc 4

approach

0

0

1

0

breakthrough

1

0

0

0

drug

1

1

0

0

for

1

0

1

1

hopes

0

0

0

1

new

0

1

1

1

of

0

0

1

0

patients

0

0

0

1

schizophrenia

1

1

1

1

treatment

0

0

1

0

b. Draw the inverted index representation for this collection, as in Figure 1.3 (page 7)

TERM

docID

breakthrough

1

drug

1

for

1

schizophrenia

1

new

2

schizophrenia

2

drug

2

new

3

approach

3

for

3

treatment

3

of

3

schizophrenia

3

new

4

hopes

4

for

4

schizophrenia

4

patients

4

TERM

docID

approach

3

breakthrough

1

drug

1

drug

2

for

1

for

3

for

4

hopes

4

new

2

new

3

new

4

of

3

patients

4

schizophrenia

1

schizophrenia

2

schizophrenia

3

schizophrenia

4

treatment

3

TERM doc

Freq

Posting List

approach

1

3

breakthrough

1

1

drug

2

1

2

for

3

1

3

4

hopes

1

4

 

new

3

2

3

4

of

1

3

 

patients

1

4

schizophrenia

4

1

2

3

4

treatment

1

3

Exercise 1.10 Write out a postings merge algorithm, in the style of Figure 1.6 (page 11), for an x OR y query.

INTERSECT(X,Y) answer ← ( ) while X ≠ NIL or Y ≠ NIL do if docID (X) = docID (Y) then ADD(answer,docID(X)) X ← next (X) Y ← next (Y) else if docID (X) < docID (Y) then X ← next (X) else Y ← next (Y) return answer Exercise 1.7 Recommend a query processing order for d. (tangerine OR trees) AND (marmalade OR skies) AND (kaleidoscope OR eyes) given the following postings list sizes:

TERM doc Freq
eyes 213312
kaleidoscope 87009
marmalade 107913
skies 271653
tangerine 46653
trees 316812

(tangerine OR trees) AND (marmalade OR skies) AND (kaleidoscope OR eyes)

maka hasilnya trees AND skies AND eyes

Hasil query boolean di Google dan Yahoo maka Query ke google yaitu spiderman and catwomen Query ke yahoo yaitu spiderman and catwomen Maka hasilnya dari segi indexing dan perangkingan query yang nilai terbaiknya pada Google.com karena pembandingan kemiripan term lebih detail.