Mutual Information and Choice of AND and OR

Mutual Information and Choice of AND and OR. Dayu 18 Nov 2005. An Example. Query No. 605 Great Britain health care We choose it because it consists of 4 terms Performance in MAP. Using Two terms.

Dayu

18 Nov 2005

An Example
• Query No.605 Great Britain health care
• We choose it because it consists of 4 terms
• Performance in MAP
Using Two terms
• Based on the performance (in MAP) of “AND” and “OR” two terms, we guess the manner that these two terms affect relevance
• Great Britain health care  G B H C
What does “Yes” mean?
• If “Yes” (i.e. MAPAND> MAPOR), it means that these two terms can complement or disambiguate each other to make more relevant information.
• Denoted by term1-term2
• If “No” (i.e. MAPAND< MAPOR), it means that these two terms
• (1) seldom co-occur or
• (2) more or less synonyms
• Denoted by (term1,term2)
• If MAPAND≈ MAPOR, it means that these two terms always co-occur
Overall Relationships

In conclusion, relationships of each pair of the four terms are

consistent. It’s (G,B)-(H,C)

• (G,B)-(H,C)
• Could we use (G or B) and (H or C)?
• Performance MAP=0.0762
• Compared with:
A Method to estimate the relationship using MI
• By mutual information.
• MI=P(A,B)/P(A)P(B)
• P(A,B)= # of IUs contains both A and B / total # of IUs
• P(A)= # of IUs contains A / total # of IUs
• P(B)= # of IUs contains A / total # of IUs

Hypothesis: The MI is bigger, we have more confidence to use OR

Social

b

0.78

0.78

Tax

1.07

a

1.07

Securities

0.79

c

0.79

Variance of MI = 0.019

Query: SDI Star Wars

b

0. 8

a

Variance of MI = 0.076

1.1

c

0.4

b

0. 56

Variance of MI = 0.017

a

0.41

c

0.23

Future Work
• Investigate on more widespread queries.
• Does the variance of MI between each pair affect to use AND or OR?
• Should we additionally bring MI of two terms into the computation of allo-T edge?