120 likes | 208 Views
This study analyzes the mutual information and choice of using "AND" and "OR" operators in the context of Great Britain health care terms. By comparing their performance in Mean Average Precision (MAP), the study examines how these operators affect relevance and information retrieval. The relationships between the terms (G,B)-(H,C) are explored, and an advanced Boolean operation of combining terms is discussed. The study also evaluates the relationship between mutual information and the difference between MAP for OR and MAP for AND operations. Future work includes investigating broader queries and exploring the impact of MI variances on using AND or OR operators in information retrieval algorithms.
E N D
Mutual Information and Choice of AND and OR Dayu 18 Nov 2005
An Example • Query No.605 Great Britain health care • We choose it because it consists of 4 terms • Performance in MAP
Using Two terms • Based on the performance (in MAP) of “AND” and “OR” two terms, we guess the manner that these two terms affect relevance • Great Britain health care G B H C
What does “Yes” mean? • If “Yes” (i.e. MAPAND> MAPOR), it means that these two terms can complement or disambiguate each other to make more relevant information. • Denoted by term1-term2 • If “No” (i.e. MAPAND< MAPOR), it means that these two terms • (1) seldom co-occur or • (2) more or less synonyms • Denoted by (term1,term2) • If MAPAND≈ MAPOR, it means that these two terms always co-occur
Overall Relationships In conclusion, relationships of each pair of the four terms are consistent. It’s (G,B)-(H,C)
Advanced Boolean Operation • (G,B)-(H,C) • Could we use (G or B) and (H or C)? • Performance MAP=0.0762 • Compared with:
A Method to estimate the relationship using MI • By mutual information. • MI=P(A,B)/P(A)P(B) • P(A,B)= # of IUs contains both A and B / total # of IUs • P(A)= # of IUs contains A / total # of IUs • P(B)= # of IUs contains A / total # of IUs Hypothesis: The MI is bigger, we have more confidence to use OR
Relationship between MI and (MAPor-MAPand)/min(MAPand,MAPor)
Social b 0.78 0.78 Tax 1.07 a 1.07 Securities 0.79 c 0.79 Variance of MI = 0.019
Query: SDI Star Wars b 0. 8 a Variance of MI = 0.076 1.1 c 0.4
Query: college education advantage b 0. 56 Variance of MI = 0.017 a 0.41 c 0.23
Future Work • Investigate on more widespread queries. • Does the variance of MI between each pair affect to use AND or OR? • Should we additionally bring MI of two terms into the computation of allo-T edge?