180 likes | 325 Views
This paper explores personalized query reformulation techniques by utilizing a user's Personal Information Repository (PIR), specifically analyzing desktop content. By leveraging keyword, expression, and summary-based methods, the study addresses the privacy concerns of traditional user profile systems. The proposed algorithms enhance the search experience through automatic query expansion and relevancy feedback using local and global document analyses. The effectiveness of various techniques, including term co-occurrence and lexical analysis, is tested against standard search algorithms, demonstrating improved query clarity and scope.
E N D
Personalized Query Expansion for the Web P. Chirita, C. S. Firan, & W. Nejdl Published in SIGIR 07
Introduction • Web query reformulation by exploiting the user’s Personal Information Repository (PIR) • Desktop (as a PIR) is a rich repository of information about the user’s interest. • Keyword, expression, and summary based techniques are proposed.
Previous Work • Personalized Search • User profiles: • ex. User profiling based on browsing history • Requires server side storage for all personal information, raising privacy concerns. • The actual search algorithm • Build the personalization aspect directly into Page-Rank (a target set of pages)
Previous Work • Automatic Query Expansion • Exploiting various social or collection specific characteristics to generate additional terms • Relevance Feedback Techniques • TF, DF, summarization • Co-occurrence Based Techniques • Highly co-occurring terms, terms in lexical affinity relationships are added. • Thesaurus Based Techniques: WordNet • Closely related terms in meaning are added.
Expanding with Local Desktop Analysis • TF • DF • Given the set of Top-K relevant Desktop documents • Generate their snippets as focused on the original search request • Identify the set of candidate terms • Order them according to the DF scores they are associated with nrWords: the total number of terms in the documentpos: the position of the first appearance of the term
Lexical Compounds • Use simple noun analysis • Sentence Selection • Identify the set of relevant Desktop documents • Generate a summary containing their most important sentences • Treshold
Cosine Similarity • Mutual Information • Likelihood Ratio
Experiments • 4 queries were chosen • One very frequent AltaVista query • One randomly selected log query • One self-selected specific query • One self-selected ambiguous query • Collect the top-5 URL generated by 20 version of algorithms. Shuffle them. Each subject assess about 325 documents for 4 queries • Give a rating ranging from 0 to 2. • Assessed with NDCG (Normalized Discounted Cumulative Gain) • T-test was done.
Algorithm • Tested • Base line: Google • TF, DF • LC, LC(O): Lexical Compounds regular and optimized (considering one top compound) • SS: Sentence Selection • TC[CS], TC[MI], TC[LR]: Term Co-occurrence Statistics using respectively Cosine Similarity, Mutual Information, and Likelihood Ratio as similarity coefficients • WN[SYN], WN[SUB], WN[SUP]: with synonyms, sub-concepts, and super-concepts
Adaptivity • Query Scope • Query Clarity the probability of the word w within the submitted query the probability of w within the entire collection of documents
Query Formulation Process • the newly added terms are more likely to convey information about her search goals • giving more weight to new keywords
Application to the Project • Collected news articles by the user can be treated as the user’s desktop. So that we can apply their algorithms to our system.