1 / 18

Personalized Query Expansion for the Web

Personalized Query Expansion for the Web. P. Chirita , C. S. Firan , & W. Nejdl Published in SIGIR 07. Introduction. Web query reformulation by exploiting the user’s Personal Information Repository (PIR) Desktop (as a PIR) is a rich repository of information about the user’s interest.

adie
Download Presentation

Personalized Query Expansion for the Web

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Personalized Query Expansion for the Web P. Chirita, C. S. Firan, & W. Nejdl Published in SIGIR 07

  2. Introduction • Web query reformulation by exploiting the user’s Personal Information Repository (PIR) • Desktop (as a PIR) is a rich repository of information about the user’s interest. • Keyword, expression, and summary based techniques are proposed.

  3. Previous Work • Personalized Search • User profiles: • ex. User profiling based on browsing history • Requires server side storage for all personal information, raising privacy concerns. • The actual search algorithm • Build the personalization aspect directly into Page-Rank (a target set of pages)

  4. Previous Work • Automatic Query Expansion • Exploiting various social or collection specific characteristics to generate additional terms • Relevance Feedback Techniques • TF, DF, summarization • Co-occurrence Based Techniques • Highly co-occurring terms, terms in lexical affinity relationships are added. • Thesaurus Based Techniques: WordNet • Closely related terms in meaning are added.

  5. Expanding with Local Desktop Analysis • TF • DF • Given the set of Top-K relevant Desktop documents • Generate their snippets as focused on the original search request • Identify the set of candidate terms • Order them according to the DF scores they are associated with nrWords: the total number of terms in the documentpos: the position of the first appearance of the term

  6. Lexical Compounds • Use simple noun analysis • Sentence Selection • Identify the set of relevant Desktop documents • Generate a summary containing their most important sentences • Treshold

  7. PS is calculated for the first 10 sentences.

  8. Expanding with Global Desktop Analysis

  9. Cosine Similarity • Mutual Information • Likelihood Ratio

  10. Thesaurus based expansion

  11. Experiments • 4 queries were chosen • One very frequent AltaVista query • One randomly selected log query • One self-selected specific query • One self-selected ambiguous query • Collect the top-5 URL generated by 20 version of algorithms. Shuffle them. Each subject assess about 325 documents for 4 queries • Give a rating ranging from 0 to 2. • Assessed with NDCG (Normalized Discounted Cumulative Gain) • T-test was done.

  12. Algorithm • Tested • Base line: Google • TF, DF • LC, LC(O): Lexical Compounds regular and optimized (considering one top compound) • SS: Sentence Selection • TC[CS], TC[MI], TC[LR]: Term Co-occurrence Statistics using respectively Cosine Similarity, Mutual Information, and Likelihood Ratio as similarity coefficients • WN[SYN], WN[SUB], WN[SUP]: with synonyms, sub-concepts, and super-concepts

  13. Results

  14. Adaptivity • Query Scope • Query Clarity the probability of the word w within the submitted query the probability of w within the entire collection of documents

  15. Query Formulation Process • the newly added terms are more likely to convey information about her search goals • giving more weight to new keywords

  16. Application to the Project • Collected news articles by the user can be treated as the user’s desktop. So that we can apply their algorithms to our system.

More Related