1 / 19

Compact Query Term Selection Using Topically Related Text

Compact Query Term Selection Using Topically Related Text. K. Tamsin Maxwell, W. Bruce Croft SIGIR 2013. Outline. Introduction Related Work Principle for Term Selection PhRank Algorithm Evaluation Framework Experiments Conlusion. Introduction.

ryann
Download Presentation

Compact Query Term Selection Using Topically Related Text

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Compact Query Term SelectionUsing Topically Related Text K. TamsinMaxwell, W. Bruce Croft SIGIR 2013

  2. Outline • Introduction • Related Work • Principle for Term Selection • PhRank Algorithm • Evaluation Framework • Experiments • Conlusion

  3. Introduction • Recent query reformulation techniques usually uses pseudo relevant feecback in their approaches. But since they consider words which not in the original query, the expansion may include peripheral words and causes query drift • PhRank also uses PRF, but uses them for in-query term selection. Each indicate term include 1-3 words, and ranked with score which from a co-occurrence graph • Here we list advantages of PhRank • It’s the first method to use PRF for in-query term selection • Only small number of terms are selected, so it retaining the flexibility for more or longer terms if required • The affinity graph captures aspects of both syntactic and non-syntactic word associations

  4. Related Work • Markov chain framework • The Markov chain framework uses the stationary distribution of a random walk over an affinity graph G to estimate the importance of vertices in the graph • A random walk describes a succession of random or semi-random steps between vertices and in • If we define transition probability between and as , and as affinity score of at time t, then is the sum of scores for each connect to

  5. Related Work • Sometimes step to some that may be unconnected, so we often define a minimum probability , where is the number of vertices in then we uses a factor to control the balance between transition probability and minimum probability

  6. Principle for Term Selection • For an informative word • Is informative relative to a query:a word should represent the meaning of query, but query usually doesn’t have enough information. PRF is used to enhancing a query representation • Is related to other informative words:The Association Hypothesis states that, “if one index term is good at discriminating relevant from non-relevant documents, then any closely associated index term is also likely to be good at this”. With a affinity graph, we can get the information above by estimate the number of word connects to a target word and the value

  7. Principle for Term Selection • For a informative term • Contains informative words:We deduce all terms must contain informative words, so we consider individual words when ranking terms • Is discriminative in retrieval collection:A term that occurs many times within a small number of documents gives a pronounced relevance signal. So we weights terms with a normalized tf.idf inspired weight

  8. The PhRank Algorithm • Graph construction • For a query, we first retrieve top documents. Then we define set as set of query itself and its relevant documents • Do stemming for documents in . Each unique word is now a vertex in graph • Edges between vertices and are connected if word and is adjacent in • Edge weights • Transition probability is based on linear combination of word and co-occur in window size of 2 and 10

  9. The PhRank Algorithm • Edge weights are defined by is the probability of document in which word and co-occur given , and and is the count of co-occur in window 2 and 10 • is the style weight confirms importance between and in

  10. The PhRank Algorithm • Random walk • A random walk of is proceed as we represent in related work • The edge weights are normalized to sum to one • The iteration stopped when the difference between any vertex dies not exceed 0.0001 • Vertex weights • The word are also weighted to exhaustiveness represent the query. Some words like “make ” would high score in affinity graph, but it is not more informative

  11. The PhRank Algorithm • We define as factor to balance exhaustively with global saliency to identify stems that are poor discriminators been relevant and non-relevant documents • For a word , is the frequency of in , and is of in

  12. The PhRank Algorithm • Term ranking • For a term , Factor represents the degree to which the term is discriminative in a collection. is defined by is the frequency of words in co-occur in 4*number of term window in collection, defined just like , and • Finally, the rank of a term for is defined as

  13. The PhRank Algorithm • After finish the rank, we still have some terms that includes uninformative words. This is because we rank terms by the whole score, so some terms would contain the similar words and decrease the diversity • We apply a simple filtering with top-down constraints • For term , If a higher rank term contains all words in or contains all words in higher rank term, we discard

  14. Evaluation Framework • Robustness • Compare with sequential dependence of Markov random field model. This model uses linear combine for query likelihood, 2 and 8 window sized bigram • Precision • The subset distribution model achieves high mean average precision • Succinctness • We use Key Concepts as the succinctness approach. This approach linear combined bag-of-words query representation and weighted bag-of-words query representation

  15. Evaluation Framework • Word dependence • We refers four models of phrase belief as the figure

  16. Experiments • We use Indri on Robust04, WT10G and GOV2 for evaluate • Feature analysis • Here we list the results of using the features in PhRank

  17. Experiments

  18. Experiments • Compare with other model

  19. Conclusion • PhRank is a novel method to select succinct term within a query which works on Markov chain frameworks • Although the term is succinct, but its risky strategy and causes the decreasing of mAP compared with sequential dependence

More Related