1 / 13

Query Rewriting Using Monolingual Statistical Machine Translation

Query Rewriting Using Monolingual Statistical Machine Translation. Stefan Riezler Yi Liu Google 2010 Association for Computational Linguistics. Introduction. Create a system that learns to generate query rewrite from a large amount of user query logs.

ernie
Download Presentation

Query Rewriting Using Monolingual Statistical Machine Translation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Query Rewriting Using Monolingual Statistical Machine Translation Stefan Riezler Yi Liu Google 2010 Association for Computational Linguistics

  2. Introduction • Create a system that learns to generate query rewrite from a large amount of user query logs. • Use query expansion in Web search for evaluation of rewritten queries. • For a given set of randomly selected queries, n-best rewrites are produced. • From the changes introduced by the rewrites, expansion terms are extracted and added as alternate form.

  3. Example • For a query like herbs for chronic constipation AND operator used. Expansion terms added with OR operator. For this sentence remedies, medicine, or supplement are appropriate terms, but in this context spices are not. • Herbs for mexican cooking only spices is a good alternative.

  4. Goal • Use the translation model and language model to expand query terms in context. • Translation model proposes expansion candidates. • Query language model performs a selection in the context of the surrounding query terms. • SMT is readily applicable to this task. Apply to large parallel data of queries on the source side, and snippets of clicked search results on the target side. • Snippets introduce noise since they are not complete sentences. • TREC Data.

  5. Review: Query Expansion by Q-D Term Correlation • A session links query terms with a document: • Aggregation of clicks over sessions will reflect the preferences of multiple users (probability distribution of doc words given query words from counts over clicked docs D over sessions): • This formula considers the Query as a cohesive unit:

  6. Review: Machine Translation 1/2 • Linear Model for SMT: • Find English string e that is a translation of foreign string f using a linear combination of feature function hm(e,f) and weights lambda: • Word Alignment: • Relationship of translation model and alignment model for source language string f and targe string e is via a hidden variable describing an alignment mapping from source position j to target position aj:

  7. Review: Machine Translation 2/2 • “Sentence Aligned” parallel training data are prepared by paring user queries with snippets of clicked search results for the respective queries. • Phrase Extraction: • Maximum-likelihood estimation of sentence aligned strings: • Alignment with highest probability:

  8. Language Model • n-gram language modeling, smoothing for sparse data problems. • Ultimate task is to pick appropriate phrase translations in the context of the original query for query expansion.

  9. Data • Training data for translation model and correlation-base model consists of pairs of queries and snippets for clicked result taken from query logs. • 3 billion query-snippet pairs from which a phrase-table of 700 million query-snippet phrase translation is extracted. • Trigram trained on English queries in user logs. • N-gram cutoffs at minimum frequency of 4. • Query were avg. length of 2.6 words. • Snippets were avg. length 8.3 words.

  10. Query Expansion • Use Google, SMT-based system, correlation-based system, and correlation-based system using language model as filter. • Expansion terms: • 150,000 randomly extracted 3+ word queries rewritten by each of the systems. • For each system, expansion terms from 5-best rewrites, and stored in table that maps source phrases to target phrases in context of full query.

  11. Evaluation 1/2 • 3 independent raters, presented with queries and 10-best search results from two systems. 7-point Likert Scale

  12. Evaluation 2/2

  13. Conclusion • SMT model is flexible enough to capture the peculiarities of query-snippet translation. • Hope to apply SMT to query suggestions.

More Related