1 / 24

Cross-Lingual Query Suggestion Using Query Logs of Different Languages

Cross-Lingual Query Suggestion Using Query Logs of Different Languages. SIGIR 07. Abstract. Query suggestion To suggest relevant queries for a given query To help users better specify their information needs Cross-Lingual Query Suggestion (CLQS):

bree
Download Presentation

Cross-Lingual Query Suggestion Using Query Logs of Different Languages

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Cross-LingualQuery Suggestion Using Query Logs of Different Languages SIGIR 07

  2. Abstract • Query suggestion • To suggest relevant queries for a given query • To help users better specify their information needs • Cross-Lingual Query Suggestion (CLQS): • For a query in one language, we suggest similar or relevant queries in other languages. • cross-lingual keyword bidding (Search Engine) • cross-language information retrieval (CLIR)

  3. Introduction • CLQS vs. Cross-Lingual Query Expansion • Full queries formulated by users in another language. • The users of search engines • similar interests in the same period of time • queries on similar topics in different languages • Key point • How to learn a similarity measure between two queries • MLQS: Term Co-Occurrence based MI and c2

  4. Estimating Cross-Lingual Query similarity • Discriminative Model for Estimating Cross-Lingual Query Similarity • Monolingual Query Similarity Measure Based on Click-through Information • Features Used for Learning Cross-Lingual Query Similarity Measure • Bilingual Dictionary • Parallel Corpora • Online Mining for Related Queries • Monolingual Query Suggestion • Estimating Cross-lingual Query Similarity

  5. Discriminative Model for Estimating Cross-Lingual Query Similarity – 1/2 • qf : a source language query • qe : a target language query • simML : Monolingual query similarity • simCL : Cross-lingual query similarity • Tqf : translation of qf in the target language

  6. Discriminative Model for Estimating Cross-Lingual Query Similarity – 2/2 • Learning: LIBSVM regression algorithm • f : feature functions • f : mapping feature space onto kernel space • w : weight vector in the kernel space • relevant vs. irrelevant • strongly relevant, weakly relevant or irrelevant

  7. Estimating Cross-Lingual Query similarity • Discriminative Model for Estimating Cross-Lingual Query Similarity • Monolingual Query Similarity Measure Based on Click-through Information • Features Used for Learning Cross-Lingual Query Similarity Measure • Bilingual Dictionary • Parallel Corpora • Online Mining for Related Queries • Monolingual Query Suggestion • Estimating Cross-lingual Query Similarity

  8. Monolingual Query Similarity Measure Based on Click-through Information • click-through information in query logs [26] • KN(x) : number of keyword in a query x • RD(x): number of clicked URLs for a query x • a = 0.4 , b =0.6

  9. Estimating Cross-Lingual Query similarity • Discriminative Model for Estimating Cross-Lingual Query Similarity • Monolingual Query Similarity Measure Based on Click-through Information • Features Used for Learning Cross-Lingual Query Similarity Measure • Bilingual Dictionary • Parallel Corpora • Online Mining for Related Queries • Monolingual Query Suggestion • Estimating Cross-lingual Query Similarity

  10. 1. Bilingual Dictionary – 1/2 • 120,000 unique entries (built-in-house) • Given an input query qf={wf1,wf2,…,wfn} (in source language) • By bilingual dictionary D: D(wfi)={ti1,ti2,…,tim} • C(x,y) is the number of queries in the log containing both x and y. • C(x) is the number of queries in the log containing x. • N is the total number of queries in the log

  11. 1.Bilingual Dictionary – 2/2 • The set of top-4 query translations is denoted as S(Tqf) • T  S(Tqf) • Retrieve all queries containing T in target language and assign Sdict(T) as their value

  12. 2. Parallel Corpora • Given a pair of queries • qf : in the source language • qe : in the target language • Bi-Directional Translation Score : • IBM model 1 & GIZA++ tool • P(yj|xi) is the word to word translation probability • Top 10 queries {qe} with qf from the query log

  13. 3. Online Mining for Related Queries – 1/3 • OOV is a major knowledge bottleneck for query translation and CLIR • Assumption : • A query in the target co-occurs with the source query in many web pages • They are probably semantically related • but, amount of noise

  14. 3. Online Mining for Related Queries – 2/3 • Frequency in the Snippets • For example: • Given a query q=abc in source language • By dictionary : a={a1,a2,a3}, b={b1,b2} and c={c1} • Web query : q ^ (a1 v a2 v a3) ^ (b1v b2) ^ (c1) in target language • 700 snippets , most frequent 10 target queries

  15. 3. Online Mining for Related Queries – 3/3 • Any query qe mined from the web will be associated with a feature CODC Measure with SCODC(qf,qe)

  16. 4. Monolingual Query Suggestion • Q0 : candidate queries (in target language) • For each target query qe, • SQML(qe) : monolingual source query

  17. Estimating Cross-Lingual Query similarity • Discriminative Model for Estimating Cross-Lingual Query Similarity • Monolingual Query Similarity Measure Based on Click-through Information • Features Used for Learning Cross-Lingual Query Similarity Measure • Bilingual Dictionary • Parallel Corpora • Online Mining for Related Queries • Monolingual Query Suggestion • Estimating Cross-lingual Query Similarity

  18. Estimating Cross-lingual Query Similarity • Four categories of features are used to learn the cross-lingual query similarity. • cross-lingual query similarity score • Learning: LIBSVM regression algorithm • f : feature functions • f : mapping feature space onto kernel space • w : weight vector in the kernel space

  19. Performance Evaluation – Log Data • Data Resources : • MSN Search Engine • French (source language) vs. English ( target language) • A one-month English query log • 7 million unique English queries • Occurrence frequency more than 5 • 5,000 French queries • 4,171 queries have their translations in the English queries • 70% training weight of LIBSVM • 10% development data • 20% testing

  20. Source Language Target Language CLIR qf CLQS {qe} BM25 Performance Evaluation - CLIR • Data Resources : • TREC6 CLIR data (AP88-90 newswire, 750MB) • 25 short French-English queries Pairs (CL1-CL25) • average long 3.3 • match in the web query logs for training CLQS

  21. CLQS

  22. CLIR

  23. Conclusion • Cross-lingual query suggestion • Query Logs • French to English • TREC6 French to English CLIR task • CLQO demonstrates the high quality

More Related