1 / 21

Enrich Query Representation by Query Understanding

Enrich Query Representation by Query Understanding. Gu Xu Microsoft Research Asia. Mismatching Problem. Mismatching is Fundamental Problem in Search Examples: NY ↔ New York, game cheats ↔ game cheatcodes Search Engine Challenges Head or frequent queries

jabari
Download Presentation

Enrich Query Representation by Query Understanding

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Enrich Query Representation by Query Understanding Gu Xu Microsoft Research Asia

  2. Mismatching Problem • Mismatching is Fundamental Problem in Search • Examples: • NY ↔ New York, game cheats ↔ game cheatcodes • Search Engine Challenges • Head or frequent queries • Rich information available: clicks, query sessions, anchor texts, and etc. • Tail or infrequent queries • Information becomes sparse and limited • Our Proposal • Enrich both queries and documents and conduct matching on the enriched representation.

  3. Matching at Different Semantic Levels Structure Match intent with answers (structures of query and document) Microsoft Office home find homepage of Microsoft Office 21 movie find movie named 21 buy laptop less than 1000 find online dealers to buy laptop with less than 1000 dollars Match topics of query and documents Topic Level of Semantics … working for Microsoft … my office is in … Microsoft Office Topic: PC Software Topic: Personal Homepage Match terms with same meanings Sense utube youtube NY New York motherboard mainboard Match exactly same terms Term NY New York disk disc

  4. Enrich Query Representation Query Parsing <person-name> michaeljordan</person-name> <location>berkeley</location> Named entity segmentation and disambiguation Large-scale knowledge base Structure Level Query Classification <query-topics> academic </query-topics> Definition of classes Accuracy & efficiency Topic Level Query Refinement Alternative Query Finding ill-formed well-formed <correction token =“berkele”> berkeley</correction> <similar-queries> michael I. jordanberkeley </ similar-queries > Ambiguity: msil or mail Equivalence (or dependency): department or dept, login or sign on Sense Level Tokenization <token>michael</token> <token>jordan</token> <token>berkele</token> C# C 1,000 1 000 MAX_PATH MAX PATH Term Level michaeljordanberkele Understanding Representation

  5. Query Refinement Using CRF-QR (SIGIR’08)

  6. Query Refinement Papers on Machin Learn Spelling Error Correction Inflection “ ” Machine Learning Papers on Phrase Segmentation Operations are mutually dependant: Spelling Error Correction Inflection Phrase Segmentation

  7. Conventional CRF papers on machin learn papers machin on X x0 x1 x2 x3 learn …… …… papers on machin learn Y y00 y10 y20 y30 papers learn machin learns on paper in upon machine learning paper in machine learning machines …… …… y01 y11 y21 y31 …… …… …… …… …… …… …… …… … … … … Intractable

  8. CRF for Query Refinement h O X Y

  9. CRF for Query Refinement lean walk machined super soccer machining data the learning paper mp3 book think macin clearn O machina lyrics learned new pc com lear machi harry machine journal university net course blearn X … … … … … … … … … … … y2 y3 Y o2 o3 x2 x3 machin learn 1. Oconstrains the mapping from X to Y(Reduce Space)

  10. CRF for Query Refinement walk super soccer data the paper mp3 book think O lyrics new pc com harry journal university net course X … … … … … … … … … … … machined machi macin learned lear clearn machine machina machining blearn lean learning Y y2 y2 y2 y2 y3 y3 y3 y3 Insertion Insertion +ed +ed +ing +ing x2 Deletion x3 Deletion machin learn 1. Oconstrains the mapping from X to Y(Reduce Space) 2. Oindexesthe mapping from X to Y(Sharing Parameters)

  11. Named Entity Recognition in Query (SIGIR’09, SIGKDD’09)

  12. Named Entity Recognition in Query harry potter author harry potter harry potter film harry potter film harry potter – Movie (0.95) harry potter author harry potter – Book (0.95) harry potter – Movie (0.5) harry potter – Book (0.4) harry potter – Game (0.1)

  13. Challenges • Named Entity Recognition in Document • Challenges • Queries are short (2-3 words on average) • Less context features • Queries are not well-formed (typos, lower cased, …) • Less content features • Knowledge Database • Coverage and Freshness • Ambiguity

  14. Our Approach to NERQ q e c t Harry Potter Walkthrough “Harry Potter” (Named Entity) + “# Walkthrough” (Context) “Game” Class • Goal of NERQ becomes to find the best triple (e, t, c)* for query q satisfying

  15. Training With Topic Model • Ideal Training Data T = {(ei, ti, ci)} • Real Training Data T = {(ei, ti, *)} • Queries are ambiguous (harry potter, harry potter review) • Training data are a relatively few

  16. Training With Topic Model (cont.) e t c harry potter kung fu panda iron man …………………… …………………… ………………………………………… …………………… # wallpapers # movies # walkthrough # book price …………………… …………………… ………………………………………… Movie Game Book …………………… # is a placeholder for name entity. Here # means “harry potter” Topics

  17. Weakly Supervised Topic Model • Introducing Supervisions • Supervisions are always better • Alignment between Implicit Topics and Explicit Classes • Weak Supervisions • Label named entities rather than queries (doc. class labels) • Multiple class labels (binary Indicator) Kung Fu Panda ? ? Movie Game Book Distribution Over Classes

  18. WS-LDA • LDA + Soft Constraints (w.r.t. Supervisions) • Soft Constraints Soft Constraints LDA Probability Document Probability on i-th Class Document Binary Label on i-th Class 1 1 0 1 1 0

  19. Extension: Leveraging Clicks Game Movie Book Context t # wallpapers # movies # walkthrough # book price …………………… URL words Title words Snippet words Content words Other features Clicked Host Name t’ www.imdb.com www.wikipedia.com www.gamespot.com www.sparknotes.com cheats.ign.com ……………………

  20. Summary The goal of query understanding is to enrich query representation and essentially solve the problem of term mismatching.

  21. Thanks!

More Related