1 / 34

Learning Joint Query Interpretation and Response Ranking

Learning Joint Query Interpretation and Response Ranking. Uma Sawant Soumen Chakrabarti IIT Bombay. Searching the “Web of things”. At least 14% of Web search queries mention target type or category. Lin et. al., WWW 2012. Telegraphic entity search queries.

darci
Download Presentation

Learning Joint Query Interpretation and Response Ranking

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Learning Joint Query Interpretation and Response Ranking Uma SawantSoumen Chakrabarti IIT Bombay

  2. Searching the “Web of things” At least 14% of Web search queries mention target type or category Lin et. al., WWW 2012

  3. Telegraphic entity search queries • No reliable syntax clues for the search engine • Free word order • No or rare capitalization • Rare to find quoted phrases • Few function or relational words

  4. How to answer entity queries?(simplified view of related work) Telegraphic NLQ 2-stage process Template Knowledge base Query Interpretation e1 e2 e3 Execution Ready Query Ranking

  5. Our Proposal Annotated Corpus Joint Query Interpretation and Ranking Interpretation Interpretation Interpretation Generative and Discriminative models e1 e2 e3 Telegraphic Query response response response Multiple Interpretations

  6. The annotated Web Type: All Type hierarchy subTypeOf Type: Major_league_baseball_teams instanceOf Entity: San_Diego_Padres Annotateddocument mentionOf … By comparison, the Padres have been to two World Series, losing in 1984 and 1998. …

  7. Query = type hints + word matchers • Large type catalog • Most query words match some type • Padres rarely co-occurs with hockey • Can know this only from corpus stats Incorrect type:World_Series_Hockey_teams Query: losingteam baseball world series1998 Query: losingteam baseball world series1998 Query: losingteam baseball world series1998

  8. Query = type hints + word matchers • Large type catalog • Most query words match some type • Padres rarely co-occurs with hockey • Can know this only from corpus stats • Need joint type inference and snippet scoring Query: losingteambaseballworld series1998 Correct Type:Major_league_baseball_teams Word matches instanceOf Entity: San Diego Padres mentionOf By comparison, the Padres have been to twoWorld Series, losing in 1984 and 1998. Evidence snippet

  9. Generative model : generate query from entity E San Diego Padres context type Padres have been to two World Series, losing in 1984 and 1998 Major league baseball team T   model model Type hint : baseball , team Context matchers : lost , 1998, world series switch Z q losing team baseball world series 1998 losing team baseball world series 1998

  10. Generative approach : plate diagram Type description language model “Switch” variables: word hints at type or is a matcher?  Generate query word T W Z For each query word… Choose type to describe entity E Entity context language model  Choose entity For each query matchers hints

  11. Discriminative model : separatecorrect and incorrect entities q : losing team baseball world series 1998 San_Diego_Padres 1998_World_Series losing team baseball world series 1998 (baseball team) losing team baseball world series 1998 (baseball team) losing team baseball world series 1998 (series) losing team baseball world series 1998 (t = baseball team) losing team baseball world series 1998 (series) losing team baseball world series 1998 (t = series) Chakrabarti

  12. Feature vector design inspired by generative Generative: Models entity prior Compatibility between hint words and type Discriminative: Hints Matchers Feature vector given query, entity, type, switches Models type prior Pr(t|e) Compatibility between matchers and snippets that mention e

  13. Discriminative framework • Constraints are formulated using the best scoring interpretation • Non-convex formulation • Annealing algorithms

  14. Testbed • YAGO entity and type catalog • ~0.2 million types and 1.9 million entities • Annotated corpus • Web corpus having 500 million pages • ~ 16 annotations per page • ~700 entity search queries • TREC + INEX • Converted to telegraphic form, with most probable type and answer entities

  15. Experiment 1 : Entity ranking using joint inference • To reach : Human recommended type • To surpass : Most generic type in catalog (no type inference) • Entity level ndcg measure (map and mrr follow the same trend, details in paper)

  16. Human > Discriminative > Generative > Generic Human > ?? > Generic 0.8 0.7 0.6 0.5 NDCG 0.4 human 0.3 discriminative generative 0.2 generic 0.1 Rank 1 2 3 4 5 6 7 8 9 10 • Generative significantly better than generic (lower) • Generative fills 28% gap to human (upper) • Discriminative significantly better than generic (lower) • Discriminative fills 43% gap to human (upper) • Discriminative significantly better than generative • Easier to handle balance diverse scales of probabilities

  17. Generic v/s discriminative Correct hint match & type choice cathedral claude monet painting Incorrect hint match & type choice amazing grace hymn writer

  18. Discriminative better than human • Correct entity unreachable from human recommended type • discriminative recovers using corpus feedback Discriminative patsy cline producer patsy cline producer producer manufacturer Owen Bradley

  19. Experiment 2 : Target Type Inference • Aggregate ranks of top-k interpretations to rank types • Compare type-level ndcg with B&N 2012 possible target type hermitage museum bank river (river) river museum building hermitage museum bank river (museum) . . . k . . . hermitage museum bankriver (building)

  20. Joint prediction improves type inference • Data : [B&N 2012], Dbpedia catalog • Joint prediction improves type inference too!

  21. Experiment 3 : joint v/s two-stage • Two-stage • Best type prediction from experiment (2) • Launch type restricted query on annotated corpus • Top m types to improve recall • Measure entity-level ndcg Stage 1 Type inference Stage 2 Ranking Form query river museum building (river) + matchers (river OR museum) + matchers Ranking

  22. Joint entity ranking ?? two-stage Joint entity ranking better than two-stage 0.6 0.5 NDCG 0.4 Joint 2stage(m=1) 0.3 2stage(m=5) 2stage(m=10) 0.2 Rank 1 2 3 4 5 6 7 8 9 10 • Not much difference with the benefit of more types in 2-stage • Joint type prediction and ranking significantly better than 2-stage

  23. Conclusion • Large percentage of Web search queries contain a mention of the target type • Identification of target type hint words and type itself is rewarding, but non-trivial • Joint query interpretation and ranking approach significantly better than two stage • Joint prediction improves type inference • Datasets available at bit.ly/WSpxvr

  24. Questions?

  25. References • Patrick Pantel, Thomas Lin, Michael Gamon: Mining Entity Types from Query Logs via User Intent Modeling. ACL (1) 2012: 563-571 • K. Balog and R. Neumayer: Hierarchical Target Type Identification for Entity-oriented Queries, In CIKM 2012, October 2012 • T. Lin, P. Pantel, M. Gamon, A. Kannan, A. Fuxman: Active Objects: Actions for Entity-Centric Search, WWW 2012

  26. Extra slides

  27. Components of the model • Entity prior • (Weighted) fraction of snippets attached to an entity in the corpus • Type • Generality or specificity of types • Hint-type compatibility • Probability of generating hint words from a language model built using type description • Hint sub-sequence matches some type name exactly • Matcher-entity compatibility • Weighted fraction of snippets attached to an entity, retrieved using matchers • Rarity of matchers + number of supporting snippets Chakrabarti

  28. Implementation details • Additive features • One generic query executed on index, rest in memory • Pruned large search space using easy heuristics • Continuous hint words

  29. Query:ymcalyrics Query:ymcaaddress Entity:YMCA_(org) Entity:YMCA_(song) Learn topic model Learn topic model instanceOf instanceOf Type: Organization Type: Music Not entity disambiguation in query • ymca in query refers to song or organization? • Similar to entity disambiguation in documents • Uses accompanying words • Misinterpreting target type: usually disastrous • Avoid early or hard commitment

  30. Future work • Better type description model • More generic query than “hint+matchers” • Entities as literals • Different models • Explore non-linear models (boosting) • List-wise loss • Use click data

  31. Generative framework Type description language model “Switch” variables: decide if word hints at type or is a matcher  Generate query word T W Z Choose type to describe entity E  For each query word… Choose entity to describe Entity context language model For each query…

  32. Discriminative framework Models entity prior Compatibility between hint words and type Feature vector given query, entity, type, switches Hints Matchers Models type prior Pr(t|e) Compatibility between matchers and snippets that mention e Given q, score of response e is: Ranking model trained by distant supervision

  33. Joint entity ranking better than two-stage • State of the art target type predictor • Does not use corpus information • Pick top k types to improve type recall • Launch type-restricted query on annotated corpus • Significantlyworse than jointtype predictionand ranking

  34. How to answer entity queries?(simplified viewof related work) Annotated Corpus Telegraphic Knowledge NLQ RDF tuples Tables Template 2-stage process Query Interpretation e1 e2 e3 Execution Ready Query Ranking

More Related