1 / 0

Learning to Rank from Distant Supervision: Exploiting Noisy Redundancy for Relational Entity Search

Learning to Rank from Distant Supervision: Exploiting Noisy Redundancy for Relational Entity Search. Mianwei Zhou, Hongning Wang, Kevin Chen- Chuan Chang University of Illinois Urbana Champaign. Limitation of Traditional Entity Search. Thalmic Lab founded by ?.

jalene
Download Presentation

Learning to Rank from Distant Supervision: Exploiting Noisy Redundancy for Relational Entity Search

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Learning to Rank from Distant Supervision: Exploiting Noisy Redundancy for Relational Entity Search

    Mianwei Zhou, Hongning Wang, Kevin Chen-Chuan Chang University of Illinois Urbana Champaign
  2. Limitation of Traditional Entity Search Thalmic Lab founded by ? Thalmic Lab founded by #person Limitation Fail to model the relation implied by the query Difficult for user to enumerate different representations of a relation Entity Search: in most cases, what we want are not pages, but entities (Cheng 2007)
  3. Our Task: Relational Entity Search Output Stephen Lake Aaron Grant … Relational Entity Searcher Input Columbia Univ. Harvard Law School FounderOf (“ThalmicLab”) Founder-Of Ranker GraduateFrom (“Barack Obama”) Grad-From Ranker … Advantages Train rankers from relational data to capture the relation semantics Relieve users from the burden of specifying keywords. Relational Entity Search: relation-specific ranking functions
  4. Proposal: Relational Entity Search Framework
  5. Relational Entity Search Framework Snippet s1: Microsoft was founded by Bill Gates s2: Steven Ballmer is CEO of Microsoft Relational Entity Searcher Entity-aware Searcher Keyword Indexes Entity Index Founder-Of Ranker …
  6. Relational Entity Search Framework Result Stephen Lake Aaron Grant … Relational Query FounderOf (“ThalmicLab”) Translated Query Thalmic Lab, Founded by Founder, Started,…, #Person Relational Entity Searcher Entity-aware Searcher Keyword Indexes Entity Index Entity-Snippet Stephen Lake Stephen Lake, co-founder of Thalmic Lab Daniel Debow Daniel Debow investigates ThalmicLab … Founder-Of Ranker …
  7. Training Relational Entity Ranker Offline Entity-Snippet Bill Gates Microsoft was founded by Bill Gates … Paul Allen Steven Ballmer Steven Ballmer is CEO of Microsoft Entity-aware Searcher FounderOfRelation Founder-Of Ranker
  8. Challenges on Accuracy and Efficiency
  9. Challenges on Accuracy and Efficiency: Distantly Supervised Ranking Challenge 1(Accuracy): How to avoid the Negative Effect brought by Noisy Snippets of Positive Entities? s11: Microsoft was founded by Bill Gatesand … Noise s12: Bill Gates met Microsoft CEO at his home … s13: Bill Gates dropped out of college and started Microsoft.
  10. Challenges on Accuracy and Efficiency: Distantly Supervised Ranking Challenge 1(Accuracy): How to avoid the Negative Effect brought by Noisy Snippets of Positive Entities? Challenge 2 (Efficiency): How to limit the number of keyword features without sacrificing too much accuracy? “founded by”, #person Entity-aware Searcher Snippets “founder”, #person “started”, #person … Require Expensive Index Checking
  11. Challenges on Accuracy and Efficiency: Distantly Supervised Ranking Challenge 1(Accuracy): How to avoid the Negative Effect brought by Noisy Snippets of Positive Entities? Challenge 2 (Efficiency): How to limit the number of keyword features without sacrificing too much accuracy? Distantly Supervised Ranking Distantly Supervised: Only Entity Labels, No Snippet Labels Ranking: Efficiency is Required
  12. Insight: Redundancy Ranking Principle
  13. Learn indicative patterns based on redundancy (Challenge 1: Accuracy) Microsoft => Bill Gates Microsoft was founded by Bill Gates. IBM => Thomas Watson Founded by Thomas Waston, IBM is … BeforeEntity [“founded by”] Facebook => Mark Zuckerberg Facebook was founded by Mark Zuckerberg … Indicative Pattern: Some important patterns that are indicative of the relation. E.g., founded by, started, created …
  14. Filter Noisy Snippets by Indicative Patterns (Challenge 1: Accuracy) : Microsoft was founded by Bill Gates e: Bill Gates : Bill Gates met Microsoft CEO at home : Bill Gates started Microsoft Indicative Patterns P : Evidences : Noise Evidence Snippet:Snippets that contain at least one indicative pattern
  15. A small number of indicative patterns are sufficient (Challenge 2: Efficiency) Snippets Indicative Patterns Microsoft => Bill Gates Microsoft was founded by Bill Gates. Bill Gates created Microsoft … BeforeEntity [“founded by”] Around [“founder”] IBM => Thomas Watson The founder of IBM is Thomas … BeforeEntity [“started”] Facebook => Mark Zuckerberg Facebook was founded by Mark Zuckerberg Mark Zuckerbergcreated Facebook in 2006 BeforeEntity[“created”] …
  16. Redundancy Ranking Principle Web Redundancy : Microsoft was founded by Bill Gates e: Bill Gates 0 : Bill Gates met Microsoft CEO at home : Bill Gates started Microsoft Redundancy Ranking Principle : Evidence Snippets chosen by Indicative Patterns P : Snippet Contribution determined by snippet features
  17. Redundancy Ranking Principle : Microsoft was founded by Bill Gates e: Bill Gates 0 : Bill Gates met Microsoft CEO at home : Bill Gates started Microsoft Snippet Contribution depends on Snippet Feature of Snippet Feature f(s) Snippet Contribution
  18. Solution: Pattern-based Filter Network
  19. Objective Function Subject to : Entities : Entity Labels : Snippets For efficiency concern, the number of indicative patterns should be small : Indicative Patterns : Feature Weightings
  20. Model Redundancy Ranking Principle: Pattern-based Filter Network (PFNet) 1. Filter Noisy Snippets by Indicative Patterns. 2. Aggregate Contribution from Evidence Snippets Evidence Aggregation Layer Noise Filtering Layer
  21. Noise Filtering Layer in PFNet : if is an evidence snippet : Microsoft was founded by Bill Gates : Bill Gates met Microsoft CEO at home : Steve Ballmer becomes CEO of Microsoft : Bill Gates started Microsoft Noise Filtering … : Around [“started”] : Around [“Microsoft”] : BeforeEntity [“founded by”] : if should be included in chosen indicative pattern
  22. Evidence Aggregation Layer in PFNet : “Steve Ballmer” : “Bill Gates” … Evidence Aggregation : Microsoft was founded by Bill Gates : Bill Gates met Microsoft CEO at home : Steve Ballmer becomes CEO of Microsoft : Bill Gates started Microsoft
  23. Likelihood for PFNet Subject to
  24. Factor Design A snippet is an evidence, if and only if contains at least one of indicative patterns.
  25. Factor Design Aggregate Contribution from Evidence Snippets.
  26. Optimization: Maximizing Likelihood by Greedily Adding Indicative Patterns Indicative Patterns P Candidate Snippet Features Log Likelihood -100.20 BeforeEntity [“founded by”] Given current P, calculate the maximized likelihoodby gradient ascent on w
  27. Optimization: Maximizing Likelihood by Greedily Adding Indicative Patterns Indicative Patterns P Around[“founder”] Candidate Snippet Features Log Likelihood -100.20 BeforeEntity [“founded by”] -5600.21 Around [“microsoft”] Around[“founder”] -76.13 -200.43 BeforeEntity[“started”]
  28. Optimization: Maximizing Likelihood by Greedily Adding Indicative Patterns Indicative Patterns P Around[“founder”] BeforeEntity [“founded by”] Candidate Snippet Features Log Likelihood -60.20 BeforeEntity [“founded by”] -3450.21 Around [“microsoft”] Around[“founder”] -103.43 BeforeEntity[“started”]
  29. Experiment Setting 6 sets of different relations
  30. Experiment Setting Baselines EntityRank(Cheng 2007) Multi Instance Learning (MIL, Riedel 2010) SVMRank(Joachims2003)
  31. Ranking Performance on 6 Different Relations.
  32. 1. Relation-specific ranking function performs better. 2. It is important to leverage redundancy. 3. It is necessary to filter noisy snippets.
  33. Larger Improvement on More Noisy Relations
  34. Around 10 indicative patterns are sufficient
  35. 1. Higher redundancy can achieve better results 2. Filtering noise is helpful for queries of different redundancy
  36. 1. Performance increases with more training examples.2. Around 90 training examples are sufficient for most relations.
  37. Thanks. Q & A
More Related