1 / 44

Slot Filling based on Knowledge Graph and Truth Finding

Slot Filling based on Knowledge Graph and Truth Finding. Dian Yu, Haibo Li, Hongzhao Huang and Heng Ji Computer Science Department Rensselaer Polytechnic Institute {yud2 , jih}@rpi.edu November 18, 2013. Our Starting Point = 0. BLENDER SF2010 System. BLENDER SFV2012 System.

nida
Download Presentation

Slot Filling based on Knowledge Graph and Truth Finding

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Slot Filling based on Knowledge Graph and Truth Finding Dian Yu, Haibo Li, Hongzhao Huang and Heng Ji Computer Science Department Rensselaer Polytechnic Institute {yud2, jih}@rpi.edu November 18, 2013

  2. Our Starting Point = 0 BLENDER SF2010 System BLENDER SFV2012 System Why? Because Heng wants everything new in her new place.

  3. Outline • Limitations of state-of-the-art • Our Vision and Approach Overview • Knowledge Graph Construction • Knowledge Path Extraction • Knowledge Path Selection • Knowledge Graph Clustering • Truth Finding • 2-Layer Mutual Enhancement Truth Finding • Credibility Initialization • Credibility Propagation • Experimental Results • Remaining Challenges • Conclusions and Future Work

  4. Outline • Limitations of State-of-the-art • Our Vision and Approach Overview • Knowledge Graph Construction • Knowledge Path Extraction • Knowledge Path Selection • Knowledge Graph Clustering • Truth Finding • 2-Layer Mutual Enhancement Truth Finding • Credibility Initialization • Credibility Propagation • Experimental Results • Remaining Challenges • Conclusions and Future Work

  5. Unpleasant Situation of Slot Filling 2009-2012 • The most challenging task in KBP • Most previous systems hit the 30% “performance ceiling” • No significant publications on this task at major venues • What are the bottlenecks? • Limited amount of labeled data  supervised learning is infeasible • Low coverage of patterns  construct knowledge graph with enriched semantic IE annotations and coreference • Knowledge gap  linguistic constraints mining, • path selection and graph clustering • Conflicting results  truth finding

  6. Inspiration from Gravitational Theory Query Slot Filler? Slot Type?

  7. Source Corpus Approach Overview Semantic Annotation Query Dependency Parsing Information Extraction Alternative Name Slot Fill Extraction Query Expansion Information Retrieval Knowledge Graph Construction Wikipedia Mining Path Extraction Graph Clustering Path Selection Merged KBs Truth Finding Slot Fills Redundancy Removal & Filler Normalization

  8. Outline • Limitations of State-of-the-art • Our Vision and Approach Overview • Knowledge Graph Construction • Knowledge Path Extraction • Knowledge Path Selection • Knowledge Graph Clustering • Truth Finding • 2-Layer Mutual Enhancement Truth Finding • Credibility Initialization • Credibility Propagation • Experimental Results • Remaining Challenges • Conclusions and Future Work

  9. Inspiration from Gravitational Theory Query Slot Filler? Slot Type?

  10. Bottleneck: Low Coverage of Patterns • Manually crafted/edited patterns: low coverage; expensive • Bootstrapping: hard to generalize; long-tail distribution • Typical Dependency patterns for per:place_of_birth • <Query_PER> nsubjpass-1 born prep_in <Filler_LOC> • <Query_PER> partmod born prep_in <Filler_LOC> • <Query_PER> nsubjpass-1 born prep_on <Filler_LOC> • <Query_PER> rcmod born prep_in <Filler_LOC> • Missing some simple cases • Charles Gwathmey [1] was born on June 19 , 1938 , in Charlotte [2], N.C.. • Dependency path between [1] and [2]: [ 'nsubjpass', 'born', 'prep_on', 'June', 'prep_in', 'N.C', 'nn') ]

  11. Bottleneck: Low Coverage of Patterns • Typical Dependency Patterns for per:place_of_death • <Q_PER> nsubj-1 dies prep_in <A_LOC> • <Q_PER> nsubj-1 died prep_in <A_LOC> • <Q_PER> nsubj-1 died prep_on <A_LOC> • <Q_PER> nsubj-1 died prep_in hospital nn <A_LOC> • Missing some simple cases • ``60 Minutes'' was the brainchild of Don Hewitt [1], the show 's longtime executive producer who died Wednesday of pancreatic cancer at his home in Bridgehampton, N.Y. [2], at age 86 . • Dependency path between [1] and [2]: • [ 'appos', "producer", 'nsubj', 'died', "who", 'rcmod', 'died', 'prep_at', 'home', 'prep_in‘]

  12. Knowledge Gap 1 • Deep Knowledge Acquisition: Nominal Coreference • Almost overnight, he became fabulously rich, with a $3-million book deal, a $100,000 speech making fee, and a lucrative multifaceted consulting business, Giuliani Partners. As a celebrity rainmaker and lawyer, his income last year exceeded $17 million. His consulting partners included seven of those who were with him on 9/11, and in 2002 Alan Placa, his boyhood pal, went to work at the firm. • After successful karting career in Europe, Perera became part of the Toyota F1 Young Drivers Development Program and was a Formula One test driver for the Japanese company in 2006. • “Alexandra Burke is out with the video for her second single … taken from the British artist’s debut album” • “a woman charged with running a prostitution ring … her business, Pamela Martin and Associates” • Our Solution: Online knowledge graph construction; enrich paths with semantic annotations and Information Extraction (coreference/relation/event)

  13. Knowledge Path Extraction ③ ① ② ① Relevant Document Set ② Sentence Set[Tree representation] ③Extracted Paths

  14. Knowledge Path Extraction • Extracted Knowledge Paths: 1) Mays {}…amod…50 2) Mays…nsubj…died…prep_at …home…Tampa 3) Mays…nsubj…died…prep_at …June, 28 {PER, NAM, Billy Mays} • Mays, 50, had died in his sleep at his Tampa home the morning of June 28. {NUM} {Death-Trigger} {PER, PRO, Mays} {GPE, NAM, FL-USA} {06/28/2009, TIME-WITHIN} {FAC, NOM} {Located}

  15. Knowledge Path Extraction • Each node is a entity/time/value mention extent or a word, enriched by • Entity type/subtype • Time normalization, role • Mention head • Full entity mention name a mention node refers to • Slot type of trigger phrases mined from Gigaword, Wikipedia articles and KBs • Each edge is • a derivation path from syntactic parsing, or • a type labeled dependency path, or • a event/semantic relation extracted by IE, labeled with argument roles

  16. Outline • Limitations of State-of-the-art • Our Vision and Approach Overview • Knowledge Graph Construction • Knowledge Path Extraction • Knowledge Path Selection • Knowledge Graph Clustering • Truth Finding • 2-Layer Mutual Enhancement Truth Finding • Credibility Initialization • Credibility Propagation • Experimental Results • Remaining Challenges • Conclusions and Future Work

  17. Knowledge Gap 2 • Deep Knowledge Acquisition: Implicit paraphrases & long-tail distribution • “employee/member”: • Sutil, a trained pianist, tested for Midland in 2006 and raced for Spyker in 2007 where he scored one point in the Japanese Grand Prix. • Daimler Chrysler reports 2004 profits of $3.3 billion; Chrysler earns $1.9 billion. • In her second term, shereceived a seat on the powerful Ways and Means Committee • Jennifer Dunn was the face of the Washington state Republican Party for more than two decades • State of Residence: Davis became Virginia's first Republican woman elected to Congress in 2000, and she was a member of the House Armed Services Committee and the Foreign Affairs Committee • Buchwaldlied about his age and escaped into the Marine Corps. • By 1942, Peterson was performing with one of Canada's leading big bands, the Johnny Holmes Orchestra. • Even more: “would join”, “would be appointed”, “will start at”, “went to work”, “was transferred to”, “was recruited by”, “took over as”, “succeeded PERSON”, “began to teach piano”, … • “spouse”: • Buchwald 's 1952 wedding -- Lena Horne arranged for it to be held in London 's Westminster Cathedral -- was attended by Gene Kelly , John Huston , Jose Ferrer , Perle Mesta and Rosemary Clooney , to name a few

  18. Too Rich is not Always a Good Thing • Need to filter out noisy contexts: 97% paths are irrelevant • Our Solution: Multi-Layer Path Selection • Encode slot type-specific linguistic constraints for deep understanding • Constraint Examples • Candidate/context node attributes (entity type, mention type, time, number, url…) • Stop words, upper case/lower case • Match name gazetteer and existing KBs (YAGO, Wikipedia infoboxes, Freebase, DBPedia) and KB mined from Wikipedia Mining • Path length • Hismost noticeable moment in the public eye came in 1979 , when Muslim militants in Iran • seized the U.S. Embassy and took the Americans stationed there hostage . • path = ('poss', 'moment', 'nsubj', 'came', 'advcl', 'seized', 'nsubj', 'Muslim militants','amod') • Coreference link/relation argument roles/event argument roles • Position of a particular node/edge type in the path • Semantic categories of context nodes from IE annotations • Entity node’s role in the entire sentence (e.g. remove commenter/reporter) • Filter “orgin” if the person is a commenter: “Canada and Russia , they have unbelievable rosters , '' Forsberg said .”

  19. More Constraint Examples • Edge type • place_of_death and place_of_birth paths should include prep_in or prep_at edges) • Filter “Employee” if the dependency path includes “prep_on” • Manhcalled on the Asian Development Bank to play a greater role in • helping improve national infrastructure. (path: ['nsubj', 'called', 'prep_on‘]) • Lexical Constraints based on trigger phrases/words [Heng’sseveral days paper-pen work] • Mining from Gigaword, Wikipedia articles • Mining from KBs (Wikipedia infoboxes, Freebase, YAGO, DBPedia) • CMU NELL knowledge base (e.g. religion list from is-a relation) • Examples: • “top-employees”: chief executive officer, chief financial officer, chief operating officer, chief strategy and development officer, chiev information officer, e-commerce and security officer,… • “headquarters”: based, headquarter, headquarters, 's • Disease list from medical ontology • Comparison with competing context nodes

  20. Outline • Limitations of State-of-the-art • Our Vision and Approach Overview • Knowledge Graph Construction • Knowledge Path Extraction • Knowledge Path Selection • Knowledge Graph Clustering • Truth Finding • 2-Layer Mutual Enhancement Truth Finding • Credibility Initialization • Credibility Propagation • Experimental Results • Remaining Challenges • Conclusions and Future Work

  21. Inspiration from Gravitational Theory Query Slot Filler? Slot Type?

  22. Slot Filling != Binary Relation Extraction • A sentence is usually anchored by a predicate instead of a pair of entities • Slot fillers need to be extracted from multiple documents instead of a local context involving two entities (main difference from ACE relation extraction) • Capture the interactions among query, candidate slot filler and all other (competing) entities from global contexts, instead of only the path between query and candidate slot filler • Cross-slot Cross-entity reasoning is required • Generalization of similar specific graphs • Model a candidate mention or context word’s latent semantic role based on its local context knowledge graph

  23. Small Universe of a Mention/Word • Filler competing with popular entities involved in centroid events/topics • “Hewittwas born Dec. 14 , 1922 , in New York City , but his family soon moved to Boston , where his father worked as the classified advertising manager for the Boston Herald American.” • Query: Hewitt • Candidate Filler 1: New York City • Path: 'nsubjpass', 'born', 'prep_in‘ • Candidate Filler 2: Boston • Path: 'nsubjpass', 'born', 'conj_but', 'moved', 'prep_to‘ • Small Universe of “Boston” Boston coref movement his family family Boston family employee Hewitt Herald American his father mod

  24. Knowledge Graph Clustering • Hypothesis: Entity mentions/words that share similar local graph structures and labels are likely to play similar roles • Local graphs for correct spouse fillers: • Local graphs for incorrect spousefillers:

  25. Knowledge Graph Clustering • Similarity Measures • Structure similarity • number of nodes • number of edges • radius • degree assortativity of graph • the maximum degree centrality for nodes in the graph (density) • Similarity between attributes of nodes and edges (PageRank) • More powerful for persons than organizations

  26. Outline • Limitations of State-of-the-art • Our Vision and Approach Overview • Knowledge Graph Construction • Knowledge Path Extraction • Knowledge Path Selection • Knowledge Graph Clustering • Truth Finding • 2-Layer Mutual Enhancement Truth Finding • Credibility Initialization • Credibility Propagation • Experimental Results • Remaining Challenges • Conclusions and Future Work

  27. Truth Finding • Negative Statement • Steinmeier, who became Chancellor Angela Merkel's foreign minister in 2005, has denied the U.S. planned to send Kurnaz to Germany. • Conflicting Evidence from Multiple Sources • Yolanda King , daughter of Martin Luther King Jr. , dies ATLANTA • She was 51 King died late Tuesday in Santa Monica , California , at age 51 , said Steve Klein , a spokesman for the King Center • A unique challenge that did not exist in traditional single-document information extraction • Our Solution • Develop a new validation approach based on a "truth-finding" framework • Propagate evidence among system, evidence and claim

  28. Hypotheses • Hypothesis 1: A claim is likely to be true if it's supported by many trustworthy evidences. An evidence is more likely to be trustworthy if many claims it supports are true • Hypothesis 2: A claim or evidence is more likely to be true if it is extracted by many trustworthy systems. And a system is more likely to be credible if it can extract many trustworthy claims or evidences

  29. 2-Layer Mutual Enhancement Truth Finding Evidence Claim System Claim-System Networks Claim-Evidence Networks

  30. What’s New • There might be multiple true claims, some redundant, some distinct but of the same type • Most of the previous truth finding methods relied on the crowd of wisdom (“great minds think alike"); but majority voting may not always work because certain implicit truths might only be discovered by a few good systems/sources • The performance of a system may vary over time • Systems may share similar resources and be dependent on each other • Not only provide confidence scores, but also detailed evidence and aspects

  31. Credibility Initialization System 1 System 2

  32. Credibility Initialization • Initializing scores for claims: • Evaluate each claim based on evidence from its dependency path Filler Query dependency tag Entity dependency tag sentence: US actress Patricia Neal dies at 84. query: Patricia Neal slot: per:origin Filler: US dependency path from query to filler: nn

  33. Credibility Propagation based on Tri-HITS • Propagating credibility scores from claim to system • Update system credibility • Propagate credibility from system to claim • Update claim credibility (Huang, 2012)

  34. Outline • Limitations of State-of-the-art • Our Vision and Approach Overview • Knowledge Graph Construction • Knowledge Path Extraction • Knowledge Path Selection • Knowledge Graph Clustering • Truth Finding • 2-Layer Mutual Enhancement Truth Finding • Credibility Initialization • Credibility Propagation • Experimental Results • Remaining Challenges • Conclusions and Future Work

  35. Overall Performance • Our fresh system significantly outperforms our old system (18.6% ) • The pool is much better this year (11.7% ) • Top 3 among all teams; Top 1 among all DEFT teams

  36. Impact of Knowledge Path Extraction • Alternative name feedback based query expansion: 1.1% gain in F-Measure • Entity Coreference Resolution to enrich knowledge paths: 1.2% gain in F-Measure • Relax path length constraints: 0.5% gain in F-Measure

  37. Impact of Knowledge Path Selection and Truth Finding F-Measure KBP2013 SF Systems

  38. Outline • Limitations of State-of-the-art • Our Vision and Approach Overview • Knowledge Graph Construction • Knowledge Path Extraction • Knowledge Path Selection • Knowledge Graph Clustering • Truth Finding • 2-Layer Mutual Enhancement Truth Finding • Credibility Initialization • Credibility Propagation • Experimental Results • Remaining Challenges • Conclusions and Future Work

  39. Remaining Challenges • Name Tagging Errors • Coreference Resolution Errors • He worked his way up the organization under founder Ted Arison and his son Micky , who now leads Carnival Corp. and called Dickinson, `` one of the most influential people in the development of the modern-day cruise industry. • Indiana Muslim running for Congress wants to combat ignorance about his[Andre Carson] faith INDIANAPOLIS -- A convert to Islam stands an election victory away from becoming the second Muslim elected to Congress and a role model for a faith community seeking to make its mark in national politics. • Vague Justification • It was in December 1970 that Anderson criticized Hoover 's pretrial attack on two Roman Catholicpriests , Daniel J. and Philip F. Berrigan , who were later convicted of destroying draft board records.  religion filler? • Fuzzy Definition • Sheand Russell Simmons, 50, have two daughters: 8-year-old Ming Lee and 5-year-old Aoki Lee.

  40. Remaining Challenges • Distinguish Slot Directions • Organization parent/subsidiary; members/member_of • Implicit Relations • He [Pascal Yoadimnadji] has been evacuated to France on Wednesday after falling ill and slipping into a coma in Chad, Ambassador Moukhtar Wawa Dahab told The Associated Press. His wife, who accompanied Yoadimnadji to Paris, will repatriate his body to Chad, the amba.  is he dead? in Paris? • Until last week, Palin was relatively unknown outside Alaska, and as facts have dribbled out about her, the McCain campaign has insisted that its examination of her background was thorough and that nothing that has come out about her was a surprise.  does she live in Alaska? • The list says that the state is owed $2,665,305 in personal income taxes by singer Dionne Warwick of South Orange, N.J., with the tax lien dating back to 1997.  does she live in NJ? • Vernon Bellecourt -- whose Ojibwe name, WaBun-Inini, means "Man of Dawn" or "Daybreak" -- was born on the White Earth Indian Reservation in Minnesota. He left home at 15 after finding work in a carnival.  did he live in Minnesota?

  41. Outline • Limitations of State-of-the-art • Our Vision and Approach Overview • Knowledge Graph Construction • Knowledge Path Extraction • Knowledge Path Selection • Knowledge Graph Clustering • Truth Finding • 2-Layer Mutual Enhancement Truth Finding • Credibility Initialization • Credibility Propagation • Experimental Results • Remaining Challenges • Conclusions and Future Work

  42. Conclusions and Future Work • Mined and incorporated rich knowledge from multiple lexical, syntactic and semantic levels for slot filling • Proposed a new knowledge graph representation • Developed a new truth-finding framework for answer validation • Married low-level IE with high-level Data Mining • Future Work • Incorporate more knowledge resources such as NELL into path selection • Hierarchal knowledge graph clustering • Collective joint extraction across queries and slot types • Truth Finding • Source: Publication Agency, Reporter’s profile, social network and his/her role in the event, Reporting time and location • System: add history, profile and confidence values (this year’s data is not very discriminative) • Claim: compute similarity based on coreference resolution, entity/event clustering and equivalence, modeling complexity; distance from answers from the top-tier systems • Evidence Dimensions: soft constraints in path selection

  43. Resources Sharing Plans • January 2014 • Heng’s paper-pen made constraints & dictionaries • BLENDER KB (merged and cleaned from Wikipedia infoboxes, Freebase, YAGO and DBPedia) • March 2014 • Slot Filling system to share with KBP community; integrated into BBN DEFT platform

  44. 44

More Related