Unsupervised Graph-based Relation Extraction and Validation for Knowledge Base Population

Unsupervised Graph-based Relation Extraction and Validation for Knowledge Base Population Ph.D. Thesis Defense Dian Yu Advisor: Dr. Heng Ji Computer Science Department Rensselaer Polytechnic Institute Doctoral Committee Dr. Heng Ji, RPI Dr. Deborah L. McGuinness, RPI Dr. Peter Fox, RPI Dr. Ralph Grishman, NYU September 28, 2017

Knowledge Base (KB) • Store millions of facts about the general human knowledge. • Entities • Relations between these entities • Represented as triples (head entity, relation, tail entity) • Useful resources for applications such as document summarization, information retrieval, semantic search, question answering, and automated translation. • Large-scale KBs examples: (RPI, location, Troy) 1.9 billion

Knowledge Base (KB) • Important input for many large-scale KBs: Wikipedia Infobox

Challenges sibling E A • Incompleteness • Famous entities • Important attributes • Require huge human effort • Language specific • Insufficient KBs for other languages • RPI + 建立/설립/establecimiento(founded) • Conflicting results from different information sources. • Goal of this Thesis: we focus on language-independent relation extraction and validation from large-scale unstructured textual corpora to populate multilingual knowledge bases. otherFamily spouse K Knowledge Base Completion Slot Filling Slot Filling Validation

Contributions Limitations of state-of-the-art • Combine merits of traditional RE (high quality) and open RE (high portability) • Comparable performance as supervised models • No need of manual annotations • Extract 2K+ instead of 40+ types • Automatic naming of interpretable types • We develop a multi-dimensional truth finding model for relation validation. • Traditional Relation Extraction • Supervised approaches rely on human annotated data and resources • Limited to a small set (30-40) of pre-defined types • Not portable to new types or new languages • Open Relation Extraction • Suffer from low recall (<20%) • No satisfying type naming standard • Relation Validation: • Ignored multi-source multi-system evidence

Outline • Background • Case Studies • I: Importance-Based Trigger Identification for Traditional Relation Extraction • II: Importance-Based Open Relation Extraction and Grounding • III: Unsupervised Relation Validation: Multi-dimensional Truth-Finding Model • Conclusion • Related Publications

Case Study 1: Slot Filling • Slot Filling (SF): extract the values (fillers)of specific attributes (slot types) for a given entity (query) from a large-scale corpus and provide justification sentences. • Query: Person (PER) or Organization (ORG) • Slot Types: • PER: 25 slot types (per:spouse, per:schools_attended …) • ORG: 16 slot types (org:founded_by, org:city_of_headquarters..) • Slot Filler: list-value; single-value query: Judy Hopps per_alternate names: Carrots per_parents: Bonnie and Stu Hopps per_place_of_residence: Bunnyburrow per_title: Police officer Judy Hoppsbecomes the first bunny police officer in a new trailer for Disney's Zootopia

Case Study 2: Slot Filling Validation • Slot Filling Validation (SFV): refinement of output from Slot Filling systems by either combining information from multiple SF systems or applying more intensive linguistic processing to validate individual candidate SF response. • SF response/claim • query: Judy Hopps • slot type: per: title • slot filler: Police Officer • evidence: “Judy Hopps became the first bunny police officer.” • Objective: classify each SF response as correct or wrong. SF claim SF response

Concepts and Hypotheses • Trigger (T): the smallest extent of a text which most clearly indicates a slot type. • triggers for per:spouse: divorce, wife, husband, etc. • Trigger-driven slots: the slot types indicated by triggers: per:date_of_birth, per:city_of_death, per:schools_attended • Hypothesis: A trigger is usually an important node relative to the query node and the filler node in the dependency graph of a context sentence. flat representation -> whole dependency tree structure

Extended Dependency Tree Dominick Dunne(query) – nsubjpass – divorced – acl:relcl – Ellen Griffin Dunne(filler)

Extended Dependency Tree • Objective: build an extended dependency graph for each evidence sentence and generate query and filler candidate mentions. • Extended Tree Construction: • Nodes: all the words in a sentence • Edges: the dependency relation between two words. • We annotate entity, time or value mention node with its type. In E1, “1965” is annotated as a year. • [optional] co-reference resolution. For example, “he” in E1 is replaced by “Dominick Dunne”.

1. Candidate Relation Identification • For each sentence which contains at least one query mention, we regard all other entities, values and time expressions as candidate fillers and generate a set of entity pairs Dominick – 1965 Dominick – Ellen Dominick – 1997

2. Trigger Identification Compute saliencescores of trigger candidates relative to query and filler in E1.

2. Trigger Identification • PageRank with Priors: find important nodes relative to R = preferred nodes (query or candidate slot fillers) • denotes the relative importance attached to node • for , otherwise. • Back probability which determines how often we jump back to the set of root nodes in . • Iterative stationary probability equations: The probability of transitioning to node from node is defined as: for node that have an edge from to , and 0 otherwise. • After convergence, since the ranks are biased towards the set .

3. Trigger Candidate Selection • Cluster triggers into tiers based on their salience scores using Affinity Propagation (B. Frey and D. Dueck. 2007) • Choose the cluster {divorced} with the highest average relative importance score (0.128) as the trigger set

4. Slot Type Labeling • Objective: label the slot type for an identified relation triple (Q, T, F). • Two methods: • Match T against existing trigger gazetteers (Yu et al., 2015) for certain types of slots

4. Slot Type Labeling • Two methods: • Match T against existing trigger gazetteers (Yu et al., 2015) for certain types of slots • Extract trigger seeds from the slot filling annotation guideline and then expand them using Paraphrase Database (PPDB) (Ganitkevitch et al., 2013; Pavlick et al., 2015) PPDB-based trigger expansion examples.

Experiments • Dataset: • Knowledge Base Population (KBP) 2013 English Slot Filling dataset • KBP 2015 Chinese Slot Filling dataset • Evaluation Metric: • Precision, • Recall • F-score

English/Chinese Slot Filling

Error in Chinese Dependency Parsing • Coordinated sentences share one subject. 罗京(Luo Jing)1983年毕业(graduated)于后即分配(appointed)来电视台，出生(born)于1961年，同年到(went to)中央电视台工作，担任(became) 《新闻联播》播音员 appointed went to Query graduated born H. Ji, 2015, NIST

Open Relation Extraction • What is Open Relation Extraction (Open RE)? • Open RE is not limited to any pre-defined relation types • Extract two arguments as well as the relation type name (usually a word/phrase) from the context -> domain independent • Example: Lucille Clifton, whom he married in 1958, was born in 1936. Existing system output (Angeli et al., 2015) (he; married in; 1958) Expected output (Lucille Clifton; born in; 1936) (Lucille Clifton; married in; 1958) (he, married in; 1958) (he; married; Lucille Clifton)

Slot Filling vs. Open RE

Hypotheses • [Relation Extraction] A candidate relational triple is likely to be salient if its two arguments are strongly connected in a dependency tree. • [Relation Grounding] Large-scale KBs can be leveraged to type open RE relational triples.

Overview • Our Goals • Extend relation types from 41 slot types in Slot Filling to 2,060 relation types in DBPedia • Achieve high-quality extraction • Implement an unsupervised method to name relation types by grounding relational triples to a large-scale KB.

Compute connection strength • Step 1: Construct an extended dependency graph

Compute connection strength • Step 2: Compute random-walk based distance between two nodes. A shorter distance indicates a stronger relation. • Mean first-passage time : the average number of steps needed by a random walker for reaching state for the first time starting from state . We use the average commute time . • Compared to the shortest path, the value of will decrease when the number of paths connecting the two nodes increases and when the length of any path decreases. • , (Li and Zhang, 2010). : fundamental matrix : identity matrix : diagonal matrix with elements : matrix containing all ones : a column vector of the stationary probabilities for the Markov chain

Compute connection strength • Step 3: Compute the importance of nodes. • Intuition: A candidate relation triple is more likely to be salient if it involves important units of the sentence. • We apply TextRank (Mihalcea and Tarau, 2004) to compute the importance score Iof each node . We assign a higher preference towards entities. • Example: Lucille Clifton, whom he married in 1958, was born in 1936..

Compute connection strength • Step 4: Combination of Step 2 and Step 3 • Intuition: Shorter distance between two informative entities. • Then we apply the maximum spanning tree algorithm for selecting candidate relation tuples.

Relation grounding • Step1: learn relation type representations based on DBpedia triples (Auer et al., 2007) and pre-trained word embedding (Glove(J. Pennington et al., 2014)) • Use the relation type names themselves (2,060) • Use all KB triples (30,024,093) composed of two entities, assuming relation patterns can be represented as linear translations between two entities. • for each KB relation given its associated KB triples • Step2: compare our candidate relational triple with all DBpedia relations • Step3: ground to the most similar KB relation based on entity type constraints.

Evaluation • Data: • KB: DBpedia • Pretrained word embedding: GloVe • Mapping Open RE to Slot Filling • Data: KBP SF 2013 • We compare with state-of-the-art Open RE systems built by University of Washington (Soderland et al., 2013) and the approach (Riedel et al., 2013) which extracts relations with matrix factorization and universal schemas.

Extracting true claims from multiple sources • Problems: • different information sources may generate claims with varied trustability • various relation extraction systems may generate erroneous, conflicting, redundant, complementary, ambiguously worded, or inter-dependent claims from the same set of documents

Truth Finding Problem • Truth Finding: Determine the veracity of multiple conflicting claims from various sources and providers (i.e. systems or humans) • We require not only high-confidence claims but also trustworthy evidence to verify them.  deep understanding is needed. • Previous truth finding work assumed most claims are likely to be true. Most of them relied on the “wisdom of the crowd”. • In SF, 72.02% responses are false. • Certain truths might only be discovered by a minority set of systems or from a few sources (62% from 1 or 2 systems)

Multi-dimensional truth-finding model (MTM) • Heuristic 1: A response is more likely to be true if derived from many trustworthy sources. A source is more likely to be trustworthy if many responses derived from it are true. • Heuristic 2: A response is more likely to be true if it is extracted by many trustworthy systems. A system is more likely to be trustworthy if many responses generated by it are true.

Credibility Initialization • Source (): • a combination of publication venue and genre • initialized uniformly as ( is the number of sources) • System (): • Each system generates a set of responses . • Similarity between system and is (Mihalcea, 2004). • Construct a weighted undirected graph , , • Apply TextRank to obtain the initial score. • Response (): • Rely on deep linguistic analysis of the evidence sentences and semantic clues. We will introduce it later.

Credibility Propagation • Extension of Co-HITS (Deng et al., 2009) • Given the initial credibility scores , we aim to obtain the refined credibility scores • Propagation： • Sources： Consider both the initial score for source and the propagation from connected responses. • System: Consider both the initial score for system and the propagation from responses to systems • Response: Each response’s score is influenced by both linked sources and systems.

MTM Algorithm • The following algorithm summarizes MTM. • The propagation algorithm converges and a similar proof to HITS (Peserico and Pretto, 2009).

Truth Finding Overall Performance ＊MAP: Mean Average Precision

Truth Finding Efficiency

Enhance Individual SF Systems

Summary • The first unsupervised language-independent graph mining approach for slot filling which does not require any annotated data or external knowledge bases for supervision. • The first open relation extraction method which exploits the global structure of a dependency tree to extract relative salient relational triples. • The first unsupervised relation grounding method to name relation types for open relation extraction based on KBs and intra-sentence context information. • Developed a novel unsupervised multi-dimensional truth-finding model incorporating signals from multiple sources, multiple systems, and multiple levels of linguistic evidence

Related Publications • Open Relation Extraction and Grounding Dian Yu, Lifu Huang, and Heng Ji. 2017. IJCNLP 2017. • Unsupervised Person Slot Filling based on Graph Mining Dian Yu, Heng Ji. 2016. ACL 2016. • Detecting Deceptive Groups Using Conversations and Network Analysis Dian Yu, YuliaTyshchuk, Heng Ji and William Wallace. ACL- IJCNLP 2015. • Why Read if You can Scan: Scoping Strategy for Biographical Fact Extraction Dian Yu, Heng Ji, Sujian Li and Chin-Yew Lin. NAACL-HLT 2015 (short). • The Wisdom of Minority: Unsupervised Slot Filling Validation based on Multi-dimensional Truth-Finding Dian Yu, Hongzhao Huang, Taylor Cassidy, Heng Ji, Chi Wang, Shi Zhi, Jiawei Han, Clare Voss and Malik Magdon-Ismail.COLING 2014. .

Thank You! Questions?

Unsatisfying naming schema Patricia later described her relation with Gary Cooper as one of the most beautiful things that ever happened to her in her life. (Patricia __________ Gary Cooper)

Grounding Example Patricia later described her relation with Gary Cooper as one of the most beautiful things that ever happened to her in her life. (Patricia __________ Gary Cooper) influencedBy

Hypotheses Appendix • Inter-dependencies among relation components can be leveraged to enhance both quality and portability of relation extraction based on unsupervised graph mining. • Evaluation of relative importance of candidate relational triples enables the discovery of rich, diverse and qualified domain-specific relational facts. • The utilization of KB and distributed representation can be leveraged for relation typing and filtering. • Relational triples can be validated and consolidated through multi-source multi-claim multi-system credibility propagation and truth finding.

Linguistic Indicators:Extended Dependency Graph Construction Appendix {NUM } 【Per:age】 {PER.Individual, NAM, Billy Mays} 【Query】 50 Mays amod nsubj {Death-Trigger} aux died Tampa prep_in had located_in sleep prep_at nn home poss {FAC.Building-Grounds.NOM} prep_of poss his June,28 {PER.Individual.PRO, Mays}

Unsupervised Graph-based Relation Extraction and Validation for Knowledge Base Population

Unsupervised Graph-based Relation Extraction and Validation for Knowledge Base Population

Presentation Transcript

Distant Supervision for Knowledge Base Population

Graph-based Consensus Maximization among Multiple Supervised and Unsupervised Models

Graph-Based Methods for “Open Domain” Information Extraction

Relation Extraction

Fuzzy graph and relation

Graph-Based Methods for “Open Domain” Information Extraction

Relation Extraction

LCC’s Approaches to Knowledge Base Population

Information Extraction Lecture 7 – Relation Extraction

Scalable Methods for Graph-Based Unsupervised and Semi-Supervised Learning

Kernel Methods for Relation Extraction

Graph-based Consensus Maximization among Multiple Supervised and Unsupervised Models

Tree Kernel-based Semantic Relation Extraction using Unified Dynamic Relation Tree

Relation Extraction

Relation Extraction and Machine Learning for IE

Scalable Methods for Graph-Based Unsupervised and Semi-Supervised Learning

Relation Extraction

Exploiting Background Knowledge for Relation Extraction

Typed Tensor Decomposition of Knowledge Bases for Relation Extraction

Graph-based Consensus Maximization among Multiple Supervised and Unsupervised Models

Graph-based Consensus Maximization among Multiple Supervised and Unsupervised Models

Relation Extraction