1 / 22

An Integrated Approach for Relation Extraction from Wikipedia Texts

An Integrated Approach for Relation Extraction from Wikipedia Texts. Yulan Yan Yutaka Matsuo Mitsuru Ishizuka The University of Tokyo WWW 2009. Abstract. Relation Extraction from Wikipedia Texts A novel distance function A linear clustering algorithm Wikipedia Texts

karen-horn
Download Presentation

An Integrated Approach for Relation Extraction from Wikipedia Texts

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. An Integrated Approach for Relation Extraction fromWikipedia Texts Yulan Yan Yutaka Matsuo Mitsuru Ishizuka The University of Tokyo WWW 2009

  2. Abstract • Relation Extraction from Wikipedia Texts • A novel distance function • A linear clustering algorithm • Wikipedia Texts • High quality texts • Heavily cross-linked articles • Sentence -> Dependency tree • Web Texts • Frequency information • Relation terms • Sentence -> Surface pattern • Experiments on two different domains • American chief executives • Companies

  3. Problem definition • Relation extraction between • article entitled concept (ec) and • one of related concepts (rc) • There is a salient semantic relation r between p and p’  l(p)

  4. Problem definition Concept pairs (Eric E. Schmidt, Google) (Eric E. Schmidt, Compiler) (Eric E. Schmidt, Atherton, California) … … (Bill Gates, Microsoft) … … Clustering Evaluation

  5. Overview of the Approach • Text preprocessor • Concept pair collection • Sentence filtering • Web Context Collector • A set of ranked relational terms • A set of surface patterns • Dependency pattern modeling • Linguistic information • Linear clustering algorithm • Local clustering • Global clustering

  6. 1. Text Preprocessor -Relation Candidate Generation • Wikipedia article texts to get • relation candidates • corresponding sentences. • All hyper-linked concepts in the article as related concepts, which may share a semantic relationship with the entitled concept • Concept pairs • Appling a linguistic parser to split article text into sentences • for the dependency pattern modeling module

  7. 2. Web Context Collection • Querying with a concept pair • Hypothesis • The web exists some key terms and patterns that provide clues to the relation the concept pair assume • Two kinds of relational information • a set of ranked relational terms as keywords • a set of surface patterns

  8. 2. Web Context Collection - Relational Term Ranking (1/2) • To collect relational terms as indicators for each concept pair • Verbs, nouns • Such as “CEO”, “founder” • Entropy-based feature ranking algorithm • Chen et al., 2005 (IJCNLP) • After the ranking • A relational term list Tcp is ranked according to term order • A keyword kcp is selected as co-appearing in the term list Tcp and corresponding Wikipedia sentence

  9. Entropy-based Feature Ranking - J. Chen, D. Ji, C.L. Tan, and Z. Niu. 2005. Unsupervised Feature Selection for Relation Extraction. In Proceedings of JCNLP-2005. • Local context vectors of co-occurrences of entity pair E1 and E2 • P ={ p1, p2, … pN } • The words occurred in P • W ={ w1, w2, … wM} • To select a subset of important features from W ;

  10. 2. Web Context Collection - Surface Pattern Generation (2/2) • Content Words(CWs) • ec( entitled concept), rc(related concept) , keyword kcp • Function Words • Bag of words is to look for verbs, nouns, and coordinating conjunctions

  11. 3. Dependency Pattern Modeling • Dependency patterns for relation clustering • selected sentences • one of entitled concept, one of the related concepts • parsing into dependency structures • R. Bunescu and R. Mooney. 2005. A shortest path dependency kernel for relation extraction. In Proceedings of HLT/EMLNP-2005. • M. Zhang, J. Zhang, J. Su and G. Zhou. 2006. A Composite Kernel to Extract Relations between Entities with both Flat and Structured Features. In Proceedings of ACL-2006.

  12. 4. Linear Clustering Algorithm- Distance Function & Centroid Selection (1/2) • All concept pairs are grouped by their keywords tcp • Let G={G1,G2, …Gn }, • Gi={cpi1,cpi2,…, } shares the same keyword tcp • A centroid ci is selected for group Gi

  13. 4. Linear Clustering Algorithm- Distance Function & Centroid Selection (2/2) • cost function cost(sp1i,sp2j) • B. Rosenfeld and R. Feldman. 2006. URES: an Unsupervised Web Relation Extraction System. In Proceedings of COLING/ACL-2006.

  14. 4. Linear Clustering Algorithm- Local Dependency Pattern Clustering

  15. 4. Linear Clustering Algorithm- Local Dependency Pattern Clustering

  16. 4. Linear Clustering Algorithm- Global Surface Pattern Clustering

  17. Experiments • Wikipedia dump on 03/12/2008 • Two categories • American chief executives • 526 articles, 7310 concept pairs • 1/3,1/3 for Dl and Dg , 18 groups • Companies • 434 articles, 4935 concept pairs • 1/3, 1/3 for Dl and Dg , 28 groups • Compare with • B. Rosenfeld and R. Feldman. 2007. Clustering for Unsupervised Relation Identification. In Proceedings of CIKM-2007. • surface feature

  18. Experiments

  19. Experiments

  20. Conclusions • A novel distance function • A linear clustering algorithm • Combination of two kinds of patterns • Dependence patterns • Surface patterns • J. Chen, D. Ji, C.L. Tan, and Z. Niu. 2005. Unsupervised Feature Selection for Relation Extraction. In Proceedings of JCNLP-2005. • R. Bunescu and R. Mooney. 2005. A shortest path dependency kernel for relation extraction. In Proceedings of HLT/EMLNP-2005. • M. Zhang, J. Zhang, J. Su and G. Zhou. 2006. A Composite Kernel to Extract Relations between Entities with both Flat and Structured Features. In Proceedings of ACL-2006. • B. Rosenfeld and R. Feldman. 2006. URES: an Unsupervised Web Relation Extraction System. In Proceedings of COLING/ACL-2006.

More Related