1 / 31

A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch

A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch. Roberto Navigli, Paola Velardi and Stefano Faralli {navigli,velardi,faralli}@di.uniroma1.it. http://lcl.uniroma1.it. ERC StG: Multilingual Joint Word Sense Disambiguation (MultiJEDI) Roberto Navigli. 1.

Download Presentation

A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch Roberto Navigli, Paola Velardi and Stefano Faralli {navigli,velardi,faralli}@di.uniroma1.it http://lcl.uniroma1.it ERC StG: Multilingual Joint Word Sense Disambiguation (MultiJEDI) Roberto Navigli 1

  2. A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch Roberto Navigli, Paola Velardi and Stefano Faralli Motivations We present a graph-based approach to learn a lexical taxonomy automatically starting from a domain corpus and the Web. Unlike other approaches, we learn both concepts and relations entirely from scratch in 3 steps: 1) term extraction 2) definition and hypernym extraction 3) graph pruning

  3. A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch Roberto Navigli, Paola Velardi and Stefano Faralli Taxonomy Learning Workflow Web glossaries & documents Domain terms Domain terms Upper terms Domain Corpus Domain filtering Definition & hypernym extraction Graph pruning Terminology extraction Hypernym graph Induced taxonomy

  4. A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch Roberto Navigli, Paola Velardi and Stefano Faralli Taxonomy Learning Workflow Web glossaries & documents Domain terms Domain terms Upper terms Domain Corpus Domain filtering Definition & hypernym extraction Graph pruning Terminology extraction Hypernym graph Induced taxonomy

  5. maximum likelihood flow network mesh generation hash function pattern recognition information processing • A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch Roberto Navigli, Paola Velardi and Stefano Faralli Terminology Extraction Domain Corpus Domain terms http://lcl.uniroma1.it/termextractor

  6. A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch Roberto Navigli, Paola Velardi and Stefano Faralli Taxonomy Learning Workflow Taxonomy Learning Workflow Web glossaries & documents Domain terms Domain terms Upper terms Domain Corpus Domain filtering Definition & hypernym extraction Graph pruning Terminology extraction Hypernym graph Induced taxonomy

  7. non domain domain • A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch Roberto Navigli, Paola Velardi and Stefano Faralli Definition & Hypernym Extraction + Domain Filtering Web glossaries & documents Domain Corpus Domain terms flow network definition extraction (WCL) In graph theory, a flow network is a directed graph. Global Cash Flow Network is a business opportunity to make money online. A flow network is a network with two distinguished vertices.

  8. network directed graph flow network • A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch Roberto Navigli, Paola Velardi and Stefano Faralli Definition & Hypernym Extraction + Domain Filtering Web glossaries & documents Domain Corpus Domain terms flow network definition extraction (WCL) In graph theory, a flow networkis a directed graph. A flow network is a network with two distinguished vertices. directed graph hypernym extraction network

  9. graph data structure network directed graph flow network • A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch Roberto Navigli, Paola Velardi and Stefano Faralli Definition & Hypernym Extraction + Domain Filtering Web glossaries & documents Domain Corpus Terms from previous iteration directed graph definition extraction (WCL) A directedgraph is a graph where ... A directed graph is a data structure ... graph hypernym extraction data structure

  10. A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch Roberto Navigli, Paola Velardi and Stefano Faralli Hypernym Extraction Algorithm (1) • Large training set with many uncommon patterns “X is a ADJ term that refers to a kind of Y” • Annotated with 4 fields: definiendum (D), definitor (V) containing the verbal pattern and definiens (H) containing the hypernym, and the rest of the sentence (R). • An <Albedo> (often represented by the generic formula HA)/ is traditionally considered / any chemical compound/ that, when dissolved in water, gives a solution with a hydrogen ion activity greater than in pure water • The algorithm builds a set of word lattices from the training set. Independent lattices are created for each of the 3 basic fields

  11. A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch Roberto Navigli, Paola Velardi and Stefano Faralli Hypernym Extraction Algorithm (2) Lattice learning consists of three steps: • each sentence in the training set is pre-processed and each field is generalized to a star pattern “[In arts, a chiaroscuro]D [is]V [a monochrome picture]H.” D=“In *, a <TARGET>”, V=“is”, H=“a * <HYPER>”

  12. A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch Roberto Navigli, Paola Velardi and Stefano Faralli Hypernym Extraction Algorithm (3) • Clustering: for each field, the training sentences are then clustered according to the star patterns they belong to; In arts, a chiaroscuro is a monochrome picture. In mathematics, a graph is a data structure that consists of . . . In computer science, a pixel is a dot that is part of a computer image. D: In * , a <TARGET> V: is H: a * <HYPER>

  13. A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch Roberto Navigli, Paola Velardi and Stefano Faralli Hypernym Extraction Algorithm (4) 3. Word-Class Lattice construction: for each sentence cluster, a WCL is created by means of a greedy alignment algorithm

  14. A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch Roberto Navigli, Paola Velardi and Stefano Faralli Performance in definition extraction Wikipedia UKWac corpus Outperforms existing methods for definition extraction

  15. A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch Roberto Navigli, Paola Velardi and Stefano Faralli Precision in hypernym extraction Wikipedia UKWac Pattern-based methods achieve much lower recall: 62 vs. 383 hypernyms extracted from UKWac

  16. A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch Roberto Navigli, Paola Velardi and Stefano Faralli The iterative growth of the hypernym graph

  17. A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch Roberto Navigli, Paola Velardi and Stefano Faralli The iterative growth of the hypernym graph

  18. A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch Roberto Navigli, Paola Velardi and Stefano Faralli Taxonomy Learning Workflow Web glossaries & documents Domain terms Domain terms Upper terms Domain Corpus Domain filtering Definition & hypernym extraction Graph pruning Terminology extraction Hypernym graph Induced taxonomy

  19. A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch Roberto Navigli, Paola Velardi and Stefano Faralli Graph Pruning Given the hypernym graph 1) We disconnect false roots and false leaves. 2) We weight edges and nodes with a novel weighting algorithm.

  20. A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch Roberto Navigli, Paola Velardi and Stefano Faralli Graph Pruning Given the hypernym graph 0 5 1) We disconnect false roots and false leaves. 5 5 5 2 3 2) We weight edges and nodes with a novel weighting algorithm. 8 8 8 7 7 1 2 1 2 10 9 0 1

  21. A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch Roberto Navigli, Paola Velardi and Stefano Faralli Graph Pruning Given the hypernym graph 0 5 1) We disconnect false roots and false leaves. 5 5 5 2 3 2) We weight edges and nodes with a novel weighting algorithm. 8 8 8 7 7 1 2 1 2 3) We apply Chu-Liu/Edmond's algorithm, to obtain an Optimal Branching. 10 9 0 1

  22. A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch Roberto Navigli, Paola Velardi and Stefano Faralli Graph Pruning Given the hypernym graph 1) We disconnect false roots and false leaves. 2) We weight edges and nodes with a novel weighting algorithm. 3) We apply Chu-Liu/Edmond's algorithm, to obtain an Optimal Branching. As a result we obtain a tree-like taxonomy.

  23. A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch Roberto Navigli, Paola Velardi and Stefano Faralli From the Noisy Hypernym Graph...

  24. A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch Roberto Navigli, Paola Velardi and Stefano Faralli Application to ACL Taxonomy • ACL Anthology from year 1979 to 2010 (4176 papers). • 29 upper terms from WordNet’s abstaction • 10,000 terms extracted, first 2000 inspected, 1006 selected (eliminated e.g. : word pair, input sentence, human judgement) • 5 iterations, 1329 definitions, 1031 nodes 1274 edges • After pruning, 936 nodes 935 edges

  25. A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch Roberto Navigli, Paola Velardi and Stefano Faralli Evaluation (5 annotators) Another application of the algorithm starting from IJCAI collection, similar results

  26. Evaluation: WordNet reconstruction • Same evaluation strategy as in Kozareva&Hovy (EMNLP2010) • Only nodes both in WordNet and in the acquired taxonomy are considered in the evaluation (as in K&H)

  27. A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch Roberto Navigli, Paola Velardi and Stefano Faralli Future work • From “strict” taxonomy to lattice • A in-house implementation of google “define” to overcome search limitations (no API for Google define) • Extension to other languages

  28. http://lcl.uniroma1.it April 2011

  29. Initial terminology Upper terms Hypernyms from iteration I Hypernyms from iteration II Hypernyms from iteration III Hypernyms from iteration IV Hypernyms from iteration V

More Related