1 / 23

The Topology of WordNet: some metrics

The Topology of WordNet: some metrics. Ann Devitt and Carl Vogel Computational Linguistics Group Trinity College Dublin, Ireland. Introduction. Measures WordNet “sub-hierarchies” Multiple inheritance Branching Factor Depth versus Height Cluster coefficients Specificity pilot study.

torgerson
Download Presentation

The Topology of WordNet: some metrics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Topology of WordNet:some metrics Ann Devitt and Carl Vogel Computational Linguistics Group Trinity College Dublin, Ireland

  2. Introduction • Measures • WordNet “sub-hierarchies” • Multiple inheritance • Branching Factor • Depth versus Height • Cluster coefficients • Specificity pilot study Ann Devitt, TCD

  3. Terminology • WordNet as directed acyclic graph • Node and synset interchangeable Ann Devitt, TCD

  4. Dimensional distribution Ann Devitt, TCD

  5. Overlap between hierarchies • 2072 synsets: more than 1 top hierarchy • 35 synsets: more than 2 top hierarchies Ann Devitt, TCD

  6. Some overlap examples • Abstraction and Event • 948 synsets • group action • Entity and Group • 250 nodes • weaponry Ann Devitt, TCD

  7. Multiple inheritance • 2.6% of nodes • Normal distribution throughout depth • Significantly different in different taxonomies: • χ2 (8, N=75180)=324.27, p≤0.001 Ann Devitt, TCD

  8. Parents = 1, depth < 3 damnation office Parents = 1, depth > 8 beagle palomino Parents > 1, depth < 3 person artefact Parents > 1, depth > 8 sea bass self-condemnation bombardon Specificity examples Ann Devitt, TCD

  9. Branching Factor • Number of children + 1 • Including leaf nodes • Range: 1 – 573 • Average: 2.023 • Excluding leaf nodes: • Average: 5.793 • 97% less than 20 Ann Devitt, TCD

  10. Branching factor • Overall low branching factor • Same distribution in all sub-hierarchies • Large number of nodes in total • Greater overall depth in paths • Not a shallow structure • despite 55,000 leaf nodes Ann Devitt, TCD

  11. Depth vs Height • Depth: • Maximum = 18 • Normal distribution • Height: • Maximum = 5 • 93.6% 1 or 2 nodes from a leaf node • Zipfian distribution Ann Devitt, TCD

  12. Depth vs Height • Reported distributions • the same across the different sub hierarchies • Depth is a more informative measure Ann Devitt, TCD

  13. Clustering coefficient • Measure of graph connectivity • Ratio: • Number of connections btwn nodes • Possible number of connections 2 Σi ki (ki – 1) Ann Devitt, TCD

  14. Cluster coefficients • First-order measure • Not useful for WordNet • Only 62 nodes have a coefficient > 0 • Does not form clusters readily Ann Devitt, TCD

  15. Cluster coefficients • Second-order measure • Average 0.337 • Normal distribution • May form clusters of wider diameter Ann Devitt, TCD

  16. Pilot Study Aims • Do people have a notion of generality/specificity for concepts? • Do people agree on what is more/less general/specific? • What features of WordNet do these judgments correlate with? Ann Devitt, TCD

  17. Sample ranking task I • Axis, axis of rotation – (the center around which something rotates • River boat – (a boat used on rivers or to ply a river) • Remains – (any object that is left unused or still extant; “I threw out the remains of my dinner” Ann Devitt, TCD

  18. Sample ranking task II • rational motive - (a motive that can be defended by reasoning or logical argument • disapproval - (the act of disapproving or condemning) • harmony, concord, concordance - (agreement of opinions) Ann Devitt, TCD

  19. Do people agree on what is more/less general/specific? YES • Cochran Q statistic (Cochran 1950) • H0 : that any agreement between respondents is due to chance • Overall: for 11 respondents • Cochran's Q 165.859 • 44 degrees of freedom • Asymp. Sig. .000 Ann Devitt, TCD

  20. What WN features correlate? • Depth • Less deep = more general • Children • Inconclusive • Sisters • Less sisters = more general • Sub-hierarchy • Did not seem to affect judgments • Did increase the difficulty of the task Ann Devitt, TCD

  21. Conclusion • WordNet metrics • Inheritance: Sub-hierarchy and parentage • Branching Factor • Distance: depth and height • Clustering • Pilot study • Suggests where to go with a larger study Ann Devitt, TCD

  22. Bibliography • W. G. Cochran: The comparison of percentages in matched samples. Biometrika, 37:256-266, 1950 • David Touretsky: The Mathematics of Inheritance Systems, Los Altos, CA: Morgan Kaufmann (1986) • D. J. Watts and S. H. Strogatz: Collective dynamics of small world networks, Nature 401, 130 (1999) Ann Devitt, TCD

  23. Multiple Inheritance vs Depth Ann Devitt, TCD

More Related