230 likes | 236 Views
The Topology of WordNet: some metrics. Ann Devitt and Carl Vogel Computational Linguistics Group Trinity College Dublin, Ireland. Introduction. Measures WordNet “sub-hierarchies” Multiple inheritance Branching Factor Depth versus Height Cluster coefficients Specificity pilot study.
E N D
The Topology of WordNet:some metrics Ann Devitt and Carl Vogel Computational Linguistics Group Trinity College Dublin, Ireland
Introduction • Measures • WordNet “sub-hierarchies” • Multiple inheritance • Branching Factor • Depth versus Height • Cluster coefficients • Specificity pilot study Ann Devitt, TCD
Terminology • WordNet as directed acyclic graph • Node and synset interchangeable Ann Devitt, TCD
Dimensional distribution Ann Devitt, TCD
Overlap between hierarchies • 2072 synsets: more than 1 top hierarchy • 35 synsets: more than 2 top hierarchies Ann Devitt, TCD
Some overlap examples • Abstraction and Event • 948 synsets • group action • Entity and Group • 250 nodes • weaponry Ann Devitt, TCD
Multiple inheritance • 2.6% of nodes • Normal distribution throughout depth • Significantly different in different taxonomies: • χ2 (8, N=75180)=324.27, p≤0.001 Ann Devitt, TCD
Parents = 1, depth < 3 damnation office Parents = 1, depth > 8 beagle palomino Parents > 1, depth < 3 person artefact Parents > 1, depth > 8 sea bass self-condemnation bombardon Specificity examples Ann Devitt, TCD
Branching Factor • Number of children + 1 • Including leaf nodes • Range: 1 – 573 • Average: 2.023 • Excluding leaf nodes: • Average: 5.793 • 97% less than 20 Ann Devitt, TCD
Branching factor • Overall low branching factor • Same distribution in all sub-hierarchies • Large number of nodes in total • Greater overall depth in paths • Not a shallow structure • despite 55,000 leaf nodes Ann Devitt, TCD
Depth vs Height • Depth: • Maximum = 18 • Normal distribution • Height: • Maximum = 5 • 93.6% 1 or 2 nodes from a leaf node • Zipfian distribution Ann Devitt, TCD
Depth vs Height • Reported distributions • the same across the different sub hierarchies • Depth is a more informative measure Ann Devitt, TCD
Clustering coefficient • Measure of graph connectivity • Ratio: • Number of connections btwn nodes • Possible number of connections 2 Σi ki (ki – 1) Ann Devitt, TCD
Cluster coefficients • First-order measure • Not useful for WordNet • Only 62 nodes have a coefficient > 0 • Does not form clusters readily Ann Devitt, TCD
Cluster coefficients • Second-order measure • Average 0.337 • Normal distribution • May form clusters of wider diameter Ann Devitt, TCD
Pilot Study Aims • Do people have a notion of generality/specificity for concepts? • Do people agree on what is more/less general/specific? • What features of WordNet do these judgments correlate with? Ann Devitt, TCD
Sample ranking task I • Axis, axis of rotation – (the center around which something rotates • River boat – (a boat used on rivers or to ply a river) • Remains – (any object that is left unused or still extant; “I threw out the remains of my dinner” Ann Devitt, TCD
Sample ranking task II • rational motive - (a motive that can be defended by reasoning or logical argument • disapproval - (the act of disapproving or condemning) • harmony, concord, concordance - (agreement of opinions) Ann Devitt, TCD
Do people agree on what is more/less general/specific? YES • Cochran Q statistic (Cochran 1950) • H0 : that any agreement between respondents is due to chance • Overall: for 11 respondents • Cochran's Q 165.859 • 44 degrees of freedom • Asymp. Sig. .000 Ann Devitt, TCD
What WN features correlate? • Depth • Less deep = more general • Children • Inconclusive • Sisters • Less sisters = more general • Sub-hierarchy • Did not seem to affect judgments • Did increase the difficulty of the task Ann Devitt, TCD
Conclusion • WordNet metrics • Inheritance: Sub-hierarchy and parentage • Branching Factor • Distance: depth and height • Clustering • Pilot study • Suggests where to go with a larger study Ann Devitt, TCD
Bibliography • W. G. Cochran: The comparison of percentages in matched samples. Biometrika, 37:256-266, 1950 • David Touretsky: The Mathematics of Inheritance Systems, Los Altos, CA: Morgan Kaufmann (1986) • D. J. Watts and S. H. Strogatz: Collective dynamics of small world networks, Nature 401, 130 (1999) Ann Devitt, TCD
Multiple Inheritance vs Depth Ann Devitt, TCD