The Large-Scale Structure of Semantic Networks

The Large-Scale Structure of Semantic Networks A. Tuba Baykara Cognitive Science 2002700187

1) Introduction 2) Analysis of 3 semantic networks and their statistical properties - Associative Network - WordNet - Roget’s Thesaurus 3) The Growing Network Model proposed by the authors - Undirected Growing Network Model - Directed Growing Network Model 4) Psychological Implications of the findings 5) General Discussion and Conclusions Overview

1) Introduction • Semantic Network:A network where concepts are represented as hierarchies of inter-connected nodes, which are linked to characteristic attributes. • Important to understand their structure because they reflect the organization of meaning and language. • Statistical similarities important because of their implications on language evolution and/or acquisition. • Would a similar model have the same statistical properties?  Growing Network Model

1) IntroductionPredictions related to the model 1- It would have the same characteristics: * Degree distribution would follow a power-law  some concepts would have much higher connections * Addition of new concepts would not change such structure  Scale-free (vs. small-world!!) 2- Previously added (early acquired) concepts would have higher connectivity than later added (acquired) concepts.

1) IntroductionTerminology • Graph, network • Node, edge (undirected link), arc (directed link), degree • Avg. shortest path (L), diameter (D), clustering coefficient (C), degree distribution () • Small-world network, random graph

2) Analysis of 3 Semantic Networksa. Associative Network • “The University of South Florida Word Association, Rhyme and Word Fragment Norms” • >6000 thousand participants; 750,000 responses to 5,019 cues (stimulus words) • great majority of these words are nouns (76%), but adjectives (13%) and verbs (7%), and other parts of speech are also represented. In addition, 16% are identified as homographs

2) Analysis of 3 Semantic Networksa. Associative Network Examples: • BOOK _______ BOOK READ • SUPPER _______ SUPPER LUNCH

2) Analysis of 3 Semantic Networksa. Associative Network (when SUPPER was normed, it produced LUNCH as a target with a forward strength of .03) Note: for simplicity, the networks were constructed with all arcs and edges unlabeled and equally-weighted. Forward & backward strength imply directions.

2) Analysis of 3 Semantic Networksa. Associative Network I) Undirected network • Word nodes were joined by an edge if associatively related, regardless of associative direction The shortest path from VOLCANO to ACHE is highlighted.

2) Analysis of 3 Semantic Networksa. Associative Network II) Directed network • Words x & y were joined by an arc from x to y if cue x evoked y as an associative response all shortest directed paths from VOLCANO to ACHE are shown.

2) Analysis of 3 Semantic Networksb. Roget’s Thesaurus • 1911 edition with 29,000 words from 1,000 categories • A connection is made only between a word and a semantic category, if that word is within that category.  bipartite graph

2) Analysis of 3 Semantic Networksb. Roget’s Thesaurus Bipartite graph Unipartite graph

2) Analysis of 3 Semantic Networksc. WordNet • Developed by George Miller at the CogSci Lab in Princeton Uni.: http://wordnet.princeton.edu • Based on the relation between synsets; contained more than 120k word forms and 99k meanings • ex: The noun "computer" has 2 senses in WordNet.1. computer, computing machine, computing device, data processor, electronic computer, information processing system -- (a machine for performing calculations automatically)2. calculator, reckoner, figurer, estimator, computer -- (an expert at calculation (or at operating calculating machines))

2) Analysis of 3 Semantic Networksc. WordNet • Links are between word forms and their meanings according to the relationships between word forms such as: • SYNONYMY • POLYSEMY • ANTONYMY • HYPERNYMY (Computer is a kind of machine/device/object.) • HYPONYMY (Digital computer/Turing machine… is a kind of computer) • HOLONYMY (Computer is a part of a platform) • MERONYMY (CPU/chip/keyboard… is a part of a computer) • Links can be established in any desired way, so WordNet treated as an undirected graph.

2) Analysis of 3 Semantic NetworksStatistical Properties I) How sparse are the 3 networks? • <k>: avg. # of connections •  In all 3, a node is connected to only a small % of other nodes. II) How connected are the networks? • Undirected A/N: completely connected • Directed A/N: largest connected component has 96% of all words • WordNet & Thesaurus: 99% All further analyses with these components!!!

2) Analysis of 3 Semantic NetworksStatistical Properties

2) Analysis of 3 Semantic NetworksStatistical Properties III) Short Path-length (L) and Diameter (D) • In WordNet & Thesaurus, L & D based on a sample of 10,000 words. In A/N, all words considered. •  L & D in random graphs with equivalent size; expected IV) Local Clustering (C) • To measure its C, directed A/N regarded as undirected • To calculate C of Thesaurus, bipartite graph converted into unipartite graph • C of all 4 networks much higher than in random graphs

2) Analysis of 3 Semantic NetworksStatistical Properties V) Power-Law Degree Distribution () • All distributions are plotted in log- • log coordinates with the line showing • best fitting power law distribution. • in of Directed A/N lower than the • rest These semantic networks are scale-free!

2) Analysis of 3 Semantic NetworksStatistical Properties / Summary • Sparsity & High-Connectivity On avg. words are related to only a few other words • Local Clustering Connections between words are coherent and transitive: if xy and yz; then xz • Short Path Length and Diameter Language is expressive and flexible (thru’ polysemy & homonymy..) • Power-Law Degree Distribution Language hosts hubs as well as many words connected to few others

3) The Growing Network Model • Inspired by Barabási & Albert (1999) • Incorporates both growth and preferential attachment • Aim: to see whether the same mechanisms are at work or not in real-life semantic networks and artificial ones • Might be applied to lexical development in children + • growth of semantic structures across languages, or even language evolution

3) The Growing Network Model Assumptions: • how children learn concepts is thru’ semantic differentiation: a new concept differentiates an already existing one, acquires a similar meaning, but also different, with a different pattern of connectivity. • more complex concepts get more differentiated • more frequent concepts get more involved in differentiation

3) The Growing Network ModelStructure • Nodes are words, and connections are semantic associations/relations • Nodes are different in their utility frequency of use • Over time new nodes are added and attached to existing nodes probabilistically according to: • Locality principle: New links are added only into a local neighborhood  a set of nodes with a common neighbor • Size principle: New connections will be to neighborhoods with already large # of connections • Utility principle: New connections within a neighborhood will be onto nodes with high utility (rich-get-richer phenomenon)

3) The Growing Network Modela. Undirected GN Model • Aim: To grow a network with n nodes • # of nodes at time t is n(t) • Start with a fully connected network of M nodes (M<<n) • At each t, add a node i with M links (chosen for a desired avg. density of connections) into a local neighborhoodHi  the set of neighbors of i including i itself. • Choose a neighborhood according to the size principle: ki(t): degree of node i at time t Ranges over all current n(t) in the network

3) The Growing Network Modela. Undirected GN Model • Connect to a node j in the neighborhood of node i according to the utility principle: • If all utilities are equal, make a connection randomly: • Stop when n nodes are reached. Uj= log(fj+1); fjtaken from Kučera & Francis(1967) frequency count Ranges over all nodes in Hi

3) The Growing Network Modela. Undirected GN Model • The growth process and a small resulting network with n=150, M=2:

3) The Growing Network Modelb. Directed GN Model • Very similar to the Undirected GN Model: insert nodes with M arcs instead of links • Same equations to apply locality,size and utility principles, since: ki = kiin + kiout • Difference: Direction Principle: majority (!) of arcs are pointed from new nodes to existing nodes  the p that an arc points away from the new node is , where >0.5 is assumed; so most arcs will point towards existing nodes.

3) The Growing Network ModelModel Results • Due to computational constraints, the GN model was compared only with A/N model. • n=5018; M=11 and M=12 in the undirected and directed GN models respectively. • The only free parameter in Directed GN model, , was set to 0.95 • The networks produced by the model are similar to A/N in terms of their L, D, C. Same low in as in Directed A/N.

3) The Growing Network ModelModel Results • Also checked if the same results would be produced when the Directed GN Model was converted into an undirected one. why!? • Convert all arcs into links, with M=11 and =0.95 • Results similar to Undirected GN model. •  Degree distribution follows a power-law

3) The Growing Network ModelArgument • L, C and  from the artificial networks were expected to compare to real-life networks: • incorporation of growth • incorporation of preferential attachment (locality, size & utility principles) • Do models without growth not produce such power-laws? • Analyze the co-occurrence of words within a large corpus Latent Semantic Analysis (LSA): meaning of words can be represented by vectors in a high dimensional space • Landauer & Dumais (1997) have already shown that local neighborhoods in semantic space captures semantic relations between words.

3) The Growing Network ModelLSA Results • Higher L, D and C than in real-life semantic networks • Very different degree-distribution. The distributions do not follow a power-law. Difficult to interpret the slope of the best fitting line.

3) The Growing Network ModelLSA Results • Analysis of the TASA corpus (>10mio words) using LSA vector representation: All words from LSA (>92k) represented as vectors All words from A/N in TASA Most freq. words in TASA

3) The Growing Network ModelLSA Results • Non-existence of power-law degree distribution implies LSA does not produce hubs. • In contrast, a growing model provides a principled explanation for the origin of power-law: Words with high connectivity acquire even more connections over time.

4) Psychological Implications • Number of connections a node has is related to the time at which the node is introduced into the network. • Predictions: • Concepts that are learned early in life will have more connections than concepts learned later. • Concepts with high utility (frequency) will receive more links than concepts with lower utility.

4) Psychological ImplicationsAnalysis of AoA-related data To test the prediction, two data sets were analyzed: I) Age of Acquisition Ratings (Gilhooly & Logie, 1980) • AoA effect: Early acquired words are retrieved from memory more rapidly than late acquired words • An experiment with 1,944 words • Adults were required to estimate the age at which they thought they first learned a word on a rating scale (100-700, 700 rated to be very late-learned concept) II) Picture naming norms (Morrison, Chappell & Ellis, 1997) • Estimation of the age at which 75% of children could successfully name the object depicted by a picture

4) Psychological ImplicationsAnalysis of AoA-related data Predictions are confirmed! Standard error bars around the means

4) Psychological ImplicationsDiscussion • Important consequences on psychological research on AoA and word frequency • Weakens: • AoA affects mainly the speech output system • AoA & word frequency display their effect on behavioral tasks independently • Confirms: • early acquired words show short naming-latencies and lexical-decision-latencies • AoA affects semantic tasks • AoA is mere cumulative frequency

4) Psychological ImplicationsCorrelational Analysis of Findings • Early acquired words have more semantic connections (more central in an underlying semantic network)  early acquired words have higher degree centrality • Centrality can also be measured by computing the eigenvector of the adjacency matrix with the largest eigenvalue. • Analysis of how degree centrality, word frequency and AoA from previous rating & naming studies correlate with 2 databases: • Naming-latency db of 796 words • Lexical-decision-latency db of 2,905 words

4) Psychological ImplicationsCorrelational Analysis of Findings • Centrality negatively correlates • with latencies • AoA correlates positively with • latencies • Word frequency correlates • negatively with latencies • When effects of word freq. and • AoA partialled out, centrality- • latency correlation remain • significant  there must be other • variables

5) General Discussion and Conclusions • Weakness of correlational analysis: direction of causation is unknown: • Because acquired early, a word will have more connections vs. • Because of having more connections, a word will be acquired early • A connectionist model can produce similar results: early acquired words are learnt better.

5) General Discussion and Conclusions • Power-law degree distributions in semantic networks can be understood by semantic growth processes  hubs • Non-growing semantic representations as LSA do not produce such a distribution per se. • Early acquired concepts have richer connections  confirmed by AoA norms.

References • Barabási, A.L., & R. Albert (1999). Emergence of scaling in random network models. Science, 286, 509-512. • Gilhooly, K.J., & R.H.Logie (1980). Age of Acquisition, imagery, concreteness, familiarity and ambiguity measures for 1944 words. Behavior Research Methods and Instrumentation, 12, 395-427. • Kučera, H., & W.N.Francis (1967). Computational analysis of present-day American English. Providence, RI: Brown University Press. • Landauer, T.K., & S.T.Dumais (1997). A solution to Plato’s problem: The Latent Semantic analysis theory of acquisition, induction, and representation of knowledge. Psychological Review, 104, 211-240. • Morrison, C.M., T.D.Chappell and A.W.Ellis (1997). Age of Acquisition norms for a large set of object names and their relation to adult estimates and other variables. Quarterly Journal of Experimental Psychology, 50A, 528-559.

Thanks for your attention! Questions / comments are appreciated.

2) Analysis of 3 Semantic Networksc. WordNet Number of words, synsets, and senses POS Unique Synsets Total Word-Strings Sense Pairs Noun 114,648 79,689 141,690 Verb 11,306 13,508 24,632 Adjective 21,436 18,563 31,015 Adverb 4,669 3,664 5,808 Totals 152,059 115,424 203,145

2) Analysis of 3 Semantic NetworksStatistical Properties With N nodes and <k> avg.degree • If <k> = pN < , the graph is composed of isolated trees • If <k> > 1, a giant cluster appears • If <k>  ln(N), the graph is totally connected

Roget’s Thesaurus • WORDS EXPRESSING ABSTRACT RELATIONS • WORDS RELATING TO SPACE • WORDS RELATING TO MATTER • WORDS RELATING TO THE INTELLECTUAL FACULTIES • WORDS RELATING TO THE VOLUNTARY POWERS • WORDS RELATING TO THE SENTIMENT AND MORAL POWERS

The Large-Scale Structure of Semantic Networks

The Large-Scale Structure of Semantic Networks

Presentation Transcript

Large Scale Structure

The Statistical Properties of Large Scale Structure

Large-scale structure from 2dFGRS

Large Scale Integration of Senses for the Semantic Web

The Phenomenology of Large Scale Structure

Formation and Evolution of the Large Scale Structure

Extracting insight from large networks: implications of small-scale and large-scale structure

Large scale structure in the SDSS

Large-Scale Structure -- From the Moon?

Overview of Large Scale Structure

Large-scale Structure Simulations

Bayesian Large Scale Structure inference

The Large Scale Structure of the Universe

SPred : Large-scale Harvesting of Semantic Predicates

Large Scale IP Networks

THE MODERATELY LARGE SCALE STRUCTURE OF QUASARS

Extracting insight from large networks: implications of small-scale and large-scale structure

The Statistical Properties of Large Scale Structure

Cosmology and the Shape of Large-Scale Structure

The Large Scale Structure of the Universe

Large-scale structure from 2dFGRS