- 218 Views
- Uploaded on
- Presentation posted in: General

A Comparison of Graphical Techniques for the Display of Co-Occurrence Data

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

A Comparison of Graphical Techniques for the Display of Co-Occurrence Data

Jan W. Buzydlowski, Xia Lin, Howard D. White

College of Information Science and Technology

Drexel University

Philadelphia, PA 19104

USA

- (Data) Visualization allows for the revelation of intricate structure which cannot be absorbed in any other way. [Cleveland, 1993]
- (Information) Visualization has two aspects, structural modeling and graphic representation.[C. Chen, 1999]
- data - model - display

- Model - Display
- Co-Occurrence Model
- 3 Graphical Displays

- Data
- Co-citation counts from the Institute for Scientific Information, Philadelphia, PA
- Obtained from a 10-year Arts & Humanities Citation Index database given Drexel by ISI for research purposes

- Co-citation counts from the Institute for Scientific Information, Philadelphia, PA

- Examples
- Derivation
- Metrics

- Market Basket Analysis
- a shopping cart holds items purchased
- e.g., milk, bread, razor blades, newspaper

- a shopping cart holds items purchased
- Over all the sales for one day
- what items are purchased together
- how can we arrange the items in the store
- Pampers and beer on Thursdays...

- how can we arrange the items in the store

- what items are purchased together

- Author Co-citation Analysis (ACA)
- Bibliographic data on a given article holds, e.g.,
- title, keywords, abstract, citations to other documents

- An article might cite, e.g.:
- Plato, Aristotle, Smith, Brown

- Bibliographic data on a given article holds, e.g.,
- Over a given set of many citing articles
- Count how many times each pair of authors were cited together
- Resulting co-citation count shows common intellectual interest

- For a given data set (N = 4 unique terms)
- Article 1: Plato, Aristotle, Smith
- Article 2: Plato, Smith
- Article 3: Plato, Aristotle, Smith, Brown

- The following co-citations (C(4,2) = 6) are found
- COMBINATIONCOUNTARTICLES
- Plato and Smith31, 2, 3
- Plato and Aristotle21, 3
- Plato and Brown13
- Aristotle and Smith21, 3
- Aristotle and Brown13
- Smith and Brown13

- Raw counts
- Additional information
- Correlations
- Replace each cell by correlation measure of each pair-wise column

- Conditional Probability
- Compute each cell by dividing each unique combination by total occurring

- Correlations

- Three Methodologies
- Multi-dimensional scaling
- Self-organizing maps
- Pathfinder networks

- Given original distances (similarities) estimate coordinates that could give those distances
- The computed distances should correspond to the original distances
- Stress
- Added dimensions

- Stress

- Also known as Kohonen Maps
- Based on Neural Networks
- Related to wetware
- robust techniques

- If categories are known
- supervised technique
- backproprogating learning

- supervised technique
- If categories are sought
- unsupervised technique
- competitive learning

- unsupervised technique

- Related to wetware

- Given a 2-D grid of nodes
- each node has N weights
- each vector (row) has N terms
- map each input vector to a node

- Similar to vector quantization (VQ)

- nodes initially given random weights
- randomly sample an input vector
- row of co-occurrence matrix
- with replacement

- find a node closest to vector
- Euclidean distance

- update node weights
- node weight = node weight + gain term * distance
- update “neighborhood”

- “cool” gain term and neighborhood
- repeat…

- Uses on graph notation
- nodes = authors
- edges = co-citation counts

- Co-occurrence is a complete network (weighted, undirected)

Plato

3

Smith

2

2

Aristotle

- Pathfinder Network is generated by varying the parameters:
- distance (r)
- triangle inequality (q)

- Uses Minkowski metric:
d = ( eir )1/r

- Example
- e1 = 3, e2 = 4
- r = 1 => d: 7 = 3 + 4 :
- Driving distance / ratio data

- r = 2 => d: 5 = (9 + 16)1/2
- Euclidean Distance

- r (approaches) infinity => d: 4 = max( 3, 4)
- ordinal data
- rank rather than value

- A required property of a metric definition
d(i,j) < d(i,k) + d(k,j)

- But may not be justified
- in personal judgments
- If a is similar to b, and b is similar to c, there may be no transitive judgment of similarity from a to c

- in set intersections
- Even though Smith and Jones appear 12 times, and Jones and Brown appear 5 times, the overlap between Smith and Brown cannot be predicted

- in personal judgments

- Defines q-triangular
- check paths of length q to determine if inequality is met
- minimum is 2
- maximum is n -1
- full compliance

- the longer the length, the fewer the connections

- check paths of length q to determine if inequality is met

- PFNet (r, q)
- Examine all paths of length q or less.
- Use Minkowski Metric with parameter r to compute path length.
- If a path of less weight is found, then remove the edge.

Smith

5

Jones

q = 2

4

3

Brown

r = 1 => Smith - Jones is kept

r = 2 => Smith - Jones is kept

r = infinity => Smith - Jones is removed

- MDS
- Reduces dimensions / reveals clusters
- 2D may be insufficient
- measurement may not be Euclidean

- Reduces dimensions / reveals clusters
- SOM
- robust
- no guarantee of convergence/unique solution

- robust
- Pathfinder
- does not assume ratio data/triangle inequality
- connections rather than position is important
- additional methodology needed for display

- does not assume ratio data/triangle inequality

- Similarities
- Spatial models

- Differences
- use of visual space
- semantic meaning
- as related to data
- research in progress

- as related to data

- MDS
- assume that 2 dimensions are sufficient
- x, y for each point already defined

- assume that 2 dimensions are sufficient
- SOM
- grid defines the 2D surface
- plot each label with the appropriate node

- grid defines the 2D surface
- Pathfinder
- only defines the nodes and links
- need additional methodologies
- Spring-embedder models
- Kamada and Kawai (1989)
- Fruchterman and Reingold (1991)
- Davidson and Harel (1996)

- Spring-embedder models

- need additional methodologies

- only defines the nodes and links

- Data
- Institute for Scientific Information
- Arts and Humanities Database (AHCI)
- 1988 - 1997
- 1.26 million records

- Example:
- Given Plato, find related authors
- Interface described in IV 2000 Paper
- CSNA 2000 Paper
- (Lin, Buzydlowski, White)

- Given Plato, find related authors

PLATO (4928)

ARISTOTLE (1861)

PLUTARCH (838)

CICERO (699)

HOMER (627)

BIBLE (552)

EURIPIDES (515)

ARISTOPHANES (474)

XENOPHON (459)

AUGUSTINE (432)

HERODOTUS (425)

KANT-I (385)

AESCHYLUS (374)

SOPHOCLES (363)

THUCYDIDES (363)

OVID (334)

HESIOD (325)

DIOGENES-LAERTIUS (317)

HEIDEGGER-M (312)

DERRIDA-J (304)

PINDAR (292)

NIETZSCHE-F (278)

HEGEL-GWF (264)

VERGIL (259)

AQUINAS-T (255)

- 1:PLATO AND ARISTOTLE -1940 docs
- 2: PLATO AND PLUTARCH - 872 docs
.

.

.

- 300: VERGIL AND AQUINAS-T - 38 docs

Visualization allows for the revelation of intricate structure which cannot be absorbed in any other way...

2D MDS map of 25 authors co-cited with Plato

PFNet of 25 authors

co-cited with Plato

AESCHYLUS

SOPHOCLES

EURIPIDES

HESIOD

AUGUSTINE

HOMER

PINDAR

BIBLE

ARISTOPHANES

PLATO

DIOGENES-LAERTIUS

ARISTOTLE

XENOPHON

KANT-I

CICERO

AQUINAS-T

PLUTARCH

HEIDEGGER-M

THUCYDIDES

DERRIDA-J

HEGEL-GWF

HERODOTUS

OVID

NIETZSCHE-F

VERGIL

- Slides available at:
- faculty.cis.drexel.edu/~jbuzydlo/
- [email protected]

- Chen, Chaomei, Information Visualization and Virtual Environments, 1999.
- Cleveland, William S., Visualizing Data, Hobart Press, 1993.
- Davidson, R, Harel, D, Drawing Graphs Nicely Using Simulated Annealing, ACM Transactions on Graphics, 15(4): 301-31 (1996).
- Fruchterman,TMJ, Reingold, EM, Graph Drawing by Force-Directed Placement, Software Practice and Experience, 21: 1129-64 (1991).
- Kamada, T,Kawai, S, An Algorithm for Drawing General Undirected Graphs, Information Processing Letters, 31(1): 7-15, (1989).