Loading in 2 Seconds...

A Comparison of Graphical Techniques for the Display of Co-Occurrence Data

Loading in 2 Seconds...

- By
**Jimmy** - Follow User

- 276 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about 'A Comparison of Graphical Techniques for the Display of Co-Occurrence Data' - Jimmy

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

### A Comparison of Graphical Techniques for the Display of Co-Occurrence Data

Jan W. Buzydlowski, Xia Lin, Howard D. White

College of Information Science and Technology

Drexel University

Philadelphia, PA 19104

USA

Information Visualization

- (Data) Visualization allows for the revelation of intricate structure which cannot be absorbed in any other way. [Cleveland, 1993]
- (Information) Visualization has two aspects, structural modeling and graphic representation.[C. Chen, 1999]
- data - model - display

Visualization Overview

- Model - Display
- Co-Occurrence Model
- 3 Graphical Displays
- Data
- Co-citation counts from the Institute for Scientific Information, Philadelphia, PA
- Obtained from a 10-year Arts & Humanities Citation Index database given Drexel by ISI for research purposes

Co-Occurrence Model

- Examples
- Derivation
- Metrics

Co-Occurrence Data - Example 1

- Market Basket Analysis
- a shopping cart holds items purchased
- e.g., milk, bread, razor blades, newspaper
- Over all the sales for one day
- what items are purchased together
- how can we arrange the items in the store
- Pampers and beer on Thursdays...

Co-Occurrence Data - Example 2

- Author Co-citation Analysis (ACA)
- Bibliographic data on a given article holds, e.g.,
- title, keywords, abstract, citations to other documents
- An article might cite, e.g.:
- Plato, Aristotle, Smith, Brown
- Over a given set of many citing articles
- Count how many times each pair of authors were cited together
- Resulting co-citation count shows common intellectual interest

Co-Occurrence Derivation

- For a given data set (N = 4 unique terms)
- Article 1: Plato, Aristotle, Smith
- Article 2: Plato, Smith
- Article 3: Plato, Aristotle, Smith, Brown
- The following co-citations (C(4,2) = 6) are found
- COMBINATIONCOUNTARTICLES
- Plato and Smith 3 1, 2, 3
- Plato and Aristotle 2 1, 3
- Plato and Brown 1 3
- Aristotle and Smith 2 1, 3
- Aristotle and Brown 1 3
- Smith and Brown 1 3

Co-Occurrence Measures

- Raw counts
- Additional information
- Correlations
- Replace each cell by correlation measure of each pair-wise column
- Conditional Probability
- Compute each cell by dividing each unique combination by total occurring

Graphical Techniques

- Three Methodologies
- Multi-dimensional scaling
- Self-organizing maps
- Pathfinder networks

MDS Methodology

- Given original distances (similarities) estimate coordinates that could give those distances
- The computed distances should correspond to the original distances
- Stress
- Added dimensions

Self-Organizing Maps (SOMs)

- Also known as Kohonen Maps
- Based on Neural Networks
- Related to wetware
- robust techniques
- If categories are known
- supervised technique
- backproprogating learning
- If categories are sought
- unsupervised technique
- competitive learning

SOMs

- Given a 2-D grid of nodes
- each node has N weights
- each vector (row) has N terms
- map each input vector to a node
- Similar to vector quantization (VQ)

SOMs Generation

- nodes initially given random weights
- randomly sample an input vector
- row of co-occurrence matrix
- with replacement
- find a node closest to vector
- Euclidean distance
- update node weights
- node weight = node weight + gain term * distance
- update “neighborhood”
- “cool” gain term and neighborhood
- repeat…

Pathfinder Networks

- Uses on graph notation
- nodes = authors
- edges = co-citation counts
- Co-occurrence is a complete network (weighted, undirected)

Plato

3

Smith

2

2

Aristotle

Pathfinder Networks Generation

- Pathfinder Network is generated by varying the parameters:
- distance (r)
- triangle inequality (q)

Pathfinder Distance

- Uses Minkowski metric:

d = ( eir )1/r

- Example
- e1 = 3, e2 = 4
- r = 1 => d: 7 = 3 + 4 :
- Driving distance / ratio data
- r = 2 => d: 5 = (9 + 16)1/2
- Euclidean Distance
- r (approaches) infinity => d: 4 = max( 3, 4)
- ordinal data
- rank rather than value

Pathfinder Triangle Inequality

- A required property of a metric definition

d(i,j) < d(i,k) + d(k,j)

- But may not be justified
- in personal judgments
- If a is similar to b, and b is similar to c, there may be no transitive judgment of similarity from a to c
- in set intersections
- Even though Smith and Jones appear 12 times, and Jones and Brown appear 5 times, the overlap between Smith and Brown cannot be predicted

Pathfinder Triangle Inequality

- Defines q-triangular
- check paths of length q to determine if inequality is met
- minimum is 2
- maximum is n -1
- full compliance
- the longer the length, the fewer the connections

Pathfinder Network Creation

- PFNet (r, q)
- Examine all paths of length q or less.
- Use Minkowski Metric with parameter r to compute path length.
- If a path of less weight is found, then remove the edge.

Pathfinder - Example

Smith

5

Jones

q = 2

4

3

Brown

r = 1 => Smith - Jones is kept

r = 2 => Smith - Jones is kept

r = infinity => Smith - Jones is removed

Comparison of Techniques

- MDS
- Reduces dimensions / reveals clusters
- 2D may be insufficient
- measurement may not be Euclidean
- SOM
- robust
- no guarantee of convergence/unique solution
- Pathfinder
- does not assume ratio data/triangle inequality
- connections rather than position is important
- additional methodology needed for display

Comparison of Techniques

- Similarities
- Spatial models
- Differences
- use of visual space
- semantic meaning
- as related to data
- research in progress

Graphical Display of Methodologies

- MDS
- assume that 2 dimensions are sufficient
- x, y for each point already defined
- SOM
- grid defines the 2D surface
- plot each label with the appropriate node
- Pathfinder
- only defines the nodes and links
- need additional methodologies
- Spring-embedder models
- Kamada and Kawai (1989)
- Fruchterman and Reingold (1991)
- Davidson and Harel (1996)

Graphical Comparison of Three Methods

- Data
- Institute for Scientific Information
- Arts and Humanities Database (AHCI)
- 1988 - 1997
- 1.26 million records
- Example:
- Given Plato, find related authors
- Interface described in IV 2000 Paper
- CSNA 2000 Paper
- (Lin, Buzydlowski, White)

PLATO (4928)

ARISTOTLE (1861)

PLUTARCH (838)

CICERO (699)

HOMER (627)

BIBLE (552)

EURIPIDES (515)

ARISTOPHANES (474)

XENOPHON (459)

AUGUSTINE (432)

HERODOTUS (425)

KANT-I (385)

AESCHYLUS (374)

SOPHOCLES (363)

THUCYDIDES (363)

OVID (334)

HESIOD (325)

DIOGENES-LAERTIUS (317)

HEIDEGGER-M (312)

DERRIDA-J (304)

PINDAR (292)

NIETZSCHE-F (278)

HEGEL-GWF (264)

VERGIL (259)

AQUINAS-T (255)

25 Authors Co-cited with Plato300 Pair-wise co-citations

- 1:PLATO AND ARISTOTLE -1940 docs
- 2: PLATO AND PLUTARCH - 872 docs

.

.

.

- 300: VERGIL AND AQUINAS-T - 38 docs

Visualization allows for the revelation of intricate structure which cannot be absorbed in any other way...

co-cited with Plato

AESCHYLUS

SOPHOCLES

EURIPIDES

HESIOD

AUGUSTINE

HOMER

PINDAR

BIBLE

ARISTOPHANES

PLATO

DIOGENES-LAERTIUS

ARISTOTLE

XENOPHON

KANT-I

CICERO

AQUINAS-T

PLUTARCH

HEIDEGGER-M

THUCYDIDES

DERRIDA-J

HEGEL-GWF

HERODOTUS

OVID

NIETZSCHE-F

VERGIL

Conclusion

- Slides available at:
- faculty.cis.drexel.edu/~jbuzydlo/
- [email protected]

Bibliography

- Chen, Chaomei, Information Visualization and Virtual Environments, 1999.
- Cleveland, William S., Visualizing Data, Hobart Press, 1993.
- Davidson, R, Harel, D, Drawing Graphs Nicely Using Simulated Annealing, ACM Transactions on Graphics, 15(4): 301-31 (1996).
- Fruchterman,TMJ, Reingold, EM, Graph Drawing by Force-Directed Placement, Software Practice and Experience, 21: 1129-64 (1991).
- Kamada, T,Kawai, S, An Algorithm for Drawing General Undirected Graphs, Information Processing Letters, 31(1): 7-15, (1989).

Download Presentation

Connecting to Server..