A comparison of graphical techniques for the display of co occurrence data
Download
1 / 39

A Comparison of Graphical Techniques for the Display of Co-Occurrence Data - PowerPoint PPT Presentation


  • 276 Views
  • Uploaded on

A Comparison of Graphical Techniques for the Display of Co-Occurrence Data. Jan W. Buzydlowski, Xia Lin, Howard D. White College of Information Science and Technology Drexel University Philadelphia, PA 19104 USA. Information Visualization.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'A Comparison of Graphical Techniques for the Display of Co-Occurrence Data' - Jimmy


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
A comparison of graphical techniques for the display of co occurrence data

A Comparison of Graphical Techniques for the Display of Co-Occurrence Data

Jan W. Buzydlowski, Xia Lin, Howard D. White

College of Information Science and Technology

Drexel University

Philadelphia, PA 19104

USA


Information visualization
Information Visualization

  • (Data) Visualization allows for the revelation of intricate structure which cannot be absorbed in any other way. [Cleveland, 1993]

  • (Information) Visualization has two aspects, structural modeling and graphic representation.[C. Chen, 1999]

    • data - model - display


Visualization overview
Visualization Overview

  • Model - Display

    • Co-Occurrence Model

    • 3 Graphical Displays

  • Data

    • Co-citation counts from the Institute for Scientific Information, Philadelphia, PA

      • Obtained from a 10-year Arts & Humanities Citation Index database given Drexel by ISI for research purposes


Co occurrence model
Co-Occurrence Model

  • Examples

  • Derivation

  • Metrics


Co occurrence data example 1
Co-Occurrence Data - Example 1

  • Market Basket Analysis

    • a shopping cart holds items purchased

      • e.g., milk, bread, razor blades, newspaper

  • Over all the sales for one day

    • what items are purchased together

      • how can we arrange the items in the store

        • Pampers and beer on Thursdays...


Co occurrence data example 2
Co-Occurrence Data - Example 2

  • Author Co-citation Analysis (ACA)

    • Bibliographic data on a given article holds, e.g.,

      • title, keywords, abstract, citations to other documents

    • An article might cite, e.g.:

      • Plato, Aristotle, Smith, Brown

  • Over a given set of many citing articles

    • Count how many times each pair of authors were cited together

    • Resulting co-citation count shows common intellectual interest


Co occurrence derivation
Co-Occurrence Derivation

  • For a given data set (N = 4 unique terms)

    • Article 1: Plato, Aristotle, Smith

    • Article 2: Plato, Smith

    • Article 3: Plato, Aristotle, Smith, Brown

  • The following co-citations (C(4,2) = 6) are found

    • COMBINATIONCOUNTARTICLES

    • Plato and Smith 3 1, 2, 3

    • Plato and Aristotle 2 1, 3

    • Plato and Brown 1 3

    • Aristotle and Smith 2 1, 3

    • Aristotle and Brown 1 3

    • Smith and Brown 1 3


Co occurrence measures
Co-Occurrence Measures

  • Raw counts

  • Additional information

    • Correlations

      • Replace each cell by correlation measure of each pair-wise column

    • Conditional Probability

      • Compute each cell by dividing each unique combination by total occurring



Graphical techniques
Graphical Techniques

  • Three Methodologies

    • Multi-dimensional scaling

    • Self-organizing maps

    • Pathfinder networks



Mds methodology
MDS Methodology

  • Given original distances (similarities) estimate coordinates that could give those distances

  • The computed distances should correspond to the original distances

    • Stress

      • Added dimensions



Self organizing maps soms
Self-Organizing Maps (SOMs)

  • Also known as Kohonen Maps

  • Based on Neural Networks

    • Related to wetware

      • robust techniques

    • If categories are known

      • supervised technique

        • backproprogating learning

    • If categories are sought

      • unsupervised technique

        • competitive learning


SOMs

  • Given a 2-D grid of nodes

    • each node has N weights

    • each vector (row) has N terms

    • map each input vector to a node

  • Similar to vector quantization (VQ)


Soms generation
SOMs Generation

  • nodes initially given random weights

  • randomly sample an input vector

    • row of co-occurrence matrix

    • with replacement

  • find a node closest to vector

    • Euclidean distance

  • update node weights

    • node weight = node weight + gain term * distance

    • update “neighborhood”

  • “cool” gain term and neighborhood

  • repeat…



Pathfinder networks
Pathfinder Networks

  • Uses on graph notation

    • nodes = authors

    • edges = co-citation counts

  • Co-occurrence is a complete network (weighted, undirected)

Plato

3

Smith

2

2

Aristotle


Pathfinder networks generation
Pathfinder Networks Generation

  • Pathfinder Network is generated by varying the parameters:

    • distance (r)

    • triangle inequality (q)


Pathfinder distance
Pathfinder Distance

  • Uses Minkowski metric:

    d = ( eir )1/r

  • Example

    • e1 = 3, e2 = 4

    • r = 1 => d: 7 = 3 + 4 :

      • Driving distance / ratio data

    • r = 2 => d: 5 = (9 + 16)1/2

      • Euclidean Distance

    • r (approaches) infinity => d: 4 = max( 3, 4)

      • ordinal data

      • rank rather than value


Pathfinder triangle inequality
Pathfinder Triangle Inequality

  • A required property of a metric definition

    d(i,j) < d(i,k) + d(k,j)

  • But may not be justified

    • in personal judgments

      • If a is similar to b, and b is similar to c, there may be no transitive judgment of similarity from a to c

    • in set intersections

      • Even though Smith and Jones appear 12 times, and Jones and Brown appear 5 times, the overlap between Smith and Brown cannot be predicted


Pathfinder triangle inequality1
Pathfinder Triangle Inequality

  • Defines q-triangular

    • check paths of length q to determine if inequality is met

      • minimum is 2

      • maximum is n -1

        • full compliance

    • the longer the length, the fewer the connections



Pathfinder network creation
Pathfinder Network Creation

  • PFNet (r, q)

    • Examine all paths of length q or less.

    • Use Minkowski Metric with parameter r to compute path length.

    • If a path of less weight is found, then remove the edge.


Pathfinder example1
Pathfinder - Example

Smith

5

Jones

q = 2

4

3

Brown

r = 1 => Smith - Jones is kept

r = 2 => Smith - Jones is kept

r = infinity => Smith - Jones is removed


Comparison of techniques
Comparison of Techniques

  • MDS

    • Reduces dimensions / reveals clusters

      • 2D may be insufficient

      • measurement may not be Euclidean

  • SOM

    • robust

      • no guarantee of convergence/unique solution

  • Pathfinder

    • does not assume ratio data/triangle inequality

      • connections rather than position is important

      • additional methodology needed for display


Comparison of techniques1
Comparison of Techniques

  • Similarities

    • Spatial models

  • Differences

    • use of visual space

    • semantic meaning

      • as related to data

        • research in progress


Graphical display of methodologies
Graphical Display of Methodologies

  • MDS

    • assume that 2 dimensions are sufficient

      • x, y for each point already defined

  • SOM

    • grid defines the 2D surface

      • plot each label with the appropriate node

  • Pathfinder

    • only defines the nodes and links

      • need additional methodologies

        • Spring-embedder models

          • Kamada and Kawai (1989)

          • Fruchterman and Reingold (1991)

          • Davidson and Harel (1996)


Graphical comparison of three methods
Graphical Comparison of Three Methods

  • Data

    • Institute for Scientific Information

    • Arts and Humanities Database (AHCI)

      • 1988 - 1997

      • 1.26 million records

  • Example:

    • Given Plato, find related authors

      • Interface described in IV 2000 Paper

      • CSNA 2000 Paper

        • (Lin, Buzydlowski, White)


25 authors co cited with plato

PLATO (4928)

ARISTOTLE (1861)

PLUTARCH (838)

CICERO (699)

HOMER (627)

BIBLE (552)

EURIPIDES (515)

ARISTOPHANES (474)

XENOPHON (459)

AUGUSTINE (432)

HERODOTUS (425)

KANT-I (385)

AESCHYLUS (374)

SOPHOCLES (363)

THUCYDIDES (363)

OVID (334)

HESIOD (325)

DIOGENES-LAERTIUS (317)

HEIDEGGER-M (312)

DERRIDA-J (304)

PINDAR (292)

NIETZSCHE-F (278)

HEGEL-GWF (264)

VERGIL (259)

AQUINAS-T (255)

25 Authors Co-cited with Plato


300 pair wise co citations
300 Pair-wise co-citations

  • 1:PLATO AND ARISTOTLE -1940 docs

  • 2: PLATO AND PLUTARCH - 872 docs

    .

    .

    .

  • 300: VERGIL AND AQUINAS-T - 38 docs


Visualization allows for the revelation of intricate structure which cannot be absorbed in any other way...


2D MDS map of 25 authors co-cited with Plato structure which cannot be absorbed in any other way...


PFNet of 25 authors structure which cannot be absorbed in any other way...

co-cited with Plato

AESCHYLUS

SOPHOCLES

EURIPIDES

HESIOD

AUGUSTINE

HOMER

PINDAR

BIBLE

ARISTOPHANES

PLATO

DIOGENES-LAERTIUS

ARISTOTLE

XENOPHON

KANT-I

CICERO

AQUINAS-T

PLUTARCH

HEIDEGGER-M

THUCYDIDES

DERRIDA-J

HEGEL-GWF

HERODOTUS

OVID

NIETZSCHE-F

VERGIL


Conclusion
Conclusion structure which cannot be absorbed in any other way...

  • Slides available at:

    • faculty.cis.drexel.edu/~jbuzydlo/

    • [email protected]


Bibliography
Bibliography structure which cannot be absorbed in any other way...

  • Chen, Chaomei, Information Visualization and Virtual Environments, 1999.

  • Cleveland, William S., Visualizing Data, Hobart Press, 1993.

  • Davidson, R, Harel, D, Drawing Graphs Nicely Using Simulated Annealing, ACM Transactions on Graphics, 15(4): 301-31 (1996).

  • Fruchterman,TMJ, Reingold, EM, Graph Drawing by Force-Directed Placement, Software Practice and Experience, 21: 1129-64 (1991).

  • Kamada, T,Kawai, S, An Algorithm for Drawing General Undirected Graphs, Information Processing Letters, 31(1): 7-15, (1989).


ad