A comparison of graphical techniques for the display of co occurrence data
This presentation is the property of its rightful owner.
Sponsored Links
1 / 39

A Comparison of Graphical Techniques for the Display of Co-Occurrence Data PowerPoint PPT Presentation


  • 209 Views
  • Uploaded on
  • Presentation posted in: General

A Comparison of Graphical Techniques for the Display of Co-Occurrence Data. Jan W. Buzydlowski, Xia Lin, Howard D. White College of Information Science and Technology Drexel University Philadelphia, PA 19104 USA. Information Visualization.

Download Presentation

A Comparison of Graphical Techniques for the Display of Co-Occurrence Data

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


A comparison of graphical techniques for the display of co occurrence data

A Comparison of Graphical Techniques for the Display of Co-Occurrence Data

Jan W. Buzydlowski, Xia Lin, Howard D. White

College of Information Science and Technology

Drexel University

Philadelphia, PA 19104

USA


Information visualization

Information Visualization

  • (Data) Visualization allows for the revelation of intricate structure which cannot be absorbed in any other way. [Cleveland, 1993]

  • (Information) Visualization has two aspects, structural modeling and graphic representation.[C. Chen, 1999]

    • data - model - display


Visualization overview

Visualization Overview

  • Model - Display

    • Co-Occurrence Model

    • 3 Graphical Displays

  • Data

    • Co-citation counts from the Institute for Scientific Information, Philadelphia, PA

      • Obtained from a 10-year Arts & Humanities Citation Index database given Drexel by ISI for research purposes


Co occurrence model

Co-Occurrence Model

  • Examples

  • Derivation

  • Metrics


Co occurrence data example 1

Co-Occurrence Data - Example 1

  • Market Basket Analysis

    • a shopping cart holds items purchased

      • e.g., milk, bread, razor blades, newspaper

  • Over all the sales for one day

    • what items are purchased together

      • how can we arrange the items in the store

        • Pampers and beer on Thursdays...


Co occurrence data example 2

Co-Occurrence Data - Example 2

  • Author Co-citation Analysis (ACA)

    • Bibliographic data on a given article holds, e.g.,

      • title, keywords, abstract, citations to other documents

    • An article might cite, e.g.:

      • Plato, Aristotle, Smith, Brown

  • Over a given set of many citing articles

    • Count how many times each pair of authors were cited together

    • Resulting co-citation count shows common intellectual interest


Co occurrence derivation

Co-Occurrence Derivation

  • For a given data set (N = 4 unique terms)

    • Article 1: Plato, Aristotle, Smith

    • Article 2: Plato, Smith

    • Article 3: Plato, Aristotle, Smith, Brown

  • The following co-citations (C(4,2) = 6) are found

    • COMBINATIONCOUNTARTICLES

    • Plato and Smith31, 2, 3

    • Plato and Aristotle21, 3

    • Plato and Brown13

    • Aristotle and Smith21, 3

    • Aristotle and Brown13

    • Smith and Brown13


Co occurrence measures

Co-Occurrence Measures

  • Raw counts

  • Additional information

    • Correlations

      • Replace each cell by correlation measure of each pair-wise column

    • Conditional Probability

      • Compute each cell by dividing each unique combination by total occurring


Co occurrence structure example

Co-Occurrence Structure -Example


Graphical techniques

Graphical Techniques

  • Three Methodologies

    • Multi-dimensional scaling

    • Self-organizing maps

    • Pathfinder networks


A comparison of graphical techniques for the display of co occurrence data

MDS


Mds methodology

MDS Methodology

  • Given original distances (similarities) estimate coordinates that could give those distances

  • The computed distances should correspond to the original distances

    • Stress

      • Added dimensions


A comparison of graphical techniques for the display of co occurrence data

SOM


Self organizing maps soms

Self-Organizing Maps (SOMs)

  • Also known as Kohonen Maps

  • Based on Neural Networks

    • Related to wetware

      • robust techniques

    • If categories are known

      • supervised technique

        • backproprogating learning

    • If categories are sought

      • unsupervised technique

        • competitive learning


A comparison of graphical techniques for the display of co occurrence data

SOMs

  • Given a 2-D grid of nodes

    • each node has N weights

    • each vector (row) has N terms

    • map each input vector to a node

  • Similar to vector quantization (VQ)


Soms generation

SOMs Generation

  • nodes initially given random weights

  • randomly sample an input vector

    • row of co-occurrence matrix

    • with replacement

  • find a node closest to vector

    • Euclidean distance

  • update node weights

    • node weight = node weight + gain term * distance

    • update “neighborhood”

  • “cool” gain term and neighborhood

  • repeat…


Pf nets

PF Nets


Pathfinder networks

Pathfinder Networks

  • Uses on graph notation

    • nodes = authors

    • edges = co-citation counts

  • Co-occurrence is a complete network (weighted, undirected)

Plato

3

Smith

2

2

Aristotle


Pathfinder networks generation

Pathfinder Networks Generation

  • Pathfinder Network is generated by varying the parameters:

    • distance (r)

    • triangle inequality (q)


Pathfinder distance

Pathfinder Distance

  • Uses Minkowski metric:

    d = ( eir )1/r

  • Example

    • e1 = 3, e2 = 4

    • r = 1 => d: 7 = 3 + 4 :

      • Driving distance / ratio data

    • r = 2 => d: 5 = (9 + 16)1/2

      • Euclidean Distance

    • r (approaches) infinity => d: 4 = max( 3, 4)

      • ordinal data

      • rank rather than value


Pathfinder triangle inequality

Pathfinder Triangle Inequality

  • A required property of a metric definition

    d(i,j) < d(i,k) + d(k,j)

  • But may not be justified

    • in personal judgments

      • If a is similar to b, and b is similar to c, there may be no transitive judgment of similarity from a to c

    • in set intersections

      • Even though Smith and Jones appear 12 times, and Jones and Brown appear 5 times, the overlap between Smith and Brown cannot be predicted


Pathfinder triangle inequality1

Pathfinder Triangle Inequality

  • Defines q-triangular

    • check paths of length q to determine if inequality is met

      • minimum is 2

      • maximum is n -1

        • full compliance

    • the longer the length, the fewer the connections


Pathfinder example

Pathfinder Example


Pathfinder network creation

Pathfinder Network Creation

  • PFNet (r, q)

    • Examine all paths of length q or less.

    • Use Minkowski Metric with parameter r to compute path length.

    • If a path of less weight is found, then remove the edge.


Pathfinder example1

Pathfinder - Example

Smith

5

Jones

q = 2

4

3

Brown

r = 1 => Smith - Jones is kept

r = 2 => Smith - Jones is kept

r = infinity => Smith - Jones is removed


Comparison of techniques

Comparison of Techniques

  • MDS

    • Reduces dimensions / reveals clusters

      • 2D may be insufficient

      • measurement may not be Euclidean

  • SOM

    • robust

      • no guarantee of convergence/unique solution

  • Pathfinder

    • does not assume ratio data/triangle inequality

      • connections rather than position is important

      • additional methodology needed for display


Comparison of techniques1

Comparison of Techniques

  • Similarities

    • Spatial models

  • Differences

    • use of visual space

    • semantic meaning

      • as related to data

        • research in progress


Graphical display of methodologies

Graphical Display of Methodologies

  • MDS

    • assume that 2 dimensions are sufficient

      • x, y for each point already defined

  • SOM

    • grid defines the 2D surface

      • plot each label with the appropriate node

  • Pathfinder

    • only defines the nodes and links

      • need additional methodologies

        • Spring-embedder models

          • Kamada and Kawai (1989)

          • Fruchterman and Reingold (1991)

          • Davidson and Harel (1996)


Graphical comparison of three methods

Graphical Comparison of Three Methods

  • Data

    • Institute for Scientific Information

    • Arts and Humanities Database (AHCI)

      • 1988 - 1997

      • 1.26 million records

  • Example:

    • Given Plato, find related authors

      • Interface described in IV 2000 Paper

      • CSNA 2000 Paper

        • (Lin, Buzydlowski, White)


25 authors co cited with plato

PLATO (4928)

ARISTOTLE (1861)

PLUTARCH (838)

CICERO (699)

HOMER (627)

BIBLE (552)

EURIPIDES (515)

ARISTOPHANES (474)

XENOPHON (459)

AUGUSTINE (432)

HERODOTUS (425)

KANT-I (385)

AESCHYLUS (374)

SOPHOCLES (363)

THUCYDIDES (363)

OVID (334)

HESIOD (325)

DIOGENES-LAERTIUS (317)

HEIDEGGER-M (312)

DERRIDA-J (304)

PINDAR (292)

NIETZSCHE-F (278)

HEGEL-GWF (264)

VERGIL (259)

AQUINAS-T (255)

25 Authors Co-cited with Plato


300 pair wise co citations

300 Pair-wise co-citations

  • 1:PLATO AND ARISTOTLE -1940 docs

  • 2: PLATO AND PLUTARCH - 872 docs

    .

    .

    .

  • 300: VERGIL AND AQUINAS-T - 38 docs


A comparison of graphical techniques for the display of co occurrence data

Visualization allows for the revelation of intricate structure which cannot be absorbed in any other way...


A comparison of graphical techniques for the display of co occurrence data

2D MDS map of 25 authors co-cited with Plato


A comparison of graphical techniques for the display of co occurrence data

PFNet of 25 authors

co-cited with Plato

AESCHYLUS

SOPHOCLES

EURIPIDES

HESIOD

AUGUSTINE

HOMER

PINDAR

BIBLE

ARISTOPHANES

PLATO

DIOGENES-LAERTIUS

ARISTOTLE

XENOPHON

KANT-I

CICERO

AQUINAS-T

PLUTARCH

HEIDEGGER-M

THUCYDIDES

DERRIDA-J

HEGEL-GWF

HERODOTUS

OVID

NIETZSCHE-F

VERGIL


Conclusion

Conclusion

  • Slides available at:

    • faculty.cis.drexel.edu/~jbuzydlo/

    • [email protected]


Bibliography

Bibliography

  • Chen, Chaomei, Information Visualization and Virtual Environments, 1999.

  • Cleveland, William S., Visualizing Data, Hobart Press, 1993.

  • Davidson, R, Harel, D, Drawing Graphs Nicely Using Simulated Annealing, ACM Transactions on Graphics, 15(4): 301-31 (1996).

  • Fruchterman,TMJ, Reingold, EM, Graph Drawing by Force-Directed Placement, Software Practice and Experience, 21: 1129-64 (1991).

  • Kamada, T,Kawai, S, An Algorithm for Drawing General Undirected Graphs, Information Processing Letters, 31(1): 7-15, (1989).


  • Login