Term co occurrence analysis as an interface to digital libraries
Download
1 / 25

Term Co-occurrence Analysis as an Interface to Digital Libraries - PowerPoint PPT Presentation


  • 65 Views
  • Uploaded on

Term Co-occurrence Analysis as an Interface to Digital Libraries. Jan W. Buzydlowski Howard D. White Xia Lin College of Information Science and Technology Drexel University, Philadelphia, Pennsylvania, USA. Digital Library Research. First Wave How to store it Next Wave

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Term Co-occurrence Analysis as an Interface to Digital Libraries' - veda-miles


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Term co occurrence analysis as an interface to digital libraries

Term Co-occurrence Analysis as an Interface to Digital Libraries

Jan W. Buzydlowski

Howard D. White

Xia Lin

College of Information Science and Technology

Drexel University, Philadelphia, Pennsylvania, USA


Digital library research
Digital Library Research Libraries

  • First Wave

    • How to store it

  • Next Wave

    • How to retrieve it (IR)

      • Text Mining

      • Visual Information Retrieval Interface (VIRI)

  • Term Co-occurrence Analysis (TCA)

    • Co-occurrence vs. lexical associations

    • Maps vs. lists


Term definition
Term Definition Libraries

  • Unit of Analysis

    • Words

    • Documents

    • Authors

    • Journals

  • Section of Focus

    • Abstract/Text

    • Title

    • Bibliography

    • Keywords


Example

Words in Title Libraries

Term

Co-occurrence

Analysis

Interface

Digital

Library

Authors in Bibliography

Salton-G

Chen-C

White-HD

Ding-Y

Cleveland-W

McCain-K

Lin-X

Schvaneveldt-R

Kamada-T

Fruchterman-T

Example


Term co occurrence methodology
Term Co-occurrence Methodology Libraries

  • User determines which terms are of interest

    • Via a seed term

    • From a pre-defined list

  • The system returns the pair-wise co-occurrence counts of the terms over the collection of records


Example1
Example Libraries

  • Unit: Author; Section: Bibliography

  • User Supplied List: Plato, Aristotle, Smith, Brown

  • For a given data set (N = 4 unique terms)

    • Article 1: Plato, Aristotle, Smith, …

    • Article 2: Plato, Smith, …

    • Article 3: Plato, Aristotle, Smith, Brown, …

  • The following co-citations (C(4,2) = 6) are found

    • COMBINATIONCOUNTARTICLES

    • Plato and Smith 3 1, 2, 3

    • Plato and Aristotle 2 1, 3

    • Plato and Brown 1 3

    • Aristotle and Smith 2 1, 3

    • Aristotle and Brown 1 3

    • Smith and Brown 1 3


Term co occurrence significance
Term Co-occurrence Significance Libraries

  • The frequent co-occurrence of term pairs within a set of documents indicates a strong association between those terms, whereas a infrequent count indicates the opposite

    • The association you would expect is borne out by the frequency

    • The frequency you compute suggests a level of association

  • Pain and Management Pain and Obtainment

  • Plato and Aristotle Plato and Cher

  • Science and Nature Science and National Tattler

  • A and B C and D


Term co occurrence uses
Term Co-occurrence Uses Libraries

  • Allows a user to get a “foothold” with just one term

    • One seed term returns many other related terms

  • Allows a user to get a “overview” with user-supplied/system-supplied terms

    • Co-occurrence counts with visualization


Seeding
Seeding Libraries

  • User types in

    • One term, e.g., Plato

    • Boolean expression, e.g., Plato AND Brown

  • System supplies top n terms, in ranked order of frequency of co-occurrence with the initial term


Example2
Example Libraries

  • For Plato seed:

  • ARISTOTLE

  • PLUTARCH

  • CICERO

  • HOMER

  • BIBLE

  • EURIPIDES

  • ARISTOPHANES

  • XENOPHON

  • AUGUSTINE

  • HERODOTUS

  • KANT-I

  • AESCHYLUS

  • SOPHOCLES

  • THUCYDIDES

  • OVID

  • HESIOD

  • DIOGENES-LAERTI

  • HEIDEGGER-M

  • DERRIDA-J

  • PINDAR

  • NIETZSCHE-F

  • HEGEL-GWF

  • VERGIL

  • AQUINAS-T


Need for visualization
Need for Visualization Libraries

  • Given a list of user- / system-supplied terms

    • Find the frequency of co-occurrence of each pair-wise combination of terms

      • Plato AND Aristotle = 1,920

      • Plato AND Plutarch = 380,

    • Too many numbers to take in at once

      • C(25, 2) = (25 * 24)/ 2 = 300 pairs

  • Three major visualization techniques

    • Multidimensional Scaling (MDS)

    • Self-Organizing (Kohonen) Maps (SOMs)

    • PathFinder Networks (PFNETs)


P Arabie Libraries

JH Ward

JC Gower

M Wish

RN Shepard

RR Sokal

JB Kruskal

SC Johnson

PHA Sneath

JD Carroll

PE Green

JA Hartigan

HA Skinner

VE McGee

RK Blashfield

White’s MDS map of 15 co-cited classificationists, ca. 1990



Our system

Three tiered hermeneutics, 1988-1997

User interface

Server

Database

Real-time and interactive

Significant data sources

ISI AHCI

MedLine

Live interface for retrieval

Our System


User interface seed
User Interface - Seed hermeneutics, 1988-1997


User interface som
User Interface – SOM hermeneutics, 1988-1997


Interface pfnet
Interface - PFNET hermeneutics, 1988-1997


Interface visual information retrieval interface viri
Interface - hermeneutics, 1988-1997Visual Information Retrieval Interface (VIRI)


User interface iv
User Interface IV hermeneutics, 1988-1997


Database interface
Database Interface hermeneutics, 1988-1997

  • API

    • String [ ] findRel( String, int )

    • Int [ ] findOcc( String [ ] )

  • Implemented on:

    • BRS

      • API via a wrapper

    • Oracle

      • API via JDBC

    • Noah

      • Specialized co-occurrence database

      • API via JNI


Future plans
Future Plans hermeneutics, 1988-1997

  • User Study

    • Preference

      • Type of map, etc.

    • Cognitive map

      • How well does the map match experts’ mental models

  • Larger datasets

  • Additional data sources


ad