1 / 25

Term Co-occurrence Analysis as an Interface to Digital Libraries

Term Co-occurrence Analysis as an Interface to Digital Libraries. Jan W. Buzydlowski Howard D. White Xia Lin College of Information Science and Technology Drexel University, Philadelphia, Pennsylvania, USA. Digital Library Research. First Wave How to store it Next Wave

veda-miles
Download Presentation

Term Co-occurrence Analysis as an Interface to Digital Libraries

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Term Co-occurrence Analysis as an Interface to Digital Libraries Jan W. Buzydlowski Howard D. White Xia Lin College of Information Science and Technology Drexel University, Philadelphia, Pennsylvania, USA

  2. Digital Library Research • First Wave • How to store it • Next Wave • How to retrieve it (IR) • Text Mining • Visual Information Retrieval Interface (VIRI) • Term Co-occurrence Analysis (TCA) • Co-occurrence vs. lexical associations • Maps vs. lists

  3. Term Definition • Unit of Analysis • Words • Documents • Authors • Journals • Section of Focus • Abstract/Text • Title • Bibliography • Keywords

  4. Words in Title Term Co-occurrence Analysis Interface Digital Library Authors in Bibliography Salton-G Chen-C White-HD Ding-Y Cleveland-W McCain-K Lin-X Schvaneveldt-R Kamada-T Fruchterman-T Example

  5. Term Co-occurrence Methodology • User determines which terms are of interest • Via a seed term • From a pre-defined list • The system returns the pair-wise co-occurrence counts of the terms over the collection of records

  6. Example • Unit: Author; Section: Bibliography • User Supplied List: Plato, Aristotle, Smith, Brown • For a given data set (N = 4 unique terms) • Article 1: Plato, Aristotle, Smith, … • Article 2: Plato, Smith, … • Article 3: Plato, Aristotle, Smith, Brown, … • The following co-citations (C(4,2) = 6) are found • COMBINATIONCOUNTARTICLES • Plato and Smith 3 1, 2, 3 • Plato and Aristotle 2 1, 3 • Plato and Brown 1 3 • Aristotle and Smith 2 1, 3 • Aristotle and Brown 1 3 • Smith and Brown 1 3

  7. Term Co-occurrence Significance • The frequent co-occurrence of term pairs within a set of documents indicates a strong association between those terms, whereas a infrequent count indicates the opposite • The association you would expect is borne out by the frequency • The frequency you compute suggests a level of association • Pain and Management Pain and Obtainment • Plato and Aristotle Plato and Cher • Science and Nature Science and National Tattler • A and B C and D

  8. Term Co-occurrence Uses • Allows a user to get a “foothold” with just one term • One seed term returns many other related terms • Allows a user to get a “overview” with user-supplied/system-supplied terms • Co-occurrence counts with visualization

  9. Seeding • User types in • One term, e.g., Plato • Boolean expression, e.g., Plato AND Brown • System supplies top n terms, in ranked order of frequency of co-occurrence with the initial term

  10. Example • For Plato seed: • ARISTOTLE • PLUTARCH • CICERO • HOMER • BIBLE • EURIPIDES • ARISTOPHANES • XENOPHON • AUGUSTINE • HERODOTUS • KANT-I • AESCHYLUS • SOPHOCLES • THUCYDIDES • OVID • HESIOD • DIOGENES-LAERTI • HEIDEGGER-M • DERRIDA-J • PINDAR • NIETZSCHE-F • HEGEL-GWF • VERGIL • AQUINAS-T

  11. Need for Visualization • Given a list of user- / system-supplied terms • Find the frequency of co-occurrence of each pair-wise combination of terms • Plato AND Aristotle = 1,920 • Plato AND Plutarch = 380, • … • Too many numbers to take in at once • C(25, 2) = (25 * 24)/ 2 = 300 pairs • Three major visualization techniques • Multidimensional Scaling (MDS) • Self-Organizing (Kohonen) Maps (SOMs) • PathFinder Networks (PFNETs)

  12. P Arabie JH Ward JC Gower M Wish RN Shepard RR Sokal JB Kruskal SC Johnson PHA Sneath JD Carroll PE Green JA Hartigan HA Skinner VE McGee RK Blashfield White’s MDS map of 15 co-cited classificationists, ca. 1990

  13. White’s PFNet of co-cited authors in Biblical and literary hermeneutics, 1988-1997

  14. Three tiered User interface Server Database Real-time and interactive Significant data sources ISI AHCI MedLine Live interface for retrieval Our System

  15. User Interface - Seed

  16. User Interface – SOM

  17. Interface - PFNET

  18. Interface - Visual Information Retrieval Interface (VIRI)

  19. User Interface IV

  20. Database Interface • API • String [ ] findRel( String, int ) • Int [ ] findOcc( String [ ] ) • Implemented on: • BRS • API via a wrapper • Oracle • API via JDBC • Noah • Specialized co-occurrence database • API via JNI

  21. Future Plans • User Study • Preference • Type of map, etc. • Cognitive map • How well does the map match experts’ mental models • Larger datasets • Additional data sources

More Related