1 / 11

Project Update

Project Update. XML Document Visualization and Retrieval. Matt Williams. Background. Can we take advantage of this structure when searching for documents?. XML vs Web Doc Added Structure. <book> <title>My First XML</title> <prod id="33-657“ media="paper"> </prod>

Download Presentation

Project Update

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Project Update XML Document Visualization and Retrieval • Matt Williams

  2. Background • Can we take advantage of this structure when searching for documents? • XML vs Web Doc • Added Structure <book> <title>My First XML</title> <prod id="33-657“ media="paper"> </prod> <chapter>Introduction to XML <para>What is HTML</para> <para>What is XML</para> </chapter> <chapter>XML Syntax <para>Elements must have a closing tag</para> <para>Elements must be properly nested</para> </chapter> </book>

  3. Information Retrieval • Standard Information Retrieval (IR) • tf*idf • tf – frequency of a term in a doc • Idf – inverse document frequency • Number of documents containing the term

  4. Information Retrieval • A fair bit of previous work on adding structure to IR queries. • Examples • XIRQL – Fuhr and GroBjohann • //book/chapter[heading $cw$ “InfoVis”] • XXL – Theobald and Weikum • Select Z From Index • Where zoos.~animal.~cougar as Z But… • What if we are unsure of the structure? • What if we have variability in the structure?

  5. Information Retrieval • My goal is to provide an interface to explore the XML collection with limited information • Meta-Schema Information – Element Index • Visual Clustering – Multidimensional Scaling • Visual Queries – Element Selection

  6. Related Work • Visual Information Seeking • Homefinder / Periodic Table – Algerg and Shneiderman

  7. Related Work • Galaxies Wise et al. • Visual Web Retrieval • Lighthouse - Leuski

  8. Related Work • ZUI – Pad, Jazz, and Piccolo • Ben Bederson • SpaceTree • Jesse Grosjean et al. • TreeMaps ?? • Ben Shneiderman

  9. Multidimensional Scaling • Document Similarity • Dimensionality Reduction From full dimensional distance measure  2 dimensional distance measure • Problems – Speed?

  10. Test Environment • eXist – Open Source XML Native Database • Wolfgang M. Meier • http://exist-db.org/ • I am working on providing a front end to the Database that provides: • A Selectable Element Index • Interactive Results That Dynamically Cluster and Zoom

  11. Thus Far • Lots of Learning!! • XML Databases • Multidimensional Scaling • XML Queries • XML Information Retrieval • Zoomable Interfaces • Treemaps • Added basic GUI to eXist • Added a Service to offer the element Index as part of the API

More Related