Archon - A Digital Library that Federates Physics Collections
This presentation is the property of its rightful owner.
Sponsored Links
1 / 34

Motivation PowerPoint PPT Presentation


  • 80 Views
  • Uploaded on
  • Presentation posted in: General

Archon - A Digital Library that Federates Physics Collections with Varying Degrees of Metadata Richness Department of Computer Science Old Dominion University, Norfolk, VA 23529 K. Maly, M. Zubair, M. Nelson In Collaboration With Los Alamos National Laboratory (R. Luce) &

Download Presentation

Motivation

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Motivation

Archon - A Digital Library that Federates Physics Collections with Varying Degrees of Metadata Richness

Department of Computer Science

Old Dominion University, Norfolk, VA 23529

K. Maly, M. Zubair, M. Nelson

In Collaboration With

Los Alamos National Laboratory (R. Luce)

&

American Physical Society (M. Doyle)

JISC/NSF PI Meeting, June 24-25


Motivation

Motivation

Lack of a federation service that provides an unified interface to diverse collections in the physics domain having metadata that differ in richness, syntax, and semantics


Motivation1

Motivation

  • Dissemination and discovery of Physics resources

  • Contributors

  • LANL, APS, AIP, CERN

  • researchers, teachers

  • Users

  • Students, teachers, researchers


Motivation

Arc: The Basic Federation Engine


Motivation

Arc: The Basic Federation Engine


Challenges

Challenges

  • Resource Discovery

    • Diversity in metadata richness

    • Lack of controlled vocabulary

    • Ease of discovering (formula based discovery)

    • Cross linking support

    • Classification

  • Creation and Maintenance

    • Freshness of metadata

    • Dynamic nature of collections

    • Filtering

  • Economic Sustainability

    • Rights management

    • Who pays? For what?


Issues no controlled vocabulary

Issues – No controlled vocabulary

  • Different subject classifications

  • Same authors but different rendering

  • Same affiliation but different form


Interactive resource discovery approach components

Interactive resource discovery approach components


Issues equation based search

Issues - Equation based search

  • Representing search query

  • Rendering of equations and embedding them into the HTML display

  • Integrating into search interface

  • Identifying equations inside the metadata

  • Filtering equations

  • Equation storage


Filtering equations

Filtering Equations

  • Errors in equation encoding, some examples:

    • missing "$" in LaTeX representation

    • illegal LaTeX symbols

  • Simple equations like "n=3"


Filtering categorizing equations

Filtering/categorizing Equations

Approach:

Use of "Stop Equation File" similar to "Stop Word File" used for indexing.

In equation filtering context, the stop equation file consists of rules in form of regular expressions, which describe the LaTeX string to be dropped. The regular expression approach gives us the flexibility to describe easily variety of strings to be filtered.


How to search for records using equations

How to search for records using equations?

  • Three search alternatives (or any combination of these) for the user:

    • Search for docs containing all formulae found in a) abstracts b) subject fields of documents containing user input ‘keywords’

    • Search for docs containing formulae defined by category (e.g. integrals, moments, limits)

    • Browse formulae by various categorizations and search for docs containing selected formulae


Issues cross linking references

Issues - Cross Linking References

  • Obtaining references from full-text documents or parallel metadata sets

  • Bad format of such references when obtained from full text

  • Needed standard way to represent across collections


Issues name similarity

Issues – Name similarity

  • Authors use different names for themselves and their affiliation

  • Could use authority files, difficult to create and maintain across different collections


Similarity approach

Similarity approach

  • Clustering

  • Iterative refinement approach:

    • Coarse level clusters based on approximate string matching (edit-distance, soundex, n-gram)

    • Refining clusters based on affiliation where available

  • Presentation

  • Allow user to follow search by clicking authors and then selecting appropriate, i.e., no authority files


Homogenizing user space

Homogenizing User Space

  • Enabling Web users to discover information in OAI collections (DP-9 Service)

    • http://arc.cs.odu.edu:8080/dp9/

  • Enabling OAI users to discover information in Web enabled non-OAI compliant collections/databases/web sites


Dp 9 service for exposing oai collections to web

DP-9 Service for Exposing OAI Collections to Web


Motivation

Vac: Gateway Service for Harvesting Non-OAI Collections

Web Enabled

Non-OAI Compliant

Collections/Databases/

Web Sites

Web Enabled

Non-OAI Compliant

Collections/Databases/

Web Sites

Web Enabled

Non-OAI Compliant

Collections/Databases/

Web Sites

WIDL Description

(XML based language)

WIDL Description

(XML based language)

WIDL Description

(XML based language)

Gateway to Non-OAI

Collections

OAI Service Provider


Sample description in widl of a web enabled non oai collection

Sample Description in WIDL of a Web enabled Non-OAI Collection

<WIDL NAME=‘’NonOAIGateway" Template=‘’TRcollector" BASEURL="http://www.princeton.edu" VERSION="2.0">

<SERVICE NAME=‘’getURL" METHOD="GET" URL="" INPUT=‘’" OUTPUT=‘’urlOutput" />

</BINDING> <BINDING NAME="urlOutput" TYPE="OUTPUT">

<VARIABLE NAME=‘’link" TYPE="String" REFERENCE="doc.p[1].text" />

<VARIABLE NAME=‘’title" TYPE="String" REFERENCE=‘’title" />

<VARIABLE NAME=‘’author" TYPE="String" REFERENCE=‘’author" />

<VARIABLE NAME=‘’descriptionr" TYPE="String" REFERENCE=‘’abstract" />

</BINDING>

</WIDL>


Federation archives consistency

Federation/archives Consistency


Future tasks

Future Tasks

  • Post processing of search results for easier navigation

  • Exploiting richer metadata and handling diversity in metadata across all participating collections

  • Concentrate on interactive search interface for resource discovery

  • Data normalization, authority files, filtering

  • Investigating different schemes for maintaining federation/archives consistency

  • More high level services beyond formula based search and cross-linking

  • User testing!!!!


Links

Links

  • ODU DL research group:

    • http://dlib.cs.odu.edu/

  • Main federation engine:

    • http://arc.cs.odu.edu/

  • NSDL research:

    • http://archon.cs.odu.edu/

  • ITR/IM research

    • http://kepler.cs.odu.edu/


Motivation

Not used


Automated metadata mapping approach

Automated metadata mapping approach


  • Login