Challenges in building federation services over harvested metadata
Download
1 / 27

Challenges in Building Federation Services over Harvested Metadata - PowerPoint PPT Presentation


  • 72 Views
  • Uploaded on

Challenges in Building Federation Services over Harvested Metadata. Kurt Maly, Michael Nelson, Mohammad Zubair Digital Library Group Old Dominion University Norfolk, VA 23529. Outline. Motivation Overview Process Automation Web Services and Applications Performance

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Challenges in Building Federation Services over Harvested Metadata' - chanel


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Challenges in building federation services over harvested metadata

Challenges in Building Federation Services over Harvested Metadata

Kurt Maly, Michael Nelson, Mohammad Zubair

Digital Library Group

Old Dominion University

Norfolk, VA 23529

NSDL 2003


Outline
Outline Metadata

  • Motivation

  • Overview

  • Process Automation

  • Web Services and Applications

  • Performance

  • Conclusions and Future Work

NSDL 2003


Motivation
Motivation Metadata

  • Harvesting provides only the basic services to get metadata from repositories.

  • Processing these data or retrieving related metadata is not part of the OAI-PMH.

  • Dynamic harvesting introduces challenges of keeping specialized-services consistent with ingestion of new metadata records.

NSDL 2003


Motivation1
Motivation Metadata

  • There is a growing use of the Web Services standard. Hence providing services compliant with this standard will increase the usability of our digital library.

  • Using web services enable 3rd parties to provide services that enhance our native services on top of our federation collection

NSDL 2003


Overview
Overview Metadata

Archon is a federation of physics digital libraries. Its architecture provides services to both humans and machines:

  • Basic Services (for humans)

    • a search and discovery service;

    • a service to allow searching on equations embedded in the metadata,

    • a cross-archive citation service

  • OAI Services (for machines)

    • a storage service for the metadata of collected archives;

    • a harvester service to collect data from digital libraries using OAI-PMH

    • a data provider service to expose metadata to OAI-PMH harvesters

  • Web Services (for machines)

    • A focus library for personal use

NSDL 2003


Archon architecture
Archon Architecture Metadata

NSDL 2003


Process automation
Process Automation Metadata

  • At the core of Archon we have high level services that require post-processing of harvested metadata .

  • we implemented Archon’s post-harvesting processes as tasks that can be run incrementally and automatically.

  • The Archon post-processing consists of tasks for citation and equation processing, normalization, and a subject resolver.

NSDL 2003


Harvest post processing citation processing
Harvest Post MetadataProcessingCitation Processing

  • Reference-linking service provides the user a list of the references for each metadata record.

  • Where possible the service provides links to the documents at external source archives and within Archon.

NSDL 2003


Harvest post processing citation processing1
Harvest Post MetadataProcessingCitation Processing

NSDL 2003




Harvest post processing citation processing data for resolved references
Harvest Post Processing-Citation Processing MetadataData for Resolved References

NSDL 2003


Harvest post processing equation processing
Harvest Post Processing - Equation Processing Metadata

  • We represent the equations as images and display these images when the metadata records are displayed. This requires the following tasks to be performed after harvesting new metadata records:

    • Identifying equations

    • Filtering equations

    • Equation storage

NSDL 2003



Harvest post processing subject resolvers
Harvest Post Processing - Subject Resolvers Metadata

  • Our subject resolver, tries to fill the subject field for APS and arXiv DC records.

NSDL 2003


Harvest post processing statistics
Harvest Post Processing - Statistics Metadata

#records

#refs

Historical

Archon collection

Unique Authors: 346,315

Unique Subjects:9,889

Equations (all): 330,503

APS

39,064 686,521

ArXive

229,076 4,838,158

CERN

17,055 58,105

NASA

38,688 N/A

Emilio

3,480 N/A

Incremental

#records

#refs

#Equation #subject resolved

APS

66,096

37 581

4,052

ArXive

49

0*

25 48

CERN

607

594 12

NASA

*Due to lack of parallel metadata or parsed error in parallel metadata. Equation will not be processed for those whose subject is not resolved.

NSDL 2003


Web services and applications
Web Services and Applications Metadata

  • Created web service to allow students and teachers to create personal collections.

  • These services use Web Services standards including the use of SOAP requests and response in communication between the clients and the services.

  • Examples of these services include:

    • Search Service

    • Book Shelf Service

NSDL 2003


Web services and applications1
Web Services and Applications Metadata

  • Book Shelf Service

    • allows each user to have a personalized collection a subset of the federation

    • enables teachers to collect course materials and package it in a personalized collection

    • enables students that are doing research in a topic to make a special collection that contains all the related documents in that collection.

  • Search Service

    • provides access to all search functionality without the need to use the Archon interface

    • allows each user (e.g. teacher) to provide customized client for the collections that can have special features according to a course’s needs.

NSDL 2003


NSDL 2003 Metadata


NSDL 2003 Metadata


Web services and applications2
Web Services and Applications Metadata

NSDL 2003


Web services and applications3
Web Services and Applications Metadata

NSDL 2003


NSDL 2003 Metadata


Conclusions and future work
Conclusions and Future Work Metadata

  • In our collections, we collected about 300K dc metadata for documents from APS, CERN, arXiv, Emilio and NASA.

  • We also collected 30K parallel metadata records from APS.

  • We have also resolved the data of 5.5M references that are cited by the above documents.

  • Our performance analysis shows that we can comfortably set the scheduler of the OAI harvester to about 1 day and have a safety factor for human intervention should the automatic process break down.

NSDL 2003


Conclusions and future work1
Conclusions and Future Work Metadata

  • We have developed Web Services that can be used for search and discovery of our collections.

  • The developed web services can be used by other developers who want to provide customized or enhanced services or that want to build services additional to the currently provided services.

  • We have also developed sample client applications such as a bookshelf client that can store a collection of documents and can be used to export them as references (in user defined formats) to help authors in writing research papers.

NSDL 2003


Conclusions and future work2
Conclusions and Future Work Metadata

  • We are almost complete in the process of adding production service of federating CERN, arXiv, and APS. We are partially complete in add NASA and plan to collaborate with AIP(American Institute of Physics) to have their collections included as well. Once all these are federated and working at the high service level at a dynamic basis, the Web services should prove to be attractive particularly to authors of papers who can thus maintain their own bibliographies.

NSDL 2003


Future work
Future Work Metadata

  • Collections have overlapping holdings, need strong de-duplication service

  • Expand the personalization effort to allow students and researchers to integrate the DL information into their writing of reports and papers

  • Test a role based access system that allows for each contributing collection to have different policies for different organizations

NSDL 2003


ad