CS 502 Computing Methods for Digital Libraries Cornell University – Computer Science Herbert Van de Sompel herbertv@cs.cornell - PowerPoint PPT Presentation

Slide1 l.jpg
Download
1 / 29

  • 303 Views
  • Updated On :
  • Presentation posted in: Pets / Animals

CS 502 Computing Methods for Digital Libraries Cornell University – Computer Science Herbert Van de Sompel herbertv@cs.cornell.edu. Lecture 5 A research perspective on Digital Libraries. DL Ancestry. URLs to some of these DLs. ADS: http://adswww.harvard.edu/

Related searches for CS 502 Computing Methods for Digital Libraries Cornell University – Computer Science Herbert Van de Sompel herbertv@cs.cornell

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

Download Presentation

CS 502 Computing Methods for Digital Libraries Cornell University – Computer Science Herbert Van de Sompel herbertv@cs.cornell

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Slide1 l.jpg

CS 502 Computing Methods for Digital LibrariesCornell University – Computer ScienceHerbert Van de Sompelherbertv@cs.cornell.edu

Lecture 5 A research perspective on Digital Libraries


Dl ancestry l.jpg

DL Ancestry


Urls to some of these dls l.jpg

URLs to some of these DLs

ADS: http://adswww.harvard.edu/

NCSTRL: http://www.ncstrl.org

UCSTRI: http://www.cs.indiana.edu:800/cstr/cover.html

arXiv: http://arXiv.org

LTRS: http://techreports.larc.nasa.gov/ltrs/

NTRS: http://techreports.larc.nasa.gov/cgi-bin/NTRS


Dl architectural review l.jpg

DL Architectural Review

Assumptions made in this perspective

  • things start with TCP/IP connectivity

  • distribute full content (reports, software, etc.)

    • not only metadata


Dl architecture history approach 1 l.jpg

DL Architecture History approach1

1. Build special client and server (generally using Motif/X11, Tcl/Tk, etc.), and use TCP/IP as the transport protocol only

  • pros: rich functionality

  • cons: high development cost, client distribution problem

  • observation: many of these projects spent more time building the interfaces, protocols, searching, etc. than populating their DL!


Dl architecture history approach 2 l.jpg

DL Architecture History approach2

2. use standard protocols built upon TCP/IP: SMTP, FTP, Gopher, WAIS, HTTP, etc.

  • con: less functionality (restricted by protocol)

  • pros: less development cost, uses commonly available clients

  • observation: this approach is now the most common

  • The ones listed on slide 2 fit into this category


Early tcp ip dls l.jpg

Early TCP/IP DLs

  • a very old one: IETF:http://www.ietf.org/

  • Internet RFC’s

  • Very first TCP/IP DL?


Early tcp ip dls8 l.jpg

Early TCP/IP DLs

  • Netlib

    • http://www.netlib.org/

    • begun in 1985, distributing mathematical software via e-mail (SMTP)

    • other access methods and protocols added (ftp, X11 client, http)


Netlib 1995 l.jpg

Netlib 1995


Netlib 2001 l.jpg

Netlib 2001


Los alamos arxiv l.jpg

Los Alamos arXiv

  • Physics pre-print server

    • http://xxx.lanl.gov/ == http://arXiv.org

    • begun in 1991 as an e-mail service to exchange TeX source of pre-prints in high energy physics

    • ftp, http access added shortly

    • Now THE communication channel in Physics

    • Paul Ginsparg


Characteristics of early tcp ip non http dls l.jpg

Characteristics of early TCP/IP, non-HTTP DLs

  • Useful

    • could get the “thing” that you were looking for

  • Constrained by transport protocol

    • SMTP, FTP, etc. interface inherently “clunky”

    • Higher level services such as searching, sophisticated browsing, etc. difficult to implement

  • Small scale

    • would the same systems work well if the holdings went from 100’s or 1000’s to millions?


Characteristics of early tcp ip http dls l.jpg

Characteristics of early TCP/IP, HTTP DLs

  • Initial HTTP implementations / conversions pretty much provided incremental steps in DL improvement

    • a “nice” ftp interface, maybe with better searching and browsing

    • but the nature of the DLs changed little

      • LTRS is an example of a http DL that is really: FTP+Searching(WAIS)+Browsing

      • http://techreports.larc.nasa.gov/ltrs/

      • Also check out user interface of http://arXiv.org


Early tcp ip http dls l.jpg

Early TCP/IP, HTTP DLs

  • But http is a very general transport protocol, and it is possible to build even higher level protocols on top of it

  • Combine this with the expressive HTTP client (web browser), and there is a lot of potential

  • Dienst

    • (http://www.ncstrl.org/Dienst/htdocs/Info/protocol4.html)

    • builds an actual DL protocol on top of HTTP

      • 1994 -- the first to do so?

  • Open Archives Initiative: metadata harvesting protocol on top of HTTP


Sophistication increases tracks meet l.jpg

Sophistication increases, tracks meet

library automation track

sophistication

research track

http

Dienst

http

LTRS, e-print, Netlib, etc.

ftp / gopher

e-mail

time


A framework for distributed digital object services l.jpg

A Framework for Distributed Digital Object Services

Kahn/Wilensky Framework [Kahn 1995]

  • 1995

  • A high level document

  • Almost a definition of key concepts, terminologies, … for next generation DLs

  • Foundation for a research discipline?

  • Not detailed enough to be a real architecture.

  • Architecture is independent of the type of data stored in the DL


Kwf key terms l.jpg

KWF: key terms

  • digital object (do)

    • A do is a data structure that contains

      • Digital data; data is typed (cf MIME)

      • Persistent Key Metadata; especially handle

      • Other metadata (for instance Terms and Conditions)

  • handle

    • a handle is a unique, persistent name for a do

  • repository

    • The place where do’s live

    • Has unique global name

  • Repository Access Protocol (RAP)

    • To deposit/access do’s in repositories


Kwf flow l.jpg

makes a

Data

which consists of

Transaction record per do

handle comes

from a handle

generator

  • Key-Metadata

  • handle

at which point the do becomes a stored do

which can go in a

repository

Properties record per do

  • Key metadata: handle

  • Other metadata:

    • Terms and conditions

Repository

which registers the do’s handle with a handle server

Accesses/Deposits the do in repositories by means of the Repository Access Protocol

What the client receives as a result of an access to a do is a dissemination.

Handle

Server

at which point the do becomes

a registered do

client

KWF: flow

Originator

digital object


Digital objects l.jpg

Digital objects

  • do = data + key-metadata

    • data is typed; core types include:

      • bit-sequence / set-of-bit-sequences

      • digital-object / set-of-digital-objects

      • handle / set-of-handles

    • other types can be defined, and registered with a global type registry

      • definition and registration left undefined

      • ~ similar to MIME

    • key-metadata includes handle

    • possibly other metadata (left undefined in KWF)


Digital objects20 l.jpg

Digital objects

  • Composite do’s:

    • a do with data of type digital-object

    • non-composite do’s are elementaldo’s

    • composite do’s can – for instance -- be used to collect similar works together

      • composite do than contains a do for each work of Shakespeare...


Changing digital objects l.jpg

Changing digital objects

  • Mutabledo’s can be changed once placed in a repository

    • key-metadata cannot be changed

    • the do’s handle does never change!

  • Immutabledo’s cannot be changed once placed in a repository

    • however, they can be deleted


Handles l.jpg

Handles

  • Guest lecture by Professor Arms 02/19


Repositories l.jpg

Repositories

  • A network accessible storage system in which digital objects may be stored for possible subsequent access or retrieval

  • A storeddo is a do that resides in a repository

  • A registereddo is a do that the repository has registered with a handle server

    • storing and registering can be the same or different processes


Repositories24 l.jpg

Repositories

  • A repository keeps a properties record for each do

    • contains key-metadata and any other metadata the repository chooses to keep

  • A do may have a transaction record associated with it in a repository


Repository access protocol l.jpg

Repository Access Protocol

  • “Protocol” may be misleading, its really just the concept for a protocol

  • RAP is designed to be simple; higher level services should come from other protocols

  • KWF defines 3 basic operation classes:

    • ACCESS_DO [metadata; key-metadata, digital object]

      • A dissemination of a do is the result of a request to access a do

    • DEPOSIT_DO [metadata; key-metadata, digital object]

    • ACCESS_REF

      • this is a means to tell the world about other ways (protocols) to access do’s in the repository.


Terms and conditions l.jpg

Terms and Conditions

  • TC are attached to:

    • each do

    • each dissemination

    • each repository

  • TC are a precondition for any operation on the above

  • Repositories responsible for enforcing TC


Terms and conditions27 l.jpg

Terms and Conditions

1

1

terms and

conditions

repository

1

N

1

1

digital object

dissemination

1

1

1

1

1

1

1

1

terms and

conditions

data

terms and

conditions

data

Figure 1 from 95 TR-1593


Digital objects terms and conditions l.jpg

Digital Objects: Terms and Conditions

  • Set by originator and/or repository

  • Can be arbitrarily complex, but generally consist of:

    • permissions: read, write, etc.

    • authentication - person, group, etc.

    • payment

    • 3rd party intervention (possibly in support of the above)


Readings l.jpg

Readings

  • Kahn, R. & Wilensky, R. 1995. A Framework for Distributed Digital Object Services

  • http://WWW.CNRI.Reston.VA.US/home/cstr/arch/k-w.html

  • Arms, W.Y. 1995. Key Concepts in the Architecture of the Digital Library. In: D-Lib Magazine. http://www.dlib.org/dlib/July95/07arms.html

  • Marc VanHeyningen. 1994. The Unified Computer Science Technical Report Index: Lessons in indexing diverse resources. http://www.cs.indiana.edu/ucstri/paper/paper.html


  • Login