slide1
Download
Skip this Video
Download Presentation
CS 502 Computing Methods for Digital Libraries Cornell University – Computer Science Herbert Van de Sompel [email protected]

Loading in 2 Seconds...

play fullscreen
1 / 29

CS 502 Computing Methods for Digital Libraries Cornell University – Computer Science Herbert Van de Sompel [email protected] - PowerPoint PPT Presentation


  • 333 Views
  • Uploaded on

CS 502 Computing Methods for Digital Libraries Cornell University – Computer Science Herbert Van de Sompel [email protected] Lecture 5 A research perspective on Digital Libraries. DL Ancestry. URLs to some of these DLs. ADS: http://adswww.harvard.edu/

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'CS 502 Computing Methods for Digital Libraries Cornell University – Computer Science Herbert Van de Sompel [email protected]' - Patman


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1
CS 502 Computing Methods for Digital LibrariesCornell University – Computer ScienceHerbert Van de [email protected]

Lecture 5 A research perspective on Digital Libraries

urls to some of these dls
URLs to some of these DLs

ADS: http://adswww.harvard.edu/

NCSTRL: http://www.ncstrl.org

UCSTRI: http://www.cs.indiana.edu:800/cstr/cover.html

arXiv: http://arXiv.org

LTRS: http://techreports.larc.nasa.gov/ltrs/

NTRS: http://techreports.larc.nasa.gov/cgi-bin/NTRS

dl architectural review
DL Architectural Review

Assumptions made in this perspective

  • things start with TCP/IP connectivity
  • distribute full content (reports, software, etc.)
    • not only metadata
dl architecture history approach 1
DL Architecture History approach1

1. Build special client and server (generally using Motif/X11, Tcl/Tk, etc.), and use TCP/IP as the transport protocol only

  • pros: rich functionality
  • cons: high development cost, client distribution problem
  • observation: many of these projects spent more time building the interfaces, protocols, searching, etc. than populating their DL!
dl architecture history approach 2
DL Architecture History approach2

2. use standard protocols built upon TCP/IP: SMTP, FTP, Gopher, WAIS, HTTP, etc.

  • con: less functionality (restricted by protocol)
  • pros: less development cost, uses commonly available clients
  • observation: this approach is now the most common
  • The ones listed on slide 2 fit into this category
early tcp ip dls
Early TCP/IP DLs
  • a very old one: IETF: http://www.ietf.org/
  • Internet RFC’s
  • Very first TCP/IP DL?
early tcp ip dls8
Early TCP/IP DLs
  • Netlib
    • http://www.netlib.org/
    • begun in 1985, distributing mathematical software via e-mail (SMTP)
    • other access methods and protocols added (ftp, X11 client, http)
los alamos arxiv
Los Alamos arXiv
  • Physics pre-print server
    • http://xxx.lanl.gov/ == http://arXiv.org
    • begun in 1991 as an e-mail service to exchange TeX source of pre-prints in high energy physics
    • ftp, http access added shortly
    • Now THE communication channel in Physics
    • Paul Ginsparg
characteristics of early tcp ip non http dls
Characteristics of early TCP/IP, non-HTTP DLs
  • Useful
    • could get the “thing” that you were looking for
  • Constrained by transport protocol
    • SMTP, FTP, etc. interface inherently “clunky”
    • Higher level services such as searching, sophisticated browsing, etc. difficult to implement
  • Small scale
    • would the same systems work well if the holdings went from 100’s or 1000’s to millions?
characteristics of early tcp ip http dls
Characteristics of early TCP/IP, HTTP DLs
  • Initial HTTP implementations / conversions pretty much provided incremental steps in DL improvement
    • a “nice” ftp interface, maybe with better searching and browsing
    • but the nature of the DLs changed little
      • LTRS is an example of a http DL that is really: FTP+Searching(WAIS)+Browsing
      • http://techreports.larc.nasa.gov/ltrs/
      • Also check out user interface of http://arXiv.org
early tcp ip http dls
Early TCP/IP, HTTP DLs
  • But http is a very general transport protocol, and it is possible to build even higher level protocols on top of it
  • Combine this with the expressive HTTP client (web browser), and there is a lot of potential
  • Dienst
    • (http://www.ncstrl.org/Dienst/htdocs/Info/protocol4.html)
    • builds an actual DL protocol on top of HTTP
      • 1994 -- the first to do so?
  • Open Archives Initiative: metadata harvesting protocol on top of HTTP
sophistication increases tracks meet
Sophistication increases, tracks meet

library automation track

sophistication

research track

http

Dienst

http

LTRS, e-print, Netlib, etc.

ftp / gopher

e-mail

time

a framework for distributed digital object services
A Framework for Distributed Digital Object Services

Kahn/Wilensky Framework [Kahn 1995]

  • 1995
  • A high level document
  • Almost a definition of key concepts, terminologies, … for next generation DLs
  • Foundation for a research discipline?
  • Not detailed enough to be a real architecture.
  • Architecture is independent of the type of data stored in the DL
kwf key terms
KWF: key terms
  • digital object (do)
    • A do is a data structure that contains
      • Digital data; data is typed (cf MIME)
      • Persistent Key Metadata; especially handle
      • Other metadata (for instance Terms and Conditions)
  • handle
    • a handle is a unique, persistent name for a do
  • repository
    • The place where do’s live
    • Has unique global name
  • Repository Access Protocol (RAP)
    • To deposit/access do’s in repositories
kwf flow
makes a

Data

which consists of

Transaction record per do

handle comes

from a handle

generator

  • Key-Metadata
  • handle

at which point the do becomes a stored do

which can go in a

repository

Properties record per do

  • Key metadata: handle
  • Other metadata:
    • Terms and conditions

Repository

which registers the do’s handle with a handle server

Accesses/Deposits the do in repositories by means of the Repository Access Protocol

What the client receives as a result of an access to a do is a dissemination.

Handle

Server

at which point the do becomes

a registered do

client

KWF: flow

Originator

digital object

digital objects
Digital objects
  • do = data + key-metadata
    • data is typed; core types include:
      • bit-sequence / set-of-bit-sequences
      • digital-object / set-of-digital-objects
      • handle / set-of-handles
    • other types can be defined, and registered with a global type registry
      • definition and registration left undefined
      • ~ similar to MIME
    • key-metadata includes handle
    • possibly other metadata (left undefined in KWF)
digital objects20
Digital objects
  • Composite do’s:
    • a do with data of type digital-object
    • non-composite do’s are elementaldo’s
    • composite do’s can – for instance -- be used to collect similar works together
      • composite do than contains a do for each work of Shakespeare...
changing digital objects
Changing digital objects
  • Mutabledo’s can be changed once placed in a repository
    • key-metadata cannot be changed
    • the do’s handle does never change!
  • Immutabledo’s cannot be changed once placed in a repository
    • however, they can be deleted
handles
Handles
  • Guest lecture by Professor Arms 02/19
repositories
Repositories
  • A network accessible storage system in which digital objects may be stored for possible subsequent access or retrieval
  • A storeddo is a do that resides in a repository
  • A registereddo is a do that the repository has registered with a handle server
    • storing and registering can be the same or different processes
repositories24
Repositories
  • A repository keeps a properties record for each do
    • contains key-metadata and any other metadata the repository chooses to keep
  • A do may have a transaction record associated with it in a repository
repository access protocol
Repository Access Protocol
  • “Protocol” may be misleading, its really just the concept for a protocol
  • RAP is designed to be simple; higher level services should come from other protocols
  • KWF defines 3 basic operation classes:
    • ACCESS_DO [metadata; key-metadata, digital object]
      • A dissemination of a do is the result of a request to access a do
    • DEPOSIT_DO [metadata; key-metadata, digital object]
    • ACCESS_REF
      • this is a means to tell the world about other ways (protocols) to access do’s in the repository.
terms and conditions
Terms and Conditions
  • TC are attached to:
    • each do
    • each dissemination
    • each repository
  • TC are a precondition for any operation on the above
  • Repositories responsible for enforcing TC
terms and conditions27
Terms and Conditions

1

1

terms and

conditions

repository

1

N

1

1

digital object

dissemination

1

1

1

1

1

1

1

1

terms and

conditions

data

terms and

conditions

data

Figure 1 from 95 TR-1593

digital objects terms and conditions
Digital Objects: Terms and Conditions
  • Set by originator and/or repository
  • Can be arbitrarily complex, but generally consist of:
    • permissions: read, write, etc.
    • authentication - person, group, etc.
    • payment
    • 3rd party intervention (possibly in support of the above)
readings
Readings
  • Kahn, R. & Wilensky, R. 1995. A Framework for Distributed Digital Object Services
  • http://WWW.CNRI.Reston.VA.US/home/cstr/arch/k-w.html
  • Arms, W.Y. 1995. Key Concepts in the Architecture of the Digital Library. In: D-Lib Magazine. http://www.dlib.org/dlib/July95/07arms.html
  • Marc VanHeyningen. 1994. The Unified Computer Science Technical Report Index: Lessons in indexing diverse resources. http://www.cs.indiana.edu/ucstri/paper/paper.html
ad