University Library Experience CDL Case Study

University Library Experience CDL Case Study 30 June 2005 John Kunze, California Digital Library

California Digital Library • A university library with no books, students, or faculty • Central services for 10 campus libraries • Content hosting: electronic texts, web-based material, datasets, finding aids • Linked: California museums & archives • Plus a Digital Preservation Program

What’s digital preservation? • Safeguarding electronic information • Viability (intact bit streams) • Renderability (by machines) • Understandability (by humans) • There’s no preservation if we don’t know what it’s called • CDL core need for persistent identifiers

What’s a persistent identifier? • An identifier that is valid for long enough • valid, enough: these are service/user dependent • What’s an identifier? It’s an association between a string and a thing. It follows that: • An id is not a string of data (good) • An id is a matter of opinion, not fact; there will be at least one other provider, serial if not in parallel, or your objects die with you (inconvenient) • Same thing, two strings; or same string, two things • Often: same string, different metadata • Often: same string, parallel things diverging over time due to different preservation practices (eg, migrations)

Accepting some disorder • Long term preservation won’t happen unless objects can change residence and diverge • Campus snapshot to CDL; subsequent snapshots • Publisher to dim CDL archive; later CDL to SS? • Better if object lives in several places at once • Eventually, Producer loses control of copies • Multiple opinions and practices will flourish • Static, id-based persistence claims soon irrelevant • “urn:…”, “hdl:…”, etc. reflect hopes of people long gone • Not pretty, but the alternative (loss) is worse

Agreeing to disagree • What we say, but shouldn’t (not loudly): • Don’t re-assign a persistent id to something else • Or don’t replace a persistent object with another • What we do: • Knowingly replace our persistent objects (typos, drafts, format conversions, home page redesign) • Honestly provide a real kind of persistence, but with very different replacement policies • Won’t have one way within CDL, let alone without

Diverse persistence practice • How dissimilar must two objects be before they get different ids? • CDL’s home-grown Digital Preservation Repository (open source) is self-service: • Lets the Submitter decide • Makes preservation a joint responsibility • Requirement: need to be able to tell users what flavor of permanence is in effect

CDL Persistent Ids Must… • Identify, whether or not the object is at hand • It may not be convenient, helpful, or permitted for you to inspect the object itself -- metadata needed • Convey different flavors of permanence • Lead to access (if authorized) • Not strictly an “identification” problem, but it is the “404 not found” that we need to fix • Be valid for some longish period • Be carried on, in, or with the object

How to choose an id scheme • All CDL requirements are purely about service • Candidate schemes: URL, PURL, URN, ARK, Handle, DOI, MD5, GUID, ISxx, … • CDL gets no direct service help from any scheme; no scheme or syntax confers persistence of any kind • We then ask which schemes are lowest cost and lowest risk?

Myths to fight against • Harmful Fallacy 1. A URL is a location, and is therefore inherently unstable. (ridiculous) • Harmful Fallacy 2. Explicit server/resolver names make URLs inherently unstable. • So “loc.gov” is less stable than “handle.net” and the implicit global resolvers that it depends on? • Harmful Fallacy 3. HTTP-based resolvers will not scale for persistent access. (google) • Harmful Fallacy 4. URLs are the problem. • “Cool URLs don’t break” -- Tim Berners-Lee

Impersistence - big factors • Bankruptcy - no successor found • Loss of funding - no successor found • Loss of political support • War, social upheaval, natural disaster • Scheme impact: zero

Impersistence - lesser factors Deliberately or accidentally, objects are • Removed • Replaced • Moved without setting up a redirect • Everyone has an indirection mechanism, though most don’t use it • Scheme impact: zero

Impersistence - small factors Your org likes persistent ids in principle, but • It lacks knowledge that vanilla web servers trivially support 500,000 redirect directives • It lacks the expertise or staff to maintain a web server, a two-column database table, and a nightly server config file report writer • Scheme impact: zero

Scheme costs and risks • Every modern service needs to support indefinitely and find or be given replacements for at least • Web server, web browser, and DNS • In addition, URN, Handle, and DOI resolution need a global proxy or a plugin for every access • ARK could use a plugin, but doesn’t need it • Handle and DOI also require • You to maintain an extra local server • The community to maintain a set of global servers • For the CDL • Handle and DOI come with highest risk • ARK comes with lowest risk

Persistence - indirect factors CDL’s persistence requirements call for an id scheme (not service) connecting users to • metadata • whether and what kind of persistence • sub-object and variant inferences • core ids on proxy failure (gracefully) • Scheme impact: ARK provides these • A scheme is not a service (DOI is not CrossRef) • When choosing a scheme, we wanted to remain independent of extra external service providers

Our Stuff vs Their Stuff • Persistence can be split into • the Our Stuff Problem • the Their Stuff Problem • It makes no sense for CDL to assign persistent ids to Their Stuff • Their Stuff can be hugely important to our users, but we don’t control it and cannot vouch for it • Where we can afford it, we track them with PURLs • CDL does assign persistent ids to Our Stuff

Distribution of Id Assignment • Objects ingested in flows from other libraries per submission agreements • Each object has an ARK after ingest • Either it has it already • Or we give it one upon entry • Campuses can mint their own ARKs or rely on our minting service • Their own campus ARK namespace is theirs to divide up as they wish

Opaque ids with semantic extensions • CDL dilemma: • opaque ids are needed for names that age and travel well • Semantically laden ids are helpful in providing many id services • Hybrid: • opaque ids are used to name abstract preservation objects • Semantic and sometimes transient extensions address components inside of objects (the set of components evolves over time anyway)

University Library Experience CDL Case Study

University Library Experience CDL Case Study

Presentation Transcript

Library case study on green architecture

University of Sussex Case Study

Establishing Digital Library in University Setup: A Case Study of Panjab University Library

ETD @ Pondicherry University :a case study

DSA Case Study – German National Library (DNB)

IMPLEMENTATION OF RFID TECHNOLOGY IN PUNJABI UNIVERSITY LIBRARY, PATIALA: A CASE STUDY

Introducing Cobb Digital Library (CDL)

Blackwood City Library Case Study

Case Study (Group 1) YouAndMe University Library

Library Collaboration: A Case study of University of Archipelago

Case Study: Central University

Case Study: The Walkerton Experience

University Library

THE HOUNSLOW EXPERIENCE: A CASE STUDY

Case Study The UA Experience

University Library

Going Virtual For Enhanced Library Experience: A Case Study of The National Library of Singapore

Case Study #1 Library System

University of Houston Energy Case Study

Case Study: Newcastle University

University of Houston Energy Case Study

University of Queensland Australia - Case Study