Persistent identifiers: the 7 levels of identification

Persistent identifiers:the 7 levels of identification Juha Hakala Helsinki University Library ELAG 2005 1-3 June 2005, CERN

Persistence? • Is not dependent on the identifier itself, but on legal, organisational and technical infrastructure • ISSN would collapse without the ISSN standard, a community using it according to the generally accepted principles, ISSN International Centre governing the system and the ISSN database linking the non-semantic (that is, dumb) identifiers to serials • Even a technically brilliant system may be discontinued if its mission breaks apart

”Normal” identifiers and resolution services • Resolution services are a new brand of identifiers which render traditional identifier systems actionable in the Internet (Web) environment • Resolve: provide a link from reference to the resource • Prime examples: DOI and URN • Both may encompass, at least in principle, any existing identifier (URN namespaces have been defined for e.g. ISSN and ISBN) • Both are useless without an existing identifier adding flesh to the DOI/URN bones • From now on, only ”normal” identifiers will be discusses • Complex enough topic for 35 minutes…

Seven levels of identifiers • After the collapse of integrated library system paradigm, and implementation of IR portals, digital asset management systems, digital archives, e-resource management systems, what do we need to identify? • This can be analysed from top to bottom, from organisations to search attributes • Such analysis may show gaps and help in design of identifier systems

Top level: libraries • Identifier system must cover at least other (memory) organisations • National level (union catalogue codes) exists; due to the Internet / Web it became necessary to develop an international system • ISIL, International Standard Identifier for Libraries and Related Organisations; ISO 15511 • Consists of ISO country code, hyphen and UC code • FI-H (Helsinki University Library) • Danish Library Authority hosts the ISIL IC; national centres have been established in some countries but the system needs wider acceptance

2nd level: collections and services • These identifiers are important for IR portals; international exchange of collection & service (e.g. a Z39.50 server) metadata is cumbersome unless there is an efficient means for duplicate control • These identifiers do not exist yet • Helsinki University Library is writing a New Work Item proposal for ISO TC 46 on ISCI; International Standard Collection Identifier • No on-going efforts to develop service ID

ISCI: design principles • Will be based on ISIL in order to allow efficient decentralization of the ISCI assignment and creation of Internet-wide resolution service without a global ISCI DB • Will consist of three parts: ISIL, delimiting character (colon) and the actual (colon-less) collection identifier • FI-H:Slavica (Slavic collection in HUL) • Need for an international support center?

3rd level: authors • International exchange of authority records can be made more efficient with persistent and unique identification • ISADN, International Standard Authority Data Number, has been discussed for quite a few years, but it is not yet formally under development • Retrospective assignment may create interesting ”ownership” problems, especially if the future ISADN contains country of origin • Is Franz Liszt German or Hungarian?

4rd level: identifiers for works • ISWC: International Standard Musical Work Code • T-345246800-1 • Letter T, 9-digit unique number and check digit • ISAN: International Standard Audiovisual Number • ISAN 006A-15FA-002B-C95F-A • 12-digit root segment + 4-digit segment for episode identification and check digit • ISTC: International Standard Text Code • ISTC OA9 2005 12B4A105 6 • agency code, year, work element & check digit • These systems were developed at the same time, but their syntax and terminology used varies • This should not complicate usage too much

ISTC/ISWC/ISAN issues • Many library system vendors are investigating the possibility of implementing FRBR, but few have been capable of doing it (VTLS, OCLC) • Once an ILMS is frbrized, implementing work identifiers is essential, but there is more than technology to consider here: • Do we need to pay for these identifiers; even when retrospectively generating them for old works? • Who will establish the national centers and create the identifiers (and work level records they require)?

5th level: manifestations • This used to be familiar terrain for us • ISBN, ISSN, NBN belong here • E-publishing has destroyed the old status quo: • Systems that worked well for decades have adaptation problems for different reasons • It is not yet entirely clear if the revisions done (or planned) are sufficient

E-problems with manifestations • It is increasingly difficult to define valid ”targets” • ISSN could be assigned to any Web site out there • Publishers want to give ISBNs to anything that can in principle be sold separately (e-book chapters, images within a book, teddy bears on sale in book stores) • The number of things to be identified is growing fast; this will cause syntax problems (ISBN revision was done to make more room) and staff issues in ISSN/ISBN national centers • There is no point to give a persistent identifier to a non-persistent resource; therefore resources must be identified, described & archived which is labour-intensive process

Case ISBN • The old ISBN was running out of number space • Several extension options were discussed: • 13, 16, even 32-digit ISBNs • The idea to make ISBN a ”dumb” number such as ISSN was voted down (for this the librarians in the WG are to blame) • The new ISBN will be compliant with the EAN system • 13 digits, starting with 978, 979 or in the future with something else to extend the scope of the system further • New check digit calculation algorithm adopted from EAN • It is possible to convert from an old ISBN to the new (starting with 978) and back • Publishers retroconvert to new ISBNs; libraries will keep the old ones • ILMS need to do sophisticated things with old/new ISBNs

6th level: component parts • Libraries have not done too well in this area in the past due to staff limitations • We catalogue serials but not the articles • E-publishing may force us to change tactics since now even component parts are separate items accessible directly • Manual processing must be partially or fully be replaced by automated processes; this will also have an impact on identifiers • Automated ID generation solves the staff bottleneck

SICI: still alive, but not kicking • Serial Item and Component Identifier, 1991- • NISO standard; has never really taken off • Can be generated programmatically provided that the article is structured enough • 0095-4403(199502/03)21:3<>1.0.TX;2-Y • Complex; consists of ISSN and stuff identifying the issue and article within it • Publishers have their own systems like PII which have been easier to create and maintain (for them) • Still not clear how popular SICI will eventually be

BICI: Dead On Arrival, or conflict between theory and practice • Book Item and Contribution Identifier • NISO draft standard, never completed • Consists of ISBN and extra stuff to identify the relevant section within the book; may be automatically generated • Publishers & book stores prefer to rely solely on ISBN in their systems • Using ISBN only is not a neat solution (uses a lot of ISBNs, and giving ISBN both for the thing as a whole and its component parts is messy)

7th level: search attributes etc. • Within Z39.50, sets (e.g. attribute and diagnostic), record syntaxes etc. are identified by ISO Object Identifiers • MARC21: 1.2.840.10003.5.10 • Bib-1: 1.2.840.10003.3.1; term examples: • Author: 1.2.840.10003.3.1.1.1003 • Name: 1.2.840.10003.3.1.1.1002 • Author-name personal: 1.2.840.10003.3.1.1.1004 • Personal name: 1.2.840.10003.3.1.1.1

OID problems • Bib-1 attribute set is not quite as coherent as it should be, there are lots of (domestic) search attributes missing from it, and sometimes there are too many alternatives • Attempt to develop Bib-2 failed, and even if we succeed in the future, co-existence of Bib-1 and Bib-2 may cause trouble • ISO OIDs can be applied to anything • Not clear how to use them in ”bibliographic context” to e.g. identify government publications or parts of them; this is currently being investigated in Finland

Conclusion • E-publishing and new applications (and their novel metadata) have expanded both the scope of identifiers needed and the requirements towards existing systems, especially on manifestation & component parts levels • Standards developers have reacted to these needs, but the progress has been slow; still, on some areas system builders have been even more slow

Conclusion (2) • Identifier is more than just a string of characters • There must be an agent which assigns the identifier to a resource, and (usually) describes it • As long as all parts in this picture are stable, identification is a routine process • Agent breakdowns have been the most common reason for problems in the past • Number of national ISSN agencies are non-active • E-resources have destroyed the balance, and it may take a while before the identification system works again in ”business as usual” style

Persistent identifiers: the 7 levels of identification

Persistent identifiers: the 7 levels of identification

Presentation Transcript

Why Do We Need Persistent Identifiers?

Persistent identifiers – an Overview

PERSISTENT IDENTIFIERS FOR THE UK : SOCIAL AND ECONOMIC DATA

Identification of Targeted Levels of Training

Bringing persistent identifiers into the CERIF Data Model

ERPANET-Workshop „Persistent Identifiers“ (18th June 2004 )

LTER, PASTA, and persistent identifiers

Persistent Identifiers and Finnish Institutional Repositories

Persistent Identifiers: A Publisher’s Perspective

The 7 Levels of Classification

An Overview of Persistent Identifiers

Persistent Identifiers

Erpanet Symposium on Persistent Identifiers PURLs

The role of persistent identifiers in tracking taxon changes

Persistent Identifiers

An Overview of Persistent Identifiers

Identification of Targeted Levels of Training

NOAA Persistent Identifiers

DCC–Persistent Identifiers for Representation Information

LTER, PASTA, and persistent identifiers

Persistent Identifiers