1 / 33

UKOLN is supported by:

To name: persistently: ay, there’s the rub Andy Powell, UKOLN, University of Bath a.powell@ukoln.ac.uk DCC Persistent Identifiers Workshop, University of Glasgow – June 2005. UKOLN is supported by:. www.bath.ac.uk. www.ukoln.ac.uk. a centre of expertise in digital information management.

tola
Download Presentation

UKOLN is supported by:

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. To name: persistently: ay, there’s the rub Andy Powell, UKOLN, University of Bath a.powell@ukoln.ac.uk DCC Persistent Identifiers Workshop, University of Glasgow – June 2005 UKOLN is supported by: www.bath.ac.uk www.ukoln.ac.uk a centre of expertise in digital information management

  2. Contents • beginning • middle • end • chance for discussion • note: middle section will focus on technical functional requirements DCC Persistent Identifiers

  3. Overall theme the only useful form of identifieris the URI, the only useful form of URI is one that conforms to a registered scheme, despite appearances to the contrary… …the most useful registered scheme is the ‘http’ URI scheme digital note use of “is” in the first line… “can be mapped to a URI” is not good enough! DCC Persistent Identifiers

  4. Introduction • PI meetings often try to focus on functional requirements… • uniqueness, persistence, resolvability, usability, transportability, simplicity of assignment, applicability to digital and non-digital resources, cost, blah, blah, blah… • difficult, because requirements are abstract… not clear how to meet them in practice - difficult to move forward • forget abstract functional requirements… let’s get technical DCC Persistent Identifiers

  5. Space/time continuum “Internet space” represents some combination of geographic / network distance and domain / administration / application distance… “time” represents time… Internet space my application time DCC Persistent Identifiers

  6. Space/time continuum applications that are closely related in terms of space or time likely to share understanding about identifiers – often by hardwiring knowledge into code Internet space other application other application my application time DCC Persistent Identifiers

  7. Space/time continuum applications that are “distant” are less likely to share understanding about identifiers knowledge locked within domain or lost over time or, worse, both other application Internet space other application my application time DCC Persistent Identifiers

  8. Pushing the boundaries • how do we push the boundaries of identifier understanding further out across the space/time continuum? • standards, standards, standards • go with the crowd • stop telling people existing stuff is broken… or at least, we stop pretending that we can do better! • use what already works and is widely deployed • focus on existing technical standards • stop inventing, start doing DCC Persistent Identifiers

  9. W3C Web Architecture • Global Identifiers - Global naming leads to global network effects. (Principle) • Identify with URIs - To benefit from and increase the value of the World Wide Web, agents should provide URIs as identifiers for resources. (Good practice) • URIs Identify a Single Resource - Assign distinct URIs to distinct resources. (Constraint) • Avoiding URI aliases - A URI owner SHOULD NOT associate arbitrarily different URIs with the same resource. (Good practice) • Consistent URI usage - An agent that receives a URI SHOULD refer to the associated resource using the same URI, character-by-character. (Good practice) • Reuse URI schemes - A specification SHOULD reuse an existing URI scheme (rather than create a new one) when it provides the desired properties of identifiers and their relation to resources. (Good practice) • URI opacity - Agents making use of URIs SHOULD NOT attempt to infer properties of the referenced resource. (Good practice) http://www.w3.org/TR/webarch/ DCC Persistent Identifiers

  10. URIs and XML • in order for identifiers to work across the space/time continuum we need • global and unambiguous identifiers • global and unambiguous ways of exchanging identifiers between software applications • the Uniform Resource Identifier is the only option for the former • XML is the “best” option for the latter • and in particular the XML Schema AnyURI datatype “global” means “very widely deployed technology” – e.g. in my mum’s house! DCC Persistent Identifiers

  11. Use URIs the only useful form of identifieris the URI, the only useful form of URI is one that conforms to a registered scheme, despite appearances to the contrary… …the most useful registered scheme is the ‘http’ URI scheme digital DCC Persistent Identifiers

  12. URI scheme registration • registration of URI schemes is important • registration helps to ensure uniqueness • without registration the same scheme can be used in ignorance by someone, somewhere else in the space/time continuum • registration doesn’t guarantee that every URI with a scheme will be unique – but it helps! • without registration there are no guarantees of uniqueness or persistence DCC Persistent Identifiers

  13. Use registered URI schemes the only useful form of identifieris the URI, the only useful form of URI is one that conforms to a registered scheme, despite appearances to the contrary… …the most useful registered scheme is the ‘http’ URI scheme digital DCC Persistent Identifiers

  14. Semantic Web • the Semantic Web relies on URIs to identify resources • resources == stuff (digital/physical/conceptual things) • the semantic Web is built on a global, shared body of metadata (RDF) • terms in the metadata language are identified using URIs • those URIs must be “resolvable”… in order that “reasoning” can be performed DCC Persistent Identifiers

  15. Note: dereferencing URIs • the Web Architecture talks about “dereferencing” URIs rather than “resolving” them • in many cases “dereferencing” a URI results in obtaining a “representation” of the resource • several representations may be available • the Web Architecture says: • only ‘http’ URIs offer simple, widely deployed dereferencing mechanism • Available representation - A URI owner SHOULD provide representations of the resource it identifies (Good practice) http://www.w3.org/TR/webarch/ DCC Persistent Identifiers

  16. Quick quiz… • what kind of identifier is this? • 1361-3200 is an ISSN it identifies UKOLN’s Ariadne magazine DCC Persistent Identifiers

  17. Quick quiz… • what kind of identifier is this? • 1361-3200 • info:lccn/n78890351 is an ‘info’ URI it identifies a Library of Congress metadata record (an authority file) but I don’t know which DCC Persistent Identifiers

  18. Quick quiz… • what kind of identifier is this? • 1361-3200 • info:lccn/n78890351 • 10.1000/182 is a DOI it is also a Handle it identifies the “DOI Handbook” DCC Persistent Identifiers

  19. Quick quiz… • what kind of identifier is this? • 1361-3200 • info:lccn/n78890351 • 10.1000/182 • 79 Ceti is a Flamsteed Designation it identifies a 7th magnitude star in the constellation of Cetus DCC Persistent Identifiers

  20. Quick quiz… • what kind of identifier is this? • 1361-3200 • info:lccn/n78890351 • 10.1000/182 • 79 Ceti • http://purl.org/dc/terms/audience is an ‘http’ URI a.k.a. a URL it is also a PURL it identifies a DCMI metadata term – i.e. a conceptual resource DCC Persistent Identifiers

  21. Quick quiz… • what kind of identifier is this? • 1361-3200 • info:lccn/n78890351 • 10.1000/182 • 79 Ceti • http://purl.org/dc/terms/audience • only one of these can be understood and dereferenced by every single bit of currently deployed Internet software… Question: why would we want to use anything else? DCC Persistent Identifiers

  22. But… But, ‘http’ URIs are just locators aren’t they? • ‘http’ URIs are identifiers, just like any other • ‘http’ URIs can identify any resource – digital, physical or conceptual • ‘http’ URIs don’t have to break, they just need to be assigned/managed carefully But, ‘http’ URIs can only be used for Web resources, accessed over HTTP, can’t they? But, ‘http’ URIs break every 30 days or something, don’t they? DCC Persistent Identifiers

  23. Use ‘http’ URIs the only useful form of identifieris the URI, the only useful form of URI is one that conforms to a registered scheme, despite appearances to the contrary… …the most useful registered scheme is the ‘http’ URI scheme DCC Persistent Identifiers

  24. Case study 1 - LOM • XML example from IEEE LOM… • typical of many XML / identifier encodings … <general> <identifier> <catalog>DOI</catalog> <entry>10.1000/182</entry> </identifier> … </general> … DCC Persistent Identifiers

  25. just a string just a string nothing in the XML schema indicates that this is a URI – some applications will be blind Case study 1 - LOM • the “catalogue” indicates what kinds of identifier is being used … <general> <identifier> <catalog>URI</catalog> <entry>http://purl.org/poi/rdn.ac.uk/12-34</entry> </identifier> … </general> … DCC Persistent Identifiers

  26. where the XML schema indicates that this is of datatype AnyURI therefore all XML-aware applications will know this is a URI Case study 1 - LOM • a improved version might be… … <general> <identifier>http://purl.org/poi/rdn.ac.uk/12-34</identifier> … </general> … DCC Persistent Identifiers

  27. the URI syntax provides the “catalogue” from the original example all XML-aware applications will know this is a URI, some will know it is a DOI Case study 1 - LOM • a improved version might be… … <general> <identifier>doi:10.1000/182</identifier> … </general> … DCC Persistent Identifiers

  28. Question: which of these forms is most persistent and why? Case study 2 - DOI http://dx.doi.org/10.1000/182 • the DOI “10.1000/182” can be encoded as a URI in several ways: • http://dx.doi.org/10.1000/182 • doi:10.1000/182 • urn:doi:10.1000/182 • however… • DOI-aware applications have to have knowledge of these encodings hard-coded into them (since the DOI itself is just a string) • nothing in the URI specification indicates that these URIs are equivalent • note that the 2nd and 3rd forms are not registered DCC Persistent Identifiers

  29. Case study 3 – ‘info’ URI http://info-uri.info/registry/ • consider the following ‘info’ URI: • info:lccn/n78890351 • ‘info’ URIs are explicitly defined to be non-dereferencable • therefore, there is no documented way of finding out what this URI identifies • there is no documented way of getting a representation of the resource it identifies • and there is no documented way of finding out any more about it Question: how is this useful? DCC Persistent Identifiers

  30. But, what happens when… • …the Internet disappears? • who cares! • we’ll deal with it • we’ll be with the crowd • there’ll be a global transition • everyone will need to deal with it • every software component on the whole Internet will need fixing • the people left behind will be the people who invented their own solutions DCC Persistent Identifiers

  31. Conclusion the only useful form of identifieris the URI, the only useful form of URI is one that conforms to a registered scheme, the most useful registered scheme is the ‘http’ URI scheme URIs are only reallyuseful if encoding syntaxes explicitly indicate “this is a URI” digital DCC Persistent Identifiers

  32. Questions and discussion? DCC Persistent Identifiers

  33. Discussion • What do we need to do to make ‘http’ URIs more persistent? (Are ‘http’ URIs the answer after all?) • Do we have functional requirements that aren’t met by ‘http’ URIs? (When and why should we create new URI schemes?) DCC Persistent Identifiers

More Related