slide1 l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Electronic Publishing, Digital Archiving and Licensing workshop Frankfurt October 20 2005 Norman Paskin, International D PowerPoint Presentation
Download Presentation
Electronic Publishing, Digital Archiving and Licensing workshop Frankfurt October 20 2005 Norman Paskin, International D

Loading in 2 Seconds...

play fullscreen
1 / 37

Electronic Publishing, Digital Archiving and Licensing workshop Frankfurt October 20 2005 Norman Paskin, International D - PowerPoint PPT Presentation


  • 93 Views
  • Uploaded on

Structured Management of Digital Content and Licenses. Electronic Publishing, Digital Archiving and Licensing workshop Frankfurt October 20 2005 Norman Paskin, International DOI Foundation n.paskin@doi.org. Structured Management of Digital Content and Licenses. Outline:

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Electronic Publishing, Digital Archiving and Licensing workshop Frankfurt October 20 2005 Norman Paskin, International D' - nyoko


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1

Structured Management of Digital Content and Licenses

Electronic Publishing, Digital Archiving and Licensing workshop

Frankfurt October 20 2005

Norman Paskin, International DOI Foundation

n.paskin@doi.org

slide2

Structured Management of Digital Content and Licenses

Outline:

  • Define terms in the title
  • Two principles: identification and description.
    • Identification: resolution, persistence, interoperability
      • Internet identifiers; URI, URN, is DNS enough?
      • What do we need to identify?
    • Description: what is it we are identifying?
      • Metadata: taxonomies, ontologies, folksonomies
  • Summary of key issues
slide3

Structured Management of Digital Content and Licenses

Management:

  • know what it is you are managing – label it
  • Require a unique label for an entity involved in a DRM transaction
  • An identifier string, which can do something

Digital Content and Licenses:

  • Enties in transactions: stuff, people, deals (= content, users, licences)
    • indecs: “people make stuff, people do deals about stuff; stuff is used by people”
  • Same system for all these entities, using internet standards

Structured:

  • Objective: capable of being used in distributed systems
  • someone else can come along at another time/place, and may need to link to another system, etc
  • So must be persistent and interoperable (which means: description)
slide4

ID

Two principles for persistent identification

resource

1. Obvious:IDENTIFICATION Assign ID to resource

Once assigned the number must identify the same resource

  • Beyond the lifetime of the resource, or the assigner
  • Less obvious: DESCRIPTION Assign Resource to ID
      • The resource must be described
    • If the Resource is not alwayssecurely and exclusively bound to the ID – then:
    • Describe the resource “content” [with precision]
    • Failure to do this will ultimately break interoperability
  • How far do we go in each? Depends on what is “good enough”
    • Technologists have focussed on (1) [and “bags of bits/data structures”]
    • The content/rights world on (2) [and focus on “intellectual content”]: ISBN etc
    • Both viewpoints valid
    • (2) is now becoming more relevant – because more open/distributed systems
slide5

Structured Management of Digital Content and Licenses

Outline:

  • Explaining the terms in the title
  • Two principles: identification and description
    • Identification: resolution, persistence, interoperability
      • Internet identifiers; URI, URN, is DNS enough?
      • What do we need to identify?
    • Description: what is it we are identifying?
      • Metadata: taxonomies, ontologies, folksonomies
  • Summary of key issues
slide6

Identifiers do something

  • Identifier: A unique label for an entity involved in a transaction
  • Note the ambiguity of the word “identifier”:
    • Label (e.g. ISBN)
    • Specification (e.g. URN) scheme for making actionable

+ = Implemented system (e.g. DOI, Bar code) “actionable identifier”

  • But pure versus actionable identifier is not a clear distinction – any pure identifier may become actionable in the future through new specifications being applied
  • Resolution: The process in which an identifier is the input (a request) to a network service to receive in return a specific output.
  • Both concepts are in principle neutral as to technology implementation
  • Abstract concepts, but implementations typically at least “internet” TCP/IP (the more general the better, e.g. not just “Web”)
slide7

Technical and social infrastructure issues

Persistence

  • "It is intended that the lifetime of a [persistent identifier] be permanent. That is, the [persistent identifier] will be globally unique forever, and may well be used as a reference to a resource well beyond the lifetime of the resource it identifies or of any naming authority involved in the assignment of its name.“
  • [Persistent Identifier] = URN in IETF RFC 1737: Functional Requirements for Uniform Resource Names. (http://www.ietf.org/rfc/rfc1737.txt)
slide8

Interoperability

  • Persistence can be seen as just one aspect of this wider concept
  • “persistence is interoperability with the future”
  • We know what we mean, but others may not.
    • Identifiers assigned in one context may be encountered, and may be re-used, in another place or time [= persistence] - without consulting the assigner. You can’t assume that your assumptions made on assignment will be known to someone else. Interoperability = the possibility of use in services outside the direct control of the issuing assigner
  • This will be key for publishing, archiving and licensing – all assume distributed access
slide9

Persistent identifiers on the Internet: DNS

  • Domain Name System: DNS
    • designed primarily as a level of indirection for IP addresses: 132.157.24.3 is a machine. Move server.acme.com to another machine, you don't have to tell everyone but just change your DNS records so it now points to 132.157.24.6 instead.
  • A number of assumptions that were valid at that time now pose problems :
    • All the data is public: difficult for use in applications like voice over IP.
    • The data can be implicitly trusted: you need some way to trust that you are talking to who you think you are talking to.
    • The names can all be in ASCII – but Chinese etc is important after all.
    • Administration will be done by sys admins sitting at consoles: no need for an administrative protocol. Ownership is then naturally at the level of whoever owns the servers and pays the sys admins.
    • Control of the naming authority will not be a problem: ICANN, Root zone file is a very active UN row now going on (WSIS)
  • DNS designed for servers:
    • When Tim B- L came out with a plan for linking documents it seemed natural to build on DNS: tack file paths on the end of the server names in order to identify the business ends of the links: URLs (now URIs).
    • But now the documents are identified starting with the names of the organizations that own the servers they sit on. A problem.
slide10

Persistent identifiers on the Internet: Handle

  • DNS is not essential to the underlying TCP/IP network, but just to the current use of that network. One proposed solution to DNS problems; Handle system (1995+)
    • identify objects, not servers.
    • objects can be anything identified: accounts, names, ids, phone #s, content…
    • explicit improvements for identifying verylarge number of digital objects.
    • not all the data is public: individual values within a handle can be private.
    • all transactions can be certified.
    • any Unicode character set can be used.
    • separation between who owns and controls the handle versus who happens to run the servers (distributed administration, ownership at the handle level)
    • gets rid of semantics in the identifier: makes it easy to move ownership across organizations without your objects having someone else's name.
    • Freely available to be used as engine underneath other named identifiers. Does not need DNS, but can work with DNS.
  • Basis of DOI system – advantages as above, proven for publishers. Used in Grid computing, US govt applications, DOI, etc though most DOIs are used in translated http proxy form
  • “The governance of the DNS will not completely encompass future Internet addressing and navigation…The system…is not static but a technology capable of evolving into a better form. As such, the current system should not be treated as sacrosanct, but amenable to innovation”. Kenneth Neil Cukier (Technology Correspondent, The Economist)
  • However, most identifier methodologies still use the DNS basis: URI, URN
slide11

URI : observations

  • Web based (W3C led). Still much wider uptake than DOI etc. Takes DNS as basis. Problems:
    • URLs, as currently understood, are demonstrably not persistent: calling them URIs doesn’t fix that
    • Inherits DNS problems (last slide) especially the name/place confusion
    • Many important recent developments are not based on URIs in any way e.g. VoIP (Skype), Peer-to-peer
    • Some are URI based but with different registration requirements (MPEG-21)
    • The Web is not the end point of evolution: grid computing, mobile computing
    • The IETF RFC consensus process, and the separate existence of W3C, leads to ongoing debate and standards with a vague existence (Cf. ISO standards: W3C web site on naming and addressing is “incomplete”)
  • Persistence = organisation is now becoming recognised, and technical solution should follow
    • e.g. “commitment statement” in archiving is seen as important (ARK)
    • e.g. IDF has established rules for social network support of DOIs
    • Importance ofsocial infrastructure
    • URN mechanism (>10 years old) meant to be solution:
    • But still not implemented – recent renewed interest may help
slide12

URN: observations

  • URN (Uniform Resource Name): using DNS to add names to locations
    • Part of mid90s IETF design concept: URL/URN/URC
    • Still inherits problems of DNS, but better than URL
    • But not widely used
  • A single point re-direction to URLs using an http: proxy server
  • Any existing identifier can add the URN spec:
    • isbn:12345678 as a URN = urn:isbn:123456789.
  • Assumes a DNS-based Resolution Discovery Service (RDS)
    • No such widely deployed RDS schemes currently exist: Browsers cannot action URN strings without some additional programming “plug-in”.
  • Some have been built for individual communities
    • Example: Life Science identifier LSID
    • fine but also needs a social infrastructure
  • functionally gives nothing beyond the functionality achieved by coherent management of the corresponding URLs –
    • but they work for that community, by adding that coherent management .
  • URN code or plug-in promised for CENDI (US government users). Some movement to “re-define URN”. If that happens and is taken up, it could be significant.
slide13

Identifier systems

  • Each community tends to arrive at its own “good enough for us” solution
    • less focus now on “what is a persistent identifier?” More on “how do we build a system… ”
  • Whatever mechanism, resolvable identifiers must provide:
    • Agreed numbering syntax
    • Resolution mechanism
    • Data model to define “what it is we are identifying”
    • Technical and social infrastructure to implement
  • (compare physical world bar codes, etc)
  • could be assembled ad hoc, or offered as a packaged system (e.g.DOI)
slide14

Identifying entities of all types

  • Resources: most commonly content (Stuff)
  • Licences (some music industry applications now looking at this (Deals)
  • Parties (see earlier InterParty project) including Institutions(people):
  • e.g. exploratory stakeholders' meeting took place Washington DC October 7 to examine the feasibility of an Institution Registry
    • Problem: libraries deliver contact names and numbers, IP address ranges, etc to publishers,
    • Publishers manage this in their access and subscription systems in order to be able to authenticate library users
    • This exchange of information is usually done individually between publishers and libraries; much duplication of effort, no possibility of synergy
    • Institution Registry could at minimum provide a central space to hold this information once only

.

slide15

Structured Management of Digital Content and Licenses

Outline:

  • Explaining the terms in the title
  • Two principles: identification and description
    • Identification: resolution, persistence, interoperability
      • Internet identifiers; URI, URN, is DNS enough?
      • What do we need to identify?
    • Description: what is it we are identifying?
      • Metadata: taxonomies, ontologies, folksonomies
  • Summary of key issues
slide16

Resolution and “What are we identifying?”

  • Resolution: The process in which an identifier is the input (a request) to a network service to receive in return a specific output
  • Identifier identifies an entity.
  • “what I point to” (resolve to and get) is not always “what is identified”,
    • Can identify but not “get” directly things that are intangible (works), or fugitive (performances) or that change: (“Todays NY Times”) or people and concepts….
    • Pointing and clicking can return different things in different contexts, or give multiple options
  • Entities can be physical, abstract, tangible, intangible, things, people, concepts, colours…
  • Resolution provides a mechanism to describe the resource “content” through a service which delivers a description
slide17

What are we identifying?

“what I point to” (resolve to and get) is not always obvious

Document on screen

Abstract work?

Manifestation of abstract work?

Version?

This HTML file?

All/some of these?

slide18

Describing what we are managing

Whatpreciselyare we identifying by this identifier?

How are these things related to other things?

Common approaches:

  • Taxonomies
  • Ontologies
  • Folksonomies
slide19

Taxonomy

  • (Greek) taxis, arrangement; + -nomie, method
  • Division into ordered groups or categories
  • Hierarchical, parent/child relationships
  • Defined area of interest
  • Gives a good way of being unambiguous within a controlled, defined area
  • Best example is Linnean taxonomy of life: the classification of organisms in an ordered system that indicates natural relationships
  • And that illustrates a key point…
slide22

Taxonomy

  • “It’s a Robin”
  • Id = Robin
  • ..and we all know what a Robin looks like…
  • “we know what we mean but others may not”
slide23

Chordata

|

Aves

|

Passeriform

|

Turdidae

|

Erithacus

|

Rubecula

European Robin

slide24

Chordata

|

Aves

|

Passeriform

|

Turdidae

|

Turdus

|

Migratorius

American Robin (different genus)

slide25

Chordata

|

Aves

|

Passeriform

|

Eopsaltridae

|

Petroica

|

Multicolor

Scarlet Robin (Australasia) (different family)

slide26

?

|

?

|

?

|

?

|

?

|

?

Robin (red) (and Batman)

slide27

?

|

?

|

?

|

?

|

?

|

?

Robin Reliant (red)

slide28

Ontologies

  • differ from taxonomic approach:
    • Not just “stamp collecting” but extensible
    • do not follow a rigid/parent child hierarchical structure: terms may inherit meaning from more than one parent
    • a more complex relationship is maintained.
    • Can build on / are more complex than taxonomies
    • Show how taxonomies map to each other
    • May add inference engines etc
  • the proposed third (missing) component of the semantic web:
    • XML allows users to add arbitrary structure to their documents but says nothing about what the structures mean.
    • RDF enables expression of meaning (sets of triples, each triple being rather like the subject, verb and object)
    • Ontologies “will enable machines to comprehend semantic documents and data"
slide29

Ontologies

  • Use underlying data model – a “context model” - to express an events-based structure
    • the accepted ontology approach [context based= events and states]
  • We often think of metadata as “about” things, people, etc
    • static views e.g. about “person A” ; “creation B”
  • Events link things (e.g. to describe rights activities) by relating things and people in the context which generated/used them
    • dynamic views e.g. “A created B”
  • Events description is the key to “rights metadata”
    • all such transactions are contextual (events)
    • describing the event in context, using formal dictionary terms, enables semantic interoperability
  • The common methodology with most uptake and promise is the <indecs> one
    • developed in more detail by CONTECS and by RightsCom
    • MPEG21 RDD the first result of the extended methodology
slide30

2005

Int DOI

Foundation

indecs

(2000)

CONTECS

(2001+)

ISO

MPEG21

RDD

IDF +

ONIX

indecsDD

EU project

-> indecs

Framework

Ltd

IFPI/RIAA, MPA,

IDF, DentsuMMG,

Rightscom

OntologyX

Mi3p etc

1998-2005: Defining what is identified through metadata

Development of indecs 1998-2005

Black = what

Red = who

slide31

Folksonomies

  • Current hot web topic: individuals assign their own keywords to content
  • Examples:
    • www.flickr.com (photo-sharing);
    • http://del.icio.us/ (social bookmarking)
slide34

Folksonomies

  • Rough and ready alternative to traditional information organisation
  • Most people use tags first and foremost to organise their own information in a way that makes sense to them
    • Sharing this creates a side-effect of “vast democratically structured frameworks of organisation”
  • Not much good for managed structured searching/management:
    • e.g. “recipe” “cooking” “barbecue”
    • the Robin problem
  • But don’t write them off:
    • cf Wikipedia (people said it would never work…)
    • imagine some automated organisation/rules/dictionary being added in certain communities
    • imagine links to Autonomy type searching
slide35

Structured Management of Digital Content and Licenses

Outline:

  • Explaining the terms in the title
  • Two principles: identification and description
  • Identification: resolution, persistence, interoperability
    • Internet identifiers; URI, URN, is DNS enough?
    • What do we need to identify?
  • Description: what is it we are identifying?
    • Metadata: taxonomies, ontologies, folksonomies
  • Summary of key issues
slide36

Summary: key issues

  • What are we identifying? [content not just bits]
  • What are we resolving to from this identifier?
  • What, if any, explicit metadata are we making available?
  • How will the social infrastructure be provided?

The mechanisms must allow:

  • Identification of entities of all forms
    • To be used in variety of contexts
  • Appropriate use of metadata at appropriate level
    • Development of ontology tools to describe entity relationships

The logic chain:

Identification  Persistent  Interoperable  Automation  Precision  Logic

slide37

Structured Management of Digital Content and Licenses

Electronic Publishing, Digital Archiving and Licensing workshop

Frankfurt October 20 2005

Norman Paskin, International DOI Foundation

n.paskin@doi.org