1 / 34

Dublin Core and Emerging Conventions for a Semantic Web

Dublin Core and Emerging Conventions for a Semantic Web. Thomas Baker Fraunhofer-Gesellschaft, Bonn ELPUB 2003, Guimaraes, Portugal 26 June 2003. A particular set of metadata terms. Dublin Core as a simple and semantically generic lingua franca

gordy
Download Presentation

Dublin Core and Emerging Conventions for a Semantic Web

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Dublin Core and Emerging Conventions for a Semantic Web Thomas Baker Fraunhofer-Gesellschaft, Bonn ELPUB 2003, Guimaraes, Portugal 26 June 2003

  2. A particular set of metadata terms • Dublin Core as a simple and semantically generic lingua franca • Fifteen “core” elements: Subject, Description, Title… • A metadata "pidgin" for "digital tourists" on a culturally diverse global Web • Limited grammar, easy to learn and use • Enough "as is" for many needs • 33 "element refinements" and 17 "encoding schemes" to qualify the elements for specialized purposes • A small set of 12 resource types for use with dc:type

  3. A simple data model(resource with properties) • 1996-1998: Collective realization that machine-processability requires a coherent data model • 1996: “Warwick Framework” proposed at DC-2 workshop: DC as one specialized module (“resource discovery”) • 1997: “Qualifiers” proposed for specifying meanings • Some early adopters took this to unintended extremes: “DC.Creator.telephone-number” • 1998: DCMI involvement in emerging Resource Description Framework, clarification of simple data model • 2000: First set of qualifiers approved

  4. A typology of metadata terms ("grammar") • Elements • (core) properties of resources • Element Refinements • properties that semantically refine elements • Encoding Schemes • give context to a metadata value • Vocabulary Terms • constitute controlled lists of possible values

  5. An emergent approach to"structured values" • Implementers sometimes "shoehorn" complex sets of information into a single value • Creator: "name=Tom, affiliation=FHG, shoesize=47" • In practice, a large variety of "structured values" • Labelled strings • Unlabelled strings • Marked-up strings (e.g., LaTex, HTML) • Secondary resource descriptions (as above) • Post-processing ad-hoc constructs is messy and does not scale • Andy Powell's model: • Elements can have string values (Simple DC) • A further requirement to point to linked metadata?

  6. A process for community standardization [10] • 1995-1999: open workshops, unruly but stimulating meetings of minds, rough consensus • 2000: qualifier vote: circa 25 voting members of an ad-hoc "Usage Committee" • 2001: smaller Usage Board • Codification of formal process for editorial control • Two two-day face-to-face meetings per year • Mandate and responsibility to maintain standard, approve extensions and clarifications

  7. ...based editorial review bya Usage Board • Term set must evolve as implementors coin new terms and usage patterns emerge • Working groups propose new terms or clarifications • Evaluate in light of grammatical principle, usefulness, clarity of definition, overlap with existing terms • Review application profiles based on Dublin Core • Tiered model of approval status: conforming, recommended, obsolete, registered • Meeting materials, mailing lists, and decisions archived and accessible on the open Web • DCMI as maintenance agency for ISO 15836

  8. A bias towards simple and generic • DCMI Usage Board bias • Strength and value of DC lies in simplicity and generic applicability • Keep the core standard small, generic, and lightweight • Resist temptation to "complexify"– people want and need distinctions, but not in a "small standard" • DCMI Type Vocabulary has just 12 terms: user communities should invent or re-use their own more specific sub-types

  9. A bias towards cooperation and re-use • Help user communities define and use their own extensions • Cooperate with maintainers of specialized vocabularies on forms of mutual recognition • Provide a model for re-use

  10. "Good neighbor" policies • MARC Relators (roles such as "adapter", "artist") • DCMI: "use MARC Relators to refine dc:contributor" • LoC's RDF schema: "MARC Relators (identified with URIs) are sub-properties of dc:contributor" • Encoding Schemes • DCMI term designates Library of Congress Subject Headings (http://purl.org/dc/terms/LCSH) • If LoC coins own term, DCMI should promote its use

  11. A "namespace policy" [20] • All DCMI metadata terms are given unique identity within three namespaces: • http: //purl.org/dc/elements/1.1/ - the core elements • http://purl.org/dc/terms/ - all other elements/qualifiers • http://purl.org/dc/dcmitype/ - a Type vocabulary • Example: http://purl.org/dc/elements/1.1/title • Policy on long-term stability of namespace URIs • Changes not substantially “semantic” (i.e., corrections) will not result in change of namespace URIs • “Semantic” changes must trigger a change of name • Version turnover of a “document management” nature will have no effect on namespace URIs

  12. A typology of metadata vocabularies • Term declarations • Declare a unique set of elements and definitions • Each DCMI term is identified with a URI • Documented in HTML pages, formally declared as RDF schemas • Application profiles • Declare how an application uses which terms in its metadata • May mix-and-match from multiple namespaces

  13. Why application profiles? • People want them! • Most standards have them: IEEE/LOM, MARC, DOI... • As focus of dialogue and semantic negotiation • Deep human need to resist total standardization? • To identify emerging semantics "at the edges" of a standard • To know how colleagues and peers are designing metadata – and avoid "reinventing the wheel" • To harmonize metadata usage within domains: • User communities (DC-Libraries, DC-Government) • Subject gateways (Renardus)

  14. Dublin Core application profiles • Declaration specifying which metadata terms an information provider uses in metadata • Identifies source of terms used • May provide additional documentation • Designed to promote interoperability within constraints of Dublin Core model • Draft guidelines sponsored by European Standardization Committee (CEN) to be progressed through DCMI process • http://www.cenorm.be/isss/Workshop/MMI-DC/application-profile-for-comment.pdf • Caution: a documentary format cannot itself guarantee interoperability

  15. A set of encoding practices • Guidelines for encoding metadata records (or embedded metadata) in HTML, XML, RDF • Use of rdfs:label and rdfs:value allow nesting of secondary resource descriptions • A model for declaring terms "machine-processably" in RDF • Namespace Policy mandates this, though not specifically RDF • Work item: a model for declaring application profiles machine-processably

  16. CORES Resolution

  17. Shared conventions fordeclaring namespaces? [30] • Cross-community consensus-building • W3C metadata standards and URIs as a basis for interoperability among different standards? • EU CORES Project (2002-2003) • Identify and explore areas of possible agreement among major standards initiatives • Interoperability Forum meeting in Brussels, November 2002

  18. CORES Resolution on Identifying Metadata Elements • http://www.cores-eu.net/interoperability/cores-resolution/ • Whereas • Our metadata standards have “elements” – units of meaning comparable and mappable to elements of other standards, • We agree: • To assign Uniform Resource Identifiers to our elements; • To articulate and publish specific policies regarding the stability, persistence, and maintenance of the URIs assigned to the elements.

  19. Clarifications to theCORES Resolution • URIs not necessarily used in applications "as is" • In metadata records, maybe dc:contributor instead of http://purl.org/dc/elements/1.1/contributor • Signatories decide what to identify with URIs • An individual element? An entire set of elements? A specific historical version of an element? • No implication that URIs will "resolve" to anything • URIs may "get" something with HTTP on Web – or not! • E.g., resolve to a database query? • Resolve to an RDF schema? • Or even resolve to nothing at all ("file not found")!!

  20. Signatories • Eliot Christian, USGS, for GILS • Brian Green, EDItEUR, for ONIX • Rebecca Guenther, Library of Congress, for MARC21 • Keith Jeffery, EuroCRIS, for CERIF • Norman Paskin, Int’l DOI Foundation, for DOI • Robby Robson, IEEE LTSC, for IEEE/LOM • Stuart Weibel, DCMI, for Dublin Core

  21. Signatories’ Action Plan • Action plan, November 2002 – May 2003: • Define and publish URI assignment mechanisms • Assign URIs to elements • Publish URI persistence policies • Article on follow-up scheduled for D-Lib Magazine in July 2003 issue • Taken as a whole, corpus of good-practice policies for others to discuss and emulate

  22. Beyond the CORES Resolution [40] • Benefits for signatories: • Important first step towards future interoperability applications (e.g., mapping, conversion) • Improve "citability" of elements between standards • Potential areas of further work: • Provide persistent URIs for terms in taxonomies and ontologies • Shared conventions on declaring URIs in machine-processable forms • Shared conventions for application profiles and mapping constructs • Shared ontologies as targets for mapping

  23. What exactly is being identified? • Is a particular term the same when used in different contexts? • A single term in a flat namespace? • http://ltsc.ieee.org/LOM/Identifier • Or two terms in a flat namespace? • http://ltsc.ieee.org/LOM/GeneralIdentifier • http://ltsc.ieee.org/LOM/MetadataIdentifier • Or two terms in a hierarchical namespace? • http://ltsc.ieee.org/LOM/General/Identifier • http://ltsc.ieee.org/LOM/Metadata/Identifier

  24. What exactly is being identified? • For purposes of identification, is a term "the same" through successive versions? • At first, DC reflected version in the URI: • http://purl.org/dc/elements/1.1/title • Then decided to keep URIs stable and define the limits of change in the Namespace Policy • http://purl.org/dc/terms/audience • URIs for DC 1.1 kept for legacy reasons • URIs for successive versions of a term used "behind the scenes" for tracking changes

  25. Publishing and documentinga vocabulary

  26. A method for maintaining (and versioning) a vocabulary • Assume that vocabularies must evolve: • Anticipate need to understand discrete states of the standard • All documents, decisions, and term declarations must evolve • Versioning to support future automated methods for processing legacy metadata • Numbered decisions linked to: • A specific historical version of a term • Supporting documentation for the decision • Historical record of the Usage Board meeting

  27. Modes for publishing a vocabulary • Multiple publication formats needed • Web pages for human use • RDF schemas for expressing relationships between terms in machine-processable form • OWL ontologies and rules languages will improve expressivity of these constructs • Future schemas may need to express versioning machine-processably • Workflow • Web pages and schemas from a common source • XML data + XSLT scripts – simple, effective

  28. A searchable "registry" of terms [50] • DCMI Registry • Searchable database of metadata terms • Terms translated into various languages • Goal: application interface for Web services • Goal: harvest schemas directly from their maintainers • An ecology of registries? • Harvest and merge element sets, vocabularies, profiles • For general overviews: SCHEMAS, CORES • Specific domains: MEG, GEM (education), FAO (agriculture) • Publication environment for information models • Tool for harmonization, mapping, conversion, merging

  29. The evolving Web context

  30. The Web as a new social context • Something new in history • Not just an historical set of technologies (HTTP, URLs, HTML) • Platform for historically unprecedented forms of social and intellectual interaction • Metadata as language for the Web • A language for statements about Web resources • Statements created and used both by humans and by machines • "Semantic Web" is about describing how resources relate to each other

  31. Scale and automation • The Web is too big to control • Metadata statements are expensive to make and maintain • Shift away from the metaphor of "library"? • NSF workshop on "Post Digital Library Futures" • http: //www.sis.pitt.edu/~dlwkshop/ • Automated resource discovery (e.g. Google) • Using contextual information (e.g., URL structures) to infer "aboutness" • Natural-language technology, e.g. summarization

  32. An evolving role for metadata • Balance between human and machine • Automated methods to generate metadata • "Let Google do it" versus expert intervention • Granularity of metadata • Describe each item or entire collections? • How much metadata is "enough" to improve discovery? • Semantic precision or tolerance of fuzziness?

  33. Which aspects of Dublin Core willprove most useful over time? • The elements and related sets of terms • Open processes for community standardization • Editorial review by a Usage Board • A bias toward simple and generic metadata • A bias toward cooperative re-use of vocabularies • The etiquette of mutual recognition • A namespace policy for using URIs • A typology of vocabularies (e.g. application profiles) • A set of encoding practices (HTML, XML, RDF) • Methods for maintaining and versioning a vocabulary • Publishing a vocabulary for humans and machines • Searchable registries of metadata terms

  34. thomas.baker@bi.fhg.de

More Related