Metadata for the web from discovery to description
This presentation is the property of its rightful owner.
Sponsored Links
1 / 35

Metadata for the Web From Discovery to Description PowerPoint PPT Presentation


  • 81 Views
  • Uploaded on
  • Presentation posted in: General

Metadata for the Web From Discovery to Description. CS 502 – 20020224 Carl Lagoze – Cornell University. The fifteen Dublin Core Elements. http://dublincore.org/usage/terms/dc/current-elements/ http://dublincore.org. A Pidgin for Digital Tourists. Metadata is language

Download Presentation

Metadata for the Web From Discovery to Description

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Metadata for the web from discovery to description

Metadata for the WebFrom Discovery to Description

CS 502 – 20020224

Carl Lagoze – Cornell University


The fifteen dublin core elements

The fifteen Dublin Core Elements

http://dublincore.org/usage/terms/dc/current-elements/

http://dublincore.org


A pidgin for digital tourists

A Pidgin for Digital Tourists

  • Metadata is language

  • Dublin Core is a small and simple language -- a pidgin -- for finding resources across domains.

  • Speakers of different languages naturally "pidginize" to communicate

    • E.g., tourists using simple phrases to order beer ("zwei Bier bitte" "dva pivo" "biru o san bai"...)

  • We are all "tourists" on the global Internet.


What is the dublin core 1

Domain

Independent

view

What is the Dublin Core (1)

  • A simple set of properties to support resource discovery on the web (fuzzy search buckets)?


What is dublin core 2

What is Dublin Core (2)?

  • An extensible ontology for resource desciption?

Greater Functionality & Cost


What is the dublin core 3

MARC

Dublin

Core

IMS

INDECS

What is the Dublin Core (3)?

  • A cross-domain switchboard for interoperable metadata?

- projections to application-specific metadata vocabularies

Switchboard


Dublin core qualifiers

Dublin Core Qualifiers

  • From fuzzy buckets to more specific description

  • Model of “graceful degradation”

    • Support both simplicity and specificity

    • Intra-domain and inter-domain semantics


Varieties of qualifiers element refinements

Varieties of qualifiers: Element Refinements

  • Make the meaning of an element narrower or more specific.

  • Narrowing implies an is a relationship

    • a "date created“ is a "date“

    • an "is part of relation“ is a "relation“

  • If your software does not understand the qualifier, you can safely ignore it.


Varieties of qualifiers value encoding schemes

Varieties of Qualifiers: Value Encoding Schemes

  • Says that the value is

    • a term from a controlled vocabulary (e.g., Library of Congress Subject Headings)

    • a string formatted in a standard way (e.g., "2001-05-02" means May 3, not February 5)

  • Even if a scheme is not known by software, the value should be "appropriate" and usable for resource discovery.


A grammar of dublin core

A Grammar of Dublin Core

  • http://www.dlib.org/dlib/october00/baker/10baker.html

  • By design not as subtle as mother tongues, but easy to learn and extremely useful in practice

  • Pidgins: small vocabularies (Dublin Core: fifteen special nouns and lots of optional adjectives)

  • Simple grammars: sentences (statements) follow a simple fixed pattern...


Example dublin core statements

Example Dublin Core statements

  • Resource has Title 'Grammar of Dublin Core'.

  • Resource has Creator 'Tom Baker'.

  • Resource has Subject 'Metadata'.

  • Resource has Relation http://foo.org/file.htm.


Metadata for the web from discovery to description

implied

verb

one of 15

properties

property value

(an appropriate

literal)

DC:Creator

DC:Title

DC:Subject

DC:Date...

implied

subject

Resource

has

property

X


Metadata for the web from discovery to description

implied

verb

one of 15

properties

property value

(an appropriate

literal)

DC:Creator

DC:Title

DC:Subject

DC:Date...

implied

subject

Resource

has

property

X

qualifiers

(adjectives)

[optional qualifier]

[optional qualifier]


Metadata for the web from discovery to description

Resource

has

Subject

"Languages -- Grammar"

LCSH

Resource

has

Date

"2000-06-13"

ISO8601

Revised


Dumb down principle for qualifiers

Dumb-Down Principle for Qualifiers

  • The fifteen elements should be usable and understandable with or without the qualifiers

  • Qualifiers refine meaning (but may be harder to understand)

  • Nouns can stand on their own without adjectives

  • If your software encounters an unfamiliar qualifier, look it up -- or just ignore it!

  • "has a“ relations break the model

    • E.g., a creator has ahair color


Metadata for the web from discovery to description

Test for “good““ qualifiers:

cover and ask:

-- Does the statement still make sense?

-- Is it still correct?

Resource

has

Subject

"Languages -- Grammar"

LCSH

Resource

has

Date

"2000-06-13"

ISO8601

Revised


Incorrect qualification

“Incorrect” Qualification

Resource

has

creator

“Cornell University”

affiliation

Resource

has

subject

“pre-schoolers”

audience


Open questions in this model

Open questions in this model

  • Are uncontrolled and unconstrained values really useful for discovery?

  • Is it possible for an organization (DCMI) to control the evolution of a language?

  • How can "simple discovery metadata" be combined with complex descriptions? Is there a notion of graceful degradation?

  • Can DC serve as a lingua franca (mapping template) among more complex models


Models for deploying metadata

Models for Deploying Metadata

  • Embedded in the resource

    • low deployment threshold

    • Limited flexibility, limited model

  • Linked to from resource

    • Using xlink

    • Is there only one source of metadata?

  • Independent resource referencing resource

    • Model of accessing the object through its surrogate

    • Resource doesn’t ‘have’ metadata, metadata is just one resource annotating another


Syntax alternatives html

Syntax Alternatives:HTML

  • Advantages:

    • Simple Mechanism – META tags embedded in content

    • Widely deployed tools and knowledge

  • Disadvantages

    • Limited structural richness (won’t support hierarchical,tree-structured data or entity distinctions).


Dublin core in html

Dublin Core in HTML

  • http://www.dublincore.org/documents/2000/08/15/dcq-html/

  • HTML constructs

    • <link> to establish pseudo-namespace

    • <meta> for metadata statements

      • name attribute for DC element (DC.element.ER)

      • content attribute for element value

      • scheme attribute for encoding scheme or controlled vocabulary

      • lang attribute for language of element value


Dublin core in html example

Dublin Core in HTML example

<link rel="schema.DC" href="http://purl.org/dc/elements/1.1"> <meta name="DC.Title" content="Business Unusual”><meta name=“DC.Title” lang=“es” content=“negocio inusual”> <meta name="DC.Creator" content="Carl Lagoze"> <meta name="DC.Subject" content="bibliographic control web cataloging "> <meta name="DC.Date.Created" scheme="W3CDTF"

content="2000-10-23"> <meta name="DC.Format" content="text/html"> <meta name="DC.Identifier" content="http://lcweb.loc.gov/lagoze_paper.html">


Unqualified dublin core in xml

Unqualified Dublin Core in XML

http://dublincore.org/documents/2002/09/09/dc-xml-guidelines/


Multi entity nature of object description

Photographer

Computer artist

Camera type

Software

Multi-entity nature of object description


Attribute value approaches to metadata

subject

implied verb

metadata noun

literal

metadata adjective

Playwright

“Shakespeare”

dc:creator.playwright

R1

dc:title

“Hamlet”

Attribute/Value approaches to metadata…

The playwright of Hamlet was Shakespeare

Hamlet has a creator Shakespeare


Run into problems for richer descriptions

“Shakespeare”

dc:creator.playwright

R1

dc:creator.birthplace

“Stratford”

…run into problems for richer descriptions…

The playwright of Hamlet was Shakespeare,who was born in Stratford

Hamlet has a creator Stratford

birthplace


Because of their failure to model entity distinctions

…because of their failure to model entity distinctions

“Shakespeare”

name

R1

R2

creator

birthplace

title

“Stratford”

“Hamlet”


Applying a model centric approach

Applying a Model-Centric Approach

  • Formally define common entities and relationships underlying multiple metadata vocabularies

  • Describe them (and their inter-relationships) in a simple logical model

  • Provide the framework for extending these common semantics to domain and application-specific metadata vocabularies.


Events are key to understanding resource complexity

Events are key to understanding resource complexity?

  • Events are implicit in most metadata formats (e.g., ‘date published’, ‘translator’)

  • Modeling implied events as first-class objects provides attachment points for common entities – e.g., agents, contexts (times & places), roles.

  • Clarifying attachment points facilitates understanding and querying “who was responsible for what when”.


Abc harmony event aware metadata ontology

ABC/Harmony Event-aware metadata ontology

  • Recognizing inherent lifecycle aspects of description (esp. of digital content)

  • Modeling incorporates time (events and situations) as first-class objects

    • Supplies clear attachment points for agents, roles, existential properties

  • Resource description as a “story-telling” activity


Resource centric metadata

?

Resource-centric Metadata


Metadata for the web from discovery to description

“Orest Vereisky”

“Leo Tolstoy”

“Margaret Wettlin”

"Moscow"

“illustrator”

“author”

“translator”

“1828”

“1877”

“1978”

“creation”

“translation”

“Russian”

“English”

“Tragic adultery andthe search for meaningfullove”

“Anna Karenina”


Breaking the metadata bottleneck human vs machine generation

Breaking the metadata bottleneck Human vs. machine generation

  • Simple text scraping

    • HTML tags as hint

    • Other structural methods

  • Natural language methods and machine learning

  • Contextual methods

    • Google (text and image search)


Putting metadata in its place

Putting metadata in its place


Query engine architecture space

Query engine architecture space


  • Login