1 / 24

Said the spider to the fly: identity and authority in the Semantic Web

Said the spider to the fly: identity and authority in the Semantic Web. Presented to the annual conference of the Cataloguing and Indexing Group, 2008, Glasgow Gordon Dunsire. How does the spider know it’s a fly?. Musca domestica. undone. flies. Zoom. . fly. mouche. bluebottle. mosca?.

dunn
Download Presentation

Said the spider to the fly: identity and authority in the Semantic Web

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Said the spider to the fly: identity and authority in the Semantic Web Presented to the annual conference of the Cataloguing and Indexing Group, 2008, Glasgow Gordon Dunsire

  2. How does the spider know it’s a fly? Musca domestica undone flies Zoom  fly mouche bluebottle mosca? Goons

  3. Overview • Introduction to the Semantic Web • Importance of identifiers • Initiatives from library communities • Subject classification and indexing • Tag clouds • Learning from the users • Authorising

  4. Semantic Web • “… an evolving extension of the [WWW] in which the semantics of information and services on the web is defined.” • Wikipedia, English, 19.50 30 Aug 2008

  5. Semantic Web foundations • Resource Description Framework (RDF) • Statements about Web resources in the form of subject-predicate-object expressions, called triples • E.g. “This presentation” – “has creator” – “Gordon Dunsire” • Each component of an RDF statement (triple) is a “resource” • RDF Schema (RDFS) • Vocabulary description language of RDF

  6. Semantic Web building blocks • RDF is about making machine-processable statements, requiring • A machine-processable language for representing RDF statements • Extensible Markup Language (XML)  • A system of machine-processable identifiers for resources (subjects, predicates, objects) • Uniform Resource Identifier (URI)  • For full machine-processing, an RDF statement is a set of three URIs

  7. Identifiers • Things requiring identification: • Subject “This presentation” • e.g. its electronic location (URL): • http://cdlr.strath.ac.uk/pubs/dunsireg/CIG2008.pps • Predicate “has creator” • e.g. http://purl.org/dc/terms/creator • Object “Gordon Dunsire” • e.g. URI of entry in Library of Congress Name Authority File (real soon now?) • Declaring vocabularies/values as “namespaces” in RDF applications provides URIs

  8. Semantic Web applications • Simple Knowledge Organization System (SKOS) • Expresses the basic structure and content of concept schemes such as thesauri and other types of controlled vocabularies • An RDF application • Web Ontology Language (OWL) • Explicitly represents the meaning of terms in vocabularies and the relationships between them

  9. Library catalogues and RDF (1) • Resource Description and Access (RDA) • Successor to AACR • Dublin Core Metadata Initiative RDA Task Group is developing namespaces • Working with two types of RDA vocabularies • RDA metadata entities (elements, attributes) • E.g. “Title”, “Content type” • Represented as an RDF Schema • RDA value vocabularies (terms) • E.g. “spoken word”, “microform” (media type) • Represented in SKOS • Using NSDL Metadata Registry tools

  10. Library catalogues and RDF (2) • IFLA bibliographic control standards • Discussions during WLIC 2008, Québec City • XML namespaces for entities and relationships from Functional Requirements for Bibliographic Records (FRBR) • E.g. “Work”, “has Expression” / ”is Expression of” • Others are likely to follow: • Functional Requirements for Authority Data (FRAD) • International Standard Bibliographic Description (ISBD) • Functional Requirements for Subject Authority Records (FRSAR) • UNIMARC • Library of Congress taking a similar approach with MARC21

  11. RDA RDF vocabulary example (fake) <?xml version="1.0" encoding="UTF-8"?> <rdf:RDF xmlns="http://www.w3.org/2004/02/skos/core#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:skos="http://www.w3.org/2004/02/skos/core#" xmlns:dc="http://purl.org/dc/elements/1.1/"> <!-- WARNING: This is a single-concept fragment --> <!-- Scheme: RDA Content Type --> <skos:ConceptScheme rdf:about="http://RDVocab.info/termList/RDAContentType"> <dc:title>RDA Content Type</dc:title> </skos:ConceptScheme> <!-- Concept: spoken word --> <skos:Concept rdf:about="http://RDVocab.info/termList/RDAContentType/1001"> <skos:inScheme rdf:resource="http://RDVocab.info/termList/RDAContentType"/> <skos:prefLabel>spoken word</skos:prefLabel> <skos:definition>Content expressed through language in an audible form. Includes recorded readings,recitations, speeches, etc., computer-generated speech, etc.</skos:definition> </skos:Concept> </rdf:RDF> Namespaces used to declare the RDA namespace – everything must be defined explicitly to the machine! Overall base domain Vocabulary URI Term URI Term Term definition

  12. RDA content type “spoken word” The term “spoken word” can be referenced as the value of the field “content type” in any metadata record using RDF/XML (Semantic Web): … xmlns:rdvct = http://RDVocab.info/termList/RDAContentype#” … <… rdvct:1001 …> … The field/attribute/element “content type” can be referenced in a similar way to the RDF Schema for RDA elements being developed by DCMI/RDA

  13. The distributed catalogue record RDA element record Work record Name authority record Author: Lee, T. B. Title: Cataloguing has a future Name: Biography: … Content type: Spoken word Expression record Carrier type: Audio disc Subject authority record Subject: Metadata Manifestation record Provenance: Donated by the author Label: Definition: … RDA content type record Item record Label: Spoken word Definition: … RDA carrier type record

  14. Subject indexing and RDF (1) • Library of Congress authority files • LC Subject Headings (LCSH) • LC Name Authority File (LCNAF) • SKOS representation planned • LCSH thesaurus relationships included • Namespaces will be freely available • Potential for development of third-party web services • E.g. Disambiguation of names • E.g. Structured browsing of subject headings

  15. Subject indexing and RDF (2) • Dewey Decimal Classification (DDC) • RDFS/SKOS representation of WebDewey in development • OCLC Terminology Services pilot for web services • High-Level Thesaurus project (HILT) • Using DDC as a spine for interoperating multiple monolingual subject vocabularies • Web service based on SKOS • European DDC Users Group (EDUG) • Technical Issues Working Group monitors projects and develops use cases for the online environment for subject access and translation • MelvilSearch • Subject search interface of the Deutsche NationalBibliotek and other German libraries • Developing web services based on RDFS/SKOS. • DeweyBrowser • Experimental subject access service using tag clouds

  16. Tag clouds: a Web 2.0 technique • Tag: Keyword applied by a user • Usually subject-based • No controls on vocabulary or application • Single “words” only (no phrases) • E.g. “flyingInsect” • Cloud: set of tags applied across a collection of documents • Size/colour indicates relatively how many documents the tag will retrieve • The bigger the tag, the more it has been applied • Displayed horizontally, not a vertical listing • Cloud display is now being used with controlled tags (vocabularies) • E.g. DeweyBrowser

  17. Authority • How “authoritative” is a subject heading? • Metadata created by 3 categories of “agent”: • Professionals (us)  • “Authors” (mmm …)  • Users (them)  • Quality and quantity • High quality, controlled terms • Expensive to maintain (?); legacy issues; slow to respond • Self-interested, uncontrolled terms • Dependent on “publishing” context (i.e. sex sells) • High quantity, uncontrolled terms • Cheap to maintain (?); user-oriented; fast to respond

  18. Authority in the Semantic Web • SKOS makes controlled terms available to authors and users • And gives more flexibility to professionals • Statistical analysis of folksonomies should sort the wheat from the chaff • Law of large numbers; regression to the mean • Full-text indexing and associative analyses (proximity and citation) complement traditional library approaches • aka Googlizing

  19. Conclusion professionals users authors machines Yes It’s a fly!

  20. Thank you • g.dunsire@strath.ac.uk • See handout for acronyms and links

More Related