WHAT I HAVE FOUND OUT FROM AN ATTEMPT TO BUILD AN RDF MODEL OF FRBR-IZED CATALOGING RULES

WHAT I HAVE FOUND OUT FROM AN ATTEMPT TO BUILD AN RDF MODEL OF FRBR-IZED CATALOGING RULES

BY • Martha M. Yee • Cataloging Supervisor • UCLA Film & Television Archive • myee@ucla.edu • http://myee.bol.ucla.edu

INTRODUCTION • 1. Some definitions • 2. The vision • 3. The experiment • 4. Some problems?

SOME DEFINITIONS • The semantic web: a way to represent knowledge; a knowledge representation language that provides ways of expressing meaning that are amenable to computation; a means of constructing maps of domains of knowledge consisting of class and property axioms with a formal semantics

SOME DEFINITIONS • The semantic web • The web as huge shared database • Hyperdata replacing hypertext

SOME DEFINITIONS • RDF (Resource Description Framework): a family of specifications for methods of modeling information that underpins the semantic web through a variety of syntax formats

SOME DEFINITIONS • RDF (Resource Description Framework) • Data encoded as: • the subject of a triple (New York) • the predicate of a triple (has the postal abbreviation) • the object of a triple (NY)

SOME DEFINITIONS • RDF (Resource Description Framework) • XML is commonly used to express RDF, but is not a necessity

SOME DEFINITIONS • RDF (Resource Description Framework) • RDFS or RDF Schema is an extensible knowledge representation language providing basic elements for the description of ontologies, AKA RDF vocabularies

SOME DEFINITIONS • RDF (Resource Description Framework) • RDFS data encoded as: • Class (= Entity); the subject of a triple • Class relationship (semantic linkage); the predicate of a triple • Class property (= Attribute); the object of a triple

SOME DEFINITIONS • RDF (Resource Description Framework) • OWL (Web Ontology Language): a family of knowledge representation languages for authoring ontologies compatible with RDF

SOME DEFINITIONS • RDF (Resource Description Framework) • SKOS (Simple Knowledge Organisation Systems): a family of formal languages built upon RDF and designed for representation of thesauri, classification schemes, taxonomies or subject-heading systems

THE VISION • The Web as shared database instead of shared document store

THE VISION • Instead of records, URI’s (Uniform Resource Identifiers) for entities: • URI for work containing all work attributes, including preferred name, variant names, but also much more data about work than our current authority records do

THE VISION • URI for expression, containing all expression attributes, and linked back to work

THE VISION • URI for manifestation, containing all manifestation attributes, and linked back to expression

THE VISION • URI’s for persons, corporate bodies, places, subjects, etc. , including preferred name, variant names, but also much more data about person, corporate body, place or subject (concept or object) than our current authority records do

THE VISION • If any data about a particular entity needed to be changed, it would be changed once at the URI and immediately accessible to all users, libraries and library staff by means of links down to local data such as circulation, acquisitions, and binding data

THE EXPERIMENT • A set of cataloging rules that are more FRBR-ized than RDA in that they more clearly differentiate between: • data applying to the expression • vs. • data applying to the manifestation

THE EXPERIMENT • You can find these rules at: • http://myee.bol.ucla.edu

THE EXPERIMENT • I am now in the process of trying to model my cataloging rules in the form of an RDF/RDFS/OWL/SKOS model

THE EXPERIMENT • I don’t seriously expect anyone to adopt these rules!

THE EXPERIMENT • My research questions: • 1. Is it possible for catalogers to tell in all cases whether a piece of data pertains to the expression or the manifestation?

THE EXPERIMENT • My research questions: • 2. Is it possible to fit our data into RDF/RDFS/OWL/SKOS?

THE EXPERIMENT • My research questions: • 3. If it is, is it possible to use that data to design indexes and displays that meet the objectives of the catalog (providing an efficient instrument to allow a user to find a particular work of which the author and title are known, a particular expression of a work, all of the works of an author, all of the works in a given genre or form, or all of the works on a particular subject)?

THE EXPERIMENT • You can find my RDF/RDFS/OWL/SKOS model at: • http://myee.bol.ucla.edu

SOME PROBLEMS? • Can we do what we need to do within the context of the semantic web?

SOME PROBLEMS? • More granularity, or data parsing by catalogers • Those familiar with RDA, FRBR, and FRAD development will recognize that much of that development is directed at increasing granularity in cataloger-produced data

SOME PROBLEMS? • Granularity issues: • More structure and more granularity makes possible more powerful indexing and more sophisticated display, • but is more complex and expensive to apply and less likely to be adopted in a standard fashion across all communities, i.e. less likely to produce interoperable data.

SOME PROBLEMS? • Granularity issues: • Currently, we demarcate a surname from a forename by putting the surname first, followed by a comma and than the forename. • Even that amount of granularity can sometimes pose a problem for a cataloger who does not necessarily know which part of the name is "surname" and which part is "forename" in a culture unfamiliar to the cataloger.

SOME PROBLEMS? • Granularity issues: • Currently we do not collect information about gender. • If we were to increase the granularity of our data in order to gather that information, we would encounter situations in which the cataloger would not necessarily know if a given creator was a female or a male or of some other sexual orientation.

SOME PROBLEMS? • Granularity issues: • Currently, if we are adding a birth and/or death date, whatever dates we use are all together in a $d subfield, without any separate coding to indicate which date is birthdate and which is death date (although an occasional b. or d. will tell us this kind of information). • We could certainly provide more granularity for dates, but that would make the MARC format just that much more complex and difficult to learn.

SOME PROBLEMS? • Granularity issues: • People who dislike the MARC format already argue that it is too granular and therefore requires too much of a learning curve before people encoding data using MARC can learn to use it. • How much of the granularity already in MARC is used either in existing records, or even if present, is used in indexing and display software?

SOME PROBLEMS? • Granularity issues: • Granularity costs money and libraries and archives are already starving for resources. • Granularity can only be provided by people, and people are expensive. • One frightening thing about the Internet is that it seems to be based on an economy of free intellectual labor. Only the programmers get paid. Everyone else is a volunteer.

SOME PROBLEMS? • Other issues: • Potentially every piece of data describing a particular entity could be represented by a URI leading out to a SKOS list of data values. Is the Internet really fast enough to assemble a record from hundreds of URI’s in a reasonable amount of time?

SOME PROBLEMS? • If the work is represented by a URI and the author of the work is represented by a linked URI, • how would it be possible to guarantee success for a user that searched on • a variant of the author name • in combination with a variant of the title?

SOME PROBLEMS? • There is a cross reference from FBI to United States. Federal Bureau of Investigation, but not from FBI Counterterrorism Division to United States. Federal Bureau of Investigation. Counterterrorism Division. For that reason, a search in any OPAC name index for FBI Counterterrorism Division will fail.

SOME PROBLEMS? • The solution to this problem is to define a transitive or inheritance relationship between a corporate body and its corporate subdivisions.

SOME PROBLEMS? • Unfortunately, RDF seems to resist hierarchical relationship. • It assumes that you just need to connect everything to everything else without needing to express any hierarchy.

SOME PROBLEMS? • This is bad news for bibliographic data which is rife with hierarchical relationships. • Hierarchy is one of our major tools for expressing meaning to our users.

SOME PROBLEMS? • Can all bibliographic data be reduced to either a class or a property with a finite list of values? Another way to put this is to ask if all that catalogers do could be reduced to a set of pull-down menus?

SOME PROBLEMS? • Is there an assumption on part of semantic web developers that a given type of data, such as publisher name, would be EITHER “literal” (i.e. transcribed or composed) OR represented by a URI (controlled)?

SOME PROBLEMS? • Cataloging is rooted in humanistic practices that require careful recording of evidence. There will always be a value in distinguishing (and labelling as such) the following types of data: • copied as is from an artifact (transcribed) • supplied by a cataloger • categorized by a cataloger (controlled)

SOME PROBLEMS? • I notice that Tim Berners-Lee, the father of the Internet and the Semantic Web himself, emphasizes the importance of recording not just data, but where the data came from, for the sake of authenticity (see February 7, 2008 interview of Sir Tim Berners-Lee by Talis http://talis-podcasts.s3.amazonaws.com/twt20080207_TimBL.html)

SOME PROBLEMS? • For many data elements, therefore, it will be important to be able to record BOTH a literal (transcribed and/or composed form) AND a URI (controlled form) • Is this a problem in RDF?

WHAT I HAVE FOUND OUT FROM AN ATTEMPT TO BUILD AN RDF MODEL OF FRBR-IZED CATALOGING RULES

WHAT I HAVE FOUND OUT FROM AN ATTEMPT TO BUILD AN RDF MODEL OF FRBR-IZED CATALOGING RULES

Presentation Transcript

I have an idea!

(An attempt to build a) T op Bar Bee Hive

So I have an idea, now what?

I have an iPad . Now What?

I have an iPad . Now What?

I have an iPad ….Now what?

An Introduction to RDF:

Thumb Rules “What does it take to build an Academic Fleet?”

THE YEE CATALOGING RULES: FRBRIZED CATALOGING RULES WITH AN RDF DATA MODEL FOR THE SEMANTIC WEB

An Introduction to RDF Schema

I have an iPad... Now What?

What We Found Out

What have you found?

What does central IT really cost? An attempt to find out!

I have an iPad ….Now what?

What does central IT really cost? An attempt to find out!

Cataloging Manga: an approach

I have an iPad ….Now what?

An Introduction to RDF Schema