1 / 45

WHAT I HAVE FOUND OUT FROM AN ATTEMPT TO BUILD AN RDF MODEL OF FRBR-IZED CATALOGING RULES

WHAT I HAVE FOUND OUT FROM AN ATTEMPT TO BUILD AN RDF MODEL OF FRBR-IZED CATALOGING RULES. BY. Martha M. Yee Cataloging Supervisor UCLA Film & Television Archive myee@ucla.edu http://myee.bol.ucla.edu. INTRODUCTION. 1. Some definitions 2. The vision 3. The experiment

neva
Download Presentation

WHAT I HAVE FOUND OUT FROM AN ATTEMPT TO BUILD AN RDF MODEL OF FRBR-IZED CATALOGING RULES

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. WHAT I HAVE FOUND OUT FROM AN ATTEMPT TO BUILD AN RDF MODEL OF FRBR-IZED CATALOGING RULES

  2. BY • Martha M. Yee • Cataloging Supervisor • UCLA Film & Television Archive • myee@ucla.edu • http://myee.bol.ucla.edu

  3. INTRODUCTION • 1. Some definitions • 2. The vision • 3. The experiment • 4. Some problems?

  4. SOME DEFINITIONS • The semantic web: a way to represent knowledge; a knowledge representation language that provides ways of expressing meaning that are amenable to computation; a means of constructing maps of domains of knowledge consisting of class and property axioms with a formal semantics

  5. SOME DEFINITIONS • The semantic web • The web as huge shared database • Hyperdata replacing hypertext

  6. SOME DEFINITIONS • RDF (Resource Description Framework): a family of specifications for methods of modeling information that underpins the semantic web through a variety of syntax formats

  7. SOME DEFINITIONS • RDF (Resource Description Framework) • Data encoded as: • the subject of a triple (New York) • the predicate of a triple (has the postal abbreviation) • the object of a triple (NY)

  8. SOME DEFINITIONS • RDF (Resource Description Framework) • XML is commonly used to express RDF, but is not a necessity

  9. SOME DEFINITIONS • RDF (Resource Description Framework) • RDFS or RDF Schema is an extensible knowledge representation language providing basic elements for the description of ontologies, AKA RDF vocabularies

  10. SOME DEFINITIONS • RDF (Resource Description Framework) • RDFS data encoded as: • Class (= Entity); the subject of a triple • Class relationship (semantic linkage); the predicate of a triple • Class property (= Attribute); the object of a triple

  11. SOME DEFINITIONS • RDF (Resource Description Framework) • OWL (Web Ontology Language): a family of knowledge representation languages for authoring ontologies compatible with RDF

  12. SOME DEFINITIONS • RDF (Resource Description Framework) • SKOS (Simple Knowledge Organisation Systems): a family of formal languages built upon RDF and designed for representation of thesauri, classification schemes, taxonomies or subject-heading systems

  13. THE VISION • The Web as shared database instead of shared document store

  14. THE VISION • Instead of records, URI’s (Uniform Resource Identifiers) for entities: • URI for work containing all work attributes, including preferred name, variant names, but also much more data about work than our current authority records do

  15. THE VISION • URI for expression, containing all expression attributes, and linked back to work

  16. THE VISION • URI for manifestation, containing all manifestation attributes, and linked back to expression

  17. THE VISION • URI’s for persons, corporate bodies, places, subjects, etc. , including preferred name, variant names, but also much more data about person, corporate body, place or subject (concept or object) than our current authority records do

  18. THE VISION • If any data about a particular entity needed to be changed, it would be changed once at the URI and immediately accessible to all users, libraries and library staff by means of links down to local data such as circulation, acquisitions, and binding data

  19. THE EXPERIMENT • A set of cataloging rules that are more FRBR-ized than RDA in that they more clearly differentiate between: • data applying to the expression • vs. • data applying to the manifestation

  20. THE EXPERIMENT • You can find these rules at: • http://myee.bol.ucla.edu

  21. THE EXPERIMENT • I am now in the process of trying to model my cataloging rules in the form of an RDF/RDFS/OWL/SKOS model

  22. THE EXPERIMENT • I don’t seriously expect anyone to adopt these rules!

  23. THE EXPERIMENT • My research questions: • 1. Is it possible for catalogers to tell in all cases whether a piece of data pertains to the expression or the manifestation?

  24. THE EXPERIMENT • My research questions: • 2. Is it possible to fit our data into RDF/RDFS/OWL/SKOS?

  25. THE EXPERIMENT • My research questions: • 3. If it is, is it possible to use that data to design indexes and displays that meet the objectives of the catalog (providing an efficient instrument to allow a user to find a particular work of which the author and title are known, a particular expression of a work, all of the works of an author, all of the works in a given genre or form, or all of the works on a particular subject)?

  26. THE EXPERIMENT • You can find my RDF/RDFS/OWL/SKOS model at: • http://myee.bol.ucla.edu

  27. SOME PROBLEMS? • Can we do what we need to do within the context of the semantic web?

  28. SOME PROBLEMS? • More granularity, or data parsing by catalogers • Those familiar with RDA, FRBR, and FRAD development will recognize that much of that development is directed at increasing granularity in cataloger-produced data

  29. SOME PROBLEMS? • Granularity issues: • More structure and more granularity makes possible more powerful indexing and more sophisticated display, • but is more complex and expensive to apply and less likely to be adopted in a standard fashion across all communities, i.e. less likely to produce interoperable data.

  30. SOME PROBLEMS? • Granularity issues: • Currently, we demarcate a surname from a forename by putting the surname first, followed by a comma and than the forename. • Even that amount of granularity can sometimes pose a problem for a cataloger who does not necessarily know which part of the name is "surname" and which part is "forename" in a culture unfamiliar to the cataloger.

  31. SOME PROBLEMS? • Granularity issues: • Currently we do not collect information about gender. • If we were to increase the granularity of our data in order to gather that information, we would encounter situations in which the cataloger would not necessarily know if a given creator was a female or a male or of some other sexual orientation.

  32. SOME PROBLEMS? • Granularity issues: • Currently, if we are adding a birth and/or death date, whatever dates we use are all together in a $d subfield, without any separate coding to indicate which date is birthdate and which is death date (although an occasional b. or d. will tell us this kind of information). • We could certainly provide more granularity for dates, but that would make the MARC format just that much more complex and difficult to learn.

  33. SOME PROBLEMS? • Granularity issues: • People who dislike the MARC format already argue that it is too granular and therefore requires too much of a learning curve before people encoding data using MARC can learn to use it. • How much of the granularity already in MARC is used either in existing records, or even if present, is used in indexing and display software?

  34. SOME PROBLEMS? • Granularity issues: • Granularity costs money and libraries and archives are already starving for resources. • Granularity can only be provided by people, and people are expensive. • One frightening thing about the Internet is that it seems to be based on an economy of free intellectual labor. Only the programmers get paid. Everyone else is a volunteer.

  35. SOME PROBLEMS? • Other issues: • Potentially every piece of data describing a particular entity could be represented by a URI leading out to a SKOS list of data values. Is the Internet really fast enough to assemble a record from hundreds of URI’s in a reasonable amount of time?

  36. SOME PROBLEMS? • If the work is represented by a URI and the author of the work is represented by a linked URI, • how would it be possible to guarantee success for a user that searched on • a variant of the author name • in combination with a variant of the title?

  37. SOME PROBLEMS? • There is a cross reference from FBI to United States. Federal Bureau of Investigation, but not from FBI Counterterrorism Division to United States. Federal Bureau of Investigation. Counterterrorism Division. For that reason, a search in any OPAC name index for FBI Counterterrorism Division will fail.

  38. SOME PROBLEMS? • The solution to this problem is to define a transitive or inheritance relationship between a corporate body and its corporate subdivisions.

  39. SOME PROBLEMS? • Unfortunately, RDF seems to resist hierarchical relationship. • It assumes that you just need to connect everything to everything else without needing to express any hierarchy.

  40. SOME PROBLEMS? • This is bad news for bibliographic data which is rife with hierarchical relationships. • Hierarchy is one of our major tools for expressing meaning to our users.

  41. SOME PROBLEMS? • Can all bibliographic data be reduced to either a class or a property with a finite list of values? Another way to put this is to ask if all that catalogers do could be reduced to a set of pull-down menus?

  42. SOME PROBLEMS? • Is there an assumption on part of semantic web developers that a given type of data, such as publisher name, would be EITHER “literal” (i.e. transcribed or composed) OR represented by a URI (controlled)?

  43. SOME PROBLEMS? • Cataloging is rooted in humanistic practices that require careful recording of evidence. There will always be a value in distinguishing (and labelling as such) the following types of data: • copied as is from an artifact (transcribed) • supplied by a cataloger • categorized by a cataloger (controlled)

  44. SOME PROBLEMS? • I notice that Tim Berners-Lee, the father of the Internet and the Semantic Web himself, emphasizes the importance of recording not just data, but where the data came from, for the sake of authenticity (see February 7, 2008 interview of Sir Tim Berners-Lee by Talis http://talis-podcasts.s3.amazonaws.com/twt20080207_TimBL.html)

  45. SOME PROBLEMS? • For many data elements, therefore, it will be important to be able to record BOTH a literal (transcribed and/or composed form) AND a URI (controlled form) • Is this a problem in RDF?

More Related