250 likes | 389 Views
Implementing FRBR on Large Databases . Thomas Hickey Diane Vizine-Goetz OCLC Research. What is FRBR. IFLA study group report: Functional Requirements for Bibliographic Records Bibliographic model independent of cataloging rules Clusters bibliographic items into a f our-level structure
E N D
Implementing FRBR on Large Databases Thomas Hickey Diane Vizine-Goetz OCLC Research
What is FRBR • IFLA study group report: Functional Requirements for Bibliographic Records • Bibliographic model independent of cataloging rules • Clusters bibliographic items into a four-level structure • Work • Expression • Manifestation • Item
Work Concept Person Expression Object Manifestation Corporate Body Event Item Place Control of Entities in FRBR Entities Surrogates Uniform titles Citations Names Subjects
Why FRBR? • Potential to improve: • Cataloging • Discovery • Delivery • By • Bringing versions of works together • Showing relationships of various kinds • Enabling users to navigate to level of interest
Research on FRBR & WorldCat • Subsets • By library, region • Example/problem sets • Shakespeare, the Bible • Humphry Clinker • 1,000 random works • By genre • Dissertations • Fiction • Whole file, 47 million bibliographic records
Our Approach • Concentrating on work-level • Problems with expression-level clusters • Efficient, maintainable, understandable • Few, if any, false matches with correct cataloging • Err on the side of missed matches • Some accommodation of frequent variants • Compare with manually clustered
The Algorithm • A key is generated for each record • Extract author, title • Look up in NACO authority file • Added entry information as needed • Form a key from bibliographic record • Author, title, added entry information • These can be sorted, compared
Problems • Many (17%) records do not have • Author main-entry • Uniform title • In general these can not be matched • Look at added entries • Information at the expression and manifestation levels • Handled separately • 180,000 clusters involving ~400,000 records
Top 10 WorldCat Clusters # RecsAuthor/Title Key 8,383 bible\n t 8,055 bible 6,174 bible\authorized 4,033 bible\o t\psalms 3,964 haggadah 3,477 great britain/treaties etc 2,402 bible\o t 2,248 koran 2,153 arabian nights
Top 10 from a Public Library # RecsAuthor/Title Key 89 bible\authorized 85 mother goose 84 chopin, frederic\1810 1849/piano music 81 schulz, charles m/peanuts 63 davis, jim/garfield 61 moore, clement clarke\1779 1863/night before christmas 60 mozart, wolfgang amadeus\1756 1791/instrumental music 58 bach, johann sebastian\1685 1750/cantatas 57 beethoven, ludwig van\1770 1827/sonatas 56 twain, mark\1835 1910/adventures of huckleberry finn
Results • Manual estimate: 1.5 manifestations/work in WorldCat • Algorithm: ~1.3 • 25,844 clusters have 20 or more records • 401,659 clusters have 5 or more records
Preliminary Plans • Build structures for FRBR into new catalog • Expose FRBR clustering for searching • Make visible in cataloging • As consensus on implementation is developed • As cataloging rules accommodate FRBR
Spin-offs • NACO normalization code • Testbed • Server • Authority work • ePrints UK • FRBR in other projects • FictionFinder • NDLTD union catalog
Fiction Subset • 2,665,662 WorldCat records • 1,758,479 work clusters • 1.5 records/cluster • 3,866 clusters have 20 or more records • 50,540 clusters have 5 or more records
Top 10 clusters for fiction # RecsAuthor/Title Key 1,296 defoe, daniel\1661 1731/robinson crusoe 1,248 carroll, lewis\1832 1898/alices adventures in wonderland 971 cervantes saavedra, miguel de\1547 1616/don quixote 828 stevenson, robert louis\1850 1894/treasure island 689 twain, mark\1835 1910/adventures of huckleberry finn 624 twain, mark\1835 1910/adventures of tom sawyer 618 swift, jonathan\1667 1745/gullivers travels 600 andersen, h c\hans christian\1805 1875/tales 581 stowe, harriet beecher\1811 1896/uncle toms cabin 570 arabian nights
FictionFinder • Employs work clusters in a prototype system for searching and browsing bibliographic records for fiction • Indexes records at the work level and organizes displays by work and expression (primarily language) • Includes records for textual items; additional modes of expression (moving image, sound) to be added later
395 records for author “crichton, michael\1942” clustered into 17 entries
Benefits • Aggregated displays for works and expressions • Enhancement of (fiction) records at work level • with elements from records within the work cluster (e.g., summaries, genre terms, subject headings, class numbers) • with external data (e.g., literary prizes, prequels/sequels, evaluative content)
Challenges • Identifying appropriate bibliographic data for systematically grouping or differentiating works and expressions • Works • Genre (graphic novel v.s novel) • Genre + mode of expressions (audio book v.s radio play) • Degree of modification (abridgement of juvenile work v.s an adaptation for young children) • Expressions • translators, illustrators, editors
Next Steps • FRBR algorithm • Explore applications • Refine algorithm as needed • FictionFinder • Add records for sound and image • Conduct user studies
Links • Functional Requirements for Bibliographic Records - Final Report • http://www.ifla.org/VII/s13/frbr/frbr.htm • Experiments with the IFLA Functional Requirements for Bibliographic Records (FRBR) • http://www.dlib.org/dlib/september02/hickey/09hickey.html • OCLC Research Activities and IFLA's Functional Requirements for Bibliographic Records • http://www.oclc.org/research/projects/frbr/index.shtm • Implementing FRBR on Large Databases • http://staff.oclc.org/~vizine/CNI/OCLCFRBR.htm