1 / 28

Automating Name Authority Record Updates and Bibliographic File Maintenance

Automating Name Authority Record Updates and Bibliographic File Maintenance. A Proof of Concept. Lucas Mak Michigan State University Libraries. Catalog M anagement Interest Group, ALA Annual, Chicago, IL, June 29, 2013. Authority Control at MSU.

arvid
Download Presentation

Automating Name Authority Record Updates and Bibliographic File Maintenance

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Automating Name Authority Record Updates and Bibliographic File Maintenance A Proof of Concept Lucas Mak Michigan State University Libraries Catalog Management Interest Group, ALA Annual, Chicago, IL, June 29, 2013

  2. Authority Control at MSU • 1.5 millions Authority Records (1.1 millions NARs) • In-house • NACO institution • Database maintenance • Post-cataloging Authority Control • New Headings Report • Download NARs from SkyRiver • Updates to NARs not necessary caught • 1XX (No item cataloged under changed 1XX  not in New Headings Report) • Elements other than 1XX (e.g. 4XX, 670)

  3. LC/NACO NAF RDA Transition • PCC Day 1 for RDA NAR: Mar. 31, 2013 • Phased reissuance of NARs • Phase 1 • Scope • NARs with characteristics known to be at variance with RDA practice • Not candidates for any of the mechanical changes to be made during phase 2 • Adding a 667 note “THIS 1XX FIELD CANNOT BE USED UNDER RDA UNTIL THIS RECORD HAS BEEN REVIEWED AND/OR UPDATED” • Completed Aug. 20, 2012 (436,943 records processed) • Phase 2 • Programmatic changes to 1XX headings that are not acceptable under RDA (e.g., changes to Bible headings, spelling out Dept. and months, etc., abbreviations in the subfield $d for personal names) • Completed March 27, 2013 (371,942 records changed)

  4. Updates of NARs by NACO institutions • Reviewing, upgrading, and recoding Phase 1 records to RDA • Adding any of the 17 new MARC fields (e.g. 046, 372, etc.) • Routine NAR maintenance • PCC post-RDA test guidelines “strongly encourage” to evaluate and recode the “RDA-acceptable AACR2 NARs” to RDA whenever possible

  5. Objectives • To catch changes to NARs • Changes in 1XX • Addition, deletion, or updates of elements other than 1XX • To perform related BFM if 1XX in a NAR is changed

  6. Tasks • To download NARs one-by-one/in bulk • To detect updates to NARs already existing in ILS • To overlay existing NARs with updated ones • Updates authorized access points (AAPs) in bib records if 1XX in NAR updated • To automate and link up the above tasks

  7. Task #1: Download NARs • OCLC LCNAF SRU Service • Can be searched by LCCN • Available in multiple schema including MARCXML • SRU-based service (HTTP request) • FREE!! • But: • Updated every Monday night • Bulk download – by search term (e.g. after certain date) • Implementation • Search LCCNs one-by-one by AutoIt script • Around 10 records/sec. retrieved • Download XML files into one folder (files named by LCCN)

  8. Task #2: NAR Update Detection • To compare NARs from ILS and NARs from LC/NACO NAF by XSLT • MARC 005 (timestamp) • If timestamp more current on the NAR from NAF  Overlay the NAR in ILS

  9. Task #3: Export/Overlay of NARs • MarcEdit • Export updated NARs into ILS • Through TCP/IP (Host address, Port, .mrc file) • One-by-one (though .mrc file can contain multiple NARs)

  10. Task #4: Updates of Bib AAPs • XSLT • To detect changes in 1XX between old and new NARs • To build AAP conversion table (a TXT file) when 1XX is changed • AutoIt • Automate bib AAP updates by “Global Update” module in ILS • Read old and new AAPs from the TXT file and fill out info required in “Global Update” process

  11. Task #5: Automation • Use AutoIt to: • Link up various steps in the workflow • Automate searching against OCLC LCNAF SRU Service by compiling and sending HTTP requests • Execute various XSLTs in a predetermined sequence • e.g. NAR comparison  AAP comparison • Read TXT files (LCCN list, AAP conversion table) created by XSLT processes • Run MarcEdit to overlay obsolete NARs • Execute “Global Update” process

  12. Basic Workflow Search by AutoIt LCCNs Extract by XSLT Retrieve Compare by XSLT ILS NARs LC/NACO NARs Updated Headings Updated NARs Extract by Create Lists Overlay by MarcEdit Global Update ILS

  13. Data Integrity Issue #1 • No ILS ARN in extracted NARs • Needed for 949 overlay command • Solution • Extract “LCCN” & “ILS ARN” pair through Create Lists • Merge ARN into extracted NARs (907$a) by XSLT/MarcEdit

  14. Data Integrity Issue #2 • NARs without 010 • 010 contains LCCN • Some LCCNs transposed into 035 • Original prefix (n, no, nb, nr) removed • Prepended with prefix (OCoLC) • Possibly done during system migration • Solution • Search string in 035 (excl. prefix) as keyword in SkyRiver • Retrieve complete LCCN from matched record • Search retrieved LCCN against OCLC Service and download the record

  15. Data Integrity Issue #3 • Existing NARs without 005 • No timestamp • Bring in the new NAR whenever the old NAR lacks 005

  16. Data Integrity Issue #4 • Local data in NAR • Local call no. (e.g. 050, 090, 053$5) • Institution code & initials (shared catalog) • Copy local data into new NAR before overlay

  17. Search and Retrieval Issue #1 • “Blank” XML File from OCLC LCNAF SRU Service

  18. Search and Retrieval Issue #1 (Cont’d) • No hit for some LCCNs • XML file size: < 2KB • LCCNs in places other than 010$a  Not indexed • Cancelled LCCNs (010$z) • Solution • Compile a list of LCCNs with file size < 2KB • Search LCCNs in SkyRiver by Keyword • Get new LCCNs from 010$a • Search OCLC LCNAF SRU Service using new LCCNs • But …

  19. Search and Retrieval Issue #2 • Keyword search in SkyRiver returns multiple hits • Undifferentiated & related NARs • Write LCCNs with multiple hits to a log file for manual review Person broken out from undifferentiated NAR Original undifferentiated NAR cancelled

  20. Search and Retrieval Issue #2 (Cont’d) • Keyword search in SkyRiver returns multiple hits • Same numeral part of LCCN with different prefixes • Write LCCNs with multiple hits to a log file for manual review NAR contributed via RLIN NAR contributed via OCLC

  21. Search and Retrieval Issue #2 (Cont’d) • Keyword search in SkyRiver returns no hit • The LCCN in question no longer exists in NAF • NAR containing cancelled LCCN was cancelled again • Loss of 010$z • Write no-hit LCCNs into log file for manual review

  22. Search and Retrieval Issue #2 (Cont’d) • Keyword search in SkyRiver returns no hit • False negative • Space between prefix and number removed • Hyphen within number removed (e.g. n 85-342238  n 85342238) • Search normalized LCCNs • Delay in returning result for a search due to slow or unstable Internet connection speed • Set a longer wait time before trying to copy new LCCN • Run keyword search in SkyRiver in loop until • Number of entries in log file equals to immediate preceding round, or • File size of the no-hit log file equals zero

  23. Global Update Issues • ILS interface navigation • AAPs with diacritics • Found by search in Global Update module but couldn’t be replaced • Code points & exact match in Global Update • Old AAPs not found • Corresponding bib records deleted  “Orphan” NARs • Write LCCN to log file for manual review

  24. Revised Workflow Retrieve New LCCN Search Search LCCNs Not Found & Search Found & Retrieve Extract Not Found/ Multiple Hits Compare LC/NACO NARs ILS NARs Updated AAPs Updated NARs Fishy NARs Merge Log File AAPs Not Found ARN- LCCN Global Update Overlay by MarcEdit ILS Extract

  25. Test Results • 82,398 NARs tested • 81,362 NARs needed to be overlaid* • 4,584 AAPs became obsolete • 10,900 bib records had at least one heading flipped * Many NARs exported from ILS do not contain field 005

  26. Limitations • Identities broken out from undifferentiated NARs can’t be detected • Partially taken care of by “New Headings Report” • AAPs have no corresponding NARs • Non-Latin script parallel APs in Field 880 • Scalability issues • Slow export using MarcEdit • Slow “Global Update” process • Memory intensive XSLT process • “Java heap space” out of memory error

  27. Possible Enhancements • “Data Exchange” module for NAR overlay • Data Exchange module – record load function • Manual intervention needed • SQL backend of Sierra (Sierra DNA) • Write SQL commands to batch changes • But, EDIT function not yet available through SQL command • AACP (Automatic Authority Control Processing) • Flip AAPs matching 4XX in NARs to corresponding 1XX in an overnight process • Replace “Global Update” with AACP • “Rig” undated NARs by inserting obsolete AAP as 4XX • Export “rigged” NARs to ILS to trigger the overnight process • Overlay exported “rigged” NARs in ILS with original updated NARs

  28. Questions? • Lucas Mak (makw@msu.edu)

More Related