1 / 33

Union Catalog Architecture

Union Catalog Architecture. Tsach Moshkovits, Development Team Leader. Olybris, Ex Libris Seminar 2005 Kos, April 2005. Overview. The Union Catalog is a sophisticated mechanism that supports the integration of disparate libraries into a single environment.

vaz
Download Presentation

Union Catalog Architecture

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Union Catalog Architecture Tsach Moshkovits, Development Team Leader Olybris, Ex Libris Seminar 2005 Kos, April 2005

  2. Overview • The Union Catalog is a sophisticated mechanism that supports the integration of disparate libraries into a single environment. • By environment, we mean a unified User view, rather than a single database or a merged index.

  3. Overview • The following will be discussed in this session: • Union catalog structure • Union catalog vs. Unified catalog • Equivalency • Merge

  4. A Unified Catalog • Usually, a Union catalog involves a catalog where all Equivalent records are merged into one new record. • In this scenario, the original records are not saved, and the index is built on the merged version of the records. • Obviously, the merged record must include information about its different parts to allow navigation from the record to remote resources.

  5. Unified Catalog Drawbacks • Match and Merge is preformed on load time, record by record. This is a slow process when additional resources are added. • A new resource may not be available until the slow load process is completely finished. • Updating a record is complex, since it may require more than just updating its merged record. This is true because the equivalence relation is not necessarily transitive.

  6. Unified Catalog Drawbacks • Merging becomes even more problematic if the merge algorithm suggests that not all data is preserved for every source record. In such a case, any match and merge process must re-access all remote resources to retrieve all original records. • It is also impossible to update the unified catalog with a standard Cataloging GUI.

  7. C Merge Equivalence Table (Z120) B Create Equivalence ALEPH Union Catalog “Just in Time” A Import Load / Catalog New/Update/Delete Indices Original Records Unified Catalog Structure – Virtual Approach Contributors

  8. Union Structure – Level A • Records are stored as distinct entities in the database. • Records can be loaded from an external resource or cataloged with the ALEPH Cataloging module. • Records from an external resource can hold an identifier to the external resource to allow simple updating or navigation to an external resource. • Indices are created using the standard ALEPH indexing scheme.

  9. Union Structure – Level B • An Equivalence table is created by mapping each record to its equivalent records. • The equivalence relation is not necessarily transitive. • This table can be recreated any time, leaving the records intact.

  10. Union Structure – Level C • Result sets will be de-duplicated to contain only one record per group of equivalents. • Browse lists will de-duplicate their counters to count only one record per group of equivalents. • User View uses on-the-fly Merge to present a single record that is built from a group of equivalents. • The Merge algorithm can vary from user to user.

  11. Virtual Approach Advantages • It is simple to update a record by unlinking it from the Equivalence table and marking it as “New.” This action breaks all existing connections in the group. • A new record is simply inserted as equivalent only to itself. • In all cases, the data of each record stays intact in the database.

  12. Virtual Approach Advantages • A separate job runs on all equivalency tables marked as “New.” The job assures that records in a group are evaluated for their real equivalency. • It takes no longer to load external resources here than it does to load and index in ALEPH.

  13. Virtual Approach Advantages • The worst-case effect of update, insert, or delete is that between the time a record is updated, until the time that equivalency entries are (re)created, the group of equivalent records appears as non-equivalent. • There is 100% uptime.

  14. Virtual Approach Advantages • The same uptime considerations apply if the match algorithm is to be changed. • Changing the merge algorithm has absolutely no effect, since it is executed “just in time.”

  15. Equivalency Table Creation • An equivalency table is created for each record in the database, and points to itself. • Pool selection: • The equivalency search is minimized to a certain number of candidates. • This is usually done on a direct index, such as ISBN, ISSN, or LCCN, and is therefore relatively fast. • If the number of candidates exceeds a certain limit, the record itself will be considered as the only candidate.

  16. Equivalency Table Creation • Final match: • The equivalent records from the pool are found. • Matching and conflicting fields are searched. • Matching adds a positive weight, while conflicts add a negative weight. • The total weight is checked against a threshold.

  17. Equivalency Table Creation • When both stages are complete, each record has a Z120 record, holding the numbers of all equivalent records. • Z120 is never empty. It holds the record’s own number if no equivalencies are found. • Both the pool selection program and the match program are table-defined, not hard-coded

  18. Merge • When a user wants to view a record, a merge is done on all the records in its equivalency table, combining them into a single display. • No merged record actually exists in the database. This is a virtual display created on request.

  19. Merge • A merged record display is built by taking the “basic” fields from the preferred record and adding other fields from each of its equivalent records. • The preferred record is selected by assigning weights to all the equivalent records based on table-defined criteria, and the top weight wins. • The merge program is also table-defined.

  20. Implementation • The union_global_param tables defines the programs (algorithms) used for different Union Catalog tasks. • ! 1 2 3 4 • !!!!!-!-!!!!!!!!!!!!!!!!!!!!-!!!!!!!!!!!!!!!!!!!! • USM90 B candidate_prog union_candidate_cdl • USM90 B match_prog union_match_cdl • USM90 B preferred_prog union_preferred_cdl • USM90 B merge_prog union_merge_aleph • USM90 B normalize_prog union_normalize_cdl

  21. Preferred Table – An Example • !!!!!-!!!!!!-!!!!!!!!!!-!!!!!!!!!!!!!!!!!!!!!!!!!-!!! • LDR F05-01 EQUAL d -10 • LDR F17-01 NOT-EQUAL 1,2,3,4,5,7,8,u,z 001 • 100## PRESENT 001 • 110## PRESENT 001 • 111## PRESENT 001 • 130## PRESENT 001 • The table defines a value for each field. All values are added according to the specifications in the middle columns. • The record with the highest value is selected as the preferred record.

  22. Match Table – An Example • !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!-!-!!!> • date exact match + 200 • date within 2 - 025 • date mismatch - 250 • short title match + 450 • full title match + 600 • full title occur within + 350 • full title mismatch - 600 • full title keywords + 450 • full title keywords order + 050 • 260b exact match + 100 • 260b occur within + 100 • 260b mismatch - 025 The accumulative sum will be compare against a defined threshold

  23. Match Table – An Example • Different fields are compared to determine whether two records match. • For each field, if a match is found, the plus value is added to the total match weight. Otherwise, the minus value is subtracted from the total matched weight. • The threshold in the first line defines the weight above which two records are considered a match.

  24. Workflow Illustration Resources Contributors queue of new/updated records Single BIB record BIB’s pool of candidates BIB’s pool of matched records (= equiv table)

  25. Two Types of Union Catalogs • “Union Catalog” - On top of Bibliographic + Holdings database • “Union View” - On top of ALEPH 500 administrative database

  26. Bibliographic and Holdings Database UNION CATALOG Normalize records JUMP SOURCE 1 SOURCE 2 SOURCE 3

  27. Bibliographic and Holdings Database • When records are loaded from various resources, fixes are done to normalize their structure and data. • Checks could be performed prior to the load so that incompatible records are rejected.

  28. Bibliographic and Holdings Database Jump to original View in union holdings

  29. ALEPH 500 Database Union Catalog - User View BIB 3 BIB 1 BIB 2 ADM 1 ADM 2 ADM 3 Librarian View

  30. ALEPH 500 Database • Records are managed in standard ALEPH 500 in a single BIB and ADM library, but separately per sub-library or administrative unit. • The Staff User view does not change from an administrative GUI prospective. • A user (patron) has a unified view on the PAC.

  31. ALEPH 500 Database

  32. ALEPH 500 Database

  33. ALEPH 500 Database

More Related