1 / 39

A Multi-Tiered Architecture for Distributed Data Collection and Centralized Data Delivery

A Multi-Tiered Architecture for Distributed Data Collection and Centralized Data Delivery . Stacy Kowalczyk and James Halliday. April 28, 2008. Project Overview. IN Harmony is An IMLS funded grant Awarded in Fall 2004 To be competed in Fall 2008 A partnership of

vinnie
Download Presentation

A Multi-Tiered Architecture for Distributed Data Collection and Centralized Data Delivery

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Multi-Tiered Architecture for Distributed Data Collection and Centralized Data Delivery Stacy Kowalczyk and James Halliday April 28, 2008

  2. Project Overview IN Harmony is • An IMLS funded grant • Awarded in Fall 2004 • To be competed in Fall 2008 • A partnership of • Indiana University Digital Library Program • Indiana University Lilly Library • Indiana State Library • Indiana State Museum • Indiana Historical Society IN Harmony – DLP Spring Forum 2008

  3. Project Goals • To provide a model for fostering collaborative digital library development by partnering with institutions with complementary collections; • To digitize a portion of the sheet music from these collections and offer access to these materials free of charge on the web; • To bring these materials and their attendant metadata together on a single web site, offering both federated searching of the entire collection and searching of one or more selected collections; IN Harmony – DLP Spring Forum 2008

  4. Deliverables • Tools to • Process the images • Capture metadata • Provide search and display functions • 10,000 pieces of sheet music scanned and cataloged • 4,000 Indiana University Lilly Library • 2,000 Indiana State Library • 2,000 Indiana State Museum • 2,000 Indiana Historical Society IN Harmony – DLP Spring Forum 2008

  5. Cataloging and Imaging Workflow Goals IN Harmony – DLP Spring Forum 2008 • Data integrity • Quality of the scans • Quality of the metadata • Accuracy of the links between page images • Accuracy of the links between metadata and images • Simplicity of use • Balance of flexibility and constraints April 28, 2008

  6. Cataloging and Imaging Use Cases IN Harmony – DLP Spring Forum 2008 • Catalog first • Scanning first • Metadata created in another system and imported into IN Harmony April 28, 2008

  7. Digitizing Quality Control • 2 phased Quality Control Process • Automated QC process verifies: • All TIFF tags of every digital file • TIFF must be uncompressed • Files names • Embedded profile appropriate to its bit depth • Consistency of pixel dimensions within a score • Appropriate resolution IN Harmony – DLP Spring Forum 2008

  8. Digitizing Quality Control (2) • Manual QC – at 100% pixel display, verify: • Correct page orientation and order • Correct color balance • Sharp and in-focus scan • No digital artifacts • When all QC is passed, derivative files are created • Large and small jpgs for screen delivery • PDF sized for 8.5 x 11 printing IN Harmony – DLP Spring Forum 2008

  9. Digitizing Quality Control Software

  10. Designing the metadata model IN Harmony – DLP Spring Forum 2008 • User studies • Work with the partners • Define fields • Write cataloging guidelines with partner input • Representation in MODS April 28, 2008

  11. Types of fields IN Harmony – DLP Spring Forum 2008 • Title elements • Name elements • Publication elements • Subject elements • Identification elements • Note elements • Cover information April 28, 2008

  12. Metadata Collection Tool

  13. Public Search and Discovery System Demo Customize footer: View menu/Header and Footer June 5, 2014

  14. Architecture OverviewJim Halliday Customize footer: View menu/Header and Footer

  15. IN Harmony Technical Overview Scanner Fedora Mass Storage System Web Browser SRU and http MODs Export Cataloging Client FTP Java Swing Quality Control Oracle Perl Web Application Authentication Service

  16. Getting Data Into IN Harmony IN Harmony – DLP Spring Forum 2008 2 primary data sources • Cataloging client • Image QC/upload application Other data sources • XML data exported from other cataloging systems • Score images exported from older systems April 28, 2008

  17. Image QC/upload application IN Harmony – DLP Spring Forum 2008 • User scans scores and uploads to IN Harmony server • User accesses Perl-based web application to initiate automated quality control • A second user proceeds with manual QC, then uses web application to signal that manual QC is finished • The application moves and backs up the files, creates derivatives, and alerts both Fedora and the internal database that the process is complete April 28, 2008

  18. IN Harmony Derivatives IN Harmony – DLP Spring Forum 2008 • Three sizes of JPG’s produced per page • Full (1200px high) • Screen (600px high) • Thumb (200px high) • Multi-page, playable PDF • Approx. 1MB for an average score April 28, 2008

  19. IN Harmony cataloging client IN Harmony – DLP Spring Forum 2008 • Standalone Java Swing based client • Connects to Oracle database and outputs MODS for Fedora ingestion • Implemented as a client-server application via web services using Axis • Specialized UI components (such as ‘smart’ combo boxes) assist with quick, correct data entry April 28, 2008

  20. Internal IN Harmony database IN Harmony – DLP Spring Forum 2008 Oracle database stores record and user data in our own internal format Communicates with upload/QC application, and cataloging client Cataloging client and internal scripts can output to MODS format for ingestion into Fedora April 28, 2008

  21. IN Harmony authentication IN Harmony – DLP Spring Forum 2008 • CAS (IU’s Central Authentication Service) is used to authenticate all users • Non-IU users must create IU Guest Accounts to authenticate • All account/password maintenance in user’s control April 28, 2008

  22. Fedora and IN Harmony IN Harmony – DLP Spring Forum 2008 • Fedora used as a single storage and infrastructure solution for Digital Library Program projects as IU • Data (score images and metadata) ingested into Fedora and referenced as METS objects • Master images sent to IU’s mass storage system • Derivatives stored internally • Objects indexed using Lucene for SRU-based searching April 28, 2008

  23. Fedora Object Model Collection Sheet music Copy Page

  24. IN Harmony end-user interface IN Harmony – DLP Spring Forum 2008 • Java Struts based web application • Offers searching, browsing, and record display • Each partner institution is offered a personalized view of their data only Interaction with Fedora • Application sends CQL queries to Fedora and retrieves MODS data which is transformed via XSLT • PURLs (persistent URL’s) are used to access image derivatives April 28, 2008

  25. METS Navigator IN Harmony – DLP Spring Forum 2008 • METS Navigator is used to page through scores online • Uses METS structmap to facilitate navigation • Allows views of multiple sizes of images • Released by IU as open source – see http://metsnavigator.sourceforge.net April 28, 2008

  26. IN Harmony Technical Overview Scanner Fedora Mass Storage System Web Browser SRU and http MODs Export Cataloging Client FTP Java Swing Quality Control Oracle Perl Web Application Authentication Service

  27. IN Harmony Links • IN Harmony Public Interface • IN Harmony Project Information • Cataloging Tool Release date – June 2008 IN Harmony – DLP Spring Forum 2008

  28. Questions? IN Harmony – DLP Spring Forum 2008

More Related