1 / 43

FEDORA Project McGill University May 17 2004

FEDORA Project McGill University May 17 2004. Bill Parod Academic Technologies Northwestern University bill-parod@northwestern.edu. Priorities for digital libraries. Managing digital resources as if they are all the same

lawanda
Download Presentation

FEDORA Project McGill University May 17 2004

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. FEDORA Project McGill University May 17 2004 Bill Parod Academic Technologies Northwestern University bill-parod@northwestern.edu

  2. Priorities for digital libraries • Managing digital resources as if they are all the same • Delivering digital resources as if they are all unique and free to participate in any number of contexts • Supporting digital scholarship wherever it may lead Slide courtesy of Sandy Payette and Thornton Staples

  3. Shortcomings of commercial digital library products • Narrow focus on specific media formats (e.g. image databases, document management) • Fail to effectively address interrelationships among digital entities • Fail to address interoperability. • Fail to provide facilities for managing programs and tools that deliver digital content. • Not extensible; do not enable easy integration of new tools and services Slide courtesy of Sandy Payette and Thornton Staples

  4. The Flexible Extensible Digital Object Repository Architecture (FEDORA) • Developed as a DARPA and NSF-funded research project at Cornell (1997-present) • Interpreted and re-implemented at University of Virginia (1999) • Virginia prototype supported a testbed of 10,000,000 digital objects with very good results (1999-2001) • Andrew W. Mellon Foundation granted Virginia and Cornell $1,000,000 to develop a full-featured production FEDORA system that is web-based (2002+) Slide courtesy of Sandy Payette and Thornton Staples

  5. Digital Object Model Slide courtesy of Sandy Payette and Thornton Staples

  6. Digital Object Model Architectural View Globally unique persistent id Persistent ID ( PID ) Public view: access methods for obtaining “disseminations” of digital object content Disseminators Internal view: metadata necessary to manage the object System Metadata Datastreams Protected view: content that makes up the “basis” of the object Slide courtesy of Sandy Payette and Thornton Staples

  7. Digital Object Model Example Disseminators Get Profile List Items Get Item List Methods Get DC Record Persistent ID ( PID ) Disseminators Default Simple Image Get Thumbnail Get Medium Get High Get VeryHigh System Metadata Datastreams Slide courtesy of Sandy Payette and Thornton Staples

  8. Object Behavior Contracts Persistent ID (PID) System Metadata Datastreams Behavior Definition Metadata Persistent ID (PID) System Metadata Persistent ID (PID) Datastreams Disseminators Service Binding Metadata (WSDL) System Metadata Datastreams Behavior Definition Object behavior subscription Data Object behavior contract data contract Web Service Slide courtesy of Sandy Payette and Thornton Staples Behavior Mechanism Object

  9. Shared Image Behavior Definitions Slide courtesy of Sandy Payette and Thornton Staples

  10. Client and Web Service Interactions user user user Client application web browser Server application Client application Slide courtesy of Sandy Payette and Thornton Staples Fedora Service APIs Fedora Repository System Content Transform Service Content Transform Service External Service Dispatch API API

  11. Fedora 1.2 Software Feature Set • Open Fedora APIs • Repository as web services (REST and SOAP bindings); WSDL interface defs • Flexible Digital Object Model • Content View: objects as bundle of items (content and metadata) • Service View: objects as a set of service methods (“behaviors”) • Extensible functionality by associating services with objects • Repository System • Core Services: Management, Access/Search, OAI-PMH • Storage: XML object store; relational db object cache; relational db object registry • Mediation - auto-dispatching to distributed web services for content transformation • Auto-Indexing – system metadata and DC record of each object • HTTP Basic Authentication and Access Control • Built-in disseminator services: XSLT x-form, image manipulation, xml-to-PDF • Content Versioning • Automatic version control (saves version of content/metadata when modified) • Enables date-time stampedAPI requests (see object as it looked at a point in time) • Clients • Fedora Administrator: GUI client to create/maintain objects • Default Web browser interface: search; access objects via default disseminator • Command line utilities (batch load, ingest, purge, others) • Migration Utility – mass export/ingest

  12. Management Service (API-M) Ingest - XML-encoded object submission Create - interactive object creation via API requests Maintain - interactive object modification via API requests Validate – application of integrity rules to objects Identify - generate unique object identifiers Security - authentication and access control Preserve - automatic content versioning and audit trail Export - XML-encoded object formats Access Service (API-A and API-A-LITE) Search - search repository for objects Object Reflection - what disseminations can the object provide? Object Dissemination - request a view of the object’s content OAI-PMH Provider Service OAI-DC records Fedora Repository Service Interfaces Slide courtesy of Sandy Payette and Thornton Staples

  13. Fedora Software Distribution Package • Open Source (Mozilla Public License) • 100% Java (Sun Java J2SDK1.4) • Supporting Technologies • Apache Tomcat 4.1 and Apache Axis (SOAP) • Xerces 2-2.0.2 for XML parsing and validation • Saxon 6.5 for XSLT transformation • Schematron 1.5 for validation • MySQL and Mckoi relational database • Oracle 9i support • Deployment Platforms • Windows 2000, NT, XP • Solaris • Linux Slide courtesy of Sandy Payette and Thornton Staples

  14. FEDORA at Northwestern University

  15. General Background • Academic Technologies unit of IT • Develop and support faculty projects • Library partnerships • Institutional partnerships • Diverse clientele • Diverse content

  16. Current FEDORA Projects • Block Museum of Art • The Last Expression Art Collection • Introduction to Asian Art History • BBC Spoken Word Archive • Encyclopedia of Chicago • WordHoard Text Analysis Project • Various image collections

  17. General Goals • Efficient production - code reuse • Efficient access – content reuse • Content flexibility • Implementation flexibility • Content management • Implementation management

  18. Art collections Wall murals Photographs Historical maps GIS maps Newspapers Book page images Digital video Spoken word Literary works Encyclopedias Lexical data Census data Event data Diversity of Content

  19. RDBMS XML Databases XSLT Processors GIS Wavelet Image Servers Vector Image Processors Streaming Media Servers Custom Servlets Diversity of Systems

  20. Art collections Wall murals Photographs Historical maps GIS maps Newspapers Book page images Digital video Spoken word Literary works Encyclopedias Lexical data Census data Event data Abstract Image Models

  21. Art collections Wall murals Photographs Historical maps GIS maps Newspapers Book page images Digital video Spoken word Literary works Encyclopedias Lexical data Census data Event data Abstract Text Model

  22. Digital video Spoken word Literary works Encyclopedias Lexical data Census data Event data Art collections Wall murals Photographs Historical maps GIS maps Newspapers Book page images Time-based Media Model

  23. Behaviors by Type

  24. Simple Image Simple Image model used for art collections Collection-specific page style is achieved by bundling Xslt style sheet as data-stream with collection object The same image model can be used for different collections

  25. Zoomable image Zoomable image xslt includes zooming controls Collection specific style is also achieved with XSLT “Zoomable” image also provides simple image behavior Can participate in basic image applications in this way

  26. Collection behavior getSearchForm performSearch getItem getItems addItem deleteItem reindex displayItem Collection Object

  27. Customizing collection objects Collection objects all leverage common search functionality Each provide their own xslt for search results So new collections can be brought up easily This is true regardless of the collection type: image, audio,…

  28. Search Implementation • FEDORA METS files currently indexed offline • Plan to integrate update notification and indexing • Search Engine • Have 3 implementations: • FEDORA native search • Sgrep • OpenText • Investigating SRW/CQL • Search results passed through XSLT • Easy to provide search capability to collections

  29. TEI Text

  30. Bound Volume TEI Book object For transcribed and/or page image scans Table of Contents tree viewer Zoomable image object for page scans

  31. Content Re-use • Contextualization • Collection maintenance • Topical galleries • Ad-hoc or dynamic collections • For classes... • personal collections… • special exhibits…

  32. Specialized clients “Project Pad” software Group/Private network folders Image annotation Audio annotation Client for FEDORA image and audio objects

  33. Image workspace

  34. Implementation flexibility • Development vs production environment • Avoid product “lock-in” • Technology migration • Services are external • Image server • Tomcat servlets • Search engines • Table of contents service • Xquery • RDBMS

  35. FEDORA – External Services External Services Cache data Dissemination Requests Image Server FEDORA Search Engine BMECH Data Request Dissemination RDBMS Data stream TOC Server

  36. Next Steps • Implement more object types • Event, video, tabular data • Authoring tools • Work flow support • Security management • Content management tools • Wider interoperability

  37. Image Workflow: FEDORA – TrueSpectra – Xythos Department Academic Technologies Users Dissemination Requests Metadata in Excel METS FEDORA TrueSpectra Image Server Data flow Requests Tiffs in Xythos • Catalog in Excel converted to METS for FEDORA ingest • Tiff Masters deposited in collection’s Xythos directory • Access to Xythos directory enabled for TrueSpectra virtual paths • METS/FEDORA record includes link to TrueSpectra image • Access to image is through FEDORA image behaviors

  38. Physical Collection Management Scenario: FEDORA – Content Service – Xythos Integration Faculty or Support Academic Technologies Users Dissemination Requests Files in Xythos Auto-ingester FEDORA TrueSpectra Streaming Server Search Data flow Requests Metadata update • FEDORA collection object attached to Xythos directory • Xythos notifies collection object of changes in the directory • File added – collection creates new member item • File updated – item accepts new version for file stream • File removed – item is set dormant in FEDORA • Metadata added/updated online or batch

  39. Summary • Code reuse through object abstraction • Content reuse through clear object models • Flexible implementation binding • Flexible content modeling

  40. Fedora Object XML (FOXML) Internal storage format; direct expression of Fedora object model Better support for relationships (“kinship” metadata) Better support for audit trail (event history) Format identifiers for dynamic service binding Shibboleth authentication Policy Enforcement XACML expression language Fedora policy enforcement module Web interface for easy content submission Batch object modification utility Administrative Reporting Object Event History (ABC/RDF disseminations) Better support for “collections” New ingest and export formats (METS1.3, DIDL) Future Software Releases December 2003 – December 2004 Slide courtesy of Sandy Payette and Thornton Staples

  41. Digital Library in a Box Full-featured DL application with “Fedora inside” Optimized for common set of content types Fedora Power Server Integrity Management Tools Service and link liveness checker Fault Tolerance Mirroring and Replication Peer-to-peer interoperability features Repository clustering Load balancing Object Creation Tools Workflow applications based on content models Web interface for document/content submission Future Development Proposals Slide courtesy of Sandy Payette and Thornton Staples

  42. Questions http://www.fedora.net Bill-parod@northwestern.edu

More Related