Cross Collection Discovery in the Yale Digital Commons

Cross Collection Discoveryin the Yale Digital Commons Youn Noh November 19, 2010

Outline • Introduction • Project background • Related work • Project context • Office of Digital Assets and Infrastructure • Yale Digital Commons • Current phase • Goals and deliverables • Campus partners and collections • Architecture and metadata • Demo • Future work

Digital Collections at Yale

Digital Collections at Yale • Yale faculty and students access Yale’s extensive collections for teaching and research. • Some Yale faculty would also like to archive their personal collections. • Yale’s Information Technology Services supports web site development for particular classes. Web sites are typically developed as one offs. • Yale’s Office of Digital Dissemination, in the Office of the Secretary, promotes the internationalization of Yale and the dissemination of Yale’s collections to the world.

The Problem of Silos

The Problem of Silos • Thematically related content is separated. • User interfaces have to be built for each collection. • Users may not know where to look. • The information architecture for Yale’s search environment is still largely based on organizational structure. • There is no easy way to drill down to content based on interests or information need. • Resources may not be organized or described in a consistent manner across collections.

Cross Collection Search (2007) • Mellon-funded Collections Collaborative re-grant project to enhance discovery, search, and access to Yale’s special collections. • Partnership led by Yale University Library and Yale’s Information Technology Services. • Proof-of-concept metadata aggregation using OAI-PMH. • Challenges and lessons learned • Reusable infrastructure requires upfront investment. • Payoffs are not immediate. • Sustainability is always an issue.

Single Search for Library, Archive and Museum Collections • Project sponsored by OCLC to create guidelines for the implementation of single search for local aggregations of LAM collections. • Working Group • Getty Research Institute • Minnesota Historical Society • Smithsonian Institution • Wellcome Trust • UC Berkeley • University of Calgary • Victoria and Albert Museum • Yale Center for British Art • Yale University • Final deliverable will be a white paper based on an internal survey that addresses issues identified by the Working Group.

ARTstor Shared Shelf • Project to develop a cataloging and image management system that integrates with the ARTstor Digital Library. • Target audience • Library visual resources collections • Instructional technology (and faculty) • Fills a gap. • No single image cataloging system has market dominance. • Leverages strengths. • The ARTstor Digital Library has a broad user base. • Cataloging interface is being developed iteratively based on requirements gathering and user testing at partner institutions. • Business model is being developed in consultation with the Shared Shelf Steering Committee, which includes Cornell and Yale.

Office of Digital Assets and Infrastructure • Provides strategic and operational leadership for the development of Yale’s digital assets and infrastructure. • Leads and coordinates collaboration among campus units. • Galleries, libraries, archives, and museums • Office of Digital Dissemination • Office of Public Affairs and Communications • Yale University Press • Identifies overlaps and gaps in infrastructure for teaching and research. • Arts Area Advisory Committee • Collections and Educational Technology • Provost’s Committee on Scholarly Publishing • Mass Storage Working Group

Yale Digital Commons • Provides a collaborative framework for developing services to support Yale’s digital assets throughout their lifecycle. • Supports digital production, collaboration, dissemination, and stewardship functions. • Improves sustainability of programs through larger-scale adoptions. • Services • Digital asset management • Digital preservation • Persistent linking • Cross collection discovery

M M M M M Content Export Aggregate M M M M M Metadata Messaging Yale Digital Commons Components iTunesU YouTube DAM Drupal Web Persistent Linking Digital Preservation Metadata • CollectionManagement • Systems • - Orbis • - TMS • eMU Kaltura CDN Data Warehouse/ Reporting Cross Collection Discovery Search OAI Isilon Mass Storage C1 C2

Cross Collection DiscoveryGoals and Deliverables • Goals • Develop shared practices and infrastructure. • Provide broader access to Yale’s collections. • Deliverables • Metadata aggregation service (built on OAICat) • Central OAI service provider harvests metadata from campus partners. • File transfer option for partners that do not implement providers. • Central OAI data provider provides aggregated metadata to external harvesters. • User interface and search service (built on VuFind) • Customized record displays based on metadata format. • Crosswalk for indexing, advanced search, and facets. • Normalized local controlled vocabularies for key fields. • Programmatic access provided via APIs.

Cross Collection DiscoveryCampus Partners and Collections • Yale Center for British Art • Paintings and Sculpture • Prints and Drawings • Rare Books and Manuscripts • Yale Peabody Museum • All departments • Yale University Art Gallery • All departments • Yale University Library • Map Collection • Lewis Walpole Library Prints and Drawings • Office of Digital Dissemination • Yale University on iTunes U

Cross Collection DiscoveryArchitecture

Cross Collection DiscoveryMetadata • Crosswalks and mappings • Local database schemas • eMU • TMS • Yale NetCast tool • Standard metadata formats • CDWA-Lite • Darwin Core • Dublin Core • MARC • VuFind / Solr index fields • Based roughly on MARC. • New fields added as needed. • XSL used for transformations.

Cross Collection DiscoveryMetadata • Facets and local controlled vocabularies • Access • Metadata must be in the public domain. • Assets may be restricted. • Important distinction to make in user interface for non-Yale users. • Local controlled vocabulary of integer values (0 for public domain and 1 for restricted access) used to designate type of access. • Providers host assets and handle user authentication. • Institution • Important for campus partners. • Collection • Corresponds to museum departments, library collections, and categories in iTunes U. • Provided as OAI sets. • Means of bringing together similar resources held by different units.

Cross Collection DiscoveryMetadata • Facets and controlled vocabularies • Creator (1XX) • For specimens? Scientific name author. • Type (LDR/06) • Museums use normalized local controlled vocabulary for classification developed for digital asset management system object models. • Topic (6XX) • Topical or iconographic description is important for access. • Museums are exploring social tagging to broaden access. • Genre (655) • Region (651, 650z, 690z) • Museums use culture. • Era (648, 650y, 690y) • For specimens? Periods, epochs, ages, groups, and formations.

Cross Collection DiscoveryDemo • Search • Item record display • Resource dissemination • Refine search

Cross Collection DiscoveryFuture Work • Usability • Stakeholder survey • User testing • Search analytics • Controlled vocabulary services • Use ARTstor vocabulary services. • Search optimization • Tweak Lucene / Solr to boost fields and records in search. • Topic modeling • Apply probabilistic text mining technique for learning topics across collections.

Cross Collection Discovery in the Yale Digital Commons

Cross Collection Discovery in the Yale Digital Commons

Presentation Transcript

Discovery Channel: Cross-Media Success

Discovery of Patterns in Digital Records

E-Discovery and Digital Forensics in the Cloud

Not Another Cross-Search Tool: The Digital Commons Network

Digital Commons

Seeding the Commons / ANDS dataset collection process

Cross-Scale Commons

Digital Commons @Brockport digitalcommons.brockport

The Commons

Digital collection

Digital Resource Commons Update

Digital Data Collection In the Field

The Commons

Cross Object Collection Information

Cross Site Collection Navigation

Digital Video Collection

The Commons