Optimizing Repositories and Identifiers in Architecture: Centralize, Store, and Share

Theme 3: Architecture

Q1: Who houses stuff, both records and identifiers • All useful services and repositories are centralized (latency, etc.) … but centralizing content will be costly, require agreements, create liabilities re: versioning, etc. etc. – problematic as a short-term goal • Overall specialized repositories are proliferating, not converging • If the content stays only in the subject-specific repositories (SSRs) • Provide opt-in storage services (funding model?) • Provide audit function re: repository compliance with standards (e.g. RLG/OCLC trusted repository guidelines) • Provide information/guidance on formats (risk, migration) • Extend JHOVE for key formats important to the community

Q1: Who houses stuff, both records and identifiers (cont.) • Many formats in the field … most data in a small number of formats but data in the long tail is very important (engage GDFR?) • Metadata may be more widely replicated than data • External resources (SSRs): utilize OpenURL to facilitate (and distinguish between) access to data, services, metadata, etc. for a single item – link journal-hosted data with additional/ancillary data hosted by DRIADE? • Service level agreements

Q2: It is productive to process full-text for automated generation of context metadata? • Yes, but … • There a variety of ways to do this … quantitative analysis less costly, natural language processing requires more investment • More can be done if access full text is allowed (comb full text for linkages, etc.) • Portal searches can also be contextualized using a ‘bag of words’ approach to describing subfields as indexes • Combination of statistical processing, natural language processing, rise of XML-based metadata, can help • Can capture administrative/technical metadata in data flows

Q3: Does storing a local copy make sense for a SSR handshaking? • Helps to assure persistent access to content (as with CiteSeer) … but comes with burden and responsibility • Data vs. application – need to secure access to underlying data … replicating AJAX-y services very, very hard • Versioning is a key issue here

Q4: Is everyone in agreement with the ‘don’t compete with Google’ conclusion? • Yes and no: develop community-specific discovery environments • … but also expose content to Google (expose, contextualize, refer to domain-specific systems) – leverage commonly used interfaces • Google, Microsoft etc. now highly value highly-curated collections and are actively engaging them • Google’s current interface is the big thing now … be prepared to interface with the next big thing • Worldcat.org as an advanced discovery environment for scholarly material: including (increasingly) data

Q5: What are the pros and cons of DOIs, handles, and other identifiers? • One of most important issues DRIADE will face • Persistent, actionable identifiers vs. unique identifiers in various sub-domains and individual institutions (an item will have many IDs) • Question of DOI expense, connection to publishers • Need community understanding of a ‘canonical identifier’ • Need a community discussion in terms of what is important about identifiers • Who controls/changes, software used, locally-hosted? • What cost? Branding? Need resolution data? • 3rd party assignment of persistent identifiers?

Q5: What are the pros and cons of DOIs, handles, and other identifiers? (cont.) • Need to promote datasets to primary resources (not just subordinated to article) in references and discovery • For multi-file datasets – need to link to surrogate or package • Identifiers as “micro-billboards”—and generators of data about contextual use of data (resolution data)

Q6: Data and applications: where does the complexity live? • Leave it up to the community to develop best practices over time • Over-engineering here will make it harder to be responsive to change • Facilitate and let practice develop within sub-communities (testbeds for innovation) • Content packaging plays a role here: bundling data with services, documentation, etc. • Utilize (and cultivate) web services and lightweight APIs to facilitate access across and between systems • Some opportunities to ‘dessicate’ replications from complex applications

Q7: How does death fit into the metadata lifecycle? • ‘Tombstoning’ for dead data • Data euthanasia? • Shifts in contact info (author, data custodian)

Q8: How to nurture bottom-up growth of data standards? • Help to foster individual sub-communities, and cultivation of best practices at the sub-community level that can be used to inform other efforts or the broader infrastructure • Sharing and re-use encourages consolidation of standards/best practice—cultivating mechanisms for sharing/re-use may help with achieving data consistency • Start from existing baseline standards -- perhaps offer broad generalized standards as a starting point?

Theme 3: Architecture

Optimizing Repositories and Identifiers in Architecture: Centralize, Store, and Share

Optimizing Repositories and Identifiers in Architecture: Centralize, Store, and Share

Presentation Transcript

Internet Architecture

Modeling Software Architecture with UML + CPN

M. Ryan Academic Decathlon 2005-06

Chapter 5: Security Architecture

The Micro Architecture Level

Computer Architecture I: Digital Design Dr. Robert D. Kent

DoD Information Enterprise Architecture v2.0

History of Architecture 47 Week 01

CSE503: Software Engineering Software architecture

Introduction to AVR ATMega32 Architecture

Welcome to the World of Architecture

History of Architecture

Modern Architecture

Lecture 4 Instruction Set Architecture

Theme:

THEME: Di- and p olysaccharides . Terpenes .

ARCHITECTURE HISTORY

Computer Architecture and Design – ECEN 350

ARCHITECTURE HISTORY

Ornaments in the Kazakhstan's urban architecture of 19th – 21st centuries. - The ppt-Presentation by O.N.Priemetz and K.