theme 3 architecture
Download
Skip this Video
Download Presentation
Theme 3: Architecture

Loading in 2 Seconds...

play fullscreen
1 / 15

Theme 3: Architecture - PowerPoint PPT Presentation


  • 67 Views
  • Uploaded on

Theme 3: Architecture. Q1: Who houses stuff, both records and identifiers. All useful services and repositories are centralized (latency, etc.) … but centralizing content will be costly, require agreements, create liabilities re: versioning, etc. etc. – problematic as a short-term goal

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Theme 3: Architecture' - adelie


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
q1 who houses stuff both records and identifiers
Q1: Who houses stuff, both records and identifiers
  • All useful services and repositories are centralized (latency, etc.) … but centralizing content will be costly, require agreements, create liabilities re: versioning, etc. etc. – problematic as a short-term goal
  • Overall specialized repositories are proliferating, not converging
  • If the content stays only in the subject-specific repositories (SSRs)
    • Provide opt-in storage services (funding model?)
    • Provide audit function re: repository compliance with standards (e.g. RLG/OCLC trusted repository guidelines)
    • Provide information/guidance on formats (risk, migration)
    • Extend JHOVE for key formats important to the community
q1 who houses stuff both records and identifiers cont
Q1: Who houses stuff, both records and identifiers (cont.)
  • Many formats in the field … most data in a small number of formats but data in the long tail is very important (engage GDFR?)
  • Metadata may be more widely replicated than data
  • External resources (SSRs): utilize OpenURL to facilitate (and distinguish between) access to data, services, metadata, etc. for a single item – link journal-hosted data with additional/ancillary data hosted by DRIADE?
  • Service level agreements
q2 it is productive to process full text for automated generation of context metadata
Q2: It is productive to process full-text for automated generation of context metadata?
  • Yes, but …
  • There a variety of ways to do this … quantitative analysis less costly, natural language processing requires more investment
  • More can be done if access full text is allowed (comb full text for linkages, etc.)
  • Portal searches can also be contextualized using a ‘bag of words’ approach to describing subfields as indexes
  • Combination of statistical processing, natural language processing, rise of XML-based metadata, can help
  • Can capture administrative/technical metadata in data flows
q3 does storing a local copy make sense for a ssr handshaking
Q3: Does storing a local copy make sense for a SSR handshaking?
  • Helps to assure persistent access to content (as with CiteSeer) … but comes with burden and responsibility
  • Data vs. application – need to secure access to underlying data … replicating AJAX-y services very, very hard
  • Versioning is a key issue here
q4 is everyone in agreement with the don t compete with google conclusion
Q4: Is everyone in agreement with the ‘don’t compete with Google’ conclusion?
  • Yes and no: develop community-specific discovery environments
  • … but also expose content to Google (expose, contextualize, refer to domain-specific systems) – leverage commonly used interfaces
  • Google, Microsoft etc. now highly value highly-curated collections and are actively engaging them
  • Google’s current interface is the big thing now … be prepared to interface with the next big thing
  • Worldcat.org as an advanced discovery environment for scholarly material: including (increasingly) data
q5 what are the pros and cons of dois handles and other identifiers
Q5: What are the pros and cons of DOIs, handles, and other identifiers?
  • One of most important issues DRIADE will face
  • Persistent, actionable identifiers vs. unique identifiers in various sub-domains and individual institutions (an item will have many IDs)
  • Question of DOI expense, connection to publishers
  • Need community understanding of a ‘canonical identifier’
  • Need a community discussion in terms of what is important about identifiers
    • Who controls/changes, software used, locally-hosted?
    • What cost? Branding? Need resolution data?
    • 3rd party assignment of persistent identifiers?
q5 what are the pros and cons of dois handles and other identifiers cont
Q5: What are the pros and cons of DOIs, handles, and other identifiers? (cont.)
  • Need to promote datasets to primary resources (not just subordinated to article) in references and discovery
  • For multi-file datasets – need to link to surrogate or package
  • Identifiers as “micro-billboards”—and generators of data about contextual use of data (resolution data)
q6 data and applications where does the complexity live
Q6: Data and applications: where does the complexity live?
  • Leave it up to the community to develop best practices over time
  • Over-engineering here will make it harder to be responsive to change
  • Facilitate and let practice develop within sub-communities (testbeds for innovation)
  • Content packaging plays a role here: bundling data with services, documentation, etc.
  • Utilize (and cultivate) web services and lightweight APIs to facilitate access across and between systems
  • Some opportunities to ‘dessicate’ replications from complex applications
q7 how does death fit into the metadata lifecycle
Q7: How does death fit into the metadata lifecycle?
  • ‘Tombstoning’ for dead data
  • Data euthanasia?
  • Shifts in contact info (author, data custodian)
q8 how to nurture bottom up growth of data standards
Q8: How to nurture bottom-up growth of data standards?
  • Help to foster individual sub-communities, and cultivation of best practices at the sub-community level that can be used to inform other efforts or the broader infrastructure
  • Sharing and re-use encourages consolidation of standards/best practice—cultivating mechanisms for sharing/re-use may help with achieving data consistency
  • Start from existing baseline standards -- perhaps offer broad generalized standards as a starting point?
ad