1 / 4

Measurement Data Archive – Integration Effort mda.doregistry / GEC11 July 2011

Measurement Data Archive – Integration Effort http:// mda.doregistry.org / GEC11 July 2011 Giridhar Manepalli Corporation for National Research Initiatives http://www.cnri.reston.va.us /. Measurement Data Archive: Status.

zaide
Download Presentation

Measurement Data Archive – Integration Effort mda.doregistry / GEC11 July 2011

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Measurement Data Archive – Integration Effort http://mda.doregistry.org/ GEC11July 2011 Giridhar ManepalliCorporation for National Research Initiativeshttp://www.cnri.reston.va.us/

  2. Measurement Data Archive: Status • Deployed a prototype of measurement data archive that includes a temporary storage space, aka workspace • A hierarchical storage system that allows making collections of objects • Mints a persistent identifier that resolves to data • Indexes metadata to support queries and data discovery • Supports SFTP, SCP, SMB, REST, and Web-based Interface into the system • Early adopters in GENI: • OnTimeMeasure - Ohio State University • INSTOOLS - University of Kentucky

  3. Success Criteria for an Archive • Archive cannot be just a store-and-retrieve service. An eco-system surrounding the archive is needed to motivate communities into using it. • Visualization, policy enforcement, dissemination, etc. are examples of services an archive could provide. • To build such an eco-system, a basic understanding of what we store is necessary: • #1: Data Model. How do you define a data object? (Not how it is serialized, e.g., databases, file-systems, etc.). Do we need a data agnostic archive? Do we manage relationships across data objects? • Too many storage systems failed because of the lack of a proper data model. • #2: Metadata. What constitutes a metadata record? How is it associated with a data object? • Lack of metadata results in a pile of bytes in an archive. Building an eco-system of services with a pile of bytes is impossible. • #3: API. How is data (and metadata) pushed into an archive? What are the end-point definitions and data structures? • #1 and #2 are more important.

  4. Integration: Next Steps • Step #1: Define a data object. • Is data just a series of bytes? Or do we pack X, Y, & Z into it? • Are relationships across objects required or not? (Not nice-to-have, but are they required?) • Do we have data visibility criteria? Permissions, etc. • Step #2: Validate metadata recommendation. • Projects should generate a few metadata records with these goals: • To identify which elements are needed, which are optional, and which are not required. • To capture different profiles of data. Perhaps some elements are needed for one class of data, and other elements are needed for other class of data. • This may result in a few profiles. Although unlimited profiles are hard to manage, a limited number will result in less optional fields. • To validate the suggested controlled vocabulary for some of the elements, and to identify vocabulary where missing. Controlled vocabulary brings some order into metadata and discovery. • Step #3: Identify API. • What end-points and data structures are reasonable for a given project? REST+XML, XML-RPC, etc.

More Related