1 / 13

Measurement Data Archive – Project Highlights GEC12 Nov 2011

Measurement Data Archive – Project Highlights GEC12 Nov 2011 Giridhar Manepalli Corporation for National Research Initiatives http:// www.cnri.reston.va.us /. Why Archive?. The obvious: for use by others or by yourself in the future The Fourth Paradigm Data-intensive science

Download Presentation

Measurement Data Archive – Project Highlights GEC12 Nov 2011

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Measurement Data Archive – Project Highlights GEC12Nov 2011 Giridhar ManepalliCorporation for National Research Initiativeshttp://www.cnri.reston.va.us/

  2. Why Archive? • The obvious: for use by others or by yourself in the future • The Fourth Paradigm • Data-intensive science • Emergent phenomena • Funding bodies increasingly asking for data plans • Citations from journal articles to data sets on the rise • Consistent archiving standards enhance the use of data over time and within a domain

  3. Measurement Data Archive Internet Public Journals 4 5 3 CNRI Workspace 3 3 Archive Workspace Internet 2 Workspace = Prototype 2 = Digital Object 2 Slice = Data Model TBD Slice Measurement Data Template Key: 1. Experiment Initiated 2. Measurement Data Collected 10510.0.1/0-L2NucmlnZW5p 1 3. Measurement Data Archived 1 4. Archived Data Referenced Experimenter Y Experimenter X 5. Archived Data Retrieved Object A Run 1 Logs Run 2 Logs DO DO Metadata

  4. Current Usage • Early adopters in GENI: • OnTimeMeasure - Ohio State University • INSTOOLS - University of Kentucky • Possible usage in other projects: • DARPA Transformative Apps program for managing mobile apps related data • Internal to CNRI for sharing documents and presentations across groups

  5. Next Steps – I&M Standpoint • Revisit the protocols for pushing data into workspace • Associate metadata with data effectively • Where does the metadata live? • How is it associated with data? At what level of granularity is it specified? • Support GENI and I&M schemes of authentication, authorization, metadata enforcement, etc. • Allow multiple workspace deployments • Identify the process to push data from workspace into the archive • Should metadata be enforced before data is pushed into the archive? • How is the data serialized in the archive? • How is data visibility managed in the archive?

  6. Next Steps – GENI-wide • Extend services offered by the archive beyond data storage • Developed a visualization service prototype to demonstrate automatic visualization of data for DataCite • Designed a theoretical model for enforcing terms & conditions, licenses, etc. prior to disseminating data • Goal: Expand archive into an eco-system to entice communities into using it • Use archive for experiments, not just for I&M

  7. Archive Services Suite of extensible services end users can leverage by following the ID. Science Times Article Title Data ID License Enforcement Visualization Terms:… Terms:… Terms:… I Agree I Agree I Agree SUITE OF SERVICES Data Set Dissemination Data Processing 2 1 10100 11010 101…. 10100 11010 101…. 10100 11010 101…. User followsData ID into the Archive. Archive User is redirected to requested Archive Service. Stores & Retrieves Data Other Experiments Ohio University VDC Experiment Experimenter Other Experimenters

  8. Measurement Data Archive – Project Highlights GEC12Nov 2011 Giridhar ManepalliCorporation for National Research Initiativeshttp://www.cnri.reston.va.us/

  9. Related Slides

  10. Prototype Limitations • Only one workspace service is deployed • Multiple workspaces, within and outside GENI networks, can be hosted that push data to the archive • Authentication and authorization model is simple and redundant • Should conform and use one scheme across GENI (or at least across I&M) • No metadata standard applied • I&M metadata requirements must be applied once identified

  11. What is Metadata and Why Do I Need It? • Lots of miscommunication because • Metadata is not a type of data • Metadata isa type of relationship between two pieces of data • Needed for Understanding and Finding • Understanding (sometimes called Descriptive MD) • How do I parse this? • How do I interpret this? • Finding (sometimes called Subject MD) • Finding one item in a population of 10 is easy • Finding one item in a population of 1M is impossible w/o some some way to distinguish them • Generally requires a human in the loop at some level • Sometimes the object is self-describing (journal article) • Automatic indexing/classification works for some domains

  12. Why is Metadata Hard? • To be effective it must be consistent, and consistently applied, within a given domain • What is the scope of the domain? • What aspects of the object need to be described? • What is the vocabulary, is it open or closed? • Even within a defined domain, there are many points of view • Especially true for any sort of subject description • May have to allow for multiple metadata records for a single described object • Spending time on creating good metadata is Good For You • The best sources for good metadata are the creators/owners of the described object, but they may lack interest and training • Some types of metadata are difficult to automate, e.g., good title • Keep it simple – trade consistency and coverage for depth

  13. Misc Points • Precision and Recall useful concepts in searching • Precision: % of search results are on target • Recall: % of the correct result set did my search retrieve • Desirable tradeoff is situational • Consider University Libraries as reliable archive holders • Variety of approaches to managing a useful vocabulary of terms • Controlled vocabulary: set of terms – use these instead of slight variations • Taxonomy: parent-child relationships • Ontologies: introduce other types of relationships

More Related