1 / 8

Grappling with Data Management Plans

N ote: this presentation was ‘partially prepared’ to support the panel discussion, but were not used. So, please contact me if they are unclear or you’d like to know what I was trying/intending to say. Diane Oerly. Grappling with Data Management Plans.

bikita
Download Presentation

Grappling with Data Management Plans

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Note: this presentation was ‘partially prepared’ to support the panel discussion, but were not used. So, please contact me if they are unclear or you’d like to know what I was trying/intending to say. Diane Oerly Grappling with Data Management Plans Panel Presentation for GPN Annual Meeting, June 2, 2011 Diane Oerly, Division of IT and Office of Research University of Missouri OerlyD@missouri.edu

  2. Grappling with Data Management Plans Background and Culture of the University of Missouri Enrollment (FS2010) 24,900 undergrad, 32,415 total (7,515 out of state, 1,699 intl) Land-grant as well as major research university complexity 345 buildings on 1,250 acres on main campus (19,524 acres statewide) 311 Degrees & certificates offered (93 bachelors, 72 doctoral) Schools of Med & Nursing and large health sciences complex on campus History of Investigators helping fund centrally-shared resources - with exceptions. Significant emphasis on involving undergrad students in research.

  3. Grappling with Data Management Plans Significant security and “effective use” implications of Health Care data.

  4. Grappling with Data Management Plans • Selected Areas of Research – from MRI Proposal, January, 2011 • High-Throughput Sequence Assembly and Analysis • Generate ~186 GB of sequence per week and the instrument is committed 4 weeks in advance. Difficult to assess the total need; however each experiment generates at least ~1 TB of data and requires ~280 CPU hours of processing time every 2.5 days. • 3rd tier analyses significantly increase these processing and storage requirements. For example on the current aging clusters, a linkage analysis with 50,000 SNPs scored in 32,000 individuals consumed over 1.9 million CPU hours of processing. • Structural Bioinformatics – Prediction, Retrievals, and Interactions • The large-scale annotation of the structures and interactions of the proteins in plant genomes will require about 20 TB of disk space for both regular storage and backup. • Will use computational and data storage equipment on a daily basis • Large-Scale and High-Throughput Plant Phenotype Analysis • Tens of thousands of images. Archiving these high resolution images (avg. 8 MB) requires > 80GB per day. It takes ~ 1Tflop for each image analysis

  5. Grappling with Data Management Plans • Selected Areas of Research – Continued. • Molecular Mimicry in Inter-Species Interactions • It is expected processing of a host-pathogen pair will take up to 100 Petaflops and 10-20 GB of storage. It is estimated >100 host-pathogen pairs must be analyzed • Visualization and Parallelism of Informatics Data • The clustering of a typical genome, say of about 37,000 genes, requires ~20 GB of RAM and ~4 Tflops for processing. High-res MRI-based brain structure analysis takes ~1Tflop for processing and about 40GB of storage. • Geospatial Informatics for Biology, Ecology, and Environmental Researches • Typical imagery data, 0.5 m GSD (ground sample distance), requires ~10 MB of storage per square km of image coverage. Intermediate processing increases the data stored roughly 800%. Multi-date change detection requires ~1.7 Tflop per square kilometer. Area and object processing for content-based retrieval from single date imagery requires ~1.6 Tflops per square km

  6. Grappling with Data Management Plans • Implementation of open-source DSpace, went live August 2008, recently upgraded to Version 1.7.1 • URL: mospace.umsystem.edu • Content includes summits and UM-System wide endeavors • Student content as well as faculty and staff - includes thesis and dissertations • Emphasis on content not available elsewhere • Collection currently holds 9,425 unique items

  7. Grappling with Data Management Plans • Collaborative effort between Division of IT, MU Libraries and the University of Missouri Library Systems. • Libraries' crucial role in preserving research and scholarship in all forms and making knowledge accessible for future scholars. • Part of international open access movement. • Helping authors not sign away their intellectual property and fulfill their public dissemination obligations.

  8. Grappling with Data Management Plans MOspace is of course, not the ultimate solution – but we are working to collect and store research data sets that accompany publication in MOspace. See Relevant Library guides for additional information: http://libraryguides.missouri.edu search for Data Management and for MOspace Local Resources at MU http://research.missouri.edu/funding/files/nsf_data_management_local_resources.pdf Recommended Elements of NSF Data Management Plan http://research.missouri.edu/funding/files/nsf_data_management_recommended_elements.pdf

More Related