1 / 13

Scientific Data: A View from the US

Scientific Data: A View from the US. George O. Strawn nitrd.gov. Caveat auditor. The opinions expressed in this talk are those of the speaker, not the U.S. government. Three faces of data. Big Data research initiative Open access becomes the default for U.S. government data

paley
Download Presentation

Scientific Data: A View from the US

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Scientific Data:A View from the US • George O. Strawn • nitrd.gov

  2. Caveat auditor • The opinions expressed in this talk are those of the speaker, not the U.S. government

  3. Three faces of data • Big Data research initiative • Open access becomes the default for U.S. government data • Public access mandated for "scientific results" supported by the U.S. government

  4. Big Data • White House, multi-agency research initiative • Basic research, Disciplinary data, Education and training, Prizes and competitions • Joint solicitation by NIH and NSF • NIH: BD2K program, Associate director for Data Science

  5. Data.gov • Open access to U.S. government data • "Voluntary" data.gov participation has yielded ~100,000 data sets to date • A new version of data.gov to be unveiled soon utilizes CKAN, "an open source data portal"

  6. Public Access to Scientific Results • Both journal articles and data • Public access to journal articles pioneered by Harold Varmus at NIH • Semantic access to Medline abstracts pioneered by Tom Rindflesch at NLM

  7. Public Access to Scientific Data • Federal agencies have submitted their "initial plans" for public access to scientific data to OSTP • NITRD may host a series of talks by the agencies on their data access plans • Plans for articulating USG scientific data and USG-supported scientific results still in process

  8. Some issues regarding data access • Disciplinary versus multi-disciplinary, agency versus multi-agency repositories • Plain (human) access versus semantic (machine) access • A general digital object architecture? • Degrees of openness

  9. Digital Object Architecture • An "hour glass" for data? (As the Internet was an hour glass for networks: TCP/IP at the narrow point; many applications above, many implementations below)

  10. Digital Object Data Model & Protocol Logical interface to heterogeneous information management and storage systems Built-in strong authentication and encryption Digital Object Repository Implements the digital object data model and protocol Portal into multiple info and storage systems Security is at the object level & objects can be securely shared Current version successfully used by industry and government Handle System Highly scalable identifier resolution system for digital objects Provides referential integrity as objects move and environments change Proven and in wide use Digital Object Registry Manages metadata records about resources Assigns handles to metadata records and resources Normalizes organizational boundaries through commonly agreed API’s and metadata models Digital Object Architecture

  11. Measuring openness • Ease of discovery (googleable, etc) • Ease of use • Extent of reusability. • Legal matters (eg, CC license, derived works friendly, etc)

  12. Sustainability • Could we duplicate the Internet story? • Public investments create a new activity • The new activity leads to a new industry • The new industry leads to novel use cases

  13. In conclusion • Data Intensive Science aspirations are here • Data Intensive Science is slowly emerging • One result will be to make the scientific record into a first class scientific object

More Related