Data data everywhere
This presentation is the property of its rightful owner.
Sponsored Links
1 / 33

Data, Data Everywhere…. PowerPoint PPT Presentation


  • 82 Views
  • Uploaded on
  • Presentation posted in: General

Data, Data Everywhere…. September 8, 2011 The Coalition for Academic Scientific Computation José-Marie Griffiths, PhD Vice President for Academic Affairs Bryant University, Smithfield, Rhode Island. Concerns of Research Administrators.

Download Presentation

Data, Data Everywhere….

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Data data everywhere

Data, Data Everywhere….

September 8, 2011

The Coalition for Academic Scientific Computation

José-Marie Griffiths, PhD Vice President for Academic Affairs

Bryant University, Smithfield, Rhode Island


Concerns of research administrators

Concerns of Research Administrators

  • Strong advocates of research and its dissemination to as wide a set of audiences as possible.

  • Most concerns today relate to current economic trends and uncertainties.

  • Long been concerned about overhead costs (which are increasing) and the cap on administrative costs.


Concerns of research administrators 2

Concerns of Research Administrators - 2

  • Concerns about policies translating into “unfunded mandates” (like recently proposed financial reporting requirements to track all federal funding).

  • Increasingly concerned about roles, responsibilities, and liabilities.

  • Size matters!


Data data everywhere

Taking AIMat Data Lifecycle Management: Access,

Integrity,

Mediation


Data policy task force

Data Policy Task Force

  • Established at the February 3-4, 2010 NSB meeting

  • Charge: further defining the issues and outlining possible options to make the use of data more effective in meeting NSF's mission.


Data policy task force strategies

Data Policy Task Force Strategies

  • Monitor the impact of NSF updated implementation of the Data Management Plan requirement to inform a review of NSF policy

  • Considering issues of data policy, Open Data movements, and related issues, the Task Force will then develop a "Statement of Principles.”

  • Provide guidance to subsequent Board efforts to develop specific actionable policy recommendations focused, initially, on NSF, but that could potentially promulgate through other Federal agencies in a national and international context.


Nsb task force on data policy statement of principles

NSB Task Force on Data PolicyStatement of Principles

  • Openness and transparency are critical to continued scientific and engineering progress and to building public trust in the nation’s scientific enterprise.

    • This applies to all materials necessary for verification, replication and interpretation of results and claims, associated with scientific and engineering research.

  • Open Data sharing is closely linked to Open Access publishing and they should be considered in concert.

  • The nation’s science and engineering enterprise consists of a broad array of stakeholders, all of which should participate in the development and adoption of policies and guidelines.


Nsb task force on data policy statement of principles 2

NSB Task Force on Data PolicyStatement of Principles - 2

  • It is recognized that standards and norms vary considerably across scientific and engineering fields and such variation needs to be accommodated in the development and implementation of policies.

  • Policies and guidelines are needed for open data sharing which in turn requires active data management.

  • All data and data management policies must include clear identification of roles, responsibilities and resourcing.


Nsb task force on data policy statement of principles 3

NSB Task Force on Data PolicyStatement of Principles - 3

  • The rights and responsibilities of investigators are recognized. Investigators should have the opportunity to analyze their data and publish their results within a reasonable time.


Nsb expert panel discussion on data policies

NSB Expert Panel Discussion on Data Policies

  • March 28-29, 2011

  • Arlington, VA

  • Participants included:

    • Over 30 experts/research administrators

    • 7 NSB members

    • 4 NSF Directors/Staff


A ccess i ntegrity m ediation

Access, Integrity, Mediation

  • Access – “what goes in must beable to come out!”

  • Integrity – “what goes in must be the same thing that comes out!”

  • Mediation – “what goes in is going to need help coming out!”


Key areas emerging from the expert panel discussion on data policies march 2011

Key Areas Emerging from theExpert Panel Discussion on Data PoliciesMarch, 2011

  • ACCESS

    • Standards and interoperability enable data-intensive science.

    • Data sharing is an identified priority.

  • INTEGRITY

    • Recognize and support computational and data-intensive science as a discipline.

  • MEDIATION

    • Storage, preservation, and curation of data are critical to data sharing and management (data stewardship).

    • Cyberinfrastructure is necessary to support data-intensive science.


Access

Access

Integrity

Mediation

What goes in must be able to come out!

Access


Key areas national science board expert panel discussion on data policies march 2011

Key Areas - National Science BoardExpert Panel Discussion on Data PoliciesMarch, 2011

  • ACCESS

    • Standards and interoperability enable data-intensive science.

    • Data sharing is an identified priority.


Standards and interoperability enable data intensive science

Standards and interoperability enable data-intensive science.

  • Citation and attribution norms

    • Need new norms and practices

    • Data producers, software & tool developers, data curators get credit for their work

  • Interoperability standards

    • To enable sharing & interoperability across disciplines and internationally

  • Development of persistent identifiers

    • To enable tracking of provenance

    • Ensure data integrity (see next section)

    • Facilitate citation & attribution


Interoperability sooner rather than later

Interoperability - sooner rather than later


Data sharing is an identified priority

Data sharing is an identified priority.

  • Must balance privacy concerns and data access for sharing and re-use.

  • Acknowledge disciplinary cultures while establishing a culture of sharing across all research communities.

  • Must promote & reward exemplary data management projects & plans.

  • Data availability must be timely – issues of embargoes and restricted use durations.


Integrity

Access

Integrity

Mediation

What goes in must be the same thing that comes out!

Integrity


Recognize and support computational and data intensive science as a discipline

Recognize and support computational and data-intensive science as a discipline.

  • Recognize & reward computational & data scientists & curators: funding, tenure, etc.

  • Support training in computational science

  • Reward international collaborations to develop cyberinfrastructure, data stewardship, interoperability, international sharing

  • New funding/economic models to support processing, storing, archiving, maintaining data sets.

  • Need to define who is responsible for what – funding agencies/publishers versus research communities


Data data everywhere

Office of Research Integrity, U.S. Department of Health and Human Services: Key Components of Data Lifecycle Management

Guidelines for Responsible Data Management in Scientific Research, ori.hhs.gov/education/products/clinicaltools/data.pdf


Planning for preservation over the data life cycle

Planning for Preservation over the Data Life Cycle

1

2

3

4

5

6

7

Proposal Planning and Writing

Project

Start-up and Data Management

Data Collection and File Creation

Data Analysis

Preparing Data for Sharing

Depositing Data

After-Deposit Archival Activities

  • Anticipate archiving costs and challenges

  • Create a data management plan

  • Follow best practices for data and documentation

  • Manage master datasets and work files

  • Determine file formats to deposit

  • Comply with dissemination standards and formats

  • Set up support for data users

Courtesy of Cole Whiteman, ICPSR


Integrity concerns for research institutions

Integrity Concerns for Research Institutions

  • What to share - raw, processed, analyzed datasets, instruments, calibration and environmental records, analytical tools, etc.

  • Processes for and costs of long-term curation of data


Mediation

Access

Integrity

Mediation

What goes in is going to need help coming out!

Mediation


Data data everywhere

Storage, preservation, and curation of data are critical to data sharing and management (data stewardship)

  • Funding agencies must commit to ongoing financial support for repositories (no “orphans”)

  • Standardized curatorial mechanisms

  • Strategic partnerships between stakeholder communities and data repositories, supported by funders

  • Define roles of different types of digital repositories

  • Possibly independent auditing of data repositories to ensure data quality, access, interoperability


Cyberinfrastructure is necessary to support data intensive science

Cyberinfrastructure is necessary to support data-intensive science

  • Geographic distribution of research teams, computing resources and datasets requires robust cyberinfrastructure

  • Must include shared applications for analysis, visualization and simulation

  • Standardization for interoperability & accessibility

  • Need capital investment in cyberinfrastructure

  • Need to define appropriate ratio of infrastructure to research funding


Mediation is needed at data collection analysis and use

Mediation is Needed at Data Collection, Analysis and Use

  • GioWeiderhold, Stanford: When there is high intensity of interaction with any of these elements, it makes sense to have multiple mediators (e.g. replicate repositories)

Use

Use

Use

Use

Analysis

Analysis

Analysis

Analysis

Repository 4

Repository 2

Repository 3

Repository 1

Collected Data Set B

Collected Research Data Set A


Informal and formal mediation

Informal and Formal Mediation

  • Mediation at “Use” level is informal and pragmatic

  • Mediation at “Repository” and “Analysis” level needs to be formal with domain/expert control*

Use

Use

Use

Use

Informal, pragmatic mediation

Analysis

Analysis

Analysis

Analysis

Formal mediationwith domain/expert control

Repository 2

Repository 4

Repository 3

Repository 1

Collected Data Set B

Collected Research Data Set A

*GioWeiderhold, Stanford, 1995


Data data everywhere

Stakeholders – Multiple Players, Inter-relationships


Data data everywhere

  • For data to be discoverable, must have a shared overlay of interdisciplinary and technological connections


Data data everywhere

This….or….This?


Data data everywhere

This….or….This?


Jos marie griffiths ph d

José-Marie Griffiths, Ph.D.

  • Vice President for Academic AffairsBryant University1150 Douglas PikeSmithfield, RI 02917(401) [email protected]@gmail.com


  • Login