Databrary
This presentation is the property of its rightful owner.
Sponsored Links
1 / 51

Databrary PowerPoint PPT Presentation


  • 118 Views
  • Uploaded on
  • Presentation posted in: General

Databrary . David Millman, NYU • Rick Gilmore, PSU • Dylan Simon, NYU Coalition for Networked Information • CNI Fall 13 December 10, 2013. databrary.org. Key Aims of Databrary project. Build a repository for sharing video Provide tools for scoring video Provide data management tools

Download Presentation

Databrary

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Databrary

Databrary

David Millman, NYU • Rick Gilmore, PSU • Dylan Simon, NYU

Coalition for Networked Information • CNI Fall 13

December 10, 2013

databrary.org


Key aims of databrary project

Key Aims of Databrary project

  • Build a repository for sharing video

  • Provide tools for scoring video

  • Provide data management tools

  • Create policies that enable sharing

  • Transform the culture of developmental science!


Key aims of databrary project1

Key Aims of Databrary project

  • Build a repository for sharing video

  • Provide tools for scoring video

  • Provide data management tools

  • Create policies that enable sharing

  • Transform the culture of developmental science!


Current funding

Current Funding

  • NIH

    • National Institute of Child Health and Human Development

  • NSF

    • Development & Learning Sciences Program

    • Research and Evaluation on Education in Science and Engineering (REESE)


What users can do with databrary

What Users Can Do with Databrary


Use cases education teaching

Use cases: Education, teaching

  • I need video clips for teaching

  • I want to illustrate an idea

  • Show the range of behaviors and exceptions

  • Show an excerpt in a talk


Use cases pre research

Use cases: Pre-research

  • I want to browse the work in my field

  • I want to know whether a study is worth doing

  • I need preliminary data for grant proposal

  • I need ideas and inspiration

  • I want to replicate, expand on, or review previous work


Use cases research

Use cases: Research

  • I want to repurpose videos for new uses

  • Replicate existing work by recoding videos

  • I want to grow my sample size

  • I want to include participants from other contexts and populations

  • I want to conduct integrative analyses


Opportunities challenges

Opportunities / Challenges

Raw data re-use

  • The data is video of people participating in experiments.

  • Can be immediately re-used in different domains without mapping or data dictionaries


Opportunities challenges1

Opportunities / Challenges

Video contains identifiable data

  • Faces, voices, possibly names & locations

  • De-identified data linked to video becomes identifiable

  • Enabling sharing while protecting privacy


Opportunities challenges2

Opportunities / Challenges

Structural consistency

  • No two labs organize material in the same way

  • What data structure works for both contributors and “consumers”?


Opportunities challenges3

Opportunities / Challenges

How “open” is it ?

  • Identifiable data

  • Inter-institutional permission clearance

  • Permissions structure / delegation

  • New IRB, sponsored programs standards?


Opportunities challenges4

Opportunities / Challenges

Using significant univinfrastructure

  • IT

  • Library

  • IRB

  • OSP

  • Counsel


Enabling sharing of identifiable data

Enabling sharing of identifiable Data


Data sharing model

Data-sharing model

How it works today


Data sharing model1

Data-sharing model

Enter Databrary


Data sharing model2

Data-sharing model

Sharing with Databrary


Data sharing model3

Data-sharing model

New Investigator wants access to Databrary


Data sharing model4

Data-sharing model

Browsing, non-research


Data sharing model5

Data-sharing model

Conduct Research


Innovations insights

Innovations / Insights

  • Seek permission to share from people depicted in recordings

    • Extends informed consent

  • Restrict access to

    • Recordings “permissioned” for sharing

    • Authorized researchers with ethics training

    • Researchers who agree to maintain privacy


Databrary release template

Databrary Release Template

  • Sharing ≠ research participation

  • Data privacy

  • Who has access?

  • How long?

  • No compensation

  • Minor assent

  • Levels of sharing


Levels of sharing

Levels of sharing

  • Private: No sharing

  • Shared: Sharing only with authorized researchers

  • Excerptable: Sharing + excerpts may be created and shown by authorized researchers to the public

  • Open: Sharing with the public


Recording sharing permission

Recording sharing permission

  • All depicted individuals

  • Explicit yes/no boxes

  • Adults and minors


Getting permissions right

Getting permissions right

  • Electronically recorded permissions

  • Linked to session- and participant-level metadata

    • Avoid data entry errors

    • Honor participants’ desired release level

  • Spreadsheet template

  • Web-based permission system


A better way

A better way...

  • Why is the Databrary model better?

    • Clear and unambiguous

    • Consent to participate ≠ permission to share data

    • Easier for participants

    • More realistic conceptualization of risk

    • Standardization across contributors via templates


Building a user community

Building a user community

  • Users must become Authorized Investigators

    • Designing registration process

  • Investigator Agreement

    • Covers data contributions, non-research, research use/re-use

    • 1.0 will be a web form

  • Institutional sign-off by Authorizing Official


Data sharing model6

Data-sharing model

Conduct Research


Who promises what

Who promises what


Who promises what1

Who promises what


Policy documents

Policy documents

  • Databrary Release Template

  • Investigator Agreement

  • Definitions of terms

  • Data Sharing Manifesto

  • Bill of Rights

  • Best Practices in Data Security

  • http://github.com/databrary/policies/


A data model for diverse data sets

A data model for diverse data sets


A data model for databrary

A data model for Databrary

  • Started by organizing around study

    • Different meanings for study: paper, analysis, etc.

    • Tremendous range in size of studies

    • Meaning can change over time

  • Raw data themselves are fixed, constant

    • Begin by collecting raw, session data into datasets

    • Layer analyses, research products on datasets


O rganizational unit session

Organizational unit: Session

  • Data collected at the same time, often single visit

  • Defined by:

    • Date of test

    • Participant release level

  • Contains raw data files (videos, etc)

  • Associated with participant(s), other metadata


What s in a session

What’s in aSession?

  • Like a folder

  • A set of files

  • Collected at a specific time

  • Often a single visit or participant

  • Datafiles, coding spreadsheets layered on later


Each file within a session

Each file within a session

  • Name/description

    • Home visit, interview, eye-tracking video, motion-tracking, EEG, ...

  • File format

    • .pdf, .doc, .csv, .mp4, .opf, .mat, ...

  • For video or other time series data

    • Start point in time and length

  • Identifiable (video) or de-identified?


What s in a dataset

What’s in a dataset?


What s in a dataset1

What’s in a dataset?

  • Top-level, binding information (optional)

    • Title and short description

    • Data owners and other users with access

    • Excerpts

    • Procedures, stimuli, blank forms, IRB approvals, and other files

    • Funding information

  • Set of sessions and metadata


How is a dataset organized

How is a dataset organized?

  • Many ways to organize a dataset

  • User-defined groups (labels, tags, annotations)

    • By participants, conditions, visits, tasks, etc.

    • Associated with metadata “measures”

  • Session assigned to arbitrarily many groups

  • Groups specific to a single dataset


Main grouping participants

Main grouping: Participants

  • Each group represents a participant

  • Includes any number of user-defined “measures”

    • Participant ID

    • Birthdate, gender, race/ethnicity

    • Geographic location, language, school grade, motor experience, disability, IQ, ...

    • Any other text, dates, numbers, ...


Grouping sessions

Grouping sessions


Grouping sessions1

Grouping sessions


Representing datasets as files

Representing datasets as files

  • People organize their own datasets in different ways

  • By using groupings for this organization, can dynamically export/import in many forms


From datasets to studies

From datasets to studies

  • Datasets provide organization for labs

    • Session storage for researchers, labs, and collaborators

    • Like a lab server, only better

  • Studiespresent research data to others

    • Pull from datasets, organize sessions

    • Full control over how research is represented

    • Add additional analyses, coding manuals, spreadsheets, scripts, figures, research products, ...


Fro m datasets t o studies

From datasets to studies


Data ingest contributor role

Data ingest: contributor role

  • Identify data to contribute

  • Determine organizational structure

  • Verify participant sharing permissions

  • Provide additional top-level metadata and files

    • description/abstract

    • resulting publications, funding sources

    • images/figures, procedure documents, stimuli

  • Set and maintain access restrictions


Data ingest

Data ingest

  • Organization, upload, and import

    • Enumerate sessions, groupings (participants, etc.), files (in CSV)

    • Collect original videos, best quality available

  • Transcode to standard video formats

    • MPEG-4, H.264, AAC, ffmpeg

  • Gradual transition from hand-curation to self-curation


System architecture

System Architecture


Looking to databrary 1 0

Looking to Databrary 1.0

  • Features

    • Study views and data re-use

    • Search

    • Policy-driven form for user registration

    • Self curation features

    • Automatic upload and transcoding

  • Timeline

    • Private beta early 2014, public release mid 2014


Building a community

Building a Community

Creating a community of researchers who share and self-curate

More interesting data

More contributors

More users


Key aims of databrary project2

Key Aims of Databrary project

  • Build a repository for sharing video

  • Provide tools for scoring video

  • Provide data management tools

  • Create policies that enable sharing

  • Transform the culture of developmental science!


  • Login