databrary
Download
Skip this Video
Download Presentation
Databrary

Loading in 2 Seconds...

play fullscreen
1 / 51

Databrary - PowerPoint PPT Presentation


  • 153 Views
  • Uploaded on

Databrary . David Millman, NYU • Rick Gilmore, PSU • Dylan Simon, NYU Coalition for Networked Information • CNI Fall 13 December 10, 2013. databrary.org. Key Aims of Databrary project. Build a repository for sharing video Provide tools for scoring video Provide data management tools

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Databrary ' - media


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
databrary
Databrary

David Millman, NYU • Rick Gilmore, PSU • Dylan Simon, NYU

Coalition for Networked Information • CNI Fall 13

December 10, 2013

databrary.org

key aims of databrary project
Key Aims of Databrary project
  • Build a repository for sharing video
  • Provide tools for scoring video
  • Provide data management tools
  • Create policies that enable sharing
  • Transform the culture of developmental science!
key aims of databrary project1
Key Aims of Databrary project
  • Build a repository for sharing video
  • Provide tools for scoring video
  • Provide data management tools
  • Create policies that enable sharing
  • Transform the culture of developmental science!
current funding
Current Funding
  • NIH
    • National Institute of Child Health and Human Development
  • NSF
    • Development & Learning Sciences Program
    • Research and Evaluation on Education in Science and Engineering (REESE)
use cases education teaching
Use cases: Education, teaching
  • I need video clips for teaching
  • I want to illustrate an idea
  • Show the range of behaviors and exceptions
  • Show an excerpt in a talk
use cases pre research
Use cases: Pre-research
  • I want to browse the work in my field
  • I want to know whether a study is worth doing
  • I need preliminary data for grant proposal
  • I need ideas and inspiration
  • I want to replicate, expand on, or review previous work
use cases research
Use cases: Research
  • I want to repurpose videos for new uses
  • Replicate existing work by recoding videos
  • I want to grow my sample size
  • I want to include participants from other contexts and populations
  • I want to conduct integrative analyses
opportunities challenges
Opportunities / Challenges

Raw data re-use

  • The data is video of people participating in experiments.
  • Can be immediately re-used in different domains without mapping or data dictionaries
opportunities challenges1
Opportunities / Challenges

Video contains identifiable data

  • Faces, voices, possibly names & locations
  • De-identified data linked to video becomes identifiable
  • Enabling sharing while protecting privacy
opportunities challenges2
Opportunities / Challenges

Structural consistency

  • No two labs organize material in the same way
  • What data structure works for both contributors and “consumers”?
opportunities challenges3
Opportunities / Challenges

How “open” is it ?

  • Identifiable data
  • Inter-institutional permission clearance
  • Permissions structure / delegation
  • New IRB, sponsored programs standards?
opportunities challenges4
Opportunities / Challenges

Using significant univinfrastructure

  • IT
  • Library
  • IRB
  • OSP
  • Counsel
data sharing model
Data-sharing model

How it works today

data sharing model1
Data-sharing model

Enter Databrary

data sharing model2
Data-sharing model

Sharing with Databrary

data sharing model3
Data-sharing model

New Investigator wants access to Databrary

data sharing model4
Data-sharing model

Browsing, non-research

data sharing model5
Data-sharing model

Conduct Research

innovations insights
Innovations / Insights
  • Seek permission to share from people depicted in recordings
    • Extends informed consent
  • Restrict access to
    • Recordings “permissioned” for sharing
    • Authorized researchers with ethics training
    • Researchers who agree to maintain privacy
databrary release template
Databrary Release Template
  • Sharing ≠ research participation
  • Data privacy
  • Who has access?
  • How long?
  • No compensation
  • Minor assent
  • Levels of sharing
levels of sharing
Levels of sharing
  • Private: No sharing
  • Shared: Sharing only with authorized researchers
  • Excerptable: Sharing + excerpts may be created and shown by authorized researchers to the public
  • Open: Sharing with the public
recording sharing permission
Recording sharing permission
  • All depicted individuals
  • Explicit yes/no boxes
  • Adults and minors
getting permissions right
Getting permissions right
  • Electronically recorded permissions
  • Linked to session- and participant-level metadata
    • Avoid data entry errors
    • Honor participants’ desired release level
  • Spreadsheet template
  • Web-based permission system
a better way
A better way...
  • Why is the Databrary model better?
    • Clear and unambiguous
    • Consent to participate ≠ permission to share data
    • Easier for participants
    • More realistic conceptualization of risk
    • Standardization across contributors via templates
building a user community
Building a user community
  • Users must become Authorized Investigators
    • Designing registration process
  • Investigator Agreement
    • Covers data contributions, non-research, research use/re-use
    • 1.0 will be a web form
  • Institutional sign-off by Authorizing Official
data sharing model6
Data-sharing model

Conduct Research

policy documents
Policy documents
  • Databrary Release Template
  • Investigator Agreement
  • Definitions of terms
  • Data Sharing Manifesto
  • Bill of Rights
  • Best Practices in Data Security
  • http://github.com/databrary/policies/
a data model for databrary
A data model for Databrary
  • Started by organizing around study
    • Different meanings for study: paper, analysis, etc.
    • Tremendous range in size of studies
    • Meaning can change over time
  • Raw data themselves are fixed, constant
    • Begin by collecting raw, session data into datasets
    • Layer analyses, research products on datasets
o rganizational unit session
Organizational unit: Session
  • Data collected at the same time, often single visit
  • Defined by:
    • Date of test
    • Participant release level
  • Contains raw data files (videos, etc)
  • Associated with participant(s), other metadata
what s in a session
What’s in aSession?
  • Like a folder
  • A set of files
  • Collected at a specific time
  • Often a single visit or participant
  • Datafiles, coding spreadsheets layered on later
each file within a session
Each file within a session
  • Name/description
    • Home visit, interview, eye-tracking video, motion-tracking, EEG, ...
  • File format
    • .pdf, .doc, .csv, .mp4, .opf, .mat, ...
  • For video or other time series data
    • Start point in time and length
  • Identifiable (video) or de-identified?
what s in a dataset1
What’s in a dataset?
  • Top-level, binding information (optional)
    • Title and short description
    • Data owners and other users with access
    • Excerpts
    • Procedures, stimuli, blank forms, IRB approvals, and other files
    • Funding information
  • Set of sessions and metadata
how is a dataset organized
How is a dataset organized?
  • Many ways to organize a dataset
  • User-defined groups (labels, tags, annotations)
    • By participants, conditions, visits, tasks, etc.
    • Associated with metadata “measures”
  • Session assigned to arbitrarily many groups
  • Groups specific to a single dataset
main grouping participants
Main grouping: Participants
  • Each group represents a participant
  • Includes any number of user-defined “measures”
    • Participant ID
    • Birthdate, gender, race/ethnicity
    • Geographic location, language, school grade, motor experience, disability, IQ, ...
    • Any other text, dates, numbers, ...
representing datasets as files
Representing datasets as files
  • People organize their own datasets in different ways
  • By using groupings for this organization, can dynamically export/import in many forms
from datasets to studies
From datasets to studies
  • Datasets provide organization for labs
    • Session storage for researchers, labs, and collaborators
    • Like a lab server, only better
  • Studiespresent research data to others
    • Pull from datasets, organize sessions
    • Full control over how research is represented
    • Add additional analyses, coding manuals, spreadsheets, scripts, figures, research products, ...
data ingest contributor role
Data ingest: contributor role
  • Identify data to contribute
  • Determine organizational structure
  • Verify participant sharing permissions
  • Provide additional top-level metadata and files
    • description/abstract
    • resulting publications, funding sources
    • images/figures, procedure documents, stimuli
  • Set and maintain access restrictions
data ingest
Data ingest
  • Organization, upload, and import
    • Enumerate sessions, groupings (participants, etc.), files (in CSV)
    • Collect original videos, best quality available
  • Transcode to standard video formats
    • MPEG-4, H.264, AAC, ffmpeg
  • Gradual transition from hand-curation to self-curation
looking to databrary 1 0
Looking to Databrary 1.0
  • Features
    • Study views and data re-use
    • Search
    • Policy-driven form for user registration
    • Self curation features
    • Automatic upload and transcoding
  • Timeline
    • Private beta early 2014, public release mid 2014
building a community
Building a Community

Creating a community of researchers who share and self-curate

More interesting data

More contributors

More users

key aims of databrary project2
Key Aims of Databrary project
  • Build a repository for sharing video
  • Provide tools for scoring video
  • Provide data management tools
  • Create policies that enable sharing
  • Transform the culture of developmental science!
ad