databrary n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Databrary PowerPoint Presentation
Download Presentation
Databrary

Loading in 2 Seconds...

  share
play fullscreen
1 / 51
media

Databrary - PowerPoint PPT Presentation

161 Views
Download Presentation
Databrary
An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Databrary David Millman, NYU • Rick Gilmore, PSU • Dylan Simon, NYU Coalition for Networked Information • CNI Fall 13 December 10, 2013 databrary.org

  2. Key Aims of Databrary project • Build a repository for sharing video • Provide tools for scoring video • Provide data management tools • Create policies that enable sharing • Transform the culture of developmental science!

  3. Key Aims of Databrary project • Build a repository for sharing video • Provide tools for scoring video • Provide data management tools • Create policies that enable sharing • Transform the culture of developmental science!

  4. Current Funding • NIH • National Institute of Child Health and Human Development • NSF • Development & Learning Sciences Program • Research and Evaluation on Education in Science and Engineering (REESE)

  5. What Users Can Do with Databrary

  6. Use cases: Education, teaching • I need video clips for teaching • I want to illustrate an idea • Show the range of behaviors and exceptions • Show an excerpt in a talk

  7. Use cases: Pre-research • I want to browse the work in my field • I want to know whether a study is worth doing • I need preliminary data for grant proposal • I need ideas and inspiration • I want to replicate, expand on, or review previous work

  8. Use cases: Research • I want to repurpose videos for new uses • Replicate existing work by recoding videos • I want to grow my sample size • I want to include participants from other contexts and populations • I want to conduct integrative analyses

  9. Opportunities / Challenges Raw data re-use • The data is video of people participating in experiments. • Can be immediately re-used in different domains without mapping or data dictionaries

  10. Opportunities / Challenges Video contains identifiable data • Faces, voices, possibly names & locations • De-identified data linked to video becomes identifiable • Enabling sharing while protecting privacy

  11. Opportunities / Challenges Structural consistency • No two labs organize material in the same way • What data structure works for both contributors and “consumers”?

  12. Opportunities / Challenges How “open” is it ? • Identifiable data • Inter-institutional permission clearance • Permissions structure / delegation • New IRB, sponsored programs standards?

  13. Opportunities / Challenges Using significant univinfrastructure • IT • Library • IRB • OSP • Counsel

  14. Enabling sharing of identifiable Data

  15. Data-sharing model How it works today

  16. Data-sharing model Enter Databrary

  17. Data-sharing model Sharing with Databrary

  18. Data-sharing model New Investigator wants access to Databrary

  19. Data-sharing model Browsing, non-research

  20. Data-sharing model Conduct Research

  21. Innovations / Insights • Seek permission to share from people depicted in recordings • Extends informed consent • Restrict access to • Recordings “permissioned” for sharing • Authorized researchers with ethics training • Researchers who agree to maintain privacy

  22. Databrary Release Template • Sharing ≠ research participation • Data privacy • Who has access? • How long? • No compensation • Minor assent • Levels of sharing

  23. Levels of sharing • Private: No sharing • Shared: Sharing only with authorized researchers • Excerptable: Sharing + excerpts may be created and shown by authorized researchers to the public • Open: Sharing with the public

  24. Recording sharing permission • All depicted individuals • Explicit yes/no boxes • Adults and minors

  25. Getting permissions right • Electronically recorded permissions • Linked to session- and participant-level metadata • Avoid data entry errors • Honor participants’ desired release level • Spreadsheet template • Web-based permission system

  26. A better way... • Why is the Databrary model better? • Clear and unambiguous • Consent to participate ≠ permission to share data • Easier for participants • More realistic conceptualization of risk • Standardization across contributors via templates

  27. Building a user community • Users must become Authorized Investigators • Designing registration process • Investigator Agreement • Covers data contributions, non-research, research use/re-use • 1.0 will be a web form • Institutional sign-off by Authorizing Official

  28. Data-sharing model Conduct Research

  29. Who promises what

  30. Who promises what

  31. Policy documents • Databrary Release Template • Investigator Agreement • Definitions of terms • Data Sharing Manifesto • Bill of Rights • Best Practices in Data Security • http://github.com/databrary/policies/

  32. A data model for diverse data sets

  33. A data model for Databrary • Started by organizing around study • Different meanings for study: paper, analysis, etc. • Tremendous range in size of studies • Meaning can change over time • Raw data themselves are fixed, constant • Begin by collecting raw, session data into datasets • Layer analyses, research products on datasets

  34. Organizational unit: Session • Data collected at the same time, often single visit • Defined by: • Date of test • Participant release level • Contains raw data files (videos, etc) • Associated with participant(s), other metadata

  35. What’s in aSession? • Like a folder • A set of files • Collected at a specific time • Often a single visit or participant • Datafiles, coding spreadsheets layered on later

  36. Each file within a session • Name/description • Home visit, interview, eye-tracking video, motion-tracking, EEG, ... • File format • .pdf, .doc, .csv, .mp4, .opf, .mat, ... • For video or other time series data • Start point in time and length • Identifiable (video) or de-identified?

  37. What’s in a dataset?

  38. What’s in a dataset? • Top-level, binding information (optional) • Title and short description • Data owners and other users with access • Excerpts • Procedures, stimuli, blank forms, IRB approvals, and other files • Funding information • Set of sessions and metadata

  39. How is a dataset organized? • Many ways to organize a dataset • User-defined groups (labels, tags, annotations) • By participants, conditions, visits, tasks, etc. • Associated with metadata “measures” • Session assigned to arbitrarily many groups • Groups specific to a single dataset

  40. Main grouping: Participants • Each group represents a participant • Includes any number of user-defined “measures” • Participant ID • Birthdate, gender, race/ethnicity • Geographic location, language, school grade, motor experience, disability, IQ, ... • Any other text, dates, numbers, ...

  41. Grouping sessions

  42. Grouping sessions

  43. Representing datasets as files • People organize their own datasets in different ways • By using groupings for this organization, can dynamically export/import in many forms

  44. From datasets to studies • Datasets provide organization for labs • Session storage for researchers, labs, and collaborators • Like a lab server, only better • Studiespresent research data to others • Pull from datasets, organize sessions • Full control over how research is represented • Add additional analyses, coding manuals, spreadsheets, scripts, figures, research products, ...

  45. From datasets to studies

  46. Data ingest: contributor role • Identify data to contribute • Determine organizational structure • Verify participant sharing permissions • Provide additional top-level metadata and files • description/abstract • resulting publications, funding sources • images/figures, procedure documents, stimuli • Set and maintain access restrictions

  47. Data ingest • Organization, upload, and import • Enumerate sessions, groupings (participants, etc.), files (in CSV) • Collect original videos, best quality available • Transcode to standard video formats • MPEG-4, H.264, AAC, ffmpeg • Gradual transition from hand-curation to self-curation

  48. System Architecture

  49. Looking to Databrary 1.0 • Features • Study views and data re-use • Search • Policy-driven form for user registration • Self curation features • Automatic upload and transcoding • Timeline • Private beta early 2014, public release mid 2014

  50. Building a Community Creating a community of researchers who share and self-curate More interesting data More contributors More users