1 / 42

Time line and procedures for datasets

Time line and procedures for datasets. BCBC Pre-retreat Workshop Tyson’s Corner, VA May 11, 2011. Topics to cover. Timeline for a dataset from contact to web site Policies to follow and documents to use Ten questions about your dataset Creating a MAGE-TAB document with us

malina
Download Presentation

Time line and procedures for datasets

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Time line and procedures for datasets BCBC Pre-retreat Workshop Tyson’s Corner, VA May 11, 2011

  2. Topics to cover Timeline for a dataset from contact to web site Policies to follow and documents to use Ten questions about your dataset Creating a MAGE-TAB document with us Seeing your dataset on the Beta Cell web site A tool you can use for MAGE-TAB: Annotare

  3. Datasets to Contact us about • Your deliverables • Microarray experiments • High Throughput sequencing experiments (RNA-seq, ChIP-seq, FAIRE-seq, etc.) • RT-PCR screens • Other deliverables – we can discuss how to integrate • Other key datasets • From your lab but from different funding • From the literature

  4. Steps to get a study into Beta Cell • Contact us. Let us know what is coming and when so we can schedule working with you. • Fill out the Ten Questions. When we get this from you, we can generate an initial spreadsheet (MAGE-TAB) for you to complete. • Fill out highlighted areas of the MAGE-TAB. We will go back and forth with you on details to get it right. • Send us your data. We will set up a FTP account for you. Send us the raw data (e.g., Affymetrix CEL files, FASTQ sequence reads) and the processed data that the conclusions are based upon. • Set a release schedule. We will load the dataset and incorporate into queries and web pages as appropriate. We need to set when to release to the BCBC and to the general public. • We can also submit your data to ArrayExpress or, if desired, GEO. • View/Query your dataset. Beta Cell has releases every 3 to 4 months.

  5. Timeline • Completion of MAGE-TAB: • Requires back and forth between the CC and the contact person in the investigator’s lab • Time to completion depends on responsiveness of such a contact person • Until the MAGE-TAB is completed, data loading cannot occur • Data loading: • Once the MAGE-TAB is completed and all necessary files have been delivered, time to load the data depends on the size of your study • For a typical study data loading takes a few weeks • Missing files will delay the process • Keep in mind that when you contact us to submit a study, you will be put in a queue and the process of getting your study into Beta Cell Genomics will start once you reach the top of the queue • Studies that are meant to be viewable on the BCBC website (either by the general public or by BCBC investigators only) have priority over private studies, i.e. a study which is to be kept private will be placed lower in the queue

  6. Policies to follow and documents to use • Ten Questions about your dataset • Available as a BCBC miscellaneous resource • http://www.betacell.org/resources/data/miscellaneous/ • Bioinformatics/Epigenomics Working group • RNA-seq and ChIP-seq recommendations • Includes checklists for data and information to provide • Mike Snyder will provide overview and discuss

  7. Meeting Deliverables • For a study to be considered fully “delivered”, the following is required on the investigator’s part: • Provide answers to the initial 10 questions and all necessary data files • Respond to all inquiries needed to generate an accurate MAGE-TAB • Allow your study to be visible (at least by other BCBC investigators) on the Beta Cell website

  8. Topics to cover Timeline for a dataset from contact to web site Policies to follow and documents to use Ten questions about your dataset Creating a MAGE-TAB document with us Seeing your dataset on the Beta Cell web site A tool you can use for MAGE-TAB: Annotare

  9. MGED Standards What information is needed for a microarray experiment? MIAME: Minimal Information About a Microarray Experiment. Brazma et al., Nature Genetics 2001 How do you “code up” microarray data? MAGE-OM: MicroArray Gene Expression Object Model. Spellman et al., Genome Biology 2002 MAGE-TAB Rayner et al., BMC Bioinformatics 2006 What words do you use to describe a microarray experiment? MO: MGED Ontology. Whetzel et al. Bioinformatics 2006

  10. labelled nucleic acid labelled nucleic acid labelled nucleic acid labelled nucleic acid labeled nucleic acid Gene expression data matrix normalization hybridisation hybridisation hybridisation hybridisation hybridization Array design RNA extract RNA extract RNA extract RNA extract RNA extract Microarray Sample Sample Sample Sample Sample genes array array array array Protocol Protocol Protocol Protocol Protocol Protocol Experiment integration MIAME in a nutshell (ala Alvis Brazma) Stoeckert et al. Drug Discovery Today TARGETS 2004

  11. labelled nucleic acid labelled nucleic acid labelled nucleic acid labelled nucleic acid nucleic acid Gene expression data matrix normalization hybridisation hybridisation hybridisation hybridisation hybridisation Array design RNA extract RNA extract RNA extract RNA extract RNA extract Microarray Sample Sample Sample Sample Sample genes array array array array Protocol Protocol Protocol Protocol Protocol Protocol Experiment integration Sequencing is replacing array technology @HWI-EAS266_0011:8:1:6:969#0/1 GTTTGCCNGTGTGTACGCTACCCCCTTCTTGTGTGTGTGTGTCT +HWI-EAS266_0011:8:1:6:969#0/1 _abb`a[DZ`aabaa_a`b]___^^aa_`aa_a^a[\\aZTZVY @HWI-EAS266_0011:8:1:7:1688#0/1 AAGATGANGGCAGGGTGCAAGATGGCAGGATGCAAGATGGCAGG +HWI-EAS266_0011:8:1:7:1688#0/1 a`^ab`^D\a]a`b``b_bbbaabb^abaa``^a_^_aa\]_VR @HWI-EAS266_0011:8:1:7:593#0/1 CAGTTCANTTCTCAGCACCACACTGGGATGCTCACACATGCCTG +HWI-EAS266_0011:8:1:7:593#0/1 abbbb_VD[bbbba_`bbbbbbbbbbbaa_`bbaabaabb_aa_ @HWI-EAS266_0011:8:1:7:139#0/1 CATGGGGNATAATTGCAATCCCCGATCCCCATCACGAATGGGGT +HWI-EAS266_0011:8:1:7:139#0/1 aab`[^YDY]Z\baa`aabaaaa`aa`a]aa```\aY]^\]ZVX @HWI-EAS266_0011:8:1:7:1390#0/1 GAATAATNGAATAGGACCGCGGTTCTATTTTGTTGGTTTTCGGA +HWI-EAS266_0011:8:1:7:1390#0/1 _U^b_`]D\__a_a`S```Y[a__]a\aa_`]`aTVZ__\HYVX @HWI-EAS266_0011:8:1:7:1663#0/1 TGATGTTNGTGGCAATAATGGGGGTAGCGGCAATGGTGGCGGGG +HWI-EAS266_0011:8:1:7:1663#0/1 a`[_X]\DQTZ[^YYa[[aXV[PZUUYSYBBBBBBBBBBBBBBB

  12. ChiP-Seq MeDIP-Seq Etc. labelled nucleic acid labelled nucleic acid labelled nucleic acid labelled nucleic acid nucleic acid normalization hybridisation hybridisation hybridisation hybridisation hybridisation Array design RNA extract RNA extract RNA extract RNA extract Chromatin, DNA extract Microarray Sample Sample Sample Sample Sample genes array array array array Protocol Protocol Protocol Protocol Protocol Protocol Experiment integration Sequencing is replacing array technology @HWI-EAS266_0011:8:1:6:969#0/1 GTTTGCCNGTGTGTACGCTACCCCCTTCTTGTGTGTGTGTGTCT +HWI-EAS266_0011:8:1:6:969#0/1 _abb`a[DZ`aabaa_a`b]___^^aa_`aa_a^a[\\aZTZVY @HWI-EAS266_0011:8:1:7:1688#0/1 AAGATGANGGCAGGGTGCAAGATGGCAGGATGCAAGATGGCAGG +HWI-EAS266_0011:8:1:7:1688#0/1 a`^ab`^D\a]a`b``b_bbbaabb^abaa``^a_^_aa\]_VR @HWI-EAS266_0011:8:1:7:593#0/1 CAGTTCANTTCTCAGCACCACACTGGGATGCTCACACATGCCTG +HWI-EAS266_0011:8:1:7:593#0/1 abbbb_VD[bbbba_`bbbbbbbbbbbaa_`bbaabaabb_aa_ @HWI-EAS266_0011:8:1:7:139#0/1 CATGGGGNATAATTGCAATCCCCGATCCCCATCACGAATGGGGT +HWI-EAS266_0011:8:1:7:139#0/1 aab`[^YDY]Z\baa`aabaaaa`aa`a]aa```\aY]^\]ZVX @HWI-EAS266_0011:8:1:7:1390#0/1 GAATAATNGAATAGGACCGCGGTTCTATTTTGTTGGTTTTCGGA +HWI-EAS266_0011:8:1:7:1390#0/1 _U^b_`]D\__a_a`S```Y[a__]a\aa_`]`aTVZ__\HYVX @HWI-EAS266_0011:8:1:7:1663#0/1 TGATGTTNGTGGCAATAATGGGGGTAGCGGCAATGGTGGCGGGG +HWI-EAS266_0011:8:1:7:1663#0/1 a`[_X]\DQTZ[^YYa[[aXV[PZUUYSYBBBBBBBBBBBBBBB

  13. From MGED to FGED What information is needed for an HTS experiment? MINSEQE: Minimum Information about a high-throughput SeQuencing Experiment How do you “code up” functional genomics data? MAGE-TAB can still be utlized What words do you use to describe a functional genomics experiment? OBI: Ontology for Biomedical Investigations, incorporates MO

  14. MAGE-TAB Format What is MAGE-TAB? • A simple spreadsheet view consisting of 2 files: • IDF: describing the experiment design, contact details, variables, and protocols • SDRF: a spreadsheet with columns that describe samples, annotations, protocol references, assays, and data • Linked data files (e.g. CEL files) are referenced by the SDRF Where can I get MAGE-TAB from? • ~10,000 MAGE-TAB files are available from ArrayExpress (includes GEO derived and ArrayExpress data) • caArray also provides MAGE-TAB files for download Who is using MAGE-TAB? • BioConductor • GenePattern • MeV • and Beta Cell Genomics!

  15. IDF file for E-TABM-34 IDF = Investigation Description Format

  16. SDRF file for E-TABM-34 SDRF = Sample and Data Relationship Format

  17. IDF A microarray expression study

  18. Experimental Design

  19. OrganismPart black border = biomaterials red border = treatments Following 1 sample: bench component

  20. in-silico component image acquisition feature extraction summarization (feature extraction II) and quantile normalization

  21. SDRF Let’s focus on the highlighted row

  22. From design to MAGE-TAB

  23. From design to MAGE-TAB

  24. Viewing the Annotation

  25. Querying the Annotation

  26. Loading and Analyzing the Data • Image and .CEL files are archived and their location stored in the database • Raw and processed data loaded into the database • Downstream analyses (e.g. differential expression) are performed, generating gene lists • Analysis results loaded into the database

  27. Querying the Data

  28. IDF A ChIP-Seq study

  29. Experimental Design

  30. Bench Component

  31. In-silico Component Ptf1a_s5_seq.txt s5_eland.txt Ptf1a_s5 Ptf1a_peaks Ptf1a_s4_seq.txt s4_eland.txt Ptf1a_s4 Input_s8_seq.txt s8_eland.txt Input_s8 Rbpjl_s6_seq.txt s6_eland.txt Rbpjl_s6 Input_s2_seq.txt s2_eland.txt Input_s2 Rbpjl_peaks Rbpjl_s4_seq.txt s4_eland.txt Rbpjl_s4 cluster generation image acquisition sequencing alignment peak calling

  32. SDRF

  33. Viewing the Annotation

  34. Querying the Annotation

  35. Viewing the Data

  36. Querying the Data

  37. Topics to cover Time line for a dataset from contact to web site Policies to follow and documents to use Ten questions about your dataset Creating a MAGE-TAB document with us Seeing your dataset on the Beta Cell web site A tool you can use for MAGE-TAB: Annotare

  38. Annotare - An open source standalone MAGE-TAB editor Shankar R, Parkinson H, Burdett T, Hastings E, Liu J, Miller M, Srinivasa R, White J, Brazma A, Sherlock G, Stoeckert CJ Jr, Ball CA. Annotare - a tool for annotating high-throughput biomedical investigations and resulting data. Bioinformatics. 2010 Aug 23.

  39. Annotare - an open source MAGE-TAB Editor Annotare is an annotation tool for high throughput gene expression experiments in MAGE-TAB format. Researchers can describe their investigations with the investigators’ contact details, experimental design, protocols that were employed, references to publications, details of biological samples, arrays, and experimental data produced in the investigation.

  40. Annotare Features • Intuitive graphical user interface forms for editing • Ontology support, an inbuilt ontology and web services connectivity to bioportal • Searchable standard templates • Design wizard • Validation module • Mac and Windows Support http://code.google.com/p/annotare/

  41. Annotare Demo • File Gallery: Three different ways to get started • Looking at an existing MAGE-TAB • Form versus sheet view • Using a template • Using the wizard

More Related