1 / 14

Data Analysis Project

Data Analysis Project. Advanced Bioinformatics BIF-30806 2013. Set Up. Basic and Advanced Project Available data sets Deliverables Literature Groups Schedule week 3 & 4. Purpose. Build software pipeline to perform a transcriptome analysis

Download Presentation

Data Analysis Project

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Analysis Project Advanced Bioinformatics BIF-30806 2013

  2. Set Up • Basic and Advanced Project • Available data sets • Deliverables • Literature • Groups • Schedule week 3 & 4

  3. Purpose • Build software pipeline to perform a transcriptome analysis • Code to connect tools and do input/output conversions • Code developed on certain data set, but should be able to run on different input (e.g. different species)

  4. Basic Project • Which are the most highly expressed genes (top 100) in your species of interest under a single condition (or in a single tissue)? • Can you find a correlation between gene expression and transcript properties, such as GC content, transcript length, intron length, codon usage, or others? • [Optional] Can you visualize the highly expressed genes in an interaction network? TOOLS: Tophat, cufflinks, perl scripts, and possibly others.

  5. Why?

  6. Advanced Project • Which transcripts/genes show differential expression under both conditions? • Can you find out what the functions of these genes are? • Can you give a biological explanation of why these genes are differentially expressed under the conditions in your experiment? • [Optional] In your data set, can you find modules of co-expressed genes? Try to use the WGCNA package. • [Optional] Can you find a functional description and explanation for the identified modules? • [Optional] To what extent are the modules conserved in a closely related species? TOOLS: Tophat, cufflinks, cuffdiff, WGCNA, perl scripts, and possibly others

  7. Why?

  8. You have a choice • Start on basic or advanced project • Of cour se the basic project can be extended with elements of the advanced project • Group members should talk to each other and discuss their choice with Harm/Sandra.

  9. Deliverables per group • Pipeline code, all input/output has to be stored in the “group directory” at the server • Final presentation (20 minutes) • Each group member must prepare and presents some slides (5 min per person)

  10. Deliverables per person • Project report • All the work done in the project (intro, M&M, results, discussion/conclusion) • Appendix A: your contribution to the group effort • Appendix B: personal reflection on the project • Contribution to group presentation • Prepare and present some slides (5 min per person) • The code that you have written

  11. Data • On server: /course/project/ • Arabidopsis • Yeast • Other data/species of your choice • Use for example NCBI Short Read Archive (SRA)

  12. Literature • See course website

  13. Groups • See course website

  14. Schedule week 3 & 4 • Presentations • Tue (26-2) afternoon: presenting project plan • Fri (1-3) afternoon: presenting progress • Fri (8-3) all day: final presentation • Deadline report & code • Sunday March 10, 23:59 • So, your report has to be in before Monday! • Email your report to “project@bioinformatics.nl”

More Related