1 / 20

Identification and analysis of differentially expressed genes in Saccharomyces cerevisiae .

Identification and analysis of differentially expressed genes in Saccharomyces cerevisiae . Group Populus : Petra van Berkel Casper Gerritsen Astri Herlino Brian Lavrijssen. Dataset of S. cerevisiae. Data generated by Nookaew et al (2012) Two conditions:

carrington
Download Presentation

Identification and analysis of differentially expressed genes in Saccharomyces cerevisiae .

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Identification and analysis of differentially expressed genes in Saccharomyces cerevisiae. Group Populus: Petra van Berkel Casper Gerritsen AstriHerlino Brian Lavrijssen

  2. Dataset of S. cerevisiae • Data generated by Nookaewet al (2012) • Two conditions: • Glucose excess (Batch) & Glucose limited (Chemostat) • 3 Biological replicates per condition • RNA-seq data: • 12 Files • 3 Sets of Paired-end reads per condition • Pipeline for differential gene expression analysis

  3. TopHat – Cufflinks analysis • Protocols based on Trapnellet al (2012) • 75% of reads mapped • Plots based on Cuffdiff gene expression output

  4. Cuffdiff output 5800 genes with FPKM values Q-value threshold based on Nookaewet al (2012)

  5. Validation of TopHat - Cufflinks • Validation of selection • Using Excel • Literature study • Boer et al (2003) • Influence of C, N, P and S limitation • Microarray analysis • > 68 out of 151 significantly upregulated • > 9 out of 33 significantly downregulated • More or less same genes found in other papers

  6. Expression network up • Up regulated genes • mrnet method in R • Number of Nodes = 57 • Number of Edges = 1560

  7. Expression network down • Down regulated genes • mrnet method in R • Number of Nodes = 33 • Number of Edges = 513

  8. GO Terms and GO Enrichment R version 2.15.0 (2012-03-30) • Packages: • biomaRt: Ensembl gene 69, S. cerevisiaeEF3 • org.Sc.sgd.db • GOstats • Rgraphviz • GO enrichment: • 8419 genes in the universe (org.Sc.sgdPMID2ORF) • Threshold: p-value < 10-4

  9. GO Terms • Down regulated 32 genes  29 genes with 208 GO terms (3 genes are not annotated) • Up regulated 133 genes  113 genes with 855 GO terms (20 genes are not annotated)

  10. GO Enrichment • Down regulated • Biological process: not found • Up regulated

  11. Biological process of up regulated genes

  12. Validation: Yeast genome database • Problem: • Not well annotated because the biomaRt was not updated to Ensembl gene 70, S. cerevisiaeEF4

  13. Top 100 • gffread: make the transcripts fasta file • Determine the top 100 highest and lowest expressed genes for the two conditions • R: order cuffdiff output on FPKM value (4 files) • Take out the genes with FPKM = 0

  14. Top 100 • Top genes: • G3P dehydrogenase, • F16P aldolase, • Ribosomal subunit protein • Bottom genes: • dubious transcript, • retro transposon, • etc..

  15. GC-content & transcript length • Determine GC-content and transcript length • Import top 100 genes files • For each file check the genes in top 100 file in transcripts.fa and count GC content and the transcript length

  16. GC-content & transcript length • Highly expressed in batch: • Length: 515.19 GC: 0.43 • Lowly expressed in batch: • Length: 831.46 GC: 0.41 • Highly expressed in chemostat: • Length: 556.65 GC: 0.43 • Lowly expressed in chemostat: • Length: 727.29 GC: 0.41

  17. GC-content & transcript length • Short sequence length! • mainly in highly expressed genes, gives unrealistic view of codon usage and intron length • These are often ribosomal subunit proteins

  18. Intron length • Genes.gtf as input • Create an indexfile • Look for the interesting genes • Print them to an outputfile • Calculate average

  19. Codon usage • Method (perl script): • Input are top high and low expressed genes • Build gene ID list and codons list and retrieve sequences • Count codon usage and calculate RSCU and average RSCU

  20. Conclusion • The up and down regulated genes are involved in carbon metabolism • Highly expressed genes are involved in carbon metabolism or are ribosomal subunit proteins

More Related