1 / 25

Oryza

Oryza. Arjan van Zeijl Claire Lessa Alvim Kamei Robert van Loo Ruud Heshof. BIF-30806 8-3-2013. Goal. Generate a platform to analyze gene expression of Saccharomyces cerevisiae using RNAseq data.

natan
Download Presentation

Oryza

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Oryza Arjan van Zeijl Claire LessaAlvim Kamei Robert van Loo Ruud Heshof BIF-30806 8-3-2013

  2. Goal • Generate a platform to analyze gene expression of Saccharomyces cerevisiaeusing RNAseq data. • Compare high expressed genes vs. low expressed geneson exon-intron length, GC-content, codon-usage.

  3. MoSCoW MustTopHat, Cufflink ShouldExon-Intron length, GC content CouldGO-annotation, Codon-usage, Palindromes WouldChemostatanalysis, Cytoscape

  4. Pipeline Trimmed RNAseq data TopHat Untrimmed Cufflinks Exon – Intron length GC content Sequence retrieval GO-terms NCBI data Palindrome Codon-usage Validation

  5. Data output RNAseq data Selected Top100 genes per 20% batches of total genes FPKM-value 100 genes 100 genes 100 genes 100 genes 100 genes Perc 1 Perc 2 Perc 3 Perc 4 Perc 5 0-20% 20-40% 40-60% 60-80% 80-100%

  6. NCBI data LOCUS NP_014825 63 aa linear PLN 25-FEB-2013 DEFINITION ribosomal 40S subunit protein S30B [Saccharomyces cerevisiae S288c]. ACCESSION NP_014825 VERSION NP_014825.3 GI:398365605 DBSOURCE REFSEQ: accession NM_001183601.3 KEYWORDS . SOURCE Saccharomyces cerevisiae S288c ORGANISM Saccharomyces cerevisiae S288c Eukaryota; Fungi; Dikarya; Ascomycota; Saccharomycotina; Saccharomycetes; Saccharomycetales; Saccharomycetaceae; Saccharomyces. ...

  7. Exon - Intron length Ribosomal 40S subunit protein S30B ID SHORT EXON INTRON FPKM CDS GC_CDS L_PALIN GC_PALIN YOR182C RPS30B 192 412 15623.7 189 41.27 0 -

  8. GC content Does more GC means more mRNA? Claire

  9. GC content & CDS length

  10. Palindrome • IR: at least 6 bp long, spacers maximum 10 bp • Conservation: IR must be identical, spacer not Comparative genome analysis suggests characteristics of yeast inverted repeats that are important for transcriptional activity (2011) Humphrey-Dixon EL, Sharp R, Schuckers M, Lock R. Genome 54(11):934-42

  11. Palindrome • Comparative analysis in 4 Saccharomyces genomes: • S. cereviseae • S. paradoxus • S. mikatae • S. bayanus IR in S. cereviseae Conserved in the 4 species • Crossed the top 100 gene lists with the palindrome list to create 3 hash tables using the gene ID as keys: • %gene_palin; • %gene_palinseq; • %GC_palin;

  12. Palindrome length

  13. Palindrome length Percentiles 1

  14. Codon usage • Previous studies indicated more extreme codon usage preference in highly expressed genes (Sharp, 1986; Plotkin, 2011) • Codon usage bias was shown to correlate with tRNA abundance (Sharp, 1986) • Non-optimal codons might slow down translation, to allow correct protein folding (Pechmann, 2013) • HOT TOPIC: 2 papers in Nature this week • Non-optimal codon usage is important for circadian clock rhythms

  15. Codon usage • MEASURE: Relative Synonymous Codon Usage (RSCU) • Took mean RSCU over genes in top 100 for each class • Problem annotation: CDS not always dividable by three

  16. Codon usage

  17. GO term enrichment • Long list top 100 • Basically two processes, components, functions • Ribosome and translation related • Glycolysis/gluconeogenesis related • Zoom in on part of the table

  18. Top 100 GO-terms

  19. Blast2GO

  20. Blast2GO

  21. GO terms top 100

  22. KEGG pathways

  23. Validation • Technical validation use 4 paired end RNA-seq reads • Create multiple copies (total 200, each 25 %) • Run pipeline: 5 hits found! (one maps on two homologous gene on two chromosomes) • FPKM values not equal (large length differences), so this is right

  24. Conclusion • High expressed genes have a high chance to contain introns. • There is a correlation between palindrome length and gene expression. • There is a preference for codon usage in highly expressed genes. • Highly expressed genes are richer in GC content and are shorter • Large differences exist in GC, intron/exon, palindromes and in GO termsbetween the top 100 and the rest

  25. Questions

More Related