The Transcriptome - PowerPoint PPT Presentation

the transcriptome n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
The Transcriptome PowerPoint Presentation
Download Presentation
The Transcriptome

play fullscreen
1 / 35
The Transcriptome
202 Views
Download Presentation
dacia
Download Presentation

The Transcriptome

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. The Transcriptome Gene Discovery Quantitation of Gene Expression Reading: Ch 15.1 BIO520 Bioinformatics Jim Lund

  2. WHY? • The genes (proteins) expressed determine the state of the cell. • Signaling. • Metabolic capabilities. • Differentiation state (cell type). • Response to changes in environment. • Verifies gene predictions. • Transcriptional regulation • Normal vs. abnormal • Conditional expression

  3. Transcriptome Analysis • Gene (transcript) discovery • transcripts • alternative splicing/processing • Transcript assays • Promoter analysis • Transcription Factors • Cellular control networks

  4. Gene Discovery • Inference from genomic DNA • Prokaryotes & fungi OK • cDNA characterization • EST • SAGE

  5. EST (Expressed Sequence Tag) • Sequence cDNA libraries • proportional libraries • subtracted or normalized libraries • Which end? • 5’ or 3’ or Whole

  6. “regular” or proportional Subtracted Miss alternate transcripts normalized Tissue Primer dT vs random Library Type

  7. Ideal cDNAs

  8. “Real” cDNAs

  9. Which end? • Whole cDNA • BEST & HARDEST (Long) • 3’-end • Consistent technically, limited information • 5’end • Coding “identity” highest • 5’ AND 3’ • Good, but technical & informatic challenge

  10. EST Data Analyses • Clustering Analysis • Assemble ESTs into genes. • Alternative splicing forms • Find coding SNPs. • Truncated, unspliced, and junk ESTs can be misleading • Project: Unigene • Program: stackPACK • Frequency analysis • Digital Differential Display • DDD is a computational method for comparing sequence-based gene representation profiles among individual cDNA libraries or pools of libraries.

  11. EST Results (old) • Known genes (30%) • Similarities to other ORFs, ESTs (30%) • Infer Function? • Novel Class (30%,  w/ time)

  12. Typical Progress/Results • Humans • 6,694,833 ESTs • 124,179 clusters (“sets”) • 29,000 sets contain EST and mRNA seqs. • CGAP EST library ”plateau” broken by: • different tissues, different states • normalized libraries

  13. Data Quality Considerations • 99% correct data (1% errors!). • Frameshifts-effects depend on tools • BLASTX tool to “find” frameshifts • How sensitive? • TBLASTX, TBLASTN to “use” in other projects • How sensitive?

  14. Gene Expression Assays • EST (Poor method) • SAGE • Microarray Hybridization • Next Gen Sequencing. • Transcriptional Fusions • GFP, LacZ fusions

  15. Serial Analysis of Gene Expression (SAGE) • Collect mRNA • Isolate short oligomers from each transcript. • Ligate together the oligomers and clone them. • Sequence thousands of clones. • Map the 1x104 – 1x105 oligomers to their genes. • Find which genes are transcribed and their relative expression levels. • http://www.sagenet.org (Vogelstein at JHU)

  16. SAGE technique • Prepare biotin labeled cDNA • Cleave with anchoring enzyme (NlaIII)

  17. SAGE technique • Ligate on linkers • Cleave with tagging enzyme (BsmFI)

  18. SAGE technique • Ligate, PCR, and gel purify ditags (102bp). • Recleave with anchoring enzyme (NlaIII), ligate to form concatemers. • Size select, clone and sequence concatemers.

  19. Colon cancer vs. normal colon epithelium (SAGE)

  20. Microarray Hybridization • Determine gene expression by parallel hybridization of labeled cDNA to DNA attached to a fixed support. • http://cmgm.stanford.edu/pbrown/

  21. Microarray Hybridization • Producing chips • Producing probes / reading arrays • Analyzing and interpreting data

  22. Transcriptional Array orf 1 orf 2 orf 3 1 2 3 3 cm 4 5 6 200 spots 7 8 9 2 40,000 dot/9 cm or Condition 1 Condition 2 > All human genes mRNA mRNA

  23. 1 2 6 8 Transcriptional Array-1 orf 1 orf 2 orf 3 1 2 3 3 cm 4 5 6 200 spots 7 8 9 2 40,000 dot/9 cm or Condition 1 Condition 2 Condition 2 > All human genes mRNA mRNA mRNA

  24. Transcriptional Array-2 orf 1 orf 2 orf 3 1 1 2 2 3 3 3 cm 6 4 5 6 200 spots 7 7 8 8 9 2 40,000 dot/9 cm or Condition 1 Condition 2 > All human genes mRNA mRNA

  25. Microarray Technologies • Spotted arrays (Brown et al.) • Spot arrays on glass slides • PCR fragments • Long (50-70bp) oligo arrays • Synthesis • Affymetrix (www.affymetrix.com) • High density array of 25 bp oligos • Made using light directed oligonucleotide synthesis and photolithography • Agilent, CombiMatrix • Made using light directed oligonucleotide synthesis and mirrors.

  26. Spotted Arrays

  27. Print Quill

  28. Spotted microarray image

  29. Affymetrix photolithographic technology • Lithographic masks are used to either block or transmit light onto specific locations of the array. • The surface is then flooded with a solution containing either adenine, thymine, cytosine, or guanine, and coupling occurs only in those regions on the glass that have been deprotected through illumination. • The coupled nucleotide also bears a light-sensitive protecting group, so the cycle can be repeated. • Microarray is built as the probes are synthesized through repeated cycles of deprotection and coupling. • Typically ends at 25 bps.) • Current arrays have 1.3 million unique features per array.

  30. GeneChip Expression Assay Design

  31. Affymetrix GeneChips: Expression Analysis • Available for humans and model organisms. • Made only by Affymetrix. • Chip designs change slowly. • GeneChips: • Human: 50,000 RefSeq genes and ESTs • C. elegans: 22,500 genes (12/00 genome annotation) • Rat 230: 30,000 genes, ESTs • Yeast: 6100 gene set • Tiling arrays for model organisms • http://affymetrix.com

  32. Quantitation of fluorescence signals (Image to data) • Hybridization, scan in chip image. • Gridding • Determine where the spots are. • Spot intensity and local background determination. • Normalization • Adjust to make the red and green total signal intensities the same. • Gene expression ratio. • Red channel/green channel. • Programs: • ScanAlyze, http://rana.lbl.gov/EisenSoftware.htm • GenePix, http://www.moleculardevices.com/pages/instruments/microarray_main.html

  33. Microarray data Big tables of numbers!

  34. Viewing microarray data Clustergram Scatter plot: log(ch1) vs log(ch2) M vs A: expression levell vs expression change Volcano plot: log(expr) vs p-value