1 / 36

Microarray analysis

Microarray analysis. Quantitation of Gene Expression Expression Data to Networks. Reading: Ch 16. BIO520 Bioinformatics Jim Lund. Microarray data. Image quantitation. Normalization Find genes with significant expression differences Annotation

fharkins
Download Presentation

Microarray analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Microarray analysis Quantitation of Gene Expression Expression Data to Networks Reading: Ch 16 BIO520 Bioinformatics Jim Lund

  2. Microarray data • Image quantitation. • Normalization • Find genes with significant expression differences • Annotation • Clustering, pattern analysis, network analysis

  3. Sources of Non-Biological Variation • Dye bias: differences in heat and light sensitivity, efficiency of dye incorporation • Differences in the amount of labeled cDNA hybridized to each channel in a microarray experiment (Channel is used to refer to a combination of a dye and a slide.) • Variation across replicate slides • Variation across hybridization conditions • Variation in scanning conditions • Variation among technicians doing the lab work.

  4. Factors which impact on the signal level • Amount of mRNA • Labeling efficiencies • Quality of the RNA • Laser/dye combination • Detection efficiency of photomultiplier or CCD

  5. Hela HepG2

  6. Hela HepG2

  7. M vs. A Plot M =Log (Red -Log Green A = (Log Green+Log Red) / 2

  8. M v A plots of chip pairs: before normalization

  9. M v A plots of chip pairs: after quantile normalization

  10. Types of normalization • To total signal (linear normalization) • LOESS (LOcally WEighted polynomial regreSSion). • To “house keeping genes” • To genomic DNA spots (Research Genetics) or mixed cDNA’s • To internal spikes

  11. Microarray analysis • Data exploration: expression of gene X? • Statistical analysis: which genes show large, reproducible changes? • Clustering: grouping genes by expression pattern. • Knowledge-based analysis: Are amine synthesis genes involved in this experiment?

  12. Fold change: the crudest method of finding differentially expressed genes Hela HepG2 >2-fold expression change >2-fold expression change

  13. Distribution of measurements for gene of interest Probability of a given Value of the ratio What do we mean by differentially expressed? • Statistically, our gene is different from the other genes. Distribution of average ratios for all genes Number of genes Log ratio

  14. Probe Signal Sample A Sample B Finding differentially expressed genesWhat affects our certainty that a gene is up or down-regulated? • Number of sample points • Difference in means • Standard deviations of sample

  15. Practical views on statistics • With appropriate biological replicates, it is possible to select statistically meaningful genes/patterns. • Sensitivity and selectivity are inversely related - e.g. increased selection of true positives WILL result in more false positive and less false negatives. • False negatives are lost opportunities, false positives cost $’s and waste time. • A typical set of experiments treated with conservative statistics typically results in more genes/pathways/patterns than one can sensibly follow - so use conservative statistics to protect against false positives when designing follow-on experiments.

  16. Statistical Tests • Student’s t-test • Correct for multiple testing! (Holm-Bonferroni) • False discovery rate. • Significance Analysis of Microarrays (SAM) • http://www-stat.stanford.edu/~tibs/SAM/ • ANOVA • Principal components analysis • Special methods for periodic patterns in data.

  17. Volcano plot: log(expr) vs p-value p-value Log(fold change)

  18. Scatter plot showing genes with significant p-values

  19. Pattern finding • In many cases, the patterns of differential expression are the target (as opposed to specific genes) • Clustering or other approaches for pattern identification - find genes which behave similarly across all experiments or experiments which behave similarly across all genes • Classification - identify genes which best distinguish 2 or more classes. • The statistical reliability of the pattern or classifier is still an issue and similar considerations apply - e.g. cluster analysis of random noise will produce clusters which will be meaningless….

  20. What is clustering? • Group similar objects together. • Genes with similar expression patterns. • Objects in the same cluster (group) are more similar to each other than objects in different clusters.

  21. Clustering • What is clustering? • Similarity/distance metrics • Hierarchical clustering algorithms • Made popular by Stanford, ie. [Eisen et al. 1998] • K-means • Made popular by many groups, eg. [Tavazoie et al. 1999] • Self-organizing map (SOM) • Made popular by Whitehead, ie. [Tamayo et al. 1999]

  22. Typical Tools • SAM (Significance Analysis of Microarrays), Stanford • GeneSpring • Affymetrix GeneChip Operating System (GCOS) • Cluster/Treeview • R statistics package microarray analysis libraries.

  23. How to define similarity? Experiments X genes n 1 p 1 X • Similarity metric: • A measure of pairwise similarity or dissimilarity • Examples: • Correlation coefficient • Euclidean distance genes genes Y Y n n Raw matrix Similarity matrix

  24. Similarity metrics • Euclidean distance • Correlation coefficient Euclidean clustering = magnitude & Direction Correlation clustering = direction

  25. Sporulation-example

  26. Sporulation-example

  27. Self-organizing maps (SOM) [Kohonen 1995] • Basic idea: • map high dimensional data onto a 2D grid of nodes • Neighboring nodes are more similar than points far away

  28. Self-organizing maps (SOM)

  29. SOM Clusters

  30. Things learned from from microarray gene expression experiments • Pathways not known to be involved • Ontology? • Novel genes involved in a known pathway • “like” and “unlike” tissues

  31. Transcription FactorsRegulatory Networks • Identify co-regulated genes • Search for common motifs (transcription factor binding sites) • Evaluate known motifs/factors • Search for new ones. • Programs: MEME, etc.

  32. mRNA-protein Correlation • YPD: should have relevant data • will yeast be typical? • Electrophoresis 18:533 • 23 proteins on 2D gels • r=0.48 for mRNA=protein • Post transcriptional and post translational regulation important!

  33. Other microarray formats • Single nucleotide polymorphism (SNP) chips • Oligos with each of 4 nt at each SNP. • Chromosomal IP chips (ChIP:chip) • Determine transcription factor binding sites • Promoter DNA on the chip. • Alternative splicing chips • Long oligos, covering alternatively spliced exons, or all exons. • Genome tiling chips

  34. ChIP:chip--Identification of Transcription Factor Binding Sites • Cross link transcription factors to DNA with formaldehyde • Pull out transcription factor of interest via immunoprecipitation with an antibody or by tagging the factor of interest with an isolatable epitope (e.g GST fusion). • Fractionate the DNA associated with the transcription factor, reverse the cross links, label and hybridize to an array of protomer DNA. • Brown et.al. (2001) Nature, 409(533-8)

  35. ChIP:chipAnalysis of TF Binding Sites

  36. On to Proteomics DNARNA Protein

More Related