slide1 l.
Skip this Video
Loading SlideShow in 5 Seconds..
Introduction to Microarray Dr G. P. S. Raghava PowerPoint Presentation
Download Presentation
Introduction to Microarray Dr G. P. S. Raghava

Loading in 2 Seconds...

play fullscreen
1 / 32

Introduction to Microarray Dr G. P. S. Raghava - PowerPoint PPT Presentation

  • Uploaded on

Introduction to Microarray Dr G. P. S. Raghava. Molecular Biology Overview . Nucleus. Cell. Chromosome. Gene (DNA). Gene (mRNA), single strand. Protein. Measuring Gene Expression.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Introduction to Microarray Dr G. P. S. Raghava' - knox

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Introduction to Microarray

Dr G. P. S. Raghava

molecular biology overview
Molecular Biology Overview




Gene (DNA)

Gene (mRNA),

single strand


measuring gene expression
Measuring Gene Expression

Idea: measure the amount ofmRNAto see whichgenesare beingexpressedin (used by) the cell. Measuringproteinwould be more direct, but is currently harder.

the goals
The Goals
  • Basic Understanding
    • Arrays can take a snap shot of which subset of genes in a cell is actively making proteins
    • Heat shock experiments
  • Medical diagnosis
    • Microarrays can indicate where mutations lie that might be linked to a disease. Still others are used to determine if a person’s genetic profile would make him or her more or less susceptible to drug side effects
    • 1999 – A genechip containing 6800 human genes was used distinguish between myeloid leukemia and lympholastic leukemia using a set of 50 genes that have different activity levels
  • Drug design
    • Pharmaceutical firms are in a rush to translate the human genome results into new products
      • Potential profits are huge
      • First, though, they must figure out what the genes do, how they interact, and how they relate to diseases.
    • Evaluation, Specificity, Response
microarray potential applications
Microarray Potential Applications
  • Biological discovery
    • new and better molecular diagnostics
    • new molecular targets for therapy
    • finding and refining biological pathways
  • Recent examples
    • molecular diagnosis of leukemia, breast cancer, ...
    • appropriate treatment for genetic signature
    • potential new drug targets


1980s: antibody-based assay (protein chip?)

~1991: high-density DNA-synthetic chemistry (Affymetrix/oligo chips)

~1995: microspotting (Stanford Univ/cDNA chips)

replacing porous surface with solid surface

replacing radioactive label with fluorescent label

improvement on sensitivity

what is a dna microarray
What is a DNA Microarray?

genes or gene fragments attached to a substrate (glass)

Tens of thousands of spots/genes

=entire genome in 1 experiment

A Revolution in Biology

Hybridized slide

Two dyes

Image analyzed

gene expression microarrays
Gene Expression Microarrays

The main types of gene expression microarrays:

  • Short oligonucleotide arrays (Affymetrix);
  • cDNA or spotted arrays (Brown/Botstein).
  • Long oligonucleotide arrays (Agilent Inkjet);
terms jargons
Stanford/cDNA chip

one slide/experiment

one spot

1 gene => one spot or few spots(replica)

control: control spots

control: two fluorescent dyes (Cy3/Cy5)

Affymetrix/oligo chip

one chip/experiment

one probe/feature/cell

1 gene => many probes (20~25 mers)

control: match and mismatch cells.

affymetrix microarrays


Affymetrix Microarrays

Raw image


~107 oligonucleotides,

half Perfectly Match mRNA (PM),

half have one Mismatch (MM)

Raw gene expression is intensity difference: PM - MM

dna microarrays


DNA Microarrays
  • Each probe consists of thousands of strands of identical oglionucleotides
    • The DNA sequences at each probe represent important genes (or parts of genes)
  • Printing Systems
    • Ex: HP, Corning Inc.
    • Printing systems can build lengths of DNA up to 60 nucleotides long
    • 1.28 x 1.28+ cm glass wafer
      • Each “print head” has a ~100 m diameter and are separated by ~100 m. ( 5,000 – 20,000 probes)
  • Photolithographic Chips
    • Ex: Affymetix
    • 1.28 x 1.28 cm glass/silicon wafer
      • 24 x 24 m probe site ( 500,000 probes)
    • Lengths of DNA up to 25 nucleotides long
    • Requires a new set of masks for each new array type
the process



10% Biotin-labeled Uracil

Antisense cRNA











Fragment (heat, Mg2+)






The Process



hybridization and staining













Hybridization and Staining


Labeled cRNA


Hybridized Array






microarray data
Microarray Data
  • First, the Problems:
    • The fabrication process is not error free
    • Probes have a maximum length 25-60 nucleotides
    • Biologic processes such as hybridization are stochastic
    • Background light may skew the fluorescence
    • How do we decide if/how strongly a particular gene is being expressed?
  • Solutions to these problems are still in their infancy
affymetrix gene chip system
Affymetrix “Gene chip” system
  • Uses 25 base oligos synthesized in place on a chip (20 pairs of oligos for each gene)
  • RNA labeled and scanned in a single “color”
    • one sample per chip
  • Can have as many as 20,000 genes on a chip
  • Arrays get smaller every year (more genes)
  • Chips are expensive
  • Proprietary system: “black box” software, can only use their chips
cdna microarray technologies
cDNA Microarray Technologies
  • Spot cloned cDNAs onto a glass microscope slide
    • usually PCR amplified segments of plasmids
  • Label 2 RNA samples with 2 different colors of flourescent dye - control vs. experimental
  • Mix two labeled RNAs and hybridize to the chip
  • Make two scans - one for each color
  • Combine the images to calculate ratios of amounts of each RNA that bind to each spot
cdna microarrays
cDNA microarrays


cDNA from one gene on each spot


cDNA labelled red/green

Compare the genetic expression in two samples of cells

e.g. treatment/control

normal / tumor tissue



Add equal amounts of labelled cDNA samples to microarray.




long oligos
“Long Oligos”
  • Like cDNAs, but instead of using a cloned gene, design a 40-70 base probe to represent each gene
  • Relies on genome sequence database and bioinformatics
  • Reduces cross hybridization
  • Cheaper and possibly more sensitive than Affy. system
images from scanner
Images from scanner
  • Resolution
    • standard 10m [currently, max 5m]
    • 100m spot on chip = 10 pixels in diameter
  • Image format
    • TIFF (tagged image file format) 16 bit (65’536 levels of grey)
    • 1cm x 1cm image at 16 bit = 2Mb (uncompressed)
    • other formats exist e.g.. SCN (used at Stanford University)
  • Separate image for each fluorescent sample
    • channel 1, channel 2, etc.
processing of images
Processing of images
  • Addressing or gridding
    • Assigning coordinates to each of the spots
  • Segmentation
    • Classification of pixels either as foreground or as background
  • Intensity determination for each spot
    • Foreground fluorescence intensity pairs (R, G)
    • Background intensities
    • Quality measures
images in analysis software
Images in analysis software
  • The two 16-bit images (Cy3, Cy5) are compressed into 8-bit images
  • Display fluorescence intensities for both wavelengths using a 24-bit RGB overlay image
  • RGB image :
    • Blue values (B) are set to 0
    • Red values (R) are used for Cy5 intensities
    • Green values (G) are used for Cy3 intensities
  • Qualitative representation of results
quantification of expression
Quantification of expression

For each spot on the slide we calculate

Red intensity = Rfg - Rbg

(fg = foreground, bg = background) and

Green intensity = Gfg - Gbg

and combine them in the log (base 2) ratio

Log2(Red intensity / Green intensity)

gene expression data
Gene Expression Data


On p genes for n slides: p is O(10,000), n is O(10-100), but growing,

slide 1 slide 2 slide 3 slide 4 slide 5 …

1 0.46 0.30 0.80 1.51 0.90 ...

2 -0.10 0.49 0.24 0.06 0.46 ...

3 0.15 0.74 0.04 0.10 0.20 ...

4 -0.45 -1.03 -0.79 -0.56 -0.32 ...

5 -0.06 1.06 1.35 1.09 -1.09 ...


Gene expression level of gene 5 in slide 4


Log2(Red intensity / Green intensity)

These values are conventionally displayed

on a red(>0)yellow (0)green (<0) scale.


Biological question

Differentially expressed genes

Sample class prediction etc.

Experimental design

Microarray experiment

16-bit TIFF files

Image analysis

(Rfg, Rbg), (Gfg, Gbg)


R, G





Biological verification

and interpretation

quality control flag
Quality control (-> Flag)
  • How good are foreground and background measurements ?
    • Variability measures in pixel values within each spot mask
    • Spot size
    • Circularity measure
    • Relative signal to background intensity
    • Dapple:
      • b-value : fraction of background intensities less than the median foreground intensity
      • p-score : extend to which the position of a spot deviates from a rigid rectangular grid
  • Flag spots based on these criteria
  • Why?
    • To reduce variability
    • To increase generalizability
  • What is it?
    • Duplicate spots
    • Duplicate slides
      • Technical replicates
      • Biological replicates
practical application of dna microarrays
Practical Application of DNA Microarrays
  • DNA Microarrays are used to study gene activity (expression)
    • What proteins are being actively produced by a group of cells?
      • “Which genes are being expressed?”
  • How?
    • When a cell is making a protein, it translates the genes (made of DNA) which code for the protein into RNA used in its production
    • The RNA present in a cell can be extracted
    • If a gene has been expressed in a cell
      • RNA will bind to “a copy of itself” on the array
      • RNA with no complementary site will wash off the array
    • The RNA can be “tagged” with a fluorescent dye to determine its presence
  • DNA microarrays provide a high throughput technique for quantifying the presence of specific RNA sequences
analysis and management of microarray data
Analysis and Management of Microarray Data
  • Magnitude of Data
    • Experiments
      • 50 000 genes in human
      • 320 cell types
      • 2000 compunds
      • 3 times points
      • 2 concentrations
      • 2 replicates
    • Data Volume
      • 4*1011 data-points
      • 1015 = 1 petaB of Data