Genome wide copy number analysis l.jpg
This presentation is the property of its rightful owner.
Sponsored Links
1 / 39

Genome-wide Copy Number Analysis PowerPoint PPT Presentation


  • 235 Views
  • Uploaded on
  • Presentation posted in: General

Genome-wide Copy Number Analysis. Qunyuan Zhang,Ph.D. Division of Statistical Genomics Department of Genetics & Center for Genome Sciences Washington University School of Medicine 02 - 08 – 2006 Course: M 21-621 Computational Statistical Genetics. Four Questions. What is Copy Number ?

Download Presentation

Genome-wide Copy Number Analysis

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Genome wide copy number analysis l.jpg

Genome-wide Copy Number Analysis

Qunyuan Zhang,Ph.D.

Division of Statistical Genomics

Department of Genetics & Center for Genome Sciences

Washington University School of Medicine

02 - 08 – 2006

Course: M 21-621 Computational Statistical Genetics


Four questions l.jpg

Four Questions

  • What is Copy Number ?

  • What can Copy Number tell us?

  • How to measure/quantify Copy Number?

  • How to analyze Copy Number?


What is copy number l.jpg

What is Copy Number ?

  • Gene Copy Number

    The gene copy number (also "copy number variants" or CNVs) is the amount of copies of a particular gene in the genotype of an individual. Recent evidence shows that the gene copy number can be elevated in cancer cells. For instance, the EGFR copy number can be higher than normal in Non-small cell lung cancer. …Elevating the gene copy number of a particular gene can increase the expression of the protein that it encodes.

    From Wikipedia www.wikipedia.org


Slide4 l.jpg

  • DNA Copy Number

    A Copy Number Variant (CNV) represents a copy number change involving a DNA fragment that is ~1 kilobases or larger.

    From Nature Reviews Genetics, Feuk et al. 2006

  • DNA Copy Number≠ DNA Tandem Repeat Number (e.g. micro satellites)

    <10 bases

  • DNA Copy Number≠RNA Copy Number

  • RNA Copy Number = Gene Expression Level

    DNA transcription mRNA

  • Copy Numberis the amount of copies of a particular fragment of nucleic acid molecular chain. It refers to DNA Copy Number in most publications.


What can copy number tell us l.jpg

What can Copy Number tell us?

  • Genetic Diversity/Polymorphisms

    - restriction fragment length polymorphism (RFLP)

    - amplified fragment length polymorphism (AFLP)

    - random amplification of polymorphic DNA (RAPD)

    - variable number of tandem repeat (VNTR; e.g., mini- and microsatellite)

    - single nucleotide polymorphism (SNP)

    - presence/absence of transportable elements

    - structural alterations (e.g., deletions, duplications, inversions … )

    - DNA copy number variant (CNV)

    Association with phenotypes/diseases genes/genetic factors


Genetic alterations in tumor cells dna copy number changes l.jpg

Normal cell

CN=2

Homologous repeats

Segmental duplications

Chromosomal rearrangements

Duplicative transpositions

Non-allelic recombinations

……

Tumor cells

deletion amplification

CN=0 CN=1 CN=2 CN=3 CN=4

Genetic Alterations in Tumor Cells (DNA Copy Number Changes)


How to measure quantify copy number l.jpg

  • Quantitative Polymerase Chain Reaction (Q-PCR) : DNA Amplification

    (dNTPs, primers, Taq polymerase, fluorescent dye)

    PCR

    less CN amplification less DNA low fluorescent intensity

    more CN amplification more DNA high fluorescent intensity

    (one fragment each time)

  • Microarray : DNA Hybridization

    (dNTPs, primers, Taq polymerase, fluorescent dye)

    PCR

    less CN amplification less DNA arrayed probes low intensities

    more CN amplification more DNA arrayed probes high intensities

    (multiple/different fragments, mixed pool)

    Hybridization

How to measure/quantify Copy Number?


Microarray from image to copy number l.jpg

Tumor

Normal

Affymetrix Mapping 250K Sty-I chip

~250K probe sets

~250K SNPs

probe set (24 probes)

CN=2

CN=2

CN=2

Deletion

CN=1

CN=0

CN>2

Deletion

Amplification

more DNA copy number more DNA hybridization higher intensity

Microarray: From Image to Copy Number


Slide9 l.jpg

~400 cancer patients

Normal tissue & tumor tissue (~400 pairs, ~800 DNA samples)

Affymetrix 250K Sty-I Human Mapping SNP Array

DNA hybridization signals (intensities on chip images)

Genotype calling

SNP genotypes

LOH analysis DNA copy number analysis

(genotypic changes) (DNA copy number changes)

How to Analyze Copy Number?

  • A Real Example

?


Slide10 l.jpg

Finished chips (scanner) Raw image data [.DAT files]

(experiment info [ .EXP]) (image processing software)

Probe level raw intensity data [.CEL files]

Background adjustment, Normalization, Summarization

Summarized intensity data

Raw copy number (CN) data [log ratio of tumor/normal intensities]

Significance test of CN changes

Estimation of CN

Smoothing and boundary determination

Concurrent regions among population

Amplification and deletion frequencies among populations

Association analysis

chip description file [.CDF]

Preprocessing :

  • General Procedures for Copy Number Analysis


Background adjustment correction l.jpg

Background Adjustment/Correction

Reduces unevenness of a single chip

Makes intensities of different positions on a chip comparable

Before adjustment After adjustment

Corrected Intensity (S’) = Observed Intensity (S) – Background Intensity (B)

For each region i, B(i) = Mean of the lowest 2% intensities in region i

AffyMetrix MAS 5.0


Slide12 l.jpg

  • Background Adjustment/Correction

Eliminates non-specific hybridization signal

Obtains accurate intensity values for specific hybridization

sense or antisense strands

25 oligonucleotide probes

quartet

probe set

PM only, PM-MM, Ideal MM, etc.


Normalization l.jpg

S – Mean of S

S’ =

STD of S

S’ ~ N(0,1 )

Base Line Array (linear); Quantile Normalization;Contrast Normalization; etc.

Normalization

Reduces technical variation between chips

Makes intensities from different chips comparable

Before normalization After normalization


Slide14 l.jpg

  • Summarization

Combines the multiple probe intensities for each probe set to produce a summarized value for subsequent analyses.

Average methods:

PM only or PM-MM, allele specific or non-specific

Model based method : Li & Wong , 2001

Gene Expression Index


Raw copy number data l.jpg

after Log transformation

Log(S)

before Log transformation

S

S : Summarized raw intensity

S’ : Log transformation, S’ = log2(S)

Raw CN:

Log ratio of tumor / normal intensities

CN = S’tumor - S’normal = log2(Stumor/Snormal)

Pair design

Snormal = S of the paired normal sample

Group design

Snormal = average S of the group of normal samples

Raw CN

Raw Copy Number Data


Individual level analysis l.jpg

Individual Level Analysis

Analysis for each individual sample (or each sample pair)

  • Significance test of CN amplification and deletion

  • Boundary finding (smoothing and segmentation)

  • CN estimation


Intensities and raw cns chr 1 piar 101 black normal red tumor green tumor normal l.jpg

Intensities and Raw CNs, Chr. 1 (Piar#101)Black: Normal, Red: Tumor, Green: Tumor- Normal


Significance test for copy number changes log p values chr 1 pair 101 l.jpg

Window-based t test

Window size = 0.5 Mbp (~30 SNPs); N = SNP number in window

Mean CN of window

t = X N ~ t (df=N -1)

SD of widow

-log(p)

Window Position (Mbp)

Significance Test for Copy Number Changes: -log(p) values, chr. 1, pair#101


Genome wide raw cn changes piar 105 l.jpg

Genome-wide Raw CN Changes (Piar#105)


Genome wide widow based test of cn changes piar 105 l.jpg

Genome-wide Widow-based Test of CN Changes (Piar#105)

- Log (p)


Slide21 l.jpg

SegmentationBioConductor R Packages (www.bioconductor.org)GLAD package, adaptive weights smoothing (AWS) methodDNAcopy package, circular binary segmentation method


Slide22 l.jpg

… SNP_i SNP_i+1 SNP_i+2 SNP_i+3 SNP_i+4 …

CN=?

CN=?

CN=?

CN=?

CN=?

log ratio

log ratio

log ratio

log ratio

log ratio

CN Estimation: Hidden Markov Model (HMM)CNAT(www.affymetrix.com); dChip (www.dchip.org) ; CNAG (www.genome.umin.jp)

position

hidden status

(unknown CN )

observed status

(raw CN = log ratio of intensities)

CN estimation:finding a sequence of CN values which maximizes the likelihood of observed raw CN.

Algorithm: Viterbi algorithm (can be Iterative)

Information/assumptions below are needed

Background probabilities: Overall probabilities of possible CN values.

P(CN=x); x=-2,-1,0,1,2,3,…, n (usually,n<10)

Transition probabilities: Probabilities of CN values of each SNP conditional on the previous one.

P(CN_i+1=x|CN_i=y); x=-2,-1,0,1,2,3,…, or n; y=-2,-1,0,1,2,3, …, or n

Emission probabilities: Probabilities of observed raw CN values of each SNP conditional on the hidden/unknown/true CN status.

P(log ratio<x|CN=y)=f(x|CN=y); x=one of real numbers; y=-2,-1,0,1,2,3, …, or n


Slide23 l.jpg

CN=4

CN=3

CN=2

CN=1

HMM Estimation of CN for Chr. 1 (Piar#101)Black: Normal Intensities, Red: Tumor Intensities, Green: Tumor- Normal Blue: HMM estimated CNs in Tumor Tissue


Population level analysis l.jpg

Population Level Analysis

Analysis for the whole group (or sub-group) of samples

  • Overall significance test

  • Amplification and deletion frequencies summarization

  • Common/concurrent region finding

  • Associations (with mutations, LOHs, clinical variables …)


Genome wide raw cn changes average over 400 pairs l.jpg

Genome-wide Raw CN Changes(average over ~400 pairs )


Raw cn changes of chr 14 average over 400 pairs l.jpg

Raw CN Changes of Chr. 14(average over ~400 pairs )


Sliding window analysis l.jpg

… .. … … . . . . .. …… …… .. … … . . . . .. …… … .. …… … ..

Window k

Window N

Window 10

Window 9

Window 6

Window 8

Window 4

Window 3

Window 2

Window 1

Window 7

Window 5

………..

………..

Each window (k) contains 30 consecutive SNPs (k, k+1, k+2, k+3, …, k+29)

Sliding Window Analysis


Genome wide raw copy number changes sliding window plot averaged over 400 pairs l.jpg

Genome-wide Raw Copy Number Changes(sliding window plot, averaged over ~400 pairs )


Slide29 l.jpg

Sliding Window Test of Significance of CN Changes

-log(p) values, based on ~ 400 pairs


Slide30 l.jpg

CN Change Frequencies in Population( Chr.14,~400 pairs)Black: Freq.(CN>0) Red: Freq.(CN>0, significant amplification at 0.01 level) Green: Freq.(CN<0, significant deletion at 0.01 level)


Slide31 l.jpg

Population Level Segmentation Analysis (~400 pairs)Circular Binary Segmentation approach, Bioconductor Package DNAcopy


Segmentation of chr 14 average result of 400 pairs l.jpg

Segmentation of Chr. 14(average result of ~400 pairs)


Visualization of concurrent regions of chr 14 400 pairs l.jpg

Visualization of Concurrent Regions of Chr. 14(~400 pairs)

samples

positions


Group specific analysis black non smokers red non smokers l.jpg

Group-specific AnalysisBlack: non-smokers, Red: non-smokers


Slide35 l.jpg

Separate Tumor Samples from Normal Samples Using Six Chromosomal Peaks with Significant CN Changes

(Classification Based on RAW CN)

Tumor

Normal


Slide36 l.jpg

Mapping Known Cancer-related Genes onto the Copy Number Map


Software l.jpg

Software

Affymetrix Chips (www.affymetrix.com)

Illumina Chips (www.illumina.com)

CNAT(www.affymetrix.com);

dChip (www.dchip.org) ;

CNAG (www.genome.umin.jp)

GenePattern www.broad.mit.edu/cancer/software/genepattern/

BioConductor R Packages (www.bioconductor.org)

GLAD package, adaptive weights smoothing (AWS) method

DNAcopy package, circular binary segmentation method

Widows ?

Unix ?

Parallel Computation ?


References l.jpg

References

  • R Gentlemen et al. Bioinformatics and computational biology solutions using R and Bioconductor. Springer, 2005

  • JL Freeman et al. Genome Research 2006; 16:949-961

  • J Huang et al. Hum Genomics. 2004;1(4):287-99

  • X Zhao et al. Cancer Research 2004; 64:3060-3071

  • Y Nannya et al. Cancer Research 2005, 65: 6071-6079

  • … see google …


Acknowledgements l.jpg

Acknowledgements

Aldi Kraja Li Ding

Ingrid Borecki John Osborne

Michael Province Ken Chen

Division of Statistical Genomics Medical Sequencing Group

Center for Genome Sciences

Washington University School of Medicine


  • Login