Genome wide copy number analysis
1 / 39

Genome-wide Copy Number Analysis - PowerPoint PPT Presentation

  • Updated On :

Genome-wide Copy Number Analysis. Qunyuan Zhang,Ph.D. Division of Statistical Genomics Department of Genetics & Center for Genome Sciences Washington University School of Medicine 02 - 08 – 2006 Course: M 21-621 Computational Statistical Genetics. Four Questions. What is Copy Number ?

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Genome-wide Copy Number Analysis' - debra

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Genome wide copy number analysis l.jpg
Genome-wide Copy Number Analysis

Qunyuan Zhang,Ph.D.

Division of Statistical Genomics

Department of Genetics & Center for Genome Sciences

Washington University School of Medicine

02 - 08 – 2006

Course: M 21-621 Computational Statistical Genetics

Four questions l.jpg
Four Questions

  • What is Copy Number ?

  • What can Copy Number tell us?

  • How to measure/quantify Copy Number?

  • How to analyze Copy Number?

What is copy number l.jpg
What is Copy Number ?

  • Gene Copy Number

    The gene copy number (also "copy number variants" or CNVs) is the amount of copies of a particular gene in the genotype of an individual. Recent evidence shows that the gene copy number can be elevated in cancer cells. For instance, the EGFR copy number can be higher than normal in Non-small cell lung cancer. …Elevating the gene copy number of a particular gene can increase the expression of the protein that it encodes.

    From Wikipedia

Slide4 l.jpg

  • DNA Copy Number

    A Copy Number Variant (CNV) represents a copy number change involving a DNA fragment that is ~1 kilobases or larger.

    From Nature Reviews Genetics, Feuk et al. 2006

  • DNA Copy Number≠ DNA Tandem Repeat Number (e.g. micro satellites)

    <10 bases

  • DNA Copy Number≠RNA Copy Number

  • RNA Copy Number = Gene Expression Level

    DNA transcription mRNA

  • Copy Numberis the amount of copies of a particular fragment of nucleic acid molecular chain. It refers to DNA Copy Number in most publications.

What can copy number tell us l.jpg
What can Copy Number tell us?

  • Genetic Diversity/Polymorphisms

    - restriction fragment length polymorphism (RFLP)

    - amplified fragment length polymorphism (AFLP)

    - random amplification of polymorphic DNA (RAPD)

    - variable number of tandem repeat (VNTR; e.g., mini- and microsatellite)

    - single nucleotide polymorphism (SNP)

    - presence/absence of transportable elements

    - structural alterations (e.g., deletions, duplications, inversions … )

    - DNA copy number variant (CNV)

    Association with phenotypes/diseases genes/genetic factors

Genetic alterations in tumor cells dna copy number changes l.jpg

Normal cell


Homologous repeats

Segmental duplications

Chromosomal rearrangements

Duplicative transpositions

Non-allelic recombinations


Tumor cells

deletion amplification

CN=0 CN=1 CN=2 CN=3 CN=4

Genetic Alterations in Tumor Cells (DNA Copy Number Changes)

How to measure quantify copy number l.jpg

  • Quantitative Polymerase Chain Reaction (Q-PCR) : DNA Amplification

    (dNTPs, primers, Taq polymerase, fluorescent dye)


    less CN amplification less DNA low fluorescent intensity

    more CN amplification more DNA high fluorescent intensity

    (one fragment each time)

  • Microarray : DNA Hybridization

    (dNTPs, primers, Taq polymerase, fluorescent dye)


    less CN amplification less DNA arrayed probes low intensities

    more CN amplification more DNA arrayed probes high intensities

    (multiple/different fragments, mixed pool)


How to measure/quantify Copy Number?

Microarray from image to copy number l.jpg



Affymetrix Mapping 250K Sty-I chip

~250K probe sets

~250K SNPs

probe set (24 probes)










more DNA copy number more DNA hybridization higher intensity

Microarray: From Image to Copy Number

Slide9 l.jpg

~400 cancer patients

Normal tissue & tumor tissue (~400 pairs, ~800 DNA samples)

Affymetrix 250K Sty-I Human Mapping SNP Array

DNA hybridization signals (intensities on chip images)

Genotype calling

SNP genotypes

LOH analysis DNA copy number analysis

(genotypic changes) (DNA copy number changes)

How to Analyze Copy Number?

  • A Real Example


Slide10 l.jpg

Finished chips (scanner) Raw image data [.DAT files]

(experiment info [ .EXP]) (image processing software)

Probe level raw intensity data [.CEL files]

Background adjustment, Normalization, Summarization

Summarized intensity data

Raw copy number (CN) data [log ratio of tumor/normal intensities]

Significance test of CN changes

Estimation of CN

Smoothing and boundary determination

Concurrent regions among population

Amplification and deletion frequencies among populations

Association analysis

chip description file [.CDF]

Preprocessing :

  • General Procedures for Copy Number Analysis

Background adjustment correction l.jpg
Background Adjustment/Correction

Reduces unevenness of a single chip

Makes intensities of different positions on a chip comparable

Before adjustment After adjustment

Corrected Intensity (S’) = Observed Intensity (S) – Background Intensity (B)

For each region i, B(i) = Mean of the lowest 2% intensities in region i

AffyMetrix MAS 5.0

Slide12 l.jpg

Eliminates non-specific hybridization signal

Obtains accurate intensity values for specific hybridization

sense or antisense strands

25 oligonucleotide probes


probe set

PM only, PM-MM, Ideal MM, etc.

Normalization l.jpg

S – Mean of S

S’ =

STD of S

S’ ~ N(0,1 )

Base Line Array (linear); Quantile Normalization;Contrast Normalization; etc.


Reduces technical variation between chips

Makes intensities from different chips comparable

Before normalization After normalization

Slide14 l.jpg

Combines the multiple probe intensities for each probe set to produce a summarized value for subsequent analyses.

Average methods:

PM only or PM-MM, allele specific or non-specific

Model based method : Li & Wong , 2001

Gene Expression Index

Raw copy number data l.jpg

after Log transformation


before Log transformation


S : Summarized raw intensity

S’ : Log transformation, S’ = log2(S)

Raw CN:

Log ratio of tumor / normal intensities

CN = S’tumor - S’normal = log2(Stumor/Snormal)

Pair design

Snormal = S of the paired normal sample

Group design

Snormal = average S of the group of normal samples

Raw CN

Raw Copy Number Data

Individual level analysis l.jpg
Individual Level Analysis

Analysis for each individual sample (or each sample pair)

  • Significance test of CN amplification and deletion

  • Boundary finding (smoothing and segmentation)

  • CN estimation

Intensities and raw cns chr 1 piar 101 black normal red tumor green tumor normal l.jpg
Intensities and Raw CNs, Chr. 1 (Piar#101)Black: Normal, Red: Tumor, Green: Tumor- Normal

Significance test for copy number changes log p values chr 1 pair 101 l.jpg

Window-based t test

Window size = 0.5 Mbp (~30 SNPs); N = SNP number in window

Mean CN of window

t = X N ~ t (df=N -1)

SD of widow


Window Position (Mbp)

Significance Test for Copy Number Changes: -log(p) values, chr. 1, pair#101

Slide21 l.jpg

SegmentationBioConductor R Packages ( package, adaptive weights smoothing (AWS) methodDNAcopy package, circular binary segmentation method

Slide22 l.jpg

SNP_i SNP_i+1 SNP_i+2 SNP_i+3 SNP_i+4 …






log ratio

log ratio

log ratio

log ratio

log ratio

CN Estimation: Hidden Markov Model (HMM)CNAT(; dChip ( ; CNAG (


hidden status

(unknown CN )

observed status

(raw CN = log ratio of intensities)

CN estimation:finding a sequence of CN values which maximizes the likelihood of observed raw CN.

Algorithm: Viterbi algorithm (can be Iterative)

Information/assumptions below are needed

Background probabilities: Overall probabilities of possible CN values.

P(CN=x); x=-2,-1,0,1,2,3,…, n (usually,n<10)

Transition probabilities: Probabilities of CN values of each SNP conditional on the previous one.

P(CN_i+1=x|CN_i=y); x=-2,-1,0,1,2,3,…, or n; y=-2,-1,0,1,2,3, …, or n

Emission probabilities: Probabilities of observed raw CN values of each SNP conditional on the hidden/unknown/true CN status.

P(log ratio<x|CN=y)=f(x|CN=y); x=one of real numbers; y=-2,-1,0,1,2,3, …, or n

Slide23 l.jpg





HMM Estimation of CN for Chr. 1 (Piar#101)Black: Normal Intensities, Red: Tumor Intensities, Green: Tumor- Normal Blue: HMM estimated CNs in Tumor Tissue

Population level analysis l.jpg
Population Level Analysis

Analysis for the whole group (or sub-group) of samples

  • Overall significance test

  • Amplification and deletion frequencies summarization

  • Common/concurrent region finding

  • Associations (with mutations, LOHs, clinical variables …)

Genome wide raw cn changes average over 400 pairs l.jpg
Genome-wide Raw CN Changes(average over ~400 pairs )

Raw cn changes of chr 14 average over 400 pairs l.jpg
Raw CN Changes of Chr. 14(average over ~400 pairs )

Sliding window analysis l.jpg

… .. … … . . . . .. …… …… .. … … . . . . .. …… … .. …… … ..

Window k

Window N

Window 10

Window 9

Window 6

Window 8

Window 4

Window 3

Window 2

Window 1

Window 7

Window 5



Each window (k) contains 30 consecutive SNPs (k, k+1, k+2, k+3, …, k+29)

Sliding Window Analysis

Genome wide raw copy number changes sliding window plot averaged over 400 pairs l.jpg
Genome-wide Raw Copy Number Changes . .. …… … .. …… … ..(sliding window plot, averaged over ~400 pairs )

Slide29 l.jpg

Sliding Window Test of Significance . .. …… … .. …… … .. of CN Changes

-log(p) values, based on ~ 400 pairs

Slide30 l.jpg

CN Change Frequencies in Population . .. …… … .. …… … ..( Chr.14,~400 pairs)Black: Freq.(CN>0) Red: Freq.(CN>0, significant amplification at 0.01 level) Green: Freq.(CN<0, significant deletion at 0.01 level)

Slide31 l.jpg

Population Level Segmentation Analysis . .. …… … .. …… … ..(~400 pairs)Circular Binary Segmentation approach, Bioconductor Package DNAcopy

Segmentation of chr 14 average result of 400 pairs l.jpg
Segmentation of Chr. 14 . .. …… … .. …… … ..(average result of ~400 pairs)

Visualization of concurrent regions of chr 14 400 pairs l.jpg
Visualization of Concurrent Regions of Chr. 14 . .. …… … .. …… … ..(~400 pairs)



Group specific analysis black non smokers red non smokers l.jpg
Group-specific Analysis . .. …… … .. …… … ..Black: non-smokers, Red: non-smokers

Slide35 l.jpg

Separate Tumor Samples from Normal Samples Using Six Chromosomal Peaks with Significant CN Changes

(Classification Based on RAW CN)



Slide36 l.jpg

Mapping Known Cancer-related Genes onto the Copy Number Map Chromosomal Peaks with Significant CN Changes

Software l.jpg
Software Chromosomal Peaks with Significant CN Changes

Affymetrix Chips (

Illumina Chips (


dChip ( ;



BioConductor R Packages (

GLAD package, adaptive weights smoothing (AWS) method

DNAcopy package, circular binary segmentation method

Widows ?

Unix ?

Parallel Computation ?

References l.jpg
References Chromosomal Peaks with Significant CN Changes

  • R Gentlemen et al. Bioinformatics and computational biology solutions using R and Bioconductor. Springer, 2005

  • JL Freeman et al. Genome Research 2006; 16:949-961

  • J Huang et al. Hum Genomics. 2004;1(4):287-99

  • X Zhao et al. Cancer Research 2004; 64:3060-3071

  • Y Nannya et al. Cancer Research 2005, 65: 6071-6079

  • … see google …

Acknowledgements l.jpg
Acknowledgements Chromosomal Peaks with Significant CN Changes

Aldi Kraja Li Ding

Ingrid Borecki John Osborne

Michael Province Ken Chen

Division of Statistical Genomics Medical Sequencing Group

Center for Genome Sciences

Washington University School of Medicine