Array based comparative genomic hybridization
This presentation is the property of its rightful owner.
Sponsored Links
1 / 82

Array-based Comparative Genomic Hybridization PowerPoint PPT Presentation


  • 147 Views
  • Uploaded on
  • Presentation posted in: General

Array-based Comparative Genomic Hybridization. Bastien JOB 2010-10-19. Structural Genomics Sequence variations (CGHa, SNPa, DNAseq, mutations…). Fonctional Genomics Gene expression / splicing… (GEa, Q-PCR, RNAseq… ). Proteomics (Antibody arrays, 2D EP +MS/MS, HPLC+MS / MS, … ). Genome.

Download Presentation

Array-based Comparative Genomic Hybridization

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Array based comparative genomic hybridization

Array-based Comparative Genomic Hybridization

Bastien JOB

2010-10-19


Array based comparative genomic hybridization

Structural Genomics

Sequence variations

(CGHa, SNPa, DNAseq, mutations…)

Fonctional Genomics

Gene expression / splicing…

(GEa, Q-PCR, RNAseq…)

Proteomics

(Antibody arrays, 2D EP +MS/MS, HPLC+MS / MS, …)

Genome

Transcriptome

Proteome

DNA: gene

RNA

Post-trad

modification

mRNA: transcript

protein

Intron

Transcription

Translation

Exon

Splicing,

editing

miRNA

Nucleus

Promotor, regulating seq

Cell Membrane


Array based comparative genomic hybridization

History and context

Technical principle, classical designs

Description of oligo CGH arrays

Data preprocessing

Bioinformatic analysis

Cross-technology correlation


History and context

History and context

CGH arrayis a methodaimingat the identification of the variation in number of the genomic content of a test sample, by comparison to a referencesample, using an array of (at least) thousands of measure points on the genome.

A bit of history of cytogenomics

  • [196x] : Karyotyping

  • [1993] : Spectral karyotyping (SKY)

  • [199x] : CGH (comparative genomichybridization) on chromosomes

  • [200x] : cDNA-based and BAC-based CGH array

  • [2005] : oligo-based CGH array

    In cancer :

  • The profiling of the patterns defined by thesealterations for a patient or a pathology.

  • Explore for the association betweensome of these patterns and clinical annotations.

    Other uses :

  • Developmentabnormalities, autism, diabetes, inter-individualsCNVs (HapMapproject), ...

    It’s an establishedmethod in the cancer researchfield, in establishment for the diagnostic field.


Array based comparative genomic hybridization

196x : Karyotype

1993 : SKY

199x : CGH on chr

200x : cDNA/BAC-based CGH array

2005 : Oligo-based CGH array


Rearrangements in tumors creating fusion genes

Rearrangements in tumors creating fusion genes


Rearrangements in tumors altering gene regulation

Rearrangements in tumors altering gene regulation

MYC – IgH translocation in Burkitt lymphoma

IMAGE CREDIT: Gregory Schuler, NCBI, NIH, Bethesda, MD, USA

Also a common fusion in prostate cancer (Tomlins et al., Science 2005)


Chromosomal amplifications

Chromosomal amplifications

EGFR amplification in lung cancer as HSR (homogeneously stained region)

EGFR amplification in lung cancer as several double minutes

Varella-Garcia et al, J Clin Pathol 2009


Common alterations across tumors and pathologies

Common alterations across tumorsand pathologies

  • Mutations activating / repressingpathways

  • Breakpointscreating duplications / amplifications / deletions / fusions

  • Known « master genes » like TP53, PTEN, CDKN2A/B, MYC, EGFR, FGF, …,

  • Some are tissue-specific, others more widelyspread

Duplicated genes

Deleted genes

activation

repression


Array based comparative genomic hybridization

History and context

Technical principle, classical designs

Description of oligo CGH arrays

Data preprocessing

Bioinformatic analysis

Cross-technology correlation


Technical principle dual color

Technical principle (dual color)


Array based comparative genomic hybridization

Designs (dual color)

  • For dual-channel CGHarray, most of the time :

    Test sample DNA (tumor) Cy5 -vs- Reference DNA (normal) Cy3

  • Mainly use of a sex-matched commercial normal DNA as reference

    • Sex-matched: anomalies on gonosomes

    • « outside » reference : polymorphisms (CNV, « copy number variations »)

  • More rarely (cancer field) : using the same person’s normal DNA

    • No polymorphism

    • Same origin ≈ same preparation

    • Some difficulties for blood DNA extraction

  • Use of a « stable » cell-line with a complete ploidy as a reference (ex: Coriell NA10851)

  • More complex designs can be performed (circular, …)


Array based comparative genomic hybridization

T

(R)

CGH array simplified process on the platform :

From sample to analysis

Fragmentation

& labelling

DNA extraction

Hybridization

Samples

Qualification

& quantitation

oligo microarray

Bioinformatic

analyses

Segmentation &

visualization

Scan, signals acquisition & normalization


Array based comparative genomic hybridization

History and context

Technical principle, classical designs

Description of oligo CGH arrays

Data preprocessing

Bioinformatic analysis

Cross-technology correlation


Long oligo agilent cgharrays

Long oligo Agilent CGHarrays

G2 : 244 K Agilent oligoarray

Spots : 60µm (@ 5µm/px)

Spots : 30µm @ 2µm/px

G3 : 4 x 180 K Agilent oligoarray


Available formats for human

Available formats (for Human)

2ndgeneration

  • 4 x 44K

  • 2 x 105K

  • 1 x 244 K

  • 3rd generation (current)

  • 8 x 60K

  • 4 x 180K

  • 2 x 400 K

  • 1 x 1M

  • Most formats alsoavailable for mouse and rat

  • Possibility to design one’sown custom array for any format


Long oligo nimblegen arrays

Long oligo NimbleGen arrays


Short oligo affymetrix snp 6 0 array

Short oligo Affymetrix SNP 6.0 array

4x

906,600

SNP

probes

945,826 CN probes *

  • 25-mer oligos

  • ~700b averageinterval

  • ~2 Kb real CN interval

* ~200,000 CNVs


Illumina infinium beadchips

Illumina Infinium BeadChips


Array based comparative genomic hybridization

History and context

Technical principle, classical designs

Description of oligo CGH arrays

Data preprocessing

Bioinformatic analysis

Cross-technology correlation


Simplified bioinformatics analysis pipeline

Simplified bioinformatics analysis pipeline

Genomic profile Segmentation

Signals acquisition

Quality controls

Normalization

CBS

Feature Extraction v10.x

Description of the population

Identification of genomic regions of interest

Describing genomic contents

Public databases

+ Clinical Annotations

R, aCGH

STAC


Signals acquisition

SIGNALS ACQUISITION


Spot position identification

Spot position identification

  • by 2D intensityhistograms

  • By a circle (fixed / variable diameter)

  • Adaptative segmentation by randomseed propagation

Credits : Pierre NEUVIAL (ENSAE)

Currentoligogeneration : perfect disc-shaped spots.


Spot extraction

Spot extraction

  • Twomethods :

  • Intensity segmentation

  • Isolation of real signal from a local background

  • Needed for bothsignals

  • Needs a background correction method

  • Then a ratio canbecomputed

  • Linearregression (Novikov, 2004)

  • (1) First linearregression on all intensities

  • (2) Identification of outliers

  • (3) Sequentialremoving of outliers pixels

  • (4) Unbiasedlinearregression on kept pixels

  • Can onlybeusedwhen background isfairlylow and homogeneous.

  • The ratio isdirectlyextracted as the slope.

(2, 3)

(1)

(4)

Credits : Pierre NEUVIAL


Array quality controls

ARRAY QUALITY CONTROLS


Scans visualization

Scans visualization


Array based comparative genomic hybridization

Array quality controls (from Agilent)

General information and some parameters

Grid positioning check

Control of channels (signal, background, …)

Control of outliers (number and position)

Control of intensity distributions

Control of the randomness of signals


Array based comparative genomic hybridization

QC : Spatial homogeneity controls

Spatial representation of signals, background, log2(ratio), p-value, errors (…)

Distribution of signals and log2(ratio)


Spatial homogeneity the bubbly one

Spatial Homogeneity (the bubbly one)


Normalization

NORMALIZATION

Why ?

Some biasescanberemovedby specific algorithms


Spatial biases

Spatial biases

Intensity

gradients

Block

effects

Print-tip

bias

Local

bias

Most of thesebiases are linked to spottedarrays


Spatial biases correction example

Spatial biases correction (example)

Credits : Pierre NEUVIAL


Dye biases intensities

Dye biases : intensities


Dye biases impact on log2 ratio

Dye biases : impact on log2(ratio)


Gc biases

GC% biases


Gc biases1

GC% biases


Dye gc step 1

Dye + GC (step 1)


Dye gc step 2

Dye + GC (step 2)


Dye gc step 3

Dye + GC (step 3)


Centralization

CENTRALIZATION

Why ?

Data generated by thismethodare relativevalues (ratio of a test versus a reference) : we are lacking information about « real » normalitylevel.


Centralization an obvious example

Centralization : an obvious example

Identifying the most probable normal genomic level is easy here, as we have a main central peak.

Frequency

Ratio

Log2(ratio)

Chromosomes


Centralization a cancer example

Centralization : a cancer example

It’s much more difficult here, to the higher complexity of the distribution / profile…

Frequency

Ratio

Log2(ratio)

Chromosomes


Centralization1

Centralization


Centralization simplification of the distribution

Centralization : simplification of the distribution


Centralization comparing to the center of the distribution

Centralization : Comparing to the center of the distribution


Centralization comparing peaks height

Centralization : Comparing peaks height


Genomic profile visualization data segmentation

GENOMIC PROFILE VISUALIZATION& DATA SEGMENTATION

Why segmenting ?

Data reduction : The data obtained are a list of hundreds of thousands of values. However, a genomic profile can be simplified to a limited list of segments considered as abnormal.


A normalized centered segmented genomic profile with called aberrations

A normalized, centered, segmented genomic profile with called aberrations

Example taken from a breast cancer profile


Array based comparative genomic hybridization

Challenge : identifying breakpoints

  • Data consist in a continuous log2(ratio) distribution

  • Two main difficults :

  • Localizationof breakpointsisunknown by default

  • Neithertheirquantity

  • Twogeneralmodels :

  • Homoscedastic (m)

  • Heteroscedastic (m, V)


Array based comparative genomic hybridization

Several segmentation methods available

  • Initial methods

Median smoothing

EM mixture

clustering

  • « Newer », wellknownmethods

HMM/EM

CBS


Array based comparative genomic hybridization

Several segmentation methods available

Lai, 2005


Complexity

Complexity


Complexity1

Complexity


Complexity2

Complexity


Array based comparative genomic hybridization

Segments along the genome for 4 osteosarcoma samples


Array based comparative genomic hybridization

Inter-array CGH segmentation

=> Interesting idea but too stringent !

Rueda et al., BMC Bioinformatics 2009


Array based comparative genomic hybridization

LOH-assisted Copy Number Markov 5-states Model


Aberrations calling

ABERRATIONS CALLING


Array based comparative genomic hybridization

Theory : in a pure, diploid cell

0.58

Log2(ratio)

0

-1

-∞

Chromosomal position


Array based comparative genomic hybridization

History and context

Technical principle, classical designs

Description of oligo CGH arrays

Data preprocessing

Bioinformatic analysis

Cross-technology correlation


Unsupervized population analysis

UNSUPERVIZED POPULATION ANALYSIS


Array based comparative genomic hybridization

Hierarchical clustering, heatmap, frequency of aberrations


Array based comparative genomic hybridization

K-means clustering

Example on breast cancer data for K=2 and K=3


Array based comparative genomic hybridization

NMF clustering (Brunet et al, 2004)

Example for a population of 103 breast cancers optimaly clustered into 3 groups


Looking for minimal common regions mcr

Looking for minimal common regions (MCR)

In the multiclonal model of tumoral evolution, genes of interest (oncogenes, tumor suppressors, …) have a higher probability to be found more frequently than others in the overlap of aberrations defined by a sufficient number of genomic profiles.

Regions statistically found as potential MCR

The tool used for this purpose is STAC v1.2 (Diskin et al, 2006)

Common problem : CNVs as a contamination…


Stac two different methods

STAC : Twodifferentmethods

  • Overall frequency of the aberration at any location (includes samples not shown).

  • Computed frequency confidence [0-1] (+ : sensitive / - : Inefficient at edges)

  • Computed “footprint” confidence [0-1] (+ : equally efficient along the genome / - : less sensitive)

  • The bold black line at the bottom shows a detected MCR found for any of the two methods, when its confidence reaches a confidence cut-off (0.95 by default).


Examples of mcrs found for a population of glioma samples hybridized on 44k arrays

Examples of MCRs found for a population of glioma samples (hybridized on 44K arrays)

=> MYCN found in a 663 Kb window .

=> HOX genes cluster found in a 143 Kb window .

=> HRAS found in a 351 Kb window .

=> Loss of CDKN2A and CDKN2B in a 1.2 Mb window.


Genomic annotation of aberrant regions

Genomic annotation of aberrant regions

Partial example of a neuroblastoma cell-line


Comparison of paired samples

COMPARISON OF PAIRED SAMPLES


Array based comparative genomic hybridization

Direct profiles of the same patient at D0 & D21


Array based comparative genomic hybridization

Fitted profiles of the same patient at D0 & D21


Array based comparative genomic hybridization

Measured differences


Supervized analysis use of clinical annotations

SUPERVIZED ANALYSIS :USE OF (CLINICAL) ANNOTATIONS


Comparing clustered samples and clinical annotations

Comparing clustered samples and clinical annotations


Comparing population anomalies to clinical annotations

Comparing population anomalies to clinical annotations

Trivial example of the difference found on the ERBB2 locus when comparing ERBB2- amplified and non-amplified breast cancer populations.


Comparing population anomalies to clinical annotations1

Comparing population anomalies to clinical annotations

Another example showing a characteristic gain of the BRAF locus in a BRAF-mutated population of melanoma.


Array based comparative genomic hybridization

History and context

Technical principle, classical designs

Description of oligo CGH arrays

Data preprocessing

Bioinformatic analysis

Cross-technology correlation


Cross technology correlation

CROSS-TECHNOLOGY CORRELATION

Why?

Detecting genes undergoing simultaneously genomic copy number variations and RNA expression variation can be useful to get stronger candidates in the characterization of a pathology.

Due to molecular cascades in human pathways, gene expression analysis may preferentially show lower genes involved in a pathway. Correlating CGH and gene expression results from a same population, it may be easier to focus on “upper” genes.

+


Cgh ge correlation along the genome

CGH / GE correlation along the genome


Selected correlated genes

Selected correlated genes

"Cheese plots" for the probe-specific simultaneous visualization of cross-technology correlation and differential expression.


Copy number from ngs

Copy Number from NGS

  • Attempts to infer variations in copy number from the read local read depth.

  • A strong GC% debiasing is required

  • An efficient alignment algorithm is also required (MAQ)

Yoon, 2009


  • Login