Array based comparative genomic hybridization
Download
1 / 82

Array-based Comparative Genomic Hybridization - PowerPoint PPT Presentation


  • 216 Views
  • Uploaded on

Array-based Comparative Genomic Hybridization. Bastien JOB 2010-10-19. Structural Genomics Sequence variations (CGHa, SNPa, DNAseq, mutations…). Fonctional Genomics Gene expression / splicing… (GEa, Q-PCR, RNAseq… ). Proteomics (Antibody arrays, 2D EP +MS/MS, HPLC+MS / MS, … ). Genome.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Array-based Comparative Genomic Hybridization' - cana


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Structural Genomics

Sequence variations

(CGHa, SNPa, DNAseq, mutations…)

Fonctional Genomics

Gene expression / splicing…

(GEa, Q-PCR, RNAseq…)

Proteomics

(Antibody arrays, 2D EP +MS/MS, HPLC+MS / MS, …)

Genome

Transcriptome

Proteome

DNA: gene

RNA

Post-trad

modification

mRNA: transcript

protein

Intron

Transcription

Translation

Exon

Splicing,

editing

miRNA

Nucleus

Promotor, regulating seq

Cell Membrane


History and context

Technical principle, classical designs

Description of oligo CGH arrays

Data preprocessing

Bioinformatic analysis

Cross-technology correlation


History and context
History and context

CGH arrayis a methodaimingat the identification of the variation in number of the genomic content of a test sample, by comparison to a referencesample, using an array of (at least) thousands of measure points on the genome.

A bit of history of cytogenomics

  • [196x] : Karyotyping

  • [1993] : Spectral karyotyping (SKY)

  • [199x] : CGH (comparative genomichybridization) on chromosomes

  • [200x] : cDNA-based and BAC-based CGH array

  • [2005] : oligo-based CGH array

    In cancer :

  • The profiling of the patterns defined by thesealterations for a patient or a pathology.

  • Explore for the association betweensome of these patterns and clinical annotations.

    Other uses :

  • Developmentabnormalities, autism, diabetes, inter-individualsCNVs (HapMapproject), ...

    It’s an establishedmethod in the cancer researchfield, in establishment for the diagnostic field.


196x : Karyotype

1993 : SKY

199x : CGH on chr

200x : cDNA/BAC-based CGH array

2005 : Oligo-based CGH array



Rearrangements in tumors altering gene regulation
Rearrangements in tumors altering gene regulation

MYC – IgH translocation in Burkitt lymphoma

IMAGE CREDIT: Gregory Schuler, NCBI, NIH, Bethesda, MD, USA

Also a common fusion in prostate cancer (Tomlins et al., Science 2005)


Chromosomal amplifications
Chromosomal amplifications

EGFR amplification in lung cancer as HSR (homogeneously stained region)

EGFR amplification in lung cancer as several double minutes

Varella-Garcia et al, J Clin Pathol 2009


Common alterations across tumors and pathologies
Common alterations across tumorsand pathologies

  • Mutations activating / repressingpathways

  • Breakpointscreating duplications / amplifications / deletions / fusions

  • Known « master genes » like TP53, PTEN, CDKN2A/B, MYC, EGFR, FGF, …,

  • Some are tissue-specific, others more widelyspread

Duplicated genes

Deleted genes

activation

repression


History and context

Technical principle, classical designs

Description of oligo CGH arrays

Data preprocessing

Bioinformatic analysis

Cross-technology correlation



Designs (dual color)

  • For dual-channel CGHarray, most of the time :

    Test sample DNA (tumor) Cy5 -vs- Reference DNA (normal) Cy3

  • Mainly use of a sex-matched commercial normal DNA as reference

    • Sex-matched: anomalies on gonosomes

    • « outside » reference : polymorphisms (CNV, « copy number variations »)

  • More rarely (cancer field) : using the same person’s normal DNA

    • No polymorphism

    • Same origin ≈ same preparation

    • Some difficulties for blood DNA extraction

  • Use of a « stable » cell-line with a complete ploidy as a reference (ex: Coriell NA10851)

  • More complex designs can be performed (circular, …)


T

(R)

CGH array simplified process on the platform :

From sample to analysis

Fragmentation

& labelling

DNA extraction

Hybridization

Samples

Qualification

& quantitation

oligo microarray

Bioinformatic

analyses

Segmentation &

visualization

Scan, signals acquisition & normalization


History and context

Technical principle, classical designs

Description of oligo CGH arrays

Data preprocessing

Bioinformatic analysis

Cross-technology correlation


Long oligo agilent cgharrays
Long oligo Agilent CGHarrays

G2 : 244 K Agilent oligoarray

Spots : 60µm (@ 5µm/px)

Spots : 30µm @ 2µm/px

G3 : 4 x 180 K Agilent oligoarray


Available formats for human
Available formats (for Human)

2ndgeneration

  • 4 x 44K

  • 2 x 105K

  • 1 x 244 K

  • 3rd generation (current)

  • 8 x 60K

  • 4 x 180K

  • 2 x 400 K

  • 1 x 1M

  • Most formats alsoavailable for mouse and rat

  • Possibility to design one’sown custom array for any format



Short oligo affymetrix snp 6 0 array
Short oligo Affymetrix SNP 6.0 array

4x

906,600

SNP

probes

945,826 CN probes *

  • 25-mer oligos

  • ~700b averageinterval

  • ~2 Kb real CN interval

* ~200,000 CNVs



History and context

Technical principle, classical designs

Description of oligo CGH arrays

Data preprocessing

Bioinformatic analysis

Cross-technology correlation


Simplified bioinformatics analysis pipeline
Simplified bioinformatics analysis pipeline

Genomic profile Segmentation

Signals acquisition

Quality controls

Normalization

CBS

Feature Extraction v10.x

Description of the population

Identification of genomic regions of interest

Describing genomic contents

Public databases

+ Clinical Annotations

R, aCGH

STAC



Spot position identification
Spot position identification

  • by 2D intensityhistograms

  • By a circle (fixed / variable diameter)

  • Adaptative segmentation by randomseed propagation

Credits : Pierre NEUVIAL (ENSAE)

Currentoligogeneration : perfect disc-shaped spots.


Spot extraction
Spot extraction

  • Twomethods :

  • Intensity segmentation

  • Isolation of real signal from a local background

  • Needed for bothsignals

  • Needs a background correction method

  • Then a ratio canbecomputed

  • Linearregression (Novikov, 2004)

  • (1) First linearregression on all intensities

  • (2) Identification of outliers

  • (3) Sequentialremoving of outliers pixels

  • (4) Unbiasedlinearregression on kept pixels

  • Can onlybeusedwhen background isfairlylow and homogeneous.

  • The ratio isdirectlyextracted as the slope.

(2, 3)

(1)

(4)

Credits : Pierre NEUVIAL




Array quality controls (from Agilent)

General information and some parameters

Grid positioning check

Control of channels (signal, background, …)

Control of outliers (number and position)

Control of intensity distributions

Control of the randomness of signals


QC : Spatial homogeneity controls

Spatial representation of signals, background, log2(ratio), p-value, errors (…)

Distribution of signals and log2(ratio)



Normalization
NORMALIZATION

Why ?

Some biasescanberemovedby specific algorithms


Spatial biases
Spatial biases

Intensity

gradients

Block

effects

Print-tip

bias

Local

bias

Most of thesebiases are linked to spottedarrays


Spatial biases correction example
Spatial biases correction (example)

Credits : Pierre NEUVIAL









Centralization
CENTRALIZATION

Why ?

Data generated by thismethodare relativevalues (ratio of a test versus a reference) : we are lacking information about « real » normalitylevel.


Centralization an obvious example
Centralization : an obvious example

Identifying the most probable normal genomic level is easy here, as we have a main central peak.

Frequency

Ratio

Log2(ratio)

Chromosomes


Centralization a cancer example
Centralization : a cancer example

It’s much more difficult here, to the higher complexity of the distribution / profile…

Frequency

Ratio

Log2(ratio)

Chromosomes






Genomic profile visualization data segmentation
GENOMIC PROFILE VISUALIZATION& DATA SEGMENTATION

Why segmenting ?

Data reduction : The data obtained are a list of hundreds of thousands of values. However, a genomic profile can be simplified to a limited list of segments considered as abnormal.


A normalized centered segmented genomic profile with called aberrations
A normalized, centered, segmented genomic profile with called aberrations

Example taken from a breast cancer profile


Challenge : identifying breakpoints called aberrations

  • Data consist in a continuous log2(ratio) distribution

  • Two main difficults :

  • Localizationof breakpointsisunknown by default

  • Neithertheirquantity

  • Twogeneralmodels :

  • Homoscedastic (m)

  • Heteroscedastic (m, V)


Several segmentation methods available called aberrations

  • Initial methods

Median smoothing

EM mixture

clustering

  • « Newer », wellknownmethods

HMM/EM

CBS


Several segmentation methods available called aberrations

Lai, 2005


Complexity
Complexity called aberrations


Complexity1
Complexity called aberrations


Complexity2
Complexity called aberrations



Inter-array CGH segmentation called aberrations

=> Interesting idea but too stringent !

Rueda et al., BMC Bioinformatics 2009



Aberrations calling
ABERRATIONS CALLING called aberrations


Theory : in a pure, diploid cell called aberrations

0.58

Log2(ratio)

0

-1

-∞

Chromosomal position


History and context called aberrations

Technical principle, classical designs

Description of oligo CGH arrays

Data preprocessing

Bioinformatic analysis

Cross-technology correlation




K-means clustering called aberrations

Example on breast cancer data for K=2 and K=3


NMF clustering called aberrations(Brunet et al, 2004)

Example for a population of 103 breast cancers optimaly clustered into 3 groups


Looking for minimal common regions mcr
Looking for minimal common regions (MCR) called aberrations

In the multiclonal model of tumoral evolution, genes of interest (oncogenes, tumor suppressors, …) have a higher probability to be found more frequently than others in the overlap of aberrations defined by a sufficient number of genomic profiles.

Regions statistically found as potential MCR

The tool used for this purpose is STAC v1.2 (Diskin et al, 2006)

Common problem : CNVs as a contamination…


Stac two different methods
STAC : called aberrationsTwodifferentmethods

  • Overall frequency of the aberration at any location (includes samples not shown).

  • Computed frequency confidence [0-1] (+ : sensitive / - : Inefficient at edges)

  • Computed “footprint” confidence [0-1] (+ : equally efficient along the genome / - : less sensitive)

  • The bold black line at the bottom shows a detected MCR found for any of the two methods, when its confidence reaches a confidence cut-off (0.95 by default).


Examples of mcrs found for a population of glioma samples hybridized on 44k arrays
Examples of MCRs found for a population of glioma samples (hybridized on 44K arrays)

=> MYCN found in a 663 Kb window .

=> HOX genes cluster found in a 143 Kb window .

=> HRAS found in a 351 Kb window .

=> Loss of CDKN2A and CDKN2B in a 1.2 Mb window.


Genomic annotation of aberrant regions
Genomic annotation of aberrant regions (hybridized on 44K arrays)

Partial example of a neuroblastoma cell-line


Comparison of paired samples
COMPARISON OF PAIRED SAMPLES (hybridized on 44K arrays)


Direct (hybridized on 44K arrays) profiles of the same patient at D0 & D21


Fitted (hybridized on 44K arrays) profiles of the same patient at D0 & D21


Measured differences (hybridized on 44K arrays)


Supervized analysis use of clinical annotations
SUPERVIZED ANALYSIS : (hybridized on 44K arrays)USE OF (CLINICAL) ANNOTATIONS



Comparing population anomalies to clinical annotations
Comparing population anomalies to clinical annotations (hybridized on 44K arrays)

Trivial example of the difference found on the ERBB2 locus when comparing ERBB2- amplified and non-amplified breast cancer populations.


Comparing population anomalies to clinical annotations1
Comparing population anomalies to clinical annotations (hybridized on 44K arrays)

Another example showing a characteristic gain of the BRAF locus in a BRAF-mutated population of melanoma.


History and context (hybridized on 44K arrays)

Technical principle, classical designs

Description of oligo CGH arrays

Data preprocessing

Bioinformatic analysis

Cross-technology correlation


Cross technology correlation
CROSS-TECHNOLOGY CORRELATION (hybridized on 44K arrays)

Why?

Detecting genes undergoing simultaneously genomic copy number variations and RNA expression variation can be useful to get stronger candidates in the characterization of a pathology.

Due to molecular cascades in human pathways, gene expression analysis may preferentially show lower genes involved in a pathway. Correlating CGH and gene expression results from a same population, it may be easier to focus on “upper” genes.

+


Cgh ge correlation along the genome
CGH / GE correlation along the genome (hybridized on 44K arrays)


Selected correlated genes
Selected correlated genes (hybridized on 44K arrays)

"Cheese plots" for the probe-specific simultaneous visualization of cross-technology correlation and differential expression.


Copy number from ngs
Copy Number from NGS (hybridized on 44K arrays)

  • Attempts to infer variations in copy number from the read local read depth.

  • A strong GC% debiasing is required

  • An efficient alignment algorithm is also required (MAQ)

Yoon, 2009


ad