array based comparative genomic hybridization
Download
Skip this Video
Download Presentation
Array-based Comparative Genomic Hybridization

Loading in 2 Seconds...

play fullscreen
1 / 82

Array-based Comparative Genomic Hybridization - PowerPoint PPT Presentation


  • 216 Views
  • Uploaded on

Array-based Comparative Genomic Hybridization. Bastien JOB 2010-10-19. Structural Genomics Sequence variations (CGHa, SNPa, DNAseq, mutations…). Fonctional Genomics Gene expression / splicing… (GEa, Q-PCR, RNAseq… ). Proteomics (Antibody arrays, 2D EP +MS/MS, HPLC+MS / MS, … ). Genome.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Array-based Comparative Genomic Hybridization' - cana


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide2

Structural Genomics

Sequence variations

(CGHa, SNPa, DNAseq, mutations…)

Fonctional Genomics

Gene expression / splicing…

(GEa, Q-PCR, RNAseq…)

Proteomics

(Antibody arrays, 2D EP +MS/MS, HPLC+MS / MS, …)

Genome

Transcriptome

Proteome

DNA: gene

RNA

Post-trad

modification

mRNA: transcript

protein

Intron

Transcription

Translation

Exon

Splicing,

editing

miRNA

Nucleus

Promotor, regulating seq

Cell Membrane

slide3
History and context

Technical principle, classical designs

Description of oligo CGH arrays

Data preprocessing

Bioinformatic analysis

Cross-technology correlation

history and context
History and context

CGH arrayis a methodaimingat the identification of the variation in number of the genomic content of a test sample, by comparison to a referencesample, using an array of (at least) thousands of measure points on the genome.

A bit of history of cytogenomics

  • [196x] : Karyotyping
  • [1993] : Spectral karyotyping (SKY)
  • [199x] : CGH (comparative genomichybridization) on chromosomes
  • [200x] : cDNA-based and BAC-based CGH array
  • [2005] : oligo-based CGH array

In cancer :

  • The profiling of the patterns defined by thesealterations for a patient or a pathology.
  • Explore for the association betweensome of these patterns and clinical annotations.

Other uses :

  • Developmentabnormalities, autism, diabetes, inter-individualsCNVs (HapMapproject), ...

It’s an establishedmethod in the cancer researchfield, in establishment for the diagnostic field.

slide5

196x : Karyotype

1993 : SKY

199x : CGH on chr

200x : cDNA/BAC-based CGH array

2005 : Oligo-based CGH array

rearrangements in tumors altering gene regulation
Rearrangements in tumors altering gene regulation

MYC – IgH translocation in Burkitt lymphoma

IMAGE CREDIT: Gregory Schuler, NCBI, NIH, Bethesda, MD, USA

Also a common fusion in prostate cancer (Tomlins et al., Science 2005)

chromosomal amplifications
Chromosomal amplifications

EGFR amplification in lung cancer as HSR (homogeneously stained region)

EGFR amplification in lung cancer as several double minutes

Varella-Garcia et al, J Clin Pathol 2009

common alterations across tumors and pathologies
Common alterations across tumorsand pathologies
  • Mutations activating / repressingpathways
  • Breakpointscreating duplications / amplifications / deletions / fusions
  • Known « master genes » like TP53, PTEN, CDKN2A/B, MYC, EGFR, FGF, …,
  • Some are tissue-specific, others more widelyspread

Duplicated genes

Deleted genes

activation

repression

slide10
History and context

Technical principle, classical designs

Description of oligo CGH arrays

Data preprocessing

Bioinformatic analysis

Cross-technology correlation

slide12

Designs (dual color)

  • For dual-channel CGHarray, most of the time :

Test sample DNA (tumor) Cy5 -vs- Reference DNA (normal) Cy3

  • Mainly use of a sex-matched commercial normal DNA as reference
    • Sex-matched: anomalies on gonosomes
    • « outside » reference : polymorphisms (CNV, « copy number variations »)
  • More rarely (cancer field) : using the same person’s normal DNA
    • No polymorphism
    • Same origin ≈ same preparation
    • Some difficulties for blood DNA extraction
  • Use of a « stable » cell-line with a complete ploidy as a reference (ex: Coriell NA10851)
  • More complex designs can be performed (circular, …)
slide13

T

(R)

CGH array simplified process on the platform :

From sample to analysis

Fragmentation

& labelling

DNA extraction

Hybridization

Samples

Qualification

& quantitation

oligo microarray

Bioinformatic

analyses

Segmentation &

visualization

Scan, signals acquisition & normalization

slide14
History and context

Technical principle, classical designs

Description of oligo CGH arrays

Data preprocessing

Bioinformatic analysis

Cross-technology correlation

long oligo agilent cgharrays
Long oligo Agilent CGHarrays

G2 : 244 K Agilent oligoarray

Spots : 60µm (@ 5µm/px)

Spots : 30µm @ 2µm/px

G3 : 4 x 180 K Agilent oligoarray

available formats for human
Available formats (for Human)

2ndgeneration

  • 4 x 44K
  • 2 x 105K
  • 1 x 244 K
  • 3rd generation (current)
  • 8 x 60K
  • 4 x 180K
  • 2 x 400 K
  • 1 x 1M
  • Most formats alsoavailable for mouse and rat
  • Possibility to design one’sown custom array for any format
short oligo affymetrix snp 6 0 array
Short oligo Affymetrix SNP 6.0 array

4x

906,600

SNP

probes

945,826 CN probes *

  • 25-mer oligos
  • ~700b averageinterval
  • ~2 Kb real CN interval

* ~200,000 CNVs

slide20
History and context

Technical principle, classical designs

Description of oligo CGH arrays

Data preprocessing

Bioinformatic analysis

Cross-technology correlation

simplified bioinformatics analysis pipeline
Simplified bioinformatics analysis pipeline

Genomic profile Segmentation

Signals acquisition

Quality controls

Normalization

CBS

Feature Extraction v10.x

Description of the population

Identification of genomic regions of interest

Describing genomic contents

Public databases

+ Clinical Annotations

R, aCGH

STAC

spot position identification
Spot position identification
  • by 2D intensityhistograms
  • By a circle (fixed / variable diameter)
  • Adaptative segmentation by randomseed propagation

Credits : Pierre NEUVIAL (ENSAE)

Currentoligogeneration : perfect disc-shaped spots.

spot extraction
Spot extraction
  • Twomethods :
  • Intensity segmentation
  • Isolation of real signal from a local background
  • Needed for bothsignals
  • Needs a background correction method
  • Then a ratio canbecomputed
  • Linearregression (Novikov, 2004)
  • (1) First linearregression on all intensities
  • (2) Identification of outliers
  • (3) Sequentialremoving of outliers pixels
  • (4) Unbiasedlinearregression on kept pixels
  • Can onlybeusedwhen background isfairlylow and homogeneous.
  • The ratio isdirectlyextracted as the slope.

(2, 3)

(1)

(4)

Credits : Pierre NEUVIAL

slide27

Array quality controls (from Agilent)

General information and some parameters

Grid positioning check

Control of channels (signal, background, …)

Control of outliers (number and position)

Control of intensity distributions

Control of the randomness of signals

slide28

QC : Spatial homogeneity controls

Spatial representation of signals, background, log2(ratio), p-value, errors (…)

Distribution of signals and log2(ratio)

normalization
NORMALIZATION

Why ?

Some biasescanberemovedby specific algorithms

spatial biases
Spatial biases

Intensity

gradients

Block

effects

Print-tip

bias

Local

bias

Most of thesebiases are linked to spottedarrays

spatial biases correction example
Spatial biases correction (example)

Credits : Pierre NEUVIAL

centralization
CENTRALIZATION

Why ?

Data generated by thismethodare relativevalues (ratio of a test versus a reference) : we are lacking information about « real » normalitylevel.

centralization an obvious example
Centralization : an obvious example

Identifying the most probable normal genomic level is easy here, as we have a main central peak.

Frequency

Ratio

Log2(ratio)

Chromosomes

centralization a cancer example
Centralization : a cancer example

It’s much more difficult here, to the higher complexity of the distribution / profile…

Frequency

Ratio

Log2(ratio)

Chromosomes

genomic profile visualization data segmentation
GENOMIC PROFILE VISUALIZATION& DATA SEGMENTATION

Why segmenting ?

Data reduction : The data obtained are a list of hundreds of thousands of values. However, a genomic profile can be simplified to a limited list of segments considered as abnormal.

a normalized centered segmented genomic profile with called aberrations
A normalized, centered, segmented genomic profile with called aberrations

Example taken from a breast cancer profile

slide49

Challenge : identifying breakpoints

  • Data consist in a continuous log2(ratio) distribution
  • Two main difficults :
  • Localizationof breakpointsisunknown by default
  • Neithertheirquantity
  • Twogeneralmodels :
  • Homoscedastic (m)
  • Heteroscedastic (m, V)
slide50

Several segmentation methods available

  • Initial methods

Median smoothing

EM mixture

clustering

  • « Newer », wellknownmethods

HMM/EM

CBS

slide56

Inter-array CGH segmentation

=> Interesting idea but too stringent !

Rueda et al., BMC Bioinformatics 2009

slide59

Theory : in a pure, diploid cell

0.58

Log2(ratio)

0

-1

-∞

Chromosomal position

slide61
History and context

Technical principle, classical designs

Description of oligo CGH arrays

Data preprocessing

Bioinformatic analysis

Cross-technology correlation

slide64

K-means clustering

Example on breast cancer data for K=2 and K=3

slide65

NMF clustering (Brunet et al, 2004)

Example for a population of 103 breast cancers optimaly clustered into 3 groups

looking for minimal common regions mcr
Looking for minimal common regions (MCR)

In the multiclonal model of tumoral evolution, genes of interest (oncogenes, tumor suppressors, …) have a higher probability to be found more frequently than others in the overlap of aberrations defined by a sufficient number of genomic profiles.

Regions statistically found as potential MCR

The tool used for this purpose is STAC v1.2 (Diskin et al, 2006)

Common problem : CNVs as a contamination…

stac two different methods
STAC : Twodifferentmethods
  • Overall frequency of the aberration at any location (includes samples not shown).
  • Computed frequency confidence [0-1] (+ : sensitive / - : Inefficient at edges)
  • Computed “footprint” confidence [0-1] (+ : equally efficient along the genome / - : less sensitive)
  • The bold black line at the bottom shows a detected MCR found for any of the two methods, when its confidence reaches a confidence cut-off (0.95 by default).
examples of mcrs found for a population of glioma samples hybridized on 44k arrays
Examples of MCRs found for a population of glioma samples (hybridized on 44K arrays)

=> MYCN found in a 663 Kb window .

=> HOX genes cluster found in a 143 Kb window .

=> HRAS found in a 351 Kb window .

=> Loss of CDKN2A and CDKN2B in a 1.2 Mb window.

genomic annotation of aberrant regions
Genomic annotation of aberrant regions

Partial example of a neuroblastoma cell-line

comparing population anomalies to clinical annotations
Comparing population anomalies to clinical annotations

Trivial example of the difference found on the ERBB2 locus when comparing ERBB2- amplified and non-amplified breast cancer populations.

comparing population anomalies to clinical annotations1
Comparing population anomalies to clinical annotations

Another example showing a characteristic gain of the BRAF locus in a BRAF-mutated population of melanoma.

slide78
History and context

Technical principle, classical designs

Description of oligo CGH arrays

Data preprocessing

Bioinformatic analysis

Cross-technology correlation

cross technology correlation
CROSS-TECHNOLOGY CORRELATION

Why?

Detecting genes undergoing simultaneously genomic copy number variations and RNA expression variation can be useful to get stronger candidates in the characterization of a pathology.

Due to molecular cascades in human pathways, gene expression analysis may preferentially show lower genes involved in a pathway. Correlating CGH and gene expression results from a same population, it may be easier to focus on “upper” genes.

+

selected correlated genes
Selected correlated genes

"Cheese plots" for the probe-specific simultaneous visualization of cross-technology correlation and differential expression.

copy number from ngs
Copy Number from NGS
  • Attempts to infer variations in copy number from the read local read depth.
  • A strong GC% debiasing is required
  • An efficient alignment algorithm is also required (MAQ)

Yoon, 2009

ad