Introduction

Detection of structural variants and copy number alterations in cancer: from computational strategies to the discovery of chromothripsis in neuroblastoma Valentina Boeva1,2,3, Bruno Zeitouni1,2,3, Tatiana Popova1,2,3, Kevin Bleakley1,2,3, Andrei Zinovyev1,2,3, Jean-Philippe Vert1,2,3, Isabelle Janoueix-Lerosey3,4, Olivier Delattre3,4 and Emmanuel Barillot1,2,3E-mail: valentina.boeva@curie.fr 1Inserm U900, 75248 Paris, France 2Mines ParisTech, Fontainebleau, F-77300 France 3Institut Curie, 26, rue d’Ulm, 75248 Paris, France 4 Inserm U830, 75248 Paris, France Introduction In many studies that apply deep sequencing to cancer genomes, one has to calculate copy number profiles (CNPs) and predict regions of gain and loss. There exist two frequent obstacles in the analysis of cancer genomes: absence of an appropriate control sample for normal tissue and possible polyploidy. We therefore developed Control-FREEC1,2, able to automatically detect Copy Number Alterations (CNAs) with or without use of a control dataset and Loss of Heterozygosity (LOH) regions. For mate-paired/paired-ends mapping (PEM) data, one can complement the information about CNAs (i.e., output of Control-FREEC) with the predictions of Structural Variants (SVs) made by another tool that we developed, SVDetect3. Here we used a combination of Control-FREEC and SVDetect (http://bioinfo-out.curie.fr/projects/freec/sv.html) on neuroblastoma samples to (1) refine coordinates of CNAs using PEM data and (2) improve confidence in calling true positive rearrangements (particularly, in ambiguous satellite/repetitive regions). CNA & LOH detection (FREEC) Window size selection Calculation of dependency function “RC vs GC-content” or “RC sample vs RC control” W = L/T/(CV)2, where L = genome length, T = total number of reads, CV = user-defined Coefficient of Variation. • Detection of CNA regions • Detection of LOH regions • Possibility to work without control sample • Possibility to set tumor ploidy • Automatic window selection • Use of mappability information • Evaluation of and adjustment of contamination of tumor samples by normal cells • Possibility to work with exome data • Possibility to cross the output with the output of SVDetect F G Calculation of BAF profiles To find a best fit by polynomial, shown in black (A-D), we first make an initialization of the polynomial's parameters (median value of RC for GC-content). Then, we optimize polynomial’s parameters by iteratively selecting data points related to P-copy regions and making a least-squares fit on them. Adjustment for a possible contamination by normal cells Results and graphical visualization List of gains and losses with assigned copy numbers Visualization in R Creation of different file format outputs for graphical visualization: Circos, UCSC Genome Browser (BedGraph) Control-FREEC uses the following formula to evaluate the fraction of contaminating normal cells p, and then correct copy number profiles: NRCi ≈ Ei + (1 - Ei)p, where NRCiis the normalized read count in window i, Ei is the expected ratio in window i . Normalized Copy Number B allele frequency Annotation of B allele frequency profiles using Gaussian mixture model fit Segmentation Segmentation is done by a LASSO-based algorithm suggested by (Harchaoui and Lévy-Leduc, 2008). Detection of SVs (SVDetect) Discovery of chromothripsis in neuroblastoma For mate-paired/paired-ends mapping (PEM) data, one can complement the information about copy number changes (i.e., output of FREEC) with the predictions of structural variants (SVs) made by SVDetect3. Automatic intersection of Control-FREEC and SVDetect outputs allows one to: We investigated somatic rearrangements in two neuroblastoma cell lines and two primary tumors using paired-end sequencing of mate-pair libraries • Refine coordinates of CNAs using PEMs • Filter out false predictions of SVDetect (often in ambiguous satellite/repetitive regions) Neuroblastoma cell lines • SVDetect3 is a tool that allows the user to: • identify candidate SVs using the clustering of discordant PEMs, • predict the type of a SV using the PEM signature, • Filter out PEMs inconsistent with the main signature of the predicted SV, • Compare SVs predicted for different samples • Create different file format outputs for graphical visualization of predicted SVs CLB-GA Intra-chromosomal SVs Inter-chromosomal SVs CLB-RE Primary neuroblastoma tumors with chromothripsis Circos representation of SVs predicted by SVDetect confirmed by the CNAs identified by Control-FREEC. (A-C) NB1141, (D-E) NB1142. (A,D) whole genome view, (B, E) zoom on chromothripsis, (C, F) copy number profile for chr1 of NB1141 and chr6 of NB1142. Illustrations of read signatures for SV type prediction (implemented in SVDetect3) 1 Control-free calling of copy number alterations in deep-sequencing data using GC-content normalization. Boeva, V., et al. Bioinformatics, 2011; 27(2):268-9. http://bioinfo-out.curie.fr/projects/freec/ 2 Control-FREEC: a tool for assessing copy number and allelic content using next generation sequencing data. V. Boeva, et al. Bioinformatics, 2012, 28(3):423-5. 3 SVDetect - a bioinformatic tool to identify genomic structural variations from paired-end next-generation sequencing data. B. Zeitouni et al., Bioinformatics, 2010. 26: 1895-1896.http://svdetect.sourceforge.net

Introduction

Introduction

Presentation Transcript

Introduction to introduction to introduction to … Optimization

INTRODUCTION/ INTRODUCTION

Introduction

INTRODUCTION

Introduction

Introduction