230 likes | 427 Views
Tumor Genome Sequencing. Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520. Cancer. Cancer will affect 1 in 2 men and 1 in 3 women in the United States, and the number of new cases of cancer is set to nearly double by the year 2050.
E N D
Tumor Genome Sequencing Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520
Cancer • Cancer will affect 1 in 2 men and 1 in 3 women in the United States, and the number of new cases of cancer is set to nearly double by the year 2050. • Cancer is a genetic disease caused by mutations in the DNA • Clinically tumors can look the same but most differ genetically.
Mutations in the Tumor Genome • Help us identify important genes for tumorigenesis and cancer progression • Drivers – a.k.a gatekeepers, mutations that cause and accelerate cancers • Passengers – Accidental by-products and thwarted DNA-repair mechanisms • Recurrent mutations on genes or pathways are likely drivers
High Throughput Driver Detection • Differential gene expression • Copy number aberration (CNA) or variation (CNV) using CGH, tiling or SNP arrays
GISTIC • Gscore: frequency of occurrence and the amplitude of the aberration • Statistical significance evaluated by permutation • FDR adjust for multiple hypothesis testing
Two Major Cancer Genome Projects • TCGA: The Cancer Genome Atlas • US funded • ~20 cancer types * a few hundred tumor samples each • Genome, transcriptome, DNA methylome, proteomics • Rigorous tumor sample QC, consistent profiling platform • ICGC: International Cancer Genome Consortium • 11 countries • 20 cancer types * 500 tumor samples each
Different Sequencing Approaches • Capture-seq ($400-600) • Could focus well known mutations • Exome-seq ($700-2K) • All the exons in genes; promoters and LncRNA genes? • RNA-seq ($500-2K) • Expression and mutations together, miss anything? • Whole genome sequencing ($3-4K) • Majority of mutations non-coding, function unknown • Better at detecting structural changes (translocations, fusions) • Cost-vs-benefit balance
MAF and VCF Formats • VCF (GWAS format) and MAF (TCGA format) • Both can annotate somatic mutations and germline variants • Tab delimited text file • CHROM, POS, ID (SNP id, gene symbol, or ENTREZ gene id), REF (reference seq), ALT (altered sequence), QUAL (quality score), FILTER (PASS vs “q10;s50” quality <=10, <=50% samples have data here), INFO (allele counts, total counts, number of samples with data, somatic or not, validated, etc)
GATK • https://www.broadinstitute.org/gatk/guide/best-practices FASTA-> BAM BAM->VCF Annotate
Example of a Cancer Genome Mutations Profile • Circos Plot: how messed up a cancer genome is
Total alterations affecting protein-coding genes in selected tumors Vogelstein et al, Science 2013
Somatic Mutation Frequency in 3K Tumor-Normal Pairs • Typical tumors: median 45 mutations / tumor • More mutations for tumors facing outside
Mutation Rate Heterogeneity • Mutation rate correlated with replication timing, gene expression, and gene length • Tumor evolution and selection
TS vs Oncogenes, GoF vs LoF • Tumor suppressors vs oncogenes • Gain of Function (GoF) or Loss of Function (LoF) mutations • Phenotypes • How to tell? • From mutation patterns • From expression patterns • Functional studies • Some genes can be both TS and oncogenes
Mutually Exclusivity and Co-occurrence • Most cancers have >=2 sequential mutations developed over many years. • Mutations in different pathways can co-occur in the same cancer, whereas those in the same pathway are rarely mutated in the same sample.
How Much Should We Sequence? • Need ~200 patients for 20% mutation rate, ~550 pts for 10%, ~1200 pts for 5% mutation rate. • Most driver mutations have been found, pressing need in basic cancer research to study their function • Biggest surprise: mutations on chromatin regulators • > 50% new and strong cancer driver genes • Oncogenes: DNMT3A, IDH1 • Tumor Suppressor: MLL, ATRX, ARID1A, SNF5 • Both: EZH2
Resources • MSKCC CBioPortal • GUI interface for experimental biologists • Broad FireHose • API for accessing processed TCGA data • UCSC CGHub • API for accessing raw and processed cancer data • Sanger COSMIC • Catalog of Somatic Mutations in Cancer • Many also provide software tools
Summary • Different sequencing approaches • Different mutation types and distributions • Gain or loss of function mutations • Tumor suppressor vs oncogenes • Cancer pathways or hallmarks • Mutation co-occurrence and mutual exclusivity • How to study the functions of the mutations?
Acknolwedgement • Aleksandar Milosavljevic • John Pack • Cheng Li • Xujun Wang