1 / 19

Analyzing Copy Number Variation in the Human Genome

Jeff Bailey S5-432 . Analyzing Copy Number Variation in the Human Genome. Continuum of Genomic Variation. Forms of genetic variation. Nucleotide. Single base-pair changes Point mutations (1 per 800 bp) Small insertions/deletions Frameshift, microsatellite, minisatellite Mobile elements

adriel
Download Presentation

Analyzing Copy Number Variation in the Human Genome

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Jeff Bailey S5-432 Analyzing Copy Number Variationin the Human Genome

  2. Continuum of Genomic Variation Forms of genetic variation. Nucleotide • Single base-pair changes • Point mutations (1 per 800 bp) • Small insertions/deletions • Frameshift, microsatellite, minisatellite • Mobile elements • Retroelement insertions (300bp -10 kb) • Large-scale genomic copy number variation (>10 kb) • Large-scale Deletions • Segmental Duplications • Local Rearangements • Chromosomal variation • Translocation, inversion, fusion Copy Number Variation Structural Variants (SV) Cytogenetics

  3. Gain Gain >green >red Loss METHOD 1: Copy Number Variation:Array Comparative Genomic Hybridization • Two genomic surveys of normal individuals identified 76 and 255 CNV regions by array CGH ( Sebat et al. Science 2004; Iafrate et al.Nat Genet 2004) • 30% CNVs overlap duplicated regions (variant SD = CNV) ( Sebat et al. Science 2004) (blue line) Modified:Feuk et al. Nat Rev Genet 2006

  4. 99.1% identical over 180 kb (VCF/DiGeorge Syndrome in 1 in 3000 births) Segmental Duplications (SD) 5.4% of the genome (>90% identity and >1 kb) chr22 • Properties: • Clustered • Complex regions Bailey and Eichler (2006) Nat Rev Genet

  5. SDs predispose to copy number variation I D D’ Cen Tel I D’ D Cen Non-allelic Homologous Recombination (Lupski, 1999) I I D’- D D D’ Cen Tel GAMETES D - D’ Cen Tel Change in Dosage Sensitive Genes → phenotype or disease Dynamic Regions – predisposed to further rearrangements

  6. Complex disease associations 1) Recurrent germline rearrangements causing congenital disease 2) Rare CNVs causing disease in a small proportion of affected individuals in a Mendelian fashion 3) Common CNVs that are responsible for a proportion of complex genetic risk in many individuals

  7. >48 kb Putative Deletion within fosmid < 32 kb Putative Insertion within fosmid Method 2: End-Sequence Pair (ESP) Analysis insert fosmid • ~1.1 million fosmid end-sequence pairs derived from a single donor (sequenced by MIT to help close gaps in the reference genome) • Fosmid insert size tightly distributed around mean (40 kb) • Compare fosmid optimal placements to detect deviations from expected. Fosmid: Concordant Insertion Deletion Inversions ReferenceGenome Dataset: 1,122,408 fosmid pairs preprocessed (15.5X genome coverage) 639,204 fosmid pairs BEST pairs (8.8X genome coverage) Results: Tuzun*, Bailey*, Sharp* et al. Nat. Genet 2005

  8. Fosmid SV Project • Fosmid End Sequencing 8 HapMap Individuals • 1695 structural variants • 525 novel insertion sequences (Kidd et al. 2008 453:56) NAHR-non-allelic homologous recombination NHEJ-- repair of double strand breaks VNTR-- strand slippage Retrotransposition-- insertion of L1, SVA or Alu element

  9. Method 3: Whole Genome Sequencing • Genome Resequencing Studies • SNPs: 3,2 M bases • Non-SNP: 9.1 M bases • 22% events, 74% variant bases (Levy et al Plos Biol 2007:e266) • Read Depth, Mismapping Pairs • Future: Perfect Whole Genome Assembly

  10. Summary of Human Genome Copy Number Variation (12/2006) • 20% of the human genome is CNV? • 3000+ genes with exons in these regions CNV? (Currently 30% of genome and 9473 genes)

  11. How many genes are truly CNV? • Lack of Breakpoint Precision? • BACs: 150-250 kb clones of which only a part of the sequence may be CNV • False positives? • Multiple studies: Increasethe proportion of false positives since true positivestend to overlap BAC gene CNV FP TP Study#1 #2 #3

  12. Design of Custom oligonucleotide aCGH • Equal number of probes per exon (exon size 3 bp – 10 Kb). • Limitation: NimbleGen algorithm creates equally spaced probes across a region. 1 2 3 Select genomic regions to target for probe design Merge overlapping regions Select oligonucleotide probe sequences (average 12/exon) and place on microarray Bailey et al. Cytogenet Genome Res 2008

  13. Mean intensity difference -0.2 SD +1.1 SD +1.4 SD +0.6 SD +1.2 SD Step #1: Seed Step #2: Extension -0.2 SD +1.1 SD +1.4 SD +0.6 SD +1.2 SD 4-exon Partial-gene CNV Detection Method ExonStructure Exon 1 Exon 2 Exon 3 Exon 4 Exon 5 Probe Regions Hybridization Log2 probe intensity Bailey et al. Cytogenet Genome Res 2008

  14. CNV in RHD 25 Chr 1 (kb) 25,390 25,350 25,370 Gene Model Exons Probe Regions GM12878 GM18517 GM18507 GM18956 GM19129 GM12156 GM18502 GM19240 GM18555 Segmental Duplications Bailey et al. Cytogenet Genome Res 2008

  15. Detecting >500 bp and >5% freq 8,599 CNV regions: 3.7% of genome (112.7 Mb)2 genomes: 1,098 CNVs 0.78% (24 Mb) Conrad, et al. 2009 Nature

  16. Causal CNVs Conrad, et al. 2009 Nature

  17. Infectious Disease Genetics Human Genome Pathogen Genome • Complex interplay that results in infectious disease phenotype • Potential host defense responses and pathogen virulence are encode in respective genomes. • SD and CNV represent key mechanisms for adaptation and diversification of responses for both host and pathogen. • The study of SD and CNV is necessary to fully understand the genetics and biology of infectious disease pathogenesis. Environment Vector Genome

  18. Human CNV typing and association studies • Comprehensive CNV Typing Chip (1st generation) • Collaboration with the Eichler Lab • Preferentially targeting gene CNVs (5,000 CNVs → 1000 genic regions → 30% host defense) • Agilent and NimbleGen oligoarray platforms • Defining copy number responsive probes • Defining copy specific probes to remove cross-hybridization • Case-control studies to examine infectious disease and immune phenotypes for association with CNVs

  19. Human Malaria • Malaria: 2-3 million deaths per year • “strongest known force for evolutionary selection in the recent history of the human genome” (Kwitkowski 2005 Am J Hum Genet) • HbS, HbC, HbE, thalassemia, ABO, Duffy null, SE Asian ovalocytosis, IL-4, CR1, HLA-DRB ... • Hypothesis: Strong selection will have impacted CNVs • Testing case-control samples for CNV associations with resistance to infection and cerebral malaria.

More Related