290 likes | 391 Views
Advancing Personal Genetics with Second Generation Sequencing. 28-Apr 8:15AM – 8:45AM Next-Gen Seq Data Management. Thanks to:. Context: Personal Genomics Landscape direct-to-consumer -- hybrid -- research only. *. *. REVEAL. *. *. 23andme.
E N D
Advancing Personal Genetics with Second Generation Sequencing 28-Apr 8:15AM – 8:45AM Next-Gen Seq Data Management Thanks to:
Context: Personal Genomics Landscapedirect-to-consumer -- hybrid -- research only * * REVEAL * *
Over 600 alleles of BRCA1 (Myriad/DNAdirect sequencing not chips)
PersonalGenomes.org Project Goals 1) Low cost: <$1K : 98% exome (or more) 2) Active subject participation, informed redaction 3) Avoid over-promising de-identification 4) Entrance exam to ensure highly informed consent 5) Multiple samples to ensure consistent IDs 6) Open access (not just researcher subset) 7) Trait questionnaire, stem cell RNA, mbiome 8) Cells available for personal functional genomics 9) Scaleable to 100,000 diverse research subjects Coriell GM2 1660 1846 1833 1687 • Employers/Insurers > Non-Discrimination Act • Actionable alleles are rare > all at risk • Non-actionable alleles > activism 1677 1070 1731 1781 0431
3 Exponential technologies3 to 18 month doubling times Computation & Communication Gbp chips human tRNA urea B12 Synthetic chemistry telegraph Analytic tRNA Shendure J, Mitra R, Varma C, Church GM, 2004 Nature Reviews of Genetics. Kurzweil 2002; Moore 1965
Illumina Affymetrix bead-array Chips vs. Gen-2 Sequencing Chips: 0.02% of the genome – assumes common DNA variants stay associated with deleterious variants over 50,000 years ABI-SOLiD Sequencing 98% genome accesses the deleterious variants directly Harvard-Danaher Polonator-G007 Helicos Illumina Roche-454
A G C T Multiplex Cyclic Sequencing by Synthesis Single instrument, multiple chemistries: polonies on slides or beads Polymerase-or- Ligase Shendure, Porreca, et al. 2005 Science Mitra, et al. 2003 Analyt. Biochem. 1999 NAR AB-SOLiD, CGI Illumina, IBS
36 to 64 flowcells (+ DNA barcodes)2 to 4 billion beads 8.5 m thick sequence image
Open-source hardware, software, wetware: Polonator G.007 (12TB image > 120 Gbp /run) Enzyme/oligo kits Polymerase or Ligase chemistries $150K including computer & 1 yr service, software, support Danaher Inc.
Effect of improvements on cost Polonator instrument 3 yr amortization: $150k / 300 runs = $500/run = $50/Gb $150k / 81 runs = $1850/run = $4.2/Gb ($10 vs. $2000 / Gb for other 2nd gen)
Personal genome sequencing options/goals Technology Genome Cost Raw bp AB3730 98% $30M 7x = 42 Gb (3.5x each) Knome 98% $350K 15x = 84 Gb SNP-chip 0.02% $1K 2 Mbp PGP coding 1% $90 30x = 1Gb PGP RNA 99% $20 30x 20K*n = 60 Mb (n=100 cell types for RNA) m-path/resistome - $20 rRNA + 20K genes VDJ-Immunome - $20 ?
In vitro Paired-end-tags (PET) Gap fill Cleave & ligate Red=Synthetic; Yellow=genome/cDNA 4. Hybr-select-chip 5. Hybr-select-solution 6. m-fluidic PCR 7. Multiplex PCR 8 ways to capture alleles from genomic or c-DNA Selective genome sequencing 1. 3. 2. For rearrangements Shendure, et al. Science 309(5741):1728-32. Nilsson et al. (2006) Trends Biotechnol 24:83. How do we optimize >100K 100mers ? Zhang, Chou, Shendure, Li, Leproust, Dahl, Davis,Nilsson, Church
Circle-capture 1% genome Gap fill Jan 2008 R=.986 Aug 2007 R= .53 Zhang, Li et al. unpublished
Genome to Phenome: Population Variation T G C A Environment Gene products Traits Trans Gene Expression cis Genome Zhang & Church unpublished
Combine all cis elementvariants G G G G A A Enhancer, promoter, splicing, polyA, termination, transport, decay. Eliminate environmental & trans-acting variation among individuals. T G TF C A T AAAAA T AAAAA T AAAAA C AAAAA Allele-specific expression (ASE) Allele-specific transcription factor binding Digital RNA allelotyping ChIP-Seq Zhang, LI, Church unpublished Forton et al. Genome Res. 2007
Tissue specific & allele specific gene expression confirmatory assays T/C = 3.73 T/C = 0.51 T/C = 3.47 cDNA Keratinocyte Genomic DNA Lymphocyte cDNA Lymphocyte cDNA Fibroblast Kun Zhang & Alice Li
25X probe * 72X time =1800X Better efficiency. Genomic DNA Aug 2007 Genomic DNA Jan 2008 cDNA Jan 2008 Kun Zhang & Billy Li
3mm skin sample Challenge: Multiple cell types from healthy adults
Complex Traits via Allele-Specific Gene Expression Multiplexed Reprogramming Primary fibroblasts Sequence tag quantitation Induced Stem Cells PGP Physicians Network Multiplexed Differentiation mRNA Induction of Multiple Gene Sets (not necessarily functional tissues) Volunteers Jay Lee et al. unpublished
Induced Pluripotent Stem Cell Generation & Transdifferentiation (Oct4/Sox2/Myc/Klf4) Adenoviral Infection Retroviral Infection Tissue Culture on a Mouse Feeder Layer ES Cell Colony Identification Clonal Isolation and Propagation Embryoid Body Induction & Guided Differentiation Mixture of differentiated cell types & Guided Differentiation 2 months Multiple integration sites 1 week No genomic integration Yamanaka, Daley, Thomson Hochedlinger, Jaenisch labs Lee & Church
Multiple cell-types with transdifferentiation Retroviral Infection Adenoviral Infection Collagen MyoD CD34
Haplotyping by amplification of single chromosomes or fragments Nunc or UCSD Green: phase contrast image Red: Cy5-labelled Alu probe Kun Zhang & Fan Liang
Single-cell or Single DNA-fragment (haplotype) sequencing: 5 Mbp • Ultra-clean conditions for reduction of background amplification + Real-Time monitoring • Post-amplification chip hybridization distinguishes alleles • Amplification variation random & easily filled by PCR • error rate <1.7 10–5 Zhang et al. Nature Biotec 2006
Environments of Genomes One in a life-time genome + yearly ( to daily) tests Bio-weather map : Allergens, Microbes, Viruses VDJ-ome PERSONAL GENOME RNAome TRAITS m-biome
PGP m-Resistome: 18 Antibiotics Dantas, Sommer, Church unpublished
Bacteria Subsisting on 18 Antibiotics Dantas Sommer Church Science 2008
Personal genome sequencing options/goals Technology Genome Cost Raw bp AB3730 98% $30M 7x = 42 Gb (3.5x each) Knome 98% $350K 15x = 84 Gb SNP-chip 0.02% $1K 2 Mbp PGP coding 1% $90 30x = 1Gb PGP RNA 99% $20 30x 20K*n = 60 Mb (n=100 cell types for RNA) m-path/resistome - $20 rRNA + 20K genes VDJ-Immunome - $20 ?