1 / 30

Bioinformatics for next-generation DNA sequencing

Bioinformatics for next-generation DNA sequencing. Gabor T. Marth Boston College Biology Department BC Biology new graduate student orientation September 2, 2008. Genetic code (DNA). AGCGT GGTAGCGCGAG TTTGCGAGCT AGCTAGGCT CCGGATGCGA CCAGCTTTGATAGATGAATATAGTGTGCGCGACTAGCTGTGTGTT

annona
Download Presentation

Bioinformatics for next-generation DNA sequencing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Bioinformatics for next-generation DNA sequencing Gabor T. Marth Boston College Biology Department BC Biology new graduate student orientation September 2, 2008

  2. Genetic code (DNA) AGCGTGGTAGCGCGAGTTTGCGAGCTAGCTAGGCTCCGGATGCGA CCAGCTTTGATAGATGAATATAGTGTGCGCGACTAGCTGTGTGTT GAATATATAGTGTGTCTCTCGATATGTAGTCTGGATCTAGTGTTG GTGTAGATGGAGATCGCGTAGCGTGGTAGCGCGAGTTTGCGAGCT AGCTAGGCTCCGGATGCGACCAGCTTTGATAGATGAATATAGTGT GCGCGACTAGCTGTGTGTTGAATATATAGTGTGTCTCTCGATATGT AGTCTGGATCTAGTGTTGGTGTAGATGGAGATCGCGTGCTTGAG TCGTTCGTTTTTTTATGCTGATGATATAAATATATAGTGTTGGTG GGGGGTACTCTACTCTCTCTAGAGAGAGCCTCTCAAAAAAAAAGCT CGGGGATCGGGTTCGAAGAAGTGAGATGTACGCGCTAGXTAGTAT ATCTCTTTCTCTGTCGTGCTGCTTGAGATCGTTCGTTTTTTTATGCT GATGATATAAATATATAGTGTTGGTGGGGGGTACTCTACTCTCTCT AGAGAGAGCCTCTCAAAAAAAAAGCTCGGGGATCGGGTTCGAAGA AGTGAGATGTACGCGCTAGXTAGTATATCTCTTTCTCTGTCGTGCT

  3. The genome

  4. Genome sequencing ~1 Mb ~100 Mb >100 Mb ~3,000 Mb

  5. Next-generation sequencing machines Illumina, AB/SOLiD short-read sequencers 1Gb (1Gb in 25-50 bp reads) bases per machine run 100 Mb 454 pyrosequencer (20-100 Mb in 100-250 bp reads) 10 Mb ABI capillary sequencer 1Mb read length 10 bp 100 bp 1,000 bp

  6. Individual human resequencing

  7. Variations at every scale of genome organization Insertion-deletion polymorphisms Single-base substitutions (SNPs) Epigenetic variations (e.g. changes in methylation / chromatic structure) Structural variations including large-scale chromosomal rearrangements

  8. … cause heritable diseases and determine responses to drugs … allow tracking ancestral human history We care about genetic variations because… … they underlie phenotypic differences

  9. (ii) micro-repeat analysis IND (iii) read mapping (iv) read assembly (v) SNP and short INDEL calling IND (i) base calling (vii) data validation, hypothesis generation Individual resequencing / SNP discovery REF

  10. Tools

  11. The variation discovery “toolbox” • base callers • read mappers • SNP callers • SV callers • assembly viewers

  12. Base calling Quinlan et al. Nature Methods 2008

  13. … and they give you the picture on the box Read mapping Read mapping is like doing a jigsaw puzzle… …you get the pieces… Problem is, some pieces are easier to place than others…

  14. Read mapping Michael Stromberg in prep.

  15. SNP discovery Marth et al. Nature Genetics 1999 Quinlan et al. in prep.

  16. Navigation bar Fragment lengths in selected region Depth of coverage in selected region Structural variation discovery Stewart et al. in prep.

  17. Assembly viewers Huang and Marth Genome Research 2008

  18. Data mining

  19. SNP calling in single-read 454 coverage DNA courtesy of Chuck Langley, UC Davis • collaborative project with Andy Clark (Cornell) and Elaine Mardis (Wash. U.) • goal was to assess polymorphism rates between 10 different African and Americanmelanogaster isolates • 10 runs of 454 reads (~300,000 reads per isolate) were collected

  20. Mutational profiling in deep 454 data Pichia stipitis reference sequence Image from JGI web site • collaboration with Doug Smith at Agencourt • Pichia stipitis is a yeast that efficiently converts xylose to ethanol (bio-fuel production) • one specific mutagenized strain had especially high conversion efficiency • goal was to determine where the mutations were that caused this phenotype • we analyzed 10 runs (~3 million reads) of 454 reads (~20x coverage of the 15MB genome) • processed the sequences with our 454 pipeline • found 39 mutations (in as many reads in which we found 650K SNP in melanogaster) • informatics analysis in < 24 hours (including manual checking of all candidates) Smith et al. Genome Research 2008

  21. SNP calling in short-read coverage C. elegans reference genome (Bristol, N2 strain) Bristol, N2 strain (3 ½ machine runs) Pasadena, CB4858 (1 ½ machine runs) • goal was to evaluate the Solexa/Illumina technology for the complete resequencing of large model-organism genomes • 5 runs (~120 million) Illumina reads from the Wash. U. Genome Center, as part of a collaborative project lead by Elaine Mardis, at Washington University • we found 45,000 SNP with very high validation rate SNP Hillier et al. Nature Methods 2008

  22. Current focus

  23. 1000 Genomes Project • data quality assessment • project design (# samples depth of read coverage) • read mapping • SNP calling • structural variation discovery

  24. SV discovery in autism deletion amplification

  25. Transcriptome sequencing (from: Mortazavi et al. Nature Methods 2008)

  26. Lab

  27. The team Michael Stromberg Chip Stewart Michele Busby Aaron Quinlan Damien Croteau-Chonka Eric Tsung Derek Barnett Weichun Huang

  28. Resources • computer cluster • 128 GB RAM server • 20TB disk space • 2 large R01 grants from the NIH • a BC RIG grant

  29. Collaborations Genome Canada Baylor HGSC Wash. U. GSC UBC GSC UCSF UCLA UC Davis Cornell Pfizer NCBI @ NIH NCI @ NIH Marshfield Clinic

  30. Graduate student rotations • Looking for new graduate students • Spots are available for all three rotations • Lots or projects • Caveat: you need to be able to program… • Check us out at: http://bioinformatics.bc.edu/marthlab/ • If you are interested, please talk to me

More Related