1 / 86

Computational assembly for prokaryotic sequencing projects

Computational assembly for prokaryotic sequencing projects. Lee Katz, Ph.D. Bioinformatician, Enteric Diseases Laboratory Branch January 15, 2014. Disclaimers

yasuo
Download Presentation

Computational assembly for prokaryotic sequencing projects

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Computational assembly for prokaryotic sequencing projects Lee Katz, Ph.D. Bioinformatician, Enteric Diseases Laboratory Branch January 15, 2014 Disclaimers The findings and conclusions in this presentation have not been formally disseminated by the Centers for Disease Control and Prevention and should not be construed to represent any agency determination or policy. The findings and conclusions in this [report/presentation] are those of the author(s) and do not necessarily represent the official position of CDC

  2. Partners in Public Health

  3. Graduated Oct 2010

  4. CDC 2010 - present

  5. Lee Katz, Present Currently in the National Enteric Reference Laboratory Vibrio, Campylobacter, Escherichia, Shigella, Yersinia, Salmonella Focusing on Listeria and Vibrio

  6. One of my projects is #2 on CDC’s list of accomplishments for 2013! #2 http://www.cdc.gov/features/endofyear/

  7. Outline • Sequencing • 1st gen • 2nd gen • 3rdgen • Reads • Quality control (Q/C) • Read metrics • Read-cleaning • Assembly • Algorithms • Assembly metrics

  8. Prokaryotic Sequencing Projects Stages Examples Haemophilus influenzae Neisseria meningitidis Bordetellabronchisceptica Vibrio cholerae Listeria monocytogenes Fleischman et al. (1995) “Whole-Genome Random Sequencing and Assembly of Haemophilus influenzae Rd” Science 269:5223 Kislyuk et al. (2010) “A computational genomics pipeline for prokaryotic sequencing projects” Bioinformatics26:15 • Sequencing • Assembly • Feature prediction • Functional annotation • …analysis… • Display (Genome Browser)

  9. Out with the old; in with the new:Two new technologies to the compgenomics class! • 454 • Illumina single end reads • Illumina paired end reads • PacBio

  10. Sanger Sequencing (1st gen)

  11. Sequencing: first generation Margulies et al. (2005) Genome sequencing in open microfabricated high density picoliter reactors. Nature437:7057

  12. Sanger sequencing output • Usually .ab1/.scf file format

  13. 454 Sequencing (2nd Gen)

  14. 454 Pyrosequencing A + PCR Reagents + Emulsion Oil B Mix DNA library & capture beads (limited dilution) Create “Water-in-oil” emulsion “Break micro-reactors” Isolate DNA containing beads Perform emulsion PCR

  15. 44 μm 454 Pyrosequencing Load enzyme beads Load beads into PicoTiter™Plate PicoTiter™Plate Diameter = 44 μm Depth = 55 μm Well size = 75 pl Well density = 480 wells mm-2 1.6 million wells per slide

  16. 454 Pyrosequencing Sequencing by synthesis Photonsgenerated are captured by CCD camera Reagent flow Margulies et al., 2005

  17. 4-mer 3-mer Measures the presence or absence of each nucleotide at any given position TACG Flow Order 2-mer KEY (TCAG) 1-mer 454 sequencing output • Flowgram (.sff file format)

  18. Illumina sequencing (2nd Gen)

  19. The following animations are courtesy of Illumina, Inc. Region complementary to P5 grafting primer Index 2 P5 primer DNA insert P7 primer Index 1 P5 grafting primer P7 grafting primer Flow cell surface

  20. The following animations are courtesy of Illumina, Inc. SBS Sequencing Primer Hybridization

  21. The following animations are courtesy of Illumina, Inc. Sequence (Cycle 1)

  22. Sequence (Cycle 1)

  23. Index 1 Seq Primer Hybridization

  24. Index 1 read – 8 cycles

  25. Unblock

  26. P5 grafting primer

  27. 7 dark cycles P5 grafting primer

  28. Index 2 index read 8 cycles 7 dark cycles P5 grafting primer

  29. Index 2 index read 8 cycles 7 dark cycles P5 grafting primer

  30. Linearization Original strand New strand

  31. Illumina sequencing video http://www.youtube.com/watch?v=womKfikWlxM

  32. PacBio sequencing* (3rd Gen) *Pacific Biosciences

  33. http://www.youtube.com/watch?v=NHCJ8PtYCFc SMRT Bell Zero-mode waveguide (ZMW), a very fancy and very small well Thanks to PacBio for donating some slide materials in this section Eid et al Science, January 2009/10.1126/science.1162986

  34. http://www.youtube.com/watch?v=NHCJ8PtYCFc Eid et al Science, January 2009/10.1126/science.1162986

  35. Eid et al Science, January 2009/10.1126/science.1162986

  36. PacBio video http://www.youtube.com/watch?v=NHCJ8PtYCFc

  37. Q/C + cleaning + metrics Reads

  38. Q/C You need to know if your data are good! Example software • FastQC • Computational Genomics Pipeline (CG-Pipeline)

  39. Quality Control FastQC output

  40. Quality Control bioinformatics FastQC output

  41. The CG-Pipeline way run_assembly_readMetrics.pl File avgReadLengthtotalBasesminReadLengthmaxReadLengthavgQuality tmp.fastq80.00 177777760 80 80 35.39

  42. Read cleaning

  43. Read cleaning with CG-Pipeline(not validated; please use with caution) R. Read F. Read Read %ACGT Phred http://sourceforge.net/projects/cg-pipeline/ Graphs made with FastqQC (AMOS)

  44. 1. Trimming low-qual endsrun_assembly_trimLowQualEnds.pl R. Read F. Read Read 1A. %ACGT 1B. Phred http://sourceforge.net/projects/cg-pipeline/ Graphs made with FastqQC (AMOS)

  45. 2a. Removing duplicate reads2b. Sometimes: downsamplingrun_assembly_removeDuplicateReads.pl Trimmed reads http://sourceforge.net/projects/cg-pipeline/

  46. 3. Trimming and filteringrun_assembly_trimClean.pl Min avg. quality Min length 3A. trimming Min avg. quality Min length 3B. filtering http://sourceforge.net/projects/cg-pipeline/

  47. More • Software • Fastx toolkit http://hannonlab.cshl.edu/fastx_toolkit/ • EA-utilshttps://code.google.com/p/ea-utils/ • AMOS amos: SourceForge.net • … and more is out there! • Evaluation • Fabbro et al 2013, “An extensive evaluation of read trimming effects on Illumina NGS data analysis”

More Related