1 / 33

September 26, 2008

Next-Gen Sequencing Bioinformatics Support GPCL-BAC Rick Jordan, Programmer/Analyst J. Lyons-Weiler, Sci. Director. September 26, 2008. Process. GPCL-BAC Director & Analyst meet w/PI Discuss Data Analysis Needs & Study Design PI Decides on Use of BAC or “Go it Alone”

davina
Download Presentation

September 26, 2008

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Next-Gen Sequencing Bioinformatics SupportGPCL-BACRick Jordan, Programmer/Analyst J. Lyons-Weiler, Sci. Director September 26, 2008

  2. Process • GPCL-BAC Director & Analyst meet w/PI • Discuss Data Analysis Needs & Study Design • PI Decides on Use of BAC or “Go it Alone” • “Go It Alone” -> data (.sff files) • “Use the BAC” • data analysis $ estimate • annotation, assembly, & analysis + data • PI reviews Preliminary Research Report w/Analyst • After final analysis, PI receives Report & Data • Often the Analysis will be tailored to the application

  3. de novo Analysis Flowchart Data/Reads exported to data rig Sequences 454 GS FLX Image files GS FLX System .sff files Signal processing Image processing Assembler Sequence processing GS or Lasergene Assembler Analysis & Annotation analysisParams.parse dataRunParams.parse 454BaseCallerMetrics.csv 454QualityFilterMetrics.csv 454RuntimeMetrics.csv

  4. Image processing

  5. Lasergene SeqBuilder • Reference sequence e.coli K12

  6. Signal Processing

  7. de novo Genome Assembly • Two software packages currently used: • GS FLX Assembler (Newbler algorithm) Can be used for all experiments • Lasergene (SeqMan Pro) Single-end experiments only

  8. GS de novo Assembler • Input: .sff files and per-base quality scores • Output: Consensus sequence, assembled de novo • Main processing steps: • Identify pairwise overlaps between reads • Construct multiple alignments of contigs • Generate consensus basecalls of contigs • Output contig consensus sequences and quality scores, along with ACE file of multiple alignments and assembly metrics files From 454 Sequencing GS-FLX Data Analysis Software Manual, Dec 2007

  9. e.g. Graphic Figure of the Assembly (Lasergene 7.2)

  10. GS Reference Mapper • Generates the consensus DNA sequence by mapping, or alignment, of the reads to a reference sequence • Provides a list of high-confidence mutations (individual bases or blocks of bases that differ between the consensus DNA sequence of the sample and the reference sequence) From 454 Sequencing GS-FLX Data Analysis Software Manual, Dec 2007

  11. Genome Annotation (sequence functional classes) Zuber et al. (2007)

  12. Gene annotation with SeqManPro Project

  13. e.g. Diagrams Smith et al. (2007)

  14. Impacted Pathways

  15. e.g. Pathway view

  16. e.g. COGS table Smith et al. (2007)

  17. e.g. Sequencing statistics table Marcy et al. (2007)

  18. Base Caller Metrics

  19. Quality Filter Metrics

  20. Runtime Metrics

  21. Quality measures by region TCA ATG

  22. Read lengths by region TCA ATG

  23. e.g. Blast results

  24. e.g. Predicted nucleotide and protein alignment Raymond et al. (2007)

  25. e.g. Predicted protein alignment Raymond et al. (2007)

  26. Grant Text Next Generation Sequence Bioinformatics Analysis. The Bioinformatics Analysis Core is sufficiently endowed with software and human resources to conduct the analysis of data from resequencing and de novo sequencing studies. Software acquisitions include the default Genome Sequencer modules and the recently acquired specialized Lasergene 7.2 software by DNA*. One BAC staff member is dedicated to the analysis of long-read NextGen sequencing data and is responsible for generating research reports for each project. Genome Sequencer FLX System Software The FLX System Software includes modules for each stage in the analysis. All raw data are accessible, and the system also offers a variety of third party software packages for niche applications. Data QA/QC The Core uses a variety of data quality control measures including consensus accuracy and quality scores including per base (Q20+) and per genome (%Bases Q20+; the proportion of an assembled genome with base call accuracy of >99%). The Core has also acquired licenses required to execute the full suite of Lasergene applications to round out the core’s Genome Annotation capabilities. In addition to the sequence assembler/SNP discover algorithms in SeqMan Pro, and the visualization and sequence editing modules (SeqBuilder), the Lasergene suite adds the capacity for gene finding (GeneQuest) and protein structure analysis & prediction (Protean). The variety of file types that the core is expected to handle is greatly aided by Laser Genes’s EditSeq and by the much-improved interoperability of SeqMan Pro (which can import .sff, .fna, .fas and .qual files).

  27. Research Report Components • Tables • Base Call Metrics • Quality Filter Metrics • Run Time Metric Tables • Quality Score • per base (Q40+) • per genome (%Bases Q40+; the proportion of an assembled genome with base call accuracy of >99%). • Quality Measure Distributions (By region) • Read Length Measure Distributions • Overall Sequence Statistics Tables • Blast tables • COGs Table • Figures • Assembly Figures • Alignment Diagrams • Gene Functional Categories Diagrams • Genome View Diagrams • Nucleotide Alignment Diagrams • Predicted Protein Alignment Diagrams • Gene Ontology Functional Class Diagrams/Charts • Pathway Views • COGs Figures • Methods Text • Manuscripts • Proposals • Letter of Support

  28. Application Areas • Ancient DNA • ChIP-seq/Methylation/Epigenetics • Eukaryotic Whole Genome Sequencing • Expression tags • Genetic variation detection • HIV sequencing • Metagenomics and Microbial Diversity • Mitochondria/viruses/plastids/plasmids • Prokaryotic Whole Genome Sequencing • Sequence Capture/Target Region Resequencing • Small RNAs • Somatic variation detection • Transcriptome Sequencing Roche 454/GS-FLX Web Site

  29. de novo Analysis Flowchart Data/Reads exported to data rig Sequences 454 GS FLX Image files GS FLX System .sff files Signal processing Image processing Assembler Sequence processing GS or Lasergene Assembler Analysis & Annotation analysisParams.parse dataRunParams.parse 454BaseCallerMetrics.csv 454QualityFilterMetrics.csv 454RuntimeMetrics.csv

  30. Final Service Product • Pre-analysis output files • dataRunParams.parse • 454 BaseCallerMetrics.csv • 454 QualityFilterMetrics.csv • 454 RuntimeMetricsAll.csv • Post-analysis output files • .sff files (for each region) • Research report (.ppt) • Additional text editing

More Related