mining hidden information from your 454 data using modular and database oriented methods n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Mining hidden information from your 454 data using modular and database oriented methods PowerPoint Presentation
Download Presentation
Mining hidden information from your 454 data using modular and database oriented methods

Loading in 2 Seconds...

play fullscreen
1 / 13

Mining hidden information from your 454 data using modular and database oriented methods - PowerPoint PPT Presentation


  • 60 Views
  • Uploaded on

Mining hidden information from your 454 data using modular and database oriented methods. Joachim De Schrijver. Overview. Short introduction on 454 sequencing Variant Identification pipeline Possibilities of a DB oriented pipeline Examples Coverage Improving PCR Fast Q assessment

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Mining hidden information from your 454 data using modular and database oriented methods' - eben


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
mining hidden information from your 454 data using modular and database oriented methods

Mining hidden information from your 454 data using modular and database oriented methods

Joachim De Schrijver

overview
Overview
  • Short introduction on 454 sequencing
  • Variant Identification pipeline
  • Possibilities of a DB oriented pipeline
  • Examples
    • Coverage
    • Improving PCR
    • Fast Q assessment
    • Homopolymers
introduction i
Introduction (i)
  • Roche/454 GS-FLX sequencing:
    • Pyrosequencing
    • ± 400,000 reads/run
    • Average length: 200-250bp
  • Applications:
    • Resequencing: Variant identification
    • De novo (genome) sequencing: Assembly of new regions, plasmids or entire genomes
  • Standard Software:
    • Variants: Amplicon Variant Analyzer (AVA)
    • Assembly: Standard 454 assembler
introduction ii
Introduction (ii)
  • Standard software
    • + Easy to use
    • + reproducible results on similar datasets
    • + GUI (graphical user interface)
    • - No answer for ‘non-standard’ questions
      • Methylation experiments
      • Different types of experiments grouped together
    • - What about ‘hidden’ information?
      • Homopolymer error rates
      • Quality score ~ length of sequenced read
      • ‘Multirun’ information
variant identification pipeline i
Variant Identification Pipeline (i)
  • Modular and database oriented pipeline
  • Modular:
    • Efficient planning
    • Scalable
  • Database (DB):
    • No loss of data
    • Grouping several runs together
variant identification pipeline ii
Variant Identification pipeline (ii)
  • Basic idea: Data is processed and stored in DB. Results (reports) are calculated ‘on the fly’ using the DB data.
    • Fast & efficient
    • Calculations only happen once
    • Everybody can access the database without risk of data modification
    • Reporting is independent from the dataprocessing
  • Paper: De Schrijver et al. 2009. Analysing 454 sequences with a modular and database oriented Variant Identification Pipeline
possibilities of a db oriented pipeline
Possibilities of a DB oriented pipeline
  • VIP originally developed for variant identification
  • Now being used in:
    • Amplicon resequencing
    • De novo shotgun
    • Methylation
    • ~ solexa experiments
  • ‘Hidden’ data can be extracted using intelligent querying strategies
  • Results per lane/Multiplex MID/run…
example detailed coverage
Example: Detailed coverage
  • Coverage can be calculated per
    • Lane
    • MID
    • Amplicon
    • Base position
  • Assessment of errors (PCR dropouts vs. human errors)
example improving pcr
Example: Improving PCR
  • Amplicon Resequencing experiment
  • Goal: Variant identification
  • Length distributions
    • Mapped
    • Unmapped
    • ‘Short’ mapped
  • Additional length separation + Improved PCR
  • Result: Improved efficiency
example homopolymers
Example: Homopolymers
  • Can the length of a homopolymer be assessed using the Q score?
  • Yes, when homopolymer length < 6bp
example q assessment
Example: Q assessment
  • Fast assessment of the quality of a run

Lab work OK

Errors in lab work

acknowledgements
Acknowledgements
  • Biobix – Ugent

Wim Van Criekinge

Tim De Meyer

GeertTrooskens

Tom Vandekerkhove

Leander Van Neste

GerbenMensschaert

  • CMG – UZ Gent

Jo Vandesompele

Jan Hellemans

FilipPattyn

Steve Lefever

Kim Deleeneer

Jean-Pierre Renard

  • NXT-GNT
    • Paul Coucke
    • SofieBekaert
    • Filip Van Nieuwerburgh
    • Dieter Deforce
    • Wim Van Criekinge
    • Jo Vandesompele
slide13

Questions ?

Joachim.deschrijver@ugent.be