1 / 11

A Fast Hybrid Short Read Fragment Assembly Algorithm

A Fast Hybrid Short Read Fragment Assembly Algorithm. Introduction. Second-generation DNA technologies Traditional : Sanger shotgun techniques New techniques(2007 & 2008): SSAKE, UCAKE and SHARCGS --based on greedy extension Edena, Velvet, Euler-SR --based on graph. Taipan Method: Two steps.

Download Presentation

A Fast Hybrid Short Read Fragment Assembly Algorithm

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Fast Hybrid Short Read Fragment Assembly Algorithm

  2. Introduction • Second-generation DNA technologies • Traditional : Sanger shotgun techniques • New techniques(2007 & 2008): • SSAKE, UCAKE and SHARCGS--based on greedy extension • Edena, Velvet, Euler-SR--based on graph

  3. Taipan Method: Two steps • 1. Greedy Extension • iteratively extended by one base at a time both in 3’ direction and 5’ direction • 2. Graph-based Method • to assembly the constructed contig from previous step.

  4. Example • Usage: taipan -f {inputfilename} -k {minimal_overlap} [-t {threshold}] [-o {seed_occ}] [-v {verbose}] [-c {min_contig_length}] • Result:

  5. Optimal spliced alignments of short sequence reads Fabio De Bona Bioinfromatics, 2008

  6. Genome VS Transcriptome • Analysis sequence reads from genomic DNA Sequence assemble Align them to the genome • Transcriptome analysis First align the single reads to the genome Then merges the alignments to infer gene structures.

  7. Reconstruct the whole genome from cDNA data Reconstruct the transcriptome from EST data (transcripted cDNA) Genome VS Transcriptome DNA

  8. Problem Formulation DNA Limitation: 1 read length of the NG is relatively small. 2 read error rate(assuming 5%)

  9. General Description Smith-Waterman • Quality Score • Slicing Site Info • Intron Length

  10. Method 3. With Slicing Info 1. Original 2. With Quality Score 4. With Intron

  11. Test Data • 10 000 sequences with known alignments • three different scorings • quality information • splicesite predictions • intron length

More Related