Performance profiling of ngs genome a ssembly a lgorithms
Download
1 / 5

Performance Profiling of NGS Genome Assembly Algorithms - PowerPoint PPT Presentation


  • 71 Views
  • Uploaded on

Performance Profiling of NGS Genome A ssembly A lgorithms. Alex Ropelewski Pittsburgh Supercomputing Center [email protected] 412-268-4960. NGS: Assembly Algorithm. de Bruijn Graph. ALIGNED 3-MERS ATG TGG GGC GCG CGT GTG TGC GCA CAA

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Performance Profiling of NGS Genome Assembly Algorithms' - merritt


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Performance profiling of ngs genome a ssembly a lgorithms

Performance Profiling ofNGS Genome Assembly Algorithms

Alex Ropelewski

Pittsburgh Supercomputing Center

[email protected]

412-268-4960


Ngs assembly algorithm
NGS: Assembly Algorithm

de Bruijn Graph

ALIGNED 3-MERS

ATG

TGG

GGC

GCG

CGT

GTG

TGC

GCA

CAA

AAT

1.ATG

AT

TG

10.AAT

2.TGG

AA

GG

6.GTG

7.TGC

3.GGC

9.CAA

CA

GC

8.GCA

4.GCG

Genome: ATGGCGTGCAAT

GT

CG

5.CGT

Assembled Genome via Eulerian Cycle (reads represented as edges)


Program characteristics
Program characteristics

  • 2 codes of interest:

    • Allpaths-LG: designed for assembling large genomes (Mostly C++, pipeline uses make)

    • Velvet: used frequently for small genomes (written in C; uses some OpenMP)

  • Both codes are:

    • memory intensive

    • time intensive

    • have some parallelization


Desired profile information
Desired Profile Information

  • For each program/step in the assembly pipeline:

    • Time and Memory consumption

    • Identification of serial and parallel steps

    • Quantify I/O characteristics

    • Quantify how many times each step is run

  • For the most time consuming and most called programs/steps:

    • Time consumed by each function

    • How many times is each function called

    • Quantify I/O characteristics

    • Identify parallel steps and examine scaling

    • Describe the main memory consumers


General outcome
General Outcome

  • Where should the optimization effort be focused?

    • Are there serial optimizations?

    • Additional candidates for parallelization?

    • Can the existing parallelization be improved?

    • Can the IO be improved?

    • Memory performance issues to address?

    • Something else?


ad