Performance profiling of ngs genome a ssembly a lgorithms
This presentation is the property of its rightful owner.
Sponsored Links
1 / 5

Performance Profiling of NGS Genome A ssembly A lgorithms PowerPoint PPT Presentation


  • 48 Views
  • Uploaded on
  • Presentation posted in: General

Performance Profiling of NGS Genome A ssembly A lgorithms. Alex Ropelewski Pittsburgh Supercomputing Center [email protected] 412-268-4960. NGS: Assembly Algorithm. de Bruijn Graph. ALIGNED 3-MERS ATG TGG GGC GCG CGT GTG TGC GCA CAA

Download Presentation

Performance Profiling of NGS Genome A ssembly A lgorithms

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Performance profiling of ngs genome a ssembly a lgorithms

Performance Profiling ofNGS Genome Assembly Algorithms

Alex Ropelewski

Pittsburgh Supercomputing Center

[email protected]

412-268-4960


Ngs assembly algorithm

NGS: Assembly Algorithm

de Bruijn Graph

ALIGNED 3-MERS

ATG

TGG

GGC

GCG

CGT

GTG

TGC

GCA

CAA

AAT

1.ATG

AT

TG

10.AAT

2.TGG

AA

GG

6.GTG

7.TGC

3.GGC

9.CAA

CA

GC

8.GCA

4.GCG

Genome: ATGGCGTGCAAT

GT

CG

5.CGT

Assembled Genome via Eulerian Cycle (reads represented as edges)


Program characteristics

Program characteristics

  • 2 codes of interest:

    • Allpaths-LG: designed for assembling large genomes (Mostly C++, pipeline uses make)

    • Velvet: used frequently for small genomes (written in C; uses some OpenMP)

  • Both codes are:

    • memory intensive

    • time intensive

    • have some parallelization


Desired profile information

Desired Profile Information

  • For each program/step in the assembly pipeline:

    • Time and Memory consumption

    • Identification of serial and parallel steps

    • Quantify I/O characteristics

    • Quantify how many times each step is run

  • For the most time consuming and most called programs/steps:

    • Time consumed by each function

    • How many times is each function called

    • Quantify I/O characteristics

    • Identify parallel steps and examine scaling

    • Describe the main memory consumers


General outcome

General Outcome

  • Where should the optimization effort be focused?

    • Are there serial optimizations?

    • Additional candidates for parallelization?

    • Can the existing parallelization be improved?

    • Can the IO be improved?

    • Memory performance issues to address?

    • Something else?


  • Login