Instrumenting genomic sequence analysis pipeline mothur on shared memory architecture
1 / 19

Instrumenting Genomic Sequence Analysis Pipeline Mothur on Shared Memory Architecture - PowerPoint PPT Presentation

  • Uploaded on

Instrumenting Genomic Sequence Analysis Pipeline Mothur on Shared Memory Architecture. Junqi Yin , Bhanu Rekepalli, Pragneshkumar Patel, Chanda Drennen , and Annette Engel XSEDE 14, Atlanta GA , July 15, 2014. Outline . Introduction Motivation --- ECSS

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about ' Instrumenting Genomic Sequence Analysis Pipeline Mothur on Shared Memory Architecture' - ide

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Instrumenting genomic sequence analysis pipeline mothur on shared memory architecture

Instrumenting Genomic Sequence Analysis PipelineMothuron Shared Memory Architecture

Junqi Yin, Bhanu Rekepalli, Pragneshkumar Patel, Chanda Drennen, and Annette Engel

XSEDE 14, Atlanta GA , July 15, 2014


  • Introduction

    • Motivation --- ECSS

    • Bioinformatics tool --- Mothur

    • SGI UV1000 --- Nautilus

  • Porting OTU analysis pipeline

    • Pre-clustering denoise

    • Distance matrix calculation

    • Sequence clustering

  • Performance results on Nautilus

  • Summary

Porting Mothur to Nautilus

Ecss community code project the effect of the macondo oil spill on coastal ecosystems
ECSS community code project The effect of the Macondo oil spill on coastal ecosystems

The ultimate goal is to improve society’s ability to understand how to respond to and mitigate the effects of petroleum pollution and related stressors on marine and coastal ecosystems of the Gulf of Mexico.

The challenge is analyzing rapidly growing pyrosequencing data (millions of sequences), which are beyond the capability of a typical workstation.

The solution is to develop a downstream analysis pipeline capable for HPC.

Porting Mothur to Nautilus


Schloss, P.D., et al., Introducing mothur: Open-source, platform-independent, community-supported software for describing and comparing microbial communities. ApplEnviron Microbiol, 2009. 75(23):7537-41 Cited by 2453

Mothur is an expandable C++ code that re-implements a large number of popular algorithms within the community into a single, standalone executable for different platforms.

However, it is not HPC ready.

Porting Mothur to Nautilus


  • One important goal is to categorize sequences

  • 3 ways to bin sequences in Mothur

    • Operational Taxonomic Units (OTUs): sensitive to errors, but is independent of any previous knowledge.

    • Taxonomic: bins sequences based on what they’re named

    • Phylogenetic: builds trees and uses the branching structure to bin sequences

  • For more information

    • Wiki:

    • User forum:

Porting Mothur to Nautilus


Single system image:

  • 1024 cores

    • Intel Nehalem EX processors

  • 4TB of global shared memory

  • 8 NVIDIA Tesla GPUs

  • NUMA

Porting Mothur to Nautilus


A single node system with large global shared memory

Offloading thread synchronization, data sharing, and massage passing overhead from CPUs

Scalable interconnect with other blades via NUMAlink5

For more information, see

NICS and Nautilus:


11,968 physical cores


Otu analysis pipeline
OTU Analysis Pipeline

  • Clustering 16S rRNAsequences into operational taxonomic units (OTUs) is a critical step for the bioinformatic analysis of microbial diversity

    • Pre-clustering denoise

    • Distance matrix calculation

    • Sequence clustering

Porting Mothur to Nautilus

Pre clustering denoise pre cluster
Pre-clustering denoise (pre.cluster)

Remove sequences due to errors: if 2 sequences that are each 1 bp different from a big group, this assumes that it’s due to sequencing error.

Time complexity O(N2); two loops are not independent, and the OpenMP directive is applied to the inner loop.

Distance matrix calculation dist seqs
Distance matrix calculation (dist.seqs)

Calculate pairwise distance between sequences ( O(N2) )

Using MPI in Mothur with embarrassingly parallel scheme

A shared MPI-IO pointer is employed and every MPI process writes to a single file in a line-by-line fashion, which cause writing contentions.

Solution: file per process; scale up to the number of Object Storage Targets (OSTs) of the parallel file systems

Sequence clustering cluster
Sequence clustering (cluster)

  • Unweighted Pair Group Method with Arithmetic mean (UPGMA)

    • Search the distance matrix and find the minimum cell ( O(N2) )

    • Treat the found cell as a node and update its distance to other cells ( O(N) )

    • Repeat first two steps N times or until the found minimum distance is larger than a predefined cutoff value

  • Time complexity O(N3) ; memory complexity O(N2); sequential implementation in Mothur

Sequence clustering cluster1
Sequence clustering (cluster)

To use more than one socket, each thread should work on part of matrix allocated on local memory

Distance matrix is represented by STL vector

“first touch” policy is enforced for NUMA

Solution: customizing memory allocation by overwriting the allocator in std::vector<Type, Allocator<Type> > numa_seqVec

Sequence clustering cluster2
Sequence clustering (cluster)

Most important methods in custom allocator class in allocate()

Object with dynamic data are problematic, e.g. can’t use vector::erase method

Sequence clustering cluster3
Sequence clustering (cluster)

The hot spot (over 90%) in cluster is SparseDistanceMatrix::getSmallestCellmethod

Same static scheduling

Set OMP_PROC_BIND or use dplace

Performance results on nautilus
Performance results on Nautilus

Scaling of distance matrix calculation

10000 sequences on Nautilus

Performance results on nautilus1
Performance results on Nautilus

Run time and speedups for 5000

Sequences on up to 128 cores

Ratio of run time for 16 cores with respect to up to 160 cores for 5000, 10000, and 2000 sequences


Pre-clustering and matrix loading have seen over 4x speedup on Nautilus

Distance calculation shows linear scaling up to the number of the OSTs

Sequence clustering shows 7x speedup when number of cores increased by 10x

Overall, OTU pipeline being accelerated by orders of magnitude on Nautilus, and the optimization is generally applicable for other shared memory machines.

HPC in Physics