Sprint
Download
1 / 12

SPRINT - PowerPoint PPT Presentation


  • 233 Views
  • Uploaded on

SPRINT. A S imple P arallel R INT erface. Overview. What is SPRINT How is SPRINT different from other parallel R packages Biological example: Post-genomic data analysis Code comparison. SPRINT. S imple P arallel R INT erface ( www.r-sprint.org )

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'SPRINT' - cicely


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Sprint

SPRINT

A Simple Parallel RINTerface


Overview
Overview

  • What is SPRINT

  • How is SPRINT different from other parallel R packages

  • Biological example: Post-genomic data analysis

  • Code comparison

SPRINT


Sprint1
SPRINT

SimpleParallelRINTerface

(www.r-sprint.org)

“SPRINT: A new parallel framework for R”,J Hill et al, BMC Bioinformatics, Dec 2008.

SPRINT


Issues of existing parallel r packages
Issues of existing parallel R packages

  • Difficult to program

  • Require scientist to also be a parallel programmer!

  • Require substantial changes to existing scripts

  • Can’t be used to solve some problems

  • No data dependencies allowed

SPRINT


Biological example
Biological example

  • Data: A matrix of expression measurements with genes in rows and samples in columns

SPRINT


Biological example1
Biological example

  • ProblemUsing all or many genes will either crash or be very slow (R memory allocation limits, number of computations)

Data limitations (correlations)

Work load limitations (permutations)

SPRINT


Workarounds and solution
Workarounds and solution

  • Workaround:

    • Remove as many genes as possible before applying algorithm. This can be an arbitrary process and remove relevant data.

    • Perform multiple executions and post-process the data. Can become very painful procedure.

  • Solution:Parallelisation of R code can be made accessible to bioinformaticians/statisticians.A library with expert coded solutions once, then easy end-point use by all.

Big Post

Genomic Data

SPRINT

HPC

R

Biological Results

SPRINT


Benchmarks 256 processes
Benchmarks (256 processes)

Data limitations (correlations)

Work load limitations (permutations)

SPRINT


Correlation code comparison
Correlation code comparison

edata <- read.table("largedata.dat")

pearsonpairwise <- cor(edata)

write.table(pearsonpairwise, "Correlations.txt")

quit(save="no")

library("sprint")

edata <- read.table("largedata.dat")

ff_handle <- pcor(edata)

pterminate()

quit(save="no")

SPRINT


Permutation testing code comparison
Permutation testing code comparison

data(golub)

smallgd <- golub[1:100,]

classlabel <- golub.cl

resT <- mt.maxT(smallgd, classlabel, test="t", side="abs")

quit(save="no")

library("sprint")

data(golub)

smallgd <- golub[1:100,]

classlabel <- golub.cl

resT <- pmaxT(smallgd, classlabel, test="t", side="abs")

pterminate()

quit(save="no")

SPRINT


Sprint2
SPRINT

SPRINT


Acknowledgements

DPM Team:

Peter Ghazal

Thorsten Forster

Muriel Mewissen

Numerical Algorithms Group

Acknowledgements

EPCC Team:

  • Terry Sloan

  • Michal Piotrowski

  • Savvas Petrou

  • Bartek Dobrzelecki

  • Jon Hill

  • Florian Scharinger

This work is supported by the Wellcome Trust and the NAG dCSE Support service.

SPRINT


ad