Sprint
This presentation is the property of its rightful owner.
Sponsored Links
1 / 12

SPRINT PowerPoint PPT Presentation


  • 154 Views
  • Uploaded on
  • Presentation posted in: General

SPRINT. A S imple P arallel R INT erface. Overview. What is SPRINT How is SPRINT different from other parallel R packages Biological example: Post-genomic data analysis Code comparison. SPRINT. S imple P arallel R INT erface ( www.r-sprint.org )

Download Presentation

SPRINT

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


SPRINT

A Simple Parallel RINTerface


Overview

  • What is SPRINT

  • How is SPRINT different from other parallel R packages

  • Biological example: Post-genomic data analysis

  • Code comparison

SPRINT


SPRINT

SimpleParallelRINTerface

(www.r-sprint.org)

“SPRINT: A new parallel framework for R”,J Hill et al, BMC Bioinformatics, Dec 2008.

SPRINT


Issues of existing parallel R packages

  • Difficult to program

  • Require scientist to also be a parallel programmer!

  • Require substantial changes to existing scripts

  • Can’t be used to solve some problems

  • No data dependencies allowed

SPRINT


Biological example

  • Data: A matrix of expression measurements with genes in rows and samples in columns

SPRINT


Biological example

  • ProblemUsing all or many genes will either crash or be very slow (R memory allocation limits, number of computations)

Data limitations (correlations)

Work load limitations (permutations)

SPRINT


Workarounds and solution

  • Workaround:

    • Remove as many genes as possible before applying algorithm. This can be an arbitrary process and remove relevant data.

    • Perform multiple executions and post-process the data. Can become very painful procedure.

  • Solution:Parallelisation of R code can be made accessible to bioinformaticians/statisticians.A library with expert coded solutions once, then easy end-point use by all.

Big Post

Genomic Data

SPRINT

HPC

R

Biological Results

SPRINT


Benchmarks (256 processes)

Data limitations (correlations)

Work load limitations (permutations)

SPRINT


Correlation code comparison

edata <- read.table("largedata.dat")

pearsonpairwise <- cor(edata)

write.table(pearsonpairwise, "Correlations.txt")

quit(save="no")

library("sprint")

edata <- read.table("largedata.dat")

ff_handle <- pcor(edata)

pterminate()

quit(save="no")

SPRINT


Permutation testing code comparison

data(golub)

smallgd <- golub[1:100,]

classlabel <- golub.cl

resT <- mt.maxT(smallgd, classlabel, test="t", side="abs")

quit(save="no")

library("sprint")

data(golub)

smallgd <- golub[1:100,]

classlabel <- golub.cl

resT <- pmaxT(smallgd, classlabel, test="t", side="abs")

pterminate()

quit(save="no")

SPRINT


SPRINT

  • Website: http://www.r-sprint.org/

  • Source code can be downloaded from website

  • Soon also in the CRAN repository

  • Mailing list: [email protected]

  • Contact email: [email protected]

SPRINT


DPM Team:

Peter Ghazal

Thorsten Forster

Muriel Mewissen

Numerical Algorithms Group

Acknowledgements

EPCC Team:

  • Terry Sloan

  • Michal Piotrowski

  • Savvas Petrou

  • Bartek Dobrzelecki

  • Jon Hill

  • Florian Scharinger

This work is supported by the Wellcome Trust and the NAG dCSE Support service.

SPRINT


  • Login