Sprint
This presentation is the property of its rightful owner.
Sponsored Links
1 / 12

SPRINT PowerPoint PPT Presentation


  • 141 Views
  • Uploaded on
  • Presentation posted in: General

SPRINT. A S imple P arallel R INT erface. Overview. What is SPRINT How is SPRINT different from other parallel R packages Biological example: Post-genomic data analysis Code comparison. SPRINT. S imple P arallel R INT erface ( www.r-sprint.org )

Download Presentation

SPRINT

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Sprint

SPRINT

A Simple Parallel RINTerface


Overview

Overview

  • What is SPRINT

  • How is SPRINT different from other parallel R packages

  • Biological example: Post-genomic data analysis

  • Code comparison

SPRINT


Sprint1

SPRINT

SimpleParallelRINTerface

(www.r-sprint.org)

“SPRINT: A new parallel framework for R”,J Hill et al, BMC Bioinformatics, Dec 2008.

SPRINT


Issues of existing parallel r packages

Issues of existing parallel R packages

  • Difficult to program

  • Require scientist to also be a parallel programmer!

  • Require substantial changes to existing scripts

  • Can’t be used to solve some problems

  • No data dependencies allowed

SPRINT


Biological example

Biological example

  • Data: A matrix of expression measurements with genes in rows and samples in columns

SPRINT


Biological example1

Biological example

  • ProblemUsing all or many genes will either crash or be very slow (R memory allocation limits, number of computations)

Data limitations (correlations)

Work load limitations (permutations)

SPRINT


Workarounds and solution

Workarounds and solution

  • Workaround:

    • Remove as many genes as possible before applying algorithm. This can be an arbitrary process and remove relevant data.

    • Perform multiple executions and post-process the data. Can become very painful procedure.

  • Solution:Parallelisation of R code can be made accessible to bioinformaticians/statisticians.A library with expert coded solutions once, then easy end-point use by all.

Big Post

Genomic Data

SPRINT

HPC

R

Biological Results

SPRINT


Benchmarks 256 processes

Benchmarks (256 processes)

Data limitations (correlations)

Work load limitations (permutations)

SPRINT


Correlation code comparison

Correlation code comparison

edata <- read.table("largedata.dat")

pearsonpairwise <- cor(edata)

write.table(pearsonpairwise, "Correlations.txt")

quit(save="no")

library("sprint")

edata <- read.table("largedata.dat")

ff_handle <- pcor(edata)

pterminate()

quit(save="no")

SPRINT


Permutation testing code comparison

Permutation testing code comparison

data(golub)

smallgd <- golub[1:100,]

classlabel <- golub.cl

resT <- mt.maxT(smallgd, classlabel, test="t", side="abs")

quit(save="no")

library("sprint")

data(golub)

smallgd <- golub[1:100,]

classlabel <- golub.cl

resT <- pmaxT(smallgd, classlabel, test="t", side="abs")

pterminate()

quit(save="no")

SPRINT


Sprint2

SPRINT

  • Website: http://www.r-sprint.org/

  • Source code can be downloaded from website

  • Soon also in the CRAN repository

  • Mailing list: [email protected]

  • Contact email: [email protected]

SPRINT


Acknowledgements

DPM Team:

Peter Ghazal

Thorsten Forster

Muriel Mewissen

Numerical Algorithms Group

Acknowledgements

EPCC Team:

  • Terry Sloan

  • Michal Piotrowski

  • Savvas Petrou

  • Bartek Dobrzelecki

  • Jon Hill

  • Florian Scharinger

This work is supported by the Wellcome Trust and the NAG dCSE Support service.

SPRINT


  • Login