Running batch jobs in r how to deal with coarsely parallel problems
This presentation is the property of its rightful owner.
Sponsored Links
1 / 16

Running Batch Jobs in R: How to deal with coarsely parallel problems PowerPoint PPT Presentation


  • 46 Views
  • Uploaded on
  • Presentation posted in: General

Running Batch Jobs in R: How to deal with coarsely parallel problems. Malcolm Haddon. May 2014. Wealth from Oceans National research Flagship. Computer Intensive. Many, many, many iterations: Management Strategy Evaluation Monte Carlo Markov Chains Lots of replicates of any analyses

Download Presentation

Running Batch Jobs in R: How to deal with coarsely parallel problems

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Running batch jobs in r how to deal with coarsely parallel problems

Running Batch Jobs in R:How to deal with coarsely parallel problems

Malcolm Haddon

May 2014

Wealth from Oceans National research Flagship


Computer intensive

Computer Intensive

  • Many, many, many iterations:

    • Management Strategy Evaluation

    • Monte Carlo Markov Chains

    • Lots of replicates of any analyses

  • Large scale simulations:

    • multi-species,

    • multi-populations,

    • multi-’etc’

  • Any computing job that takes a long time or uses a lot of computing resources

| Batch Jobs in R | Haddon


Why the fuss

Why the Fuss?

  • Solving BIG computing problems has its own strategies.

  • If a job:

    • takes a very long time, or

    • uses very large amounts of RAM

    • Then how can it be split up most effectively?

  • Depends on the scale at which processes are independent.

  • May need trials to find best compromise.

| Batch Jobs in R | Haddon


Coarsely parallel processes

Coarsely Parallel Processes

  • Not talking about finely parallel processes such as cellular models in Oceanography or visualization.

    • The use of GPUs containing thousands of small processors is ideally suited to such analyses.

    • Some emphasis on this with the CSIRO clusters, (Bragg, etc) and the Advanced Scientific Computing program

  • Instead: focussed on serial and sequential problems where analysis order is important.

    • Population processes

    • Many biological processes

  • Cannot split up time-series trajectories – but can treat each trajectory as a different process (coarsely parallel)

| Batch Jobs in R | Haddon


Alternative approaches to simulation

Alternative Approaches to Simulation.

Apply 8 Harvest Strategies

to an abalone fishery over

40 years with 1000

replicates (8 x 1000)

Apply 8 Harvest Strategies

to an abalone fishery over

40 years with 1000

replicates (8 x 1000)

Split the job

into 8 parts

for (iter in 1:1000) {

}

…..

for (iter in 1:1000) {

}

for (iter in 1:1000) {

}

for (HS in 1:8) {

for (iter in 1:1000) {

}

}

…..

Store Results

Store Results

Store Results

Combine

plot and tabulate

results

plot and tabulate

results

Next Steps

| Batch Jobs in R | Haddon


The r program

The R program

| Batch Jobs in R | Haddon


Running batch jobs in r how to deal with coarsely parallel problems

setwd

resultdir

read in Data

batchsimab.r

source(“Constants”)

source(“run_specification”)

source(“Lots of Functions”)

write to csv file(s)

write to Rdata files

plots to tiff/pdf/etc

| Batch Jobs in R | Haddon


Top level runbatch r contains

Top Level: runbatch.R – contains:

## SET PARAMETERS AS DESIRED IN

## runspecification.Rand constants.R

>wkdir<- "C:/A_CSIRO/Rcode/abalone/SimAb"

>setwd(wkdir) ## points to directory containing batchsimab.r

>command <- "R.exe --vanilla < “batchsimab.R"

>shell(command, wait=FALSE)

##(R.exe must be on the path).

| Batch Jobs in R | Haddon


Top level runbatch r contains1

Top Level: runbatch.R – contains:

## SET PARAMETERS AS DESIRED IN

## RunSpecification.Rand constants.R

primaryloop <- c(val1, val2, val3,..)

for (toplevel in 1:length(primaryloop) {

sink(“RunSpecification.R”)

sink()

command <- "R.exe --vanilla < batchsimab.R"

shell(command, wait=FALSE)

}

## Can re-write values in RunSpecification.R

| Batch Jobs in R | Haddon


Running batch jobs in r how to deal with coarsely parallel problems

  • pickLML <- c(127,132,138,145)

  • for (pick in 1:length(pickLML)) {

  • filename <- "alt_runspecification.r"

  • sink(filename)

  • cat("##Select the HCR \n")

  • cat("StepH <- FALSE \n")

  • cat("ConstH <- TRUE \n")

  • cat("## Define the Scenarios \n")

  • cat("initDepl_L <- c(0.7) \n")

  • cat("inH_L <- c(0.1) \n")

  • cat("origTAC <- 150.0 \n")

  • cat(paste("LML <- ",pickLML[pick],sep="") ," \n")

  • cat("reps <- 100 \n")

  • sink()

  • command <- "R.exe --vanilla < batchsimab.R"

  • shell(command, wait=FALSE)

  • Sys.sleep(5.0)

  • }

| Batch Jobs in R | Haddon


Alt runspecification r contents

alt-runspecification.r - contents

  • batch <- TRUE

  • ##Select the HCR

  • StepH <- FALSE

  • ConstH <- TRUE

  • ## Define the Scenarios

  • initDepl_L <- c(0.7)

  • inH_L <- c(0.1)

  • origTAC <- 150.0

  • LML <- 138

  • reps <- 100

| Batch Jobs in R | Haddon


Alternative approach

Alternative Approach

Not that useful for coarsely parallel problems,

but excellent for finely parallel processes.

| Batch Jobs in R | Haddon


Alternative approaches

Alternative Approaches

  • Can use one’s own desktop or laptop.

  • Can use a secondary machine (remote login)

  • Can use a CSIRO cluster machine (bragg for Linux or bragg-w for windows, plus others).

  • Clusters are very effective for finely parallel work but less so for coarsely parallel jobs.

  • Can use Condor – harvests CPU time on remote machines on network automatically.

  • wiki.csiro.au/display/ASC/Scientific+Computing+Homepage

| Batch Jobs in R | Haddon


Conclusion

Conclusion

  • The use of batch jobs provides a solution for completing certain types of task.

  • If you are using computer intensive methods then you might gain greatly from using coarsely parallel methods.

  • Trade-off between the benefits and the set-up time and post-run processing determines when it becomes sensible to use coarsely parallel methods

  • Invariably more than 1 way exists to do the same thing:

  • https://wiki.csiro.au/display/ASC/Scientific+Computing+Homepage

| Batch Jobs in R | Haddon


Thank you

CSIRO Marine and Atmospheric Research

Malcolm Haddon

tel. 61 3 6232 5097

email. [email protected]

web. www.csiro.au

Thank you

Wealth from Oceans National research Flagship


Adding in r exe to path

Adding in R.exe to Path

  • Control Panel

    • System

      • Advanced System Settings

        • Environmental Variables

          • PATH - edit

  • Paste “; C:/Program Files/R/R3.1.0/bin/x64” onto the end of the present PATH and exit.

| Batch Jobs in R | Haddon


  • Login