Eden parallel functional programming with haskell
Download
1 / 106

Eden Parallel Functional Programming with Haskell - PowerPoint PPT Presentation


  • 123 Views
  • Uploaded on
  • Presentation posted in: General

Eden Parallel Functional Programming with Haskell. Rita Loogen Philipps-Universität Marburg, Germany

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha

Download Presentation

Eden Parallel Functional Programming with Haskell

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Eden Parallel FunctionalProgrammingwithHaskell

Rita Loogen

Philipps-Universität Marburg, Germany

Joint workwithYolanda Ortega Mallén, Ricardo Peña Alberto de la Encina, Mercedes Hildalgo Herrero, ChristóbalPareja, Fernando Rubio, Lidia Sánchez-Gil, Clara Segura, Pablo Roldan Gomez (UniversidadComplutense de Madrid)

Jost Berthold, Silvia Breitinger, Mischa Dieterle, Thomas Horstmeyer, Ulrike Klusik, Oleg Lobachev, Bernhard Pickenbrock, Steffen Priebe, Björn Struckmeier(Philipps-Universität Marburg)

CEFP Budapest 2011


Marburg /Lahn

Rita Loogen: Eden – CEFP 2011


Overview

  • Lectures I & II (Thursday)

    • Motivation

    • Basic Constructs

    • Case Study: Mergesort

    • Eden TV – The Eden Trace Viewer

    • Reducingcommunicationcosts

    • Parallel mapimplementations

    • Explicit Channel Management

    • The Remote Data Concept

    • Algorithmic Skeletons

      • NestedWorkpools

      • DivideandConquer

  • Lecture III: Lab Session (Friday Morning)

  • Lecture IV: Implementation

    • LayeredStructure

    • Primitive Operations

    • The Eden Module

      • The Trans class

      • The PA monad

      • Process Handling

      • Remote Data


Materials

Materials

  • Lecture Notes

  • Slides

  • Example Programs (Case studies)

  • Exercises

    areprovided via the Eden web page

    www.informatik.uni-marburg.de/~eden

    Navigateto CEFP!

Rita Loogen: Eden – CEFP 2011


Motivation

Rita Loogen: Eden – CEFP 2011


Our Goal

Parallel programmingat a highlevelofabstraction

inherent parallelism

  • functional language (e.g. Haskell)

  • => concise programs

  • => high programming efficiency

automatic parallelisation

or annotations


Our Approach

Parallel programmingat a highlevelofabstraction

l

l

l

l

+

  • parallelismcontrol

    • explicit processes

    • implicitcommunication

    • distributedmemory

  • functional language (e.g. Haskell)

  • => concise programs

  • => high programming efficiency

Eden = Haskell + Parallelism

www.informatik.uni-marburg.de/~eden


Basic Constructs

Rita Loogen: Eden – CEFP 2011


Eden

parallel

programming

at a high level of

abstraction

  • = Haskell + Coordination

    • processdefinition

    • processinstantiation

process:: (Trans a, Trans b) => (a -> b) -> Process a b

gridProcess = process (\ (fromLeft,fromTop) ->

let ... in (toRight, toBottom))

processoutputs

computedby

concurrentthreads,

listssentasstreams

( # ) :: (Trans a, Trans b) => Process a b -> a ->b

(outEast, outSouth) = gridProcess# (inWest,inNorth)


Derivedoperatorsandfunctions

  • Parallel functionapplication

    • Often, processabstractionandinstantiationareused in thefollowingcombination

  • Eagerprocesscreation

    • Eagercreationof a seriesofprocesses

($#) :: (Trans a, Trans b) => (a -> b) -> a -> b

f $# x = process f # x -- ($#) = (#) . process

spawn :: (Trans a, Trans b) =>

[Processa b] -> [a] -> [b]

spawn= zipWith (#) -- ignoringdemandcontrol

spawnF ::(Trans a, Trans b) =>

[a -> b] -> [a] -> [b]

spawnF =spawn . (mapprocess)

Rita Loogen: Eden – CEFP 2011


Evaluating f $# e

graphofprocessabstraction

process f

graphofargumentexpressione

#

will beevaluated

in parentprocess

bynewconcurrentthread

andsenttochildprocess

will beevaluated

bynewchildprocess

on remote PE

11

resultof f $ e

mainprocess

creates

childprocess

resultof e

Rita Loogen: Eden – CEFP 2011


*2

*3

*5

sm

sm

DefiningprocessnetsExample: Computing Hammingnumbers

importControl.Parallel.Eden

hamming:: [Int]

hamming

= 1: sm ((uncurrysm) $#

(map (*2)$#hamming,

map (*3)$#hamming))

(map (*5)$# hamming)

sm:: [Int] -> [Int] -> [Int]

sm [] ys = ys

smxs [] = xs

sm (x:xs) (y:ys)

| x < y = x : smxs (y:ys)

| x == y = x : smxsys

| otherwise= y : sm (x:xs) ys

1:

hamming


QuestionsaboutSemantics

  • simple denotationalsemantics

    • processabstraction-> lambdaabstraction

    • processinstantiation-> application

    • value/resultofprogram, but noinformationaboutexecution, parallelismdegree, speedups /slowdowns

  • operational

    • When will a processbecreated? When will a processinstantiationbeevaluated?

    • Towhichdegree will process in-/outputsbeevaluated? Weakhead normal form or normal form or ...?

    • When will process in-/outputsbecommunicated?


Answers

Eden

onlyifandwhenitsresult

isdemanded

normal form

eager (push) communication:

valuesarecommunicated

assoonasavailable

Lazy Evaluation (Haskell)

onlyifandwhenitsresult

isdemanded

WHNF

(weakhead normal form )

onlyifdemanded:

requestandanswer

messagesnecessary

  • When will a processbecreated? When will a processinstantiationbeevaluated?

  • Towhichdegree will process in-/outputsbeevaluated? Weakhead normal form or normal form or ...?

  • When will process in-/outputsbecommunicated?


Lazyevaluation vs. Parallelism

  • Problem:Lazyevaluation==>distributedsequentiality

  • Eden‘sapproach:

    • eagerprocesscreationwithspawn

    • eagercommunication:

      • normal form evaluationof all processoutputs(byindependentthreads)

      • push communication, i.e. valuesarecommunicatedassoonasavailable

    • explicit demandcontrolbysequentialstrategies(Module Control.Seq):

      • rnf, rwhnf... :: Strategy a

      • using :: a -> Strategy a -> a

      • pseq :: a -> b -> b (Module Control.Parallel)


Case Study: MergeSort

Rita Loogen: Eden – CEFP 2011


Case Study: MergeSort

Unsorted

sublist 1

Sorted

sublist 1

Haskell Code:

mergeSort :: (Ord a, Show a) => [a] -> [a]

mergeSort [] = []

mergeSort [x] = [x]

mergeSortxs = sortMerge (mergeSort xs1) (mergeSort xs2)

where [xs1,xs2] = unshuffle 2xs

sorted

list

Unsorted

list

split

merge

Unsorted

sublist 2

sorted

Sublist 2


Example: MergeSortparallel

Unsorted

sublist 1

Sorted

sublist 1

Eden Code (simplestversion):

parMergeSort :: (Ord a, Show a, Trans a) => [a] -> [a]

parMergeSort [] = []

parMergeSort [x] = [x]

parMergeSortxs = sortMerge (parMergeSort$# xs1) (parMergeSort$# xs2)

where [xs1,xs2] = unshuffle 2xs

sorted

list

Unsorted

list

split

merge

Unsorted

sublist 2

sorted

Sublist 2


Example: MergeSortProcessnet

childprocess

childprocess

Eden Code (simplestversion):

parMergeSort :: (Ord a, Show a, Trans a) => [a] -> [a]

parMergeSort [] = []

parMergeSort [x] = [x]

parMergeSortxs = sortMerge (parMergeSort$# xs1) (parMergeSort$# xs2)

where [xs1,xs2] = unshuffle 2xs

childprocess

mainprocess

childprocess

childprocess

childprocess


EdenTV: The Eden Trace Viewer Tool

Rita Loogen: Eden – CEFP 2011


The Eden-System

Eden

Parallel runtimesystem

(Management ofprocesses

andcommunication)

EdenTV

parallel system


Compiling, Running, Analysing Eden Programs

Set upenvironmentfor Eden on Lab computersbycalling

edenenv

Compile Eden programswith

ghc –parmpi --make –O2 –eventlogmyprogram.hsor

ghc –parpvm --make –O2 –eventlogmyprogram.hs

Ifyouusepvm, youfirsthavetostart it.

Providepvmhostsormpihostsfile

Runcompiledprogramswith

myprogram <parameters> +RTS –ls -N<noPe> -RTS

Viewactivityprofile (tracefile) with

edentvmyprogram_..._-N4_-RTS.parevents

Rita Loogen: Eden – CEFP 2011


deblock thread

new thread

runnable

suspend thread

block thread

kill thread

run thread

running

blocked

kill thread

finished

kill thread

Eden Threads andProcesses

  • An Eden processcomprisesseveralthreads(one per outputchannel).

  • Thread State Transition Diagram:


EdenTV

-Diagrams:

Machines (PEs)

Processes

Threads

- Message

Overlays

Machines

Processes

- zooming

- messagestreams

- additional infos

- ...


EdenTV Demo

Rita Loogen: Eden – CEFP 2011


Case Study: MergeSortcontinued

Rita Loogen: Eden – CEFP 2011


Example: Activityprofileof parallel mergesort

  • Program run, lengthofinputlist: 1.000

  • Observation:

  • SLOWDOWN

  • Seq. runtime: 0,0037 s

  • Par. runtime: 0,9472 s

  • Reasons:

  • 1999 processes, mostlyblocked

  • 31940 messages

  • delayedprocesscreation

  • processplacement


Howcanweimproveour parallel mergesort? Herearesomerulesofthumb.

  • Adaptthe total numberofprocessestothenumberofavailableprocessorelements (PEs), in Eden: noPe :: Int

  • UseeagerprocesscreationfunctionsspawnorspawnF.

  • Bydefault, Eden placesprocessesroundrobin on theavailable PEs. Try todistributeprocessesevenlyoverthe PEs.

  • Avoidelement-wisestreamingif not necessary, e.g. byputtingthelistintosome „box“ orbychunkingitintobiggerpieces.

THINK PARALLEL!

Rita Loogen: Eden – CEFP 2011


Parallel Mergesortrevisited

unsorted

sublist 1

sorted

sublist 1

mergesort

unsorted

sublist 2

sorted

sublist 2

mergesort

sorted

list

unsorted

list

mergemanylists

unshuffle (noPe-1)

unsorted

sublist noPe-1

sorted

sublistnoPe-1

mergesort

unsorted

sublistnoPe

sorted

sublistnoPe

mergesort

Rita Loogen: Eden – CEFP 2011


...

x1

x2

x3

x4

...

f

f

f

f

...

y1

y2

y3

y4

A Simple Parallelisationofmap

map :: (a -> b) -> [a] -> [b]

map f xs = [ f x | x <- xs ]

parMap :: (Trans a, Trans b) =>

(a -> b) -> [a] -> [b]

parMap f = spawn (repeat(process f))

1 process

per list element


Alternative Parallelisationofmergesort - 1st try

Eden Code:

par_ms :: (Ord a, Show a, Trans a) => [a] -> [a]

par_msxs

= head $ sms $parMapmergeSort

(unshuffle (noPe-1) xs))

sms :: (NFData a, Ord a) => [[a]] -> [[a]]

sms [] = []

smsxss@[xs] = xss

sms (xs1:xs2:xss) = sms (sortMerge xs1 xs2) (smsxss)

 Total numberofprocesses = noPe

 eagerlycreatedprocesses

 roundrobinplacementleadsto 1 process per PE

but maybe still toomanymessages


ResultingActivity Profile (Processes/Machine View)

Previousresultsforinputsize 1000

Seq. runtime: 0,0037 s

Par. runtime: 0,9472 s

  • Input size 1.000

  • seq. runtime: 0,0037

  • par. runtime: 0,0427 s

  • 8 Pes, 8 processes, 15 threads

  • 2042 messages

  • Much better, but still

  • SLOWDOWN

  • Reason: Indeedtoomanymessages

Rita Loogen: Eden – CEFP 2011


Reducing Communication Costs

Rita Loogen: Eden – CEFP 2011


ReducingNumberof Messages byChunking Streams

Split a list (stream) intochunks:

chunk :: Int -> [a] -> [[a]]

chunksize [] = []

chunksizexs = ys : chunksizezs

where (ys,zs) = splitAtsizexs

Combine with parallel map-implementationofmergesort:

par_ms_c :: (Ord a, Show a, Trans a) =>

Int -> -- chunksize

[a] -> [a]

par_ms_csizexs

= head $ sms $mapconcat$

parMap ((chunksize) . mergeSort . concat)

(map (chunksize)(unshuffle (noPe-1) xs)))

Rita Loogen: Eden – CEFP 2011


ResultingActivity Profile (Processes/Machine View)

Previousresultsforinputsize 1000

Seq. runtime: 0,0037 s

Par. runtime I: 0,9472 s

Par. runtime II: 0,0427 s

  • Input size 1.000, chunksize 200

  • seq. runtime: 0,0037

  • par. runtime: 0,0133 s

  • 8 Pes, 8 processes, 15 threads

  • 56 messages

  • Much better, but still

  • SLOWDOWN

  • parallel runtime w/o Startup and Finish of parallel system:

  • 0,0125-0,009 = 0,0035

  •  increaseinputsize

Rita Loogen: Eden – CEFP 2011


Activity Profile for Input Size 1.000.000

  • Input size 1.000.000

  • Chunksize 1000

  • seq. runtime: 7,287 s

  • par. runtime: 2,795 s

  • 8 Pes, 8 processes, 15 threads

  • 2044 messages

  •  speedupof 2.6 on 8 PE

unshuffle

mapmergesort

merge

Rita Loogen: Eden – CEFP 2011


Further improvement

Idea: Remove inputlistdistributionbylocalsublistselection:

par_ms_c :: (Ord a, Show a, Trans a) =>

Int -> [a] -> [a]

par_ms_csizexs

= head $ sms $mapconcat$

parMap ((chunksize) . mergeSort . concat)

(map (chunksize)(unshuffle (noPe-1) xs)))

par_ms:: (Ord a, Show a, Trans a) =>

Int -> [a] -> [a]

par_ms_bsizexs

= head $ sms $mapconcat$

parMap(\ i -> (chunksize(mergeSort

((unshuffle (noPe-1) xs)!!i))))

[0..noPe-2]

Rita Loogen: Eden – CEFP 2011


CorrespondingActivityProfiles

  • Input size 1.000.000

  • Chunksize 1000

  • seq. runtime: 7,287 s

  • par. runtime: 2,795 s

  • new par. runtime: 2.074 s

  • 8 Pes, 8 processes, 15 threads

  • 1036 messages

  •  speedupof 3.5 on 8 PEs

Rita Loogen: Eden – CEFP 2011


Parallel mapimplementations

Rita Loogen: Eden – CEFP 2011


T

T

T

T

T

T

T

T

...

...

...

...

P

P

P

P

P

P

...

...

PE

PE

PE

PE

Parallel mapimplementations: parMapvsfarm

parMap

farm

farm:: (Trans a, Trans b) =>

([a] -> [[a]]) -> ([[b]] -> [b]) ->

(a -> b)-> [a] -> [b]

farmdistributecombinef xs

= combine (parMap(map f)

(distributexs))

parMap::(Trans a, Trans b) => (a -> b) -> [a] -> [b]

parMapf xs

=spawn(repeat(process f))xs


Processfarms

1 process

per sub-tasklist

withstatic

taskdistribution

farm:: (Trans a, Trans b) =>

([a] -> [[a]]) -> -- distribute

([[b]] -> [b]) -> -- combine

(a->b) -> [a] -> [b]

farmdistributecombine f xs

= combine . (parMap (map f)) . distribute

Choose e.g.

  • distribute = unshufflenoPe / combine = shuffle

  • distribute = splitIntoNnoPe / combine = concat

1 process

per PE

withstatic

taskdistribution

Rita Loogen: Eden – CEFP 2011


Example: Functional Program for Mandelbrot Sets

ul

dimx

Idea: parallel computation of lines

lr

image :: Double -> Complex Double -> Complex Double -> Integer -> String

imagethresholdullrdimx

= header ++ (concat$ map xy2col lines)

where

xy2col ::[Complex Double] -> String

xy2col line = concatMap (rgb.(iterthreshold (0.0 :+ 0.0) 0)) line

(dimy, lines) = coordullrdimx


Example: ParallelFunctional Program for Mandelbrot Sets

ul

dimx

Idea: parallel computation of lines

lr

image :: Double -> Complex Double -> Complex Double -> Integer -> String

imagethresholdullrdimx

= header ++ (concat$ map xy2col lines)

where

xy2col ::[Complex Double] -> String

xy2col line = concatMap (rgb.(iterthreshold (0.0 :+ 0.0) 0)) line

(dimy, lines) = coordullrdimx

Replacemapby

farm(unshufflenoPe) shuffle

orfarmB (splitIntoNnoPe) concat


Mandelbrot

Traces

Problem size: 2000 x 2000

Platform: Beowulf cluster

Heriot-Watt-University,

Edinburgh

(32 Intel P4-SMP nodes @ 3 GHz

512MB RAM, Fast Ethernet)

farm (unshufflenoPe) shuffle

roundrobinstatic

taskdistribution

farm (splitIntoNnoPe) concat

roundrobinstatic

taskdistribution


Camera

2D Image

3D Scene

Example: Ray Tracing

rayTrace:: Size -> CamPos -> [Object] -> [Impact]

rayTracesizecameraPosscene = findImpactsallRaysscene

whereallRays = generateRayssizecameraPos

findImpacts:: [Ray] -> [Object] -> [Impact]

findImpactsraysobjs= map (firstImpactobjs) rays


Reducing Communication CostsbyChunking

Combine chunkingwith parallel map-implementation:

chunkMap :: Int -> (([a] -> [b]) -> ([[a]] -> [[b]]))

-> (a -> b) -> [a] -> [b]

chunkMapsizemapscheme f xs

= concat (mapscheme (map f) (chunksizexs))

Rita Loogen: Eden – CEFP 2011


RaytracerExample:Element-wise Streaming vsChunking

Input size 250

Chunksize 500

Runtime: 0,235 s

8 PEs

9 processes

17 threads

48 conversations

548 messages

Input size 250

Runtime: 6,311 s

8 PEs

9 processes

17 threads

48 conversations

125048 messages

Rita Loogen: Eden – CEFP 2011


Communication vs Parameter Passing

Processinputs- canbecommunicated: f $# inp

- canbepassedasparameter(\ () -> f inp) $# () () isdummyprocessinput

graphofprocessabstraction

graphofinputexpression

#

will beevaluated in parentprocess

byconcurrentthread

andthensenttochildprocess

will bepacked (serialised)

andsentto remote PE

wherechildprocessiscreated

toevaluatethisexpression

Rita Loogen: Eden – CEFP 2011


T

T

T

T

...

...

P

P

P

P

T...T

T...T

...

...

PE

PE

PE

PE

Farm vs Offline Farm

Farm

Offline Farm

offlineFarm:: (Trans a, Trans b) =>

([a] -> [[a]]) -> ([[b]] -> [b]) ->

(a -> b) -> [a] -> [b]

offlineFarmdistributecombinef xs

= combine$

spawn

(map (rfi (map f)) (distributexs) )(repeat ())

rfi :: (a -> b) -> a -> Process () b

rfi h x = process (\ () -> h x)

farm:: (Trans a, Trans b) =>

([a] -> [[a]]) -> ([[b]] -> [b]) ->

(a -> b)-> [a] -> [b]

farmdistributecombinef xs

= combine (parMap(map f)

(distributexs))


RaytracerExample: Farm vs Offline Farm

Input size 250

Chunksize 500

Runtime: 0,235 s

8 PEs

9 processes

17 threads

48 conversations

548 messages

Input size 250

Chunksize 500

Runtime: 0,119 s

8 PEs

9 processes

17 threads

40 conversations

290 messages

Rita Loogen: Eden – CEFP 2011


Eden extendsHaskellwithparallelism

explicit processdefinitionsandimplicitcommunication

controlofprocessgranularity, distributionofwork, andcommunicationtopology

implementedbyextendingthe Glasgow Haskell Compiler (GHC)

toolEdenTVtoanalyse parallel programbehaviour

rulesofthumbforproducingefficient parallel programs

numberofprocesses ~ noPe

reducingcommunication

chunking

offline processes: parameterpassinginsteadofcommunication

parallel mapimplementations

Schematataskdecompositiontaskdistribution

parMapregularstatic:process per task

farmregularstatic:process per processor

offlineFarmregularstatic:taskselectionin processes

workpoolirregulardynamic

Eden: Whatwehaveseen so far


Overview Eden Lectures

  • Lectures I & II (Thursday)

    • Motivation

    • Basic Constructs

    • Case Study: Mergesort

    • Eden TV – The Eden Trace Viewer

    • Reducingcommunicationcosts

    • Parallel mapimplementations

    • Explicit Channel Management

    • The Remote Data Concept

    • Algorithmic Skeletons

      • NestedWorkpools

      • DivideandConquer

  • Lecture III: Lab Session (Friday Morning)

  • Lecture IV: Implementation (FridayAfternoon)

    • LayeredStructure

    • Primitive Operations

    • The Eden Module

      • The Trans class

      • The PA monad

      • Process Handling

      • Remote Data


workpool

merge

worker

nw-1

worker

1

worker

0

Many-to-one Communication: merge

Using non-deterministicmergefunction: merge:: [[a]] -> [a]

Workpoolor

Master/Worker Scheme

masterWorker :: (Trans a, Trans b) => Int -> Int -> (a->b) -> [a] -> [b]

masterWorkernwprefetchf tasks = orderByfromWsreqs

wherefromWs = parMap (map f) toWs

toWs = distributenptasksreqs

reqs = initReqs ++ newReqs

initReqs = concat (replicateprefetch [0..nw-1])

newReqs = merge [[i | r <- rs] | (i,rs) <- zip [0..nw-1] fromWs]


Example: Mandelbrot revisited!

offlineFarm

splitIntoN (noPe-1)

Input size 2000

Runtime: 13,09 s

8 PEs

8 processes

15 threads

35 conversations

1536 messages

Input size 2000

Runtime: 13,91 s

8 PEs

8 processes

22 threads

42 conversations

3044 messages

masterWorker

prefetch 50

Rita Loogen: Eden – CEFP 2011


T

T

T

T

T

T

T

T

...

...

...

...

P

P

P

P

T...T

T...T

P

P

P

P

...

...

PE

PE

...

PE

PE

PE

PE

workpool

T

T

T

merge

w np-1

w 1

w 0

PE

PE

PE

Parallel mapimplementations

parMapfarmofflineFarm

  • statictaskdistribution:

  • dynamictaskdistribution:

increasinggranularity


Explicit Channel Management

Rita Loogen: Eden – CEFP 2011


Explicit Channel Management in Eden

Example: Definition of a process ring

ring :: (Trans i,Transo,Trans r) =>

((i,r) -> (o,r)) -> -- ring processfct

[i] -> [o] -- input-outputfct

ring f is = os

where

(os, ringOuts)

= unzip[process f #inp |

inp<- lazyzipisringIns]

ringIns = rightRotateringOuts

rightRotatexs = last xs : initxs

rightrotate

Problem: onlyindirect ring connections via parentprocess


Explicit Channels in Eden

parfill

nextChan

new

prevchan

p

  • Channel generation

  • Channel usage

plink ::

(Trans i,Trans o, Trans r) =>

((i,r) -> (o,r)) ->

Process (i,ChanName r)

(o,ChanName r)

plink f = processfun_link

where

fun_link (fromP, nextChan)

= new (\ prevChanprev ->

let

(toP, next)

= f (fromP, prev)

in

parfillnextChannext

(toP, prevChan)

)

new :: Trans a =>

(ChanName a -> a -> b) -> b

parfill :: Trans a =>

ChanName a -> a -> b -> b


Ring Definition

with explicit channels

ring :: (Trans i,Transo,Trans r) =>

((i,r) -> (o,r)) -> -- ring processfct

[i] -> [o] -- input-outputfct

ring f is = os

where

(os, ringOuts)

= unzip [f # inp |

inp<- lazyzipisringIns]

ringIns = rightRotateringOuts

rightRotatexs = last xs : initxs

leftrotate

rightrotate

process

plink

Problem: onlyindirect ring connections via parentprocess


Traceprofile RingImplicitvs explicit channels

ring withimplicitchannels -

All communications

gothroughgenerator

process (number1).

Ring with explicit channels –

Ring processescommunicatedirectly.


The Remote Data Concept

Rita Loogen: Eden – CEFP 2011


The „Remote Data“-Concept

  • Functions:

    • Release localdatawithrelease:: a -> RD a

    • Fetchreleaseddatawithfetch :: RD a -> a

  • Replace

    • (process g # (process f # inp))

      with

    • process (g o fetch) # (process (release o f) # inp)

inp

f

g

inp

f

g


Ring Definition with Remote Data

ring :: (Trans i,Transo,Trans r) =>

((i,r) -> (o,r)) -> -- ring processfct

[i] -> [o] -- input-outputfct

ring f is = os

where

(os, ringOuts)

= unzip [processf_RD#inp |

inp<- lazyzipisringIns]

f_RD (i, ringIn) = (o, releaseringOut)

where (o, ringOut) = f (i, fetchringIn)

ringIns = rightRotateringOuts

rightRotatexs = last xs : initxs

type [RD r]


Implementationof Remote Data withdynamicchannels

-- remote data

type RD a = ChanName (ChanName a)

-- convertlocaldataintocorresponding remote data

release :: Trans a ⇒ a → RD a

release x = new (\ cc c → parfill c x cc)

-- convert remote dataintocorrespondinglocaldata

fetch :: Trans a ⇒ RD a → a

fetch cc = new (\ c x → parfill cc c x)


Main Station

1

200

  • 0 200300∞ ∞

  • 0 150 ∞ 400

  • 150 0 50 125

  • ∞ ∞50 0 100

  • ∞4001251000

Elisabeth

church

2

300

150

400

Town Hall

3

125

50

4

Mensa

Computetheshortestway

fromA toB für arbitrarynodes

A and B!

100

5

Old University

Example: Computing ShortestPaths

Map -> Graph -> Adjacencymatrix/Distancematrix

1

2

3

4

5


Warshall‘salgorithmin process ring

  • Ring

rowi

rowk

ring_iterate :: Int -> Int -> Int ->

[Int] -> [[Int]] -> ( [Int],[[Int]])

ring_iteratesize k i rowk (rowi:xs)

| i > size = (rowk, []) -- End ofiterations

| i == k = (rowR, rowk:restoutput) –- send ownrow

| otherwise = (rowR, rowi:restoutput) –- update row

where

(rowR, restoutput) = ring_iteratesize k (i+1) nextrowkxs

nextrowk | i == k = rowk -- no update, ifownrow

| otherwise = updaterowrowkrowi(rowk!!(i-1))

Force evaluationofnextrowkbyinserting

rnfnextrowk `pseq`

beforecallofring_iterate


Tracesof parallel Warshall

sequentialstartupphase

With additional demand on

nextrowk

End of

fast

version


(Advanced) Algorithmic Skeletons

Rita Loogen: Eden – CEFP 2011


Algorithmic Skeletons

  • patternsof parallel computations=> in Eden: parallel higher-order functions

  • typicalpatterns:

    • parallel mapsandmaster-workersystems: parMap, farm, offline_farm, mw (workpoolSorted)

    • map-reduce

    • topologyskeletons: pipeline, ring, torus, grid, trees...

    • divideandconquer

  • in thefollowing:

    • nestedmaster-workersystems

    • divideandconquerschemes

See Eden‘s

Skeleton Library


workpool

merge

worker

np-1

tasks

worker

1

worker

0

results

sub-wp

sub-wp

sub-wp

sub-wp

merge

merge

w{np-1}

w1

w0

merge

merge

w{np-1}

w1

w0

w{np-1}

w{np-1}

w1

w1

w0

w0

Nesting Workpools

sub-wp

sub-wp

sub-wp

sub-wp

sub-wp


workpool

merge

worker

np-1

tasks

worker

1

worker

0

results

sub-wp

sub-wp

sub-wp

sub-wp

merge

merge

w{np-1}

w1

w0

merge

merge

w{np-1}

w1

w0

w{np-1}

w{np-1}

w1

w1

w0

w0

Nesting Workpools

wpNested :: (Trans a, Trans b) =>

[Int] -> [Int] -> -- branchingdegrees/prefetches

-- per level

([a] -> [b]) -> -- workerfunction

[a] -> [b] -- tasks, results

wpNestednspfswf = foldrfldwf (zipnspfs)

where

fld :: (Trans a, Trans b) =>

(Int,Int) -> ([a] -> [b]) -> ([a] -> [b])

fld (n,pf) wf = workpool' n pfwf

sub-wp

sub-wp

sub-wp

sub-wp

sub-wp

wpnested [4,5] [64,8] yields


HierarchicalWorkpool

1 master

4 submasters

20 workers

fasterresultcollection

via hierarchy

-> betteroverallruntime

Mandelbrot Trace

Problem size: 2000 x 2000

Platform: Beowulf clusterHeriot-Watt-University, Edinburgh

(32 Intel P4-SMP nodes @ 3 GHz, 512MB RAM, Fast Ethernet)


Experimental Results

  • Mandelbrot set visualisation

  • . . . for 5000 5000 pixels, calculated line-wise (5000 tasks)

  • Platform: Beowulf cluster Heriot-Watt-University(32 Intel P4-SMP nodes @ 3 GHz, 512MB RAM, Fast Ethernet)

[31] [2,15] [4,7] [6,5] [2,2,4]

logical PEs 32 33 33 37 35


7W

7W

7W

7W

1-Level-Nesting Trace

Garbage

Collection

Prefetch

branching

[4,7]

=> 33 log.PEs

prefetch 40


4W

4W

4W

4W

4W

4W

3-Level-Nesting Trace

branching

[3,2,4]

=> 34 log.PEs

prefetch 60


1

2

1

2

1

3

5

1

4

3

6

2

7

5

8

1

4

3

6

2

7

5

8

Divide-and-conquer

dc :: (a->Bool) -> (a->b) -> (a->[a]) -> ([b]->b) -> a->b

dc trivial solvesplitcombinetask

= if trivial taskthensolvetask

elsecombine (maprec_dc (splittask))

whererec_dc = dc trivial solvesplitcombine

regularbinaryscheme

withdefaultplacing:

1

2

1

2

3

1

3

3

1

4

3

4

2

4

5

3

1

4

3

4

2

4

5


1

2

1

2

1

3

5

1

4

3

6

2

7

5

8

1

4

3

6

2

7

5

8

Explicit Placement via Ticket List

2,

3, 4, 5, 6, 7, 8

1

unshuffle

5, 7

4,

6, 8

3,

2

1

6

8

5

2

7

4

1

3

4

1

5

3

7

2

6

8

4

1

5

3

7

2

6

8


Regular DC-Skeleton with Ticket Placement

dcNTickets :: (Trans a, Trans b) =>

Int -> [Int] -> ... -- branch degree / tickets / ...

dcNTickets k [] trivial solve split combine

= dc trivial solve split combine

dcNTickets k tickets trivial solve split combine x

= if trivial x then solve x

else childRes `pseq` rnf myRes `pseq` -- demand control

combine (myRes:childRes ++ localRess )

wherechildRes = spawnAtchildTicketschildProcsprocIns

childProcs = map (process . rec_dcN) theirTs

rec_dcNts = dcNTickets kts trivial solve split combine

-- ticket distribution

(childTickets, restTickets) = splitAt (k-1) tickets

(myTs: theirTs) = unshuffle k restTickets

-- input splitting

(myIn:theirIn) = split x

(procIns, localIns) = splitAt (length childTickets) theirIn

-- local computations

myRes = dcNTicketsk myTstrivial solve split combine myIn

localRess = map (dc trivial solve split combine) localIns


Regular DC-Skeleton with Ticket Placement

dcNTickets:: (Trans a, Trans b) =>

Int -> [Int] -> ... -- branchdegree / tickets / ...

dcNTickets k [] trivial solvesplitcombine

= dc trivial solvesplitcombine

dcNTickets k tickets trivial solvesplitcombine x

= if trivial x thensolve x

elsechildRes`pseq` rnfmyRes`pseq` -- demandcontrol

combine (myRes:childRes ++ localRess )

wherechildRes = spawnAtchildTicketschildProcsprocIns

childProcs = map (process . rec_dcN) theirTs

rec_dcNts = dcNTickets kts trivial solvesplitcombine

-- ticket distribution

(childTickets, restTickets) = splitAt (k-1) tickets

(myTs: theirTs) = unshuffle k restTickets

-- inputsplitting

(myIn:theirIn) = split x

(procIns,localIns) = splitAt (lengthchildTickets) theirIn

-- localcomputations

myRes= dcNTicketsk myTstrivial solve split combine myIn

localRess = map (dc trivial solvesplitcombine) localIns

  • arbitrary, but fixedbranchingdegree

  • flexible, workswith

    • toofewtickets

    • double tickets

  • parallel unfoldingcontrolledby ticket list


Case Study: Karatsuba

  • multiplicationof large integers

  • fixedbranchingdegree 3

  • complexity O(nlog2 3), combinecomplexity O(n)

  • Platform: LAN (Fast Ethernet),7 dual-corelinuxworkstations, 2 GB RAM

  • inputsize: 2 integerswith 32768 digitseach


Divide-and-ConquerSchemes

  • Distributed expansion

  • Flat expansion


Divide-and-ConquerUsing Master-Worker

divConFlat :: (Trans a,Trans b, Show b, Show a, NFData b) =>

((a->b) -> [a] -> [b])

-> Int -> (a->Bool) -> (a->b) -> (a->[a]) -> ([b]->b) -> a -> b

divConFlatparallelMapSkeldepth trivial solvesplitcombine x

= combineTopMaster (\_ -> combine) levelsresults

where

(tasks,levels) = generateTasksdepth trivial split x

results = parallelMapSkeldcSeqtasks

dcSeq = dc trivial solvesplitcombine


Case Study: Parallel FFT

  • frequency distribution in a signal, decimation in time

  • 4-radix FFT, input size: 410 complex numbers

  • Platform: Beowulf Cluster Edinburgh

Problem:

Communicating

huge data amounts


Using Master-Worker-DC


Intermediate Conclusions

  • Eden enableshigh-level parallel programming

  • Usepredefinedor design ownskeletons

  • Eden‘sskeletonlibraryprovides a large collectionofsophisticatedskeletons:

    • parallel maps: parMap, farm, offlineFarm …

    • master-worker: flat, hierarchical, distributed …

    • divide-and-conquer: ticket placement, via master-worker …

    • topologicalskeletons: ring, torus, all-to-all, parallel transpose …

Rita Loogen: Eden – CEFP 2011


Eden Lab Session

  • Download the exercise sheet from http://www.mathematik.uni-marburg.de/~eden/?content=cefp

  • Chooseoneofthethreeassignmentsanddownloadthecorrespondingsequentialprogram: sumEuler.hs (easy) - juliaSets.hs (medium) - gentleman.hs (advanced)

  • Download the sample mpihostsfileandmodifyittorandomlychosen lab computersnylxywithxychosenfrom 01 upto 64

  • Call edenenvtosetuptheenvironmentfor Eden

  • Compile Eden programswithghc –parmpi --make –O2 –eventlogmyprogram.hs

  • Runcompiledprogramswithmyprogram <parameters> +RTS –ls -Nx-RTS with x=noPe

  • Viewactivityprofile (tracefile) withedentvmyprogram_..._-N…_-RTS.parevents

Rita Loogen: Eden – CEFP 2011


Eden‘sImplementation

Rita Loogen: Eden – CEFP 2011


HaskellEden

core syntax

simplify

Lex/Yacc parser

core to STG

simplify

STG

Front end

code generation

(parallel Eden

runtime system)

abs. syntax

prefix form

C

abstract C

abs. syntax

abs. syntax

renamer

type checker

reader

flattening

C compiler

desugarer

Message

Passing

Library

MPI or PVM

Glasgow Haskell Compiler& Eden Extensions


Eden‘s parallel runtimesystem (PRTS)

Modificationof GUM, the PRTS ofGpH (Glasgow Parallel Haskell):

  • Recycled

    • Thread management: heapobjects, threadscheduler

    • Memory management: localgarbagecollection

    • Communication: graphpackingandunpackingroutines

  • Newlydeveloped

    • Processmanagement:runtimetables, generationandtermination

    • Channel management:channelrepresentation, connection, etc.

  • Simplifications

    • no „virtualsharedmemory“ (global addressspace) necessary

    • noglobalisationofunevaluateddata

    • no global garbagecollectionofdata


BH

.

.

.

.

.

.

.

.

.

DREAM: DistRibuted Eden Abstract Machine

  • abstractviewofEden‘s parallel runtimesystem

  • abstractviewofprocess:

HEAP

BH

out-

ports

in-

ports

BH

Thread representedby TSO (threadstateobject) in theheap

black hole closure, on accessthreadsaresuspendeduntilthisclosureisoverwritten

BH

Rita Loogen: Eden – CEFP 2011


no global addressspace

localheap

inports/outports

noneedfor global garbagecollection

localgarbagecollection

outportsas additional roots

inportscanberecognisedasgarbage

.

.

.

.

.

.

.

.

.

Garbage Collection and Termination

close

BH

BH

out-

ports

BH

BH

in-

ports

.

.

.

BH

BH


Implementationof Eden

Eden

  • Parallel programming on a highlevelofabstraction

    • explicit processdefinitions

    • implicitcommunication

  • Automaticprocessandchannelmanagement

    • Distributed graphreduction

    • Management ofprocessesandtheirinterconnectingchannels

    • Message passing

?

Eden

Runtime System (RTS)


Eden Module

Implementationof Eden

Eden

  • Parallel programming on a highlevelofabstraction

    • explicit processdefinitions

    • implicitcommunication

  • Automaticprocessandchannelmanagement

    • Distributed graphreduction

    • Management ofprocessesandtheirinterconnectingchannels

    • Message passing

Eden

Runtime System (RTS)


Eden programs

Skeleton Library

Eden Module

Primitive Operations

Parallel GHC Runtime System

Layer Structure


Parprim – The Interface to the Parallel RTS

Primitive operationsprovide the basic functionality :

  • channel administration

    • primitive channels (= inports)data ChanName' a = Chan Int# Int# Int#

    • create communication channel(s)createC :: IO ( ChanName' a, a )

    • connect communication channelconnectToPort :: ChanName' a -> IO ()

  • communication

    • send datasendData :: Mode -> a -> IO ()

    • modidata Mode = Connect | Stream | Data | Instantiate Int

  • thread creationfork :: IO () -> IO ()

  • generalnoPE, selfPE :: Int

  • PE ProcessInport


    The Eden Module

    • Type class Trans

    • transmissible data types

    • overloaded communication functions for lists (-> streams): write :: a -> IO () and tuples (-> concurrency): createComm :: IO (ChanName a, a)

    process:: (Trans a, Trans b) => (a -> b)-> Process a b

    ( # ) :: (Trans a, Trans b) => Processa b -> a -> b

    spawn :: (Trans a, Trans b) => [Process a b] -> [a]->[b]

    explicit definitions

    of process, ( # ) and Process

    as well as spawn

    • explicit channels

    • newtype ChanName a

    • = Comm (a -> IO ())


    Type class Trans

    class NFData a => Transa where

    write :: a -> IO ( )

    write x = rnf x `pseq` sendDataData x

    createComm :: (ChanName a, a) createComm = do(cx, x) <- createC

    return (Comm (sendVia cx), x)

    sendVia :: ChanName’ a -> a -> IO ()

    sendViach d = do connectToPortch

    write d


    Tupletransmissionbyconcurrentthreads

    instance (Trans a, Trans b) => Trans (a,b) where

    createComm = do (cx,x) <- createC

    (cy,y) <- createC

    return (Comm (write2 (cx,cy)),

    (x,y))

    write2 :: (Trans a, Trans b) =>

    (ChanName' a, ChanName' b) -> (a,b) -> IO ()

    write2 (c1,c2) (x1,x2) = do fork (sendVia c1 x1)

    sendVia c2 x2

    Rita Loogen: Eden – CEFP 2011


    Stream Transmission of Lists

    instance Trans a => Trans [a] where

    write l@[] = sendData Data l

    write (x:xs) = do (rnf x `pseq` sendData Stream x)

    write xs

    Rita Loogen: Eden – CEFP 2011


    The PA Monad

    Improvingcontrolover parallel activities:

    newtypePA a = PA { fromPA :: IO a }

    instanceMonad PA where

    return b = PA $ return b

    (PA ioX) >>= f = PA $ do x <- ioX

    fromPA $ f x

    runPA :: PA a -> a

    runPA = unsafeperformIO . fromPA

    Rita Loogen: Eden – CEFP 2011


    Remote Process Creation

    data (Trans a, Trans b) => Processa b

    = Proc (ChanName b -> ChanName‘(ChanName a) -> IO ( ) )

    process :: (a -> b) -> Process a b

    process f= Procf_remotewheref_remote(CommsendResult) inCC =do (sendInput, invals) = createComm

    connectToPortinCCsendDataDatasendInputsendResult (f invals)

    channel for

    returning input channel handle(s)

    output channel(s)

    Process Output


    ProcessInstantiation

    ( # ) :: (Trans a, Trans b) => Process a b -> a -> b

    pabs#inps

    = unsafePerformIO $ instantiateAt0pabsinps

    instantiateAt:: (Trans a, Trans b) =>

    Int -> Process a b -> a -> IO b

    instantiateAtpe(Procf_remote) inps

    = do(sendresult, result) <- createComm

    (inCC, CommsendInput) <- createC

    sendData(Instantiatepe)

    (f_remotesendresultinCC)

    fork (sendInputinps)

    returnresult

    output channel(s)

    channel for

    returning input channel handle(s)


    PE 1

    PE 2

    System

    Parent Process

    Child Process

    createinputchannels (inports)

    forchildprocess‘ results

    createinputchannels

    send inputchannels

    toparentprocess

    createinputchannels

    forreceivingchildprocess‘

    inputchannels

    createoutportsandforkthreads

    forevaluatingoutputcomponents

    send instantiatemessage

    withcreatedinputchannels

    H

    E

    A

    P

    evaluateoutputcomponent n

    evaluateoutputcomponent 2

    evaluateoutputcomponent 1

    createoutportsandforkthreads

    forsendinginputstochildprocess

    send resultstoparent

    andterminatethread

    send resultstoparent

    andterminatethread

    send resultstoparent

    andterminatethread

    H

    E

    A

    P

    evaluateinputcomponentsto NF

    continueevaluation

    evaluateinputcomponentsto NF

    evaluateinputcomponent n to NF

    terminateprocessand

    closeinports

    send resultstochild

    andterminatethread

    send resultstochild

    andterminatethread

    send resultstochild

    andterminatethread


    Eden programs

    Skeleton Library

    Eden Module

    Primitive Operations

    Parallel GHC Runtime System

    ConclusionsofLecture 3

    Layeredimplementationof Eden

    • More flexibility

    • Complexityhidden

    • BetterMaintainability

    • Lean interfaceto GHC RTS


    Conclusions

    www.informatik.uni-marburg.de/~eden

    • Eden = Haskell + Coordination

    • Explicit ProcessDefinitions

    • ImplicitCommunication (Message Transfer)

    • Explicit Channel Management -> arbitraryprocesstopologies

    • NondeterministicMerge-> masterworkersystemswithdynamicloadbalancing

    • Remote Data -> pass datadirectlyfromproducertoconsumerprocesses

    • ProgrammingMethodology: UseAlgorithmic Skeletons

    • EdenTVtoanalyse parallel programbehaviour

    • Available on severalplatforms

    Rita Loogen: Eden – CEFP 2011


    More on Eden

    PhD Workshop tomorrow 16:40-17:00

    Bernhard Pickenbrock: Development of a multi-core implementation of Eden

    Rita Loogen: Eden – CEFP 2011


    ad
  • Login