Eden parallel functional programming with haskell
This presentation is the property of its rightful owner.
Sponsored Links
1 / 106

Eden Parallel Functional Programming with Haskell PowerPoint PPT Presentation


  • 98 Views
  • Uploaded on
  • Presentation posted in: General

Eden Parallel Functional Programming with Haskell. Rita Loogen Philipps-Universität Marburg, Germany

Download Presentation

Eden Parallel Functional Programming with Haskell

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Eden parallel functional programming with haskell

Eden Parallel FunctionalProgrammingwithHaskell

Rita Loogen

Philipps-Universität Marburg, Germany

Joint workwithYolanda Ortega Mallén, Ricardo Peña Alberto de la Encina, Mercedes Hildalgo Herrero, ChristóbalPareja, Fernando Rubio, Lidia Sánchez-Gil, Clara Segura, Pablo Roldan Gomez (UniversidadComplutense de Madrid)

Jost Berthold, Silvia Breitinger, Mischa Dieterle, Thomas Horstmeyer, Ulrike Klusik, Oleg Lobachev, Bernhard Pickenbrock, Steffen Priebe, Björn Struckmeier(Philipps-Universität Marburg)

CEFP Budapest 2011


Eden parallel functional programming with haskell

Marburg /Lahn

Rita Loogen: Eden – CEFP 2011


Overview

Overview

  • Lectures I & II (Thursday)

    • Motivation

    • Basic Constructs

    • Case Study: Mergesort

    • Eden TV – The Eden Trace Viewer

    • Reducingcommunicationcosts

    • Parallel mapimplementations

    • Explicit Channel Management

    • The Remote Data Concept

    • Algorithmic Skeletons

      • NestedWorkpools

      • DivideandConquer

  • Lecture III: Lab Session (Friday Morning)

  • Lecture IV: Implementation

    • LayeredStructure

    • Primitive Operations

    • The Eden Module

      • The Trans class

      • The PA monad

      • Process Handling

      • Remote Data


Materials

Materials

Materials

  • Lecture Notes

  • Slides

  • Example Programs (Case studies)

  • Exercises

    areprovided via the Eden web page

    www.informatik.uni-marburg.de/~eden

    Navigateto CEFP!

Rita Loogen: Eden – CEFP 2011


Motivation

Motivation

Rita Loogen: Eden – CEFP 2011


Our goal

Our Goal

Parallel programmingat a highlevelofabstraction

inherent parallelism

  • functional language (e.g. Haskell)

  • => concise programs

  • => high programming efficiency

automatic parallelisation

or annotations


Our approach

Our Approach

Parallel programmingat a highlevelofabstraction

l

l

l

l

+

  • parallelismcontrol

    • explicit processes

    • implicitcommunication

    • distributedmemory

  • functional language (e.g. Haskell)

  • => concise programs

  • => high programming efficiency

Eden = Haskell + Parallelism

www.informatik.uni-marburg.de/~eden


Basic constructs

Basic Constructs

Rita Loogen: Eden – CEFP 2011


Eden parallel functional programming with haskell

Eden

parallel

programming

at a high level of

abstraction

  • = Haskell + Coordination

    • processdefinition

    • processinstantiation

process:: (Trans a, Trans b) => (a -> b) -> Process a b

gridProcess = process (\ (fromLeft,fromTop) ->

let ... in (toRight, toBottom))

processoutputs

computedby

concurrentthreads,

listssentasstreams

( # ) :: (Trans a, Trans b) => Process a b -> a ->b

(outEast, outSouth) = gridProcess# (inWest,inNorth)


Derived operators and functions

Derivedoperatorsandfunctions

  • Parallel functionapplication

    • Often, processabstractionandinstantiationareused in thefollowingcombination

  • Eagerprocesscreation

    • Eagercreationof a seriesofprocesses

($#) :: (Trans a, Trans b) => (a -> b) -> a -> b

f $# x = process f # x -- ($#) = (#) . process

spawn :: (Trans a, Trans b) =>

[Processa b] -> [a] -> [b]

spawn= zipWith (#) -- ignoringdemandcontrol

spawnF ::(Trans a, Trans b) =>

[a -> b] -> [a] -> [b]

spawnF =spawn . (mapprocess)

Rita Loogen: Eden – CEFP 2011


Evaluating f e

Evaluating f $# e

graphofprocessabstraction

process f

graphofargumentexpressione

#

will beevaluated

in parentprocess

bynewconcurrentthread

andsenttochildprocess

will beevaluated

bynewchildprocess

on remote PE

11

resultof f $ e

mainprocess

creates

childprocess

resultof e

Rita Loogen: Eden – CEFP 2011


Defining process nets example computing hamming numbers

*2

*3

*5

sm

sm

DefiningprocessnetsExample: Computing Hammingnumbers

importControl.Parallel.Eden

hamming:: [Int]

hamming

= 1: sm ((uncurrysm) $#

(map (*2)$#hamming,

map (*3)$#hamming))

(map (*5)$# hamming)

sm:: [Int] -> [Int] -> [Int]

sm [] ys = ys

smxs [] = xs

sm (x:xs) (y:ys)

| x < y = x : smxs (y:ys)

| x == y = x : smxsys

| otherwise= y : sm (x:xs) ys

1:

hamming


Questions about semantics

QuestionsaboutSemantics

  • simple denotationalsemantics

    • processabstraction-> lambdaabstraction

    • processinstantiation-> application

    • value/resultofprogram, but noinformationaboutexecution, parallelismdegree, speedups /slowdowns

  • operational

    • When will a processbecreated? When will a processinstantiationbeevaluated?

    • Towhichdegree will process in-/outputsbeevaluated? Weakhead normal form or normal form or ...?

    • When will process in-/outputsbecommunicated?


Answers

Answers

Eden

onlyifandwhenitsresult

isdemanded

normal form

eager (push) communication:

valuesarecommunicated

assoonasavailable

Lazy Evaluation (Haskell)

onlyifandwhenitsresult

isdemanded

WHNF

(weakhead normal form )

onlyifdemanded:

requestandanswer

messagesnecessary

  • When will a processbecreated? When will a processinstantiationbeevaluated?

  • Towhichdegree will process in-/outputsbeevaluated? Weakhead normal form or normal form or ...?

  • When will process in-/outputsbecommunicated?


Lazy evaluation vs parallelism

Lazyevaluation vs. Parallelism

  • Problem:Lazyevaluation==>distributedsequentiality

  • Eden‘sapproach:

    • eagerprocesscreationwithspawn

    • eagercommunication:

      • normal form evaluationof all processoutputs(byindependentthreads)

      • push communication, i.e. valuesarecommunicatedassoonasavailable

    • explicit demandcontrolbysequentialstrategies(Module Control.Seq):

      • rnf, rwhnf... :: Strategy a

      • using :: a -> Strategy a -> a

      • pseq :: a -> b -> b (Module Control.Parallel)


Case study merge sort

Case Study: MergeSort

Rita Loogen: Eden – CEFP 2011


Case study merge sort1

Case Study: MergeSort

Unsorted

sublist 1

Sorted

sublist 1

Haskell Code:

mergeSort :: (Ord a, Show a) => [a] -> [a]

mergeSort [] = []

mergeSort [x] = [x]

mergeSortxs = sortMerge (mergeSort xs1) (mergeSort xs2)

where [xs1,xs2] = unshuffle 2xs

sorted

list

Unsorted

list

split

merge

Unsorted

sublist 2

sorted

Sublist 2


Example merge sort parallel

Example: MergeSortparallel

Unsorted

sublist 1

Sorted

sublist 1

Eden Code (simplestversion):

parMergeSort :: (Ord a, Show a, Trans a) => [a] -> [a]

parMergeSort [] = []

parMergeSort [x] = [x]

parMergeSortxs = sortMerge (parMergeSort$# xs1) (parMergeSort$# xs2)

where [xs1,xs2] = unshuffle 2xs

sorted

list

Unsorted

list

split

merge

Unsorted

sublist 2

sorted

Sublist 2


Example merge sort process net

Example: MergeSortProcessnet

childprocess

childprocess

Eden Code (simplestversion):

parMergeSort :: (Ord a, Show a, Trans a) => [a] -> [a]

parMergeSort [] = []

parMergeSort [x] = [x]

parMergeSortxs = sortMerge (parMergeSort$# xs1) (parMergeSort$# xs2)

where [xs1,xs2] = unshuffle 2xs

childprocess

mainprocess

childprocess

childprocess

childprocess


Edentv the eden trace viewer tool

EdenTV: The Eden Trace Viewer Tool

Rita Loogen: Eden – CEFP 2011


The eden system

The Eden-System

Eden

Parallel runtimesystem

(Management ofprocesses

andcommunication)

EdenTV

parallel system


Compiling running analysing eden programs

Compiling, Running, Analysing Eden Programs

Set upenvironmentfor Eden on Lab computersbycalling

edenenv

Compile Eden programswith

ghc –parmpi --make –O2 –eventlogmyprogram.hsor

ghc –parpvm --make –O2 –eventlogmyprogram.hs

Ifyouusepvm, youfirsthavetostart it.

Providepvmhostsormpihostsfile

Runcompiledprogramswith

myprogram <parameters> +RTS –ls -N<noPe> -RTS

Viewactivityprofile (tracefile) with

edentvmyprogram_..._-N4_-RTS.parevents

Rita Loogen: Eden – CEFP 2011


Eden threads and processes

deblock thread

new thread

runnable

suspend thread

block thread

kill thread

run thread

running

blocked

kill thread

finished

kill thread

Eden Threads andProcesses

  • An Eden processcomprisesseveralthreads(one per outputchannel).

  • Thread State Transition Diagram:


Edentv

EdenTV

-Diagrams:

Machines (PEs)

Processes

Threads

- Message

Overlays

Machines

Processes

- zooming

- messagestreams

- additional infos

- ...


Edentv demo

EdenTV Demo

Rita Loogen: Eden – CEFP 2011


Case study merge sort continued

Case Study: MergeSortcontinued

Rita Loogen: Eden – CEFP 2011


Example activity profile of parallel mergesort

Example: Activityprofileof parallel mergesort

  • Program run, lengthofinputlist: 1.000

  • Observation:

  • SLOWDOWN

  • Seq. runtime: 0,0037 s

  • Par. runtime: 0,9472 s

  • Reasons:

  • 1999 processes, mostlyblocked

  • 31940 messages

  • delayedprocesscreation

  • processplacement


How can we improve our parallel mergesort here are some rules of thumb

Howcanweimproveour parallel mergesort? Herearesomerulesofthumb.

  • Adaptthe total numberofprocessestothenumberofavailableprocessorelements (PEs), in Eden: noPe :: Int

  • UseeagerprocesscreationfunctionsspawnorspawnF.

  • Bydefault, Eden placesprocessesroundrobin on theavailable PEs. Try todistributeprocessesevenlyoverthe PEs.

  • Avoidelement-wisestreamingif not necessary, e.g. byputtingthelistintosome „box“ orbychunkingitintobiggerpieces.

THINK PARALLEL!

Rita Loogen: Eden – CEFP 2011


Parallel mergesort revisited

Parallel Mergesortrevisited

unsorted

sublist 1

sorted

sublist 1

mergesort

unsorted

sublist 2

sorted

sublist 2

mergesort

sorted

list

unsorted

list

mergemanylists

unshuffle (noPe-1)

unsorted

sublist noPe-1

sorted

sublistnoPe-1

mergesort

unsorted

sublistnoPe

sorted

sublistnoPe

mergesort

Rita Loogen: Eden – CEFP 2011


A simple parallelisation of map

...

x1

x2

x3

x4

...

f

f

f

f

...

y1

y2

y3

y4

A Simple Parallelisationofmap

map :: (a -> b) -> [a] -> [b]

map f xs = [ f x | x <- xs ]

parMap :: (Trans a, Trans b) =>

(a -> b) -> [a] -> [b]

parMap f = spawn (repeat(process f))

1 process

per list element


Alternative parallelisation of mergesort 1st try

Alternative Parallelisationofmergesort - 1st try

Eden Code:

par_ms :: (Ord a, Show a, Trans a) => [a] -> [a]

par_msxs

= head $ sms $parMapmergeSort

(unshuffle (noPe-1) xs))

sms :: (NFData a, Ord a) => [[a]] -> [[a]]

sms [] = []

smsxss@[xs] = xss

sms (xs1:xs2:xss) = sms (sortMerge xs1 xs2) (smsxss)

 Total numberofprocesses = noPe

 eagerlycreatedprocesses

 roundrobinplacementleadsto 1 process per PE

but maybe still toomanymessages


Resulting activity profile processes machine view

ResultingActivity Profile (Processes/Machine View)

Previousresultsforinputsize 1000

Seq. runtime: 0,0037 s

Par. runtime: 0,9472 s

  • Input size 1.000

  • seq. runtime: 0,0037

  • par. runtime: 0,0427 s

  • 8 Pes, 8 processes, 15 threads

  • 2042 messages

  • Much better, but still

  • SLOWDOWN

  • Reason: Indeedtoomanymessages

Rita Loogen: Eden – CEFP 2011


Reducing communication costs

Reducing Communication Costs

Rita Loogen: Eden – CEFP 2011


Reducing number of messages by chunking streams

ReducingNumberof Messages byChunking Streams

Split a list (stream) intochunks:

chunk :: Int -> [a] -> [[a]]

chunksize [] = []

chunksizexs = ys : chunksizezs

where (ys,zs) = splitAtsizexs

Combine with parallel map-implementationofmergesort:

par_ms_c :: (Ord a, Show a, Trans a) =>

Int -> -- chunksize

[a] -> [a]

par_ms_csizexs

= head $ sms $mapconcat$

parMap ((chunksize) . mergeSort . concat)

(map (chunksize)(unshuffle (noPe-1) xs)))

Rita Loogen: Eden – CEFP 2011


Resulting activity profile processes machine view1

ResultingActivity Profile (Processes/Machine View)

Previousresultsforinputsize 1000

Seq. runtime: 0,0037 s

Par. runtime I: 0,9472 s

Par. runtime II: 0,0427 s

  • Input size 1.000, chunksize 200

  • seq. runtime: 0,0037

  • par. runtime: 0,0133 s

  • 8 Pes, 8 processes, 15 threads

  • 56 messages

  • Much better, but still

  • SLOWDOWN

  • parallel runtime w/o Startup and Finish of parallel system:

  • 0,0125-0,009 = 0,0035

  •  increaseinputsize

Rita Loogen: Eden – CEFP 2011


Activity profile for input size 1 000 000

Activity Profile for Input Size 1.000.000

  • Input size 1.000.000

  • Chunksize 1000

  • seq. runtime: 7,287 s

  • par. runtime: 2,795 s

  • 8 Pes, 8 processes, 15 threads

  • 2044 messages

  •  speedupof 2.6 on 8 PE

unshuffle

mapmergesort

merge

Rita Loogen: Eden – CEFP 2011


Further improvement

Further improvement

Idea: Remove inputlistdistributionbylocalsublistselection:

par_ms_c :: (Ord a, Show a, Trans a) =>

Int -> [a] -> [a]

par_ms_csizexs

= head $ sms $mapconcat$

parMap ((chunksize) . mergeSort . concat)

(map (chunksize)(unshuffle (noPe-1) xs)))

par_ms:: (Ord a, Show a, Trans a) =>

Int -> [a] -> [a]

par_ms_bsizexs

= head $ sms $mapconcat$

parMap(\ i -> (chunksize(mergeSort

((unshuffle (noPe-1) xs)!!i))))

[0..noPe-2]

Rita Loogen: Eden – CEFP 2011


Corresponding activity profiles

CorrespondingActivityProfiles

  • Input size 1.000.000

  • Chunksize 1000

  • seq. runtime: 7,287 s

  • par. runtime: 2,795 s

  • new par. runtime: 2.074 s

  • 8 Pes, 8 processes, 15 threads

  • 1036 messages

  •  speedupof 3.5 on 8 PEs

Rita Loogen: Eden – CEFP 2011


Parallel map implementations

Parallel mapimplementations

Rita Loogen: Eden – CEFP 2011


Parallel map implementations parmap vs farm

T

T

T

T

T

T

T

T

...

...

...

...

P

P

P

P

P

P

...

...

PE

PE

PE

PE

Parallel mapimplementations: parMapvsfarm

parMap

farm

farm:: (Trans a, Trans b) =>

([a] -> [[a]]) -> ([[b]] -> [b]) ->

(a -> b)-> [a] -> [b]

farmdistributecombinef xs

= combine (parMap(map f)

(distributexs))

parMap::(Trans a, Trans b) => (a -> b) -> [a] -> [b]

parMapf xs

=spawn(repeat(process f))xs


Process farms

Processfarms

1 process

per sub-tasklist

withstatic

taskdistribution

farm:: (Trans a, Trans b) =>

([a] -> [[a]]) -> -- distribute

([[b]] -> [b]) -> -- combine

(a->b) -> [a] -> [b]

farmdistributecombine f xs

= combine . (parMap (map f)) . distribute

Choose e.g.

  • distribute = unshufflenoPe / combine = shuffle

  • distribute = splitIntoNnoPe / combine = concat

1 process

per PE

withstatic

taskdistribution

Rita Loogen: Eden – CEFP 2011


Example functional program for mandelbrot sets

Example: Functional Program for Mandelbrot Sets

ul

dimx

Idea: parallel computation of lines

lr

image :: Double -> Complex Double -> Complex Double -> Integer -> String

imagethresholdullrdimx

= header ++ (concat$ map xy2col lines)

where

xy2col ::[Complex Double] -> String

xy2col line = concatMap (rgb.(iterthreshold (0.0 :+ 0.0) 0)) line

(dimy, lines) = coordullrdimx


Example parallel functional program for mandelbrot sets

Example: ParallelFunctional Program for Mandelbrot Sets

ul

dimx

Idea: parallel computation of lines

lr

image :: Double -> Complex Double -> Complex Double -> Integer -> String

imagethresholdullrdimx

= header ++ (concat$ map xy2col lines)

where

xy2col ::[Complex Double] -> String

xy2col line = concatMap (rgb.(iterthreshold (0.0 :+ 0.0) 0)) line

(dimy, lines) = coordullrdimx

Replacemapby

farm(unshufflenoPe) shuffle

orfarmB (splitIntoNnoPe) concat


Eden parallel functional programming with haskell

Mandelbrot

Traces

Problem size: 2000 x 2000

Platform: Beowulf cluster

Heriot-Watt-University,

Edinburgh

(32 Intel P4-SMP nodes @ 3 GHz

512MB RAM, Fast Ethernet)

farm (unshufflenoPe) shuffle

roundrobinstatic

taskdistribution

farm (splitIntoNnoPe) concat

roundrobinstatic

taskdistribution


Example ray tracing

Camera

2D Image

3D Scene

Example: Ray Tracing

rayTrace:: Size -> CamPos -> [Object] -> [Impact]

rayTracesizecameraPosscene = findImpactsallRaysscene

whereallRays = generateRayssizecameraPos

findImpacts:: [Ray] -> [Object] -> [Impact]

findImpactsraysobjs= map (firstImpactobjs) rays


Reducing communication costs by chunking

Reducing Communication CostsbyChunking

Combine chunkingwith parallel map-implementation:

chunkMap :: Int -> (([a] -> [b]) -> ([[a]] -> [[b]]))

-> (a -> b) -> [a] -> [b]

chunkMapsizemapscheme f xs

= concat (mapscheme (map f) (chunksizexs))

Rita Loogen: Eden – CEFP 2011


Raytracer example element wise streaming vs chunking

RaytracerExample:Element-wise Streaming vsChunking

Input size 250

Chunksize 500

Runtime: 0,235 s

8 PEs

9 processes

17 threads

48 conversations

548 messages

Input size 250

Runtime: 6,311 s

8 PEs

9 processes

17 threads

48 conversations

125048 messages

Rita Loogen: Eden – CEFP 2011


Communication vs parameter passing

Communication vs Parameter Passing

Processinputs- canbecommunicated: f $# inp

- canbepassedasparameter(\ () -> f inp) $# () () isdummyprocessinput

graphofprocessabstraction

graphofinputexpression

#

will beevaluated in parentprocess

byconcurrentthread

andthensenttochildprocess

will bepacked (serialised)

andsentto remote PE

wherechildprocessiscreated

toevaluatethisexpression

Rita Loogen: Eden – CEFP 2011


Farm vs offline farm

T

T

T

T

...

...

P

P

P

P

T...T

T...T

...

...

PE

PE

PE

PE

Farm vs Offline Farm

Farm

Offline Farm

offlineFarm:: (Trans a, Trans b) =>

([a] -> [[a]]) -> ([[b]] -> [b]) ->

(a -> b) -> [a] -> [b]

offlineFarmdistributecombinef xs

= combine$

spawn

(map (rfi (map f)) (distributexs) )(repeat ())

rfi :: (a -> b) -> a -> Process () b

rfi h x = process (\ () -> h x)

farm:: (Trans a, Trans b) =>

([a] -> [[a]]) -> ([[b]] -> [b]) ->

(a -> b)-> [a] -> [b]

farmdistributecombinef xs

= combine (parMap(map f)

(distributexs))


Raytracer example farm vs offline farm

RaytracerExample: Farm vs Offline Farm

Input size 250

Chunksize 500

Runtime: 0,235 s

8 PEs

9 processes

17 threads

48 conversations

548 messages

Input size 250

Chunksize 500

Runtime: 0,119 s

8 PEs

9 processes

17 threads

40 conversations

290 messages

Rita Loogen: Eden – CEFP 2011


Eden what we have seen so far

Eden extendsHaskellwithparallelism

explicit processdefinitionsandimplicitcommunication

controlofprocessgranularity, distributionofwork, andcommunicationtopology

implementedbyextendingthe Glasgow Haskell Compiler (GHC)

toolEdenTVtoanalyse parallel programbehaviour

rulesofthumbforproducingefficient parallel programs

numberofprocesses ~ noPe

reducingcommunication

chunking

offline processes: parameterpassinginsteadofcommunication

parallel mapimplementations

Schematataskdecompositiontaskdistribution

parMapregularstatic:process per task

farmregularstatic:process per processor

offlineFarmregularstatic:taskselectionin processes

workpoolirregulardynamic

Eden: Whatwehaveseen so far


Overview eden lectures

Overview Eden Lectures

  • Lectures I & II (Thursday)

    • Motivation

    • Basic Constructs

    • Case Study: Mergesort

    • Eden TV – The Eden Trace Viewer

    • Reducingcommunicationcosts

    • Parallel mapimplementations

    • Explicit Channel Management

    • The Remote Data Concept

    • Algorithmic Skeletons

      • NestedWorkpools

      • DivideandConquer

  • Lecture III: Lab Session (Friday Morning)

  • Lecture IV: Implementation (FridayAfternoon)

    • LayeredStructure

    • Primitive Operations

    • The Eden Module

      • The Trans class

      • The PA monad

      • Process Handling

      • Remote Data


Many to one communication merge

workpool

merge

worker

nw-1

worker

1

worker

0

Many-to-one Communication: merge

Using non-deterministicmergefunction: merge:: [[a]] -> [a]

Workpoolor

Master/Worker Scheme

masterWorker :: (Trans a, Trans b) => Int -> Int -> (a->b) -> [a] -> [b]

masterWorkernwprefetchf tasks = orderByfromWsreqs

wherefromWs = parMap (map f) toWs

toWs = distributenptasksreqs

reqs = initReqs ++ newReqs

initReqs = concat (replicateprefetch [0..nw-1])

newReqs = merge [[i | r <- rs] | (i,rs) <- zip [0..nw-1] fromWs]


Example mandelbrot revisited

Example: Mandelbrot revisited!

offlineFarm

splitIntoN (noPe-1)

Input size 2000

Runtime: 13,09 s

8 PEs

8 processes

15 threads

35 conversations

1536 messages

Input size 2000

Runtime: 13,91 s

8 PEs

8 processes

22 threads

42 conversations

3044 messages

masterWorker

prefetch 50

Rita Loogen: Eden – CEFP 2011


Parallel map implementations1

T

T

T

T

T

T

T

T

...

...

...

...

P

P

P

P

T...T

T...T

P

P

P

P

...

...

PE

PE

...

PE

PE

PE

PE

workpool

T

T

T

merge

w np-1

w 1

w 0

PE

PE

PE

Parallel mapimplementations

parMapfarmofflineFarm

  • statictaskdistribution:

  • dynamictaskdistribution:

increasinggranularity


Explicit channel management

Explicit Channel Management

Rita Loogen: Eden – CEFP 2011


Explicit channel management in eden

Explicit Channel Management in Eden

Example: Definition of a process ring

ring :: (Trans i,Transo,Trans r) =>

((i,r) -> (o,r)) -> -- ring processfct

[i] -> [o] -- input-outputfct

ring f is = os

where

(os, ringOuts)

= unzip[process f #inp |

inp<- lazyzipisringIns]

ringIns = rightRotateringOuts

rightRotatexs = last xs : initxs

rightrotate

Problem: onlyindirect ring connections via parentprocess


Explicit channels in eden

Explicit Channels in Eden

parfill

nextChan

new

prevchan

p

  • Channel generation

  • Channel usage

plink ::

(Trans i,Trans o, Trans r) =>

((i,r) -> (o,r)) ->

Process (i,ChanName r)

(o,ChanName r)

plink f = processfun_link

where

fun_link (fromP, nextChan)

= new (\ prevChanprev ->

let

(toP, next)

= f (fromP, prev)

in

parfillnextChannext

(toP, prevChan)

)

new :: Trans a =>

(ChanName a -> a -> b) -> b

parfill :: Trans a =>

ChanName a -> a -> b -> b


Ring definition

Ring Definition

with explicit channels

ring :: (Trans i,Transo,Trans r) =>

((i,r) -> (o,r)) -> -- ring processfct

[i] -> [o] -- input-outputfct

ring f is = os

where

(os, ringOuts)

= unzip [f # inp |

inp<- lazyzipisringIns]

ringIns = rightRotateringOuts

rightRotatexs = last xs : initxs

leftrotate

rightrotate

process

plink

Problem: onlyindirect ring connections via parentprocess


Traceprofile ring implicit vs explicit channels

Traceprofile RingImplicitvs explicit channels

ring withimplicitchannels -

All communications

gothroughgenerator

process (number1).

Ring with explicit channels –

Ring processescommunicatedirectly.


The remote data concept

The Remote Data Concept

Rita Loogen: Eden – CEFP 2011


The remote data concept1

The „Remote Data“-Concept

  • Functions:

    • Release localdatawithrelease:: a -> RD a

    • Fetchreleaseddatawithfetch :: RD a -> a

  • Replace

    • (process g # (process f # inp))

      with

    • process (g o fetch) # (process (release o f) # inp)

inp

f

g

inp

f

g


Ring definition with remote data

Ring Definition with Remote Data

ring :: (Trans i,Transo,Trans r) =>

((i,r) -> (o,r)) -> -- ring processfct

[i] -> [o] -- input-outputfct

ring f is = os

where

(os, ringOuts)

= unzip [processf_RD#inp |

inp<- lazyzipisringIns]

f_RD (i, ringIn) = (o, releaseringOut)

where (o, ringOut) = f (i, fetchringIn)

ringIns = rightRotateringOuts

rightRotatexs = last xs : initxs

type [RD r]


Implementation of remote data with dynamic channels

Implementationof Remote Data withdynamicchannels

-- remote data

type RD a = ChanName (ChanName a)

-- convertlocaldataintocorresponding remote data

release :: Trans a ⇒ a → RD a

release x = new (\ cc c → parfill c x cc)

-- convert remote dataintocorrespondinglocaldata

fetch :: Trans a ⇒ RD a → a

fetch cc = new (\ c x → parfill cc c x)


Example computing shortest paths

Main Station

1

200

  • 0 200300∞ ∞

  • 0 150 ∞ 400

  • 150 0 50 125

  • ∞ ∞50 0 100

  • ∞4001251000

Elisabeth

church

2

300

150

400

Town Hall

3

125

50

4

Mensa

Computetheshortestway

fromA toB für arbitrarynodes

A and B!

100

5

Old University

Example: Computing ShortestPaths

Map -> Graph -> Adjacencymatrix/Distancematrix

1

2

3

4

5


Warshall s algorithm in process ring

Warshall‘salgorithmin process ring

  • Ring

rowi

rowk

ring_iterate :: Int -> Int -> Int ->

[Int] -> [[Int]] -> ( [Int],[[Int]])

ring_iteratesize k i rowk (rowi:xs)

| i > size = (rowk, []) -- End ofiterations

| i == k = (rowR, rowk:restoutput) –- send ownrow

| otherwise = (rowR, rowi:restoutput) –- update row

where

(rowR, restoutput) = ring_iteratesize k (i+1) nextrowkxs

nextrowk | i == k = rowk -- no update, ifownrow

| otherwise = updaterowrowkrowi(rowk!!(i-1))

Force evaluationofnextrowkbyinserting

rnfnextrowk `pseq`

beforecallofring_iterate


Traces of parallel warshall

Tracesof parallel Warshall

sequentialstartupphase

With additional demand on

nextrowk

End of

fast

version


Advanced algorithmic skeletons

(Advanced) Algorithmic Skeletons

Rita Loogen: Eden – CEFP 2011


Algorithmic skeletons

Algorithmic Skeletons

  • patternsof parallel computations=> in Eden: parallel higher-order functions

  • typicalpatterns:

    • parallel mapsandmaster-workersystems: parMap, farm, offline_farm, mw (workpoolSorted)

    • map-reduce

    • topologyskeletons: pipeline, ring, torus, grid, trees...

    • divideandconquer

  • in thefollowing:

    • nestedmaster-workersystems

    • divideandconquerschemes

See Eden‘s

Skeleton Library


Nesting workpools

workpool

merge

worker

np-1

tasks

worker

1

worker

0

results

sub-wp

sub-wp

sub-wp

sub-wp

merge

merge

w{np-1}

w1

w0

merge

merge

w{np-1}

w1

w0

w{np-1}

w{np-1}

w1

w1

w0

w0

Nesting Workpools

sub-wp

sub-wp

sub-wp

sub-wp

sub-wp


Nesting workpools1

workpool

merge

worker

np-1

tasks

worker

1

worker

0

results

sub-wp

sub-wp

sub-wp

sub-wp

merge

merge

w{np-1}

w1

w0

merge

merge

w{np-1}

w1

w0

w{np-1}

w{np-1}

w1

w1

w0

w0

Nesting Workpools

wpNested :: (Trans a, Trans b) =>

[Int] -> [Int] -> -- branchingdegrees/prefetches

-- per level

([a] -> [b]) -> -- workerfunction

[a] -> [b] -- tasks, results

wpNestednspfswf = foldrfldwf (zipnspfs)

where

fld :: (Trans a, Trans b) =>

(Int,Int) -> ([a] -> [b]) -> ([a] -> [b])

fld (n,pf) wf = workpool' n pfwf

sub-wp

sub-wp

sub-wp

sub-wp

sub-wp

wpnested [4,5] [64,8] yields


Hierarchical workpool

HierarchicalWorkpool

1 master

4 submasters

20 workers

fasterresultcollection

via hierarchy

-> betteroverallruntime

Mandelbrot Trace

Problem size: 2000 x 2000

Platform: Beowulf clusterHeriot-Watt-University, Edinburgh

(32 Intel P4-SMP nodes @ 3 GHz, 512MB RAM, Fast Ethernet)


Experimental results

Experimental Results

  • Mandelbrot set visualisation

  • . . . for 5000 5000 pixels, calculated line-wise (5000 tasks)

  • Platform: Beowulf cluster Heriot-Watt-University(32 Intel P4-SMP nodes @ 3 GHz, 512MB RAM, Fast Ethernet)

[31] [2,15] [4,7] [6,5] [2,2,4]

logical PEs 32 33 33 37 35


1 level nesting trace

7W

7W

7W

7W

1-Level-Nesting Trace

Garbage

Collection

Prefetch

branching

[4,7]

=> 33 log.PEs

prefetch 40


3 level nesting trace

4W

4W

4W

4W

4W

4W

3-Level-Nesting Trace

branching

[3,2,4]

=> 34 log.PEs

prefetch 60


Divide and conquer

1

2

1

2

1

3

5

1

4

3

6

2

7

5

8

1

4

3

6

2

7

5

8

Divide-and-conquer

dc :: (a->Bool) -> (a->b) -> (a->[a]) -> ([b]->b) -> a->b

dc trivial solvesplitcombinetask

= if trivial taskthensolvetask

elsecombine (maprec_dc (splittask))

whererec_dc = dc trivial solvesplitcombine

regularbinaryscheme

withdefaultplacing:

1

2

1

2

3

1

3

3

1

4

3

4

2

4

5

3

1

4

3

4

2

4

5


Explicit placement via ticket list

1

2

1

2

1

3

5

1

4

3

6

2

7

5

8

1

4

3

6

2

7

5

8

Explicit Placement via Ticket List

2,

3, 4, 5, 6, 7, 8

1

unshuffle

5, 7

4,

6, 8

3,

2

1

6

8

5

2

7

4

1

3

4

1

5

3

7

2

6

8

4

1

5

3

7

2

6

8


Regular dc skeleton with ticket placement

Regular DC-Skeleton with Ticket Placement

dcNTickets :: (Trans a, Trans b) =>

Int -> [Int] -> ... -- branch degree / tickets / ...

dcNTickets k [] trivial solve split combine

= dc trivial solve split combine

dcNTickets k tickets trivial solve split combine x

= if trivial x then solve x

else childRes `pseq` rnf myRes `pseq` -- demand control

combine (myRes:childRes ++ localRess )

wherechildRes = spawnAtchildTicketschildProcsprocIns

childProcs = map (process . rec_dcN) theirTs

rec_dcNts = dcNTickets kts trivial solve split combine

-- ticket distribution

(childTickets, restTickets) = splitAt (k-1) tickets

(myTs: theirTs) = unshuffle k restTickets

-- input splitting

(myIn:theirIn) = split x

(procIns, localIns) = splitAt (length childTickets) theirIn

-- local computations

myRes = dcNTicketsk myTstrivial solve split combine myIn

localRess = map (dc trivial solve split combine) localIns


Regular dc skeleton with ticket placement1

Regular DC-Skeleton with Ticket Placement

dcNTickets:: (Trans a, Trans b) =>

Int -> [Int] -> ... -- branchdegree / tickets / ...

dcNTickets k [] trivial solvesplitcombine

= dc trivial solvesplitcombine

dcNTickets k tickets trivial solvesplitcombine x

= if trivial x thensolve x

elsechildRes`pseq` rnfmyRes`pseq` -- demandcontrol

combine (myRes:childRes ++ localRess )

wherechildRes = spawnAtchildTicketschildProcsprocIns

childProcs = map (process . rec_dcN) theirTs

rec_dcNts = dcNTickets kts trivial solvesplitcombine

-- ticket distribution

(childTickets, restTickets) = splitAt (k-1) tickets

(myTs: theirTs) = unshuffle k restTickets

-- inputsplitting

(myIn:theirIn) = split x

(procIns,localIns) = splitAt (lengthchildTickets) theirIn

-- localcomputations

myRes= dcNTicketsk myTstrivial solve split combine myIn

localRess = map (dc trivial solvesplitcombine) localIns

  • arbitrary, but fixedbranchingdegree

  • flexible, workswith

    • toofewtickets

    • double tickets

  • parallel unfoldingcontrolledby ticket list


Case study karatsuba

Case Study: Karatsuba

  • multiplicationof large integers

  • fixedbranchingdegree 3

  • complexity O(nlog2 3), combinecomplexity O(n)

  • Platform: LAN (Fast Ethernet),7 dual-corelinuxworkstations, 2 GB RAM

  • inputsize: 2 integerswith 32768 digitseach


Divide and conquer schemes

Divide-and-ConquerSchemes

  • Distributed expansion

  • Flat expansion


Divide and conquer using master worker

Divide-and-ConquerUsing Master-Worker

divConFlat :: (Trans a,Trans b, Show b, Show a, NFData b) =>

((a->b) -> [a] -> [b])

-> Int -> (a->Bool) -> (a->b) -> (a->[a]) -> ([b]->b) -> a -> b

divConFlatparallelMapSkeldepth trivial solvesplitcombine x

= combineTopMaster (\_ -> combine) levelsresults

where

(tasks,levels) = generateTasksdepth trivial split x

results = parallelMapSkeldcSeqtasks

dcSeq = dc trivial solvesplitcombine


Case study parallel fft

Case Study: Parallel FFT

  • frequency distribution in a signal, decimation in time

  • 4-radix FFT, input size: 410 complex numbers

  • Platform: Beowulf Cluster Edinburgh

Problem:

Communicating

huge data amounts


Using master worker dc

Using Master-Worker-DC


Intermediate conclusions

Intermediate Conclusions

  • Eden enableshigh-level parallel programming

  • Usepredefinedor design ownskeletons

  • Eden‘sskeletonlibraryprovides a large collectionofsophisticatedskeletons:

    • parallel maps: parMap, farm, offlineFarm …

    • master-worker: flat, hierarchical, distributed …

    • divide-and-conquer: ticket placement, via master-worker …

    • topologicalskeletons: ring, torus, all-to-all, parallel transpose …

Rita Loogen: Eden – CEFP 2011


Eden lab session

Eden Lab Session

  • Download the exercise sheet from http://www.mathematik.uni-marburg.de/~eden/?content=cefp

  • Chooseoneofthethreeassignmentsanddownloadthecorrespondingsequentialprogram: sumEuler.hs (easy) - juliaSets.hs (medium) - gentleman.hs (advanced)

  • Download the sample mpihostsfileandmodifyittorandomlychosen lab computersnylxywithxychosenfrom 01 upto 64

  • Call edenenvtosetuptheenvironmentfor Eden

  • Compile Eden programswithghc –parmpi --make –O2 –eventlogmyprogram.hs

  • Runcompiledprogramswithmyprogram <parameters> +RTS –ls -Nx-RTS with x=noPe

  • Viewactivityprofile (tracefile) withedentvmyprogram_..._-N…_-RTS.parevents

Rita Loogen: Eden – CEFP 2011


Eden s implementation

Eden‘sImplementation

Rita Loogen: Eden – CEFP 2011


Glasgow haskell compiler eden extensions

HaskellEden

core syntax

simplify

Lex/Yacc parser

core to STG

simplify

STG

Front end

code generation

(parallel Eden

runtime system)

abs. syntax

prefix form

C

abstract C

abs. syntax

abs. syntax

renamer

type checker

reader

flattening

C compiler

desugarer

Message

Passing

Library

MPI or PVM

Glasgow Haskell Compiler& Eden Extensions


Eden s parallel runtime system prts

Eden‘s parallel runtimesystem (PRTS)

Modificationof GUM, the PRTS ofGpH (Glasgow Parallel Haskell):

  • Recycled

    • Thread management: heapobjects, threadscheduler

    • Memory management: localgarbagecollection

    • Communication: graphpackingandunpackingroutines

  • Newlydeveloped

    • Processmanagement:runtimetables, generationandtermination

    • Channel management:channelrepresentation, connection, etc.

  • Simplifications

    • no „virtualsharedmemory“ (global addressspace) necessary

    • noglobalisationofunevaluateddata

    • no global garbagecollectionofdata


Dream distributed eden abstract machine

BH

.

.

.

.

.

.

.

.

.

DREAM: DistRibuted Eden Abstract Machine

  • abstractviewofEden‘s parallel runtimesystem

  • abstractviewofprocess:

HEAP

BH

out-

ports

in-

ports

BH

Thread representedby TSO (threadstateobject) in theheap

black hole closure, on accessthreadsaresuspendeduntilthisclosureisoverwritten

BH

Rita Loogen: Eden – CEFP 2011


Garbage collection and termination

no global addressspace

localheap

inports/outports

noneedfor global garbagecollection

localgarbagecollection

outportsas additional roots

inportscanberecognisedasgarbage

.

.

.

.

.

.

.

.

.

Garbage Collection and Termination

close

BH

BH

out-

ports

BH

BH

in-

ports

.

.

.

BH

BH


Implementation of eden

Implementationof Eden

Eden

  • Parallel programming on a highlevelofabstraction

    • explicit processdefinitions

    • implicitcommunication

  • Automaticprocessandchannelmanagement

    • Distributed graphreduction

    • Management ofprocessesandtheirinterconnectingchannels

    • Message passing

?

Eden

Runtime System (RTS)


Implementation of eden1

Eden Module

Implementationof Eden

Eden

  • Parallel programming on a highlevelofabstraction

    • explicit processdefinitions

    • implicitcommunication

  • Automaticprocessandchannelmanagement

    • Distributed graphreduction

    • Management ofprocessesandtheirinterconnectingchannels

    • Message passing

Eden

Runtime System (RTS)


Layer structure

Eden programs

Skeleton Library

Eden Module

Primitive Operations

Parallel GHC Runtime System

Layer Structure


Parprim the interface to the parallel rts

Parprim – The Interface to the Parallel RTS

Primitive operationsprovide the basic functionality :

  • channel administration

    • primitive channels (= inports)data ChanName' a = Chan Int# Int# Int#

    • create communication channel(s)createC :: IO ( ChanName' a, a )

    • connect communication channelconnectToPort :: ChanName' a -> IO ()

  • communication

    • send datasendData :: Mode -> a -> IO ()

    • modidata Mode = Connect | Stream | Data | Instantiate Int

  • thread creationfork :: IO () -> IO ()

  • generalnoPE, selfPE :: Int

  • PE ProcessInport


    The eden module

    The Eden Module

    • Type class Trans

    • transmissible data types

    • overloaded communication functions for lists (-> streams): write :: a -> IO () and tuples (-> concurrency): createComm :: IO (ChanName a, a)

    process:: (Trans a, Trans b) => (a -> b)-> Process a b

    ( # ) :: (Trans a, Trans b) => Processa b -> a -> b

    spawn :: (Trans a, Trans b) => [Process a b] -> [a]->[b]

    explicit definitions

    of process, ( # ) and Process

    as well as spawn

    • explicit channels

    • newtype ChanName a

    • = Comm (a -> IO ())


    Type class trans

    Type class Trans

    class NFData a => Transa where

    write :: a -> IO ( )

    write x = rnf x `pseq` sendDataData x

    createComm :: (ChanName a, a) createComm = do(cx, x) <- createC

    return (Comm (sendVia cx), x)

    sendVia :: ChanName’ a -> a -> IO ()

    sendViach d = do connectToPortch

    write d


    Tuple transmission by concurrent threads

    Tupletransmissionbyconcurrentthreads

    instance (Trans a, Trans b) => Trans (a,b) where

    createComm = do (cx,x) <- createC

    (cy,y) <- createC

    return (Comm (write2 (cx,cy)),

    (x,y))

    write2 :: (Trans a, Trans b) =>

    (ChanName' a, ChanName' b) -> (a,b) -> IO ()

    write2 (c1,c2) (x1,x2) = do fork (sendVia c1 x1)

    sendVia c2 x2

    Rita Loogen: Eden – CEFP 2011


    Stream transmission of lists

    Stream Transmission of Lists

    instance Trans a => Trans [a] where

    write l@[] = sendData Data l

    write (x:xs) = do (rnf x `pseq` sendData Stream x)

    write xs

    Rita Loogen: Eden – CEFP 2011


    The pa monad

    The PA Monad

    Improvingcontrolover parallel activities:

    newtypePA a = PA { fromPA :: IO a }

    instanceMonad PA where

    return b = PA $ return b

    (PA ioX) >>= f = PA $ do x <- ioX

    fromPA $ f x

    runPA :: PA a -> a

    runPA = unsafeperformIO . fromPA

    Rita Loogen: Eden – CEFP 2011


    Remote process creation

    Remote Process Creation

    data (Trans a, Trans b) => Processa b

    = Proc (ChanName b -> ChanName‘(ChanName a) -> IO ( ) )

    process :: (a -> b) -> Process a b

    process f= Procf_remotewheref_remote(CommsendResult) inCC =do (sendInput, invals) = createComm

    connectToPortinCCsendDataDatasendInputsendResult (f invals)

    channel for

    returning input channel handle(s)

    output channel(s)

    Process Output


    Process instantiation

    ProcessInstantiation

    ( # ) :: (Trans a, Trans b) => Process a b -> a -> b

    pabs#inps

    = unsafePerformIO $ instantiateAt0pabsinps

    instantiateAt:: (Trans a, Trans b) =>

    Int -> Process a b -> a -> IO b

    instantiateAtpe(Procf_remote) inps

    = do(sendresult, result) <- createComm

    (inCC, CommsendInput) <- createC

    sendData(Instantiatepe)

    (f_remotesendresultinCC)

    fork (sendInputinps)

    returnresult

    output channel(s)

    channel for

    returning input channel handle(s)


    Eden parallel functional programming with haskell

    PE 1

    PE 2

    System

    Parent Process

    Child Process

    createinputchannels (inports)

    forchildprocess‘ results

    createinputchannels

    send inputchannels

    toparentprocess

    createinputchannels

    forreceivingchildprocess‘

    inputchannels

    createoutportsandforkthreads

    forevaluatingoutputcomponents

    send instantiatemessage

    withcreatedinputchannels

    H

    E

    A

    P

    evaluateoutputcomponent n

    evaluateoutputcomponent 2

    evaluateoutputcomponent 1

    createoutportsandforkthreads

    forsendinginputstochildprocess

    send resultstoparent

    andterminatethread

    send resultstoparent

    andterminatethread

    send resultstoparent

    andterminatethread

    H

    E

    A

    P

    evaluateinputcomponentsto NF

    continueevaluation

    evaluateinputcomponentsto NF

    evaluateinputcomponent n to NF

    terminateprocessand

    closeinports

    send resultstochild

    andterminatethread

    send resultstochild

    andterminatethread

    send resultstochild

    andterminatethread


    Conclusions of lecture 3

    Eden programs

    Skeleton Library

    Eden Module

    Primitive Operations

    Parallel GHC Runtime System

    ConclusionsofLecture 3

    Layeredimplementationof Eden

    • More flexibility

    • Complexityhidden

    • BetterMaintainability

    • Lean interfaceto GHC RTS


    Conclusions

    Conclusions

    www.informatik.uni-marburg.de/~eden

    • Eden = Haskell + Coordination

    • Explicit ProcessDefinitions

    • ImplicitCommunication (Message Transfer)

    • Explicit Channel Management -> arbitraryprocesstopologies

    • NondeterministicMerge-> masterworkersystemswithdynamicloadbalancing

    • Remote Data -> pass datadirectlyfromproducertoconsumerprocesses

    • ProgrammingMethodology: UseAlgorithmic Skeletons

    • EdenTVtoanalyse parallel programbehaviour

    • Available on severalplatforms

    Rita Loogen: Eden – CEFP 2011


    More on eden

    More on Eden

    PhD Workshop tomorrow 16:40-17:00

    Bernhard Pickenbrock: Development of a multi-core implementation of Eden

    Rita Loogen: Eden – CEFP 2011


  • Login