1 / 106

Eden Parallel Functional Programming with Haskell

Eden Parallel Functional Programming with Haskell. Rita Loogen Philipps-Universität Marburg, Germany

pearly
Download Presentation

Eden Parallel Functional Programming with Haskell

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Eden Parallel FunctionalProgrammingwithHaskell Rita Loogen Philipps-Universität Marburg, Germany Joint workwithYolanda Ortega Mallén, Ricardo Peña Alberto de la Encina, Mercedes Hildalgo Herrero, ChristóbalPareja, Fernando Rubio, Lidia Sánchez-Gil, Clara Segura, Pablo Roldan Gomez (UniversidadComplutense de Madrid) Jost Berthold, Silvia Breitinger, Mischa Dieterle, Thomas Horstmeyer, Ulrike Klusik, Oleg Lobachev, Bernhard Pickenbrock, Steffen Priebe, Björn Struckmeier(Philipps-Universität Marburg) CEFP Budapest 2011

  2. Marburg /Lahn Rita Loogen: Eden – CEFP 2011

  3. Overview • Lectures I & II (Thursday) • Motivation • Basic Constructs • Case Study: Mergesort • Eden TV – The Eden Trace Viewer • Reducingcommunicationcosts • Parallel mapimplementations • Explicit Channel Management • The Remote Data Concept • Algorithmic Skeletons • NestedWorkpools • DivideandConquer • Lecture III: Lab Session (Friday Morning) • Lecture IV: Implementation • LayeredStructure • Primitive Operations • The Eden Module • The Trans class • The PA monad • Process Handling • Remote Data

  4. Materials Materials • Lecture Notes • Slides • Example Programs (Case studies) • Exercises areprovided via the Eden web page www.informatik.uni-marburg.de/~eden Navigateto CEFP! Rita Loogen: Eden – CEFP 2011

  5. Motivation Rita Loogen: Eden – CEFP 2011

  6. Our Goal Parallel programmingat a highlevelofabstraction inherent parallelism • functional language (e.g. Haskell) • => concise programs • => high programming efficiency automatic parallelisation or annotations

  7. Our Approach Parallel programmingat a highlevelofabstraction l l l l + • parallelismcontrol • explicit processes • implicitcommunication • distributedmemory • … • functional language (e.g. Haskell) • => concise programs • => high programming efficiency Eden = Haskell + Parallelism www.informatik.uni-marburg.de/~eden

  8. Basic Constructs Rita Loogen: Eden – CEFP 2011

  9. Eden parallel programming at a high level of abstraction • = Haskell + Coordination • processdefinition • processinstantiation process:: (Trans a, Trans b) => (a -> b) -> Process a b gridProcess = process (\ (fromLeft,fromTop) -> let ... in (toRight, toBottom)) processoutputs computedby concurrentthreads, listssentasstreams ( # ) :: (Trans a, Trans b) => Process a b -> a ->b (outEast, outSouth) = gridProcess# (inWest,inNorth)

  10. Derivedoperatorsandfunctions • Parallel functionapplication • Often, processabstractionandinstantiationareused in thefollowingcombination • Eagerprocesscreation • Eagercreationof a seriesofprocesses ($#) :: (Trans a, Trans b) => (a -> b) -> a -> b f $# x = process f # x -- ($#) = (#) . process spawn :: (Trans a, Trans b) => [Processa b] -> [a] -> [b] spawn= zipWith (#) -- ignoringdemandcontrol spawnF :: (Trans a, Trans b) => [a -> b] -> [a] -> [b] spawnF = spawn . (mapprocess) Rita Loogen: Eden – CEFP 2011

  11. Evaluating f $# e graphofprocessabstraction process f graphofargumentexpressione # will beevaluated in parentprocess bynewconcurrentthread andsenttochildprocess will beevaluated bynewchildprocess on remote PE 11 resultof f $ e mainprocess creates childprocess resultof e Rita Loogen: Eden – CEFP 2011

  12. *2 *3 *5 sm sm DefiningprocessnetsExample: Computing Hammingnumbers importControl.Parallel.Eden hamming :: [Int] hamming = 1: sm ((uncurrysm) $# (map (*2)$#hamming, map (*3)$#hamming)) (map (*5)$# hamming) sm :: [Int] -> [Int] -> [Int] sm [] ys = ys smxs [] = xs sm (x:xs) (y:ys) | x < y = x : smxs (y:ys) | x == y = x : smxsys | otherwise = y : sm (x:xs) ys 1: hamming

  13. QuestionsaboutSemantics • simple denotationalsemantics • processabstraction -> lambdaabstraction • processinstantiation -> application • value/resultofprogram, but noinformationaboutexecution, parallelismdegree, speedups /slowdowns • operational • When will a processbecreated? When will a processinstantiationbeevaluated? • Towhichdegree will process in-/outputsbeevaluated? Weakhead normal form or normal form or ...? • When will process in-/outputsbecommunicated?

  14. Answers Eden onlyifandwhenitsresult isdemanded normal form eager (push) communication: valuesarecommunicated assoonasavailable Lazy Evaluation (Haskell) onlyifandwhenitsresult isdemanded WHNF (weakhead normal form ) onlyifdemanded: requestandanswer messagesnecessary • When will a processbecreated? When will a processinstantiationbeevaluated? • Towhichdegree will process in-/outputsbeevaluated? Weakhead normal form or normal form or ...? • When will process in-/outputsbecommunicated?

  15. Lazyevaluation vs. Parallelism • Problem:Lazyevaluation ==> distributedsequentiality • Eden‘sapproach: • eagerprocesscreationwithspawn • eagercommunication: • normal form evaluationof all processoutputs(byindependentthreads) • push communication, i.e. valuesarecommunicatedassoonasavailable • explicit demandcontrolbysequentialstrategies(Module Control.Seq): • rnf, rwhnf... :: Strategy a • using :: a -> Strategy a -> a • pseq :: a -> b -> b (Module Control.Parallel)

  16. Case Study: MergeSort Rita Loogen: Eden – CEFP 2011

  17. Case Study: MergeSort Unsorted sublist 1 Sorted sublist 1 Haskell Code: mergeSort :: (Ord a, Show a) => [a] -> [a] mergeSort [] = [] mergeSort [x] = [x] mergeSortxs = sortMerge (mergeSort xs1) (mergeSort xs2) where [xs1,xs2] = unshuffle 2xs sorted list Unsorted list split merge Unsorted sublist 2 sorted Sublist 2

  18. Example: MergeSortparallel Unsorted sublist 1 Sorted sublist 1 Eden Code (simplestversion): parMergeSort :: (Ord a, Show a, Trans a) => [a] -> [a] parMergeSort [] = [] parMergeSort [x] = [x] parMergeSortxs = sortMerge (parMergeSort$# xs1) (parMergeSort$# xs2) where [xs1,xs2] = unshuffle 2xs sorted list Unsorted list split merge Unsorted sublist 2 sorted Sublist 2

  19. Example: MergeSortProcessnet childprocess childprocess Eden Code (simplestversion): parMergeSort :: (Ord a, Show a, Trans a) => [a] -> [a] parMergeSort [] = [] parMergeSort [x] = [x] parMergeSortxs = sortMerge (parMergeSort$# xs1) (parMergeSort$# xs2) where [xs1,xs2] = unshuffle 2xs childprocess mainprocess childprocess childprocess childprocess

  20. EdenTV: The Eden Trace Viewer Tool Rita Loogen: Eden – CEFP 2011

  21. The Eden-System Eden Parallel runtimesystem (Management ofprocesses andcommunication) EdenTV parallel system

  22. Compiling, Running, Analysing Eden Programs Set upenvironmentfor Eden on Lab computersbycalling edenenv Compile Eden programswith ghc –parmpi --make –O2 –eventlogmyprogram.hsor ghc –parpvm --make –O2 –eventlogmyprogram.hs Ifyouusepvm, youfirsthavetostart it. Providepvmhostsormpihostsfile Runcompiledprogramswith myprogram <parameters> +RTS –ls -N<noPe> -RTS Viewactivityprofile (tracefile) with edentvmyprogram_..._-N4_-RTS.parevents Rita Loogen: Eden – CEFP 2011

  23. deblock thread new thread runnable suspend thread block thread kill thread run thread running blocked kill thread finished kill thread Eden Threads andProcesses • An Eden processcomprisesseveralthreads(one per outputchannel). • Thread State Transition Diagram:

  24. EdenTV -Diagrams: Machines (PEs) Processes Threads - Message Overlays Machines Processes - zooming - messagestreams - additional infos - ...

  25. EdenTV Demo Rita Loogen: Eden – CEFP 2011

  26. Case Study: MergeSortcontinued Rita Loogen: Eden – CEFP 2011

  27. Example: Activityprofileof parallel mergesort • Program run, lengthofinputlist: 1.000 • Observation: • SLOWDOWN • Seq. runtime: 0,0037 s • Par. runtime: 0,9472 s • Reasons: • 1999 processes, mostlyblocked • 31940 messages • delayedprocesscreation • processplacement

  28. Howcanweimproveour parallel mergesort? Herearesomerulesofthumb. • Adaptthe total numberofprocessestothenumberofavailableprocessorelements (PEs), in Eden: noPe :: Int • UseeagerprocesscreationfunctionsspawnorspawnF. • Bydefault, Eden placesprocessesroundrobin on theavailable PEs. Try todistributeprocessesevenlyoverthe PEs. • Avoidelement-wisestreamingif not necessary, e.g. byputtingthelistintosome „box“ orbychunkingitintobiggerpieces. THINK PARALLEL! Rita Loogen: Eden – CEFP 2011

  29. Parallel Mergesortrevisited unsorted sublist 1 sorted sublist 1 mergesort unsorted sublist 2 sorted sublist 2 mergesort sorted list unsorted list mergemanylists unshuffle (noPe-1) unsorted sublist noPe-1 sorted sublistnoPe-1 mergesort unsorted sublistnoPe sorted sublistnoPe mergesort Rita Loogen: Eden – CEFP 2011

  30. ... x1 x2 x3 x4 ... f f f f ... y1 y2 y3 y4 A Simple Parallelisationofmap map :: (a -> b) -> [a] -> [b] map f xs = [ f x | x <- xs ] parMap :: (Trans a, Trans b) => (a -> b) -> [a] -> [b] parMap f = spawn (repeat(process f)) 1 process per list element

  31. Alternative Parallelisationofmergesort - 1st try Eden Code: par_ms :: (Ord a, Show a, Trans a) => [a] -> [a] par_msxs = head $ sms $parMapmergeSort (unshuffle (noPe-1) xs)) sms :: (NFData a, Ord a) => [[a]] -> [[a]] sms [] = [] smsxss@[xs] = xss sms (xs1:xs2:xss) = sms (sortMerge xs1 xs2) (smsxss)  Total numberofprocesses = noPe  eagerlycreatedprocesses  roundrobinplacementleadsto 1 process per PE but maybe still toomanymessages

  32. ResultingActivity Profile (Processes/Machine View) Previousresultsforinputsize 1000 Seq. runtime: 0,0037 s Par. runtime: 0,9472 s • Input size 1.000 • seq. runtime: 0,0037 • par. runtime: 0,0427 s • 8 Pes, 8 processes, 15 threads • 2042 messages • Much better, but still • SLOWDOWN • Reason: Indeedtoomanymessages Rita Loogen: Eden – CEFP 2011

  33. Reducing Communication Costs Rita Loogen: Eden – CEFP 2011

  34. ReducingNumberof Messages byChunking Streams Split a list (stream) intochunks: chunk :: Int -> [a] -> [[a]] chunksize [] = [] chunksizexs = ys : chunksizezs where (ys,zs) = splitAtsizexs Combine with parallel map-implementationofmergesort: par_ms_c :: (Ord a, Show a, Trans a) => Int -> -- chunksize [a] -> [a] par_ms_csizexs = head $ sms $mapconcat$ parMap ((chunksize) . mergeSort . concat) (map (chunksize)(unshuffle (noPe-1) xs))) Rita Loogen: Eden – CEFP 2011

  35. ResultingActivity Profile (Processes/Machine View) Previousresultsforinputsize 1000 Seq. runtime: 0,0037 s Par. runtime I: 0,9472 s Par. runtime II: 0,0427 s • Input size 1.000, chunksize 200 • seq. runtime: 0,0037 • par. runtime: 0,0133 s • 8 Pes, 8 processes, 15 threads • 56 messages • Much better, but still • SLOWDOWN • parallel runtime w/o Startup and Finish of parallel system: • 0,0125-0,009 = 0,0035 •  increaseinputsize Rita Loogen: Eden – CEFP 2011

  36. Activity Profile for Input Size 1.000.000 • Input size 1.000.000 • Chunksize 1000 • seq. runtime: 7,287 s • par. runtime: 2,795 s • 8 Pes, 8 processes, 15 threads • 2044 messages •  speedupof 2.6 on 8 PE unshuffle mapmergesort merge Rita Loogen: Eden – CEFP 2011

  37. Further improvement Idea: Remove inputlistdistributionbylocalsublistselection: par_ms_c :: (Ord a, Show a, Trans a) => Int -> [a] -> [a] par_ms_csizexs = head $ sms $mapconcat$ parMap ((chunksize) . mergeSort . concat) (map (chunksize)(unshuffle (noPe-1) xs)))  par_ms:: (Ord a, Show a, Trans a) => Int -> [a] -> [a] par_ms_bsizexs = head $ sms $mapconcat$ parMap(\ i -> (chunksize(mergeSort ((unshuffle (noPe-1) xs)!!i)))) [0..noPe-2] Rita Loogen: Eden – CEFP 2011

  38. CorrespondingActivityProfiles  • Input size 1.000.000 • Chunksize 1000 • seq. runtime: 7,287 s • par. runtime: 2,795 s • new par. runtime: 2.074 s • 8 Pes, 8 processes, 15 threads • 1036 messages •  speedupof 3.5 on 8 PEs Rita Loogen: Eden – CEFP 2011

  39. Parallel mapimplementations Rita Loogen: Eden – CEFP 2011

  40. T T T T T T T T ... ... ... ... P P P P P P ... ... PE PE PE PE Parallel mapimplementations: parMapvsfarm parMap farm farm :: (Trans a, Trans b) => ([a] -> [[a]]) -> ([[b]] -> [b]) -> (a -> b)-> [a] -> [b] farmdistributecombinef xs = combine (parMap(map f) (distributexs)) parMap :: (Trans a, Trans b) => (a -> b) -> [a] -> [b] parMapf xs =spawn(repeat(process f))xs

  41. Processfarms 1 process per sub-tasklist withstatic taskdistribution farm :: (Trans a, Trans b) => ([a] -> [[a]]) -> -- distribute ([[b]] -> [b]) -> -- combine (a->b) -> [a] -> [b] farmdistributecombine f xs = combine . (parMap (map f)) . distribute Choose e.g. • distribute = unshufflenoPe / combine = shuffle • distribute = splitIntoNnoPe / combine = concat 1 process per PE withstatic taskdistribution Rita Loogen: Eden – CEFP 2011

  42. Example: Functional Program for Mandelbrot Sets ul dimx Idea: parallel computation of lines lr image :: Double -> Complex Double -> Complex Double -> Integer -> String imagethresholdullrdimx = header ++ (concat$ map xy2col lines) where xy2col ::[Complex Double] -> String xy2col line = concatMap (rgb.(iterthreshold (0.0 :+ 0.0) 0)) line (dimy, lines) = coordullrdimx

  43. Example: ParallelFunctional Program for Mandelbrot Sets ul dimx Idea: parallel computation of lines lr image :: Double -> Complex Double -> Complex Double -> Integer -> String imagethresholdullrdimx = header ++ (concat$ map xy2col lines) where xy2col ::[Complex Double] -> String xy2col line = concatMap (rgb.(iterthreshold (0.0 :+ 0.0) 0)) line (dimy, lines) = coordullrdimx Replacemapby farm(unshufflenoPe) shuffle orfarmB (splitIntoNnoPe) concat

  44. Mandelbrot Traces Problem size: 2000 x 2000 Platform: Beowulf cluster Heriot-Watt-University, Edinburgh (32 Intel P4-SMP nodes @ 3 GHz 512MB RAM, Fast Ethernet) farm (unshufflenoPe) shuffle roundrobinstatic taskdistribution farm (splitIntoNnoPe) concat roundrobinstatic taskdistribution

  45. Camera 2D Image 3D Scene Example: Ray Tracing rayTrace:: Size -> CamPos -> [Object] -> [Impact] rayTracesizecameraPosscene = findImpactsallRaysscene whereallRays = generateRayssizecameraPos findImpacts:: [Ray] -> [Object] -> [Impact] findImpactsraysobjs = map (firstImpactobjs) rays

  46. Reducing Communication CostsbyChunking Combine chunkingwith parallel map-implementation: chunkMap :: Int -> (([a] -> [b]) -> ([[a]] -> [[b]])) -> (a -> b) -> [a] -> [b] chunkMapsizemapscheme f xs = concat (mapscheme (map f) (chunksizexs)) Rita Loogen: Eden – CEFP 2011

  47. RaytracerExample:Element-wise Streaming vsChunking Input size 250 Chunksize 500 Runtime: 0,235 s 8 PEs 9 processes 17 threads 48 conversations 548 messages Input size 250 Runtime: 6,311 s 8 PEs 9 processes 17 threads 48 conversations 125048 messages Rita Loogen: Eden – CEFP 2011

  48. Communication vs Parameter Passing Processinputs- canbecommunicated: f $# inp - canbepassedasparameter(\ () -> f inp) $# () () isdummyprocessinput graphofprocessabstraction graphofinputexpression # will beevaluated in parentprocess byconcurrentthread andthensenttochildprocess will bepacked (serialised) andsentto remote PE wherechildprocessiscreated toevaluatethisexpression Rita Loogen: Eden – CEFP 2011

  49. T T T T ... ... P P P P T...T T...T ... ... PE PE PE PE Farm vs Offline Farm Farm Offline Farm offlineFarm:: (Trans a, Trans b) => ([a] -> [[a]]) -> ([[b]] -> [b]) -> (a -> b) -> [a] -> [b] offlineFarmdistributecombinef xs = combine$ spawn (map (rfi (map f)) (distributexs) ) (repeat ()) rfi :: (a -> b) -> a -> Process () b rfi h x = process (\ () -> h x) farm :: (Trans a, Trans b) => ([a] -> [[a]]) -> ([[b]] -> [b]) -> (a -> b)-> [a] -> [b] farmdistributecombinef xs = combine (parMap(map f) (distributexs))

  50. RaytracerExample: Farm vs Offline Farm Input size 250 Chunksize 500 Runtime: 0,235 s 8 PEs 9 processes 17 threads 48 conversations 548 messages Input size 250 Chunksize 500 Runtime: 0,119 s 8 PEs 9 processes 17 threads 40 conversations 290 messages Rita Loogen: Eden – CEFP 2011

More Related