1 / 39

VLDB summer school 学习报告

东北大学. VLDB summer school 学习报告. 王春磊 2013 年 9 月 6 日. Agenda. Parallel Data Processing Big Data Map/Reduce and Hadoop Stratosphere – A Platform for Big Data Analytics. DIMA – TU Berlin. 29.07.2013. 78. DIMA – TU Berlin. The Stratosphere System Stack Layered approach – several

yael
Download Presentation

VLDB summer school 学习报告

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 东北大学 VLDB summer school 学习报告 王春磊 2013年9月6日

  2. Agenda • ParallelDataProcessing • BigData • Map/ReduceandHadoop • Stratosphere – A Platform for Big Data Analytics DIMA–TUBerlin 29.07.2013 78

  3. DIMA–TUBerlin

  4. TheStratosphereSystemStack Layeredapproach–several entrypointstothesystem Pact4 Scala Scala-Compiler Plugin Meteor Script SOPREMO Compiler PACTProgram StratosphereOptimizer RuntimeOperators NepheleDataflowEngine NepheleParallel Dataflow DIMA–TUBerlin 29.07.2013 79 79

  5. TheStratosphereSystemStack Meteorscriptinglanguage -InspiredbyJaql -Nesteddatamodel(JSON) -Relationalcoreoperators -Packagesforinformation extractionandintegration Pact4 Scala Meteor Script PACTProgram SOPREMO Compiler Scala-Compiler Plugin StratosphereOptimizer RuntimeOperators NepheleDataflowEngine NepheleParallel Dataflow DIMA–TUBerlin 29.07.2013 80 80

  6. TheStratosphereSystemStack PACTprogrammingmodel GeneralizesMapReduce withadditional second-orderfunctions (PArallelizationConTracts) Pact4 Scala Meteor Script PACTProgram SOPREMO Compiler Scala-Compiler Plugin StratosphereOptimizer RuntimeOperators NepheleDataflowEngine NepheleParallel Dataflow DIMA–TUBerlin 29.07.2013 81 81

  7. TheStratosphereSystemStack Runtimeengine -Memorymanagement -AsynchronousIO -Queryexecution (sorting,hashing,…) Pact4 Scala Scala-Compiler Plugin Meteor Script SOPREMO Compiler PACTProgram StratosphereOptimizer RuntimeOperators NepheleDataflowEngine NepheleParallel Dataflow DIMA–TUBerlin 29.07.2013 82 82

  8. TheStratosphereSystemStack Nepheledataflowengine -Resourceallocation -Scheduling -Taskcommunication -Faulttolerance -Executionmonitoring Pact4 Scala Scala-Compiler Plugin Meteor Script SOPREMO Compiler PACTProgram StratosphereOptimizer RuntimeOperators NepheleDataflowEngine NepheleParallel Dataflow DIMA–TUBerlin 29.07.2013 83 83

  9. TheStratosphereSystemStack Stratosphereoptimizer picks: -Physicalexecution strategies -Partitioningstrategies -Operatororder Pact4 Scala Scala-Compiler Plugin Meteor Script SOPREMO Compiler PACTProgram StratosphereOptimizer RuntimeOperators NepheleDataflowEngine NepheleParallel Dataflow DIMA–TUBerlin 29.07.2013 84 84

  10. Abschnittsübersicht Meteor Compiler Scala- Compiler StratosphereOptimizer RuntimeOperators NepheleDataflowEngine THEMETEORSCRIPTING LANGUAGE DIMA–TUBerlin 29.07.2013 85 85

  11. MeteorExamples DIMA–TUBerlin 29.07.2013 86 86

  12. MeteorExamples(2) DIMA–TUBerlin 29.07.2013 87 87

  13. Abschnittsübersicht Meteor Compiler Scala- Compiler StratosphereOptimizer RuntimeOperators NepheleDataflowEngine THENEPHELEEXECUTION ENGINE DIMA–TUBerlin 29.07.2013 89 89

  14. NepheleJobGraphs JobGraph ExecutionGraph F F F Channels E E E Tasks Parallel Execution D D D (networkchannel) C C C B B B (memorychannel) A A A Tasksconsumedatastreams andproducedatastreams Channelsarespannedaccordingtoa "distributionpattern" DIMA–TUBerlin 29.07.2013 90 90

  15. NepheleArchitecture Workloadovertime ■Standardmasterworkerpattern ■Workerscanbeallocatedondemand Client PublicNetwork(Internet) ComputeCloud Master Private/VirtualizedNetwork CloudController PersistentStorage Worker Worker Worker DIMA–TUBerlin 29.07.2013 91

  16. StructureofaNepheleSchedule ■NepheleScheduleisrepresentedasDAG Output1 Task:LineWriterTask.program Output:s3://user:key@storage/outp Task1 Task:MyTask.program □Verticesrepresenttasks □Edgesdenotecommunicationchannels ■Mandatoryinformationforeachvertex □Taskprogram □Input/outputdatalocation(I/Overticesonly) ■Optionalinformationforeachvertex Numberofsubtasks(degreeofparallelism) Numberofsubtaskspervirtualmachine □ □ Typeofvirtualmachine(#CPUcores,RAM…) Channeltypes Sharingvirtualmachinesamongtasks □ □ □ Input1 Task:LineReaderTask.program Input:s3://user:key@storage/input DIMA–TUBerlin 29.07.2013 92

  17. InternalScheduleRepresentation ■Nephelescheduleisconvertedintointernal representation Output1(1) ID:2 Type:m1.large Task1(2) ■Explicitparallelization □Parallelizationrange(mpl)derivedfromPACT □WiringofsubtasksderivedfromPACT ■Explicitassignmenttovirtualmachines □SpecifiedbyIDandtype □Typereferstohardwareprofile ID:1 Type:m1.small Input1(1) DIMA–TUBerlin 29.07.2013 93

  18. ExecutionStages ■Issueswithon-demandallocation: □Whentoallocatevirtualmachines? □Whentodeallocatevirtualmachines? □Noguaranteeofresourceavailability! ■Stagesensurethreeproperties: □VMsofupcomingstageareavailable □Allworkersaresetupandready □Dataofpreviousstagesisstoredinpersistent manner Stage1 Output1(1) ID:2 Type:m1.large Stage0 Task1(2) ID:1 Type:m1.small Input1(1) DIMA–TUBerlin 29.07.2013 94

  19. ChannelTypes ■Networkchannels(pipeline) Stage1 Output1(1) ID:2 Type:m1.large □Verticesmustbeinsamestage ■In-memorychannels(pipeline) □VerticesmustrunonsameVM □Verticesmustbeinsamestage Stage0 ■Filechannels □VerticesmustrunonsameVM □Verticesmustbeindifferentstages Task1(2) ID:1 Type:m1.small Input1(1) DIMA–TUBerlin 29.07.2013 95

  20. FromPACTstoNephele PACTcode (grouping) invoke(): while(!input2.eof) KVPairp=input2.next(); hash-table.put(p.key,p.value); functionmatch(Keyk,Tupleval1, while(!input1.eof) KVPairp=input1.next(); Tupleval2) ->(Key,Tuple) User Function KVPaitt=hash-table.get(p.key); if(t!=null) KVPair[]result= UF.match(p.key,p.value,t.value); { Tupleres=val1.concat(val2); res.project(...); Keyk=res.getColumn(1); output.write(result); Return(k,res); end } Nephelecode (communication) In-Memory Channel compile V4 V4 UF1 (map) UF2 (map) V1 V2 V3 V1 V3 V2 V3 V1 V3 V2 UF4 UF3 (match) span V3 V4 Network Channel (reduce) DIMA–TUBerlin 29.07.2013 96 96

  21. Abschnittsübersicht Meteor Compiler Scala- Compiler StratosphereOptimizer RuntimeOperators NepheleDataflowEngine Second-orderfunctionsfordataparallelism THEPACTPROGRAMMING MODEL 29.07.2013 DIMA–TUBerlin 97 97

  22. ParallelizationContracts(PACTs) Second-order function First-orderfunction (usercode) Data Data ■Describehowinputispartitionedingroups □“Whatisprocessedtogether” ■First-orderUDFcalledonceperinputgroup ■MapPACT □Eachinputrecordformsagroup, □EachrecordisindependentlyprocessedbyUDF ■ReducePACT □Oneattributeisthedesignatedkey □Allrecordswithsamekeyvalueformagroup MapPACT ReducePACT 29.07.2013 DIMA–TUBerlin 98 98

  23. MoreParallelizationContracts CrossPACT Eachpairofinput recordsformsa group MatchPACT Eachpairwithequal keyvaluesformsa group CoGroupPACT Allpairswithequal keyvaluesforma group 2DReduce DistributedDistributedequi-join Cartesianproduct ■MorePACTscurrentlyunderconsideration □Forsimilarityoperators,streamprocessing,etc 29.07.2013 DIMA–TUBerlin 99 99

  24. PACTProgrammingModel ■APACTprogramisanarbitrary dataflowDAGconsistingofoperators ■Anoperatorconsistsof □Asecond-orderfunction(SOF)signature (PACT) □Auser-definedfirst-orderfunction(FOF) writteninJava ■PACTprogramsserveasintermediate representation,butarealsoexposed totheuser □ToimplementUDFsforfunctionalitynot supportedbyMeteor Sink1 Reduce(onA) sum(B),avg(C) Match(A=D) if(A>3)emit Map C:=max(A,B) Source1 Extract(A,B) Map if(D>4)emit Source2 Extract(D,E) 29.07.2013 DIMA–TUBerlin 100 100

  25. PACTProgrammingModel ■APACTprogramisanarbitrary dataflowDAGconsistingofoperators ■Anoperatorconsistsof □Asecond-orderfunction(SOF)signature (PACT) □Auser-definedfirst-orderfunction(FOF) writteninJava ■PACTprogramsserveasintermediate representation,butarealsoexposed totheuser □ToimplementUDFsforfunctionalitynot supportedbyMeteor Sink1 Reduce(onA) sum(B),avg(C) Match(A=D) if(A>3)emit Map C:=max(A,B) Source1 Extract(A,B) Map if(D>4)emit Source2 Extract(D,E) 29.07.2013 DIMA–TUBerlin 100 100

  26. Abschnittsübersicht Meteor Compiler Scala- Compiler StratosphereOptimizer RuntimeOperators NepheleDataflowEngine OpeningtheBlackBoxes THESTRATOSPHERE OPTIMIZER 29.07.2013 DIMA–TUBerlin 101 101

  27. OptimizerDesign ■Cost-basedoptimizerproducesphysicalexecutionplan givenPACTprogram Annotatesedgeswithdistributionpatters,e.g.,broadcast,partition Choosesphysicalexecutionstrategies(e.g.,hash/sort) ReordersPACTfunctions Constructs“Nephelejobgraph” □ □ □ □ ■Challenge:Semanticsofuser-definedfunctionsunknown □Howtoderivecorrecttransformations(thistalk) □Howtocostfunctions(ongoingwork) □MixandmatchUDFsandnativeoperators(ongoingwork) 29.07.2013 DIMA–TUBerlin 102 102

  28. OptimizationOverview ■Approach: □StaticallyanalyzeusercodeineachPACTUDFsandextractproperties □Basedontheseproperties,derivesemanticallycorrecttransformations □Enumeratesemanticallyequivalentplans ■Contribution:HowtodeeplyembedMapReducefunctions intoaqueryoptimizer □Parallelizationandreordering □Appliestodataflowscomposed(inpart)offunctionswrittenin arbitraryimperativecode □ExportabletoScope,SQL/MapReduce(e.g.,Aster,Greenplum) 29.07.2013 DIMA–TUBerlin 103 103

  29. …viaStaticCodeAnalysis Feasible: 1voidmatch(Recordleft, 2 3 4 5 6 7 8 9 10 11 12 13 14 Recordright, Collectorcol){ Recordout=copy(left); if(left.get(0)>3){ doublea=right.get(2); out.set(2,1.0/a); } out.set(1,42); out.set(3,right.get(0)); out.set(4,right.get(1)); out.set(5,right.get(2)); col.emit(out); } 1.Recorddatamodel, fixedAPIfor 2.Nocontrolflowbetween operators Correct: ■Difficultycomesfrom differentcodepaths ■Correctnessguaranteed throughconservatism ■AddtoR,Wwhenin doubt 29.07.2013 DIMA–TUBerlin 104 104

  30. OpeningtheBlackBoxes… Analyzeusercodetodiscover: ■OutputschemaOf:Schemaofoutput recordgivenschemaofinput record(s) ■ReadsetRf:Attributesoftheinput record(s)thatmightinfluenceoutput ■WritesetWf:Attributesofthe outputrecord(s)thatmighthave differentvaluesfromrespective inputattributes ■EmitcardinalityEf:Boundson recordsemittedpercall(1,>1,…) (Of,Rf,Wf,Ef) PACT f 29.07.2013 DIMA–TUBerlin 105 105

  31. CodeAnalysisAlgorithm ■Rffromgetstatements ■Wfbybackwardstraversal ofdataflowgraph startingfromemit statement ■Efbytraversingcontrol flowgraph Input1=[A,B,C] Input2=[D,E,F] Output=[A,B,C,D,E,F] Rf={A,B,C,D,E,F} Wf={B,C} Ef=1 1voidmatch(Recordleft, 2Recordright, 3Collectorcol){ 4Recordout=copy(left); 5if(left.get(0)>3){ 6doublea=right.get(2); 7out.set(2,1.0/a); 8} 9out.set(1,42); 10out.set(3,right.get(0)); 11out.set(4,right.get(1)); 12out.set(5,right.get(2)); 13col.emit(out); 14} 29.07.2013 DIMA–TUBerlin 106 106

  32. AutomaticParallelization ■Optimizercanpick partitioningstrategies □FromPACTsignature ■E.g.,forMatch: broadcast,partition,SFR ■Partitioningstrategies propagatedtop-downas interestingproperties ■Caninferpreserved partitioningviaR/Wsets □Identifiespass-through UDFs ■AReducedoesnotalways implyaphysicalsort operator Sink1 Reduce(onA) sum(B),avg(C) fifo Match(A=D) if(A>3)emit probeHT(A)buildHT(D) parti./sort.(A)partition(D) MapMap C:=max(A,B)if(D>4)emit Source1Source2 Extract(A,B)Extract(D,E) 29.07.2013 DIMA–TUBerlin 107 107

  33. OperatorReordering ■ReorderingPACTs □Reducedatavolume □Introducenewpartitioningopportunities ■Reordering,partitioning,andphysical operatorsinonestage □“Optimal”executionplan ■Powerfultransformationsusingread andwriteconflicts ■Can“emulate”mostrelational optimizationswithoutknowing operatorsemantics Sink1 Match(A=D) if(A>3)emit buildHT(A) Reduce(onA)probeHT(D) sum(B),avg(C)part./sort(D) buildHT(A) partition(A) Map C:=max(A,B) Source1 Extract(A,B) Map if(D>4)emit Source2 Extract(D,E) 29.07.2013 DIMA–TUBerlin 108 108

  34. ExampleTransformations Theorem1:TwoMapoperatorscanbereorderediftheir UDFshaveonlyread-readconflicts Theorem2:ForaMapandaReduce,weneedinaddition theReducekeygroupstobepreserved Enabledoptimizations: Selectionpush-down (Bushy)joinreordering Aggregationpush-down Equivalenttoinvariantgroupingtransformation[Chaudhuri&Shim1994]f Reorderingofnon-relationalReducefunctions 29.07.2013 DIMA–TUBerlin 109 109

  35. Abschnittsübersicht SOPREMO Compiler Scala- Compiler Stratosphereoptimizer RuntimeOperators NepheleDataflowEngine Spinningfastiterativedataflows SUPPORTFORITERATIVE QUERIES 29.07.2013 DIMA–TUBerlin 110 110

  36. Motivation ■IterationsimportantforMachineLearning,graphs,etc ■Manysetupsrequiremultiplesystemsthatarededicatedto specialstepsinaprocessingpipeline ■Example: □MapReduce(extract,filter,transform,aggregate) □SpecializedSystemsforModeltraining MapReduceUpdate,Pregel/GraphLab,Specialized"homebrewed" solutions Extract Transform Postprocess &TestModel TrainModel DWH/ Hadoop/ Stratosphere DWH/ Hadoop/ Stratosphere Pregel(Giraph) GraphLab DIMA–TUBerlin Source Data 29.07.2013 Result 111 111

  37. Approach ■Don’tbuildspecializedsystems-embediterationsina dataflowsystemsurfacingproperabstractions ■Gainqueryoptimization,externalmemoryalgorithms,… ■LayerAPIsontop DSLScript CustomAPI DataFlowAPI DataFlowOptimizer ParallelDataFlowRuntime DataFlowPrograms (MapReduce/PACT/ extendedRel.Alg.) 29.07.2013 DIMA–TUBerlin 112 112

  38. “Bulk”Iterations ■Recomputestateateach iteration ■Conceptualfeedbackedgein thedataflow–lazyunrolling possible ■Distinguishdynamicdata path(differentdataeach iteration)andconstantdata path(same) □Cachingheuristicswere constantanddynamicpaths meet □Cacheddatamaybeindexed ■Optimizerweighscostsfor constantanddynamicdata pathdifferently □Automaticallyfavorsplansthat pushworktotheconstantpath S (pid,r) dynamic Reduce(ontid) Sumup partial ranks JoinP andA (pid,tid,p) constant (pid=tid,r=∑k) Match(onpid) (tid,k=r*p) A p 29.07.2013 DIMA–TUBerlin 113 113

  39. PageRank:TwoOptimizerPlans O O Sumup partialranks JoinPandA Sumup partialranks JoinPandA Reduce(ontid) (pid=tid,r=∑k) fifo Match(onpid) (tid,k=r*p) Reduce(ontid) (pid=tid,r=∑k) part./sort(tid) Match(onpid) (tid,k=r*p) probeHashTable (pid) CACHE part./sort(tid) A(pid,tid,p) CACHE buildHashTable(pid) partition(pid) A(pid,tid,p) probeHash- Table(pid) I partition(pid) buildHash- Table(pid) broadcast I fifo p p (pid,r) (pid,r) 29.07.2013 DIMA–TUBerlin 114 114

More Related