1 / 64

Increasing complexities

Compiler optimizations based on call-graph flattening Carlo Alberto Ferraris professor Silvano Rivoira Master of Science in Telecommunication Engineering Third School of Engineering: Information Technology Politecnico di Torino July 6 th , 2011. Increasing complexities.

mirit
Download Presentation

Increasing complexities

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Compiler optimizationsbased on call-graphflatteningCarlo Alberto Ferrarisprofessor Silvano RivoiraMaster of Science in Telecommunication EngineeringThird School of Engineering: Information TechnologyPolitecnico di TorinoJuly 6th, 2011

  2. Increasingcomplexities Everydayobjects are becomingmulti-purposenetworkedinteroperablecustomizablereusableupgradeable

  3. Increasingcomplexities Everydayobjects are becomingmore and more complex

  4. Increasingcomplexities Software thatrunssmartobjectsisbecomingmore and more complex

  5. Diminishingresources Systemshavetoberesource-efficient

  6. Diminishingresources Systemshavetoberesource-efficient Resources come in manydifferentflavours

  7. Diminishingresources Systemshavetoberesource-efficient Resources come in manydifferentflavours Power Especiallyvaluable in battery-poweredscenariossuchas mobile, sensor, 3rd world applications

  8. Diminishingresources Systemshavetoberesource-efficient Resources come in manydifferentflavours Power, density Criticalfactor in data-center and product design

  9. Diminishingresources Systemshavetoberesource-efficient Resources come in manydifferentflavours Power, density, computational CPU, RAM, storage, etc. are oftengrowingslowerthan the potentialapplications

  10. Diminishingresources Systemshavetoberesource-efficient Resources come in manydifferentflavours Power, density, computational, development Developmenttime and costsshouldbeas low aspossiblefor low TTM and profitability

  11. Diminishingresources Systemshavetoberesource-efficient Resources come in manynon-orthogonalflavours Power, density, computational, development

  12. Do more withless

  13. Abstractions Weneedtomodularize and hide the complexity Operatingsystems, frameworks, libraries, managedlanguages, virtualmachines, …

  14. Abstractions Weneedtomodularize and hide the complexity Operatingsystems, frameworks, libraries, managedlanguages, virtualmachines, … Allofthiscomeswith a cost: genericsolutions are generallylessefficientthanad-hocones

  15. Abstractions Weneedtomodularize and hide the complexity Palm webOS User interface running on HTML+CSS+Javascript

  16. Abstractions Weneedtomodularize and hide the complexity Javascript PC emulator Running Linux inside a browser

  17. Optimizations Weneedtomodularize and hide the complexitywithoutsacrificing performance

  18. Optimizations Weneedtomodularize and hide the complexitywithoutsacrificing performance Compiler optimizationstrade off compilation timewithdevelopment, executiontime

  19. Vestigialabstractions The naturalsubdivisionof code in functionsismaintained in the compiler and all the way down to the processor Eachfunctionisself-containedwithstrictconventionsregulatinghowitrelatestootherfunctions

  20. Vestigialabstractions Processors don’t care aboutfunctions; respecting the conventionsis just additional work Push the contentsof the registers and returnaddress on the stack, jumpto the callee; execute the callee, jumpto the returnaddress; restore the registersfrom the stack

  21. Vestigialabstractions Manyoptimizations are simplynotfeasiblewhenfunctions are present int replace(int* ptr, int value) { inttmp = *ptr; *ptr = value; return tmp; } int A(int*ptr, intvalue) { returnreplace(ptr, value); } int B(int*ptr, intvalue) { replace(ptr, value); returnvalue; } void*malloc(size_tsize) { void*ret; // [variouschecks] ret = imalloc(size); if (ret == NULL) errno = ENOMEM; returnret; } // ... type *ptr = malloc(size); if (ptr == NULL) return NOT_ENOUGH_MEMORY; // ...

  22. Vestigialabstractions Manyoptimizations are simplynotfeasiblewhenfunctions are present interpreter_setup(); while (opcode = get_next_instruction()) interpreter_step(opcode); interpreter_shutdown(); function interpreter_step(opcode) { switch (opcode) { case opcode_instruction_A: execute_instruction_A(); break; case opcode_instruction_B: execute_instruction_B(); break; // ... default: abort("illegal opcode!"); } }

  23. Vestigialabstractions Manyoptimizationefforts are directed at workingaround the overheadcausedbyfunctions Inliningclones the body of the callee in the caller; optimalsolutionw.r.t.callingoverheadbutcauses code sizeincrease and cache pollution; usefulonly on small, hot functions

  24. Call-graphflattening

  25. Call-graphflattening Whatifwedismissfunctionsduringearlycompilation…

  26. Call-graphflattening Whatifwedismissfunctionsduringearly compilation and track the control flow explicitelyinstead?

  27. Call-graphflattening Whatifwedismissfunctionsduringearly compilation and track the control flow explicitelyinstead?

  28. Call-graphflattening Whatifwedismissfunctionsduringearly compilation and track the control flow explicitelyinstead?

  29. Call-graphflattening Wegetmostbenefitsofinlining, including the abilitytoperformcontextual code optimizations, without the code sizeissues

  30. Call-graphflattening Wegetmostbenefitsofinlining, including the abilitytoperformcontextual code optimizations, without the code sizeissues Where’s the catch?

  31. Call-graphflattening The load on the compiler increasesgreatlybothdirectly due to CGF itself and alsoindirectly due tosubsequentoptimizations Worse case complexity (numberofedges) isquadraticw.r.t. the numberofcallsitesbeingtransformed (heuristicsmay help)

  32. Call-graphflattening During CGF weneedtostaticallykeeptrackofall live valuesacrossallcallsites in allfunctions A valueisaliveifitwillbeneeded in subsequentinstructions A = 5, B = 9, C = 0; // live: A, B C = sqrt(B); // live: A, C return A + C;

  33. Call-graphflattening Basically the compiler hastostatically emulate ahead-of-timeall the possiblestackusagesof the program Thishasalreadybeendone on microcontrollers and resulted in a 23% decreaseofstackusage (and 5% performance increase)

  34. Call-graphflattening The indirect cause ofincreased compiler loadcomesfrom standard optimizationsthat are runafter CGF CGF doesnot create newbranches (eachcall and returninstructionisturnedexactelyinto a jump) butotheroptimizations can

  35. Call-graphflattening The indirect cause ofincreased compiler loadcomesfrom standard optimizationsthat are runafter CGF Mostoptimizations are designedto operate on smallfunctionswithlimitedamountsofbranches

  36. Call-graphflattening Manypossibleapplicationscenariosbesideinlining

  37. Call-graphflattening Manypossibleapplicationscenariosbesideinlining Code motion Moveinstructionsbetweenfunctionboundaries; avoidunneededcomputations, alleviate registerpressure, improve cache locality

  38. Call-graphflattening Manypossibleapplicationscenariosbesideinlining Code motion, macro compression Findsimilar code sequences in differentpartsof the code and mergethem; reduce code size and cache pollution

  39. Call-graphflattening Manypossibleapplicationscenariosbesideinlining Code motion, macro compression, nonlinear CF CGF supportsnativelynonlinearcontrolflows; almost-zero-cost EH and coroutines

  40. Call-graphflattening Manypossibleapplicationscenariosbesideinlining Code motion, macro compression, nonlinear CF, stacklessexecution No runtimestackneeded in fully-flattenedprograms

  41. Call-graphflattening Manypossibleapplicationscenariosbesideinlining Code motion, macro compression, nonlinear CF, stacklessexecution, stackprotection Effectivestackpoisoningattacks are muchharder or evenimpossible

  42. Implementation To test if CGF isapplicablealsotocomplexarchitectures and to validate some of the ideaspresented in the thesis, a pilotimplementationwaswrittenagainst the open-source LLVM compiler framework

  43. Implementation Operates on LLVM-IR; host and target architectureagnostic; roughly 800 linesof C++ code in 4 classes The pilotimplementation can notflattenrecursive, indirect or variadiccallsites; they can beusedanyway

  44. Implementation Enumerate suitablefunctions Enumerate suitablecallsites (and their live values) Create dispatchfunction, populatewith code Transformcallsites Propagate live values Removeoriginalfunctions or create wrappers

  45. Examples int a(int n) { return n+1; } int b(int n) { inti; for (i=0; i<10000; i++) n = a(n); return n; }

  46. int a(int n) { return n+1; } int b(int n) { inti; for (i=0; i<10000; i++) n = a(n); return n; }

  47. int a(int n) { return n+1; } int b(int n) { inti; for (i=0; i<10000; i++) n = a(n); return n; }

  48. Examples int a(int n) { return n+1; } int b(int n) { n = a(n); n = a(n); n = a(n); n = a(n); return n; }

  49. int a(int n) { return n+1; } int b(int n) { n = a(n); n = a(n); n = a(n); n = a(n); return n; }

  50. .type .Ldispatch,@function .Ldispatch: movl $.Ltmp4, %eax # store the return dispather of a in rax jmpq *%rdi # jump to the requested outer disp. .Ltmp2: # outer dispatcher of b movl $.LBB2_4, %eax # store the address of %10 .Ltmp0: # outer dispatcher of a movl (%rsi), %ecx # load the argument n in ecx jmp .LBB2_4 .Ltmp8: # block %17 movl $.Ltmp6, %eax jmp .LBB2_4 .Ltmp6: # block %18 movl $.Ltmp7, %eax .LBB2_4: # block %10 movq %rax, %rsi incl %ecx # n = n + 1 movl $.Ltmp8, %eax jmpq *%rsi # indirectbr .Ltmp4: # return dispatcher of a movl %ecx, (%rdx) # store in pointer rdx the return value ret # in ecx and return to the wrapper .Ltmp7: # return dispatcher of b movl %ecx, (%rdx) ret

More Related