1 / 24

Fully Dynamic Specialization

Fully Dynamic Specialization. AJ Shankar OSQ Lunch 9 December 2003. “That’s Why They Play the Game”. Programs are executed because we can’t determine their behavior statically! Idea: Optimize programs dynamically to take advantage of runtime information we can’t get statically

buzz
Download Presentation

Fully Dynamic Specialization

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Fully Dynamic Specialization AJ Shankar OSQ Lunch 9 December 2003

  2. “That’s Why They Play the Game” • Programs are executed because we can’t determine their behavior statically! • Idea: Optimize programs dynamically to take advantage of runtime information we can’t get statically • Look at portions of the program for predictable inputs that we can optimize for

  3. Unpredictable Unpredictable Predictable Predictable Generic G P2 P3 P4 Specialization • Recompile portions of the program, using known runtime values as constants • Possibly many variants of the same code • Allow for fallback to original code when assumptions are not met • Predictable == recurrent

  4. How It Works • Chose a good region of code to specialize: after a good predictable instruction • Insert dispatch that checks the result of the chosen instruction • Recompile code for different results of the instruction • During execution, jump to appropriate specialized code LOAD pc X = … X = … Dispatch(X) Dispatch(X) Dispatch(X) Spec1 Spec1 Spec1 Spec2 Spec2 Spec2 Default Default Default … … … … … … Rest of Code

  5. When Is This a Good Idea? • Any app whose execution is heavily dependent on input • For instance • Interpreters • Raytracers • Dynamic content producers (CGI scripts, etc.)

  6. Specialization Is Hard! • Specializing code at runtime is costly • Can even slow the program down • Existing specializers rely on static annotations to clue them in about profitable areas • Difficult to get right • Limits specialization potential

  7. Existing: DyC, Cyclone, etc. • Explicitly annotate static data • No support for automatic specialization of frequently-executed code • Could compile lots of useless stuff • No concrete store information • Doesn’t take advantage of the fact that memory location X is constant for the lifetime of the program

  8. Existing: Calpa • Mock, et al, 2000. Extension to DyC. • Profile execution on sample input to derive annotations • But converting a concrete profile to an abstract annotation means • Still unable to detect concrete memory constants • Frequently executed code for arbitrary input? • Still needs source, is offline!

  9. Motivating Example: Interpreter Sample interpreted program: X = 10; … WHILE (Z != 0) { Y = X+Z; … } while(1) { i = instrs[pc]; switch(instr.opcode) { case ADD: env[i.res] = env[i.op1] + env[i.op2]; pc++; break; case BNEQ; if (env[i.op1] != 0) pc = env[i.op2]; else pc++; break; ... } } • X is constant after initialization • concrete memory location • Y = X+Z executed frequently

  10. Motivating Example: Interpreter Sample interpreted program: X = 10; … WHILE (Z != 0) { Y = X+Z; … } while(1) { while (pc == 15) { // Y = X + Z env[3] = 10 + env[2]; … // Z != 0 ? if (env[2] == 0) pc = 19; } else { // normal loop } } while(1) { i = instrs[pc]; switch(instr.opcode) { case ADD: env[i.res] = env[i.op1] + env[i.op2]; pc++; break; case BNEQ; if (env[i.op1] != 0) pc = env[i.op2]; else pc++; break; ... } }

  11. A More Concrete Approach • Do everything at runtime! • Specialize on execution-time hot values • Know which concrete memory locations are constant • Other benefits of this approach: • Specialize temporally, as execution progresses • Specialize dynamically loaded libraries as well • No annotations or source code necessary

  12. A Quick Recap • Chose a good region of code to specialize • Insert dispatch that checks the result of the chosen instruction (the “trigger”) • Recompile code for different values of a hot instruction • During execution, jump to appropriate specialized code LOAD pc X = … X = … LOAD pc Dispatch(X) Dispatch(pc) Dispatch(X) Dispatch(X) Spec1 pc=15 Spec1 Spec1 Spec2 Spec2 pc=27 Spec2 Default Default Default while(1) … … … … … … … Rest of Code

  13. The Details • Need to identify the best predictable instruction • Specializing on its result should provide the greatest benefit • To find it, gather profile information about all instructions • Need to actually do the specializing

  14. Instrumentation: Hot Values • What’s a hot value? One that occurs frequently as the result of an instruction • x % 2 has two very hot values, 0 and 1 • Good candidate instructions are predictable: result in (only) a few hot values • For instance, small_constant_table[x], but not rand(x) • Case study: Interpreter • Predictable instructions: LOAD pc, instr.opcode instr = instrs[pc]; switch(instr.opcode) { … }

  15. Instrumentation: Store Profile • Keep track of memory locations that have been written to • Idea: if a location hasn’t been written to yet, it probably won’t be later, either • Case study: Interpreter • Store profile says env[Y] written to a lot, but env[X], instrs[] never written to regs[instr.res] = regs[instr.op1] + regs[instr.op2];

  16. Invalidating Specialized Code • Memory locations may not really be constant • When ‘constant’ memory is overwritten, must invalidate or modify specializations that depended on it • How does Calpa handle invalidation? • Computes points-to set • Inserts invalidation calls at all appropriate points (offline) • Too costly an approach, without modification

  17. Invalidation Options Class Interpreter { private Instruction[] instrs; void SetInstrs(Instruction[] is) { instrs = is; } } • Write barrier • Still feasible if field is private • On-entry checks • Feasible if specialization depends on a small number of memory locations • e.g. Factor(BigInt x) • Hardware support • e.g. Mondrian • Ideal solution • Possible to simulate? Hot Instruction CheckMem Dispatch Invalidate Spec1 Default

  18. Specialization Algorithm • Find good candidate instructions • Predictable • Frequently executed • For each candidate instruction • Simultaneously evaluate method using constant propagation for some of its hot values • Compute overall cost/benefit • Choose the best instruction

  19. Specializing the Interpreter while(1) { i = instrs[pc]; switch(instr.opcode) { case ADD: env[i.res] = env[i.op1] + env[i.op2]; pc++; break; case BNEQ; if (env[i.op1] != 0) pc = env[i.op2]; else pc++; break; ... } } Candidates: Instr.opcode: Executed very frequently A small handful of values pc: Executed very frequently More values, but still reasonable

  20. Specializing on instr.opcode Dispatch(opcode) LOOP: i = instrs[pc] switch(ADD) switch(i.opcode) i.opcode = ADD benefit = 1 switch(ADD) case ADD: … … i.opcode = ADD benefit = 2 case ADD: env[i.res] = env[i.op1]+env[i.op2] i.opcode = ADD env[i.res] = env[i.op1]+env[i.op2] pc = pc + 1 i.opcode = ADD pc = pc + 1 goto LOOP i.opcode = ADD benefit = 3 goto LOOP i.opcode = ADD LOOP: i = instrs[pc] {} Other values of opcode have similar results…

  21. Specializing on pc Y = X + Z Dispatch(pc) LOOP: i = instrs[15] LOOP: i = instrs[pc] pc = 15 benefit = 1 LOOP: i = instrs[15] switch(i.opcode) pc = 15 ; i = ADD Y, X, Z benefit = 2 switch(ADD) case ADD: … … pc = 15 ; i = ADD Y, X, Z benefit = 3 case ADD: env[i.res] = env[i.op1]+env[i.op2] pc = 15 ; i = ADD Y, X, Z benefit = 6 env[Y] = 10 + env[Z] pc = 15; i = ADD Y, X, Z pc = pc + 1 benefit = 7 pc = 15 + 1 pc = 16 ; i = ADD Y, X, Z goto LOOP benefit = 8 LOOP: i = instrs[16] pc = 16 ; i = BNEQ Z, 15 benefit = 9 switch(BNEQ) pc = 16 ; i = BNEQ Z, 15 benefit = 10 if (env[Z] != 0) pc = 16 ; i = BNEQ Z, 15 benefit = … pc++; …

  22. Final Result • Choose to specialize on pc because benefit is far greater than for instr.opcode • Generate different versions for each of the hottest values of pc • Terminate loop unrolling either naturally (when we don’t know what pc is anymore) or with a simple heuristic

  23. Implementation Ideas • Use Dynamo • Hot trace as basis for specialization • Intuitively, follow the lifetime of an object as it travels through the program across function boundaries • Unfortunately, closed-source, and API isn’t expressive enough

  24. Implementation Ideas • JikesRVM • Java VM written in Java • Has a primitive framework for sampling • Has a fairly sophisticated framework for dynamic recompilation • Does aggressive inlining • Only instrument hot traces (but compiler is slow…)

More Related