Unbox Variable Lookup Optimization: Enhancing Performance in Compiler Transformations
This document outlines the novel unbox variable cache structure and lookup instructions aimed at optimizing performance in programming languages. Key components include a software cache system for storing unboxed scalar variables, state management policies for cache cells, and a detailed analysis of the performance gains in for loop scenarios. The transformation passes in the compiler leverage unboxing opportunities, improving variable lookup efficiency and reducing object creation. Performance evaluation illustrates significant efficiency enhancements, particularly in integer addition operations.
Unbox Variable Lookup Optimization: Enhancing Performance in Compiler Transformations
E N D
Presentation Transcript
R Unbox Variable Lookup Optimization Apr. 2012
Outline • Unbox Variable Cache Structure • Unbox Variable Lookup Instructions and Compiler Transformation • Performance Evaluation • Obvious Performance Gain in ForLoopAdd example
Unbox Variable Cache Structure • A Software Cache for Unboxed Scalar Variables • Data: • 64bit: store int/real • Length: current frame constant table’s size • Use symbol’s index to access the cache directly • May waste some space, but simple • State: • Used 4bit right now • 0~1: state: INVALID, VALID, MODIFIED • 2~3: cache type: Logical, Int, Real • Counter for each frame • modified_count: count of the total modified cache cells 64bit Cache State Cache Current Frame Cache Cache State
Cache State • Simple state change policies First time get var and unbox Get var INVALID VALID Set unbox var Write back Set unbox var MODIFIED Get var/set unbox var
Unbox Variable Lookup Instructions • Get Variable • GETLOGICALUNBOX, GETINTUNBOX, GETREALUNBOX • Merge the semantics of • GETVAR, GUARD and UNBOX • Three Operands • 1): symbol index; 2) Expected type; 3) Guard failure PC • Set Variable • SETUNBOXVAR • Write a scalar value into the cache • If the var is a new variable, still not define it in current frame • Define it when writing back • POP value from the stack • POPUNBOX • Slightly different to POP, need pop unbox type stack, too • Write back modified scalar variables • UNBOXWRITEBACK • Box all modified variables, and set them in the current frame
Unbox Variable Lookup Compiler Transformation • Current Compiling Passes • Decoding Pass • Build Jump Target • Type Annotation Pass • Unbox Opportunity Identification Pass • Code Unbox Opt Transformation Pass • Will use the new Instructions • Code Clean Pass • PC Fix Pass • Jump Target Fix • Encoding Pass Add new policies
Unbox Opportunity Identification Pass • SETVAR • Original: the top stack element should be a boxed value • New: the top stack element could be unboxed • Could expose more unbox oportunites PC STMT 1 LDCONST, 1 3 SETVAR, 2 5 POP Can unbox this one
Code Unbox Opt Transformation Pass • SETVAR • If the top stack element is a scalar value • SETVAR UNBOX; SETUNBOXVAR; BOX • Insert GUARD if needed (the top stack element’s type is from profile) • The final BOX: Always maintain the stack’s shape in this pass • CALL, RETURN • Add UNBOXWRITEBACK in front of it PC STMT 1 LDCONST, 1 3 UNBOXREAL 4 SETVARUNBOXVAR, 2 6 BOXREAL 7 POP PC STMT 1 LDCONST, 1 //Real 3 SETVAR, 2 5 POP
Code Clean Pass (Instruction Combine) • GETVAR + GUARD + UNBOX • According to the type, transform to • GETLOGICALUNBOX, GETINTUNBOX, GETREALUNBOX • BOX + POP • POPUNBOX • No need box again. But need pop the unbox type stack
Transformation Example • RealAdd run <-function() { a <- 101; b <- a+202; print(b); }; PC STMT 1 LDCONSTREAL, 1 3 SETUNBOXVAR, 2 5 POP_UNBOX 6 GETREALUNBOX, 2, 2, 8 10 LDCONSTREAL, 3 12 REALADD 13 SETUNBOXVAR, 5 15 POP_UNBOX 16 GETFUN, 6 18 MAKEPROM, 7 20 UNBOXWRITEBACK 21 CALL, 8 23 UNBOXWRITEBACK 24 RETURN PC STMT 1 LDCONST, 1 3 SETVAR, 2 5 POP 6 GETVAR, 2 8 LDCONST, 3 10 ADD, 4 12 SETVAR, 5 14 POP 15 GETFUN, 6 17 MAKEPROM, 7 19 CALL, 8 21 RETURN
Special Handling in ForLoop • The ForLoop will update the loop variable in STEPFOR • The semantic is something like a GETVAR • If the loop variable is Logical/Integer/Real • Just get the scalar value, and load it into the cache • Save the value • Change the cache state to VALID
Performance Evaluation on ForLoop Examples • Examples • Experiment Methodology • Running Method • Run “run()” 10 times • 1st time: profile • 2nd time: trigger compiling/optimization, and start to use the new code • 3rd-10th: just use the optimized code • All the following result are normalized • Average of 5 runs in each case • Normalized by only using the 3rd-10th runs • And normalized to 1 iteration ForIntAdd ForIntAdd3 run <-function() { r <- 11; for( i in 1:1000000) { r <- r + 1 + i + 2; } print(r); }; run <-function() { r <- 11; for( i in 1:1000000) { r <- r + i; } print(r); };
Performance Result • ForIntAdd • ForIntAdd3
Analysis - Performance Gain Source • Much efficient variable lookup • Much Less object creation during SETVAR • ForLoop example: the same effect that hosting all box/unbox out of the loop
Next Step • Working on more larger test case • Code pieces extracted from real benchmark • E.g. Shootout • Add New Instruction/Compiler transformation • To support the codes used in these new test cases