Aggressive Program Analysis Framework for Static Error Checking in Open64

Aggressive Program Analysis Framework for Static Error Checking in Open64 Hongtao Yu Wei Huo ZhaoQing Zhang XiaoBing Feng Key Laboratory of Computer System and Architecture, Institute of Computing Technology, Chinese Academy of Sciences { htyu, huowei, zqzhang, fxb }@ict.ac.cn

Outline • Introduction • Framework • Field-sensitive pointer analysis • Flow- and context-sensitive dataflow analysis • Conclusion

Introduction • Open64 is a high performance compiler • However, existed scalar analysis of Open64 is not precise enough to serve for static error checking. • The original interprocedural framework is • Flow-insensitive • Field-insensitive • Context-insensitive for some problems

Improvement • Our aim is to improve the interprocedural phase, to gain more precision in analysis • To gain flow-sensitivity, we integrate the intraprocedural analysis phase into interprocedural phase • To gain context-sensitivity, transfer functions modeling procedure effects are computed for each procedure • To gain field-sensitivity, fields are distinguished by <base, offset, size>.

Error checking • Several error checking problems can be abstract as a common dataflow problem and be solved using the fix point theory on semi-lattices • Uninitialized reference • Null pointer deference • File I/O behavior

Contributions • We have designed and implemented: • A flow- and context-sensitive interprocedural framework, under which kinds of error checking are performed. • Two efficient field-sensitive pointer analysis. • One is a directly implementation of Steensgaard, B. Points-to Analysis by Type Inference of Programs with Structures and Unions. In Proceedings of the Proceedings of the 6th International Conference on Compiler Construction, 1996. • The other is an improvement of it

FICI pointer analysis FICS pointer analysis Build Call Graph Interprocedural control flow optimization Construct SSA Form for each procedure IPL summay phase IPA_LINK Framework Interprocedural Analyzer • FICI = Flow- and • Context-insensitive. • FICS= Flow-insensitive but • Context-sensitive. • FSCS= Flow- and • Context-sensitive. • FICS and FSCS have not been implemented yet. FSCS pointer analysis Static error checker

Pointer analysis architecture • Employs several field-sensitive algorithms that differ in precision and efficiency. • Pointer analysis are performed in the increasing order of precision. • The first is a field-sensitive unification-based pointer analysis. • Each analysis is performed on the base of the former analysis so that we can obtain higher efficiency than performing the analysis separately. • Up to now, only the field-sensitive unification-based pointer analysis has been implemented.

Interprocedural control flow optimization • Dead Function Elimination (DFE) • Delete uninvoked functions • Fake Control Flow Elimination (FCFE) • Recognizes the program points where control flow must terminate • Flow- and context-sensitive problem • Based on Gated Single Assignment Form (GSA).

Fake Control Flow Elimination #define error(str) report(__FILE__, __LINE__, str, ERROR) void parseClusterFile(preproc_t *p, char *name) { ….. tmp_fsize = ftell(fp); if (tmp_fsize > INT_MAX) { L1: error("File too large!"); } else { fsize = (size_t)tmp_fsize; } rewind(fp); buf = xmalloc(fsize); ….. } void report(char *file, unsigned line, char *str, char sev) { if (sev == ERROR) { … L2:exit(EXIT_UNKNOWN); } else if (sev == WARRNING) { printf(…); } else if (sev == NOTE) { printf(…); } } Taken from HyperSAT-1.7 tmp_fsize = ftell(fp); B1 tmp_fsize > INT_MAX B2 N B3 Y fsize = (size_t) tmp_fsize error ("File too large!") B4 rewind(fp); buf = xmalloc(fsize) B5

Fake Control Flow Elimination #define error(str) report(__FILE__, __LINE__, str, ERROR) void parseClusterFile(preproc_t *p, char *name) { ….. tmp_fsize = ftell(fp); if (tmp_fsize > INT_MAX) { L1: error("File too large!"); } else { fsize = (size_t)tmp_fsize; } rewind(fp); buf = xmalloc(fsize); ….. } void report(char *file, unsigned line, char *str, char sev) { if (sev == ERROR) { … L2:exit(EXIT_UNKNOWN); } else if (sev == WARRNING) { printf(…); } else if (sev == NOTE) { printf(…); } } Taken from HyperSAT-1.7 tmp_fsize = ftell(fp); B1 tmp_fsize > INT_MAX B2 N B3 Y fsize = (size_t) tmp_fsize rewind(fp); buf = xmalloc(fsize) B5 exit

Unification-based pointer analysis • We have implemented two efficient field-sensitive pointer analysis. • One is a directly implementation of Steensgaard, B. Points-to Analysis by Type Inference of Programs with Structures and Unions. In Proceedings of the Proceedings of the 6th International Conference on Compiler Construction, 1996. • The other is an improvement of it

Experiment explanation • Example : • The benchmark name ; • KLOC: • The size of the benchmark (line numbers counted by kilo lines); • Field OPs: • The number of indirect memory access to fields of structural objects; • Classes: • The number of alias classes of the total Field OPs above; memory access operations in the same alias class are regarded as aliased • Max: • the maximal number of Field OPs in the same alias class; • Min: • the minimal number of Field OPs in the same alias class; • Average: • the average number of Field OPs in the same alias class • Time: • the time for analyzing the benchmark;

The main idea of improvement is to consider memory layout in high-level analysis in order to precisely distinguish fields of structure objects.

τ1 τ7 τ2 τ3 τ8 τ9 τ4 τ11 τ5 τ10 τ6 Field-sensitive Steensgaard Classification int i1, *i2, **i3, **i4; float f1, **f2; struct { int a, *b, *c; } s1, *s2; struct { int d, *e; float f, *g } s3, *s4; s2 = &s1; s4 = &s3; f2 = &s4->g; *f2 = &f1; i3 = &s2->b; i4 = &s2->c; *i4 = &i1; i2 = (int*) s2; i2 = (int*) s4; i2: τ1s1.a: τ8 s2: τ2s3.d: τ8 s4: τ3 s1.b: τ9 i3: τ4 s3.e: τ9 i4: τ5s1.c:τ10 f2: τ6s3.f: τ10 s1: τ7 s3.g: τ10 s3: τ7 i1: τ11 f1: τ11

τ1 τ7 τ2 τ3 τ8 τ9 τ4 τ10 τ12 τ5 τ13 τ11 τ6 Aggressive Field-sensitive Classification i2: τ1s1.a: τ8 s2: τ2s3.d: τ8 s4: τ3 s1.b: τ9 i3: τ4 s3.e: τ9 i4: τ5s1.c: τ10 f2: τ6s3.f: τ11 s1: τ7 s3.g: τ11 s3: τ7 i1: τ12 f1: τ13

object object struct struct simple simple blank blank Another Improvement • Improving the type hierarchy (b) The improved type hierarchy (a) Original Type hierarchy We make a change for the type hierarchy and corresponding change in type system that enables the result of joining simple type and struct type is another struct type with only fields possibly overlapped with the scalar joined.

Flow- and context-sensitive dataflow analysis A transfer function evaluator Computes transfer functions for each procedure Traverse the procedure call graph in a bottom-up order, from callees to callers. To handle recursions, we reduced the call graph to a SCC-DAG in each SCC. A dataflow value propagator A data flow value is an element of the semi-lattice Propagates dataflow value from the entry to the exit of each procedure’s local CFG in a top-down manner Traverse the procedure call graph in a top-down manner

Transfer functions A single transfer function has the form x1…xn is a list of formal-in parameters, either a declared formal parameter or a location whose value at the procedure entry may be accessed by the procedure or the procedures it invokes y is a formal-out parameter, include not only the return value of this procedure but also all the locations whose value at the procedure exit may be accessed out of the procedure. The function body of f gives the mapping relations between inputs x1…xn and the output y.

Dataflow value propagator • At a procedure entry • Perform the “meet” operation on the data flow values from difference call sites. • At each callsite • Propagate value of each actual-in parameter to the corresponding formal-in parameter of the callee • Obtain the value of each formal-out parameter by applying the transfer function and propagate it to the corresponding actual-out parameter. • Need to perform iterations for loops and recursions

Checking uninitialized reference Abstract the task as solving a dataflow problem Any memory object in any reference site has a dataflow value “define”, “may define” or “undefine”. To determine which values they have. A memory object is initialized on all paths from the program entry to the current reference site, the memory object in this site has the value “define”. If on some path the memory object is not initialized, the vaule will be “undefine”. The initialization through indirect memory operations (e.g. the deference of pointers) results in a value “may define”.

Checking uninitialized reference (2) • First compute a transfer function for each procedure, the transfer function says that which global variables are modified and how to be modified by this procedure. • If on every path in the procedure form entry to exit a global variable is modified by direct assignments, the variable is “must mod”. • if in every path variable is modified by either direct or indirect assignments, the variable is “may mod”. • Otherwise the variable must not be modified on some of the paths, so it is “may not mod” • The transfer function evaluator performs an interprocedural modified side effect analysis to the whole program.

Checking uninitialized reference (3) • Then propagates the modification property at procedure entry to the exit. • The “must mod” and “may mod” value of a transfer function are both regarded as “define” • The “may not mod” value is regarded as “undefine” • Only propagate for scalar variables that are either local or global currently.

Time: the time for checking the benchmark; • Reports: the total number of warnings produced on the benchmark; • Bugs: the number of true bugs found; • FP Rate: the false positive rate; • Compare with GCC with the warning option –Wuninitialized. No warnings are reported by GCC, because GCC can only check uninitialized reference for auto variables intraprocedurally.

Conclusion • We have introduced our work of constructing an aggressive framework for program analysis in order to do error checking in Open64. • We • Integrates intraprocedural analysis into interprocedural phase in order to do flow- and context-sensitive whole program analysis. • Improves the original alias analysis to be field-sensitive and compared the three unification-based methods. • An error checking instance, checking uninitialized reference is displayed.

Thank You ! www.themegallery.com

Aggressive Program Analysis Framework for Static Error Checking in Open64