1 / 131

Chord: A Versatile Platform for Program Analysis

Chord: A Versatile Platform for Program Analysis. Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial. What is Chord?. Static and dynamic program analysis framework for Java Started in 2006 as static Ch ecker o f r aces and d eadlocks Publicly available under New BSD License Key goals:

lesa
Download Presentation

Chord: A Versatile Platform for Program Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chord: A Versatile Platform for Program Analysis MayurNaik Intel Labs, Berkeley PLDI 2011 Tutorial

  2. What is Chord? • Static and dynamic program analysis framework for Java • Started in 2006 as static Checker of races and deadlocks • Publicly available under New BSD License • Key goals: • versatile: applies to various analyses, domains, platforms • extensible: users can build own analyses atop given ones • productive: facilitates rapid prototyping of analyses • robust: deterministic, handles partial programs, etc.

  3. Key Features of Chord • Many standard static and dynamic analyses • Writing/solving analyses using Datalog/BDDs • Analyses as “building blocks” • Context-sensitive static analysis framework • Dynamic analysis framework

  4. Outline of Tutorial • Part 1: • Getting Started With Chord • Program Representation • Part 2: • Analysis Using Datalog/BDDs • Chaining Analyses Together • Part 3: • Context-Sensitive Analysis • Dynamic Analysis

  5. Downloading Chord • Stable Binary Release • http://jchord.googlecode.com/files/chord-bin-2.0.tar.gz • Stable Source Release • http://jchord.googlecode.com/files/chord-src-2.0.tar.gz (mandatory) • Chord’s source code + JARs of libraries used by Chord • http://jchord.googlecode.com/files/chord-libsrc-2.0.tar.gz (optional) • (adapted) Java source code of libraries used by Chord • Latest Development Snapshotsvn checkout http://jchord.googlecode.com/svn/trunk/ chord • Or checkout only relevant directories under trunk/: • main/ (released as 1 above) • libsrc/ (released as 2 above) • test/ (Chord’s regression test suite) • … (many more)

  6. Compiling Chord • Requirements: • JVM for Java 5 or higher • Apache Ant • C++ compiler(not needed by default) • Optional: edit chord.properties • to enable C BuDDy library:set chord.use.buddy=true • to enable C++ JVMTI agent:set chord.use.jvmti=true • Run in main directory:ant compile main/ build.xml chord.properties agent/ bdd/ doc/ examples/ lib/ src/ web/ chord.jar libbuddy.so | buddy.dll |libbuddy.dylib libchord_instr_agent.so

  7. Running Chord • Requirements: JVM for Java 5 or higher • no other dependencies (e.g., Eclipse) • Run either command in any directory: • ant –f <...>/build.xml [–Dkeyi=vali]* run • requires Apache Ant • not available in Binary Release • java –cp <…>/chord.jar [–Dkeyi=vali]* chord.project.Boot • where <…> denotes path of Chord’s main/ directory • –Dkeyi=vali sets value of system property keyi to vali

  8. Chord Properties • All inputs to Chord are specified via System Properties • conventionally named chord.* (e.g., chord.work.dir) • Three choices with decreasing precedence: • On command line via –Dkey=val format • use to specify properties specific to the current Chord run • Via user-specified file denoted by chord.props.file • use to specify properties specific to program being analyzed(e.g. its main class, classpath, etc.) • default value = "[chord.work.dir]/chord.properties" • Via pre-defined file main/chord.properties • use to specify properties that must hold in every Chord run(e.g., maximum memory to be used by JVM)

  9. Architecture of Chord example program analysis programquadcode starts, blocks on D1 resumes, runs to finish starts, runs to finish domain D1analysis relation R12analysis domain D2analysis bytecodetranslator (joeq) starts, runs to finish relation R12 domain D2 domain D1    relationR2 relationR1 staticanalysis Dataloganalysis dynamicanalysis programbytecode   bytecodeinstrumentor(javassist) bddbddb BuDDy programinputs Java program Classic or Modern Runtime starts, blocks on D1, D2, R1, R12 starts, blocks on D1 user demands this to run resumes, runs to finish resumes,runs to finish resumes, runs to finish starts, blocks on R2, D2 analysis resultin XML programsource analysis resultin HTML saxon XSLT Java2HTML

  10. Setting Up a Java Program for Analysis example/src/ foo/ Main.java ... classes/ foo/Main.class ... lib/src/taz/ ... jar/ taz.jar chord.properties chord_output/ bddbddb/ • Command to run in Chord’s main directory: • ant –Dchord.work.dir=<…>/example run chord.main.class=foo.Mainchord.class.path=classes:lib/jar/taz.jarchord.src.path=src:lib/srcchord.run.ids=0,1chord.args.0="-thread 1 -n 10"chord.args.1="-thread 2 -n 50"

  11. Java Program Representations Java source code.java javac Java bytecode.class javap DisassembledJava bytecode

  12. Example: Java Source Code • 1: package test;2:3: public class HelloWorld {4: public static void main(String[] args) {5: System.out.print("Hello World!");6: }7: } File test/HelloWorld.java:

  13. Pretty-Printing Java Bytecode javap –private –verbose –classpath<CLASS_PATH> [–bootclasspath<BOOT_CLASS_PATH>] <CLASS_NAME> • public class test.HelloWorld extends java.lang.ObjectConstant pool:const #1 = Method #6.#20; // java/lang/Object."<init>":()V ...public static void main(java.lang.String[]);Code: Stack=2, Locals=1, Args_size=1 0: getstatic #2; // Field java/lang/System.out:Ljava/io/PrintStream; 3: ldc #3; // String Hello World! 5: invokevirtual #4; // Method java/io/PrintStream.println:... 8: return SourceFile: "HelloWorld.java" Run "javac –g" on .java files to keep debuginfo (lines, vars, source) in .class files LineNumberTable: line 5: 0 line 6: 8LocalVariableTable: Start Length Slot Name Signature 0 9 0 args [Ljava/lang/String;

  14. Java Program Representations Java source code.java javac Joeq Java bytecode.class Quadcode javap DisassembledJava bytecode

  15. Pretty-Printing Quadcode ant –Dchord.work.dir=<WORK_DIR>–Dchord.out.file=<OUTPUT_FILE> –Dchord.print.classes=<CLASS_NAMES>–Dchord.verbose=0 run • Class: test.HelloWorldMethod: main:([Ljava/lang/String;)V@test.HelloWorld 0#1 5#3 5#2 8#4Control flow graph:BB0 (ENTRY) (in: <none>, out: BB2)BB2 (in: BB0 (ENTRY), out: BB1 (EXIT))1: GETSTATIC_A T1, .out3: MOVE_A T2, AConst: "Hello World!" 2: INVOKEVIRTUAL_Vprintln:(Ljava/lang/String;)V@java.io.PrintStream, (T1,T2)4: RETURN_VBB1 (EXIT) (in: BB2, out: <none>)Exception handlers: []Register factory: Registers: 3 Alternative options: –Dchord.print.methods=<METHOD_SIGNS> –Dchord.print.all.classes=true Replace any `$` by `#` toprevent shell interpretation

  16. Type Hierarchy • jq_Type jq_Primitive jq_Reference jq_Class jq_Array (all defined in package joeq.Class)

  17. chord.program.Program API • static Program g() • fully-qualified name of the class, e.g., "java.lang.String[]" • IndexSet<jq_Type> getTypes() • all types in classes that may be loaded • IndexSet<jq_Reference> getClasses() • all classes that may be loaded • IndexSet<jq_Method> getMethods() • all methods that may be called

  18. joeq.Class.jq_Class API • String getName() • fully-qualified name of the class, e.g., "java.lang.String[]" • jq_InstanceField[] getDeclaredInstanceFields() • all instance fields declared in the class • jq_StaticField[] getDeclaredStaticFields() • all static fields declared in the class • jq_InstanceMethod[] getDeclaredInstanceMethods() • all instance methods declared in the class • jq_StaticMethod[] getDeclaredStaticMethods() • all static methods declared in the class

  19. joeq.Class.jq_Method API • String getName().toString() • name of the method • String getDesc().toString() • descriptor of the method, e.g., "(Ljava/lang/String;)V" • jq_ClassgetDeclaringClass() • declaring class of the method • ControlFlowGraphgetCFG() • control-flow graph of the method • Quad getQuad(intbci) • first quad at the given bytecode offset (null if missing) • intgetLineNumber(intbci) • line number of the given bytecode offset (-1 if missing) • String toString() • ID of the method in format mName:mDesc@cName

  20. Control Flow Graphs (CFGs) • Each CFG contains: • a set of registers (register factory) • a directed graph whose nodes are basic blocks and edges denote control flow • Register Factory: • one register per argument (local variables) • named R0, R1, …, Rn • one register per temporary (stack variables) • named Tn+1, Tn+2, …, Tm • Basic Block (BB): • sequence of primitive statements (quads) • unique entry BB: no quads and no incoming edges • unique exit BB: no quads and no outgoing edges

  21. joeq.Compiler.Quad.ControlFlowGraph API • RegisterFactorygetRegisterFactory() • set of all local variables • EntryOrExitBasicBlock entry() • unique entry basic block • EntryOrExitBasicBlock exit() • unique exit basic block • List<BasicBlock>reversePostOrder () • List of all basic blocks in reverse post-order • jq_MethodgetMethod() • containing method of the CFG

  22. joeq.Compiler.Quad.BasicBlock API • int size() • number of quads in the basic block • Quad getQuad(int index) • quad at the given 0-based index • List<BasicBlock> getPredecessors() • list of immediate predecessor basic blocks • List<BasicBlock> getSuccessors() • list of immediately successor basic blocks • jq_MethodgetMethod() • containing method of the basic block

  23. Quad Instructions • Each quad contains an operator and upto 4 operands • Example: getfield l = b.f: Operand lo = Getfield.getDest(q);Operand bo = Getfield.getBase(q);if (lo instanceofRegisterOperand &&boinstanceofRegisterOperand) { Register l = ((RegisterOperand) lo).getRegister(); Register b = ((RegisterOperand) bo).getRegister();jq_Field f = Getfield.getField(q).getField(); ...}

  24. Kinds of Quads • joeq.Compiler.Quad.Operator • Move Getstatic Branch Invoke • Phi PutstaticIntIfCmpInvokeVirtual • Unary GetfieldGotoInvokeStatic • Binary PutfieldJsrInvokeInterface • New ALoad Ret • NewArrayAStoreLookupSwitch • MultiNewArrayCheckcastTableSwitch • AlengthInstanceof • Monitor Return

  25. joeq.Compiler.Quad.Quad API • Operator getOperator() • kind of the quad • intgetBCI() • bytecode offset of the quad in its containing method • String toByteLocStr() • unique identifier of the quad in format offset!mName:mDesc@cName • String toJavaLocStr() • location of the quad in format fileName:lineNum in Java source code • String toLocStr() • location of the quad in both Java bytecode and source code • String toVerboseStr() • verbose description of the quad (its location plus contents) • BasicBlockgetBasicBlock() • containing basic block of the quad

  26. Traversing Quadcode • import chord.program.Program;import joeq.Class.jq_Method;import joeq.Compiler.Quad.*; QuadVisitor qv = new QuadVisitor.EmptyVisitor() { public void visitNew(Quad q) { ... } public void visitPhi(Quad q) { ... } ...}; • Program program = Program.g();for (jq_Method m : program.getMethods()) { if (!m.isAbstract()) {ControlFlowGraphcfg = m.getCFG(); for (BasicBlock bb : cfg.reversePostOrder()) for (Quad q : bb.getQuads())q.accept(qv); }}

  27. Java Program Representations Java source code.java HTMLizedJava source code.html j2h Java2HTML javac Joeq Java bytecode.class Quadcode javap DisassembledJava bytecode

  28. HTMLizing Java Source Code • Programmatically: • import chord.program.Program;Program program = Program.g();program.HTMLizeJavaSrcFiles(); • From command line: • Use j2h:ant –Djava.dir=<JAVA_DIR> –Dhtml.dir=<HTML_DIR> j2h_xref • Use Java2HTML:ant –Djava.dir=<JAVA_DIR> –Dhtml.dir=<HTML_DIR> j2h_fast

  29. Java Program Representations Java source code.java HTMLizedJava source code.html j2h Java2HTML javac Joeq Java bytecode.class Quadcode javap Chord Jasmin DisassembledJava bytecode Jasmin code.j

  30. Analysis Scope Construction • Determines which parts of the program to analyze • Computed in either of these cases: • chord.build.scope=true • chord.program.Program.g() is called • Algorithm specified by chord.scope.kind=[rta|cha|dynamic] • Rapid Type Analysis (RTA) • Class Hierarchy Analysis (CHA) • Dynamic Analysis • All three algorithms require specifying: • chord.main.class=<MAIN CLASS> • chord.class.path=<CLASSPATH>

  31. Analysis Scope Representation • Reachable Methods • stored in file specified by chord.methods.file(default = "[chord.out.dir]/methods.txt") • Resolved Reflection • stored in file specified by chord.reflect.file(default = "[chord.out.dir]/reflect.txt") mname:mdesc@cname... # resolvedClsForNameSites ... # resolvedObjNewInstSites ... # resolvedConNewInstSites ... # resolvedAryNewInstSites ... Class Class.forName(String) Object Class.newInstance() Object Constructor.newInstance(Object[]) Object Array.newInstance(Class, int) bci!mname:mdesc@cname->cname1,cname2,...,cnameN

  32. Rapid Type Analysis (RTA) • Preferred (and default) scope construction algorithm • Allows specifying reflection resolution via chord.reflect.kind=[none|static|dynamic] • Preferred way to resolve reflection is ‘dynamic’ and requires specifying how to run program: • chord.run.args=id1,…,idN • chord.args.id1=<ARGS1>, …, chord.args.idN=<ARGSN>

  33. Dynamic Analysis Based Scope Construction • Runs program and observes which classes are loaded • Requires JVMTI (set chord.use.jvmti=true in file main/chord.properties) • Requires specifying how to run program: • chord.run.args=id1,…,idN • chord.args.id1=<ARGS1>, …, chord.args.idN=<ARGSN> • All methods of each loaded class are deemed reachable • Currently no support for reflection resolution

  34. Additional Analysis Scope Features • Scope Reuse • Enables using scope constructed by a previous run of Chord • Constructs scope from files specified by chord.methods.fileand chord.reflect.file • Specified via chord.reuse.scope=true • Scope Exclusion • Enables excluding certain classes from scope • Treats all methods in such classes as no-ops • Specified via three properties: • 1. chord.std.scope.exclude (default = "") • 2. chord.ext.scope.exclude (default = "") • 3. chord.scope.exclude (default = "[chord.std.scope.exclude],[chord.ext.scope.exclude]")

  35. Native Method Stubs • Specified in file main/src/chord/program/stubs/stubs.txtin format: • mname:mdesc@cnamestub_cname • where stub_cname denotes a class implementing: • public interface joeq.Compiler.Quad.ICFGBuilder { public ControlFlowGraph run(jq_Method m);} • Example: start:()V@java.lang.Threadchord.program.stubs.ThreadStartCFGBuilder

  36. Example Native Method Stub • void start() {this.run(); return; } public ControlFlowGraphrun(jq_Method m) {jq_Classc = m.getDeclaringClass();jq_Method n = c.getDeclaredInstanceMethod( new jq_NameAndDesc("run", "()V"));RegisterFactory f = new RegisterFactory(0, 1); Register r = f.getOrCreateLocal(0, c);ControlFlowGraphcfg = new ControlFlowGraph(m, 1, 0, f); Quad q1 = Invoke.create(0, m, Invoke.INVOKEVIRTUAL_V.INSTANCE, null, new MethodOperand(n), 1);Invoke.setParam(q1, 0, new RegisterOperand(r, c)); Quad q2 = Return.create(1, m, RETURN_V.INSTANCE);BasicBlockbb = cfg.createBasicBlock(1, 1, 2, null);bb.appendQuad(q1); bb.appendQuad(q2);BasicBlockeb = cfg.entry(), xb = cfg.exit();eb.addSuccessor(bb); bb.addPredecessor(eb);bb.addSuccessor(xb); xb.addPredecessor(bb); return cfg;}

  37. Outline of Tutorial • Part 1: • Getting Started With Chord • Program Representation • Part 2: • Analysis Using Datalog/BDDs • Chaining Analyses Together • Part 3: • Context-Sensitive Analysis • Dynamic Analysis

  38. Program Domain • Building block for analyses based on Datalog/BDDs • Represents an indexed set of values of a fixed kind • typically artifacts from program being analyzed (e.g., set of all methods in the program) • Assigns unique 0-based index to each value • everything in Datalog/BDDs must be numbered • indices given in order in which values are added • order affects efficiency of running analysis on large sets • initial indices (0, 1, ...) typically given to frequently-usedvalues (e.g., the main method) • O(1) access to value given index, and vice versa

  39. Example Predefined Program Domains

  40. Writing a Program Domain Analysis package chord.analyses.method;@Chord(name = "M")public class DomM extends ProgramDom<jq_Method> {@Override public void fill() {Program program = Program.g();add(program.getMainMethod());jq_Method start = program.getThreadStartMethod();if (start != null) add(start);for (jq_Method m : program.getMethods()) add(m); }} • Domain M: all methods in the program • main method has index 0 • java.lang.Thread.start() method has index 1

  41. Running a Program Domain Analysis package chord.analyses.method;@Chord(name = "M")public class DomM extends ProgramDom<jq_Method> {@Override public void fill() {Program program = Program.g();add(program.getMainMethod());jq_Method start = program.getThreadStartMethod();if (start != null) add(start);for (jq_Method m : program.getMethods()) add(m); }} ant –Dchord.work.dir=<…> –Dchord.run.analyses=M run

  42. Running a Program Domain Analysis package chord.analyses.method;@Chord(name = "M")public class DomM extends ProgramDom<jq_Method> {@Override public void fill() {Program program = Program.g();add(program.getMainMethod());jq_Method start = program.getThreadStartMethod();if (start != null) add(start);for (jq_Method m : program.getMethods()) add(m); }} chord_output/ bddbddb/ M.map M.dom main:([Ljava/lang/String;)V@Bldgstart:()V@java.lang.Thread<init>:()V@Bldg… <N> M <N>M.map

  43. chord.project.analyses.ProgramDom<T> API • void setName(String name) • set name of domain • boolean add(T val) • add value to domain if not present; return true if added • intgetOrAdd(T val) • add value to domain if not present; return its index in either case • void save() • save domain to disk (.dom and .map files) • String toUniqueString(T val) • unique string representation of value • int size() • number of values in domain • T get(int index) • value having the given index; IndexOutofBoundsEx if not found • intindexOf(T val) • index of given value; -1 if not found Note: values once addedcannot be removed!

  44. Program Relation • Building block for analyses based on Datalog/BDDs • Represents a set of tuples over one or more fixed program domains • Represented symbolically as a BDD • enables storing and manipulating large relations efficiently • Provides various relational operations • projection, selection, join, etc. • BDD size and efficiency of operations depends heavily on encoding of relation content as opposed to size • ordering of values within program domains • relative ordering between program domains

  45. Writing a Program Relation Analysis • package chord.analyses.invk;@Chord(name = "MI", sign = "M0,I0:M0_I0")public class RelMIextends ProgramRel { @Override public void fill() {DomIdomI = (DomI) doms[1]; for (Quad q : domI) {jq_Methodm = q.getMethod();add(m, q); } }} Relation MI: tuples (m, i) such that method m contains call i • M0,I0: Domain names • Order mnemonically (hard to change over time) • Suffix 0, 1, etc. distinguishes repeating domains • M0_I0: Domain order • Only dictates performance • Can also be I0_M0 or I0xM0 • Easy to change over time

  46. Writing a Program Relation Analysis package chord.analyses.var;@Chord(name = "VT", sign = "V0,T0:T0_V0")public class RelVTextends ProgramRel{ @Override public void fill() { for (each RegisterOperand o of each quad) { Register v = o.getRegister();jq_Type t = o.getType(); add(v, t); } }} Relation VT: tuples (v, t) such that local variable v has type t

  47. Running a Program Relation Analysis package chord.analyses.var;@Chord(name = "VT", sign = "V0,T0:T0_V0")public class RelVTextends ProgramRel{ @Override public void fill() { for (each RegisterOperand o of each quad) { Register v = o.getRegister();jq_Type t = o.getType(); add(v, t); } }} ant –Dchord.work.dir=<…> –Dchord.run.analyses=VT run

  48. Running a Program Relation Analysis package chord.analyses.var;@Chord(name = "VT", sign = "V0,T0:T0_V0")public class RelVTextends ProgramRel{ @Override public void fill() { for (each RegisterOperand o of each quad) { Register v = o.getRegister();jq_Type t = o.getType(); add(v, t); } }} # V0:2 T0:2# 1 2# 3 46 42 1 4 37 4 0 16 3 7 15 3 0 74 2 5 03 2 6 52 1 3 4 chord_output/ bddbddb/ V.dom, T.dom,V.map, T.map VT.bdd

  49. Program Relation as Binary Function Variable v0 has types t1, t2, t3 Variable v1 has type t3 Variable v2 has type t3 Relation VT = { (0, 1), (0, 2), (0, 3), (1, 3), (2, 3) }

  50. BDD: Binary Decision Diagrams (Bryant 1986) b1 0 edge 1 edge b2 b2 b3 b3 b3 b3 b4 b4 b4 b4 b4 b4 b4 b4 0 1 1 1 0 0 0 1 0 0 0 1 0 0 0 0 Graphical Encoding of a Binary Function

More Related