Query-Based Debugging

Query-Based Debugging Raimondas Lencevicius Department of Computer Science, UCSB

Debugging of OO Programs • Symbolic debugging • Control flow debugging • Object state monitoring • Data breakpoints • Conditional breakpoints • Debugging of abstract relationships? • Complex object relationships

Debugging Object Relationships • Programmers need to find objects violating relationships • “Are there any windows that do not reference some child widget?” • Current debuggers provide only low-level views • Programmers have to write special testing code

Goals of Query-Based Debugging • Make debugging of data structures easier by answering questions about object relationships • Explore unfamiliar programs • Find data structure errors as soon as they occur

Query-Based Debugging • Ask common questions about program state • Quickly access sets of interesting objects • Check properties of large groups of objects using single query • Answer queries while program is running • Provide functionality efficiently

Windows and Widgets widget collection window widget1 widget2 parent window Graphical user interface: Window Widgets Program:

widget collection window widget1 parent window Query Example • “Are there any windows that do not reference some child widget?”

Talk Overview • Query case study • Query model • Implementation of debugger • Dynamic queries • Experimental results • Future work • Conclusions

Java Compiler - Case Study Goal: understand and debug Java subset compiler written for UCSB compiler course Variety of queries “Can the current lexer token refer to an unitialized token?” “Can identifiers declared in the same scope have the same name and type?” “Can methods have the same name?”

Java Compiler - Case Study “Can methods have the same name?” Experiment with input file containing such methods: … static int isOne(int c) { return 0;} … static int isOne(int c) { return 1; } …

Java Compiler - Case Study “Can methods have the same name?” Debugger gives positive answer But not a program error Compiler finds duplicate methods in later phaseSemanticException: The name `isOne' at line 27 chars 14 to 20 was already declared. MethodDeclaration public Id name >> “isOne”… Code >>…(ReturnStmt,Num"0")... MethodDeclaration public Id name >> “isOne”… Code >>…(ReturnStmt,Num”1")...

Java Compiler Example Summary • Explore unfamiliar program • Find a possible error • Further program investigation shows that there is no error • Use query as invariant to verify program’s execution • Dynamic query

Query Model Search domain • Widget wid; Window win. (wid.window == win) && (! win.widgetCollection.contains(wid)) Constraint expression in conjunctive form • Arbitrary boolean constraint expression • Assumption: side-effect free methods • Selection and join queries

Java Compiler Example “Can methods have the same name?” MethodDecl x y.(x.name.spelling == y.name.spelling)&& (x != y)

Domain collector Variable types User input Domain collections Query string Domain sizes Execution module Parser Optimizer Code generator Intermediate form Optimized form Generated code GUI Output Static Query Implementation

Overview of Implementation • Enumeration primitive: finds all instances of domain • Join ordering: finds good order to evaluate query • Hash joins: speed up equality constraints • Incremental delivery: shows first result early

d1 x1 x1 x1 x1 d1 m2 m2 m2 m1 m1 m1 ce1 ce1 ce1 CallExpression ce Declaration d Method m (d.contains(m))? (ce.decl == m)? Query Execution “Find all declared methods returning integers and called at least once” Declaration d; Method m; CallExpression ce.(d.contains(m)) && (ce.decl == m) &&(m.typeName != “int”)

1% 1% 10% 10% 10 200 10 100 200 Join Ordering Inefficient ordering 200 2000 200 100 10 Efficient ordering

Join Ordering • Join execution order significantly influences performance cecil_method a b; cecil_formal c d. (a.formals.includes(c)) && (b.formals.includes(d)) && (c.name == d.name) && (a != c) && (b != d) • Naïve evaluation of Cartesian product is slow • Straightforward order takes 37 seconds • Optimized order takes 6 seconds. • Problem is NP-complete • System uses heuristics

X = Y Hash Joins Nested-loop joins 100 200 20,000 operations Hash joins X = Y 200 100 300 operations

x1 x1 d1 d1 x1 x1 m1 m2 m2 m1 m1 m2 ce1 ce1 ce1 CallExpression ce Declaration d Method m (d.contains(m))? (ce.decl == m)? Incremental Delivery • Show first result early by pushing intermediate results through pipeline

Incremental Delivery • Goal: fast response for most queries • Pipelining • Joins are separate threads connected in pipeline by limited-size buffers • Thread blocks on empty input or full output • Scheduler prefers threads closer to the end of pipeline • Time-slicing • Interrupt “slow” threads and reschedule

Gas Tank - Case Study • Goal: to debug a gas tank simulation applet • Inter-object constraints • Molecules should stay inside the gas tank • Molecules should not occupy the same position

Gas Tank - Case Study • Detecting an error is not enough • What code led to this error? • Need dynamic queries! Blue molecule x = 20, y = 25 Red molecule x = 20, y = 25

Gas Tank - Case Study • Dynamic query finds error in Move methodpublic void move() {… x += (int)(v*Math.cos(dtor(dir)));y += (int)(v*Math.sin(dtor(dir))); … • Fix the errory += (int)(v*Math.sin(dtor(dir)));if collided() then handleCollision(); • But debugger still shows an error • Exclude “atomic” regions

Motivation of Dynamic Queries • Close cause-effect gap between error and its discovery • Errors are reported as soon as they occur • Display dynamics of objects’ relationships - visualization • Perform continuous invariant or assertion checks

Instrumented Java Program Custom Class Loader Java Program Standard Java Virtual Machine Query Results CustomDebugger Code DebuggerLibrary Code Query String and Change Set Dynamic Query Implementation

Implementation of Dynamic Queries • Monitor changes that affect query result • Invoke debugger when change occurs • Reevaluate query efficiently - incrementally

Change Monitoring Molecule m1, m2.(m1.x == m2.x) && (m1.y == m2.y) && (m1 != m2) • When to reevaluate? • What to monitor? • Change set - objects and fields affecting result of query • Domain objects • Referenced fields Molecule <init>, x, y • Objects and fields referenced in methods

…x += … ; … … 22: iadd 23: putfield 37 26: aload_0 … … 22: iadd 23: invokestatic debug 26: aload_0 … public final class DebuggingCode implements RunTimeCode { public static void debug(Molecule updatedObject, int newValue) { … updatedObject.x = newValue; // replaces putfield 37 QueryTool.runTool(updatedObject); // invokes query evaluator }} Instrumentation Molecule m1, m2.(m1.x == m2.x) && (m1.y == m2.y) && (m1 != m2) Compile Load and Instrument

Implementation of Monitoring • Java bytecode instrumented during load time • Custom class loader • Uses modified class file handling tools from BCA library • Creation and deletion of domain objects • Creation monitored by instrumenting constructors • Deletion handled by GC - not implemented yet • Modification of change set fields • Instrumentation of field assignments

Efficient Query Reevaluation • Same techniques as static queries • Join ordering • Hash joins • Incremental reevaluation • Custom code generation for selection queries

10% 1% 1% 10% Incremental Reevaluation Original query: A * B * C 200 200 200 100 Old results 10 Incremental query: A * B * C 1 1 1 100 10

Molecule m x: 5 Query Reevaluation Optimizations Molecule m1, m2.(m1.x == m2.x) && (m1.y == m2.y) && (m1 != m2) • Same value assignments • Do not change result - no reevaluation required • Fast selection queries • Lean custom code … x = 5; …

Static Query Experiments • Setup: Sun Ultra 2/200 (200 Mhz UltraSparc) running modified Self 4.0 • Queries • Self GUI • Cecil compiler • Synthetic stress tests • Different query structures

3.5 3 2.5 2 1.5 1 0.5 0 Static Query Evaluation Time 20.7 5.9 4.5K x 4.5K Completion Time Response Time 1804 join Translation Time Primitive Time Costly selection Time (sec) 11K x 4.5K hash join 12 x 146 x 370 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 Self GUI Cecil comp Points and rectangles Query number

Discussion of Static Query Experiments • Most queries take less than a second to execute • Join ordering heuristic performs well • Hash joins can speed up execution • Incremental delivery decreases response time

Discussion of Results • Query 17 • 5,000x5,000 = 25,000,000 checks • Query 18 • Complex, large intermediate results

Dynamic Query Experiments • Implemented in fully portable Java 1.2 • Setup: Sun Ultra 2/2300 (300 Mhz UltraSparc II) running Sun Solaris Java 1.2 with JIT compiler • Queries • Gas tank • Decaf compiler • SPECjvm98 applications: • Synthetic stress test microbenchmarks • Jess expert system • compress • Ray tracer

Gas tank Compress Jess Decaf Ray tracer Program Slowdown - Selections 5.83 3.5 3 • Overhead does not depend on domain size • Query 4: z.OutCnt < 0Queries 5-6: z.count() < 0,Query 7: z.costlyMathCount(0) • Query 12: point.radialDistanceGreaterThan(100M) 2.5 2 Slowdown 1.5 1 0.5 0 Invocation frequency 1.9M/s 2.3M/s 1 2 3 4 5 6 7 8 9 10 11 12 Query number

Program Slowdown - Joins • Practical for infrequent invocations

Discussion of Dynamic Query Experiments • Selections are efficient • Join queries practical for infrequent evaluations and small query domains • Can we predict debugger performance for wide class of queries? • Query execution model

Performance Model Tinstrumented = Toriginal (1 + Tevaluate * Fevaluate) • Slowdown depends on • Frequency of debugger invocations • Selections: Tevaluate = 131 ns - 4.26 s • Joins: Tevaluate = 5.7 s - 546 s

Field Assignment Frequencies 250 200 • Microbenchmark: 40M assignments per second • SPECjvm98 suite • Max frequency: 1.9M assignments per second in compress • 95% fields have < 100K assignments per second 150 Number of fields 100 50 0 1 5 10 50 2M 0.1 1M 0.5 100 500 50K 10K 1000 5000 100K 500K Field assignment frequency

Selection Slowdown Estimates • 500K assignments per second • 6.5% overhead for Tevaluate = 130 ns • 313% overhead for Tevaluate = 4.26 s • 95% fields have < 100K assignments per second • 43% overhead for 4.26 s selection constraints

Summary of Dynamic Queries • Selection queries are efficient • Less than factor 2 slowdown in experiments including stress tests • Projected less than 43% overhead for most selection queries • Join queries are efficient for infrequent evaluations • 2-930 factor slowdown on join queries

Query-Based Debugging

Query-Based Debugging

Presentation Transcript

Debugging

Debugging

Debugging

Debugging

Debugging

Trace-Based Debugging in Constraint Programming

Understanding Temporal Intent of User Query based on Time-based Query Classification

Query-based Rule Modeling

Debugging

Debugging

Query-Based Data Pricing

Logic Based Query Languages

Debugging

Debugging !!! 

Learning Based Web Query Processing

Debugging

Debugging

Debugging