Tao Xie University of Illinois at Urbana- Champaign ,USA taoxie@illinois.edu

SBQS 2013 Cooperative Testing and Analysis: Human-Tool, Tool-Tool, and Human-Human Cooperations to Get Work Done Tao Xie University of Illinois at Urbana-Champaign,USA taoxie@illinois.edu

Turing Test Tell Machine and Human Apart

Human vs. MachineMachine Better Than Human? IBM's Deep Blue defeated chess champion Garry Kasparov in 1997 IBM Watson defeated top human Jeopardy! players in 2011

Global Trend: Automation Replacing Human IBM Watson as Jeopardy! player Google’s driverless car Microsoft's instant voice translation tool

CAPTCHA: Human is Better "Completely Automated Public Turing test to tell Computers and Humans Apart"

Human-Computer Interaction iPad Movie: Minority Report CNN News

Human-Centric Software Engineering …

Task Allocation of Machine and Human • Machine is better at task set A • Mechanical, tedious, repetitive tasks, … • Ex. solving constraints along a long path • Human is better at task set B • Intelligence, human intent, abstraction, domain knowledge, … • Ex. local reasoning after a loop, recognizing naming semantics = A UB

Mutually Enhanced Demands on Automation and Human Factors Ironies of Automation “Even highly automated systems, such as electric power networks, need human beings... one can draw the paradoxical conclusion that automated systems still are man-machine systems, for which both technical and human factors are important.” “As the plane passed 39 000 feet, the stall and overspeed warning indicators came on simultaneously—something that’s supposed to be impossible, and a situation the crew is not trained to handle.” IEEE Spectrum 2009 Malaysia Airlines Flight 124 @2005 Lisanne Bainbridge, "Ironies of Automation”, Automatica 1983 .

Mutually Enhanced Demands on Automation and Human Factors Ironies of Automation “The increased interest in human factors among engineers reflects the irony that the more advanced a control system is, so the more crucial may be the contribution of the human operator.” Malaysia Airlines Flight 124 @2005 Lisanne Bainbridge, "Ironies of Automation”, Automatica 1983 .

Takeaway Messages • Don’t forget human factors • Using your tools as end-to-end solutions • Helping your tools • Don’t forget cooperations of human and tool; human and human • Human can help your tools too • Human and human could work together to help your tools, e.g., crowdsourcing

Google Scholar: “Pointer Analysis”

“Pointer Analysis: Haven’t We Solved This Problem Yet?” [Hind PASTE 2001] “During the past 21 years, over 75 papers and 9 Ph.D. theses have been published on pointer analysis. Given the tones of work on this topic one may wonder, “Haven't we solved this problem yet?'' With input from many researchers in the field, this paper describes issues related to pointer analysis and remaining open problems.” Michael Hind. Pointer analysis: haven't we solved this problem yet?. In Proc. ACM SIGPLAN-SIGSOFT Workshop on Program Analysis for Software Tools and Engineering (PASTE 2001)

“Pointer Analysis: Haven’t We Solved This Problem Yet?” [Hind PASTE 2001] Section 4.3 Designing an Analysis for a Client’s Needs “Barbara Ryder expands on this topic: “… We can all write an unbounded number of papers that compare different pointer analysis approximations in the abstract. However, this does not accomplish the key goal, which is to design and engineer pointer analyses that are useful for solving real software problems for realistic programs.”

Google Scholar: “Clone Detection”

Some Success Stories of Applying Clone Detection MSRA XIAO Zhenmin Li, Shan Lu, SuvdaMyagmar, and Yuanyuan Zhou. CP-Miner: a tool for finding copy-paste and related bugs in operating system code. In Proc.OSDI 2004. Yingnong Dang, Dongmei Zhang, Song Ge, Chengyun Chu, YingjunQiu, and Tao Xie. XIAO: Tuning code clones at hands of engineers in practice. In Proc. ACSAC 2012 Human to Determine What are Serious (Known) Bugs MSR 2011 Keynote by YY Zhou: Connecting Technology with Real-world Problems – From Copy-paste Detection to Detecting Known Bugs

XIAO: Clone Detection@MSRA Finding refactoring opportunity Available in Visual Studio 2012 Searching similar snippets for fixing bug once XIAO Code Clone Search service integrated into workflow of Microsoft Security Response Center (MSRC) Microsoft Technet Blog about XIAO: We wanted to be sure to address the vulnerable code wherever it appeared across the Microsoft code base. To that end, we have been working with Microsoft Research to develop a “Cloned Code Detection” system that we can run for every MSRC case to find any instance of the vulnerable code in any shipping product. This system is the one that found several of the copies of CVE-2011-3402 that we are now addressing with MS12-034. Yingnong Dang, Dongmei Zhang, Song Ge, YingjunQiu, and Tao Xie. XIAO: Tuning code clones at hands of engineers in practice. In Proc. Annual Computer Security Applications Conference (ACSAC 2012)

XIAO: Enabling Human Tuning and Investigation • XIAO enables code clone analysis with • High scalability, High compatibility • High tunability: what you tune is what you get • High explorability: How to navigate through the large number of detected clones? How to quickly review a pair of clones?

"Are Automated Debugging [Research] Techniques Actually Helping Programmers?" • 50 years of automated debugging research • N papers  only 5 evaluated with actual programmers “ ” Chris Parnin and Alessandro Orso. Are automated debugging techniques actually helping programmers?. In Proc. ISSTA 2011

Human Factors in Real World • Academia • Tend to leave human out of loop (involving human makes evaluations difficult to conduct or write) • Tend not to spend effort on improving tool usability • tool usability would be valued more in HCI than in SE • too much to include both the approach/tool itself and usability/its evaluation in a single paper • Real-world • Often has human in the loop (familiar IDE integration, social effect, lack of expertise/willingness to write specs,…) • Examples • Agitar [ISSTA 2006] vs. Daikon [TSE 2001] • Test generation in Pex based on constraint solving

NSF Workshop on Formal Methods • Goal: to identify the future directions in research in formal methods and its transition to industrial practice. • The workshop will bring together researchers and identify primary challenges in the field, both foundational, infrastructural, and in transitioning ideas from research labs to developer tools. http://goto.ucsd.edu/~rjhala/NSFWorkshop/

Example Barriers Related to Human Factors • “Lack of education amongst practitioners” • “Education of students in logic and design for verification” • “Expertise required to create and use a verification tool. E.g., both Astre for Airbus and SDV for Windows drivers were closely shepherded by verification experts.” • “Tools require lots of up-front effort (e.g., to write specifications)” • “User effort required to guide verification tools, such as assertions or specifications”

Example Barriers Related to Human Factors • “Not integrated with standard development flows (testing)” • “Too many false positives and no ranking of errors” • “General usability of tools, in terms of false alarms and error messages. The Coverity CACM paper pointed out that they had developed features that they do not deploy because they baffle users. Many tools choose unsoundness over soundness to avoid false alarms.”

Example Barriers Related to Human Factors • “The necessity of detailed specifications and complex interaction with tools, which is very costly and discouraging for industrial, who lack high-level specialists.” • “Feedback to users. It’s difficult to explain to users why automated verification tools are failing. Counterexamples to properties can be very difficult for users to understand, especially when they are abstract, or based on incomplete environment models or constraints.”

Automation in Software Testing http://www.dagstuhl.de/programm/kalender/semhp/?semnr=1011 2010 Dagstuhl Seminar 10111 PracticalSoftware Testing: Tool Automation and Human Factors

Automation in Software Testing Human Factors http://www.dagstuhl.de/programm/kalender/semhp/?semnr=1011 2010 Dagstuhl Seminar 10111 PracticalSoftware Testing: Tool Automation and Human Factors

Human-Centric SE Example: Whyline Andy Ko and Brad Myers. Debugging Reinvented: Asking and Answering Why and Why Not Questions about Program Behavior. In Proc. ICSE 2008

Takeaway Messages • Don’t forget human factors • Using your tools as end-to-end solutions • Helping your tools • Don’t forget cooperations of human and tool intelligence; human and human intelligence • Human can help your tools too • Human and human could work together to help your tools, e.g., crowdsourcing

ReflexionModels • Motivation • Architecture recovery is challenging (abstraction gap) • Human typically has high-level view in mind • Repeat • Human: define/update high-level modelof interest • Tool: extract a source model • Human: define/update declarative mapping between high-level model and source model • Tool: compute a software reflexion model • Human: interpret the software reflexionmodel Until happy Gail C. Murphy, David Notkin. Reengineering with Reflection Models: A Case Study. IEEE Computer 1997

State-of-the-Art/Practice Test Generation Tools Running Symbolic PathFinder ... … ====================================================== results no errors detected ====================================================== statistics elapsed time: 0:00:02 states: new=4, visited=0, backtracked=4, end=2 search: maxDepth=3, constraints=0 choice generators: thread=1, data=2 heap: gc=3, new=271, free=22 instructions: 2875 max memory: 81MB loaded code: classes=71, methods=884 …

Automated Test Generation • Recent advanced technique: Dynamic Symbolic Execution/Concolic Testing • Instrument code to explore feasible paths • Example tool: Pex from Microsoft Research (for .NET programs) L. A. Clarke. A system to generate test data and symbolically execute programs. TSE 1976. J. C. King. Symbolic execution and program testing. CACM 1976. P. Godefroid, N. Klarlund, and K. Sen. DART: directed automated random testing. PLDI 2005 K. Sen, D. Marinov, and G. Agha. CUTE: a concolic unit testing engine for C. ESEC/FSE 2005 N. Tillmann and J. de Halleux. Pex - White Box Test Generation for .NET. TAP 2008

Dynamic Symbolic Execution Choose next path Code to generate inputs for: Solve Execute&Monitor void CoverMe(int[] a) { if (a == null) return; if (a.Length > 0) if (a[0] == 1234567890) throw new Exception("bug"); } Negated condition a==null F T a.Length>0 T F Done: There is no path left. a[0]==123… F T Data null {} {0} {123…} Observed constraints a==null a!=null && !(a.Length>0) a!=null && a.Length>0 && a[0]!=1234567890 a!=null && a.Length>0 && a[0]==1234567890 Constraints to solve a!=null a!=null && a.Length>0 a!=null && a.Length>0 && a[0]==1234567890

Test Generation by Pex Released since 2008 • Download countsinitial 20 months of releaseAcademic: 17,366 • Industrial: 13,022 • Total: 30,388 Pex detected various bugs (including a serious bug) in a core .NET component (already been extensively tested over 5 years by 40 testers) , used by thousands of developers and millions of end users. “It has saved me two major bugs (not caught by normal unit tests) that would have taken at least a week to track down and fix normally plus a few smaller issues so I'm a big proponent of Pex.” http://research.microsoft.com/projects/pex/ 34

Automating Test Generation • Method sequences • MSeqGen/Seeker [Thummalapenta et al. OOSPLA 11, ESEC/FSE 09], Covana[Xiao et al. ICSE 2011], OCAT [Jaygarl et al. ISSTA 10], Evacon[Inkumsah et al. ASE 08], Symclat[d'Amorim et al. ASE 06] • Environments e.g., db, file systems, network, … • DBApp Testing [Taneja et al. ESEC/FSE 11], [Pan et al. ASE 11] • CloudApp Testing [Zhang et al. IEEE Soft 12] • Loops • Fitnex[Xie et al. DSN 09] http://people.engr.ncsu.edu/txie/publications.htm

Unit Test Generation: Replace Human or Get Human Out of the Loop Class Under Test 00: classGraph { … 03: public void AddVertex(Vertex v) { 04: vertices.Add(v); 05: } 06: public Edge AddEdge(Vertex v1, Vertex v2) { … 15: } 16: } Manual Test Generation: Tedious, Missing Special/Corner Cases, … Generated Unit Tests • void test2() { • Graph ag= new Graph(); • Vertex v1 = new Vertex(0); • ag.AddEdge(v1, v1); • } void test1() { Graph ag = new Graph(); Vertex v1 = new Vertex(0); ag.AddVertex(v1); } … 36 36

State-of-the-Art/Practice Test Generation Tools Running Symbolic PathFinder ... … ====================================================== results no errors detected ====================================================== statistics elapsed time: 0:00:02 states: new=4, visited=0, backtracked=4, end=2 search: maxDepth=3, constraints=0 choice generators: thread=1, data=2 heap: gc=3, new=271, free=22 instructions: 2875 max memory: 81MB loaded code: classes=71, methods=884 …

Challenges Faced by Test Generation Tools • Ex: Dynamic Symbolic Execution (DSE) /Concolic Testing • Instrumentcode to explore feasible paths • Challenge: path explosion When desirable receiver or argument objects are not generated Total block coverage achieved is 50%, lowest coverage 16%. • object-creation problems (OCP) - 65% • external-method call problems (EMCP) – 27%

Example Object-Creation Problem 00: classGraph { … 03: public void AddVertex(Vertex v) { 04: vertices.Add(v); // B1 } 06: public Edge AddEdge(Vertex v1, Vertex v2) { 07: if (!vertices.Contains(v1)) 08: throw new VNotFoundException(""); 09: // B2 10: if (!vertices.Contains(v2)) 11: throw new VNotFoundException(""); 12: // B3 14: Edge e = new Edge(v1, v2); 15: edges.Add(e); } } //DFS:DepthFirstSearch 18: classDFSAlgorithm{ … 23: public void Compute (Vertex s) { ... 24: if (graph.GetEdges().Size() > 0) { // B4 25: isComputed = true; 26: foreach (Edge e ingraph.GetEdges()) { 27: ... // B5 28: } 29: } } } [OOPSLA 11] • A graph example from QuickGraph library • Includes two classes Graph DFSAlgorithm • Graph AddVertex AddEdge: requires both vertices to be in graph 39 39

Example Object-Creation Problem • Test target: Cover true branch (B4) of Line 24 • Desired object state: graph should include at least one edge • Target sequence: • Graph ag = new Graph(); • Vertex v1 = new Vertex(0); • Vertex v2 = new Vertex(1); • ag.AddVertex(v1); • ag.AddVertex(v2); • ag.AddEdge(v1, v2); • DFSAlgorithmalgo = new DFSAlgorithm(ag); • algo.Compute(v1); 00: classGraph { … 03: public void AddVertex(Vertex v) { 04: vertices.Add(v); // B1 } 06: public Edge AddEdge(Vertex v1, Vertex v2) { 07: if (!vertices.Contains(v1)) 08: throw new VNotFoundException(""); 09: // B2 10: if (!vertices.Contains(v2)) 11: throw new VNotFoundException(""); 12: // B3 14: Edge e = new Edge(v1, v2); 15: edges.Add(e); } } //DFS:DepthFirstSearch 18: classDFSAlgorithm{ … 23: public void Compute (Vertex s) { ... 24: if (graph.GetEdges().Size() > 0) { // B4 25: isComputed = true; 26: foreach (Edge e ingraph.GetEdges()) { 27: ... // B5 28: } 29: } } } [OOPSLA 11] 40 40

Challenges Faced by Test Generation Tools • Ex: Dynamic Symbolic Execution (DSE) /Concolic Testing • Instrument code to explore feasible paths • Challenge: path explosion • Typically DSE instruments or explores only methods @ project under test; • Third-party API external methods (network, I/O, ..): • too many paths • uninstrumentable Total block coverage achieved is 50%, lowest coverage 16%. • object-creation problems (OCP) - 65% • external-method call problems (EMCP) – 27%

Example External-Method Call Problems (EMCP)

Challenges Faced by Test Generation Tools • Ex: Dynamic Symbolic Execution (DSE) /Concolic Testing • Instrumentcode to explore feasible paths • Challenge: path explosion Total block coverage achieved is 50%, lowest coverage 16%. Xusheng Xiao, Tao Xie, Nikolai Tillmann, and Jonathan de Halleux. Precise Identification of Problems for Structural Test Generation. In Proc. ICSE 2011

What to Do Next? 2010 Dagstuhl Seminar 10111 Practical Software Testing: Tool Automation and Human Factors

Conventional Wisdom: Improve Automation Capability @NCSU ASE • Tackling object-creation problems • Seeker [OOSPLA 11] , MSeqGen[ESEC/FSE 09] Covana[ICSE 11], OCAT [ISSTA 10]Evacon[ASE 08], Symclat[ASE 06] • Still not good enough (at least for now)! • Seeker (52%) > Pex/DSE (41%) > Randoop/random (26%) • Tackling external-method call problems • DBApp Testing [ESEC/FSE 11], [ASE 11] • CloudApp Testing [IEEE Soft 12] • Deal with only common environment APIs

Example Object Creation Problem (OCP) • Test target: Cover true branch (B4) of Line 24 • Desired object state: graph should include at least one edge • Target sequence: • Graph ag = new Graph(); • Vertex v1 = new Vertex(0); • Vertex v2 = new Vertex(1); • ag.AddVertex(v1); • ag.AddVertex(v2); • ag.AddEdge(v1, v2); • DFSAlgorithmalgo = new DFSAlgorithm(ag); • algo.Compute(v1); 00: classGraph { … 03: public void AddVertex(Vertex v) { 04: vertices.Add(v); // B1 } 06: public Edge AddEdge(Vertex v1, Vertex v2) { 07: if (!vertices.Contains(v1)) 08: throw new VNotFoundException(""); 09: // B2 10: if (!vertices.Contains(v2)) 11: throw new VNotFoundException(""); 12: // B3 14: Edge e = new Edge(v1, v2); 15: edges.Add(e); } } //DFS:DepthFirstSearch 18: classDFSAlgorithm{ … 23: public void Compute (Vertex s) { ... 24: if (graph.GetEdges().Size() > 0) { // B4 25: isComputed = true; 26: foreach (Edge e ingraph.GetEdges()) { 27: ... // B5 28: } 29: } } } 46 46

Unconventional Wisdom: Human Can Help! Object Creation Problems (OCP) Tackle object-creation problems with Factory Methods

Unconventional Wisdom: Human Can Help! External-Method Call Problems (EMCP) Tackle external-method call problems with Mock Methods or Method Instrumentation Mocking System.IO.File.ReadAllText

CooperativeSoftware Testing and Analysis • Human-AssistedComputing • Driver: tool Helper: human • Ex. Covana[ICSE 2011] • Human-CentricComputing • Driver: human  Helper: tool • Ex. Pex for Fun[ICSE 2013 SEE] Interfaces are important. Contentsare important too!

Example Problems Faced by Tools Symptoms all non-primitive program inputs/fields object-creation problems (OCP) external-method call problems (EMCP) (Likely) Causes • all executed external-method calls

Tao Xie University of Illinois at Urbana- Champaign ,USA taoxie@illinois.edu

Tao Xie University of Illinois at Urbana- Champaign ,USA taoxie@illinois.edu

Presentation Transcript

University of Illinois at Urbana-Champaign (UIUC)

University of Illinois at Urbana-Champaign UIUC

Champaign/Urbana, Illinois

Presenter : Megan University of Illinois at Urbana-Champaign

University of Illinois at Urbana-Champaign

Tao Xie University of Illinois at Urbana-Champaign

University of Illinois Urbana-Champaign

University of Illinois at Urbana-Champaign

University of Illinois at Urbana-Champaign WELCOME

Mickey Chiu University of Illinois at Urbana-Champaign

University of Illinois Urbana-Champaign

University of Illinois at Urbana-Champaign

University of Illinois at Urbana-Champaign

Joe Mahoney University of Illinois at Urbana-Champaign

Joe Mahoney University of Illinois at Urbana-Champaign

Joe Mahoney University of Illinois at Urbana-Champaign

University of Illinois at Urbana-Champaign WELCOME

Joe Mahoney University of Illinois at Urbana-Champaign

Joe Mahoney University of Illinois at Urbana-Champaign

Mickey Chiu University of Illinois at Urbana-Champaign

University of Illinois at Urbana-Champaign XML Metadata