1 / 30

Towards Optimization-Safe Systems: Analyzing the Impact of Undefined Behavior

24 th ACM SOSP (November, 2013) Best Paper. Towards Optimization-Safe Systems: Analyzing the Impact of Undefined Behavior. Xi Wang, Nickolai Zeldovich , M . Frans Kaashoek , Armando Solar- Lezama MIT CSAIL. Outline. Introduction Model for Unstable Code Design & Implementation

masako
Download Presentation

Towards Optimization-Safe Systems: Analyzing the Impact of Undefined Behavior

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 24th ACM SOSP (November, 2013) Best Paper Towards Optimization-Safe Systems: Analyzing the Impact of Undefined Behavior Xi Wang, NickolaiZeldovich, M. FransKaashoek, Armando Solar-Lezama MIT CSAIL

  2. Outline • Introduction • Model for Unstable Code • Design & Implementation • Evaluation A Seminar at Advanced Defense Lab

  3. Introduction • The specifications of C-family languages designate certain code fragments as having undefined behavior. • giving compilers the freedom to generate instructions • Aiming for system programming, the specifications choose to trust programmersand assume that their code will never invoke undefined behavior. A Seminar at Advanced Defense Lab

  4. Undefined Behavior in C • p, q, p’: n-bit pointer • x, y : n-bit integer • a : array A Seminar at Advanced Defense Lab

  5. Compiler Optimization • One way in which compilers exploit undefined behavior is to optimize a program under the assumption that the program NEVER invokes undefined behavior. • Consequence: • Origin program ≠ Optimized program • We call such code optimization-unstable code, or just unstable code for short. A Seminar at Advanced Defense Lab

  6. Unstable Code Example • Vulnerability Note VU#162289 (US-CERT) [link] A Seminar at Advanced Defense Lab =>Compiler think: always false

  7. Unstable Code Example (cont.) • CVE-2009-1897 [link] • Linux Kernel 2.6.30 [LXR link] • Programmer put the check at an improper position, but it can work... A Seminar at Advanced Defense Lab =>Compiler think: always false

  8. Is this programmers’ fault? • Poor understanding of unstable code is a major obstacle to reasoning about system behavior. • However, these bugs are quite subtle, and understanding them requires detailed knowledge of the language specification. A Seminar at Advanced Defense Lab

  9. Is this compilers’ fault? • A story: GCC bug #30475 (2007/01/15) [link] • “This will create MAJOR SECURITY ISSUES in ALL MANNER OF CODE. I don’t care if your language lawyers tell you gccis right. . . . FIX THIS! NOW!” • A GCC user • “I am not joking, the C standard explictly says signed integer overflow is undefined behavior. . . . GCC is not going to change.” • A GCC developer A Seminar at Advanced Defense Lab

  10. Unstable Code Test • The default optimization level for release build is -O2. A Seminar at Advanced Defense Lab

  11. Model for Unstable Code • C*: a C dialect that assigns well-defined semantics to code fragments that have undefined behavior in C. • P: Program • e: expression or code fragment • P[e/e’]: replace e in program P with e’ • Definition: Unstable code • A code fragment e in program P is unstable w.r.t. language specifications C and C* iffthere exists a fragment e’such that is legal under C but not under C*. A Seminar at Advanced Defense Lab

  12. Approach for Identifying Unstable Code • Stack does this using a two-phase scheme • Run optimizer Owithout taking advantage of undefined behavior, which resembles optimizations under C* • Run optimizer Oagain, this time taking advantage of undefined behavior, which resembles (more aggressive) optimizations under C. A Seminar at Advanced Defense Lab

  13. Well-defined Program Assumption • x: input • Re(x): reachability condition.=> under input x, will e be reached? • Ue(x) or UB: undefined behavior condition.=> under input x, will e exhibit undefined behavior in C? • Definition: Well-defined program assumption • A code fragment e is well-defined on an input xiffexecuting enevertriggers undefined behavior at e • A program Pis well-defined on an input xiff every fragment of the program is well-defined on that input, denoted as Δ A Seminar at Advanced Defense Lab

  14. Eliminating Unreachable Code • Theorem: Elimination • In a well-defined program P, an optimizer can eliminate code fragment e, if there is no input x that both reaches e and satisfies the well-defined program assumption Δ(x) A Seminar at Advanced Defense Lab

  15. Simplifying Unnecessary Computation • Theorem: Simplification A Seminar at Advanced Defense Lab

  16. Simplification Oracle • Boolean oracle: propose true and false in turn for a booleanexpression, enumerating possible values • Algebra oracle: propose to eliminate common terms on both sides of a comparison if one side is a subexpression of the other • x + y < x => y < 0 A Seminar at Advanced Defense Lab

  17. Limitation • It is possible to exploit the well-defined program assumption in other forms. A Seminar at Advanced Defense Lab

  18. Design & Implementation • Implement with LLVM + Boolector solver A Seminar at Advanced Defense Lab

  19. Compiler Frontend • To reduce false warnings, Stack ignores such compiler-generated code by tracking code origins, at the cost of missing possible bugs. A Seminar at Advanced Defense Lab

  20. UB Condition Insertion • Stack inserts a special function call into the IR at the corresponding instruction • void bug_on(boolexpr) A Seminar at Advanced Defense Lab

  21. Solver-based Algorithm • To implement these algorithms, Stack consults the Boolectorsolver to decide satisfiability for elimination and simplification queries. • But it is practically infeasible to precisely compute them for large programs. • To address this challenge, Stack computes approximate queries by limiting the computation to a single function. • With Tu and Padua’s algorithm A Seminar at Advanced Defense Lab

  22. Evaluation • New bug: 160 (July 2012  March 2013) A Seminar at Advanced Defense Lab

  23. Analysis of Bug Reports • Non-optimization bugs • Urgent optimization bugs • Time bombs • Redundant code (false alarm) A Seminar at Advanced Defense Lab

  24. Analysis of Bug Reports (cont.) • Non-optimization Bugs • Example: PostgreSQL [link] A Seminar at Advanced Defense Lab Time bomb!!

  25. Precision • Kerberos: 11 warning • Developers accepted every patch • false warning rate: 0/11 • Postgres: STACK produced 68 warnings • 9 patches accepted • 29 patches in discussion: developers blamed compilers • 26 time bombs • 4 false warnings A Seminar at Advanced Defense Lab

  26. Performance • 64-bit Ubuntu (Linux) • Intel Core i7-980 3.3GHz • 24GB memory • Solver time out: 5s A Seminar at Advanced Defense Lab

  27. Prevalence of Unstable Code • All packages in Debian Wheezy archive: 17,432 • Containing C/C++ code: 8,575 • Containing unstable code: 3,471 (40%) • 150 CPU day to analyze A Seminar at Advanced Defense Lab

  28. Prevalence of Unstable Code (cont.) A Seminar at Advanced Defense Lab

  29. Completeness • It is difficult to known precisely how much unstable code Stack would miss in general. • We analyze what kind of unstable code Stack misses. • A total of ten tests from real systems • Result: 7/10 A Seminar at Advanced Defense Lab

  30. Q & A A Seminar at Advanced Defense Lab

More Related