Non-Control-Data Attacks and Securing software by enforcing data-flow integrity

CS590 paper presentation Non-Control-Data Attacks and Securing software by enforcing data-flow integrity Zhiqiang Lin Mar 28, 2007

Non-Control-Data Attacks Are Realistic Threats Overview Examples Data flow Integrity Shuo Chen, Jun Xu, Emre C. Sezer, Prachi Gauriar, and Ravishankar K. Iyer USENIX Security’05 Discussions Credit: most slides of this presentation come from Shuo Chen’s Conclusions

Control Data Attack: Well-Known, Dominant • Control data attack: corrupt function pointers, jump targets and return addresses to run malicious code • E.g., code injection, mimicry attack and return-to-LibC • Currently the most dominant form of memory corruption attacks [CERT and Microsoft Security Bulletin] • By exploiting many vulnerabilities such as buffer overflow, format string bug, integer overflow, double free, etc.

Current defense techniques • Enforce control data integrity to provide security. Legal Control flow ?

Non-Control-Data Attack • Non-control-data attacks: attacks not corrupting any control data • i.e., attacks preserving the integrity of control flow of the victim process • Currently very rare in reality • Very few instances documented in literature. • Several papers: theoretically possible to construct non-control-data attacks against synthetic programs. • Not yet considered as a serious threat • How applicable are such attacks against real-worldsoftware? • Why rare  attackers’ incapability or lack of incentives? • No focused investigation yet.

Motivating Facts • Random hardware memory errors could subvert the security of real-world systems. • Boneh and DeMillo: random errors allow deriving secret keys in CRT-based RSA implementation. [Eurocrypt’97] • Our previous work: authentication of SSH and FTP servers, packet filtering of Linux firewalls can be compromised. [DSN’01 and DSN’02] • Govindavajhala and Appel: Java type system can be subverted. [S&P’03] • None of them is control-data attack. A wide range of real-world software susceptible. • Software vulnerabilities are more deterministic and more amenable to attacks. • Many software vulnerabilities are essentially “memory fault injectors”: overwriting an arbitrary memory location • Heap overflow • Double free • Format string bug • Integer overflow

General Applicability of Non-Control-Data Attacks • The claim: • Many real-world software applications are susceptible to non-control-data attacks. • The severity of the attack consequences is equivalent to that due to control data attacks. • Goal of their paper • Experimentally validate the claim • Construct non-control-data attacks to compromise the security of “representative” applications • Discuss the implications of the claim on current defensive techniques • Call for comprehensive defensive techniques

Realistic Non-Control-Data Attacks Overview Examples Data flow Integrity Discussions Conclusions

Selection of Target Applications • Real-world applications, not synthetic applications. • Leading application categories • CERT advisories (2000 – 2004) • 84% are server vulnerabilities • HTTP service (18%), database service (10%), 6 remote login service (8%), mail service (5%), FTP service (4%). • Selection criteria • Different types of vulnerabilities should be covered • Different types of server applications should be studied • Practical constraints for our selection • Uncertainties in many vulnerability reports: really exploitable? • Proprietary source code • Limited information about details of many vulnerabilities • Eventually, they selected • Open-source FTP, SSH, Telnet, HTTP servers • Stack buffer overflow, format string, heap corruption, integer overflow.

x uninitialized, run as EUID 0 x=109, run as EUID 0 x=109, run as EUID 109. Lose the root privilege! Get a special SITE EXEC command. Exploit a format string vulnerability. x= 0, still run as EUID 109. Get a data command (e.g., PUT) x=0, run as EUID 0 x=0, run as EUID 0 1. Non-Control-Data Attack against WU-FTPD Server (via a format string bug) int x; FTP_service(...) { authenticate(); x = user ID of the authenticated user; seteuid(x); while (1) { get_FTP_command(...); if (a data command?) getdatasock(...); } } getdatasock( ... ) { seteuid(0); setsockopt( ... ); seteuid(x); } When return to service loop, still runs as EUID 0 (root). Allow us to upload /etc/passwd We can grant ourselves the root privilege! Only corrupt an integer, not a control data attack.

2. Non-Control-Data Attack against NULL-HTTP Server (via a heap overflow bug) • Attack the configuration string of CGI-BIN path. • Mechanism of CGI • suppose server name = www.foo.comCGI-BIN = • Requested URL = http://www.foo.com/cgi-bin • The server executes • Our attack • Exploit the vulnerability to overwrite CGI-BIN to /bin • Request URL http://www.foo.com/cgi-bin/sh • The server executes /usr/local/httpd/exe /usr/local/httpd/exe /bar /bar /bin /sh The server gives me a root shell! Only overwrite four characters in the CGI-BIN string.

auth = 0 auth = 0 auth = 1 auth = 1 Password incorrect, but auth = 1 Logged in without correct password 3. Non-Control-Data Attack against SSH Communications SSH Server (via an integer overflow bug) void do_authentication(char *user, ...) { int auth = 0; ... while (!auth) { /* Get a packet from the client */ type = packet_read(); switch (type) { ... case SSH_CMSG_AUTH_PASSWORD: if (auth_password(user, password)) auth =1; case ... } if (auth) break; } /* Perform session preparation. */ do_authenticated(…); }

4. More Non-Control-Data Attacks • Against NetKit Telnet server (default Telnet server of Redhat Linux) • Exploit a heap overflow bug • Overwrite two strings:/bin/login–h foo.com -p (normal scenario) /bin/sh–h –p -p (attack scenario) • The server runs /bin/sh when it tries to authenticate the user. • Against GazTek HTTP server • Exploit a stack buffer overflow bug • Send a legitimate URL http://www.foo.com/cgi-bin/bar • The server checks that “/..” is not embedded in the URL • Exploit the bug to change the URL to http://www.foo.com/cgi-bin/../../../../bin/sh • The server executes /bin/sh

What Non-Control-Data Attacks Imply? • Control flow integrity is not a sufficiently accurate approximation to software security. • Many types of non-control data critical to security • User identify data • configuration data • user input data • decision-making data • Once attackers have the incentive, they are likely to succeed in non-control-data attacks.

Securing software by enforcing data-flow integrity Overview Examples Data flow Integrity Miguel Castro, Microsoft Research; Manuel Costa, Microsoft Research Cambridge; Tim Harris, Microsoft Research OSDI’06 Discussions Conclusions

Motivation • Most of the software in use today is written in C++. This body of software has a large amount of defects and there exists many ways to exploit these defects such as corrupting control data. • Removing or avoiding all defects is hard and that although it is possible to prevent attacks based on control-data exploits, certain attacks can succeed without compromising control-flow, in particular the non-control data attack.

Basic Idea – Data Flow Integrity (DFI) • A technique that computes a dataflow graph for a vulnerable program, and instruments the program to ensure that the flow of data at runtime is allowed by the data-flow graph. • It can be applied to existing C and C++ programs automatically, because it requires no modifications and it does not generate false positives.

DFI – High level Overview (1/2) • Analysis Part • Using reaching definition analysis to compute a data-flow graph at compile time. • For every load, compute the set of stores that may produce the loaded data. • An ID is assigned to every store operation and for each load, the set of allowed IDs is computed. In compiler theory, a reaching definition for a given instruction is another instruction, the target variable of which may reach the given instruction without an intervening assignment. d1 : y := 3 d2 : y := 4 d3 : x := y d1 : y := 3 d2 : x := y

DFI – High level Overview (2/2) • Enforcing Part (The results of the analysis is used to add run-time checks that will enforce data-flow integrity) • Stores are instrumented to write their ID into the runtime definition table (RDT). The RDT keeps track of the last store to write to each memory location. • Loads are instrumented to check if the store in the RDT is in their set of allowed writes. If a store ID is not in the set during a check, a exception is raised.

Example vulnerable code in C and their high-level intermediate representation Phoenix compiler infrastructure

Static Analysis • Compute reaching definitions using a combination of two analyses: • flow-sensitive intra-procedural analysis • flow-insensitive and context-insensitive inter-procedural analysis. • They operate on Phoenix's high level intermediate representation The set of reaching definitions is {1,8} for both uses of authenticated (in lines 2 and 10).

Instrumentation • SETDEF opnd id • CHECKDEF opnd setName. • The first instruction sets the RDT entry for opnd to id. • The second retrieves the runtime definition identifier for opnd from the RDT and checks if the identifier is in the reaching definitions set with name setName. • The compiler maintains a map from set names to set values that is used when lowering CHECKDEF instructions to the assembly of the target machine.

Instrumented Example code SETDEF opnd id CHECKDEF opnd setName. Note: Every Store is instrumented for the check

Optimizations • Renaming equivalent definitions • Removing bounds checks on writes • Removing SETDEFs and CHECKDEFs • Optimizing membership checks • Removing SETDEFs for safe definitions

Evaluation - Performance

Evaluation – space overhead

Evaluation - Performance

Evaluation – effectiveness against attacks • Synthetic attacks • Wilander’s buffer overflowtestbed • NullHttpd • Corrupting cgi-bin configuration string • SSH • Overwrite a stack variable • Stunnel • A format string attack == control data attack • No false positive

Overview Examples Data flow Integrity Discussions Conclusions

Discussions on Current Defensive Techniques • Defenses based on control flow integrity • Monitor system call sequences • Protect control data • Non-executable stack and heap • Pointer encryption PointGuard • Identifying pointers in low level code is really challenging • Address space randomization • Challenge: need to randomize every program segment • Limitation: 32-bit address space cannot provide sufficient entropy • Memory safety enforcement • Promising direction, e.g., CCured, Cyclone, CRED • Currently difficult to migrate existing large code bases to memory safe version. Incur runtime overhead. Difficult to ensure memory safety for low-level code. • Data flow integrity • Efficient • High performance overhead 1.5X-2.7 • Points-to-analysis in inter-procedure analysis? • Still open: to design a generic and secure defense?

Mitigating Factors • Requiring application-specific semantic knowledge • Control-data attack  unrelated to the semantics of the victim process (hijack the control flow, do whatever you like) • Non-control-data attack  rely on the semantics of the victim process • Not a fundamental constraint • Semantics of widely used applications will be well understood, if attackers have strong incentives • The more instances attackers see, the easier they can clone new ones. A matter of experiences. • Lifetime of security-critical data • Attacks are not possible if the vulnerabilities exist outside the lifetime of the target data. • Programs can be modified to reduce data lifetime to enhance security.

Reducing Data Lifetime for Security Lifetime of seteuid() argument

Reducing Data Lifetime for Security Lifetime of auth flag

Overview Examples Data flow Integrity Discussions Conclusions

Conclusions • Many real-world software applications are susceptible to attacks that do not hijack program control flow. • Constructing a generic and secure defensive technique to defeat both control-data attacks and non-control-data attacks is still an open problem? (DFI is the best so far?)

Conclusions • Other possible methods: • “Reducing data lifetime is a secure programming practice to increase software resilience to attacks.“ • …

Non-Control-Data Attacks and Securing software by enforcing data-flow integrity