Kheiron: Runtime Adaptation of Native-C and Bytecode Applications

Kheiron: Runtime Adaptation of Native-C and Bytecode Applications Rean Griffith, Gail Kaiser Programming Systems Lab (PSL) Columbia University June 14 2006 Presented by Rean Griffith rg2023@cs.columbia.edu

Overview • Introduction • Problem • Solution • System Operation • Feasibility Experiments • Supported Adaptations • Conclusions & Future Work

Introduction • Self-healing systems are supposed to reduce the cost and complexity of system management. • Extra facilities for problem detection, diagnosis and remediation help end-users and administrators. • Sounds great, where do I get one?

Problem • Existing/legacy systems don’t have all the self-healing mechanisms they’ll ever need. • Tomorrow’s systems won’t have all of them either. • It’s impractical, costly and time-consuming to re-design, re-build and re-deploy new self-healing versions. • What happens when we need a new self-healing facility?

6 Questions • Can we retro-fit self-healing mechanisms onto existing systems as a form of system adaptation? • How could we do it? • Can we do it on-the-fly? • Can we do things in a general way rather than ad-hoc one-time fixes? • Sounds risky, if we can do it, can we give any guarantees? • What kinds of self-healing mechanisms can we add?

3.5 Quick Answers

How can we do it? • Observation: All software systems run in a software execution environment (EE). Use it as the lowest common denominator for adapting live systems. • Hypotheses: • The execution environment is a feasible target for efficiently and transparently effecting adaptations in the applications they host. • Existing facilities in unmodified execution environments can be used to effect runtime adaptations. • Any guarantees we give are a function of the execution environment and its operation.

Solution Considerations • Two kinds of execution environments: • Un-managed/native [Processor + OS e.g. x86 + Linux] • Managed [JVM/CLR] • What do we need from the EE? • Facility for tracing program execution. • Facility for controlling program execution. • Access to metadata about the units of execution. • Facility for adding/editing metadata.

Comparing Execution Environments

System Architecture from 10,000ft

How Kheiron Works • Attaches to programs while they run or when they load. • Interacts with programs while they run at various points of their execution. • Augments type definitions and/or executable code • Needs metadata – rich metadata is better • Interposes at method granularity, inserting new functionality via method prologues and epilogues. • Control can be transferred into/out of adaptation library logic • Control-flow changes can be done/un-done dynamically

System Operation

Kheiron/C Operation Mutator Application Kheiron/C void foo( int x, int y) { int z = 0; } Points Snippets Dyninst API Dyninst Code C/C++ Runtime Library ptrace/procfs

B C A Bytecode Method body Bytecode Method body New Bytecode Method Body Call _Sample Method Bytecode Method body Prepare Shadow Create Shadow SampleMethod SampleMethod _SampleMethod SampleMethod _SampleMethod Kheiron/JVM Operation SampleMethod( args ) [throws NullPointerException] <room for prolog> push args call _SampleMethod( args ) [throws NullPointerException] { try{…} catch (IOException ioe){…} } // Source view of _SampleMethod’s body <room for epilog> return value/void

Experiments • Goal: Measure the feasibility of our approach. • Look at the impact on execution when no repairs/adaptations are active. • Selected compute-intensive applications as test subjects (SciMark and Linpack). • Unmanaged experiments • P4 2.4 GHz processor, 1GB RAM, SUSE 9.2, 2.6.8x kernel, Dyninst 4.2.1. • Managed experiments • P3 Mobile 1.2 GHz processor, 1GB RAM, Windows XP SP2, Java HotspotVM v1.5 update 04.

Kheiron/C – Results

Kheiron/JVM – Results

What did we learn from our experiments? • Our approach is feasible with between ~1% - 5% runtime overhead when no repairs active. • Kheiron is transparent to both the application and the unmodified execution environment. • More/rich metadata makes things “easier” • Easier to navigate and make changes in managed execution environments then their un-managed counterparts. • We can perform and undo our changes on-the-fly. Allowing us to manage the performance impact. • We use a general approach where we can hook/interpose at method-granularity in a variety of execution environments.

Unmanaged Execution Environment Metadata • Not enough information to support type discovery and/or type relationships. • No APIs for metadata manipulation. • In the managed world, units of execution are self-describing.

Adaptation Guarantees • Managed execution environments give guarantees about: • Valid executables – bytecode verification • Security attributes – security sandboxes and permissions/policies. • These guarantees encoded in metadata in the units of execution. • Any inserted adaptations are bound by the same rules as the original application. • Un-managed execution environments don’t give the same guarantees.

Supported Adaptations • Instrumentation insertion/removal. • Component/structure instance-caching. • Periodic/on-demand consistency checks on cached components or sub-system interfaces. • Hot component swaps. • Function-input filters. • Residual testing. • Ghost Transactions – (POST for software). • Selective Emulation (compiled C-binaries).

Selective Emulation Using STEM + Dyninst • STEM – an instruction level x86 emulator developed by another group at Columbia (Locasto et. al.). • Dyninst – a toolkit for instrumenting running C-applications.

How it works • Running an application in an emulator/sandbox isn’t a new idea • Security benefits • Isolation benefits • High overheads associated with whole-program execution – Valgrind, Bochs, original STEM. • Idea: Vary, at runtime, the portions of the application which run inside the STEM emulator to manage the performance impact.

Background on STEM • Original STEM works at the source level: void foo() { int i = 0; // save cpu registers macro emulate_init(); // begin emulation function call emulate_begin(); i = i + 10; // end emulation function call emulate_end(); // commit/restore cpu registers macro emulate_term(); }

void foo() { int i = 0; // save cpu registers macro emulate_init(); // Oops…can’t inject macros with Dyninst // begin emulation function call emulate_begin(); // OK to inject function calls with Dyninst i = i + 10; // end emulation function call emulate_end(); // OK to inject function calls with Dyninst // commit/restore cpu registers macro emulate_term(); // Oops…can’t inject macros with Dyninst } Using un-modified Dyninst 4.2.1

Modified STEM + Dyninst • Modify Dyninst trampoline to save CPU state to a memory address (rather than the stack) before method call. • Use Dyninst API to allocate memory areas in target process address space for register storage area and code storage area. • Save instructions relocated by trampoline to prime STEM’s instruction pipeline in the code storage area. • Use Dyninst API to insert calls to our RegisterSave and EmulatorPrime functions which configure STEM. • Use Dyninst API to insert calls to STEM’s emulate_begin(). • Modify STEM to keep track of its stack depth (initially set to 0), emulation ends when a ret/leave instruction is encountered at stack depth 0. The search for emulate_end goes away.

Conclusions – 6 Answers • Kheiron can be used to efficiently and transparently retro-fit self-healing mechanisms onto existing systems as a form of adaptation. • Kheiron uses facilities and characteristics of the unmodified execution environment to adapt running programs. • Changes can be done/un-done at runtime to manage the performance impact as well as give flexibility in evolving the system. • Based on metadata, and its verification/validation rules, we can extend existing systems in a general way. • Guarantees on application properties are a function of the execution environment. • Kheiron supports a wide range of adaptations.

Future Work • Kheiron can be used for disturbance/fault injection. • Working on a methodology for benchmarking self-healing systems with respect to the efficacy of their self-healing mechanisms (extensions to work done by Aaron Brown et. al.). • Actively looking for systems to field-test/refine/reject ideas about our proposed benchmarking methodology for my thesis.

Questions, Comments, Queries? Thank you for your time and attention. Contact: Rean Griffith rg2023@cs.columbia.edu [reanG@us.ibm.com]

Kheiron: Runtime Adaptation of Native-C and Bytecode Applications