1 / 18

Low Overhead Program Monitoring and Profiling

Low Overhead Program Monitoring and Profiling. Department of Computer Science University of Pittsburgh Pittsburgh, Pennsylvania 15260 {naveen, childers}@cs.pitt.edu. Naveen Kumar, Bruce Childers. Mary Lou Soffa. Department of Computer Science University of Virginia

lucchesi
Download Presentation

Low Overhead Program Monitoring and Profiling

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Low Overhead Program Monitoring and Profiling Department of Computer Science University of Pittsburgh Pittsburgh, Pennsylvania 15260 {naveen, childers}@cs.pitt.edu Naveen Kumar, Bruce Childers Mary Lou Soffa Department of Computer Science University of Virginia Charlottesville, Virginia 22904 soffa@virginia.edu

  2. Introduction • Program instrumentation: Insertion of additional code into a program • Monitor program behavior or gather information • Can be inserted at source intermediate or binary level • Applications • Detect program invariants [Ernst] • Dynamic slicing [Zhang] • Software testing [Misurda] • Software security checks [Scott]

  3. Running Example • Consider a software security system that monitors the memory behavior of untrusted programs (e.g. Dynamo RIO) • Instrumentation at binary instruction level • Instrument all loads and stores • Program can be instrumented statically as well as dynamically

  4. Static instrumentation probe1: call secure(…) probe2: call secure(…) probe3: call secure(…) probe4: call secure(…) probe1: M[r[sp] + -20 ] = r[l0] save call save_gp_regs … r[o0] = M[r[sp] + 0x68 ] r[o0] = r[o0] +0x10 call secure r[o1] = r[g0] + 1 call restore_gp_regs restore r[sp] = r[sp] + 124 M[r[l0 ]+ 0x10 ] = r[o2] jmp probe1_ret r[o1] = r[o1] << 10 r[o1] = r[o1] + 0x228 r[o0] = r[o2] << 0x14 r[l4] = r[o0] << 0x14 M[r[l0 ]+ 0x10 ] = r[o2] M[r[o1] + 0x228 ] = r[o0] r[i4] = r[o1] r[l1] = r[o0] jmp r[31] … M[r[l0] + 0x20 ] = r[o0] r[sp] = r[sp] -112 r[o0] = r[o0] << 10 r[o1] = M[r[o0] + 0x3d0 ] … … jmp probe1 jmp probe2 jmp probe3 jmp probe4 Example from gzip. Instrumentation performed before execution starts

  5. Dynamic instrumentation probe1: call secure(…) probe2: call secure(…) probe3: call secure(…) probe4: call secure(…) r[o1] = r[o1] << 10 r[o1] = r[o1] + 0x228 r[o0] = r[o2] << 0x14 r[l4] = r[o0] << 0x14 M[r[l0 ]+ 0x10 ] = r[o2] M[r[o1] + 0x228 ] = r[o0] r[i4] = r[o1] r[l1] = r[o0] jmp r[31] … M[r[l0] + 0x20 ] = r[o0] r[sp] = r[sp] -112 r[o0] = r[o0] << 10 r[o1] = M[r[o0] + 0x3d0 ] … … jmp probe1 jmp probe2 jmp probe3 jmp probe4 Instrumentation performed at run-time on code that executes More powerful than static instrumentation, possibly less expensive

  6. Motivation • Stumbling block: high overhead • Slowdown by an order of magnitude or more [Ernst] • Existing solutions: user guided • Sampling [Arnold] • Smaller data sets analyzed (test data set of SPEC instead of Ref) [Mock] • Less aggressive uses, especially in dynamic settings [Deusterwald] • User has to decide how best to apply instrumentation • What is needed are automatic techniques to mitigate the overheads systematically

  7. Goals • Gather exact information • Separate out the accuracy from efficiency • User should focus on what to gather, rather than how to efficiently gather • Efficient • Comparable to hand-optimized instrumentation • Automatic • No or little user guidance

  8. Instrumentation Optimization • Costs associated with instrumentation • Dynamic probe count: Number of probes executed • Probe cost: Number of instructions in a probe • Payload cost: Frequency of invocation and cost of payload • Optimize instrumentation code to reduce costs • Dynamic probe coalescing • Partial context switches • Partial payload inlining

  9. Base Instrumenter probe1: call secure(…) probe2: call secure(…) probe3: call secure(…) probe4: call secure(…) r[o1] = r[o1] << 10 r[o1] = r[o1] + 0x228 r[o0] = r[o2] << 0x14 r[l4] = r[o0] << 0x14 M[r[l0 ]+ 0x10 ] = r[o2] M[r[o1] + 0x228 ] = r[o0] r[i4] = r[o1] r[l1] = r[o0] jmp r[31] … M[r[l0] + 0x20 ] = r[o0] r[sp] = r[sp] -112 r[o0] = r[o0] << 10 r[o1] = M[r[o0] + 0x3d0 ] … … jmp probe1 jmp probe2 jmp probe3 jmp probe4 Base instrumenter generates a list of Instrumentation Points

  10. Dynamic Probe Coalescing probe5: call secure(…) call secure(…) probe3: call secure(…) probe4: call secure(…) probe6: call secure(…) call secure(…) call secure(…) probe1: call secure(…) probe2: call secure(…) probe3: call secure(…) probe4: call secure(…) r[o1] = r[o1] << 10 r[o1] = r[o1] + 0x228 r[o0] = r[o2] << 0x14 r[l4] = r[o0] << 0x14 M[r[l0 ]+ 0x10 ] = r[o2] M[r[o1] + 0x228 ] = r[o0] r[i4] = r[o1] r[l1] = r[o0] jmp r[31] … M[r[l0] + 0x20 ] = r[o0] r[sp] = r[sp] -112 r[o0] = r[o0] << 10 r[o1] = M[r[o0] + 0x3d0 ] … … jmp probe1 jmp probe5 jmp probe2 jmp probe3 jmp probe6 jmp probe4

  11. Partial Context Switch probe6: call secure(…) call secure(…) call secure(…) probe4: call secure(…) probe6: M[r[sp] -20 ] = r[l0] M[r[sp] -28 ] = r[o1] save call save_gp_regs … effective address … call secure … effective address … call secure … effective address … call secure call restore_gp_regs restore … … jmp probe6_ret r[o1] = r[o1] << 10 r[o1] = r[o1] + 0x228 r[o0] = r[o2] << 0x14 r[l4] = r[o0] << 0x14 M[r[l0 ]+ 0x10 ] = r[o2] M[r[o1] + 0x228 ] = r[o0] r[i4] = r[o1] r[l1] = r[o0] jmp r[31] … M[r[l0] + 0x20 ] = r[o0] r[sp] = r[sp] -112 r[o0] = r[o0] << 10 r[o1] = M[r[o0] + 0x3d0 ] … … jmp probe6 jmp probe4 Analyze register usage in payload Remove spill and reload of GP registers Regs. used in payload: {…} Not used: {g0…g7}

  12. Partial Payload Inlining probe6: M[r[sp] -20 ] = r[l0] M[r[sp] -28 ] = r[o1] r[sp] = r[sp] -140 … effective address … call secure … effective address … call secure … effective address … call secure r[sp] = r[sp] + 140 … … jmp probe6_ret void secure(address) { if(address > REDZONE) return; redAlerts++; createReport(); if(critical(address)) assert(address); } r[o1] = M[r[g1]+0] r[o1] = r[o1] - r[o0] r[i0] = 1 jmp r[31] … r[o3] = M[r[g2] +0] r[o3] = r[o3] + 1 … !call createReport … !call assert call __full_secure void __inlined_secure(address) { r[o1] = r[o1] << 10 r[o1] = r[o1] + 0x228 r[o0] = r[o2] << 0x14 r[l4] = r[o0] << 0x14 M[r[l0 ]+ 0x10 ] = r[o2] M[r[o1] + 0x228 ] = r[o0] r[i4] = r[o1] r[l1] = r[o0] jmp r[31] … M[r[l0] + 0x20 ] = r[o0] r[sp] = r[sp] -112 r[o0] = r[o0] << 10 r[o1] = M[r[o0] + 0x3d0 ] … … __full_secure(address, tag); } void __full_secure(address, tag) { jmp probe6 jmp probe4

  13. Implementation • Strata: dynamic translation system [Scott et. al.] • Generates code at run-time for an application • Suitable for dynamic instrumentation • FIST: base instrumentation system [Kumar et. al.] • Flexible for diverse instrumentation needs • Generates a list of instrumentation points (IP’s) • INS-OP: developed in this work • Constructs an IR for the list of IP’s obtained from FIST • Each optimization is a pass that modifies the IR

  14. Case Studies • Case study 1: Program profiling • Lightweight instrumentation application • Lower initial overhead implies lesser benefits • Demonstrates efficacy of the optimizations in an unfavorable scenario • Case study 2: Memory simulation • Relatively heavy-weight instrumentation application • Can compare with state-of-the-art systems to see the benefits of optimization

  15. Case study 1: Program profiling • The benefit of optimization varies; depends upon the initial overhead • The speedups range from 1.26 to 2.63

  16. Case study 2: Memory Simulation • Strata-Embra is a SPARC implementation of cache simulator from SimOS • Strata-Embra-Opt is optimized cache simulator using INS-OP • INS-OP optimizes the fastest cache simulator we could find by 2 - 3.3 times

  17. Conclusions • Introduced “instrumentation optimization” to reduce the cost of instrumented code • Reduced probe count • Reduce cost of an individual probe • Reduce the cost of payload • Speedups between 1.2 - 3.3 times • More detailed information gathering • Accuracy need not be sacrificed for efficiency • Feasibility of certain applications • Run-time monitoring more feasible • Example: applications that perform continuous testing

  18. Effectiveness of optimizations

More Related