1 / 20

Dynamic Floating-Point Error Detection

Dynamic Floating-Point Error Detection. Mike Lam, Jeff Hollingsworth and Pete Stewart. Motivation. Finite precision -> roundoff error Compromises ill-conditioned calculations Hard to detect and diagnose Increasingly important as HPC grows Single-precision is faster on GPUs

airell
Download Presentation

Dynamic Floating-Point Error Detection

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Dynamic Floating-Point Error Detection Mike Lam, Jeff Hollingsworth and Pete Stewart

  2. Motivation Finite precision -> roundoff error Compromises ill-conditioned calculations Hard to detect and diagnose Increasingly important as HPC grows Single-precision is faster on GPUs Double-precision fails on long-running computations Previous solutions are problematic Numerical analysis requires training Manual re-writing and testing in higher precision is tedious and time-consuming

  3. Our Solution Instrument floating-point instructions Automatic Minimize developer effort Ensure analysis consistency and correctness Binary-level Include shared libraries w/o source code Include compiler optimizations Runtime Data-sensitive

  4. Our Solution Three parts Utility that inserts binary instrumentation Runtime shared library with analysis routines GUI log viewer General overview Find floating-point instructions and insert calls to shared library Run instrumented program View output with GUI

  5. Our Solution Dyninst-based instrumentation Cross-platform No special hardware required Stack walking and binary rewriting Java GUI Cross-platform Minimal development effort

  6. Our Solution Cancellation detection Instrument addition & subtraction Compare runtime operand values Report cancelled digits Side-by-side (“shadow”) calculations Instrument all floating-point instructions Higher/lower precision Different representation (i.e. rationals) Report final errors

  7. Cancellation Detection Overview Loss of significant digits during operations For each addition/subtraction: Extract value of each operand Calculate result and compare magnitudes (binary exponents) If eans < max(ex,ey) there is a cancellation For each cancellation event: Record a “priority:” max(ex,ey) - eans Save event information to log

  8. Gaussian Elimination • A -> [L,U] • Comparison of eight methods • Classical • Classical w/ partial pivoting • Classical w/ full pivoting • Bordering (“Sherman’s march”) • “Pickett’s charge” • “Pickett’s charge” w/ partial pivoting • Crout’s method • Crout’s method w/ partial pivoting

  9. Gaussian Elimination

  10. Gaussian Elimination Classical vs. Bordering

  11. Gaussian Elimination

  12. SPEC Benchmarks Results are hard to interpret without domain knowledge Overheads:

  13. Roundoff Error Sparse “shadow value” table Maps memory addresses to alternate values Shadow values can be single-, double-, quad- or arbitrary-precision Other ideas: rationals, # of significant digits, etc. Instrument every FP instruction Extract operation type and operand addresses Perform the same operation on corresponding shadow values Output shadow values and errors upon termination

  14. More Gaussian Elimination

  15. Issues & Possible Solutions Expensive overheads (100-500X) Optimize with inline snippets Reduce workload with data flow analysis Following values through compiler optimizations Selectively instrument MOV instructions Filtering false positives Deduce “root cause” of error using data flow

  16. Conclusion Analysis of floating-point error is hard Our tool provides automatic analysis of such error Work in progress

  17. Thank you!

More Related