Dynamic floating point error detection
1 / 20

Dynamic Floating-Point Error Detection - PowerPoint PPT Presentation

  • Uploaded on

Dynamic Floating-Point Error Detection. Mike Lam, Jeff Hollingsworth and Pete Stewart. Motivation. Finite precision -> roundoff error Compromises ill-conditioned calculations Hard to detect and diagnose Increasingly important as HPC grows Single-precision is faster on GPUs

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about ' Dynamic Floating-Point Error Detection' - airell

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Dynamic floating point error detection

Dynamic Floating-Point Error Detection

Mike Lam,

Jeff Hollingsworth and Pete Stewart


Finite precision -> roundoff error

Compromises ill-conditioned calculations

Hard to detect and diagnose

Increasingly important as HPC grows

Single-precision is faster on GPUs

Double-precision fails on long-running computations

Previous solutions are problematic

Numerical analysis requires training

Manual re-writing and testing in higher precision is tedious and time-consuming

Our solution
Our Solution

Instrument floating-point instructions


Minimize developer effort

Ensure analysis consistency and correctness


Include shared libraries w/o source code

Include compiler optimizations



Our solution1
Our Solution

Three parts

Utility that inserts binary instrumentation

Runtime shared library with analysis routines

GUI log viewer

General overview

Find floating-point instructions and insert calls to shared library

Run instrumented program

View output with GUI

Our solution2
Our Solution

Dyninst-based instrumentation


No special hardware required

Stack walking and binary rewriting

Java GUI


Minimal development effort

Our solution3
Our Solution

Cancellation detection

Instrument addition & subtraction

Compare runtime operand values

Report cancelled digits

Side-by-side (“shadow”) calculations

Instrument all floating-point instructions

Higher/lower precision

Different representation (i.e. rationals)

Report final errors

Cancellation detection
Cancellation Detection


Loss of significant digits during operations

For each addition/subtraction:

Extract value of each operand

Calculate result and compare magnitudes (binary exponents)

If eans < max(ex,ey) there is a cancellation

For each cancellation event:

Record a “priority:” max(ex,ey) - eans

Save event information to log

Gaussian elimination
Gaussian Elimination

  • A -> [L,U]

  • Comparison of eight methods

    • Classical

    • Classical w/ partial pivoting

    • Classical w/ full pivoting

    • Bordering (“Sherman’s march”)

    • “Pickett’s charge”

    • “Pickett’s charge” w/ partial pivoting

    • Crout’s method

    • Crout’s method w/ partial pivoting

Gaussian elimination2
Gaussian Elimination

Classical vs. Bordering

Spec benchmarks
SPEC Benchmarks

Results are hard to interpret without domain knowledge


Roundoff error
Roundoff Error

Sparse “shadow value” table

Maps memory addresses to alternate values

Shadow values can be single-, double-, quad- or arbitrary-precision

Other ideas: rationals, # of significant digits, etc.

Instrument every FP instruction

Extract operation type and operand addresses

Perform the same operation on corresponding shadow values

Output shadow values and errors upon termination

Issues possible solutions
Issues & Possible Solutions

Expensive overheads (100-500X)

Optimize with inline snippets

Reduce workload with data flow analysis

Following values through compiler optimizations

Selectively instrument MOV instructions

Filtering false positives

Deduce “root cause” of error using data flow


Analysis of floating-point error is hard

Our tool provides automatic analysis of such error

Work in progress