Loading in 5 sec....

Dynamic Floating-Point Error DetectionPowerPoint Presentation

Dynamic Floating-Point Error Detection

- 86 Views
- Uploaded on
- Presentation posted in: General

Dynamic Floating-Point Error Detection

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Dynamic Floating-Point Error Detection

Mike Lam,

Jeff Hollingsworth and Pete Stewart

Finite precision -> roundoff error

Compromises ill-conditioned calculations

Hard to detect and diagnose

Increasingly important as HPC grows

Single-precision is faster on GPUs

Double-precision fails on long-running computations

Previous solutions are problematic

Numerical analysis requires training

Manual re-writing and testing in higher precision is tedious and time-consuming

Instrument floating-point instructions

Automatic

Minimize developer effort

Ensure analysis consistency and correctness

Binary-level

Include shared libraries w/o source code

Include compiler optimizations

Runtime

Data-sensitive

Three parts

Utility that inserts binary instrumentation

Runtime shared library with analysis routines

GUI log viewer

General overview

Find floating-point instructions and insert calls to shared library

Run instrumented program

View output with GUI

Dyninst-based instrumentation

Cross-platform

No special hardware required

Stack walking and binary rewriting

Java GUI

Cross-platform

Minimal development effort

Cancellation detection

Instrument addition & subtraction

Compare runtime operand values

Report cancelled digits

Side-by-side (“shadow”) calculations

Instrument all floating-point instructions

Higher/lower precision

Different representation (i.e. rationals)

Report final errors

Overview

Loss of significant digits during operations

For each addition/subtraction:

Extract value of each operand

Calculate result and compare magnitudes (binary exponents)

If eans < max(ex,ey) there is a cancellation

For each cancellation event:

Record a “priority:” max(ex,ey) - eans

Save event information to log

- A -> [L,U]
- Comparison of eight methods
- Classical
- Classical w/ partial pivoting
- Classical w/ full pivoting
- Bordering (“Sherman’s march”)
- “Pickett’s charge”
- “Pickett’s charge” w/ partial pivoting
- Crout’s method
- Crout’s method w/ partial pivoting

Classical vs. Bordering

Results are hard to interpret without domain knowledge

Overheads:

Sparse “shadow value” table

Maps memory addresses to alternate values

Shadow values can be single-, double-, quad- or arbitrary-precision

Other ideas: rationals, # of significant digits, etc.

Instrument every FP instruction

Extract operation type and operand addresses

Perform the same operation on corresponding shadow values

Output shadow values and errors upon termination

Expensive overheads (100-500X)

Optimize with inline snippets

Reduce workload with data flow analysis

Following values through compiler optimizations

Selectively instrument MOV instructions

Filtering false positives

Deduce “root cause” of error using data flow

Analysis of floating-point error is hard

Our tool provides automatic analysis of such error

Work in progress