1 / 9

Automatically Tuned Linear Algebra Software (ATLAS)

Automatically Tuned Linear Algebra Software (ATLAS). R. Clint Whaley. University of Tennessee www.netlib.org/atlas. What is ATLAS. A package that adapts to differing architectures via AEOS techniques Initially, supply BLAS

reeves
Download Presentation

Automatically Tuned Linear Algebra Software (ATLAS)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Automatically Tuned Linear Algebra Software (ATLAS) R. Clint Whaley University of Tennessee www.netlib.org/atlas

  2. What is ATLAS • A package that adapts to differing architectures via AEOS techniques • Initially, supply BLAS • Automated Empirical Optimization of Software (AEOS) • Machine searches opt space • Finds application-apparent architecture • AEOS requires: • Method of code variation • Code generation • Multiple implement. • Parameterization • Sophisticated Timers • Robust search heuristic

  3. Why ATLAS is needed • BLAS require many man-hours / platform • Only done if financial incentive is there • Many platforms will never have an optimal version • Lags behind hardware • May not be affordable by everyone • Improves vendor code • Allows for portably optimal codes • Obsolescence insurance • Operations may be important, but not general enough for standard

  4. ATLAS Software • Coming soon • pthread support • Open source kernels • SSE & 3DNOW! • GOTO ev5/6 BLAS • Performance for banded and packed • More LAPACK • Coming not-so-soon • Sparse support • User customization • Currently provided • Full BLAS (C & F77) • Level 3 BLAS • Generated GEMM • 1-2 hours install time per precision • Recursive GEMM-based L3 BLAS • Antoine Petitet • Level 2 BLAS • GEMV & GER ker • Level 1 BLAS • Some LAPACK • LU, LLt

  5. N K N A M B M K * NB C Algorithmic Approach for Matrix Multiply • Only generated code is on-chip multiply • All BLAS operations written in terms of generated on-chip multiply • All transpose cases coerced through data copy to 1 case of on-chip multiply • Only 1 case generated per platform

  6. Algorithmic approach for Level 3 BLAS Recursive TRMM • Recur down to L1 cache block size • Need kernel at bottom of recursion • Use gemm-based kernel for portability 0 0 0 0 0 0 0

  7. 500x500 DGEMM Across Various Architectures

  8. 500 x 500 Double Precision RB LU factorization

  9. 500x500 Recursive BLAS on UltraSparc 2200

More Related