1 / 4

ARM Cortex-A9 performance in HPC applications Kurt Keville, Clark Della Silva, Merritt Boyd

ARM Cortex-A9 performance in HPC applications Kurt Keville, Clark Della Silva, Merritt Boyd. ARM gaining market share in embedded systems and SoCs Current processors include the ARM9 series, the Cortex-A8, and the Cortex-A9 ~15 Billion ARM chips shipped to date Thumb / Thumb-2

halona
Download Presentation

ARM Cortex-A9 performance in HPC applications Kurt Keville, Clark Della Silva, Merritt Boyd

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ARM Cortex-A9 performance in HPC applicationsKurt Keville, Clark Della Silva, Merritt Boyd ARM gaining market share in embedded systems and SoCs Current processors include the ARM9 series, the Cortex-A8, and the Cortex-A9 ~15 Billion ARM chips shipped to date Thumb / Thumb-2 More efficient instruction encoding, better code density Higher performance for select applications VFPv3 Floating point co-processor NEON SIMD Extensions Up to 4x 32-bit floating point operations per instruction No double precision

  2. ARM Cortex-A8 and Cortex-A9 Uses the ARMv7-A architecture Thumb-2, NEON, VFPv3 Systems TI BeagleBoard (1GHz)‏ Genesi Efika-MX (800MHz)‏ Gumstix Overo Earth (600MHz)‏ • Uses ARMv7-A architecture • Thumb-2, NEON, VFPv3 • Available in single and dual core packages, quads upcoming (A6, Tegra 3)‏ • Systems • TI PandaBoard (dual 1GHz)‏ • NuFront NuSmart (dual 1.2GHz)‏

  3. TI PandaBoard Results 3.0 Gflop/s SP NEON, 1.2 Gflop/s DP (HPL)‏ Power consumption Idle: ~4 Watts / board, Full load: ~7.5 Watts / board Software & Hardware Challenges Most libraries assume x86 / x86-64, No precompiled binaries (unavailable or unoptimized), Compiler support immature (-mcpu=cortex-a9, -mhard-float)‏ Limited RAM on some systems, Low-quality networking hardware and software, Few possibilities for expansion, Reliability issues Energy Efficiency 2 Gflop/s / Watt gets you #1 on Green500 PandaBoard is $175, and 18 square inches .4 Gflop/s / Watt, 0.0074 Gflop/s / $, and 0.072 Gflop/s / square inch

  4. Looking Ahead : Embedded GPUs Most SoCs include a GPU, e.g. PVR SGX 540 (PandaBoard)‏ Potential for mixed CPU-GPU computation OpenCL support, pending release of drivers on TI SoCs, available for Apple Hardware ARM Cortex-A15 with PVR series 6 GPU Much more powerful and better suited for computation Tegra 3 & 4 Potential for Cuda Support

More Related