1 / 12

Wavescalar S. Swanson, et al. Computer Science and Engineering University of Washington

Wavescalar S. Swanson, et al. Computer Science and Engineering University of Washington. Presented by Brett Meyer. ILP in Modern Architecture. Lots of available ILP in software Execute in parallel for greater performance Superscalar processors can’t tap it Serialized by PC

jaunie
Download Presentation

Wavescalar S. Swanson, et al. Computer Science and Engineering University of Washington

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. WavescalarS. Swanson, et al.Computer Science and Engineering University of Washington Presented by Brett Meyer

  2. ILP in Modern Architecture • Lots of available ILP in software • Execute in parallel for greater performance • Superscalar processors can’t tap it • Serialized by PC • Superscalar doesn’t scale Data-flow approaches can cheaply leverage existing parallelism

  3. Wavescalar • Introduction • WaveCache and Wavescalar ISA • Evaluation and Results • Does WaveCache make sense? • Compiler challenges

  4. Wavescalar: Basics • ALU-in-cache data-flow architecture • No centralized, broadcast-based resources • Compile data-flow binaries

  5. Wavescalar: Waves • Instructions  architecture • Programs broken into waves • Block with single entry • Use wave number to tag data • Disambiguates data from multiple iterations

  6. Wavescalar: Memory • Relaxed program order • Follow control-flow • Obey dependencies • Distributed store buffers • Hardware coherence

  7. Evaluation • WaveCache • 4 MB of on-chip instructions + data, 2K ALUs • WaveCache vs. superscalar • 16-wide OOO, 1K registers, 1K window • WaveCache vs. TRIPS • 4 16-wide in-order cores, 2 MB on-chip cache • Key assumption: perfect memory Fair comparisons? Is it reasonable to assume perfect memory?

  8. Results • WaveCache out-performs superscalar • Similar performance to TRIPS

  9. Memory is the problem, not ILP • Data-flow exposes greater ILP • Memory not fast enough for low-ILP CPUs • Processor-memory performance gap • What does perfect memory hide? • Does superscalar perform better? • Did not model hardware coherence WaveCache needs MORE bandwidth than a superscalar

  10. Is WaveScalar Scalable? • Sub-linear performance improvement • More clusters further away from memory • SPEC, MediaBench fit easily in memory • What happens to performance when the working set doesn’t fit in WaveCache?

  11. Compiler Challenges • Wave identification • Can waves be optimized for performance? • Handling path explosion • 1 BR/5 inst  1050 loaded for 100 executed?

  12. Compiler Challenges • Semi-static instruction placement • Fetch partial/complete waves • Loads/stores close to memory • Clustering neighboring instructions • Reduce coherence traffic

More Related