1 / 14

Is Out-Of-Order Out Of Date ?

Is Out-Of-Order Out Of Date ?. IA-64’s parallel architecture will improve processor performance William S. Worley Jr., HP Labs Jerry Huck, IA-64 Architecture Lab. Slides by Selvin, Pascal, Pavel. The prelude to the IA-64. The need for greater processing power is increasing

Download Presentation

Is Out-Of-Order Out Of Date ?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Is Out-Of-Order Out Of Date ? IA-64’s parallel architecture will improve processor performanceWilliam S. Worley Jr., HP LabsJerry Huck, IA-64 Architecture Lab Slides by Selvin, Pascal, Pavel

  2. The prelude to the IA-64 • The need for greater processing power is increasing • New innovative computing technologies • Traditional computing has increasing problem sizes • Architecture design from the ground-up to support ILP • Enables the compiler to express more parallelism (EPIC) • Reduces hardware cost of scheduling parallel instructions • Current approaches • Legacy architectures were not designed primarily for high ILP • Non-architectural, principally OOO dynamic superscalar hardware • IA-64 • Growing market for high-performance 64-bit architecture • No existing Intel 64-bit binaries

  3. Not just for ILP • Better building block for high performance systems • Multi-programming gives limited improvements • Parallelism has to be improved at all levels in the system • Solely hardware-based multithreading cannot compensate for lack of parallelism in the basic processing element. • SMT, CMP apply equally to RISC, CISC and EPIC • Integrated hardware multithreading is orthogonal to EPIC • Inter-thread interference in SMT processors • Hardware Resource Utilization vs. Complexity • Transistors : PA-8000 re-order buffer = PA-7200 • Complexity scales quadratically for 1.5x or 2x increase in issue-width

  4. Architecture vs. Implementation • Speed of Functional units is architecture independent • Memory and Data-cache hierarchy • Largely independent of the architecture • OOO RISC designs achieve better utilization • With additional cost, it is possible to realise better designs • IA-64 memory-system balanced cost and performance • Cycle time of IA-64 • IC process, number of registers, register ports, bypass network, number of cache ports • Critical path is found in functional units and bypass networks • IA-64 have higher utilization of this fundamental structure

  5. IA-64 Parallelism Capabilities • Predication: • less encountered branches • less mispredicted branches • more parallelism • Larger register set: • new coding strategies (impossible with RISC) • more efficient than register renaming (RISC) • less data loss in the event of an interruption

  6. IA-64 Parallelism Capabilities (2) • Features to deal with memory latency: • earlier access to variables • not restricted to fixed hardware algorithms for: • correctly predicting execution path • triggering memory fetches • heuristics to identify speculative load candidates • compiler involved • control of the degree of speculation by the programmer

  7. IA-64 Parallelism Capabilities (3) • Register Stack Engine (RSE): • increases the utilization of the register file • reduces the cost of procedures calls, returns • especially valuable for object-oriented code • straightforward hardware design • Mechanisms to deliver instructions to the processor • eliminate effects of increased code size • modest design costs

  8. Results • Comparison between PA-RISC and IA-64 • 15 codes (encryption, decryption and keying for five AES algorithms) • 8/15 IA-64 codes used more than 32 reg. • 6/15 IA-64 codes smaller • 2/15 IA-64 codes 4 times smaller • overall code size 27 % larger (could have been reduced to 10%)

  9. IA-64 uses existing compiler techniques to exploit parallelism: data prefetch branch hints loop unrolling profile-based path instructions other Compilers and IA-64

  10. IA-64 does require well-prepared code: (profiled, with branch hints, etc) to achieve high performance, but this is also true for Out-of-Order processors. Lack of code profiling is equally harmful both for IA-64 and OOO architectures. With profiled code, IA-64 is superior to OOO, as proven by benchmark tests (specFP64) Need for compiler support

  11. Critical path instructions (e.g. long latency operations) • OOO compilers don’t distinguish them, so such instrs. often have high exec. cost • IE-64 compilers must detect such instructions and make sure they start first (*) * Cost of mispredicts is minimized by prefetches issued by the compiler

  12. Compiler contribution: static code generation (i.e. fewer branches) branch hints Hardware mechanisms: sample instructions on timer ticks, get information about actual program flow (HP) feedback info on cache misses back to the program (Intel Itanium) Dealing with cache misses

  13. IA-64 has hint fields in most branch and memory instructions to allow the program collect flow info from and pass it to the processor. These features allow software to improve performance during the run-time, without recompilation. Dynamic prediction mechanisms (2)

  14. Initial implementation (as always) focuses on the most important architectural elements only. It uses the ideas of EPIC while providing compatibility with IA-32 and PA-RISC processors. Future implementations will deliver even more ILP Creators assure that the IA-64 architecture will not remain fixed Current and Future IA-64 implementations

More Related