1 / 21

IA-64 Microarchitecture --- Itanium Processor

IA-64 Microarchitecture --- Itanium Processor. Jun Feng Jun Xie Huafeng Lü. Outline. Introduction Pipeline Issue Performance Comparison Summary. Itanium Processor. First implementation of IA-64 Compiler based exploitation of ILP Also has many features of superscalar.

junior
Download Presentation

IA-64 Microarchitecture --- Itanium Processor

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. IA-64 Microarchitecture --- Itanium Processor Jun Feng Jun Xie Huafeng Lü

  2. Outline • Introduction • Pipeline Issue • Performance Comparison • Summary

  3. Itanium Processor • First implementation of IA-64 • Compiler based exploitation of ILP • Also has many features of superscalar

  4. 10-stage Pipeline • Front-end • Instruction delivery • Operand delivery • Execution

  5. Front-end • IPG, Fetch, Rotate Prefetches up to 32 bytes per cycle (2 bundles) into a prefetch buffer (up to hold 8 bundles) Branch prediction is done using a multilevel adaptive predictor

  6. Instruction delivery • EXP and REN Distributes up to 6 instructions to the 9 functional units Implements registers renaming for both rotation and register stacking

  7. Operand delivery • WLD and REG Accesses the register file Performs register bypassing Accesses and updates a register scoreboard Checks predicate dependences

  8. Execution • EXE, DET and WRB Executes instructions through ALUs and load/store units Detects exceptions and posts NaTs Retires instructions and performs write-back

  9. Integer PerformanceSPECint benchmark: considerably slower • Itanium is considerably slower than Alpha 21264 and Pentium 4. • Only: 60% of of P4, 68% of Alpha Itanium: HP rx4610, 800MHz, 4MB off-chip L3 cache Alpha 21264: Compaq GS320, 1GHz, on-chip L2 cache Pentium 4: Compaq Precision 330, 2GHz, 256KB on-chip L2 cache

  10. Floating Point Performance SPECfp benchmarks: a different story • Itanium is quicker than Alpha 21264 and Pentium 4. • 108% of of P4, 120% of Alpha Itanium: HP rx4610, 800MHz, 4MB off-chip, L3 cache Alpha 21264: Compaq GS320, 1GHz, on-chip L2 cache Pentium 4: Compaq Precision 330, 2GHz, on-chip L2 cache

  11. Discussion on SPECfp • Floating point app: competitive .higher degrees of ILP .aggressive memory system Art benchmark: 4 times of Pentium 4 Alpha: outperform when tuned In terms of power: worse than P4 56% of floating point performance per watt

  12. Summary By Us • Good floating point performance • Poor integer performance • Overall: not so good as Intel has advertised

  13. Conclusion • Large code size • Only static instruction-level parallelism • Cannot manage cache misses/hits flexibly • Lack of applications

More Related