1 / 10

Computer Architecture Principles Dr. Mike Frank

Computer Architecture Principles Dr. Mike Frank. CDA 5155 (UF) / CA 714-R (NTU) Summer 2003 Module #28 Limits to Instruction-Level Parallelism. Limits of ILP (3.8). There are limits to the amount of instruction-level parallelism that may be exploited! Assume a perfect processor:

neola
Download Presentation

Computer Architecture Principles Dr. Mike Frank

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Computer Architecture PrinciplesDr. Mike Frank CDA 5155 (UF) / CA 714-R (NTU)Summer 2003 Module #28 Limits to Instruction-Level Parallelism

  2. Limits of ILP (3.8) • There are limits to the amount of instruction-level parallelism that may be exploited! • Assume a perfect processor: • ∞ virtual registers,  no WAR/WAW hazards • Perfect branch/jump prediction • All memory addresses known/predicted exactly • Perfect caches (no fetch stalls) • ∞ issue width for all instruction types • 1-cycle execution latency for all instruction types • Only limits on ILP are then due to true data dependences through registers/memory. • Caveat: May not be a true limit, b/c even these may be reduced somewhat, through data value prediction.

  3. Average ILP in a Perfect Processor Note that ILP even on a perfect processor is limited in real applications!

  4. Implications of ILP Limitations • Part of the historical improvements in computer performance have come from decreased CPI • Increased IPC, or amount of ILP exploited • Reaching an ILP limit implies: No more CPI reductions! (For serially-written programs.) • Then, further perf. improvements may only be from: • Reduce total # instructions (more efficient app. algorithms) • Increased parallelism via other methods: • Programmer-visible parallelism, via various styles • Thread-level parallelism • Explicit vector-based programming styles • Improved clock speed: • Reduce gate delays per clock (but, minimum is 1) • Improve logic gate speed (but, scales only  length)

  5. Effect of Window Size on ILP IPC Max size to date,typical Used insubsequentanalysis

  6. Imperfect Branch Prediction Effect on ILP exploitable across basic blocksvia speculation

  7. Effect of Finite Register Set

  8. Imperfect Alias Analysis

  9. Ambitious Semi-Realistic ILP IPC Configuration studied here: 64 IPC, no issue restrictions; tournament predictor, 1Kentries, 16-entry return predictor; perfect dynamic aliasanalysis; renaming w. 64 extra integer, 64 extra FP regs.

  10. Limits on ILP in ‘Perfect’ Scheme • WAR & WAW hazards through memory • E.g. reuse of stack memory in procedure calls • Unnecessary dependences on: • Loop counter variables – fix via unrolling • Return address register – fix w. return stack • Stack pointer – fix by using non-stack-based code? • Dataflow limit • Might be overcome through value prediction • Predicting data values, speculating on result • Predicting address values for memory alias elimination

More Related