1 / 18

Application of Instruction Analysis/Synthesis Tools to x86’s Functional Unit Allocation

Application of Instruction Analysis/Synthesis Tools to x86’s Functional Unit Allocation. Ing-Jer Huang and Ping-Huei Xie Institute of Computer & Information Engineering National Sun Yat-sen University Kaohsiung, Taiwan 80441 R. O. C. ijhuang@cie.nsysu.edu.tw.

scout
Download Presentation

Application of Instruction Analysis/Synthesis Tools to x86’s Functional Unit Allocation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Application of Instruction Analysis/Synthesis Tools to x86’s Functional Unit Allocation Ing-Jer Huang and Ping-Huei Xie Institute of Computer & Information Engineering National Sun Yat-sen University Kaohsiung, Taiwan 80441 R. O. C. ijhuang@cie.nsysu.edu.tw

  2. Decoupled superscalar architecture register renaming branch prediction Assumptions no cache miss fast instruction fetcher and decoder 100% branch prediction correct load/store unit: 2 cycles;others: 1 cycle large RS and ROB Superscalar Model under Investigation

  3. FU Usage 4A, 2M, 1B 3A, 0M, 0B 2A, 2M, 1B 2A, 1M, 0B 1A, 1M, 1B Frequency The Problem Q: How many functional units are needed in an x86 compatible superscalar core? A:The distribution of functional unit usage in typical x86 programs

  4. How to Obtain FU Distribution? • Simulation-based approaches [Shinatani, 1995], [Davidson, 1995], [Hara et al., 1996], etc. • Running on different CPU platforms • Slow, but can explore many configurations • Monitoring-based approaches [Adams et al., 1989], [Bhandarkar et al., 1997], [Huang, 1997], etc. • Directly running on the same CPU platform • Fast, but work for only the configuration of the underlying CPU platform

  5. A Fast Performance/Cost Approximation Environment

  6. ASIA: Automatic Synthesis of Instruction Set Architedcture • GOAL: analyzes and synthesizes application-specific instruction set for pipelined uni-processors. • APPROACH: a micro-operation scheduling engine based on a simulated annealing algorithm  The superscalar core is an application-specific RISC core for x86 emulation

  7. ASIA-II: Extensions for Superscalar Architecture • Register renaming • Temporary registers are used on the fly to resolve anti and data dependencies. • Execution window • Instructions are dispatched sequentially. • Branch prediction • Effective sizes of basic blocks are enlarged.

  8. Register Renaming • In ASIA-II: ignore output, anti dependencies during scheduling

  9. Realistic Patterns in the Execution Window • Balanced distribution: 0bjective function includes both time steps and H/W counts • Window effect: MOP’s are displaced with a limited distance; long distance is possible with many iterations of displacement .as long as performance is improved.

  10. Basic Block Expansion (Eblocks) Due to Branch Prediction

  11. A Small Example from Word97

  12. Extended Basic Blocks

  13. Scheduled Eblocks

  14. Description of Benchmark

  15. Micro-operation Level Parallelism (MSP)

  16. Notation: A - Integer unit M - Memory unit B - Branch unit F - Floating unit Others is the sum of that frequent less than 1.0% Functional Unit Usage

  17. Accumulated Coverage of Functional Unit Allocation (NSC 98) (IA-64) (AMD K6) (Pentium Pro) (Base Machine)

  18. Conclusions • Synthesis/analysis tools have been used to observe the functional unit usage and MLP in superscalar core. • Speedup over simulation is over 600 times. • FUTURE WORK: investigate various microarchitecture features • register renaming vs. branch prediction • functional unit optimization

More Related