1 / 15

Emulating Unimplemented Instructions in an SMT

Emulating Unimplemented Instructions in an SMT. Suan Yong & Brian Forney CS/ECE 752 Spring 2000. Motivation. Simultaneous Multithreaded processors are promising and likely to be embraced by industry Exploit thread level parallelism Compaq’s Alpha 21464 has planned SMT support. The question.

etana
Download Presentation

Emulating Unimplemented Instructions in an SMT

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Emulating Unimplemented Instructions in an SMT Suan Yong & Brian Forney CS/ECE 752 Spring 2000

  2. Motivation • Simultaneous Multithreaded processors are promising and likely to be embraced by industry • Exploit thread level parallelism • Compaq’s Alpha 21464 has planned SMT support

  3. The question • What if SMT support was needed but cost, complexity, or power consumption are an issue? • One solution is to remove functional units => emulation • Can anything be done to speed up emulation of instructions?

  4. Related work • “The Use of Multithreading for Exception Handling,” Zilles et al, Micro-32, November 1999 • “Simultaneous Subordinate Microthreading (SSMT),” Chappell et al, 26th Annual ISCA, May 1999

  5. 3 4 5 6 7 A A B B C C C C 6 7 6 8 7 9 8 10 Exception Handling 3 4 5 6 7 standard pipeline: SMT:

  6. PC 2 1 R1 R2 : 2 1 Rn 3 2 1 3 3 1 1 2 2 1 1 1 FP 2 1 BRANCH PREDICTOR FETCH I$ PC 3 1 R1 DECODE R2 : 3 1 Rn Simultaneous Multithreading (SMT) D$ R / W I-ALU I-MUL 3 1

  7. PC R1 R2 : Rn I-MUL FP Emulating SMT approach BRANCH PREDICTOR FETCH I$ PC R1 DECODE R2 : Rn D$ R / W I-ALU I-MUL T-STRT T-RET

  8. A PC & B A R1 src1 R2 src2 : [3] 7 6 5 4 Rn I-MUL FP 7 6 5 4 3 BRANCH PREDICTOR FETCH I$ PC R1 DECODE R2 : Rn D$ R / W I-ALU T-STRT T-RET 5 [7] [6] [5] [4] [3] [2] [1]

  9. PC A R1 R2 : A 7 4 Rn 5 I-MUL FP 7 7 6 5 4 4 3 BRANCH PREDICTOR FETCH I$ Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z PC C B A R1 src1 DECODE R2 src2 : [3] 7 6 5 4 Rn D$ R / W I-ALU T-STRT T-RET 5 [7] [6] [5] [4] [3] [2] [1]

  10. PC R1 R2 : Rn I-MUL FP 7 6 5 4 3 BRANCH PREDICTOR FETCH I$ Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z PC R1 src1 DECODE R2 src2 : [3] 6 C Rn D$ R / W I-ALU T-STRT T-RET C C B A [7] [6] [5] [4] [3] [2] [1]

  11. PC R1 R2 : Rn I-MUL FP 7 6 5 4 3 ? BRANCH PREDICTOR FETCH I$ Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z PC R1 src1 DECODE R2 src2 : [3] 6 C Rn D$ R / W I-ALU T-STRT T-RET C C B A [7] [6] [5] [4] [3] [2] [1]

  12. Methodology Modified Zilles’s sim-multi Compaq Alpha SMT simulator added exception thread support added multiply thread Ran representative execution traces of benchmarks from SPEC CPU2000 and MediaBench

  13. Simulator modes

  14. Conclusions ESMT usually minimizes performance cost of emulation “ooo” mode (non-pausing) works best “squash” is occasionally better, because of resource contention do partial squashing? Some of the hardware is already needed, and could be useful for other purposes

More Related