1 / 31

UltraSparc IV

UltraSparc IV. Tolga TOLGAY. OUTLINE. Introduction History What is new? Chip Multitreading Pipeline Cache Branch Prediction Conclusion. INTRODUCTION. Sparc = Scalable Processor Architecture Open processor architecture SUN UltraSparc v9: RISC Architecture 64 bit address and data

draco
Download Presentation

UltraSparc IV

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. UltraSparc IV Tolga TOLGAY

  2. OUTLINE • Introduction • History • What is new? • Chip Multitreading • Pipeline • Cache • Branch Prediction • Conclusion

  3. INTRODUCTION • Sparc = Scalable Processor Architecture • Open processor architecture • SUN UltraSparc v9: • RISC Architecture • 64 bit address and data • Superscalar

  4. HISTORY • Begin developing Sparc – 1984 • First Sparc Processor – 1986 • SuperSparc – 1992 • UltraSparc I – 1995 • UltraSparc II – 1997 • UltraSparc III – 2001 • UltraSparc IV – 2004 • UltraSparc IV+ – 2005 • UltraSparc T1 – 2005

  5. WHAT IS NEW? • What UltraSparc IV offers new : • CMT (Chip Multithreading) • New registers added due to CMT enhancement • MCU registers, Sun Fireplan Interconnect registers are shared. • Enhancements on Floating Point Unit • 16 MB L2 cache with 128 byte line-size shared by two processors. • L2 caches uses LRU replacement strategy • New write-cache indexing-hashing feature

  6. Chip Multitreading (CMT) • Two UltraSparc III cores into one die. • Two mirrored cores share : • System bus • DRAM controller • Off-die L2 cache • Fireplan registers. • Also called Chip Multiprocessing

  7. Chip Multitreading

  8. Chip Multitreading • Aim is to increase performance without increasing clock speed. • Mirroring the cores cause a hot spot of floating point units. • How to avoid hot spot : • Heat towers in copper interconnect

  9. Chip Multitreading

  10. Core • More core improvements: • Improved instruction fetch and store bandwidth. • Improved data prefetching • FPU can handle more unexpected and underflow cases so reducing exceptions. • On-die cache enhanced with a hashed index to better handle multiple writes.

  11. Pipeline • Because UltraSparc IV contains two UltraSparc III cores, it uses the same pipeline. • 4-way superscalar architecture. • 14-stage pipeline.

  12. Pipeline Stages

  13. Pipeline Stages

  14. Pipeline Stages

  15. Pipeline Stages • Stage A : Address Generation • Generates and selects the fetch address • Address can be selected from several sources • Stage P : Preliminary Fetch • Starts fetching from I-Cache • Accesses to Branch Predictor • Stage F : Fetch • Second half of I-Cache access • At the end of stage 4 instructions may be latched • Stage B : Branch Target Computation • Analyzes the instructions • Calculate branch target address

  16. Pipeline Stages • Stage I : Instruction Group Formation • Instructions are grouped into instruction queue. • Stage J : Instruction Group Staging • A group of instructions are dequeued and sent to R-Stage • Stage R : Dispatch and Register Access • Dependency calculation • Dependency solution

  17. Pipeline Stages • Stage E : Integer Instruction Execution • First stage of execution pipelines • Integer instructions -> A0 and A1 pipelines • Branch instructions -> Branch pipeline • Other instructions -> MS pipeline • Stage C : Cache • Integer pipelines write results back • SIU results are produced • First stage for Floating Point Instructions

  18. Pipeline Stages • Stage M : Miss • Data cache misses are determined • Second step for FP instructions • Stage W : Write • MS pipeline results are written • Third step for FP instructions • D-cache miss requests send to L2 cache • Stage X : Extend • Final step for Floating Point instructions • Results from FP instructions are ready for bypass

  19. Pipeline Stages • Stage T : Trap • Traps are signalled • After trap, instructions invalidate results • Stage D : Done • Integer results are written into architectural register file • Floating point results are written to floating point register file. • Results became visible to any traps generated from younger instructions.

  20. Pipeline Rules • Grouping rules : • Group : collection of instructions that does not limit eachother to be executed in parallel • Made before R-stage • Needed for : • The execution order is maintained • Each pipeline runs a subset of instructions • Instructions may require helpers • Execution order : in – order execution

  21. Cache Organization • Doubled cache size because of dual core. • Data Cache : 64 KB x 2 • Instruction Cache : 32 KB x 2 • L2 Cache : 16 MB, off-chip, shared • No L3 Cache

  22. Cache Organization

  23. Cache Organization • Data Cache • 64 KB Level 1 cache per core • Instruction Cache • 32 KB Level 1 cache per core • 4 – way associative

  24. Cache Organization • Prefetch Cache • One of L1 caches • 2 Kbyte SRAM : 32 x 64 bytes • Uses LRU replacement algorithm • Aim is to fetch data before needed • Reduces main memory access latency • 2 ports reads 8 bytes, 1 port writes 16 bytes per cycle. • Hardware prefetch

  25. Cache Organization • Write Cache • Reduces the bandwidth due to store traffic • 2 Kbyte cache • Handles multiprocessor and on-chip cache consistency • Improves error recovery • Optionally uses a hashed index

  26. Cache Organization • L2 Cache • 16 MB SRAM shared by two processors • Seperate L2 cache tags • Two way set associative • LRU replacement policy • 128 bytes of line size • UltraSparc IV+ has an on-die Level 2 cache with an off-die Level 3 cache

  27. Branch Prediction • Branch Predictor : • Small, single-cycle accessed • SRAM • Output is connected to P-stage • Branch detemination is made in B-stage • If miss, return to A-Stage.

  28. Conclusion • UltraSparc IV is a milestone as it is first dual core chip of UltraSparc family • Sun continues to develop UltraSparc : • UltraSparc IV+ • UltraSparc T1

  29. References • UltraSparc IV User’s Manual, Sun Microsystems • UltraSparc IV Whitepaper, Sun Microsystems • UltraSparc IV Mirrors Predecessor, Kevin Krewell • Implementation and Productization of a 4th Generation 1.8GHz Dual-Core SPARC V9 Microprocessor, Anand Dixit, Jason Hart, ... • UltraSparc III User’s Manual, Sun Microsystems

  30. References • Web Sites : • http://web.cs.unlv.edu/cs219/group3/index.html • http://bwrc.eecs.berkeley.edu/CIC/archive/cpu_history.html#SPARC • http://www.arcade-eu.org/overview/2005/sparcIV.html • http://www.top500.org/orsc/2006/sparcIV.htm • http://www.sparc.org/history.html

  31. Questions...

More Related