1 / 22

UIUC - CS 433 IBM Power7

Adam Kunk Anil John Pete Bohman. UIUC - CS 433 IBM Power7. Quick Facts. Released by IBM in 2010 (~ February) Successor of the Power6 Clock Rate: 2.4 GHz - 4.25 GHz Feature size: 45 nm ISA: Power ISA v 2.06 Cores: 4, 6, 8 Cache: L1, L2, L3. Why the Power7?.

read
Download Presentation

UIUC - CS 433 IBM Power7

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Adam Kunk Anil John Pete Bohman UIUC - CS 433 IBM Power7

  2. Quick Facts • Released by IBM in 2010 (~ February) • Successor of the Power6 • Clock Rate: 2.4 GHz - 4.25 GHz • Feature size: 45 nm • ISA: Power ISA v 2.06 • Cores: 4, 6, 8 • Cache: L1, L2, L3 References: [1]

  3. Why the Power7? • PERCS – Productive, Easy-to-use, Reliable Computer System • DARPA funded contract that IBM won in order to develop the Power7 ($244 million contract, 2006) • Contract was to develop a petascale supercomputer architecture before 2011 in the HPCS (High Performance Computing Systems) project. • IBM, Cray, and Sun Microsystems received HPCS grant for Phase II. • IBM was chosen for Phase III in 2006. References: [1], [2]

  4. Blue Waters • Side note: • The Blue Waters system was meant to be the first supercomputer using PERCS technology. • But, the contract was cancelled (cost and complexity).

  5. History of Power POWER8 POWER7/7+ POWER6/6+ MostPOWERful & Scalable Processor in Industry POWER5/5+ FastestProcessor In Industry POWER4/4+ Hardware Virtualization for Unix & Linux • 4,6,8 Core • 32MB On-Chip eDRAM • Power Optimized Cores • Mem Subsystem ++ • 4 Thread SMT++ • Reliability + • VSM & VSX • Protection Keys+ • 45nm, 32nm • Dual Core • High Frequencies • Virtualization + • Memory Subsystem + • Altivec • Instruction Retry • Dyn Energy Mgmt • 2 Thread SMT + • Protection Keys • 65nm First Dual Corein Industry • Dual Core & Quad Core Md • Enhanced Scaling • 2 Thread SMT • Distributed Switch + • Core Parallelism + • FP Performance + • Memory bandwidth + • 130nm, 90nm • Dual Core • Chip Multi Processing • Distributed Switch • Shared L2 • Dynamic LPARs (32) • 180nm, 2001 2004 2007 2010 Future References: [3]

  6. Core Core Core Core S M P F A B R I C P O W E R L2 L2 L2 L2 G X B U S L2 L2 L2 L2 Memory Interface Core Core Core Core Memory++ Power7 Layout Cores: • 8 Intelligent Cores / chip (socket) • 4 and 6 Intelligent Cores available on some models • 12 execution units per core • Out of order execution • 4 Way SMT per core • 32 threads per chip • L1 – 32 KB I Cache / 32 KB D Cache per core • L2 – 256 KB per core Chip: • 32MB Intelligent L3 Cache on chip L3 Cache eDRAM References: [3]

  7. Power7 Options (8, 6, 4 cores) References: [3]

  8. Power7 Core • Each core implements “aggressive” out-of-order (OoO) instruction execution • The processor has an Instruction Sequence Unit capable of dispatching up to six instructions per cycle to a set of queues • Up to eight instructions per cycle can be issued to the Instruction Execution units References: [4]

  9. Execution Units • The Power7 processor has a set of 12 execution units: • 2 fixed point units • 2 load store units • 4 double precision floating point units • 1 vector unit • 1 branch unit • 1 condition register unit • 1 decimal floating point unit References: [4]

  10. Pipeline

  11. Exceptions

  12. ILP

  13. SMT • Simultaneous Multithreading • SMT1: Single instruction execution thread per core • SMT2: Two instruction execution threads per core • SMT4: Four instruction execution threads per core • This means that an 8-core Power7 can execute 32 threads simultaneously

  14. S80 HW Multi-thread Single thread Out of Order FX0 FX0 FX1 FX1 FP0 FP0 FP1 FP1 LS0 LS0 LS1 LS1 BRX BRX CRL CRL POWER5 2 Way SMT POWER7 4 Way SMT FX0 FX0 FX1 FX1 FP0 FP0 FP1 FP1 LS0 LS0 LS1 LS1 BRX BRX CRL CRL Thread 0 Executing Thread 1 Executing No Thread Executing Thread 2 Executing Thread 3 Executing Multithreading History References: [3]

  15. TLP

  16. Memory

  17. Memory Access • (Look at section 2.1.4 in http://www.redbooks.ibm.com/redpapers/pdfs/redp4639.pdf)

  18. Caches • (Look at section 2.1.6 in http://www.redbooks.ibm.com/redpapers/pdfs/redp4639.pdf)

  19. Performance

  20. Closing/Wrap-up

  21. References • 1. http://en.wikipedia.org/wiki/POWER7 • 2. http://en.wikipedia.org/wiki/PERCS • 3. Central PA PUG POWER7 review.ppt • http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&ved=0CCEQFjAA&url=http%3A%2F%2Fwww.ibm.com%2Fdeveloperworks%2Fwikis%2Fdownload%2Fattachments%2F135430247%2FCentral%2BPA%2BPUG%2BPOWER7%2Breview.ppt&ei=3El3T6ejOI-40QGil-GnDQ&usg=AFQjCNFESXDZMpcC2z8y8NkjE-v3S_5t3A

  22. References (cont.) • 4. http://www.redbooks.ibm.com/redpapers/pdfs/redp4639.pdf

More Related