1 / 17

CMP/CMT Scaling of SPECjbb2005 on UltraSPARC T1 (Niagara)

10 th Workshop on Computer Architecture Evaluation using Commercial Workloads (CAECW-10). CMP/CMT Scaling of SPECjbb2005 on UltraSPARC T1 (Niagara). Dimitris Kaseridis and Lizy K. John The University of Texas at Austin Laboratory for Computer Architecture http://lca.ece.utexas.edu.

xena
Download Presentation

CMP/CMT Scaling of SPECjbb2005 on UltraSPARC T1 (Niagara)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 10th Workshop on Computer Architecture Evaluation using Commercial Workloads (CAECW-10) CMP/CMT Scaling of SPECjbb2005 on UltraSPARC T1 (Niagara) Dimitris Kaseridis and Lizy K. John The University of Texas at Austin Laboratory for Computer Architecture http://lca.ece.utexas.edu

  2. Outline • Brief Description of UltraSPARC T1 • Objectives • SpecJbb2005 Benchmark • Results Laboratory for Computer Architecture

  3. A new multi-threaded processor that combines CMP & SMT in CMT 8 cores with each one handling 4 hardware context threads 32 active hardware context threads Simple in-order pipeline with no branch prediction unit per core Optimized for multithreaded performance  Throughput High throughput  hide the memory and pipeline stalls/latencies by scheduling other threads with Zero cycle thread switch penalty UltraSPARC T1 Laboratory for Computer Architecture

  4. SMP vs. CMT Laboratory for Computer Architecture

  5. UltraSPARC T1 Core Pipeline • Thread Group shares L1 cache, TLBs, execution units, pipeline registers and datapath • Core area = 11 mm2 (90 nm technology) • 4 way MT adds ~ 20% area to core Laboratory for Computer Architecture

  6. Objectives • Evaluate CMP/CMT benefits • Quantify the benefits that additional cores and/or additional hardware threads on a multithreaded environment • Show effectiveness of latency hiding Laboratory for Computer Architecture

  7. Characteristics Model a self contained 3-tier system: Server, Database and Clients Every warehouse is a collection of Java objects with ~25MB of data Each client is represented by an individual thread No I/O effects Reported score: Billion of Operations per Second (BOPS) Targets performance of CPUs, caches, memory hierarchy and the scalability of shared memory processors Stresses the implementations of: JVM (Java Virtual Machine), JIT (Just-In-Time) compiler, garbage collection and threads SPECjbb 2005 Benchmark SPECjbb2005 3-tier architecture Laboratory for Computer Architecture

  8. Experimental parameters Parameters Laboratory for Computer Architecture

  9. On-chip performance counters for real/accurate results Niagara: Solaris10 tools : cpustat, cputrack 2 counters per Hardware Thread with one only for Instruction count Measurements Methodology Laboratory for Computer Architecture

  10. Results – Latency hiding pay off Single core execution using 4 threads on one core Single Thread Execution on T1 SpecJbb Score (BOPS) X2 instead of 4 SpecJbb Score (BOPS) Number of Warehouses Number of Warehouses Laboratory for Computer Architecture

  11. CMP / CMT Scaling – CMP benefits 8 corex 1 thread/cores SpecJbb Score (BOPS) Number of Warehouses Laboratory for Computer Architecture

  12. CMP / CMT Scaling – CMT benefits 8 corex 2 threads/cores SpecJbb Score (BOPS) Number of Warehouses • 75% of the benefit of adding a single core • Significant less area and power requirements (remember that 4 way MT adds ~ 20% area to each core) Laboratory for Computer Architecture

  13. CMP / CMT Scaling – SMT benefits 8 corex 4 threads/cores SpecJbb Score (BOPS) Number of Warehouses Laboratory for Computer Architecture

  14. CMP / CMT Scaling – SMT benefits SpecJbb Score (BOPS) Number of Warehouses • Additional hardware threads > 2 give an additional benefit of 45% • Gradually diminishing returns in terms of SMT efficiency • Garbage collector significantly effects regions 4 and 5 Laboratory for Computer Architecture

  15. SPECjbb Score Scaling IPC of three configurations Best case SPECjbb score speedup IPC Norm. SPECjbb score Number of Virtual Processors Laboratory for Computer Architecture

  16. Conclusions • Throughput vs. Latency in multiprocessing/multithreaded environments • Latency hiding is a good/promising technique against aggressive speculation • Adding SMT can give up to 75% the benefit of CMP with significant less cost • Moving to higher levels of SMT shows diminishing returns  tradeoffs between #cores and #Hardware threads per core Laboratory for Computer Architecture

  17. Thank you… Questions?? The Laboratory for Computer Architecture Web-site: http://lca.ece.utexas.edu Laboratory for Computer Architecture

More Related