1 / 33

Evaluating a $2M Commercial Server on a $2K PC and Related Challenges

This article discusses the challenges of evaluating a $2M commercial server using a $2K PC. It covers how to scale and tune workloads, manage simulation complexity, and cope with workload variability. It also explores the NSF challenges in computer architecture evaluation. The article includes insights from the Wisconsin Multifacet Project.

glust
Download Presentation

Evaluating a $2M Commercial Server on a $2K PC and Related Challenges

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Evaluating a $2M Commercial Server on a $2K PCand Related Challenges Mark D. Hill Multifacet Project (www.cs.wisc.edu/multifacet) Computer Sciences Department University of Wisconsin—Madison February 2003

  2. Context & Summary • Commercial Servers • Processors, memory, disks  $2M • Run large multithreaded transaction-oriented workloads • Use commercial applications on commercial OS • To Simulate on $2K PC • Scale & tune workloads • Manage simulation complexity • Cope with workload variability • NSF Challenges in Computer Architecture Evaluation Keep L2 miss rates, etc. Separate timing & function Use randomness & statistics • Advice researchers, program committees, & funders basically “know," but often forget to heed

  3. Multifacet: Commercial Server Design • Wisconsin Multifacet Project • Directed by Mark D. Hill & David A. Wood • Sponsors: NSF, WI, IBM, Intel, & Sun • Current Contributors: Alaa Alameldeen, Brad Beckman,Milo Martin, Mike Marty, Kevin Moore, & Min Xu • Commercial Server Availability • SafetyNet tolerates some transient faults [ISCA 2002] • Commercial Server Software Complexity • Flight Data Recorder aids debugging of multithreaded programs [ISCA 2003] • Commercial Server Design Complexity • Token Coherence eases coherence protocol design[IEEE Micro Top Picks, Nov-Dec 2003]

  4. Outline • Workload & Simulation Methods • Select, scale, & tune workloads • Transition workload to simulator • Specify & test the proposed design • Evaluate design with simple/detailed processor models • Separate Timing & Functional Simulation • Cope with Workload Variability • NSF Challenges in Computer Architecture Evaluation

  5. Protocol Development Timing Simulator Multifacet Simulation Overview Full Workloads Commercial Server(Sun Fire V880) Scaled Workloads • Virtutech Simics (www.virtutech.com) • Rest is Multifacet software Workload Development Memory Protocol Generator (SLICC) Full System FunctionalSimulator (Simics) Pseudo-RandomProtocol Checker Memory TimingSimulator (Ruby) Processor TimingSimulator (Opal)

  6. Select Important Workloads Full Workloads • Online Transaction Processing: DB2 w/ TPC-C-like • Java Server Workload: SPECjbb • Static web content serving: Apache • Dynamic web content serving: Slashcode • Java-based Middleware

  7. Setup & Tune Workloads (on real hardware) Full Workloads Commercial Server(Sun Fire V880) • Tune workload, OS parameters • Measure transaction rate, speed-up, miss rates, I/O • Compare to published results

  8. Scale & Re-tune Workloads Commercial Server(Sun Fire V880) Scaled Workloads • Scale-down for PC memory limits • Retaining similar behavior (e.g., L2 cache miss rate) • Re-tune to achieve higher transaction rates(OLTP: raw disk, multiple disks, more users, etc.)

  9. Transition Workloads to Simulation Scaled Workloads • Create disk dumps of tuned workloads • In simulator: Boot OS, start, & warm application • Create Simics checkpoint (snapshot) Full System FunctionalSimulator (Simics)

  10. Specify Proposed Computer Design • Coherence Protocol (control tables: states X events) • Cache Hierarchy (parameters & queues) • Interconnect (switches & queues) • Processor (later) Memory Protocol Generator (SLICC) Memory TimingSimulator (Ruby)

  11. Test Proposed Computer Design • Randomly select write action & later read check • Massive false-sharing for interaction • Perverse network stresses design • Transient error & deadlock detection • Sound but not complete Pseudo-RandomProtocol Checker Memory TimingSimulator (Ruby)

  12. Simulate with Simple Blocking Processor Scaled Workloads • Warm-up caches or sometimes sufficient (SafetyNet) • Run for fixed number of transactions • Some transaction partially done at start • Other transactions partially done at end • Cope with workload variability (later) Full System FunctionalSimulator (Simics) Memory TimingSimulator (Ruby)

  13. Simulate with Detailed Processor Scaled Workloads • Accurate (future) timing & (current) function • Simulation complexity decoupled (discussed soon) • Same transaction methodology& work variability issues Full System FunctionalSimulator (Simics) Memory TimingSimulator (Ruby) Processor TimingSimulator (Opal)

  14. Simulation Infrastructure & Workload Process Full Workloads Commercial Server(Sun Fire V880) Scaled Workloads • Select important workloads: run, tune, scale, & re-tune • Specify system & pseudo-randomly test • Create warm workload checkpoint • Simulate with simple or detailed processor • Fixed #transactions, manage simulation complexity (next),cope with workload variability (next next) Memory Protocol Generator (SLICC) Full System FunctionalSimulator (Simics) Pseudo-RandomProtocol Checker Memory TimingSimulator (Ruby) Processor TimingSimulator (Opal)

  15. Outline • Workload & Simulation Methods • Separate Timing & Functional Simulation • Simulation Challenges & Complexity • Timing-First Simulation • Cope with Workload Variability • NSF Challenges in Computer Architecture Evaluation

  16. Target Application Web Server Database (Simulated) Target System Kernels SPEC Benchmarks Operating System MMU Status Registers Real Time Clock Serial Port Processor PCI Bus I/O MMU Controller DMA Controller IRQ Controller Terminal RAM Graphics Card Ethernet Controller SCSI Controller Fiber Channel Controller CD-ROM SCSI Disk SCSI Disk … Simulating Function Getting Harder!

  17. Simulating Timing Getting Harder! • Micro-architecture complexity • Multiple “in-flight” instructions • Speculative execution • Out-of-order execution • Thread-level parallelism • Hardware Multi-threading • Traditional Multi-processing

  18. Timing and Functional Simulator Integrated (SimOS) - Complex Functional Simulator Timing Simulator Functional-First (Trace-driven) - Timing feedback Timing-Directed Timing Simulator Functional Simulator • Complete Timing • No? Function • No Timing • Complete Function Timing-First (Multifacet) Timing Simulator Functional Simulator • Complete Timing • Partial Function • No Timing • Complete Function Managing Simulator Complexity + Timing feedback - Tight Coupling - Performance?

  19. add load Commit Execute Cache Verify CPU Reload Timing-First Operation System CPU Network RAM Timing Simulator Functional Simulator • Timing Simulator runs speculatively ahead • On commit, calls Functional Simulator to verify • Reload Timing Simulator state if necessary,e.g., interrupt, unimplemented instruction

  20. Timing-First Simulation • Complete Timing • Partial Function • No Timing • Complete Function Timing Simulator Functional Simulator Timing-First Discussion • Supports speculative multi-processor timing models • Leverages existing simulators • Rapid development time (e.g., immediate checks) • Has low simulation overhead (18% uniprocessor) • Introduces relatively little performance error (< 3%) • BUT duplicates some code & function

  21. Outline • Workload & Simulation Methods • Separate Timing & Functional Simulation • Cope with Workload Variability • Variability in Multithreaded Workloads • Coping in Simulation • NSF Challenges in Computer Architecture Evaluation

  22. What is Happening Here? OLTP

  23. What is Happening Here? • How can slower memory lead to faster workload? • Answer: Multithreaded workload takes different path • Different lock race outcomes • Different scheduling decisions • (1) Does this happen for real hardware? • (2) If so, what should we do about it?

  24. One Second Intervals (on real hardware) OLTP

  25. 60 Second Intervals (on real hardware) 16-day simulation OLTP

  26. Coping with Workload Variability • Running (simulating) long enough not appealing • Need to separate coincidental & real effects • Standard statistics on real hardware • Variation within base system runs vs. variation between base & enhanced system runs • But deterministic simulation has no “within” variation • Solution with deterministic simulation • Add pseudo-random delay on L2 misses • Simulate base (enhanced) system many times • Use simple or complex statistics

  27. Confidence Interval Example ROB • Estimate #runs to getnon-overlapping confidence intervals

  28. Outline • Workload & Simulation Methods • Separate Timing & Functional Simulation • Cope with Workload Variability • NSF Challenges in Computer Architecture Evaluation Advice researchers, program committees, & fundersbasically “know," but often forget to heed

  29. NSF Challenges in Computer Architecture Evaluation • Dec 2001 NSF Computer Systems Architecture Workshop • Report in IEEE Computer, Aug 2003 • By Kevin Skadon, Margaret Martonosi,David August,Mark Hill, David Lilja, & Vijay Pai • Simulation Frameworks • P (Problem): Need more modularity, portability, & reuse • R (Recommendation): More simulations frameworks,e.g., ASIM & Liberty • Benchmarking • P: Benchmarks for too few domains • R: Reward benchmark development & characterization; consider micro- and synthetic benchmarks

  30. NSF Challenges in Computer Architecture Evaluation • Abstractions & Methodology • P: Believe simulation too much; other methods insufficiently • 1985 ISCA: 30% simulation & 30% modeling • 2001 ISCA: 90% simulation & 0% modeling • R: Push analytic models for insight, cross validation, & far—reaching research • Metrics, Accuracy, & Validation • P: Too dependent on relative & aggregate metrics • R: More metrics & statistical methods, especially when balancing multiple dimensions (e.g., performance & power)

  31. Talk Summary • Simulations of $2M Commercial Servers must • Complete in reasonable time (on $2K PCs) • Handle OS, devices, & multithreaded hardware • Cope with variability of multithreaded software • Multifacet • Scale & tune transactional workloads • Separate timing & functional simulation • Cope w/ workload variability via randomness & statistics • References (www.cs.wisc.edu/multifacet/papers) • Simulating a $2M Commercial Server on a $2K PC [Computer 2/03] • Full-System Timing-First Simulation [Sigmetrics 02] • Variability in Architectural Simulations … [HPCA 03] • NSF Panel • Challenges in Computer Architecture Evaluation [Computer 8/03]

  32. Backup Slides

  33. Other Multifacet Methods Work • Specifying & Verifying Coherence Protocols • [SPAA98], [HPCA99], [SPAA99], & [TPDS02] • Workload Analysis & Improvement • Database systems [VLDB99] & [VLDB01] • Pointer-based [PLDI99] & [Computer00] • Middleware [HPCA03] • Modeling & Simulation • Commercial workloads [Computer02] & [HPCA03] • Decoupling timing/functional simulation [Sigmetrics02] • Simulation generation [PLDI01] • Analytic modeling [Sigmetrics00] & [TPDS TBA] • Micro-architectural slack [ISCA02] • Interaction costs [Micro02]

More Related