1 / 20

Symbiotic Jobscheduling for a Simultaneous Multithreading Processor

Symbiotic Jobscheduling for a Simultaneous Multithreading Processor. Authors : Allan Snavely and Dean Tullsen. Presenter: Alexandra Fedorova Simon Fraser University. Super-scalar Processor. Issue slots. Super-scalar processor has multiple issue slots

miyo
Download Presentation

Symbiotic Jobscheduling for a Simultaneous Multithreading Processor

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Symbiotic Jobscheduling for a Simultaneous Multithreading Processor Authors: Allan Snavely and Dean Tullsen Presenter: Alexandra Fedorova Simon Fraser University

  2. Super-scalar Processor Issue slots • Super-scalar processor has multiple issue slots • A “slot” means we can issue/schedule an instruction • Many issue slots  many instructions issued per cycle • This is possible, because we have many functional units time

  3. Problem: Under-Utilization Issue slots • A single thread is not always able to fill all the slots • Slots are left unused – we waste energy! • One solution is speculative, out-of-order execution, but it is difficult to implement and has limitations. time

  4. Simultaneous Multithreading Issue slots • An idea: fill unused slots with instructions from multiple threads • More instructions to choose from – more opportunity to fill the issue slots! time

  5. Problem: Contention Resource sharing increases utilization, but also LEADS TO CONTENTION

  6. Research Question How to enable sharing of processor hardware without causing contention?

  7. Outline • Background and Problem Statement • Overview of the Idea • Challenge • Details of the Solution • Research Methodology • Results

  8. Idea: Symbiotic Schedules • Assumption: OS has a queue of threads ready to execute. • Some threads compete less than others • Co-schedule threads that complete less CPU Scheduling queue

  9. Challenge • How do we measure the degree of contention? • How do we identify co-schedules that have little contention? ?

  10. Measuring Contention Background: IPC = instructions/cycle (measure of progress) IPCSMT – thread’s IPC on SMT processor: more contention  lower IPCSMT IPCsingle– thread’s IPC running alone IPCSMT / IPCsingle– measure of symbiosis for a given thread

  11. Weighted Speedup • Sum of symbiosis measures across all N threads:

  12. Maximizing Symbiosis • How to achieve the best symbiosis online? • Proposal #1: • Run different thread combos • Measure their Weighted Speedup • Remember combos with the bestWeighted Speedup • Co-schedule them in the future. Problem? Cannot measure WS online!!!

  13. Problem: Predicting WS • WS cannot be measured online • Only offline, in lab conditions • So we must estimate it • using metrics available online

  14. Estimating WS • Obtain hardware performance metrics (available online) • Measure WS (available offline) • Observe correlation between metrics and WS • Build a model to predict WS

  15. Estimating WS: Part I Run threads together, measure instructions, measure cycles 1. Measure IPCSMT , Run each thread in isolation, measure instructions, cycles 2. Measure IPCsingle 3. Compute WS = Σ (IPCSMT/IPCsingle)

  16. Estimating WS: Part II 4. Measure online hardware metrics Run threads together, read hardware counters • AllConf • Dcache • FQ • FP • etc. WS 5. Correlate WS to each metric WS1 AllConf1 WS2AllConf2 WS3AllConf3 WS4AllConf4 ... AllConf 6. Metric with highest correlation is the best predictor

  17. Result of the Model • We know which metric best predicts symbiosis (WS) • IPC • Dcache • FQ • Composite • Score • Measure Score online. If Score is high, there is high symbiosis.

  18. Scheduler • Sample • Run many different co-schedules • Measure hardware counters • Optimize • Predict which co-schedules have high symbiosis: those with high Score • Schedule • Select co-schedules that are predicted symbiotic (with high Score)

  19. Performance Results

  20. Summary • New processor motivated a new problem: resource contention • Addressed by co-scheduling symbiotic threads • Challenge: which threads are symbiotic? • Solution: heuristic based on hardware counters • On average 9% speedup

More Related