1 / 25

Center for Embedded Systems Research (CESR) Department of Electrical & Computer Eng’g

Virtual Multiprocessor: An Analyzable, High-Performance Microarchitecture for Real-Time Computing. Ali El-Haj-Mahmoud, Ahmed S. AL-Zawawi, Aravindh Anantaraman, and Eric Rotenberg. Center for Embedded Systems Research (CESR) Department of Electrical & Computer Eng’g

yan
Download Presentation

Center for Embedded Systems Research (CESR) Department of Electrical & Computer Eng’g

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Virtual Multiprocessor: An Analyzable, High-Performance Microarchitecture for Real-Time Computing Ali El-Haj-Mahmoud, Ahmed S. AL-Zawawi,Aravindh Anantaraman, and Eric Rotenberg Center for Embedded Systems Research (CESR) Department of Electrical & Computer Eng’g North Carolina State University

  2. Embedded Processor Trends • Inheriting desktop high-performance features • Examples • ARM11: 8-stage pipeline, caches, dynamic br. pred. • Ubicom IP3023: 8 hardware threads • PowerPC 750: 2-way superscalar, OOO execution

  3. Real-Time Systems and Analyzability A B • Schedulability of task-set determined a priori • Requires worst-case execution times (WCET) •  statically analyzable microarchitecture • Dynamic microarchitecture features complicate real-time design A trade-off between performance and analyzability

  4. Multiple Simple Processors • Analyzable • Natural fit with real-time systems • Rigid resource partitioning • Higher cost/performance metric C A B proc 1 proc 2

  5. Simultaneous Multithreading (SMT) • Flexible resource sharing • Better cost/performance metric • Unanalyzable C A B SMT

  6. Unanalyzability of SMT • Violates single-task WCET assumption (tasks analyzed separately) • Arbitrary periods  arbitrary overlap of tasks • Dynamic interference Cannot derive WCETs  Cannot perform schedulability

  7. Real-Time Virtual Multiprocessor • Combine MP analyzability and SMT flexibility • Key idea: Interference-free multithreading • SMT performance • WCET of each task independent of task-set • RVMP substrate: two parts • Highly reconfigurable multithreaded superscalar • Space: multiple arbitrary interference-free partitions • Time: rapidly reconfigure partitions • Static schedule orchestrates partitioning

  8. Big Picture Co-design processor and real-time scheduling for analyzable high-performance

  9. RVMP Architecture • Superscalar “ways” are natural partitioning granularity • Different-sized virtual processors carved out of single superscalar

  10. Processor Architecture • Starting point • Alpha 21164: 4-way in-order superscalar • Ubicom IP3023: 8 hardware threads (4 in RVMP  4 VPs) • Simplifications for analyzability • In-order issue within VPs • Software-managed scratchpads • Static branch prediction Not limitations of RVMP!

  11. Processor Architecture HRT 1 Fetch Unit Interleaved Instruction Scratchpad Decode Slotter and Scoreboard (Issue Logic) Data Scratchpad FU0: INT Int RF PC FU1: INT/MUL/DIV 4 4 Fetch Buffer FU2: INT/AGEN RD FU3: INT/AGEN RD WR FU4: FPU FP RF Shadow Buffers Shadow Buffers

  12. Instruction Fetch PV HRT PV FV FV Issue Decode Fetch Unit Instruction Scratchpad Backend Fetch Buffer Shadow Buffers Shadow Buffers

  13. Real-Time Scheduling A B • Too complicated • Must schedule entire hyper-period • Overwhelming # of possible space/time schedules • High dedicated-storage cost for schedule C D

  14. Task A period WCET2 WCET4 WCET3 WCET1 4 3 # of ways 2 … 1 … 4 ways … 3 ways … 2 ways … 1 way Round

  15. Task B period 4 3 # of ways 2 … 1 … 4 ways … 3 ways … 2 ways … 1 way

  16. Task C period 4 3 # of ways 2 … 1 … 4 ways … 3 ways … 2 ways … 1 way

  17. Task D period 4 3 # of ways 2 … 1 … 4 ways … 3 ways … 2 ways … 1 way

  18. SUCCEEDED FAILED Cyclic schedule

  19. Interaction: Scheduling and Architecture HRT LTC FV PV CVs EOT FU0 FU1 FU2 60 0 FU3 FU4 FU0 FU1 1 FU2 40 FU3 FU4 INVALID INVALID 60 40 100 cycles

  20. Experiments • Tasks from C-lab and MiBench benchmarks • 100 task-sets • 4 tasks per task-set (also 8 tasks in paper) • grouped according to scalar utilization (U_scalar) • Two experiments • Worst-case schedulability analysis • Run-time experiments

  21. Worst-Case Schedulability Tests

  22. RVMP Configurations

  23. Run-Time Experiments

  24. Unsafe Behavior of SMT

  25. Summary • Novel contributions • Virtualize a single processor • Space: variable-size interference-free partitions • Time: rapid reconfiguration • Simple real-time scheduling approach • Analyzability of MP with flexibility of SMT • Co-design processor and real-time scheduling for analyzable high-performance

More Related