1 / 26

Effect of Context Aware Scheduler on TLB

Effect of Context Aware Scheduler on TLB. Satoshi Yamada PhD Candidate Kusakabe Laboratory. Contents. Introduction Overhead of Context Switch Context Aware Scheduler Benchmark Applications and Measurement Environment Result Related Works Conclusion. widely spread multithreading.

hector
Download Presentation

Effect of Context Aware Scheduler on TLB

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Effect of Context Aware Scheduler on TLB Satoshi Yamada PhD Candidate Kusakabe Laboratory 3rd Joint Workshop on Embedded and Ubiquitous Computing

  2. Contents • Introduction • Overhead of Context Switch • Context Aware Scheduler • Benchmark Applications and Measurement Environment • Result • Related Works • Conclusion 3rd Joint Workshop on Embedded and Ubiquitous Computing

  3. widely spread multithreading • Multithreading hides the latency of disk I/O and network access • Threads in many languages, Java, Perl, and Python correspond to OS threads * More context switches happen today * Process scheduler in OS is more responsible for the system performance 3rd Joint Workshop on Embedded and Ubiquitous Computing

  4. A only B only Switch Switch A and B Context Switch and Caches • Overhead of a context switch • includes that of loading a new working set for next process • is deeply related with the utilization of caches • Agarwal. etc “Cache performance of operating system and multiprogramming workloads” (1988) • Mogul, et al. “The effect of of context switches no cache performance” (1991) Process B Process A Working sets overflows the cache Process A Process B Process B Process A 3rd Joint Workshop on Embedded and Ubiquitous Computing Cache

  5. Child clone() Child task_struct task_struct mm_struct mm_struct mm signal file . . mm signal file . . share copy signal_struct signal_struct . . . . . . Sibling Threads Advantage of Sibling Thread Parent Parent fork() task_struct task_struct mm_struct mm signal file . . mm signal file . . signal_struct signal_struct . . create a PROCESS create a THREAD OS does not have to switch memory address spaces in switch sibling threads we can expect the reduction of the overhead of context switch 3rd Joint Workshop on Embedded and Ubiquitous Computing

  6. Contents • Introduction • Overhead of Context Switch and TLB • Context Aware Scheduler • Benchmark Applications and Measurement Environment • Result • Related Works • Conclusion 3rd Joint Workshop on Embedded and Ubiquitous Computing

  7. TLB flush in Context Switch • TLB is a cache which stores the translation from virtual addresses into physical address • TLB translation latency: 1 ns • TLB miss overhead: several accesses to memory • On x86 processors, most of TLB entries are invalidated (flushed) in every context switch by changing memory address space TLB flush does not happen in the context switch among sibling threads 3rd Joint Workshop on Embedded and Ubiquitous Computing

  8. Overhead due to a context switch by lat_ctx in LMbench 3rd Joint Workshop on Embedded and Ubiquitous Computing

  9. Contents • Introduction • Overhead of Context Switch and TLB • Context Aware Scheduler • Benchmark Applications and Measurement Environment • Result • Related Works • Conclusion 3rd Joint Workshop on Embedded and Ubiquitous Computing

  10. O(1) Scheduler in Linux • O(1) scheduler runqueue has • active queue and expired queue • priority bitmap and array of linked list of threads • O(1) scheduler • searches priority bitmap • chooses a thread with the highest priority Scheduling overhead is independent of the number of threads 3rd Joint Workshop on Embedded and Ubiquitous Computing

  11. Context Aware (CA) Scheduler • CA scheduler creates auxiliary runqueues per group of threads • CA scheduler compares Preg and Paux • Preg: the highest priority in regular O(1) scheduler runqueue • Paux: the highest priority in the auxiliary runqueue • if Preg - Paux <= threshold, then we choose Paux 3rd Joint Workshop on Embedded and Ubiquitous Computing

  12. Context Aware Scheduler Linux O(1) scheduler A B C D E Context switches between processes:3 times A C D B E CA scheduler Context switches between processes:1 time 3rd Joint Workshop on Embedded and Ubiquitous Computing

  13. Fairness • O(1) scheduler keeps the fairness by epoch • cycles of active queue and expired queue • CA scheduler also follows epoch • guarantee the same level of fairness as O(1) scheduler 3rd Joint Workshop on Embedded and Ubiquitous Computing

  14. Contents • Introduction • Overhead of Context Switch • Context Aware Scheduler • Benchmarks and Measurement Environment • Result • Related Works • Conclusion 3rd Joint Workshop on Embedded and Ubiquitous Computing

  15. Benchmarks • Java • Volano Benchmark (Volano) • lusearch program in DaCapo benchmark suite (DaCapo) • C • Chat benchmark (Chat) • memory program in SysBench benchmark suite (SysBench) Information of Each Benchmark Applications 3rd Joint Workshop on Embedded and Ubiquitous Computing

  16. Measurement Environment • Hardware • Sun’s J2SE 5.0 • threshold of context aware scheduler • 1 and 10 • Perfctr to count the TLB misses • GNU’s time command to measure the total system performance 3rd Joint Workshop on Embedded and Ubiquitous Computing

  17. Contents • Introduction • Overhead of Context Switch • Context Aware Scheduler • Benchmarks and Measurement Environment • Result • Related Works • Conclusion 3rd Joint Workshop on Embedded and Ubiquitous Computing

  18. Effect on TLB Results of TLB misses (million times) • CA scheduler significantly reduces TLB misses • Bigger threshold is more effective • frequent changes of priority by dynamic priority especially in DaCapo and Volano 3rd Joint Workshop on Embedded and Ubiquitous Computing

  19. Effect on System Performance Results of the Counters in Each Application(seconds) Results by time command (seconds) • CA scheduler • enhances the throughput on every application • reduces the total elapsed time by 43% 3rd Joint Workshop on Embedded and Ubiquitous Computing

  20. Contents • Introduction • Overhead of Context Switch • Context Aware Scheduler • Benchmarks and Measurement Environment • Result • Related Works • Conclusion 3rd Joint Workshop on Embedded and Ubiquitous Computing

  21. H. L. Sujay Parekh, et. al,“Thread Sensitive Scheduling for SMT Processors” (2000) • Parekh’s scheduler • tries groups of threads to execute in parallel and sample the information about • IPC • TLB misses • L2 cache misses, etc • schedules on the information sampled Sampling Phase Scheduling Phase Sampling Phase Scheduling Phase 3rd Joint Workshop on Embedded and Ubiquitous Computing

  22. Pranay Koka, et. al, “Opportunities for Cache Friendly Process” (2005) • Koka’s scheduler • traces the execution of each thread • put the focus on the shared memory space between threads • Schedule on the information above Tracing Phase Scheduling Phase Tracing Phase Scheduling Phase 3rd Joint Workshop on Embedded and Ubiquitous Computing

  23. Conclusion • Conclusion • CA scheduler is effective in reducing TLB misses • CA scheduler enhances the throughput of every application • Future Works • Evaluation on other platforms • Investigation of fairness among an epoch • compare with Completely Fair Scheduler (Linux 2.6.23) 3rd Joint Workshop on Embedded and Ubiquitous Computing

  24. widely spread multithreading ThreadA ThreadB • Multithreading hides the latency of disk I/O and network access • Threads in many languages, Java, Perl, and Python correspond to OS threads ThreadB waits disk * More context switches happen today * Process scheduler in OS is more responsible for the system performance 3rd Joint Workshop on Embedded and Ubiquitous Computing

  25. Context Aware (CA) scheduler Our CA scheduler aggregates sibling threads Linux O(1) scheduler CA scheduler A B C D E Context switches between processes:3 times A C D B E Context switches between processes:1 time 3rd Joint Workshop on Embedded and Ubiquitous Computing

  26. Results of Context Switch (micro seconds) Process C Process A 2MB L2 cache size: 2MB Process B 1MB Cache 3rd Joint Workshop on Embedded and Ubiquitous Computing 0

More Related