1 / 28

CS519: Lecture 7

CS519: Lecture 7. Uniprocessor and Multiprocessor Scheduling. What and Why?. What is processor scheduling? Why? At first to share an expensive resource – multiprogramming Now to perform concurrent tasks because processor is so powerful Future looks like past + now

osanna
Download Presentation

CS519: Lecture 7

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS519: Lecture 7 Uniprocessor and Multiprocessor Scheduling

  2. What and Why? • What is processor scheduling? • Why? • At first to share an expensive resource – multiprogramming • Now to perform concurrent tasks because processor is so powerful • Future looks like past + now • Rent-a-computer approach – large data/processing centers use multiprogramming to maximize resource utilization • Systems still powerful enough for each user to run multiple concurrent tasks DCS, Rutgers University

  3. Assumptions • Pool of jobs contending for the CPU • CPU is a scarce resource • Jobs are independent and compete for resources (this assumption is not always used) • Scheduler mediates between jobs to optimize some performance criteria DCS, Rutgers University

  4. Types of Scheduling We’re mostly concerned with short-term scheduling DCS, Rutgers University

  5. What Do We Optimize? • System-oriented metrics: • Processor utilization: percentage of time the processor is busy • Throughput: number of processes completed per unit of time • User-oriented metrics: • Turnaround time: interval of time between submission and termination (including any waiting time). Appropriate for batch jobs • Response time: for interactive jobs, time from the submission of a request until the response begins to be received • Deadlines: when process completion deadlines are specified, the percentage of deadlines met must be promoted DCS, Rutgers University

  6. Design Space • Two dimensions • Selection function • Which of the ready jobs should be run next? • Preemption • Preemptive: currently running job may be interrupted and moved to Ready state • Non-preemptive: once a process is in Running state, it continues to execute until it terminates or it blocks for I/O or system service DCS, Rutgers University

  7. Job Behavior CPU • I/O-bound jobs • Jobs that perform lots of I/O • Tend to have short CPU bursts • CPU-bound jobs • Jobs that perform very little I/O • Tend to have very long CPU bursts • Distribution tends to be hyper-exponential • Very large number of very short CPU bursts • A small number of very long CPU bursts Disk DCS, Rutgers University

  8. Scheduling Algorithms • FIFO: non-preemptive • Round-Robin: preemptive • Shortest Job Next: non-preemptive • Shortest Remaining Time: preemptive at arrival • Highest Response Ratio (turnaround/service time) Next: non-preemptive • Priority with feedback: preemptive at time quantum DCS, Rutgers University

  9. Service Time Arrival Time Process 1 0 3 2 2 6 3 4 4 4 6 5 5 8 2 Example Job Set DCS, Rutgers University

  10. Behavior of Scheduling Policies DCS, Rutgers University

  11. Behavior of Scheduling Policies DCS, Rutgers University

  12. Priority with Feedback Scheduling After each preemption, process moves to lower-priority queue DCS, Rutgers University

  13. Scheduling Algorithms • FIFO is simple but leads to poor average response times. Short processes are delayed by long processes that arrive before them • RR eliminate this problem, but favors CPU-bound jobs, which have longer CPU bursts than I/O-bound jobs • SJN, SRT, and HRRN alleviate the problem with FIFO, but require information on the length of each process. This information is not always available (although it can sometimes be approximated based on past history or user input) • Feedback is a way of alleviating the problem with FIFO without information on process length DCS, Rutgers University

  14. It’s a Changing World • Assumption about bi-modal workload no longer holds • Interactive continuous media applications are sometimes processor-bound but require good response times • New computing model requires more flexibility • How to match priorities of cooperative jobs, such as client/server jobs? • How to balance execution between multiple threads of a single process? DCS, Rutgers University

  15. Lottery Scheduling • Randomized resource allocation mechanism • Resource rights are represented by lottery tickets • Have rounds of lottery • In each round, the winning ticket (and therefore the winner) is chosen at random • The chances of you winning directly depends on the number of tickets that you have • P[wining] = t/T, t = your number of tickets, T = total number of tickets DCS, Rutgers University

  16. Lottery Scheduling • After n rounds, your expected number of wins is • E[win] = nP[wining] • The expected number of lotteries that a client must wait before its first win • E[wait] = 1/P[wining] • Lottery scheduling implements proportional-share resource management • Ticket currencies allow isolation between users, processes, and threads • OK, so how do we actually schedule the processor using lottery scheduling? DCS, Rutgers University

  17. Implementation DCS, Rutgers University

  18. Performance Allocated and observed execution ratios between two tasks running the Dhrystone benchmark. With exception of 10:1 allocation ratio, all observed ratios are close to allocations DCS, Rutgers University

  19. Short-term Allocation Ratio DCS, Rutgers University

  20. Isolation Five tasks running the Dhrystone benchmark. Let amount.currency denote a ticket allocation of amount denominated in currency. Tasks A1 and A2 have allocations 100.A and 200.A, respectively. Tasks B1 and B2 have allocations 100.B and 200.B, respectively. Halfway thru experiment B3 is started with allocation 300.B. This inflates the number of tickets in B from 300 to 600. There’s no effect on tasks in currency A or on the aggregate iteration ratio of A tasks to B tasks. Tasks B1 and B2 slow to half their original rates, corresponding to the factor of 2 inflation caused by B3. DCS, Rutgers University

  21. Thread Scheduling for Cache Locality • Traditionally, each resource (CPU, memory, I/O) has been managed separately • Resources are not independent, however • Policy for one resource can affect how another resource is used. For instance, the order in which threads are scheduled can affect performance of memory subsystem • Neat paper that uses a very simple scheduling idea to enhance memory performance DCS, Rutgers University

  22. Main Idea • When working with a large array, want to tile (block) for efficient use of the cache • What is tiling? Restructuring loops for data re-use. • Tiling by hand is a pain and is error-prone • Compilers can automatically tile but not always. For instance, when program contains dynamically allocated or indirectly accessed data • So, use threads and hints to improve cache utilization DCS, Rutgers University

  23. Example Thread ti is denoted by ti(ai1,…,aik), where aij is the address of the jth piece of data reference by thread ti. Simplify by using just 2 or 3 addresses or hints. Hints might be elements of rows or columns of a matrix, for ex. DCS, Rutgers University

  24. Algorithm • Hash algorithm should assign threads to bins so that threads with similar hints fall in the same bin • Threads in each bin are scheduled together • 2-D plane of bins with 2 hints • Key insight: the sum of the two dimensions of a bin is less than the cache size C • Easy to extend to k hints DCS, Rutgers University

  25. Performance Fork = create and schedule null thread Run = execute and terminate null thread DCS, Rutgers University

  26. More Complex Examples Partial differential equation solver DCS, Rutgers University

  27. Multiprocessor Scheduling • Load sharing: single ready queue; processor dequeues thread at the front of the queue when idle; preempted threads are placed at the end of queue • Gang scheduling: all threads belonging to an application run at the same time • Hardware partitions: chunk of the machine is dedicated to each application • Advantages and disadvantages? DCS, Rutgers University

  28. Multiprocessor Scheduling • Load sharing: poor locality; poor synchronization behavior; simple; good processor utilization. Affinity or per processor queues can improve locality. • Gang scheduling: central control; fragmentation --unnecessary processor idle times (e.g., two applications with P/2+1 threads); good synchronization behavior; if careful, good locality • Hardware partitions: poor utilization for I/O-intensive applications; fragmentation – unnecessary processor idle times when partitions left are small; good locality and synchronization behavior DCS, Rutgers University

More Related