1 / 26

Multi-processor Scheduling

Multi-processor Scheduling. Two implementation choices Single, global ready queue Per-processor run queue Which is better?. Queue-per-processor. Advantages of queue per processor Promotes processor affinity (better cache locality) Removes a centralized bottleneck

xuxa
Download Presentation

Multi-processor Scheduling

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Multi-processor Scheduling • Two implementation choices • Single, global ready queue • Per-processor run queue • Which is better?

  2. Queue-per-processor • Advantages of queue per processor • Promotes processor affinity (better cache locality) • Removes a centralized bottleneck • Which runs in global memory • Supported by default in Linux 2.6 • Java 1.6 support: a double-ended queue (java.util.Deque) • Use a bounded buffer per consumer • If nothing in a consumer’s queue, steal work from somebody else • If too much in the queue, push work somewhere else

  3. Thread Implementation Issues Andrew Whitaker

  4. Where do Threads Come From? • A few choices: • The operating system • A user-mode library • Some combination of the two…

  5. Option #1: Kernel Threads • Threads implemented inside the OS • Thread operations (creation, deletion, yield) are system calls • Scheduling handled by the OS scheduler • Described as “one-to-one” • One user thread mapped to one kernel thread • Every invocation of Thread.start() creates a kernel thread process OS threads

  6. Option #2: User threads • Implemented as a library inside a process • All operations (creation, destruction, yield) are normal procedure calls • Described as “many-to-one” • Many user-perceived threads map to a single OS process/thread process OS thread

  7. Process Address Space Review • Every process has a user stack and a program counter • In addition, each process has a kernel stack and program counter • (not shown here) stack SP heap (dynamic allocated mem) static data (data segment) code (text segment) PC

  8. Threaded Address Space User address space (for both user and kernel threads) • Every thread always has its own user stack and program counter • For both user, kernel threads • For user threads, there is only a single kernel stack, program counter, PCB, etc. thread 1 stack SP (T1) thread 2 stack SP (T2) thread 3 stack SP (T3) heap (dynamic allocated mem) static data (data segment) PC (T2) code (text segment) PC (T1) PC (T3)

  9. User Threads vs. Kernel Threads • User threads are faster • Operations do not pass through the OS • But, user threads suffer from: • Lack of physical parallelism • Only run on a single processor! • Poor performance with I/O • A single blocking operation stalls the entire application • For these reasons, most (all?) major OS’s provide some form of kernel threads

  10. When Would User Threads Be Useful? • The  calculator? • The web server? • The Fibonacci GUI?

  11. Option #3: Two-level Model • OS supports native multi-threading • And, a user library maps multiple user threads to a single kernel thread • “Many-to-many” • Potentially captures the best of both worlds • Cheap thread operations • Parallelism process OS threads

  12. Problems with Many-to-Many Threads • Lack of coordination between user and kernel schedulers • “Left hand not talking to the right” • Specific problems • Poor performance • e.g., the OS preempts a thread holding a crucial lock • Deadlock • Given K kernel threads, at most K user threads can block • Other runnable threads are starved out!

  13. Scheduler Activations, UW 1991 • Add a layer of communication between kernel and user schedulers • Examples: • Kernel tells user-mode that a task has blocked • User scheduler can re-use this execution context • Kernel tells user-mode that a task is ready to resume • Allows the user scheduler to alter the user-thread/kernel-thread mapping • Supported by newest release of NetBSD

  14. Implementation Spilling Over into the Interface • In practice, programmers have learned to live with expensive kernel threads • For example, thread pools • Re-use a static set of threads throughout the lifetime of the program

  15. Locks • Used for implementing critical sections • Modern languages (Java, C#) implicitly acquire and release locks interface Lock { public void acquire(); // only one thread allowed between an // acquire and a release public void release(); }

  16. Two Varieties of Locks • Spin locks • Threads busy wait until the lock is freed • Thread stays in the ready/running state • Blocking locks • Threads yield the processor until the lock is freed • Thread transitions to the blocked state

  17. Why Use Spin Locks? • Spin Locks can be faster • No context switching required • Sometimes, blocking is not an option • For example, in the kernel scheduler implementation • Spin locks are neverused on a uniprocessor

  18. Multiple threads can acquire this lock! Bogus Spin Lock Implementation class SpinLock implements Lock { private volatile boolean isLocked = false; public void acquire() { while (isLocked) { ; } // busy wait isLocked = true; } public void release() { isLocked = false; } }

  19. Hardware Support for Locking • Problem: Lack of atomicity in testing and setting the isLocked flag • Solution: Hardware-supported atomic instructions • e.g., atomic test-and-set • Java conveniently abstracts these primitives (AtomicInteger, and friends)

  20. Corrected Spin Lock class SpinLock implements Lock { private final AtomicBoolean isLocked = new AtomicBoolean (false); public void acquire() { // get the old value, set a new value while (isLocked.getAndSet(true)) { ; } } public void release() { assert (isLocked.get() == true); isLocked.set(false); } }

  21. Problem: must ensure thread-safe access to the wait queue! Blocking Locks: Acquire Implementation • Atomically test-and-set locked status • If lock is already held: • Set thread state to blocked • Add PCB (task_struct) to a wait queue • Invoke the scheduler

  22. Disabling Interrupts • Prevents the processor from being interrupted • Serves as a coarse-grained lock • Must be used with extreme care • No I/O or timers can be processed

  23. Thread-safe Blocking Locks • Atomically test-and-set locked status • If lock is already held: • Set thread state to blocked • Disable interrupts • Add PCB (task_struct) to a wait queue • Invoke the scheduler • Next task re-enables interrupts

  24. Disabling Interrupts on a Multiprocessor • Disabling interrupts can be done locally or globally (for all processors) • Global disabling is extremely heavyweight • Linux: spin_lock_irq • Disable interrupts on the local processor • Grab a spin lock to lock out other processors

  25. Preview For Next Week publicclass Example extends Thread { private staticint x = 1; private static int y = 1; private static boolean ready = false; publicstaticvoid main(String[] args) { Thread t = new new Example(); t.start(); x = 2; y = 2; ready = true; } publicvoid run() { while (! ready) Thread.yield(); // give up the processor System.out.println(“x= “ + x + “y= “ + y); } }

  26. What Does This Program Print? • Answer: it’s a race condition. Many different outputs are possible • x=2, y=2 • x=1,y=2 • x=2,y=1 • x=1,y=1 • Or, the program may print nothing! • The ready loop runs forever

More Related