COP 5611 Operating Systems Spring 2010

COP 5611 Operating Systems Spring 2010 Dan C. Marinescu Office: HEC 439 B Office hours: M-Wd 2:00-3:00 PM

Lecture 7 Last time: Thread coordination Today: Thread coordination Scheduling Multi-level memories I/O bottleneck Next Time: 2 2 2 2

Hardware support for atomic actions • RSM (Read and Set Memory) instruction • TST (Test and Set) instruction • Two primitives: • ACQUIRE (lock) • RELEASE (lock) use atomic instructions to manipulate the lock.

Lecture 6 4

Processor sharing strategies • The previous solution assume that each thread runs on a different processor and have the luxury of a spin lock and busy wait. Now we consider sharing a processor among several threads and need several new functions: • Strategy 1: a thread voluntary releases the control of the processor • Allow a thread to wait for an event; • Allow several threads running on the same processor to wait for a lock. • Strategy 2: force a thread to release the control of the processor • What needs to be done to switch the processor from one thread to another: • Save the state of the current thread • Schedule another thread • Start running the new thread 5

The kernel • The role of a kernel: controls virtualization • Processor sharing among threads • Virtual memory management • I/O operations • Two modes of running: • user (unprivileged) + kernel (privileged) • Two types of threads: • user-layer threads + processor-layer threads. • Open questions • How to create and terminate a thread? • If multiple threads are RUNNABLE who decides which one gets control of the processor? • What if no threads are ready to run? 6

The procedure followed when a kernel starts Procedure RUN_PROCESSORS() for each processor do allocate stack and setup processor thread /*allocation of the stack done at processor layer */ shutdown  FALSE SCHEDULER() deallocate processor_thread stack /*deallocation of the stack done at processor layer */ halt processor

The processor_thread and the SCHEDULER • Thread creation: thread_id ALLOCATE_THREAD(starting_address_of_procedure, address_space_id); • What if want to create/terminate threads dynamically  we have to: • Allow a tread to self-destroy and clean-up -> EXIT_THREAD • Allow a thread to terminate another thread of the same application DESTROY_THREAD • What if no thread is able to run  • create a dummy thread for each processor called a processor_thread which is scheduled to run when no other thread is available • the processor_thread runs in the thread layer • the SCHEDULER runs in the processor layer 8

Switching threads with dynamic thread creation • Switching from one user-thread to another requires two steps • 1. Switch from the user-thread releasing the processor to the processor-thread • 2. Switch from the processor thread to the new use-thread which is going to have the control of the processor. This step requires the SCHEDULER to circle through the thread_table until a thread ready to run is found • The boundary between user-layer threads and processor-layer thread is crossed twice 9

YIELD • A thread voluntarily releases the control of the processor. • allow a thread to wait for an event; • allow several threads running on the same processor to wait for a lock. • YIELD  function implemented by the kernel to • Save the state of the current thread • Schedule another thread. Invoke the SCHEDULER • Start running the new thread – dispatch the processor to the new thread • Cannot be implemented in a high level language, must be implemented in the machine language. • Can be called from the environment of the thread, e.g., C, C++, Java 11

Lecture 19 12

Communication with a bounded buffer using YIELD • Now the producer (the thread writing to the bounded buffer) and the consumer share one processor. • The SEND and RECEIVE use YIELD to allow the other thread to continue. • Example: switch from thread 1 to thread 6 using • YIELD • ENTER_PROCESSOR_LAYER • EXIT_PROCESSOR_LAYER

Shared data structures protected by locks • All threads share • The bounded buffer • The thread table • Both resources are protected by locks. • Is this sufficient? Recall that other resources shared are the pointers IN and OUT.

Using events for thread sequence coordination • YIELD requires the thread to periodically check if a condition has occurred. • Basic idea  use events and construct two before-or-after actions • WAIT(event_name) issued by the thread which can continue only after the occurrence of the event event_name. • NOTIFY(event_name)  search the thread_table to find a thread waiting for the occurrence of the event event_name. 18

Polling and interrupts • Polling  periodically checking the status of a subsystem. • How often should the polling be done? • Too frequently  large overhead • After a large time interval  the system will appear non-responsive • Interrupts • could be implemented in hardware as polling  before executing the next instruction the processor checks an “interrupt” bit implemented as a flip-flop • If the bit is ON invoke the interrupt handler instead of executing the next instruction • Multiple types of interrupts  multiple “interrupts” bits checked based upon the priority of the interrupt. • Some architectures allow the interrupts to occur durin the execution of an instruction • The interrupt handler should be short and very carefully written. Interrupts of lower priority could be masked.

This solution does not work The NOTIFY should always be sent after the WAIT. If the sender and the receiver run on two different processor there could be a race condition for the notempty event. The NOTIFY could be sent before the WAIT. Tension between modularity and locks Several possible solutions: AWAIT/ADVANCE, semaphores, etc 21

AWAIT - ADVANCE solution • A new state, WAITING and two before-or-after actions that take a RUNNING thread into the WAITING state and back to RUNNABLE state. • eventcount  variables with an integer value shared between threads and the thread manager; they are like events but have a value. • A thread in the WAITING state waits for a particular value of the eventcount • AWAIT(eventcount,value) • If eventcount >value  the control is returned to the thread calling AWAIT and this thread will continue execution • If eventcount ≤value  the state of the thread calling AWAIT is changed to WAITING and the thread is suspended. • ADVANCE(eventcount) • increments the eventcount by one then • searches the thread_table for threads waiting for this eventcount • if it finds a thread and the eventcount exceeds the value the thread is waiting for then the state of the thread is changed to RUNNABLE 22

Thread states and state transitions 23

Solution for a single sender and multiple receivers 25

Supporting multiple senders: the sequencer Sequencer shared variable supporting thread sequence coordination -it allows threads to be ordered and is manipulated using two before-or-after actions. TICKET(sequencer)  returns a non-negative value which increases by one at each call. Two concurrent threads calling TICKET on the same sequencer will receive different values based upon the timing of the call, the one calling first will receive a smaller value. READ(sequencer)  returns the current value of the sequencer 26

Multiple sender solution; only the SEND must be modified 27

Thread scheduling policies • Non-preemptive scheduling  a running thread releases the processor at its own will. Not very likely to work in a greedy environment. • Cooperative scheduling  a thread calls YIEALD periodically • Preemptive scheduling  a thread is allowed to run for a time slot. It is enforced by the thread manager working in concert with the interrupt handler. • The interrupt handler should invoke the thread exception handler. • What if the interrupt handler running at the processor layer invokes directly the thread? Imagine the following sequence: • Thread A acquires the thread_table_lock • An interrupt occurs • The YIELD call in the interrupt handler will attempt to acquire the thread_table_lock • Solution: the processor is shared between two threads: • The processor thread • The interrupt handler thread • Recall that threads have their individual address spaces so the scheduler when allocating the processor to thread must also load the page map table of the thread into the page map table register of the processor 29

Virtual machines • First commercial product IBM VM 370 originally developed as CP-67 • Advantages: • One could run multiple guest operating systems on the same machine • An error in one guest operating system does not bring the machine down • An ideal environment for developing operating systems

Performance metrics • Wide range, sometimes correlated, other times with contradictory goals : • Throughput, utilization, waiting time, fairness • Latency (time in system) • Capacity • Reliability as a ultimate measure of performance • Some measures of performance reflect physical limitations: capacity, bandwidth (CPU, memory, communication channel), communication latency. • Often measures of performance reflect system organization and policies such as scheduling priorities. • Resource sharing is an enduring problem; recall that one of the means for virtualization is multiplexing physical resources. • The workload can be characterized statistically • Queuing Theory can be used for analytical performance evaluation. 33

System design for performance • When you have a clear idea of the design, simulate the system before actually implementing it. • Identify the bottlenecks. • Identify those bottlenecks likely to be removed naturally by the technologies expected to be embedded in your system. • Keep in mind that removing one bottleneck exposes the next. • Concurrency helps a lot both in hardware and in software. • in hardware implies multiple execution units • Pipelining  multiple instructions are executed concurrently • Multiple exaction units in a processor: integer, floating point, pixels • Graphics Processors – geometric engines. • Multi-processor system • Multi-core processors • Paradigm: SIMD (Single instruction multiple data), MIMD (Multiple Instructions Multiple Data. 34

System design for performance (cont’d) • in software  complicates writing and debugging programs. SPMD (Same Program Multiple data) paradigm • Design a well balanced system: • The bandwidth of individual sub-systems should be as close as possible • The execution time of pipeline stages as close as possible. 35

COP 5611 Operating Systems Spring 2010