CS162 Operating Systems and Systems Programming Midterm Review

CS162Operating Systems andSystems ProgrammingMidterm Review March 18, 2013

Overview • Synchronization, Critical Sections • Deadlock • CPU Scheduling • Memory Multiplexing, Address Translation • Caches, TLBs • Page Allocation and Replacement • Kernel/User Interface

Synchronization,Critical sections

Definitions • Synchronization: using atomic operations to ensure cooperation between threads • Mutual Exclusion: ensuring that only one thread does a particular thing at a time • One thread excludes the other while doing its task • Critical Section: piece of code that only one thread can execute at once • Critical section is the result of mutual exclusion • Critical section and mutual exclusion are two ways of describing the same thing.

Locks: using interrupts • Key idea: maintain a lock variable and impose mutual exclusion only during operations on that variable int value = FREE; Acquire() {disable interrupts; if (value == BUSY) { put thread on wait queue; Go to sleep(); // Enable interrupts? } else {value = BUSY;}enable interrupts;} Release() {disable interrupts; if (anyone on wait queue) { take thread off wait queue Place on ready queue; } else {value = FREE; }enable interrupts;}

int guard = 0; int value = FREE; Acquire() { // Short busy-wait timewhile (test&set(guard)); if (value == BUSY) { put thread on wait queue; go to sleep() & guard = 0; } else {value = BUSY;guard = 0; }} Better locks: using test&set • test&set (&address) { /* most architectures */ result = M[address]; M[address] = 1; return result;} Release() { // Short busy-wait timewhile (test&set(guard)); if anyone on wait queue { take thread off wait queue Place on ready queue; } else {value = FREE; }guard = 0;

Does this busy wait? If so, how long does it wait? • Why is this better than the version that disables interrupts? • Remember that a thread can be context-switched while holding a lock. • What implications does this have on the performance of the system?

Semaphores • Definition: a Semaphore has a non-negative integer value and supports the following two operations: • P(): an atomic operation that waits for semaphore to become positive, then decrements it by 1 • Think of this as the wait() operation • V(): an atomic operation that increments the semaphore by 1, waking up a waiting P, if any • This of this as the signal() operation • You cannot read the integer value! The only methods are P() and V(). • Unlike lock, there is no owner (thus no priority donation). The thread that calls V() may never call P(). Example: Producer/consumer

Condition Variables • Condition Variable: a queue of threads waiting for something inside a critical section • Key idea: allow sleeping inside critical section by atomically releasing lock at time we go to sleep • Contrast to semaphores: Can’t wait inside critical section • Operations: • Wait(&lock): Atomically release lock and go to sleep. Re-acquire lock later, before returning. • Signal(): Wake up one waiter, if any • Broadcast(): Wake up all waiters • Rule: Must hold lock when doing condition variable ops!

Complete Monitor Example(with condition variable) Is this Hoare or Mesa monitor? • Here is an (infinite) synchronized queue Lock lock;Condition dataready; Queue queue; AddToQueue(item) { lock.Acquire(); // Get Lock queue.enqueue(item); // Add itemdataready.signal();// Signal any waiters lock.Release(); // Release Lock } RemoveFromQueue() { lock.Acquire(); // Get Lockwhile (queue.isEmpty()) { dataready.wait(&lock); // If nothing, sleep } item = queue.dequeue(); // Get next item lock.Release(); // Release Lock return(item); }

Mesa vs. Hoare monitors • Need to be careful about precise definition of signal and wait. Consider a piece of our dequeue code: while (queue.isEmpty()) { dataready.wait(&lock); // If nothing, sleep } item = queue.dequeue(); // Get next item • Why didn’t we do this? if (queue.isEmpty()) { dataready.wait(&lock); // If nothing, sleep } item = queue.dequeue(); // Get next item • Answer: depends on the type of scheduling • Hoare-style (most textbooks): • Signaler gives lock, CPU to waiter; waiter runs immediately • Waiter gives up lock, processor back to signaler when it exits critical section or if it waits again • Mesa-style (most real operating systems): • Signaler keeps lock and processor • Waiter placed on ready queue with no special priority • Practically, need to check condition again after wait

Reader/Writer • Two types of lock: read lock and write lock (i.e. shared lock and exclusive lock) • Shared locks is an example where you might want to do broadcast/wakeAll on a condition variable. If you do it in other situations, correctness is generally preserved, but it is inefficient because one thread gets in and then the rest of the woken threads go back to sleep.

Read/Writer Reader() { // check into system lock.Acquire(); while ((AW + WW) > 0) { WR++; okToRead.wait(&lock); WR--; } AR++; lock.release(); // read-only access AccessDbase(ReadOnly); // check out of system lock.Acquire(); AR--; if (AR == 0 && WW > 0) okToWrite.signal(); lock.Release();} Writer() { // check into system lock.Acquire(); while ((AW + AR) > 0) { WW++; okToWrite.wait(&lock); WW--; } AW++; lock.release(); // read/write access AccessDbase(ReadWrite); // check out of system lock.Acquire(); AW--; if (WW > 0){ okToWrite.signal(); } else if (WR > 0) { okToRead.broadcast(); } lock.Release();} What if we remove this line?

Read/Writer Revisited Reader() { // check into system lock.Acquire(); while ((AW + WW) > 0) { WR++; okToRead.wait(&lock); WR--; } AR++; lock.release(); // read-only access AccessDbase(ReadOnly); // check out of system lock.Acquire(); AR--; if (AR == 0 && WW > 0) okToWrite.broadcast(); lock.Release();} Writer() { // check into system lock.Acquire(); while ((AW + AR) > 0) { WW++; okToWrite.wait(&lock); WW--; } AW++; lock.release(); // read/write access AccessDbase(ReadWrite); // check out of system lock.Acquire(); AW--; if (WW > 0){ okToWrite.signal(); } else if (WR > 0) { okToRead.broadcast(); } lock.Release();} What if we turn signal to broadcast?

Read/Writer Revisited Reader() { // check into system lock.Acquire(); while ((AW + WW) > 0) { WR++; okContinue.wait(&lock); WR--; } AR++; lock.release(); // read-only access AccessDbase(ReadOnly); // check out of system lock.Acquire(); AR--; if (AR == 0 && WW > 0) okContinue.signal(); lock.Release();} Writer() { // check into system lock.Acquire(); while ((AW + AR) > 0) { WW++; okContinue.wait(&lock); WW--; } AW++; lock.release(); // read/write access AccessDbase(ReadWrite); // check out of system lock.Acquire(); AW--; if (WW > 0){ okToWrite.signal(); } else if (WR > 0) { okContinue.broadcast(); } lock.Release();} What if we turn okToWrite and okToRead into okContinue?

Read/Writer Revisited Reader() { // check into system lock.Acquire(); while ((AW + WW) > 0) { WR++; okContinue.wait(&lock); WR--; } AR++; lock.release(); // read-only access AccessDbase(ReadOnly); // check out of system lock.Acquire(); AR--; if (AR == 0 && WW > 0) okContinue.signal(); lock.Release();} Writer() { // check into system lock.Acquire(); while ((AW + AR) > 0) { WW++; okContinue.wait(&lock); WW--; } AW++; lock.release(); // read/write access AccessDbase(ReadWrite); // check out of system lock.Acquire(); AW--; if (WW > 0){ okToWrite.signal(); } else if (WR > 0) { okContinue.broadcast(); } lock.Release();} • R1 arrives • W1, R2 arrive while R1 reads • R1 signals R2

Read/Writer Revisited Reader() { // check into system lock.Acquire(); while ((AW + WW) > 0) { WR++; okContinue.wait(&lock); WR--; } AR++; lock.release(); // read-only access AccessDbase(ReadOnly); // check out of system lock.Acquire(); AR--; if (AR == 0 && WW > 0) okContinue.broadcast(); lock.Release();} Writer() { // check into system lock.Acquire(); while ((AW + AR) > 0) { WW++; okContinue.wait(&lock); WW--; } AW++; lock.release(); // read/write access AccessDbase(ReadWrite); // check out of system lock.Acquire(); AW--; if (WW > 0){ okToWrite.signal(); } else if (WR > 0) { okContinue.broadcast(); } lock.Release();} Need to change to broadcast! Why?

Example Synchronization Problems • Fall 2012 Midterm Exam #3 • Spring 2013 Midterm Exam #2 • Hard Synchronization Problems from Past Exams • http://inst.eecs.berkeley.edu/~cs162/fa13/hand-outs/synch-problems.html

Deadlock

Four requirements for Deadlock • Mutual exclusion • Only one thread at a time can use a resource. • Hold and wait • Thread holding at least one resource is waiting to acquire additional resources held by other threads • No preemption • Resources are released only voluntarily by the thread holding the resource, after thread is finished with it • Circular Wait • There exists a set {T1, …, Tn} of waiting threads • T1 is waiting for a resource that is held by T2 • T2 is waiting for a resource that is held by T3 • … • Tn is waiting for a resource that is held by T1

R1 R1 R2 R2 R1 T2 T1 T2 T3 T1 T2 T3 T1 T3 T4 R3 R3 R2 R4 R4 Simple Resource Allocation Graph Resource Allocation Graph Examples • Recall: • request edge – directed edge T1  Rj • assignment edge – directed edge Rj Ti Allocation GraphWith Deadlock Allocation GraphWith Cycle, but No Deadlock

R1 T2 T1 T3 R2 T4 Deadlock Detection Algorithm • Only one of each type of resource  look for loops • More General Deadlock Detection Algorithm • Let [X] represent an m-ary vector of non-negative integers (quantities of resources of each type): [FreeResources]: Current free resources each type[RequestX]: Current requests from thread X[AllocX]: Current resources held by thread X • See if tasks can eventually terminate on their own [Avail] = [FreeResources] Add all nodes to UNFINISHED do { done = true Foreach node in UNFINISHED { if ([Requestnode] <= [Avail]) { remove node from UNFINISHED [Avail] = [Avail] + [Allocnode] done = false } } } until(done) • Nodes left in UNFINISHED  deadlocked

Banker’s algorithm • Method of deadlock avoidance. • Process specifies max resources it will ever request. Process can initially request less than max and later ask for more. Before a request is granted, the Banker’s algorithm checks that the system is “safe” if the requested resources are granted. • Roughly: Some process could request it’s specified Max and eventually finish; when it finishes, that process frees up its resources. A system is safe if there exists some sequence by which processes can request their max, terminate, and free up their resources so that all other processes can finish (in the same way).

Example Deadlock Problems • Spring 2004 Midterm Exam #4 • Spring 2011 Midterm Exam #2

CPU Scheduling

Time USER1 USER2 USER3 USER1 USER2 Scheduling Assumptions • CPU scheduling big area of research in early 70’s • Many implicit assumptions for CPU scheduling: • One program per user • One thread per program • Programs are independent • In general unrealistic but they simplify the problem • For instance: is “fair” about fairness among users or programs? • If I run one compilation job and you run five, you get five times as much CPU on many operating systems • The high-level goal: Dole out CPU time to optimize some desired parameters of system

Scheduling Metrics • Waiting Time: time the job is waiting in the ready queue • Time between job’s arrival in the ready queue and launching the job • Service (Execution) Time: time the job is running • Response (Completion) Time: • Time between job’s arrival in the ready queue and job’s completion • Response time is what the user sees: • Time to echo a keystroke in editor • Time to compile a program Response Time = Waiting Time + Service Time • Throughput: number of jobs completed per unit of time • Throughput related to response time, but not same thing: • Minimizing response time will lead to more context switching than if you only maximized throughput

Scheduling Policy Goals/Criteria • Minimize Response Time • Minimize elapsed time to do an operation (or job) • Maximize Throughput • Two parts to maximizing throughput • Minimize overhead (for example, context-switching) • Efficient use of resources (CPU, disk, memory, etc) • Fairness • Share CPU among users in some equitable way • Fairness is not minimizing average response time: • Better average response time by making system lessfair • Tradeoff: fairness gained by hurting average response time!

P1 P2 P3 0 24 27 30 First-Come, First-Served (FCFS) Scheduling • First-Come, First-Served (FCFS) • Also “First In, First Out” (FIFO) or “Run until done” • In early systems, FCFS meant one program scheduled until done (including I/O) • Now, means keep CPU until thread blocks • Example:ProcessBurst TimeP1 24P2 3P3 3 • Suppose processes arrive in the order: P1 , P2 , P3 The Gantt Chart for the schedule is: • Waiting time for P1 = 0; P2 = 24; P3 = 27 • Average waiting time: (0 + 24 + 27)/3 = 17 • Average completion time: (24 + 27 + 30)/3 = 27 • Convoy effect: short process behind long process

Round Robin (RR) • FCFS Scheme: Potentially bad for short jobs! • Depends on submit order • If you are first in line at supermarket with milk, you don’t care who is behind you, on the other hand… • Round Robin Scheme • Each process gets a small unit of CPU time (time quantum), usually 10-100 milliseconds • After quantum expires, the process is preempted and added to the end of the ready queue • n processes in ready queue and time quantum is q  • Each process gets 1/n of the CPU time • In chunks of at most q time units • No process waits more than (n-1)q time units • Performance • q large  FCFS • q small  Interleaved • q must be large with respect to context switch, otherwise overhead is too high (all overhead)

What if we Knew the Future? • Could we always mirror best FCFS? • Shortest Job First (SJF): • Run whatever job has the least amount of computation to do • Shortest Remaining Time First (SRTF): • Preemptive version of SJF: if job arrives and has a shorter time to completion than the remaining time on the current job, immediately preempt CPU • These can be applied either to a whole program or the current CPU burst of each program • Idea is to get short jobs out of the system • Big effect on short jobs, only small effect on long ones • Result is better average response time

Discussion • SJF/SRTF are the best you can do at minimizing average response time • Provably optimal (SJF among non-preemptive, SRTF among preemptive) • Since SRTF is always at least as good as SJF, focus on SRTF • Comparison of SRTF with FCFS and RR • What if all jobs the same length? • SRTF becomes the same as FCFS (i.e., FCFS is best can do if all jobs the same length) • What if jobs have varying length? • SRTF (and RR): short jobs not stuck behind long ones

Summary • Scheduling: selecting a process from the ready queue and allocating the CPU to it • FCFS Scheduling: • Run threads to completion in order of submission • Pros: Simple (+) • Cons: Short jobs get stuck behind long ones (-) • Round-Robin Scheduling: • Give each thread a small amount of CPU time when it executes; cycle between all ready threads • Pros: Better for short jobs (+) • Cons: Poor when jobs are same length (-)

Summary (cont’d) • Shortest Job First (SJF)/Shortest Remaining Time First (SRTF): • Run whatever job has the least amount of computation to do/least remaining amount of computation to do • Pros: Optimal (average response time) • Cons: Hard to predict future, Unfair • Multi-Level Feedback Scheduling: • Multiple queues of different priorities • Automatic promotion/demotion of process priority in order to approximate SJF/SRTF • Lottery Scheduling: • Give each thread a number of tokens (short tasks  more tokens) • Reserve a minimum number of tokens for every thread to ensure forward progress/fairness

Example CPU Scheduling Problems • Spring 2011 Midterm Exam #4 • Fall 2012 Midterm Exam #4 • Spring 2013 Midterm Exam #5

Memory Multiplexing,Address Translation

Important Aspects of Memory Multiplexing • Controlled overlap: • Processes should not collide in physical memory • Conversely, would like the ability to share memory when desired (for communication) • Protection: • Prevent access to private memory of other processes • Different pages of memory can be given special behavior (Read Only, Invisible to user programs, etc). • Kernel data protected from User programs • Programs protected from themselves • Translation: • Ability to translate accesses from one address space (virtual) to a different one (physical) • When translation exists, processor uses virtual addresses, physical memory uses physical addresses • Side effects: • Can be used to avoid overlap • Can be used to give uniform view of memory to programs

Data 2 Code Data Heap Stack Code Data Heap Stack Stack 1 Heap 1 Code 1 Stack 2 Data 1 Heap 2 Code 2 OS code OS data OS heap & Stacks Why Address Translation? Prog 1 Virtual Address Space 1 Prog 2 Virtual Address Space 2 Translation Map 1 Translation Map 2 Physical Address Space

> Error Virtual Address Seg # Offset Virtual Address: Virtual Page # Offset PageTablePtr page #0 V,R Physical Page # Offset Base7 Base6 Base5 Base2 Base0 Base4 Base1 Base3 Base2 Limit2 Limit2 Limit3 Limit7 Limit1 Limit6 Limit4 Limit5 Limit0 V V V V N N V N V page #1 page #1 V,R V,R Physical Address + Physical Address page #2 V,R,W Check Perm page #3 V,R,W page #4 N Access Error page #5 V,R,W Addr. Translation: Segmentation vs. Paging

Review: Address Segmentation Virtual memory view Physical memory view 1111 1111 stack 1111 0000 1110 000 stack 1100 0000 heap 1000 0000 heap 0111 0000 data data 0101 0000 0100 0000 code code 0001 0000 0000 0000 0000 0000 seg # offset

Review: Address Segmentation Virtual memory view Physical memory view 1111 1111 stack 1110 0000 1110 000 stack 1100 0000 What happens if stack grows to 1110 0000? heap 1000 0000 heap 0111 0000 data data 0101 0000 0100 0000 code code 0001 0000 0000 0000 0000 0000 seg # offset

Review: Address Segmentation Virtual memory view Physical memory view 1111 1111 stack stack 1110 0000 1110 000 1100 0000 No room to grow!! Buffer overflow error heap 1000 0000 heap 0111 0000 data data 0101 0000 0100 0000 code code 0001 0000 0000 0000 0000 0000 seg # offset

Review: Paging Page Table Virtual memory view Physical memory view 11111 11101 11110 11100 11101 null 11100 null 11011 null 11010 null 11001 null 11000 null 10111 null 10110 null 10101 null 10100 null 10011 null 10010 10000 10001 01111 10000 01110 01111 null 01110 null 01101 null 01100 null 01011 01101 01010 01100 01001 01011 01000 01010 00111 null 00110 null 00101 null 00100 null 00011 00101 00010 00100 00001 00011 00000 00010 1111 1111 stack 1111 0000 stack 1110 0000 1100 0000 heap 1000 0000 heap 0111 000 data data 0101 000 0100 0000 code code 0001 0000 0000 0000 0000 0000 page # offset

Review: Paging Page Table Virtual memory view Physical memory view 11111 11101 11110 11100 11101 null 11100 null 11011 null 11010 null 11001 null 11000 null 10111 null 10110 null 10101 null 10100 null 10011 null 10010 10000 10001 01111 10000 01110 01111 null 01110 null 01101 null 01100 null 01011 01101 01010 01100 01001 01011 01000 01010 00111 null 00110 null 00101 null 00100 null 00011 00101 00010 00100 00001 00011 00000 00010 1111 1111 stack stack 1110 0000 1110 0000 1100 0000 What happens if stack grows to 1110 0000? heap 1000 0000 heap 0111 000 data data 0101 000 0100 0000 code code 0001 0000 0000 0000 0000 0000 page # offset

Review: Paging Page Table Virtual memory view Physical memory view 11111 11101 11110 11100 11101 10111 11100 10110 11011 null 11010 null 11001 null 11000 null 10111 null 10110 null 10101 null 10100 null 10011 null 10010 10000 10001 01111 10000 01110 01111 null 01110 null 01101 null 01100 null 01011 01101 01010 01100 01001 01011 01000 01010 00111 null 00110 null 00101 null 00100 null 00011 00101 00010 00100 00001 00011 00000 00010 1111 1111 stack stack 1110 0000 1110 0000 1100 0000 stack Allocate new pages where room! heap 1000 0000 heap 0111 000 data Challenge: Table size equal to the # of pages in virtual memory! data 0101 000 0100 0000 code code 0001 0000 0000 0000 0000 0000 page # offset

Design a Virtual Memory system Would you choose Segmentation or Paging?

Design a Virtual Memory system How would you choose your Page Size?

Design a Virtual Memory system How would you choose your Page Size? • Page Table Size • TLB Size • Internal Fragmentation • Disk Access

Design a Virtual Memory system You have a 32 bit address space and 2 KB pages. In a single-level page table, how many bits would you use for the index and how many for the offset?

31 11 0 Virtual Page Number Byte Offset Design a Virtual Memory system 2 KB = 211 bytes 11 bit Offset 32 bit address space -11 bit offset = 21 bits 21 bit Virtual Page Number

CS162 Operating Systems and Systems Programming Midterm Review

CS162 Operating Systems and Systems Programming Midterm Review

Presentation Transcript

CS162 Operating Systems and Systems Programming Lecture 12 Address Translation

CS162 Operating Systems and Systems Programming Lecture 10 Scheduling

CS162 Operating Systems and Systems Programming Final Exam Review

CS162 Operating Systems and Systems Programming Lecture 22 Client-Server

CS162 Operating Systems and Systems Programming Lecture 22 Networking III

CS162 Operating Systems and Systems Programming Review (II)

CS162 Operating Systems and Systems Programming Lecture 18 Transactions

CS162 Operating Systems and Systems Programming Lecture 18 Transactions

CS140 – Operating Systems Midterm Review

CS162 Operating Systems and Systems Programming Lecture 22 Security (II)

CS162 Operating Systems and Systems Programming Lecture 18 Transactions

CS162 Operating Systems and Systems Programming Lecture 22 Networking II

CS162 Operating Systems and Systems Programming Review (II)