Outline of Lecture 9

Outline of Lecture 9 Cache Replacement in Multithreaded Architectures Page Replacement Stack Replacement Algorithms and Their Properties Priority Based Replacement Belady’s Anomaly

Context Selection in Multithreaded Architectures Context Selection Functional Units Memory Interconnection Network I/O

Context Selection, States Software Hardware Hardware Controls Registers Status word Program Counter Running Waiting Ready Context Selection

Threads Processor Memory Program Counter and Status Register Ready Queue Suspended Queue ... ... Unloaded Ready Thread Loaded Thread, it could be ready or suspended CP Context Pointer Unloaded Suspended Thread Register frames

Context Switching One-thread execution Two-thread execution Three-thread execution Time Context-switching

Processor Efficiency 1.0 0.9 Linear Region Saturation region 0 5 11 15 10 Number of Contexts

Efficiency Saturation Point Factors influencing saturation point: - context switching time, ts - cache loading time, tc - cache miss probability, pm Saturation point happens at inverse of the duty factor (ts+1/pm +tc)/(ts+1/pm) = 1+tc/(ts+1/pm) number of contexts. For small ts this is about 1+tcpm Large cache line yields small pm Large cache line or slow memory result in large tc.

Context Switching Analysis Each gray block lasts 1/pm instructions and accesses 1/pm-1 data items. There is also s register sets, so s threads are ready to execute with single instruction context switch. One-thread execution 1/pm Each gap lasts tc cycles Two-thread execution, s=2 tc -ts-1/pm ts+1/pm ts+1/pm In general with ts=1 tc -(s-1)(1/pm+1) • We assume that ts=1, costs one instruction, so each of s threads lasts 1/pm instructions • and accesses 1/pmdata items (one item caused a cache miss in previous block). • Two cases needs to be considered: • s*(1/pm+1)< tc+1/pm (gap is not filled) then in time tc+1/pm the system accesses • s/pm data items so the ADDT is (tc+1/pm)/(s/pm) = (1+ tc*pm)/s • otherwise, the gap is filled so in time 1+1/pm the system executes 1/pmdata • accesses and ADDT is (1/pm+1/(1/pm) = 1+pm • To combine this two formulas, we notice that condition s*(1/pm+1)< tc+1/pm means that • s(1+pm) <1+ tc*pm so (1+ tc*pm)/s>1+ pm. But then the following expression gives us the • correct value: max((1+ tc*pm)/s, 1+ pm).

Performance in Virtual Memory The performance of a virtual memory management system depends on the total number of page faults, which depend on: The paging policies, including frame allocation: Static allocation - the number of frames allocated to a process is fixed Dynamic allocation - the number of frames allocated to a process changes The frame allocation policies

Page Replacement When there is a page fault, the referenced page must be loaded If there is no available frame in memory one page is selected for replacement If the selected page has been modified, it must be copied back to disk (swapped out) Same pages may be referenced several times, so for good performance a good replacement algorithm will strive to cause minimum number of page faults.

Paging Policies Fetch policy -- decides when a page should be loaded into memory -> demand paging Replacement policy -- decides which page in memory should be replaced -> difficult Placement policy-- decides where in memory should a page be loaded -> easy for paging

Page Faults and Performance Issues A page fault requires the operating system to carry out the page fault service. The total time it takes to service a page fault includes several time components: The time interval to service the page fault interrupt - system The time interval to store back (swap out) the replaced page to the secondary storage device – process/cleaning The time interval to load (swap in) the referenced page from the secondary storage device (disk unit) - process Delay in queuing for the secondary storage device - process Delay in scheduling the process with the referenced page - process

Demand Paging In demand paging, a page fault occurs when a reference is made to a page not in memory. The page fault may occur while: • fetching an instruction, or • fetching an operand of an instruction.

Problems to be Solved within Demand Paging Two major problems must be solved to implement demand paging: Each process needs a minimum number of frames. This minimum number is based on the machine architecture. 1.Frame allocation - decide how many frames to allocate to each process, usually needed only for loading the process to the memory initially. 2.Page replacement - select which pages are to be replaced when a page fault occurs.

Page Replacement Algorithms Paging System may be characterized by 3 items: • The Reference String • The Page Replacement Algorithm • The number of page frames available in memory, m A page replacement algorithm is said to satisfy the inclusion property or is called a stack algorithm if the set of pages in a k-frame memory is always a subset of the pages in a (k + 1)-frame memory.

Page Reference A page reference string is a sequence of page numbers in order of reference An example of a sequence of page references is: < 3,6,2,1,4,7,3,5,8,9,2,8,10,7 > The sequence of page references represents the behavior of a process during execution Every Process generates a sequence of memory references as it runs Each memory reference corresponds to a specific virtual page A process’ memory access may be characterized by an ordered list of page numbers, referred to as the reference string

Mt= Reference String w = r1 r2 ...rT-1 rTsequence of virtual page references M0 - initial memory state M0, M1, ...MT real memory state Mt under request for page rtis Mt-1 + Xt- Yt where Xtpages brought in, and Yt pages moved out in t-th step For demand driven fetching

Cost of Fetching f(k) - cost of fetching k pages, • f(1) = 1 = tseek+ ttransfer • f(0) = 0 • f(k+1) > f(k). The cost C(m,w) is • for demand replacement with it simplifies to • C(m,w) = p x f(1)=p, • where p denotes number of page faults in 1...T.

Demand Policy Optimality For a disk for electronic auxiliary memory (electronic disk) f(k) = kf(1)=k (ignoring costs of page fault interrupt). When f(k)≥k, there is a demand replacement algorithm with a cost function that does at least as well for all memory sizes and reference strings as any other algorithm, nice result but not very useful when disks are used! Conclusion: pre-fetching helps by bringing more than one page per page fault but it is difficult to predict which pages to pre-fetch!

Replacement Policies OPT (Optimal) - remove the page with next reference most distant into the future, FIFO (First In First Out)- circulating pointer, often inefficient, LIFO (Last In First Out) - special situations (sequential access), LFU (Least Frequently Used) – good performance MRU (Most Recently Used) MRU is not the same as LIFO, e.g., consider string 12314 with m=3, then after 1231, MRU will replace 1, while LIFO 3 MRU is not the same as LFU, e.g. w = (123)r (456)s, m=3 LFU - 3 x (1+min(r,s)), MRU - 3 x (1+s) page faults.

Stack Algorithm Definition M(m,w) is a state of real memory after referencing string w in m frames, where M(m,0) = 0. Inclusion property characterizing stack algorithms: Given w, there is a permutation of the virtual pages labeled 1,2,...,n, called stack, S(w)= {s1 (w), ..., sn (w) } such that M(m,w) = {s1 (w), ..., sm (w) } For a sequence of references, there is a sequence of stacks. dp(w) is the distance of page p from the top of the stack after processing string w.

Stack Updating via Priority Consider stack-updating procedure through priority list with the following properties: • Priority is independent of the number of frames m. • The currently referenced page has highest priority. 3. The resident page with the lowest priority is moved up in stack (from real memory) only when necessary and only to its priority level. With these three properties, class of priority algorithms is the same as class of stack algorithms, sinceonly removal can brake stack property and if M is the stack of m-frames and y is the page in m+1 frame, then Selecting a victim with m+1 frames Selecting a victim with m frames Although min[min[M],y] may be y, the stack is the same with m and m+1 frames, because in such a case m+1st frame will contain min[M].

Stack Update If S(w) is a stack in which if dp(w) = k (so sk(w) = p) then si(wp)= where min[M] is the lowest priority page resident in memory M and max[p1,p2] is the higher priority page among pages p1,p2. This form of update is a consequence of the stack property (for each position k consider k frames).

Diagram of Stack Update s1(w) s1(wp) s2(w) s2(wp) s3(w) s3(wp) ... ... ... sk-1(wp) sk-1(w) sk(w) sk(wp) sk+1(w) sk+1(wp) ... ... ... sn(wp) sn(w)

Priorities for Algorithms OPT = Optimal – smaller time to the next reference, higher priority LRU = Least Recently Used - smaller time to the last reference, higher priority LFU = Least Frequently Used - higher frequency of references, higher priority MRU = Most Recently Used - larger time to the last reference, higher priority FIFO = First In First Out - smaller time to the memory entry, higher priority? No!, Memory entry time depends on the number of frames allocated to the program, so depends on m

Belady’s Anomaly for FIFO Consider m+2 different pages p1 ,.., pm+2 and a stringw= p1 ,.., pm+1,p1 , pm+2 For m >1 frames p1 is in memory after processing string w, but for m+1 frames it is out of resident pages after w, so memory content differs! Consider also a stringw = p1 , p2, p3, p4, p1, p2, p5, p1, p2, p3, p4, p5 With m=3 frames FIFO requires 9 page faults but with m=4 it needs 10 page faults to process this string, displaying Belady’s Anomaly that number of page faults increases after increase in memory allocation for a program. 1-- 12- 123 234 341 412 125 125 125 253 534 534 1--- 12-- 123- 1234 1234 1234 2345 3451 4512 5123 1234 2345

FIFO Algorithm LRU is a stack algorithm: M(m,w) = {m pages the were most recently referenced}, FIFO is not a stack algorithm since a rule M(m,w) = {m pages that most recently entered memory} depends on m, as the previous examples showed Non-stack algorithm can exhibit the following Belady’s anomaly: Allocating more frames to a program may cause more page faults then the same program will generate with less frames allocated.

Stack and Priority List Updating: OPT Replacement w=123 St=<3,1,2> s1=3; s2=1; s3=2 d1=2; d2=3; d3=1 w12 3 4 1 2 3 2 3 1 St 1234 1 2 3 2 3 1 - 111 4 1 2 3 2 2 - - 22 2 4 1 1 1 3 - - - 3 3 3 4 4 4 4 Lt 111 1 2 3 2 3 1 1 - 22 2 3 2 3 1 2 2 - - 3 3 1 1 1 2 3 3 - - - 4 4 4 4 4 4 4 priority Stack can be implemented by keeping page table as a double linked list and bringing page referenced to the top of the stack: requires 6 pointer changes for each reference! Arrows show the reason for priority, lines show level of competition. First stack is computed, then priorities are computed at each column.

Outline of Lecture 9

Outline of Lecture 9

Presentation Transcript

Outline of today’s lecture

EE360: Lecture 9 Outline Multiuser OFDM

Outline of Today’s Lecture

EE360: Lecture 9 Outline

EE359 – Lecture 9 Outline

Outline of lecture 5

Lecture 2 Outline (Ch. 8, 9)

OUTLINE OF THE LECTURE

Outline of Today’s Lecture

Outline of the lecture

EE359 – Lecture 9 Outline

Lecture 9 Outline

Lecture 9 Outline

Lecture 3 Outline (Ch. 9, 10)

Lecture outline - 9/12

Outline of Today’s Lecture

Lecture 6 Outline (Ch. 9)

EE359 – Lecture 9 Outline

EE359 – Lecture 9 Outline

Outline of today’s lecture

Outline of Lecture

Outline 9