Efficient Load-on-Demand Implementation in Operating Systems

Deferred segment-loading An exercise on implementing the concept of ‘load-on-demand’

The ‘do-it-later’ philosophy • Modern operating systems often follow a policy of deferring work whenever possible • The advantage of adopting this practice is most evident in those cases where it turns out that the work was not needed after all • Example: Many programs contain lots of code and data for diagnosing errors – but it’s not needed if no errors actually occur

Avoiding wasted effort • Thus it will be more efficient if an OS does not always take time to load those portions of a program (such as its error-diagnostics and error-recovery routines) which may be unnecessary in the majority of situations • But of course the OS needs to be ready to take a ‘timeout’ for loading those routines when and if the need becomes apparent

Another example • In a multitasking environment, many tasks are taking turns at executing instructions • The CPU typically performs task-switching several times every second – and must do a ‘save’ of the outgoing task’s context, and a ‘load’ of the incoming task’s context, any time it switches from one task to the next • We ask: can any of this work be deferred?

The NPX registers • Only a few tasks typically make any use of the Pentium’s ‘floating-point’ registers, so it’s wasteful to do a ‘save-and-reload’ for these registers with every task-switch • The TS-bit (bit #3 in Control Register 0) is designed to assist an OS in implementing a policy of ‘lazy’ context-switching for the set of registers used in floating-point work

Example: effect of TS=1 • Each time the CPU performs a task-switch it automatically sets the TS-bit to 1 (only an OS can execute a ‘clts’ to reset TS=0) • When any task tries to execute any of the NPX instructions (to do some arithmetic with values in the floating-point registers), an exception 7 fault will occur if the TS-bit hasn’t been cleared since a task-switch

The fault-7 exception-handler • The work involved in saving the contents of the floating-point registers being used by a no-longer-active task, and reloading those registers with values that the active task expects to work on, can be deferred to the fault-handler for exception-7 • Then it can clear the TS-bit (with ‘clts’) and ‘retry’ the instruction that caused this ‘fault’

The ‘fork()’ system-call • In a UNIX/Linux operating system, the way any new task get created is by a call to the kernel’s ‘fork()’ service-function • This function is supposed to ‘duplicate’ the entire program-environment of the calling task (i.e., code, data, stack and heap, plus the kernel’s process-control data-structure • But much of this work is often wasted!

The ‘fork-and-exec’ senario • In practice, the most common reason for a program to ‘fork()’ a child-process is so the child-task can launch a separate program: • In these cases the ‘duplicated’ code, data, and heap are not relevant to the new task -- and so they will simply get discarded! if ( fork() == 0 ) execl( “newprog”, newargs, 0 );

‘loading-on-demand’ • An OS can avoid all the wasted effort of duplicating a parent-task’s resources (its code, data, heap, etc.) by implementing “only upon demand” loading as a policy • For an OS that uses the CPU’s memory-segmentation capabilities, an ‘on demand’ policy can be implemented by using the Pentium ‘Segment-Not-Present’ exception

How it works • Segments remain ‘uninitialized’ until they are actually accessed by an application • Segment-descriptors are initially marked as ‘Not Present’ (i.e., their P-bit is zero) • When any instruction attempts to access such a memory-segment (read, write, or fetch), the CPU responds by generating exception-11: “Segment-Not-Present”

An ‘error-code’ is pushed • Besides pushing the memory-address of the faulting instruction onto the exception-handler’s stack, the CPU also pushes an ‘error-code’ to indicate which descriptor was not yet marked as being ‘Present’ • The handler can then ‘load’ that segment with the proper information and adjust its descriptor’s P-bit, then retry the instruction

Error-Code Format 31 15 3 2 1 0 reserved T I I D T E X T table-index Legend: EXT = An external event caused the exception (1=yes, 0=no) IDT = table-index refers to Interrupt Descriptor Table (1=yes, 0=no) TI = The Table Indicator flag, used when IDT=0 (1=GDT, 0=LDT) This same error-code format is used with exceptions 0x0B, 0x0C, and 0x0D

Our ‘simulation’ demo • We can illustrate the ‘just-in-time’ idea by writing a program that performs a ‘far’ call to an ‘uninitialized’ region of memory: • The code-segment descriptor (referenced here by the selector-value ‘sel_CS’) will be initially marked ‘Not-Present’ (so this ‘lcall’ instruction will trigger an exception-11) lcall $sel_CS, $draw_message

Our ‘fault-handler’ • Our Interrupt-Service-Routine for fault-11 will do two things: • Initialize the memory-region with code and data • Mark the code-segment’s descriptor as ‘Present’ • It will carefully preserve the CPU registers, so that it can ‘retry’ the faulting instruction

Where is the ‘error-code’? 16-bits FLAGS +6 CS +4 IP +2 error-code SS:SP +0 Layout of our fault-handler’s stack (because we used a 286 interrupt-gate) The Pentium provides a special pair of instructions that procedures can use to address any parameter-values that reside on its stack: ‘enter’ and ‘leave’

Code using ‘enter’ and ‘leave’ isrNPF: # Our fault-handler for exception-0x0B enter $0, $0 # setup stackframe access call initialize_the_high_arena call mark_segment_as_ready leave # discard the frame access add $2, %sp # discard the error-code iret # ‘retry’ the faulting instruction

What does ‘enter’ do? • The effect of the single instruction enter $0, $0 is equivalent to this instruction-sequence: push %bp mov %sp, %bp

How the stack is changed 16-bits 16-bits FLAGS FLAGS +6 +8 CS CS +4 +6 IP IP +2 +4 error-code error-code SS:SP +0 +2 old-BP Layout of our fault-handler’s stack BEFORE executing ‘enter’ SS:SP SS:BP Layout of our fault-handler’s stack AFTER executing ‘enter’ NOTE: Any memory-references that use indirect addressing via register BP will use the SS segment-register by default (not the DS segment-register) for example: testw $0x0007, 2(%bp)

What does ‘leave’ do? • The effect of the single instruction leave is equivalent to this instruction-sequence: mov %bp, %sp pop %bp

How the stack is changed 16-bits 16-bits FLAGS FLAGS +6 +8 CS CS +4 +6 IP IP +2 +4 error-code error-code SS:SP +0 +2 old-BP Layout of our fault-handler’s stack AFTER executing ‘leave’ SS:BP … other pushed words SS:SP So the effect of ‘leave’ is to undo the effect of ‘enter’ Layout of our fault-handler’s stack BEFORE executing ‘leave’

Our demo’s memory-layout ARENA #3 (not used by this demo) 0x00030000 ARENA #2 (where our demo expects drawing code will reside) Copy contents of ARENA #1 to ARENA #2 0x00020000 ARENA #1 (where the loader puts our program code and data) 0x00010000 BOOT_LOCN 0x00007C00 0x00000000

Efficient copying • We use the Pentium’s ‘rep movsw’ instruction to perform memory-to-memory copying operations • The segment-selector for the segment we copy from (it must be ‘readable’) goes into registers DS, and the segment-selector for the segment we copy to (it must be ‘writable’) goes into ES • The number of words we will copy should match the size of our code-segment (which is 64KB) • The Direction-Flag should be cleared (DF=0)

Example assembly code cld ; use ‘forward’ string-copying mov $sel_ds, %si ; selector for arena at 0x10000 mov %si, %ds ; goes in segment-register DS xor %si, %si ; start copying from offset zero mov $sel_DS, %di ; selector for arena at 0x20000 mov %di, %es ; goes in segment-register DS xor %di, %di ; start copying to offset zero mov $0x8000, %cx ; number of words to be copied rep movsw ; perform the arena-copying

Segment-Descriptor Format 63 47 32 Base[31..24] G D R S V A V L Limit [19..16] P D P L S X C / D R / W A Base[23..16] Base[15..0] Limit[15..0] 0 31 The segment-descriptor’s ‘Present’ bit is bit-number 47

In-class exercise • To get some practical ‘hands on’ experience with implementing the demand-loading concept we suggest the following exercise: Modify our ‘notready.s’ demo so that it uses a 32-bit Interrupt-Gate for its Segment-Not-Present entry in the Interrupt Descriptor Table (this will affect the layout of the fault-handler’s stack) • You may need to abandon use of the ‘enter’ and ‘leave’ instructions unless you also use a 32-bit data-segment descriptor for your stack-segment

Efficient Load-on-Demand Implementation in Operating Systems

Efficient Load-on-Demand Implementation in Operating Systems

Presentation Transcript

Deferred Compensation

Loading

Loading…

Deferred Shading

“Deferred Tax”

Loading…

Deferred Candidates

Loading …… .

“Dream Deferred”