1 / 26

Deferred segment-loading

Deferred segment-loading. An exercise on implementing the concept of ‘load-on-demand’. The ‘do-it-later’ philosophy. Modern operating systems often follow a policy of deferring work whenever possible

amos-guzman
Download Presentation

Deferred segment-loading

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Deferred segment-loading An exercise on implementing the concept of ‘load-on-demand’

  2. The ‘do-it-later’ philosophy • Modern operating systems often follow a policy of deferring work whenever possible • The advantage of adopting this practice is most evident in those cases where it turns out that the work was not needed after all • Example: Many programs contain lots of code and data for diagnosing errors – but it’s not needed if no errors actually occur

  3. Avoiding wasted effort • Thus it will be more efficient if an OS does not always take time to load those portions of a program (such as its error-diagnostics and error-recovery routines) which may be unnecessary in the majority of situations • But of course the OS needs to be ready to take a ‘timeout’ for loading those routines when and if the need becomes apparent

  4. Another example • In a multitasking environment, many tasks are taking turns at executing instructions • The CPU typically performs task-switching several times every second – and must do a ‘save’ of the outgoing task’s context, and a ‘load’ of the incoming task’s context, any time it switches from one task to the next • We ask: can any of this work be deferred?

  5. The NPX registers • Only a few tasks typically make any use of the Pentium’s ‘floating-point’ registers, so it’s wasteful to do a ‘save-and-reload’ for these registers with every task-switch • The TS-bit (bit #3 in Control Register 0) is designed to assist an OS in implementing a policy of ‘lazy’ context-switching for the set of registers used in floating-point work

  6. Example: effect of TS=1 • Each time the CPU performs a task-switch it automatically sets the TS-bit to 1 (only an OS can execute a ‘clts’ to reset TS=0) • When any task tries to execute any of the NPX instructions (to do some arithmetic with values in the floating-point registers), an exception 7 fault will occur if the TS-bit hasn’t been cleared since a task-switch

  7. The fault-7 exception-handler • The work involved in saving the contents of the floating-point registers being used by a no-longer-active task, and reloading those registers with values that the active task expects to work on, can be deferred to the fault-handler for exception-7 • Then it can clear the TS-bit (with ‘clts’) and ‘retry’ the instruction that caused this ‘fault’

  8. The ‘fork()’ system-call • In a UNIX/Linux operating system, the way any new task get created is by a call to the kernel’s ‘fork()’ service-function • This function is supposed to ‘duplicate’ the entire program-environment of the calling task (i.e., code, data, stack and heap, plus the kernel’s process-control data-structure • But much of this work is often wasted!

  9. The ‘fork-and-exec’ senario • In practice, the most common reason for a program to ‘fork()’ a child-process is so the child-task can launch a separate program: • In these cases the ‘duplicated’ code, data, and heap are not relevant to the new task -- and so they will simply get discarded! if ( fork() == 0 ) execl( “newprog”, newargs, 0 );

  10. ‘loading-on-demand’ • An OS can avoid all the wasted effort of duplicating a parent-task’s resources (its code, data, heap, etc.) by implementing “only upon demand” loading as a policy • For an OS that uses the CPU’s memory-segmentation capabilities, an ‘on demand’ policy can be implemented by using the Pentium ‘Segment-Not-Present’ exception

  11. How it works • Segments remain ‘uninitialized’ until they are actually accessed by an application • Segment-descriptors are initially marked as ‘Not Present’ (i.e., their P-bit is zero) • When any instruction attempts to access such a memory-segment (read, write, or fetch), the CPU responds by generating exception-11: “Segment-Not-Present”

  12. An ‘error-code’ is pushed • Besides pushing the memory-address of the faulting instruction onto the exception-handler’s stack, the CPU also pushes an ‘error-code’ to indicate which descriptor was not yet marked as being ‘Present’ • The handler can then ‘load’ that segment with the proper information and adjust its descriptor’s P-bit, then retry the instruction

  13. Error-Code Format 31 15 3 2 1 0 reserved T I I D T E X T table-index Legend: EXT = An external event caused the exception (1=yes, 0=no) IDT = table-index refers to Interrupt Descriptor Table (1=yes, 0=no) TI = The Table Indicator flag, used when IDT=0 (1=GDT, 0=LDT) This same error-code format is used with exceptions 0x0B, 0x0C, and 0x0D

  14. Our ‘simulation’ demo • We can illustrate the ‘just-in-time’ idea by writing a program that performs a ‘far’ call to an ‘uninitialized’ region of memory: • The code-segment descriptor (referenced here by the selector-value ‘sel_CS’) will be initially marked ‘Not-Present’ (so this ‘lcall’ instruction will trigger an exception-11) lcall $sel_CS, $draw_message

  15. Our ‘fault-handler’ • Our Interrupt-Service-Routine for fault-11 will do two things: • Initialize the memory-region with code and data • Mark the code-segment’s descriptor as ‘Present’ • It will carefully preserve the CPU registers, so that it can ‘retry’ the faulting instruction

  16. Where is the ‘error-code’? 16-bits FLAGS +6 CS +4 IP +2 error-code SS:SP +0 Layout of our fault-handler’s stack (because we used a 286 interrupt-gate) The Pentium provides a special pair of instructions that procedures can use to address any parameter-values that reside on its stack: ‘enter’ and ‘leave’

  17. Code using ‘enter’ and ‘leave’ isrNPF: # Our fault-handler for exception-0x0B enter $0, $0 # setup stackframe access call initialize_the_high_arena call mark_segment_as_ready leave # discard the frame access add $2, %sp # discard the error-code iret # ‘retry’ the faulting instruction

  18. What does ‘enter’ do? • The effect of the single instruction enter $0, $0 is equivalent to this instruction-sequence: push %bp mov %sp, %bp

  19. How the stack is changed 16-bits 16-bits FLAGS FLAGS +6 +8 CS CS +4 +6 IP IP +2 +4 error-code error-code SS:SP +0 +2 old-BP Layout of our fault-handler’s stack BEFORE executing ‘enter’ SS:SP SS:BP Layout of our fault-handler’s stack AFTER executing ‘enter’ NOTE: Any memory-references that use indirect addressing via register BP will use the SS segment-register by default (not the DS segment-register) for example: testw $0x0007, 2(%bp)

  20. What does ‘leave’ do? • The effect of the single instruction leave is equivalent to this instruction-sequence: mov %bp, %sp pop %bp

  21. How the stack is changed 16-bits 16-bits FLAGS FLAGS +6 +8 CS CS +4 +6 IP IP +2 +4 error-code error-code SS:SP +0 +2 old-BP Layout of our fault-handler’s stack AFTER executing ‘leave’ SS:BP … other pushed words SS:SP So the effect of ‘leave’ is to undo the effect of ‘enter’ Layout of our fault-handler’s stack BEFORE executing ‘leave’

  22. Our demo’s memory-layout ARENA #3 (not used by this demo) 0x00030000 ARENA #2 (where our demo expects drawing code will reside) Copy contents of ARENA #1 to ARENA #2 0x00020000 ARENA #1 (where the loader puts our program code and data) 0x00010000 BOOT_LOCN 0x00007C00 0x00000000

  23. Efficient copying • We use the Pentium’s ‘rep movsw’ instruction to perform memory-to-memory copying operations • The segment-selector for the segment we copy from (it must be ‘readable’) goes into registers DS, and the segment-selector for the segment we copy to (it must be ‘writable’) goes into ES • The number of words we will copy should match the size of our code-segment (which is 64KB) • The Direction-Flag should be cleared (DF=0)

  24. Example assembly code cld ; use ‘forward’ string-copying mov $sel_ds, %si ; selector for arena at 0x10000 mov %si, %ds ; goes in segment-register DS xor %si, %si ; start copying from offset zero mov $sel_DS, %di ; selector for arena at 0x20000 mov %di, %es ; goes in segment-register DS xor %di, %di ; start copying to offset zero mov $0x8000, %cx ; number of words to be copied rep movsw ; perform the arena-copying

  25. Segment-Descriptor Format 63 47 32 Base[31..24] G D R S V A V L Limit [19..16] P D P L S X C / D R / W A Base[23..16] Base[15..0] Limit[15..0] 0 31 The segment-descriptor’s ‘Present’ bit is bit-number 47

  26. In-class exercise • To get some practical ‘hands on’ experience with implementing the demand-loading concept we suggest the following exercise: Modify our ‘notready.s’ demo so that it uses a 32-bit Interrupt-Gate for its Segment-Not-Present entry in the Interrupt Descriptor Table (this will affect the layout of the fault-handler’s stack) • You may need to abandon use of the ‘enter’ and ‘leave’ instructions unless you also use a 32-bit data-segment descriptor for your stack-segment

More Related