Multiprocessor initialization
Download
1 / 21

Multiprocessor Initialization An introduction to the use of ... - PowerPoint PPT Presentation


  • 1432 Views
  • Uploaded on

Multiprocessor Initialization. An introduction to the use of Interprocessor Interrupts. Multiprocessor topology. Back Side Bus. Local APIC. Local APIC. IO APIC. CPU #0. CPU #1. Front Side Bus. peripheral devices. system memory. bridge. The Local-APIC ID register. 31. 24.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Multiprocessor Initialization An introduction to the use of ...' - oshin


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Multiprocessor initialization

Multiprocessor Initialization

An introduction to the use of Interprocessor Interrupts


Multiprocessor topology
Multiprocessor topology

Back Side Bus

Local

APIC

Local

APIC

IO

APIC

CPU

#0

CPU

#1

Front Side Bus

peripheral

devices

system memory

bridge


The local apic id register
The Local-APIC ID register

31

24

0

reserved

APIC

ID

This register is initially zero, but its APIC ID Field (8-bits) is programmed

by the BIOS during system startup with a unique processor identification-

number which subsequently is used when specifying the processor as a

recipient of inter-processor interrupts.

Memory-Mapped Register-Address: 0xFEE00020


The local apic eoi register
The Local-APIC EOI register

31

0

write-only register

This write-only register is used by Interrupt Service Routines to issue an

‘End-Of-Interrupt’ command to the Local-APIC. Any value written to this

register will be interpreted by the Local-APIC as an EOI command. The

value stored in this register is initially zero (and it will remain unchanged).

Memory-Mapped Register-Address: 0xFEE000B0


The spurious interrupt register
The Spurious Interrupt register

31

8

7

0

reserved

E

N

spurious

vector

Local-APIC is Enabled (1=yes, 0=no)

This register is used to Enable/Disable the functioning of the Local-APIC,

and when enabled, to specify the interrupt-vector number to be delivered

to the processor in case the Local-APIC generates a ‘spurious’ interrupt.

(In some processor-models, the vector’s lowest 4-bits are hardwired 1s.)

Memory-Mapped Register-Address: 0xFEE000F0


Interrupt command register
Interrupt Command Register

  • Each Pentium’s Local-APIC has a 64-bit Interrupt Command Register

  • It can be programmed by system software to transmit messages (via the Back Side Bus) to one or several other processors

  • Each processor has a unique identification number in its APIC Local-ID Register that can be used for directing messages to it


Icr upper 32 bits
ICR (upper 32-bits)

31

24

0

reserved

Destination

field

The Destination Field (8-bits) can be used to specify which

processor (or group of processors) will receive the message

Memory-Mapped Register-Address: 0xFEE00310


Icr lower 32 bits
ICR (lower 32-bits)

15

31

19 18

12

10 8

7

0

R

/

O

Vector

field

Delivery Mode

000 = Fixed

001 = Lowest Priority

010 = SMI

011 = (reserved)

100 = NMI

101 = INIT

110 = Start Up

111 = (reserved)

Destination Shorthand

00 = no shorthand

01 = only to self

10 = all including self

11 = all excluding self

Trigger Mode

0 = Edge

1 = Level

Level

0 = De-assert

1 = Assert

Destination Mode

0 = Physical

1 = Logical

Delivery Status

0 = Idle

1 = Pending

Memory-Mapped Register-Address: 0xFEE00300


Mp initialization protocol
MP initialization protocol

  • Set shared processor-counter equal to 1

  • Step 1: issue an ‘INIT’ IPI to all-except-self

  • Delay for 10 milliseconds

  • Step 2: issue ‘Startup’ IPI to all-except-self

  • Delay for 200 microseconds

  • Step 3: issue ‘Startup’ IPI to all-except-self

  • Delay for 200 microseconds

  • Check the value of the processor-counter


Issue init ipi
Issue ‘INIT’ IPI

# address Local-APIC via register FS

mov $sel_fs, %ax

mov %ax, %fs

# broadcast ‘INIT’ IPI to ‘all-except-self’

mov $0x000C4500, %eax

mov %eax, %fs:0xFEE00300)

.B0: btl $12, %fs:(0xFEE00300)

jc .B0


Issue startup ipi
Issue ‘Startup’ IPI

# broadcast ‘Startup’ IPI to all-except-self

# using vector 0x11 to specify entry-point

# at real memory-address 0x00011000

mov $0x000C4611, %eax

mov %eax, %fs:(0xFEE00300)

.B1: btl $12, %fs:(0xFEE00300)

jc .B1


Timing delays
Timing delays

  • Intel’s MP Initialization Protocol specifies the use of some timing-delays:

    • 10 milliseconds ( = 10,000 microseconds)

    • 200 microseconds

  • We can use the 8254 Timer’s Channel 2 for implementing these timed delays, by programming it for ‘one-shot’ countdown mode, then polling bit #5 at i/o port 0x61


Mathematical examples
Mathematical examples

EXAMPLE 1

Delaying for 10-milliseconds means delaying for 1/100-th of a second

(because 100 times 10 milliseconds = one-thousand milliseconds)

EXAMPLE 2

Delaying for 200-microseconds means delaying 1/5000-th of a second

(because 5000 times 200 microseconds = one-million microseconds)

GENERAL PRINCIPLE

Delaying for x–microseconds means delaying for 1000000/x seconds

(because 1000000/x times x-microseconds = one-million microseconds)


Mathematical theory

PROBLEM: Given the desired delay-time in microseconds,

express the desired delay-time in clock-frequency pulses

and program that number into the PIT’s Latch-Register

RECALL: Clock-Frequency-in-Seconds = 1193182 Hertz

ALSO: One second equals one-million microseconds

APPLYING DIMENSIONAL ANALYSIS

Pulses-Per-Microsecond = Pulses-Per-Second / Microseconds-Per-Second

Delay-in-Clock-Pulses = Delay-in-Microseconds * Pulses-Per-Microsecond

CONCLUSION

For a desired time-delay of x microseconds, the number of clock-pulses

may be computed as x * (1193182 /1000000) = 1193182 / (1000000 / x )

as dividing by a fraction amounts to multiplying by that fraction’s reciprocal


Delaying for eax microseconds
Delaying for EAX microseconds

# We use the 8254 Timer/Counter Channel 2 to generate a

# timed delay (expressed in microseconds by value in EAX)

mov %eax, %ecx # copy delay-time to ECX

mov %1000000, %eax # microseconds-per-sec

xor %edx, %edx # extended to quadword

div %ecx # perform dword division

mov %eax, %ecx # copy quotient into ECX

mov $1193182, %ecx # input-pulses-per-sec

xor %edx, %edx # extended to quadword

div %ecx # perform dword division

# now transfer the quotient from AX to the Channel 2 Latch


Mutual exclusion
Mutual Exclusion

  • Shared variables must not be modified by more than one processor at a time (‘mutual exclusion’)

  • The Pentium’s ‘lock’ prefix helps enforce this

  • Example: every processor adds 1 to count

    lock

    incl (count)

  • Example: all processors needs private stacks

    mov 0x1000, %ax

    lock

    xadd [new_SS], %ax

    mov %ax, %ss


Rom bios isn t reentrant
ROM-BIOS isn’t ‘reentrant’

  • The video service-functions in ROM-BIOS that we use to display a message-string at the current cursor-location (and afterward advance the cursor) modify global storage locations (as well as i/o ports), and hence must be called by one processor at a time

  • A shared memory-variable (called ‘mutex’) is used to enforce this mutual exclusion


Implementing a spinlock
Implementing a ‘spinlock’

mutex: .word 1

spin: btw $0, mutex

jnc spin

lock

btrw $0, mutex

jnc spin

# <CRITICAL SECTION OF CODE GOES HERE>

lock

btsw $0, mutex


Demo smphello s
Demo: ‘smphello.s’

  • Each CPU needs to access its Local-APIC

  • The BSP (“Boot-Strap Processor”) wakes up other processors by broadcasting the ‘INIT-SIPI-SIPI’ message-sequence

  • Each AP (“Application Processor”) starts executing at a 4K page-boundary, and needs its own private stack-area

  • Shared variables need ‘exclusive’ access


In class exercise
In-class exercise

  • Include this procedure that multiple CPUs will execute simultaneously (without ‘lock)

    total: .word 0 # the shared variable

    add_one_thousand:

    mov $1000, %cx

    nxinc: addw $1, (total)

    loop nxinc

    ret


We may need a barrier
We may need a ‘barrier’

  • We can use a software construct (known as a ‘barrier’) to stop CPUs from entering a block of code until a prescribed number of them are all ready to enter it together

    arrived: .word 0 # shared variable

    barrier: lock

    incw (arrived)

    await: cmpw $2, (arrived)

    jb await

    call add_one_thouand


ad