780 likes | 950 Views
summary. Input and output mechanisms.Supervisor mode, exceptions, and traps.Memory management and address translation.Caches.How architecture affects program performance.How architecture affects program power consumption.. 3.1 Introduction . outline. aspects of CPUs that do not directly relate
E N D
1. CH3 CPUs
2. summary Input and output mechanisms.
Supervisor mode, exceptions, and traps.
Memory management and address translation.
Caches.
How architecture affects program performance.
How architecture affects program power consumption.
3. 3.1 Introduction
4. outline aspects of CPUs that do not directly relate to their instruction sets
interrupts and memory management
performance and power consumption
5. outline 3.2: study input and output mechanisms such as interrupts
3.3: several mechanisms designed to handle internal events
3.4: co-processors that provide optional support for parts of the instruction set
3.5: memory systems, memory management and caches
6. outline 3.6 looks at performance
3.7 considers power consumption
3.8 data compressor example
7. 3.2 Programming Input and Output
8. basics of I/O programming
basic characteristics of I/O devices
9. 3.2.1 Input and Output Devices
10. Structure of a typical I/O device Input and output devices usually have some analog or nonelectronic component
relationship between I/O device and CPU
Registers: interface between CPU and device's internals
CPU talks to the device by reading and writing the registers
11. Structure of a typical I/O device
12. Structure of a typical I/O device Data registers: hold data values, such as the data read or written by a disk.
Status registers: provide information about the device's operation
13. Ex1. 8251 UART 8251 UART (Universal Asynchronous Receiver/Transmitter): the original device used for serial communications
Data are transmitted as streams of characters
Every character starts with a start bit (a 0) and a stop bit (a 1)
14. Ex1. 8251 UART baud rate: data bits are sent as high and low voltages at a uniform rate
CPU must set the UART's mode registers
- baud rate:
- data bits: 5-8bits
- parity bit: even, odd,none
- stop bit: 1, 1.5, or 2 bits
15. Ex1. 8251 UART 8-bit register: buffers characters between the UART and the CPU bus.
Transmitter Ready output: transmitter is ready to accept a data character
Transmitter Empty signal: goes high when the UART has no characters to send.
Receiver Ready: goes high when UART has a character ready to be read by CPU.
16. 3.2.2 Input and Output Primitives
17. programming support for input and output I/O instructions
- special instructions (Intel x86) for input and output
memory-mapped I/O
provides addresses for the registers in each I/O device
read and write instructions communicate with the devices
18. Ex1. Memory-Mapped I/O on ARM use the EQU pseudo-op to define a symbolic name for the memory location of our I/O device
DEV1 EQU 0x1000
19. Ex1. Memory-Mapped I/O on ARM read and write the device register:
LDR r1,#DEV1 ; set up device address
LDR r0,[r1] ; read DEV1
LDR r0,#8 ; set up value to write
STR r0,[r1] ; write 8 to device
20. Ex2. Memory-Mapped I/O on SHARC A memory-mapped I/O device must be assigned within the external memory space, which starts at 0x400000.
use a DM access to read and write the off-chip device register:
I0 = 0x400000
M0 = 0
R1 = DM(i0,M0)
21. write I/O devices in C read and write arbitrary memory locations are peek and poke
The peek function written in C as:
int peek(char *location) {
return *location;
}
#define DEV1 0x1000
dev_status = peek(DEVl);
22. write I/O devices in C poke function can be implemented as:
void poke(char *location, char newval) {
(*location) = newval;
}
write 8 to the status register
poke(DEV1,8);
23. 3.2.3 Busy-Wait I/O
24. busy-wait I/O Devices are slower than the CPU and require many cycles to complete an operation.
CPU must wait for one operation to complete before starting the next one
polling: Asking an I/O device whether it is finished by reading its status register
25. Ex3-3 Busy-Wait I/O Programming write a sequence of characters to an output device
two registers: one for the character to be written and a status register
status register's value is 1 when the device is busy writing and 0 when the write transaction has completed
26. Ex3-3 Busy-Wait I/O Programming register addresses
#define 0UT_CHAR 0x1000 /* output device character register */
#define OUT_STATUS 0x1001 /* output device status register */
27. Ex3-3 Busy-Wait I/O Programming sequence of characters is stored in a standard C string, which is terminated by a null (0) character
char *mystring = "Hello, world." /* string to write */
char *current_char; /* pointer to current position in string */
28. Ex3-3 Busy-Wait I/O Programming current_char = mystring;
/* point to head of string */
while (*current_char != '\0') {
/* until null character */
poke(OUT_CHAR,*current_char);
/* send character to device */
while (peek(OUT_STATUS) != 0);
/* keep checking status */
current_char++; /* update character pointer */
}
29. Ex3-4 Copy Characters from Input to Output Using Busy-Wait I/O repeatedly read a character from the input device and write it to the output device
define addresses for the device registers:
#define IN_DATA 0x1000
#define IN_STATUS 0x1001
#define 0UT_DATA 0x1100
#define OUT_STATUS 0x1101
30. Ex3-4 Copy Characters from Input to Output Using Busy-Wait I/O The input device:
sets status register to 1: when a new character has been read;
set the status register 0: after character has been read
When writing:
set the output status register to 1: to start writing and wait for it to return to 0
31. while (TRUE) { /* perform operation forever */
/* read a character into achar */
while (peek(IN_STATUS) == 0); /* wait until ready */
achar = (char)peek(IN_DATA); /* read the character */
/* write achar */
poke(OUT_DATA,achar);
poke(OUT_STATUS,l); /* turn on device */
while (peek(OUT_STATUS) != 0); /* wait until done */
}
32. 3.2.4 Interrupts Busy-wait I/O is inefficient: the CPU does nothing but test the device status
CPU could work in parallel with the I/O transaction:
- computation
- control of other I/O devices.
33. interrupt interrupt mechanism allows devices to signal CPU and to force execution of a particular piece of code
At interrupt, the program counter point to an interrupt handler routine (device driver): writing the next data, reading data
CPU can return to the program that was interrupted
34. interrupt
35. interrupt interface between the CPU and I/O device includes the following signals:
I/O device asserts the interrupt request signal when it wants service
CPU asserts the interrupt acknowledge signal when it is ready to handle the I/O device's request
36. interrupt The interrupt handler operates much like a subroutine, except that it is not called by the executing program
The program that runs when no interrupt is being handled is often called the foreground program
when the interrupt handler finishes, it returns to the foreground program
37. ex3-5 Copy Characters from Input to Output with Basic Interrupts repeatedly read a character from an input device and write it to an output device
use a global variable achar for the input handler to pass the character to the foreground program
use a global Boolean variable, gotchar, to signal when a new character has been received
38. void input_handler() { /* get a character and put in global */
achar = peek(IN_DATA); /* get character */
gotchar = TRUE; /* signal to main program */
poke(IN_STATUS,0); /* reset status to initiate next transfer */
}
void output_handler() { /* react to character being sent */
/* don't have to do anything */
}
39. ex3-5 Copy Characters from Input to Output with Basic Interrupts main(){
while (TRUE) { /* read then write forever */
if (gotchar) { /* write a character */
poke(OUT_DATA,achar); /* put character in device */
poke(OUT_STATUS,l); /* set status to initiate write */
gotchar = FALSE; /* reset flag */
}}}
40. Ex3-6 Copy Characters from Input to Output with Interrupt and Buffer performs reads and writes independently.
The read and write routines communicate through the following global variables:.
string io_buf: hold a queue of characters that have been read but not yet written.
integers buf_start and buf_end: point to the first and last characters read.
integer error: set to 0 whenever io_buf overflows
41. Ex3-6 Copy Characters from Input to Output with Interrupt and Buffer input and output devices allow to run at different rates
queue io_buf acts as a wraparound buffer
add characters to the tail
take characters from the head
42. Ex3-6 Copy Characters from Input to Output with Interrupt and Buffer When head and tail are equal, the queue is empty
43. Ex3-6 Copy Characters from Input to Output with Interrupt and Buffer When the buffer is full, we leave one character in the buffer unused
45. Debug interrupt interrupt can occur at any time means that the same bug can manifest itself in different ways when the interrupt handler interrupts different segments of the foreground program
46. Ex3-7 Debugging Interrupt Code Y = Ax+b:
for (i = 0; i < M; i++) {
y[i] = b[i];
for (j = 0; j < N; j++)
y[i] = y[i] + A[i,j]*x[j];
}
47. Ex3-7 Debugging Interrupt Code Assume read_handler has a bug that causes it to change the value of j
Any CPU register that is written by the interrupt handler must be saved before it is modified and restored before the handler exits
48. implement The CPU implements interrupts by checking the interrupt request line at the beginning of execution of every instruction
If an interrupt request asserted, CPU does not fetch curent instruction
The starting address of the interrupt handler is usually given as a pointer
49. interrupts and subroutines interrupt handler must return to the foreground program without disturbing the foreground program's operation
Most CPUs use the same basic mechanism for remembering the foreground program's PC as is used for subroutines
interrupt mechanism puts the return address on a stack
50. Priorities and Vectors interrupts can be generalized to handle multiple devices and to provide more flexible definitions
- interrupt priorities: CPU to recognize some interrupts as more important than others
- interrupt vectors: allow the interrupting device to specify its handler
51. Prioritized interrupts Prioritized interrupts
- allow multiple devices to be connected
- allow the CPU to ignore less important interrupt requests
the lower-numbered interrupt lines are given higher priority
52. Prioritized device interrupts most CPUs provide the priority number in binary form
53. change the priority How do we change the priority of a device?
Simply by connecting it to a different interrupt request line
This requires hardware modification
programmable switches, or make the change easy
54. Nested interrupt Masking: CPU stores the priority level of interrupt in an internal register
When a subsequent interrupt occur,
- checked against the priority register
- new request only if higher priority
When the interrupt handler exits, the priority register must be reset.
55. power-down interrupts The highest-priority interrupt is normally called the nonmaskable interrupt or NMI.
The NMI cannot be turned off
reserved for interrupts caused by power failures
detect a dangerously low power supply
NMI interrupt handler save critical state in nonvolatile memory, turn off I/O devices
56. Most CPUs provide a relatively small number of interrupt priority levels
more priority levels can be added with external logic
combine polling with prioritized interrupts to efficiently handle the device
57. Using polling to share an interrupt over several devices
58. Ex3-8 I/O with Prioritized Interrupts A has priority 1
B priority 2
C priority 3.
59. Interrupt vectors define the interrupt handler that should service a request from a device
hardware structure to support interrupt vectors
60. Interrupt vectors additional interrupt vector lines run from the devices to the CPU
After request is acknowledged, device sends its interrupt vector to CPU.
CPU uses vector number as an index in a table stored in memory
gives the address of the handler
61. Activity on the bus during a vectored interrupt
62. Interrupt vectors First, the device stores its vector number. a device can be given a new handler without modifying the system software.
there is no fixed relationship between vector numbers and interrupt handlers
63. implement Most modern CPUs implement both prioritized and vectored interrupts.
Priorities determine which device is serviced first
vectors determine what routine is used to service the interrupt
64. Interrupt Overhead complete interrupt handling process
Once a device requests an interrupt, some steps are performed by the CPU, some by the device, and others by software.
The basic procedure is described below.
1. CPU: checks interrupts at the beginning of an instruction, answers the highest-priority interrupt
65. Interrupt Overhead 2. Device: device receives acknowledgment and sends the CPU its interrupt vector.
3. CPU: CPU looks up the device handler address in the interrupt vector table, save current PC, internal CPU state, general-purpose registers.
66. Interrupt Overhead 4. Software: device driver save additional CPU state, performs required operations, restores saved state, executes interrupt return instruction.
5. CPU: interrupt return instruction restores the PC and other automatically saved states, return to the interrupted.
67. performance penalty interrupt causes a change in the program counter, it incurs a branch penalty. if interrupt automatically stores CPU registers, requires extra cycles
interrupt requires extra cycles to acknowledge the interrupt and obtain the vector from the device.
68. performance penalty interrupt handler will save and restore CPU registers that were not automatically saved by the interrupt.
interrupt return instruction incurs a branch penalty as well as the time required to restore the automatically saved state.
69. performance penalty time required for the hardware to respond to the interrupt, obtain the vector, cannot be changed by the programmer.
programming result in a small number of registers used by an interrupt handler
coding interrupt handler in assembly language rather than a high-level language
70. Interrupts in ARM types of interrupts: fast interrupt requests (FIQs) and interrupt requests (IRQs).
FIQ takes priority over an IRQ.
interrupt table is kept in the bottom memory addresses, starting at location 0.
The entries in the table contain subroutine calls to the appropriate handler.
71. Interrupts in ARM responding to an interrupt:
saves the appropriate value of the PC to be used to return,
copies the CPSR into an SPSR (saved program status register),
forces bits in the CPSR to note the interrupt, and
forces the PC to the appropriate interrupt vector.
72. Interrupts in ARM leaving the interrupt handler :
restore the proper PC value,
restore the CPSR from the SPSR, and
clear interrupt disable flags.
73. Interrupts in ARM worst-case latency to respond:
2 cycles to synchronize external request,
up to 20 cycles to complete current instruction,
3 cycles for data abort
2 cycles to enter interrupt handling state.
adds up to 4-27 clock cycles
74. Interrupts in SHARC supports three prioritized, vectored, maskable interrupts,
each of which calls an interrupt handler subroutine
75. When processing an interrupt outputs interrupt vector address;
pushes current PC onto the PC stack;
may push the ASTAT and MODE1 registers onto the status stack;
sets appropriate bit in the interrupt latch register
changes interrupt mask pointer to show the current interrupt nesting state.
76. return from an interrupt pops the return address of the PC stack and saves it to the PC,
pops the status stack if appropriate, and
clears the appropriate bits in the interrupt latch and mask registers.
77. Interrupts in SHARC The interrupt vector table may be kept either in internal or external memory.
vector table provides interrupt vectors for a number of actions, including:
reset, the three external interrupts,
internal DMA channels, timers,
floating-point errors,
user software interrupts.
78. Interrupts in SHARC For most instructions, the latency for an external interrupt is four cycles.
Some instructions require multiple cycles to finish and will delay interrupt handling;
waiting for external memory may also delay handling.