嵌入式處理器架構與程式設計

嵌入式處理器架構與程式設計 王建民中央研究院資訊所 2008年 7月

Contents • Introduction • Computer Architecture • ARM Architecture • Development Tools • GNU Development Tools • ARM Instruction Set • ARM Assembly Language • ARM Assembly Programming • GNU ARM ToolChain • Interrupts and Monitor

Lecture 2Computer Architecture

Outline • Basic Concepts • Instruction Set Architecture • Machine Organization

What is “Computer Architecture”? Application Software Software Programming System Operating System Instruction Set Architecture Processor Memory I/O System Circuits Hardware Devices

What is “Computer Architecture”? • Instruction Set Architecture (ISA) • Interface between hardware and software • The true language of a machine • The hardware’s specification; defines what a machine does • Computer Organization • The guts of the machine; how the hardware works? • The implementation; must obey the ISA abstraction

Machine Organization Keyboard, Mouse Computer Processor (CPU) (active) Memory (passive) (where programs, & data live when running) Devices Disk(where programs, & data live when not running) Input Control (“brain”) Datapath (“brawn”) Output Display, Printer

Stored Program Computer • 1944: The First Electronic Computer ENIAC at IAS, Princeton Univ. (18,000 vacuum tubes) • Stored-Program Concept – Storing programs as numbers – by John von Neumann – Eckert and Mauchly worked in engineering the concept. • Idea: A program is written as a sequence of instructions, represented by binary numbers. The instructions are stored in the memory just as data. They are read one by one, decoded and then executed by the CPU.

Execution Cycle Instruction Fetch Obtain instruction from program storage Instruction Decode Determine required actions and instruction size Operand Fetch Locate and obtain operand data Execute Compute result value or status Result Store Deposit results in storage for later use Next Instruction Determine successor instruction

software instruction set hardware The Instruction Set The actual programmer visible instruction set

Instruction-Set Processor Design1 • Architecture (ISA) programmer/compiler view • “functional appearance to its immediate user/system programmer” • Opcodes, addressing modes, architected registers, IEEE floating point

Instruction-Set Processor Design2 • Implementation (µarchitecture) processor designer/view • “logical structure or organization that performs the architecture” • Pipelining, functional units, caches, physical registers

Instruction-Set Processor Design3 • Realization (chip) chip/system designer view • “physical structure that embodies the implementation” • Gates, cells, transistors, wires

Levels of Abstraction temp = v[k]; v[k] = v[k+1]; v[k+1] = temp; High Level Language Program (e.g., C) Compiler lw $15, 0($2) lw $16, 4($2) sw $16, 0($2) sw $15, 4($2) Assembly Language Program (e.g., MIPS) Assembler 0000 1001 1100 0110 1010 1111 0101 1000 1010 1111 0101 1000 0000 1001 1100 0110 1100 0110 1010 1111 0101 1000 0000 1001 0101 1000 0000 1001 1100 0110 1010 1111 Machine Language Program (MIPS) Machine Interpretation Datapath Transfer Specification IR <- Imem[PC]; PC <- PC + 4 ° °

Recall in C language • Operators: +, -, *, /, % • Operands: • Variables • Constants • Assignment statement: variable = expression • Expressions consist of operators operating on operands

Statement Constant Operands Memory Register When Translating to Assembly a = b + 5; load $r1, M[b] load $r2, 5 add $r3, $r1, $r2 store $r3, M[a] Operator

Components of an ISA • Organization of programmable storage • Registers • Memory • Addressing modes • Data Types • Encoding and representation • Instruction Format • How are instructions specified? • Instruction Set • What operations can be performed?

Basic ISA Classes1 • Accumulator (only one register) 1 address: Add A ; acc ← acc + mem[A] • Stack 0 address: Add ; tos ← tos + next • General Purpose Register 2 address: Add A, B ; EA(A) ← EA(A) + EA(B) 3 address: Add A, B, C ; EA(A) ← EA(B) + EA(C)

Basic ISA Classes2 • Load/Store • Only load/store instructions can access memory Load Ra, Rb ; Ra ← mem[Rb] Store Ra, Rb ; mem[Rb] ← Ra • Memory to Memory • All operands and destinations can be memory addresses Add A, B, C ; mem[A] ← mem[B] + mem[C]

Comparison of Four ISA Classes • Code sequence for C = A+ B Stack Accumulator Register Register (reg-mem) (load-store) Push A Load A Load R1,A Load R1,A Push B Add B Add R1,B Load R2,B Add Store C Store R1,C Add R3,R1,R2 Pop C Store R3,C • Comparison: Bytes per instruction? Number of instructions? Cycles per instructions?

CISC vs. RISC • CISC (Complex Instruction Set Computer) • May have memory-memory instructions • Variable instruction length • Relatively fewer registers • Complex addressing modes • RISC (Reduced Instruction Set Computer) • Have only load-store instructions • Uniform instruction format • Identical general-purpose registers • Simple addressing modes

General Purpose Registers Dominate • Advantages of registers • Registers are faster than memory • Registers are easier for a compiler to use • E.g., as a place for temporary storage • Registers can hold variables • Memory traffic is reduced (since registers are faster than memory) • Code density is improved (since register named with fewer bits than memory location)

MIPS Registers as an Example • 32 registers, each is 32 bits wide • Groups of 32 bits called a word in MIPS • Registers are numbered from 0 to 31 • Each can be referred to by number or name • Number references: $0, $1, $2, … $30, $31 • By convention, each register also has a name to make it easier to code, e.g., $16 - $23 $s0 - $s7 (C variables) $8 - $15 $t0 - $t7 (temporary) • 32 x 32-bit FP registers (paired DP) • Others: HI, LO, PC

Memory Addressing • Since 1980 almost every machine uses addresses to level of 8-bits (byte) • 2 questions for the design of ISA • Read a 32-bit word as four loads of bytes from sequential byte addresses or as one load word from a single byte address? • Can a word be place on any byte-boundary?

Memory Organization • Viewed as a large single dimension array, with an address • A memory address is an index into the array • “Byte addressing” means that the index points to a byte of memory 0 8 bits of data 1 8 bits of data 2 8 bits of data 3 8 bits of data 4 8 bits of data 5 8 bits of data 6 8 bits of data ...

Word Addressing • Every word in memory has an address, similar to an index in an array • Early computers numbered words like C numbers elements of an array: • Memory[0], Memory[1], Memory[2], … • Today machines address memory as bytes, hence word addresses differ by 4 • Memory[0], Memory[4], Memory[8],… • Computers needed to access 8-bit bytes as well as words (4 bytes/word) Called the “address” of a word

0 1 2 3 Aligned Not Aligned Alignment • An ISA may require that all words start at addresses that are multiples of 4 bytes (called alignment)

Endianess • Big Endian: address of most significant byte = word address (xx00 = Big End of word) • IBM 360/370, Motorola 68k, MIPS, Sparc, HP PA • Little Endian: address of least significant byte = word address (xx00 = Little End of word) • Intel 80x86, DEC Vax, DEC Alpha (Windows NT) 3 2 1 0 little endian byte 0 msb lsb 0 1 2 3 big endian byte 0

Addressing Modes Addressing ModeExampleMeaning Register Add R4,R3 R4←R4+R3 Immediate Add R4,#3 R4←R4+3 Displacement Add R4,100(R1) R4←R4+mem[100+R1] Register indirect Add R4,(R1) R4←R4+mem[R1] Indexed/Base Add R4,(R1+R2) R4←R4+mem[R1+R2] Direct or absolute Add R4,(1000) R4←R4+mem[1000] Memory indirect Add R4,@(R3) R4←R4+mem[mem[R3]] Auto-increment Add R1,(R2)+ R1←R1+mem[R2]; R2←R2+d; Auto-decrement Add R1,-(R2) R2←R2-d; R1←R1+mem[R2]; Scaled Add R4,100(R1)[R2] R4←R4+mem[100+R1+R2*d]

Addressing Mode Usage • 3 programs measured on machine with all address modes (VAX) • Displacement: 42% avg, 32% to 55% • Immediate: 33% avg, 17% to 43% • Register deferred (indirect): 13% avg, 3% to 24% • Scaled: 7% avg, 0% to 16% • Memory indirect: 3% avg, 1% to 6% • Misc: 2% avg, 0% to 3% • 88% displacement, immediate & register indirect • Immediate Size: • 50% to 60% fit within 8 bits • 75% to 80% fit within 16 bits

Instruction Formats1 … Variable: Fixed: Hybrid: …

Instruction Formats2 • If code size is most important, use variable length instructions: • Difficult control design to compute next address • Complex operations, so use microprogramming • Slow due to several memory accesses • If performance is most important, use fixed length instructions • Simple to decode, so use hardware • Wastes code space because of simple operations • Works well with pipelining • Recent embedded machines added optional mode to execute subset of 16-bit wide instructions

Typical Operations Data Movement register-register movement memory-memory movement load/store, in/out, push/pop Arithmetic integer or floating-point add, subtract, multiply, divide Shift shift left/right, rotate left/right Logic not, and, or, xor, set, clear Control (Jump/Branch) unconditional, conditional Subroutine Linkage call, return Interrupt trap, return Synchronization test&set (atomic r-m-w) String search, translate Graphis (MMX) parallel subword ops (4 16-bit add)

Top 10 80x86 Instructions

Summary • While theoretically we can talk about complicated addressing modes and instructions, the ones we actually use in programs are the simple onesRISC philosophy

MIPS Instruction Set Design1 • Use general purpose registers with a load-store architecture: YES • Provide at least 16 general purpose registers plus separate floating-point registers: 31 GPR & 32 FPR • Support basic addressing modes: displacement (with an address offset size of 12 to 16 bits), immediate (size 8 to 16 bits), and register deferred: YES: 16 bits for immediate, displacement • All addressing modes apply to all data transfer instructions: YES

MIPS Instruction Set Design2 • Use fixed instruction encoding if interested in performance and use variable instruction encoding if interested in code size: Fixed • Support these data sizes and types: 8-bit, 16-bit, 32-bit integers and 32-bit and 64-bit IEEE 754 floating point numbers: YES • Support these simple instructions, since they will dominate the number of instructions executed: load, store, add, subtract, move register-register, and, shift, compare equal, compare not equal, branch (with a PC-relative address at least 8-bits long), jump, call, and return: YES, 16b • Aim for a minimalist instruction set: YES

MIPS ISA as an Example Registers • Instruction Categories • Load/store • Computational • Jump and branch • Floating point • Memory management • special $r0 - $r31 PC HI LO 3 Instruction Formats: all 32 bits wide OP $rs $rd sa funct $rt immediate OP $rs $rt jump target OP

Machine Organization Keyboard, Mouse Computer Processor (CPU) (active) Memory (passive) (where programs, & data live when running) Devices Disk(where programs, & data live when not running) Input Control (“brain”) Datapath (“brawn”) Output Display, Printer

Semiconductor Memory, DRAM • Semiconductor memory began to be competitive in early 1970s • Intel formed to exploit market for semiconductor memory • First commercial DRAM was Intel 1103 • 1Kbit of storage on single chip • charge on a capacitor used to hold value • Semiconductor memory quickly replaced core memory in ‘70s

bit lines word lines Col. 1 Col.2M Row 1 N Row Address Decoder Row 2N Memory cell(one bit) M N+M Column Decoder & Sense Amplifiers D Data DRAM Architecture • Bits stored in 2-dimensional arrays on chip • Modern chips have around 4 logical banks on each chip

DRAM Operation • Row access (RAS) • decode row address, enable addressed row (often multiple Kb in row) • bitlines share charge with storage cell • small change in voltage detected by sense amplifiers which latch whole row of bits • sense amplifiers drive bitlines full rail to recharge storage cells • Column access (CAS) • decode column address to select small number of sense amplifier latches (4, 8, 16, or 32 bits depending on DRAM package) • on read, send latched bits out to chip pins • on write, change sense amplifier latches which then charge storage cells to required value • can perform multiple column accesses on same row without another row access (burst mode) • Precharge • charges bit lines to known value, required before next row access

Processor-DRAM Performance Gap • Processor-DRAM performance gap grows 50%/year µProc 60%/yr. (2X/1.5yr) 1000 CPU 100 Performance 10 DRAM 5%/yr. (2X/15 yrs) DRAM 1 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1980 1981 1982 1983 1984 1985 1998 1999 2000 Time

Memory Hierarchy • Fact: Large memories are slow, fast memories are small • How do we create a memory that is large, cheap and fast (most of the time)? • Hierarchy of Levels • Uses smaller and faster memory technologies close to the processor • Fast access time in highest level of hierarchy • Cheap, slow memory furthest from processor • The aim of memory hierarchy design is to have access time close to the highest level and size equal to the lowest level

Current Memory Hierarchy Processor Speed(ns): 1ns 2ns 6ns 100ns 10,000,000ns Size (MB): 0.0005 0.1 1-4 100-1000 100,000 Cost ($/MB): -- $100 $30 $1 $0.05 Technology: Regs SRAM SRAM DRAM Disk Control Secon- dary Mem- ory Main Mem- ory L2 Cache Data-path L1 cache regs

Why Hierarchy works: Natural Locality • The Principle of Locality: • Programs access a relatively small portion of the address space at any second • Temporal Locality(Locality in Time) Recently accessed data tend to be referenced again soon • Spatial Locality(Locality in Space) nearby items will tend to be referenced soon

How is the hierarchy managed? • Registers « Memory • By the compiler (or assembly language programmer) • Cache « Main Memory • By hardware • Main Memory « Disks • By combination of hardware and the operating system (virtual memory) • By the programmer (files)

Inside a Cache Address Address Main Memory Processor CACHE Data Data copy of main memory location 100 copy of main memory location 101 Data Byte Data Byte Line 100 Data Byte 304 6848 Address Tag 416 Data Block

嵌入式處理器架構與 程式設計