1 / 27

Microprocessor system architectures – IA 64

Microprocessor system architectures – IA 64. Jakub Yaghob. Application architecture. Application architecture features – I. Instruction set Architecture Load-Execute-Store architecture, no stack, no division Explicit parallelism

galya
Download Presentation

Microprocessor system architectures – IA 64

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Microprocessor system architectures – IA64 Jakub Yaghob

  2. Application architecture

  3. Application architecture features – I • Instruction set • Architecture • Load-Execute-Store architecture, no stack, no division • Explicit parallelism • Massive resources (128 integer and FP registers, 64 predicate registers, 8 branch registers) • Enhancements • Speculation, predication, software pipelining, branch prediction, multimedia instructions • Instruction level parallelism • Independent instructions in bundles • Multiple bundles per clock

  4. Application architecture features – II • Explicit parallelism • Instruction group • Defined by a compiler • Parallel execution of instructions • Strict requirements on dependencies • Forbidden register RAW, WAW dependencies • Memory model • Relatively weak • Only restriction is RAW, WAW, WAR dependencies on one memory location • Explicit memory access synchronization

  5. Speculation • Early memory load • Control speculation • Advancing load in a condition • Sometimes load executed “uselessly”, when the condition is not met • Data speculation • Advancing load before a store with aliases • Checking using ALAT • Speculation check • No speculative load, if it would cause an exception • Data speculation is invalid, if there is a write to the memory location

  6. Prediction • Predicate registers • 64 1-bit predicate registers PR0-PR63 • PR0 hardwired to 1, write is ignored • No specialized arithmetic/logic flags • Set by compare instructions • Pair of PR (one for the comparison, one for complementary comparison) • Modes of setting (some of them breach WAW inside of an instruction group) • Nearly all instructions are conditioned by a PR

  7. Register stack • Support for function calls • GR0-GR31 are global registers • GR32-GR127 create a register stack • Each procedure has a register frame • 2 variable sized areas: local and output • Register renaming using alloc instruction • First output register becomes GR32 • If register stack overflows, then CPU will free some registers by saving them into the memory

  8. Privilege levels and serialization • Privilege levels • Like IA-32, levels 0-3 • System instructions and registers accessible only with CPL=0 • Serialization • Data dependency • All application and system resources excluding control registers • Values written to a register are observed by instructions in subsequent instruction groups • Instruction serialization • Modifications are observed before subsequent instruction group fetches are re-initiated • Data serialization • Modifications affecting both execution and data memory access are observed • In-flight • Non-serialized resources have “some” value for reads

  9. System registers

  10. Processor Status Register (PSR) • Current execution environment • Divided into four overlapped sections • Special instructions

  11. Control registers • 128 control registers • Large number of reserved, only 26 used • Groups • Global control registers • CR0 (DCR=Default Control Register) • CR2 (IVA=Interruption Vector Address) • CR8 (PTA=Page Table Address) • Global interrupt control registers • Control of an active interrupt • Writes are not serialized

  12. Banked general registers • Fast switching of GR16-GR31 for interrupt handlers • Current bank in PSR.bn • Bank switching • Interrupt selects bank 0 • rfisets the bank from IPSR.bn • bswswitches to the specified bank • Including NaT

  13. Virtual memory model • Virtual regions • Supports OS with Multiple Address Spaces • Protection domain mechanism • Supports OS with Single Address Space • TLB • Algorithms for paging deferred to OS • VHPT (Virtual Hash Page Table) • Augmenting TLB performance • Inverted page tables • Other mechanisms • Various page sizes, fixed translations, …

  14. Address translation

  15. TLB • Separated for code and data • Data TLB translates accesses to VHPT or RSE • Each TLB divided into two parts • Translation registers (TR) • Fully associative array • OS can explicitly set the translation • No automatic replacement • Translationcache (TC) • Entries can be inserted by an instruction • Automatic replacement (from VHPT)

  16. Access rights on pages • Defined by TLB.ar and TLB.pl • Using TLB.ar • Read only • Read, execute • Read, write • Read, write, execute • Read only/read, write • Read, execute/read, write, execute • Read, write, execute/read, write • Exec, promote/read, execute

  17. Virtual addressing – other – I • Page sizes • 4k, 8k, 16k, 64k, 256k, 1M, 4M, 16M, 64M, 256M, 4G • Region registers(RR) • Highest 3 bitsof VA create an indexinto RR • rid – region identification • ps – preferred page size • ve –VHPT enabling

  18. Virtual addressing – other – II • Protection keys • At least 16 keys • A key in TLB entry is compared with protection keys; exception „key miss fault“

  19. VHPT – I

  20. VHPT – II • Vlastnosti • CPU do VHPT nic nezapisuje • CPU neudržuje koherenci TLB a VHPT • Dva formáty • Krátký – pro každou oblast, položka 8B • Dlouhý – jedna velká pro systém, položka 32B • Různé velikosti mocniny 2 • Prohledáváno, pokud selže TLB • Pokud nalezeno ve VHPT, automaticky vloženo do TC • Pevné hashovací funkce

  21. Physical addressing and memory attributes • Only 63 bits • Current architecture and implementation only 50 bits • Memory attributes • Virtual – like IA-32 (WB, WC, …) • Physical – using bit 63 of FA • 0 – WB, speculative • 1 – UC, nonspeculative • Nontrivial rules for memory ordering

  22. Interrupts – I • Kinds depending on handlers • IVA • Handled by OS, a vector defined by CR2 • PAL • Handled by PAL or by system firmware, ev. by OS • Kinds depending on behavior • Abort • Interrupt • External, asynchronous • Fault • Trap • Interrupts are disabled during interrupt handling

  23. Interrupts – II • Currently defined 81 exceptions • 5 for „hard“ exceptions • RESET, INIT, INT, MCA, PMI • 23 for IA-32 emulation • IVA-interrupts • Vectors have fixed address • Exception groups on one vector • External interrupts • 256 vectors • Priority division using vector number • Current vector CR65 (IVR=Interrupt Vector Register) • Current priority in CR66 (TPR=Task Priority Register)

  24. RSE – 1 • Register Stack Engine (RSE) • Transfers registers stack from/to memory • Without software intervention in the background • Different activity modes (lazy-store intensive-load intensive-eager) • Physical register stack must have size at least 96 registers • More in multiplies of 16

  25. RSE – II

  26. Firmware • Processor Abstraction Layer (PAL) • Unified interface to the CPU firmware • System abstraction layer (SAL) • Separates OS from implementation variation of platforms • Extensible firmware interface (EFI) • OS booting • Each FW layer (including OS) has defined an entry point • PAL and SAL placedin 16M memory exactly below 4G • Fixed structure

  27. Model firmware

More Related