1 / 65

CS433: Computer System Organization

CS433: Computer System Organization. Luddy Harrison Intel IA32 Architecture. History. The x86 / IA32 family. 8086 / 8088 (1978). 16-bit registers 16-bit external data bus (808 6 ) 8-bit external data bus (808 8 ) 20-bit address space via segment registers. Intel 286 (1982).

amber-ryan
Download Presentation

CS433: Computer System Organization

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS433: Computer System Organization Luddy Harrison Intel IA32 Architecture

  2. History The x86 / IA32 family

  3. 8086 / 8088 (1978) • 16-bit registers • 16-bit external data bus (8086) • 8-bit external data bus (8088) • 20-bit address space via segment registers

  4. Intel 286 (1982) • segment registers point to descriptor tables • descriptors have 24-bit segment addresses • segment swapping • protection • bounds checking on segments • read/execute/write checking • four privelege levels

  5. Intel 386 (1985) • 32-bit registers (data and address) • virtual 8086 mode • 32-bit address bus • segmented memory model + flat memory model • paging with 4Kbyte pages • pipelined execution (decode + execution)

  6. Intel 486 (1989) • five stage pipeline • 8Kb on-chip L1 cache • write-through • integrated x87 FPU • power management

  7. Intel Pentium (1993) • two pipelines, u and v • superscalar execution • 8kb data + 8kb instruction on-chip L1 caches • write-back option in addition to write-through • branch prediction • burstable 64-bit external data bus • multiprocessor support • [second stepping: MMX]

  8. Intel P6 (1995 – 1999) • Pentium Pro • Pentium II • Pentium II Xeon • Celeron • Pentium III • Pentium III Xeon

  9. Pentium Pro • 3-way superscalar • out-of-order • more aggressive branch prediction • speculative execution • L1 + L2 cache on chip • 8K + 8K L1 • 256K L2

  10. Pentium II • MMX (in P6 family) • 16K + 16K L1 caches • 256K, 512K, 1M L2 caches supported • improved power management

  11. Pentium II Xeon • improved multiprocessor support • 4- and 8-way systems • 2Mb L2 cache on chip

  12. Celeron • low-priced / reduced power market • 128K L2 cache • cheaper package (plastic)

  13. Pentium III • Streaming SIMD Extensions (SSE) • 128-bit registers • floating point vector types

  14. Pentium III Xeon • improved cache

  15. Pentium 4 (2000) • return to Arabic numerals • NetBurst microarchitecture • SSE2 and SSE3

  16. Pentium 4 Supporting Hyper-Threading Technology (2004) • marketing team abandons names in favor of entire sentences • Hyper-Threading is Simultaneous MultiThreading

  17. Intel Xeon (2001-2004) • internal revolt against long name • recycled portion of old name(s) prevails • multiprocessor support • Was this the first Hyper-Threading IA32?

  18. Intel Pentium M (2003) • The M is not a Roman Numeral • not “Pentium 1000” • refers to “Mobile” • low-power • integrated wireless support

  19. Register Architecture The x86 / IA32 family

  20. User-Visible Architectural State

  21. System Status in EFLAGS

  22. Special Register Purposes

  23. Offset Calculation

  24. Overlaid Registers

  25. SIMD

  26. NetBurst MicroArchitecture (Pentium 4) • deep branch prediction • dynamic dataflow analysis • instructions translated into a risc-like form • these in turn are subject to out-of-order execution • speculative execution • up to 126 instructions in flight • up to 48 loads and 24 stores in pipeline • advanced branch predictor • 4K branch target buffer • execution trace cache stores decoded instructions • straightens code on the fly! • 8-way L2 cache • 64-byte cache line size • external bus capable of 6.4Gbytes per second

  27. Front-End Pipeline • Prefetch • Fetch (on prefetch fail) • Decode into micro-operations • Generate microcode from complex operations • Delivers decoded instructions from execution trace cache • Branch prediction

  28. EFLAGS

  29. Data Types

  30. Fundamental Data Types

  31. Floating Point Types

  32. IEEE 754 and IA32 • Kahan et al formulated the proper working of floating point hardware in a documented standard known as IEEE 754 • The x86 was designed to do all “scratch” calculations using a small floating point stack • the entries on the stack are 80-bit extended precision numbers • Unfortunately, this does not correspond well to the semantics of C

  33. Example of Semantic Difference Between Natural x86 Execution and C Semantics double A, B, C, D, E, F, G; // set B=D=F and C=E=G and let B*C be very close to 0 in// extended precision, but exactly 0 in double precision. ... // suppose we use the x86 FP stack to do this RHS: A = B*C + D*E - F*G; // this yields zero in double but non-zero // in extended precision ... assert (A == 0.0);

  34. Operating on NaNs

  35. Pointer Types

  36. Disadvantages of Far Pointers? • To dereference, two moves are necessary (in the general case): • move to segment register • move to address register • To compare, two comparisons are necessary • How do we compare ≤? • To store/load we must do two stores/loads • What about register allocation? • What other primitive types in programming languages are sometimes multi-word types (physically)?

  37. MMX Types (64-bits)

  38. SSEn Types (128 bits)

  39. BCD (Binary Coded Decimal) What is the idea behind BCD? What is this type trying to optimize?

  40. Memory Models

More Related