CS433: Computer System Organization

CS433: Computer System Organization Luddy Harrison Intel IA32 Architecture

History The x86 / IA32 family

8086 / 8088 (1978) • 16-bit registers • 16-bit external data bus (8086) • 8-bit external data bus (8088) • 20-bit address space via segment registers

Intel 286 (1982) • segment registers point to descriptor tables • descriptors have 24-bit segment addresses • segment swapping • protection • bounds checking on segments • read/execute/write checking • four privelege levels

Intel 386 (1985) • 32-bit registers (data and address) • virtual 8086 mode • 32-bit address bus • segmented memory model + flat memory model • paging with 4Kbyte pages • pipelined execution (decode + execution)

Intel 486 (1989) • five stage pipeline • 8Kb on-chip L1 cache • write-through • integrated x87 FPU • power management

Intel Pentium (1993) • two pipelines, u and v • superscalar execution • 8kb data + 8kb instruction on-chip L1 caches • write-back option in addition to write-through • branch prediction • burstable 64-bit external data bus • multiprocessor support • [second stepping: MMX]

Intel P6 (1995 – 1999) • Pentium Pro • Pentium II • Pentium II Xeon • Celeron • Pentium III • Pentium III Xeon

Pentium Pro • 3-way superscalar • out-of-order • more aggressive branch prediction • speculative execution • L1 + L2 cache on chip • 8K + 8K L1 • 256K L2

Pentium II • MMX (in P6 family) • 16K + 16K L1 caches • 256K, 512K, 1M L2 caches supported • improved power management

Pentium II Xeon • improved multiprocessor support • 4- and 8-way systems • 2Mb L2 cache on chip

Celeron • low-priced / reduced power market • 128K L2 cache • cheaper package (plastic)

Pentium III • Streaming SIMD Extensions (SSE) • 128-bit registers • floating point vector types

Pentium III Xeon • improved cache

Pentium 4 (2000) • return to Arabic numerals • NetBurst microarchitecture • SSE2 and SSE3

Pentium 4 Supporting Hyper-Threading Technology (2004) • marketing team abandons names in favor of entire sentences • Hyper-Threading is Simultaneous MultiThreading

Intel Xeon (2001-2004) • internal revolt against long name • recycled portion of old name(s) prevails • multiprocessor support • Was this the first Hyper-Threading IA32?

Intel Pentium M (2003) • The M is not a Roman Numeral • not “Pentium 1000” • refers to “Mobile” • low-power • integrated wireless support

Register Architecture The x86 / IA32 family

User-Visible Architectural State

System Status in EFLAGS

Special Register Purposes

Offset Calculation

Overlaid Registers

SIMD

NetBurst MicroArchitecture (Pentium 4) • deep branch prediction • dynamic dataflow analysis • instructions translated into a risc-like form • these in turn are subject to out-of-order execution • speculative execution • up to 126 instructions in flight • up to 48 loads and 24 stores in pipeline • advanced branch predictor • 4K branch target buffer • execution trace cache stores decoded instructions • straightens code on the fly! • 8-way L2 cache • 64-byte cache line size • external bus capable of 6.4Gbytes per second

Front-End Pipeline • Prefetch • Fetch (on prefetch fail) • Decode into micro-operations • Generate microcode from complex operations • Delivers decoded instructions from execution trace cache • Branch prediction

EFLAGS

Data Types

Fundamental Data Types

Floating Point Types

IEEE 754 and IA32 • Kahan et al formulated the proper working of floating point hardware in a documented standard known as IEEE 754 • The x86 was designed to do all “scratch” calculations using a small floating point stack • the entries on the stack are 80-bit extended precision numbers • Unfortunately, this does not correspond well to the semantics of C

Example of Semantic Difference Between Natural x86 Execution and C Semantics double A, B, C, D, E, F, G; // set B=D=F and C=E=G and let B*C be very close to 0 in// extended precision, but exactly 0 in double precision. ... // suppose we use the x86 FP stack to do this RHS: A = B*C + D*E - F*G; // this yields zero in double but non-zero // in extended precision ... assert (A == 0.0);

Operating on NaNs

Pointer Types

Disadvantages of Far Pointers? • To dereference, two moves are necessary (in the general case): • move to segment register • move to address register • To compare, two comparisons are necessary • How do we compare ≤? • To store/load we must do two stores/loads • What about register allocation? • What other primitive types in programming languages are sometimes multi-word types (physically)?

MMX Types (64-bits)

SSEn Types (128 bits)

BCD (Binary Coded Decimal) What is the idea behind BCD? What is this type trying to optimize?

Memory Models

CS433: Computer System Organization

CS433: Computer System Organization

Presentation Transcript

The Computer System

Redesigning the Organization with Information Systems

Figure 11.2 Schematic of levels of organization in the nervous system.

CSC 317 Computer Organization and Architecture

Computer System Chapter 12. Concurrent Programming

Introduction To Computer System

Naming

Introduction to Computer Architecture (Section-2)

INTRODUCTION OF COMPUTER

Computer Organization and Architecture William Stallings 8th Edition

CSE 5344 Computer Networks

CHAPTER 1 INTRODUCTION TO COMPUTER SYSTEM

55:035 Computer Architecture and Organization

系統程式

Chapter 1: Introduction

Computer Organization and Architecture

Computer Organization

CS 245: Database System Principles Notes 03: Disk Organization

Review of ECE301: Computer Organization