1 / 35

Christopher Foster Scott Thibaudeau Brian Cleary

Christopher Foster Scott Thibaudeau Brian Cleary. Itanium – IA-64: Overview. Development of the Parallel Processor Success and Failure (Problems and Solutions) Multiple Parallel Pipelines on a Single Die Itanium is born! Execution of Parallel Processing in IA-64

wednesday
Download Presentation

Christopher Foster Scott Thibaudeau Brian Cleary

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Christopher Foster Scott Thibaudeau Brian Cleary

  2. Itanium – IA-64: Overview. • Development of the Parallel Processor • Success and Failure (Problems and Solutions) • Multiple Parallel Pipelines on a Single Die • Itanium is born! • Execution of Parallel Processing in IA-64 • 10 deep pipeline execution; 9 Parallel distribution sites • Current and future IA-64 code Development • The Memory Requirements and Specifications • Heirarchy: Registers, L1,2,3 Cache, Main Memory, HD • L1=Data; L2=Unified; L3=Off-Chip: Fully Associative • Latency Times • Full Memory Block Diagram Overview • System Management Bus (SM Bus) • Thermal System • EEPROM, PIROM • System Bus (IA-64 Bus Architecture) • Bandwidth • Parallel Processors in Parallel • SAC, SDC (Controls access to the bus)

  3. History of Microprocessors:A Very Abridged Tour. • Beginning of time: Circa 1980 and before… • CISC and RISC Computers are all that exist. • Zilog 6502 Lives in every house (Nintendo). • Ronald Regan in office. • Middle Ages: Circa 1990 • Parallel Processing exists in white-papers. • IA-32 is in almost every desktop. • Vanilla Ice hits it big. • Current Day: Circa 2000 • Beowulf Clusters (Distributed Parallel Processing Networks) • Pentium breaks the GHz mark with IA-32. • Intel develops the IA-64 Architecture to support Parallel on die.

  4. So what’s so good about Parallelism? At the most efficient each parallel path divides the execution time IN HALF! • This leads to incredible gains: • Productivity (Reduced Latency) • Wait times for compile/execute • Increased functionality in real-time processes • Reliability (Redundancy) • Multiple modules for eachfunctional unit • Security (Locality) • All processors in one place (physically) • Encryption power increased • Scalability (Modular reuse)

  5. But are there any disadvantages? • YES: • Memory Size/Latency • Branch Prediction • Independent Instructions

  6. IA-64 Solves All of these problems: • Memory Size: 64 bit addressing | Huge Register File • Memory Latency: Multiple Layers of Cache • Branch Prediction: Hardware Solution • Independent Instructions: *New code classes* And with these problems out of the way…

  7. The way is prepared for:Multiple Parallel Processes on a Single Die:Explicitly Parallel Instruction Processing (EPIC) • With resources made available, the Itanium is able to use multiple • functional units for each process required. • This results in an incredible number of • separate pipelined execution paths: • Integer Function Units (2) • Memory Units (2) • Branch Prediction Units (3) • Floating Point Units (2) + • Total 9 separate execution paths! Note: Though the focus is not on pipelining here, there are 10 deep pipelines for each unit.

  8. Overall Architecture

  9. The Full Pipeline Procedure

  10. Fetch/Distribution Procedures 3 instructions per bundle 2 bundles per clock x Fully 6 instructions per clock. M0, M1, I0, I1, F0, F1, B0, B1, B2 These are all execution pipelines. M=Memory Units F=Floating Point Units I=Integer Units B=Branch

  11. How do we write code for The Itanium? • *NEW Code Classes* • Allow programmer to specify specific function units for: • Loads, Arithmetic, Branch Ops, Logic Operations • Enable users to specify INDEPENDENT INSTRUCTIONS • Interpretation at OS Level: • Windows 64 (to be released as Windows XP64); • Linux-64, HP-UX, Modesto; • PAL Level interpretation • Possibility of Virtual Machine interface.

  12. And what does this code look like?

  13. Is Itanium Fully Developed? No. • Some registers yet to be named and used. • Windows 64 not yet available. • Cost of processor/memory production still too high. And they haven’t written any books on the subject yet either. Moore’s Law: If we keep doubling, then we can expect IA-64 to be around half as long as IA-32. That’s about 5-7 years. That gives us at least 3 more.

  14. Register File • 256 general and floating point registers • 64-bits wide • Rotating registers

  15. Memory Hierarchy • Level 1 Data Cache (L1-D) • Level 1 Instruction Cache (L1-I) • 16Kb, 4-way set associative with 32-byte lines • Level 2 Unified Cache (L2) • Level 3 Cache (L3) • Main Memory (FSB) Bus • Maximum Bandwidth of 2.1GB/s. • Level 1 & Level 2 Data Translation Lookaside Buffers (L1/L2-DTLB) • Instruction Translation Cache (ITLB)

  16. Level 1 Data Cache (L1-D) • 16 Kb, 4-way set associative, write through, no write allocate with 32-byte lines • Integer loads have 2-cycle latency • Floating Point loads bypass L1 Data cache

  17. Level 2 Unified Cache (L2) • 96Kb, 6-way set associative, write back and write allocate with 64-byte lines • Integer loads have 6-cycle latency • Floating Point have 9-cycle latency

  18. L3 Cache (L3)??? • Off-chip • 2Mb or 4Mb package • Maximum bandwidth from L3 to L2 is 16 bytes times the core frequency • Integer loads have 21-cycle latency • Floating Point have 24-cycle latency So what?

  19. L1 & L2 Data Translation Lookaside Buffer • 32 & 96 entries, respectively • Both fully associative • Both support page sizes of 4k, 8k, 16k, 64k, 256k, 1M, 4M, 16M, 64M, and 256M • Purges supported include all page sizes and 4G

  20. Instruction Translation Cache • Single-level instruction • 64 entries • Fully associative

  21. Overall Architecture

  22. IA-64 Thermal Specifications • What are the components? How does it work? • Internal thermal circuit w/ thermal sensing diode • How does it protect itself from overheating? • Comparison to THIGH • What happens when overheating occurs? • Thermal Alert Register tripped • To restore… • What exactly are the heat tolerances? What should be calculated? Any equations? • According to Intel…

  23. IA-64 Thermal Specifications

  24. IA-64 Thermal Specifications: Dimensions of Thermal Sensor

  25. IA-64 Thermal Specifications: The Processors • What about the AMD/P4/P3? • P4: Application Slows Down (Itanium inherits fundamental heat protection) • P3: Application Freezes • As for the AMD… • Video displaying above characteristics at end of presentation

  26. IA-64 Thermal Specifications: Location of Thermal Sensor

  27. IA-64 System Management Bus (w/Thermal Sensory) • Why do we care about the PIROM and EEPROM? • EEPROM is a read write memory block that enables vendors to specify methods/standards as to how data is transferred in the data bus. • PIROM contains write-protected information regarding certain characteristics of the processor (frequency speed). • As for the thermal sensor, in conjunction with the above components, accurate temperature checking/regulation is achieved.

  28. IA-64 System Management Bus: Data/Addressing Management • Packet Types (Read/Write) • Memory Units: current address read, random access read, sequential read, byte write, page write • Thermal Unit: write byte, read byte, send byte, receive byte, ARA • Addressing • Memory Units: “1010XXY2b” • Thermal Unit: “0011XXXZb” “1001XXXZb” “0101XXXZb”

  29. IA-64 System Management Bus:Memory Unit Packet Types

  30. IA-64 System Management Bus:Thermal Unit Packet Types

  31. IA-64 Bus Architecture:SMBus Timing Diagrams

  32. IA-64 Main Bus Architecture: Overview

  33. IA-64 Main Bus Architecture:Specifications • 64-Bit bus running at 2.1 GB/s • Up to [4] Itaniums can be connected in parallel to the same bus (running at 266 Mhz) • SAC: System Address Controller • SDC: System Data Controller • Above controllers assign Address or Data Information from the Itanium(s) to the memory unit (from multiple processors to a single bus line and vice versa)

  34. IA-64 Customer Feedback • What are journalists, customers saying? - “The heat generated from the Itanium can be compared to an EZ-Bake Oven…Intel is losing its foothold in the processor industry by relying on the archaic x86 architecture.” - “Upgrading a mission critical system is a daunting task, especially since there exists reliable 64-bit Unix Machines. Then there’s the code conversion problem…”

More Related