christopher foster scott thibaudeau brian cleary n.
Skip this Video
Loading SlideShow in 5 Seconds..
Christopher Foster Scott Thibaudeau Brian Cleary PowerPoint Presentation
Download Presentation
Christopher Foster Scott Thibaudeau Brian Cleary

Loading in 2 Seconds...

play fullscreen
1 / 35

Christopher Foster Scott Thibaudeau Brian Cleary - PowerPoint PPT Presentation

  • Uploaded on

Christopher Foster Scott Thibaudeau Brian Cleary. Itanium – IA-64: Overview. Development of the Parallel Processor Success and Failure (Problems and Solutions) Multiple Parallel Pipelines on a Single Die Itanium is born! Execution of Parallel Processing in IA-64

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

Christopher Foster Scott Thibaudeau Brian Cleary

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
christopher foster scott thibaudeau brian cleary
Christopher Foster

Scott Thibaudeau

Brian Cleary


Itanium – IA-64: Overview.

  • Development of the Parallel Processor
    • Success and Failure (Problems and Solutions)
    • Multiple Parallel Pipelines on a Single Die
    • Itanium is born!
  • Execution of Parallel Processing in IA-64
    • 10 deep pipeline execution; 9 Parallel distribution sites
    • Current and future IA-64 code Development
  • The Memory Requirements and Specifications
    • Heirarchy: Registers, L1,2,3 Cache, Main Memory, HD
    • L1=Data; L2=Unified; L3=Off-Chip: Fully Associative
    • Latency Times
    • Full Memory Block Diagram Overview
  • System Management Bus (SM Bus)
    • Thermal System
  • System Bus (IA-64 Bus Architecture)
    • Bandwidth
    • Parallel Processors in Parallel
    • SAC, SDC (Controls access to the bus)
history of microprocessors a very abridged tour
History of Microprocessors:A Very Abridged Tour.
  • Beginning of time: Circa 1980 and before…
    • CISC and RISC Computers are all that exist.
    • Zilog 6502 Lives in every house (Nintendo).
    • Ronald Regan in office.
  • Middle Ages: Circa 1990
    • Parallel Processing exists in white-papers.
    • IA-32 is in almost every desktop.
    • Vanilla Ice hits it big.
  • Current Day: Circa 2000
    • Beowulf Clusters (Distributed Parallel Processing Networks)
    • Pentium breaks the GHz mark with IA-32.
    • Intel develops the IA-64 Architecture to support Parallel on die.
so what s so good about parallelism
So what’s so good about Parallelism?

At the most efficient each parallel path

divides the execution time IN HALF!

  • This leads to incredible gains:
  • Productivity (Reduced Latency)
    • Wait times for compile/execute
    • Increased functionality in real-time processes
  • Reliability (Redundancy)
    • Multiple modules for eachfunctional unit
  • Security (Locality)
    • All processors in one place (physically)
    • Encryption power increased
  • Scalability (Modular reuse)
but are there any disadvantages
But are there any disadvantages?
  • YES:
    • Memory Size/Latency
    • Branch Prediction
    • Independent Instructions
ia 64 solves all of these problems
IA-64 Solves All of these problems:
  • Memory Size: 64 bit addressing | Huge Register File
  • Memory Latency: Multiple Layers of Cache
  • Branch Prediction: Hardware Solution
  • Independent Instructions: *New code classes*

And with these problems out of the way…

The way is prepared for:Multiple Parallel Processes on a Single Die:Explicitly Parallel Instruction Processing (EPIC)
  • With resources made available, the Itanium is able to use multiple
  • functional units for each process required.
  • This results in an incredible number of
  • separate pipelined execution paths:
  • Integer Function Units (2)
  • Memory Units (2)
  • Branch Prediction Units (3)
  • Floating Point Units (2) +
  • Total 9 separate execution paths!

Note: Though the focus is not

on pipelining here, there are

10 deep pipelines for each unit.

fetch distribution procedures
Fetch/Distribution Procedures

3 instructions per bundle

2 bundles per clock x

Fully 6 instructions per clock.

M0, M1, I0, I1, F0, F1, B0, B1, B2

These are all execution pipelines.

M=Memory Units

F=Floating Point Units

I=Integer Units


how do we write code for the itanium
How do we write code for The Itanium?
  • *NEW Code Classes*
    • Allow programmer to specify specific function units for:
      • Loads, Arithmetic, Branch Ops, Logic Operations
    • Enable users to specify INDEPENDENT INSTRUCTIONS
    • Interpretation at OS Level:
      • Windows 64 (to be released as Windows XP64);
      • Linux-64, HP-UX, Modesto;
    • PAL Level interpretation
      • Possibility of Virtual Machine interface.
is itanium fully developed
Is Itanium Fully Developed?


  • Some registers yet to be named and used.
  • Windows 64 not yet available.
  • Cost of processor/memory production still too high.

And they haven’t written any books on the subject yet either.

Moore’s Law: If we keep doubling, then we can expect

IA-64 to be around half as long as IA-32. That’s about

5-7 years. That gives us at least 3 more.

register file
Register File
  • 256 general and floating point registers
  • 64-bits wide
  • Rotating registers
memory hierarchy
Memory Hierarchy
  • Level 1 Data Cache (L1-D)
  • Level 1 Instruction Cache (L1-I)
    • 16Kb, 4-way set associative with 32-byte lines
  • Level 2 Unified Cache (L2)
  • Level 3 Cache (L3)
  • Main Memory (FSB) Bus
    • Maximum Bandwidth of 2.1GB/s.
  • Level 1 & Level 2 Data Translation Lookaside Buffers (L1/L2-DTLB)
  • Instruction Translation Cache (ITLB)
level 1 data cache l1 d
Level 1 Data Cache (L1-D)
  • 16 Kb, 4-way set associative, write through, no write allocate with 32-byte lines
  • Integer loads have 2-cycle latency
  • Floating Point loads bypass L1 Data cache
level 2 unified cache l2
Level 2 Unified Cache (L2)
  • 96Kb, 6-way set associative, write back and write allocate with 64-byte lines
  • Integer loads have 6-cycle latency
  • Floating Point have 9-cycle latency
l3 cache l3
L3 Cache (L3)???
  • Off-chip
  • 2Mb or 4Mb package
  • Maximum bandwidth from L3 to L2 is 16 bytes times the core frequency
  • Integer loads have 21-cycle latency
  • Floating Point have 24-cycle latency

So what?

l1 l2 data translation lookaside buffer
L1 & L2 Data Translation Lookaside Buffer
  • 32 & 96 entries, respectively
  • Both fully associative
  • Both support page sizes of 4k, 8k, 16k, 64k, 256k, 1M, 4M, 16M, 64M, and 256M
  • Purges supported include all page sizes and 4G
instruction translation cache
Instruction Translation Cache
  • Single-level instruction
  • 64 entries
  • Fully associative
ia 64 thermal specifications
IA-64 Thermal Specifications
  • What are the components? How does it work?
  • Internal thermal circuit w/ thermal sensing diode
  • How does it protect itself from overheating?
  • Comparison to THIGH
  • What happens when overheating occurs?
  • Thermal Alert Register tripped
  • To restore…
  • What exactly are the heat tolerances? What should be calculated? Any equations?
  • According to Intel…
ia 64 thermal specifications the processors
IA-64 Thermal Specifications: The Processors
  • What about the AMD/P4/P3?
  • P4: Application Slows Down (Itanium inherits fundamental heat protection)
  • P3: Application Freezes
  • As for the AMD…
  • Video displaying above characteristics at end of presentation
ia 64 system management bus w thermal sensory
IA-64 System Management Bus (w/Thermal Sensory)
  • Why do we care about the PIROM and EEPROM?
  • EEPROM is a read write memory block that enables vendors to specify methods/standards as to how data is transferred in the data bus.
  • PIROM contains write-protected information regarding certain characteristics of the processor (frequency speed).
  • As for the thermal sensor, in conjunction with the above components, accurate temperature checking/regulation is achieved.
ia 64 system management bus data addressing management
IA-64 System Management Bus: Data/Addressing Management
  • Packet Types (Read/Write)
  • Memory Units: current address read, random access read, sequential read, byte write, page write
  • Thermal Unit: write byte, read byte, send byte, receive byte, ARA
  • Addressing
  • Memory Units: “1010XXY2b”
  • Thermal Unit: “0011XXXZb” “1001XXXZb” “0101XXXZb”
ia 64 main bus architecture specifications
IA-64 Main Bus Architecture:Specifications
  • 64-Bit bus running at 2.1 GB/s
  • Up to [4] Itaniums can be connected in parallel to the same bus (running at 266 Mhz)
  • SAC: System Address Controller
  • SDC: System Data Controller
  • Above controllers assign Address or Data Information from the Itanium(s) to the memory unit (from multiple processors to a single bus line and vice versa)
ia 64 customer feedback
IA-64 Customer Feedback
  • What are journalists, customers saying?

- “The heat generated from the Itanium can be compared to an EZ-Bake Oven…Intel is losing its foothold in the processor industry by relying on the archaic x86 architecture.”

- “Upgrading a mission critical system is a daunting task, especially since there exists reliable 64-bit Unix Machines. Then there’s the code conversion problem…”