Topic 5 processor development
Download
1 / 100

Topic 5 Processor Development - PowerPoint PPT Presentation


  • 58 Views
  • Uploaded on

Topic 5 Processor Development. AH Computing Computer Architecture. SQA arrangements.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Topic 5 Processor Development' - xena-gross


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Topic 5 processor development

Topic 5 Processor Development

AH Computing

Computer Architecture


Sqa arrangements
SQA arrangements

  • Description of the evolution of the following microprocessor architectures: the Power PC series, the Intel X86 series and the Intel IA-64 in terms, where appropriate, of the following features and techniques:

    • increasing clock speeds

    • data bus widths

    • pipelining

    • superscalar processing

    • branch prediction

    • speculative loading of data and executing of instructions

    • predication

    • the number and function of registers used

    • SIMD

    • RISC

    • CISC

  • Explanation of the relationship between these developments and system performance.


Introduction
Introduction

  • From 1980s, microprocessor architecture has developed rapidly, as a result of

  • Increasing miniaturisation of microelectronic circuitry, which means that more and more complex chip designs have become possible and economically viable

  • The pressure form software developers to design microprocessors with ever increasing performance


Introduction1
Introduction

  • The first microprocessors were not general purpose processors but were designed for specific applications


Intel 4004 1971
Intel 4004 (1971)

  • the first complete CPU on one chip

  • the first commercially available microprocessor used in calculators, data terminals, numeric control systems etc.

  • 16 general purpose registers

  • 1KByte of data memory and 4Kbytes of instruction memory

  • 16 4-bit GP registers

  • Clock speed of 740 KHz

  • 45 instructions

Development of Intel


Intel 8080 1974
Intel 8080 (1974)

  • 16-bit address bus, 8-bit data bus

  • PC was 16 bits long

  • 7 8-bit GP registers

  • Used in the first personal computer, the Altair 8800

  • Others…Zilog Z-80, Motorola/MOS 6502


Processor development
Processor Development

Look at the evolution of families of processors

  • Power PC

  • Intel X86

  • Intel I-64


Processor development1
Processor Development

Compare the following features and techniques

  • Increasing clock speeds

  • Data bus widths

  • Pipelining

  • Superscalar processing

  • Branch prediction

  • Speculative loading of data

  • Predication

  • The number and function of registers used

  • SIMD

  • RISC

  • CISC



Pentium
Pentium

  • Intel introduced superscalar architecture to the Pentium processor

  • 2 integer arithmetic and logic units

  • 1 Floating Point unit

  • 8 80-bit






Summary of x86
Summary of X86

The X86 series of microprocessors can be characterised as having:

  • a relatively small number of registers (8 GP, 8 FP and 8 SIMD)

  • a large instruction set

  • instructions of varying length

  • many addressing modes

  • These characteristics are typical of CISC (complex instruction set computer) architecture. Other CISC based processors include the IBM 370 and the VAX11/780.


Questions scholar page 128
Questions (Scholar page 128)

  • Sketch a graph of the increase in clock speeds from the 8086 to the Pentium processor

  • Which of the X86 processors was the first to use pipelining to improve performance?

  • How many registers has the (a) 8086, (b) 80286, (c) 80486 (d) Pentium

  • Which X86 chip was the first to have a superscalar architecture?

  • The X86 series are considered to be CISC processors. Justify this claim.



Background
Background

  • Improvements in processor capability and operating systems led to the birth of the Wintel PC

  • Wintel is portmanteau of Windows and Intel. It usually means a computer based on an Intel x86 compatible processor and running the Microsoft Windowsoperating system.

  • Still dominates the laptop and desktop market


Motorola
Motorola

  • At the same time Motorola was developing its own family of microprocessors, the 68000 series

  • These were developed as 32-bit processors from start

  • As a result, Apple was able to develop its Macintosh computers with true graphical OS from the start


Motorola 68000 1979
Motorola 68000 (1979)

  • Same time as Intel 8086

  • 8MHz clock speed

  • 32-bit architecture

  • 16-bit data bus, 24-bit address bus

  • 16 32-bit registers (8 data, 8 address)

  • No segment registers required as direct addressing used

  • Used pre-fetching to speed up execution


Motorola 68020 1984
Motorola 68020 (1984)

  • 32-bit data and address buses

  • Pipeline had 3 stages

  • 256 cache added


Motorola 68040 1991
Motorola 68040 (1991)

  • 32-bit data and address buses

  • Pipeline had 6 stages

  • Floating point unit added

  • 4Kbyte caches for data and programs added


Motorola 68060 1994
Motorola 68060 (1994)

  • Superscalar – 3 execution units, 2 integer and 1 FP

  • 10 stage pipelines

  • 8Kbyte caches for data and programs


Motorola series
Motorola series

  • Used in Sun workstations, Apple Macintosh computers, and later Atari computers

  • No longer in use in main computer market

  • Still used in embedded systems

  • Motorola and IBM designed the first PowerPC chip to


Main characteristics of motorola series
Main Characteristics of Motorola series

In the final years of the 68000 processors, Apple, Motorola and IBM defined a specification for open system software and hardware, and Motorola and IBM designed the first PowerPC chip to meet this specification.


Powerpc
PowerPC

  • Acronym for “performance optimised with enhanced RISC”

  • Compared with CISC-based X86

    • More registers

    • A smaller, but more efficient, instruction set

    • Less addressing modes


Powerpc1
PowerPC

  • First chip 601 in 1993

  • 32-bit chip with a 64-bit data bus

  • Clock speed of 60MHz

  • Up to 4 Gb of memory

  • Superscalar architecture 3 independent execution units (integer, floating point and branch processing) – each with a 6 stage pipeline


Topic 5 processor development

Used in the XBox

Used in the Nintendo Wii


Power pc overview
Power PC overview

  • Used in

    • Controllers in cars

    • Networking – routers and servers

    • Honda’s Asimo

    • Vehicle-Management Computer for the F-35 fighter jet

    • Playstation 3, Wii, Nintendo DS


Topic 5 processor development


Comparison of x86 with powerpc

Direct addressing for Load, Store and Branch instructions. All other instruction address internal registers

Comparison of X86 with PowerPC


Topic 5 processor development

TRENDS important All other instruction address internal registers


Summary of table
Summary of table All other instruction address internal registers

  • clock speeds have increased by a factor of 50 in 10 years

  • bus speeds have increased by a factor of 20

  • the complexity (no. of transistors) has increased by a factor of 20

  • on chip cache has increased

  • new features have been added.


Clock speeds
Clock speeds All other instruction address internal registers

  • PowerPC chips had clock speeds lower than CISC based designs

  • But more efficient RISC based technology gave a better performance.

  • Clock speed alone cannot be used to compare processors


Questions page 133
Questions (Page 133) All other instruction address internal registers

  • Which 3 companies cooperated in the design of the PowerPC specification?

  • What was the first PowerPC chip released, and when?

  • The 601 chip can be described as superscalar. How is this justified?

  • How many programmer accessible registers are there in all PowerPC chips?

  • Compare the X86 and PowerPC architectures in terms of

    • a) instructions set

    • b) instruction length

    • c) addressing modes

  • What new feature did the G3 chip have which improved performance?

  • Which was the first PowerPC chip to have SIMD instructions?

    • a) 601

    • b) 604e

    • c) G3

    • d) G4

    • e) G5

  • Why is clock speed not a good way of comparing a Windows PC with a Apple Macintosh?

  • Other than in Apple computers, what are PowerPC chips used for?


Answers
Answers All other instruction address internal registers

Q10: Apple, Motorola, IBM

Q11: the 601 in 1993

Q12: it has 3 independent processing units - the floating point unit (FPU), the integer

ALU, and the system unit

Q13: 2 sets of 32 registers, each 64 bits wide

Q14: a) similar - X86 has 235 different instructions, PowerPC has 225

b) X86 has varied instruction lengths (1-11 bytes), the PowerPC instructions are all

exactly 4 bytes

c) the X86 has 11 addressing modes, the PowerPC has only 2

Q15: L2 "backside" cache on chip

Q16: d) G4

Q17: because the Mac uses the more efficient RISC architecture, a Mac with a lower

clock speed may outperform a Windows PC with a higher clock speed

Q18: IBM servers, Nintendo Game Cube, and a range of embedded applications


Intel ia 64

Intel IA-64 All other instruction address internal registers


Intel ia 641
Intel IA-64 All other instruction address internal registers

  • The X86 series reached its peak with the Pentium 3, Pentium 4 and Athlon processors.

  • These are essentially CISC processors, using pipelining and superscalar processing, but with some RISC-like features. In 1994, Intel and HP began work on designing a new 64-bit architecture to replace the X86 series.


Topic 5 processor development
EPIC All other instruction address internal registers

  • Combination of RISC and CISC features, and is given the description EPIC - explicitly parallel instruction computing. There are 4 key features to the design:

  • instruction level parallelism - the compiler creates code which uses the many parallel execution units of the processor

  • use of VLIW - very long instruction words

  • use of predication - executing both branches of a program, then discarding the "not chosen" branch results

  • use of speculative loading - use of large fast cache to load data and instructions in advance of when they will be required


X86 ia 64
X86 IA-64 All other instruction address internal registers


X86 ia 641
X86 IA-64 All other instruction address internal registers


X86 ia 642
X86 IA-64 All other instruction address internal registers


X86 ia 643
X86 IA-64 All other instruction address internal registers


Topic 5 processor development
VLIW All other instruction address internal registers

  • Very Long Instruction Words

  • Fetched from memory in bundles of 128 bits

  • Contains 3 instructions

  • Each of length 41 bits

  • Final 5 bits are a pointer, which indicates to the processor to which of the many execution units each instruction should be assigned.


Ia 64 execution units
IA-64 Execution Units All other instruction address internal registers

  • I-unit (integer and logical operations)

  • M-unit (load and store operations)

  • B-unit (branch instructions)

  • F-unit (floating point operations)


Pointer
Pointer All other instruction address internal registers

  • 5 bits = 32 different combinations

  • 00000 – send instruction 1 to the M-unit, instruction 2 to the I-unit, instruction 3 to another I-unit

  • 11101 – send instruction 1 to the M-unit, instruction 2 to the F-unit and instruction 3 to the B-unit

  • The pointer is created by the compiler which determines in advance whether or not instructions can be executed in parallel


The compiler
The Compiler All other instruction address internal registers

When the instruction arrives at the processor, the 3 instructions are directed to the appropriate execution unit for processing:


Summary of ia 64
Summary of IA-64 All other instruction address internal registers

  • Performance is enhanced by

    • the use of VLIW reduces the number of relatively slow memory fetches

    • The sequencing of instructions being determined by the compiler rather than being dealt with at run time)


The itanium processor
The Itanium processor All other instruction address internal registers

  • The first commercial version of the IA-64 architecture was massively superscalar

    • 11 execution units

    • 4 integer units

    • 2 floating point units

    • 3 branch units

    • 2 load/store units


The itanium processor1
The Itanium processor All other instruction address internal registers

  • It makes extensive use of

    • Predication

    • Speculative loading of both data and instructions

  • Executes 20 operations per cycle

  • Clock speed of 800MHz is the equivalent of an X86 or PowerPC running at several GHz


The itanium processor2
The Itanium processor All other instruction address internal registers

1 Exabyte

=1024 Petabytes

=1024 x 1024 Terabytes

= 1024 x 1024 x 1024 Gigabytes

It has

  • 128 64-bit registers for integer/logical/general purpose use

  • 128 82-bit registers for floating point and graphics use

  • Data bus 128 bits wide

  • Address bus 64 bits wide (potentially64 Exabytes of addressable memory)


Questions page 137
Questions (Page 137) All other instruction address internal registers

  • The IA-64 uses VLIW. What does this mean?

  • Can the Itanium be described as a superscalar architecture?

  • IA-64 chips use predication. Explain the difference between predication and branch prediction.

  • How can an 800MHz Itanium outperform a 2.5GHz Pentium?


Answers1
Answers All other instruction address internal registers

Q19: VLIW = very large instruction word; the IA-64 fetches a 128 bit bundle containing 3 41-bit instructions during each memory fetch

Q20: yes, it has 11 execution units which can operate in parallel

Q21: branch prediction mean "guessing" whether or not a branch will be taken, and executing following instructions accordingly - if the prediction is wrong, the pipeline will stall; predication means executing instructions from both branches simultaneously, and discarding the results from the branch which is not required

Q22: due to its parallel execution units, 10 stage pipeline, VLIW memory accessing and use of predication and speculative loading, the Itanium can process up to 20 operations per cycle.


Intel itanium
Intel Itanium All other instruction address internal registers

  • Intel has released two processor families using the brand: the original Itanium and the Itanium 2.

  • Starting November 1, 2007, new members of the second family are again called Itanium.

  • The processors are marketed for use in enterprise servers and high-performance computing systems.


Dual core
Dual Core All other instruction address internal registers

  • Dual-core refers to a CPU that includes two complete execution cores per physical processor.


Parallel computing

Parallel Computing All other instruction address internal registers


Sqa arrangements1
SQA arrangements All other instruction address internal registers

  • Description of how parallel computers function referring to their use of:

    • local (cache) as well as main memory

    • pipelining

    • local pathways and packet switching to achieve communication between CPUs.

  • Description of the performance benefits of parallel computers.


Examples of parallel computing
Examples of parallel computing All other instruction address internal registers

  • Pipelining - executing one instruction while fetching the next

  • Superscalar architecture– multiple execution units all processing different operations simultaneously

  • SIMD instructions – the same instruction being applied to several data items at the same time

All single processors


Parallel computing1
Parallel Computing All other instruction address internal registers

  • Another approach is to have multiple processors

  • This is the basis of most mainframe computers and supercomputers


Parallel computing2
Parallel Computing All other instruction address internal registers

  • Using multiple processing elements simultaneously to solve a problem.

  • accomplished by breaking the problem into independent parts so that each processing element can execute its part of the algorithm simultaneously with the others.

  • The processing elements can include resources such as a single computer with multiple processors, several networked computers, specialized hardware, or any combination of the above


Multiprocessing mainframes and supercomputers
Multiprocessing, mainframes and supercomputers All other instruction address internal registers

  • Simplest – several processors connected to the same system bus…


Multiprocessing mainframes and supercomputers1
Multiprocessing, mainframes and supercomputers All other instruction address internal registers

  • Each processor has shared access to memory and to I/O devices

  • Master-slave – some systems have one processor controlling the others

  • Symmetrical Multiprocessing (SMP)- In other systems all are equal (up to 10 processors)


Multiprocessing
Multiprocessing All other instruction address internal registers

  • Not limited to mainframe systems

  • PowerMac G5 dual processor desktop system has 2 G5 processors


Comparison
Comparison All other instruction address internal registers


Massively parallel architectures
Massively Parallel Architectures All other instruction address internal registers


Massive parallel processing
Massive parallel processing All other instruction address internal registers

  • (MPP) is a term used in computer architecture to refer to a computer system with many independent arithmetic units or entire microprocessors, that run in parallel.

  • The term massive connotes hundreds if not thousands of such units.

  • processors are arranged in an interconnected array which serves as a network.

  • Early examples of such a system are the Distributed Array Processor, the Goodyear MPP, the Connection Machine, and the Ultracomputer.


Massively parallel architectures1
Massively Parallel Architectures All other instruction address internal registers

  • Today's most powerful supercomputers are all MP systems such as

    • Earth Simulator,

    • Blue Gene,

    • ASCI White,

    • ASCI Red,

    • ASCI Purple, and

    • ASCI Thor's Hammer.


Massively parallel architectures2
Massively Parallel Architectures All other instruction address internal registers

Memory

  • Each processor has access to its own local memory or cache

  • All processors can access a main (global) memory by a systemwide bus


Massively parallel architectures3
Massively Parallel Architectures All other instruction address internal registers

  • processors are pipelined - the results from one processor can become the input for another processor

  • as well as the system bus, there may be local pathways connecting groups of processors into clusters, and other pathways connecting clusters


Mp architectures communication
MP Architectures - communication All other instruction address internal registers

To achieve communication between processors, parallel computers use:

  • data pathways (buses) to connect clusters of processors, as well as system buses to connect processors and pipelines, enabling the results of one CPU to flow into another

    Or

  • packet switching techniques similar to those used in networks to manage the flow of data between processors.


Mp architectures communication1
MP Architectures - communication All other instruction address internal registers

  • Packet switching techniques, similar to those on a network, are used in which data packets are assigned the addresses of specific nodes (processors) on the array.

  • This enables any processor on the array to access the local memory of any other processor on the array or to pass data or instructions to other processors.


Topic 5 processor development
MPP All other instruction address internal registers


Examples lucidor
Examples - Lucidor All other instruction address internal registers

  • It consists of 90 interconnected nodes. Each node has two 90MHz Itanium 2 processors accessing 16K of L1 cache, and 256K of L2 cache.

  • Each node can access the system bus via a 128 port switch at a data transfer rate of 2Gbits per second.

  • In addition to the local memory, each node has shared access to 6Gb of main memory.

  • As a result, the system can achieve data processing rates of over 600GFlops per second.


Hitachi sr2201
Hitachi SR2201 All other instruction address internal registers

  • from 8 up to 2048 processors. The processors (Hitachi RISC chips) are arranged in a 3-dimensional grid to maximise communication between them.

  • As with Lucidor, speeds of up to 600GFlos per second can be achieved.

  • These systems are in use for a variety of applications, including structural and crash analysis, fluid dynamics research, quantum chemistry analysis and visualisation tools.

  • All of these can make use of the parallel architecture, as they require high speed processing of large amounts of data.


Topic 5 processor development
Cray All other instruction address internal registers

  • The CrayT3D is a current example, with 2048 nodes arranged in a 3-dimensional grid.

  • Each node has 2 Alpha processors, with access to individual cache and 8Mwords of memory.

  • Cray claims that this system can process 1 trillion flops per second.


Cray 2
Cray 2 All other instruction address internal registers


Blue gene
Blue Gene All other instruction address internal registers

  • Blue Gene is a computer architecture project designed to produce several supercomputers, designed to reach operating speeds in the PFLOPS (petaFLOPS) range, and currently reaching sustained speeds of nearly 500 TFLOPS (teraFLOPS).

  • Blue Gene/L has 65,536 processors.

  • Each is connected by 3 networks.

  • At the time of writing, Blue Gene/L is the fastest computer in the world, achieving over 70Tflops per second.


Blue gene1
Blue Gene All other instruction address internal registers

Chip – 2 processors

Card – 2 chips

Node – 16 cards

Cabinet – 32 nodes

System – 64 cabinets


Blue gene2
Blue Gene All other instruction address internal registers


Exercise
Exercise All other instruction address internal registers

  • Research one of the following –

    • Earth Simulator,

    • Blue Gene,

    • ASCI White,

    • ASCI Red,

    • ASCI Purple, and

    • ASCI Thor's Hammer.

  • In terms of

    • Number of nodes

    • Number of processors at each node

    • Global memory

    • Processing power in teraflops per second

    • Applications


Past paper questions
Past Paper Questions All other instruction address internal registers

  • 2011 Q 14

  • 2008 Q 13

  • 2007 Q17a,b

  • 2006 Q15b


Past paper 2008
Past Paper 2008 All other instruction address internal registers

JGT(37) If the flag is set jump to location 37

Describe the problem that instruction JGT(37) could cause for a processor using a pipeline.


Topic 5 processor development
2009 All other instruction address internal registers

Mediatrain is a company which uses a high performance computer system to produce multimedia training projects. The computer system has a PowerPC superscalar processor which has thirty two 64-bit general purpose registers.

(a) The PowerPC is an example of a RISC processor. RISC processors have a large number of general purpose registers. Name three other features of a RISC processor that distinguish it from a CISC

processor. (3)


2009 cont
2009 cont All other instruction address internal registers

c. Explain the benefit to the PowerPC processor of having so many general purpose registers (2)

d. Most of the instructions in the PowerPC processor instruction set have an op-code and an operand.

Describe the function of the op-code and the operand. (2)


2009 cont1
2009 cont All other instruction address internal registers

e. Superscalar processing involves the use of multiple pipelines.

State a feature of the PowerPC processor which makes it suited to superscalar processing. Justify your answer. (4)


2009 cont2
2009 cont All other instruction address internal registers

f. Branch instructions can cause a problem for processors which use pipelines.

Branch prediction can reduce this problem.

Describe how branch prediction operates. (3)


Topic 5 processor development
2009 All other instruction address internal registers

(g) The PowerPC processor makes use of Single Instruction Multiple Data (SIMD) instructions.

Explain how the use of SIMD instructions improves performance, using a suitable multimedia example. (3)


Topic 5 processor development
2008 All other instruction address internal registers

15. The Pentium III processor has eight registers which can be operated on by SIMD instructions.

(a) Describe what is meant by a SIMD instruction. (1)

(b) Describe how the Pentium III could use SIMD instructions and registers when adjusting the brightness of a graphic. (3)