computer architecture guidance l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Computer Architecture Guidance PowerPoint Presentation
Download Presentation
Computer Architecture Guidance

Loading in 2 Seconds...

play fullscreen
1 / 41

Computer Architecture Guidance - PowerPoint PPT Presentation


  • 151 Views
  • Uploaded on

Computer Architecture Guidance. Keio University AMANO, Hideharu hunga@am . ics . keio . ac . jp. Contents. Techniques of two key architectures for future system LSIs Parallel Architectures Reconfigurable Architectures Advanced uni-processor architecture

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Computer Architecture Guidance' - violet


Download Now An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
computer architecture guidance

ComputerArchitectureGuidance

Keio University

AMANO, Hideharu

hunga@am.ics.keio.ac.jp

contents
Contents

Techniques of two key architectures for future system LSIs

    • Parallel Architectures
    • Reconfigurable Architectures
  • Advanced uni-processor architecture

→Special Course of Microprocessors (by Prof. Yamasaki, Fall term)

class
Class
  • Lecture using Powerpoint: (70mins. )
    • The ppt file is uploaded on the web site http://www.am.ics.keio.ac.jp, and you can down load/print before the lecture.
    • When the file is uploaded, the message is sent to you by E-mail.
    • Textbook: “Parallel Computers” by H.Amano (Sho-ko-do)
  • Exercise (20mins.)
    • Simple design or calculation on design issues
evaluation
Evaluation
  • Exercise on Parallel Programming using SCore on RHiNET-2 (50%)
  • Exercise after every lecture (50%)
computer architecture 1 introduction to parallel architectures

ComputerArchitecture1Introduction to Parallel Architectures

Keio University

AMANO, Hideharu

hunga@am.ics.keio.ac.jp

parallel architecture
Parallel Architecture

A parallel architecture consists of multiple processing units

which work simultaneously.

  • Purposes
  • Classifications
  • Terms
  • Trends
boundary between parallel machines and uniprocessors
Boundary between Parallel machines and Uniprocessors

Uniprocessors

  • ILP(InstructionLevelParallelism)
    • A single Program Counter
    • Parallelism Inside/Between instructions
  • TLP(TreadLevelParallelism)
    • Multiple Program Counters
    • Parallelism between processes and jobs

Parallel Machines

Definition

Hennessy & Petterson’s

Computer Architecture: A quantitative approach

increasing of simultaneous issued instructions vs tightly coupling

Performance improvement

Tightly Coupling

Increasing of simultaneous issued instructions vs. Tightly coupling

Single pipeline

Multiple instructions issue

Multiple Threads execution

On chip implementation

Shared memory, Shared register

Connecting Multiple Processors

purposes of providing multiple processors
Purposes of providing multiple processors
  • Performance
    • A job can be executed quickly with multiple processors
  • Fault tolerance
    • If a processing unit is damaged, total system can be available: Redundant systems
  • Resource sharing
    • Multiple jobs share memory and/or I/O modules for cost effective processing:Distributed systems
  • Low power
    • High performance with Low frequency operation

Parallel Architecture: Performance Centric!

flynn s classification
Flynn’s Classification
  • The number of InstructionStream: M(Multiple)/S(Single)
  • The number of DataStream:M/S
    • SISD
      • Uniprocessors(including Super scalar、VLIW)
    • MISD: Not existing(AnalogComputer)
    • SIMD
    • MIMD
simd single instruction streams multiple data streams

Instruction

SIMD (Single Instruction StreamsMultiple Data Streams
  • All Processing Units executes the same instruction
  • Low degree of flexibility
  • Illiac-IV/MMX(coarse grain)
  • CM-2 type(fine grain)

Instruction

Memory

Processing Unit

Data memory

two types of
Two types of SIMD
  • Coarse grain:Each node performs floating point numerical operations
    • ILLIAC-IV,BSP,GF-11
    • Multimedia instructions in recent high-end CPUs
    • Dedicated on-chip approach: NEC’s IMEP
  • Fine grain:Each node only performs a few bits

operations

    • ICLDAP,CM-2,MP-2
    • Image/Signal Processing
    • Connection Machines extends the application to Artificial Intelligence (CmLisp)
a processing unit of cm 2
A processing unit of CM-2

Flags

A

B

F

OP

C

Context

s

c

256bit memory

1bit serial ALU

element of cm2

P

P

P

P

P

P

P

P

P

P

P

P

P

P

P

P

Element of CM2

4096 chips =

64KPE

Instruction

LSI chip

Router

4x4 Processor Array

12links 4096 Hypercube

connection

256bit x 16 PE RAM

the future of simd
The future of SIMD
  • Coarse grain SIMD
    • A large scale supercomputer like Illiac-IV/GF-11 will not revive.
    • Multi-media instructions will be used in the future.
    • Special purpose on-chip system will become popular.
  • Fine grain SIMD
    • Advantageous to specific applications like image processing
    • General purpose machines are difficult to be built ex.CM2 → CM5
slide17

Each processor executes individual instructions

  • Synchronization is required
  • High degree of flexibility
  • Various structures are possible
MIMD

Processors

Interconnection

networks

Memory modules (Instructions・Data)

classification of mimd machines structure of shared memory
Classification of MIMD machines Structure of shared memory
  • UMA(UniformMemoryAccessModel)

provides shared memory which can be accessed from all processors with the same manner.

  • NUMA(Non-UniformMemoryAccessModel)

provides shared memory but not uniformly accessed.

  • NORA/NORMA(NoRemoteMemoryAccessModel)

provides no shared memory. Communication is done with message passing.

slide19

On-chip multiprocessor

Chip multiprocessor

Single chip multiprocessor

UMA
  • The simplest structure of shared memory machine
  • The extension of uniprocessors
  • OS which is an extension for single processor can be used.
  • Programming is easy.
  • System size is limited.
    • Bus connected
    • Switch connected
  • A total system can be implemented on a single chip
an example of uma bus connected

Snoop

Cache

Snoop

Cache

Snoop

Cache

Snoop

Cache

PU

PU

PU

An example of UMA:Bus connected

MainMemory

sharedbus

PU

SMP(Symmetric MultiProcessor)

On chip multiprocessor

switch connected uma
Switch connected UMA

....

LocalMemory

CPU

Interface

Switch

….

MainMemory

The gap between switch and bus becomes small

slide22

Competitive to WS/PC clusters with Software DSM

NUMA
  • Each processor provides a local memory, and accesses other processors’ memory through the network.
  • Address translation and cache control often make the hardware structure complicated.
  • Scalable:
    • Programs for UMA can run without modification.
    • The performance is improved as the system size.
typical structure of numa
Typical structure of NUMA

Node 0

Node 1

Interconnecton

Network

Node2

Logical address space

Node 3

classification of numa
Classification of NUMA
  • Simple NUMA:
    • Remote memory is not cached.
    • Simple structure but access cost of remote memory is large.
  • CC-NUMA:CacheCoherent
    • Cache consistency is maintained with hardware.
    • The structure tends to be complicated.
  • COMA:CacheOnlyMemoryArchitecture
    • No home memory
    • Complicated control mechanism
sgi origin
SGIOrigin

BristledHypercube

MainMemory

HubChip

Network

MainMemory is connected directly with HubChip

1 cluster consists of 2PE.

ddm data diffusion machine

...

...

...

DDM(DataDiffusionMachine)

...

nora norma

The fastest processor is always NORA

(except The Earth Simulator)

Cluster computing

Inter-PU communications

NORA/NORMA
  • No shared memory
  • Communication is done with message passing
  • Simple structure but high peak performance

Hard for programming

fujitsu s nora ap1000 1990
Fujitsu’s NORA AP1000(1990)
  • Mesh connection
  • SPARC
intel s paragon xp s 1991
Intel’s Paragon XP/S(1991)
  • Mesh connection
  • i860
pc cluster
PC Cluster
  • Beowulf Cluster (NASA’s Beowulf Projects 1994, by Sterling)
    • Commodity components
    • TCP/IP
    • Free software
  • Others
    • Commodity components
    • High performance networks like Myrinet / Infiniband
    • Dedicated software
terms 1
Terms(1)
  • Multiprocessors:
    • MIMD machines with shared memory
    • (Strict definition:by Enslow Jr.)
      • Shared memory
      • Shared I/O
      • Distributed OS
      • Homogeneous
    • Extended definition: All parallel machines(Wrong usage)
terms 2

Don’t use if possible

Terms(2)
  • Multicomputer
    • MIMD machines without shared memory, that is NORA/NORMA
  • Arraycomputer
    • A machine consisting of array of processing elements : SIMD
    • A supercomputer for array calculation (Wrong usage)
  • Loosely coupled ・ Tightly coupled
    • Loosely coupled: NORA,Tightly coupled:UMA
    • But, are NORAs really loosely coupled??
slide38

Classification

Fine grain 

SIMD

Coarse grain 

Multiprocessors

Stored

programming

based

Bus connected UMA

Switch connected UMA

Simple NUMA

CC-NUMA

COMA

MIMD

NUMA

NORA

Multicomputers

Systolic architecture

Data flow architecture

Mixed control

Demand driven architecture

Others

systolic architecture
Systolic Architecture

Data x

Computational Array

Data y

Data streams are inserted into an array of special purpose computing node in a certain rhythm.

Introduced in Reconfigurable Architectures

data flow machines
Data flow machines

A process is driven by the data

(a+b)x(c+(dxe))

Also introduced in Reconfigurable Systems

exercise 1
Exercise 1
  • In this class, a PC cluster RHiNET is used for exercising parallel programming. Is RHiNET classified into Beowulf cluster ? Why do you think so?
  • If you take this class, send the answer with your name and student number to hunga@am.ics.keio.ac.jp