Other architectures examples
Download
1 / 23

Other Architectures & Examples - PowerPoint PPT Presentation


  • 119 Views
  • Uploaded on

Other Architectures & Examples. Multithreaded architectures Dataflow architectures Multiprocessor examples 1 st May, 2006. Context switching. Delays and poor resource utilization due to - Data/control hazards cache misses waiting for some event Solution –

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Other Architectures & Examples' - caryn-weber


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Other architectures examples

Other Architectures & Examples

Multithreaded architectures

Dataflow architectures

Multiprocessor examples

1st May, 2006

Anshul Kumar, CSE IITD


Context switching
Context switching

  • Delays and poor resource utilization due to -

    • Data/control hazards

    • cache misses

    • waiting for some event

  • Solution –

    • context switch to another thread

  • Context switch mechanism –

    • operating system - slow

    • hardware - fast

Anshul Kumar, CSE IITD


Multithreaded architecture
Multithreaded architecture

  • Hardware context switching

  • Models

    • control flow or hybrid (control flow, data flow)

  • Granularity

    • fine grain or coarse grain

  • Memory organization

    • shared?, distributed?, cache coherent?

  • No. of threads

    • small, medium, large

Anshul Kumar, CSE IITD


Ilp and multithreading
ILP and Multithreading

ILP Coarse MT Fine MT SMT

Hennessy and Patterson


Chip level multithreading
Chip level multithreading

Executing instructions from multiple threads within one processor chip at the same time.

  • Multithreading: Interleaved issue of multiple instructions from different threads

  • Simultaneous multithreading (SMT): Issue multiple instructions from multiple threads in one cycle.

  • Chip-level multiprocessing (CMP or Multicore): integrate two or more superscalar processors into one chip, each execute one thread independently

  • Any combination of multithreading/SMT/CMP

Wikipedia

Anshul Kumar, CSE IITD


Historical examples
Historical Examples

Machine Granu- Procs Threads/ Memory Year

larity proc

HEP from fine max 16 8 active shared 1978

Denelcor 64 max centralized

Tera fine max 256 128 distributed 1990

shared

Alewife coarse max 512 1 active CC 1990

(MIT) sparcle 3 loaded

Anshul Kumar, CSE IITD


Modern examples
Modern examples

  • Pentium 4 Hyperthreading

  • MIPS MT 8 cores with 4 threads each

  • IBM Power 5 dual core, 2 threads each

  • Ultrasparc T1 fine grained multithreading

Anshul Kumar, CSE IITD


HEP

Control loop

8 stage pipeline

scheduler function unit

PSW

queue

Program

memory

Matching

unit

Increment

control

Registers

Operand

fetch

SFU

FU1

FU2

FUn

To/from

data

memory

Anshul Kumar, CSE IITD


Control flow data flow models
Control Flow & Data Flow models

  • Control Flow (von Neumann)

    • control flows through a sequence of instructions, branches can alter the flow

    • instructions get data from or put data in memory

    • explicit parallelism through control operators – fork/join

  • Data Flow

    • instructions are triggered by availability of data

    • data flows from instruction to instruction

    • explicit parallelism

Anshul Kumar, CSE IITD


Dataflow model
Dataflow Model

A

B

1

-

+

A-B

B+1

*

R=(A-B)*(B+1)

Anshul Kumar, CSE IITD


Dataflow program
Dataflow Program

-

L1:

Compute B

A

L3:

L2/2

L2:

L3/1

+

-

B

B

1

L4/2

L4/1

L4:

A-B

*

B+1

L6/1

R=(A-B)*(B+1)

Anshul Kumar, CSE IITD


Static dataflow architecture
Static Dataflow Architecture

Activity

Store

Fetch

unit

FU1

FU2

FUn

Instruction

queue

Update

unit

to/from other PEs

Anshul Kumar, CSE IITD


Tagged token dataflow architecture
Tagged-token dataflow architecture

Matching

unit

Matching

store

Instruction/

data

memory

Fetch

unit

FU1

FU2

FUn

Token

queue

Form

token unit

to/from other PEs

Anshul Kumar, CSE IITD


Uma examples
UMA Examples

  • Earlier approach : Large number of processors (e.g. Denelcor HEP, NYU Ultracomputer)

  • Now realized : Good only for small number of processors (e.g. Encore Multimax - 1980’s, SGI Power Challenge - 1990’s)

Anshul Kumar, CSE IITD


Sgi power challenge
SGI Power Challenge

  • 18 MIPS R 8000

  • 16 GB RAM, 8-way interleaved

  • 4 power channel-2, each 320 MB/s (I/O bus)

  • Power path-2 : split transaction shared bus (256 bit data, 40 bit address)

  • Snoopy cache coherence protocol

Anshul Kumar, CSE IITD


Numa examples
NUMA Examples

  • BBN TC2000

  • IBM RP3

  • Hector

  • Cray T3D

Anshul Kumar, CSE IITD


Hector
Hector

  • Hierarchical Structure

    global ring

    local rings

    stations

    Proc module (P+C+M)

    I/O module

Anshul Kumar, CSE IITD


Hector1
Hector

station

station

station

local ring

global ring

local ring

station

station

station

Station

Station bus

Station

controller

Proc

module

Proc

module

Proc

module

I/O

module

Anshul Kumar, CSE IITD


Cray t3d
Cray T3D

  • Alpha 21064 Proc Cray Y-MP host

  • upto 128 GB memory

  • 4x4x4 3D torus - config upto 8x8x8

  • 2 PEs in each node

Anshul Kumar, CSE IITD


Cc numa examples
CC-NUMA examples

Machine Nodes Mem Cache Net

Wisconsin single proc per col bus snoopy bus grid

Multicube

Aquarius single proc per node snoopy+ bus grid

Multimulti directory

Stanford cluster per cluster snoopy+ pair of

Dash 4 R3000+ directory meshes

FPU on bus

Stanford single proc per node directory 2D

Flash T5+magic chip mesh

Convex hyper node per SCI X bar

Exemplar 8 PA-RISC hyper node (hyper node)

multi rings

Magic chip : memory + I/O + network controller

Anshul Kumar, CSE IITD


Coma examples
COMA examples

  • DDM (Data Diffusion Machine)

    • single bus (split transaction)

    • can be made hierarchical

  • KSR 1

    • hierarchical rings

    • distributed directory is a matrix :

      rows for pages, columns for caches

Anshul Kumar, CSE IITD


Distr mem arch examples
Distr Mem Arch Examples

Machine Comp. Comm. Vec. Switch Topology

proc proc proc

nCUBE2 custom custom hyper cube

iPSC2 i386 yes yes hyper cube

Intel i860 i860 custom 2D mesh

Paragon

Genesis i870 i870 custom 2 level X bar

Manna i860 i860 16x16 X bar hierarch.

Parsytec P.PC601 T805 C004 3D mesh

Transtech i860 T805 C004 variable

Paramid

IBM SP2 Power2 i860 custom fat tree

Meiko SPARC custom Fujitsu custom fat tree

C32

Parsys T900 T900 C104 hierarch sw

SN9800

Anshul Kumar, CSE IITD


References
References

  • D. Sima, T. Fountain, P. Kacsuk, "Advanced Computer Architectures : A Design Space Approach", Addison Wesley, 1997.

Anshul Kumar, CSE IITD


ad