cs718 data parallel processors n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
CS718 : Data Parallel Processors PowerPoint Presentation
Download Presentation
CS718 : Data Parallel Processors

Loading in 2 Seconds...

play fullscreen
1 / 25

CS718 : Data Parallel Processors - PowerPoint PPT Presentation


  • 149 Views
  • Uploaded on

CS718 : Data Parallel Processors. 27 th April, 2006. Data Parallel Architectures. SIMD Processors Multiple processing elements driven by a single instruction stream Associative Processors SIMD like processors with associative memory Vector Processors

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'CS718 : Data Parallel Processors' - dom


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
cs718 data parallel processors

CS718 : Data Parallel Processors

27th April, 2006

Anshul Kumar, CSE IITD

data parallel architectures
Data Parallel Architectures
  • SIMD Processors
    • Multiple processing elements driven by a single instruction stream
  • Associative Processors
    • SIMD like processors with associative memory
  • Vector Processors
    • Uni-processors with vector instructions
  • Systolic Arrays
    • Application specific VLSI structures

Anshul Kumar, CSE IITD

slide3
SIMD

M

P

DS

IS

C

P

DS

One of the earliest model of parallel computer

Anshul Kumar, CSE IITD

illiac iv simd model

P

P

P

P

M

M

M

M

ILLIAC IV SIMD Model

I/O

CU

bus

PE1

PE2

PEn

Interconnection network

Planned for 64 x 4 PEs, built only 64

Anshul Kumar, CSE IITD

burroughs scientific processor bsp model

P

M

Burroughs Scientific Processor (BSP) Model

I/O

CU

bus

P1

P2

Pn

Interconnection network

M1

M2

Mk

Anshul Kumar, CSE IITD

simd algorithms sum of vector elements
SIMD algorithms: sum of vector elements

a0

a1

a2

a3

a4

a5

a6

a7

Si = ai + ai+1 i = 0,2,4,6

Si = Si + Si+2 i = 0,4

Si = Si + Si+4 i = 0

step 1:

a0+a1

a2+a3

a4+a5

a6+a7

a0+a1+

a2+a3

a4+a5+

a6+a7

step 2:

a0+a1+a2+a3+

a4+a5+a6+a7

step 3:

OR

Si = ai + ai+4 i = 0,1,2,3

Si = Si + Si+2 i = 0,1

Si = Si + Si+1 i = 0

Anshul Kumar, CSE IITD

no of processors vs time
No. of processors vs time

Adding vector elements:

  • n processors – log n steps
  • n/log n processors – log n steps

Matrix multiplication:

  • n processor – n2 steps
  • n2 processors – n steps
  • n3 processors – log n steps
  • n3/log n processors – log n steps

Important factors: data distribution, network

Anshul Kumar, CSE IITD

rise and fall of simds
Rise and fall of SIMDs
  • Introduced in 60’s (e.g. Illiac, BSP)
  • Problems:
    • not cost effective
    • serial fraction and Amdahl’s law
    • I/O bottle neck
  • Overshadowed by Vector Processors
  • Resurrected in 80’s (MPP from Goodyear, Connection machine from Thinking Machines Inc., MP-1 from MasPar)
  • Did not survive because of high cost

Anshul Kumar, CSE IITD

related ideas
Related ideas
  • Coarse grain SIMD with off the shelf processors (synchronized MIMD), e.g. CM5 of Thinking Machines
  • This gave rise to SPMD (single program multiple data)
  • MMX and SIMD instructions in Pentium

Anshul Kumar, CSE IITD

vector processors
Vector Processors

I-cache

I-unit

and

control

D-cache

Memory

V-reg

GPRs

address

unit

Mem

control

Buses

VFU

VFU

FU

Anshul Kumar, CSE IITD

four generations of cray systems vector processors
Four Generations of CRAY systems (vector processors)

System CPUs Clock Flops/ Words Mflops Gates/

MHz clock/ moved/ chip

CPU clk/CPU

CRAY-1 1 80 2 1 80 2

X-MP 4 105 2 3 840 16

Y-MP 8 166 2 3 2667 2500

C90 16 240 4 6 15360 10000

Anshul Kumar, CSE IITD

cray history
Cray History
  • http://www.cray.com/company/history.html

Anshul Kumar, CSE IITD

cray c90
8GB central memory shared by 16 CPUs

128 CPU - mem paths

word =

64 bits + 16 ECC

Dual vector pipes

128 element segments

Memory

8 sections

8x8 sub sections

8x8x2 bank groups

8x8x2x8 banks

CRAY C90

Anshul Kumar, CSE IITD

convex c4 xa system
Convex C4/XA system
  • CPU: 7.5 ns clock, 1620 MFLOPs
  • Mem: 32 MB x 32 banks, 64 bit word, 50ns access time
  • 3 FP pipes, 2 results each
  • Vector regs - FPU cross bar
  • 1.1 GB/s per I/O port

5 x 5

crossbar

CPUs

memories

I/O

utilities

Anshul Kumar, CSE IITD

other examples
NEC SX - X

4 CPUs

4 x 2 pipes each

Fujitsu VP5000

7 - 222 CPUs

2 LS pipes

3 Func pipes

2 mask pipes

Other examples

Fujitsu VP2000

1 - 2 CPUs

Anshul Kumar, CSE IITD

systolic arrays h t kung 1978
Systolic Arrays (H.T. Kung 1978)

Simplicity, Regularity, Concurrency, Communication

Example :

Band matrix multiplication

Anshul Kumar, CSE IITD

slide17

T=0

B31

A23

A22

B21

A12

A31

A21

A11

B11

B12

slide18

T=1

B31

A23

A32

A22

A12

B22

B21

A31

A21

A11

B11

B12

slide19

T=2

A33

B32

A23

B31

B22

A32

A22

A12

B21

A31

A21

A11

B11

B12

slide20

T=3

A34

B42

B32

B31

A23

A33

A32

B21

B22

A22

A12

A42

B23

A31

A11 B11

A21

B12

slide21

T=4

A34

B42

A23

A43

A33

B32

B33

B31

A11 B11 A12 B21

A32

A22

A42

B22

B23

A31

A21 B11

A11 B12

slide22

T=5

A34

B42

A23

B32

B33

B31

A43

A33

C11

A21 B11 A22 B21

A11 B12 A12 B22

A32

A42

B23

A21 B12

A31 B11

slide23

T=6

B43

A44

B42

A34

C11

A21 B11 A22 B21 A23 B31

B32

B33

A33

A43

C12

A53

A31 B11 A32 B21

A21 B12 A22 B22

A42

A12 B23

A31 B12

warp programmable systolic processor
WARP: Programmable Systolic Processor

[Kung, CMU 1987]

Complete contrast to the original idea

  • not application specific
  • not a single VLSI
  • complex cell (pipelined FP adder, mult, FIFOs, RAM, cross bar)
  • linear
  • asynchronous

Anshul Kumar, CSE IITD

references
References
  • D. Sima, T. Fountain, P. Kacsuk, "Advanced Computer Architectures : A Design Space Approach", Addison Wesley, 1997.
  • K. Hwang, "Advanced Computer Architecture : Parallelism, Scalability, Programmability", McGraw Hill, 1993.

Anshul Kumar, CSE IITD