the von neumann syndrome calls for a revolution n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
The von Neumann Syndrome calls for a Revolution PowerPoint Presentation
Download Presentation
The von Neumann Syndrome calls for a Revolution

Loading in 2 Seconds...

play fullscreen
1 / 63

The von Neumann Syndrome calls for a Revolution - PowerPoint PPT Presentation


  • 223 Views
  • Uploaded on

HPRCTA'07 - First International Workshop on High-Performance Reconfigurable Computing Technology and Applications - in conjunction with SC07 -. Reno, NV, November 11, 2007. The von Neumann Syndrome calls for a Revolution. Reiner Hartenstein TU Kaiserslautern. http://hartenstein.de.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

The von Neumann Syndrome calls for a Revolution


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
the von neumann syndrome calls for a revolution

HPRCTA'07 - First International Workshop on High-Performance Reconfigurable Computing Technology and Applications

- in conjunction with SC07 -

Reno, NV, November 11, 2007

The von Neumann Syndrome calls for a Revolution

Reiner Hartenstein

TU Kaiserslautern

http://hartenstein.de

about scientific revolutions
About Scientific Revolutions

Thomas S. Kuhn: The Structure of Scientific Revolutions

Ludwik Fleck: Genesis and Developent of a Scientific Fact

2

what is the von neumann syndrome
What is the von Neumann Syndrome

Computing the von Neumann style is tremendously inefficient. Multiple layers of massive overhead phenomena at run time often lead to code sizes of astronomic dimensions: resident at drastically slower off-chip memory.

The manycore programming crisis requires complete re-mapping and re-implementation of applications. A sufficiently large population of programmers qualified to program applications for 4 and more cores is far from being available.

3

education for multi core

I programmingmulticores

Education for multi-core

Mateo Valero

Multicore-based pacifier

4

will computing be affordable in the future
Will Computing be affordable in the Future?

Another problem is a high priority political issue: the very high energy consumption of von-Neumann-based Systems. The electricity consumption of all visible and hidden computers reaches more than 20% of our total electricity consumption. A study predicts 35 - 50% for the US by the year 2020.

5

reconfigurable computing highly promising
Reconfigurable Computing highly promising

Fundamental concepts from Reconfigurable Computing promise a speed-up by almost one order of magnitude, for some application areas by up to 2 or 3 orders of magnitude, at the same time slashing the electricity bill down to 10% or less.

It is really time to fully exploit the most disruptive revolution since the mainframe: Reconfigurable Computing - also to reverse the down trend in CS enrolment.

Reconfigurable Computing shows us the road map to the personal desktop supercomputer making HPC affordable also for small firms and for individuals, and, to a drastic reduction of energy consumption.

Contracts between microprocessor firms and Reconfigurable Computing system vendors are on the way but not yet published. The technology is ready, but most users are not.

Why?

6

a revolution is overdue
A Revolution is overdue

The talk sketches a road map requiring a redefinition of the entire discipline, inspired by the mind set of Reconfigurable Computing.

7

much more saved by coarse grain
much more saved by coarse-grain

*) feasible also with rDPA

8

slide9

200

100

2003 and later

2020

Mobile Computation, Communication, Entertainment, etc. (high volume market)

(3) Power-aware Applications

Cyber infrastructure energy consumption: several predictions.

most pessimistic: almost 50% by 2025 in the USA

PCs and servers (high volume)

HPC and Supercomputing,

9

an example fpgas in oil and gas 1
An Example: FPGAs in Oil and Gas .... (1)

[Herb Riley, R. Associates]

„Application migration [from supercomputer] has resulted in a 17-to-1 increase in performance"

For this example speed-up is not my key issue (Jürgen Becker‘s tutorial showed much higher speed-ups - going upto a factor of 6000)

For this oil and gas example a side effect is much more interesting than the speed-up

10

an example fpgas in oil and gas 2
An Example: FPGAs in Oil and Gas .... (2)

[Herb Riley, R. Associates]

„Application migration [from supercomputer] has resulted in a 17-to-1 increase in performance"

Saves more than $10,000 in electricity bills per year (7¢ / kWh) - .... per 64-processor 19" rack

did you know …

This is a strategic issue

… 25% of Amsterdam‘s electric energy consumption goes into server farms ?

… a quarter square-kilometer of office floor space within New York City is occupied by server farms ?

11

oil and gas as a strategic issue
Oil and Gas as a strategic issue

Low power design: not only to keep the chips cool

You know the amount of Google’ s electricity bill?

It should be investigated, how far the migrational achievements obtained for computationally intensive applications, can also be utilized for servers

Recently the US senate ordered a study on the energy consumption of servers

12

flag ship conference series ieee isca

Other: cache coherence ?speculative scheduling?

Flag ship conference series: IEEE ISCA

migration of the lemings

98.5 % von Neumann

[David Padua, John Hennessy, et al.]

Parallelism faded away

(2001: 84%)

Jean-Loup Baer

13

slide14

application disciplines use their own trick boxes:

transdisciplinary fragmentation of methodology

Unqualified for RC ?

Using FPGAs for scientific computation?

hiring a student from the EE dept. ?

CS is responsible to provide a RC common model

  • for transdisciplinary education
  • and, to fix its intradisciplinary fragmentation

14

computing curricula 2004 fully ignores reconfigurable computing

not even here

Joint Task Force for

Computing Curricula 2004fully ignores Reconfigurable Computing

Curricula ?

FPGA & synonyma: 0 hits

(Google: 10 million hits)

15

slide16

This is criminal !

Curriculum Recommendations, v. 2005

Upon my complaints the only change: including to the last paragraph of the survey volume:

"programmable hardware (including FPGAs, PGAs, PALs, GALs, etc.)."

However, no structural changes at all

v. 2005 intended to be the final version (?)

torpedoing the transdisciplinary responsibility of CS curricula

16

fine grained vs coarse grained reconfigurability

instruction-stream-based

data-stream-based

fine-grained vs. coarse-grained reconfigurability

“fine-grained” means: data path width ~1 bit

“coarse-grained”: path width = many bits (e.g. 32 bits)

Domain-specific rDPU design

rDPU with extensible „instruction„ set

  • CPU w. extensible instruction set (partially reconfigurable)
  • Domain-specific CPU design (not reconfigurable)
  • Soft core CPU (reconfigurable)

17

coarse grained terminology

DPU

DPU

CPU

program

counter

coarse-grained: terminology

**) does not have a program counter

*) “transport-triggered”

18

coarse grained terminology1

DPU

DPU

CPU

program

counter

coarse-grained: terminology

**) does not have a program counter

*) “transport-triggered”

PACT Corp, Munich, offers rDPU arrays

rDPAs

19

slide20

The Method of Communication

and Data Transport

by Software

by

Configware

The Paradigm Shift to Data-Stream-Based

the von Neumann syndrome

complex pipe network on rDPA

20

slide21

non-von-Neumann

machine paradigm

(generalization of the systolic array model)

The Anti Machine

A kind of trans(sub)disciplinary effort:

the fusion of paradigms

Interpreation [Thomas S.Kuhn]:

cleanup the terminology!

Twin paradigm ?

split up into 2 paradigms?

Like mater & anti matter: one

elementary particles physics

21

languages turned into religions
Languages turned into Religions

Java is a religion – not a language

[Yale Patt]

  • Teaching to students the tunnel view of language designers
  • falling in love with the subtleties of formalismes
  • instead of meeting the needs of the user

22

slide23

The language and tool disaster

End of April a DARPA brainstorming conference

Software people do not speak VHDL

Hardware people do not speak MPI

Bad quality of the application development tools

A poll at FCCM’98 revealed, that 86% hardware designers hate their tools

23

the first reconfigurable computer

60 years later the von Neumann (vN) model took over

The first Reconfigurable Computer
  • prototyped 1884 by Herman Hollerith
  • a century before FPGA introduction
  • data-stream-based
  • instruction-stream-based

24

reconfigurable computing came back
Reconfigurable Computing came back
  • As a separate community – the clash of paradigms
  • 1960 „fixed plus variable structure computer“ proposed by G. Estrin
  • 1970 PLD (programmable logic device*)
  • 1985 FPGA (Field Programmable Gate Array)
  • 1989 Anti Machine Model – counterpart of von Neumann
  • 1990 Coarse-grained Reconfigurable Datapath Array
  • Wann? Foundation of PACT
  • Wann reconfigurable address generator – 1994 MoPL

*) Boolean equations in sum of products form implemented by AND matrix and OR matrix

structured VLSI design like memory chips: integration density very close to Moore curve

25

outline
Outline
  • von Neumann overhead hits the memory wall
  • The manycore programming crisis
  • Reconfigurable Computing is the solution
  • We need a twin paradigm approach
  • Conclusions

26

the spirit of the mainframe age
The spirit of the Mainframe Age
  • For decades, we’ve trained programmers to think sequentially, breaking complex parallelism down into atomic instruction steps …
  • … finally tending to code sizes of astronomic dimensions
  • Even in “hardware” courses (unloved child of CS scenes) we often teach von Neumann machine design – deepening this tunnel view
  • 1951: Hardware Design going von Neumann (Microprogramming)

27

von neumann array of massive overhead phenomena
von Neumann: array of massive overhead phenomena

… piling up to code sizes of astronomic dimensions

28

von neumann array of massive overhead phenomena1
von Neumann: array of massive overhead phenomena

piling up to code sizes of astronomic dimensions

temptations by von Neumann style software engineering

[Dijkstra 1968] the “go to”

considered harmful

massive communication congestion

[R.H. 1975] universal bus

considered harmful

Backus, 1978: Can programming be liberated from the von Neumann style?

Arvindet al., 1983:A critique of Multiprocessing the von Neumann Style

29

von neumann array of massive overhead phenomena2
von Neumann: array of massive overhead phenomena

piling up to code sizes of astronomic dimensions

Dijkstra 1968

R.H., Koch 1975

Backus 1978

Arvind1983

temptations by von Neumann style software engineering

[Dijkstra 1968] the “go to”

considered harmful

massive communication congestion

[R.H. 1975] universal bus

considered harmful

30

von neumann overhead just one example
von Neumann overhead: just one example

94% computation load

only for moving this window

[1989]: 94% computation load

(image processing example)

31

the memory wall

ends in 2005

Performance

1000

µProc

60%/yr..

CPU clock speed ≠ performance:

processor’s silicon is mostly cache

2005: ~1000

100

CPU

Dave Patterson’s Law -

“Performance” Gap:

10

1

1990

2000

DRAM

7%/yr..

DRAM

1980

the Memory Wall

instruction stream code size of astronomic dimensions …..

… needs off-chipRAM which fully hits

better compare off-chip vs. fast on-chip memory

growth 50% / year

32

benchmarked computational density

CPU clock speed ≠ performance:

processor’s silicon is mostly cache

200

DEC alpha

[BWRC, UC Berkeley, 2004]

175

150

caches ...

125

100

SPECfp2000/MHz/Billion Transistors

CPU

75

IBM

50

SUN

25

HP

0

1990

1995

2000

2005

stolen from Bob Colwell

Benchmarked Computational Density

alpha: down by 100 in 6 yrs

IBM: down by 20 in 6 yrs

33

outline1
Outline
  • von Neumann overhead hits the memory wall
  • The manycore programming crisis
  • Reconfigurable Computing is the solution
  • We need a twin paradigm approach
  • Conclusions

34

the manycore future
The Manycore future
  • we are embarking on a new computing age -- the age of massive parallelism[Burton Smith]
  • everyone will have multiple parallel computers [B.S.]
  • Even mobile devices will exploit multicore processors, also to extend battery life [B.S.]
  • multiple von Neumann CPUs on the same µprocessor chip lead to exploding (vN) instruction stream overhead [R.H.]

35

von neumann parallelism

the sprinkler head has only a single whole: the von Neumann bottleneck

[Hartenstein]

the watering pot model

von Neumann parallelism

36

several overhead phenomena

The instruction-stream-based parallel von Neumann approach:

[Hartenstein]

the watering pot model

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

Several overhead phenomena

per CPU!

has several von Neumann overhead phenomena

37

explosion of overhead by von neumann parallelism

proportionate to the number of processors

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

Explosion of overhead by von Neumann parallelism

disproportionate to the number of processors

[R.H. 2006] MPI

considered

harmful

38

rewriting applications

rDPU

rDPU

rDPU

rDPU

rDPU

rDPU

rDPU

rDPU

rDPU

rDPU

rDPU

rDPU

rDPU

rDPU

rDPU

rDPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

Rewriting Applications
  • more processors means rewriting applications
  • we need to map an application onto different size manycore configurations
  • most applications are not readily mappable onto a regular array.
  • Mapping is much less problematic with Reconfigurable Computing

39

disruptive development

The Education Wall

Disruptive Development
  • Computer industry is probably going to be disrupted by some very fundamental changes. [Ian Barron]
  • We must reinvent computing. [Burton J. Smith]
  • A parallel [vN] programming model for manycore machines will not emerge for five to 10 years [experts from Microsoft Corp].
  • I don‘t agree: we have a model.
  • Reconfigurable Computing: Technology is Ready, Users are Not
  • It‘s mainly an education problem

40

outline2
Outline
  • von Neumann overhead hits the memory wall
  • The manycore programming crisis
  • Reconfigurable Computing is the solution
  • We need a twin paradigm approach
  • Conclusions

41

the reconfigurable computing paradox
The Reconfigurable Computing Paradox
  • Bad FPGA technology: reconfigurability overhead, wiring overhead, routing congestion, slow clock speed
  • Up to 4 orders of magnitude speedup + tremendously slashing the electricity bill by migration to FPGA
  • The reason of this paradox ?
  • There is something fundamentally wrong in using the von Neumann paradigm
  • The spirit from the Mainframe Age is collapsing under the von Neumann syndrome

42

beyond von neumann parallelism

The instruction-stream-based von Neumann approach:

beyond von Neumann Parallelism

the watering pot model [Hartenstein]

We need an approach like this:

per CPU!

it’s data-stream-based RC*

has several von Neumann overhead phenomena

*) “RC” = Reconfigurable Computing

43

beyond von neumann parallelism1
beyond von Neumann Parallelism

the watering pot model [Hartenstein]

instead of this instruction-stream-based parallelism

we need an approach like this:

per CPU!

several von Neumann overhead phenomena

it’s data-stream-based Recondigurable Computing

44

von neumann overhead vs reconfigurable computing

(coarse-grained rec.)

rDPU

rDPU

rDPU

rDPU

rDPU

rDPU

rDPU

rDPU

rDPA: reconfigurable datapath array

rDPU

rDPU

rDPU

rDPU

rDPU

rDPU

rDPU

rDPU

no instruction fetch at run time

von Neumann overhead vs. Reconfigurable Computing

using

reconfigurable

data counters

using data

counters

using

program

counter

*) configured before run time

45

von neumann overhead vs reconfigurable computing1

rDPU

rDPU

rDPU

rDPU

rDPU

rDPU

rDPU

rDPU

rDPU

rDPU

rDPU

rDPU

rDPU

rDPU

rDPU

rDPU

von Neumann overhead vs. Reconfigurable Computing

(coarse-grained rec.)

using

reconfigurable

data counters

using data

counters

using

program

counter

rDPA: reconfigurable datapath array

[1989]: x 17 speedup by GAG**

(image processing example)

[1989]: x 15,000 total speedup

from this migration project

*) configured before run time

**) just by reconfigurable address generator

46

reconfigurable computing means
Reconfigurable Computing means …
  • For HPC run time is more precious than compiletime

http://www.tnt-factory.de/videos_hamster_im_laufrad.htm

  • Reconfigurable Computing means moving overhead from run time to compile time**
  • Reconfigurable Computing replaces “looping” at run time* …

… by configuration before run time

**) or, loading time

*) e. g. complex address computation

47

reconfigurable computing means1
Reconfigurable Computing means …
  • For HPC run time is more precious than compiletime
  • Reconfigurable Computing means moving overhead from run time to compile time**
  • Reconfigurable Computing replaces “looping” at run time* …

… by configuration before run time

**) or, loading time

*) e. g. complex address computation

48

data meeting the processing unit pu

by Software

by

Configware

Data meeting the Processing Unit (PU)

... explaining the RC advantage

We have 2 choices

routing the data by memory-cycle-hungry instruction streams thru shared memory

(data)

data-stream-based: placement* of the execution locality ...

(PU)

pipe network generated by configware compilation

*) before run time

49

what pipe network

rDPU

rDPU

rDPU

rDPU

rDPU

rDPU

depending on connect fabrics

rDPA

rDPU

rDPU

rDPU

rDPU

rDPU

rDPU

rDPU

rDPU

rDPU

rDPU

rDPU

rDPU

rDPA

array port receiving or sending a data stream

What pipe network ?

pipe network, organized at compile time

Generalization*of the systolic array

rDPA = rDPU array, i. e. coarse-grained

[R. Kress, 1995]

*) supporting non-linear pipes on free form hetero arrays

rDPU = reconf. datapath unit (no program counter)

50

migration benefit by on chip ram

rDPU

rDPU

rDPU

rDPU

rDPU

rDPU

rDPU

rDPU

rDPU

ASM: Auto-Sequencing Memory

RAM

GAG

rDPA

ASM

ASM

ASM

ASM

ASM

ASM

ASM

ASM

ASM

ASM

ASM

ASM

data

counter

Migration benefit by on-chip RAM

Some RC chips have hundreds of on-chip RAM blocks, orders of magnitude faster than off-chip RAM

so that the drastic code size reduction by software to configware migration can beat the memory wall

multiple on-chip RAM blocks are the enabling technology for ultra-fast anti machine solutions

GAGs inside ASMs generate the data streams

GAG = generic address generator

rDPA = rDPU array, i. e. coarse-grained

rDPU = reconf. datapath unit (no program counter)

51

coarse grained reconfigurable array example

rDPU

mesh-connected; exceptions: see

rout thru only

3 x 3 fast on-chip RAM

ASM

ASM

ASM

ASM

ASM

ASM

ASM

ASM

ASM

. .

not used

backbus connect

32 bits wide

. . .

. . .

array size: 10 x 16 = 160 such rDPUs

compiled by Nageldinger‘s KressArray Xplorer (Juergen Becker‘s CoDe-X inside)

Coarse-grained Reconfigurable Array example

image processing: SNN filter ( mainly a pipe network)

coming close to programmer‘s mind set (much closer than FPGA)

note: kind of software perspective, but without instruction streams  datastreams+ pipelining

52

outline3
Outline
  • von Neumann overhead hits the memory wall
  • The manycore programming crisis
  • Reconfigurable Computing is the solution
  • We need a twin paradigm approach
  • Conclusions

53

software configware co compilation

C language source

Juergen Becker 1996

“vN"machine

Partitioner

para

d

igm

antimachine

paradigm

CW

SW

Analyzer

compiler

compiler

/ Profiler

CW Code

SW code

FW Code

apropos compilation:

Software / Configware Co-Compilation

The CoDe-X co-compiler

But we need a dual paradigm approach: to run legacy software together w. configware

Reconfigurable Computing: Technology is Ready. -- Users are Not ?

54

curricula from the mainframe age

the education wall

(procedural)

the main problem

Curricula from the mainframe age

structurally

disabled

non-von-Neumann accelerators

(this is not a lecture on brain regions)

no common model

the common model is ready, but users are not

not really taught

55

we need a twin paradigm education

each side needs its own common model

procedural

structural

(this is not a lecture on brain regions)

We need a twin paradigm education

Brain Usage: both Hemispheres

56

rceducation 2008
RCeducation 2008

teaching RC ?

The 3rd International Workshop on Reconfigurable Computing Education

April 10, 2008, Montpellier, France

http://fpl.org/RCeducation/

57

we need new courses

2007

Here it is !

We need new courses

We need undergraduate lab courses with HW / CW / SW partitioning

We need new courses with extended scope on parallelism and algorithmic cleverness for HW / CW / SW co-design

“We urgently need a Mead-&-Conway-like text book “

[R. H., Dagstuhl Seminar 03301,Germany, 2003]

58

outline4
Outline
  • von Neumann overhead hits the memory wall
  • The manycore programming crisis
  • Reconfigurable Computing is the solution
  • We need a twin paradigm approach
  • Conclusions

59

conclusions
Conclusions
  • We need to increase the population of HPC-competent people [B.S.]
  • We need to increase the population of RC-competent people [R.H.]
  • Data streaming is the key model of parallel computation – not vN
  • Von-Neumann-type instruction streams considered harmful [RH]
  • But we need it for some small code sizes, old legacy software, etc. …
  • The twin paradigm approach is inevitable, also in education [R. H.].

60

an open question

**) “FPGAs? Do we need to learn hardware design?”

please, reply to:

An Open Question
  • Coarse-grained arrays: technology ready*, users not ready

*) offered by startups (PACT Corp. and others)

  • Much closer to programmer’s mind set: really much closer than FPGAs**
  • Which effect is delaying the break-through?

61

slide63
END

63