Download
1 / 37

end-users - PowerPoint PPT Presentation


  • 189 Views
  • Updated On :

high-end computing technology: where is it heading? greg astfalk woon yung chung [email protected] prologue this is not a talk about hewlett-packard ’ s product offering(s) the context is hpc (high performance computing) somewhat biased to scientific computing

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'end-users' - betty_james


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Slide2 l.jpg

high-end computing technology: where is it heading?

greg astfalk

woon yung chung

[email protected]


Prologue l.jpg
prologue

this is not a talk about hewlett-packard’s product offering(s)

the context is hpc (high performance computing)

somewhat biased to scientific computing

also applies to commercial computing


Backdrop l.jpg
backdrop

end-users of hpc systems have needs and “wants” from hpc systems

the computer industry delivers the hpc systems

there exists a gap between the two wrt

programming

processors

architectures

interconnects/storage

in this talk we (weakly) quantify the gaps in these 4 areas


End users programming wants l.jpg

end-users of hpc machines would ideally like to think and code sequentially

have a compiler and run-time system that produces portable and (nearly) optimal parallel code

regardless of processor count

regardless of architecture type

yes, i am being a bit facetious but the idea remains true

end-users’ programming “wants”


Parallelism methodologies l.jpg
parallelism methodologies code sequentially

there exists 5 methodologies to achieve parallelism

automatic parallelization via compilers

explicit threading

pthreads

message-passing

mpi

pragma/directive

openmp

explicitly parallel languages

upc, et al.


Parallel programming l.jpg
parallel programming code sequentially

parallel programming is a cerebral effort

if lots of neurons plus mpi constitutes “prime-time” then parallel programming has arrived

no major technologies on the horizon to change this status quo


Discontinuities l.jpg
discontinuities code sequentially

the ease of parallel programming has not progressed at the same rate that parallel systems have become available

performance gains require compiler optimization or pbo

most parallelism requires hand-coding

in the real-world many users don’t use any compiler optimizations


Parallel efficiency l.jpg
parallel efficiency code sequentially

mindful that the bounds on parallel efficiency are, in general, far apart

50% efficiency on 32 processors is good

10% efficiency on (100) processors is excellent

>2% efficiency on (1000) processors is heroic

a little communication can “knee over” the efficiency vs. processor count curve


Apps with sufficient parallelism l.jpg
apps with sufficient parallelism code sequentially

few existing applications can utilize (1000), or even (100), processors with any reasonable degree of efficiency

to date have generally required heroic effort

new algorithms (i.e., data and control decompositions) or nearly complete are necessary

such large-scale parallelism will have “arrived” when msc/nastran and oracle exist on such systems and utilize the processors


Latency tolerant algorithms l.jpg

latency tolerance will be a increasingly important theme for the future

hardware will not solve this problem

more on this point later

developing algorithms that have significant latency tolerance will be necessary

this means thinking “outside the box” about the algorithms

simple modifications to existing algorithms generally won’t suffice

latency tolerant algorithms


Operating systems l.jpg
operating systems the future

development environments will move to nt

heavy-lifting will remain with unix

four unix’s to survive (alphabetically)

hp-ux

linux

aix 5l

solaris

linux will be important at the lower-end but will not significantly encroach on the high-end


End users proc arch wants l.jpg
end-users the future’ proc/arch “wants”

all things being equal high-end users would likely want a classic cray vector supercomputer

no caches

multiple pipes to memory

single word access

hardware support for gather/scatter

etc.

it is true however that for some applications contemporary risc processors perform better


Processors l.jpg
processors the future

the “processor of choice” is now, and will be, for some time to come the risc processor

risc processors have caches

caches are good

caches are bad

if your code fits in cache, you aren’t supercomputing! 


Risc processor performance l.jpg

a rule of thumb is that a risc processor, the futureany risc processor, gets on average, on a sustained basis,

10% of its peak performance

the 3 on this is large

achieved performance varies with

architecture

application

algorithm

coding

dataset size

anything else you can think of

risc processor performance


Semiconductor processes l.jpg
semiconductor processes the future

semiconductor processes change every 2-3 years

assuming that “technology scaling” applies to subsequent generations then per generation

frequency increase of ~40%

transistor density increase of ~100%

energy per transition decrease of ~60%



What to do with gates l.jpg
what to do with gates the future

it is not a simple question of what the best use of the gates is

larger caches

multiple cores

specialized functional units

etc.

the impact of soft errors with decreasing design rule size will be a important topic

what happens if a alpha particles flips a bit in a register?


Processor futures l.jpg
processor futures the future

you can expect, for the short term, moore’s law like gains in processor’s peak performance

doubling of “performance” every 18-24 months

does not necessarily apply to application performance

moore’s law will not last forever

4-5 more turns (maybe?)


Customer spending m l.jpg

1995 the future

1996

1997

1998

1999

2000

2001

2002

2003

Cisc

Ia64

Ia32

Risc

customer spending ($m)

$40,000

$35,000

$30,000

$25,000

$20,000

$15,000

$10,000

$5,000

$0

idc, february 2000

  • technology disruptions

  • risc crossed over cisc in 1996

  • itanium will cross over risc in 2004


Present high end architectures l.jpg
present high-end architectures the future

today’s high-end architecture is either

smp

ccnuma

cluster of smp nodes

cluster of ccnuma nodes

japanese vector system

all of these architectures work

efficiency varies with application type


Architectural issues l.jpg
architectural issues the future

of the choices available the smp is preferred, however

smp processor count is limited

cost of scalability is prohibitive

ccnuma addresses these limitations but induces its own

disparate latencies

better, but still limited, scalability

ras limitations

clusters too have pros and cons

huge latencies

low cost

etc.


Physics l.jpg
physics the future

limitations imposed by physics have led us to architectures that have a deep memory hierarchy

the algorithmist and programmer must deal with, and exploit, the hierarchy to achieve good performance

this is part of the cerebral effort of parallel programming we mentioned earlier


Memory hierarchy l.jpg
memory hierarchy the future

typical latencies for today’s technology


Balanced system ratios l.jpg
balanced system ratios the future

a “ideal” high-end system should be balanced wrt its performance metrics

for each peak flop/second

0.5–1 byte of physical memory

10–100 byte of disk capacity

4–16 byte/sec of cache bandwidth

1–3 byte/sec of memory bandwidth

0.1–1 bit/sec of interconnect bandwidth

0.02–0.2 byte/sec of disk bandwidth


Balanced system l.jpg
balanced system the future

applying the balanced system ratios to a unnamed contemporary 16 processor smp


Storage l.jpg
storage the future

data volumes are growing at a extremely rapid pace

disk capacity sold doubled from 1997 to 1998

storage is a increasingly large percent of the total server sale

disk technology is advancing too slowly

per generation, of 1-1.5 years;

access time decreases 10%

spindle bandwidth increases 30%

capacity increases 50%


Networks l.jpg
networks the future

only the standards will be widely deployed

gigabit ethernet

gigabyte ethernet

fibre channel (2x and 10x later)

sio

atm

dwdm backbones

the “last mile” problem remains with us

inter-system interconnect for clustering will not keep pace with the demands (for latency and bandwidth)


Vendor s constraints l.jpg
vendor the future’s constraints

rule #1: be profitable to return value to the shareholders

you don’t control the market size

you can only spend ~10% of your revenue on r&d

don’t fab your own silicon (hopefully)

you must be more than just a “technical computing” company

to not do this is to fail to meet rule #1 (see above)


Market sizes l.jpg
market sizes the future

according to the industry analysts the technical market is, depending on where you draw the cut-line, $4-5 billion annually

the bulk of the market is small-ish systems (data from forest baskett at sgi)


A perspective l.jpg
a perspective the future

commercial computing is not a enemy

without the commercial market’s revenue our ability to build hpc-like systems would be limited

the commercial market benefits from the technology innovation in the hpc market

is performance “left on the table” in designing a system to serve both the commercial and technical markets

yes


Slide32 l.jpg
why? the future

lack of a cold war

performance of hpc systems has been marginalized

in the mid-70s how many applications ran faster on a vax 11/780 than the cray-1

none

how many applications today run faster on a pentium than the cray t90?

some

current demand for hpc systems is elastic


Future prognostication l.jpg
future prognostication the future

computing in the future will be all about data and moving data

the growth in data volumes is incredible

richer media types (i.e., video) means more data

distributed collaborations imply moving data

e-whatever requires large, rapid data movement

more flops  more data


Data movement l.jpg
data movement the future

the scope of data movement encompasses:

register to functional unit

cache to register

cache to cache

memory to cache

disk to memory

tape to disk

system to system

pda to client to server

continent to continent

all of these are going to be important


Epilogue l.jpg
epilogue the future

for hpc in the future

it is going to be risc processors

smp and ccnuma architectures

smp processor count relatively constant

technology trends are reasonably predictable

mpi, pthreads and openmp for parallelism

latency management will be crucial

it will be all about data


Epilogue cont d l.jpg
epilogue (cont the future’d)

for the computer industry in the future

trending toward “e-everything”

e-commerce

apps-on-tap

brokered services

remote data

virtual data centers

visualization

nt for development

vectors are dying

for hpc vendors in the future

there will be fewer 


Conclusion l.jpg
conclusion the future

hpc users will need to yield more to what the industry can provide rather than vice-versa

vendor’s rule #1 is a cruel master


ad