Loading in 5 sec....

Seminar on parallel computingPowerPoint Presentation

Seminar on parallel computing

- 81 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about ' Seminar on parallel computing' - matthew-weiss

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Seminar on parallel computing

- Goal: provide environment for exploration of parallel computing
- Driven by participants
- Weekly hour for discussion, show & tell
- Focus primarily on distributed memory computing on linux PC clusters
- Target audience:
- Experience with linux computing & Fortran/C
- Requires parallel computing for own studies

- 1 credit possible for completion of ‘proportional’ project

Main idea

- Distribute a job over multiple processing units
- Do bigger jobs than is possible on single machines
- Solve bigger problems faster
- Resources: e.g., www-jics.cs.utk.edu

Sequential limits

- Moore’s law
- Clock speed physically limited
- Speed of light
- Miniaturization; dissipation; quantum effects

- Memory addressing
- 32 bit words in PCs: 4 Gbyte RAM max.

Machine architecture: serial

- Single processor
- Hierarchical memory:
- Small number of registers on CPU
- Cache (L1/L2)
- RAM
- Disk (swap space)

- Operations require multiple steps
- Fetch two floating point numbers from main memory
- Add and store
- Put back into main memory

Vector processing

- Speed up single instructions on vectors
- E.g., while adding two floating point numbers fetch two new ones from main memory
- Pushing vectors through the pipeline

- Useful in particular for long vectors
- Requires good memory control:
- Bigger cache is better

- Common on most modern CPUs
- Implemented in both hardware and software

SIMD

- Same instruction works simultaneously on different data sets
- Extension of vector computing
- Example:
DO IN PARALLEL

for i=1,n

x(i) = a(i)*b(i)

end

DONE PARALLEL

MIMD

- Multiple instruction, multiple data
- Most flexible, encompasses SIMD/serial.
- Often best for ‘coarse grained’ parallelism
- Message passing
- Example: domain decomposition
- Divide computational grid in equal chunks
- Work on each domain with one CPU
- Communicate boundary values when necessary

Historical machines

- 1976 Cray-1 at Los Alamos (vector)
- 1980s Control Data Cyber 205 (vector)
- 1980s Cray XMP
- 4 coupled Cray-1s

- 1985 Thinking Machines Connection Machine
- SIMD, up to 64k processors

- 1984+ Nec/Fujitsu/Hitachi
- Automatic vectorization

Sun and SGI (90s)

- Scaling between desktops and compute servers
- Use of both vectorization and large scale parallelization
- RISC processors
- Sparc for Sun
- MIPS for SGI: PowerChallenge/Origin

Happy developments

- High performance Fortran / Fortran 90
- Definitions for message passing languages
- PVM
- MPI

- Linux
- Performance increase of commodity CPUs
- Combination leads to affordable cluster computing

Who’s the biggest

- www.top500.org
- Linpack matrix-vector benchmarks
- June 2003:
- Earth Simulator, Yokohama, NEC, 36 Tflops
- Asci Q, Los Alamos, HP, 14 Tflops
- Linux cluster, Livermore, 8 Tflops

Parallel approaches

- Embarrassingly parallel
- “Monte Carlo” searches
- SETI @ home
- Analyze lots of small time series

- Parallalize DO-loops in dominantly serial code
- Domain decomposition
- Fully parallel
- Requires complete rewrite/rethinking

Example: seismic wave propagation

- 3D spherical wave propagation modeled with high order finite element technique (Komatitsch and Tromp, GJI, 2002)
- Massively parallel computation on linux PC clusters
- Approx. 34 Gbyte RAM needed for 10 km average resolution
- www.geo.lsa.umich.edu/~keken/waves

Resolution

- Spectral elements: 10 km average resolution
- 4th order interpolation functions
- Reasonable graphics resolution: 10 km or better
- 12 km: 10243 = 1 GB
- 6 km: 20483 = 8 GB

Simulated EQ (d=15 km) after 17minutes

512x512

256 colors

Positive only

Truncated max

Log10 scale

Particle velocity

P

PPP

PP

PKPab

SK

PKP

PKIKP

256 colors

Positive only

Truncated max

Log10 scale

Particle velocity

Some S component

PcSS

SS

R

S

PcS

PKS

Resources at UM

- Various linux clusters in Geology
- Agassiz (Ehlers) 8 Pentium 4 @ 2 Gbyte each
- Panoramix (van Keken) 10 P3 @ 512 Gbyte
- Trans (van Keken, Ehlers) 24 P4 @ 2 Gbyte

- SGIs
- Origin 2000 (Stixrude, Lithgow, van Keken)

- Center for Advanced Computing @ UM
- Athlon clusters (384 nodes @ 1 Gbyte each)
- Opteron cluster (to be installed)

- NPACI

Software resources

- GNU and Intel compilers
- Fortran/Fortran 90/C/C++

- MPICH www-fp.mcs.anl.gov
- Primary implementation of MPI
- “Using MPI” 2nd edition, Gropp et al., 1999

- Sun Grid Engine
- Petsc www-fp.mcs.anl.gov
- Toolbox for parallel scientific computing

Download Presentation

Connecting to Server..