1 / 35

Parallel Programming on the SGI Origin2000

Parallel Programming on the SGI Origin2000. Taub Computer Center Technion. Anne Weill-Zrahia. With thanks to Moshe Goldberg, TCC and Igor Zacharov SGI. Mar 2005. Parallel Programming on the SGI Origin2000. Parallelization Concepts SGI Computer Design Efficient Scalar Design

jabari
Download Presentation

Parallel Programming on the SGI Origin2000

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Parallel Programming on the SGI Origin2000 Taub Computer Center Technion Anne Weill-Zrahia With thanks to Moshe Goldberg, TCC and Igor Zacharov SGI Mar 2005

  2. Parallel Programming on the SGI Origin2000 • Parallelization Concepts • SGI Computer Design • Efficient Scalar Design • Parallel Programming -OpenMP • Parallel Programming- MPI

  3. Academic Press 2001 ISBN 1-55860-671-8

  4. Parallelization Concepts

  5. Introduction to Parallel Computing • Parallel computer :A set of processors that work cooperatively to solve a computational problem. • Distributed computing : a number of processors communicating over a network • Metacomputing : Use of several parallel computers

  6. Parallel classification • Parallel architectures Shared Memory / Distributed Memory • Programming paradigms Data parallel / Message passing

  7. Why parallel computing • Single processor performance – limited by physics • Multiple processors – break down problem into simple tasks or domains • Plus – obtain same results as in sequential program, faster. • Minus – need to rewrite code

  8. Three HPC Architectures Shared memory Cluster Vector Processor

  9. Shared Memory • Each processor can access any part of the memory • Access times are uniform (in principle) • Easier to program (no explicit message passing) • Bottleneck when several tasks access same location

  10. Symmetric Multiple Processors Memory Memory Bus CPU CPU CPU CPU Examples: SGI Power Challenge, Cray J90/T90

  11. Data-parallel programming • Single program defining operations • Single memory • Loosely synchronous (completion of loop) • Parallel operations on array elements

  12. Distributed Parallel Computing Memory Memory Memory Memory CPU CPU CPU CPU Examples: SP2, Beowulf clusters

  13. Message Passing Programming • Separate program on each processor • Local Memory • Control over distribution and transfer of data • Additional complexity of debugging due to communications

  14. Distributed Memory • Processor can only access local memory • Access times depend on location • Processors must communicate via explicit message passing

  15. Message Passing or Shared Memory? Message Passing Shared Memory Takes longer to implement More details to worry about Increases source lines Complex to debug and time Increase in total memory used Scalability limited by: - communications overhead - process synchronization Parallelism is visible Easier to implement System handles many details Little increase in source Easier to debug and time Efficient memory use Scalability limited by: - serial portion of code - process synchronization Compiler based parallelism

  16. Performance issues • Concurrency – ability to perform actions simultaneously • Scalability – performance is not impaired by increasing number of processors • Locality – high ration of local memory accesses/remote memory accesses (or low communication)

  17. Objectives of HPC in the Technion • Maintain leading position in science/engineering • Production: sophisticated calculations • Required: high speed • Required: large memory • Teach techniques of parallel computing • In research projects • As part of courses

  18. HPC in the Technion SGI Origin2000 22 cpu (R10000) -- 250 MHz Total memory -- 9 GB 32 cpu (R12000) – 300 MHz Total memory - 9GB PC cluster (linux redhat 9.0) 6 cpu (pentium II - 866MHz) Memory - 500 MB/cpu PC cluster (linux redhat 9.0) 16 cpu (pentium III – 800 MHz) Memory – 500 MB/cpu

  19. Origin2000 (SGI) 128 processors

  20. Origin2000 (SGI) 22 processors

  21. PC clusters (Intel) • 6 processors • 16 processors

  22. ~PBytes/sec ~100 MBytes/sec Offline Processor Farm ~20 TIPS There is a “bunch crossing” every 25 nsecs. There are 100 “triggers” per second Each triggered event is ~1 MByte in size ~100 MBytes/sec Online System Tier 0 CERN Computer Centre ~622 Mbits/sec or Air Freight (deprecated) Tier 1 FermiLab ~4 TIPS France Regional Centre Germany Regional Centre Italy Regional Centre ~622 Mbits/sec Tier 2 Tier2 Centre ~1 TIPS Caltech ~1 TIPS Tier2 Centre ~1 TIPS Tier2 Centre ~1 TIPS Tier2 Centre ~1 TIPS HPSS HPSS HPSS HPSS HPSS ~622 Mbits/sec Institute ~0.25TIPS Institute Institute Institute Physics data cache ~1 MBytes/sec 1 TIPS is approximately 25,000 SpecInt95 equivalents Physicists work on analysis “channels”. Each institute will have ~10 physicists working on one or more channels; data for these channels should be cached by the institute server Pentium II 300 MHz Pentium II 300 MHz Pentium II 300 MHz Pentium II 300 MHz Tier 4 Physicist workstations Data Grids forHigh Energy Physics Image courtesy Harvey Newman, Caltech

  23. GRIDS: Globus Toolkit • Grid Security Infrastructure (GSI) • Globus Resource Allocation Manager (GRAM) • Monitoring and Discovery Service (MDS): • Global Access to Secondary Storage (GASS):

  24. November 2004

  25. A Recent Example Matrix multiply

  26. Profile -- original

  27. Profile – optimized code

More Related