1 / 108

Supercomputers New Perspective

duaa
Download Presentation

Supercomputers New Perspective

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


    1. Supercomputers New Perspective Prof. Sin-Min Lee Department of Computer Science

    22. Multiple Processor Organization Single instruction, single data stream - SISD Single instruction, multiple data stream - SIMD Multiple instruction, single data stream - MISD Multiple instruction, multiple data stream- MIMD

    23. Single Instruction, Single Data Stream - SISD Single processor Single instruction stream Data stored in single memory Uni-processor

    24. Single Instruction, Multiple Data Stream - SIMD Single machine instruction Controls simultaneous execution Number of processing elements Lockstep basis Each processing element has associated data memory Each instruction executed on different set of data by different processors Vector and array processors

    25. Multiple Instruction, Single Data Stream - MISD Sequence of data Transmitted to set of processors Each processor executes different instruction sequence Never been implemented

    26. Multiple Instruction, Multiple Data Stream- MIMD Set of processors Simultaneously execute different instruction sequences Different sets of data SMPs, clusters and NUMA systems

    28. Taxonomy of Parallel Processor Architectures

    30. MIMD - Overview General purpose processors Each can process all instructions necessary Further classified by method of processor communication

    37. Tightly Coupled - SMP Processors share memory Communicate via that shared memory Symmetric Multiprocessor (SMP) Share single memory or pool Shared bus to access memory Memory access time to given area of memory is approximately the same for each processor

    38. Tightly Coupled - NUMA Nonuniform memory access Access times to different regions of memroy may differ

    39. Loosely Coupled - Clusters Collection of independent uniprocessors or SMPs Interconnected to form a cluster Communication via fixed path or network connections

    40. Parallel Organizations - SISD

    41. Parallel Organizations - SIMD

    42. Parallel Organizations - MIMD Shared Memory

    43. Parallel Organizations - MIMD Distributed Memory

    46. Symmetric Multiprocessors with the following characteristics Two or more similar processors of comparable capacity Processors share same memory and I/O Processors are connected by a bus or other internal connection Memory access time is approximately the same for each processor All processors share access to I/O Either through same channels or different channels giving paths to same devices All processors can perform the same functions (hence symmetric) System controlled by integrated operating system providing interaction between processors Interaction at job, task, file and data element levels

    47. SMP Advantages Performance If some work can be done in parallel Availability Since all processors can perform the same functions, failure of a single processor does not halt the system Incremental growth User can enhance performance by adding additional processors Scaling Vendors can offer range of products based on number of processors

    48. Block Diagram of Tightly Coupled Multiprocessor

    49. History Cray Research founded in 1972. Cray Computer founded in 1988. 1976 First product – Cray-1 (240,000,000 OpS). Seymour Cray personally invented vector register technology. 1985 Cray-2 (1,200,000,000 OpS, a 5-fold increase from Cray 1). Seymour is credited with immersion-cooling technology Cray-3 used revolutionary new gallium arsenide integrated circuits for the traditional silicon ones 1996 Cray was bought by SGI In March 2000 the Cray Research name and business was sold by SGI to Tera Inc.

    50. Visual Tour

    51. Market Segment (from top500)

    52. Supercomputer Architecture

    53. Current Cray Products Cray X1 is the only Cray’s product with a unique vector CPU Competitors are: Fujitsu, NEC, HP Cray XT3 and XD1 use AMD Opteron CPUs (series 100 and series 200 accordingly) You can find full product specifications as well as additional information on current systems at www.cray.com

    54. Performance Measurements Performance is measured in teraflops Linpack is a standard benchmark Performance is also measured in memory bandwidth & latency, disk performance, interconnects, internal IO, reliability, and others For example: My home system, Athlon 750, gives about 34 megaflops (34*10^6 flops) Current mid-range supercomputers give about 40 teraflops(40*10^12 flops) which is 1,176,470 times faster

    55. Scalable Architecture in XT-3

    56. Is Cray a good deal? Typical Cost approximately $30 million and above Useful lifetime – 6 years Most customers use supercomputers at 90% - 98% load Clustered supercomputers and machines build around common desktop components (AMD/Intel CPUs, memory chips, motherboards, and etc.) are significantly cheaper

    57. Future Cray’s “Red Storm” System in Sandia National Laboratories is running on Linux OS Current Cost $90 million Uses 11,648 AMD Opteron CPUs Current operational speed – 41.5 teraflops Uses unique SeaStar chip, which passes messages between thousands of CPUs Upgrades are scheduled to be completed by the end of 2005 using dual-core Opteron Expected to reach 100 teraflops by the end of 2005

    64. We currently have two Cray T3E supercomputers, which are used to run the daily weather forecasts and to run large scale climate studies. These are both Massively Parallel Processor (MPP) systems, which is to say that they contain a large number of processors (CPUs), each with it's own separate portion of volatile RAM which serves as main memory. The individual portions of memory are relatively low, 128 MB per processor on one T3E and 256 MB on the other, but because of the numbers of processors involved this gives total memory sizes of 118 GB on the T3E-900 and 168 GB on the T3E-1200E respectively. In order to run programs with very high memory requirements, it is necessary for the programmer to break down the forecast or climate data into smaller sections and distribute it across a number of processors.  Each processor can access it's own local memory with normal load/store operations, but data on held remote processors must be accessed using special software routines, such as the Message Passing Interface (MPI) or Cray's SHMEM system. The T3Es run an operating system called Unicos/mk.  This is based on the Unix operating system, but it has been extensively modified to allow it to run across a large number of processors simultaneously. Unicos/mk is best thought of as an interactive system which has the capacity to run batch work, rather than the other way round.  The batch facilities are provided by an additional piece of Cray software called the Network Queueing System (NQS). For reasons of efficiency, the majority of the workload on the Crays, including the weather forecasts, are run in batch mode.  This allows us to ensure that the supercomputers are run at full capacity over weekends and holiday periods, as well as allowing us to allocate extra computing power to our scientists when it is not required for to produce the daily forecasts.

    65. We currently have two Cray T3E supercomputers, which are used to run the daily weather forecasts and to run large scale climate studies. These are both Massively Parallel Processor (MPP) systems, which is to say that they contain a large number of processors (CPUs), each with it's own separate portion of volatile RAM which serves as main memory. The individual portions of memory are relatively low, 128 MB per processor on one T3E and 256 MB on the other, but because of the numbers of processors involved this gives total memory sizes of 118 GB on the T3E-900 and 168 GB on the T3E-1200E respectively. In order to run programs with very high memory requirements, it is necessary for the programmer to break down the forecast or climate data into smaller sections and distribute it across a number of processors.  Each processor can access it's own local memory with normal load/store operations, but data on held remote processors must be accessed using special software routines, such as the Message Passing Interface (MPI) or Cray's SHMEM system. The T3Es run an operating system called Unicos/mk.  This is based on the Unix operating system, but it has been extensively modified to allow it to run across a large number of processors simultaneously. Unicos/mk is best thought of as an interactive system which has the capacity to run batch work, rather than the other way round.  The batch facilities are provided by an additional piece of Cray software called the Network Queueing System (NQS). For reasons of efficiency, the majority of the workload on the Crays, including the weather forecasts, are run in batch mode.  This allows us to ensure that the supercomputers are run at full capacity over weekends and holiday periods, as well as allowing us to allocate extra computing power to our scientists when it is not required for to produce the daily forecasts.

    66. > decision to make. We can rewrite our codes for each computer system we > buy in order to make it run really fast and efficiently on that system. > This costs us a lot of money in employing people to rewite our code. > Alternatively we can write code that may not be as fast and efficient, > but that runs well on many different systems. This is closer to the > approach we actually take - in reality it's a bit of both - but as far > as possible we aim for code that is highly portable to different > computers. > > Slide 32: > Again a very complex idea to explain! Can't think off the top of my head > of a good analogy to use here. I'll let you know if I can think of > something... > > Slide 34: > Similar comments to 32> decision to make. We can rewrite our codes for each computer system we> buy in order to make it run really fast and efficiently on that system.> This costs us a lot of money in employing people to rewite our code.> Alternatively we can write code that may not be as fast and efficient,> but that runs well on many different systems. This is closer to the> approach we actually take - in reality it's a bit of both - but as far> as possible we aim for code that is highly portable to different> computers.> > Slide 32:> Again a very complex idea to explain! Can't think off the top of my head> of a good analogy to use here. I'll let you know if I can think of> something...> > Slide 34:> Similar comments to 32

    67. Weather forecasting

    105. Teacher’s note. You could do this as a class exercise to show how parallel processing works. Ask 12 pupils ..( if your class is smaller than 26 you can make this group smaller, as long as you have one person in this group you can demonstrate the concept) Call them Group A to write out the seven times table. Ask another 12 pupils .. Call them Group B to write out one line of the seven times table. So Ann does 1*7 Branwen does 2*7, Sabeen does 3*7, Lucy does 4*7 and so on. Then have one pupil time the first person in Group A to finish writing out the tables. Have another pupil time how long it takes for everybody in Group B to have written down their answers. Group B should be much faster. This is the idea behind Parallel ProcessingTeacher’s note. You could do this as a class exercise to show how parallel processing works. Ask 12 pupils ..( if your class is smaller than 26 you can make this group smaller, as long as you have one person in this group you can demonstrate the concept) Call them Group A to write out the seven times table. Ask another 12 pupils .. Call them Group B to write out one line of the seven times table. So Ann does 1*7 Branwen does 2*7, Sabeen does 3*7, Lucy does 4*7 and so on. Then have one pupil time the first person in Group A to finish writing out the tables. Have another pupil time how long it takes for everybody in Group B to have written down their answers. Group B should be much faster. This is the idea behind Parallel Processing

    108. References http://research.microsoft.com/users/gbell/craytalk/sld066.htm http://inventors.about.com/library/inventors/blsupercomputer.htm http://americanhistory.si.edu/csr/comphist/cray.htm http://web.mit.edu/invent/iow/cray.html www.top500.org http://www.spikynorman.dsl.pipex.com/CrayWWWStuff/ http://news.zdnet.co.uk/hardware/emergingtech/0,39020357,39162182,00.htm

More Related