350 likes | 356 Views
CMPE 478, Parallel Processing. Advanced Hardware Parallel/Distributed Processing High Performance Computing Top 500 list Grid computing. picture of ASCI WHITE, the most powerful computer in the world (2001). Von Neumann Architecture. RAM. CPU. Device. Device. BUS.
E N D
CMPE 478, Parallel Processing Advanced Hardware Parallel/Distributed Processing High Performance ComputingTop 500 listGrid computing picture of ASCI WHITE, the most powerful computer in the world (2001)
Von Neumann Architecture RAM CPU Device Device BUS • sequential computer
History of Computer Architecture • 4 Generations (identified by logic technology) • Tubes • Transistors • Integrated Circuits • VLSI (very large scale integration)
PERFORMANCE TRENDS • Traditional mainframe/supercomputer performance 25% increase per year • But … microprocessor performance 50% increase per year since mid 80’s.
Moore’s Law • “Transistor density doubles every 18 months” • Moore is co-founder of Intel. • 60 % increase per year • Exponential growth • PC costs decline. • PCs are building bricks of all future systems.
Bit Level Parallelism(upto mid 80’s) • 4 bit microprocessors replaced by 8 bit, 16 bit, 32 bit etc. • doubling the width of the datapath reduces the number of cycles required to perform a full 32-bit operation • mid 80’s reap benefits of this kind of parallelism (full 32-bit word operations combined with the use of caches)
Instruction Level Parallelism(mid 80’s to mid 90’s) • Basic steps in instruction processing (instruction decode, integer arithmetic, address calculations, could be performed in a single cycle) • Pipelined instruction processing • Reduced instruction set (RISC) • Superscalar execution • Branch prediction
Thread/Process Level Parallelism(mid 90’s to present) • On average control transfers occur roughly once in five instructions, so exploiting instruction level parallelism at a larger scale is not possible • Use multiple independent “threads” or processes • Concurrently running threads, processes
Sequential vs Parallel Processing • physical limits reached • easy to program • expensive supercomputers • “raw” power unlimited • more memory, multiple cache • made up of COTS, so cheap • difficult to program
Amdahl’s Law • The serial percentage of a program is fixed. So speed-up obtained by employing parallel processing is bounded. • Lead to pessimism in in the parallel processing community and prevented development of parallel machines for a long time. 1 Speedup = 1-s s + P • In the limit: • Spedup = 1/s s
Gustafson’s Law • Serial percentage is dependent on the number of processors/input. • Broke/disproved Amdahl’s law. • Demonstrated achieving more than 1000 fold speedup using 1024 processors. • Justified parallel processing
Hillis’ Thesis ‘85 Piece of silicon Sequential computer Parallel computer • proposed “The Connection Machine” with massive number of processors each with small memory operating in SIMD mode. • CM-1, CM-2 machines from Thinking Machines Corporation (TMC)were examples of this architecture with 32K-128K processors. Unfortunately, TMC went out of business.
Grand Challenge Applications • Important scientific & engineering problems identified by U.S. High Performance Computing & Communications Program (’92)
Flynn’s Taxonomy • classifies computer architectures according to: • Number of instruction streams it can process at a time • Number of data elements on which it can operate simultaneously Data Streams Single Multiple Single SIMD SISD Instruction Streams Multiple MISD MIMD
SPMD Model (Single Program Multiple Data) • Each processor executes the same program asynchronously • Synchronization takes place only when processors need to exchange data • SPMD is extension of SIMD (relax synchronized instruction execution) • SPMD is restriction of MIMD (use only one source/object)
Parallel Processing Terminology • Embarassingly Parallel: • applications which are trivial to parallelize • large amounts of independent computation • Little communication • Data Parallelism: • model of parallel computing in which a single operation can be applied to all data elements simultaneously • amenable to SIMD or SPMD style of computation • Control Parallelism: • many different operations may be executed concurrently • require MIMD/SPMD style of computation
Parallel Processing Terminology • Scalability: • If the size of problem is increased, number of processors that can be effectively used can be increased (i.e. there is no limit on parallelism). • Cost of scalable algorithm grows slowly as input size and the number of processors are increased. • Data parallel algorithms are more scalable than control parallel algorithms • Granularity: • fine grain machines: employ massive number of weak processors each with small memory • coarse grain machines: smaller number of powerful processors each with large amounts of memory
Shared Address Space process (thread) process (thread) process (thread) process (thread) process (thread) Shared Memory Machines • Memory is globally shared, therefore processes (threads) see single address • space • Coordination of accesses to locations done by use of locks provided by • thread libraries • Example Machines: Sequent, Alliant, SUN Ultra, Dual/Quad Board Pentium PC • Example Thread Libraries: POSIX threads, Linux threads.
Shared Memory Machines • can be classified as: • UMA: uniform memory access • NUMA: nonuniform memory access • based on the amount of time a processor takes to access local and global memory. P M P M .. P M Inter- connection network P M P M .. P M Inter- connection network M M M .. M P P .. P M M .. M Inter- connection network/ or BUS (a) (c) (b)
M process process M Network M process M process process M Distributed Memory Machines • Each processor has its own local memory (not directly accessible by others) • Processors communicate by passing messages to each other • Example Machines: IBM SP2, Intel Paragon, COWs (cluster of workstations) • Example Message Passing Libraries: PVM, MPI
Beowulf Clusters • Use COTS, ordinary PCs and networking equipment • Has the best price/performance ratio PC cluster
Grid Computing • provide access to computing power and various resources just like accessing electrical power from electrical grid • Allows coupling of geographically distributed resources • Provide inexpensive access to resources irrespective of their physical location or access point • Internet & dedicated networks can be used to interconnect distributed computational resources and present them as a single unified resource • Resources: supercomputers, clusters, storage systems, data resources, special devices
Grid Computing • the GRID is, in effect, a set of software tools, which when combined with hardware, would let users tap processing poweroff the Internet as easily as the electrical power can be drawn from the electricty grid. • Examples of Grid projects: • - Seti@home : search for extraterrestial intelligence • - Entropia : company to broker processing power of idle computers, about 30,000 volunteer computers and total processing power 1 Tflop. • Xpulsar@home : sifts astronomical data for pulsars • - Folding@home : protein folding • - Evolutionary@home : population dynamics
Seti@home Project • Screen-saver program • Sifts through signals recorded by the giant Arecibo radio telescope in Puerto Rico • 3 million people downloaded screen saver and run it. • Program periodically prompts its host to retrieve a new chunk of data from the Internet and sends latest processed results back to SETI. • Equivalent of more than 600,000 years of PC processing time has already clocked up.
More Grid Projects • GriPhyN: grid developed by consortium of American labs for physics projects • Earth System Grid: make huge climate simulations spanning hundreds of years. • Earthquake Engineering Simulation Grid: • Particle Physics Data Grid: • Information Power Grid: supported by NASA for massive engineering calculations • DataGrid : European, coordinated by CERN. Aim is to develop middleware for research projects in biological sciences, earth observation and high energy physics.
Gordon Bell & Jim Gray on “What’s next in High Performance Computing” • Beowulf ’s economics and sociology are poised to kill off the other architectural lines • Computational Grid can federate systems into supercomputers far beyond the power of any current computing center • The centers will become super-data and super-application centers • Clusters (currently) perform poorly on applications that require large shared memory
Gordon Bell & Jim Gray on “What’s next in High Performance Computing” • Now individuals and laboratories can assemble and incrementally grow any-size super-computer anywhere in the world. • By 2010, the cluster is likely to be the principal computing structure. • Seti@home does not run Linpack, so does not qualify in the top500 list. But Seti@home avarages 13 Tflops making it more powerful than the top 3 of top500 machines combined. • GRID and P2P computing using the Internet is likely to remain the world’s most powerful supercomputer.
Gordon Bell & Jim Gray on “What’s next in High Performance Computing” • Concerned that traditional supercomputer architecture is dead and a supercomputer mono-culture is being born. • Recommend increased investment in peta-scale distributed databases. • By 2010, the cluster is likely to be the principal computing structure. • Research programs that stimulate cluster understanding and training are a good investment for laboratories that depend on highest performance machines.