Parallel computing explained parallel computing overview l.jpg
This presentation is the property of its rightful owner.
Sponsored Links
1 / 65

Parallel Computing Explained Parallel Computing Overview PowerPoint PPT Presentation


  • 270 Views
  • Uploaded on
  • Presentation posted in: General

Parallel Computing Explained Parallel Computing Overview. Slides Prepared from the CI-Tutor Courses at NCSA http://ci-tutor.ncsa.uiuc.edu/ By S. Masoud Sadjadi School of Computing and Information Sciences Florida International University March 2009. Agenda. 1 Parallel Computing Overview

Download Presentation

Parallel Computing Explained Parallel Computing Overview

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Parallel computing explained parallel computing overview l.jpg

Parallel Computing ExplainedParallel Computing Overview

Slides Prepared from the CI-Tutor Courses at NCSA

http://ci-tutor.ncsa.uiuc.edu/

By

S. Masoud Sadjadi

School of Computing and Information Sciences

Florida International University

March 2009


Agenda l.jpg

Agenda

1 Parallel Computing Overview

2 How to Parallelize a Code

3 Porting Issues

4 Scalar Tuning

5 Parallel Code Tuning

6 Timing and Profiling

7 Cache Tuning

8 Parallel Performance Analysis

9 About the IBM Regatta P690


Agenda3 l.jpg

Agenda

1 Parallel Computing Overview

1.1 Introduction to Parallel Computing

1.1.1 Parallelism in our Daily Lives

1.1.2 Parallelism in Computer Programs

1.1.3 Parallelism in Computers

1.1.4 Performance Measures

1.1.5 More Parallelism Issues

1.2 Comparison of Parallel Computers

1.3 Summary


Parallel computing overview l.jpg

Parallel Computing Overview

  • Who should read this chapter?

    • New Users – to learn concepts and terminology.

    • Intermediate Users – for review or reference.

    • Management Staff – to understand the basic concepts – even if you don’t plan to do any programming.

    • Note: Advanced users may opt to skip this chapter.


Introduction to parallel computing l.jpg

Introduction to Parallel Computing

  • High performance parallel computers

    • can solve large problems much faster than a desktop computer

      • fast CPUs, large memory, high speed interconnects, and high speed input/output

      • able to speed up computations

        • by making the sequential components run faster

        • by doing more operations in parallel

  • High performance parallel computers are in demand

    • need for tremendous computational capabilities in science, engineering, and business.

      • require gigabytes/terabytes f memory and gigaflops/teraflops of performance

      • scientists are striving for petascale performance


Introduction to parallel computing6 l.jpg

Introduction to Parallel Computing

  • HPPC are used in a wide variety of disciplines.

    • Meteorologists: prediction of tornadoes and thunderstorms

    • Computational biologists: analyze DNA sequences

    • Pharmaceutical companies: design of new drugs

    • Oil companies: seismic exploration

    • Wall Street: analysis of financial markets

    • NASA: aerospace vehicle design

    • Entertainment industry: special effects in movies and commercials

  • These complex scientific and business applications all need to perform computations on large datasets or large equations.


Parallelism in our daily lives l.jpg

Parallelism in our Daily Lives

  • There are two types of processes that occur in computers and in our daily lives:

    • Sequential processes

      • occur in a strict order

      • it is not possible to do the next step until the current one is completed.

      • Examples

        • The passage of time: the sun rises and the sun sets.

        • Writing a term paper: pick the topic, research, and write the paper.

    • Parallel processes

      • many events happen simultaneously

      • Examples

        • Plant growth in the springtime

        • An orchestra


Agenda8 l.jpg

Agenda

1 Parallel Computing Overview

1.1 Introduction to Parallel Computing

1.1.1 Parallelism in our Daily Lives

1.1.2 Parallelism in Computer Programs

1.1.2.1 Data Parallelism

1.1.2.2 Task Parallelism

1.1.3 Parallelism in Computers

1.1.4 Performance Measures

1.1.5 More Parallelism Issues

1.2 Comparison of Parallel Computers

1.3 Summary


Parallelism in computer programs l.jpg

Parallelism in Computer Programs

  • Conventional wisdom:

    • Computer programs are sequential in nature

    • Only a small subset of them lend themselves to parallelism.

    • Algorithm: the "sequence of steps" necessary to do a computation.

    • The first 30 years of computer use, programs were run sequentially.

  • The 1980's saw great successes with parallel computers.

    • Dr. Geoffrey Fox published a book entitled Parallel Computing Works!

    • many scientific accomplishments resulting from parallel computing

    • Computer programs are parallel in nature

    • Only a small subset of them need to be run sequentially


Parallel computing l.jpg

Parallel Computing

  • What a computer does when it carries out more than one computation at a time using more than one processor.

  • By using many processors at once, we can speedup the execution

    • If one processor can perform the arithmetic in time t.

    • Then ideally p processors can perform the arithmetic in time t/p.

    • What if I use 100 processors? What if I use 1000 processors?

  • Almost every program has some form of parallelism.

    • You need to determine whether your data or your program can be partitioned into independent pieces that can be run simultaneously.

    • Decomposition is the name given to this partitioning process.

  • Types of parallelism:

    • data parallelism

    • task parallelism.


Data parallelism l.jpg

Data Parallelism

  • The same code segment runs concurrently on each processor, but each processor is assigned its own part of the data to work on.

    • Do loops (in Fortran) define the parallelism.

    • The iterations must be independent of each other.

    • Data parallelism is called "fine grain parallelism" because the computational work is spread into many small subtasks.

  • Example

    • Dense linear algebra, such as matrix multiplication, is a perfect candidate for data parallelism.


An example of data parallelism l.jpg

An example of data parallelism

Original Sequential Code

Parallel Code

DO K=1,N

DO J=1,N

DO I=1,N

C(I,J) = C(I,J) + A(I,K)*B(K,J)

END DO

END DO

END DO

!$OMP PARALLEL DO

DO K=1,N

DO J=1,N

DO I=1,N

C(I,J) = C(I,J) + A(I,K)*B(K,J)

END DO

END DO

END DO

!$END PARALLEL DO


Quick intro to openmp l.jpg

Quick Intro to OpenMP

  • OpenMP is a portable standard for parallel directives covering both data and task parallelism.

    • More information about OpenMP is available on the OpenMP website.

    • We will have a lecture on Introduction to OpenMP later.

  • With OpenMP, the loop that is performed in parallel is the loop that immediately follows the Parallel Do directive.

    • In our sample code, it's the K loop:

      • DO K=1,N


Openmp loop parallelism l.jpg

OpenMP Loop Parallelism

Iteration-Processor Assignments

The code segment running on each processor

DO J=1,N

DO I=1,N

C(I,J) = C(I,J) + A(I,K)*B(K,J)

END DO

END DO


Openmp style of parallelism l.jpg

OpenMP Style of Parallelism

  • can be done incrementally as follows:

    • Parallelize the most computationally intensive loop.

    • Compute performance of the code.

    • If performance is not satisfactory, parallelize another loop.

    • Repeat steps 2 and 3 as many times as needed.

  • The ability to perform incremental parallelism is considered a positive feature of data parallelism.

  • It is contrasted with the MPI (Message Passing Interface) style of parallelism, which is an "all or nothing" approach.


Task parallelism l.jpg

Task Parallelism

  • Task parallelism may be thought of as the opposite of data parallelism.

  • Instead of the same operations being performed on different parts of the data, each process performs different operations.

  • You can use task parallelism when your program can be split into independent pieces, often subroutines, that can be assigned to different processors and run concurrently.

  • Task parallelism is called "coarse grain" parallelism because the computational work is spread into just a few subtasks.

  • More code is run in parallel because the parallelism is implemented at a higher level than in data parallelism.

  • Task parallelism is often easier to implement and has less overhead than data parallelism.


Task parallelism17 l.jpg

Task Parallelism

  • The abstract code shown in the diagram is decomposed into 4 independent code segments that are labeled A, B, C, and D. The right hand side of the diagram illustrates the 4 code segments running concurrently.


Task parallelism18 l.jpg

Task Parallelism

Original Code

Parallel Code

program main

code segment labeled A

code segment labeled B

code segment labeled C

code segment labeled D

end

program main

code segment labeled A

code segment labeled B

code segment labeled C

code segment labeled D

end

program main

!$OMP PARALLEL

!$OMP SECTIONS

code segment labeled A

!$OMP SECTION

code segment labeled B

!$OMP SECTION

code segment labeled C

!$OMP SECTION

code segment labeled D

!$OMP END SECTIONS

!$OMP END PARALLEL

end


Openmp task parallelism l.jpg

OpenMP Task Parallelism

  • With OpenMP, the code that follows each SECTION(S) directive is allocated to a different processor. In our sample parallel code, the allocation of code segments to processors is as follows.


Parallelism in computers l.jpg

Parallelism in Computers

  • How parallelism is exploited and enhanced within the operating system and hardware components of a parallel computer:

    • operating system

    • arithmetic

    • memory

    • disk


Operating system parallelism l.jpg

Operating System Parallelism

  • All of the commonly used parallel computers run a version of the Unix operating system. In the table below each OS listed is in fact Unix, but the name of the Unix OS varies with each vendor.

  • For more information about Unix, a collection of Unix documents is available.


Two unix parallelism features l.jpg

Two Unix Parallelism Features

  • background processing facility

    • With the Unix background processing facility you can run the executable a.out in the background and simultaneously view the man page for the etime function in the foreground. There are two Unix commands that accomplish this:

      a.out > results &

      man etime

  • cron feature

    • With the Unix cron feature you can submit a job that will run at a later time.


Arithmetic parallelism l.jpg

Arithmetic Parallelism

  • Multiple execution units

    • facilitate arithmetic parallelism.

    • The arithmetic operations of add, subtract, multiply, and divide (+ - * /) are each done in a separate execution unit. This allows several execution units to be used simultaneously, because the execution units operate independently.

  • Fused multiply and add

    • is another parallel arithmetic feature.

    • Parallel computers are able to overlap multiply and add. This arithmetic is named MultiplyADD (MADD) on SGI computers, and Fused Multiply Add (FMA) on HP computers. In either case, the two arithmetic operations are overlapped and can complete in hardware in one computer cycle.

  • Superscalar arithmetic

    • is the ability to issue several arithmetic operations per computer cycle.

    • It makes use of the multiple, independent execution units. On superscalar computers there are multiple slots per cycle that can be filled with work. This gives rise to the name n-way superscalar, where n is the number of slots per cycle. The SGI Origin2000 is called a 4-way superscalar computer.


Memory parallelism l.jpg

Memory Parallelism

  • memory interleaving

    • memory is divided into multiple banks, and consecutive data elements are interleaved among them. For example if your computer has 2 memory banks, then data elements with even memory addresses would fall into one bank, and data elements with odd memory addresses into the other.

  • multiple memory ports

    • Port means a bi-directional memory pathway. When the data elements that are interleaved across the memory banks are needed, the multiple memory ports allow them to be accessed and fetched in parallel, which increases the memory bandwidth (MB/s or GB/s).

  • multiple levels of the memory hierarchy

    • There is global memory that any processor can access. There is memory that is local to a partition of the processors. Finally there is memory that is local to a single processor, that is, the cache memory and the memory elements held in registers.

  • Cache memory

    • Cache is a small memory that has fast access compared with the larger main memory and serves to keep the faster processor filled with data.


Memory parallelism25 l.jpg

Memory Parallelism

Memory Hierarchy

Cache Memory


Disk parallelism l.jpg

Disk Parallelism

  • RAID (Redundant Array of Inexpensive Disk)

    • RAID disks are on most parallel computers.

    • The advantage of a RAID disk system is that it provides a measure of fault tolerance.

    • If one of the disks goes down, it can be swapped out, and the RAID disk system remains operational.

  • Disk Striping

    • When a data set is written to disk, it is striped across the RAID disk system. That is, it is broken into pieces that are written simultaneously to the different disks in the RAID disk system. When the same data set is read back in, the pieces are read in parallel, and the full data set is reassembled in memory.


Agenda27 l.jpg

Agenda

1 Parallel Computing Overview

1.1 Introduction to Parallel Computing

1.1.1 Parallelism in our Daily Lives

1.1.2 Parallelism in Computer Programs

1.1.3 Parallelism in Computers

1.1.4 Performance Measures

1.1.5 More Parallelism Issues

1.2 Comparison of Parallel Computers

1.3 Summary


Performance measures l.jpg

Performance Measures

  • Peak Performance

    • is the top speed at which the computer can operate.

    • It is a theoretical upper limit on the computer's performance.

  • Sustained Performance

    • is the highest consistently achieved speed.

    • It is a more realistic measure of computer performance.

  • Cost Performance

    • is used to determine if the computer is cost effective.

  • MHz

    • is a measure of the processor speed.

    • The processor speed is commonly measured in millions of cycles per second, where a computer cycle is defined as the shortest time in which some work can be done.

  • MIPS

    • is a measure of how quickly the computer can issue instructions.

    • Millions of instructions per second is abbreviated as MIPS, where the instructions are computer instructions such as: memory reads and writes, logical operations , floating point operations, integer operations, and branch instructions.


Performance measures29 l.jpg

Performance Measures

  • Mflops(Millions of floating point operations per second)

    • measures how quickly a computer can perform floating-point operations such as add, subtract, multiply, and divide.

  • Speedup

    • measures the benefit of parallelism.

    • It shows how your program scales as you compute with more processors, compared to the performance on one processor.

    • Ideal speedup happens when the performance gain is linearly proportional to the number of processors used.

  • Benchmarks

    • are used to rate the performance of parallel computers and parallel programs.

    • A well known benchmark that is used to compare parallel computers is the Linpack benchmark.

    • Based on the Linpack results, a list is produced of the Top 500 Supercomputer Sites. This list is maintained by the University of Tennessee and the University of Mannheim.


More parallelism issues l.jpg

More Parallelism Issues

  • Load balancing

    • is the technique of evenly dividing the workload among the processors.

    • For data parallelism it involves how iterations of loops are allocated to processors.

    • Load balancing is important because the total time for the program to complete is the time spent by the longest executing thread.

  • The problem size

    • must be large and must be able to grow as you compute with more processors.

    • In order to get the performance you expect from a parallel computer you need to run a large application with large data sizes, otherwise the overhead of passing information between processors will dominate the calculation time.

  • Good software tools

    • are essential for users of high performance parallel computers.

    • These tools include:

      • parallel compilers

      • parallel debuggers

      • performance analysis tools

      • parallel math software

    • The availability of a broad set of application software is also important.


More parallelism issues31 l.jpg

More Parallelism Issues

  • The high performance computing market is risky and chaotic. Many supercomputer vendors are no longer in business, making the portability of your application very important.

  • A workstation farm

    • is defined as a fast network connecting heterogeneous workstations.

    • The individual workstations serve as desktop systems for their owners.

    • When they are idle, large problems can take advantage of the unused cycles in the whole system.

    • An application of this concept is the SETI project. You can participate in searching for extraterrestrial intelligence with your home PC. More information about this project is available at the SETI Institute.

    • Condor

      • is software that provides resource management services for applications that run on heterogeneous collections of workstations.

      • MironLivny at the University of Wisconsin at Madison is the director of the Condor project, and has coined the phrase high throughput computing to describe this process of harnessing idle workstation cycles. More information is available at the Condor Home Page.


Agenda32 l.jpg

Agenda

1 Parallel Computing Overview

1.1 Introduction to Parallel Computing

1.2 Comparison of Parallel Computers

1.2.1 Processors

1.2.2 Memory Organization

1.2.3 Flow of Control

1.2.4 Interconnection Networks

1.2.4.1 Bus Network

1.2.4.2 Cross-Bar Switch Network

1.2.4.3 Hypercube Network

1.2.4.4 Tree Network

1.2.4.5 Interconnection Networks Self-test

1.2.5 Summary of Parallel Computer Characteristics

1.3 Summary


Comparison of parallel computers l.jpg

Comparison of Parallel Computers

  • Now you can explore the hardware components of parallel computers:

    • kinds of processors

    • types of memory organization

    • flow of control

    • interconnection networks

  • You will see what is common to these parallel computers, and what makes each one of them unique.


Kinds of processors l.jpg

Kinds of Processors

  • There are three types of parallel computers:

    • computers with a small number of powerful processors

      • Typically have tens of processors.

      • The cooling of these computers often requires very sophisticated and expensive equipment, making these computers very expensive for computing centers.

      • They are general-purpose computers that perform especially well on applications that have large vector lengths.

      • The examples of this type of computer are the Cray SV1 and the Fujitsu VPP5000.


Kinds of processors35 l.jpg

Kinds of Processors

  • There are three types of parallel computers:

    • computers with a large number of less powerful processors

      • Named a Massively Parallel Processor (MPP), typically have thousands of processors.

      • The processors are usually proprietary and air-cooled.

      • Because of the large number of processors, the distance between the furthest processors can be quite large requiring a sophisticated internal network that allows distant processors to communicate with each other quickly.

      • These computers are suitable for applications with a high degree of concurrency.

      • The MPP type of computer was popular in the 1980s.

      • Examples of this type of computer were the Thinking Machines CM-2 computer, and the computers made by the MassPar company.


Kinds of processors36 l.jpg

Kinds of Processors

  • There are three types of parallel computers:

    • computers that are medium scale in between the two extremes

      • Typically have hundreds of processors.

      • The processor chips are usually not proprietary; rather they are commodity processors like the Pentium III.

      • These are general-purpose computers that perform well on a wide range of applications.

      • The most common example of this class is the Linux Cluster.


Trends and examples l.jpg

Trends and Examples

  • Processor trends :

  • The processors on today’s commonly used parallel computers:


Memory organization l.jpg

Memory Organization

  • The following paragraphs describe the three types of memory organization found on parallel computers:

    • distributed memory

    • shared memory

    • distributed shared memory


Distributed memory l.jpg

Distributed Memory

  • In distributed memory computers, the total memory is partitioned into memory that is private to each processor.

    • There is a Non-Uniform Memory Access time (NUMA), which is proportional to the distance between the two communicating processors.

  • On NUMA computers, data is accessed the quickest from a private memory, while data from the most distant processor takes the longest to access.

  • Some examples are the Cray T3E, the IBM SP, and workstation clusters.


Distributed memory40 l.jpg

Distributed Memory

  • When programming distributed memory computers, the code and the data should be structured such that the bulk of a processor’s data accesses are to its own private (local) memory.

  • This is called having good data locality.

  • Today's distributed memory computers use message passing such as MPI to communicate between processors as shown in the following example:


Distributed memory41 l.jpg

Distributed Memory

  • One advantage of distributed memory computers is that they are easy to scale. As the demand for resources grows, computer centers can easily add more memory and processors.

    • This is often called the LEGO block approach.

  • The drawback is that programming of distributed memory computers can be quite complicated.


Shared memory l.jpg

Shared Memory

  • In shared memory computers, all processors have access to a single pool of centralized memory with a uniform address space.

  • Any processor can address any memory location at the same speed so there is Uniform Memory Access time (UMA).

  • Processors communicate with each other through the shared memory.

  • The advantages and disadvantages of shared memory machines are roughly the opposite of distributed memory computers.

    • They are easier to program because they resemble the programming of single processor machines

    • But they don't scale like their distributed memory counterparts


Distributed shared memory l.jpg

Distributed Shared Memory

  • In Distributed Shared Memory (DSM) computers, a cluster or partition of processors has access to a common shared memory.

    • It accesses the memory of a different processor cluster in a NUMA fashion.

    • Memory is physically distributed but logically shared.

    • Attention to data locality again is important.

  • Distributed shared memory computers combine the best features of both distributed memory computers and shared memory computers.

  • That is, DSM computers have both the scalability of distributed memory computers and the ease of programming of shared memory computers.

  • Some examples of DSM computers are the SGI Origin2000 and the HP V-Class computers.


Trends and examples44 l.jpg

Trends and Examples

  • Memory organization trends:

  • The memory organization of today’s commonly used parallel computers:


Flow of control l.jpg

Flow of Control

  • When you look at the control of flow you will see three types of parallel computers:

    • Single Instruction Multiple Data (SIMD)

    • Multiple Instruction Multiple Data (MIMD)

    • Single Program Multiple Data (SPMD)


Flynn s taxonomy l.jpg

Flynn’s Taxonomy

  • Flynn’s Taxonomy, devised in 1972 by Michael Flynn of Stanford University, describes computers by how streams of instructions interact with streams of data.

  • There can be single or multiple instruction streams, and there can be single or multiple data streams. This gives rise to 4 types of computers as shown in the diagram below:

  • Flynn's taxonomy names the 4 computer types SISD, MISD, SIMD and MIMD.

    • Of these 4, only SIMD and MIMD are applicable to parallel computers.

    • Another computer type, SPMD, is a special case of MIMD.


Simd computers l.jpg

SIMD Computers

  • SIMD stands for Single Instruction Multiple Data.

  • Each processor follows the same set of instructions.

  • With different data elements being allocated to each processor.

  • SIMD computers have distributed memory with typically thousands of simple processors, and the processors run in lock step.

  • SIMD computers, popular in the 1980s, are useful for fine grain data parallel applications, such as neural networks.

  • Some examples of SIMD computers were the Thinking Machines CM-2 computer and the computers from the MassPar company.

  • The processors are commanded by the global controller that sends instructions to the processors.

    • It says add, and they all add.

    • It says shift to the right, and they all shift to the right.

    • The processors are like obedient soldiers, marching in unison.


Mimd computers l.jpg

MIMD Computers

  • MIMD stands for Multiple Instruction Multiple Data.

  • There are multiple instruction streams with separate code segments distributed among the processors.

  • MIMD is actually a superset of SIMD, so that the processors can run the same instruction stream or different instruction streams.

  • In addition, there are multiple data streams; different data elements are allocated to each processor.

  • MIMD computers can have either distributed memory or shared memory.

  • While the processors on SIMD computers run in lock step, the processors on MIMD computers run independently of each other.

  • MIMD computers can be used for either data parallel or task parallel applications.

  • Some examples of MIMD computers are the SGI Origin2000 computer and the HP V-Class computer.


Spmd computers l.jpg

SPMD Computers

  • SPMD stands for Single Program Multiple Data.

  • SPMD is a special case of MIMD.

  • SPMD execution happens when a MIMD computer is programmed to have the same set of instructions per processor.

  • With SPMD computers, while the processors are running the same code segment, each processor can run that code segment asynchronously.

  • Unlike SIMD, the synchronous execution of instructions is relaxed.

  • An example is the execution of an if statement on a SPMD computer.

    • Because each processor computes with its own partition of the data elements, it may evaluate the right hand side of the if statement differently from another processor.

    • One processor may take a certain branch of the if statement, and another processor may take a different branch of the same if statement.

    • Hence, even though each processor has the same set of instructions, those instructions may be evaluated in a different order from one processor to the next.

  • The analogies we used for describing SIMD computers can be modified for MIMD computers.

    • Instead of the SIMD obedient soldiers, all marching in unison, in the MIMD world the processors march to the beat of their own drummer.


Summary of simd versus mimd l.jpg

Summary of SIMD versus MIMD


Trends and examples51 l.jpg

Trends and Examples

  • Flow of control trends:

  • The flow of control on today:


Agenda52 l.jpg

Agenda

1 Parallel Computing Overview

1.1 Introduction to Parallel Computing

1.2 Comparison of Parallel Computers

1.2.1 Processors

1.2.2 Memory Organization

1.2.3 Flow of Control

1.2.4 Interconnection Networks

1.2.4.1 Bus Network

1.2.4.2 Cross-Bar Switch Network

1.2.4.3 Hypercube Network

1.2.4.4 Tree Network

1.2.4.5 Interconnection Networks Self-test

1.2.5 Summary of Parallel Computer Characteristics

1.3 Summary


Interconnection networks l.jpg

Interconnection Networks

  • What exactly is the interconnection network?

    • The interconnection network is made up of the wires and cables that define how the multiple processors of a parallel computer are connected to each other and to the memory units.

    • The time required to transfer data is dependent upon the specific type of the interconnection network.

    • This transfer time is called the communication time.

  • What network characteristics are important?

    • Diameter: the maximum distance that data must travel for 2 processors to communicate.

    • Bandwidth: the amount of data that can be sent through a network connection.

    • Latency: the delay on a network while a data packet is being stored and forwarded.

  • Types of Interconnection Networks

    The network topologies (geometric arrangements of the computer network connections) are:

    • Bus

    • Cross-bar Switch

    • Hybercube

    • Tree


Interconnection networks54 l.jpg

Interconnection Networks

  • The aspects of network issues are:

    • Cost

    • Scalability

    • Reliability

    • Suitable Applications

    • Data Rate

    • Diameter

    • Degree

  • General Network Characteristics

    • Some networks can be compared in terms of their degree and diameter.

    • Degree: how many communicating wires are coming out of each processor.

      • A large degree is a benefit because it has multiple paths.

    • Diameter: This is the distance between the two processors that are farthest apart.

      • A small diameter corresponds to low latency.


Bus network l.jpg

Bus Network

  • Bus topology is the original coaxial cable-based Local Area Network (LAN) topology in which the medium forms a single bus to which all stations are attached.

  • The positive aspects

    • It is also a mature technology that is well known and reliable.

    • The cost is also very low.

    • simple to construct.

  • The negative aspects

    • limited data transmission rate.

    • not scalable in terms of performance.

  • Example: SGI Power Challenge.

    • Only scaled to 18 processors.


Cross bar switch network l.jpg

Cross-Bar Switch Network

  • A cross-bar switch is a network that works through a switching mechanism to access shared memory.

    • it scales better than the bus network but it costs significantly more.

  • The telephone system uses this type of network. An example of a computer with this type of network is the HP V-Class.

  • Here is a diagram of a cross-bar switch network which shows the processors talking through the switchboxes to store or retrieve data in memory.

  • There are multiple paths for a processor to communicate with a certain memory.

  • The switches determine the optimal route to take.


Cross bar switch network57 l.jpg

Cross-Bar Switch Network

  • In a hypercube network, the processors are connected as if they were corners of a multidimensional cube. Each node in an N dimensional cube is directly connected to N other nodes.

  • The fact that the number of directly connected, "nearest neighbor", nodes increases with the total size of the network is also highly desirable for a parallel computer.

  • The degree of a hypercube network is log n and the diameter is log n, where n is the number of processors.

  • Examples of computers with this type of network are the CM-2, NCUBE-2, and the Intel iPSC860.


Tree network l.jpg

Tree Network

  • The processors are the bottom nodes of the tree. For a processor to retrieve data, it must go up in the network and then go back down.

  • This is useful for decision making applications that can be mapped as trees.

  • The degree of a tree network is 1. The diameter of the network is 2 log (n+1)-2 where n is the number of processors.

  • The Thinking Machines CM-5 is an example of a parallel computer with this type of network.

  • Tree networks are very suitable for database applications because it allows multiple searches through the database at a time.


Interconnected networks l.jpg

Interconnected Networks

  • Torus Network: A mesh with wrap-around connections in both the x and y directions.

  • Multistage Network: A network with more than one networking unit.

  • Fully Connected Network: A network where every processor is connected to every other processor.

  • Hypercube Network: Processors are connected as if they were corners of a multidimensional cube.

  • Mesh Network: A network where each interior processor is connected to its four nearest neighbors.


Interconnected networks60 l.jpg

Interconnected Networks

  • Bus Based Network: Coaxial cable based LAN topology in which the medium forms a single bus to which all stations are attached.

  • Cross-bar Switch Network: A network that works through a switching mechanism to access shared memory.

  • Tree Network: The processors are the bottom nodes of the tree.

  • Ring Network: Each processor is connected to two others and the line of connections forms a circle.


Summary of parallel computer characteristics l.jpg

Summary of Parallel Computer Characteristics

  • How many processors does the computer have?

    • 10s?

    • 100s?

    • 1000s?

  • How powerful are the processors?

    • what's the MHz rate

    • what's the MIPS rate

  • What's the instruction set architecture?

    • RISC

    • CISC


Summary of parallel computer characteristics62 l.jpg

Summary of Parallel Computer Characteristics

  • How much memory is available?

    • total memory

    • memory per processor

  • What kind of memory?

    • distributed memory

    • shared memory

    • distributed shared memory

  • What type of flow of control?

    • SIMD

    • MIMD

    • SPMD


Summary of parallel computer characteristics63 l.jpg

Summary of Parallel Computer Characteristics

  • What is the interconnection network?

    • Bus

    • Crossbar

    • Hypercube

    • Tree

    • Torus

    • Multistage

    • Fully Connected

    • Mesh

    • Ring

    • Hybrid


Design decisions made by some of the major parallel computer vendors l.jpg

Design decisions made by some of the major parallel computer vendors


Summary l.jpg

Summary

  • This completes our introduction to parallel computing.

  • You have learned about parallelism in computer programs, and also about parallelism in the hardware components of parallel computers.

  • In addition, you have learned about the commonly used parallel computers, and how these computers compare to each other.

  • There are many good texts which provide an introductory treatment of parallel computing. Here are two useful references:

    Highly Parallel Computing, Second EditionGeorge S. Almasi and Allan GottliebBenjamin/Cummings Publishers, 1994Parallel Computing Theory and PracticeMichael J. QuinnMcGraw-Hill, Inc., 1994


  • Login