Types of parallel computers
Download
1 / 22

Types of Parallel Computers - PowerPoint PPT Presentation


  • 103 Views
  • Uploaded on

Types of Parallel Computers. Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer. ITCS 4/5145 Parallel Programming, UNC-Charlotte, B. Wilkinson, 2012. Jan 14, 2013. Shared Memory Multiprocessor. Conventional Computer.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Types of Parallel Computers' - danno


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Types of parallel computers
Types of Parallel Computers

Two principal approaches:

  • Shared memory multiprocessor

  • Distributed memory multicomputer

ITCS 4/5145 Parallel Programming, UNC-Charlotte, B. Wilkinson, 2012. Jan 14, 2013



Conventional computer
Conventional Computer

Consists of a processor executing a program stored in a (main) memory:

Each main memory location located by its address. Addresses start at 0 and extend to 2b - 1 when there are b bits (binary digits) in address.

Main memory

Instr

uctions (to processor)

Data (to or from processor)

Processor


Shared memory multiprocessor system
Shared Memory Multiprocessor System

Natural way to extend single processor model - have multiple processors connected to multiple memory modules, such that each processor can access any memory module:

Memory module

One

address

space

Processor-memory Interconnections

Processors


Simplistic view of a small shared memory multiprocessor
Simplistic view of a small shared memory multiprocessor

Examples:

  • Dual Pentiums

  • Quad Pentiums

Processors

Shared memory

Bus


Real computer system have cache memory between main memory and processors. Level 1 (L1) cache and Level 2 (L2) cache.Example Quad Shared Memory Multiprocessor

Processor

Processor

Processor

Processor

L1 cache

L1 cache

L1 cache

L1 cache

L2 Cache

L2 Cache

L2 Cache

L2 Cache

Bus interface

Bus interface

Bus interface

Bus interface

Processor/

memory

b

us

Memory controller

Memory

Shared memory


Recent innovation since 2005
“Recent” innovation and processors. Level 1 (L1) cache and Level 2 (L2) cache.(since 2005)

  • Dual-core and multi-core processors

  • Two or more independent processors in one package

  • Actually an old idea but not put into wide practice until recently with the limits of making single processors faster principally caused by:

    • Power dissipation (power wall) and clock frequency limitations

    • Limits in parallelism within a single instruction stream

    • Memory speed limitations (memory wall)


Power dissipation and processors. Level 1 (L1) cache and Level 2 (L2) cache.

Clock frequency

“The Free Lunch Is Over: A Fundamental Turn Toward Concurrency in Software” Herb Sutter, http://www.gotw.ca/publications/concurrency-ddj.htm


Single quad core shared memory multiprocessor
Single “quad core” shared memory multiprocessor and processors. Level 1 (L1) cache and Level 2 (L2) cache.

Chip

Processor

Processor

Processor

Processor

L1 cache

L1 cache

L1 cache

L1 cache

L2 Cache

Memory controller

Memory

Shared memory


Multiple quad core multiprocessors example coit grid05 uncc edu
Multiple quad-core multiprocessors and processors. Level 1 (L1) cache and Level 2 (L2) cache.(example coit-grid05.uncc.edu)

Processor

Processor

Processor

Processor

Processor

Processor

Processor

Processor

L2 Cache

L1 cache

L1 cache

L1 cache

L1 cache

L1 cache

L1 cache

L1 cache

L1 cache

possible L3 cache

Memory controller

Memory

Shared memory


Programming shared memory multiprocessors
Programming Shared Memory Multiprocessors and processors. Level 1 (L1) cache and Level 2 (L2) cache.

Several possible ways – we will concentrate upon using threads

Threads - individual parallel sequences (threads), each thread having their own local variables but being able to access shared variables declared outside threads.

1. Low–level thread libraries - programmer calls thread routines to create and control the threads. Example Pthreads, Java threads.

2. Higher level library functions and preprocessor compiler directives.

Example OpenMP - industry standard. Consists of library functions, compiler directives, and environment variables


Tasks and processors. Level 1 (L1) cache and Level 2 (L2) cache.– rather than program with threads, which are closely linked to the physical hardware, can program with parallel “tasks”. Promoted by Intel with their TBB (Thread Building Blocks) tools.

Other alternatives include parallelizing compilers compiling regular sequential programs and making them parallel programs, and special parallel languages (both not now common).


Gpu clusters
GPU clusters and processors. Level 1 (L1) cache and Level 2 (L2) cache.

  • Recent trend for clusters – incorporating GPUs for high performance.

  • GPU often attached through PCI-e x16 interface to CPU, and separate GPU memory.

  • Now 1000’s cores in each GPU offering orders of magnitude speed improvement for HPC tasks.

  • 10,000’s of threads possible (Data parallel programming model, see later)


Message passing multicomputer
Message-Passing Multicomputer and processors. Level 1 (L1) cache and Level 2 (L2) cache.


Message passing multicomputer1
Message-Passing Multicomputer and processors. Level 1 (L1) cache and Level 2 (L2) cache.

Complete computers connected through an interconnection network:

Many interconnection networks explored in the 1970s and 1980s

including 2- and 3-dimensional meshes, hypercubes, and multistage interconnection networks

Interconnection

network

Messages

Processor

Local

memory

Computers


Networked computers as a computing platform
Networked Computers as a Computing Platform and processors. Level 1 (L1) cache and Level 2 (L2) cache.

  • Became a very attractive alternative to expensive supercomputers and parallel computer systems for high-performance computing in early 1990s.

  • Several early projects. Notable:

    • Berkeley NOW (network of workstations) project.

    • NASA Beowulf project.


Key advantages
Key advantages: and processors. Level 1 (L1) cache and Level 2 (L2) cache.

  • Very high performance workstations and PCs readily available at low cost.

  • The latest processors can easily be incorporated into the system as they become available.

  • Existing software can be used or modified.


Beowulf clusters
Beowulf Clusters and processors. Level 1 (L1) cache and Level 2 (L2) cache.

  • A group of interconnected “commodity” computers achieving high performance with low cost.

  • Typically using commodity interconnects - high speed Ethernet, and Linux OS.

    Beowulf comes from name given by NASA Goddard Space Flight Center cluster project.


Cluster interconnects
Cluster Interconnects and processors. Level 1 (L1) cache and Level 2 (L2) cache.

  • Originally fast Ethernet on low cost clusters

  • Gigabit Ethernet - easy upgrade path

    More specialized/higher performance interconnects available including Myrinet and Infiniband.


Dedicated cluster with a master node and compute nodes
Dedicated cluster with a master node and compute nodes and processors. Level 1 (L1) cache and Level 2 (L2) cache.

User

Computers

Dedicated Cluster

External network

Ethernet interface

Master node

Switch

Local network

Compute nodes


Software tools for clusters
Software Tools for Clusters and processors. Level 1 (L1) cache and Level 2 (L2) cache.

  • Based upon message passing programming model

  • User-level libraries provided for explicitly specifying messages to be sent between executing processes on each computer .

  • Use with regular programming languages (C, C++, ...).

  • Can be quite difficult to program correctly as we shall see.


Next step
Next step and processors. Level 1 (L1) cache and Level 2 (L2) cache.

  • Learn the message passing programming model, some MPI routines, write a message-passing program and test on the cluster.


ad