CSC 364/664 Parallel Computation Fall 2003 Burg/Miller/Torgersen

CSC 364/664 Parallel ComputationFall 2003 Burg/Miller/Torgersen Chapter 1: Parallel Computers

Concurrency vs. True Parallelism • Concurrency is used in systems where more than one user is using a resource at the same • CPU • Database information • In true parallelism, multiple processors are working simultaneously on one application problem

Flynn’s Taxonomy – Classification by Control Mechanism • A classification of parallel systems from a “flow of control” perspective • SISD –single instruction, single data • SIMD – single instruction, multiple data • MISD – multiple instructions, single data • MIMD – multiple instructions, multiple data

SISD • Single instruction, single data • Sequential programming with one processor, just like you’ve always done

SIMD • Single instruction, multiple data • One control unit issuing the same instruction to multiple CPU’s that operate simultaneously on their own portions of data • Lock-step, synchronized • Vector and matrix computation lend themselves to an SIMD implementation • Examples of SIMD computers: Illiac IV, MPP, DAP, CM-2, and MasPar MP-2

MIMD • Multiple instructions, multiple data • Each processor “doing its own thing” • Processors synchronize either through passing messages or writing values to shared memory addresses • Subcategories • SPMD – single program, multiple data (MPI on a Linux cluster) • MPMD – multiple program, multiple data (PVM) • Examples of MIMD computers – BBN butterfly, IPSC 1 and 2, IBM SP, SP2

MISD • Multiple instruction, single data • Doesn’t really exist, unless you consider pipelining an MISD configuration

Comparison of SIMD and MIMD • It takes a specially-designed computer to do SIMD computing, since one control unit controls multiple processors. • SIMD requires only one copy of a program. • MIMD systems have a copy of the program and operating system at each processor. • SIMD computers quickly become obsolete. • MIMD systems can be pieced together from the most up-to-date components available.

Classification by Communication Mechanism • Shared-address-space • “Multiprocessors”, each with its own control unit • Virtual memory makes all memory addresses look like they come from one consistent space, but they don’t necessarily • Processors communicate with reads and writes • Message passing systems • “Multicomputers” • Separate processors and separate memory addresses • Processors communicate with message passing

Shared Memory Address Space • Interprocess communication is done in the memory interface through reads and writes. • Virtual memory address maps to a real address. • Different processors may have memory locally attached to them. • Access could needed to a processor’s own memory, or to the memory attached to a different processor. • Different instances of memory access could take different amounts of time. Collisions are possible. • UMA (i.e., shared memory) vs. NUMA (i.e., distributed shared memory)

Message Passing System • Interprocess communication is done at the program level using sends and receives. • Reads and writes refer only to a processor’s local memory. • Data can be packed into long messages before being sent, to compensate for latency. • Global scheduling of messages can help avoid message collisions.

Basic Architecture Terms – Clock Speed and Bandwidth • Clock speed of a processor– max # of times per sec. that a device can say something new • Bandwidthof a transmission medium (i.e., telephone line, cable line, etc.) is defined as the maximum rate at which the medium can change a signal. Bandwidth is measured in cycles per second or Hertz. Bandwidth is determined by the physical properties of the transmission medium, including the material of which it is composed.

Basic Architecture Terms – Clock Speed and Bandwidth • Data rate is a measure of the amount of data that can be sent across a transmission medium per unit time. Data rate is determined by two things (1) the bandwidth, and (2) the potential number of different things that can be conveyed each time the signal changes (which, in the case of a bus, is based on the number of parallel data lines).

Basic Architecture Terms -- Bus • A bus is a communication medium to which all processors are connected. • Only one communication at a time is allowed on the bus. • Only one step from any source to any destination. • Bus data rate (sometimes loosely called “bandwidth”) is defined as clock speed times number of bits transmitted at each clock pulse • Bus is low-cost, but you can’t have very many processors attached to it.

Bus on a Motherboard • The bus transports data among the CPU, memory,and other components. • It consists of electrical circuits called traces and adapters or expansion cards. • There’s a main motherboard bus, and then buses for the CPU, memory, SCSI connections, and USB.

Types of Buses • Original IBM PC bus – 8-bit parallel, 4.77 MHz clock speed • IBM AT, 1982, introduced the ISA bus (Industry Standard Architecture), 16 bit parallel, with expansion slots, still compatible with 8-bit, 8 MHz clock speed • IBM PS/2, MCA (Microchannel Architecture) bus, 32 bit parallel, but not backwardly compatible; 10 MHz clock speed; didn’t catch on

Types of Buses • Compaq and other IBM rivals introduced EISA (Extended Industry Standard Architecture) bus in 1988, 32-bit parallel, 8.2 MHz clock speed; didn’t catch on • VL-Bus (Vesa Local Bus), 32-bit parallel, close to clock speed of CPU, tied directly to CPU • The trend moved to specialized buses with higher clock speeds, closer to the CPU’s clock speed, and separate from the system bus – e.g. PCI (Peripheral Component Bus)

PCI bus • PCI bus can exist side-by-side with ISA bus and system bus; in this sense it’s a “local” bus • Originally 33 MHz, 32-bits • PCI-X is 133 MHz, 64 bit for 1 GB/sec data transfer rate • Supports Plug and Play • See http://computer.howstuffworks.com/pci.htm

Ethernet Bus-Based Network • All nodes branch off a common line. • Each device has an ethernet address, also known as MAC address. • All computers receive all data transmissions (in packets). They look to see if the packet is addressed to them, and read it only if it is. • When a computer wants to transmit data, it waits until the line is free. • CSMA/CD protocol is used (carrier-sense multiple access with collision detection).

Basic Architecture Terms -- Ethernet • Ethernet is actually an OSI layer 2 communication protocol. It does not dictate the type of connectivity – could be copper, fiber, wireless. • Today’s ethernet is full-duplex, i.e., it has separate lines for send and receive • IEEE Standard 802.3 • Ethernet comes in 10, 100, and 1000 Mb/sec (1 Gb/sec) speeds. • See http://computer.howstuffworks.com/ethernet.htm

Basic Architecture Terms -- Hub • Hubs connect computers in a network. • They operate using a broadcast model. When n computers are connected to a hub, hubs simply pass through all network traffic to each of the n computers.

Basic Architecture Terms -- Switch • Unlike hubs, switches can look at data packets as they are received, determine the source and destination device, and forward the packet appropriately. • By delivering messages only to the device that the packet was intended for, switches conserve network bandwidth. • See http://howstuffworks.com/lan-switch.htm

Basic Architecture Terms -- Myrinet • Packet communication and switching technology, faster than ethernet. • Myrinet offers full-duplex 2+2 Gb/sec data rate and low latency. It is used in Linux clusters. • Only 16 of the nodes of WFU’s clusters are connected with myrinet. The rest are connected with ethernet, for cost reasons.

Classification by Interconnection Network • Static network • Bus-based network can be static (if no switches are involved) • Direct links between computers • Examples include completely connected, line/ring, mesh, tree (regular and fat), and hypercube • Dynamic network • Uses switches • Connections change according to whether a switch is open or closed • Could be arranged in stages (multistage) (e.g., Omega network)

Hypercube • A d-dimensional hypercube has 2d nodes. • Each node has a d-bit address. • Neighboring nodes differ in one bit. • Needs a routing algorithm. We’ll try one in class.

Multistage Networks • See notes on Omega network from class.

Properties of Network Communication • Diameter of a network – min # links between 2 farthest nodes • Bisection widthof a network -- # links that must be cut to divide network into 2 equal parts

Properties of Network Communication • Message latency – time taken to prepare the message to be sent (software overhead) • Network latency – time taken for a message to pass through a network • Communication latency – total time taken to send a message, including message and network latency • Deadlock – occurs when packets cannot be forwarded because they are waiting for each other in a circular way

Memory Hierarchy • Global memory • Local memory • Cache • Faster, but more expensive • Cache coherence must be maintained

Communication Methods • Circuit switching • Packet switching • Wormhole routing

Properties of a Parallel Program • Granularity • Speedup • Overhead • Efficiency • Cost • Scalability • Gustafson’s law

CSC 364/664 Parallel Computation Fall 2003 Burg/Miller/Torgersen

CSC 364/664 Parallel Computation Fall 2003 Burg/Miller/Torgersen

Presentation Transcript

Principles of Parallel Algorithm Design

Parallel Processing

Foundations of Computer Science from Data Manipulation to Theory of Computation

Introduction to Media Computation

Cohesive devices… .

Principles of Parallel Algorithm Design

Introduction to Parallel Computing Intel Math Kernel Library

Cancer

Parallel Concept and Hardware Architecture CUDA Programming Model Overview

CUDA Lecture 4 CUDA Programming Basics

Parallel Visualization with ParaView

Distributed Parallel Computing

CMPUT329 - Fall 2003

Parallels

CMPUT329 - Fall 2003

Theory of Computation

A Review of the Miller Method and Relationship Development Intervention as Treatments for Autism

Parallel Algorithms

Parallel and Distributed Algorithms Spring 2007

Parallels

Quantum Computing

Parallel Computing Final Exam Review