models for parallel computers n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Models for Parallel Computers PowerPoint Presentation
Download Presentation
Models for Parallel Computers

Loading in 2 Seconds...

play fullscreen
1 / 21

Models for Parallel Computers - PowerPoint PPT Presentation


  • 114 Views
  • Uploaded on

Models for Parallel Computers. Goals of a machine model accurately reflect constraints of existing machines broad applicability with respect to existing and future machines allows accurate prediction of performance Models being discussed PRAM BSP LogP.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Models for Parallel Computers' - marla


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
models for parallel computers
Models for Parallel Computers
  • Goals of a machine model
    • accurately reflect constraints of existing machines
    • broad applicability with respect to existing and future machines
    • allows accurate prediction of performance
  • Models being discussed
    • PRAM
    • BSP
    • LogP
parallel random access machine pram
Parallel Random Access Machine (PRAM)
  • PRAM
    • S. Fortune, J. Wyllie, Parallelism in Random Access Machines, 10th ACM Symposium on Theory of Computing, pp. 114-118, 1978
    • Abstraction of a synchronized MIMD computer with shared memory
  • Definition of a n-PRAM
    • It is a parallel register machine with n processors P0, ..., Pn-1 with a shared memory.
    • In each step, each processor can work as a separate register machine or can access a cell of the shared memory.
    • The processors are working synchronously, one step takes unit time.
pram 2 example
PRAM (2): Example
  • Task
    • Given are n variables X0, ..., Xn-1{0,1}.
    • Compute function ORn with

ORn(X0,...,Xn-1)=1 iff i{0,...,n-1} with Xi=1

  • Solution 1:
    • Use n-PRAM
    • Calculate on a tree network in which each inner node performs the operation OR2 with data sent from its sons. The root gets the result.
    • This solution has runtime O(log n).
pram 3 example
PRAM (3): Example
  • Solution 2:
    • Use a n-PRAM, let processor Pi access variable Xi.

for all processors Pi pardo

if Xi=1 then Result :=1

od

    • This solution has constant runtime.
pram 4
PRAM (4)
  • Different PRAM models:
    • Exclusive Read Exclusive Write (EREW): It is not possible to read or write a memory cell simultaneously with several processors.
    • Concurrent Read Exclusive Write (CREW): It is only possible to read a cell simultaneously.
    • Concurrent Read Concurrent Write (CRCW): Processors can read and write a cell simultaneously. Concurrent write forces to define which one of the concurrent processors will win.
      • Arbitrary: One processor wins, but it is not known in advance which one wins.
      • Common: All processors must write identical data.
      • Priority: The processor with the largest or lowest index wins.
pram 5
PRAM(5)
  • A CRCW PRAM can be simulated on a EREW PRAM
    • Delay O(log n), e.g. 256 processors can be simulated with 8c where c is a constant factor - the simulation overhead.
pram 6
PRAM (6)
  • The PRAM model favors maximal parallelism.
  • It does not take aspects of real machines into account. Machines do not provide:
    • Unit time communication cost, no contention
    • Synchronous execution and therefore explicit synchronization is required.
  • Hardware implementation of a PRAM at Fachbereich 14, Universität des Saarlandes (Prof. Wolfgang Paul)
    • Fork language (Christoph Kessler, Universität Trier)
bulk synchronous parallel model bsp
Bulk-Synchronous Parallel Model (BSP)
  • Definition BSP machine
    • It consists of processors with local memory connected by a communication network.
    • It performs a sequence of supersteps consisting of three phases:
      • Local computation phase:
        • asynchronous operations
        • extract messages from an input buffer
        • insert messages into an output buffer
      • Communication phase:
        • messages are transferred from output buffer to destination input buffer
      • Barrier synchronization concludes superstep and makes moved data available in the local memories.
superstep
Superstep

Processors

  • Execution time of a superstep

w + g h + l

  • w is the maximum number of local operations performed by any processor
  • h is the maximum number of messages sent or received by any processor (h-relation)
  • g reflects the bandwidth measured in time units per message
  • l upper bound of global synchronization time

Synchronization

other cost functions of a superstep
Other Cost Functions of a Superstep
  • Overlapping communication and computation

MAX(w, hg) + L

  • Taking into account the sequence of sending and receiving messages

(hin + hout) g

  • The cost model does not take into account:
    • splitting cost in startup and transfer cost
    • locality: mapping the processes in the right way onto the machine will reduce contention and communication distance.
  • L.G. Valiant: A Bridging Model for Parallel Computation, Communications of the ACM, Vol. 33, No. 8, pp. 103-111, 1990
  • users.comlab.ox.ac.uk/bill.mccoll/oparl.html
broadcast3
Broadcast
  • Summary: Broadcasting
    • When N is small ( i.e., N<l/(pg-2g) use the one-stage broadcast
    • The two-stage broadcast is always better than the tree broadcast when p>4
    • Most communication libraries still implement broadcast by the tree based technique
  • Good BSP design = Balanced design
slide15
BSP
  • The BSP model describes a special but realistic computational structure
  • It allows to study the influence of synchronization and communication.
  • It allows for asynchronous computation.
slide16
LogP
  • LogP model takes into account latency, overhead and bandwidth of the communication.
    • David Culler, A Practical Model of Parallel Computation, Communications of the ACM, Vol. 39, No.11, November 1996
slide17
LogP
  • A parallel computer is described in the LogP model by the following parameters:
    • L: An upper bound on the latency, or delay, incurred in communicating a message containing a word from its source node to its target node.
    • o: The overhead, defined as the time that a processor is engaged in the transmission or reception of each message. During this time the processor cannot perform other operations.
    • g: The gap, defined as the minimum time interval between consecutive message transmissions or receptions. The reciprocal of g corresponds to the available per-processor bandwidth
    • P: The number of processor/memory modules.
logp 2
LogP (2)

P

P

P

M

M

M

  • The latency experienced by any message is unpredictable, but it is bound by L.
  • The network is treated as a pipeline of length L with initiation rate g and a processor overhead of o on each end.
  • The capacity of the network is finite. No more than L/g messages can be in transit from a processor at any time.

g (gap)

o (overhead)

o

L (latency)

Interconnection Network

logp implementation of broadcast
LogP Implementation of Broadcast

P=8, L=6, g=4, o=2

Total execution time: 24

logp 3
LogP (3)
  • The LogP model takes into account latency and overhead for message transmission.
    • The technique of multithreading is often suggested as a way of masking latency. In practice this technique is limited by the available communication bandwidth. This is taken into account by the finite network capacity.
  • It does encourage
    • techniques optimizing the data placement to reduce amount and frequency of communication.
    • latency hiding techniques.
  • It does not model
    • local operations
    • explicit synchronization operations
    • details of the interconnection topology
summary
Summary
  • Models highlight different aspects of the machine.
    • PRAM: parallelism
    • BSP: coarse communication model, special programming structure
    • LogP: detailed communication model