Computer Science 320

1 / 14

Computer Science 320 - PowerPoint PPT Presentation

Computer Science 320. Reduction. Estimating π. Throw N darts, and let C be the number of darts that land within the circle quadrant of a unit circle Then, C / N should be about the same ratio as circle area / square area

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

PowerPoint Slideshow about 'Computer Science 320' - jude

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Computer Science 320

Reduction

Estimating π

Throw N darts, and let C be the number of darts that land within the circle quadrant of a unit circle

Then, C / N should be about the same ratio as circle area / square area

Circle’s area = π * R2, and circle quadrant’s area is π / 4, where R = 1

Then C / N = π / 4, and

π = 4 * C / N

Sequential Program PiSeq

// Generate n random points in the unit square, count how many are in

// the unit circle.

count = 0;

for (long i = 0; i < N; ++ i){

double x = prng.nextDouble();

double y = prng.nextDouble();

if (x * x + y * y <= 1.0) ++ count;

}

// Stop timing.

time += System.currentTimeMillis();

// Print results.

System.out.println("pi = 4 * " + count + " / " + N + " = " +

(4.0 * count / N));

new ParallelTeam().execute (new ParallelRegion(){

public void run() throws Exception{

execute (0, N-1, new LongForLoop(){

// Set up per-thread PRNG and counter.

// Extra padding to avert cache interference.

// Parallel loop body.

public void run (long first, long last){

// Skip PRNG ahead to index <first>

// Generate random points.

for (long i = first; i <= last; ++ i){

if (x * x + y * y <= 1.0) ++ count_thread;

}

}

Parallel Program PiSmp3
Reduction Step, SMP-Style

static SharedLong count;

. . .

. . .

public void finish(){

// Reduce per-thread counts into shared count.

}

Monte Carlo Design for a Cluster
• Could keep global counter in process 0, but that would involve too many messages
• Use reduction instead, so message passing is minimal
• Each process has its own PRNG, with its own split sequence
Reduction vs Gather
• Could allocate an array of K cells for results, where the ith processor’s result is in the ith cell; then gather these into process 0 and let process 0 reduce the end result from these
• Instead, the reduce method employs all processes in computing the reduction
Reduction in Cluster
• Concentrate data into fewer and fewer processes
• When K = 8,
• processes 4-7 send their data to processes 0-3
• processes 2-3 send their results to processes 0-1
• process 1 sends its results to process 0
• At most log2(K) messages!
Reduction Tree for K = 8

Messages are sent in parallel at each level, starting at the bottom

When results have been computed, messages are sent from the next level

Initial state

After first set of messages

After second set of messages

After third set of messages

It’s Automatic: reduce

world.reduce(0, buf, InegerOp.SUM);

// Compute the count in each processor

...

// Perform the reduction step

LongItemBufbuf = new LongItemBuf();

buf.item = count;

world.reduce(0, buf, InegerOp.SUM);

count = buf.item;

...

...

if (rank == 0)

// Output the count and the estimate of PI

Reduction in Mandelbrot Histogram

int[] histogram = new int[maxiter = 1];

...

world.reduce(0, IntegerBuf.buffer(histogram), InegerOp.SUM);

Reduction in Mandelbrot Histogram

int[] histogram = new int[maxiter = 1];

...

world.reduce(0, IntegerBuf.buffer(histogram), InegerOp.SUM);