Cs668 lecture 9 objectives
This presentation is the property of its rightful owner.
Sponsored Links
1 / 40

CS668 Lecture 9 Objectives PowerPoint PPT Presentation


  • 75 Views
  • Uploaded on
  • Presentation posted in: General

CS668 Lecture 9 Objectives. Material from Chapter 9 Show how to implement manager-worker programs Parallel Algorithms for Document Classification Parallel Algorithms for Clustering. Outline. Creating communicators Non-blocking communications Implementation Pipelining Tasks

Download Presentation

CS668 Lecture 9 Objectives

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Cs668 lecture 9 objectives

CS668 Lecture 9 Objectives

  • Material from Chapter 9

  • Show how to implement manager-worker programs

  • Parallel Algorithms for Document Classification

  • Parallel Algorithms for Clustering


Outline

Outline

  • Creating communicators

  • Non-blocking communications

  • Implementation

  • Pipelining Tasks

  • Clustering Vectors


Implementation of a very simple document classifier

Implementation of a Very Simple Document Classifier

  • Manager/Worker Design Strategy

  • Manager description

    • create initial tasks

    • communicate to/from workers

  • Worker description

    • receive tasks

    • Alternate between task-computation and communication with master


Structure of main program manager worker paradigm

Structure of Main program:Manager/Worker Paradigm

MPI_Init (&argc, &argv);

// what is my rank?

MPI_Comm_rank(MPI_COMM_WORLD, &myrank);

// how many processors are there?

MPI_Comm_size(MPI_COMM_WORLD, &p);

if (myid == 0)

Manager(p);

else

Worker(myid, p);

MPI_Barrier(MPI_COMM_WORLD);

MPI_Finalize();

return(0);


More mpi functions

More MPI functions

  • MPI_Abort

  • MPI_Comm_split

  • MPI_Isend, MPI_Irecv, MPI_Wait

  • MPI_Probe

  • MPI_Get_count

  • MPI_Testsome


Mpi abort

MPI_Abort

  • A “quick and dirty” way for one process to terminate all processes in a specified communicator

  • Example use: If manager cannot allocate memory needed to store document profile vectors

int MPI_Abort (

MPI_Comm comm, /* Communicator */

int error_code)/* Value returned to

calling environment */


Creating a workers only communicator

Creating a Workers-only Communicator

  • To support workers-only broadcast, need workers-only communicator

  • Can use MPI_Comm_split

  • Excluded processes (e.g., Manager) passes MPI_UNDEFINED as the value of split_key, meaning it will not be part of any new communicator


Workers only communicator

Workers-only Communicator

int id;

MPI_Comm worker_comm;

...

if (!id) /* Manager */

MPI_Comm_split (MPI_COMM_WORLD,

MPI_UNDEFINED, id, &worker_comm);

else /* Worker */

MPI_Comm_split (MPI_COMM_WORLD, 0,

id, &worker_comm);


Nonblocking send receive

Nonblocking Send / Receive

  • Persistent communications reduce overhead when repeated calls to same point-to-point message passing routines

  • MPI_Isend, MPI_Irecv initiate operation

  • MPI_Wait blocks until operation complete

  • Calls can be made early

    • MPI_Isend as soon as value(s) assigned

    • MPI_Irecv as soon as buffer available

  • Can eliminate a message copying step

  • Allows communication / computation overlap


Function mpi isend

Pointer to object that identifies

communication operation

Function MPI_Isend

int MPI_Isend (

void *buffer,

int cnt,

MPI_Datatype dtype,

int dest,

int tag,

MPI_Comm comm,

MPI_Request *handle

)


Function mpi irecv

Pointer to object that identifies

communication operation

Function MPI_Irecv

int MPI_Irecv (

void *buffer,

int cnt,

MPI_Datatype dtype,

int src,

int tag,

MPI_Comm comm,

MPI_Request *handle

)


Function mpi wait

Function MPI_Wait

int MPI_Wait (

MPI_Request *handle,

MPI_Status *status

)

Blocks until operation associated with pointer handle completes.

status points to object containing info on received message


Receiving problem

Receiving Problem

  • Worker does not know length of message it will receive

  • Example, the length of File Path Name

  • Alternatives

    • Allocate huge buffer

    • Check length of incoming message, then allocate buffer

  • We’ll take the second alternative


Function mpi probe

Function MPI_Probe

int MPI_Probe (

int src,

int tag,

MPI_Comm comm,

MPI_Status *status

)

Works for any Send.

Blocks until message is available to be received from

process with rank src with message tag tag; status pointer gives info

on message size.


Function mpi get count

Function MPI_Get_count

int MPI_Get_count (

MPI_Status *status,

MPI_Datatype dtype,

int *cnt

)

cnt returns the number of elements in message


Mpi testsome

MPI_Testsome

  • Non-blocking test

  • Often need to check whether one or more messages have arrived

  • Manager posts a nonblocking receive to each worker process

  • Builds an array of handles or request objects

  • Testsome allows manager to determine how many messages have arrived


Function mpi testsome

Function MPI_Testsome

int MPI_Testsome (

int in_cnt, /* IN - Number of

nonblocking receives to check */

MPI_Request *handlearray, /* IN -

Handles of pending receives */

int *out_cnt, /* OUT - Number of

completed communications */

int *index_array, /* OUT - Indices of

completed communications */

MPI_Status *status_array) /* OUT -

Status records for completed comms */


Document classification problem

Document Classification Problem

  • Search directories, subdirectories for documents (look for .html, .txt, .tex, etc.)

  • Using a dictionary of key words, create a profile vector for each document

  • Store profile vectors


Data dependence graph 1

Data Dependence Graph (1)


Partitioning and communication

Partitioning and Communication

  • Most time spent reading documents and generating profile vectors

  • Create two primitive tasks for each document


Data dependence graph 2

Data Dependence Graph (2)


Agglomeration and mapping

Agglomeration and Mapping

  • Number of tasks not known at compile time

  • Tasks do not communicate with each other

  • Time needed to perform tasks varies widely

  • Strategy: map tasks to processes at run time


Manager worker style algorithm

Manager/worker-style Algorithm

  • Task/Functional Partitioning

  • Domain/Data Partitioning


Roles of manager and workers

Roles of Manager and Workers


Manager pseudocode

Manager Pseudocode

Identify documents

Receive dictionary size from worker 0

Allocate matrix to store document vectors

repeat

Receive message from worker

if message contains document vector

Store document vector

endif

if documents remain then Send worker file name

else Send worker termination message

endif

until all workers terminated

Write document vectors to file


Worker pseudocode

Worker Pseudocode

Send first request for work to manager

if worker 0 then

Read dictionary from file

endif

Broadcast dictionary among workers

Build hash table from dictionary

if worker 0 then

Send dictionary size to manager

endif

repeat

Receive file name from manager

if file name is NULL then terminate endif

Read document, generate document vector

Send document vector to manager

forever


Task channel graph

Task/Channel Graph


Enhancements

Enhancements

  • Finding middle ground between pre-allocation and one-at-a-time allocation of file paths

  • Pipelining of document processing


Allocation alternatives

Load imbalance

n/p

Allocation Alternatives

Time

Excessive

communication

overhead

1

Documents Allocated per Request


Pipelining

Pipelining


Time savings through pipelining

Time Savings through Pipelining


Pipelined manager pseudocode

Pipelined Manager Pseudocode

a 0 {assigned jobs}

j  0 {available jobs}

w  0 {workers waiting for assignment}

repeat

if (j > 0) and (w > 0) then

assign job to worker

j  j– 1; w  w– 1; a  a + 1

elseif (j > 0) then

handle an incoming message from workers

increment w

else

get another job

increment j

endif

until (a = n) and (w = p)


Summary

Summary

  • Manager/worker paradigm

    • Dynamic number of tasks

    • Variable task lengths

    • No communications between tasks

  • New tools for “kit”

    • Create manager/worker program

    • Create workers-only communicator

    • Non-blocking send/receive

    • Testing for completed communications

  • Next Step: Cluster Profile Vectors


K means clustering

K-Means Clustering

  • Assumes documents are real-valued vectors.

  • Assumes distance function on vector pairs

  • Clusters based on centroids (aka the center of gravity or mean) of points in a cluster, c:

  • Reassignment of instances to clusters is based on distance of vector to the current cluster centroids.

    • (Or one can equivalently phrase it in terms of similarities)


K means algorithm

K-Means Algorithm

Let d be the distance measure between instances.

Select k random instances {s1, s2,… sk} as seeds.

Until clustering converges or other stopping criterion:

For each instance xi:

Assign xi to the cluster cjsuch that d(xi, sj) is minimal.

// Now Update the seeds to the centroid of each cluster)

For each cluster cj

sj = (cj)


K means example k 2

Pick seeds

Reassign clusters

Compute centroids

Reassign clusters

x

x

x

Compute centroids

x

x

x

K Means Example(K=2)

Reassign clusters

Converged!


Termination conditions

Termination conditions

  • Desire that docs in a cluster are unchanged

  • Several possibilities, e.g.,

    • A fixed number of iterations.

    • Doc partition unchanged.

    • Centroid positions don’t change.

    • We’ll choose termination when only small fraction change (threshold value)


Cs668 lecture 9 objectives

  • Quiz:

    • Describe Manager/Worker pseudo-code that implements the K-means algorithm in parallel

    • What data partitioning for parallelism?

    • How are cluster centers updated and distributed?


Hints

Hints

  • objects to be clustered are evenly partitioned among all processes

  • cluster centers are replicated

  • Global-sum reduction on cluster centers is performed at the end of each iteration to generate the new cluster centers.

  • Use MPI_Bcast and MPI_Allreduce


  • Login