1 / 24

Document Classification

Document Classification. Syamsul Rizal 20136089. Introduction. Texts Images Music. Introduction. Given a document and a set of labels Find the documents most likely to contain relevant information. Motivation. Reads a dictionary of keywords Locates a set of text documents

mateo
Download Presentation

Document Classification

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Document Classification Syamsul Rizal 20136089

  2. Introduction • Texts • Images • Music

  3. Introduction • Given a document and a set of labels • Find the documents most likely to contain relevant information

  4. Motivation • Reads a dictionary of keywords • Locates a set of text documents • Reads the documents • Generates a vector for each document • Writes the document vectors

  5. Parallel Algorithm Design The document classification problem

  6. Partitioning and Communication

  7. Partitioning and Communication The reading and profiling of each document may occur in parallel

  8. Agglomeration and Mapping • Agglomeration • Reduce communication • Mapping • Load balancing, task scheduling

  9. Manager / Worker Paradigm • Manager • Responsible for keeping track of assigned and unassigned data • Worker • Assigns tasks to other processes, and retrieves results back from them

  10. Manager / Worker Paradigm Balances Workloads Increasing execution time Lowering speedup

  11. Manager Process a = array showing document assigned to each process d = document assigned j = ID of worker k = document vector length n = number of documents p = number of processor s = storage array containing document vectors t = terminated workers v = individual document vector IntMPI_Abort (MPI_Commcomm, interror_code) Makes a “best effort” attempt to abort all processes

  12. Worker Process • Scenario 1

  13. Worker Process • Scenario 2 Broadcast bandwidth inside the parallel computer > bandwidth between File server & parallel computer

  14. Worker Process f = file name k = dictionary size v = document vector

  15. Worker Process Decide which process will be the manager MPI_COM_WORLD

  16. Creating a Workers-Only Communicator Int id; MPI_Commworker_comm; … If (!id) /* Manager */ MPI_Comm_Split (MPI_COMM_WORLD, MPI_UNDEFINED, id, &worker_comm); Else/* Worker */ MPI_Comm_split (MPI_COMM_WORLD, 0, id, &worker_comm); To split a communicator into one or more new communicator

  17. Non Blocking Communications 3 phases of Manager Process:

  18. Non Blocking Communications Does not return until either message has been copied into a system buffer or message has been sent. Does not return until the message has been received into the buffer. Overlap System Hang

  19. Non Blocking Communications Ok, Here is the status Hello, I’ve just started this send It looks like the message has been sent Not yet MPI_REQUEST_NULL

  20. Non Blocking Communications

  21. Function for Manager • MPI_Irecv • MPI_Wait IntMPI_Irecv (void *buffer, intcnt, MPI_Datatypedtype, intsrc, int tag, MPI_Commcomm, MPI_Request *handle) We cannot access buffer until a matching call to MPI_Wait has returned. A handle(pointer) to an MPI_Request object that identifies the communication operation that has been initiated IntMPI_wait (MPI_Request *handle, MPI_Status *status) Blocks until the operation associated with pointer handle completes. MPI_Status object containing information about the received message.

  22. Function for Worker • MPI_Isend • MPI_Probe • MPI_Get_count IntMPI_Isend (void *buffer, intcnt, MPI_Datatypedtype, intdest, int tag, MPI_Commcomm, MPI_Request *handle) MPI_Request object created by the run-time system. The message buffer may not be reused until the matching call to MPI_Wait has returned. IntMPI_Probe (intsrc, int tag, MPI_Commcomm, MPI_Status *status) Src: the rank of the message source Tag: the incoming message’s Comm: the communicator Block until a message matching the source and tag is available to be received. IntMPI_Get_Count (MPI_Status *status, MPI_Datatypedtype, int *cnt)

  23. Summary • Parallel Algorithm Design • Partitioning • Communication • Agglomeration • Mapping • Manager • find the plain text files • Allocates the document to the worker • Writes the complete set of document (output) • Non Blocking Communication • To enhanced the performance on the system using MPI_Wait function

  24. Thank You

More Related