1 / 23

Decision Trees and MPI Collective Algorithm Selection Problem

Decision Trees and MPI Collective Algorithm Selection Problem. Jelena Pje¡sivac-Grbovi´c ,Graham E. Fagg, Thara Angskun, George Bosilca, and Jack J. Dongarra, IPDPS(IEEE International Parallel & Distributed Processing Symposium) 2007. Reporter : Yu Tang Liu. Outline. Abstract Introduction

verdad
Download Presentation

Decision Trees and MPI Collective Algorithm Selection Problem

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Decision Trees and MPI Collective Algorithm Selection Problem Jelena Pje¡sivac-Grbovi´c ,Graham E. Fagg, Thara Angskun, George Bosilca, and Jack J. Dongarra, IPDPS(IEEE International Parallel & Distributed Processing Symposium) 2007. Reporter : Yu Tang Liu

  2. Outline • Abstract • Introduction • C4.5 Decision Tree algorithm • Experimental Results and Analysis • Conclusion

  3. Abstract • Selecting the close-to-optimal collective algorithm based on the parameters of the collective call at run time is an important step in achieving good performance of MPI applications. • Explore the applicability of C4.5 decision trees to the MPI collective algorithm selection problem.

  4. Introduction • Performance of MPI collective operations depend on • Total number of nodes involved in communication • System and network characteristics • Size of data being transferred • Current load • The operation that is being performed • The segment size used for operation pipelining • Selecting the best possible algorithm and segment size combination for every instance of collective operation.

  5. Introduction • Process of tuning a system 1.Detailed profiling of the system, possibly combined with communication modeling. 2.Analyzing the collected data and generating a decision function 3.During run-time, the decision function selects the close-to-optimal method(combination of algorithm and segment size) for a particular collective instance.

  6. C4.5 Decision Tree Algorithm • Decision Tree Example

  7. C4.5 Decision Tree Algorithm • In the decision tree each node corresponds to a non-categorical attribute and each arc to a possible value of that attribute. A leaf of the tree specifies the expected value of the categorical attribute for the records described by the path from the root to that leaf. • In the decision tree at each node should be associated the non-categorical attribute which is most informative among the attributes not yet considered in the path from the root.

  8. C4.5 Decision Tree Algorithm • Requirement of application of C4.5 algorithm • Attribute-value description • Predefined classes • Discrete classes • Sufficient data • “Logical” classification models

  9. C4.5 Decision Tree Algorithm • Additional parameters that affect the resulting decision tree • Weight • Confidence level • Attribute grouping • Windowing

  10. C4.5 Decision Tree Algorithm • ID3 algorithm • C4.5 algorithm= ID3 algorithm +

  11. Experimental Results and Analysis • C4.5 decision tree for Alltoall on Nano cluster

  12. Experimental Results and Analysis • Barrier is a collective operation used to synchronize a group of nodes. It guarantees that by the end of the operation, all processes involved in the barrier have at least entered the barrier. • In flat-tree/linear algorithm all nodes report to a preselected root; once every node has reported to the root, the root sends a releasing message to all participants. • In the double ring algorithm, a zero-byte message is sent from a preselected root circularly to the right. A node can leave barrier only after it receives the message for the second time. • Bruck algorithm requires communication steps. At step k, node r receives a zero-byte message from and sends message to node and node (with wrap around) respectively.

  13. Experimental Results and Analysis • Alltoall is used to exchange data among all processes in a group. The operation is equivalent to all processes executing the scatter operation on their local buffer. • In the linear algorithm at step i, the ith node sends a message to all other nodes. The (i+1)th node is able to proceed and start sending as soon as it receives the complete message from the ith node. We allow for segmentation of messages being sent. • In the pairwise exchange algorithm, at step i, node with rank r sends a message to node (r+i) and receives a message from the (r-i)th node, with wrap around. We do not segment messages in this algorithm.

  14. Experimental Results and Analysis • The Broadcast operation transmits an identical message from the root process to all processes of the group. At the end of the call, the contents of the root’s communication buffer is copied to all other processes. • In flat-tree/linear algorithm root node sends an individual message to all participating nodes. • In pipeline algorithm, messages are propagated from the root left to right in a linear fashion. • In binomial and binary tree algorithms, messages traverse the tree starting at the root and going towards the leaf nodes through intermediate nodes. • In the splitted-binary tree algorithm , the original message is split into two parts, and the “left” half of the message is sent down the left half of the binary tree, and the “right” half of the message is sent down the right half of the tree. In the final phase of the algorithm, every node exchanges message with their “pair” from the opposite side of the binary tree. • binary tree algorithm

  15. Experimental Results and Analysis • The Reduce operation combines elements provided in the input buffer of each process within a group using the specified operation, and returns the combined value in the output buffer of the root process. • flat-tree/linear • Pipeline • binomial tree • binary tree • k-chain tree.

  16. Experimental Results and Analysis

  17. Experimental Results and Analysis • Broadcast decision tree statistics corresponding to the data presented in last figure.

  18. Experimental Results and Analysis • Performance penalty of Broadcast decision trees corresponding to the data presented in last Figure and table

  19. Experimental Results and Analysis

  20. Experimental Results and Analysis • Statistics for combined Broadcast and Reduce decision trees corresponding to the data presented in last figure.

  21. Experimental Results and Analysis • Mean performance penalty of the combined decision tree for each of the collectives.

  22. Experimental Results and Analysis • Segment of combined Broadcast and Reduce decision tree ‘-m 40 –c 25’

  23. Conclusion • C4.5 decision tree can be used to generate a reasonably small and very accurate decision function: the mean performance penalty on existing performance data was within the measurement error for all trees we considered. • These trees were also able to produce decision functions with less than 2.5% relative performance penalty for both collectives. This indicates that it is possible to use information about one MPI collective operation to generate a reasonable well decision function for another collective.

More Related