1 / 17

Parallel streaming decision trees

Parallel streaming decision trees. Yael Ben-Haim & Elad Yom-Tov Presented by: Yossi Richter. Why decision trees?. Simple classification model, short testing time Understandable by humans BUT: Difficult to train on large data (need to sort each feature). Previous work.

anne
Download Presentation

Parallel streaming decision trees

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Parallel streaming decision trees Yael Ben-Haim & Elad Yom-Tov Presented by: Yossi Richter

  2. Why decision trees? • Simple classification model, short testing time • Understandable by humans • BUT: • Difficult to train on large data (need to sort each feature)

  3. Previous work • Presorting (SLIQ, 1996) • Approximations (BOAT, 1999) (CLOUDS, 1997) • Parallel (e.g. SPRINT 1996) • Vertical parallelism • Task parallelism • Hybrid parallelism • Streaming • Minibatch (SPIES, 2003) • Statistic (pCLOUDS, 1999)

  4. Streaming parallel decision tree Data

  5. Iterative parallel decision tree Time Master Workers Initializeroot Build histogram Build histogram Data Data Merge Compute node splits Build histogram Build histogram Until convergence

  6. Building an on-line histogram • A histogram is a list of pairs (p1, m1) … (pn, mn) • Initialize: c=0, p=[ ], m=[ ] • For each data point p: • If p==pj for any j<=c • mj = mj + 1 • Otherwise • Add a bin to the histogram with the value (p, 1) • c = c + 1 • If c > max_bins • Merge the two closest bins in the histogram • c = max_bins

  7. Merging two histograms • Concatenate the two histogram lists, creating a list of length c • Repeat until c <= max_bins • Merge the two closest bins

  8. Example of the histogram 50 bins, 1000 data points

  9. Pruning • Taken from the MDL-based SLIQ algorithm • Consists of two phases: • Tree construction • Bottom-up pass on the complete tree • During tree construction, for each tree node, set cleaf = 1 + number of samples that reached the node and do not belong to the majority class • The bottom-up pass: • for each leaf, set cboth = cleaf • for each internal node, for which cboth(left) and cboth(right) have been assigned, set cboth = 2 + cboth(left) + cboth(right) • The subtree rooted at a node is to be pruned if cleaf is small, namely: • Only a few samples reach it • A substantial portion of the samples that reach it belongs to the majority class • If cleaf < cboth (i.e., the subtree does not contribute much information) then: • Prune the subtree • Set cboth = cleaf

  10. Shameless PR slide IBM Parallel Machine Learning toolbox • A toolbox for conducting large-scale machine learning • Supports architectures ranging from single machines with multiple cores to large distributed clusters • Works by distributing the computations across multiple nodes • Allows for rapid learning of very large datasets • Includes state-of-the-art machine learning algorithms for: • Classification: Support-vector machines (SVM), decision tree • Regression: Linear and SVM • Clustering: k-means, fuzzy k-means, kernel k-means, Iclust • Feature reduction: Principal component analysis (PCA), and kernel PCA. • Includes an API for adding algorithms • Freely available from alphaWorks • Joint project of the Haifa Machine Learning group and the Watson Data Analytics group K-means, Blue Gene

  11. No statistically Significant difference Results: Comparing single node solvers Ten-fold cross-validation, unless test\train partition exists

  12. 80% reduction in size Results: Pruning

  13. Speedup (Strong scalability) Alpha Beta Speedup improves with data size!

  14. Weak scalability Alpha Beta Scalability improves with the number of processors!

  15. Algorithm complexity

  16. Summary • An efficient new algorithm for parallel streaming decision trees • Results as good as single-node trees, but with scalability that improves with the data size and the number of processors • Ongoing work: Proof that the algorithm is only epsilon different from previous decision tree algorithm

  17. Thai Traditional Chinese Gracias Spanish Russian Thank You KIITOS Danish English Simplified Chinese תודה Hebrew (Toda) Danke German Arabic Grazie Merci Italian French Japanese Korean Obrigado Portuguese

More Related