240 likes | 350 Views
This study explores the use of supervised learning to predict optimal loop unroll factors in compiler heuristics. By leveraging empirical data and machine learning algorithms such as nearest neighbors and support vector machines, the authors aim to improve heuristic design based on objective criteria rather than ad-hoc methods. The research provides insights into feature extraction and selection, demonstrating that a tailored approach can yield significant speedups in loop optimization. The findings suggest that machine learning can empower compiler writers to make data-driven decisions in heuristic development.
E N D
PREDICTING UNROLL FACTORS USING SUPERVISED LEARNING Mark Stephenson & Saman Amarasinghe Massachusetts Institute of Technology Computer Science and Artificial Intelligence Lab
INTRODUCTION & MOTIVATION • Compiler heuristics rely on detailed knowledge of the system • Compiler interactions not understood • Architectures are complex
HEURISTIC DESIGN • Current approach to heuristic development is somewhat ad hoc • Can compiler writers learn anything from baseball? • Is it feasible to deal with empirical data? • Can we use statistics and machine learning to build heuristics?
CASE STUDY • Loop unrolling • Code expansion can degrade performance • Increased live ranges, register pressure • A myriad of interactions with other passes • Requires categorization into multiple classes • i.e., what’s the unroll factor?
ORC’S HEURISTIC (UNKNOWN TRIPCOUNT) if (trip_count_tn == NULL) { UINT32 ntimes = MAX(1, OPT_unroll_times-1); INT32 body_len = BB_length(head); while (ntimes > 1 && ntimes * body_len > CG_LOOP_unrolled_size_max) ntimes--; Set_unroll_factor(ntimes); } else { … }
ORC’S HEURISTIC (KNOWN TRIPCOUNT) } else { BOOL const_trip = TN_is_constant(trip_count_tn); INT32 const_trip_count = const_trip ? TN_value(trip_count_tn) : 0; INT32 body_len = BB_length(head); CG_LOOP_unroll_min_trip = MAX(CG_LOOP_unroll_min_trip, 1); if (const_trip && CG_LOOP_unroll_fully && (body_len * const_trip_count <= CG_LOOP_unrolled_size_max || CG_LOOP_unrolled_size_max == 0 && CG_LOOP_unroll_times_max >= const_trip_count)) { Set_unroll_fully(); Set_unroll_factor(const_trip_count); } else { UINT32 ntimes = OPT_unroll_times; ntimes = MIN(ntimes, CG_LOOP_unroll_times_max); if (!is_power_of_two(ntimes)) { ntimes = 1 << log2(ntimes); } while (ntimes > 1 && ntimes * body_len > CG_LOOP_unrolled_size_max) ntimes /= 2; if (const_trip) { while (ntimes > 1 && const_trip_count < 2 * ntimes) ntimes /= 2; } Set_unroll_factor(ntimes); } }
SUPERVISED LEARNING • Supervised learning algorithms try to find a function F(X) → Y • X : vector of characteristics that define a loop • Y : empirically found best unroll factor 1 2 3 4 Loops Unroll Factors 5 6 7 8 F(X)
EXTRACTING THE DATA • Extract features • Most features readily available in ORC • Kitchen sink approach • Finding the labels (best unroll factors) • Added instrumentation pass • Assembly instructions inserted to time loops • Calls to a library at all exit points • Compile and run at all unroll factors (1.. 8) • For each loop, choose the best one as the label
LEARNING ALGORITHMS • Prototyped in Matlab • Two learning algorithms classified our data set well • Near neighbors • Support Vector Machine (SVM) • Both algorithms classify quickly • Train at the factory • No increase in compilation time
NEAR NEIGHBORS # FP operations # branches unroll don’t unroll
SUPPORT VECTOR MACHINES • Map the original feature space into a higher-dimensional space (using a kernel) • Find a hyperplane that maximally separates the data
# FP operations # FP operations # branches2 # branches SUPPORT VECTOR MACHINES unroll don’t unroll
PREDICTION ACCURACY • Leave-one-out cross validation • Filter out ambiguous training examples • Only keep obviously better examples (1.05x) • Throw away obviously noisy examples
FEATURE SELECTION • Feature selection is a way to identify the best features • Start with loads of features • Small feature sets are better • Learning algorithms run faster • Are less prone to overfitting the training data • Useless features can confuse learning algorithms
FEATURE SELECTION CONT.MUTUAL INFORMATION SCORE • Measures the reduction of uncertainty in one variable given knowledge of another variable • Does not tell us how features interact with each other
FEATURE SELECTION CONT.GREEDY FEATURE SELECTION • Choose single best feature • Choose another feature, that together with the best feature, improves classification accuracy most …
RELATED WORK • Monsifrot et al., “A Machine Learning Approach to Automatic Production of Compiler Heuristics.” 2002 • Calder et al., “Evidence-Based Static Branch Prediction Using Machine Learning.” 1997 • Cavazos et al., “Inducing Heuristic to Decide Whether to Schedule.” 2004 • Moss et al., “Learning to Schedule Straight-Line Code.” 1997 • Cooper et al., “Optimizing for Reduced Code Space using Genetic Algorithms.” 1999 • Puppin et al., “Adapting Convergent Scheduling using Machine Learning.” 2003 • Stephenson et al., “Meta Optimization: Improving Compiler Heuristics with Machine Learning.” 2003
CONCLUSION • Supervised classification can effectively find good heuristics • Even for multi-class problems • SVM and near neighbors perform well • Potentially have big impact • Spent very little time tuning the learning parameters • Let a machine learning algorithm tell us which features are best
T H E N D T H E N D E E
SOFTWARE PIPELINING • ORC has been tuned with SWP in mind • Every major release of ORC has had a different unrolling heuristic for SWP • Currently 205 lines long • Can we learn a heuristic that outperforms ORC’s SWP unrolling heuristic?
HURDLES • Compiler writer must extract features • Acquiring labels takes time • Instrumentation library • ~2 weeks to collect data • Predictions confined to training labels • Have to tweak learning algorithms • Noise