Associative Classification (AC) Mining for A Personnel Scheduling Problem

Associative Classification (AC) Mining for A Personnel Scheduling Problem Fadi Thabtah

Trainer scheduling problem Schedule Courses (events) Resources Locations Staff (trainers) Timeslots

Trainer scheduling problem • Assigning a number of training courses (events) to a limited number of training staff, locations, and timeslots • Each course has a numerical priority value • Each trainer is penalised depending on the travel distance

Objective Function Total priority for scheduled events Total penalty for training staff MAX

Hyperheuristic approach • Operates at a higher level of abstraction than metaheuristics • You may think of it as a supervisor that manages the choice of simple local search neighbourhoods (low-level heuristics) at any time

Low-level heuristics • Problem-oriented • Represent simple methods used by human experts • Easy to implement • Examples: • Add new event to the schedule • Swap two events in the schedule • Replace one event in the schedule by another

Low Level Heuristic 1 Low Level Heuristic 2 Low Level Heuristic 3 Hyperheuristic Current solution Perturbed solution

Initial solution Objective value Hyperheuristic algorithm Objective value Objective value CPU time Current solution (according to acceptance criterion) Set of low-level heuristics Perturbed solution Selected low-level heuristic Building a Schedule using A hyperheuristic

Advantages of hyperheuristics • Cheap and fast to implement • Produce solutions of good quality (comparable to those obtained by hard-to-implement metaheuristic methods) • Require limited domain-specific knowledge • Robustness: can be effectively applied to a wide range of problems and problem instances

Current Hyperheuristics Approaches • Simple hyperheuristics (Cowling et al., 2001-2002) • Choice-function-based(Cowling et al., 2001 – 2002) • Based on genetic algorithms(Cowling et al., 2002; Han et al., 2002) • Hybrid Hyperheuristics. (Cowling, Chakhlevitch 2003-2004)

Why Data Mining Scenario: While constructing the solution of the scheduling problem, the hyperheuristic manages the choice of appropriate LLH in each choice point, therefore an expert decision maker is needed (Classification). Two approaches: • Learn the performance of LLH from past schedules to predict appropriate LLH in current one • While constructing schedule learn and predict LLH Or what so called, Learn “On-the-fly”

RowIds RowId A1 A1 A2 A2 Class 1 1 x1 x1 y1 y1 c1 2 2 x1 x2 y2 y4 c2 3 3 x1 x1 y1 y1 c2 4 x1 y2 c1 5 x2 y1 c2 6 x2 y1 c1 7 x2 y3 c2 8 x1 y3 c1 9 x2 y4 c1 10 x3 y1 c1 Classification : A Two-Step Process 1. Classifier building: Describing a set of predetermined classes • 2. Classifier usage: • Calculate error rate • If Error rate is acceptable, then apply the classifier to test data Classification Algorithm • Training Data Class/ LLH • Test Data Classification Rules

Learning the Performance of LLH (Hyperheuristic Solution) Applied K times Data Mining Techniques Produce Derived Hyperheuristic Algorithm Guide Rules Set (If/Then)

Transaction Id Items Time 12 bread, milk, juice 10:12 13 bread, juice, milk 12:13 14 milk, beer, bread, juice 13:22 15 bread, eggs, milk 13:26 16 beer, basket, bread, juice 15:11 Association Rules Mining • Advantages: • Items shelving • Sales promotions • Future planning • Strong tool that aims to find relationships between variables in a database. • Its applied widely especially in market basket analysis in order to infer items from the presence of other items in the customer’s shopping cart • Example : if a customer buys milk, what is the probability that he/she buys cereal as well? • Unlike classification, the target class is not pre-specified in association rule mining. • Transactional Database

Associative Classification (AC) • Special case of association rule that considers only the class label as a consequent of a rule. • Derive a set of class association rules from the training data set which satisfy certain user-constraints, i.e support and confidence thresholds. • To discover the correlations between objectsand class labels. • Ex: • CBA • CPAR • CMAR

Training Data AC Steps Associative classification Algorithm Frequent Ruleitems: Attribute values that pass support threshold user Class Association Rules

Rule support and confidence Given a training data set T, for a rule • The support of R, denoted as sup(R) , is the number of objects in T matchingR condition and having a class label c • The confidence of R , denoted as conf(R), is the the number of objects matchingR condition and having class label cover the number of objects matchingR condition • Any Item has a support larger than the user minimum support is called frequent itemset

Current Developed Techniques • MCAR (Thabtah et al., Pceeding of the 3rd IEEE International Conference on Computer Systems and Applications (pp. 1-7) • MMAC (Thabtah, et al., Journal ofKnowledge and InformationSystem (2006)00:1-21. MCAR Characteristics: • Combinations of two general data mining approaches, i.e. (association rule, classification) • Suitable for traditional classification problems • Employs a new method of finding the rules MMACC characteristics: • Produces classifiers of the form: that are suitable to not only traditional binary classification problems but also useful to multi-class labels problems such as Medical Diagnoses and Text Classification. • Presents three Evaluation Accuracy measures

Data and Experiments Learning Approach : Learn the performance of LLH from past schedules to predict appropriate LLH in current one Supp=5%, confidence=40% Number of datasets : 12-16 UCI data and 9 solutions Of the training scheduling problem Algorithms used: CBA (AC algorithm) • MMAC (AC algorithm) • Decision Tree algorithms (C4.5) • Covering algorithms (RIPPER) • Hybrid Classification algorithm (PART)

Relative prediction accuracy in term of PART for the Accuracy Measures of MMAC algorithm

Relative prediction accuracy in term of CBA for the Accuracy Measuresof MMAC algorithm

Number of Rules of CBA, PART and Top-label

Accuracy (%) for PART, RIPPER, CBA and MMAC on UCI data sets

Comparison between AC algorithms on 12 UCI data sets

MCAR vs. CBA and C4.5 On UCI data sets

Conclusions • Associative classification is a promising approach in data mining • Since more than LLHs could improve the objective function in the hyperheuristic, we need a multi-label rules in the classifier • Associative classifiers produce more accurate classification models than traditional classification algorithms such as decision trees and rule induction approaches • One challenge in associative classification is the exponential growth of rules, therefore pruning becomes essential

Future Work • Constructing a hyperheuristic approach for the personnel scheduling problem • Investigating the use of multi-class labels classification algorithms with a hyperheuristic • Implementing of a new data mining techniques based on dynamic learning suitable for scheduling and optimization problem. • Investigate rule pruning in AC mining

Questions?

Associative Classification (AC) Mining for A Personnel Scheduling Problem