Literal and ProRulext: Algorithms for Rule Extraction of ANNs

Literal and ProRulext: Algorithms for Rule Extraction of ANNs Paulemir G. Campos, Teresa B. Ludermir E-mail: {pgc, tbl}@cin.ufpe.br

Presentation Summary • 1. Introduction • 2. Literal and ProRulext • 3. Experiments • 4. Results • 5. Discussions • 6. Conclusions • Acknowledgements • References

1. Introduction • Main Features of Artificial Neural Networks (ANN): • Excellent capacity for generalization; • It have been applied with success to solve several problems the actual world; • It represents the domain knowledge in topology, weight values and bias; • And, explaining clearly your answers is not available promptly (main deficiency).

1. Introduction • Usually this deficiency can be minimized through the “IF/THEN” Rule Extraction from the trained network (ANN + Rule Extraction). • However, exist others hybrid models for this aim, such as, Evolutionary Algorithms and Neuro-Fuzzy Systems.

1. Introduction • This paper presents two algorithms for extraction of rules from trained networks: Literal and ProRulext. • The Literal has as a differential to be portable. • The ProRulext has a relatively low computational cost in the rules extraction from feedforward MLP networks with one hidden layer.

2. Literal and ProRulext • Literal: • Is a very simple algorithm proposed for the extraction of “IF-THEN” propositional rules from trained networks applied to problems of pattern classification and time series forecast; • The rules are extracted through a literal mapping of the network input and output; • This approach is a Pedagogical Technique (Andrews et al [2] Taxonomy).

2. Literal and ProRulext • Overview of the Literal Algorithm: • 1. Make discrete the network inputs and outputs in intervals with the same width; • 2. Normalize the patterns of the training set of network for values within [0;1] or [-1;1]; • 3. Present each one of these normalized input patterns to the trained network obtaining the respective rule consequents;

2. Literal and ProRulext • Overview of the Literal Algorithm (to continue): • 4. De-Normalize the rule antecedents and consequents previously obtained for original values of the database; • 5. Store the new rules created in the previous steps in a file; • 6. Select the input attribute with more frequent contents through the conclusion of the rules;

2. Literal and ProRulext • Overview of the Literal Algorithm (to continue): • 7. Eliminate the other attributes of each one of these rules, guaranteeing more general rules; • 8. Eliminate the redundant rules that can be obtained after the execution of steps 6 and 7;

2. Literal and ProRulext • Overview of the Literal Algorithm (to continue): • 9. Calculate the coverage of the training set of each resultant rule through conclusion, based on the number of activations of theses rules; • 10. Exclude the rules with 0% coverage of the patterns used in the training of the network from which the rules have been extracted originally.

2. Literal and ProRulext • ProRulext: • Is the other algorithm proposed in this paper for the extraction of “IF-THEN” propositional rules from MLP networks with one hidden layer trained to pattern classification and time series forecast;

2. Literal and ProRulext • ProRulext (to continue): • The rules are extracted by using a decompositional method to obtain its antecedents and by applying a pedagogical approach to determine the consequents; • This approach is a Eclectic Technique (Andrews et al [2] Taxonomy).

2. Literal and ProRulext • Overview of the ProRulext Algorithm: • 1. Make it discrete the network inputs and outputs in intervals with the same width; • 2. Normalize the network input and output patterns of the training set for values within [0;1] or [-1;1]; • 3. Present each one of these input patterns to the trained network;

2. Literal and ProRulext • Overview of the ProRulext Algorithm (to continue): • 4. Build the AND/OR graph of the trained network considering only its positive weights; • 5. Determine the antecedents of the rules through the decompositional method; • 6. Apply a pedagogical approach to find the consequents of these rules;

2. Literal and ProRulext • Overview of the ProRulext Algorithm (to continue): • 7. De-Normalize the rule antecedents and consequents previously obtained for original values of the database; • 8. Store the new rules created in the previous step in a file; • 9. Select an input attribute with more frequent contents through conclusion of rules;

2. Literal and ProRulext • Overview of the ProRulext Algorithm (to continue): • 10. Eliminate the other attributes of each one of these rules, guaranteeing more general rules; • 11. Eliminate the redundant rules which can be obtained through the execution of steps 9 and 10;

2. Literal and ProRulext • Overview of the ProRulext Algorithm (to continue): • 12. Calculate the coverage of the training set of each resulting rule through conclusion, based on the number of activations of these rules; • 13. Erase the rules with 0% of coverage of the patterns used in the training of the network from which the rules have been extracted originally.

2. Literal and ProRulext • It is valid to emphasize that both algorithms presented have rule simplification stages (the last five steps of Literal and ProRulext). • This way it can be assured the acquisition of concise and legible rules from trained network for pattern classification and time series forecast.

3. Experiments • The trained networks and the respective sets of rules have been generated through the AHES (Applied Hybrid Expert System) version ‘1.2.1.5’ [4].

3. Experiments • The models implemented in the AHES are feedforward MLP networks with one hidden layer and the rule extraction techniques: BIO-RE [11], Geometrical [7], NeuroLinear [10], Literal [5] and ProRulext [4].

3. Experiments - Databases • In a problem of patterns classification, it will be used a database about Breast Cancer from the Proben1 repository [6]. • This base contains 699 cases, among which 458 are related to benign Breast Cancer and 241 to malignant Breast Cancer, each one with 10 attributes more the Breast Cancer class.

3. Experiments - Databases • For the time series forecast problem it will be used a database with the Index of the Stock Market of São Paulo (IBOVESPA) [6]. • The series predicted in this work will be of minimum with a total amount of 584 patterns.

3. Experiments - Databases • Before the experiments those bases have been submitted to pre-processing stages [6]. • Thus, the Breast Cancer database remained with 457 cases, 219 benign and 238 malignant. • The IBOVESPA database has the size of the time window indicated equal to two and the number of patterns has become 582.

3. Experiments - Databases • Furthermore, the databases have been normalized to values belonging to the interval [0; 1] or [-1; 1] (depending on the activation function used) before the stages of training and rule extraction from each trained networks.

3. Experiments – The Trained Networks • The MLP networks have been trained according to the Holdout methodology. • Thus, each training set contains 2/3 of the total normalized input and output patterns. On the other hand, each test set has the remaining 1/3 of the patterns.

3. Experiments – The Trained Networks • Fixed parameters during the training stage of the networks obtained with the Breast Cancer database: • Method of weight adjusting per epochs or batch; • Choice of the fixed initial weights among values within the interval [-0.1; 0.1]; • Moment term equal to ‘0.1’, number of epochs equal to 100 and output maximum error desired equal to ‘0.01’.

3. Experiments – The Trained Networks • Fixed parameters during the training stage of the networks obtained with the IBOVESPA database: • Method of weight adjusting per pattern or on-line; • Choice of the fixed initial weights among values belonging to the interval [-0.1; 0.1]; • Without moment term; number of epochs equal to 100 and output maximum error desired equal to ‘0.01’.

3. Experiments – The Trained Networks • Variable parameters during the training stage of the networks obtained with the Breast Cancer and IBOVESPA databases: • Number of units of the hidden layer (1, 3 and 5); • Learning rate (0.1; 0.5; 0.9); • Use or not of bias; • And, kinds of non-linear activation functions (sigmoid and hyperbolic tangent).

3. Experiments – The Trained Networks • Trained networks selected using Breast Cancer database: where: CM1 Network – CM_Tan_NE9_Bias_4; CM2 Network – CM_Sig_NE9_Bias_1.

3. Experiments – The Trained Networks • Trained networks selected using IBOVESPA database: where: IB1 Network – IBOVESPA_Sig_Bias_2; IB2 Network – IBOVESPA_Tan_4; MAE – Mean Absolute Error.

3. Experiments – Extracting Rules • ProRulext algorithm: • Limits of the “IF part” using the two database: ‘0.1’, ‘0.5’ and ‘0.9’; • Limits of the “THEN part” using the Breast Cancer database: ‘0.1’, ‘0.5’ and ‘0.9’; • And, limits of the “THEN part” using the IBOVESPA database: ‘0.1’, ‘0.5’ and ‘0.8’, because with ‘0.9’ no rule has been obtained.

3. Experiments – Extracting Rules • Literal and ProRulext Algorithms: • Quantity of intervals to make discrete numerical input and output attributes of the two databases: 2 (two) • This to obtain sets of rules as much compact as possible.

3. Experiments – Extracting Rules • Examples of extracted rules by Literal from CM2 Network (Breast Cancer)

3. Experiments – Extracting Rules • Examples of extracted rules by ProRulext from IB1 Network (IBOVESPA)

3. Experiments – Extracting Rules • It was also obtained sets of rules with the BIO-RE (Bio) [11], Geometrical (Geo) [7] and NeuroLinear (Neuro) [10] techniques. • It has been done for comparison among the results obtained with these techniques and the ones presented by Literal and ProRulext.

4. Results • The best results of the sets of extracted rules from trained networks with Breast Cancer database where: Sig – Sigmoid, Tan – Hyperbolic Tangent, Irr – non relevant (Sig or Tan)

4. Results • The best results of the sets of extracted rules from trained networks with IBOVESPA database where: Sig – Sigmoid, Tan – Hyperbolic Tangent, Irr – non relevant (Sig or Tan)

5. Discussions • The results using Breast Cancer database indicate that the BIO-RE technique [11] has obtained sets of more concise, comprehensible and faithful rules, because the antecedents of the rules extracted by the Geometrical approach [7] are hidden units, what damages its legibility.

5. Discussions • The Literal and the ProRulext algorithms have presented performance compatible with the one obtained with the NeuroLinear technique, mainly recognized for extracting very faithful, compact and legible rules.

5. Discussions • However, the NeuroLinear was the most expensive computational method. • And the BIO-RE and Literal techniques have not been affected by the kind of activation function used in the network training.

5. Discussions • By analyzing results obtained with IBOVESPA database, can be concluded that all the investigated approaches, except by the Geometric technique, have offered the acquisition of sets of rules that are very concise, legible and faithful to the networks from which they have been obtained.

5. Discussions • It is important to mention that Literal and ProRulext do not have the disadvantages presented by the other methods investigated. • Besides, the algorithms proposed in this paper extract very expressive rules, as already illustrated.

6. Conclusions • It has been observed that Literal and ProRulext algorithms presented performance similar to the NeuroLinear, obtaining sets of rules that are concise, legible and faithful to the networks from which they have extracted, also with a lower computational cost and applicable to trained networks for pattern classification and time series forecast.

6. Conclusions • BIO-RE has obtained optimal rule sets, but it is only applicable to binary data or when the conversion to this type does not significantly affect the network performance [11].

6. Conclusions • Thus, as Literal and ProRulext do not have that limitation, these new approaches appear as efficient alternatives for the rule extraction from trained networks to justify the inferred outputs.

Acknowledgements • The authors thanks to CNPQ and CAPES (Brazilian Government Research Institutes) for financial support to development this research.

References • [1] R. Andrews and S. Geva, “Rule Extraction from Local Cluster Neural Nets”, Neurocomputing, vol. 47, 2002, pp. 1-20. • [2] R. Andrews, A. B. Tickle and J. Diederich, “A Survey and Critique of Techniques for Extracting Rules from Trained Artificial Neural Networks”, Knowledge-Based Systems, vol. 8, n. 6, 1995, pp. 373–389. • [3] B. Baesens, R. Setiono, C. Mues and J. Vanthienen, “Using Neural Network Rule Extraction and Decision Tables for Credit-Risk Evaluation”, Management Science, vol. 49, 2003, pp. 312-329.

References • [4] P. G. Campos, “Explanatory Mechanisms for ANNs as Extraction of Knowledge”, Master Thesis, Federal University of Pernambuco, Brazil, 2005 (In Portuguese). • [5] P. G. Campos and T. B. Ludermir, “Literal – A Pedagogical Technique for Rules Extraction of ANNs”, V ENIA – Brazilian Conference of Artificial Intelligence, São Leopoldo-RS, 2005, pp. 1138-1141 (In Portuguese). • [6] P. G. Campos, E. M. J. Oliveira, T. B. Ludermir and A. F. R. Araújo, “MLP Networks for Classification and Prediction with Rule Extraction Mechanism”, Proceedings of the International Joint Conference on Neural Networks, Budapest, 2004, pp. 1387-1392.

References • [7] Y. M. Fan and C. J. Li, “Diagnostic Rule Extraction from Trained Feedforward Neural Networks”, Mechanical Systems and Signal Processing, vol. 16, n. 6, 2002, pp. 1073-1081. • [8] Y. Hayashi, R. Setiono and K. Yoshida, “A Comparison Between Two Neural Network Rule Extraction Techniques for the Diagnosis of Hepatobiliary Disorders”, Artificial Intelligence in Medicine, vol. 20, n. 3, 2000, pp. 205-216. • [9] T. B. Ludermir, A. C. P. L. F. Carvalho, A. P. Braga et al, “Hybrid Intelligent Systems”, In: S. O. Rezende (Organizer), Intelligent Systems: Foundations and Applications, Manole, Barueri, 2003, pp. 249-268 (In Portuguese).

References • [10] R. Setiono, H. Liu, “NeuroLinear: From Neural Networks to Oblique Decision Rules”, Neurocomputing, vol. 17, 1997, pp. 1-24. • [11] I. A. Taha, J. Ghosh, “Symbolic Interpretation of Artificial Neural Networks”, IEEE Transactions on Knowledge and Data Engineering, vol. 11, n. 3, 1999, pp. 448-463.

Literal and ProRulext: Algorithms for Rule Extraction of ANNs