Actionable Rules in Knowledge Discovery: Uncovering Valuable Insights

ACTION RULES Slides by A A Tzacheva.

Knowledge Discovery • The nontrivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data. (Fayyad, et al 1996) Data mining is the process of discovering meaningful new correlations, patterns and trends by sifting through large amounts of data stored in repositories. Gartner Group “The Saying that Knowledge Is Power Is Not Quite True… Used Knowledge Is Power” Edward E. Free

Knowledge Discovery • Knowledge Discovery of Databases (KDD) is a new area of research that combines many algorithms and techniques used in artificial intelligence, statistics, databases, machine learning, etc. • KDD is the process of extracting previously unknown, not obvious, new, and interesting information from huge amount of data • Past research on data mining has mostly been focused on techniques for generating rules from datasets

Knowledge Discovery Selection Transformation Cleaning Integration Evaluation Visualization Data Mining Data Warehouse Prepared data Patterns Knowledge Knowledge Base Data

Knowledge Discovery [Pohle, 2003] Many data mining systems are great in deriving useful statistics and patterns from huge amounts of data, but they are not very smart in interpreting these results, which is crucial for turning them into interesting, understandable and actionable knowledge. Lack of sophisticated tool support for incorporating human domain knowledge into the mining process. This domain knowledge should be updated with the mining results. Mining Process (Fayyad):[[Business Understanding]  [Domain Knowledge]]  [Data Understanding] [Data Preparation]  [Modeling/Mining]  [Evaluation]  [Deployment]

Interestingness Function E = [Cond1 Cond2] Presumptive Objective Associations: two conditions occur together, with some confidence Data Mining Task: For a given dataset D, language of facts L, interestingness function ID, L and threshold c, find association E such that ID,L(E) > c efficiently. Knowledge Engineer definesc

Are All the “Discovered” Patterns Interesting? • The rules discovered by data mining algorithm are large and we want a subset of rules, which are interesting, because these algorithms discover accurate rules rather than interesting rules. • Association is interesting if it is easily understood by humans, valid on new or test data with some degree of certainty, potentially useful, novel, or validates some hypothesis that a user seeks to confirm . • There are two aspects of rules’ interestingness that have been studied in data mining literature, objective and subjective measures • Objective measures are data-driven and domain-independent. Generally, these measures evaluate the rules based on the quality as well as the similarity between them, rather than considering the user belief about the domain. * Note: Domain here is meant in a sense of the type of data – ex. financial data (means financial domain), medical data (means medical domain), therefore if the measures being calculated are independent of the domain, then they can always be calculated - no matter if the data is financial, medical or any other type.

Objective Measure Examples • Assume is an association rule • Some objective measures are: • Support or Strength: card[  ] • Confidence or Certainty Factor: card[]/card[] • Coverage Factor: card[]/card[] • Leverage: card[]/n – [card[]/n]*[card[]/n] • Lift: n  card[]/[card[]*card[]]

Problem: Subjective Interestingness • Rule is: • unexpected, if it contradicts the user belief about the domain and therefore surprises the user • novel, if to some extent contributes to new knowledge • actionable, if the user can take an action to his/her advantage based on this rule • In the data mining literature the actionability has been quantified in terms of unexpectedness. For example: - the most of actionable knowledge is unexpected. - the most of unexpected knowledge is actionable.

Actionability - Subjective • So, if we are able to calculate the actionability of a rule (like the way we are able to calculate support and confidence) then, we have a way of knowing whether our rule is unexpected and interesting (we have a way to measure the interestingness of the rule) • In the data mining literature, data mining is viewed as the process of turning data into information, information into action, and action into value or profit. • However, the task of finding actionable rules is not trivial. As actionability is seen as an elusive concept because it is difficult to know the space of all rules and the actions to be attached to them.

Argument: Objective Unexpectedness • Unexpectedness • /does not depend on domain knowledge/ • If r = [A B1] has a high confidence and r1 = [A*C B2] • has a high confidence, then r1 is unexpected. • Unexpectedness is inherently subjective and prior beliefs of • the user form its important component. • [Padmanabhan & Tuzhilin] • A  B is unexpected with respect to the belief • on the dataset D if the following conditions hold: • B   = False [ B and  logically contradict each other] • A   happen together on a large subset of D  A* B is true, which means A* 

Actionability • The actionability measure is based on the rules’ benefit to the user, that is, the user can do something to his/her interest with the rule. • This measure is very important for the rules to be interesting in the sense that the users always are looking for patterns to improve their performance and establishing better work. • The practical implication of getting information is to improve the business, that is, the information must ensure the success of business for decision-making. Actions can be performed to make the business succeed.

Actionable Rules and Action Rules • There are methods which define actionability as an approximation of unexpectedness. • In order to produce unexpected and/or actionable rules, the system must know what the user expects, i.e., his/her existing knowledge or concepts about the domain. • Machine learning also typically assumes that the domain knowledge is correct or at least partially correct. • Actionable rule mining deals with benefit-driven actions required for decision making • Rules are unexpected if they "surprise" the user, and rules are actionable if the user can do something with them to his/her advantage • For example, a user may be able to change the nondesirable/non-profitable patterns to desirable/profitable patterns • Although both unexpectedness and actionability are important, actionability is the key concept in most applications because actionable rules allow the user to do his/her job better by taking some specific actions in response to the discovered knowledge.

Action Rules Action Rules • Next, we will focus on a special type of rules, called action rules, which are actionable rules. We will study a well-defined algorithm for discovering such rules. • These rules can be constructed from classification rules to suggest a way to re-classify objects (for instance customers, or patients) to a desired state. • In e-commerce applications, this re-classification may mean that a consumer not interested in a certain product, now may buy it, and therefore may fall into a group of more profitable customers. In medical domain, this re-classification may mean how to change the class of a tumor from malignant to benign.

Action Rules Action Rules • These groups are described by values of classification attributes in a decision table schema.By a decision table we mean any information system where the set of attributes is partitioned into conditions and decisions. • To discover action rules it is required that the set of conditions is partitioned into stable conditions and flexible conditions/attributes. For simplicity reason, we also assume that there is only one decision attribute. • For example, date of birth is a stable attribute, and interest rate on any customer account is a flexible attribute (dependable on bank). • The assumption that the decision attribute d is flexible is quite essential.

Action Rules * Note: the objects are the rows of the database (decision table), and the attributes are the columns of the database. Decision table Any information system S of the form S = ( AFl ASt {d} ), where • d is a distinguished attribute called decision. • the elements of ASt are called stable conditions • the elements of AFl {d} are called flexible conditions Example of action rule: [ (b1, v1 w1)  (b2, v2 w2)  …  (bp, vp wp)](x)  [(d, k1 k2)](x) This means that, if we change the value of attribute b1 from v1 to  w1 ,and the value of attribute b2 from v2 to  w2 ,and so on, and the value of attribute bp from vp to  wp , then the value of the decision attribute d, will change from k1 to the desired value k2 . Assumption: (i)[(1 i  p)  (bi AFl)] – in other words, the attributes b1, b2, …, bp are all flexible attributes

Action Rules X a b c d x1 0 S 0 L x2 0 R 1 L x3 0 S 1 L x4 0 R 1 L x5 2 P 2 L x6 2 P 2 L x7 2 S 2 H {a, c} - stable attributes, {b,d} - flexible attributes, d - decision attribute. (its values are L – Low profitability customer, and H – High profitability customer) Decision Table (r1, r2)- action rule: [(b, P S)](x)[(d, L H)](x) Rules discovered: r1 =[ (b, P)  (d, L)] r2 =[(a, 2) ^(b, S)  (d, H)]

Practical Examples • Next, we see some action rules extracted from 3 different databases – 2 in medical domain, and 1 in financial domain: • Binding to thrombin database • Insurance company benchmark database • Breast cancer database • However, we first need to introduce one more notation – the cost of an action rule

Cost of Action Rule • Usually, there is a cost (monetary or moral) association with undertaking some kind of an action. For example: • decreasing the interest rate on a customer account (re-classifying the customer from one interest rate group to another) may cost us mailing a letter to them, and doing some internal administration to the account, say $5. • relocating an employee from one city to another (re-classifying the employee from one division to another) may cost us the moving expense, in addition to a moral cost – some negative emotions of the employee about it may influence his/her future performance or perception of our organization. • We will denote the cost with  - a number from 0 to + . The cost will be close to 0 if the action is trivial (very easy to accomplish) and the cost will be close to plus infinity + if the action is very difficulty (almost impossible) to accomplish.

Cost of Action Rule Assumption: If we have an information system S, S= (X, A, V) , where X are the objects, A are the attributes, and V are the values of the attributes, assume attribute b  A is flexible, and b1, b2 Vb (b1 and b2 are some of the values of b). By S(X, b1, b2) we mean a number from (0, +] which describes the average predicted cost of approved action associated with a possible re-classification of qualifying objects X from class b1 to class b2. Object X qualifies for re-classification from b1 to b2, if b(X) = b1 (currently the value of b is b1 for that object)

Cost of Action Rule Action rule r: [(b1, v1→ w1)  (b2, v2→ w2)  … ( bp, vp→ wp)](x)  (d, k1→ k2)(x) The cost of the left hand side of the rule r (costLeft) equal to the sum of the costs of the terms listed in the left hand side: costLeft = {S(vi , wi) : 1  i  p} Action rule r is feasible in S, if costLeft< S(k1 , k2). For any feasible action rule r, the cost of the conditional Part(left hand side) of r is lower than the cost of its decision part (right hand side, where the decision attribute is listed)

Binding to Thrombin Database • The first database, is the Binding to Thrombin database, is used for drug design, and provided in the KDD Cup 2001 Competition. • Drugs are typically small organic molecules that achieve their desired activity by binding to a target site on a receptor. The first step in the discovery of a new drug is usually to identify and isolate the receptor to which it should bind, followed by testing many small molecules for their ability to bind to the target site. • This leaves researchers with the task of determining what separates the active (binding) compounds from the inactive (non-binding) ones. Such a determination can then be used in the design of new compounds that not only bind, but also have all the other properties required for a drug (solubility, oral absorption, lack of side effects, appropriate duration of action, toxicity, etc.).

Binding to Thrombin Database • The data set consists of 1909 compounds (the objects /rows in the database) tested for their ability to bind to a target site on thrombin, a key receptor in blood clotting. • Each compound is described by binary features (the attributes / columns in the database), which describe three-dimensional properties of the molecule.Biological activity in general, and receptor binding affinity in particular, correlate with various structural and physical properties of small organic molecules. The task with KDD Cup 2001 was to determine which of these properties are critical in this case and to learn to accurately predict the class value: Active or Inactive. • In this testing we use the class attribute, which has value A for active and I for inactive, as the re-classification attribute for the actionRules. In this way, we provide suggestions to the user to what molecular properties can be changed in order to reclassify the chemical compound from inactive to active class, in order to bind to thrombin.

Binding to Thrombin Database • The following results were found with LowestCostReclassifier (software for extracting action rules of lowest cost) : decisionAttribute = activityvalueFrom = 0valueTo = 1 dataFile= thrombinnumberOfObjects: 1908 minConfidenceL1 = 0.65minFeasibilityL3 = 0.0001knownCost = 0.3maxCostL2 = 0.01 ---- Goal Node:(f304, 1->0 | 0.12327) => (activity, 0->1 | 0.3) 1(f172, 0->1 | 0.00472118) => (f304, 1->0 | 0.12327) 0.998424 ---- Action Rule of Min Cost Found: ----(f172, 0->1 | 0.00472118) => (activity, 0->1 | 0.3) 0.998424

Insurance Company Benchmark database • The next database used is in the financial domain, the Insurance Company Benchmark (COIL 2000) database used with the CoIL 2000 Challenge. • The data contains 5,822 tuples (the customers /rows in the database). The features (the attributes / columns in the database) include product usage data and socio-demographic data derived from zip area codes. • The data was supplied by the Dutch data mining company Sentient Machine Research and is based on a real world business problem. • In our testing the user would like to reclassify the attribute Contribution car policies from a value of 5 to 6.

Insurance Company Benchmark database • The following results were found with LowestCostReclassifier : decisionAttribute = Contribution car policiesvalueFrom = 5valueTo = 6 dataFile= InsurancenumberOfObjects: 5822 minConfidenceL1 = 0.72minFeasibilityL3 = 0.0001knownCost = 0.4maxCostL2 = 0.02 ---- Goal Node:(Private health insurance, 3->4 | 0.02423844) => (Contribution car policies, 5->6 | 0.4) 0.833333(High level education, 2->3 | 0.00176027) ^ (Social class B1, 2->4 | 0.0146667) => (Private health insurance, 3->4 | 0.3) 0.714286 ---- Action Rule of Min Cost Found: ---- (High level education, 2->3 | 0.00176027) ^ (Social class B1, 2->4 | 0.0146667) => (Contribution car policies, 5->6 | 0.4) 0.714286

Breast Cancer Database Time • Another database used is a breast cancer database. It was obtained at the University of Wisconsin Hospitals, Madison from Dr. William H. Wolberg. • It contains a class attribute which classifies the tumor as benign or malignant. The rest of the attributes (columns in the database) contain descriptions of common factors radiologists, and pathologists examine in order to place the diagnosis, such as Clump Thickness, Uniformity of Cell Shape, Bare Nuclei, etc. • The database has 700 instances (the objects /rows in the database), Benign: 458 (65.5%) and Malignant: 241 (34.5%). • The class attribute is used as the re-classification attribute with LowestCostReclassifier. This way, we will provide suggestions/actions to be undertaken in order to change the class from malignant to benign.

Breast Cancer Database • The following results were found with LowestCostReclassifier : decisionAttribute = ClassvalueFrom = 4valueTo = 2 dataFile= brcancernumberOfObjects: 699 minConfidenceL1 = 0.7minFeasibilityL3 = 0.0001knownCost = 0.3maxCostL2 = 0.00059.. . !!!!!!!List of sifted action rules is emptyi.e. no feasible action rules were found, sothis state/child has no descendents:(Marginal Adhesion, 1->3 | 0.00128449) @@@@@@@@ Traversing next nodeThe list Q of nodes traversed was empty.

Breast Cancer Database ---- Best Node ----(Uniformity of Cell Size, 8->1 | 0.00179706) ^ (Normal Nucleoli, 5->3 | 0.00721622) => (Class, 4->2 | 0.3) 1(Marginal Adhesion, 1->3 | 0.00128449) => (Uniformity of Cell Size, 8->1 | 0.00179706) 0.85 ---- Action Rule of Min Cost Found: ----(Marginal Adhesion, 1->3 | 0.00128449) => (Class, 4->2 | 0.3) 0.85 • In this case, the goal could not be reached, possibly because the desired maximum cost specified, 0.00059was too low. However, still the best node found thus far was returned, which cost 0.00128449 is still lowerthat the currently known cost to the user 0.3.

Actionable Rules in Knowledge Discovery: Uncovering Valuable Insights