ACTION RULES /Lecture II/

ACTION RULES/Lecture II/ presented by Zbigniew Ras UNC-Charlotte, Computer Science

Action Rules [Z. Ras & A. Wieczorkowska] Decision table Any information system of the form S = (U, AFl ASt {d}), where • d  AFl ASt is a distinguished attribute called decision. • The elements of ASt are called stable conditions • the elements of AFl {d} are called flexible conditions Example of action rule: [ (b1, v1 w1)  (b2, v2 w2)  …  (bp, vp wp)](x)  [(d, k1 k2)](x) Assumption: (i)[(1 i  p)  (biAFl)]

Action Rules X a b c d x1 0 S 0 L x2 0 R 1 L x3 0 S 1 L x4 0 R 1 L x5 2 P 2 L x6 2 P 2 L x7 2 S 2 H {a, c} - stable attributes, {b,d} - flexible attributes, d - decision attribute. Decision Table (r1, r2)- action rule: [(b, P S)](x)[(d, L H)](x) Rules discovered: r1 =[ (b, P)  (d, L)] r2 =[(a, 2) ^(b, S)  (d, H)] Notation: (r2)={a,b}, (r2)=d.

E-Action Rules [L.-S. Tsay & Z. Ras] St Flex St Flex St Flex Decision A B C D E F G a1*b1*c1* d1g1 a1*b2* e2 f2g2 E-Action rule: (B, b1 b2) ^ (E = e2) ^ (F,  f2)  (G, g1 g2) What about support & confidence of action rules?

[Object-Based] Support of Action Rules Action rule r:[ (b1, v1 w1)  (b2, v2 w2)  …  (bp, vp wp)](x)  [(d, k1 k2)] (x) Object x certainly supports rule r in S = (X, A) if:1) (i  p)[ bi(x)= vi] and d(x) = k12) (y  X)(i  p)[ bi(y) = wi ] and d(y) = k23) (b  A – [{bi : 1  i  p}  {d}])[ b(x) = b(y) ] CSupS(r) = card{x: x certainly supports r in S}

Action rule r: [(b1, v1 w1)  (b2, v2 w2)  …  (bp, vp wp)](x)  [(d, k1 k2)] (x) Object x possibly supports rule r in S = (X, A) if:1) (i  p)[ bi(x) = vi ] and d(x) = k12) (y  X)(i  p)[ bi(y) = wi] and d(y) =k2 3) (c  ASt)[c(x) = c(y)] [Object-Based] Support of Action Rules PSupS(r) = card{x: x possibly supports r in S}

[Rule-Based] Support of Action Rules Action rule r: [(b1, v1 w1)  (b2, v2 w2)  …  (bp, vp wp)](x)  [(d, k1 k2)] (x) Object xX supports rule r in S = (X, A), if there are two rules r1, r2 extracted from S and there exists object y  X satisfying two conditions:  (i  p)[[ bi (r1)]  [bi(x) = vi]] (r1)=dd(x) = k1  (i  p)[[ bi (r2)]  [bi(y) = wi]] (r2)=dd(y) = k2  [[b  ASt]  [b(x) = b(y)]] (r2) = {a1,a2,b1,b2,c1,c2} Confidence: ConfS(r) = RSupS(r)/SupS(r1) (r1) RSupS(r) = card{x: x supports r in S} (r2)

Cost of Action Rule [Ras & Tzacheva] Assumption: S= (X, A, V) is information system, Y  X. Attribute b  A is flexible in S and b1, b2 Vb. By S(Y, b1, b2) we mean a number from (0, +] which describes the average predicted cost of approved action associated with a possible re-classification of qualifying objects in Y from class b1 to b2. Object x  Y qualifies for re-classification from b1 to b2, if b(x) = b1. S(Y, b1, b2) = +, if there is no action approved which is required for a possible re-classification of qualifying objects in Y from class b1 to b2 If Y is uniquely defined, we often write S(b1, b2) instead of S(Y, b1, b2).

Cost of Action Rule Action rule r: [(b1, v1→ w1)  (b2, v2→ w2)  … ( bp, vp→ wp)](x)  (d, k1→ k2)(x) The cost of r in S: costS(r) = {S(vi , wi) : 1  i  p} Action rule r is feasible in S, if costS(r) < S(k1 , k2). For any feasible action rule r, the cost of the conditional part of r is lower than the cost of its decision part.

Extension: Cost of Action Rule RS[(d, k1 → k2)] denotes set of feasible action rules in S having term (d, k1 → k2) as their decision part. Assumption: Among action rules in RS[(d, k1 → k2)] the user identifies rule r of minimal cost value. But that cost value may still be too high to get his approval for implementation of r. The cost of r might be high because of the high cost of one of its sub-terms (bj, vj → wj). In such case, we may look for an action rule in RS[(bj, vj → wj)] of minimal cost value needed to re-classify qualifying objects from vj to wj. Rules short on left side. It was observed such rules were not interesting – active mining.

Cost of Action Rule Example: r = [(b1, v1 → w1) … (bj, vj → wj) …  ( bp, vp → wp)](x)  (d, k1 → k2)(x) In RS[(bj, vj → wj)] we find r1= [(bj1, vj1 → wj1)  (bj2, vj2 → wj2)  … ( bjq, vjq → wjq)](x) (bj, vj → wj)(x) Then, we can compose r with r1 and the same replace term (bj, vj → wj) by term from the left hand side of r1: [(b1, v1 → w1)  … [(bj1, vj1 → wj1)  (bj2, vj2 → wj2)  …  ( bjq, vjq → wjq)] … ( bp, vp → wp)](x)  (d, k1 → k2)(x)

Search Graph [Tzacheva & Ras] • In order to construct action rules of the lowest cost, we build Search Graph GS, which is a directed graph, that is dynamically built by applying action rules discovered from S to its nodes. • The initial node n0of the graph GS contains information coming from the user, associated with the system S, about what objects he/she would like to reclassify(ex. from the class described by value k1 of the attribute d to the class k2) and what is the current cost, S(k1, k2), of thereclassification k1 → k2 . • Any other node n in GS shows an alternative wayto achieve the same reclassification with a cost that is lowerthan the cost assigned to all nodes which are preceding n in GS.

Search Graph Assume that N is the set of nodes in graph GS and n0 is its initial node. For any node nN, by f(n) = (Yn, {[ vn,j → wn,j , S(vn,j, wn,j)]} j In) we mean its domain(set of objects in S), set of actions needed to reclassify objects from Yn, and their cost, where Yn X. We say that action rule r, discovered from S, is applicable to node n if: Yn  RSupS(r) ≠ Ø (k  In)[r RS[ vn,kj → wn,k]]

Information System SRS [(d, k1 →, k2)] r1 r2 r3 rn n0 = {[ k1 → k2 , S (k1, k2)]} r = [(b1, v1 → w1) ^ (b2, v2 → w2)^ … ^( bp, vp → wp)](x) => (d, k1 → k2)(x) n1 = {[ v1 → w1 , S (v1, w2)], [ v2 → w2 , S (v2, w2)], …, [ vp → wp , S (vp, wp)]} r1 n2 n3 r4 rn rj nn Figure 4. Lowest Cost Reclassification Search Graph in a standalone system S Minimal Cost Reclassification Search Graph for S.

Search GraphProperties Property 1. Let f(n0) = (Y, {[k1→k2, S(k1,k2)]}), f(n) = (Yn, {[ vn,,k → wn,,k , S (vn,,k, wn,,k)]}k  In). The cost assigned to the node n for reclassifying x  Yn from k1to k2 is equal to: Costk1→k2(n, x) = {S(vn,,k, wn,,k): k  In} Property 2. If node n2 is a successor of the node n1, then Confk1→k2(n2, x)Confk1→k2(n1, x) Property 3. If node n2 is a successor of the node n1, then Costk1→k2(n2, x)Costk1→k2(n1, x)

Search for Action Rules [Tzacheva & Ras] • We propose A* type algorithm for speeding up the construction ofthe shortest path from the root to the goal nodein graph GS. • A* is probably one of the most popular search algorithms in AI. It is an informed, optimal search algorithm, which uses a heuristic estimate of remaining distance to the goal by means of a heuristic function h(N) . • We assume that user provides three threshold values: 1 - threshold for minimum confidence of action rules. 2 - threshold for maximum cost of action rules. 3 - threshold for minimum feasibility of action rules.

Heuristic Method - A* • We assume that: h(ni) = [cost(ni,Yi) - 2]/3 • Heuristic value h(ni) is associated with any node ni in G. It shows the maximal number of steps that might be needed to reach the goal. • Also, we assume that: g(ni) is the number of edges to the current node • Then, we associate an estimated path length to the goal for each node as follows: f(ni) = h(ni) + g(ni)

Proposed Algorithm - A* Initialize Q with search node [([conf(no),h(no)],[no])] as the only entry; Initialize domain of no (given by user) as Yo. If Q is empty, fail. Else, pick search nodes from Q with a least value of f. If two search nodes in Q have the same least value of f assigned to them, if an ontology is available, pick search node s from Q with the highest value of Ont(s). If state(s) is a goal and conf(s)1, return s (we have reached the goal). Otherwise remove s from Q. Find all children of state(s) and create all the one-step extensions of s to each descendant. If state(s1) is a child of state(s) and r is the action rule applied to s in order to move from s to s1, then initialize Ystate(s1) as Ystate(s) DomS(r) and if an ontology is available, Ont(s1) as Ont(r) 6.Add all the extended paths to Q; 7. Go to step 2.

Implementation and Testing • The heuristic strategy for lowest cost reclassification – LowestCostReclassifier software,is implemented in C++ using the Microsoft Visual Studio 7.0 IDE and compiler. • The user is asked to enter the attribute in which he/she is interested in reclassifying, its current and the desired values. Also the user chooses the following 3 thresholds: 1 - minimum confidence of action rules 2 - maximum cost of action rules 3 - minimum feasibility of action rules. And the currently known to the user cost of reclassification • The action rules have the following form: (attribute, valueFrom - > valueTo | cost ) => (attribute, valueFrom -> valueTo | cost) confidence • The LowestCostReclassifier software was tested and applied to three different databases. Two in medical domain, and one in financial domain.

Conclusions • We extract action rules as per the original algorithm presented in [62]. Next, we proposed a heuristic approach using A* algorithm of building a search graph G which will identify an action rule of the lowest cost considering three thresholds the user provides: min confidence, max cost, and min feasibility. • Further, we observed that even the maximum cost threshold is not reachable, we will still return the best node found thus far, which cost would still be lower than the currently known cost to the user. • In that sense, the leaves in our graph G and the nodes close to them would represent the most actionable knowledge and the same the mostly unexpected/interesting knowledge related to a desired reclassification of objects.

Final Claims Subjective measure: user-driven, domain-dependent. Include unexpectedness [Silberschatz and Tuzhilin, 1995], novelty, actionability [Piatesky-Shapiro & Matheus, 1994]. Claim1[Suzuki, Padmanabhan & Tuzhilin] Unexpectednessis partially an objective concept. A  B is unexpected with respect to the belief  on the dataset D if the following conditions hold: B   = False [ B and  logically contradict each other] A   holds on a large subset of D A*  B holds which means A*   Our Claim: Actionability is partially an objective concept. Actionability measure = Cost of an action rule

Final Claims Questions? • Our Claim: the most cheap rules are most of actionable • Claim 2 [Silberschatz & Tuzhilin] • the most of actionable rules are unexpected • Our Claim: The most cheap rules are unexpected • References: • Z. Ras, A. Tzacheva, L.-S. Tsay, “Action Rules”, in Encyclopedia of Data Warehousing and Mining, • (Ed. J. Wang), Idea Group Inc., 2005, will appear • A. Tzacheva, Z. Ras, "Action rules mining", in the Special Issue on Knowledge Discovery, • International Journal of Intelligent Systems, Wiley, 2005, will appear • A. Tzacheva, Z. Ras, “Discovering non-standard semantics of semi-stable attributes”, • Proceedings of Flairs-2003, St. Augustine, Florida, AAAI Press, 2003, 330-334 Thank You

ACTION RULES /Lecture II/