1 / 17

High Frequent Value Reduct in Very Large Databases

High Frequent Value Reduct in Very Large Databases. Tsau Young Lin San Jose State University, USA Jianchao Han California State University Dominguez Hills, USA. Agenda. Introduction Decision table reduction review Our method An example Conclusion. Introduction. Rough Set

Download Presentation

High Frequent Value Reduct in Very Large Databases

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. High Frequent Value Reduct in Very Large Databases Tsau Young Lin San Jose State University, USA Jianchao Han California State University Dominguez Hills, USA RSFDGrC-2007

  2. Agenda • Introduction • Decision table reduction review • Our method • An example • Conclusion RSFDGrC-2007

  3. Introduction • Rough Set • Finding decision rules in decision tables • Reduction • Row (Horizontal) reduction • merging duplicate rows • Column (Vertical, or Attribute) reduction • finding important attributes • Value reduction • simplifying decision rules RSFDGrC-2007

  4. Finding Value Reduct RSFDGrC-2007

  5. Row Reduction: Step 1 • An equivalence relation can be defined by RESULT: ID-i  ID-j iff ID-i.RESULT = ID-j.RESULT • It partitions the transaction into three decision classes: • DECISION1={ID-1, ID-2, ID-3, ID-4, ID-5, ID-6, ID-7, ID-8, ID-9}={1} • DECISION2={ID-10, ID-11, ID-12, ID-13, ID-14}={2} • DECISION3={ID-15, ID-16, ID-17, ID-18}={3} RSFDGrC-2007

  6. Row Reduction: Step 2 • For the conditional attributes {TEST, LOW, HIGH, CASE, NEW}, we have the following condition classes: • CONDITION1 = {ID-1, ID-2}; • CONDITION2 = {ID-3}; • CONDITION3 = {ID-4, …, ID-9}; • CONDITION4 = {ID-10}; • CONDITION5 = {ID-11, …, ID-14}; • CONDITION6 = {ID-15}; • CONDITION7 = {ID-16, ID-17, ID-18}. RSFDGrC-2007

  7. Row Reduction: Step 3 • Decision rules • R1: CONDITION1  DECISION1; • R2: CONDITION2  DECISION1; • R3: CONDITION3  DECISION1; • R4: CONDITION4  DECISION2; • R5: CONDITION5  DECISION2; • R6: CONDITION6  DECISION3; • R7: CONDITION7  DECISION3. RSFDGrC-2007

  8. Attribute Reduction • Finding attribute reducts • Two minimal attribute reducts: • {TEST, LOW, HIGH, CASE} • {TEST, LOW, HIGH, NEW}. RSFDGrC-2007

  9. Value Reduction • Finding value reduct for each rule • [Rule] <attribute> represents equivalent classes • Consider Rule 1 • [R1]TEST ={R1, R2, R5, R7}; • [R1]LOW ={R1, R7}; • [R1]HIGH ={R1, R5, R6}; • [R1]CASE ={R1, R2, R4, R5, R6, R7} RSFDGrC-2007

  10. Value Reduct • Find the family • F = {[R1]TEST, [R1]LOW, [R1]HIGH, [R1]CASE}, F = {R1} • Value reduct is the minimal subfamilies {[R1]LOW, [R1]HIGH }, such that [R1]LOW [R1]HIGH = F If choose theattribute reduct {TEST, LOW, HIGH, NEW}, obtain different rules RSFDGrC-2007

  11. Our Method • Finding value reduct without finding attribute reduct • Avoid computation of selecting attribute reduct • Do not miss any rules when selecting attribute reduct • Forming rules from frequent rows • Originated from association rules • Easy implementation in DBMS RSFDGrC-2007

  12. Algorithm: Finding all decision rules Input: A decision (relational) table T condition attribute set C, a decision attribute d; a minimum support threshold s Output: RB, a set of decision rules RB  empty For k=1 to |C| Do RBk  empty For each subset of C, A of size k, Do TA  create a subset from T with all columns in A Remove all inconsistent and insufficient support tuples from TA For each remaining tuple r in TA Do If r is not covered by R Then RBk  RBk  {r} If RBk = empty Then Return RB Else R R  RBk Return R RSFDGrC-2007

  13. Implementation in SQL • Create a subset of T from a given subset A of C and remove all inconsistent and insufficient support tuples • Assume A = {A1, A2, …, Ap}, then the following SQL statement works: CREATE VIEW TA SELECT A1, A2, …, Ap, d, sum(support) FROM (SELECT A1, A2, … Ap, d, count(*) support FROM T GROUP BY A1, A2, …, Ap, d) GROUP BY A1, A2, …, Ap HAVING count(*) = 1 and sum(support) >= s RSFDGrC-2007

  14. An Example • Previous decision with the support threshold s=1 • Loop 1: Finding frequent 2-itemset. • Two consistent rules RSFDGrC-2007

  15. An Example: Continue • Finding frequent 3-itemset -- Three consistent rules • Finding frequent 4-itemset -- Three consistent rules RSFDGrC-2007

  16. An Example: Continue • No any 5-items which are consistent and not covered by RB • The output RB is the union of above tables: 2-item rules • R1’’: CASE = 3  RESULT = 1 with support = 6 • R2’’: NEW = 2  RESULT = 1 with support = 6 3-item rules • R3’’: TEST = 0, HIGH = 0  RESULT = 3 with support = 1 • R4’’: LOW = 0, HIGH = 0  RESULT = 1 with support = 2 • R5’’: LOW = 0, HIGH = 1  RESULT = 3 with support = 3 4-item rules • R6’’: TEST = 1, LOW = 1, HIGH = 1  RESULT = 1 with support 1 • R7’’: TEST = 1, LOW = 1, HIGH = 0  RESULT = 2 with support 4 • R8’’: TEST = 0, HIGH = 1, CASE = 2  RESULT = 2 with support 1 • R9’’: TEST = 0, HIGH = 1, NEW = 1  RESULT = 2 with support 1 RSFDGrC-2007

  17. Conclusion • Reviewed various approaches to reducing decision tables to form decision rules • Present a new method to find decision rules directly from value reduction • Discuss the algorithm implementation in SQL • Demonstrate the method with an example RSFDGrC-2007

More Related