High Frequent Value Reduct in Very Large Databases

High Frequent Value Reduct in Very Large Databases Tsau Young Lin San Jose State University, USA Jianchao Han California State University Dominguez Hills, USA RSFDGrC-2007

Agenda • Introduction • Decision table reduction review • Our method • An example • Conclusion RSFDGrC-2007

Introduction • Rough Set • Finding decision rules in decision tables • Reduction • Row (Horizontal) reduction • merging duplicate rows • Column (Vertical, or Attribute) reduction • finding important attributes • Value reduction • simplifying decision rules RSFDGrC-2007

Finding Value Reduct RSFDGrC-2007

Row Reduction: Step 1 • An equivalence relation can be defined by RESULT: ID-i  ID-j iff ID-i.RESULT = ID-j.RESULT • It partitions the transaction into three decision classes: • DECISION1={ID-1, ID-2, ID-3, ID-4, ID-5, ID-6, ID-7, ID-8, ID-9}={1} • DECISION2={ID-10, ID-11, ID-12, ID-13, ID-14}={2} • DECISION3={ID-15, ID-16, ID-17, ID-18}={3} RSFDGrC-2007

Row Reduction: Step 2 • For the conditional attributes {TEST, LOW, HIGH, CASE, NEW}, we have the following condition classes: • CONDITION1 = {ID-1, ID-2}; • CONDITION2 = {ID-3}; • CONDITION3 = {ID-4, …, ID-9}; • CONDITION4 = {ID-10}; • CONDITION5 = {ID-11, …, ID-14}; • CONDITION6 = {ID-15}; • CONDITION7 = {ID-16, ID-17, ID-18}. RSFDGrC-2007

Row Reduction: Step 3 • Decision rules • R1: CONDITION1  DECISION1; • R2: CONDITION2  DECISION1; • R3: CONDITION3  DECISION1; • R4: CONDITION4  DECISION2; • R5: CONDITION5  DECISION2; • R6: CONDITION6  DECISION3; • R7: CONDITION7  DECISION3. RSFDGrC-2007

Attribute Reduction • Finding attribute reducts • Two minimal attribute reducts: • {TEST, LOW, HIGH, CASE} • {TEST, LOW, HIGH, NEW}. RSFDGrC-2007

Value Reduction • Finding value reduct for each rule • [Rule] <attribute> represents equivalent classes • Consider Rule 1 • [R1]TEST ={R1, R2, R5, R7}; • [R1]LOW ={R1, R7}; • [R1]HIGH ={R1, R5, R6}; • [R1]CASE ={R1, R2, R4, R5, R6, R7} RSFDGrC-2007

Value Reduct • Find the family • F = {[R1]TEST, [R1]LOW, [R1]HIGH, [R1]CASE}, F = {R1} • Value reduct is the minimal subfamilies {[R1]LOW, [R1]HIGH }, such that [R1]LOW [R1]HIGH = F If choose theattribute reduct {TEST, LOW, HIGH, NEW}, obtain different rules RSFDGrC-2007

Our Method • Finding value reduct without finding attribute reduct • Avoid computation of selecting attribute reduct • Do not miss any rules when selecting attribute reduct • Forming rules from frequent rows • Originated from association rules • Easy implementation in DBMS RSFDGrC-2007

Algorithm: Finding all decision rules Input: A decision (relational) table T condition attribute set C, a decision attribute d; a minimum support threshold s Output: RB, a set of decision rules RB  empty For k=1 to |C| Do RBk  empty For each subset of C, A of size k, Do TA  create a subset from T with all columns in A Remove all inconsistent and insufficient support tuples from TA For each remaining tuple r in TA Do If r is not covered by R Then RBk  RBk  {r} If RBk = empty Then Return RB Else R R  RBk Return R RSFDGrC-2007

Implementation in SQL • Create a subset of T from a given subset A of C and remove all inconsistent and insufficient support tuples • Assume A = {A1, A2, …, Ap}, then the following SQL statement works: CREATE VIEW TA SELECT A1, A2, …, Ap, d, sum(support) FROM (SELECT A1, A2, … Ap, d, count(*) support FROM T GROUP BY A1, A2, …, Ap, d) GROUP BY A1, A2, …, Ap HAVING count(*) = 1 and sum(support) >= s RSFDGrC-2007

An Example • Previous decision with the support threshold s=1 • Loop 1: Finding frequent 2-itemset. • Two consistent rules RSFDGrC-2007

An Example: Continue • Finding frequent 3-itemset -- Three consistent rules • Finding frequent 4-itemset -- Three consistent rules RSFDGrC-2007

An Example: Continue • No any 5-items which are consistent and not covered by RB • The output RB is the union of above tables: 2-item rules • R1’’: CASE = 3  RESULT = 1 with support = 6 • R2’’: NEW = 2  RESULT = 1 with support = 6 3-item rules • R3’’: TEST = 0, HIGH = 0  RESULT = 3 with support = 1 • R4’’: LOW = 0, HIGH = 0  RESULT = 1 with support = 2 • R5’’: LOW = 0, HIGH = 1  RESULT = 3 with support = 3 4-item rules • R6’’: TEST = 1, LOW = 1, HIGH = 1  RESULT = 1 with support 1 • R7’’: TEST = 1, LOW = 1, HIGH = 0  RESULT = 2 with support 4 • R8’’: TEST = 0, HIGH = 1, CASE = 2  RESULT = 2 with support 1 • R9’’: TEST = 0, HIGH = 1, NEW = 1  RESULT = 2 with support 1 RSFDGrC-2007

Conclusion • Reviewed various approaches to reducing decision tables to form decision rules • Present a new method to find decision rules directly from value reduction • Discuss the algorithm implementation in SQL • Demonstrate the method with an example RSFDGrC-2007

High Frequent Value Reduct in Very Large Databases

High Frequent Value Reduct in Very Large Databases

Presentation Transcript

Technologies for Mining Frequent Patterns in Large Databases

Mining Association Rules in Large Databases

Birch: An efficient data clustering method for very large databases

Very Large Databases

Mining Association Rules in Large Databases

Large Databases – Introduction

Mining Frequent Itemsets over Uncertain Databases

Fast Frequent Free Tree Mining in Graph Databases

Mining Association Rules in Large Databases

New Challenges in Large Simulation Databases

Progress VLDB (Very Large DataBases )

Mining Frequent Itemsets over Uncertain Databases

Very large numbers!

Mining Probabilistically Frequent Sequential Patterns in Uncertain Databases

Large Databases in Industry

Progress VLDB (Very Large DataBases )

Very large numbers!

Large Scientific Databases