1 / 20

Discovering associations with numeric variables

Discovering associations with numeric variables. Advisor:Dr. Hsu Graduate: Ching-Lung Chen. Outline. Motivation Objective Introduction Alternative numeric data mining techniques The OPUS search algorithm Search for impact rules Evaluation Conclusion Personal Opinion. Motivation.

gmurphy
Download Presentation

Discovering associations with numeric variables

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Discovering associations with numeric variables Advisor:Dr. Hsu Graduate: Ching-Lung Chen IDSL,Intelligent Database System Lab

  2. Outline • Motivation • Objective • Introduction • Alternative numeric data mining techniques • The OPUS search algorithm • Search for impact rules • Evaluation • Conclusion • Personal Opinion IDSL,Intelligent Database System Lab

  3. Motivation • To further develops Aumann and Lindell’s proposal for a variant of association rule for which the consequent is a numeric variable. • Traditional association rules cannot discovered directly with numeric data. IDSL,Intelligent Database System Lab

  4. Objective • To improve the efficiency of algorithm that enable these rules to be discovered for dense data set. IDSL,Intelligent Database System Lab

  5. Introduction • prev-response>0.2 & region=E & socio-class=F => antecdent • profitability: => target count=l05211, mean=12.51,sum=1316190.61, max=205.31, min=-l.05 =>consequent • Association rules have demonstrated the ability to detect interesting associations between fields in a database. • This paper renamed impact rules and characterize it as follows: training set: a finite set of records. conditions:each record is an element. target: which is associated with a numeric value . antecedent: an impact rule consists of a conjunction of conditions. consequent: describing the impact on the target of selecting the training set records that satisfy the antecedent. IDSL,Intelligent Database System Lab

  6. Alternative numeric data mining techniques • Regression techniques • can predict a numeric value for a given set of input parameters. • cannot provide an explicit segmentation of the data. • Association rules • can provision a statistics relating to the frequency of one sub-range • cannot directly address the issue of the impact IDSL,Intelligent Database System Lab

  7. for m elements has search space m 2 The OPUS search algorithm • OPUS provides a efficient search for impact rules ,present by G. I. Webb. Journal of Artificial Intelligence research • It assigning an arbitrary order to the elements, so that each combination of elements was considered just once. Fig. 1.A fixed-structure search space IDSL,Intelligent Database System Lab

  8. If C can be a solution then pruning the nodes below {b} The OPUS search algorithm • Under fix structure search, algorithm typically seek to identify branches that cannot contain a solution and prune those branches. Fig. 2. Pruning a branch from a fixed-structure search space IDSL,Intelligent Database System Lab

  9. removal all nodes containing b. Fig. 3. Pruning all nodes containing a single operator from a fixed-structure search space The OPUS search algorithm • If C can be a solution then a pruning rule identifying that no node containing bmay contain a solution would enable removal all nodes containing b. IDSL,Intelligent Database System Lab

  10. Search for impact rules The OPUS_IR algorithm Algorithm: OPUS_IR(Current, Available) 1. SoFar : = {} 2. FOR EACH P in Available (a) New := Current (b) If pruning rules cannot determine that Available: solution(x New) Then record New relevant stats OPUS_IR(New, SoFar) SoFar := SoFar {p} End if End FOR IDSL,Intelligent Database System Lab

  11. t arg IRule n targ Search for impact rules • is the mean of the target for all records. • value(x) the value of the target for a record x • cover(X)is the set of record for which the condition X is true • ante(ir)is the antecedent of impact rule ir • mean(ir) is the mean value of the target for records covered by ante(ir) is the impact rule with impact(ir) = (mean(ir) - ) * |cover(ante(ir))| IDSL,Intelligent Database System Lab

  12. Search for impact rules • If the follow condition happened then no superset of New may be a solution: • If • If • For all searches, if New covers no records • If |cover(New)| < minsup IDSL,Intelligent Database System Lab

  13. = = = = = n 4 , x d , x c , x b , x a - - 1 2 1 n n n activation of OPUS_IR with Current ={ } x ,..., x - 1 n 2 Relative complexity • If all itemsets is , both and are frequent itemsets. In this case: If for the call to OPUS_IR with Current = Both generate by the activation IDSL,Intelligent Database System Lab

  14. will not be added to SoFar x n { x ,..., x , x } - - 1 n 2 n 1 will not be generated { x ,..., x , x , x } - - 1 n 2 n 1 n no call to OPUS_IR with Current = { x ,..., x , x } - - 1 n 2 n 1 Relative complexity • If is not a frequent itemset then • If is not a frequent itemset then not be passed as a member of Available to the activation with IDSL,Intelligent Database System Lab

  15. will be generated { x ,..., x , x , x } - - 1 n 2 n 1 n will be passed as a value in Available x - n 1 Relative complexity • If both are frequent itemsets then IDSL,Intelligent Database System Lab

  16. Evaluation • OPUS_IR was implemented as a DOS program and run on a 800MHz PIII Windows PC with 256Mb RAM. • This implementation was applied to each data set to find 1000 impact rule with the highest impact on the target. Data Sets Results IDSL,Intelligent Database System Lab

  17. Example rules • Each rule is preceded by the summary population statistics for the data set. The results for the data set from the KDD Repository. IDSL,Intelligent Database System Lab

  18. The frequent itemset approach Frequent Itemset Generation IDSL,Intelligent Database System Lab

  19. Conclusions • Impact rule provide association rule like knowledge discovery for numeric data. • This paper presents efficient techniques for impact rule discovery. • These techniques avoid the need to set arbitrary constraints on the cover of the antecedent of an impact rule. IDSL,Intelligent Database System Lab

  20. Personal Opinion • This algorithm even though avoid the need to set arbitrary constraints but still need to assign the recursive calls depth, maybe we can develop a algorithm for automatic search without limited. IDSL,Intelligent Database System Lab

More Related