1 / 28

Modifying Logic of Discovery for Dealing with Domain Knowledge in Data Mining

Modifying Logic of Discovery for Dealing with Domain Knowledge in Data Mining. Jan Rauch University of Economics , Prague Czech Republic. Modifying Logic of Discovery for Dealing with Domain Knowledge in Data Mining.  Presented an idea of a theoretical approach

kueng
Download Presentation

Modifying Logic of Discovery for Dealing with Domain Knowledge in Data Mining

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Modifying Logic of Discovery for Dealing with Domain Knowledge in Data Mining Jan Rauch University ofEconomics, Prague CzechRepublic

  2. Modifying Logic of Discovery for Dealing with Domain Knowledge in Data Mining  Presented an idea of a theoretical approach  There are software tools for partial steps • Logic of discovery • Modifications • 4ft-Discoverer

  3. Logic of Discovery Can computers formulate and verify scientific hypotheses? Can computers in a rational way analyze empirical data and produce reasonable reflection of the observed empirical world? Can it be done using mathematical logic and statistics? 1978

  4. Logic of Discovery (simplified) State dependent structure Data matrix M 1: 1 Theoretical statements Theoretical calculi Observational statements Observational calculi Statistical hypothesis tests

  5. Association rules – observational statements M M Val(, M)  {0,1} …. hypothesis tests

  6. GUHA Procedure ASSOC – a tool for finding a set of interesting association rules Val(, M) = 1  is prime:  is true + does not logically follow from other more simple

  7. Deduction rules in logic of association rules Examples : Theorem for : is correct if and only if (1) or (2) (1) 1A and 1B tautologies of propositional calculus (2) 2 tautology 1A , 1B , 2 , created from , , ‘ , ‘ Theorems for additional 4ft-quantifiers: is correct iff Applications: prime rules + dealing with knowledge in data mining

  8. Data mining – CRISP-DM http://www.crisp-dm.org/ Beer  BMI Wine region, sportsmen, … Logical calculus … Analytical report

  9. Data mining – CRISP-DM http://www.crisp-dm.org/ Beer  BMI Wine region, sportsmens, … Analytical report Logical calculus ??

  10. Modifying Logic of Discovery Logic of discovery Theoretical statements Logical calculus of associational rules Logic of association rules mining Logical calculus of associational rules A1 A2, A3 A4 … ; Cons(A1 A2 ), … Statements on data matrices , evaluation , Cons

  11. Logic of association rules mining (simplified) • Type of M: number of columns + possible values , •  • Val(, M) • Items of domain knowledge: Beer  BMI, … • Consequences of domain knowledge Cons(Beer  BMI), … LCAR Logical Calculus of Association Rules DK AR Beer (8-10) 0.9,50BMI (>30)  Status (W)

  12. Atomic consequences of Beer  BMI (simplified) Cons(Beer  BMI) • Beer(low)0.9,50 BMI(low ) • Beer(high)0.9,50 BMI(high) Beer: 0, 1, 2, …., 15 Low:  -  ,  = 0, …, 5 High:  -  ,  = 10, …, 15 • Beer(0 – 3)0.9,50 BMI(15 – 18) • Beer(11 – 13)0.9,50 BMI(29 – 31) BMI: 15, 1, 2, …., 35 Low:  -  ,  = 15, …, 22 High:  -  ,  = 28, …, 35 • … • … • Beer(2 – 4)0.9,50 BMI(17 – 22) • Beer(14 – 15)0.9,50 BMI(30 – 35) • … • …

  13. 4ft-Discoverer 4ftD = LCAR, DK AR, 4ft-Miner, 4ft-Filter, 4ft-Synt • Under implementation, based • on Cons(Beer  BMI) and

  14. Applying 4ft-Discoverer New knowledge not following from Beer  BMI true in given data M ? 4ft-Miner 4ft-Filter Rules not following from Beer  BMI 4ft-Synt Consequences of Beer  BMI Particular interesting rules New knowledge C  D, E F

  15. 4ft-Filter 4ft-Miner Cons(Beer  BMI) • Set ofp, Base  • Set ofBeer() .09,  50BMI() • Eachp, Base  : Is there Beer() .09,  50BMI() such that • iscorrect ? + • Filter out p, Base 

  16. 4ft-Synt 4ft-Miner Cons(C D) • Set ofp, Base  • Set ofC() .09,  50D() • Isthereenoughp, Base andC() .09,  50D() such that • iscorrect ? + • Consider C D as a candidate of new knowledge

  17. Conclusions • Richassociationrules , , • Criteriaofcorrectnessfordeductionrules • FormallanguagefordomainknowledgeBeer  BMI, … • AtomicconsequencesBeer(low) p, Base BMI(low) , …, Beer() p, Base BMI() • Conversion Beer  BMI     via • Partially implemented http://lispminer.vse.cz/, http://sewebar.vse.cz/ http://sewebar.vse.cz/RuleML_demo/final/final.html

  18. Thank you

  19. Examples of 4ft-quantifiers – statisticalhypothesistests Lower critical implication for0 < p  1, 0 << 0.5 : The rule  !p; corresponds to the statistical test (on the level )of the null hypothesis H0: P( |  ) p againstthe alternative one H1: P( |  ) > p. Here P( |  ) is the conditional probability of thevalidity of  under the condition . Fisher’s quantifier for0 << 0.5 : The rule  ,Base corresponds to the statistical test (on the level  of the null hypothesis of independence of  and  against the alternative one of the positive dependence.

  20. 4ft-Miner, importantsimple 4ft-quantifiers Foundedimplication: Double founded implication: Founded equivalence: Above Average: „Classical“:

  21. Associational and implicational quantifiers The generalizedquantifier  isassociational if it satisfies: If (a,b,c,d) = 1 and a’  a  b’  b c’ c d’dthen also (a’,b’,c,d) = 1 Examples: The generalizedquantifier  isimplicationalif it satisfies: If (a,b,c,d) = 1 and a’  a  b’  b then also (a’,b’,c,d) = 1 Examples:

  22. Despecifying-dereducingdeduction rule SpRd whereisimplicationalissoundifthereis a such that despecifies to anddereduces to Anexample: despecifies to dereduces to insteadof

  23. Deductionrulesand implicationalquantifiers (1) The 4ft quantifier  isimplicationalif it satisfies: If (a,b,c,d) = 1 and a’  a  b’  b then also (a’,b’,c,d) = 1 TPC = a’  a  b’  b is True Preservation Condition for implicational quantifiers •  is a-dependent if there a, a’, b, c, dsuch that (a,b,c,d ) (a’,b,c,d), • b-dependent, …. • If  is implicational then (a,b,c,d ) = (a,b,c’,d’) for all c’, c’ , d, d’ • If * is implicational then we use only *(a,b) instead of *(a,b,c,d)

  24. Deductionrulesand implicationalquantifiers (2) • Definition: The implicational 4ft-quantifier * is interesting implicational if • * is both a-dependent and b-dependent • * (0,0) = 0 Theorem: If * is interesting implicational 4ft-quantifier and R = is a deduction rule then there are propositional formulas 1A, 1B, 2 derived from , , ’, ’ such that Ris sound iff at least one of the conditions i), ii) is satisfied: i) both 1A and 1B are tautologies ii) 2 is a tautology and are examples of interesting implicational 4ft - quantifiers

  25. Overview of classes of 4ft-quantifiers • Additional results: • Dealing with missing information • Tables of critical frequencies • Definabilityin classicalpredicatecalculi • Interesting subclasses

  26. Association rules andthe ASSOC procedure(1) { A, B } {E, F }

  27. Association rules andthe ASSOC procedure(2) { A, B } {E, F } Conf({ A, B } {E, F }) = Supp({ A, B } {E, F }) =

  28. GUHA and association rules http://en.wikipedia.org/wiki/Association_rule_learning#cite_note-pospaper-7 History: The concept of association rules was popularised particularly due to the 1993 article of Agrawal[2], which has acquired more than 6000 citations according to Google Scholar, as of March 2008, and is thus one of the most cited papers in the Data Mining field. However, it is possible that what is now called "association rules" is simliar to what appears in the 1966 paper [7] on GUHA, a general data mining method developed by PetrHájek et al. [8].

More Related