1 / 33

First-Order Rule Learning

First-Order Rule Learning. Sequential Covering (I). Learning consists of iteratively learning rules that cover yet uncovered training instances Assume the existence of a Learn_one_Rule function: Input: a set of training instances

acasey
Download Presentation

First-Order Rule Learning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. First-Order Rule Learning

  2. Sequential Covering (I) • Learning consists of iteratively learning rules that cover yet uncovered training instances • Assume the existence of a Learn_one_Rule function: • Input: a set of training instances • Output: a single high-accuracy (not necessarily high-coverage) rule

  3. Sequential Covering (II) • Algorithm Sequential_Covering(Instances) • Learned_rules  • Rule Learn_one_Rule(Instances) • While Quality(Rule, Instances) > Threshold Do • Learned_rules Learned_rules + Rule • Instances Instances - {instances correctly classified by Rule} • Rule Learn_one_Rule(Instances) • Sort Learned_rules by Quality over Instances # Quality is user-defined rule quality evaluation function • Return Learned_rules

  4. CN2 (I) • Algorithm Learn_one_Rule_CN2(Instances, k) • Best_hypo  • Candidate_hypo {Best_hypo} • While Candidate_hypo  Do • All_constraints {(a=v): a is an attribute and v is a value of a found in Instances} • New_candidate_hypo • For each hCandidate_hypo • For each cAll_constraints, specialize h by adding c • Remove from New_candidate_hypo any hypotheses that are duplicates, inconsistent or not maximally specific • For all hNew_candidate_hypo • If Quality_CN2(h, Instances) > Quality_CN2(Best_hypo, Instances) • Best_hypo h • Candidate_hypo the k best members of New_candidate_hypo as per Quality_CN2 • Return a rule of the form “IF Best_hypo THEN Pred” # Pred = most frequent target attribute’s value among instances that match Best_hypo

  5. CN2 (II) • Algorithm Quality_CN2(h, Instances) • h_instances {i Instances: i matches h} • Return -Entropy(h_instances) • where Entropy is computed with respect to the target attribute • Note that CN2 performs a general-to-specific beam search, keeping not the single best candidate at each step, but a list of the k best candidates

  6. Illustrative Training Set

  7. CN2 Example (I) First pass: Full instance set 2-best1: « Income Level = Low » (4-0-0), « Income Level = High » (0-1-5) Can’t do better than (4-0-0) Best_hypo: « Income Level = Low » First rule: IF Income Level = Low THEN HIGH

  8. CN2 Example (II) Second pass: Instances 2-3, 5-6, 8-10, 12-14 2-best1: « Income Level = High » (0-1-5), « Credit History = Good » (0-1-3) Best_hypo: « Income Level = High » 2-best2: « Income Level = High AND Credit History = Good » (0-0-3), « Income level = High AND Collateral = None » (0-0-3) Best_hypo: « Income Level = High AND Credit History = Good » Can’t do better than (0-0-3) Second rule: IF Income Level = High AND Credit History = Good THEN LOW

  9. CN2 Example (III) Third pass: Instances 2-3, 5-6, 8, 12, 14 2-best1: « Credit History = Good » (0-1-0), « Debt level = High » (2-1-0) Best_hypo: « Credit History = Good » Can’t do better than (0-1-0) Third rule: IF Credit History = Good THEN MODERATE

  10. CN2 Example (IV) Fourth pass: Instances 2-3, 5-6, 8, 14 2-best1: « Debt level = High » (2-0-0), « Income Level = Medium » (2-1-0) Best_hypo: « Debt Level = High » Can’t do better than (2-0-0) Fourth rule: IF Debt Level = High THEN HIGH

  11. CN2 Example (V) Fifth pass: Instances 3, 5-6, 8 2-best1: « Credit History = Bad » (0-1-0), « Income Level = Medium » (0-1-0) Best_hypo: «  Credit History = Bad » Can’t do better than (0-1-0) Fifth rule: IF Credit History = Bad THEN MODERATE

  12. CN2 Example (VI) Sixth pass: Instances 3, 5-6 2-best1: « Income Level = High » (0-0-2), « Collateral = Adequate » (0-0-1) Best_hypo: «  Income Level = High  » Can’t do better than (0-0-2) Sixth rule: IF Income Level = High THEN LOW

  13. CN2 Example (VII) Seventh pass: Instance 3 2-best1: « Credit History = Unknown » (0-1-0), « Debt level = Low » (0-1-0) Best_hypo: « Credit History = Unknown » Can’t do better than (0-1-0) Seventh rule: IF Credit History = Unknown THEN MODERATE

  14. CN2 Example (VIII) Quality: -pilog(pi) Rule 1: (4-0-0) - Rank 1 Rule 2: (0-0-3) - Rank 2 Rule 3: (1-1-3) - Rank 5 Rule 4: (4-1-2) - Rank 6 Rule 5: (3-1-0) - Rank 4 Rule 6: (0-1-5) - Rank 3 Rule 7: (2-1-2) - Rank 7

  15. CN2 Example (IX) IF Income Level = Low THEN HIGH IF Income Level = High AND Credit History = Good THEN LOW IF Income Level = High THEN LOW IF Credit History = Bad THEN MODERATE IF Credit History = Good THEN MODERATE IF Debt Level = High THEN HIGH IF Credit History = Unknown THEN MODERATE

  16. Limitations of AVL (I) • Consider the MONK1 problem: • 6 attributes • A1: 1, 2, 3 • A2: 1, 2, 3 • A3: 1, 2 • A4: 1, 2, 3 • A5: 1, 2, 3, 4 • A6: 1, 2 • 2 classes: 0, 1 • Target concept: If (A1=A2 or A5=1) then Class 1

  17. Limitations of AVL (II) • Can you build a decision tree for this concept?

  18. Limitations of AVL (III) • Can you build a rule set for this concept? • If A1=1 and A2=1 then Class=1 • If A1=2 and A2=2 then Class=1 • If A1=3 and A2=3 then Class=1 • If A5=1 then Class=1 • Class=0

  19. First-order Language • Supports first-order concepts -> relations between attributes accounted for in a natural way • For simplicity, restrict to Horn clauses • A clause is any disjunction of literals whose variables are universally quantified • Horn clauses (single non-negated literal):

  20. FOIL (I) • Algorithm FOIL(Target_predicate, Predicates, Examples) • Pos those Examples for which Target_predicate is true • Neg those Examples for which Target_predicate is false • Learned_rules   • While Pos   Do • New_rule  the rule that predicts Target_predicate with no precondition • New_rule_neg  Neg • While New_rule_neg   Do • Candidate_literals  GenCandidateLit(New_rule, Predicates) • Best_literal  argmaxLCandidate_literals FoilGain(L, New_rule) • Add Best_literal to New_rule’s preconditions • New_rule_neg  subset of New_rule_neg that satisfies New_rule’s preconditions • Learned_rules  Learned_rules + New_rule • Pos  Pos – {members of Pos covered by New_rule} • Return Learned_rules

  21. FOIL (II) • Algorithm GenCandidateLit(Rule, Predicates) • Let Rule P(x1, …, xk)  L1, …, Ln • Return all literals of the form • Q(v1, …, vr) where Q is any predicate in Predicates and the vi’s are either new variables or variables already present in Rule, with the constraint that at least one of the vi’s must already exist as a variable in Rule • Equal(xj, xk) where xj and xk are variables already present in Rule • The negation of all of the above forms of literals

  22. FOIL (III) • Algorithm FoilGain(L, Rule) • Return • where • p0 is the number of positive bindings of Rule • n0 is the number of negative bindings of Rule • p1 is the number of positive bindings of Rule+L • n1 is the number of negative bindings of Rule+L • t is the number of positive bindings of Rule that are still covered after adding L to Rule

  23. Illustration (I) • Consider the data: • GrandDaughter(Victor, Sharon) • Father(Sharon, Bob) • Father(Tom, Bob) • Female(Sharon) • Father(Bob, Victor) • Target concept: GrandDaughter(x, y) • Closed-world assumption

  24. Illustration (II) • Training set: • Positive examples: • GrandDaughter(Victor, Sharon) • Negative examples: • GrandDaughter(Victor, Victor) • GrandDaughter(Victor, Bob) • GrandDaughter(Victor, Tom) • GrandDaughter(Sharon, Victor) • GrandDaughter(Sharon, Sharon) • GrandDaughter(Sharon, Bob) • GrandDaughter(Sharon, Tom) • GrandDaughter(Bob, Victor) • GrandDaughter(Bob, Sharon) • GrandDaughter(Bob, Bob) • GrandDaughter(Bob, Tom) • GrandDaughter(Tom, Victor) • GrandDaughter(Tom, Sharon) • GrandDaughter(Tom, Bob) • GrandDaughter(Tom, Tom)

  25. Illustration (III) • Most general rule: • GrandDaughter(x, y)  • Specializations: • Father(x, y) • Father(x, z) • Father(y, x) • Father(y, z) • Father(x, z) • Father(z, x) • Female(x) • Female(y) • Equal(x, y) • Negations of each of the above

  26. Illustration (IV) • Consider 1st specialization • GrandDaughter(x, y)  Father(x, y) • 16 possible bindings: • x/Victor, y/Victor • x/Victor y/Sharon • … • x/Tom, y/Tom • FoilGain: • p0 = 1 (x/Victor, y/Sharon) • n0 = 15 • p1 = 0 • n1 = 16 • t = 0 • So that GainFoil(1st specialization) = 0

  27. Illustration (V) • Consider 4th specialization • GrandDaughter(x, y)  Father(y, z) • 64 possible bindings: • x/Victor, y/Victor, z/Victor • x/Victor y/Victor, z/Sharon • … • x/Tom, y/Tom, z/Tom • FoilGain: • p0 = 1 (x/Victor, y/Sharon) • n0 = 15 • p1 = 1 (x/Victor, y/Sharon, z/Bob) • n1 = 11 (x/Victor, y/Bob, z/Victor) (x/Victor, y/Tom, z/Bob) (x/Sharon, y/Bob, z/Victor) (x/Sharon, y/Tom, z/Bob) (x/Bob, y/Tom, z/Bob) (x/Bob, y/Sharon, z/Bob) (x/Tom, y/Sharon, z/Bob) (x/Tom, y/Bob, z/Victor) (x/Sharon, y/Sharon, z/Bob) (x/Bob, y/Bob, z/Victor) (x/Tom, y/Tom, z/Bob) • t = 1 • So that GainFoil(4th specialization) = 0.415

  28. Illustration (VI) • Assume the 4th specialization is indeed selected • Partial rule: GrandDaughter(x, y)  Father(y, z) • Still covers 11 negative examples • New set of candidate literals: • All of the previous ones • Female(z) • Equal(x, z) • Equal(y, z) • Father(z, w) • Father(w, z) • Negations of each of the above

  29. Illustration (VII) • Consider the specialization • GrandDaughter(x, y)  Father(y, z), Equal(x, z) • 64 possible bindings: • x/Victor, y/Victor, z/Victor • x/Victor y/Victor, z/Sharon • … • x/Tom, y/Tom, z/Tom • FoilGain: • p0 = 1 (x/Victor, y/Sharon, z/Bob) • n0 = 11 • p1 = 0 • n1 = 3 (x/Victor, y/Bob, z/Victor) (x/Bob, y/Tom, z/Bob) (x/Bob, y/Sharon, z/Bob) • t = 0 • So that GainFoil(specialization) = 0

  30. Illustration (VIII) • Consider the specialization • GrandDaughter(x, y)  Father(y, z), Father(z, x) • 64 possible bindings: • x/Victor, y/Victor, z/Victor • x/Victor y/Victor, z/Sharon • … • x/Tom, y/Tom, z/Tom • FoilGain: • p0 = 1 (x/Victor, y/Sharon, z/Bob) • n0 = 11 • p1 = 1(x/Victor, y/Sharon, z/Bob) • n1 = 1 (x/Victor, y/Tom, z/Bob) • t = 1 • So that GainFoil(specialization) = 2.585

  31. Illustration (IX) • Assume that specialization is indeed selected • Partial rule: GrandDaughter(x, y)  Father(y, z), Father(z, x) • Still covers 1 negative example • No new set of candidate literals • Use all of the previous ones

  32. Illustration (X) • Consider the specialization • GrandDaughter(x, y)  Father(y, z), Father(z, x), Female(y) • 64 possible bindings: • x/Victor, y/Victor, z/Victor • x/Victor y/Victor, z/Sharon • … • x/Tom, y/Tom, z/Tom • FoilGain: • p0 = 1 (x/Victor, y/Sharon, z/Bob) • n0 = 1 • p1 = 1(x/Victor, y/Sharon, z/Bob) • n1 = 0 • t = 1 • So that GainFoil(specialization) = 1

  33. Illustration (XI) • No negative examples are covered and all positive examples are covered • So, we get the final correct rule: GrandDaughter(x, y)  Father(y, z), Father(z, x), Female(y)

More Related