IT/CS 811 Principles of Machine Learning and Inference

IT/CS 811 Principles of Machine Learning and Inference Deductive (explanation-based) learning Prof. Gheorghe Tecuci Learning Agents Laboratory Computer Science Department George Mason University

Overview The explanation-based learning problem The explanation-based learning method The utility problem Discussion Recommended reading

Explanation-based learning problem Given A training example A positive example of a concept to be learned. Learning goal A specification of the desirable features of the concept to be learned from the training example. Background knowledge Prior knowledge that allows proving (explaining) that the training example is indeed a positive example of the concept. Determine A concept definition representing a deductive generalization of the training example that satisfies the learning goal. Purpose of learning Improve the problem solving efficiency of the agent.

Explanation-based learning problem: illustration Given Training Example - The description of a particular cup: OWNER(OBJ1, EDGAR) & COLOR(OBJ1, RED) & IS(OBJ1, LIGHT) & PART-OF(CONCAVITY1, OBJ1) & ISA(CONCAVITY1, CONCAVITY) & IS(CONCAVITY1, UPWARD-POINTING) & PART-OF(BOTTOM1, OBJ1) & ISA(BOTTOM1, BOTTOM) & IS(BOTTOM1, FLAT) & PART-OF(BODY1, OBJ1) & ISA(BODY1, BODY) & IS(BODY1, SMALL) & PART-OF(HANDLE1, OBJ1) & ISA(HANDLE1, HANDLE) & LENGTH(HANDLE1, 5) Learning goal Find a sufficient concept definition for CUP, expressed in terms of the features used in the training example (LIGHT, HANDLE, FLAT, etc.) Background Knowledge "x, LIFTABLE(x) & STABLE(x) & OPEN-VESSEL(x) ® CUP(x) "x "y, IS(x, LIGHT) & PART-OF(y, x) & ISA(y, HANDLE) ® LIFTABLE(x) "x "y, PART-OF(y, x) & ISA(y, BOTTOM) & IS(y, FLAT) ® STABLE(x) "x "y, PART-OF(y,x) & ISA(y, CONCAVITY) & IS(y, UPWARD-POINTING) ® OPEN-VESSEL(x) Determine A deductive generalization of the training example that satisfies the learning goal "x"y1"y2"y3, [PART-OF(y1, x) & ISA(y1, CONCAVITY) & IS(y1, UPWARD-POINTING) & PART-OF(y2, x) & ISA(y2, BOTTOM) & IS(y2, FLAT) & IS(x, LIGHT) & PART-OF(y3, x) & ISA(y3, HANDLE) => CUP(x)]

Explanation-based learning method Explain Construct an explanation that proves that the training example is an example of the concept to be learned. Generalize Generalize the found explanation as much as possible so that the proof still holds, and extract from it a concept definition that satisfies the learning goal.

Explain - Prove that the training example is a cup: The leaves of the proof tree are those features of the training example that allows one to recognize it as a cup. By building the proof one isolates the relevant features of the training example.

The semantic network representation of the cup example. The enclosed features are the relevant ones.

Generalize the proof tree as much as possible so that the proof still holds: - replace each rule instance with its general pattern; - find the most general unification of these patterns. OPEN-VESSEL (x1) ôôô OPEN-VESSEL (x2) STABLE (x1) ôôô STABLE (x3) LIFTABLE (x1) ôôô LIFTABLE (x4) Therefore x1=x2=x3=x4=x

The leaves of this generalized proof tree represent an operational definition of the concept CUP: "x1"y1"y2"y3, [PART-OF(y1, x1) & ISA(y1, CONCAVITY) & IS(y1, UPWARD-POINTING) & PART-OF(y2, x1) & ISA(y2, BOTTOM) & IS(y2, FLAT) & IS(x1, LIGHT) & PART-OF(y3, x1) & ISA(y3, HANDLE) => CUP(x1)]

Discussion How does this learning method improve the efficiency of the problem solving process?

The goal of this learning strategy is to improve the efficiency in problem solving. The agent is able to perform some task but in an inefficient manner. We would like to teach the agent to perform the task faster. Consider, for instance, an agent that is able to recognize cups. The agent receives a description of a cup that includes many features. The agent will recognize that this object is a cup by performing a complex reasoning process, based on its prior knowledge. This process is illustrated by the proof tree which demonstrates that object o1 is indeed a cup: The object o1 is light and has a handle. Therefore it is liftable. An so on … being liftable, stable and an open vessel, it is a cup. However, the agent can learn from this process to recognize a cup faster. The next step in the learning process is to generalize the proof tree. While the initial tree proves that the specific object o1 is a cup, the generalized tree proves that any object x which is light, has a handle and some other features is a cup. Therefore, to recognize that an object o2 is a cup, the agent only needs to look for the presence of these features discovered as important. It no longer needs to build a complex proof tree. Therefore cup recognition is done much faster. Finally, notice that the agent needs only one example to learn from. However, it needs a lot of prior knowledge to prove that this example is a cup. Providing such prior knowledge to the agent is a very complex task.

The utility problem: discussion Let us assume that we have learned an operational definition of the concept “cup”. What happens with the efficiency of recognizing cups covered by the learned rule? Why? What happens with the efficiency of recognizing cups when the input is not covered by the learned rule? Why? When does the efficiency increase? How to assure the increase of the efficiency?

The utility problem: a solution Cost/benefit formula to estimates the utility of the learned rule on the efficiency of the system: Utility = (AvrSavings * ApplicFreq) - AvrMatchCost where AvrSavings = the average time savings when the rule is applicable ApplicFreq = the probability that the rule is applicable when it is tested AvrMatchCost = the average time cost of matching the rule

The utility problem: discussion of the solution How to estimate ApplicFreq? How to estimate AvrMatchCost? How to estimate AvrSavings? Maintain a statistic on the rule's use during subsequent problem solving. Measure rule’s matching cost during subsequent problem solving. Measure the savings requires running the problem solver with and without the rule on each problem. Is this practical? Which would be a good heuristic? Heuristic: One possible solution is to use an estimate of the rule's average savings based on the savings that the rule would have produced on the training example from which it was learned. Conclusion: The system maintains a statistic on the rule's use during subsequent problem solving, in order to determine its utility. If the rule has a negative utility, it is discarded.

The utility of the learned rules • Explanation-based learning has been introduced as a method for improving the efficiency of a system. • Let us consider again the explanation-based system learning an operational definition of the concept CUP. The system has learned a new rule for recognizing a certain kind of cup. This rule does not contain any new knowledge. It is just a compilation of some other rules from the knowledge base. • Adding this new rule in the KB has the following effects on system's efficiency: • - increases the efficiency in recognizing cups covered by the learned rule; • decreases the efficiency in recognizing cups that are not covered by the learned rule. • Adding the operational definition of cup into the KB will increase the global performance of the system only if the first effect is more important then the second one. • Both these effects may be combined into a cost/benefit formula that indicates the utility of the rule with respect to the efficiency of the system: • Utility = (AvrSavings * ApplicFreq) - AvrMatchCost • where • AvrSavings = the average time savings when the rule is applicable • ApplicFreq = the probability that the rule is applicable when it is tested • AvrMatchCost = the average time cost of matching the rule • After learning a rule, the system should maintain a statistic on the rule's use during subsequent problem solving, in order to determine its utility. If the rule has a negative utility, it is discarded. • Unfortunately, although the match cost and the application frequency can be directly measured during subsequent problem solving, it is more difficult to measure the savings. Doing so would require running the problem solver with and without the rule on each problem. And this would have to be done for all rules. • One possible solution is to use an estimate of the rule's average savings based on the savings that the rule would have produced on the training example from which it was learned.

Exercise Given • A training Example The following example of “supports”: [ book(book1) & material(book1, rigid) & cup(cup1) & material(cup1, rigid) & above(cup1, book1) & touches(cup1, book1) ] => supports(book1, cup1) • Learning goal Find a sufficient concept definition for “supports”, expressed in terms of the features used in the training example. • Background Knowledge "x "y [on-top-of(y, x) & material(x, rigid) ® supports(x, y)] "x "y [above(x, y) & touches(x, y) ® on-top-of(x, y)] "x "y "z [above(x, y) & above(y, z) ® above(x, z)] Determine A deductive generalization of the training example that satisfies the learning goal.

Discussion Do we need a training example to learn an operational definition of the concept? Why? Answer: The learner does not need a training example. It can simply built proof trees from top-down, starting with an abstract definition of the concept and growing the tree until the leaves are operational features. However, without a training example the learner will learn many operational definitions. The training example focuses the learner on the most typical example.

Discussion What is the classification accuracy of deductive learning? What is the classification accuracy of an example classified as positive? Why? What is the classification accuracy of an example classified as negative? Why? How could one improve the classification accuracy?

Learning from several positive examples Learn an operational definition from the first example. Consider this as the first term of a disjunctive definition. Eliminate all the examples already covered by this definition. Learn another operational definition from an uncovered example. Eliminate all the examples covered by this new definition and add it as a new term in the disjunctive definition of the concept. Continue this process until there is no training example left.

Discussion How to use negative examples? Develop a theory of why something is a negative example of some concept and apply the standard method. Does such an approach make sense when we already have a theory that explains positive examples? Why? Sometimes it is easier to explain that something is a negative example. Could you provide an example of such a case?

Discussion How could we apply explanation-based learning to learn inference rules from facts? How could we apply explanation-based learning to learn macro-operators from action plans?

Learning inference rules: illustration Given • Training Example An input fact: RICE-AREA(VIETNAM) • Learning goal Learn a general inference rule allowing the direct derivation of the input fact from facts explicitly represented in the background knowledge (e.g., RAINFALL, CLIMATE, SOIL) • Background Knowledge RAINFALL(VIETNAM, HEAVY); CLIMATE(VIETNAM, SUBTROPICAL); SOIL(VIETNAM, RED-SOIL); LOCATION(VIETNAM, SE-ASIA) "x, CLIMATE(x, SUBTROPICAL) ® TEMPERATURE(x, WARM) "x, RAINFALL(x, HEAVY) ® WATER-SUPPLY(x, HIGH) "x, SOIL(x, RED-SOIL) ® SOIL(x, FERTILE-SOIL) "x, WATER-SUPPLY(x, HIGH) & TEMPERATURE(x, WARM) & SOIL(x, FERTILE-SOIL) ® RICE-AREA(x) This background knowledge could be used in proving the input fact. Determine • A general inference rule that allows the direct derivation of the input from the facts stored in the knowledge base: "x, RAINFALL(x, HEAVY) & CLIMATE(x, SUBTROPICAL) & SOIL(x, RED-SOIL) => RICE-AREA(x)

Learning macro-operators: illustration Consider the following situation that involves a robot that could go from one room to another and could push boxes through the doors: InRoom(Robot, Room1) InRoom(Box, Room2) Connects(Door1, Room1, Room2) Connects(Door2, Room2, Room3) Connects(Door3, Room1, Room4) Apply explanation-based learning to learn a general macro-operator from the following example of problem solving episode to achieve the goal: InRoom(Box, Room1) perform the actions: GoThru(Robot, Door1, Room1, Room2) PushThru(Robot, Box, Door1, Room2, Room1) The knowledge of the system consists of the action models and inference rule from the next slide.

Learning macro-operators: illustration (cont.) GoThru(a, d, r1, r2) ; robot a goes through door d from room r1 to room r2 Preconditions: InRoom(a, r1) ; a is in room r1 Connects(d, r1, r2) ; door d connects room r1 with room r2 Effects: InRoom(a, r2) ; a is in room r2 PushThru(a, o, d, r1, r2) ; a pushes box o through d from r1 to r2 Preconditions: InRoom(a, r1) ; a is in room r1 InRoom(o, r1) ; o is in room r1 Connects(d, r1, r2) ; door d connects room r1 with room r2 Effects: InRoom(a, r2) ; a is in room r2 InRoom(o, r2) ; o is in room r2 Connects(d, r1, r2) => Connects(d, r2, r1) ; if d connects r1 with r2then it also connects r2 with r1.

Learning macro-operators: illustration (cont.) GoAndPushThru(a, o, d1, d2, r1, r2, r3) ; the robot goes from room r1 into room r2 and pushes the box into room r3 Preconditions:InRoom(a, r1) ; a is in room r1 InRoom(o, r2) ; o is in room r2 Connects(d1, r1, r2) ; door d1 connects room r1 with room r2 Connects(d2, r2, r3) ; door d2 connects room r2 with room r3 Effects: InRoom(a, r3) ; a is in room r3 InRoom(o, r3) ; o is in room r3

Discussion Input examples Background knowledge Type of inference Result of learning How does deductive learning compare with inductive learning? What comparison criteria to consider? EIL – many, both positive and negative EBL – only one positive example EIL – very little needed (e.g. generalization hierarchy) EBL – complete and correct domain theory EIL – inductive EBL – deductive EIL – improves system’s competence EBL – improves system’s efficiency

General features of explanation-based learning • Needs only one example • Requires complete knowledge about the concept (which makes this learning strategy impractical). • Improves agent's efficiency in problem solving • Shows the importance of explanations in learning

Recommended reading Mitchell T.M., Machine Learning, Chapter 11: Analytical Learning, pp. 307 - 333, McGraw Hill, 1997. Mitchell T.M., Keller R.M., Kedar-Cabelli S.T., Explanation-Based Generalization: A Unifying View, Machine Learning 1, pp. 47-80, 1986. Also in Readings in Machine Learning, J.W.Shavlik, T.G.Dietterich (eds), Morgan Kaufmann, 1990. DeJong G., Mooney R., Explanation-Based Learning: An Alternative View, Machine Learning 2, 1986. Also in Readings in Machine Learning, J.W.Shavlik, T.G.Dietterich (eds), Morgan Kaufmann, 1990. Tecuci G. & Kodratoff Y., Apprenticeship Learning in Imperfect Domain Theories, in Kodratoff Y. & Michalski R. (eds), Machine Learning, vol 3, Morgan Kaufmann, 1990. S. Minton, Quantitative Results Concerning the Utility of Explanation-Based Learning, in Artificial Intelligence, vol. 42, pp. 363-392, 1990. Also in Shavlik J. and Dietterich T. (eds), Readings in Machine Learning, Morgan Kaufmann, 1990.

IT/CS 811 Principles of Machine Learning and Inference