1 / 51

Learning From Observations

Learning From Observations. Marco Loog. Learning from Observations. Idea is that percepts should be used for improving agents ability to act in the future, not only for acting per se. Outline. Learning agents Inductive learning Decision tree learning. Learning.

denim
Download Presentation

Learning From Observations

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Learning From Observations Marco Loog

  2. Learning from Observations • Idea is that percepts should be used for improving agents ability to act in the future, not only for acting per se

  3. Outline • Learning agents • Inductive learning • Decision tree learning

  4. Learning • Learning is essential for unknown environments, i.e., when designer lacks omniscience • Learning is useful as a system construction method, i.e., expose the agent to reality rather than trying to write it down • Learning modifies the agent’s decision mechanisms to improve performance

  5. Learning Agent [Revisited] • Four conceptual components • Learning element : responsible for making improvements • Performance element : takes percepts and decides on actions • Critic : provides feedback on how agent is doing and determines how performance element should be modified • Problem generator : responsible for suggesting actions leading to new and informative experience

  6. Figure 2.15 [Revisited]

  7. Learning Element • Design of learning element is affected by • Which components of the performance element are to be learned • What feedback is available to learn these components • What representation is used for the components

  8. Agent’s Components • Direct mapping from conditions on current state to actions [instructor : brake!] • Means to infer relevant properties about world from percept sequence [learning from images] • Info about evolution of the world and results of possible actions [braking on wet road] • Utility indicating desirability of world state [no tip / component of utility function] • ... • Each component can be learned from appropriate feedback

  9. Types of Feedback • Supervised learning : correct answers for each example • Unsupervised learning : correct answers not given • Reinforcement learning : occasional rewards

  10. Inductive Learning • Simplest form : learn a function from examples • I.e. learn the target function f • Examples : input / output pairs (x, f(x))

  11. Inductive Learning • Problem • Find a hypothesis h, such that h ≈ f, based on given training set of examples • = highly simplified model of real learning • Ignores prior knowledge • Assumes examples are given

  12. Hypothesis • A good hypothesis will generalize well, i.e., able to predict based on unseen examples

  13. Inductive Learning Method • E.g. function fitting • Goal is to estimate real underlying functional relationship from example observations

  14. Inductive Learning Method • Construct h to agree with f on training set

  15. Inductive Learning Method • Construct h to agree with f on training set

  16. Inductive Learning Method • Construct h to agree with f on training set

  17. Inductive Learning Method • Construct h to agree with f on training set • h is consistent if it agrees with f on all examples

  18. Inductive Learning Method • Construct h to agree with f on training set • h is consistent if it agrees with f on all examples

  19. So, which ‘Fit’ is Best?

  20. So, which ‘Fit’ is Best? • Ockham’s razor : prefer simplest hypothesis consistent with the data

  21. So, which ‘Fit’ is Best? • Ockham’s razor : prefer simplest hypothesis consistent with the data • What’s consistent? What’s simple?

  22. Hypothesis • A good hypothesis will generalize well, i.e., able to predict based on unseen examples • Not-exactly-consistent may be preferable over exactly consistent • Nondeterministic behavior • Consistency even not always possible • Nondeterministic functions : trade-off complexity of hypothesis / degree of fit

  23. Decision Trees • ‘Decision tree induction is one of the simplest, and yet most successful forms of learning algorithm’ • Good intro to the area of inductive learning

  24. Decision Tree • Input : object or situation described by set of attributes / features • Output [discrete or continuous] : decision / prediction • Continuous -> regression • Discrete -> classification • Boolean classification : output is binary / ‘true’ or ‘false’

  25. Decision Tree • Performs a sequence of tests in order to reach a decision • Tree [as in : graph without closed loops] • Internal node : test of the value of single property • Branches labeled with possible test outcomes • Leaf node : specifies output value • Resembles a ‘how to’ manual

  26. Decide whether to wait for a Table at a Restaurant • Based on the following attributes • Alternate : is there an alternative restaurant nearby? • Bar : is there a comfortable bar area to wait in? • Fri/Sat : is today Friday or Saturday? • Hungry : are we hungry? • Patrons : number of people in the restaurant [None, Some, Full] • Price : price range [$, $$, $$$] • Raining : is it raining outside? • Reservation : have we made a reservation? • Type : kind of restaurant [French, Italian, Thai, Burger] • WaitEstimate : estimated waiting time [0-10, 10-30, 30-60, >60]

  27. Attribute-Based Representations • Examples of decisions

  28. Decision Tree • Possible representation for hypotheses • Below is the ‘true’ tree [note Type? plays no role]

  29. Expressiveness • Decision trees can express any function of the input attributes • E.g., for Boolean functions, truth table row path to leaf

  30. Expressiveness • There is a consistent decision tree for any training set with one path to leaf for each example [unless f nondeterministic in x] but it probably won’t generalize to new examples • Prefer to find more compact decision trees [This Ockham again...]

  31. Attribute-Based Representations • Is simply a lookup table • Cannot generalize to unseen examples

  32. Decision Tree • Applying Ockham’s razor : smallest tree consistent with examples

  33. Decision Tree • Applying Ockham’s razor : smallest tree consistent with examples • Able to generalize to unseen examples • No need to program everything out / specify everything in detail ‘true’ tree = smallest tree?

  34. Decision Tree Learning • Unfortunately, finding the ‘smallest’ tree is intractable in general • New aim : find a ‘smallish’ tree consistent with the training examples • Idea : [recursively] choose ‘most significant’ attribute as root of [sub]tree • ‘Most significant’ : making the most difference to the classification

  35. Choosing an Attribute Tests • Idea : a good attribute splits the examples into subsets that are [ideally] ‘all positive’ or ‘all negative’ • Patrons? is a better choice

  36. Using Information Theory • Information content [entropy] : • I(P(v1), … , P(vn)) = Σi=1 -P(vi) log2 P(vi) • For a training set containing p positive examples and n negative examples • Specifies the minimum number of bits of information needed to encode the classification of an arbitrary member

  37. Information Gain • Chosen attribute A divides training set E into subsets E1, … , Ev according to their values for A, where A has v distinct values • Information gain [IG] : expected reduction in entropy caused by partitioning the examples

  38. Information Gain • Information gain [IG] : expected reduction in entropy caused by partitioning the examples • Choose the attribute with the largest IG • [Wanna know more : Google it...]

  39. Information Gain [E.g.] • For the training set : p = n = 6, I(6/12, 6/12) = 1 bit • Consider Patrons? and Type? [and others] • Patrons has the highest IG of all attributes and so is chosen as the root • Why is IG of Type? equal to zero?

  40. Decision Tree Learning • Plenty of other measures for ‘best’ attributes possible...

  41. Back to The Example... • ‘Training data’

  42. Based on the 12 examples; substantially simpler solution than ‘true’ tree More complex hypothesis isn’t justified by small amount of data Decision Tree Learned

  43. Performance Measurement • How do we know that h ≈ f? • Or : how the h*ll do we know that our decision tree performs well? • Most often we don’t know... for sure

  44. Performance Measurement • However • prediction quality can be estimated using theory from computational / statistical learning theory / PAC-learning • Or we could, for example, simply try h on a new test set of examples • The crux being of course that there should actually be new test set... • If no test set is available several possibilities exist for creating ‘training’ and ‘test’ sets from the available data

  45. Performance Measurement • Learning curve : ‘%’ correct on test set as function of training set size

  46. Bad Conduct in AI • Training on the test set! • May happen before you know it • Often very hard justifiable... if at all possible • All I can say is : try to avoid it

  47. Ensemble-Learning-in-1-Slide • Idea : collection [ensemble] of hypotheses is used / predictions are combined • Motivation : hope that it is much less likely to misclassify [obviously!] • E.g. independence can be exploited • Examples : majority voting / boosting • Ensemble learning simply creates new, more expressive hypothesis space

  48. Summary • In general : learning needed for unknown environments or lazy designers • Learning agent = performance element + learning element [Chapter 2] • Supervised learning : the aim is to find simple hypothesis [approximately] consistent with training examples • Decision tree learning using IG • Difficult to measure learning performance • Learning curve

  49. Next Week • More...

More Related