1 / 15

ITCS 3153 Artificial Intelligence

ITCS 3153 Artificial Intelligence. Lecture 24 Statistical Learning Chapter 20. AI: Creating rational agents. The pursuit of autonomous, rational, agents It’s all about search Varying amounts of model information tree searching (informed/uninformed) simulated annealing

mali
Download Presentation

ITCS 3153 Artificial Intelligence

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ITCS 3153Artificial Intelligence Lecture 24 Statistical Learning Chapter 20

  2. AI: Creating rational agents • The pursuit of autonomous, rational, agents • It’s all about search • Varying amounts of model information • tree searching (informed/uninformed) • simulated annealing • value/policy iteration • Searching for an explanation of observations • Used to develop a model

  3. Searching for explanation of observations • If I can explain observations… • can I predict the future? • Can I explain why ten coin tosses are 6 H and 4 T? • Can I predict the 11th coin toss

  4. Running example: Candy • Surprise Candy • Comes in two flavors • cherry (yum) • lime (yuk) • All candy is wrapped in same opaque wrapper • Candy is packaged in large bags containing five different allocations of cherry and lime

  5. Statistics • Given a bag of candy, what distribution of flavors will it have? • Let H be the random variable corresponding to your hypothesis • H1 = all cherry, H2 = all lime, H3 = 50/50 cherry/lime • As you open pieces of candy, let each observation of data: D1, D2, D3, … be either cherry or lime • D1 = cherry, D2 = cherry, D3 = lime, … • Predict the flavor of the next piece of candy • If the data caused you to believe H1 was correct, you’d pick cherry

  6. Bayesian Learning • Use available data to calculate the probability of each hypothesis and make a prediction • Because each hypothesis has an independent likelihood, we use all their relative likelihoods when making a prediction • Probabilistic inference using Bayes’ rule: • P(hi | d) = aP(d | hi) P(hi) • The probability of of hypothesis hi being active given you observed sequence d equals the probability of seeing data sequence d generated by hypothesis hi multiplied by the likelihood of hypothesis i being active hypothesis prior likelihood

  7. Prediction of an unknown quantity X • The likelihood of X happening given d has already happened is a function of how much each hypothesis predicts X can happen given d has happened • Even though a hypothesis has a high prediction that X will happen, this prediction will be discounted if the hypothesis itself is unlikely to be true given the observation of d

  8. Details of Bayes’ rule • All observations within d are • independent • identically distributed • The probability of a hypothesis explaining a series of observations, d • is the product of explaining each component

  9. Example • Prior distribution across hypotheses • h1 = 100% cherry = 0.1 • h2 = 75/25 cherry/lime = 0.2 • h3 = 50/50 cherry/lime = 0.5 • h4 = 25/75 cherry/lime = 0.2 • h5 = 100% lime = 0.1 • Prediction • P(d|h3) = (0.5)10

  10. Example • Probabilities for each hypothesis starts at prior value <.1, .2, .4, .2, .1> • Probability of h3 hypothesis as 10 lime candies are observed • P(d|h3)*P(h3) = (0.5)10*(0.4)

  11. Prediction of 11th candy • If we’ve observed 10 lime candies, is 11th lime? • Build weighted sum of each hypothesis’s prediction • Weighted sum can become expensive to compute • Instead use most probable hypothesis and ignore others • MAP: maximum a posteriori from hypothesis from observations

  12. Overfitting • Remember overfitting from NN discussion? • The number of hypotheses influences predictions • Too many hypotheses can lead to overfitting

  13. Overfitting Example • Say we’ve observed 3 cherry and 7 lime • Consider our 5 hypotheses from before • prediction is a weighted average of the 5 • Consider having 11 hypotheses, one for each permutation • The 3/7 hypothesis will be 1 and all others will be 0

  14. Learning with Data • First talk about parameter learning • Let’s create a hypothesis for candies that says the probability a cherry is drawn is q, hq • If we unwrap N candies and c are cherry, what is q? • The (log) likelihood is:

  15. Learning with Data • We want to find q that maximizes log-likelihood • differentiate L with respect to q and set to 0 • This solution process may not be easily computed and iterative and numerical methods may be used

More Related