1 / 48

Artificial Intelligence Should Be About Predictions

Artificial Intelligence Should Be About Predictions. Rich Sutton AT&T Labs with special thanks to Michael Littman, Doina Precup, Satinder Singh, David McAllester, Peter Stone. Outline. AI at an Impasse A Predictive Proposal Some of the Machinery Prospects and Conclusion.

massa
Download Presentation

Artificial Intelligence Should Be About Predictions

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Artificial Intelligence Should Be About Predictions Rich Sutton AT&T Labs with special thanks to Michael Littman, Doina Precup, Satinder Singh, David McAllester, Peter Stone

  2. Outline • AI at an Impasse • A Predictive Proposal • Some of the Machinery • Prospects and Conclusion

  3. It’s Hard to Build Large AI Systems • Brittleness • Unforeseen interactions • Scaling • Requires too much manual complexity management • people must understand, intervene, patch and tune • like programming • Need more self-maintenance • learning, verification • internal coherence of knowledge and experience

  4. AI at a Impasse • We can’t go beyond ourselves • We can’t make AI systems more complex than we can understand • All the representations • All the possible meanings • All the interactions • Beyond that, we get bogged down • Brittleness • Continual manual tuning • Teams of people diverge on rep’ns and meanings • No big return for our efforts

  5. What keeps the knowledge in an AI system correct? • People do! • But eventually this is a dead end. • The key to a successful AI is that it can tell for itself if it is working correctly.

  6. The Verification Principle • The Verification Principle An AI system can successfully maintain knowledge only to the extent that it can verify that knowledge itself

  7. Two Strategies for Self-maintenance • Logical self-consistency • Check statements for consistency with each other • Establishes an internal coherence within the AI • But tells us nothing about the external world • Consistency with data • Make predictions, see if they happen • Establishes a coherence between the AI and its world

  8. Outline • AI at an Impasse • A Predictive Proposal • Some of the Machinery • Prospects and Conclusion

  9. Mind is About Predictions • Hypothesis: Knowledge is predictive About what-leads-to-what, under what ways of behaving What will I see if I go around the corner? Objects: What will I see if I turn this over? Active vision: What will I see if I look at my hand? Value functions: What is the most reward I know how to get? Such knowledge is learnable, chainable • Hypothesis: Mental activity is working with predictions Learning them Combining them to produce new predictions (reasoning) Converting them to action (planning, reinforcement learning) Figuring out which are most useful

  10. Philosophical and Psychological Roots • Like classical british empiricism (1650–1800) • Knowledge is about experience • Experience is central • But not anti-nativist (evolutionary experience) • Emphasizing sequential rather than simultaneous events • Replace association/contiguity with prediction/contingency • Close to Tolman’s “Expectancy Theory” (1932–1950) • Cognitive maps, vicarious trial and error • Psychology struggled to make it a science (1890–1950) • Introspection • Behaviorism, operational definitions • Objectivity

  11. Modern Computional View of Mind • OK to talk about insides of minds • Ok to talk about the function and purpose of a design • We talk about Why • Why a system works • Why it should compute X and in manner Y • Why such a system should achieve purpose Z • This is new, and resolves classical struggles • Servo-mechanisms, state-transition probabilities • Utility and decision theory • Information as signal – subjective (private) yet clear • Purpose defines and constrains mental constructs

  12. Informational View of Mind • Mind does information processing • Mind exchanges information with the world • Only experience is known for sure • Anything more public or “objective” is suspect • World is an I-O entity, a black box • Although we often seem to talk about what is inside,All we can sensibly talk about is I-O behavior experience Mind World

  13. Is Mind about Predictions?ORIs Mind about Action (or Policies)? • Of course it is ultimately about action • But action generation methods are relatively clear • Value functions and decision theory • Pick action that maximizes expected cumulative reward • OR • Policy gradient RL methods • Execution-time search • Reflexes and behavior-based robotics • Learning-extended reflexes and conditioning • Flexible cognition requires more than action generation • Most mental activity is working with predictions

  14. An old, simple, appealing idea • Mind as prediction engine! • Predictions are learnable, combinable • They represent cause and effect, and can be pieced together to yield plans • Perhaps this old idea is essentially correct. • Just needs • Development, revitalization in modern forms • Greater precision, formalization, mathematics • The computational perspective to make it respectable • Imagination, determination, patience • Not rushing to performance

  15. Requisites of Prediction Proposal • The AI has to have a life • Predictions must be very flexible, expressive • To capture a wide variety of world knowledge • Mixtures of transition predictions • Closed-loop action conditioning • Closed-loop termination • And yet be grounded, directly comparable to data • Predictions must be combinable, compositional • Support varieties of planning • Projection and anticipation of futures

  16. Outline • AI at an Impasse • A Predictive Proposal • Some of the Machinery • Prospects and Conclusion

  17. Machinery for General Transition Predictions • In steps of increasing expressiveness • Simple state-transition predictions • Mixtures of predictions • Closed-loop termination • Closed-loop action conditioning • While staying grounded in data • Predictions and State

  18. The Simplest Transition Predictions state action Experience 1-step Prediction a X Y k-step Prediction p X Y

  19. Mixtures of k-step Predictions: Terminating over a period of time time steps of interest Where will I be in 10–20 steps? Where will I be in roughly k steps? now k=10 steps k=20 steps Arbitrary termination profiles are possible now k steps short term But sometimes anything like this is too loose and sloppy... medium term long term

  20. Closed-loop Termination • Terminate depending on what happens • E.g., instead of “Will I finish this report soon”which uses a soft termination profile: • Use “Will I be done when my boss gets here?” 1 hr probably in about an hour Prob. time boss arrives 1 only one precise but uncertain time matters Prob. 0

  21. Closed-loop Termination • Terminate depending on what happens • E.g., instead of “Will I finish this report soon”which uses a soft termination profile: • Use “Will I be done when my boss gets here?” 1 hr probably in about an hour Prob. time boss arrives 1 only one precise but uncertain time matters Prob. 0

  22. Closed-loop terminationallows time specification to be both flexible and precise • Instead of “what will I see at t+100?” • Can say “what will I see when I open the box?” • Will we elect a black or a woman president first? • Where will the tennis ball be when it reaches me? • What time will it be when the talk starts? or “when John arrives?” “when the bus comes?” “when I get to the store?” A substantial increase in expressiveness

  23. Closed-loop Action Conditioning • What happens depends on what you do • What you do depends on what happens • Each prediction has a closed-loop policy Policy: States --> Actions (or Probs.) • If you follow the policy, then you predict and verify • Otherwise not • If partly followed, temporal-difference methods can be used

  24. General Transition Predictions (GTPs) Closed-loop terminations And closed-loop policies Correspond to arbitrary experiments and the results of those experiments What will I see if I go into the next room? What time will it be when the talk is over? Is there a dollar in the wallet in my pocket? Where is my car parked? Can I throw the ball into the basket? Is this a chair situation? What will I see if I turn this object around?

  25. Anatomy of a General Transition Prediction States Measurement space 1 Predictor Recognizes the conditions, makes the prediction 2 Experiment - policy - termination condition - measurement function(s) knowledge Actions verifier

  26. Example: Open-the-door • PredictorUse visual input to estimate • Probabilities of succeeding in opening the door, and of other outcomes (door locked, no handle, no real door) • expected cumulative cost (sub-par reward) in trying • Experiment • Policy for walking up to the door, shaping grasp of handle, turning, pulling, and opening the door • Terminate on successful opening or various failure conditions • Measure outcome and cumulative cost

  27. RoboCup-Soccer Example Safe to pass? Predict the outcome of choosing to pass • The pass will take several steps to set up • – choosing to pass involves a whole action policy • You may choose to not to pass half way through • Terminations and outcomes: • – pass is aborted • – opponents touch the ball before teammate • – teamate touches first, appears to control ball • – ball goes out of bounds

  28. Example: Pass-to-Teammate • Predictor uses perceived positions of ball, opponents, etc. to estimate probabilities of • Successful pass, openness of receiver • Interception • Reception failure • Aborted pass, in trouble • Aborted pass, something better to do • Loss of time • Experiment • Policy for maneuvering ball, or around ball, to set up and pass • Termination strategy for aborting, recognizing completion • Measurement of outcome, time

  29. More Predictive Knowledge • John is in the coffee room • My car is in the South parking lot • What we know about geography, navigation • What we know about how an object looks, rotates • What we know about how objects can be used • Recognition strategies for objects and letters • The portrait of Washington on the dollar in the wallet in my other pants in the laundry, has a mustache on it • Composing experiments creates a productive rep’n language

  30. Relational, Propositional, and Deictic •  objectsX, If I drop X, then X will be on the floor • Holding object X means predicting certain sensations if, for example, one directs one’s eyes toward one’s hand • Thus, on dropping, the predicted sensations are merely transferred from the looking-at-hand prediction to the looking-at-floor prediction • Such transfer of existing predictions should be a common part of visual knowledge - updated every time the eyes move •  X,Y, such that Red(X), Blue(Y), and Above(X,Y) • There is some place I can foveate and see Red • There is some place I can foveate and see Blue • If I foveate first the Red place, “mark” it, then the Blue place, the mark will be Above the fovea (may need to search) • These are typical ideas of modern, active, deictic vision X X

  31. Combining Predictions • If the mind is about predictions, • Then thinking is combining predictions to produce new ones • Predictions obviously compose • If A->B and B->C, then A->C • GTPs are designed to do this generally • Fit into “Bellman equations” of semi-Markov extensions of dynamic programming • Can also be used for simulation-based planning

  32. Composing Predictions X Y Y Z X Z • Final measurement • (e.g., partial distribution • of outcome states) Transient measurement (e.g., elapsed time, cumulative reward)

  33. Composing Predictions Y’ .1 X Y Y Z .8 Y’’ .1 Y’ .1 p b p b then if Y X Z 1 1 2 2 .8 T . 8 T + Y’’ .1 1 2

  34. Sutton, Precup, & Singh, 1999 Room-to-Room GTPs (General Transition Predictions) Target (goal) hallway “Options” Precup 2000 Sutton, Precup, & Singh 1999 4 stochastic primitive actions Policy u p F a i l 3 3 % l e f t r i g h t o f t h e t i m e Termination hallways d o w n 8 multi-step GTPs ( t o e a c h r o o m ' s 2 h a l l w a y s ) Predict:Probability of reaching each terminal hallway Goal: minimize # steps + values for target and other outcome hallway

  35. Planning with GTPs (GTPs)

  36. 1 0 0 0 S t e p s p e r 1 0 0 e p i s o d e 1 0 1 0 0 1 0 1 0 0 0 1 0 , 0 0 0 1 E p i s o d e s Learning Path-to-Goal with and without GTPs Primitives GTPs & primitives GTPs

  37. 0 . 7 u p p e r h a l l w a y 0 . 6 s u b g o a l i d e a l v a l u e s R M S E r r o r i n l o w e r 0 . 5 h a l l w a y l l s u b g o a l 0 . 4 0 . 3 l e a r n e d T w o s u b g o a l v a l u e s 0 . 2 s t a t e v a l u e s 0 . 1 0 2 0 , 0 0 0 4 0 , 0 0 0 6 0 , 0 0 0 8 0 , 0 0 0 1 0 0 , 0 0 0 0 T i m e S t e p s T i m e s t e p s Rooms Example: Simultaneous Learning of all 8 GTPs from their Goals 0 . 4 0 . 3 goal prediction 0 . 2 0 . 1 0 0 2 0 , 0 0 0 4 0 , 0 0 0 6 0 , 0 0 0 8 0 , 0 0 0 1 0 0 , 0 0 0 All 8 hallway GTPs were learned accurately and efficiently while actions are selected totally at random

  38. Machinery for General Predictions • In steps of increasing expressiveness • Simple state-transition predictions • Mixtures of predictions • Closed-loop termination • Closed-loop action conditioning • While staying grounded in data • Predictions and State

  39. Predictive State Representations • Problem: So far we have assumed statesbut world really just gives information, “observations” • Hypothesis: What we normally think of as stateis a set of predictions about outcomes of experiments • Wallet’s contents, John’s location, presence of objects… • Prior work: • Learning deterministic FSAs - Rivest & Schapire, 1987 • Adding stochasticity: An alternative to HMMs - Herbert Jaeger, 1999 • Adding action: An alternative to POMDPs - Littman, Sutton, & Singh 2001

  40. Empty Gridworld with Local Sensing Four actions: Up, Down, Right, Left And four sensory bits

  41. Distance to Wall Predictions 0 R 0RR 1RRR 1RRRR . . . 0D 1DD 1DDD . . . “meaning” of predictions 4 GTPs suffice to identify each state More needed to update PSR Many more are computed from PSR Predictive State Representation (PSR)

  42. Suppose we add one non-uniformity 0 R 0RR 1RRR 1RRRR . . . 0D 1DD 1DDD . . . Now there is much more to know It would be challenging to program it all correctly

  43. Other Extension Ideas • Stochasticity • Egocentric motion • Multiple Rooms • Second agent • Moveable objects • Transient goals It’s easy to make such problems arbitrarily challenging

  44. Outline • AI at an Impasse • A Predictive Proposal • Some of the Machinery • Prospects and Conclusion

  45. How Could These Ideas Proceed? • Build systems! Build Gridworlds! • A performance orientation would be problematic • The “Knowledge Representation” guys may not be impressed • But others I think will be very interested and appreciative - throughout modern probabalistic AI

  46. Conclusion: Predictions are the Coin of the Mental Realm • Knowledge is Predictions About what-leads-to-what, under what ways of behaving Such knowledge is learnable, chainable • Mental activity is working with predictions Learning them Combining them to produce new predictions (reasoning) Converting them to action (planning, reinforcement learning) Figuring out which are most useful • Predictions are verifiable A natural way to self-maintain knowledge,which is essential for scaling AI beyond programming • Most of the machinery is simple but potentially powerful

  47. Reliable Knowledge Requires Verification • We can distinguish • 1. Having knowledge • 2. Having the ability to verify knowledge • I.e., there is somethingbeyondhaving knowledgewhich we might call understanding its meaningand which is key in practice to building powerful AIs

  48. Summary of Results for Predictive State Rep’ns (PSRs) • Exist compact, linear PSRs • # tests ≤ # states in minimal POMDP • # tests ≤ Rivest & Schapire’s Diversity • # tests can be exponentially fewer than diversity and POMDP • Compact simulation/update process • Construction algorithm from POMDP • Learning/discovery algorithms of Rivest and Schapire, and of Jaeger, do not immediately extend to PSRs • There are natural EM-like algorithms (current work)

More Related