1 / 56

Turning Probabilistic Reasoning into Programming

Turning Probabilistic Reasoning into Programming. Avi Pfeffer Harvard University. Uncertainty. Uncertainty is ubiquitous Partial information Noisy sensors Non-deterministic actions Exogenous events Reasoning under uncertainty is a central challenge for building intelligent systems.

wayde
Download Presentation

Turning Probabilistic Reasoning into Programming

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Turning Probabilistic Reasoning into Programming Avi Pfeffer Harvard University

  2. Uncertainty • Uncertainty is ubiquitous • Partial information • Noisy sensors • Non-deterministic actions • Exogenous events • Reasoning under uncertainty is a central challenge for building intelligent systems

  3. Probability • Probability provides a mathematically sound basis for dealing with uncertainty • Combined with utilities, provides a basis for decision-making under uncertainty

  4. Probabilistic Reasoning • Representation: creating a probabilistic model of the world • Inference: conditioning the model on observations and computing probabilities of interest • Learning: estimating the model from training data

  5. The Challenge • How do we build probabilistic models of large, complex systems that are • easy to construct and understand • support efficient inference • can be learned from data

  6. (The Programming Challenge) • How do we build programs for interesting problems that are • easy to construct and maintain • do the right thing • run efficiently

  7. Lots of Representations • Plethora of existing models • Bayesian networks, hidden Markov models, stochastic context free grammars, etc. • Lots of new models • Object-oriented Bayesian networks, probabilistic relational models, etc.

  8. Goal • A probabilistic representation language that • captures many existing models • allows many new models • provides programming-language like solutions to building and maintaining models

  9. IBAL • A high-level “probabilistic programming” language for representing • Probabilistic models • Decision problems • Bayesian learning • Implemented and publicly available

  10. Outline • Motivation • The IBAL Language • Inference Goals • Probabilistic Inference Algorithm • Lessons Learned

  11. Stochastic Experiments • A programming language expression describes a process that generates a value • An IBAL expression describes a process that stochastically generates a value • Meaning of expression is probability distribution over generated value • Evaluating an expression = computing the probability distribution

  12. Constants Variables Conditionals Stochastic Choice x = ‘hello y = x z = if x==‘bye ffthen 1 else 2 w = dist [ 0.4: ’hello, 0.6: ’world ] Simple expressions

  13. Functions fair ( ) = dist [0.5 : ‘heads, 0.5 : ‘tails] x = fair ( ) y = fair ( ) • x and y are independent tosses of a ftfair coin

  14. Higher-order Functions fair ( ) = dist [0.5 : ‘heads, 0.5 : ‘tails] biased ( ) = dist [0.9 : ‘heads, 0.1 : ‘tails] pick ( ) = dist [0.5 : fair, 0.5 : biased] coin = pick ( ) x = coin ( ) y = coin ( ) • x and y are conditionally independent ffgiven coin

  15. Data Structures and Types • IBAL provides a rich type system • tuples and records • algebraic data types • IBAL is strongly typed • automatic ML-style type inference

  16. D P(U| S, D) S s d 0.9 0.1 Diligent Smart s d 0.3 0.7 s d 0.6 0.4 Good Test Taker 0.01 0.99 s d Understands Exam Grade HW Grade Bayesian Networks nodes = domain variables edges = direct causal influence Network structure encodes conditional independencies: I(HW-Grade,Smart | Understands)

  17. D S G U E H BNs in IBAL smart = flip 0.8 diligent = flip 0.4 understands = case <smart,diligent> of # <true,true> : flip 0.9 # <true,false> : flip 0.6 …

  18. H1 H2 Ht-1 Ht O1 O2 Ot-1 Ot First-Order HMMs Initial distribution P(H1) Transition model P(Hi|Hi-1) Observation model P(Oi|Hi) What if hidden state is arbitrary data structure?

  19. HMMs in IBAL init : () -> state trans : state -> state obs : state -> obsrv sequence(current) = { state = current observation = obs(state) future = sequence(trans(state)) } hmm() = sequence(init())

  20. SCFGs S -> AB (0.6) S -> BA (0.4) A -> a (0.7) A -> AA (0.3) B -> b (0.8) B -> BB (0.2) • Non-terminals are data generating functions

  21. SCFGs in IBAL append(x,y) = ifnull(x) then y else cons (first(x), append (rest(x),y) production(x,y) = append(x(),y()) terminal(x) = cons(x,nil) s() = dist[0.6:production(a,b), 0.4:production(b,a)] a() = dist[0.7:terminal(‘a),…

  22. Actor Movie Gender Actor Genre Movie Chaplin Mod T. … … Appearance Actor Movie Role-Type Chaplin Mod T. … … Probabilistic Relational Models Role-Type  Actor.Gender, Movie.Genre

  23. PRMs in IBAL movie( ) = { genre = dist ... } actor( ) = { gender = dist ... } appearance(a,m) = { role_type = case (a.gender,m.genre) of (male,western) : dist ... } mod_times = movie() chaplin = actor() a1 = appearance(chaplin, mod_times)

  24. Other IBAL Features • Observations can be inserted into programs • condition probability distribution over values • Probabilities in programs can be learnable parameters, with Bayesian priors • Utilities can be associated with different outcomes • Decision variables can be specified • influence diagrams, MDPs

  25. Outline • Motivation • The IBAL Language • Inference Goals • Probabilistic Inference Algorithm • Lessons Learned

  26. Goals • Generalize many standard frameworks for inference • e.g. Bayes nets, HMMs, probabilistic CFGs • Support parameter estimation • Support decision making • Take advantage of language structure • Avoid unnecessary computation

  27. Diligent Smart Good Test Taker Understands Exam Grade HW Grade Desideratum #1: Exploit Independence • Use Bayes net-like inference algorithm

  28. Desideratum #2: Exploit Low-Level Structure • Causal independence (noisy-or) x = f() y = g() z = x & flip(0.9) | y & flip(0.8)

  29. Desideratum #2: Exploit Low-Level Structure • Context-specific independence x = f() y = g() z = case <x,y> of <false,false> : flip 0.4 <false,true> : flip 0.6 <true> : flip 0.7

  30. Desideratum #3: Exploit Object Structure • Complex domain often consists of weakly interacting objects • Objects share a small interface • Objects are conditionally independent given interface Student 1 Student 2 Course Difficulty

  31. Desideratum #4: Exploit Repetition • Domain often consists of many of the same kinds of objects • Can inference be shared between them? f() = complex x1 = f() x2 = f() … x100 = f()

  32. Desideratum #5: Use the Query • Only evaluate required parts of model • Can allow finite computation on infinite model f() = f() x = let y = f() in true • A query on x does not require f • Lazy evaluation is required • Particularly important for probabilistic languages, e.g. stochastic grammars

  33. Desideratum #6 Use Support • The support of a variable is the set of values it can take with positive probability • Knowing support of subexpressions can simplify computation f() = f() x = false y = if x then f() else true

  34. Desideratum #7 Use Evidence • Evidence can restrict the possible values of a variable • It can be used like support to simplify computation f() = f() x = flip 0.6 y = if x then f() else true observe x = false

  35. Outline • Motivation • The IBAL Language • Inference Goals • Probabilistic Inference Algorithm • Lessons Learned

  36. Two-Phase Inference • Phase 1: decide what computations need to be performed • Phase 2: perform the computations

  37. Natural Division of Labor • Responsibilities of phase 1: • utilizing query, support and evidence • taking advantage of repetition • Responsibilities of phase 2: • exploiting conditional independence, low-level structure and inter-object structure

  38. Phase 1 IBAL Program Computation graph

  39. Computation Graph • Nodes are subexpressions • Edge from X to Y means “Y needs to be computed in order to compute X” • Graph, not tree • different expressions may share subexpressions • memoization used to make sure each subexpression occurs once in graph

  40. Construction of Computation Graph • Propagate evidence throughout program • Compute support for each node

  41. Evidence Propagation • Backwards and forwards let x = <a:flip 0.4, b:1> in observe x.a = true in if x.a then ‘a else ‘b

  42. Construction of Computation Graph • Propagate evidence throughout program • Compute support for each node • this is an evaluator for a non-deterministic programming language • lazy evaluation • memoization

  43. Gotcha! • Laziness and memoization don’t go together • Memoization: when a function is called, look up arguments in cache • But with lazy evaluation, arguments are not evaluated before function call!

  44. Lazy Memoization • Speculatively evaluate function without evaluating arguments • When argument is found to be needed • abort function evaluation • store in cache that argument is needed • evaluate the argument • speculatively evaluate function again • When function evaluates successfully • cache mapping from evaluated arguments to result

  45. Need x Need y ‘a Lazy Memoization let f(x,y,z) = if x then y else z in f(true,’a,’b) f(_,_,_) f(true,_,_) f(true,’a,_)

  46. Microfactors Solution P(Outcome=true)=0.6 Phase 2 Computation Graph

  47. X Y Value False False 0 False True 1 True - 1 Microfactors • Representation of function from variables to reals • E.g. is the indicator function of XvY • More compact than complete tables • Can represent low-level structure

  48. Producing Microfactors Goal: Translate an IBAL program into a set of microfactors F and a set of variables X such that the P(Output) = • Similar to Bayes net • Can solve by variable elimination • exploits independence

  49. Producing Microfactors • Accomplished by recursive descent on computation graph • Use production rules to translate each expression type into microfactors • Introduce temporary variables where necessary

  50. e1 e2 e3 X=False X=True e2 e3 X e1 X=True X=True X=False X=True 1 1 Producing Microfactors if e1then e2else e3

More Related