1 / 26

PGM ch 4.1-4.2 notes

LAC group, 16/06/2011. PGM ch 4.1-4.2 notes. So far. Directed graphical models Bayesian Networks Useful because both the structure and the parameters provide a natural representation for many types of real-world domains. This chapter. Un directed graphical models

Download Presentation

PGM ch 4.1-4.2 notes

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. LAC group, 16/06/2011 PGM ch 4.1-4.2 notes

  2. So far... • Directed graphical models • Bayesian Networks • Useful because both the structure and the parameters provide a natural representation for many types of real-world domains.

  3. This chapter... • Undirected graphical models • Useful in modelling phenomena where we cannot determine the directionality of the interaction between the variables. • Offer a different, simpler perspective on directed models (both independence structure & inference task)

  4. This chapter... • Introduce a framework that allows both directed and undirected edges • Note: some of the results in this chapter require that we restrict attention to distribution over discrete state spaces. • Discrete vs. continuous = boolean or real numbers e.g. 2.1.6

  5. The 4 students example A (The misconception example sec. 3.4.2, ex.3.8) • 4 students who get together in pairs to work on their homework for a class. The pairs that meet are shown via the edges (lines) of this undirected graph : • A : Alice • B : Bobby • C : Charles • D : Debbie D B C

  6. The 4 students example We want to model the following distribution: • A is independent of C given B and D • B is independent of D given A and C

  7. The 4 students example PROBLEM 1: If we try to model these on a Bayesian network, we will be in trouble: • Any bayesian network I-map of such a distribution will have extraneous edges • At least one of the desired independence statements will not be captured (cont’d)

  8. The 4 students example (cont’d) • Any bayesian will require from us to describe the directionality of the influence Also: • Interactions look symmetrical and we would like to model this somehow, without representing a direction of influence.

  9. The 4 students example A SOLUTION 1: Undirected graph = (here) Markov network structure • Nodes (circles) represent variables • Edges (lines) represent a notion of direct probabilistic interaction between the neighbouring variables, not mediated by any other variable in the network. D B C

  10. The 4 students example A PROBLEM 2: • How to parameterise this undirected graph? • CPD (conditional probability distribution) not useful, as the interaction is not directed • We would like to capture the affinities between the related variables e.g. Alice and Bobby are more likely to agree than disagree D B C

  11. The 4 students example SOLUTION 2: • Associate A and B with a general purpose function : factor

  12. The 4 students example • Here we focus only on non-negative factors. Factor: Let D be a set of random variables. We define a factor φ to be a function from Val(D) to R. A factor is non-negative if all its entries are non-negative. Scope: The set of variables D is called the scope of the factor and is denoted as Scope[φ].

  13. The 4 students example • Let’s calculate the factor of A and B i.e. the fact that Alice and Bob are more likely to agree than disagree: φ1(A,B) : Val(A,B) to R+ The value associated with a particular assignment a,b denotes the affinity between the two values: the higher the value of φ1(A,B) the more compatible the two values are

  14. The 4 students example • Fig 4.1/a shows one possible compatibility factor for A and B • Not normalised (see partial function later on how to do this) • 0: right, 1:wrong/has the misconception 0: right, 1:wrong/has the misconception

  15. The 4 students example • φ1(A,B) asserts that: • it is more likely that Alice and Bob agree φ1(a0, b0), φ1(a1, b1) - they are more likely to be either both wrong or both right • If they disagree, Alice is more likely to be right (φ1(a0, b1)) than Bob (φ1(a1, b0)) 0: right, 1:wrong/has the misconception

  16. The 4 students example • φ3(C,D) asserts that: • Charles and Debbie argue all the time and they will end up disagreeing any way : φ3(c0, d1) and φ3(c1, d0) 0: right, 1:wrong/has the misconception

  17. The 4 students example So far: • defined the local interactions between variables/nodes/circles Next step: • Define a global model : need to combine these interactions = multiply them as with a Bayesian network

  18. The 4 students example A possible GLOBAL MODEL: P(a,b,c,d) = φ1(a, b) ∙ φ2(b, c) ∙ φ3(c, d) ∙ φ4(d, a) PROBLEM: Nothing guarantees that the result is a normalised distribution (see fig. 4.2 middle column)

  19. The 4 students example SOLUTION Take the product of the local factors and normalise it: P(a,b,c,d) = 1/Z ∙ φ1(a, b) ∙ φ2(b, c) ∙ φ3(c, d) ∙ φ4(d, a) Where Z= ∑ φ1(a, b) ∙ φ2(b, c) ∙ φ3(c, d) ∙ φ4(d, a) Z is a normalising constant known as partition function : partition as in markov random field in statistical physics; function , as Z is a function of the parameters [important for machine learning]

  20. The 4 students example • See figure 4.2 for the calculations of the joint distribution • Calculate the partition function of a1,b1,c0,d1

  21. The 4 students example • We can use the partition function/joint probability to answer questions like: • How likely is Bob to have a misconception? • How likely is Bob to have the misconception, given that Charles doesn’t?

  22. The 4 students example • How likely is Bob to have the misconception? P(b1) ≈ 0.732 P(b0) ≈ 0.268 Bob is 26% less ?? likely to have the misconception

  23. The 4 students example • How likely is Bob to have the misconception, given that Charles doesn’t? P(b1|c0) ≈ 0.06

  24. The 4 students example Advantages of this approach: • Allows great flexibility in representing interactions between variables. • We can change the nature of interaction between A and B by simply modifying the entries in the factor without caring about normalisation constraints and the interaction of other factors

  25. The 4 students example • Tight connection between factorisation of the distribution and its independence properties: • Factorisation:

  26. The 4 students example • Using the formula in 3) we can decompose the distribution in several ways e.g. P(A,B,C,D) = [1/Z ∙ φ1(A, B) ∙ φ2(B, C)] ∙ φ3(C, D) ∙ φ4(A, D) and infer that

More Related