1 / 22

Basics of Data Mining with Bayesian Networks

Learn the fundamentals of data mining with Bayesian networks, including unconditional and conditional probability, joint probability, conditional independence, and creating a Bayesian network. Explore examples and applications in various domains.

fritch
Download Presentation

Basics of Data Mining with Bayesian Networks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Mining with Bayesian Networks (I) Instructor: Qiang Yang Hong Kong University of Science and Technology Qyang@cs.ust.hk Thanks: Dan Weld, Eibe Frank

  2. Weather data set

  3. Windy=True Windy=False Play=yes Play=no Basics • Unconditional or Prior Probability • Pr(Play=yes) + Pr(Play=no)=1 • Pr(Play=yes) is sometimes written as Pr(Play) • Table has 9 yes, 5 no • Pr(Play=yes)=9/(9+5)=9/14 • Thus, Pr(Play=no)=5/14 • Joint Probability of Play and Windy: • Pr(Play=x,Windy=y) for all values x and y, should be 1 3/14 6/14 3/14 ?

  4. Probability Basics • Conditional Probability • Pr(A|B) • # (Windy=False)=8 • Within the 8, • #(Play=yes)=6 • Pr(Play=yes | Windy=False) =6/8 • Pr(Windy=False)=8/14 • Pr(Play=Yes)=9/14 • Applying Bayes Rule • Pr(B|A) = Pr(A|B)Pr(B) / Pr(A) • Pr(Windy=False|Play=yes)= 6/8*8/14/(9/14)=6/9

  5. Conditional Independence • “A and P are independent given C” • Pr(A | P,C) = Pr(A | C) C A P Probability F F F 0.534 F F T 0.356 F T F 0.006 F T T 0.004 T F F 0.048 T F T 0.012 T T F 0.032 T T T 0.008 Ache Cavity Probe Catches

  6. C A P Probability F F F 0.534 F F T 0.356 F T F 0.006 F T T 0.004 T F F 0.012 T F T 0.048 T T F 0.008 T T T 0.032 Suppose C=True Pr(A|P,C) = 0.032/(0.032+0.048) = 0.032/0.080 = 0.4 Pr(A|C) = 0.032+0.008/ (0.048+0.012+0.032+0.008) = 0.04 / 0.1 = 0.4 Conditional Independence • “A and P are independent given C” • Pr(A | P,C) = Pr(A | C) and also Pr(P | A,C) = Pr(P | C)

  7. C P(A) T 0.4 F 0.02 P(C) .01 C P(P) T 0.8 F 0.4 Conditional Independence Conditional probability table (CPT) • Can encode joint probability distribution in compact form C A P Probability F F F 0.534 F F T 0.356 F T F 0.006 F T T 0.004 T F F 0.012 T F T 0.048 T T F 0.008 T T T 0.032 Ache Cavity Probe Catches

  8. Creating a Network • 1: Bayes net = representation of a JPD • 2: Bayes net = set of cond. independence statements • If create correct structure that represents causality • Then get a good network • i.e. one that’s small = easy to compute with • One that is easy to fill in numbers

  9. Example • My house alarm system just sounded (A). • Both an earthquake (E) and a burglary (B) could set it off. • John will probably hear the alarm; if so he’ll call (J). • But sometimes John calls even when the alarm is silent • Mary might hear the alarm and call too (M), but not as reliably • We could be assured a complete and consistent model by fully specifying the joint distribution: • Pr(A, E, B, J, M) • Pr(A, E, B, J, ~M) • etc.

  10. Structural Models (HK book 7.4.3) Instead of starting with numbers, we will start with structural relationships among the variables There is a direct causal relationship from Earthquake to Alarm There is a direct causal relationship from Burglar to Alarm There is a direct causal relationship from Alarm to JohnCall Earthquake and Burglar tend to occur independently etc.

  11. Earthquake Burglary Alarm MaryCalls JohnCalls Possible Bayesian Network

  12. P(E) .002 P(B) .001 B T T F F E T F T F P(A) .95 .94 .29 .01 A T F P(J) .90 .05 A T F P(M) .70 .01 Complete Bayesian Network Earthquake Burglary Alarm MaryCalls JohnCalls

  13. Microsoft Bayesian Belief Net • http://research.microsoft.com/adapt/MSBNx/ • Can be used to construct and reason with Bayesian Networks • Consider the example

  14. Learning problem Some methods are proposed Difficult problem Often requires domain expert’s knowledge Once set up, a Bayesian Network can be used to provide probabilistic queries Microsoft Bayesian Network Software Problems: Known structure, fully observable CPTables are to be learned Unknown structure, fully observable Search structures Known Structure, hidden var Parameter learning using hill climbing Unknown (Structure,Var) No good results Mining for Structural Models

  15. Hidden Variable (Han and Kamber’s Data Mining book, pages 301-302) • Assume that the Bayesian Network structure is given • Some variables are hidden • Example: • Our objective: find the CPT for all nodes • Idea: • Use a method of gradient descent • Let S be the set of training examples: {X1, X2, … Xs} • Consider a variable Yi and Parents Ui={Parent1, Parent2, …}. • Question: What is Pr(Yi=yij | Ui=uik)? • Answer: learn this value from the data in iterations

  16. Learn CPT for Hidden Variable • Suppose we are in a Tennis Domain • We wish to introduce a new variable not in our data set, called Field Temp • It represents the temperature of the field • Assume that we don’t have a good way to measure it, but have to include it in our network Windy Outlook Field Temp

  17. Learn the CPT Ui Parent1 Parent2 • Let wijk be the value of Pr(Yi|Ui) • Compute a new wijk based on the old Yi

  18. Example: Learn the CPT Windy Outlook w=Pr(Field Temp=Hot|Windy=True,Outlook=Sunny) • Let the old w be 0.5. Compute a new w Field Temp Normalize and then iterate until stable.

More Related