1 / 42

Bayesian Networks

Bayesian Networks. Nariman Farsad. Overview. Review previously presented models Introduce Bayesian networks Evaluation, Sampling and Inference Examples Applications in NLP Conclusion. Probabilistic Model. An outcome captured by n random variables (RV) RV can take m different values

juliannad
Download Presentation

Bayesian Networks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Bayesian Networks Nariman Farsad N. Farsad

  2. Overview N. Farsad Review previously presented models Introduce Bayesian networks Evaluation, Sampling and Inference Examples Applications in NLP Conclusion

  3. Probabilistic Model N. Farsad An outcome captured by n random variables (RV) RV can take m different values A random configuration

  4. Computational Tasks N. Farsad

  5. Joint Distribution Model N. Farsad • Modeled using the joint distribution • Issues • Memory cost to store tables. • Number of parameters • For n=m=10, 10 billion numbers to store • Runtime cost to do lots of summations • The sparse data problem in learning

  6. Fully Independent Model N. Farsad • Represented by • Solved most problems of joint distribution modeling • Number of parameters • But! • Too strong assumption • Not accurate

  7. Naïve Bayes Model N. Farsad • Represented by • Efficient • Number of parameters • Good accuracy for some applications like text classification • Still over-simplified for some applications

  8. Question? N. Farsad What if we want a better compromise between the model’s computability and the model’s accuracy.

  9. Conditional Independence N. Farsad Independence of two random variables Conditional independence of

  10. Answer: Bayesian Networks N. Farsad A Bayesian Network is defined by a directed acyclic graph (DAG) and a collection of conditional probability tables, where nodes in the graph represent random variables and directed edges in the graph represent conditional independence assumptions. The edges are interpreted in the following way: If Vj(1 ≤ j ≤ n) is a random variable, and Vπ(j) are parent variables of Vj , i.e., all source nodes for edges whose destination node is Vj then the probability of Vj given variables Vπ(j)is independent of any other variable; i.e.

  11. A Graphical Example V1 V2 V3 V4 N. Farsad

  12. Representational Power (1) V1 V2 V3 V4 N. Farsad Full joint distribution model

  13. Representational Power (2) V1 V2 V3 V4 N. Farsad Fully independent model

  14. Representational Power (3) V1 V2 V3 V4 N. Farsad Naïve Bayes model

  15. Representational Power (4) X1 X2 X3 o1 o2 o3 N. Farsad HMM model

  16. Computational Tasks (1) N. Farsad • Evaluation • Simulation • Draw • Conjoin to form a complete configuration

  17. Computational Tasks (2) N. Farsad • Inference • Use the tables to calculate (Brute Force) • In tree Bayesian networks use message passing algorithms • Learning • From a given a network graph and complete observations, use MLE (i.e. counting)

  18. Number of free Parameters N. Farsad • Number of free parameters for each node • k is the number or parents for that node • Examples • Fully independent model • Joint distribution model

  19. Computational Example (1) N. Farsad • You have a new burglar alarm installed. • It is reliable about detecting burglary, but responds to minor earthquakes. • Two neighbors (John, Mary) call incase they hear the alarm • John mixes the phone ringing with alarm • Mary does not hear the alarm well

  20. Computational Example(2) Burglary Earthquake Alarm John Calls Marry Calls N. Farsad

  21. Evaluation Example (1) B E A J M N. Farsad

  22. Evaluation Example (2) N. Farsad

  23. Inference (1) N. Farsad Suppose we are interested in calculating We can calculate it using

  24. Inference (2) N. Farsad Marginal probability

  25. Inference (3) N. Farsad P(B = T, J = T) = P(B = T)P(E = T)P(A = T|B = T,E = T)P(J = T|A = T)P(M = T|A = T) + P(B = T)P(E = T)P(A = T|B = T,E = T)P(J = T|A = T)P(M = F|A = T) + P(B = T)P(E = T)P(A = F|B = T,E = T)P(J = T|A = F)P(M = T|A = F) + P(B = T)P(E = T)P(A = F|B = T,E = T)P(J = T|A = F)P(M = F|A = F) + P(B = T)P(E = F)P(A = T|B = T,E = F)P(J = T|A = T)P(M = T|A = T) + P(B = T)P(E = F)P(A = T|B = T,E = F)P(J = T|A = T)P(M = F|A = T) + P(B = T)P(E = F)P(A = F|B = T,E = F)P(J = T|A = F)P(M = T|A = F) + P(B = T)P(E = F)P(A = F|B = T,E = F)P(J = T|A = F)P(M = F|A = F) = 0.001 · 0.002 · 0.95 · 0.9 · 0.7 + 0.001 · 0.002 · 0.95 · 0.9 · 0.3 + 0.001 · 0.002 · 0.05 · 0.05 · 0.01 + 0.001 · 0.002 · 0.05 · 0.05 · 0.99 + 0.001 · 0.998 · 0.94 · 0.9 · 0.7 + 0.001 · 0.998 · 0.94 · 0.9 · 0.3 + 0.001 · 0.998 · 0.06 · 0.05 · 0.01 + 0.001 · 0.998 · 0.06 · 0.05 · 0.99 = 8.49017 · 10−4

  26. Inference (4) N. Farsad • To calculate • Note • Using a similar method we can calculate

  27. Inference (5) N. Farsad P(B = F, J = T) = P(B = F)P(E = T)P(A = T|B = F,E = T)P(J = T|A = T)P(M = T|A = T) + P(B = F)P(E = T)P(A = T|B = F,E = T)P(J = T|A = T)P(M = F|A = T) + P(B = F)P(E = T)P(A = F|B = F,E = T)P(J = T|A = F)P(M = T|A = F) + P(B = F)P(E = T)P(A = F|B = F,E = T)P(J = T|A = F)P(M = F|A = F) + P(B = F)P(E = F)P(A = T|B = F,E = F)P(J = T|A = T)P(M = T|A = T) + P(B = F)P(E = F)P(A = T|B = F,E = F)P(J = T|A = T)P(M = F|A = T) + P(B = F)P(E = F)P(A = F|B = F,E = F)P(J = T|A = F)P(M = T|A = F) + P(B = F)P(E = F)P(A = F|B = F,E = F)P(J = T|A = F)P(M = F|A = F) = 0.999 · 0.002 · 0.29 · 0.9 · 0.7 + 0.999 · 0.002 · 0.29 · 0.9 · 0.3 + 0.999 · 0.002 · 0.71 · 0.05 · 0.01 + 0.999 · 0.002 · 0.71 · 0.05 · 0.99 + 0.999 · 0.998 · 0.001 · 0.9 · 0.7 + 0.999 · 0.998 · 0.001 · 0.9 · 0.3 + 0.999 · 0.998 · 0.999 · 0.05 · 0.01 + 0.999 · 0.998 · 0.999 · 0.05 · 0.99 = 5.12899587 · 10−2

  28. Inference (6) N. Farsad Therefore P(J = T) = P(B = T, J = T) + P(B = F, J = T) = 8.49017 · 10−4 + 5.12899587 · 10−2 = 0.0521389757 and finally

  29. Take Away Message N. Farsad • Inference is Hard! • In fact for general Bayesian Network inference can be NP-hard • We can do better in tree Bayesian networks • Message passing algorithm, also known as sum-product algorithm can be used.

  30. What about NLP? N. Farsad • Relatively new to NLP. • Selected example for presentation • Weissenbacher, D. 2006. Bayesian network, a model for NLP?. In Proceedings of the Eleventh Conference of the European Chapter of the Association For Computational Linguistics: Posters & Demonstrations (Trento, Italy, April 05 - 06, 2006). European Chapter Meeting of the ACL. Association for Computational Linguistics, Morristown, NJ, 195-198.

  31. Anaphoric pronoun N. Farsad • A pronoun that refers to a linguistic expression previously introduced in the text • Example • “Nonexpression of the locus even when it is present suggests that these chromosomes …” Here pronoun it is anaphoric • “Thus, it is not unexpected that this versatile cellular …” Here pronoun is non-anaphoric

  32. What does the paper do? N. Farsad Attempts to solve the non-anaphoric it identification using Bayesian networks.

  33. History of Other Algorithms (1) N. Farsad • First pronoun classifier proposed in 1987 by Paice. • Relied on a set of logical first order rules • Non-anaphoric start with it and end with a delimiter like to, that, whether … • Left context of the pronoun should not be immediately preceded by certain words like before, from, to,… • the distance between the pronoun and the delimiter must be shorter than 25 words long. • the lexical items occurring between the pronoun and the delimiter must not contain certain words belonging to specific sets. • Lots of false positives

  34. History of Other Algorithms (2) N. Farsad • To solve the false positive problem in 1994 Lappin proposed • More constrained rules in form of finite state automata. • Helped in finding specific sequences like: It is not/may be<Modaladj>; It is <Cogved> that <Subject> • Solved the false positive problem but introduced lots of false negatives

  35. History of Other Algorithms (3) N. Farsad • Evan in 2001 • Proposed a machine learning approach based on surface clues. • 35 syntactic and contextual surface clues considered for learning • Pronoun position in the sentence • Lemma of the following verb • After learning KNN was used for classification • Accuracy was good but not great.

  36. History of Other Algorithms (4) N. Farsad • Clement in 2004 • Used a similar machine learning approach • Used 21 of the most relevant surface clues • Classified new instances with a SVM • Achieved a better accuracy.

  37. Room for Improvement? N. Farsad Each of the proposed models has its own strength and weaknesses Is it possible to combine the strengths of these systems to create a better system?

  38. The Answer N. Farsad

  39. How does it work? N. Farsad a priori probability values are calculated using frequency counts in the training corpus. a posterior probability is calculated using observations and a priori probability values 50% threshold is used to label the pronoun as non-anaphoric

  40. Results N. Farsad

  41. Conclusion N. Farsad • Bayesian Networks are very powerful and flexible in probabilistic modeling • Inference can be NP-Hard • For tree Bayesian Networks there exist efficient algorithms • Relatively new to NLP • Initial results seem promising

  42. Questions? N. Farsad

More Related