1 / 20

Made by: Maor Levy, Temple University 2012

Probability in Artificial Intelligence Unit 3, Introduction to Artificial Intelligence, Stanford online course. Made by: Maor Levy, Temple University 2012. Probability expresses uncertainty. Pervasive in all of Artificial Intelligence Machine learning Information Retrieval (e.g., Web)

johnstewart
Download Presentation

Made by: Maor Levy, Temple University 2012

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Probability in Artificial IntelligenceUnit 3, Introduction to Artificial Intelligence, Stanford online course Made by: Maor Levy, Temple University 2012

  2. Probability expresses uncertainty. • Pervasive in all of Artificial Intelligence • Machine learning • Information Retrieval (e.g., Web) • Computer Vision • Robotics • Based on mathematical calculus. • Probability of a fair coin:

  3. Example: Probability of cancer • P(has cancer) = 0.02 • P(has cancer) = 0.98 • Multiple events: cancer, test result • P(has cancer, test positive) • The problem with joint distributions: it takes numbers to specify them!

  4. Conditional Probability describes the cancer test: • P(test positive | has cancer) = 0.9 • P(has cancer) = 0.2 • Put this together with: Prior probability • P(has cancer) = 0.02 • P(test negative | has cancer) = 0.1 • Total probability is a fundamental rule relating marginal probabilities to conditional probabilities.

  5. In summary: • P(has cancer) = 0.02 • P(¬has cancer) = 0.98 • P(test positive | has cancer) = 0.9 • P(has cancer) = 0.2 • P(test negative | has cancer) = 0.1 • P(test negative | has cancer) = 0.8 • P(cancer) and P(Test positive | cancer) is called the model. • Calculating P(Test positive) is called prediction. • Calculating P(Cancer | test positive) is called diagnostic reasoning.

  6. A belief network consists of: • A directed acyclic graph with nodes labeled with random variables • a domain for each random variable • a set of conditional probability tables for each variable • given its parents (including prior probabilities for nodes with no parents). • A belief network is a graph: the nodes are random variables; there is an arc from the parents of each node into that node. • A belief network is automatically acyclic by construction. • A belief network is a directed acyclic graph (DAG) where nodes are random variables. • The parents of a node n are those variables on which n directly depends. • A belief network is a graphical representation of dependence and independence: • A variable is independent of its non-descendants given its parents.

  7. Whether l1 is lit (L1_lit) depends only on the status of the light (L1_st) and whether there is power in wire w0. Thus, L1_lit is independent of the other variables given L1_st and W0. • In a belief network, W0 and L1_st are parents of L1_lit. • Similarly, W0 depends only on whether there is power in w1, whetherthere is power in w2, the position of switch s2 (S2_pos), and the status of switch s2 (S2_st).

  8. To represent a domain in a belief network, you need to consider: • What are the relevant variables? • What will you observe? • What would you like to find out (query)? • What other features make the model simpler? • What values should these variables take? • What is the relationship between them? This should be expressed in terms of local influence. • How does the value of each variable depend on its parents? This is expressed in terms of the conditional probabilities.

  9. The power network can be used in a number of ways: • Conditioning on the status of the switches and circuit • breakers, whether there is outside power and the position of the switches, you can simulate the lighting. • Given values for the switches, the outside power, and whether the lights are lit, you can determine the posterior probability that each switch or circuit breaker is ok or not. • Given some switch positions and some outputs and some intermediate values, you can determine the probability of any other variable in the network.

  10. A Bayes network is a form of probabilistic graphical model. Specifically, a Bayes network is a directed acyclic graph of nodes representing variables and arcs representing dependence relations among the variables. • A representation of the joint distribution over all the variables represented by nodes in the graph. Let the variables be X(1), ..., X(n). • Let parents(A) be the parents of the node A. • Then the joint distribution for X(1) through X(n) is represented as the product of the probability distributions P(Xi | Parents(Xi)) for i = 1 to n: • If X has no parents, its probability distribution is said to be unconditional, otherwise it is conditional.

  11. Examples of Bayes network:

  12. True Bayesians actually consider conditional probabilities as more basic than joint probabilities. • It is easy to define P(A|B) without reference to the joint probability P(A,B). • Bayes’ Rule: • Back to the cancer example:

  13. Two variables are independent if: • It means that the occurrence of one event makes it neither more nor less probable that the other occurs. • This says that their joint distribution factors into a product two simpler distributions • This implies: • We write • Independence is a simplifying modeling assumption • Empirical joint distributions: at best “close” to independent • For example: • The event of getting a 6 the first time a die is rolled and the event of getting a 6 the second time are independent. • By contrast, the event of getting a 6 the first time a die is rolled and the event that the sum of the numbers seen on the first and second trials is 8 are not independent.

  14. Two events are dependent if the outcome or occurrence of the first affects the outcome or occurrence of the second so that the probability is changed. • Example: A card is chosen at random from a standard deck of 52 playing cards. Without replacing it, a second card is chosen. What is the probability that the first card chosen is a queen and the second card chosen is a jack? • Probabilities: • P(queen on first pick) = • P(jack on 2nd pick given queen on 1st pick) = • P(queen and jack) =

  15. X and Y are conditionally independent given a third event Z precisely if the occurrence or non-occurrence of X and the occurrence or non-occurrence of Y are independent events in their conditional probability distribution given Z. We write:

  16. A B • Why Bayes Networks? • P(A) • P(B) • P(C|A,B) • P(D|E) • P(E|C) • Joint Distribution of any five variables is: • In Bayes network: • P(A,B,C,D,E)=P(A)*P(B)*P(C|A,B)*P(D|E)*P(E|C) • Parameters: 1 1 4 2 2 Total of 10 C D E

  17. The Naïve network has • Bayes Network needs only 47 numerical probabilities to specify the joint.

  18. Active Triples Inactive Triples • Are X and Y conditionally independent given evidence vars {Z}? • Yes, if X and Y “separated” by Z • Look for active paths from X to Y • No active paths = independence! • A path is active if each triple is active: • Causal chain A  B  C where B is unobserved (either direction) • Common cause A  B  C where B is unobserved • Common effect (aka v-structure) A  B  C where B or one of its descendants is observed • All it takes to block a path is a single inactive segment

  19. L R B R B • Examples: T D T T’ Yes T’ Yes Yes Yes

  20. Overview: • Bayes network: • Graphical representation of joint distributions • Efficiently encode conditional independencies • Reduce number of parameters from exponential to linear (in many cases)

More Related