Basics of Probability

Basics of Probability

A Bit Math A Probability Space is a triple <, F, P>, where •  is the sample space: a non-empty set of possible outcomes; • F is an algebra (a.k.a field) on , that is to say, F is a set of subsets of that contains as a member and is closed under union and complementation. • P is a function from F to real numbers: such that (1) P(S)  0, for all S  F; (2) P() = 1; (3) for all A, B  F, if AB=, then P(AB) = P(A) + P(B)

Example • Sample space:  ={1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13}. • Algebra: F = P(), the set of all subsets of . • Technically members of F are usually referred to as events. But they can also express properties, propositions, etc. e.g. event: “a black object being selected” – {1,2,3,4,5,6,7,8,9} property: “Showing a letter A” – {1,2,7,10,12} proposition: “the object selected is both round and black” – {7, 8, 9} A A B B B B A B B A B A B 1 2 3 4 5 6 7 8 9 10 11 12 13

Conditional Probability • DefineP(A | B) = P(AB) / P(B), for P(B) > 0. • Bayes theorem: P(B | A) = [P(A | B) P(B)] / P(A), • Law of total probability: Let B1, …, BmF be a partition of , that is to say, Bi  Bj =  for all i j, and B1 , …,  Bm = , P(A) = iP(A  Bi) = iP(A | Bi) P(Bi) • Independence: two events A, B are said to be independent if P(A|B) = P(A) (which implies that P(B|A) = P(B)). • Conditional Independence: A, B are independent conditional on C if P(A | B, C) = P(A | C)

Example • Suppose one object is chosen randomly from the 13 objects. Consider the following events: A (having a letter A), B (having a letter B), s (being a square), w (being white). • P(A) = 5/13, P(A & s) = 3/13, P(s | A) = P(A & s) / P(A) = 3/5. • P(s) = P(A) P(s | A) + P(B) P(s | B) = 8/13 • P(A | s) = [P(A) P(s | A)] / P(s) = 3 / 8 • A and s are not independent. • P (A | w) = P (A | s, w) = 1/2. So J and s are independent conditional on w. A A B B B B A B B A B A B 1 2 3 4 5 6 7 8 9 10 11 12 13

Different Concepts of Probability • Probability as relative frequency • Probability as propensity • Probability as degrees of logical entailment • Probability as degrees of belief

Probability as Relative Frequency • Probability is some sort of relative frequency: • relative frequency in a finite population? • limiting relative frequency in an infinite sequence? • relative frequency in a “suitably” large population? • Problem of single case:how to make sense of probability statement about a singular case? Tempting answer: relative frequency of passing in a reference class of similar cases. • Problem of reference class:one case can belong to multiple reference classes. One answer: narrowest class with enough data.

Probability as Propensity • Probability is a propensity or tendency of a chance set-up to produce a certain outcome. • It seems natural to talk about single-case probabilities (usually known as chances), but only relative to some chance set-up. • The chance set-up needs to be stable and genuinely chancy to admit non-degenerate probability values. • Relative frequencies are still relevant for measuring propensity.

Logical Probability • Probability is degree of partial entailment or confirmation between propositions. • Motivated by the aspiration of generalizing deductive logic into an inductive logic. • Given such a logic, we can talk about valid inductive arguments: they are arguments in which the premises confer the right logical (or inductive) probability to the conclusion. • There are then many logical probabilities associated with a proposition, relative to different “evidence” propositions. Principle of Total Evidence: the logical probability relative to all your evidence is your “right” credence.

Subjective Probability • Probability is degree of belief of some agent. • Assumed to have a nice correspondence with one’s betting behavior. • Dutch-book argument: your fair betting odds satisfies the probability calculus if and only if nobody can “book” you by designing a combination of bets acceptable to you that entails a sure loss on your part. • Various proposals of putting more constraint on subjective probability based on relative frequency, chance or logical probability.

Three Approaches to Statistical Explanation

Hempel’s I-S Model • Basic idea: a statistical explanation of a particular event is a correct inductive argument that contains essentially a statistical law in the premises, and confers a high (logical or inductive) probability to the explanandum. • Generic form: p (G; F) = r (statistical law) Fa (particular facts about a) Ga (the explanandum) • Note that the conclusion of the (inductive) argument is not a probabilistic statement, but rather the very explanandum, the sentence describing the particular event to be explained. (r)

The Problem of Ambiguity • One problem is that there can be two correct inductive arguments with high inductive probability but contradictory conclusions. • The root of the problem is that it is possible to have two true statistical laws: p(G; F) = r1 and p(G; F&E) = r2 such that r1 is very high, but r2 is very low. • You may read the above two laws this way: “Relative to class F (or among objects with property F), the probability of G is high (r1)”, and “Relative to class F&E (or among objects with both F and F), the probability of G is low (r2)” • It looks like a problem of reference class.

Maximal Specificity • It is then natural to suggest the following solution: choose the most specific or the narrowest “reference class”. • This is essentially the requirement of maximal specificity imposed by Hempel. • Note that his “maximal specificity” is relative to a knowledge situation. That is, the requirement is that the statistical law used in an I-S explanation must be the most specific statistical law we know that governs the case in question. (There might be even more specific but unknown statistical law that governs the case.) • So I-S explanations are always relative to a knowledge or epistemic situation.

Railton’s D-N-P Model • Basic Idea: The argument central to a statistical explanation is still deductive, not inductive. The relevant statistical laws are “universal laws about chances”. • Generic form: x (Fx  p(G(x)) = r) (statistical law) Fa (particular facts about a) p(Ga) = r(chance of the explanandum) • Note that the conclusion of the argument is not the explanandum itself, but a sentence describing the chance of the particular event in question.

Salmon’s S-R Approach • Basic idea: a statistical explanation is not an argument at all, but is rather (based upon) an assembly of all (and only) facts that are statistically relevant to the explanandum, and how the probability of explanandum depends on the relevant factors. • Salmon gave 8 steps/conditions for constructing the S-R basis. • Condition 5 essentially requires that all statistically relevant factors are considered in the partition. (An analogue to Hempel’s maximal specificity, but without being relativized to a knowledge situation.) • Condition 7 essentially requires that only statistically relevant factors are used in the partition. • Also note that all relevant probabilities are calculated in step 4.

A Few More Observations • Neither Railton nor Salmon requires high probability of any sort. • Neither Railton nor Salmon relativize the concept of statistical explanation to a knowledge situation. • Salmon, but not Hempel or Railton, seems to suggest that it is not only important to ascertain the probability (relative frequency with respect to the right reference class or chance given the right chance set-up) of the explanandum, but also crucial to reveal how that probability depends on various factors.

Basics of Probability