Combining Logic and Probability

Combining Logic and Probability

Can logic be combined with probability? Trivially yes: logic can be combined with anything. • Slightly less trivially, logic is already involved in constraining probabilities: certain deductive relationships have to be respected.

Carnap ‘combined’ logic with probabilty by defining probabilities on logical formulas and called the result logic - without much in the way of argument, and none very convincing. But he set a trend that continues …

Carnap’s model seems to have been logic + probability = logical probability Cf. cheese + soufflé = cheese soufflé • But other people though the ages, sporadically from the 17th century to the 20th century, have had bigger things in mind: specifically Logic  Probability = Logic

I think they’re right. But the unity at this level is fairly coarse-grained and hides a diversity of approaches. I will describe one, with which I am most in sympathy, and which seems in line with some versions of progicism.

However, given some plausible(?) constraints, it leads us in a possibly unexpected direction. But not to progicide. Now read on …

The formalisms of logic and probability are remarkably similar, suggesting a conceptual kinship. Both are domain-general theories of validity, consisting of a type of propositional algebra A and a real-valued non-negative additive valuation function V on A.

Seventy years ago de Finetti claimed that the rules of subjective probability are consistency constraints on probability-evaluations (‘coerenza’ = ‘consistency’). The semantic characteristic of consistency is possession of a model. Question: Do ‘salient’ features of deductive model theory extend naturally to probability?

In 1964 Haim Gaifman went a long way to answering this question positively. He extended the notion of a valuation in a relational structure to that of a probability model for a first order language L.

A classical relational structure for a first order language L can be viewed as a pair (D,m), where m is an additive function assigning values in {0,1} to atomic sentences of L(D) (L plus enough constants to name all members of D) and extends the valuation to all sentences via a recursive definition. Gaifman considered structures (A,m) where m is a finitely additive function taking values in [0,1].

{0,1}  [0,1]: “a small step for (Gaif)man; a giant stride for mankind.”

His best known result was a ‘logical’ analogue of the extension theorem for countably additive measures: a probability on the quantifier-free sentences of L has a unique extension to all the sentences of L satisfying the Gaifman condition. He also defined natural analogues of submodel, elementary submodel etc..

Thus was inaugurated a new branch of logic ‘continuously’ extending the model theory of first order logic (it becomes a limiting case).

Shortly afterwards Dana Scott and Peter Krauss extended Gaifman’s work in two important ways: they (i) considered language systems closer in expressive power to the sigma-algebras of mathematical probability, and (ii) developed a corresponding proof theory.

Unsurprisingly, just as with ‘ordinary’ logic we shall see (i) and (ii) pull in opposite directions.

(i ) Expressive power. The typical algebraic structure investigated in mathematical probability is a -algebra with a countably additive probability function defined on it.

There is a well-known extension of first order logic (retaining finite quantification), closed under countable conjunctions and disjunctions. Such a language is called an L1, language. The classes of models of L1, sentences are a -complete Boolean algebra.

So far so good.

“.... L1,0 is the proper generalisation of L0,0 to denumerable formulas. In fact we have seen several reasons for claiming that L1,0 plays the same role for L0,0 that the theory of Borel sets and -fields plays for the ordinary field of sets.” Dana Scott.

Interesting side-note: this infinitary language has much stronger definability properties than first order: a single L1, sentence defines the standard model of arithmetic, and another the class of finite sets (these facts don’t seem very relevant to probability considerations).

There is also a finitely-specifiable proof system relative to which each valid sentence is provable (though a rule of /\-introduction means there are countably infinite proofs). There is no strong completeness theorem, or compactness theorem, which worried several people (Kreisel, Barwise) but doesn’t seem especially relevant here.

Scott and Krauss extended Gaifman’s notion of a probability model to L1,: this is a complete Boolean algebra B, a set D with a relation Id mapping DD into B (with Id(a,a) = 1 for all a in D), and a strictly positive countably additive probability function m on B. If m is two-valued B is the Boolean algebra 2 and the model reduces to a classical relational structure.

(ii) Proof theory. Because there are only two values in deductive logic they don’t have to be, and usually are not, represented in the primitive vocabulary. But it has been done: Smullyan developed an effective (r.e.) proof theory for first order languages with valued sentences: these have the form AT and AF, which we can suggestively represent v(A) = 1, v(A) = 0. For obvious reasons no embedding is permitted.

Things are more complicated here because probability assignments should be able to specify a variety of relationships among real numbers – e.g. those specifiable in some suitable language/theory T.

Scott and Krauss developed a formal language for probability assertions. The atomic ones have the form (, A) where  is a formula of L(T) with n free variables and A = (A1, ... , An) are sentences of L1,. (, A) ‘says’ that the probabilities of A1, ... , An satisfy . So (Tarski’s T-schema!) (, A) is true in a probability model with probability function m just in case [m(A1), ... , m(An)] holds in the intended model of T.

‘Valid probability assertion’ and ‘valid consequence’ are defined completely analogously to the deductive case: ‘ is valid’ = ‘ is true in all models’, and ‘ is a consequence of ’ = ‘every model of  is a model of ’.

Of course there is the question of T ...

S and K, like several people since, chose the first order theory M of real closed fields. This is algebraic real number theory. It is complete and decidable. • The decidability is very important: it is needed to show that • (i) the set of valid probability assertions is effectively enumerable. • (ii) For the first order case ‘effectively’ = ‘recursively’.

(i) and (ii) are a remarkable result. Note that ‘effectively enumerable’ (i) means definable by a type of formula of set theory in a very large set, the set of hereditarily countable sets.

On the other hand, as far as capturing ordinary probability theory is concerned this logic of probability assertions is still (relatively) weak: e.g. it is not a valid probability assertion that the probabilities of a countable set of mutually inconsistent sentences whose disjunction is a logical truth sum to 1. Ironical since the whole point of having countable operations was to be able to say and prove such things.

A proof theory for probabilities on an infinitary language which doesn’t suffer from such defects will require a language and theory in which one can talk and prove things about arbitrary sets of sentences, of real numbers, convergence of infinite sequences of reals, sums of such sequences and so forth.

In that case set theory becomes the most plausible candidate for a ‘logic’ of probability assertions. Nothing in principle wrong with that: first order set theory is r.e (if you think real logic has to be r.e.). … or you could go back to finitary languages.

The second option seems a shame, and a cop-out in view of the desire for expressiveness that drove this enterprise.

But the set theory option is not outrageous. And it offers a valuable free bonus: unification (methodologians tell us this is a virtue).

Up to isomorphism a certain class of sets and a certain class of infinitary sentences are indistinguishable (A |Mod(A)). It is fundamental to probability theory to respect this, via the rule if A  B then P(A) = P(B): probabilities are essentially defined on propositions in a Boolean algebra.

A Modest Proposal • So let’s use Occam’s scythe to cut out the sentences entirely and model the object-languages as arbitrary -algebras of sets (Fabio Cozman yesterday quietly switched from formulas to sets). Set theory thereby accommodates both a semantics and proof theory (Kolmogorov axioms, or maybe de Finetti ones) for probability assignments to sets.

Semantics. • A consistent assignment is one that can be extended to all members of the algebra in a way that satisfies the constraints imposed by the probability axioms. • Model: an extension to the whole algebra (cf propositional logic). • A assignment v is a consequence of a set V of assignments if every model of V is a model of v.

Note that with finite additivity (but not countable) we get absoluteness of consistency: a semantically consistent assignment can be extended to any including algebra. Cf deductive logic.

We could even splurge and use second order set theory. It’s ‘almost’ categorical.

Either way, first or second order, we have incompleteness. Well-known fact of logic: expressive adequacy and axiomatisability can’t be simultaneously optimised • Thank you and good night.

Not quite. Back to de Finetti (‘I am alpha and omega’). Another de Finetti preoccupation now comes to the fore: • Countable versus finite additivity. With probabilities defined on finitary languages the question doesn’t arise. But on -algebras it definitely does, and must be answered.

Other reasons against • de Finetti claimed that countable additivity adds information that a purely logical principle shouldn’t: e.g. • (i) Start with a uniform distribution over [0,1] and then learn that the truth is a rational number. With countable additivity the new distribution is very skewed: for any enumeration the probability must tend to 0; • (ii) the notion of a fair countably infinite lottery becomes inconsistent. It may be unrealistic but it is not inconsistent.

Objections: 1. Non-countably additive probability functions on infinite algebras are nonconglomerable: for some countable partition {Bi: i>0} you will have P(A|Bi)  [a,b] for some a,b and P(A) outside [a,b]. ‘The Paradox of Non-Conglomerability’. But no paradox: it is not a failure of ‘or’- introduction. (But you are able to ‘reason to a foregone conclusion’: Dubins)

2. Violation of countable additivity is Dutch Bookable. E.g. the countable fair lottery is Dutch Bookable. But Dutch Book arguments implicitly assume that if bets are individually acceptable so must be finite and countable sums where these are defined. Obviously false in general.

3. Halpern: ‘There are no uniform probability functions in countable domains’. Ah, yes.

Combining Logic and Probability

Combining Logic and Probability

Presentation Transcript

Using Probability and Probability Distributions

PROBABILITY AND CONDITIONAL PROBABILITY

Lexicalizing and Combining

Learning, Logic, and Probability: A Unified View

Probability and Probability Distributions

Markov Logic: A Simple and Powerful Unification Of Logic and Probability

Combining Probabilities and Conditional Probability

Topic 3 Sets, Logic and Probability

Markov Logic: Combining Logic and Probability

Unifying logic and probability A new dawn for artificial intelligence?

Probability and Conditional Probability

Probability and Probability Distribution

PLUIE: Probability and Logic Unified for Information Extraction

Combining

Combining

Probability and Probability Density Functions

Markov Logic: A Simple and Powerful Unification Of Logic and Probability