- 111 Views
- Uploaded on
- Presentation posted in: General

Possibility Theory and its applications: a retrospective and prospective view

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Possibility Theory and its applications: a retrospective and prospective view

D. Dubois, H. Prade IRIT-CNRS, Université Paul Sabatier 31062 TOULOUSE FRANCE

- Basic definitions
- Pioneers
- Qualitative possibility theory
- Quantitative possibility theory

- similar to probability theory because it is based on set-functions.
- differs by the use of a pair of dual set functions (possibility and necessity measures) instead of only one.
- it is not additive and makes sense on ordinal structures.

The name "Theory of Possibility" was coined by Zadeh in 1978

- Feasibility:It is possible to do something (physical)
- Plausibility: It is possible thatsomething occurs (epistemic)
- Consistency : Compatible with what is known(logical)
- Permission: It is allowed to do something (deontic)

- S: frame of discernment (set of "states of the world")
- x : ill-known description of the current state of affairs taking its value on S
- L: Plausibility scale: totally ordered set of plausibility levels ([0,1], finite chain, integers,...)
- A possibility distribution πx attached to x is a mapping from S to L : s, πx(s) L, such that s, πx(s) = 1 (normalization)
- Conventions:
πx(s) = 0 iff x = s is impossible, totally excluded

πx(s) = 1 iff x = s is normal, fully plausible, unsurprizing

- If I do not know the age of the president, I may have statistics on presidents ages… but generally not, or they may be irrelevant.
- partial ignorance :
- 70 ≤ x ≤ 80 (sets,intervals)
a uniform possibility distribution

π(x)= 1x [70, 80]

= 0otherwise

- 70 ≤ x ≤ 80 (sets,intervals)
- partial ignorance with preferences : May have reasons to believe that72 > 71 73 > 70 74 > 75 > 76 > 77

- Linguistic information described by fuzzy sets: “ he is old ” : π = µOLD
- If I bet on president's age:I may come up with a subjective probability !
But this result is enforced by the setting of exchangeable bets (Dutch book argument). Actual information is often poorer.

- π' more specific than π in the wide senseif and only if π' ≤ π
In other words: any value possible for π' should be at least as possible for πthat is, π' is more informative than π

- COMPLETE KNOWLEDGE : The most specific ones
- π(s0) = 1 ; π(s) = 0 otherwise
- IGNORANCE : π(s) = 1, s S

- A possibility distribution on S (the normal values of x)
- an event A
How confident are we that x A S ?

- (A) = maxuAπ(s); The degree of possibility that x A
- N(A) = 1 – (Ac)=min uA 1 – π(s)The degree of certainty (necessity) that x A

- In this example, the available knowledge is modeled by p(x) = 1 if x [a, b], 0 otherwise.
- Proposition p = "x > " to be checked
- i) a > : then x > is certainly true : N(x > ) = P(x > ) = 1.
- ii) b < : then x > is certainly false ; N(x > ) = P(x > ) = 0.
- iii) a ≤ ≤ b: then x > is possibly true or false; N(x > ) = 0; P(x > ) = 1.

(A) = to what extent at least one element in A is consistent with π (= possible)

N(A) = 1 – (Ac) = to what extent no element outside A is possible = to what extent π implies A

(A B) = max((A), (B)); N(A B) = min(N(A), N(B)).

Mind that most of the time : (A B) < min((A), (B)); N(A B) > max(N(A), N(B)

Corollary N(A) > 0 (A) = 1

- In the 1950’s, G.L.S. Shackle called "degree of potential surprize" of an event its degree of impossibility.
- Potential surprize is valued on a disbelief scale, namely a positive interval of the form [0, y*], where y* denotes the absolute rejection of the event to which it is assigned.
- The degree of surprize of an event is the degree of surprize of its least surprizing realization.
- He introduces a notion of conditional possibility

- In his 1973 book, the philosopherDavid Lewisconsiders a relation between possible worlds he calls "comparative possibility".
- He relates this concept of possibility to a notion of similarity between possible worlds for defining the truth conditions of counterfactual statements.
- for events A, B, C, A B C A C B.
- The ones and only ordinal counterparts to possibility measures

- The philosopherL. J. Cohenconsidered the problem of legal reasoning (1977).
- "Baconian probabilities" understood as degrees of provability.
- It is hard to prove someone guilty at the court of law by means of pure statistical arguments.
- A hypothesis and its negation cannot both have positive "provability"
- Such degrees of provability coincide with necessity measures.

- Zadeh(1978) proposed an interpretation of membership functions of fuzzy sets as possibility distributions encoding flexible constraints induced by natural language statements.
- relationship between possibility and probability: what is probable must preliminarily be possible.
- refers to the idea of graded feasibility ("degrees of ease") rather than to the epistemic notion of plausibility.
- the key axiom of "maxitivity" for possibility measures is highlighted (also for fuzzy events).

- Qualitative:
- comparative: A complete pre-ordering ≥πon UA well-ordered partition of U: E1 > E2 > … > En
- absolute: πx(s) L = finite chain, complete lattice...

- Quantitative: πx(s) [0, 1], integers...
One must indicate where the numbers come from.

All theories agree on the fundamental maxitivity axiom(A B) = max((A), (B))

Theories diverge on the conditioning operation

- A Bayesian-like equation: A) = min(A), )A) is the maximal solution to this equation.
(B | A)= 1 if A, B ≠ Ø, (A) = (A B) > 0 = (A B) if (A) > (A B)

N(B | A) = 1 – (Bc| A)

• Independence(B | A) = (B) impliesA) = min(), )

Not the converse!!!!

- The set of states of affairs is partitioned via π into a totally ordered set of clusters of equally plausible states
E1 (normal worlds) > E2 >... En+1 (impossible worlds)

- ASSUMPTION: the current situation is normal.
By default the state of affairs is in E1

- N(A) > 0 iff P(A) > P(Ac)
iff A is true in all the normal situations

Then, A is accepted as an expected truth

- Accepted events are closed under deduction

(B) ≥(C) means « Comparing propositions on the basis of their most normal models »

- ASSUMPTION for computing (B): the current situation is the most normal where B is true.
- PLAUSIBLE REASONING = “ reasoning as if the current situation were normal” and jumping to accepted conclusions obtained from the normality assumption.
- DIFFERENT FROM PROBABILISTIC REASONING BASED ON AVERAGING

• If B is learned to be true, then the normal situations become the most plausible ones in B, and the accepted beliefs are revised accordingly

- Accepting A in the context where B is true:
- P(AB) > P(Ac B) iff N(A | B) > 0(conditioning)
• One may have N(A) > 0 , N(Ac | B) > 0 :

non-monotony

Given a non-dogmatic possibility distribution π on S (π(s) > 0, s)

Propositions A, and B

- A |=πB iff (A B) > (A Bc)
It means that B is true in the most plausible worlds where A is true

- This is a form of inference first proposed by Shoham in nonmonotonic reasoning

(in A)

- Pieces of knowledge like ∆ = {b f, p b, p ¬f}
can be expressed by constraints

(b f) > ( b ¬f)

(p b) > (p ¬b)

(p ¬f) > (p f)

- the minimally specific π* ranks normal situations first:
¬p b f, ¬p ¬b

- then abnormal situations: ¬f b
- Last, totally absurd situations f p , ¬b p

- the minimally specific π* ranks normal situations first:

= material implication

- Ranking of rules: b f has less priority that others according to p*: N*(b f ) = N*(p b) > N*(b f)
- Possibilistic base :
K = {(b f ), (p b), (p ¬f)},with <

- Exception-tolerant Reasoning in rule bases
- Belief revision and inconsistency handling in deductive knowledge bases
- Handling priority in constraint-based reasoning
- Decision-making under uncertainty with qualitative criteria (scheduling)
- Abductive reasoning for diagnosis under poor causal knowledge (satellite faults, car engine test-benches)

- A set of states S;
- A set of consequences X.
- A decision = a mapping f from S to X
- f(s) is the consequence of decision f when the state is known to be s.
- Problem : rank-order the set of decisions in XS when the state is ill-known and there is a utility function on X.
- This is SAVAGE framework.

- Uncertainty on states is possibilistica function π: S L
L is a totally ordered plausibility scale

- Preference on consequences:
a qualitative utility function µ: X U

- µ(x) = 0totally rejected consequence
- µ(y) > µ(x)y preferred to x
- µ(x) = 1preferred consequence

- Qualitative pessimistic utility (Whalen):
UPES(f) = minsS max(n(π(s)), µ(f(s)))

where n is the order-reversing map of V

- Low utility : plausible state with bad consequences

- Qualitative optimistic utility (Yager):
UOPT(f) = maxsS min(π(s), µ(f(s)))

- High utility: plausible states with good consequences

- in fuzzy expert systems:
- µ = membership function of rule condition
- π = imprecision of input fact

- in fuzzy databases
- µ = membership function of query
- π = distribution of stored imprecise data

- in pattern recognition
- µ = membership function of attribute template
- π = distribution of an ill-known object attribute

- There exists a common scale V that contains both L and U, so that confidence and uncertainty levels can be compared.
- (certainty equivalent of a lottery)

- If only a subset E of plausible states is known
- π = E
- UPES(f) = minsE µ(f(s)) (utility of the worst consequence in E)
criterion of Wald under ignorance

- UOPT(f)= maxsE µ(f(s))

- xAy (s) = x if A occurs= y if its complement Ac occurs
UPES(xAy) = median {µ(x), N(A), µ(y)}

- Interpretation: If the agent is sure enough of A, it is as if the consequence is x: UPES(f) = µF(x)
If he is not sure about A it is as if the consequence is y: UPES(f) = µF(y)

Otherwise, utility reflects certainty: UPES(f) = N(A)

- WITH UOPT(f) : replace N(A) by (A)

- Suppose the preference relation a on acts obeys the following properties:
- (XS, a) is a complete preorder.
- there are two acts such that f a g.
- A, f, x, y constant,x a y xAfyAf
- if f >a h and g >a h imply f g >a h
- if x is constant, h >a x and h >a g imply h >a xg
then there exists a finite chain L, an L-valued necessity measure on S and an L-valued utility function u, such thata is representable by the pessimistic possibilistic criterion UPES(f).

- Provides a foundation for possibility theory
- Possibility theory is justified by observing how a decision-maker ranks acts
- Applies to one-shot decisions (no compensations/ accumulation effects in repeated decision steps)
- Presupposes that consecutive qualitative value levels are distant from each other (negligibility effects)

- Membership functions of fuzzy sets
- Natural language descriptions pertaining to numerical universes (fuzzy numbers)
- Results of fuzzy clustering
Semantics: metrics, proximity to prototypes

- Upper probability bound
- Random experiments with imprecise outcomes
- Consonant approximations of convex probability sets
Semantics: frequentist, subjectivist (gambles)...

- Orders of magnitude of very small probabilities
degrees of impossibility k(A) ranging on integersk(A) = n iff P(A) = en

- Likelihood functions (P(A| x), where x varies) behave like possibility distributions
P(A| B) ≤ maxx B P(A| x)

- Given a numerical possibility distribution p, defineP(p) = {Probabilities P | P(A) ≤ (A) for all A}
- Then, generally it holds that (A) = sup {P(A) | P P(p)}N(A) = inf {P(A) | P P(p)}
- So p is a faithful representation of a family of probability measures.

Consider a nested family of sets E1E2 … En

a set of positive numbers a1 …an in [0, 1]

and the family of probability functions

P = {P | P(Ei) ≥ ai for all i}.

Pis always representable by means of a possibility measure. Its possibility distribution is precisely

πx = mini max(µEi, 1 – ai)

- Let mi = i – i+1 then m1 +… + mn = 1
A basic probability assignment (SHAFER)

- π(s)= ∑i: sAi mi (one point-coverage function)
- Only in the consonant case can m be recalculated from π

- A Coxian axiom(A C) = (A |C)*(C), with * = product
Then: (A |C) = (A C)/ (C)

N(A|C) = 1 – (Ac | C)

Dempster rule of conditioning (preserves s-maxitivity)

For the revision of possibility distributions: minimal change of when N(C) = 1.

It improves the state of information (reduction of focal elements)

(A |b C) = sup{P(A|C), P ≤ , P(C) > 0}

N(A |b C) = inf{P(A|C), P ≤ , P(C) > 0}

It is still a possibility measure π(s |b C) = π(s)max(1, 1/( π(s) + N(C)))

It can be shownthat:

(A |b C) = (A C)/ ((A C) + N(AcC))

N(A|bC) = N(A C) / (N(A C) + P(AcC))

= 1 – (Ac |b C)

For inference from generic knowledge based on observations

- Why ?
- fusion of heterogeneous data
- decision-making : betting according to a possibility distribution leads to probability.
- Extraction of a representative value
- Simplified non-parametric imprecise probabilistic models

Elementary forms of probability-possibility transformations exist for a long time

- POSS PROB: Laplace indifference principle “ All that is equipossible is equiprobable ” = changing a uniform possibility distribution into a uniform probability distribution
- PROB POSS: Confidence intervals Replacing a probability distribution by an interval A with a confidence level c.
- It defines a possibility distribution
- π(x) = 1 if x A,
= 1 – c if x A

- Possibility probability consistency: P ≤
- Preserving the ordering of events : P(A) ≥ P(B) (A) ≥ (B)or elementary events only(x) > (x') if and only if p(x) > p(x')(orderpreservation)
- Informational criteria:
from to P: Preservation of symmetries

(Shapley value rather than maximal entropy)

from P to : optimize information content

(Maximization or minimisation of specificity

- Rationale : given a probability p, try and preserve as much information as possible
- Select a most specific element of the set PI(P) = {: ≥ P} of possibility measures dominating P such that (x) > (x') iff p(x) > p(x')
- may be weakened into : p(x) > p(x')implies (x) > (x')
- The result is i = j=i,…n pi
(case of no ties)

- The possibility distribution obtained by transforming p encodes then family of confidence intervals around the mode of p.
- The a-cut of is the (1- a)-confidence interval of p
- The optimal symmetric transform of the uniform probability distribution is the triangular fuzzy number
- The symmetric triangular fuzzy number (STFN) is a covering approximation of any probability with unimodal symmetric density p with the same mode.
- In other words the a-cut of a STFN contains the (1- a)-confidence interval of any such p.

- IL = {x, p(x) ≥ } =[aL, aL+ L] is the interval of length L with maximal probability
- The most specific possibility distribution dominating p is π such that L > 0, π(aL) = π(aL+ L) = 1 – P(IL).

b

- Chebyshev inequality defines a possibility distribution that dominates any density with given mean and variance.
- The symmetric triangular fuzzy number (STFN) defines a possibility distribution that optimally dominates anysymmetric density with given mode and bounded support.

- Idea (Kaufmann, Yager, Chanas):
- Pick a number in [0, 1] at random
- Pick an element at random in the -cut of π.
a generalized Laplacean indifference principle : change alpha-cuts into uniform probability distributions.

- Rationale : minimise arbitrariness by preserving the symmetry properties of the representation.

- The centre of gravity of the polyhedron P(p)
- The pignistic transformation of belief functions (Smets)
- The Shapley value of the unanimity game N in game theory.

- Starting point : exploit the betting approach to subjective probability
- A critique: The agent is forced to be additive by the rules of exchangeable bets.
- For instance, the agent provides a uniform probability distribution on a finite set whether (s)he knows nothing about the concerned phenomenon, or if (s)he knows the concerned phenomenon is purely random.

- Idea : It is assumed that a subjective probability supplied by an agent is only a trace of the agent's belief.

- Assumption 1: Beliefs can be modelled by belief functions
- (masses m(A) summing to 1 assigned to subsets A).

- Assumption 2: The agent uses a probability function induced by his or her beliefs, using the pignistic transformation (Smets, 1990) or Shapley value.
- Method : reconstruct the underlying belief function from the probability provided by the agent by choosing among the isopignistic ones.

- There are clearly several belief functions with a prescribed Shapley value.

I(m) = ∑m(A)card(A).

- The least specific belief function in the sense of maximizing I(m) is characterized by
- i = j=1,n min(pj, pi).
- It is a probability-possibility transformation, previously suggested in 1983: This is the unique possibility distribution whose Shapley value is p.
- It gives results that are less specific than the confidence interval approach to objective probability.

- Representing incomplete probabilistic data for uncertainty propagation in computations
- (but fuzzy interval analysis based on the extension principle differs from conservative probabilistic risk analysis)
- Systematizing some statistical methods (confidence intervals, likelihood functions, probabilistic inequalities)
- Defuzzification based on Choquet integral (linear with fuzzy number addition)

- Uncertain reasoning : Possibilistic nets are a counterpart to Bayesian nets that copes with incomplete data. Similar algorithmic properties under Dempster conditioning (Kruse team)
- Data fusion : well suited for mergingheterogeneous information on numerical data (linguistic, statistics, confidence intervals) (Bloch)
- Risk analysis : uncertainty propagation using fuzzy arithmetics, and random interval arithmetics when statistical data is incomplete (Lodwick, Ferson)
- Non-parametric conservative modelling of imprecision in measurements (Mauris)

Quantitative possibility is not as well understood as probability theory.

- Objective vs. subjective possibility (a la De Finetti)
- How to use possibilistic conditioning in inference tasks ?
- Bridge the gap with statistics and the confidence interval literature (Fisher, likelihood reasoning)
- Higher-order modes of fuzzy intervals (variance, …) and links with fuzzy random variables
- Quantitative possibilistic expectations : decision-theoretic characterisation ?

- Possibility theory is a simple and versatile tool for modeling uncertainty
- A unifying framework for modeling and merging linguistic knowledge and statistical data
- Useful to account for missing information in reasoning tasks and risk analysis
- A bridge between logic-based AI and probabilistic reasoning

- A |=π A if A ≠ Ø (restricted reflexivity)
- if A ≠ Ø, then A |=πØ never holds (consistency preservation)
- The set {B: A |=π B} is deductively closed
-If A B and C |=π A then C |=π B

(right weakening rule RW)

-If A |=π B and A |=π Cthen A |=π B C

(Right AND)

- If A |=π C ; B |=π C then A B |=π C (Left OR)
- If A |=π B and A B |=π C then A |=π C
(cut, weak transitivity )

(But if A normally implies B which normally implies C, then A may not imply C)

- If A |=π B and if A |=π Cc is false, then A C |=π B(rational monotony RM)
If B is normally expected when A holds,then B is expected to hold when both A and C hold, unless it is that A normally implies not C

- Let |= be a consequence relation on 2S x 2S
- Define an induced partial relation on subsets as
A > B iff A B |= Bc for A ≠

- Theorem: If |= satisfies restricted reflexivity, right weakening, rational monotony, Right AND and Left OR, then A > B is the strict part of a possibility relation on events.
So a consequence relation satisfying the above properties is representable by possibilistic inference, and induces a complete plausibility preordering on the states.

- A generic rule « if A then B » is modelled by P(AB) > P(Ac B).
- This is a constraint that delimits a set of possibility
- distributions on the set of interpretations of the language

- ∆ = {Ai Bi, i = 1,n}
- ∆ defines a set of constraints on possibility distributions (Ai Bi) > (Ai ¬Bi), i = 1,…n
- • (∆) = set of feasible π's with respect to ∆
• One may compute * : the least specific possibility distribution in (∆)

What « ∆ implies A B » means

- Cautious inference
∆ = A B iff

For all P (∆), P(AB) > P(Ac B).

- Possibilistic inference
∆ =* A B iff *(AB) > *(Ac B) for the least specific possibility measure in (∆).

Leads to a stratification of ∆ according to N*(Ac B)

- A possibilistic knowledge base is an ordered set of propositional or 1st order formulas pi
- K = {(pii), i = 1,n} where i > 0 is the level of priority or validity of pi
i = 1 means certainty.

i = 0 means ignorance

- Captures the idea of uncertain knowledge in an ordinal setting

- Axiomatization:
All axioms of classical logic with weight 1

Weighted modus ponens{(p ), (¬p q )} |- (q min(,))

OLD! Goes back to Aristotle school

Idea: the validity of a chain of uncertain deductions is the validity of its weakest link

Syntactic inference K |-(p ) is well-defined

- Inconsistency becomes a graded notion inc(K) = sup{, K |- (,)}
- Refutation and resolution methods extendK |- (p ) iffK {(p 1)} |- (,)
- Inference with a partially inconsistent knowledge base becomes non-trivial and nonmonotonicK |-nt p iff K |- (p ) and > inc(K)

- A weighted formula has a fuzzy set of models .
- If A = [p] is the set of models of p (subset of S),
- |-(p a) means N(A) ≥
The least specific possibility distribution induced by |-(p a) is:

π(p a)(s) = max(µA(s), 1 – )

= 1 if p is true in state s

= 1 – if p is false in state s

- The fuzzy set of models of K is the intersection of the fuzzy sets of models of {(pii), i = 1,n}
- πK(s)= mini=1,n {1 – i | s [pi]}
determined by the highest priority formula violated by s

- The p. d. πK is the least informed state of partial knowledge compatible with K

- Monotonic semantic entailment follows Zadeh’s entailment principleK |= (p, ) stands for πK ≤ π(p a)
Theorem: K |- (p, ) iff K |= (p )

- For the non-trivial inference under inconsistency:{(p 1)} K |-nt q iff (q p) > (¬q p)

Possibilistic logic

Formulas are Boolean

Truth is 2-valued

Weighted formulas have fuzzy sets of models

Validity is many-valued

degrees of validity are not compositional except for conjunctions

Represents uncertainty

Fuzzy logic (Pavelka)

Formulas are non-Boolean

Truth is many-valued

Weighted formulas have crisp sets of models (cuts)

Validity is Boolean

degrees of truth are compositional

represents real functions by means of logical formulas

• K = {b f, p b, p ¬f}

= material implication

- K {b} |- f; K {p} |- contradiction

K = {(b f ), (p b ), (p ¬f )}

then K {(b, 1)} |- (f ) and K {(b, 1)} |-nt f