1 / 33

# Axioms and Algorithms for Inferences Involving Probabilistic Independence - PowerPoint PPT Presentation

Axioms and Algorithms for Inferences Involving Probabilistic Independence. Dan Geiger, Azaria Paz, and Judea Pearl, Information and Computation 91(1), March 1991, 128-141. Presentation by Guy Moses & Omer Weissbrod

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about ' Axioms and Algorithms for Inferences Involving Probabilistic Independence' - chace

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### Axioms and Algorithmsfor InferencesInvolvingProbabilistic Independence

Dan Geiger, Azaria Paz, and Judea Pearl,

Information and Computation 91(1), March 1991, 128-141.

Presentation by Guy Moses & Omer Weissbrod

for the course 236372 - Bayesian NetworksComputer Science Faculty, Technion – winter 2009

partially based on the presentation by Ilan Gronau

Introduction- some definitions, notations and reminders.

Proof of Completeness. - “if it’s true – it can be proved”.

 Preparations for the Membership Algorithm–more definitions, and some theoretical groundwork.

The Membership Algorithm– description, proof of correctness, complexity analysis.

• U (Universe) – set of random variables with probability distributionP.

• X,Y – finite sets of random variables:X= x1,…,xn, Y = y1,…,ym

• P(X,Y) = P(X)·P(Y)- a short-hand notation for the equality:Pr{x1=a1,…, xn=an, y1=b1, …, ym=bm} = Pr{x1=a1,…, xn=an} · Pr{y1=b1, …, ym=bm}

for every choice of a1, …, an, b1, …, bm

• (X,Y) – short-hand for P(X,Y) = P(X)·P(Y)

This is called an independence statement.

*note that X,Yare disjoint sets of variables (XY = ).

•  - a specific independence statement of the form (X,Y)

•  - a set of independence statements of the form (X,Y):  = 1, … , k

• XY-short-hand notation for the union X  Y

• P satisfies = (X,Y) means: P(X,Y) = P(X)·P(Y)for that specific P.

Definitions:

•   iff every distribution that satisfies  also satisfies .

•    iff   cl(),i.e. there exists a derivation chain 1,…,n=  s.t. for each j, either j  or jis derived by an axiom from the previous statements.

For a set of axioms A:

Soundness: A is sound iff for every  and :      

Completeness: A is complete iff for every  and :    

Completeness - Alternative definition:A is complete iff for every  and every cl()there exists a distribution Pthat satisfies cl)( and does not satisfy.

We saw (in 1st lecture) that axioms 1a-1d are sound (always infer correctly).

Today we’ll show they are complete (can derive every true statement).

 Introduction- some definitions, notations and reminders.

 Proof of Completeness. - “if it’s true – it can be proved”.

 Preparations for the Membership Algorithm–more definitions, and some theoretical groundwork.

 The Membership Algorithm– description, proof of correctness, complexity analysis.

• Definition: =(X,Y) cl()is minimal if for every non-empty X’,Y’s.t.X’X, Y’ Y,X’Y’XY we have (X’,Y’)  cl().

• For every=(X,Y) cl()we can find an appropriate minimal ’=(X’,Y’)cl()through iterative decomposition.

• Observation: Psatisfies   Psatisfies’(decomposition soundness),

Therefore:Pdoesn’t satisfy ’ Pdoesn’t satisfy .

• Our plan: Given an arbitrary cl(), We will find a distribution P that satisfies cl() but doesn’t satisfy ’. This will prove completeness (using the alternative completeness definition and the observation above).

• To simplify annotation, we will assume WLOG that =(X,Y)is already minimal.

=0.5n

=0.5m

Completeness Proof

Let =(X,Y) cl()be a minimal statement where:

X={x1,…,xn},Y={y1,…,ym},andZ={z1,z2,…,zk}stand for the rest of the variables in U.

We will construct Pas follows: All variables, except x1, are fair coins (probability  for each of their two values)

x1 is defined thus:

Part 1: P does not satisfy 

We will inspect the following scenario: x1=1, all other variables are 0.

P(x1, … , xn, y1, … , ym)  P(x1, … , xn)·P(y1, … , ym)

Therefore, P does not satisfy , as required.

Part 2: P satisfies cl()

Let(V,W)  cl(). We will show thatP(V,W)=P(V)·P(W). This is done by inspecting different scenarios:

Scenario 1: either V or W contains only elements of Z. We will assume WLOG that W contains only elements of Z.

all variables in Z are independent under Pand therefore:

Z

W

Z

Z

Z

Y

Z

Z

Z

V

Z

Z

Y

X

Y

Z

X

Y

Y

X

Z

Y

Z

Z

X

Z

Part 2: P satisfies cl()

Let(V,W)  cl(). We will show thatP(V,W)=P(V)·P(W). This is done by inspecting different scenarios:

Scenario 2: Both V and W contain elements of X  Y,butV  W doesn’t contain all elements of X  Y.

Without full information about the assignments of the variables in X  Y, x1could turn out to be 0 or 1 with probability, and therefore:

Z

Z

W

Z

Z

Y

Z

Z

Z

Z

Z

V

Y

X

Y

Z

X

Y

Y

X

Z

Y

Z

Z

X

Z

mix

Completeness Proof – cont’d

Part 2: P satisfies cl()- continued

Scenario 3: Both V and W contain elements of X  Y, and(X  Y)(V  W).

We will show a derivation chain for =(X,Y), contradicting our original assumption that  cl():

Mark: (V,W)=(XVYVZV, XWYWZW)cl()

where: Y=YVYW, X=XVXW, ZVZWZ, V=XVYVZV,W=XWYWZW

Remove all z’s by decomposition: (XVYV,XWYW)cl()

Due to minimality of=(X,Y):(XV,YV)cl()and (XW,Y)cl()

(XV,YV)(XVYV,XWYW) (XV,YV XWYW) = (XV,XWY)

(XW,Y)  (XWY,XV) (Y,XVXW) = (Y,X) =

Z

Z

Z

Z

Y

W

Z

Z

Z

Z

Z

Y

X

V

Y

Z

X

Y

Y

X

Z

Y

Z

Z

X

Z

Reminder: Completeness - Alternative definition:A is complete iff for every  and every cl()there exists a distribution Pthat satisfies cl)( and does not satisfy.

We’ve shown: given a minimalcl(),there exists a distributionPthat obeys:

• Pdoes not satisfy.

• Psatisfies.

Given a non-minimal  cl(), we will derive itsminimal statement ’, and devise a distribution P’that satisfies but does not satisfy ’. Due to soundness of decomposition, P’ cannot satisfy  as well.

discrete p.d.’s

normalp.d.’s

binary p.d.’s

Scope of Completeness

The proof uses P- a binary p.d. (probability distribution

function) therefore:

• P

however,

for normal p.d.’s, the axiom set a1-d1 is not complete.

a stronger axiom is required:

replace:

with:

Introduction- some definitions, notations and reminders.

 Proof of Completeness. - “if it’s true – it can be proved”.

 Preparations for the Membership Algorithm–more definitions, and some theoretical groundwork.

The Membership Algorithm– description, proof of correctness, complexity analysis.

Definition: Span

span(): the set of elements represented in statement .

Example: span(x1x2,x3,x4) = {x1,x2,x3,x4}

span(): the set of elements represented in all statements of .

Example: span({(x1,x2),(x1,x3)}) = {x1,x2,x3}

Definition: Projection

The projection of onX, denoted (X), is the statement derived from by removing all elements not in X from .

Example: if =(x1x2x3, x4x5)and X={x2,x3,x4}then (X)=(x2x3, x4).

The projection of onX, denoted (X), is {(X) |   }.

Projection Lemma:  iff‘ , where ’= (span())

)if '  then clearly   because all the statements in ‘ can be derived from the statements in  by decomposition.

Projection Lemma:   iff’  , where ’ = (span()), s = span()

)if then there is a derivation chain for : 1, 2, … , k.

For each j:

if k  j,k<j, (by symmetry or decomposition)

then k(s) j(s)by symmetry or decomposition respectively.

Similarly,

if j is derived from kandl by mixing,

then j(s)is derived from k(s),l(s)by mixing. 

Projection Lemma:   iff’  , where ’ = (span()), s = span()

Observations from projection lemma:

• Variables not in are unnecessary for determining whether   .

• The problem of verifying whether   can be simplified to the problem of verifying whether ', where '= (span()).

• This problem can be solved with a possibly reduced time and space complexity.

Conditions for Inference of Independence

Maim claim: for a given ,  we have ’  iff:

•  is trivial: =(X,)(up to symmetry)

OR

•  is in ’:’(up to symmetry)

OR

•  is derivable from ’:

there exists ’’s.t. span() = span(’)

and for ’=(AP,BQ) =(AQ,BP) (A,B,Q,P may be empty)

’  (A,P), ’  (B,Q) (up to symmetry)

Maim claim: for a given ,  we have ’  iff:

•  is trivial*: =(X,) *up to symmetry

•  is in*’:’

•  is derivable* from ’:’’s.t. span() = span(’)

and for ’=(AP,BQ) =(AQ,BP) : ’  (A,P), ’  (B,Q)

) if 1. is trivial*

OR 2.  is in*’. than the proof is immediate.

otherwise,

3. there exists ’’s.t. span() = span(’)

and for ’=(AP,BQ) =(AQ,BP) : ’  (A,P), ’  (B,Q)

we will show a constructive proof under these conditions

mix

mix

dec.

Proof of Main Claim

Maim claim: for a given ,  we have ’  iff:

•  is trivial*: =(X,) *up to symmetry

•  is in*’:’

•  is derivable* from ’:’’s.t. span() = span(’)

and for ’=(AP,BQ) =(AQ,BP) : ’  (A,P), ’  (B,Q)

• ) (contd.) given that ’ (AP,BQ), ’  (A,P), ’  (B,Q).

• (A,P)(AP,BQ) (A,PBQ)

• (B,Q)(AP,BQ) (APB,Q) (PB,Q)

• (PB,Q)(A,PBQ) (AQ,PB) = (AQ, BP) = 

• We’ve proven this direction.

dec.

Proof of Main Claim

Maim claim: for a given ,  we have ’  iff:

•  is trivial*: =(X,) *up to symmetry

•  is in*’:’

•  is derivable* from ’:’’s.t. span() = span(’)

and for ’=(AP,BQ) =(AQ,BP) : ’  (A,P), ’  (B,Q)

)Given’  , if 1. is trivial* OR 2.  is in*’,

than the proof is immediate.

Otherwise, since no axiom can add new variables to a statement, there must exist ’’s.t. span() = span(’)in the derivation chain of.

also: = (AQ,BP) (A,P)

 = (AQ,BP) (Q,B) 

• We’ve seen that, after discarding unneeded variables,it is possible to tell whether ’   (when it’s not immediately obvious) by:

• Finding another statement ’’for whichspan() = span(’),

• Verifying that ’  (A,P), ’  (B,Q)when ’=(AP,BQ) =(AQ,BP).

• Thissuggests using a recursive “divide and conquer” approach.

Introduction- some definitions, notations and reminders.

 Proof of Completeness. - “if it’s true – it can be proved”.

Preparations for the Membership Algorithm–more definitions, and some theoretical groundwork.

The Membership Algorithm– description, proof of correctness, complexity analysis.

Procedure Find(,):

• set ’ :=(span()).

• if  is trivial, or ’ (up to symmetry)then Find(,) := TRUE.

• else if for all non-trivial ’’: span()  span(’), then Find(,) := FALSE.

• else there exists ’’: span() = span(’),

and ’=(AP,BQ) =(AQ,BP),

set 1:= (A,P), 2:= (B,Q).

Find(,) := (Find(’,1) Find(’,2))

We will prove that Find(,) := TRUEcl() by induction on k=.

Induction base: if k=1 then  is trivial, therefore the algorithm will return TRUE in step 2 and cl().

Induction assumption: Find(,) := TRUEcl() for each ’<k.

Induction step: Find(,) := TRUEiff either:

1. Step 2 returns TRUE   is trivial or ’cl().

2. Step 4 returns TRUE

iff

Find(’,1) := TRUE Find(’,2) := TRUE

iff

1cl(’)2cl(’)

iff

cl(’)

(according to algorithm’s definition)

(according to induction assumption)

(according to main claim)

(according to projection lemma)

iffcl()

Definitions:

n = the number of distinct variables in  {}.

k = the number of distinct variables in {}.

• First projection cost: O(||·n) – happens only once.

• Recursive step: T)k)  ||·k + T(k1) + T(k2)

where k1+k2=k, k1=|1|, k2=|2|

• Can be shown by induction: T)k)  ||·k·(depth of recursion)

• Worst case analysis: T)k)  ||·k·k= ||·k2

• Total run time is bounded by: O(||·n + ||·k2)which is also:O(||·n2)since k n.

• Instead of arbitrarily choosing ’, find one whose sub-statements {A,B,P,Q} have balanced size (can improve run-time complexity).

• Using the derivation chain presented in the constructive proof, the algorithm can also return a derivation chain for  with a length of O(k).

The algorithm can be expanded into a polynomial algorithm for the following problems:

• Given two sets  and , is cl()  cl() ?is cl() = cl() ?

• Minimize the size of  while preserving cl(): Start with a maximal-size statement and remove from  all statements derivable from it.Repeat with the next largest statement etc.