axioms and algorithms for inferences involving probabilistic independence
Download
Skip this Video
Download Presentation
Axioms and Algorithms for Inferences Involving Probabilistic Independence

Loading in 2 Seconds...

play fullscreen
1 / 33

Axioms and Algorithms for Inferences Involving Probabilistic Independence - PowerPoint PPT Presentation


  • 93 Views
  • Uploaded on

Axioms and Algorithms for Inferences Involving Probabilistic Independence. Dan Geiger, Azaria Paz, and Judea Pearl, Information and Computation 91(1), March 1991, 128-141. Presentation by Guy Moses & Omer Weissbrod

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Axioms and Algorithms for Inferences Involving Probabilistic Independence' - chace


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
axioms and algorithms for inferences involving probabilistic independence

Axioms and Algorithmsfor InferencesInvolvingProbabilistic Independence

Dan Geiger, Azaria Paz, and Judea Pearl,

Information and Computation 91(1), March 1991, 128-141.

Presentation by Guy Moses & Omer Weissbrod

for the course 236372 - Bayesian NetworksComputer Science Faculty, Technion – winter 2009

partially based on the presentation by Ilan Gronau

what s ahead
What’s ahead?

Introduction- some definitions, notations and reminders.

Proof of Completeness. - “if it’s true – it can be proved”.

 Preparations for the Membership Algorithm–more definitions, and some theoretical groundwork.

The Membership Algorithm– description, proof of correctness, complexity analysis.

definitions
Definitions
  • U (Universe) – set of random variables with probability distributionP.
  • X,Y – finite sets of random variables:X= x1,…,xn, Y = y1,…,ym
  • P(X,Y) = P(X)·P(Y)- a short-hand notation for the equality:Pr{x1=a1,…, xn=an, y1=b1, …, ym=bm} = Pr{x1=a1,…, xn=an} · Pr{y1=b1, …, ym=bm}

for every choice of a1, …, an, b1, …, bm

  • (X,Y) – short-hand for P(X,Y) = P(X)·P(Y)

This is called an independence statement.

*note that X,Yare disjoint sets of variables (XY = ).

notations
Notations
  •  - a specific independence statement of the form (X,Y)
  •  - a set of independence statements of the form (X,Y):  = 1, … , k
  • XY-short-hand notation for the union X  Y
  • P satisfies = (X,Y) means: P(X,Y) = P(X)·P(Y)for that specific P.
soundness and completeness
Soundness and Completeness

Definitions:

  •   iff every distribution that satisfies  also satisfies .
  •    iff   cl(),i.e. there exists a derivation chain 1,…,n=  s.t. for each j, either j  or jis derived by an axiom from the previous statements.

For a set of axioms A:

Soundness: A is sound iff for every  and :      

Completeness: A is complete iff for every  and :    

Completeness - Alternative definition:A is complete iff for every  and every cl()there exists a distribution Pthat satisfies cl)( and does not satisfy.

independence axioms
Independence Axioms

We saw (in 1st lecture) that axioms 1a-1d are sound (always infer correctly).

Today we’ll show they are complete (can derive every true statement).

what s ahead1
What’s ahead?

 Introduction- some definitions, notations and reminders.

 Proof of Completeness. - “if it’s true – it can be proved”.

 Preparations for the Membership Algorithm–more definitions, and some theoretical groundwork.

 The Membership Algorithm– description, proof of correctness, complexity analysis.

minimal statement
Minimal Statement
  • Definition: =(X,Y) cl()is minimal if for every non-empty X’,Y’s.t.X’X, Y’ Y,X’Y’XY we have (X’,Y’)  cl().
  • For every=(X,Y) cl()we can find an appropriate minimal ’=(X’,Y’)cl()through iterative decomposition.
  • Observation: Psatisfies   Psatisfies’(decomposition soundness),

Therefore:Pdoesn’t satisfy ’ Pdoesn’t satisfy .

  • Our plan: Given an arbitrary cl(), We will find a distribution P that satisfies cl() but doesn’t satisfy ’. This will prove completeness (using the alternative completeness definition and the observation above).
  • To simplify annotation, we will assume WLOG that =(X,Y)is already minimal.
completeness proof

=0

=0.5n

=0.5m

Completeness Proof

Let =(X,Y) cl()be a minimal statement where:

X={x1,…,xn},Y={y1,…,ym},andZ={z1,z2,…,zk}stand for the rest of the variables in U.

We will construct Pas follows: All variables, except x1, are fair coins (probability  for each of their two values)

x1 is defined thus:

Part 1: P does not satisfy 

We will inspect the following scenario: x1=1, all other variables are 0.

P(x1, … , xn, y1, … , ym)  P(x1, … , xn)·P(y1, … , ym)

Therefore, P does not satisfy , as required.

completeness proof cont d
Completeness Proof – cont’d

Part 2: P satisfies cl()

Let(V,W)  cl(). We will show thatP(V,W)=P(V)·P(W). This is done by inspecting different scenarios:

Scenario 1: either V or W contains only elements of Z. We will assume WLOG that W contains only elements of Z.

all variables in Z are independent under Pand therefore:

Z

W

Z

Z

Z

Y

Z

Z

Z

V

Z

Z

Y

X

Y

Z

X

Y

Y

X

Z

Y

Z

Z

X

Z

completeness proof cont d1
Completeness Proof – cont’d

Part 2: P satisfies cl()

Let(V,W)  cl(). We will show thatP(V,W)=P(V)·P(W). This is done by inspecting different scenarios:

Scenario 2: Both V and W contain elements of X  Y,butV  W doesn’t contain all elements of X  Y.

Without full information about the assignments of the variables in X  Y, x1could turn out to be 0 or 1 with probability, and therefore:

Z

Z

W

Z

Z

Y

Z

Z

Z

Z

Z

V

Y

X

Y

Z

X

Y

Y

X

Z

Y

Z

Z

X

Z

completeness proof cont d2

mix

mix

Completeness Proof – cont’d

Part 2: P satisfies cl()- continued

Scenario 3: Both V and W contain elements of X  Y, and(X  Y)(V  W).

We will show a derivation chain for =(X,Y), contradicting our original assumption that  cl():

Mark: (V,W)=(XVYVZV, XWYWZW)cl()

where: Y=YVYW, X=XVXW, ZVZWZ, V=XVYVZV,W=XWYWZW

Remove all z’s by decomposition: (XVYV,XWYW)cl()

Due to minimality of=(X,Y):(XV,YV)cl()and (XW,Y)cl()

(XV,YV)(XVYV,XWYW) (XV,YV XWYW) = (XV,XWY)

(XW,Y)  (XWY,XV) (Y,XVXW) = (Y,X) =

Z

Z

Z

Z

Y

W

Z

Z

Z

Z

Z

Y

X

V

Y

Z

X

Y

Y

X

Z

Y

Z

Z

X

Z

completeness proof summary
Completeness Proof – Summary

Reminder: Completeness - Alternative definition:A is complete iff for every  and every cl()there exists a distribution Pthat satisfies cl)( and does not satisfy.

We’ve shown: given a minimalcl(),there exists a distributionPthat obeys:

  • Pdoes not satisfy.
  • Psatisfies.

Given a non-minimal  cl(), we will derive itsminimal statement ’, and devise a distribution P’that satisfies but does not satisfy ’. Due to soundness of decomposition, P’ cannot satisfy  as well.

scope of completeness

all p.d.’s over U

discrete p.d.’s

normalp.d.’s

binary p.d.’s

Scope of Completeness

The proof uses P- a binary p.d. (probability distribution

function) therefore:

  • P

however,

for normal p.d.’s, the axiom set a1-d1 is not complete.

a stronger axiom is required:

replace:

with:

what s ahead2
What’s ahead?

Introduction- some definitions, notations and reminders.

 Proof of Completeness. - “if it’s true – it can be proved”.

 Preparations for the Membership Algorithm–more definitions, and some theoretical groundwork.

The Membership Algorithm– description, proof of correctness, complexity analysis.

some more definitions and tools
Some more Definitions and Tools

Definition: Span

span(): the set of elements represented in statement .

Example: span(x1x2,x3,x4) = {x1,x2,x3,x4}

span(): the set of elements represented in all statements of .

Example: span({(x1,x2),(x1,x3)}) = {x1,x2,x3}

some more definitions and tools1
Some more Definitions and Tools

Definition: Projection

The projection of onX, denoted (X), is the statement derived from by removing all elements not in X from .

Example: if =(x1x2x3, x4x5)and X={x2,x3,x4}then (X)=(x2x3, x4).

The projection of onX, denoted (X), is {(X) |   }.

some more definitions and tools2
Some more Definitions and Tools

Projection Lemma:  iff‘ , where ’= (span())

)if \'  then clearly   because all the statements in ‘ can be derived from the statements in  by decomposition.

some more definitions and tools3
Some more Definitions and Tools

Projection Lemma:   iff’  , where ’ = (span()), s = span()

)if then there is a derivation chain for : 1, 2, … , k.

For each j:

if k  j,k<j, (by symmetry or decomposition)

then k(s) j(s)by symmetry or decomposition respectively.

Similarly,

if j is derived from kandl by mixing,

then j(s)is derived from k(s),l(s)by mixing. 

some more definitions and tools4
Some more Definitions and Tools

Projection Lemma:   iff’  , where ’ = (span()), s = span()

Observations from projection lemma:

  • Variables not in are unnecessary for determining whether   .
  • The problem of verifying whether   can be simplified to the problem of verifying whether \', where \'= (span()).
  • This problem can be solved with a possibly reduced time and space complexity.
conditions for inference of independence
Conditions for Inference of Independence

Maim claim: for a given ,  we have ’  iff:

  •  is trivial: =(X,)(up to symmetry)

OR

  •  is in ’:’(up to symmetry)

OR

  •  is derivable from ’:

there exists ’’s.t. span() = span(’)

and for ’=(AP,BQ) =(AQ,BP) (A,B,Q,P may be empty)

’  (A,P), ’  (B,Q) (up to symmetry)

proof of main claim
Proof of Main Claim

Maim claim: for a given ,  we have ’  iff:

  •  is trivial*: =(X,) *up to symmetry
  •  is in*’:’
  •  is derivable* from ’:’’s.t. span() = span(’)

and for ’=(AP,BQ) =(AQ,BP) : ’  (A,P), ’  (B,Q)

) if 1. is trivial*

OR 2.  is in*’. than the proof is immediate.

otherwise,

3. there exists ’’s.t. span() = span(’)

and for ’=(AP,BQ) =(AQ,BP) : ’  (A,P), ’  (B,Q)

we will show a constructive proof under these conditions

proof of main claim1

mix

mix

mix

dec.

Proof of Main Claim

Maim claim: for a given ,  we have ’  iff:

  •  is trivial*: =(X,) *up to symmetry
  •  is in*’:’
  •  is derivable* from ’:’’s.t. span() = span(’)

and for ’=(AP,BQ) =(AQ,BP) : ’  (A,P), ’  (B,Q)

  • ) (contd.) given that ’ (AP,BQ), ’  (A,P), ’  (B,Q).
  • (A,P)(AP,BQ) (A,PBQ)
  • (B,Q)(AP,BQ) (APB,Q) (PB,Q)
  • (PB,Q)(A,PBQ) (AQ,PB) = (AQ, BP) = 
  • We’ve proven this direction.
proof of main claim2

dec.

dec.

Proof of Main Claim

Maim claim: for a given ,  we have ’  iff:

  •  is trivial*: =(X,) *up to symmetry
  •  is in*’:’
  •  is derivable* from ’:’’s.t. span() = span(’)

and for ’=(AP,BQ) =(AQ,BP) : ’  (A,P), ’  (B,Q)

)Given’  , if 1. is trivial* OR 2.  is in*’,

than the proof is immediate.

Otherwise, since no axiom can add new variables to a statement, there must exist ’’s.t. span() = span(’)in the derivation chain of.

also: = (AQ,BP) (A,P)

 = (AQ,BP) (Q,B) 

conclusions from claim
Conclusions from Claim
  • We’ve seen that, after discarding unneeded variables,it is possible to tell whether ’   (when it’s not immediately obvious) by:
    • Finding another statement ’’for whichspan() = span(’),
    • Verifying that ’  (A,P), ’  (B,Q)when ’=(AP,BQ) =(AQ,BP).
  • Thissuggests using a recursive “divide and conquer” approach.
what s ahead3
What’s ahead?

Introduction- some definitions, notations and reminders.

 Proof of Completeness. - “if it’s true – it can be proved”.

Preparations for the Membership Algorithm–more definitions, and some theoretical groundwork.

The Membership Algorithm– description, proof of correctness, complexity analysis.

the membership algorithm
The Membership Algorithm

Procedure Find(,):

  • set ’ :=(span()).
  • if  is trivial, or ’ (up to symmetry)then Find(,) := TRUE.
  • else if for all non-trivial ’’: span()  span(’), then Find(,) := FALSE.
  • else there exists ’’: span() = span(’),

and ’=(AP,BQ) =(AQ,BP),

set 1:= (A,P), 2:= (B,Q).

Find(,) := (Find(’,1) Find(’,2))

algorithm correctness proof
Algorithm Correctness Proof

We will prove that Find(,) := TRUEcl() by induction on k=.

Induction base: if k=1 then  is trivial, therefore the algorithm will return TRUE in step 2 and cl().

algorithm correctness proof1
Algorithm Correctness Proof

Induction assumption: Find(,) := TRUEcl() for each ’<k.

Induction step: Find(,) := TRUEiff either:

1. Step 2 returns TRUE   is trivial or ’cl().

2. Step 4 returns TRUE

iff

Find(’,1) := TRUE Find(’,2) := TRUE

iff

1cl(’)2cl(’)

iff

cl(’)

(according to algorithm’s definition)

(according to induction assumption)

(according to main claim)

(according to projection lemma)

iffcl()

complexity analysis
Complexity Analysis

Definitions:

n = the number of distinct variables in  {}.

k = the number of distinct variables in {}.

  • First projection cost: O(||·n) – happens only once.
  • Recursive step: T)k)  ||·k + T(k1) + T(k2)

where k1+k2=k, k1=|1|, k2=|2|

  • Can be shown by induction: T)k)  ||·k·(depth of recursion)
  • Worst case analysis: T)k)  ||·k·k= ||·k2
  • Total run time is bounded by: O(||·n + ||·k2)which is also:O(||·n2)since k n.
improvements and variations
Improvements and Variations
  • Instead of arbitrarily choosing ’, find one whose sub-statements {A,B,P,Q} have balanced size (can improve run-time complexity).
  • Using the derivation chain presented in the constructive proof, the algorithm can also return a derivation chain for  with a length of O(k).
variations contd
Variations (contd.)

The algorithm can be expanded into a polynomial algorithm for the following problems:

  • Given two sets  and , is cl()  cl() ?is cl() = cl() ?
  • Minimize the size of  while preserving cl(): Start with a maximal-size statement and remove from  all statements derivable from it.Repeat with the next largest statement etc.
ad