axioms and algorithms for inferences involving probabilistic independence n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Axioms and Algorithms for Inferences Involving Probabilistic Independence PowerPoint Presentation
Download Presentation
Axioms and Algorithms for Inferences Involving Probabilistic Independence

Loading in 2 Seconds...

play fullscreen
1 / 33

Axioms and Algorithms for Inferences Involving Probabilistic Independence - PowerPoint PPT Presentation


  • 93 Views
  • Uploaded on

Axioms and Algorithms for Inferences Involving Probabilistic Independence. Dan Geiger, Azaria Paz, and Judea Pearl, Information and Computation 91(1), March 1991, 128-141. Presentation by Guy Moses & Omer Weissbrod

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Axioms and Algorithms for Inferences Involving Probabilistic Independence' - chace


Download Now An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
axioms and algorithms for inferences involving probabilistic independence

Axioms and Algorithmsfor InferencesInvolvingProbabilistic Independence

Dan Geiger, Azaria Paz, and Judea Pearl,

Information and Computation 91(1), March 1991, 128-141.

Presentation by Guy Moses & Omer Weissbrod

for the course 236372 - Bayesian NetworksComputer Science Faculty, Technion – winter 2009

partially based on the presentation by Ilan Gronau

what s ahead
What’s ahead?

Introduction- some definitions, notations and reminders.

Proof of Completeness. - “if it’s true – it can be proved”.

 Preparations for the Membership Algorithm–more definitions, and some theoretical groundwork.

The Membership Algorithm– description, proof of correctness, complexity analysis.

definitions
Definitions
  • U (Universe) – set of random variables with probability distributionP.
  • X,Y – finite sets of random variables:X= x1,…,xn, Y = y1,…,ym
  • P(X,Y) = P(X)·P(Y)- a short-hand notation for the equality:Pr{x1=a1,…, xn=an, y1=b1, …, ym=bm} = Pr{x1=a1,…, xn=an} · Pr{y1=b1, …, ym=bm}

for every choice of a1, …, an, b1, …, bm

  • (X,Y) – short-hand for P(X,Y) = P(X)·P(Y)

This is called an independence statement.

*note that X,Yare disjoint sets of variables (XY = ).

notations
Notations
  •  - a specific independence statement of the form (X,Y)
  •  - a set of independence statements of the form (X,Y):  = 1, … , k
  • XY-short-hand notation for the union X  Y
  • P satisfies = (X,Y) means: P(X,Y) = P(X)·P(Y)for that specific P.
soundness and completeness
Soundness and Completeness

Definitions:

  •   iff every distribution that satisfies  also satisfies .
  •    iff   cl(),i.e. there exists a derivation chain 1,…,n=  s.t. for each j, either j  or jis derived by an axiom from the previous statements.

For a set of axioms A:

Soundness: A is sound iff for every  and :      

Completeness: A is complete iff for every  and :    

Completeness - Alternative definition:A is complete iff for every  and every cl()there exists a distribution Pthat satisfies cl)( and does not satisfy.

independence axioms
Independence Axioms

We saw (in 1st lecture) that axioms 1a-1d are sound (always infer correctly).

Today we’ll show they are complete (can derive every true statement).

what s ahead1
What’s ahead?

 Introduction- some definitions, notations and reminders.

 Proof of Completeness. - “if it’s true – it can be proved”.

 Preparations for the Membership Algorithm–more definitions, and some theoretical groundwork.

 The Membership Algorithm– description, proof of correctness, complexity analysis.

minimal statement
Minimal Statement
  • Definition: =(X,Y) cl()is minimal if for every non-empty X’,Y’s.t.X’X, Y’ Y,X’Y’XY we have (X’,Y’)  cl().
  • For every=(X,Y) cl()we can find an appropriate minimal ’=(X’,Y’)cl()through iterative decomposition.
  • Observation: Psatisfies   Psatisfies’(decomposition soundness),

Therefore:Pdoesn’t satisfy ’ Pdoesn’t satisfy .

  • Our plan: Given an arbitrary cl(), We will find a distribution P that satisfies cl() but doesn’t satisfy ’. This will prove completeness (using the alternative completeness definition and the observation above).
  • To simplify annotation, we will assume WLOG that =(X,Y)is already minimal.
completeness proof

=0

=0.5n

=0.5m

Completeness Proof

Let =(X,Y) cl()be a minimal statement where:

X={x1,…,xn},Y={y1,…,ym},andZ={z1,z2,…,zk}stand for the rest of the variables in U.

We will construct Pas follows: All variables, except x1, are fair coins (probability  for each of their two values)

x1 is defined thus:

Part 1: P does not satisfy 

We will inspect the following scenario: x1=1, all other variables are 0.

P(x1, … , xn, y1, … , ym)  P(x1, … , xn)·P(y1, … , ym)

Therefore, P does not satisfy , as required.

completeness proof cont d
Completeness Proof – cont’d

Part 2: P satisfies cl()

Let(V,W)  cl(). We will show thatP(V,W)=P(V)·P(W). This is done by inspecting different scenarios:

Scenario 1: either V or W contains only elements of Z. We will assume WLOG that W contains only elements of Z.

all variables in Z are independent under Pand therefore:

Z

W

Z

Z

Z

Y

Z

Z

Z

V

Z

Z

Y

X

Y

Z

X

Y

Y

X

Z

Y

Z

Z

X

Z

completeness proof cont d1
Completeness Proof – cont’d

Part 2: P satisfies cl()

Let(V,W)  cl(). We will show thatP(V,W)=P(V)·P(W). This is done by inspecting different scenarios:

Scenario 2: Both V and W contain elements of X  Y,butV  W doesn’t contain all elements of X  Y.

Without full information about the assignments of the variables in X  Y, x1could turn out to be 0 or 1 with probability, and therefore:

Z

Z

W

Z

Z

Y

Z

Z

Z

Z

Z

V

Y

X

Y

Z

X

Y

Y

X

Z

Y

Z

Z

X

Z

completeness proof cont d2

mix

mix

Completeness Proof – cont’d

Part 2: P satisfies cl()- continued

Scenario 3: Both V and W contain elements of X  Y, and(X  Y)(V  W).

We will show a derivation chain for =(X,Y), contradicting our original assumption that  cl():

Mark: (V,W)=(XVYVZV, XWYWZW)cl()

where: Y=YVYW, X=XVXW, ZVZWZ, V=XVYVZV,W=XWYWZW

Remove all z’s by decomposition: (XVYV,XWYW)cl()

Due to minimality of=(X,Y):(XV,YV)cl()and (XW,Y)cl()

(XV,YV)(XVYV,XWYW) (XV,YV XWYW) = (XV,XWY)

(XW,Y)  (XWY,XV) (Y,XVXW) = (Y,X) =

Z

Z

Z

Z

Y

W

Z

Z

Z

Z

Z

Y

X

V

Y

Z

X

Y

Y

X

Z

Y

Z

Z

X

Z

completeness proof summary
Completeness Proof – Summary

Reminder: Completeness - Alternative definition:A is complete iff for every  and every cl()there exists a distribution Pthat satisfies cl)( and does not satisfy.

We’ve shown: given a minimalcl(),there exists a distributionPthat obeys:

  • Pdoes not satisfy.
  • Psatisfies.

Given a non-minimal  cl(), we will derive itsminimal statement ’, and devise a distribution P’that satisfies but does not satisfy ’. Due to soundness of decomposition, P’ cannot satisfy  as well.

scope of completeness

all p.d.’s over U

discrete p.d.’s

normalp.d.’s

binary p.d.’s

Scope of Completeness

The proof uses P- a binary p.d. (probability distribution

function) therefore:

  • P

however,

for normal p.d.’s, the axiom set a1-d1 is not complete.

a stronger axiom is required:

replace:

with:

what s ahead2
What’s ahead?

Introduction- some definitions, notations and reminders.

 Proof of Completeness. - “if it’s true – it can be proved”.

 Preparations for the Membership Algorithm–more definitions, and some theoretical groundwork.

The Membership Algorithm– description, proof of correctness, complexity analysis.

some more definitions and tools
Some more Definitions and Tools

Definition: Span

span(): the set of elements represented in statement .

Example: span(x1x2,x3,x4) = {x1,x2,x3,x4}

span(): the set of elements represented in all statements of .

Example: span({(x1,x2),(x1,x3)}) = {x1,x2,x3}

some more definitions and tools1
Some more Definitions and Tools

Definition: Projection

The projection of onX, denoted (X), is the statement derived from by removing all elements not in X from .

Example: if =(x1x2x3, x4x5)and X={x2,x3,x4}then (X)=(x2x3, x4).

The projection of onX, denoted (X), is {(X) |   }.

some more definitions and tools2
Some more Definitions and Tools

Projection Lemma:  iff‘ , where ’= (span())

)if '  then clearly   because all the statements in ‘ can be derived from the statements in  by decomposition.

some more definitions and tools3
Some more Definitions and Tools

Projection Lemma:   iff’  , where ’ = (span()), s = span()

)if then there is a derivation chain for : 1, 2, … , k.

For each j:

if k  j,k<j, (by symmetry or decomposition)

then k(s) j(s)by symmetry or decomposition respectively.

Similarly,

if j is derived from kandl by mixing,

then j(s)is derived from k(s),l(s)by mixing. 

some more definitions and tools4
Some more Definitions and Tools

Projection Lemma:   iff’  , where ’ = (span()), s = span()

Observations from projection lemma:

  • Variables not in are unnecessary for determining whether   .
  • The problem of verifying whether   can be simplified to the problem of verifying whether ', where '= (span()).
  • This problem can be solved with a possibly reduced time and space complexity.
conditions for inference of independence
Conditions for Inference of Independence

Maim claim: for a given ,  we have ’  iff:

  •  is trivial: =(X,)(up to symmetry)

OR

  •  is in ’:’(up to symmetry)

OR

  •  is derivable from ’:

there exists ’’s.t. span() = span(’)

and for ’=(AP,BQ) =(AQ,BP) (A,B,Q,P may be empty)

’  (A,P), ’  (B,Q) (up to symmetry)

proof of main claim
Proof of Main Claim

Maim claim: for a given ,  we have ’  iff:

  •  is trivial*: =(X,) *up to symmetry
  •  is in*’:’
  •  is derivable* from ’:’’s.t. span() = span(’)

and for ’=(AP,BQ) =(AQ,BP) : ’  (A,P), ’  (B,Q)

) if 1. is trivial*

OR 2.  is in*’. than the proof is immediate.

otherwise,

3. there exists ’’s.t. span() = span(’)

and for ’=(AP,BQ) =(AQ,BP) : ’  (A,P), ’  (B,Q)

we will show a constructive proof under these conditions

proof of main claim1

mix

mix

mix

dec.

Proof of Main Claim

Maim claim: for a given ,  we have ’  iff:

  •  is trivial*: =(X,) *up to symmetry
  •  is in*’:’
  •  is derivable* from ’:’’s.t. span() = span(’)

and for ’=(AP,BQ) =(AQ,BP) : ’  (A,P), ’  (B,Q)

  • ) (contd.) given that ’ (AP,BQ), ’  (A,P), ’  (B,Q).
  • (A,P)(AP,BQ) (A,PBQ)
  • (B,Q)(AP,BQ) (APB,Q) (PB,Q)
  • (PB,Q)(A,PBQ) (AQ,PB) = (AQ, BP) = 
  • We’ve proven this direction.
proof of main claim2

dec.

dec.

Proof of Main Claim

Maim claim: for a given ,  we have ’  iff:

  •  is trivial*: =(X,) *up to symmetry
  •  is in*’:’
  •  is derivable* from ’:’’s.t. span() = span(’)

and for ’=(AP,BQ) =(AQ,BP) : ’  (A,P), ’  (B,Q)

)Given’  , if 1. is trivial* OR 2.  is in*’,

than the proof is immediate.

Otherwise, since no axiom can add new variables to a statement, there must exist ’’s.t. span() = span(’)in the derivation chain of.

also: = (AQ,BP) (A,P)

 = (AQ,BP) (Q,B) 

conclusions from claim
Conclusions from Claim
  • We’ve seen that, after discarding unneeded variables,it is possible to tell whether ’   (when it’s not immediately obvious) by:
    • Finding another statement ’’for whichspan() = span(’),
    • Verifying that ’  (A,P), ’  (B,Q)when ’=(AP,BQ) =(AQ,BP).
  • Thissuggests using a recursive “divide and conquer” approach.
what s ahead3
What’s ahead?

Introduction- some definitions, notations and reminders.

 Proof of Completeness. - “if it’s true – it can be proved”.

Preparations for the Membership Algorithm–more definitions, and some theoretical groundwork.

The Membership Algorithm– description, proof of correctness, complexity analysis.

the membership algorithm
The Membership Algorithm

Procedure Find(,):

  • set ’ :=(span()).
  • if  is trivial, or ’ (up to symmetry)then Find(,) := TRUE.
  • else if for all non-trivial ’’: span()  span(’), then Find(,) := FALSE.
  • else there exists ’’: span() = span(’),

and ’=(AP,BQ) =(AQ,BP),

set 1:= (A,P), 2:= (B,Q).

Find(,) := (Find(’,1) Find(’,2))

algorithm correctness proof
Algorithm Correctness Proof

We will prove that Find(,) := TRUEcl() by induction on k=.

Induction base: if k=1 then  is trivial, therefore the algorithm will return TRUE in step 2 and cl().

algorithm correctness proof1
Algorithm Correctness Proof

Induction assumption: Find(,) := TRUEcl() for each ’<k.

Induction step: Find(,) := TRUEiff either:

1. Step 2 returns TRUE   is trivial or ’cl().

2. Step 4 returns TRUE

iff

Find(’,1) := TRUE Find(’,2) := TRUE

iff

1cl(’)2cl(’)

iff

cl(’)

(according to algorithm’s definition)

(according to induction assumption)

(according to main claim)

(according to projection lemma)

iffcl()

complexity analysis
Complexity Analysis

Definitions:

n = the number of distinct variables in  {}.

k = the number of distinct variables in {}.

  • First projection cost: O(||·n) – happens only once.
  • Recursive step: T)k)  ||·k + T(k1) + T(k2)

where k1+k2=k, k1=|1|, k2=|2|

  • Can be shown by induction: T)k)  ||·k·(depth of recursion)
  • Worst case analysis: T)k)  ||·k·k= ||·k2
  • Total run time is bounded by: O(||·n + ||·k2)which is also:O(||·n2)since k n.
improvements and variations
Improvements and Variations
  • Instead of arbitrarily choosing ’, find one whose sub-statements {A,B,P,Q} have balanced size (can improve run-time complexity).
  • Using the derivation chain presented in the constructive proof, the algorithm can also return a derivation chain for  with a length of O(k).
variations contd
Variations (contd.)

The algorithm can be expanded into a polynomial algorithm for the following problems:

  • Given two sets  and , is cl()  cl() ?is cl() = cl() ?
  • Minimize the size of  while preserving cl(): Start with a maximal-size statement and remove from  all statements derivable from it.Repeat with the next largest statement etc.