Ockham s razor what it is what it isn t how it works and how it doesn t
This presentation is the property of its rightful owner.
Sponsored Links
1 / 203

Ockham’s Razor: What it is, What it isn’t, How it works, and How it doesn’t PowerPoint PPT Presentation


  • 82 Views
  • Uploaded on
  • Presentation posted in: General

Ockham’s Razor: What it is, What it isn’t, How it works, and How it doesn’t. Kevin T. Kelly Department of Philosophy Carnegie Mellon University www.cmu.edu. Further Reading.

Download Presentation

Ockham’s Razor: What it is, What it isn’t, How it works, and How it doesn’t

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Ockham s razor what it is what it isn t how it works and how it doesn t

Ockham’s Razor: What it is, What it isn’t, How it works, and How it doesn’t

Kevin T. Kelly

Department of Philosophy

Carnegie Mellon University

www.cmu.edu


Ockham s razor what it is what it isn t how it works and how it doesn t

Further Reading

“Efficient Convergence Implies Ockham's Razor”,Proceedings of the 2002 International Workshop on Computational Models of Scientific Reasoning and Applications, Las Vegas, USA, June 24-27, 2002.

(with C. Glymour) “Why Probability Does Not Capture the Logic of Scientific Justification”, C. Hitchcock, ed., Contemporary Debates in the Philosophy of Science, Oxford: Blackwell, 2004.

“Justification as Truth-finding Efficiency: How Ockham's Razor Works”,Minds and Machines 14: 2004, pp. 485-505.

“Learning, Simplicity, Truth, and Misinformation”,The Philosophy of Information, under review.

“Ockham's Razor, Efficiency, and the Infinite Game of

Science”, proceedings, Foundations of the Formal Sciences 2004: Infinite Game Theory, Springer, under review.


Which theory to choose

T1

T2

T3

T4

T5

Which Theory to Choose?

Compatible with data

???


Use ockham s razor

T4

T5

Use Ockham’s Razor

Complex

T1

T2

T3

Simple


Dilemma

T4

T5

Dilemma

If you know the truth is simple,

then you don’t need Ockham.

Complex

T1

T2

T3

Simple


Dilemma1

Dilemma

If you don’t know the truth is simple,

then how coulda fixedsimplicity bias help you if the truth is complex?

Complex

T1

T2

T3

Simple

T4

T5


Puzzle

Puzzle

A fixed bias is like a broken thermometer.

How could it possibly help you find unknown truth?

Cold!


I ockham apologists

I. Ockham Apologists


Wishful thinking

Wishful Thinking

  • Simple theories are nice if true:

    • Testability

    • Unity

    • Best explanation

    • Aesthetic appeal

    • Compress data

  • So is believing that you are the emperor.


Overfitting

Overfitting

  • Maximum likelihood estimates based on overly complex theories can have greater predictive error (AIC, Cross-validation, etc.).

    • Same is true even if you know the true model is complex.

    • Doesn’t converge to true model.

    • Depends on random data.

Thanks, but a simpler model still has lower predictive error.

The truth is complex.

-God-

.

.

.

.


Ignorance knowledge

Ignorance = Knowledge

Messy worlds are legion

Tidy worlds are few.

That is why the tidy worlds

Are those most likely true. (Carnap)

unity


Ignorance knowledge1

Ignorance = Knowledge

Messy worlds are legion

Tidy worlds are few.

That is why the tidy worlds

Are those most likely true. (Carnap)

1/3

1/3

1/3


Ignorance knowledge2

Ignorance = Knowledge

Messy worlds are legion

Tidy worlds are few.

That is why the tidy worlds

Are those most likely true. (Carnap)

2/6

2/6

1/6

1/6


Depends on notation

Depends on Notation

But mess depends on coding,

which Goodman noticed, too.

The picture is inverted if

we translate green to grue.

2/6

2/6

Notation

Indicates truth?

1/6

1/6


Same for algorithmic complexity

Same for Algorithmic Complexity

  • Goodman’s problem works against every fixed simplicity ranking (independent of the processes by which data are generated and coded prior to learning).

  • Extra problem: any pair-wise ranking of theories can be reversed by choosing an alternative computer language.

  • So how could simplicity help us find the true theory?

Notation

Indicates truth?


Just beg the question

Just Beg the Question

  • Assign high prior probability to simple theories.

    • Why should you?

    • Preference for complexity has the same “explanation”.

You presume simplicity

Therefore you should presume simplicity!


Miracle argument

Miracle Argument

  • Simple data would be a miracle if a complex theory were true (Bayes, BIC, Putnam).


Begs the question

Begs the Question

  • “Fairness” between theories 

    bias against complex worlds.

S

C


Two can play that game

Two Can Play That Game

  • “Fairness” between worlds 

    bias against simple theory.

S

C


Convergence

Convergence

  • At least a simplicity bias doesn’t preventconvergence to the truth (MDL, BIC, Bayes, SGS, etc.).

    • Neither do other biases.

    • May as well recommend flat tires since they can be fixed.

O

P

O

P

L

L

M

M

E

E

O

I

X

C

S


Does ockham have no frock

Does Ockham Have No Frock?

Ash Heap of

History

Philosopher’s stone,

Perpetual motion,

Free lunch

Ockham’s Razor???

. . .


Ii how ockham helps you find the truth

II. How Ockham Helps You Find the Truth


What is guidance

What is Guidance?

  • Indication or tracking

    • Too strong

    • Fixed bias can’t indicate anything

  • Convergence

    • Too weak

    • True of other biases

  • “Straightest” convergence

    • Just right?

S

C

S

C

S

C


A true story

A True Story

Niagara Falls

Clarion

Pittsburgh


A true story1

A True Story

Niagara Falls

Clarion

Pittsburgh


A true story2

A True Story

Niagara Falls

Clarion

Pittsburgh


A true story3

A True Story

Niagara Falls

!

Clarion

Pittsburgh


A true story4

A True Story

Niagara Falls

Clarion

Pittsburgh


A true story5

A True Story

?


A true story6

A True Story


A true story7

A True Story

Ask directions!


A true story8

A True Story

Where’s …


What does she say

What Does She Say?

Turn around. The freeway ramp is on the left.


You have better ideas

You Have Better Ideas

Phooey!

The Sun was on the right!


You have better ideas1

You Have Better Ideas

!!


You have better ideas2

You Have Better Ideas


You have better ideas3

You Have Better Ideas


You have better ideas4

You Have Better Ideas


Stay the course

Stay the Course!

Ahem…


Stay the course1

Stay the Course!


Stay the course2

Stay the Course!


Don t flip flop

Don’t Flip-flop!


Don t flip flop1

Don’t Flip-flop!


Don t flip flop2

Don’t Flip-flop!


Then again

Then Again…


Then again1

Then Again…


Then again2

Then Again…

@@$##!


One good flip can save a lot of flop

Pittsburgh

One Good Flip Can Save a Lot of Flop


The u turn

Pittsburgh

The U-Turn


The u turn1

Pittsburgh

The U-Turn


The u turn2

Pittsburgh

The U-Turn


The u turn3

Pittsburgh

The U-Turn


The u turn4

Pittsburgh

The U-Turn


The u turn5

Pittsburgh

The U-Turn


The u turn6

Pittsburgh

The U-Turn

Told ya!


The u turn7

Pittsburgh

The U-Turn


The u turn8

Pittsburgh

The U-Turn


The u turn9

Pittsburgh

The U-Turn


The u turn10

Pittsburgh

The U-Turn


The u turn11

Pittsburgh

The U-Turn


Your route

Pittsburgh

Your Route

Needless U-turn


The best route

Pittsburgh

The Best Route

Told ya!


The best route anywhere from there

NY, DC

Pittsburgh

The Best Route Anywhere from There

Told ya!


The freeway to the truth

Complex

Simple

The Freeway to the Truth

  • Fixed advice for all destinations

  • Disregarding it entails an extra course reversal…

Told ya!


The freeway to the truth1

Complex

Simple

The Freeway to the Truth

  • …even if the advice points away from the goal!

Told ya!


Counting marbles

Counting Marbles


Counting marbles1

Counting Marbles


Counting marbles2

Counting Marbles

May come at any time…


Counting marbles3

Counting Marbles

May come at any time…


Counting marbles4

Counting Marbles

May come at any time…


Counting marbles5

Counting Marbles

May come at any time…


Counting marbles6

Counting Marbles

May come at any time…


Counting marbles7

Counting Marbles

May come at any time…


Counting marbles8

Counting Marbles

May come at any time…


Ockham s razor

Ockham’s Razor

  • If you answer, answer with the current count.

3

?


Analogy

Analogy

  • Marbles = detectable “effects”.

  • Late appearance = difficulty of detection.

  • Count = model (e.g., causal graph).

  • Appearance times = free parameters.


Analogy1

Analogy

  • U-turn= model revision (with content loss)

  • Highway= revision-efficient truth-finding method.

T

T


The u turn argument

The U-turn Argument

  • Suppose you converge to the truth but

  • violate Ockham’s razor along the way.

3


The u turn argument1

The U-turn Argument

  • Where is that extra marble, anyway?

3


The u turn argument2

The U-turn Argument

  • It’s not coming, is it?

3


The u turn argument3

The U-turn Argument

  • If you never say 2 you’ll never converge to the truth….

3


The u turn argument4

The U-turn Argument

  • That’s it. You should have listened to Ockham.

3

2

2

2


The u turn argument5

The U-turn Argument

  • Oops! Well, no method is infallible!

3

2

2

2


The u turn argument6

The U-turn Argument

  • If you never say 3, you’ll never converge to the truth….

3

2

2

2


The u turn argument7

The U-turn Argument

  • Embarrassing to be back at that old theory, eh?

3

2

2

2

3


The u turn argument8

The U-turn Argument

  • And so forth…

3

2

2

2

3

4


The u turn argument9

The U-turn Argument

  • And so forth…

3

2

2

2

3

4

5


The u turn argument10

The U-turn Argument

  • And so forth…

3

2

2

2

3

4

5

6


The u turn argument11

The U-turn Argument

  • And so forth…

3

2

2

2

3

4

5

6

7


The score

The Score

  • You:

Subproblem

3

2

2

2

3

4

5

6

7


The score1

The Score

  • Ockham:

Subproblem

2

2

2

3

4

5

6

7


Ockham is necessary

Ockham is Necessary

  • If you converge to the truth,

    and

  • you violate Ockham’s razor

    then

  • some convergent method beats your worst-case revision bound in each answer in the subproblem entered at the time of the violation.


Ockham is sufficient

Ockham is Sufficient

  • If you converge to the truth,

    and

  • you never violate Ockham’s razor

    then

  • You achieve the worst-case revision bound of each convergent solution in each answer in each subproblem.


Efficiency

Efficiency

  • Efficiency = achievement of the best worst-case revision bound in each answer in each subproblem.


Ockham efficiency theorem

Ockham Efficiency Theorem

  • Among the convergent methods…

    Ockham = Efficient!

Efficient

Inefficient


Mixed strategies

“Mixed” Strategies

  • mixed strategy = chance of output depends only on actual experience.

  • convergence in probability = chance of producing true answer approaches 1 in the limit.

  • efficiency = achievement of best worst-case expected revision bound in each answer in each subproblem.


Ockham efficiency theorem1

Ockham Efficiency Theorem

  • Among the mixed methods that converge in probability…

    Ockham = Efficient!

Efficient

Inefficient


Dominance and support

Dominance and “Support”

  • Every convergent method is weakly dominated in revisions by a clone who says “?” until stage n.

    • Convergence Must leap eventually.

    • Efficiency Only leap to simplest.

    • Dominance Could always wait longer.

Can’t wait forever!


Iii ockham on steroids

III. Ockham on Steroids


Ockham wish list

Ockham Wish List

  • General definition of Ockham’s razor.

  • Compare revisions even when not bounded within answers.

  • Prove theorem for arbitrary empirical problems.


Empirical problems

Empirical Problems

  • Problem = partition of a topological space.

  • Potential answers = partition cells.

  • Evidence = open (verifiable) propositions.

Example: Symmetry


Example parameter freeing

Example: Parameter Freeing

  • Euclidean topology.

  • Say which parameters are zero.

  • Evidence = open neighborhood.

  • Curve fitting

a1

a1 = 0

a2 = 0

a1 > 0

a2 = 0

a1 = 0

a2 > 0

a1 > 0

a2 > 0

a2

0


The players

The Players

  • Scientist:

    • Produces an answer in response to current evidence.

  • Demon:

    • Chooses evidence in response to scientist’s choices


Winning

Winning

  • Scientist wins…

    • by default if demon doesn’t present an infinite nested sequence of basic open sets whose intersection is a singleton.

    • else by merit if scientist eventually always produces the true answer for world selected by demon’s choices.


Comparing revisions

Comparing Revisions

  • One answer sequence maps into another iff

    • there is an order and answer-preserving map from the first to the second (? is wild).

  • Then the revisions of first are as good as those of the second.

. . .

?

?

?

?

?

. . .


Comparing revisions1

Comparing Revisions

  • The revisions of the first are strictly better if, in addition, the latter doesn’t map back into the former.

. . .

?

?

?

?

?

. . .

?


Comparing methods

Comparing Methods

  • F is as good as G iff

    each output sequence of F is as good as some output sequence of G.

F

as good as

G


Comparing methods1

Comparing Methods

  • F is better thanG iff

    F is as good as G and

    G is not as good as F

F

not as good as

G


Comparing methods2

Comparing Methods

  • F is strongly better thanG iff each output sequence of F is strictly better than an output sequence of G but …

strictly better than


Comparing methods3

Comparing Methods

  • … no output sequence of G is as good as any of F.

not as good as


Terminology

Terminology

  • Efficient solution: as good as any solution in any subproblem.


What simplicity isn t

Symptoms…

What Simplicity Isn’t

Only by

accident!!

  • Syntactic length.

  • Data-compression (MDL).

  • Computational ease.

  • Social “entrenchment” (Goodman).

  • Number of free parameters (BIC, AIC).

  • Euclidean dimensionality


What simplicity is

What Simplicity Is

  • Simpler theories are compatible with deeper problems of induction.

Worst demon

Smaller demon


Problem of induction

Problem of Induction

  • No true information entails the true answer.

  • Happens in answer boundaries.


Demonic paths

Demonic Paths

A demonic path from w is a sequence of alternating answers that a demon can force an arbitrary convergent method through starting from w.

0…1…2…3…4


Simplicity defined

Simplicity Defined

The A-sequences are the demonic sequences beginning with answer A.

A is as simple as B iff each B-sequence is as good as some A-sequence.

3

3, 4

3, 4, 5

<

<

<

2, 3

2, 3, 4

2, 3, 4, 5

. . .

So 2 is simpler than 3!


Ockham answer

Ockham Answer

  • An answer as simple as any other answer.

  • = number of observed particles.

n

n, n+1

n, n+1, n+2

<

<

<

2, …, n

2, …, n, n+1

2, …, n, n+1, n+2

. . .

So 2 is Ockham!


Ockham lemma

Ockham Lemma

A is Ockham iff

for all demonic p, (A*p) ≤ some demonic sequence.

I can force you through 2

but not

through 3,2.

So 3 isn’t Ockham

3


Ockham answer1

Ockham Answer

E.g.: Only simplest curve compatible with data is Ockham.

a1

Demonic sequence:

Non-demonic sequences:

a2

0


General efficiency theorem

General Efficiency Theorem

If the topology is metrizable and separable and the question is countable then:

Ockham = Efficient.

Proof: uses Martin’s Borel Determinacy theorem.


Stacked problems

. . .

3

2

1

0

Stacked Problems

There is an Ockham answer at every stage.

1


Non ockham strongly worse

Non-Ockham  Strongly Worse

If the problem is a stacked countable partition over a restricted Polish space:

Each Ockham solution is strongly better than each non-Ockham solution in the subproblem entered at the time of the violation.


Simplicity low dimension

Simplicity  Low Dimension

  • Suppose God says the true parameter value is rational.


Simplicity low dimension1

Simplicity  Low Dimension

  • Topological dimension and integration theory dissolve.

  • Does Ockham?


Simplicity low dimension2

Simplicity  Low Dimension

  • The proposed account survives in the preserved limit point structure.


Iv ockham and symmetry

IV. Ockham and Symmetry


Respect for symmetry

?

Respect for Symmetry

  • If several simplest alternatives are available, don’t break the symmetry.

  • Count the marbles of each color.

  • You hear the first marble but don’t see it.

  • Why red rather than green?


Respect for symmetry1

?

Respect for Symmetry

  • Before the noise, (0, 0) is Ockham.

  • After the noise, no answer is Ockham:

Demonic

Non-demonic

(0, 0)

(1, 0)

(1, 0) (0, 1)

(0, 1)

(0, 1) (1, 0)

Right!


Goodman s riddle

Goodman’s Riddle

  • Count oneicles--- a oneicle is a particle at any stage but one, when it is a non-particle.

  • Oneicle tranlation is auto-homeomorphism that does not preserve the problem.

  • Unique Ockham answer is current oneicle count.

  • Contradicts unique Ockham answer in particle counting.


Supersymmetry

Supersymmetry

  • Say when each particle appears.

  • Refines counting problem.

  • Every auto-homeomorphism preserves problem.

  • No answer is Ockham.

  • No solution is Ockham.

  • No method is efficient.


Dual supersymmetry

Dual Supersymmetry

  • Say only whether particle count is even or odd.

  • Coarsens counting problem.

  • Particle/Oneicle auto-homeomorphism preserves problem.

  • Every answer is Ockham.

  • Every solution is Ockham.

  • Every solution is efficient.


Broken symmetry

Broken Symmetry

  • Count the even or just reportodd.

  • Coarsens counting problem.

  • Refines the even/odd problem.

  • Unique Ockham answer at each stage.

  • Exactly Ockham solutions are efficient.


Simplicity under refinement

Simplicity Under Refinement

Supersymmetry

No answer is Ockham

Time of particle appearance

Particle counting

Oneicle counting

Twoicle counting

Broken symmetry

Unique Ockham answer

Particle counting

or odd particles

Oneicle counting

or odd oneicles

Twoicle counting

or odd twoicles

Dual supersymmetry

Both answers are Ockham

Even/odd


Proposed theory is right

Proposed Theory is Right

  • Objective efficiency is grounded in problems.

  • Symmetries in the preceding problems would wash out stronger simplicity distinctions.

  • Hence, such distinctions would amount to mere conventions (like coordinate axes) that couldn’t have anything to do with objective efficiency.


Furthermore

Furthermore…

  • If Ockham’s razor is forced to choose in the supersymmetrical problems then either:

    • following Ockham’s razor increases revisions in some counting problems

      Or

    • Ockham’s razor leads to contradictions as a problem is coarsened or refined.


V conclusion

V. Conclusion


What ockham s razor is

What Ockham’s Razor Is

  • “Only output Ockham answers”

  • Ockham answer = a topological invariant of the empirical problem addressed.


What it isn t

What it Isn’t

  • preference for

    • brevity,

    • computational ease,

    • entrenchment,

    • past success,

    • Kolmogorov complexity,

    • dimensionality, etc….


How it works

How it Works

  • Ockham’s razor is necessary for mininizing revisions prior to convergence to the truth.


How it doesn t

How it Doesn’t

  • No possible method could:

    • Point at the truth;

    • Indicate the truth;

    • Bound the probability of error;

    • Bound the number of future revisions.


Spooky ockham

Spooky Ockham

  • Science without support or safety nets.


Spooky ockham1

Spooky Ockham

  • Science without support or safety nets.


Spooky ockham2

Spooky Ockham

  • Science without support or safety nets.


Spooky ockham3

Spooky Ockham

  • Science without support or safety nets.


Vi stochastic ockham

VI. Stochastic Ockham


Mixed strategies1

“Mixed” Strategies

  • mixed strategy = chance of output depends only on actual experience.

e

Pe(M = H at n) = Pe|n(M = H at n).


Stochastic case

Stochastic Case

  • Ockham =

    at each stage, you produce a non-Ockham answer with prob = 0.

  • Efficiency =

    achievement of the best worst-case expected revision bound in each answer in each subproblem over all methods that converge to the truth in probability.


Stochastic efficiency theorem

Stochastic Efficiency Theorem

  • Among the stochastic methods that converge in probability, Ockham = Efficient!

Efficient

Inefficient


Stochastic methods

Stochastic Methods

  • Your chance of producing an answer is a function of observations made so far.

2

p

Urn selected in light of

observations.


Stochastic u turn argument

Stochastic U-turn Argument

  • Suppose you converge in probability to the truth but produce a non-Ockham answer with prob > 0.

3

r > 0


Stochastic u turn argument1

Stochastic U-turn Argument

  • Choose small e > 0. Consider answer 4.

3

r > 0


Stochastic u turn argument2

Stochastic U-turn Argument

  • By convergence in probability to the truth:

3

r > 0

2

p > 1 - e/3


Stochastic u turn argument3

Stochastic U-turn Argument

  • Etc.

3

r > 0

2

3

4

p> 1-e/3

p > 1-e/3

p > 1-e/3


Stochastic u turn argument4

Stochastic U-turn Argument

  • Since e can be chosen arbitrarily small,

    • sup prob of ≥ 3 revisions ≥ r.

    • sup prob of ≥ 2 revisions =1

3

r > 0

2

3

4

p> 1-e/3

p > 1-e/3

p > 1-e/3


Stochastic u turn argument5

Stochastic U-turn Argument

  • So sup Exp revisions is ≥ 2 + 3r.

  • But for Ockham = 2.

3

r > 0

2

3

4

p> 1-e/3

p > 1-e/3

p > 1-e/3

Subproblem


Vii statistical inference

VII. Statistical Inference

(Beta Version)


The statistical puzzle of simplicity

The Statistical Puzzle of Simplicity

  • Assume: Normal distribution, s =1, m≥ 0.

  • Question:m= 0 or m> 0 ?

  • Intuition: m= 0 is simpler than m> 0 .

m = 0

mean


Analogy2

Analogy

  • Marbles: potentially small “effects”

  • Time: sample size

  • Simplicity: fewer free parameters tied to potential “effects”

  • Counting: freeing parameters in a model


U turn in probability

U-turn “in Probability”

  • Convergence in probability: chance of producing true model goes to unity

  • Retraction in probability: chance of producing a model drops from above r > .5 to below 1 – r.

1

r

Chance of producing

true model

Chance of producing

alternative model

1 - r

0

Sample size


Suppose you probably choose a model more complex than the truth

Suppose You (Probably) Choose a Model More Complex than the Truth

m = 0

m > 0

mean

Revision Counter: 0

>r

zone for choosing m > 0

sample mean


Eventually you retract to the truth in probability

Eventually You Retract to the Truth (In Probability)

m = 0

mean

RevisionCounter: 1

>r

zone for choosing m = 0

sample mean


So you probably output an overly simple model nearby

So You (Probably) Output an Overly Simple Model Nearby

m > 0

mean

RevisionCounter: 1

>r

zone for choosing m = 0

sample mean


Ockham s razor what it is what it isn t how it works and how it doesn t

Eventually You Retract to the Truth (In Probability)

m > 0

mean

RevisionCounter: 2

>r

zone for choosing m > 0

sample mean


Ockham s razor what it is what it isn t how it works and how it doesn t

But Standard (Ockham) Testing Practice Requires Just One Retraction!

m = 0

mean

RevisionCounter: 0

>r

zone for choosing m = 0

sample mean


Ockham s razor what it is what it isn t how it works and how it doesn t

In The Simplest World, No Retractions

m = 0

mean

RevisionCounter: 0

>r

zone for choosing m = 0

sample mean


Ockham s razor what it is what it isn t how it works and how it doesn t

In The Simplest World, No Retractions

m = 0

mean

RevisionCounter: 0

>r

zone for choosing m = 0

sample mean


Ockham s razor what it is what it isn t how it works and how it doesn t

In Remaining Worlds, at Most One Retraction

m > 0

mean

RevisionCounter: 0

>r

zone for choosing m > 0


Ockham s razor what it is what it isn t how it works and how it doesn t

In Remaining Worlds, at Most One Retraction

m > 0

mean

RevisionCounter: 0

zone for choosing m > 0


Ockham s razor what it is what it isn t how it works and how it doesn t

In Remaining Worlds, at Most One Retraction

m > 0

mean

RevisionCounter: 1

>r

zone for choosing m > 0


Ockham s razor what it is what it isn t how it works and how it doesn t

So Ockham Beats All Violators

  • Ockham: at most one revision.

  • Violator: at least two revisions in worst case


Ockham s razor what it is what it isn t how it works and how it doesn t

Summary

  • Standard practice is to “test” the point hypothesis rather than the composite alternative.

  • This amounts to favoring the “simple” hypothesis a priori.

  • It also minimizes revisions in probability!


Two dimensional example

Two Dimensional Example

  • Assume: independent bivariate normal distribution of unit variance.

  • Question:how many components of the joint mean are zero?

  • Intuition: more nonzeros = more complex

  • Puzzle: How does it help to favor simplicity in less-than-simplest worlds?


A real model selection method

A Real Model Selection Method

  • Bayes Information Criterion (BIC)

  • BIC(M, sample) =

    • - log(max prob that M can assign to sample) +

    • + log(sample size)  model complexity  ½.

  • BIC method: choose M with least BIC score.


Official bic property

Official BIC Property

  • In the limit, minimizing BIC finds a model with maximal conditional probability when the prior probability is flat over models and fairly flat over parameters within a model.

  • But it is also revision-efficient.


Aic in simplest world

AIC in Simplest World

  • n = 2

  • m = (0, 0).

  • Retractions = 0

Simple

Complex


Aic in simplest world1

AIC in Simplest World

  • n = 100

  • m = (0, 0).

  • Retractions = 0

Simple

Complex


Aic in simplest world2

AIC in Simplest World

  • n = 4,000,000

  • m = (0, 0).

  • Retractions = 0

Simple

Complex


Bic in simplest world

BIC in Simplest World

  • n = 2

  • m = (0, 0).

  • Retractions = 0

Simple

Complex


Bic in simplest world1

BIC in Simplest World

  • n = 100

  • m = (0, 0).

  • Retractions = 0

Simple

Complex


Bic in simplest world2

BIC in Simplest World

  • n = 4,000,000

  • m = (0, 0).

  • Retractions = 0

Simple

Complex


Bic in simplest world3

BIC in Simplest World

  • n = 20,000,000

  • m = (0, 0).

  • Retractions = 0

Simple

Complex


Performance in complex world

Performance in Complex World

  • n = 2

  • m = (.05, .005).

  • Retractions = 0

Simple

Complex

95%


Performance in complex world1

Performance in Complex World

  • n = 100

  • m = (.05, .005).

  • Retractions = 0

Simple

Complex


Performance in complex world2

Performance in Complex World

  • n = 30,000

  • m = (.05, .005).

  • Retractions = 1

Simple

Complex


Performance in complex world3

Performance in Complex World

  • n = 4,000,000 (!)

  • m = (.05, .005).

  • Retractions = 2

Simple

Complex


Ockham s razor what it is what it isn t how it works and how it doesn t

Question

  • Does the statistical retraction minimization story extend to violations in less-than-simplest worlds?

  • Recall that the deterministic argument for higher retractions required the concept of minimizing retractions in each “subproblem”.

  • A “subproblem” is a proposition verified at a given time in a given world.

  • Some analogue “in probability” is required.


Subproblem

Subproblem.

  • H is an a -subroblem in w at n:

    • There is a likelihood ratio test of {w} at significance < a such that this test has power < 1 - a at each world in H.

worlds

H

w

sample size = n

>1- a

> a

> a

reject

reject

accept


Significance schedules

Significance Schedules

  • A significance schedulea(.) is a monotone decreasing sequence of significance levels converging to zero that drop so slowly that power can be increased monotonically with sample size.

n+1

n

a(n+1)

a(n)


Ockham violation inefficient

Ockham Violation  Inefficient

Subproblem

At sample size n

(mX, mY)


Ockham violation inefficient1

Ockham Violation  Inefficient

Subproblem

At sample size n

(mX, mY)

Ockham violation:

Probably say blue hypothesis

at white world

(p > r)


Ockham violation inefficient2

Ockham Violation  Inefficient

Subproblem

at time of violation

(mX, mY)

Probably say blue

Probably say white


Ockham violation inefficient3

Ockham Violation  Inefficient

Subproblem

at time of violation

(mX, mY)

Probably say blue

Probably say white


Ockham violation inefficient4

Ockham Violation  Inefficient

Subproblem

at time of violation

(mX, mY)

Probably say blue

Probably say white

Probably say blue


Oops ockham inefficient

Oops! Ockham  Inefficient

(mX, mY)

Subproblem


Oops ockham inefficient1

Oops! Ockham  Inefficient

(mX, mY)

Subproblem


Oops ockham inefficient2

Oops! Ockham  Inefficient

(mX, mY)

Subproblem


Oops ockham inefficient3

Oops! Ockham  Inefficient

(mX, mY)

Subproblem


Oops ockham inefficient4

Oops! Ockham  Inefficient

(mX, mY)

Subproblem

Two retractions


Local retraction efficiency

Local Retraction Efficiency

  • Ockham does as well as best subproblem performance in some neighborhood of w.

(mX, mY)

Subproblem

At most one retraction

Two retractions


Ockham violation inefficient5

Ockham Violation  Inefficient

  • Note: no neighborhood around w avoids extra retractions.

Subproblem

at time of violation

(mX, mY)

w


Gonzo ockham inefficient

Gonzo Ockham  Inefficient

  • Gonzo = probably saying simplest answer in entire subproblem entered in simplest world.

(mX, mY)

Subproblem


Balance

Balance

  • Be Ockham (avoid complexity)

  • Don’t be Gonzo Ockham (avoid bad fit).

  • Truth-directed: sole aim is to find true model with minimal revisions!

  • No circles: totally worst-case; no prior bias toward simple worlds.


The end

THE END


  • Login