Ockham’s Razor: What it is, What it isn’t, How it works, and How it doesn’t. Kevin T. Kelly Department of Philosophy Carnegie Mellon University www.cmu.edu. Further Reading.
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
Ockham’s Razor: What it is, What it isn’t, How it works, and How it doesn’t
Kevin T. Kelly
Department of Philosophy
Carnegie Mellon University
www.cmu.edu
Further Reading
“Efficient Convergence Implies Ockham's Razor”,Proceedings of the 2002 International Workshop on Computational Models of Scientific Reasoning and Applications, Las Vegas, USA, June 24-27, 2002.
(with C. Glymour) “Why Probability Does Not Capture the Logic of Scientific Justification”, C. Hitchcock, ed., Contemporary Debates in the Philosophy of Science, Oxford: Blackwell, 2004.
“Justification as Truth-finding Efficiency: How Ockham's Razor Works”,Minds and Machines 14: 2004, pp. 485-505.
“Learning, Simplicity, Truth, and Misinformation”,The Philosophy of Information, under review.
“Ockham's Razor, Efficiency, and the Infinite Game of
Science”, proceedings, Foundations of the Formal Sciences 2004: Infinite Game Theory, Springer, under review.
T1
T2
T3
T4
T5
Compatible with data
???
T4
T5
Complex
T1
T2
T3
Simple
T4
T5
If you know the truth is simple,
then you don’t need Ockham.
Complex
T1
T2
T3
Simple
If you don’t know the truth is simple,
then how coulda fixedsimplicity bias help you if the truth is complex?
Complex
T1
T2
T3
Simple
T4
T5
A fixed bias is like a broken thermometer.
How could it possibly help you find unknown truth?
Cold!
I. Ockham Apologists
Thanks, but a simpler model still has lower predictive error.
The truth is complex.
-God-
.
.
.
.
Messy worlds are legion
Tidy worlds are few.
That is why the tidy worlds
Are those most likely true. (Carnap)
unity
Messy worlds are legion
Tidy worlds are few.
That is why the tidy worlds
Are those most likely true. (Carnap)
1/3
1/3
1/3
Messy worlds are legion
Tidy worlds are few.
That is why the tidy worlds
Are those most likely true. (Carnap)
2/6
2/6
1/6
1/6
But mess depends on coding,
which Goodman noticed, too.
The picture is inverted if
we translate green to grue.
2/6
2/6
Notation
Indicates truth?
1/6
1/6
Notation
Indicates truth?
You presume simplicity
Therefore you should presume simplicity!
bias against complex worlds.
S
C
bias against simple theory.
S
C
O
P
O
P
L
L
M
M
E
E
O
I
X
C
S
Ash Heap of
History
Philosopher’s stone,
Perpetual motion,
Free lunch
Ockham’s Razor???
. . .
II. How Ockham Helps You Find the Truth
S
C
S
C
S
C
Niagara Falls
Clarion
Pittsburgh
Niagara Falls
Clarion
Pittsburgh
Niagara Falls
Clarion
Pittsburgh
Niagara Falls
!
Clarion
Pittsburgh
Niagara Falls
Clarion
Pittsburgh
?
Ask directions!
Where’s …
Turn around. The freeway ramp is on the left.
Phooey!
The Sun was on the right!
!!
Ahem…
@@$##!
Pittsburgh
Pittsburgh
Pittsburgh
Pittsburgh
Pittsburgh
Pittsburgh
Pittsburgh
Pittsburgh
Told ya!
Pittsburgh
Pittsburgh
Pittsburgh
Pittsburgh
Pittsburgh
Pittsburgh
Needless U-turn
Pittsburgh
Told ya!
NY, DC
Pittsburgh
Told ya!
Complex
Simple
Told ya!
Complex
Simple
Told ya!
May come at any time…
May come at any time…
May come at any time…
May come at any time…
May come at any time…
May come at any time…
May come at any time…
3
?
T
T
3
3
3
3
3
2
2
2
3
2
2
2
3
2
2
2
3
2
2
2
3
3
2
2
2
3
4
3
2
2
2
3
4
5
3
2
2
2
3
4
5
6
3
2
2
2
3
4
5
6
7
Subproblem
3
2
2
2
3
4
5
6
7
Subproblem
2
2
2
3
4
5
6
7
and
then
and
then
Ockham = Efficient!
Efficient
Inefficient
Ockham = Efficient!
Efficient
Inefficient
Can’t wait forever!
III. Ockham on Steroids
Example: Symmetry
a1
a1 = 0
a2 = 0
a1 > 0
a2 = 0
a1 = 0
a2 > 0
a1 > 0
a2 > 0
a2
0
. . .
?
?
?
?
?
. . .
. . .
?
?
?
?
?
. . .
?
each output sequence of F is as good as some output sequence of G.
F
as good as
G
F is as good as G and
G is not as good as F
F
not as good as
G
strictly better than
not as good as
Symptoms…
Only by
accident!!
Worst demon
Smaller demon
A demonic path from w is a sequence of alternating answers that a demon can force an arbitrary convergent method through starting from w.
0…1…2…3…4
The A-sequences are the demonic sequences beginning with answer A.
A is as simple as B iff each B-sequence is as good as some A-sequence.
3
3, 4
3, 4, 5
<
<
<
2, 3
2, 3, 4
2, 3, 4, 5
. . .
So 2 is simpler than 3!
n
n, n+1
n, n+1, n+2
<
<
<
2, …, n
2, …, n, n+1
2, …, n, n+1, n+2
. . .
So 2 is Ockham!
A is Ockham iff
for all demonic p, (A*p) ≤ some demonic sequence.
I can force you through 2
but not
through 3,2.
So 3 isn’t Ockham
3
E.g.: Only simplest curve compatible with data is Ockham.
a1
Demonic sequence:
Non-demonic sequences:
a2
0
If the topology is metrizable and separable and the question is countable then:
Ockham = Efficient.
Proof: uses Martin’s Borel Determinacy theorem.
. . .
3
2
1
0
There is an Ockham answer at every stage.
1
If the problem is a stacked countable partition over a restricted Polish space:
Each Ockham solution is strongly better than each non-Ockham solution in the subproblem entered at the time of the violation.
IV. Ockham and Symmetry
?
?
Demonic
Non-demonic
(0, 0)
(1, 0)
(1, 0) (0, 1)
(0, 1)
(0, 1) (1, 0)
Right!
Supersymmetry
No answer is Ockham
Time of particle appearance
Particle counting
Oneicle counting
Twoicle counting
Broken symmetry
Unique Ockham answer
Particle counting
or odd particles
Oneicle counting
or odd oneicles
Twoicle counting
or odd twoicles
Dual supersymmetry
Both answers are Ockham
Even/odd
Or
V. Conclusion
VI. Stochastic Ockham
e
Pe(M = H at n) = Pe|n(M = H at n).
at each stage, you produce a non-Ockham answer with prob = 0.
achievement of the best worst-case expected revision bound in each answer in each subproblem over all methods that converge to the truth in probability.
Efficient
Inefficient
2
p
Urn selected in light of
observations.
3
r > 0
3
r > 0
3
r > 0
2
p > 1 - e/3
3
r > 0
2
3
4
p> 1-e/3
p > 1-e/3
p > 1-e/3
3
r > 0
2
3
4
p> 1-e/3
p > 1-e/3
p > 1-e/3
3
r > 0
2
3
4
p> 1-e/3
p > 1-e/3
p > 1-e/3
Subproblem
VII. Statistical Inference
(Beta Version)
m = 0
mean
1
r
Chance of producing
true model
Chance of producing
alternative model
1 - r
0
Sample size
m = 0
m > 0
mean
Revision Counter: 0
>r
zone for choosing m > 0
sample mean
m = 0
mean
RevisionCounter: 1
>r
zone for choosing m = 0
sample mean
m > 0
mean
RevisionCounter: 1
>r
zone for choosing m = 0
sample mean
Eventually You Retract to the Truth (In Probability)
m > 0
mean
RevisionCounter: 2
>r
zone for choosing m > 0
sample mean
But Standard (Ockham) Testing Practice Requires Just One Retraction!
m = 0
mean
RevisionCounter: 0
>r
zone for choosing m = 0
sample mean
In The Simplest World, No Retractions
m = 0
mean
RevisionCounter: 0
>r
zone for choosing m = 0
sample mean
In The Simplest World, No Retractions
m = 0
mean
RevisionCounter: 0
>r
zone for choosing m = 0
sample mean
In Remaining Worlds, at Most One Retraction
m > 0
mean
RevisionCounter: 0
>r
zone for choosing m > 0
In Remaining Worlds, at Most One Retraction
m > 0
mean
RevisionCounter: 0
zone for choosing m > 0
In Remaining Worlds, at Most One Retraction
m > 0
mean
RevisionCounter: 1
>r
zone for choosing m > 0
So Ockham Beats All Violators
Summary
Simple
Complex
Simple
Complex
Simple
Complex
Simple
Complex
Simple
Complex
Simple
Complex
Simple
Complex
Simple
Complex
95%
Simple
Complex
Simple
Complex
Simple
Complex
Question
worlds
H
w
sample size = n
>1- a
> a
> a
reject
reject
accept
n+1
n
a(n+1)
a(n)
Subproblem
At sample size n
(mX, mY)
Subproblem
At sample size n
(mX, mY)
Ockham violation:
Probably say blue hypothesis
at white world
(p > r)
Subproblem
at time of violation
(mX, mY)
Probably say blue
Probably say white
Subproblem
at time of violation
(mX, mY)
Probably say blue
Probably say white
Subproblem
at time of violation
(mX, mY)
Probably say blue
Probably say white
Probably say blue
(mX, mY)
Subproblem
(mX, mY)
Subproblem
(mX, mY)
Subproblem
(mX, mY)
Subproblem
(mX, mY)
Subproblem
Two retractions
(mX, mY)
Subproblem
At most one retraction
Two retractions
Subproblem
at time of violation
(mX, mY)
w
(mX, mY)
Subproblem