Bayesian Learning, Regressionbased learning. Overview. Bayesian Learning Full MAP learning Maximum Likelihood Learning Learning Bayesian Networks (Fully observable) Regression and Logistic Regression. Full Bayesian Learning.
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
= ∑iP(X, hi d)
= ∑iP(X hi,d) P(hi d)
= ∑iP(X hi) P(hi d)
~ ∑iP(X hi) P(d hi) P(hi)
The data does not add anything to a prediction given an hp
P(h75d)
P(h50d)
P(h25d)
P(h0d)
Posterior Probability of HP(hi d) = αP(d hi) P(hi)
∑iP(next candy is lime hi) P(hi d)
P(h75d)
P(h50d)
P(h25d)
P(h0d)
MAP approximationthe proportion of cherries in the bag is equal to the proportion (frequency) of in cherries in the data
There is one more example for you to look at
in the next few slides
= P( W=greenF = cherry, hθθ1θ2) P( F = cherry hθθ1θ2)
= θ (1θ1)
= ∏j P(dj hθθ1θ2)
Frequencies again!
C
Xi
X1
X2
C
Xi
X1
X2
WHY?
problem of fitting a linear function to a set of training examples: input/output pairs with numeric values
hw(x) = w1x + w0
y = w1x + w0
Algebra givesan exact solution to
the minimization
problem
fw(X1,...,Xn) = w0+w1 × X1 + ...+ wn × Xn ,
hwe=w0+w1× x1 + ...+ wn× xn =∑i=0n wi× xi,
where x0is defined to be 1.
ErrorE(w) = ∑e∈E(oe. hwe)2
f(in) = 1 when in > t
= 0 otherwise
Decision boundary
Linearly
separable
NonLinearly
separable
Linearly
separable
Linearly
separable
= FALSE
Majority (I1,I2,I3)
= TRUE
each set of weights defines a point on the error surface
Given a point on the surface, look at the slope of the surface along the axis formed by each weight
partial derivative of the surface Err with respect to each weight wj

f
'
(
x
)
f
(
x
)(
1
f
(
x
))
chain rule
Weight Updatefor Logistic Regression