490 likes | 725 Views
Source: Bishop book chapter 1 with modifications by Christoph F. Eick. Pattern Recognition and Machine Learning. Chapter 1: Introduction. Polynomial Curve Fitting . Experiment: Given a function; create N training example. What M should we choose? Model Selection
E N D
Source: Bishop book chapter 1 with modifications by Christoph F. Eick Pattern Recognition and Machine Learning Chapter 1: Introduction
Polynomial Curve Fitting Experiment: Given a function; create N training example What M should we choose?Model Selection Given M, what w’s should we choose?Parameter Selection
0th Order Polynomial • As N, E¯ • As c (H), first E¯ and then E • As c (H)the training error decreases for some time and then stays constant (frequently at 0) How do M, the quality of fitting and the capability to generalize relate to each other??
Over-fitting Root-Mean-Square (RMS) Error:
Data Set Size: 9th Order Polynomial
Data Set Size: 9th Order Polynomial Increasing the size of the data sets alleviates the over-fitting problem.
Regularization Penalize large coefficient values Idea: penalize high weights that contribute to high variance and sensitivity to outliers.
Regularization: 9th Order Polynomial
The example demonstrated: As N, E¯ As c (H), first E¯ and then E As c (H)the training error decreases for some time and then stays constant (frequently at 0)
Polynomial Coefficients Weight of regularization increases
Probability Theory Apples and Oranges
Probability Theory Marginal Probability Conditional Probability Joint Probability
Probability Theory Sum Rule Product Rule
The Rules of Probability Sum Rule Product Rule
Bayes’ Theorem posterior likelihood × prior
Probability Densities Cumulative Distribution Function Usually in ML!
Expectations (f under p(x)) Conditional Expectation (discrete) Approximate Expectation (discrete and continuous)
Gaussian Parameter Estimation Likelihood function Compare: for 2, 2.1, 1.9,2.05,1.99 N(2,1) and N(3.1)
Maximum Likelihood Determine by minimizing sum-of-squares error, .
Predictive Distribution Skip initially
Model Selection Cross-Validation
Entropy • Important quantity in • coding theory • statistical physics • machine learning
Entropy Coding theory: x discrete with 8 possible states; how many bits to transmit the state of x? All states equally likely
Entropy In how many ways can N identical objects be allocated M bins? Entropy maximized when
Differential Entropy Put bins of width ¢along the real line Differential entropy maximized (for fixed ) when in which case