1 / 77

Model Fitting

Model Fitting. Jean-Yves Le Boudec. Contents. What is model fitting ? Linear Regression Linear regression with L1 norm minimization Choosing a distribution Heavy Tail. Virus Infection Data. We would like to capture the growth of infected hosts (explanatory model)

irina
Download Presentation

Model Fitting

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Model Fitting Jean-Yves Le Boudec 0

  2. Contents • What is model fitting ? • Linear Regression • Linear regression with L1 norm minimization • Choosing a distribution • Heavy Tail 1

  3. Virus Infection Data • We would like to capture the growth of infected hosts (explanatory model) • An exponential model seems appropriate • How can we fit the model, in particular, what is the value of  ? 2

  4. Least Square Fit of Virus Infection Data = 0.5173 Mean doubling time 1.34 hours Prediction at +6 hours: 100 000 hosts Least square fit 3

  5. Least Square Fit of Virus Infection Data In Log Scale = 0.39 Mean doubling time 1.77 hours Prediction at +6 hours: 39 000 hosts Least square fit 4

  6. Compare the Two LS fit in natural scale LS fit in log scale 5

  7. Which Fitting Method should I use ? • Which optimization criterion should I use ? • The answer is in a statistical model. • Model not only the interesting part, but also the noise • For example = 0.5173 6

  8. How can I tell which is correct ? = 0.39 7

  9. Look at Residuals • = validate model 8

  10. 9

  11. 10

  12. Least Square Fit = Gaussian iid Noise • Assume model (homoscedasticity) • The theorem says: minimize least squares = compute MLE for this model • This is how we computed the estimates for the virus example 11

  13. Least Square and Projection • Skrivañ war an daol petra zo: data point, predicted response and estimated parameter for virus example Data point Predicted response Manifold Where the data point would lie if there would be no noise Estimated parameter 12

  14. Confidence Intervals 13

  15. 14

  16. Robustness to « Outliers » 15

  17. A Simple Example Least Square L1 Norm Minimization Model : y_i = m + noise What is m ? Confidence interval ? • Model: y_i = m + noise • What is m ? • Confidence interval ? 16

  18. Mean Versus Median 17

  19. 2. Linear Regression • Also called « ANOVA » (Analysis of Variance ») • = least square + linear dependence on parameter • A special case where computations are easy 18

  20. Example 4.3 • What is the parameter ? • Is it a linear model ? • How many degrees of freedom ? • What do we assume on i? • What is the matrix X ? 19

  21. 20

  22. Does this model have full rank ? • Q: Matrix X has full rank means the dimension of the set X() is ???? • A: 3 21

  23. Some Terminology • xi are called explanatory variable • Assumed fixed and known • yi are called response variables • They are « the data » • Assumed to be one sample output of the model 22

  24. Least Square and Projection Data point Predicted response Manifold Where the data point would lie if there would be no noise Estimated parameter 23

  25. Solution of the Linear Regression Model 24

  26. Least Square and Projection • The theorem gives H and K data residuals Predicted response Manifold Where the data point would lie if there would be no noise Estimated parameter 25

  27. The Theorem Gives  with Confidence Interval 26

  28. SSR • Confidence Intervals use the quantity s • s2 is called « Sum of Squared Residuals » data residuals Predicted response 27

  29. Validate the Assumptions with Residuals 28

  30. Residuals • Residuals are given by the theorem data residuals Predicted response 29

  31. Standardized Residuals • The residuals ei are an estimate of the noise terms i • They are not (exactly) normal iidThe variance of ei is ???? • A: 1- Hi,i • Standardized residuals are not exactly normal iid either but their variance is 1 30

  32. Which of these two models could be a linear regression model ? • A: both • Linear regression does not mean that yi is a linear function of xi • Achtung: There is a hidden assumption • Noise is iid gaussian -> homoscedasticity 31

  33. 32

  34. 3. Linear Regression with L1 norm minimization • = L1 norm minimization + linear dependency on parameter • More robust • Less traditional 33

  35. This is convex programming 34

  36. 35

  37. Confidence Intervals • No closed form • Compare to median ! • Boostrap: • How ? 36

  38. 37

  39. 4. Choosing a Distribution • Know a catalog of distributions, guess a fit • Shape • Kurtosis, Skewness • Power laws • Hazard Rate • Fit • Verify the fit visually or with a test (see later) 38

  40. Distribution Shape • Distributions have a shape • By definition: the shape is what remains the same when we • Shift • Rescale • Example: normal distribution: what is the shape parameter ? • Example: exponential distribution: what is the shape parameter ? 39

  41. Standard Distributions • In a given catalog of distributions, we give only the distributions with different shapes. For each shape, we pick one particular distribution, which we call standard. • Standard normal: N(0,1) • Standard exponential: Exp(1) • Standard Uniform: U(0,1) 40

  42. Log-Normal Distribution 41

  43. 42

  44. Skewness and Curtosis 43

  45. Power Laws and Pareto Distribution 44

  46. Complementary Distribution FunctionsLog-log Scales Lognormal Normal Pareto 45

  47. Zipf’s Law 46

  48. 47

  49. Hazard Rate • Interpretation: probability that a flow dies in next dt seconds given still alive • Used to classify distribs • Aging • Memoriless • Fat tail • Ex: normal ? Exponential ? Pareto ? Log Normal ? 48

  50. The Weibull Distribution • Standard Weibull CDF • Aging for c > 1 • Memoriless for c = 1 • Fat tailed for c <1 49

More Related