Some terminology. When the relation between variables are expressed in this manner, we call the relevant equation(s) mathematical models The intercept and weight values are called the parameters of the model.
PowerPoint Slideshow about 'Some terminology' - floyd
An Image/Link below is provided (as is) to download presentation
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
When the relation between variables are expressed in this manner, we call the relevant equation(s) mathematical models
The intercept and weight values are called the parameters of the model.
Although one can describe the relationship between two variables in the way we have done here, for now on we’ll assume that our models are causal models, such that the variable on the left-hand side of the equation is being caused by the variable(s) on the right side.
The values of Y in these models are often called predicted values, sometimes abbreviated as Y-hat or . Why? They are the values of Y that are implied or predicted by the specific parameters of the model.
Up to this point, we have assumed that our basic models are correct.
There are two important issues we need to deal with, however:
Is the basic model correct (regardless of the value of the parameters)? That is, is a linear, as opposed to a quadratic, model the appropriate model for characterizing the relationship between variables?
If the model is correct, what are the most right parameter values for the model?
One way to do so is to find the difference between each value of Y and the corresponding predicted value (we called these differences “errors” before), square these differences, and average them together
In this graph I have plotted the error variance as a function of the different parameter values we chose for b. Notice that our error was large at first, but got smaller as we made X larger. Eventually, the error reached a minimum and, then, began to increase again as we made X larger.
The method we just used is sometimes called the brute force or gradient descent method to estimating parameters.
More formally, gradient decent involves starting with viable parameter value, calculating the error using slightly different value, moving the best guess parameter value in the direction of the smallest error, then repeating this process until the error is as small as it can be.
With simple linear models, the equation is so simple that brute force methods are unnecessary.