The Chi Square statistic tests :. Whether the difference between what you observe and what chance would predict is due to sampling error. The greater the deviation of what we observe to what we would expect by chance, the greater the probability that the difference is NOT due to chance.
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
Step #1: Hypotheses:
Categorizing same individuals in two ways:Approval and Gender
Looking at the Effect of an Independent Variable (Gender) on Dependent Variable (Approval).
This is a classic application for the c2 test.
· 1. We have nominal data in both variables - men vs women, approve vs disapprove.
· 2. The data are in the form of frequencies and
· 3. We are looking to see if there is a relationship between the two variables.
We set as our standard 95% confidence that the difference we observe in our study is not due to chance This is equivalent of setting alpha at risk level of 5% (=.05).STEP 4: DETERMINE CRITICAL VALUE OF 2
Assuming the null hypothesis is true what would be the expected values?
Row Margin * Column Margin
Cell a: 335 * 418 / 908 = 140,030 / 908 = 154
Cell b: 335 * 490 / 908 = 164,150 / 908 = 181
Cell c: 573 * 418 / 908 = 239,514 / 908 = 264
Cell d: 573 * 490 / 908 = 280,770 / 908 = 309
Critical value: 3.84
Chi-square computed from data: 10.07
Decision: Reject Null.
STEP 6: STATE CONCLUSION
Thinking about cases as pairs - Concordant and Discordant Pairs.
A pair where case A scores higher or lower than does case B on BOTH variables.
A pair where case A scores higher or lower than does case B on ONE variable and the opposite for the other variable.
Cases A and B tie on at least one of the variables.
All science is concerned with the relationships between variables -- the effect of one variable on another. This is what hypothesis testing is all about. We hypothesize that X is related to Y. The two most powerful techniques for analyzing the relationship between interval level variables are:
1. Regression:Magnitude of relationship between the independent variable and the dependent variable (how much change in one yields how much change in the other).
2. Correlation: the predictive power of one variable on another (direction and strength of association).
Consider the relationship between education and income. First we could look at the strength of the relationship, for example, the impact of education on income, asking how much of a change in income is associated with one’s # of years of education. EG., how many more dollars of income would someone earn, on average, if he or she finished college rather than drop out after 2 years?
We are asking, as education increases how much does income change? Given a positive relationship between education and income (as X goes up Y goes up) how do years of education vary with dollars of yearly income? Is the effect big or small?
Correlation analysis asks: how good a predictor is the "independent variable" of the dependent variable? Here, how good a predictor of income is education? Is education a good indicator of income or not? How accurate is our prediction of the effect of education on income. It tells us how strongly related - how predictive - is one variable of another, say, education of income.
The greater the amount of spread of points
around the regression line, the less predictive
is X of Y and consequently, the weaker the
Draw a straight line through these points. Connect the dots. That line is called the "regression line". The regression line is the "best-fitting line" drawn through the points on an X-Y scatterplot.
Now Add 5 years of education
10 Years of Education Means about $12,000 Income
It adds an Additional $4,000 of Income!
Beta is the change in the dependent variable associated with one unit of change on the predictor variable.
Deviation is the sum of the squared distance of points to the regression line.
Problem: They all add up to zero!
Solution: Square the Distances
Fitting a regression line to data points by this method is called the "least squares method" -- the regression line is that line which minimizes the squared difference between the observed points and the point predicted by the line.
The best-fitting line is that line which -- compared to any other line you could plot through the points -- produced the lowest sum of squared deviations. So what we do in a regression analysis is compute that line which minimizes the squared deviation of points from the "best-fitting line".
Hopefully, these pictures will help you visualize relationships. What regression and correlation analyses each do is produce a summary number to represent a relationship. Regression tells you the strength of the relation [shown by slope of the line], and the predictive power of the relationship [as summarized by the correlation coefficient, written r] gives you a summary measure of errors in prediction.
b (beta) is the slope of the regression line
X is the value of the independent variable
Interpretation: a one unit change on X relates to a
beta change on Y, plus the value of the intercept.
y = a + bX
e.g., Income = $4100 (intercept) + $800 * X(Years of Education)
The classic case of deterministic relationship is that between Fahrenheit and Centigrade measure of temperature:
F0 = 32 + (9/5)C
Where a, the intercept, is 320. So when C=0 degrees F=32, b beta is the slope of the line, here (9/5) or 1.8. C is X degrees Centigrade. So for every one degree of change in degrees C, Fahrenheit goes up by 1.8 degrees, starting at 32 degrees: when C =0 F = 320 + (9/5)0 = 320
when C = 1000 F = 32 + (9/5)100=2120
Where the numerator is the covariance of X and Y and the denominator is the variance.