1 / 31

INDEPENDENT VARIABLES AND CHI SQUARE

INDEPENDENT VARIABLES AND CHI SQUARE. Independent versus Dependent Variables. Given two variables X and Y , they are said to be independent if the occurance of one does not affect the probability of the occurence of the other. Formally, X and Y are independent if

Download Presentation

INDEPENDENT VARIABLES AND CHI SQUARE

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. INDEPENDENT VARIABLES AND CHI SQUARE

  2. Independent versus Dependent Variables Given two variables X and Y, they are said to be independent if the occurance of one does not affect the probability of the occurence of the other. Formally, X and Y are independent if P (X | Y) = P (X) or P (Y | X) = P (Y) What does it mean?

  3. Independent versus Dependent Variables Consider the following contingency table Y X We say that X is independent from Y if

  4. Independent versus Dependent Variables: example 1 The following table gives a contingency table of an observed population (in million) based on gender (X) and healt insurance coverage (Y). Are the two variables independent? That is the health insurance coverage depends on gender?

  5. Independent versus Dependent Variables • We have to verify Y X YES

  6. Independent versus Dependent Variables • We have to verify Y X YES

  7. Independent versus Dependent Variables: example 2 Consider the example of the 420 employees. Are the variable Smoke (X) independent from the variable College Graduate (Y)?

  8. Independent versus Dependent Variables: example 2 We have to verify No independence!!

  9. Independent versus Dependent Variables Two variables are maximally dependent if the contingency table is There is a one-to-one relation between the categories of the two variables

  10. Chi square How caw we measure the “degree” of dependence between two variables? Remind that two variables are independent if From these relations we get: nhk*is called theoretical or expected frequency (E) since it expresses the frequency of the category h of X and k of Y in condition of independence.

  11. Chi square The observedfrequenciesnikare indicatedwith (O). If the observedfrequencies(O) are equalto the expectedfrequencies(E ) the variables are independent. We can buildanindicatorofindependence/dependencebetween the twovariablescalledChi square. The formula is It is evident the if Chi square is equal to 0 (O=E ) the two variables are independent.

  12. Chi square: example 1 Violence and lack of discipline have become major problems in schools in the United States. A random sample of 300 adults was selected, and they were asked if they favor giving more freedom to schoolteachers to punish students for violence and lack of discipline. The two-way classification of the responses of these adults is represented in the following table. Are the two variables gender and opinion independent?

  13. Chi square: example 1 In order to compute the chi square we have to compute the expected frequencies as follows:

  14. Chi square: example 1 For example

  15. Chi square: example 1 The value of the chi square is different from 0 and hence we should conclude that the two variables are independent.

  16. Chi square: criticalvalue Howeverit can happenthatevenif the chi squareisdifferentfrom 0, itsvalueissufficientlysmalltothinkthatthereisindependencebetween the variablesof interest. Butwhichvalueof the chi square can beconsidered a criticalvalue so thatvalues under thiscriticalvalue indicate independence and valuesoverthiscriticalvalue indicate dependencebetween the twovariables? Itdoesnotexist a fixedcriticalvalue. Itisdeterminedtimebytimedepending on the data we are examiningbyusing the methods and the principlesof the statisticalinference

  17. Chi square: criticalvalue We do not deal with the computationof the criticalvalue. However the criticalvalueiscomputedfromall the Statistical software, included Excel. Rule If the criticalvalue > chi squarethe twovariables can beconsideredindependent If the criticalvalue < chi squarethe twovariables can beconsidereddependent in the sensethattheyinfluencereciprocally. In the previousexample the criticalvalueis 9.21. Itisgreaterthan the valueof the chi square (8.252) thanwe can saythat the twovariables are independent, thatis the opinion of the selected people isnotinfluencedby the gender.

  18. Chi square: example 2 • A researcher wanted to study the relationship between gender and owning cell phones. She took a sample of 2000 adults and obtained the information given in the following table. Looking at the table can we conclude that gender and owning cell phones are related for all adults?

  19. Chi square: example 2 We have to compute the expected frequencies

  20. Chi square: example 2 Critical value= 3.841 The critical value is less than the chi square and hence we can conclude the two variables are dependent, that is owning cell phone depends on gender.

  21. LINEAR REGRESSION

  22. LINEAR REGRESSION • So far we investigated the relation of independence/dependence between two variables (qualitative or quantitative). • However this kind of relation is reciprocal, in the sense that we don’t know if one variable influences the other or vice versa and we don’t know how strong is this relation. • If we would like to know if one variable influences the other and how strong this relation is we have to refer to Linear regression. • By using the regression analysis we can evaluate the magnitude of change in one variable due to a certain change in another variable and we can predict the value of one variable for a given value of the other variable. • (Linear) regression is a statistical analysis that evaluates if exists a linear relationship between two quantitative variables, X and Y.

  23. SIMPLE LINEAR REGRESSION Definition A regression model is a mathematical equation that describes the relationship between two or more variables. A simple regression model includes only two variables: one independent and one dependent. The dependent variable is the one being explained, and the independent variable is the one used to explain the variation in the dependent variable.

  24. SIMPLE LINEAR REGRESSION A (simple) regression model that gives a straight-line relationship between two variables is called a linear regression model. Why is it called “regression model” or “regression analysis”? The method was first used to examine the relationship between the heights of fathers and sons. The two were related, of course. But they found that a tall father tended to have sons shorter than himself; a short father tended to have sons taller than himself. The height of sons regressed to the mean. The term "regression" is now used for many sorts of curve fitting.

  25. LINEAR REGRESSION: example 1 We want to investigate the relation between Incomes (in hundreds of dollars) (X) and Food Expenditures of Seven Households (Y). That is we want to investigate if Income influences Household’s decision about Food Expenditure and how strong is this influence.

  26. LINEAR REGRESSION: example 1 We can represent the data with a Scatter plot. A scatter plot is a plot of the values of Y versus the corresponding values of X: First household Seventh household Food expenditure Income

  27. LINEAR REGRESSION: example 1 The scatter plot seems to reveal a linear relationship between the two variables: a linear regression model might be indicated. In the Figure the points (observations) are replaced by a linear model (a) and non linear model (b). Linear Food Expenditure Food Expenditure Nonlinear Income Income

  28. LINEAR REGRESSION: the equation How can wewrite the linearmodelmathematically? y = a + b x Constant term or y- intercept Slope Independent variable Dependent variable

  29. LINEAR REGRESSION: intercept How can werepresentagraphically? The intercept is the Y value of the line when X equals zero. The intercept determines the position of the line on the Y axis. Y a4 a3 a2 a1 0 X

  30. LINEAR REGRESSION: slope How can werepresentbgraphically? The slope quantifies the steepness of the line. It equals the change in Y for each unit change in X. If the slope is positive, Y increases as X increases. If the slope is negative, Y decreases as X increases. Y Y Y b1 b2 b>0 b<0 b2>b1 X X X

  31. LINEAR REGRESSION Coming back to the example, among all the possible lines that can interpolate the points in the scatter plot which is the “best” ? Food expenditure Choosing the best line (or the line that best describes the relation between X and Y) means finding the “best” a and the “best” b Income

More Related