E N D
9/22/23 The Chi-Square Distribution Test of Independence 1 Chi-square: Test of independence • H0: Variables A and B are independent • HA: Variables A and B are not independent • Knowing A will help you predict B • Note: Relationship does not have to be causal to show dependence 2 1
9/22/23 Chi-square: Test of independence • When H0 is true, the expected frequencies for each cell is found by multiplying row and column totals then dividing by the grand total. 3 Row X Column Chi-square • Degrees of freedom • df =(r-1) x (c-1) • Where • r= number of rows • c= number of columns 4 2
9/22/23 Test of Independence Researchers in California have asked a sample of 175 farmers to select their favorite from three popular British beef cattle breeds. Of the 111 Northern CA farmers in the sample, 54 selected Angus, 25 selected Hereford, and 32 selected Shorthorn. Of the 64 Southern CA farmers in the sample, 19 selected Angus, 22 selected Hereford, and 23 selected Shorthorn. At the 0.05 level, is region independent of breed preference? 5 Test of Independence •First, arrange the data in a table. Angus 54 19 Hereford 25 22 Shorthorn 32 23 Northern CA Southern CA 6 3
9/22/23 Test of Independence •Second, add row and column totals. Angus Northern CA 54 Southern CA 19 Totals 73 Hereford 25 22 47 Shorthorn 32 23 55 Totals 111 64 175 7 Test of Independence •Third, add E columns. Angus Hereford O 25 22 47 Shorthorn O 32 23 55 Totals O 54 19 73 E E E Northern CA Southern CA Totals 111 64 175 8 4
9/22/23 Test of Independence •Fourth, calculate E columns. Angus O Northern CA 54 Hereford O 25 Shorthorn O 32 Totals E E E 111 (73×111) 175 (47*111) 175 (55*111) 175 Southern CA 19 22 23 64 (47*64) 175 (55*64) 175 (73*64) 175 Totals 73 47 55 175 9 Test of Independence •Fourth, calculate E columns. Angus O Northern CA 54 Hereford O 25 Shorthorn O 32 Totals E E E 46.3 29.8 34.9 111 Southern CA 16 26.7 22 17.2 23 20.1 64 Totals 73 47 55 175 10 5
9/22/23 Test of Independence •Fifth, calculate (O-E)2/E columns. Angus Hereford Shorthorn Totals O E O E O E (O-E)2/E (O-E)2/E (O-E)2/E Northern CA 54 46.3 1.28 25 29.8 0.77 32 34.9 0.24 111 Southern CA 19 26.7 2.22 22 17.2 1.34 23 20.1 0.42 64 Totals 73 47 55 175 11 P-value for χ2 • Upper tail (only) • P-value = pchisq(χ2 calc,df,lower.tail=FALSE); 12 6
9/22/23 Critical Value for χ2 • Upper tail (only) • χ2crit = qchisq(1- α,df); 13 P-Value Rules for χ2 • If the P-value is greater than α, you will fail to reject H0 • If the P-value is less than α, you will reject H0 14 7
9/22/23 Critical Value Rules for χ2 • Upper tail (only) • If χ2calc is greater than χ2crit , you will reject H0 • If χ2calc is less than χ2crit , you will fail to reject H0 15 P- value pchisq(6.27,2,lower.tail=FALSE) = 0.04349975 16 8
9/22/23 critical value qchisq(.95,2) = 5.991465 17 Test of Independence Hypotheses: H0: Type of breed and region areindependent. H1: Type of breed and region are not independent. Critical Value: a = 0.05 df = (r – 1) (c – 1) = (2 – 1)• (3 – 1) = 1 • 2 = 2 Do Not Reject H0 Reject H0 0.95 0.05 If c2 > 5.991 (from table or R), reject H0. c =5.991 2 18 9
9/22/23 Test of Independence •Test Statistic: •Conclusion: Since the test statistic of 6.27 is greater than the critical value of 5.991, we reject the null hypothesis. •Implications: There is enough evidence to show that breed preference is not independent from region. c2 = 6.27 19 Chi-square: Test of independence • A sheep research station conducts a research project on litter size in hair sheep. In the normal (fall) breeding season, 50 Dorper, 102 Katahdin, and 85 St. Croix mature ewes are exposed to their respective rams. The following spring, the scientist tabulates litter size from those 237 ewes. No ewe had more than triplets. 0 1 2 1 1 2 3 5 Dorper Katahdin St. Croix 13 14 10 31 70 71 16 3 20 10
9/22/23 Chi-square: Test of independence • H0: Litter size and breed are independent • Ha: Litter size and breed are not independent • Knowing litter size will help you predict breed or vice versa 21 Chi-square: Test of independence 0 1 2 3 Row total 50 102 85 237 Dorper Katahdin St. Croix Column total 1 2 1 4 13 14 10 37 31 70 71 172 5 16 3 24 22 11
9/22/23 Chi-square: Test of independence • When H0 is true, the expected frequencies for each cell is found by multiplying row and column totals then dividing by the grand total. 23 Row X Column Chi-square • Degrees of freedom • df =(r-1) x (c-1) • Where • r= number of rows • c= number of columns 24 12
9/22/23 Chi-square: Test of independence 0 1 2 3 Row total O E O E O E O E 4*50/237 = 0.8 37*50/237 = 7.8 172*50/237 = 36.3 24*50/237 = 5.1 Dorper 1 13 31 5 50 4*102/237 = 1.7 37*102/237 = 15.9 172*102/23 7 = 74.0 24*102/237 = 10.3 Katahdin 2 14 70 16 102 4*85/237 = 1.4 37*85/237 = 13.3 172*85/237 = 61.7 24*85/237 = 8.6 St. Croix 1 10 71 3 85 Column total 4 37 172 24 237 25 Chi-square: Test of independence 0 1 2 3 Row total O E O E O E O E (O-E)2/E (O-E)2/E (O-E)2/E (O-E)2/E Dorper 1 4*50/ 237 = 0.8 4*102 /237 = 1.7 0.05 13 37*50 /237 = 7.8 37*10 2/237 = 15.9 3.47 31 172*5 0/237 = 36.3 172*1 02/23 7 = 74.0 172*8 5/237 = 61.7 0.77 5 24*50 /237 = 5.1 24*10 2/237 = 10.3 0.002 50 Katahdin 2 0.05 14 0.23 70 0.22 16 3.15 102 St. Croix 1 4*85/ 237 = 1.4 0.11 10 37*85 /237 = 13.3 0.82 71 1.40 3 24*85 /237 = 8.6 3.64 85 Column total 4 37 172 24 237 2 =13.912 χdf=6 26 13
9/22/23 P-value for χ2 • Upper tail (only) • P-value = pchisq(χ2calc,df,lower.tail=FALSE); 27 Critical Value for χ2 • Upper tail (only) • χ2crit = qchisq(1- α,df); 28 14
9/22/23 P-Value Rules for χ2 • If the P-value is greater than α, you will fail to reject H0 • If the P-value is less than α, you will reject H0 29 Critical Value Rules for χ2 • Upper tail (only) • If χ2calc is greater than χ2crit , you will reject H0 • If χ2calc is less than χ2crit , you will fail to reject H0 30 15
9/22/23 P- value pchisq(13.912,6,lower.tail=FALSE) = 0.03063477 31 critical value qchisq(.95,6) = 12.59159 32 16
9/22/23 R 33 Chi-square: Test of independence 0 1 2 3 Row total O E O E O E O E (O-E)2/E (O-E)2/E (O-E)2/E (O-E)2/E Dorper 1 4*50/ 237 = 0.8 4*102 /237 = 1.7 0.05 13 37*50 /237 = 7.8 37*10 2/237 = 15.9 3.47 31 172*5 0/237 = 36.3 172*1 02/23 7 = 74.0 172*8 5/237 = 61.7 0.77 5 24*50 /237 = 5.1 24*10 2/237 = 10.3 0.002 50 Katahdin 2 0.05 14 0.23 70 0.22 16 3.15 102 St. Croix 1 4*85/ 237 = 1.4 0.11 10 37*85 /237 = 13.3 0.82 71 1.40 3 24*85 /237 = 8.6 3.64 85 Column total 4 37 172 24 237 2 =13.912 χdf=6 34 17
9/22/23 R 35 18