1 / 27

Prof. Saibal Chattopadhyay IIM Calcutta

IIMC Long Duration Executive Education Executive Programme in Business Management Statistics for Managerial Decisions Advanced Statistical Inference. Prof. Saibal Chattopadhyay IIM Calcutta. A Brief Review. Uncertainty and Randomness: Theory of Probability

Download Presentation

Prof. Saibal Chattopadhyay IIM Calcutta

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. IIMC Long Duration Executive EducationExecutive Programme in Business ManagementStatistics for Managerial DecisionsAdvanced Statistical Inference Prof. Saibal Chattopadhyay IIM Calcutta

  2. A Brief Review • Uncertainty and Randomness: Theory of Probability • Decision Making Under Uncertainty: Utility Theory • Random Variables & Probability Distributions: Binomial, Poisson, Normal, Exponential • Joint Distribution of Two Random Variables- Marginal Distributions, Mean, Variance, Covariance, Correlation Coefficient, Independence of random variables • Regression Approach to the analysis of a bivariate data – Curve fitting and Least Squares Principle • Sampling Theory: SRS, Stratified RS, Systematic Sampling, Central Limit Theorem, Multistage Sampling, Chi-Square, t and F distributions

  3. Statistical Inference • Sample based Inference about a population • Estimation (Point and Interval) • Hypothesis Testing Characteristics of Interest: • Population Mean • Population SD • Population Proportion One sample problems: Mean (SD known or unknown; n large or small) Two Sample Problems: • Difference of two means • Ratio of Two SD’s • Difference of two proportions • Case Studies 1-5

  4. Some other Inference problems • Categorical Data Analysis Variable is categorical in nature: Information available in terms of frequencies (number of individuals) belonging to different categories Example: 100 randomly selected items returned to a department store are categorized as: Cash Refund: 34 Credit to Charge Account: 18 Merchandise Exchange: 31 Return Refused: 17

  5. Categorical Data Analysis Research Question: Are these four possible dispositions for a return request occur with equal frequency? • Need a hypothesis-testing to assess whether the Data (four frequencies: 34, 18, 31, 17) support the theory that probabilities for observations to fall in these four categories are all equal • P1, P2, P3, and P4 are these probabilities, with P1 + P2 + P3 + P4 = 1 • To test Ho: P1 = P2 = P3 = P4

  6. Hypothesis-testing for categorical data • What is the alternative hypothesis? Ha: Not all Pi’s are equal How to proceed? With 2 such categories, no problem: the test is the equality of two proportions With multiple categories? • Goodness of fit tests for Ho versus Ha  An extension for testing equality of proportions from several populations

  7. Goodness-of-fit test General idea: • k categories • P1, P2, …, Pk: true unknown proportions for these k categories; P1 + P2 +…+ Pk = 1 • Ho: P1 = P1o; P2 = P2o; … Pk=Pko • Ha: Ho not true; at least one Pi differs from the corresponding hypothesized value • Level of significance =  = 0.05 or 0.01 • Data given: Observed frequencies f1, f2, …, fk for these k categories; f1 + f2 + …+ fk = n = sample size

  8. Goodness-of-fit test • Calculate the ‘expected frequencies’ for these k categories if Ho is true; Under Ho, Expected Frequency = Probability*Sample Size • fe1 = n.P1o; fe2 = n.P2o; … ; fek = n.Pko • fe1 + fe2 + … + fek = n = total frequency • Examine how closely these correspond to the actual observed frequencies • If they match closely, accept Ho • Reject Ho otherwise

  9. Goodness-of-fit test How to judge: Test Statistic? 2 =  (obs. freq. – exp. freq.)2 /(exp. freq.) =  (fi – fei)2 /(fei) A Chi-square based on frequencies, both observed and expected (under Ho) • A Frequency Chi-Square Test • Distribution of this Chi-square? • Approximately Chi-square with (k-1) d.f. provided all expected frequencies are ‘large’ • How large: all fei  5

  10. Goodness-of-fit Chi-Square Test • If Ho is true, discrepancies are small and so Chi-Square value is ‘small’ • Reject Ho if 2 is ‘large’: 2 > C • How large is large? Use level  = 0.05 or 0.01 • 2  : upper -point of 2 (d.f = k –1): Table Back to the Example: • k = 4 (number of categories) • Ho: P1 = P2 = P3 = P4 = ¼ ; Ha: Not Ho • Obs. Freq: f1 = 34, f2 = 18, f3 = 31, f4 = 17 • N = total frequency = 100

  11. Goodness-of-fit Chi-Square • Expected Frequencies: fe1 = 100. ¼ = 25 = fe2 = fe3 = fe4 • 2 = (34 – 25)2/25 + (18 – 25)2/25 + (31 – 25)2/25 + (17 – 25)2/25 = 9.2 • Suppose  = 0.05 ( to test at 5% level) • 2 value from table (d.f = k –1 =3) = 7.815 • Observed 2 = 9.2 > 7.815 : Reject Ho • Return of merchandise not equally frequent over the different categories, at 5% level

  12. Another Application – Test of Homogeneity • 2 or more similarly classified populations • Data: Frequencies falling in each category are known from each population • To Test if the populations are identical 2 populations - K classes each P1, P2, …, Pk : Probabilities for Population1 P1*, P2*, … Pk*: Prob. For Population 2 Ho: P1=P1*, P2 =P2*, …, Pk=Pk* Ha: They are not all equal

  13. Case Study 6: Right of Advertising • A study of consumers and dentists attitude toward advertising of dental services “Should Dentists Advertise?” - Journal of Advertising Research, June 1982, 33-38. Two samples: 101 consumers (population1) & 124 dentists (population 2) were asked to respond to the following statement: “I favour the use of advertising by dentists to attract new patients” Possible Responses are: (strongly agree, agree, neutral, disagree, strongly disagree):

  14. Should Dentists Advertise? • Data table

  15. Should Dentists Advertise? Research Question: Are the two groups - consumers and dentists – differ in their attitudes toward advertising? Probability Table:

  16. Should Dentists Advertise? To Test Ho: P1=P1*, …, P5 = P5* Expected Cell Count Formula: Exp = (Row marginal total)(Col. Marginal total) Grand Total Chi-sq =  (obs. freq. – exp. freq.)2 /(exp. freq.) DF = (# Rows – 1) (#Columns –1) Reject Ho if observed Chi-sq >tabled Chi-sq. (Assumption: all expected frequency  5)

  17. Should Dentists Advertise? Table of observed (expected) counts:

  18. Should Dentists Advertise? Calculation of the Test Statistic: Here all expected frequencies are  5. Chi-sq = (34 – 19.3)2 + … + (46 – 28.11)2 19.30 28.11 = 84.47 Degrees of freedom = (2-1)(5-1) = 4 Use alpha = 0.05 Chi-sq from table = 9.488 Reject Ho if obs. Chi-sq > 9.488

  19. Should Dentists Advertise? Conclusion: Since obs. Value of Chi-sq = 84.47 > 9.488, we shall reject Ho at 5% level of significance. Thus in the light of the given data, it appears that the two groups (consumers and doctors) differ significantly in their attitudes toward advertising.

  20. A Test for Independence • Two attributes A and B • A has k levels A1, A2, …, Ak • B has l levels B1, B2, …, Bl • Data available on k.l level combinations fij = number of observations (frequency) belonging to (Ai, Bj), n = total frequency • To test Ho: A and B are independent • Alternative Ha: they are associated

  21. Case Study 7: TV viewing and Fitness “Television viewing and Physical fitness in adults”: Research Quarterly for Exercise and Sport (1990), 315-320. A: Physical Fitness has k=2 levels A1=physically fit, A2=not physically fit B: TV viewing time (in hours per day, rounded to the nearest hour) has l=4 levels B1= 0, B2= (1-2), B3= (3-4), B4 =(5 or more)

  22. TV viewing and Physical Fitness • Data available on 1200 adult males surveyed gave the following counts:

  23. TV viewing and Physical Fitness Ho: TV viewing and Physical fitness are independent attributes Ha: They are associated Expected Cell Counts under Ho: (Row total)(Column Total) Total Frequency Chi-sq =  (obs. – exp.)2 / exp Degrees of freedom = (k-1)(l-1) Reject Ho if observed Chi-sq > Tabled Chi-sq.

  24. TV viewing and Physical Fitness Table of Observed (Expected) Frequencies

  25. TV viewing and Physical Fitness • All expected frequencies are  5; so we may use the goodness-of-fit chi-square Degrees of Freedom = (2-1)(4-1) = 3 Chi-sq = (35 – 25.5)2 + … + (34 – 32.7)2 25.5 32.7 = 6.13 At 5% level, tabled Chi-sq = 7.815 Decision Rule: Reject Ho if Chi-sq > 7.815

  26. TV Viewing and Physical Fitness • Conclusion: Since Observed Chi-sq = 6.13 is less than tabled value 7.815, we fail to reject Ho at 5% level. This means that in the light of the given data, it appears that Physical Fitness and TV viewing are independent of each other.

  27. References Text Book for the Course • Statistical Methods in Business and Social Sciences: Shenoy, G.V. & Pant, M. (Macmillan India Limited) Suggested Reading • Complete Business Statistics: Aczel, A.D. & Sounderpandian, J. – Fifth Edition (Tata McGraw-Hill)

More Related