Data Analysis: Analyzing Individual Variables & Basics of Hypothesis Testing

Data Analysis: Analyzing Individual Variables & Basics of Hypothesis Testing

Review: Stages in Research Process Formulate Problem Determine Research Design Determine Data Collection Method Design Data Collection Forms Design Sample & Collect Data Analyze and Interpret Data Prepare Written/Oral Report

Data Analysis: Two Key Considerations • (1) Is the variable to be analyzed by itself • (univariate analysis) or in relationship to • other variables (multivariate analysis)? • (2) What level of measurement was used? • If you can answer these two questions, data analysis is easy...

Level of Measurement • CATEGORICAL MEASURES: A commonly used expression for nominal and ordinal measures. • CONTINUOUS MEASURES: A commonly used expression for interval and ratio measures.

Basic Univariate Statistics: Categorical Measures • FREQUENCY ANALYSIS: A count of the number of cases that fall into each of the response categories.

Use of Percentages • Percentages are very useful for interpreting the results of categorical analyses and should be included whenever possible. • Unless your sample size is VERY large, however, report percentages as whole numbers (i.e., no decimals)

Frequency Analysis • Researchers almost always work with “valid” percentages which are simply percentages after taking out cases with missing data on the variable being analyzed. • Note: In the example, there were no missing cases. As a result, the “Percent” column entries were identical to the “Valid Percent” column entries.

Uses of Frequency Analysis • Identify blunders and cases with excessive item nonresponse • Identify outliers • Univariate categorical analysis • Determine empirical distribution of a variable

Confidence Interval • A projection of the range within which a population parameter will lie at a given level of confidence based on a statistic obtained from a probabilistic sample. This is why you need to draw a probability sample!

Confidence Intervals for Proportions where z = z score associated with the desired level of confidence; p = the proportion obtained from the sample; and n = the number of valid cases overall on which the proportion was based. CONFIDENCE INTERVAL:

Confidence Intervals for Proportions Therefore, we can be 95% confident that the proportion of people in the population who would respond that they had financed their most recent car purchase is between .21 and .39, inclusive.

CAUTION in Interpreting Confidence Intervals • The confidence interval only takes sampling error into account. • It DOES NOT account for other common types of error (e.g., response error, nonresponse error). • The goal is to reduce TOTAL error, not just one type of error.

Basic Univariate Statistics: Continuous Measures • DESCRIPTIVE STATISTICS: Statistics that describe the distribution of responses on a variable. The most commonly used descriptive statistics are the mean and standard deviation.

Converting Continuous Measures to Categorical Measures • Sometimes it is useful to convert continuous measures to categorical measures. • This is legitimate, because measures at higher levels of measurement (in this case, continuous measures) have all the properties of measures at lower levels of measurement (categorical measures). • Why do this?Ease of interpretation for managers

Converting Continuous Measures to Categorical Measures • TWO-BOX TECHNIQUE: A technique for converting an interval-level rating scale into a categorical measure usually used for presentation purposes. The percentage of respondents choosing one of the top two positions on a rating scale is reported.

Converting Continuous Measures to Categorical Measures Please rate the quality of service provided by Better Smiles Dental Office on the following scales: very very poor poor neutral good good Dental technicians (2) (6) (36) (32) (24) Receptionist (10) (16) (18) (36) (20) Dentist (17) (17) (35) (21) (10) Frequency count of respondents selecting each response category shown in red

Converting Continuous Measures to Categorical Measures two-box mean (s.d.) Dental technicians 56% 3.70 (0.97) Receptionist 56% 3.40 (1.25) Dentist 31% 2.90 (1.21) (n=100)

Confidence Intervals for Means where z = z score associated with the desired level of confidence; s = the sample standard deviation; and n = the total number of cases used to calculate the mean. CONFIDENCE INTERVAL:

Confidence Intervals for Means • EXAMPLE:A sample of 100 car owners revealed that the mean number of family members was 4.0, with a sample standard deviation of 1.9 family members. Assuming that the 100 respondents had been secured using a probability sampling plan, what is the 95% confidence interval for the mean number of family members in the population?

Confidence Intervals for Means Therefore, we can be 95% confident that the mean number of family members in the population lies somewhere between 3.6 and 4.4, inclusive.

Hypothesis Testing THE ISSUE: How can we tell if a particular result in the sample represents the true situation in the population or simply occurred by chance?

Hypotheses • Unproven propositions about some phenomenon of interest.

Hypothesis Testing Null Hypothesis (Ho)The hypothesis that a proposed result is not true for the population. Researchers typically attempt to reject the null hypothesis in favor of some alternative hypothesis. Alternative Hypothesis (HA)The hypothesis that a proposed result is true for the population.

Typical Hypothesis Testing Procedure Specify Null and Alternative Hypotheses after Analyzing the Research Problem Choose an Appropriate Statistical Test Considering the Research Design and after Determining the Sampling Distribution That Applies Given the Chosen Test Statistic Specify the Significance Level (Alpha) for the Problem Being Investigated Collect the Data and Compute the Value of the Test Statistic Appropriate for the Sampling Distribution Determine the Probability of the Test Statistic under the Null Hypothesis Using the Sampling Distribution Specified in Step 2 Compare the Obtained Probability with the Specified Significance Level and Then Reject or Do Not Reject the Null Hypothesis on the Basis of the Comparison

Significance Level (α) • The acceptable level of Type I error selected by the researcher, usually set at 0.05. Type I error is the probability of rejecting the null hypothesis when it is actually true for the population.

p-value • The probability of obtaining a given result if in fact the null hypothesis were true in the population. A result is regarded as statistically significant if the p-value is less than the chosen significance level of the test.

Common Misinterpretations of What “Statistically Significant” Means Viewing the  or p levels as if they are somehow related to the probability that the research (alternative) hypothesis is true (e.g., a p-value such as p<.001 is “highly significant” and therefore more valid than p<.05). Viewing p-values as if they represent the probability that the results occurred because of sampling error (e.g., p=.05 implies that there is only a .05 probability that the results were caused by chance). Assuming that statistical significance is the same thing as managerial significance.

Testing Hypotheses about Individual Variables • Chi-square Goodness-of-Fit Test for Frequencies: A statistical test to determine whether some observed pattern of frequencies corresponds to an expected pattern.

Testing Hypotheses about Individual Variables • Kolmogorov-Smirnov Test: A statistical test used with ordinal data to determine whether some observed pattern of frequencies corresponds to some expected pattern; also used to determine whether two independent samples have been drawn from the same population or from populations with the same distribution.

Testing Hypotheses about Individual Variables • Z-test for Comparing Sample Proportion against a Standard where p = proportion from the sample, π = the proportion standard to be achieved, σp = the standard error of the proportion, and n = number of respondents in the sample.

where x = sample mean, μ = the population standard, sx = the standard error of the mean, s = sample standard deviation, and n = sample size. Testing Hypotheses about Individual Variables • t-test for Comparing Sample Mean against a Standard (Small Sample, n ≤ 30)

where x = sample mean, μ = the population standard, sx = the standard error of the mean, s = sample standard deviation, and n = sample size. Testing Hypotheses about Individual Variables • z-test for Comparing Sample Mean against a Standard (Large Sample, n > 30)

Data Analysis: Analyzing Individual Variables & Basics of Hypothesis Testing