Data Analysis: Review and Practical Application using SPSS

Data Analysis: • Review and Practical Application using SPSS

Data of Interest • National Insurance Company • 1000 questionnaires sent • 285 respondents • Questionnaire Presentation • Copy given in class

Coding • Coding broadly refers to the set of all tasks associated with transforming edited responses into a form that is ready for analysis • Steps • Transforming responses to each question into a set of meaningful categories • Assigning numerical codes to the categories • Creating a data set suitable for computer analysis

Transforming Responses into Meaningful Categories • A structured question is pre-categorized • Responses to a nonstructured or open-ended question to be grouped into a meaningful and manageable set of categories Q 1: In this questionnaire, how many non-categorized questions?

Missing-Value Category • A missing value can stem from • A respondent's refusal to answer a question • An interviewer's failure to ask a question or record an answer or a "don't know" that does not seem legitimate • Best way to treat missing value responses • Sound questionnaire design • Tight control over fieldwork

Assigning Numerical Codes • Assign appropriate numerical codes to responses that are not already in quantified form • To assign numerical codes, the researcher should facilitate computer manipulation and analysis of the responses

Multiple Response Question –Rank Order Question • Please rank the following Insurance companies by placing a 1 beside the company you think is best overall, a 2 beside the company you think is second best, and so on.__________Progressive__________All State__________National • Q2 How would you code the previous question to be added to the questionnaire ? This question requires as many variables (and columns) as there are objects to be ranked: 3 separate variables are needed

Creating a Data Set • Organized collection of data records • Each sample unit within the data set is called a Case or Observation • Structure of a Data Set • The number of observations = n • The total number of variables embedded in the questionnaire is m, then • Data set = n x m matrix of numbers • Importance of Coding Sheet: Anybody can enter /check data set. (Copy of coding sheet)

SPSS Data Set • 2 Views : Variable and Data. • Raw Variable (labels and values) • Transformed Variable (compute and recode)

Preliminary Data Analysis: Basic Descriptive Statistics • Preliminary data analysis examines the central tendency and the dispersion of the data on each variable in the data set • Measurement level dictates what to do • Feeling for the data • What can we do: limitations on next slide? Run descriptives. (outputs 1)

Measures of Central Tendency and Dispersion for Different Types of Variables

Why Averages May be Misleading • Researchers tested a new sauce product and found • Mean rating of the taste test was close to the middle of the scale, which had "very mild" and "very hot" as its bipolar adjectives • Researcher’s conclusion • Consumers need really neither really hot nor really mild sauce

Why Averages May be Misleading (Cont’d) • Deeper examination revealed • The existence of a large proportion of consumers who wanted the sauce to be mild and an equally large proportion who wanted it to be hot nor really mild sauce • Moral of the story: • A clear understanding of the distribution of responses can help a researcher avoid erroneous inferences. Talk about Skewness and Kurtosis.

Crosstabs: Occurencies in specific condition. • Most of the time with categorical variables • Examples to run

Cross-Tabulations- Comparing frequencies: Chi-square Contingency Test • Technique used for determining whether there is a statistically significant relationship between two categorical (nominal or ordinal) variables

Cross-Tabulation Using SPSS for National Insurance Company • One crucial issue in the customer survey of National Insurance Company was how a customer's education was associated with whether or not she or he would recommend National to a friend.

Need to Conduct Chi-square Test to Reach a Conclusion • The hypotheses are: • H0:There is no association between educational level and willingness to recommend National to a friend (the two variables are independent of each other). • Ha:There is some association between educational level and willingness to recommend National to a friend (the two variables are not independent of each other). • Let’s do it….

Conducting the Test • Test involves comparing the actual, or observed, cell frequencies in the cross-tabulation with a corresponding set of expected cell frequencies(Eij)

Expected Values ninj Eij = ----- n where niand njare the marginal frequencies, that is, the total number of sample units in category i of the row variable and category j of the column variable, respectively

Chi-square Test Statistic where r and c are the number of rows and columns, respectively, in the contingency table. The number of degrees of freedom associated with this chi‑square statistic are given by the product (r - 1)(c - 1).

Computed Chi-square value P-value National Insurance Company Study

National Insurance Company Study --P-Value Significance • The actual significance level (p-value) = 0.019 • the chances of getting a chi-square value as high as 10.007 when there is no relationship betweeneducation and recommendation are less than 19 in 1000. • The apparent relationship between education and recommendation revealed by the sample data is unlikely to have occurred because of chance. • We can safely reject null hypothesis.

Precautions in Interpreting Cross Tabulation Results • Two-way tables cannot show conclusive evidence of a causal relationship • Watch out for small cell sizes • Increases the risk of drawing erroneous inferences when more than two variables are involved

Overview of Techniques for Examining Associations • Spearman Correlation Coefficient Technique • The technique is appropriate when • The degree of association between two sets of ranks (pertaining to two variables) is to be examined • Illustrative Research Question(s) This Technique Can Answer: • Is there a significant relationship between motivation levels of salespeople and the quality of their performance? • Assume that the data on motivation and quality of performance are in the form of ranks, say, 1through 20, for 20 salespeople who were evaluated subjectively by their supervisor on each variable

Overview of Techniques for Examining Associations (Cont’d) • Pearson Correlation Coefficient Technique • This technique is appropriate when • The degree of association between two metric-scaled (interval or ratio) variables is to be examined • Illustrative Research Question(s) This Technique Can Answer: • Is there a significant relationship between customers' age (measured in actual years) and their perceptions of our company's image (measured on a scale of 1to 7)?

A Spearman correlation coefficient is a measure of association between two sets of ranks di = the difference between the ith sample unit's ranks on the two variables n = the total sample size Spearman Correlation Coefficient

The Pearson correlation coefficient is the degree of association between variables that are interval-or ratio-scaled. Pearson correlation coefficient (rxy) between them is given by n = sample size (total number of data points) X and Y = means Xi and Yi = values for any sample unit i sx and sy = standard deviations n S (Xi – X)(Yi – Y) = 1 i rxy = ----------------------------- (n-1) sx sy Pearson Correlation Coefficient

National Insurance Company– Computing Pearson Correlation Among Service Quality Constructs • National Insurance Companywas interested in the correlations between respondents’ overall service-quality perceptions (on the 10-point scale) and their average ratings along each of the five dimensions of Service Quality

National Insurance Company– Computing Pearson Correlation Among Service Quality Constructs Using SPSS

Interpreting Pearson Correlation Coefficients • Each of the five service-quality measures (reliability, empathy, tangibles, responsiveness, and assurance) is significantly related to the overall quality (OQ) at the .001 level of significance • Responsiveness has the strongest correlation (.8625) • Tangibles have the weakest correlation (.5038) • All the correlations are strong enough to be meaningful

Comparing Means • Mainly T-tests and ANOVAs • T-test on OQ and gender.

Independent T-tests • Independent Variable with 2 categories max. • Equality of variance (cf output) • 88% of chance that the difference of .04 is due to chance (random effect). Cannot reject the null hypothesis.

Analysis of Variance • ANOVA is appropriate in situations where the independent variable is set at certain specific levels (called treatments in an ANOVA context) and metric measurements of the dependent variable are obtained at each of those levels

24 Stores Chosen randomly for the study 8 Stores randomly chosen for each treatment Treatment 1 Store brand sold at the regular price Treatment 2 Store brand sold at 50¢ off the regular price Treatment 3 Store brand sold at 75¢ off the regular price monitor sales of the store brand for a week in each store Example

Table 15.2 Unit Sales Data Under Three Pricing Treatments

ANOVA –Grocery Store Hypothesis • Grocery Store Example • Ho1 = 2 = 3 • Ha At least one  is different from one or more of the others • Hypotheses for K Treatment groups or samples • Ho1 = 2 = ………..k • Ha At least one  is different from one or more of the others

Exhibit 15.1 SPSS Computer Output forANOVA Analysis

There is less than a .001 probability of obtaining an F-value as high as 137.447 Exhibit 15.1 SPSS Computer Output forANOVA Analysis (Cont’d)

ANOVA • OQ recommendation and OQ, individual variable • OQ and EDUC (Graph)..and post hoc

Overview of Techniques for Examining Associations (Cont’d) • Simple Regression Analysis Technique • This technique is appropriate when • A mathematical function or equation linking two metric-scaled (interval or ratio) variables is to be constructed, under the assumption that values of one of the two variables is dependent on the values of the other

Overview of Techniques for Examining Associations–Simple Regression Analysis (Cont’d) • Illustrative Research Question(s) this Technique Can Answer: • Are sales (measured in dollars) significantly affected by advertising expenditures (measured in dollars)? • What proportion of the variation in sales is accounted for by variation in advertising expenditures? How sensitive are sales to changes in advertising expenditures?

Overview of Techniques for Examining Associations (Cont’d) • Multiple Regression Analysis Technique • This technique is appropriate when • Under the same conditions as simple regression analysis except that more than two variables are involved wherein one variable is assumed to be dependent on the others

Overview of Techniques for Examining Associations (Cont’d) • Illustrative Research Question(s) this Technique Can Answer: • Are sales significantly affected by advertising expenditures and price (where all three variables are measured in dollars)? • What proportion of the variation in sales is accounted for by advertising and price? How sensitive are sales to changes in advertising and price?

Simple Regression Analysis • Generates a mathematical relationship (called the regression equation) between one variable designated as the dependent variable (Y) and another designated as the independent variable (X)

Independent Variable Vs.Dependent Variable • Independent variable • Explanatory or predictor variable • Often presumed to be a cause of the other • Dependent variable • Criterion Variable • Influenced by the independent variable

Practical Applications of Regression Equations • The regression coefficient, or slope, can indicate how sensitive the dependent variable is to changes in the independent variable • The regression equation is a forecasting tool for predicting the value of the dependent variable for a given value of the independent variable

Precautions In Using Regression Analysis • Only capable of capturing linear associations between dependent and independent variables • A significant R2-value does not necessarily imply a cause-and-effect association between the independent and dependent variables • A regression equation may not yield a trustworthy prediction of the dependent variable when the value of the independent variable at which the prediction is desired is outside the range of values used in constructing the equation

Precautions In Using Regression Analysis (Cont’d) • A regression equation based on relatively few data points cannot be trusted • The ranges of data on the dependent and independent variables can affect the meaningfulness of a regression equation

Multiple Regression Analysis • Yi = a + b1X1i + b2X2i + … + bkXki • Yi is the predicted value of the dependent variable for some unit i; • X1i, X2i, …, Xki are values on the independent variables for unit i; • bl, b2, . . . , bk are the regression coefficients; • a is the Y-intercept representing the prediction for Y when all independent variables are set to zero

National Insurance Company– Multiple Regression Using SPSS • Jill and Tom were interested in conducting a multiple regression analysis wherein overall service quality perceptions is the dependent variable and the average ratings along the five dimensions are the indpendent variable

Data Analysis: Review and Practical Application using SPSS

Data Analysis: Review and Practical Application using SPSS

Presentation Transcript

Multivariate Data Analysis Using SPSS

Homework #3 is due 11/19 Bonus #2 is posted

Introduction to SPSS 16.0

Advanced Graphics Course

SPSS for Beginners

Practice for the Mid-Term

Multiple Indicator Cluster Surveys Data Processing Workshop

Basic Data Analysis IV Regression Diagnostics in SPSS

SPSS

Transferring VMS SAS/SPSS Data to UNIX

Introduction to SPSS

SPSS (the Statistical Package for the Social Sciences)

SPSS ýõëýí ñóðàõ

Business Statistics

SPSS for Beginners

Introduction to SPSS

GAP Toolkit 5 Training in basic drug abuse data management and analysis

Multivariate Data Analysis Using SPSS