- 85 Views
- Uploaded on
- Presentation posted in: General

Doing Quantitative Research 26E02900, 6 ECTS Cr.

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Doing Quantitative Research26E02900, 6 ECTS Cr.

Olli-Pekka Kauppila

Daria Volchek

Lecture II - May 14, 2014

AM session

- Descriptive statistics, assumptions for regression analyses
PM session

- Introduction to regression analysis

Deepen the understanding of research design and measures

Improve skills at using SPSS

Understand different ways of dealing with missing values

Learn more about computing variables

Learn the assumptions for multivariate analyses

Learn to make graphs to examine and illustrate your data

Learn to identify and deal with outlier observations

Learn more about interpreting correlations

Open SPSS software

File → Open → Data

Find and open your excel file

When the dataset is open, save it in .sav form

- File → Save as

No matter where and how you collect your data, you are likely to have missing values

Missing values are not a problem, provided that there are not too many of them, and you deal with them appropriately

- Remove any cases with a high number of missing values
- Do not use variables with a high number of missing values
- Common remedies
- Deletion: if missing values are few and randomly distributed
- Replace with means: if missing values are relatively few
- Multiple imputation: use software to estimate the missing values based on what is known about the case

Before analyses, you usually need to modify your variables

Examples:

- Transform reversely coded items: item 4B on job satisfaction scale is reversely coded. Thus, 4B_re: 5→1; 4→2; 3→3; 2→4; 1→5
- Compute summated scales: Job satisfaction = (4A + 4B_re + 4C + 4D) / 4
- Create dummies: e.g. Firm 1: Firm 1 = 1; others = 0
- Transform other variables: e.g. Employee age = data collection year - year of birth
SPSS: Transform → Compute variable

We need to use the following primary variables

- Job satisfaction (4A, 4B*, 4C, 4D)
- Risk avoidance (11A*, 11B*, 11C*, 11D*, 11F)
- Perceived managerial support (16A, 16B, 16C, 16D, 16E, 16F)
* = Reverse-coded items

- And the following background variables
- Age (= Data collection year - Birth year)
- Gender (1 = Female, 0 = others)
- Firm membership (three different firms → Create 3 dummy variables)

Analyze → Scale → Reliability analysis

Go to “statistics;” select all items from “descriptives for”

- Job satisfaction (4A, 4B_re, 4C, 4D): Alpha = .79 → ok
- Risk avoidance (11A_re, 11B_re, 11C_re, 11D_re, 11F): Alpha = .65; but .82 ifwedropitem 11F → dropit→ ok
- Perceived managerial support (16A, 16B, 16C, 16D, 16E, 16F): Alpha = .96 → ok
- Transform → Computevariable
- E.g. Target variable: Jobsat; Numericexpression: (@[email protected][email protected][email protected])/4

Normal distribution

Homoscedasticity

- I.e. equal levels of variance across the range of predictor variables
Linearity

Absence of uncorrelated errors

- I.e. relevant but unmeasured variables do not bias the results

SPSS: Graphs → Chart builder

Like in Excel, you find bar, line, area, and pie charts that you may use to depict your data

Particularly useful graphs:

- Scatter plots
- Histograms
- Boxplots
- Correlation analysis gives you an overview of how different variables are related one another

Linear regression line

Locally weighted regression line

How employee age relates to role clarity?

Or, perhaps the effect is not linear after all…

How values for employee role clarity are distributed?

Are distributions of role clarity any different for male and female employees?

Analyze → Descriptive statistics → Descriptives → Options → Skewness and kurtosis

When there is no skewness or kurtosis, the variable is normally distributed

Positively skewed distribution

Negatively skewed distribution

Common remedies; transform the variable by taking:

Logarithm or squared term

Squared or cubed terms

Peaked distribution - positive value

Flat distribution - negative value

Common remedies; transform the variable by taking:

Try all transformations

Inverse of the variable (1 / X or Y)

How risk avoidance is related to employee age?What can you tell about the distribution of job satisfaction?Does the level of perceived managerial support vary between firms?

Classroom exercise

Older employees tend to be more risk averse than younger employees

Outliers are observations that deviate substantially from other observations

The key question is: why is it that the outlier observation is so different?

In general, if the outlier observation seems to be caused by a mistake, then it should be deleted

- I.e. a respondent’s birth year is marked as 1776
If the outlier observation is substantially different from other observations, but nevertheless a “legitimate member” of the sample, it should be retained

- I.e. annual salaries of some (very few) individuals are millions of euros

Why these three individuals have such a low level of role clarity?

Should we remove these outliers from the analysis?

Correlation analysis shows you how different variables are related to one another

When the sample size increases, even relatively weak correlations become statistically significant

Because of multicollinearity, you do not want to include strongly correlated independent variables into the same model

Note: correlation does not imply causation!

SPSS: Analyze → Correlate → Bivariate

Dependent variable:

- Job satisfaction
Independent variables:

- Risk avoidance
- Perceived managerial support
- Control variables:
- Gender
- Age
- Firm affiliation

In most datasets, correlations above .2 are statistically significant. Correlations

above .5 are very strong

Usual cutoff values:

p < .001

p < .01

p < .05

This correlation is not significant. Will that be a problem?

What do these correlations tell us?

Should we exclude a firm dummy from the regression model?

Will risk avoidance cause job satisfaction?

What is the role of employee age?

Besides correlations, researchers usually report means and standard deviations of the variables

- Mean is informative as it gives you a fairly good understanding of the overall level of variables.
I.e. It is quite different if the mean value of job satisfaction is 2.2, rather than 3.9 (on a 5-point Likert scale)

- Standard deviation is informative, because it helps you interpret high and low values. Values one standard deviation above the mean value are usually considered as “high,” and values one standard deviation below the mean value are “low”
I.e. If job satisfaction’s mean value is 3.9 and standard deviation 0.4. Thus, job satisfaction of 4.3 is “high” and 3.5 is “low.”