Standardization of variables

1 / 24

# Standardization of variables - PowerPoint PPT Presentation

Standardization of variables. Maarten Buis 5-12-2005. Recap. Central tendency Dispersion SPSS. Standardization. Is used to improve interpretability of variables. Some variables have a natural interpretable metric: e.g. income, age, gender, country.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about 'Standardization of variables' - oistin

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### Standardization of variables

Maarten Buis

5-12-2005

Recap
• Central tendency
• Dispersion
• SPSS
Standardization
• Is used to improve interpretability of variables.
• Some variables have a natural interpretable metric: e.g. income, age, gender, country.
• Others, primarily ordinal variables, do not: e.g. education, attitude items, intelligence.
• Standardizing these variables makes them more interpretable.
Standardization
• Transforming the variable to a comparable metric
• known unit
• known mean
• known standard deviation
• known range
• Three ways of standardizing:
• P-standardization (percentile scores)
• Z-standardization (z-scores)
• D-standardization (dichotomize a variable)
When you should always standardize
• When averaging multiple variables, e.g. when creating a socioeconomic status variable out of income and education.
• When comparing the effects of variables with unequal units, e.g. does age or education have a larger effect on income?
P-Standardization
• Every observation is assigned a number between 0 and 100, indicating the percentage of observation beneath it.
• Can be read from the cumulative distribution
• In case of knots: assign midpoints
• The median, quartiles, quintiles, and deciles are special cases of P-scores.
P-standardization
• Turns the variable into a ranking, i.e. it turns the variable into a ordinal variable.
• It is a non-linear transformation: relative distances change
• Results in a fixed mean, range, and standard deviation; M=50, SD=28.6, This can change slightly due to knots
• A histogram of a P-standardized variable approximates a uniform distribution
Linear transformation
• Say you want income in thousands of guilders instead of guilders.
• You divide INCMID by f1000,-
Linear transformation
• Say you want to know the deviation from the mean
• Subtract the mean (f2543,-) from INCMID
Linear transformation
• Adding a constant (X’ = X+c)
• M(X’) = M(X)+c
• SD(X’) = SD(X)
• Multiply with a constant (X’ = X*c)
• M(X’) = M(X)*c
• SD(X’) = SD(X) * |c|
Z-standardization
• Z = (X-M)/SD
• two steps:
• center the variable (mean becomes zero)
• divide by the standard deviation (the unit becomes standard deviation)
• Results in fixed mean and standard deviation: M=0, SD=1
• Not in a fixed range!
• Z-standardization is a linear transformation: relative distances remain intact.
Z-standardization
• Step 1: subtract the mean
• c = -M(X)
• M(X’) = M(X)+c
• M(X’) = M(X)-M(X)=0
• SD(X’)=SD(X)
Z-standardization
• Step 2: divide by the standard deviation
• c is 1/SD(X)
• M(Z) = M(X’) * c
• M(Z) = 0 * 1/SD(X) = 0
• SD(Z) = SD(X’) * c
• SD(Z) = SD(X) * 1/SD(X) = 1
Normal distribution
• Normal distribution = Gauss curve = Bell curve
• Formula (McCall p. 120)
• Note the (x-m)2 part
• apart from that all you have to remember is that the formula is complicated
• Normal distribution occurs when a large number of small random events cause the outcome: e.g. measurement error
Normal distribution
• Other examples the height of individuals, intelligence, attitude
• But: the variables Education, Income and age in Eenzaam98 are not normally distributed
Z-scores and the normal distribution
• Z-standardization will not result in a normally distributed variable
• Standardization in NOT the same as normalization
• We will not discuss normalization (but it does exist)
• But: If the original distribution is normally distributed, than the z-standardized variable will have a standard normal distribution.
Standard normal distribution
• Normal distribution with M=0 and SD=1.
• Table A in Appendix 2 of McCall
• Important numbers (to be remembered):
• 68% of the observations lie between ± 1 SD
• 90% of the observations lie between ± 1.64 SD
• 95% of the observations lie between ± 1.96 SD
• 99% of the observations lie between ± 2.58 SD
Why bother?
• If you know:
• That a variable is normally distributed
• the mean and standard deviation
• Than you know the percentage of observations above or below and observation
• These numbers are a good approximation, even if the variable is not exactly normally distributed
P & Z standardization
• Both give a distribution with fixed mean, standard deviation, and unit
• P-standardization also gives a fixed range
• Both are relative to the sample: if you take observations out, than you have to re-compute the standardized variables
P & Z-standardization
• When interpreting Z-standardized variables one uses percentiles
• With P-standardization one decreases the scale of measurement to ordinal, BUT this improves interpretability.
Do before Wednesday