standardization of variables l.
Skip this Video
Loading SlideShow in 5 Seconds..
Standardization of variables PowerPoint Presentation
Download Presentation
Standardization of variables

Loading in 2 Seconds...

play fullscreen
1 / 24

Standardization of variables - PowerPoint PPT Presentation

  • Uploaded on

Standardization of variables. Maarten Buis 5-12-2005. Recap. Central tendency Dispersion SPSS. Standardization. Is used to improve interpretability of variables. Some variables have a natural interpretable metric: e.g. income, age, gender, country.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Standardization of variables' - oistin

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
standardization of variables

Standardization of variables

Maarten Buis


  • Central tendency
  • Dispersion
  • SPSS
  • Is used to improve interpretability of variables.
  • Some variables have a natural interpretable metric: e.g. income, age, gender, country.
  • Others, primarily ordinal variables, do not: e.g. education, attitude items, intelligence.
  • Standardizing these variables makes them more interpretable.
  • Transforming the variable to a comparable metric
    • known unit
    • known mean
    • known standard deviation
    • known range
  • Three ways of standardizing:
    • P-standardization (percentile scores)
    • Z-standardization (z-scores)
    • D-standardization (dichotomize a variable)
when you should always standardize
When you should always standardize
  • When averaging multiple variables, e.g. when creating a socioeconomic status variable out of income and education.
  • When comparing the effects of variables with unequal units, e.g. does age or education have a larger effect on income?
p standardization
  • Every observation is assigned a number between 0 and 100, indicating the percentage of observation beneath it.
  • Can be read from the cumulative distribution
  • In case of knots: assign midpoints
  • The median, quartiles, quintiles, and deciles are special cases of P-scores.
p standardization8
  • Turns the variable into a ranking, i.e. it turns the variable into a ordinal variable.
  • It is a non-linear transformation: relative distances change
  • Results in a fixed mean, range, and standard deviation; M=50, SD=28.6, This can change slightly due to knots
  • A histogram of a P-standardized variable approximates a uniform distribution
linear transformation
Linear transformation
  • Say you want income in thousands of guilders instead of guilders.
  • You divide INCMID by f1000,-
linear transformation10
Linear transformation
  • Say you want to know the deviation from the mean
  • Subtract the mean (f2543,-) from INCMID
linear transformation12
Linear transformation
  • Adding a constant (X’ = X+c)
    • M(X’) = M(X)+c
    • SD(X’) = SD(X)
  • Multiply with a constant (X’ = X*c)
    • M(X’) = M(X)*c
    • SD(X’) = SD(X) * |c|
z standardization
  • Z = (X-M)/SD
  • two steps:
    • center the variable (mean becomes zero)
    • divide by the standard deviation (the unit becomes standard deviation)
  • Results in fixed mean and standard deviation: M=0, SD=1
  • Not in a fixed range!
  • Z-standardization is a linear transformation: relative distances remain intact.
z standardization14
  • Step 1: subtract the mean
  • c = -M(X)
  • M(X’) = M(X)+c
  • M(X’) = M(X)-M(X)=0
  • SD(X’)=SD(X)
z standardization15
  • Step 2: divide by the standard deviation
  • c is 1/SD(X)
  • M(Z) = M(X’) * c
  • M(Z) = 0 * 1/SD(X) = 0
  • SD(Z) = SD(X’) * c
  • SD(Z) = SD(X) * 1/SD(X) = 1
normal distribution
Normal distribution
  • Normal distribution = Gauss curve = Bell curve
  • Formula (McCall p. 120)
    • Note the (x-m)2 part
    • apart from that all you have to remember is that the formula is complicated
  • Normal distribution occurs when a large number of small random events cause the outcome: e.g. measurement error
normal distribution17
Normal distribution
  • Other examples the height of individuals, intelligence, attitude
  • But: the variables Education, Income and age in Eenzaam98 are not normally distributed
z scores and the normal distribution
Z-scores and the normal distribution
  • Z-standardization will not result in a normally distributed variable
  • Standardization in NOT the same as normalization
  • We will not discuss normalization (but it does exist)
  • But: If the original distribution is normally distributed, than the z-standardized variable will have a standard normal distribution.
standard normal distribution
Standard normal distribution
  • Normal distribution with M=0 and SD=1.
  • Table A in Appendix 2 of McCall
  • Important numbers (to be remembered):
    • 68% of the observations lie between ± 1 SD
    • 90% of the observations lie between ± 1.64 SD
    • 95% of the observations lie between ± 1.96 SD
    • 99% of the observations lie between ± 2.58 SD
why bother
Why bother?
  • If you know:
    • That a variable is normally distributed
    • the mean and standard deviation
  • Than you know the percentage of observations above or below and observation
  • These numbers are a good approximation, even if the variable is not exactly normally distributed
p z standardization
P & Z standardization
  • Both give a distribution with fixed mean, standard deviation, and unit
  • P-standardization also gives a fixed range
  • Both are relative to the sample: if you take observations out, than you have to re-compute the standardized variables
p z standardization22
P & Z-standardization
  • When interpreting Z-standardized variables one uses percentiles
  • With P-standardization one decreases the scale of measurement to ordinal, BUT this improves interpretability.
do before wednesday
Do before Wednesday
  • Read McCall chapter 5
  • Understand Appendix 2, table A
  • make exercises 5.7-5.28