Sampling in Statistics: Understanding Standardized Normal Distribution

Statistics in SPSSLecture 5 Petr Soukup, Charles University in Prague

Sampling

Why sampling? Sample vs. population Money, money, money We have only sample

Sample types Random (probability) – simple, multistage, cluster,... Purposive – quota Only for random sampled data we can use following tools for statistical inference

Standardized normal distribution

Stand. normal distribution Author: Karl Fridrich Gauss (Gaussian distribution) Model that is followed by many variables It is wise to know about it

Stand. normal distribution Mean is equal to 0 Standard deviation (and variance) is equal to 1 We use symbol N(0,1)

Stand. normal distribution SIX SIGMA RULE: NEARLY ALL VALUES ARE COVERD BY THE RANGE WITH THE WIDTH OF SIX STANDARD DEVIATIONS

Stand. normal distribution • 5 % of values are above 1.96 or below -1,96

Sampling distribution

Sampling distribution Basic idea (utopic): We carry out infinite number of samples and compute some descriptive statistic* (e.g. mean) Sampling distribution = distribution of statistics for individual samples Usually follow some well-known distribution (mainly normal distr.) *in sampling we use only term statistic (instead of descriptive)

Field’s example

Sampling distribution

Online simulation http://onlinestatbook.com/stat_sim/sampling_dist/index.html

Sampling distribution Basic statistic – standard error S.E. = standard deviation of sampling distribution Computation: , where s=standard deviation of the variable and N is sample size

Computation of std. deviation for sampling distribution (STANDARD ERROR) • SPSS: ANALYZE-DESCRIPTIVE STATISTICS-EXPLORE (for mean) • SPSS: ANALYZE-DESCRIPTIVE STATISTICS-EXPLORE (for proportion of binary variable) – tip: use 0,1 coding • ? How to compute it for nominal or ordinal data (one category)?

Confidence interval (CI) • Try to cover (estimate) unknown parameter for population by the range • Mostly 95 % coverage (intervals) • Normal distribution: MEAN +- 2*SD (95%) • Conf. Int.: MEAN +- 2*S.E. (95%) • etc.

Usage of STANDARD ERROR: Confidence interval for mean • SPSS: ANALYZE-DESCRIPTIVE STATISTICS-EXPLORE (for mean) • Computation: MEAN +- 2*S.E. (95%)

Usage of STANDARD ERROR: Confidence interval for proportion • SPSS: ANALYZE-DESCRIPTIVE STATISTICS-EXPLORE (for proportion) • Computation: MEAN +- 2*S.E. (95%) • Use 0,1 coding

HW5 Try to compute confidence interval for mean (one cardinal variable) and for proportion (one binary variable). Interpret results.

Thanks for your attention

Sampling in Statistics: Understanding Standardized Normal Distribution