1 / 28

Jan. 20-23

Jan. 20-23. Shapes of distributions… “Statistics” for one quantitative variable… Mean and median Percentiles Standard deviations Transforming data… Rescale: Y = c times X Recenter: Y = X plus a adding variables to each other other transformations.

harriett
Download Presentation

Jan. 20-23

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Jan. 20-23 • Shapes of distributions… • “Statistics” for one quantitative variable… • Mean and median • Percentiles • Standard deviations • Transforming data… • Rescale: Y = c times X • Recenter: Y = X plus a • adding variables to each other • other transformations

  2. Shape of a distribution… • Outliers • Unimodal --- Bimodal --- Multimodal • Symmetrical • Skew - right or left?

  3. Colleges – Datadesk histogram

  4. GE daily changes ($/share)

  5. NH polls, 1/26/04 - errors

  6. Population • vs. • Sample

  7. A statistic is • anything that can be computed from data.

  8. STATISTICS of a single quantitative variable • MEAN • MEDIAN • QUARTILES ( Q1, Q3 ) • Five-number summary • Boxplots • Interquartile range • PERCENTILES / QUANTILES / FRACTILES • (“quantiles” and “fractiles” are synonyms for “percentiles” for people who don’t like the implied multiplication by 100) • STANDARD DEVIATION • VARIANCE

  9. Statistics of one variable… • MEAN — Sum of values, divided by n • MEDIAN — Middle value • (when values are ranked, smallest to largest) • (or, average of two middle values)

  10. Number of Colleges (ranked)

  11. Colleges – Datadesk histogram median — 5 mean — 5.36

  12. Salaries

  13. salaries median — 60,000 mean — 106,875

  14. So, which measure of “center” is best? • All the measures agree (roughly) when the distribution is symmetrical • Mean has attractive mathematical properties • Also, the mean is related to the total, if that’s what you care about • Median may be more “typical” when the distribution is non-symmetrical • A measure is “robust” if it works reasonably well under a wide variety of circumstances • Medians are robust

  15. Jan. 23 • RMS, Geometric mean • Percentiles, Quartiles (Q1, Q3), BOX PLOTS • Measures of spread: • IQR (range containing middle half) • Standard deviation ( , s ) • Variance • Transforming data… • Rescale: Y = c times X • Recenter: Y = X plus a • adding variables to each other • other transformations • “STANDARDIZING” a variable • NORMAL DISTRIBUTIONS

  16. Computing percentiles • To calculate 20-th percentile: • Rank the values from smallest to largest • Compute 20% of n… 20% of 72 = 14.4 • Count off that many values (from lowest)… • The value at which you stop is the 20-th percentile. • What if you stop between values ?

  17. Number of Colleges

  18. QUARTILES • Lower quartile (Q1) = 25-th percentile • Upper quartile (Q3) = 75-th percentile • ( What’s Q2 ? ) • INTERQUARTILE RANGE ( IQR ) = Q3 minus Q1

  19. Five-number summary • — maximum • — Q3 • — median • — Q1 • — minimum

  20. VARIANCE and STANDARD DEVIATION • VARIANCE (s2): • STANDARD DEVIATION (s):

  21. Linear Transformations • If you MULTIPLY or DIVIDE a variable by a constant… • Y = c times X Y = X / c • then… • measures of center are multiplied or divided by c • measures of spread are multiplied or divided by |c| • If you ADD or SUBTRACT a constant from a variable… • Y = X + a Y = X – a • then… • measures of center are increased (decreased) by a • measures of spread are UNCHANGED.

  22. More transformations • ADDING VARIABLES: • W = X + Y • Mean (W) = Mean (X) + Mean (Y) • Standard Deviation of (W) — anything can happen • OTHER TRANSFORMATIONS: • Y = X squared ? • Y = log (X) ? • …NO RELIABLE RULES for mean • or std. dev.

  23. Standardized Variables • Write and S for mean, standard deviation of X • Then form transformed variable: • Z = (X - ) / S • Then… • mean (Z) = 0 • std dev (Z) = 1 • Z answers the question: How many standard deviations is this value above (or below) the mean?

  24. Jan. 25 • More on transforming and standardizing variables • More on normal distributions Jan. 27++ Relations among variables --- scatterplots “independent” variables correlations linear regressions (best fit lines)

  25.  Normal Density Function • X ~ (,) •  = mean,  = std. dev. • (Why Greek? Why not x-bar, s?)

  26. Trying the integral • Standard normal: mean = 0, std. dev. = 1 • Density curve: • …so the area between a and b is: 1 0

  27. The core computation • If X ~ N(,), what fraction of values are between • a and b ? • Rule of 68 – 95 – 99.7 • Standardizing • Tables and computers • Reversing the calculation a b

  28. Standardizing • Same Question: • Is X between a and b ? • Is (X-)/ between (b-)/ and (b-)/ ? • But Z = (X-)/ is a variable with a standard normal distribution (mean 0, standard deviation 1). • So, if we can answer this question for standard normals, we can answer it for all normals.

More Related