Statistical Methods in Computer Science

1 / 20

# Statistical Methods in Computer Science - PowerPoint PPT Presentation

Data 2: Central Tendency & Variability Ido Dagan. Statistical Methods in Computer Science. Frequency Distributions and Scales. Characteristics of Distributions. Shape, Central Tendency, Variability. Different Central Tendency. Different Variability. This Lesson.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## Statistical Methods in Computer Science

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
1. Data 2: Central Tendency & Variability Ido Dagan Statistical Methods in Computer Science

2. Frequency Distributions and Scales

3. Characteristics of Distributions Shape, Central Tendency, Variability Different Central Tendency Different Variability

4. This Lesson Examine measures of central tendency Mode (Nominal) Median (Ordinal) Mean (Numerical) Examine measures of variability (dispersion) Entropy (Nominal) Variance (Numerical), Standard Deviation Standard scores (z-score)

5. Centrality/Variability Measuresand Scales

6. The Mode (Mo)השכיח The mode of a variable is the value that is most frequent Mo = argmax f(x) For categorical variable: The category that appeared most For grouped data: The midpoint of the most frequent interval Under the assumption that values are evenly distributed in the interval

7. Finding the Mode: Example 1 The collection of values that a variable X took during the measurement ? Depends on Grouping

8. Finding the Mode: Example 2 The mode of a grouped frequency distribution depends on grouping 87 88 86

9. The Median (Mdn)החציון The median of a variable is its 50th percentile, P50. The point below which 50% of all measurements fall Requires ordering: Only ordinal and the numerical scales Examples: 0,8,8,11,15,16,20 ==> Mdn = 11 12,14,15,18,19,20 ==> Mdn = 16.5 (halfway between 15 and 18).

10. The Median (Mdn)החציון The median of a variable is its 50th percentile, P50. The point below which 50% of all measurements fall Requires ordering: Only ordinal and the numerical scales Examples: 0,8,8,11,15,16,20 ==> Mdn = 11 12,14,15,18,19,20 ==> Mdn = 16.5 (halfway between 15 and 18). 5,7,8,8,8,8 ==> Mdn = ?

11. The Median (Mdn)החציון The median of a variable is its 50th percentile, P50. The point below which 50% of all measurements fall Requires ordering: Only ordinal and the numerical scales Examples: 0,8,8,11,15,16,20 ==> Mdn = 11 12,14,15,18,19,20 ==> Mdn = 16.5 (halfway between 15 and 18). 5,7,8,8,8,8 ==> Mdn = ? One method: Halfway between first and second 8, Mdn = 8 Another: Use linear interpolation as we did in intervals, Mdn = 7.75 7.75 = 7.5 + (¼ * 1.0)

12. The Median (Mdn)החציון The median of a variable is its 50th percentile, P50. The point below which 50% of all measurements fall Requires ordering: Only ordinal and the numerical scales Examples: 0,8,8,11,15,16,20 ==> Mdn = 11 12,14,15,18,19,20 ==> Mdn = 16.5 (halfway between 15 and 18). 5,7,8,8,8,8 ==> Mdn = ? One method: Halfway between first and second 8, Mdn = 8 Another: Use linear interpolation as we did in intervals, Mdn = 7.75 7.75 = 7.5 + (¼ * 1.0) between 7 and 8

13. The Median (Mdn)החציון The median of a variable is its 50th percentile, P50. The point below which 50% of all measurements fall Requires ordering: Only ordinal and the numerical scales Examples: 0,8,8,11,15,16,20 ==> Mdn = 11 12,14,15,18,19,20 ==> Mdn = 16.5 (halfway between 15 and 18). 5,7,8,8,8,8 ==> Mdn = ? One method: Halfway between first and second 8, Mdn = 8 Another: Use linear interpolation as we did in intervals, Mdn = 7.75 7.75 = 7.5 + (¼ * 1.0) 1 of four 8's

14. The Median (Mdn)החציון The median of a variable is its 50th percentile, P50. The point below which 50% of all measurements fall Requires ordering: Only ordinal and the numerical scales Examples: 0,8,8,11,15,16,20 ==> Mdn = 11 12,14,15,18,19,20 ==> Mdn = 16.5 (halfway between 15 and 18). 5,7,8,8,8,8 ==> Mdn = ? One method: Halfway between first and second 8, Mdn = 8 Another: Use linear interpolation as we did in intervals, Mdn = 7.75 7.75 = 7.5 + (¼ * 1.0) Width of interval containing 8's (real limits)

15. Arithmetic mean (mean, for short) Average is colloquial: Not precisely defined when used, so we avoid the term. The Arithmetic Meanממוצע חשבוני

16. Properties of Central Tendency Measures Mo: Relatively unstable between samples Problematic in grouped distributions Can be more than one: Distributions that have more than one sometimes called multi-modal For uniform distributions, all values are possible modes Typically used only on nominal data

17. Properties of Central Tendency Measures Mean: Responsive to exact value of each score Only interval and ratio scales Takes total of scores into account: Does not ignore any value Sum of deviations from mean is always zero: Because of this: sensitive to outliers Presence/absence of scores at extreme values Stable between samples, and basis for many other statistical measures

18. Properties of Central Tendency Measures Median: Robust to extreme values Only cares about ordering, not magnitude of intervals Often used with skewed distributions Mo Mdn Mean

19. Properties of Central Tendency Measures Contrasting Mode, Median, Mean Mo Mdn Mean

20. Properties of Central Tendency Measures Contrasting Mode, Median, Mean Mo Mdn Mean