1 / 33

Edpsy 511

Edpsy 511. Exploratory Data Analysis Homework 1: Due 9/20. Landmarks in the data. Quartiles We’re often interested in the 25 th , 50 th and 75 th percentiles. 39, 38, 38, 36, 36, 31, 29, 29, 28, 19 Steps First, order the scores from least to greatest. Second, Add 1 to the sample size.

Download Presentation

Edpsy 511

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Edpsy 511 Exploratory Data Analysis Homework 1: Due 9/20

  2. Landmarks in the data • Quartiles • We’re often interested in the 25th, 50th and 75th percentiles. • 39, 38, 38, 36, 36, 31, 29, 29, 28, 19 • Steps • First, order the scores from least to greatest. • Second, Add 1 to the sample size. • Why? • Third, Multiply sample size by percentile to find location. • Q1 = (10 + 1) * .25 • Q2 = (10 + 1) * .50 • Q3 = (10 + 1) * .75 • If the value obtained is a fraction take the average of the two adjacent X values.

  3. Box-and-Whiskers Plots (a.k.a., Boxplots)

  4. Shapes of Distributions • Normal distribution • Positive Skew • Or right skewed • Negative Skew • Or left skewed

  5. How is this variable distributed?

  6. How is this variable distributed?

  7. How is this variable distributed?

  8. Descriptive Statistics

  9. Statistics vs. Parameters • A parameter is a characteristic of a population. • It is a numerical or graphic way to summarize data obtained from the population • A statistic is a characteristic of a sample. • It is a numerical or graphic way to summarize data obtained from a sample

  10. Types of Numerical Data • There are two fundamental types of numerical data: • Categorical data: obtained by determining the frequency of occurrences in each of several categories • Quantitative data: obtained by determining placement on a scale that indicates amount or degree

  11. Techniques for Summarizing Quantitative Data • Frequency Distributions • Histograms • Stem and Leaf Plots • Distribution curves • Averages • Variability

  12. Summary Measures Summary Measures Variation Quartile Central Tendency Median Arithmetic Mean Mode Range Variance Standard Deviation

  13. Measures of Central Tendency Central Tendency Average (Mean) Median Mode

  14. Mean (Arithmetic Mean) • Mean (arithmetic mean) of data values • Sample mean • Population mean Sample Size Population Size

  15. Mean • The most common measure of central tendency • Affected by extreme values (outliers) 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 12 14 Mean = 5 Mean = 6

  16. Mean of Grouped Frequency

  17. Weighted Mean A form of mean obtained from groups of data in which the different sizes of the groups are accounted for or weighted.

  18. Median • Robust measure of central tendency • Not affected by extreme values • In an Ordered array, median is the “middle” number • If n or N is odd, median is the middle number • If n or N is even, median is the average of the two middle numbers 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 12 14 Median = 5 Median = 5

  19. Mode • A measure of central tendency • Value that occurs most often • Not affected by extreme values • Used for either numerical or categorical data • There may may be no mode • There may be several modes 0 1 2 3 4 5 6 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 No Mode Mode = 9

  20. The Normal Curve

  21. Different Distributions Compared

  22. Variability • Refers to the extent to which the scores on a quantitative variable in a distribution are spread out. • The range represents the difference between the highest and lowest scores in a distribution. • A five number summary reports the lowest, the first quartile, the median, the third quartile, and highest score. • Five number summaries are often portrayed graphically by the use of box plots.

  23. Variance • The Variance, s2, represents the amount of variability of the data relative to their mean • As shown below, the variance is the “average” of the squared deviations of the observations about their mean • The Variance, s2, is the sample variance, and is used to estimate the actual population variance, s 2

  24. Standard Deviation • Considered the most useful index of variability. • It is a single number that represents the spread of a distribution. • If a distribution is normal, then the mean plus or minus 3 SD will encompass about 99% of all scores in the distribution.

  25. Σ(X – X)2 Σ(X – X)2 Variance (SD2) = N-1 3640 9 N-1 √ Calculation of the Variance and Standard Deviation of a Distribution Raw Score Mean X – X (X – X)2 85 54 31 961 80 54 26 676 70 54 16 256 60 54 6 36 55 54 1 1 50 54 -4 16 45 54 -9 81 40 54 -14 196 30 54 -24 576 25 54 -29 841 = =404.44 Standard deviation (SD) =

  26. Comparing Standard Deviations Data A Mean = 15.5 S = 3.338 11 12 13 14 15 16 17 18 19 20 21 Data B Mean = 15.5 S = .9258 11 12 13 14 15 16 17 18 19 20 21 Data C Mean = 15.5 S = 4.57 11 12 13 14 15 16 17 18 19 20 21

  27. Facts about the Normal Distribution • 50% of all the observations fall on each side of the mean. • 68% of scores fall within 1 SD of the mean in a normal distribution. • 27% of the observations fall between 1 and 2 SD from the mean. • 99.7% of all scores fall within 3 SD of the mean. • This is often referred to as the 68-95-99.7 rule

  28. Fifty Percent of All Scores in a Normal Curve Fall on Each Side of the Mean

  29. Probabilities Under the Normal Curve

  30. Standard Scores • Standard scores use a common scale to indicate how an individual compares to other individuals in a group. • The simplest form of a standard score is a Z score. • A Z score expresses how far a raw score is from the mean in standard deviation units. • Standard scores provide a better basis for comparing performance on different measures than do raw scores. • A Probability is a percent stated in decimal form and refers to the likelihood of an event occurring. • T scores are z scores expressed in a different form (z score x 10 + 50).

  31. Probability Areas Between the Mean and Different Z Scores

  32. Examples of Standard Scores

More Related