1 / 34

Lecture 4 Descriptive Statistics

Math 15 Introduction to Scientific Data Analysis. Lecture 4 Descriptive Statistics. Course Lecture Schedule. There is a quiz today!. http://ccb.ucmerced.edu/home.php?id=jobs. Review – Absolute and relative references. How to display data. There are some steps missing!. Histogram

dirk
Download Presentation

Lecture 4 Descriptive Statistics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Math 15Introduction to Scientific Data Analysis Lecture 4 Descriptive Statistics UC Merced

  2. Course Lecture Schedule • There is a quiz today!

  3. http://ccb.ucmerced.edu/home.php?id=jobs UC Merced

  4. Review – Absolute and relative references UC Merced

  5. How to display data There are some steps missing! UC Merced

  6. Histogram • Histogram is a graphical display of tabulated frequencies. • Frequency • Number of observations in a given statistical category (or group). • Total outcomes of a class (bin, or group.) • A grouping is called a class (or bin). • Data is often summarized with 5 to 15 classes (bins). • Bin width = (Range of data)/(# of bins) UC Merced

  7. 1 0.167 0 0.0 1 0.167 1 0.167 3 0.5 6 1.0 Example: Let’s Make a Histogram by Hand! • Range: (Max – Min) = 74-51 = 23 • Bin width: (Range)/# of bins = 23/5 = 4.6 • Make table! 62,66,74,51,73,71 How frequently “the group” appeared? UC Merced

  8. 62,66,74,51,73,71 Remember these steps to create a histogram! UC Merced

  9. Any Questions? UC Merced

  10. One research may well generate masses of data. • For example, a comparatively small study that distributes 200 questionnaires with maybe 20 items on each can generate potentially 4000 items of raw data. • To make sense of this data it needs to be summarized in some way, so that • The reader has an idea of the typical values in the data, and how these vary. • The reader can construct a mental picture (including chart or graph) of the data and the people, events or objects they relate to. • To do this researchers use descriptive or summary statistics: they describe or summaries the data. • Describe a quantitative nature of the data UC Merced

  11. 1. Statistics • Science of collecting, organizing, presenting analyzing, and interpreting numerical data in relation to the decision making process. • Goal of Statistics • Get a “feel” for the data • Type of variables • Descriptive (Summary) statistics • Pictorial representation – Graphs and Charts We did this on the last lab! UC Merced

  12. Basic Statistics Definitions • Population • Totality under study • i.e. • Students attending UC Merced • US population (300 millions) • Fishes in Lake Yosemite, etc. • Sample • Subset of a population • i.e. • Students taking Math 15 (for UC Merced) • Fishes caught in Lake Yosemite, etc. You may need a number of samples to have a good statistics that reflect a nature of population! UC Merced

  13. 2. Types of descriptive statistics • All quantitative studies will have some descriptive statistics, • As well as frequency tables (Histograms). • For example, sample size, maximum and minimum values, averages and measures of variation of the data about the average. • The two main types of descriptive statistics encountered in research papers are: • Measures of central tendency • Measures of dispersion. UC Merced

  14. Measures of Central Tendency • These statistics provide a measure of what values lie at the center of the distribution. • The most common is called the MEAN or sometimes the AVERAGE (or the EXPECTED VALUE) • The formula for the sample mean is the sum of all values divided by the number of observations. UC Merced

  15. Median Measures of Central Tendency • The MODE represents the most frequently occurring value. • Thinking visually, the mode would be in a histogram the tallest bar. • The MEDIAN is the 50th percentile, such that half of the values are above the median and half the values are below. 51,62,66,71,73,74 n = 6 UC Merced

  16. Measures of Central Tendency • Mean (Average) • Excel: =AVERAGE(cell range) • Median • Excel: =MEDIAN(cell range) • Mode • Excel: =MODE(cell range) UC Merced

  17. The choice of which particular descriptive statistics to report will affect the “picture” that is presented of the data, and there is the potential to mislead. Example: Redmond, WA where Bill Gates and his family live: The city has ~46,000 people with a mean income of about $36,000. What would be the effect on mean and median of including Bill Gates (assuming his income is $2.5 billion per year)? UC Merced

  18. Simple Examples: Mean, Median, Mode • Respiratory rate • Group 1 = (11, 12, 13, 13, 14, 15); • Group 2 = (11, 12, 13, 13, 14, 39); • The mean (average) is more susceptible to extreme values • Median is 13 – value that divides data in 50:50 • Mode is 13 – the most common value UC Merced

  19. Example Question: What is average or median of these 400 data points? UC Merced

  20. Excel can do… Mode Average UC Merced

  21. Any Questions? UC Merced

  22. 3. Measures of dispersion or variability • Measures how “spread out” around the center are the data. • A measure of variability is the RANGE. • This is simply the maximum value minus the minimum value. UC Merced

  23. Mode RANGE Average Measures of dispersion or variability • RANGE. • This is simply the maximum value minus the minimum value. more susceptible to extreme values • There is no RANGE function in Excel: = Max() – Min() UC Merced

  24. Measures of dispersion or variability. • The most common measures of variability are • STANDARD DEVIATION and VARIANCE. • The variance is the standard deviation squared, or the standard deviation is the square root of the variance. • Personally, prefer to think in terms of standard deviations, because it represents the “typical” (or “standard”) deviation of values from the mean. UC Merced

  25. Measures of dispersion or variability. • The formula for the sample variance is the sum of squared deviations from the mean divided by the number of observations minus 1: • The sample standard deviation (= s) is simply the square root of the sample variance. Sum of square of distances from the mean UC Merced

  26. Mode Average Measures of dispersion or variability • Variance or Standard Deviation • The one on the left is more dispersed than the one on the right. It has a higher variance. UC Merced

  27. Measures of Dispersion • Range: • Excel: =MIN(cell range) and =MAX(cell range) to get minimum and maximum values • To define the Range: = MAX(:)-MIN(:) • Sample Standard Deviation and Sample Variance: • Excel: =STDEV(cell range) or =VAR(cell range) UC Merced

  28. 4. Shape of Curve UC Merced

  29. 4. Measures of Shape • Skewness is a measure of the lack of symmetry in a distribution, or whether the distribution is skewed to the left or the right. • “Positive skewness”: Values clustered toward lower range with a long tail extending to upper ranges. • “Negative skewness”: Values clustered toward upper range with long tail extending to lower ranges. • If you are interested, the formula for the skewness is: UC Merced

  30. Mode Average Illustrate the notion of skewness Mode • Both have the same average and variance. Average positively skewed negatively skewed UC Merced

  31. Measures of Shape • Skewness • Excel: =SKEW(cell range) UC Merced

  32. Any Questions? UC Merced

  33. Creating Wheelchair lanes in the side walk One more Example: How to make an analysis? Creating Wheelchair lanes in the side walk Creating Wheelchair lanes in the side walk UC Merced

  34. Important Announcements UC Merced

More Related