1 / 13

Chapter 3.4

Chapter 3.4. Exploratory Data Analysis. Traditional Statistics. Data are organized by using a frequency distribution Use distribution to create various graphs, histogram, frequency polygon, ogive Mean and standard deviation are computer to summarize data

psyche
Download Presentation

Chapter 3.4

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 3.4 Exploratory Data Analysis

  2. Traditional Statistics • Data are organized by using a frequency distribution • Use distribution to create various graphs, histogram, frequency polygon, ogive • Mean and standard deviation are computer to summarize data • Purpose is to confirm various conjectures about the nature of the data

  3. Exploratory Data Analysis (EDA) • Purpose is to examine data to find out what information can be discovered about the data such as the center and the spread • Organized using a stem and leaf plot • Measure of central tendency is the median and variation is the interquartile range • Represented graphically using a boxplot

  4. Quartiles • Quartiles divide the distribution into four groups, separated by Q1, Q2,Q3 • Q1 is the same as the 25th percentile • Q2 is the same as the 50th percentile (median) • Q3 is the same as the 75th percentile For example: 5, 6, 12, 13, 15, 18, 22, 50

  5. The five number summary • The lowest value of the data set (minimum) • Q1 • the median • Q3 • The highest value of the data set (maximum)

  6. Boxplot • A boxplot is a graph of a data set obtained by drawing a horizontal line from the minimum data value to Q1, drawing a horizontal line from Q3 to the maximum data value, and drawing a box whose vertical sides pass through Q1andQ3 with a vertical line inside the box passing through the median or Q2

  7. Procedure for constructing a boxplot • Find the five-number summary for the data values • Draw a horizontal axis with a scale such that it includes the maximum and the minimum data values. • Draw a box whose vertical sides go through Q1 and Q3and draw a vertical line through the median • Draw a line from the minimum data value to the left side of the box and a line from the maximum data value to the right side of the box.

  8. Number of Meteorites Found • The number of meteorites found in 10 states of the U. S. is 89, 47, 164, 296, 30, 215, 138, 78, 48, 39. Construct a boxplot for the data

  9. Information obtained from a boxplot • If the median is near the center of the box, the distribution is approximately symmetric • If the median falls to the left for the center of the box, the distribution is positively (right) skewed. • If the median falls to the right of the center, the distribution is negatively (left) skewed. • If the lines are about the same length, the distribution is approximately symmetric • If the right line is larger than the left line, the distribution is positively (right) skewed • If the left line is larger than the right line, the distribution is negatively (left) skewed

  10. Sodium Content of Cheese • A dietitian is interest in comparing the sodium content of real cheese with the sodium content of a cheese substitute. Compare the distribution using boxplots.

  11. Resistant Statistic • A resistant statistic is relatively less affected by outliers than a nonresistant statistic. • The mean and standard deviation are nonresistant statistics • Sometimes, when a distribution is skewed or contains outliers, the median and interquartile range may more accurately summarize the data than the mean and standard deviation

  12. Correspondence between traditional and exploratory data analysis

  13. Try it! • Applying the concepts 3-4 • Pg. 174

More Related