220 likes | 246 Views
Learn about arithmetic mean, mode, median, quartiles, percentiles, mode, range, average deviation, variance, and standard deviation in data analysis.
E N D
Chapter 3 Data Description • You can describe a human being by physical and intellectual measures. • You can describe a sample data set by using two types of measures: • Measures of central tendency, • Measures of dispersion or variation
Measures of Central Tendency • There are three measures of central tendency: • Arithmetic mean • Mode • Median
Arithmetic means • The formula for unorganized data: _ X = (åXi) / n formula 3-1 page 40 The formula for organized data: _ X = ( åXj • Fr ( Xj) ) / n formula 3-2, p.41
Exercises from book • Example problem 3-1 (page 40) • The mean of 11 observations: _ 1 + 1 + 2 + 2 + 3 + 4 + 5 + 5 + 9 + 10 • X = --------------------------------------------- 11 • = 44/11= 4
Median • The median is the middle value after a sample data set is arranged in order (descending or ascending). Median for unorganized data • If there are n-number of observations in a data set, then (n+1)/2th observation represents the median. • If there are an even number of observations in a data set, then (n+1)/2 will not generate a whole number. In this case, one takes a value between the two neighboring values as the median. Example Problems 3-5 on Page 45 3-6 on Page 46
Median for organized data • The median for organized dataThe formula for calculating the median of an organized data set is shown in 3-5 on page 46. The formula is: MD = (n/2) – CF(xm-1 L + ----------------------- w m FR(xm) formula 3-5 (p. 46)
Quartiles • There are three quartiles (q1, q2, and q3) that divide a data set into four equal parts. • Q1 one-fourth of data are below q1 Q2 half of data are below q2 same as median and mean Q3 three-fourths of data are below q3 Q1 Q2 Q3
Percentiles • There are 99 percentiles that divide a set of data into 100 equal parts.Each part is called a percentile. 1st percentile-1% of data below10th percentile-10% of data below 75th percentile-75% of data below
The mode • The value that appears most frequently is the mode of a data set. 20 students are classified according to the colors of their eyes eye color blue brown dark green # of students 6 8 4 2 Which value appears most frequently?
The mode Brown (not 8) So, brown is the mode
Measures of dispersion • A human being can’t be described fully by height only. Weight is another measure that we use to describe someone. Similarly, a data set can’t be described fully by measures of central tendency. We need a new measures, called measures of dispersion or variation or variability. • Review the table on page 53.
Measures of dispersion • There are three measures of dispersion: 1. range 2. average deviation 3. variance
Range/Average Deviation • Range The difference between the highest and lowest value of a data set is the range. • Average deviation Arithmetic mean of the absolute values of the deviations from the mean.
_ AD = (å | x – xi |) / n Formula 3-6, page 54 Consider table 3-3 on page 55 Average Deviation is calculated as 1.6
Variance and Standard deviation • Variance is somewhat similar to average deviation. If individual deviations from the mean are squared, and their average is calculated, it represents variance of a data set. 2 _ • S = å(x – xi)2 / (n-1) formula 3-11 on page 59.
Alternative formula: 2 2 2 S = (åX – (åX) / n) / (n-1) The square root of S2 gives standard deviation, s. • Exercises from the book Problem 2 (page 61)
Uses of the standard deviation • What is standard deviation ? It is a measure of how much a data set deviates, on the average, from its mean.
Uses of the standard deviation • The further away we go (each direction) from the mean, more and more observations are covered by the two points. Look at the following:
Uses of the standard deviation • ------------Mean---------------------------- Mean ---------------------------------Mean---------------------- • There are two theorems that tell us what proportion of observations lies within a specified number of standard deviation from the mean:
Tchebycheff’s Theorem • The proportion of any set of values that will lie within k standard deviations from the mean is at least 1-(1/k2) where k is greater than 1 • Exercises from the book: Example problem 3-15 (page 65) Example problem 5 (p.67)
The Normal Rule • If a data set follows a symmetrical, bell-shaped distribution, then 68 percent of the individual observations fall within one standard deviation from the mean; 95 percent of the observations fall within two standard deviations from the mean: and almost 100 percent of the observations fall within three standard deviations from the mean.
Example Problem 3-17 (page 66) A) how many members earn between $1250 and $1550? B) how many members earn between $1100 and $1700?