Data Measures and Transformation Methods
210 likes | 231 Views
Learn about numerical data description, sample median computation, measures of spread, boxplots, standard deviation calculation, and transformation techniques to analyze and interpret data effectively.
Data Measures and Transformation Methods
E N D
Presentation Transcript
Set 3 Numerical description of data, data transformation
Measures based on rank of data • Five number summary • Minimum = The smallest data value • First Quartile = Q1, 25% below, 75% above • Median = M, a middle, 50% below, 50% above • Third Quartile = Q3, 75 % below 25% above • Maximum = The largest data value • Measures of spread • Range = Max - Min • Interquartile range = range of the middle 50% IQR = Q3-Q1
Computation of sample median • Order the data values in increasing order • n odd, M = the middle value • n even, M = the average of two middle values • Data: 54, 9, 37, 15, 52, 40, 54, 128, 1 • Ordered data 1, 9, 15, 37, 40, 52, 54, 54, 128 Order 1 2 3 4 5 6 7 8 9 • n = 9, an odd number • M = 40
A simple example • Data: 54, 9, 37, 15, 52, 40, 54, 128, 1, 3 • Ordered data: 1, 3, 9, 15, 37, 40, 52, 54, 54, 128 • Min = 1, Max = 128 • Range = 128 - 1 = 127 • N= 10, M = Average 37 and 40 = 38.5 • Q1 = 9, approximately median of data below M • Q3 = 54, approximately median of data above M • IQR = 54 - 9 =45, 1.5 IQR = 67.5 • Outlier 128 > 54 + 67.5
Use computer • MINITAB (Version 14) Stat >> Basic Statistics >> Display Descriptive Statistics >>Statistics • Example data Variable N Min Q1 Med Q3 Max x10 1.0 7.5 38.5 78.0 54.0 • Harris Bank 1977 Salary data Variable N Min Q1 Med Q3 Max SALARY 93 3900.0 4890.0 5400.0 6000.0 8100.0
Boxplot • Graph >> Boxplot >> One Y >> Simple (SALARY) Max Outlier > Q3+1.5IQR Q3 Median IQR Q1 Min
Sample mean • Definition • Computation methods • Use the formula • Use a calculator • Use a computer • MINITAB Stat >> Basic Statistics >> Display Descriptive Statistics >>Statistics Or: Calc >> Column Statistics >> Mean Meanof salaries = 5420.3
Interpretation of the mean • Center of the gravity of the distribution x 1 2 6 3 -2 -1 +3 x 1 2 12 5 -4 -3 +7 • For any data, the sum of deviations from the mean is zero
Sample variance & standard deviation • Square deviation from the mean • Sum of square deviation from the mean • Variance = An average of the square deviations • Sample variance • SD = Square root of variance • Sample SD • SD is in the same unit as the variable • Sensitive to extreme values • Not suitable for skewed distribution
Computation • Use the variance formula • Sample variance • Use calculator and compute s • Data: 1, 12, 2 Standard deviation of x = 6.0828
Computation by MINITAB • MINITAB Stat >> Basic Statistics >> Display Descriptive Statistics >>Statistics Or: Calc >> Column Statistics >> Standard deviation • Harris Bank 1977 Salary data Variable N Mean St. Dev. Variance SALARY 93 5420.3 709.6 503514.0 Variable Min Q1 Med Q3 Max SALARY 3900 4890 5400 6000 8100
Descriptive statistics for two groups • Stat >> Basic Statistics >> Display Descriptive Stat >> By variable >> Statistics Variable Gender N Mean Median Tr Mean StDev Salaries 0 61 5138.9 5220.0 5137.1 539.9 1 32 5957 6000 5927 691 Variable Gender Min Max Q1 Q3 Salaries 0 3900.0 6300.0 4800.0 5400.0 1 4620 8100 5400 6225
Boxplot for two groups • Graph >> Boxplot >> One Y >> With groups
Linear function of data • Compute y as y = a + b x • Multiply each observation by the constant b • Add the constant a • Relations between summary measures for x and y Mean(y) = a + b Mean(x) • Also true for Min, Q1, median, Q3, Max SD(y) = |b| SD(x ) • Also true for Range and IQR Variance(y) = b2 Variance(x)
Example • Flat raise y = 1000 + x • Percentage raise w = 1.1x Salary after % raise Salary before raise
Standardized data • Compute Zsas • Average zs = 0 • SD of zs= 1 • MINITAB Calc >> Standardize (Specify an output column)
Example Z Income Income
Non-linear functions of data • Monotone functions (increasing or decreasing) • Examples: y = log x Median(y) = log[Median(x)] • Also true for Min, Q1, Q3, Max • NOT TRUE FOR MEAN & SD • MEAN & SD must be computed after transforming the data • Non-monotone functions • Example: y = x2 • ALL MEASURES must be computed after transforming the data
Example • Natural Log of Income Log of Income Income Med=40950
Skewed distribution • Income distribution • Hypothesis H: Data are generated from a normal distribution • If H is true, then the tail probability (P-value=P[R>0.7916]<0.0100) • Conclusion:P-value is low, hence data reject the normality hypothesis
Transformation to normality • Distribution of log of income • Hypothesis H: Log of incomes are generated from a normal distribution • If H is true, then the tail probability (P-value=P[R>0.9950]>0.10) • Conclusion:P-value is not low, hence data do not give evidence against the normality hypothesis