Download
1 / 59

Descriptive Statistics (Part 1) - PowerPoint PPT Presentation


  • 216 Views
  • Updated On :

4A. Chapter. Descriptive Statistics (Part 1). Numerical Description Central Tendency Dispersion. McGraw-Hill/Irwin. © 2008 The McGraw-Hill Companies, Inc. All rights reserved. Numerical Description. Statistics are descriptive measures derived from a sample ( n items).

Related searches for Descriptive Statistics (Part 1)

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Descriptive Statistics (Part 1)' - yule


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Descriptive statistics part 1 l.jpg

4A

Chapter

Descriptive Statistics (Part 1)

Numerical Description

Central Tendency

Dispersion

McGraw-Hill/Irwin

© 2008 The McGraw-Hill Companies, Inc. All rights reserved.


Slide3 l.jpg

Numerical Description

  • Statistics are descriptive measures derived from a sample (n items).

  • Parameters are descriptive measures derived from a population (N items).


Slide4 l.jpg

Numerical Description

  • Three key characteristics of numerical data:


Slide5 l.jpg

  • Defect rate = total no. defects

x 100

no. inspected

Numerical Description

  • Example: Vehicle Quality

  • Numerical statistics can be used to summarize this random sample of brands.

  • Must allow for sampling error since the analysis is based on sampling.


Slide6 l.jpg

Numerical Description Power and Associates.

  • Number of defects per 100 vehicles, 1004 models.


Slide7 l.jpg

To begin, sort the data in Excel. Power and Associates.


Slide8 l.jpg

Numerical Description Power and Associates.

  • Sorted data provides insight into central tendency and dispersion.


Slide9 l.jpg

Numerical Description Power and Associates.

  • Visual Displays

  • The dot plot offers a visual impression of the data.


Slide10 l.jpg

Numerical Description Power and Associates.

  • Visual Displays

  • Histograms with 5 bins (suggested by Sturge’s Rule) and 10 bins are shown below.

  • Both are symmetric with no extreme values and show a modal class toward the low end.


Slide11 l.jpg

Descriptive Statistics in Excel Power and Associates.

Go to Tools | Data Analysis and select Descriptive Statistics


Slide12 l.jpg

Highlight the data range, specify a cell for the upper-left corner of the output range, check Summary Statistics and click OK.


Slide13 l.jpg

Here is the resulting analysis. corner of the output range, check


Slide14 l.jpg

Descriptive Statistics in MegaStat corner of the output range, check


Slide15 l.jpg

Here is the resulting MegaStat analysis: corner of the output range, check


Slide16 l.jpg

Central Tendency corner of the output range, check

  • The central tendency is the middle or typical values of a distribution.

  • Central tendency can be assessed using a dot plot, histogram or more precisely with numerical statistics.


Slide17 l.jpg

Central Tendency corner of the output range, check

  • Six Measures of Central Tendency


Slide18 l.jpg

Central Tendency corner of the output range, check

  • Six Measures of Central Tendency


Slide19 l.jpg

Central Tendency corner of the output range, check

  • Six Measures of Central Tendency


Slide20 l.jpg

Central Tendency corner of the output range, check

  • Mean

  • A familiar measure of central tendency.

  • In Excel, use function =AVERAGE(Data) where Data is an array of data values.


Slide21 l.jpg

Central Tendency corner of the output range, check

  • Mean

  • For the sample of n = 37 car brands:


Slide22 l.jpg

Central Tendency corner of the output range, check

  • Characteristics of the Mean

  • Arithmetic mean is the most familiar average.

  • Affected by every sample item.

  • The balancing point or fulcrum for the data.


Slide23 l.jpg

= (42 – 65) + (60 – 65) + (70 – 65) + (75 – 65) + (78 – 65)

= (-23) + (-5) + (5) + (10) + (13) = -28 + 28 = 0

Central Tendency

  • Characteristics of the Mean

  • Regardless of the shape of the distribution, absolute distances from the mean to the data points always sum to zero.

  • Consider the following asymmetric distribution of quiz scores whose mean = 65.


Slide24 l.jpg

Central Tendency (78 – 65)

  • Median

  • The median (M) is the 50th percentile or midpoint of the sorted sample data.

  • M separates the upper and lower half of the sorted observations.

  • If n is odd, the median is the middle observation in the data array.

  • If n is even, the median is the average of the middle two observations in the data array.


Slide25 l.jpg

  • For (78 – 65)n = 8, the median is between the fourth and fifth observations in the data array.

Central Tendency

  • Median


Slide26 l.jpg

  • For (78 – 65)n = 9, the median is the fifth observation in the data array.

Central Tendency

  • Median


Slide27 l.jpg

For even (78 – 65)n, Median =

Central Tendency

  • Median

  • Consider the following n = 6 data values:11 12 15 17 21 32

  • What is the median?

n/2 = 6/2 = 3 and n/2+1 = 6/2 + 1 = 4

M = (x3+x4)/2 = (15+17)/2 = 16

11 12 15 16 17 21 32


Slide28 l.jpg

For odd (78 – 65)n, Median =

Central Tendency

  • Median

  • Consider the following n = 7 data values:12 23 23 25 27 34 41

  • What is the median?

(n+1)/2 = (7+1)/2 = 8/2 = 4

M = x4 = 25

12 23 23 25 27 34 41


Slide29 l.jpg

Central Tendency (78 – 65)

  • Median

  • Use Excel’s function =MEDIAN(Data) where Data is an array of data values.

  • For the 37 vehicle quality ratings (odd n) the position of the median is (n+1)/2 = (37+1)/2 = 19.

  • So, the median is x19 = 121.

  • When there are several duplicate data values, the median does not provide a clean “50-50” split in the data.


Slide30 l.jpg

Central Tendency (78 – 65)

  • Characteristics of the Median

  • The median is insensitive to extreme data values.

  • For example, consider the following quiz scores for 3 students:

Tom’s scores: 20, 40, 70, 75, 80 Mean =57, Median = 70, Total = 285

Jake’s scores: 60, 65, 70, 90, 95 Mean = 76, Median = 70, Total = 380

Mary’s scores: 50, 65, 70, 75, 90 Mean = 70, Median = 70, Total = 350

  • What does the median for each student tell you?


Slide31 l.jpg

Central Tendency (78 – 65)

  • Mode

  • The most frequently occurring data value.

  • Similar to mean and median if data values occur often near the center of sorted data.

  • May have multiple modes or no mode.


Slide32 l.jpg

Central Tendency (78 – 65)

  • Mode

  • For example, consider the following quiz scores for 3 students:

Lee’s scores: 60, 70, 70, 70, 80 Mean =70, Median = 70, Mode = 70

Pat’s scores: 45, 45, 70, 90, 100 Mean = 70, Median = 70, Mode = 45

Sam’s scores: 50, 60, 70, 80, 90 Mean = 70, Median = 70, Mode = none

Xiao’s scores: 50, 50, 70, 90, 90 Mean = 70, Median = 70, Modes = 50,90

  • What does the mode for each student tell you?


Slide33 l.jpg

Central Tendency (78 – 65)

  • Mode

  • Easy to define, not easy to calculate in large samples.

  • Use Excel’s function =MODE(Array)- will return #N/A if there is no mode.- will return first mode found if multimodal.

  • May be far from the middle of the distribution and not at all typical.


Slide34 l.jpg

Central Tendency (78 – 65)

  • Mode

  • Generally isn’t useful for continuous data since data values rarely repeat.

  • Best for attribute data or a discrete variable with a small range (e.g., Likert scale).


Slide35 l.jpg

Central Tendency (78 – 65)

  • Example: Price/Earnings Ratios and Mode

  • Consider the following P/E ratios for a random sample of 68 Standard & Poor’s 500 stocks.

  • What is the mode?


Slide36 l.jpg

Central Tendency (78 – 65)

  • Example: Price/Earnings Ratios and Mode

  • Excel’s descriptive statistics results are:

  • The mode 13 occurs 7 times, but what does the dot plot show?


Slide37 l.jpg

Central Tendency (78 – 65)

  • Example: Rose Bowl Winners’ Points

  • Points scored by the winning NCAA football team tends to have modes in multiples of 7 because each touchdown yields 7 points.

  • Consider the dot plot of the points scored by the winning team in the first 87 Rose Bowl games.

  • What is the mode?


Slide38 l.jpg

Central Tendency (78 – 65)

  • Skewness

  • Compare mean and median or look at histogram to determine degree of skewness.


Slide39 l.jpg

Central Tendency (78 – 65)

  • Symptoms of Skewness


Slide40 l.jpg

Midrange = (78 – 65)

Midrange =

=

Central Tendency

  • Midrange

  • The midrange is the point halfway between the lowest and highest values of X.

  • Easy to use but sensitive to extreme data values.

  • For the J. D. Power quality data (n=37):

  • Here, the midrange (130) is higher than the mean (125.38) or median (121).


Slide41 l.jpg

Dispersion (78 – 65)

  • Variation is the “spread” of data points about the center of the distribution in a sample. Consider the following measures of dispersion:

  • Measures of Variation


Slide42 l.jpg

Dispersion (78 – 65)

  • Measures of Variation


Slide43 l.jpg

Dispersion (78 – 65)

  • Measures of Variation


Slide44 l.jpg

Dispersion (78 – 65)

  • Range

  • The difference between the largest and smallest observation.

Range = xmax – xmin

  • For example, for the n = 68 P/E ratios,

Range = 91 – 7 = 84


Slide45 l.jpg

Dispersion (78 – 65)

  • Variance

  • The population variance (s2) is defined as the sum of squared deviations around the mean m divided by the population size.

  • For the sample variance (s2), we divide by n – 1 instead of n, otherwise s2 would tend to underestimate the unknown population variance s2.


Slide46 l.jpg

Population standard deviation (78 – 65)

Sample standard deviation

Dispersion

  • Standard Deviation

  • The square root of the variance.

  • Explains how individual values in a data set vary from the mean.

  • Units of measure are the same as X.


Slide47 l.jpg

Dispersion (78 – 65)

  • Standard Deviation

  • Excel’s built in functions are


Slide48 l.jpg

Dispersion (78 – 65)

  • Calculating a Standard Deviation

  • Consider the following five quiz scores for Stephanie.


Slide49 l.jpg

Dispersion (78 – 65)

  • Calculating a Standard Deviation

  • Now, calculate the sample standard deviation:

  • Somewhat easier, the two-sum formula can also be used:


Slide50 l.jpg

Dispersion (78 – 65)

  • Calculating a Standard Deviation

  • The standard deviation is nonnegative because deviations around the mean are squared.

  • When every observation is exactly equal to the mean, the standard deviation is zero.

  • Standard deviations can be large or small, depending on the units of measure.

  • Compare standard deviations only for data sets measured in the same units and only if the means do not differ substantially.


Slide51 l.jpg

Dispersion (78 – 65)

  • Coefficient of Variation

  • Useful for comparing variables measured in different units or with different means.

  • A unit-free measure of dispersion

  • Expressed as a percent of the mean.

  • Only appropriate for nonnegative data. It is undefined if the mean is zero or negative.


Slide52 l.jpg

Dispersion (78 – 65)

  • Coefficient of Variation

  • For example:


Slide53 l.jpg

Dispersion (78 – 65)

  • Mean Absolute Deviation

  • The Mean Absolute Deviation (MAD) reveals the average distance from an individual data point to the mean (center of the distribution).

  • Uses absolute values of the deviations around the mean.

  • Excel’s function is =AVEDEV(Array)


Slide54 l.jpg

Machine B (78 – 65)

Machine A

Dispersion

  • Central Tendency vs. Dispersion: Manufacturing

  • Consider the histograms of hole diameters drilled in a steel plate during manufacturing.

  • The desired distribution is outlined in red.


Slide55 l.jpg

Machine B (78 – 65)

Machine A

Dispersion

  • Central Tendency vs. Dispersion: Manufacturing

Acceptable variation but mean is less than 5 mm.

Desired mean (5mm) but too much variation.

  • Take frequent samples to monitor quality.


Slide56 l.jpg

Dispersion (78 – 65)

  • Central Tendency vs. Dispersion: Job Performance

  • Consider student ratings of four professors on eight teaching attributes (10-point scale).


Slide57 l.jpg

Dispersion (78 – 65)

  • Central Tendency vs. Dispersion: Job Performance

  • Jones and Wu have identical means but different standard deviations.


Slide58 l.jpg

Dispersion (78 – 65)

  • Central Tendency vs. Dispersion: Job Performance

  • Smith and Gopal have different means but identical standard deviations.


Slide59 l.jpg

Dispersion (78 – 65)

  • Central Tendency vs. Dispersion: Job Performance

  • A high mean (better rating) and low standard deviation (more consistency) is preferred. Which professor do you think is best?


ad