Describing data
Download
1 / 22

Describing Data - PowerPoint PPT Presentation


  • 130 Views
  • Updated On :

Describing Data. Descriptive Statistics: Central Tendency and Variation. Lecture Objectives. You should be able to: Compute and interpret appropriate measures of centrality and variation . Recognize distributions of data .

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Describing Data' - nuri


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Describing data

Describing Data

Descriptive Statistics:

Central Tendency and Variation


Lecture objectives
Lecture Objectives

  • You should be able to:

  • Compute and interpret appropriate measures of centrality and variation.

  • Recognize distributions of data.

  • Apply properties of normally distributed data based on the mean and variance.

  • Compute and interpret covariance and correlation.


Summary measures
Summary Measures

1. Measures of Central Location

Mean, Median, Mode

2. Measures of Variation

Range, Percentile, Variance, Standard Deviation

3. Measures of Association

Covariance, Correlation


Measures of central location the arithmetic mean
Measures of Central Location:The Arithmetic Mean

It is the Arithmetic Average of data values:

The Most Common Measure of Central Tendency

Affected by Extreme Values (Outliers)

Sample Mean

0 1 2 3 4 5 6 7 8 9 10

0 1 2 3 4 5 6 7 8 9 10 12 14

Mean = 5

Mean = 6


Median
Median

  • Important Measure of Central Tendency

  • In an ordered array, the median is the “middle” number.

    • If n is odd, the median is the middle number.

    • If n is even, the median is the average of the 2 middle

    • numbers.

  • Not Affected by Extreme Values

0 1 2 3 4 5 6 7 8 9 10

0 1 2 3 4 5 6 7 8 9 10 12 14

Median = 5

Median = 5


Mode

A Measure of Central Tendency

Value thatOccurs Most Often

Not Affected by Extreme Values

There May Not be a Mode

There May be Several Modes

Used for Either Numerical or Categorical Data

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

0 1 2 3 4 5 6

No Mode

Mode = 9


Measures of variability
Measures of Variability

Range

The simplest measure

Percentile

Used with Median

Variance/Standard Deviation

Used with the Mean


Range

7 8 9 10 11 12

7 8 9 10 11 12

Range

Range = 12 - 7 = 5

Difference Between Largest & Smallest

Observations:

Range =

Ignores How Data Are Distributed:

Range = 12 - 7 = 5


Percentile
Percentile

2008 Olympic Medal Tally for top 55 nations. What is the percentile score for a country with 9 medals? What is the 50th percentile?


Percentile solutions
Percentile - solutions

Order all data (ascending or descending).

  • Country with 9 medals ranks 24th out of 55. There are 31 nations (56.36%) below it and 23 nations (41.82%) above it. Hence it can be considered a 57th or 58th percentile score.

  • The medal tally that corresponds to a 50th percentile is the one in the middle of the group, or the 28th country, with 7 medals. Hence the 50th percentile (Median) is 7.

    Now compute the first and third quartile values.


Box plot

Smallest

Largest

Q1

Q3

Median

Box Plot

The box plot shows 5 points, as follows:


Outliers

20

60

80

105

Outlier

40

50

Outliers

Interquartile Range (IQR) = [Q3 – Q1] = 60-40 = 20

1 Step = [1.5 * IQR] = 1.5*20 = 30

Q1 – 30 = 40 - 30 = 10

Q3 + 30 = 60 + 30 = 90

Any point outside the limits (10, 90) is considered an outlier.


Variance
Variance

For the Population:

For the Sample:

Variance is in squared units, and can be difficult to interpret. For instance, if data are in dollars, variance is in “squared dollars”.


Standard deviation
Standard Deviation

For the Population:

For the Sample:

Standard deviation is the square root of the variance.



The normal distribution
The Normal Distribution

A property of normally distributed data is as follows:


Comparing standard deviations

Data A

11 12 13 14 15 16 17 18 19 20 21

Data B

11 12 13 14 15 16 17 18 19 20 21

Data C

Comparing Standard Deviations

Mean = 15.5

s = 3.338

Mean = 15.5

s = .9258

Mean = 15.5

s = 4.57

11 12 13 14 15 16 17 18 19 20 21


Outliers1
Outliers

Typically, a number beyond a certain number of standard deviations is considered an outlier.

In many cases, a number beyond 3 standard deviations (about 0.25% chance of occurring) is considered an outlier.

If identifying an outlier is more critical, one can make the rule more stringent, and consider 2 standard deviations as the limit.


Coefficient of variation
Coefficient of Variation

Standard deviation relative to the mean.

Helps compare deviations for samples with different means


Computing cv
Computing CV

Stock A: Average Price last year = $50

Standard Deviation = $5

Stock B: Average Price last year = $100

Standard Deviation = $5

Coefficient of Variation:

Stock A: CV = 10%

Stock B: CV = 5%


Standardizing data
Standardizing Data

Which of the two numbers for person 8 is farther from the mean? The age of 75 or the income of 200,000?

Z scores tell us the distance from the mean, measured in standard deviations


Measures of association
Measures of Association

Covariance and Correlation

Covariance measures the average product of the deviations of two variables from their means.

Correlation is the standardized form of covariance (divided by the product of their standard deviations).

Correlation is always between -1 and +1.


ad