1 / 33

# Quantitative Variables - PowerPoint PPT Presentation

Quantitative Variables. Recall that quantitative variables have units, and are measured on a continuous scale… Examples: income (in \$), height (in inches), website popularity (by number if hits). Quantitative Variables. Mathematical operations on quantitative variables makes sense …

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about 'Quantitative Variables' - malha

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

• Recall that quantitative variables have units, and are measured on a continuous scale…

• Examples: income (in \$), height (in inches), website popularity (by number if hits)

• Mathematical operations on quantitative variables makes sense …

• Adding, subtracting, taking the arithmetic average etc…

• Histogram – note that the bars touch each other – the values at the bottom are continuous!

• To see the features of the data

• Shape

• Center

• Identifier variables are categorical variables with exactly one individual in each category.

• Examples: Social Security Number, ISBN, FedEx Tracking Number

• Don’t be tempted to analyze identifier variables.

• Be careful not to consider all variables with one case per category, like year, as identifier variables.

• The Why will help you decide how to treat identifier variables.

• Does the histogram have a single, central hump or several separated bumps?

• Humps in a histogram are called modes.

• A histogram with one main peak is dubbed unimodal; histograms with two peaks are bimodal; histograms with three or more peaks are called multimodal.

Humps and Bumps (cont.)

A histogram that doesn’t appear to have any mode and in which all the bars are approximately the same height is called uniform:

Humps and Bumps (cont.)

Symmetry which all the bars are approximately the same height is called

• Is the histogram symmetric?

• If you can fold the histogram along a vertical line through the middle and have the edges match pretty closely, the histogram is symmetric.

Symmetry (cont.) which all the bars are approximately the same height is called

• The (usually) thinner ends of a distribution are called the tails. If one tail stretches out farther than the other, the histogram is said to be skewed to the side of the longer tail.

• In the figure below, the histogram on the left is said to be skewed left, while the histogram on the right is said to be skewed right.

Anything Unusual? which all the bars are approximately the same height is called

• Do any unusual features stick out?

• Sometimes it’s the unusual features that tell us something interesting or exciting about the data.

• You should always mention any stragglers, or outliers, that stand off away from the body of the distribution.

• Are there any gaps in the distribution? If so, we might have data from more than one group.

Anything Unusual? (cont.) which all the bars are approximately the same height is called

• The following histogram has outliers—there are three cities in the leftmost bar:

Shape - Outliers which all the bars are approximately the same height is called

Do any unusual features stick out?

We will discuss these in more detail when we introduce box plots.

Why do we care about shape? which all the bars are approximately the same height is called

• When quantitative variables are skewed, we describe the center and spread using different measures than if the variable is symmetric.

The center of the distribution - median which all the bars are approximately the same height is called

• The “most typical value” in the data usually refers to some measure of the “center” of the distribution

• The median is the point that divides the histogram into two equal pieces

Calculating the median which all the bars are approximately the same height is called

• First, order all values from smallest to largest

• Let n = sample size

• If n is odd, the median is located at the (n+1)/2 position

• If n is even, the median is the average of the two middle points

Calculating the median which all the bars are approximately the same height is called

• Example 1 : Earthquakes in N.Z.

• 2010 EQ magnitudes in N.Z.: 3.2,3.2,3.3,3.4,3.5,3.5,3.6,3.6, 3.7, 3.8,3.9,3.9,6.4

• Since n is odd:

• Median is located at the

(n+1)/2 = (13+1)/2 = 7th position

• Median is 3.6

Calculating the median which all the bars are approximately the same height is called

• Example 2 : Earthquakes in Samoa

• 2010 Earthquake magnitudes in Samoa: 1.1,3.5,4.4,4.6,5.1,6.0

• Since n is even:

• Median is the average of

• (n/2) = (6/2) = 3rd value (4.4)

• (n/2)+1 = (6/2)+1 = 4th value (4.6)

• Median is (4.4+4.6)/2 = 4.5

Median - Interpretation which all the bars are approximately the same height is called

• Example 1: The typical earthquake size in Fiji in 2010 was 3.6 on the Richter scale

• How useful is this?

Spread which all the bars are approximately the same height is called

• If all earthquakes in Fiji were 3.6, then the Median would be sufficient information

• But they are not, so we need to see how spread out are the earthquakes around 3.6

Spread - Range which all the bars are approximately the same height is called

• Range = max value - min value

• For the Fiji example:

• Range = 6.4-3.2 = 3.2

• This is not useful…why?

Spread-IQR which all the bars are approximately the same height is called

• Inter-quartile range

• IQR = Q3 - Q1

• Q1 = Median of 1st half

• Q3 = Median of 2nd half

• One single number that captures “how spread out the data is”

Spread-IQR which all the bars are approximately the same height is called

• NZ Earthquake example cont:

• 2010 EQ magnitudes in N.Z. (divided): 1st half: 3.2,3.2,3.3,3.4,3.5,3.5,3.6,

2nd half: 3.6, 3.6, 3.7,3.8,3.9,3.9,6.4

• Q1 = (n+1)/2 = (7+1)/2 = 4 -> 3.4

• Q3 = (n+1)/2 = (7+1)/2 = 4 -> 3.8

• IQR = 3.8-3.4 = 0.4

• When n is odd, include median in both lists…don’t when n is even

IQR which all the bars are approximately the same height is called

• Almost always a reasonable summary of the spread of a distribution

• Shows how spread out the middle 50% of the data is

• One problem is that it ignores a lot of individual variation

5-Number Summary which all the bars are approximately the same height is called

• Minimum

• Q1

• Median

• Q3

• Maximum

The which all the bars are approximately the same height is called five-number summary of a distribution reports its median, quartiles, and extremes (maximum and minimum).

Example: The five-number summary for the ages at death for rock concert goers who died from being crushed is

The Five-Number Summary

Categorical or Quantitative? which all the bars are approximately the same height is called