# Quantitative Variables - PowerPoint PPT Presentation

1 / 33

Quantitative Variables. Recall that quantitative variables have units, and are measured on a continuous scale… Examples: income (in \$), height (in inches), website popularity (by number if hits). Quantitative Variables. Mathematical operations on quantitative variables makes sense …

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

Quantitative Variables

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

### Quantitative Variables

• Recall that quantitative variables have units, and are measured on a continuous scale…

• Examples: income (in \$), height (in inches), website popularity (by number if hits)

### Quantitative Variables

• Mathematical operations on quantitative variables makes sense …

• Adding, subtracting, taking the arithmetic average etc…

### Visualizing quantitative variables

• Histogram – note that the bars touch each other – the values at the bottom are continuous!

• Dot plot

### So why visualize?

• To see the features of the data

• Shape

• Center

### Identifying Identifiers

• Identifier variables are categorical variables with exactly one individual in each category.

• Examples: Social Security Number, ISBN, FedEx Tracking Number

• Don’t be tempted to analyze identifier variables.

• Be careful not to consider all variables with one case per category, like year, as identifier variables.

• The Why will help you decide how to treat identifier variables.

### Humps and Bumps

• Does the histogram have a single, central hump or several separated bumps?

• Humps in a histogram are called modes.

• A histogram with one main peak is dubbed unimodal; histograms with two peaks are bimodal; histograms with three or more peaks are called multimodal.

A bimodal histogram has two apparent peaks:

### Humps and Bumps (cont.)

A histogram that doesn’t appear to have any mode and in which all the bars are approximately the same height is called uniform:

### Symmetry

• Is the histogram symmetric?

• If you can fold the histogram along a vertical line through the middle and have the edges match pretty closely, the histogram is symmetric.

### Symmetry (cont.)

• The (usually) thinner ends of a distribution are called the tails. If one tail stretches out farther than the other, the histogram is said to be skewed to the side of the longer tail.

• In the figure below, the histogram on the left is said to be skewed left, while the histogram on the right is said to be skewed right.

### Anything Unusual?

• Do any unusual features stick out?

• Sometimes it’s the unusual features that tell us something interesting or exciting about the data.

• You should always mention any stragglers, or outliers, that stand off away from the body of the distribution.

• Are there any gaps in the distribution? If so, we might have data from more than one group.

### Anything Unusual? (cont.)

• The following histogram has outliers—there are three cities in the leftmost bar:

### Shape - Outliers

Do any unusual features stick out?

We will discuss these in more detail when we introduce box plots.

### Why do we care about shape?

• When quantitative variables are skewed, we describe the center and spread using different measures than if the variable is symmetric.

### The center of the distribution - median

• The “most typical value” in the data usually refers to some measure of the “center” of the distribution

• The median is the point that divides the histogram into two equal pieces

### Calculating the median

• First, order all values from smallest to largest

• Let n = sample size

• If n is odd, the median is located at the (n+1)/2 position

• If n is even, the median is the average of the two middle points

### Calculating the median

• Example 1 : Earthquakes in N.Z.

• 2010 EQ magnitudes in N.Z.: 3.2,3.2,3.3,3.4,3.5,3.5,3.6,3.6, 3.7, 3.8,3.9,3.9,6.4

• Since n is odd:

• Median is located at the

(n+1)/2 = (13+1)/2 = 7th position

• Median is 3.6

### Calculating the median

• Example 2 : Earthquakes in Samoa

• 2010 Earthquake magnitudes in Samoa: 1.1,3.5,4.4,4.6,5.1,6.0

• Since n is even:

• Median is the average of

• (n/2) = (6/2) = 3rd value (4.4)

• (n/2)+1 = (6/2)+1 = 4th value (4.6)

• Median is (4.4+4.6)/2 = 4.5

### Median - Interpretation

• Example 1: The typical earthquake size in Fiji in 2010 was 3.6 on the Richter scale

• How useful is this?

• If all earthquakes in Fiji were 3.6, then the Median would be sufficient information

• But they are not, so we need to see how spread out are the earthquakes around 3.6

• Range = max value - min value

• For the Fiji example:

• Range = 6.4-3.2 = 3.2

• This is not useful…why?

• Inter-quartile range

• IQR = Q3 - Q1

• Q1 = Median of 1st half

• Q3 = Median of 2nd half

• One single number that captures “how spread out the data is”

• NZ Earthquake example cont:

• 2010 EQ magnitudes in N.Z. (divided): 1st half: 3.2,3.2,3.3,3.4,3.5,3.5,3.6,

2nd half: 3.6, 3.6, 3.7,3.8,3.9,3.9,6.4

• Q1 = (n+1)/2 = (7+1)/2 = 4 -> 3.4

• Q3 = (n+1)/2 = (7+1)/2 = 4 -> 3.8

• IQR = 3.8-3.4 = 0.4

• When n is odd, include median in both lists…don’t when n is even

### IQR

• Almost always a reasonable summary of the spread of a distribution

• Shows how spread out the middle 50% of the data is

• One problem is that it ignores a lot of individual variation

### 5-Number Summary

• Minimum

• Q1

• Median

• Q3

• Maximum

The five-number summary of a distribution reports its median, quartiles, and extremes (maximum and minimum).

Example: The five-number summary for the ages at death for rock concert goers who died from being crushed is