- 129 Views
- Uploaded on
- Presentation posted in: General

Quantitative Variables

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

- Recall that quantitative variables have units, and are measured on a continuous scale…
- Examples: income (in $), height (in inches), website popularity (by number if hits)

- Mathematical operations on quantitative variables makes sense …
- Adding, subtracting, taking the arithmetic average etc…

- Histogram – note that the bars touch each other – the values at the bottom are continuous!

- Dot plot

- To see the features of the data
- Shape
- Center
- Spread

- Identifier variables are categorical variables with exactly one individual in each category.
- Examples: Social Security Number, ISBN, FedEx Tracking Number

- Don’t be tempted to analyze identifier variables.
- Be careful not to consider all variables with one case per category, like year, as identifier variables.
- The Why will help you decide how to treat identifier variables.

- Does the histogram have a single, central hump or several separated bumps?
- Humps in a histogram are called modes.
- A histogram with one main peak is dubbed unimodal; histograms with two peaks are bimodal; histograms with three or more peaks are called multimodal.

A bimodal histogram has two apparent peaks:

A histogram that doesn’t appear to have any mode and in which all the bars are approximately the same height is called uniform:

- Is the histogram symmetric?
- If you can fold the histogram along a vertical line through the middle and have the edges match pretty closely, the histogram is symmetric.

- The (usually) thinner ends of a distribution are called the tails. If one tail stretches out farther than the other, the histogram is said to be skewed to the side of the longer tail.
- In the figure below, the histogram on the left is said to be skewed left, while the histogram on the right is said to be skewed right.

- Do any unusual features stick out?
- Sometimes it’s the unusual features that tell us something interesting or exciting about the data.
- You should always mention any stragglers, or outliers, that stand off away from the body of the distribution.
- Are there any gaps in the distribution? If so, we might have data from more than one group.

- The following histogram has outliers—there are three cities in the leftmost bar:

Do any unusual features stick out?

We will discuss these in more detail when we introduce box plots.

- When quantitative variables are skewed, we describe the center and spread using different measures than if the variable is symmetric.

- The “most typical value” in the data usually refers to some measure of the “center” of the distribution
- The median is the point that divides the histogram into two equal pieces

- First, order all values from smallest to largest
- Let n = sample size
- If n is odd, the median is located at the (n+1)/2 position
- If n is even, the median is the average of the two middle points

- Example 1 : Earthquakes in N.Z.
- 2010 EQ magnitudes in N.Z.: 3.2,3.2,3.3,3.4,3.5,3.5,3.6,3.6, 3.7, 3.8,3.9,3.9,6.4
- Since n is odd:
- Median is located at the
(n+1)/2 = (13+1)/2 = 7th position

- Median is 3.6

- Median is located at the

- Example 2 : Earthquakes in Samoa
- 2010 Earthquake magnitudes in Samoa: 1.1,3.5,4.4,4.6,5.1,6.0
- Since n is even:
- Median is the average of
- (n/2) = (6/2) = 3rd value (4.4)
- (n/2)+1 = (6/2)+1 = 4th value (4.6)

- Median is (4.4+4.6)/2 = 4.5

- Median is the average of

- Example 1: The typical earthquake size in Fiji in 2010 was 3.6 on the Richter scale
- How useful is this?

- If all earthquakes in Fiji were 3.6, then the Median would be sufficient information
- But they are not, so we need to see how spread out are the earthquakes around 3.6

- Range = max value - min value
- For the Fiji example:
- Range = 6.4-3.2 = 3.2

- This is not useful…why?

- Inter-quartile range
- IQR = Q3 - Q1
- Q1 = Median of 1st half
- Q3 = Median of 2nd half
- One single number that captures “how spread out the data is”

- NZ Earthquake example cont:
- 2010 EQ magnitudes in N.Z. (divided): 1st half: 3.2,3.2,3.3,3.4,3.5,3.5,3.6,
2nd half: 3.6, 3.6, 3.7,3.8,3.9,3.9,6.4

- Q1 = (n+1)/2 = (7+1)/2 = 4 -> 3.4
- Q3 = (n+1)/2 = (7+1)/2 = 4 -> 3.8
- IQR = 3.8-3.4 = 0.4
- When n is odd, include median in both lists…don’t when n is even

- Almost always a reasonable summary of the spread of a distribution
- Shows how spread out the middle 50% of the data is
- One problem is that it ignores a lot of individual variation

- Minimum
- Q1
- Median
- Q3
- Maximum

The five-number summary of a distribution reports its median, quartiles, and extremes (maximum and minimum).

Example: The five-number summary for the ages at death for rock concert goers who died from being crushed is