Quantitative variables
Sponsored Links
This presentation is the property of its rightful owner.
1 / 33

Quantitative Variables PowerPoint PPT Presentation


  • 102 Views
  • Uploaded on
  • Presentation posted in: General

Quantitative Variables. Recall that quantitative variables have units, and are measured on a continuous scale… Examples: income (in $), height (in inches), website popularity (by number if hits). Quantitative Variables. Mathematical operations on quantitative variables makes sense …

Download Presentation

Quantitative Variables

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Quantitative Variables

  • Recall that quantitative variables have units, and are measured on a continuous scale…

  • Examples: income (in $), height (in inches), website popularity (by number if hits)


Quantitative Variables

  • Mathematical operations on quantitative variables makes sense …

  • Adding, subtracting, taking the arithmetic average etc…


Visualizing quantitative variables

  • Histogram – note that the bars touch each other – the values at the bottom are continuous!


Visualizing quantitative variables

  • Dot plot


So why visualize?

  • To see the features of the data

    • Shape

    • Center

    • Spread


Constructing a Histogram


Step 1 – Choose the Classes


Step 2 – Count


Step 3 – Draw the Histogram


Identifying Identifiers

  • Identifier variables are categorical variables with exactly one individual in each category.

    • Examples: Social Security Number, ISBN, FedEx Tracking Number

  • Don’t be tempted to analyze identifier variables.

  • Be careful not to consider all variables with one case per category, like year, as identifier variables.

    • The Why will help you decide how to treat identifier variables.


Shape - Modality and Symmetry


Humps and Bumps

  • Does the histogram have a single, central hump or several separated bumps?

    • Humps in a histogram are called modes.

    • A histogram with one main peak is dubbed unimodal; histograms with two peaks are bimodal; histograms with three or more peaks are called multimodal.


A bimodal histogram has two apparent peaks:

Humps and Bumps (cont.)


A histogram that doesn’t appear to have any mode and in which all the bars are approximately the same height is called uniform:

Humps and Bumps (cont.)


Symmetry

  • Is the histogram symmetric?

    • If you can fold the histogram along a vertical line through the middle and have the edges match pretty closely, the histogram is symmetric.


Symmetry (cont.)

  • The (usually) thinner ends of a distribution are called the tails. If one tail stretches out farther than the other, the histogram is said to be skewed to the side of the longer tail.

  • In the figure below, the histogram on the left is said to be skewed left, while the histogram on the right is said to be skewed right.


Anything Unusual?

  • Do any unusual features stick out?

    • Sometimes it’s the unusual features that tell us something interesting or exciting about the data.

    • You should always mention any stragglers, or outliers, that stand off away from the body of the distribution.

    • Are there any gaps in the distribution? If so, we might have data from more than one group.


Anything Unusual? (cont.)

  • The following histogram has outliers—there are three cities in the leftmost bar:


Shape - Outliers

Do any unusual features stick out?

We will discuss these in more detail when we introduce box plots.


Why do we care about shape?

  • When quantitative variables are skewed, we describe the center and spread using different measures than if the variable is symmetric.


The center of the distribution - median

  • The “most typical value” in the data usually refers to some measure of the “center” of the distribution

  • The median is the point that divides the histogram into two equal pieces


Calculating the median

  • First, order all values from smallest to largest

  • Let n = sample size

  • If n is odd, the median is located at the (n+1)/2 position

  • If n is even, the median is the average of the two middle points


Calculating the median

  • Example 1 : Earthquakes in N.Z.

  • 2010 EQ magnitudes in N.Z.: 3.2,3.2,3.3,3.4,3.5,3.5,3.6,3.6, 3.7, 3.8,3.9,3.9,6.4

  • Since n is odd:

    • Median is located at the

      (n+1)/2 = (13+1)/2 = 7th position

    • Median is 3.6


Calculating the median

  • Example 2 : Earthquakes in Samoa

  • 2010 Earthquake magnitudes in Samoa: 1.1,3.5,4.4,4.6,5.1,6.0

  • Since n is even:

    • Median is the average of

      • (n/2) = (6/2) = 3rd value (4.4)

      • (n/2)+1 = (6/2)+1 = 4th value (4.6)

    • Median is (4.4+4.6)/2 = 4.5


Median - Interpretation

  • Example 1: The typical earthquake size in Fiji in 2010 was 3.6 on the Richter scale

  • How useful is this?


Spread

  • If all earthquakes in Fiji were 3.6, then the Median would be sufficient information

  • But they are not, so we need to see how spread out are the earthquakes around 3.6


Spread - Range

  • Range = max value - min value

  • For the Fiji example:

    • Range = 6.4-3.2 = 3.2

  • This is not useful…why?


Spread-IQR

  • Inter-quartile range

  • IQR = Q3 - Q1

  • Q1 = Median of 1st half

  • Q3 = Median of 2nd half

  • One single number that captures “how spread out the data is”


Spread-IQR

  • NZ Earthquake example cont:

  • 2010 EQ magnitudes in N.Z. (divided): 1st half: 3.2,3.2,3.3,3.4,3.5,3.5,3.6,

    2nd half: 3.6, 3.6, 3.7,3.8,3.9,3.9,6.4

  • Q1 = (n+1)/2 = (7+1)/2 = 4 -> 3.4

  • Q3 = (n+1)/2 = (7+1)/2 = 4 -> 3.8

  • IQR = 3.8-3.4 = 0.4

  • When n is odd, include median in both lists…don’t when n is even


IQR

  • Almost always a reasonable summary of the spread of a distribution

  • Shows how spread out the middle 50% of the data is

  • One problem is that it ignores a lot of individual variation


5-Number Summary

  • Minimum

  • Q1

  • Median

  • Q3

  • Maximum


The five-number summary of a distribution reports its median, quartiles, and extremes (maximum and minimum).

Example: The five-number summary for the ages at death for rock concert goers who died from being crushed is

The Five-Number Summary


Categorical or Quantitative?


  • Login