Visualizing Data: Charts & Graphs for Effective Analysis

Topic 1 Organizing Information Pictorially Using Charts and Graphs

Characteristics of the individuals under study are called variables • Some variables have values that are attributes or characteristics … those are called qualitative or categorical variables • Some variables have values that are numeric measurements … those are called quantitative variables • The suggested approaches to analyzing problems vary by the type of variable

Examples of categorical variables • Gender • Zip code • Blood type • States in the United States • Brands of televisions • Categorical variables have category values … those values cannot be added, subtracted, etc.

Examples of quantitative variables • Temperature • Height and weight • Sales of a product • Number of children in a family • Points achieved playing a video game • Quantitative variables have numeric values … those values can be added, subtracted, etc.

A simple data set is blue, blue, green, red, red, blue, red, blue • A frequency table for this qualitative data is • The most commonly occurring color is blue

The relativefrequencies are the proportions (or percents) of the observations out of the total • A relative frequency distribution lists • Each of the categories • The relative frequency for each category

A relative frequency table for this qualitative data is • A relative frequency table can also be constructed with percents (50%, 12.5%, and 37.5% for the above table)

Bar graphs for categorical data • Bar graphs for our simple data (using Excel) • Frequency bar graph • Relative frequency bar graph

Comparative Bar Graph • An example side-by-side bar graph comparing educational attainment in 1990 versus 2003

Pie Chart • An example of a pie chart

Histogram for quantitative data • Quantitative data sometimes cannot be put directly into frequency tables since they do not have any obvious categories • Categories are created using classes, or intervals of numbers • The data is then put into the classes

For ages of adults, a possible set of classes is 20 – 29 30 – 39 40 – 49 50 – 59 60 and older • For the class 30 – 39 • 30 is the lowerclasslimit • 39 is the upperclasslimit • The classwidth is the difference between the upper class limit and the lower class limit • For the class 30 – 39, the class width is 40 – 30 = 10

All the classes have the same widths, except for the last class • The class “60 and above” is an open-endedclass because it has no upper limit • Classes with no lower limits are also called open-ended classes

The classes and the number of values in each can be put into a frequency table • In this table, there are 1147 subjects between 30 and 39 years old

Good practices for constructing tables for continuous variables • The classes should not overlap • The classes should not have any gaps between them • The classes should have the same width (except for possible open-ended classes at the extreme low or extreme high ends) • The class boundaries should be “reasonable” numbers • The class width should be a “reasonable” number

Just as for discrete data, a histogram can be created from the frequency table • Instead of individual data values, the categories are the classes – the intervals of data

Stemplots • A stemplot is a different way to represent data that is similar to a histogram • To draw a stem-and-leaf plot, each data value must be broken up into two components • The stem consists of all the digits except for the right most one • The leaf consists of the right most digit • For the number 173, for example, the stem would be “17” and the leaf would be “3”

Stemplots • In the stem-and-leaf plot below • The smallest value is 56 • The largest value is 180 • The second largest value is 178

Stemplots • To draw a stemplot • Write all the values in ascending order • Find the stems and write them vertically in ascending order • For each data value, write its leaf in the row next to its stem • The resulting leaves will also be in ascending order • The list of stems with their corresponding leaves is the stem-and-leaf plot

Comparative Stemplots If we wanted to compare two sets of data, we could draw two stem-and-leaf plots using the same stem, with leaves going left (for one set of data) and right (for the other set)

A useful way to describe a variable is by the shape of its distribution • Some common distribution shapes are • Uniform • Bell-shaped (or normal) • Skewed right • Skewed left

A variable has a uniform distribution when • Each of the values tends to occur with the same frequency • The histogram looks flat

A variable has a bell-shaped distribution when • Most of the values fall in the middle • The frequencies tail off to the left and to the right • It is symmetric

A variable has a skewedright distribution when • The distribution is not symmetric • The tail to the right is longer than the tail to the left • The arrow from the middle to the long tail points right Right

A variable has a skewedleft distribution when • The distribution is not symmetric • The tail to the left is longer than the tail to the right • The arrow from the middle to the long tail points left Left

The two graphs show the same data … the difference seems larger for the graph on the left • The vertical scale is truncated on the left

The gazebo on the right is twice as large in each dimension as the one on the left • However, it is much more than twice as large as the one on the left Original “Twice” as large

Visualizing Data: Charts & Graphs for Effective Analysis

Visualizing Data: Charts & Graphs for Effective Analysis

Presentation Transcript

Topic 1

TOPIC 1

Topic 1

Topic 1

Topic 1

Topic 1

Topic 1

TOPIC 1

Topic 1

Topic 1

Topic 1

Topic 1

Topic 1

TOPIC 1

Topic 1

TOPIC 1

Topic 1

Topic 1

1. TOPIC

Topic 1

Topic 1