1 / 87

Lecture 3 Descriptives &amp; Graphing Lecturer: James Neill Research Methods &amp; Design in Psychology Overview Univariate descriptives &amp; graphs Non-parametric vs. parametric Non-normal distributions Properties of normal distributions Graphing relations b/w 2 and 3 variables

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Lecture 3

Descriptives & Graphing

Lecturer: James Neill

Overview

Univariate descriptives & graphs

Non-parametric vs. parametric

Non-normal distributions

Properties of normal distributions

Graphing relations b/w 2 and 3 variables

Empirical Approach to Research

A positivistic approach ASSUMES:

the world is made up of bits of data which can be ‘measured’, ‘recorded’, & ‘analysed’

Interpretation of data can lead to valid insights about how people think, feel and behave

What do we want to Describe?

Distributional properties of variables:

Central tendency(ies)

Shape

Basic Univariate Descriptive Statistics

Central tendency

Mode

Median

Mean

• Interquartile Range
• Range
• Standard Deviation
• Variance

Shape

• Skewness
• Kurtosis
Basic Univariate Graphs

Bar Graph – Pie Chart

Stem & Leaf Plot

Boxplot

Histogram

Measures of Central Tendency

Statistics to represent the ‘centre’ of a distribution

Mode (most frequent)

Median (50th percentile)

Mean (average)

Choice of measure dependent on

Type of data

Shape of distribution (esp. skewness)

Measures of Central Tendency

Mode

Median

Mean

Nominal

X

Ordinal

X

X

Interval

X

X

X

Ratio

X?

X

X

Measures of Dispersion

Measures of deviation from the central tendency

Non-parametric / non-normal:range, percentiles, min, max

Parametric:SD & properties of the normal distribution

Measures of Dispersion

Range, Min/Max

Percentiles

SD

Nominal

Ordinal

X

Interval

X

X

X?

Ratio

X

X

X

Describing Nominal Data

Frequencies

Most frequent?

Least frequent?

Percentages?

Bar graphs

Examine comparative heights of bars – shape is arbitrary

Consider whether to use freqs or %s

Frequencies

Number of individuals obtaining each score on a variable

Frequency tables

graphically (bar chart, pie chart)

Can also present as %

Mode

Most common score - highest point in a distribution

Suitable for all types of data including nominal (may not be useful for ratio)

Before using, check frequencies and bar graph to see whether it is an accurate and useful statistic.

Describing Ordinal Data

Conveys order but not distance (e.g., ranks)

Descriptives as for nominal (i.e., frequencies, mode)

Also maybe median – if accurate/useful

Maybe IQR, min. & max.

Bar graphs, pie charts, & stem-&-leaf plots

Stem & Leaf Plot
• Useful for ordinal, interval and ratio data
• Alternative to histogram
Box & whisker
• Useful for interval and ratio data
• Represents min. max, median and quartiles
Describing Interval Data

Conveys order and distance, but no true zero (0 pt is arbitrary).

Interval data is discrete, but is often treated as ratio/continuous (especially for > 5 intervals)

Distribution (shape)

Central tendency (mode, median)

Dispersion (min, max, range)

Can also use M & SD if treating as continuous

Describing Ratio Data

Numbers convey order and distance, true zero point - can talk meaningfully about ratios.

Continuous

Distribution (shape – skewness, kurtosis)

Central tendency (median, mean)

Dispersion (min, max, range, SD)

The Four Moments of a Normal Distribution

Mean

<-Kurt->

<-SD->

<-Skew

Skew->

The Four Moments of a Normal Distribution

Four mathematical qualities (parameters) allow one to describe a continuous distribution which as least roughly follows a bell curve shape:

• 1st = mean (central tendency)
• 2nd = SD (dispersion)
• 3rd = skewness (lean / tail)
• 4th = kurtosis (peakedness / flattness)
Mean (1st moment )
• Average score
• Mean =  X / N
• Use for ratio data or interval (if treating it as continuous).
• Influenced by extreme scores (outliers)
Standard Deviation (2nd moment )
• SD = square root of Variance

=  (X - X)2

N – 1

• Standard Error (SE) = SD / square root of N
Skewness (3rd moment )
• Lean of distribution
• +ve = tail to right
• -ve = tail to left
• Can be caused by an outlier
• Can be caused by ceiling or floor effects
• Can be accurate (e.g., the number of cars owned per person)
Skewness (3rd moment )
• Negative skew
• Positive skew
Kurtosis (4th moment )
• Flatness or peakedness of distribution
• +ve = peaked
• -ve = flattened
• Be aware that by altering the X and Y axis, any distribution can be made to look more peaked or more flat – so add a normal curve to the histogram to help judge kurtosis
Kurtosis (4th moment )

Red = Positive (leptokurtic)

Blue = negative (platykurtic)

Key Areas under the Curve for Normal Distributions
• For normal distributions, approx. +/- 1 SD = 68%+/- 2 SD ~ 95%+/- 3 SD ~ 99.9%
Types of Non-normal Distribution
• Bi-modal
• Multi-modal
• Positively skewed
• Negatively skewed
• Flat (platykurtic)
• Peaked (leptokurtic)
Rules of Thumb in Judging Severity of Skewness & Kurtosis
• View histogram with normal curve
• Deal with outliers
• Skewness / kurtosis <-1 or >1
• Skewness / kurtosis significance tests
Skewed Distributions& the Mode, Median & Mean

+vely skewed mode < median < mean

Symmetrical (normal) mean = median = mode

-vely skewed mean < median < mode

Edward Tufte

Graphs:

Reveal data

Communicate complex ideas with clarity, precision, and efficiency

Tufte's Guidelines 1

Show the data

Substance rather than method

Avoid distortion

Present many numbers in a small space

Make large data sets coherent

Tufte's Guidelines 2
• Encourage eye to make comparisons
• Reveal data at several levels
• Purpose: Description, exploration, tabulation, decoration
• Closely integrated with statistical and verbal descriptions
Tufte’s Graphical Integrity 1

Some lapses intentional, some not

Lie Factor = size of effect in graph size of effect in data

Leaving out important context

Lack of taste and aesthetics

Tufte's Graphical Integrity 2

Trade-off between amount of information, simplicity, and accuracy

“It is often hard to judge what users will find intuitive and how [a visualization] will support a particular task” (Tweedie et al)

• Presenting Data – Statistics Glossary v1.1 - http://www.cas.lancs.ac.uk/glossary_v1.1/presdata.html
• A Periodic Table of Visualisation Methods - http://www.visual-literacy.org/periodic_table/periodic_table.html
• Gallery of Data Visualization
• Univariate Data Analysis – The Best & Worst of Statistical Graphs - http://www.csulb.edu/~msaintg/ppa696/696uni.htm
• Pitfalls of Data Analysis – http://www.vims.edu/~david/pitfalls/pitfalls.htm
• Statistics for the Life Sciences –http://www.math.sfu.ca/~cschwarz/Stat-301/Handouts/Handouts.html