1 / 87

Lecture 3 Descriptives & Graphing Lecturer: James Neill Research Methods & Design in Psychology Overview Univariate descriptives & graphs Non-parametric vs. parametric Non-normal distributions Properties of normal distributions Graphing relations b/w 2 and 3 variables

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Descriptives & Graphing

Lecturer: James Neill

Univariate descriptives & graphs

Non-parametric vs. parametric

Non-normal distributions

Properties of normal distributions

Graphing relations b/w 2 and 3 variables

A positivistic approach ASSUMES:

the world is made up of bits of data which can be ‘measured’, ‘recorded’, & ‘analysed’

Interpretation of data can lead to valid insights about how people think, feel and behave

Distributional properties of variables:

Central tendency(ies)

Shape

Central tendency

Mode

Median

Mean

• Interquartile Range

• Range

• Standard Deviation

• Variance

Shape

• Skewness

• Kurtosis

Bar Graph – Pie Chart

Stem & Leaf Plot

Boxplot

Histogram

Statistics to represent the ‘centre’ of a distribution

Mode (most frequent)

Median (50th percentile)

Mean (average)

Choice of measure dependent on

Type of data

Shape of distribution (esp. skewness)

Mode

Median

Mean

Nominal

X

Ordinal

X

X

Interval

X

X

X

Ratio

X?

X

X

Measures of deviation from the central tendency

Non-parametric / non-normal:range, percentiles, min, max

Parametric:SD & properties of the normal distribution

Range, Min/Max

Percentiles

SD

Nominal

Ordinal

X

Interval

X

X

X?

Ratio

X

X

X

Frequencies

Most frequent?

Least frequent?

Percentages?

Bar graphs

Examine comparative heights of bars – shape is arbitrary

Consider whether to use freqs or %s

Number of individuals obtaining each score on a variable

Frequency tables

graphically (bar chart, pie chart)

Can also present as %

Bar chart: Do you believe in God?

Most common score - highest point in a distribution

Suitable for all types of data including nominal (may not be useful for ratio)

Before using, check frequencies and bar graph to see whether it is an accurate and useful statistic.

Conveys order but not distance (e.g., ranks)

Descriptives as for nominal (i.e., frequencies, mode)

Also maybe median – if accurate/useful

Maybe IQR, min. & max.

Bar graphs, pie charts, & stem-&-leaf plots

• Useful for ordinal, interval and ratio data

• Alternative to histogram

• Useful for interval and ratio data

• Represents min. max, median and quartiles

Conveys order and distance, but no true zero (0 pt is arbitrary).

Interval data is discrete, but is often treated as ratio/continuous (especially for > 5 intervals)

Distribution (shape)

Central tendency (mode, median)

Dispersion (min, max, range)

Can also use M & SD if treating as continuous

Numbers convey order and distance, true zero point - can talk meaningfully about ratios.

Continuous

Distribution (shape – skewness, kurtosis)

Central tendency (median, mean)

Dispersion (min, max, range, SD)

Mean

<-Kurt->

<-SD->

<-Skew

Skew->

Four mathematical qualities (parameters) allow one to describe a continuous distribution which as least roughly follows a bell curve shape:

• 1st = mean (central tendency)

• 2nd = SD (dispersion)

• 3rd = skewness (lean / tail)

• 4th = kurtosis (peakedness / flattness)

Mean (1st moment )

• Average score

• Mean =  X / N

• Use for ratio data or interval (if treating it as continuous).

• Influenced by extreme scores (outliers)

Standard Deviation (2nd moment )

• SD = square root of Variance

=  (X - X)2

N – 1

• Standard Error (SE) = SD / square root of N

Skewness (3rd moment )

• Lean of distribution

• +ve = tail to right

• -ve = tail to left

• Can be caused by an outlier

• Can be caused by ceiling or floor effects

• Can be accurate (e.g., the number of cars owned per person)

Skewness (3rd moment )

• Negative skew

• Positive skew

Kurtosis (4th moment )

• Flatness or peakedness of distribution

• +ve = peaked

• -ve = flattened

• Be aware that by altering the X and Y axis, any distribution can be made to look more peaked or more flat – so add a normal curve to the histogram to help judge kurtosis

Kurtosis (4th moment )

Red = Positive (leptokurtic)

Blue = negative (platykurtic)

• For normal distributions, approx. +/- 1 SD = 68%+/- 2 SD ~ 95%+/- 3 SD ~ 99.9%

• Bi-modal

• Multi-modal

• Positively skewed

• Negatively skewed

• Flat (platykurtic)

• Peaked (leptokurtic)

• View histogram with normal curve

• Deal with outliers

• Skewness / kurtosis <-1 or >1

• Skewness / kurtosis significance tests

Skewed Distributions& the Mode, Median & Mean

+vely skewed mode < median < mean

Symmetrical (normal) mean = median = mode

-vely skewed mean < median < mode

More on Graphing(Visualising Data)

Graphs:

Reveal data

Communicate complex ideas with clarity, precision, and efficiency

Show the data

Substance rather than method

Avoid distortion

Present many numbers in a small space

Make large data sets coherent

• Encourage eye to make comparisons

• Reveal data at several levels

• Purpose: Description, exploration, tabulation, decoration

• Closely integrated with statistical and verbal descriptions

Some lapses intentional, some not

Lie Factor = size of effect in graph size of effect in data

Leaving out important context

Lack of taste and aesthetics

Trade-off between amount of information, simplicity, and accuracy

“It is often hard to judge what users will find intuitive and how [a visualization] will support a particular task” (Tweedie et al)

Distortive Variations in Scale California 1964-1990

Distortive Variations in Scale California 1964-1990

Restricted Scales California 1964-1990

Restricted Scales California 1964-1990

People Histogram Variables (Bivariate)

Separate Graphs Variables (Bivariate)

Clustered bar chart Variables (Multivariate)

19 Variables (Multivariate)th vs. 20th century causes of death

Demographic distribution of age Variables (Multivariate)

Where partners first met Variables (Multivariate)

Line graph Variables (Multivariate)

Line graph Variables (Multivariate)

Causes of Mortality Variables (Multivariate)

Bivariate Normality Variables (Multivariate)

Exampes of More Complex Graphs Variables (Multivariate)

Sea Temperature Variables (Multivariate)

Sea Temperature Variables (Multivariate)