Introduction to spss
Download
1 / 76

Introduction to SPSS - PowerPoint PPT Presentation


  • 187 Views
  • Uploaded on

Introduction to SPSS. Data types and SPSS data entry and analysis. In this session. What does SPSS look like? Types of data (revision) Data Entry in SPSS Simple charts in SPSS Summary statistics Contingency tables and crosstabulations Scatterplots and correlations

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Introduction to SPSS' - chloe-brock


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Introduction to spss

Introduction to SPSS

Data types and SPSS

data entry and analysis


In this session
In this session

  • What does SPSS look like?

  • Types of data (revision)

  • Data Entry in SPSS

  • Simple charts in SPSS

  • Summary statistics

  • Contingency tables and crosstabulations

  • Scatterplots and correlations

  • Tests of differences of means



Aspects of spss
Aspects of SPSS

  • Menus - Analyse and Charts esp.

  • Spreadsheet view of data

    • Rows are cases (people, respondents etc.)

    • Columns are Variables

  • Variable view of data

    • Shows detail of each variable type



In spss
In SPSS

  • We change ticks etc. on a questionnaire into numbers

  • One number for each variable for each case

  • How we do this depends on the type of variable/data


Types of data
Types of data

  • Nominal

  • Ranked

  • Scales/measures

  • Mixed types

  • Text answers (open ended questions)


Nominal categorical
Nominal (categorical)

  • order is arbitrary

  • e.g. sex, country of birth, personality type, yes or no.

  • Use numeric in SPSS and give value labels.

    (e.g. 1=Female, 2=Male, 99=Missing)

    (e.g. 1=Yes, 2=No, 99=Missing)

    (e.g. 1=UK, 2=Ireland, 3=Pakistan, 4=India, 5=other, 99=Missing)


Ranks or ordinal
Ranks or Ordinal

  • in order, 1st, 2nd, 3rd etc.

  • e.g. status, social class

  • Use numeric in SPSS with value labels

    • E.g. 1=Working class, 2=Middle class, 3=Upper class

    • E.g. Class of degree, 1=First, 2=Upper second, 3=Lower second, 4=Third, 5=Ordinary, 99=Missing


Measures scales
Measures, scales

  • Interval - equal units

    • e.g. IQ

  • Ratio - equal units, zero on scale

    • e.g. height, income, family size, age

    • Makes sense to say one value is twice another

  • Use numeric (or comma, dot or scientific) in SPSS

    • E.g. family size, 1, 2, 3, 4 etc.

    • E.g. income per year, 25000, 14500, 18650 etc.


Mixed type
Mixed type

  • Categorised data

  • Actually ranked, but used to identify categories or groups

    • e.g. age groups

    • = ratio data put into groups

  • Use numeric in SPSS and use value labels.

    • E.g. Age group, 1=‘Under 18’, 2=‘18-24’, 3=‘25-34’, 4=‘35-44’, 5=‘45-54’, 6=‘55 or greater’


Text answers
Text answers

  • E.g. answers to open-ended questions

  • Either enter text as given (Use String in SPSS)

  • Or

  • Code or classify answers into one of a small number types. (Use numeric/nominal in SPSS)


Data entry in spss
Data Entry in SPSS

  • Video by Andy Field


Frequency counts
Frequency counts

  • Used with categorical and ranked variables

  • e.g. gender of students taking Health and Illness option



Central tendency
Central Tendency Illness option

  • Mean

    • = average value

    • sum of all the values divided by the number of values

  • Mode

    • = the most frequent value in a distribution

    • (N.B. it is possible to have 2 or more modes, e.g. bimodal distribution)

  • Median

    • = the half-way value, or the value that divides the ordered distribution in the middle

    • The middle score when scores are ordered

    • N.B. need to put values into order first


Dispersion and variability
Dispersion and variability Illness option

  • Quartiles

    • The three values that split the sorted data into four equal parts.

    • Second Quartile = median.

    • Lower quartile = median of lower half of the data

    • Upper quartile = median of upper half of the data

    • Need to order the individuals first

    • One quarter of the individuals are in each inter-quartile range


Used on box plot
Used on Box Plot Illness option

Age of Health and Illness students

Upper quartile

Median

Lower quartile


Variance
Variance Illness option

  • Average deviation from the mean, squared

  • 5.20 is the Sum of Squares

  • This depends on number of individuals so we divide by n (5)

  • Gives 1.04 which is the variance


Standard deviation
Standard Deviation Illness option

  • The variance has one problem: it is measured in units squared.

  • This isn’t a very meaningful metric so we take the square root value.

  • This is the Standard Deviation


Using spss
Using SPSS Illness option

  • ‘Analyse>Descriptive>Explore’ menu.

  • Gives mean, median, SD, variance, min, max, range, skew and kurtosis.

  • Can also produce stem and leaf, and histogram.


Charts in spss
Charts in SPSS Illness option

  • Use ‘Chart Builder’ from ‘Graph’ menu or the Legacy menu

  • And/or double click chart to edit it.

  • E.g. double click to edit bars (e.g. to change from colour to fill pattern).

  • Do this in SPSS first before cut and paste to Word

  • Label the chart (in SPSS or in Word)


Stem and leaf plots
Stem and leaf plots Illness option

  • e.g. age of students taking Health and Illness option

  • good at showing

    • distribution of data

    • outliers

    • range


Stem and leaf plots e g
Stem and leaf plots e.g. Illness option


Box plot
Box Plot Illness option


Box plot1
Box Plot Illness option

Fill colour changed.

N.B. numbers refer to case numbers.


Histograms and bar charts
Histograms and bar charts Illness option

  • Length/height of bar indicates frequency


Histogram
Histogram Illness option

Fill pattern suitable for black and white printing


Changing the bin size
Changing the bin size Illness option

Bin size made smaller to show more bars


Pie chart
Pie chart Illness option

  • angle of segment indicates proportion of the whole


Pie chart1
Pie Chart Illness option

Shadow and one slice moved out for emphasis


Analysing relationships
Analysing relationships Illness option

  • Contingency tables or crosstabulations

    • Compares nominal/categorical variables

      • But can include ordinal variables

    • N.B. table contains counts (= frequency data)

    • One variable on horizontal axis

    • One variable on vertical axis

    • Row and column total counts known as marginals


Example
Example Illness option

  • In the Health and Illness class, are women more likely to be under 21 than men?


Crosstabulations
Crosstabulations Illness option

  • e.g.

  • Use column and row percentages to look for relationships


Spss output
SPSS output Illness option


Chi square
Chi-square Illness option²

Cross tabulations and Chi-square are tests that can be used to look for a relationship between two variables:

  • When the variables are categorical so the data are nominal (or frequency).

  • For example, if we wanted to look at the relationship between gender and age.

  • There are several different types of Chi-square (²), we will be using the 2 x 2 Chi-square



Another example
Another example Illness option

  • The Bank employees data


Bank employees chi square tests
Bank Employees Illness optionChi-Square tests


Chi square analysis on spss
Chi-Square analysis on SPSS Illness option

  • http://www.youtube.com/watch?v=Ahs8jS5mJKk4m15s

  • http://www.youtube.com/watch?v=IRCzOD27NQU

    • From 6m:30s to 9m:50s

  • http://www.youtube.com/watch?v=532QXt1PM-Q&feature=plcp&context=C3ba91a4UDOEgsToPDskJ-ABupdp-Yfvuf4j4fJGzV12m30s


  • Low values in cells
    Low values in cells Illness option

    • Get SPSS to output expected values

    • Look where these are <5

    • Consider recoding to combine cols or rows


    Tabulating questionnaire responses
    Tabulating questionnaire responses Illness option

    • Categorical survey data often “collapsed” for purposes of data analysis

    An analysis on a sample of 2 (e.g. Black African) would not have been very meaningful!


    Recoding variables
    Recoding variables Illness option

    • http://www.youtube.com/watch?v=uzQ_522F2SM&feature=related

      • Ignore t-test for now 6m11s

  • http://www.youtube.com/watch?v=FUoYZ_f6Lxc

    • Uses old version of SPSS, no submenu now. 6m


  • Scatterplots and correlations
    Scatterplots and correlations Illness option

    • Looks for association between variables, e.g.

      • Population size and GDP

      • crime and unemployment rates

      • height and weight

    • Both variables must be rank, interval or ratio (scale or ordinal in SPSS).

    • Thus cannot use variables like, gender, ethnicity, town of birth, occupation.


    Scatterplots
    Scatterplots Illness option

    • e.g. age (in years) versus Number of GCSEs


    Interpretation
    Interpretation Illness option

    • As Y increases X increases

    • Called correlation

    • Regression line model in red


    Correlation measures association not causation
    Correlation measures association not causation Illness option

    • The older the child the better s/he is at reading

    • The less your income the greater the risk of schizophrenia

  • Height correlates with weight

    • But weight does not cause height

    • Height is one of the causes of weight (also body shape, diet, fitness level etc.)

  • Numbers of ice creams sold is correlated with the rate of drowning

    • Ice creams do not cause drowning (nor vice versa)

    • Third variable involved – people swim more and buy more ice creams when it’s warm


  • Scatterplot in spss
    Scatterplot in SPSS Illness option

    • Use Graph menu

    • http://www.youtube.com/watch?v=74BjgPQvIEg8m34s

    • http://www.youtube.com/watch?v=blfflA-34pQ&feature=related4m04s

    • http://www.youtube.com/watch?v=UVylQoG4hZM1m50s, ignore polynomial regression


    Modifying the scatterplot
    Modifying the Scatterplot Illness option

    • http://www.youtube.com/watch?v=803YCYA2AoQ&feature=related4m04s

    • http://www.youtube.com/watch?v=vPzvuMuVXk8&feature=related3m40s


    If mixed data sets
    If mixed data sets Illness option

    • Change point icon and/or colour to see different subsets.

    • Overall data may have no relationship but subsets might.

    • E.g. show male and female respondents.

    • Use Chart builder


    Correlation
    Correlation Illness option

    • Correlation coefficient = measure of strength of relationship, e.g. Pearson’s r

    • varies from 0 to 1 with a plus or minus sign


    Positive correlation
    Positive correlation Illness option

    • as x increases, y increases

    r = 0.7


    Negative correlation
    Negative correlation Illness option

    • as x increases, y decreases

    r = -0.7


    Strong correlation i e close to 1
    Strong correlation (i.e. close to 1) Illness option

    r = 0.9


    Weak correlation i e close to 0
    Weak correlation (i.e. close to 0) Illness option

    r = 0.2


    Interpretation cont
    Interpretation cont. Illness option

    • r2 is a measure of degree of variation in one variable accounted for by variation in the other.

    • E.g. If r=0.7 then r2=.49 i.e. just under half the variation is accounted for (rest accounted for by other factors).

    • If r=0.3 then r2=0.09 so 91% of the variation is explained by other things.


    Significance of r
    Significance of r Illness option

    • SPSS reports if r is significant at α=0.05

    • N.B. this is dependent on sample size to a large extent.

    • Other things being equal, larger samples more likely to be significant.

    • Usually, size of r is more important than its significance


    Pearson s r in spss
    Pearson Illness option’s r in SPSS

    • http://www.youtube.com/watch?v=loFLqZmvfzU6m57s


    Parametric and non parametric
    Parametric and non-parametric Illness option

    • Some statistics rely on the variables being investigated following a normal distribution. – Called Parametric statistics

    • Others can be used if variables are not distributed normally – called Non-parametric statistics.

    • Pearson’s r is a parametric statistic

    • Kendal’s tau and Spearman’s rho (rank correlation) are non-parametric.


    Assessing normality
    Assessing normality Illness option

    • Produce histogram and normal plot


    Use statistical test
    Use statistical test Illness option

    • SPSS provides two formal tests for normality : Kolmogorov-Smirnov (K-S) and Shapiro-Wilks (S-W)

      • But, there is debate about KS

      • Extremely sensitive to departure from normality

      • May erroneously imply parametric test not suitable – especially in small sample

    • So, always use a histogram as well.


    Often can use parametric tests
    Often can use parametric tests Illness option

    • Parametric tests (e.g. Pearson’s r) are robust to departures from normality

    • Small, non-normal samples OK

    • But use non-parametric if

      • Data are skewed (questionnaire data often is)

      • Data are bimodal


    Spearmans s rho
    Spearmans Illness option’s rho

    • http://www.youtube.com/watch?v=r_WQe2c-ISU From 4.14 to 4.56

    • http://www.youtube.com/watch?v=POkFi5vKvI8&feature=fvwrel6m16s


    So far
    So far… Illness option

    • Looked at relationships between nominal variables

      • Gender vs age group

  • Looked at relationships between scale variables

    • Height vs. Weight

  • Now combine the two

  • Groupsvs a scale variable

    • E.g. Gender vs income


  • Reminder iv vs dv
    Reminder – IV vs DV Illness option

    • IV = independent variable

    • What makes a difference, causes effects, is responsible for differences.

    • DV = dependent variable

    • What is affected by things, what is changed by the IV.

    • Gender vs income. Gender = IV, income = DV

    • So we investigate the effect of gender on income


    Example 1 age group vs no of gcses
    Example 1 Illness optionAge group vs. no. of GCSEs

    • Using the Health and Illness class data

    • Age group defines 2 groups

      • Under 21

      • 21 and over

    • Just two groups

    • Can use independent samples t-test

    • Independent because the two groups consist of different people.

    • t-test compares the means of the 2 groups.


    Difference of means
    Difference of means Illness option

    • Do under 21s have more or fewer GCSEs than 21 and overs?

    • Means are different (6.44 & 4.28) but is that significant?


    No significant difference therefore assume equal variances Illness option

    Means are statistically significantly different


    Parametric vs non parametric
    Parametric Illness optionvs non-parametric

    • Just as in the case of correlations, there are both kinds of tests.

    • Need to check if DV is normally distributed.

    • Do this visually

    • Also use statistical tests


    Tests for normality
    Tests for normality Illness option

    • Kolmogorov-Smirnov and Shapiro-Wilk

    • If n>50 use KS

    • If n≤50 use SW

    • Null hypothesis is ‘data are normally distributed’.

    • So if p<0.05 then data are significantly different from a normal distribution – use non-parametric tests

    • If p≥0.05 then no significant difference – use parametric tests


    Checking normality
    Checking normality Illness option

    • Produce histogram of DV

    • Tick box to undertake statistical test

    • Interpret results.


    T test
    t-test Illness option

    • Identify your two groups.

    • Determine what values in the data indicate those two groups (e.g. 1=female, 2=male)

    • Select Analyze:CompareMeans:Independent samples t-test

    • http://www.youtube.com/watch?v=_KHI3ScO8sc9m40s


    Mann whitney u test
    Mann-Whitney U test Illness option

    • Use this when comparing two groups and the DV is not normally distributed

    • http://www.youtube.com/watch?v=7iTvv3m9d_g3m45s


    Comparing 3 or more groups
    Comparing 3 or more groups Illness option

    • ANOVA = Analysis of Variance

    • Analyze: Compare Means: One-way ANOVA

    • http://www.youtube.com/watch?v=wFq1b3QjI1U4m04s

      Useful to get table of means (descriptives) and means plots from ANOVA options.


    Anova means and f value
    ANOVA Means and F value Illness option


    Anova means plot
    ANOVA Means Plot Illness option


    ad