Quantitative Tools for Qualitative Data

1 / 127

# Quantitative Tools for Qualitative Data - PowerPoint PPT Presentation

Quantitative Tools for Qualitative Data. Richard Bell University of Melbourne. For copies of this presentation 130 slides (about 500kb in a zipped file) email: rcb@unimelb.edu.au. What kind of Qualitative Data can be Analysed?. Not raw continuous text data

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about 'Quantitative Tools for Qualitative Data' - Olivia

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### Quantitative Tools for Qualitative Data

Richard Bell

University of Melbourne

For copies of this presentation

• 130 slides
• (about 500kb in a zipped file)
• email: rcb@unimelb.edu.au
What kind of Qualitative Data can be Analysed?
• Not raw continuous text data
• Discrete text units that are replicated
• Any kind of coding that has been made
What does the data have to look like
• It must be able to be represented by a table
• not necessarily a two-way table
• for example
• a four-way table: magazine, sex, concept, category

Here, data is stored as a table, the first 4 columns define the cells, the last column gives the frequency in the cell.

To analyse this data at a case level

Use the SPSS WEIGHT BY function ie

WEIGHT BY FREQ.

Kinds of tables
• Rows are participants, columns are categories
• Rows are categories, columns are participants
• Rows are one set of categories, columns are another set of categories
Data in cells of table
• Indicator to indicate present/absence of relationship between rows and columns
• Frequencies or counts of indicators
• Values of categories

Indicator of

present/absence of

relationship between

rows and columns

Proportions of Activities by Site (Frequencies)

Data from Huber (1997)

Site A B C D E F G H I

SIN 0.11 0.07 0.09 0.13 0.15 0.16 0.15 0.12 0.12

SGR 0.07 0.05 0.09 0.13 0.18 0.16 0.2 0.18 0.2

TCO 0.5 0.46 0.43 0.34 0.33 0.28 0.26 0.25 0.2

TIN 0.17 0.21 0.19 0.09 0.13 0.2 0.13 0.16 0.16

TGR 0.07 0.09 0.11 0.14 0.1 0.07 0.13 0.16 0.13

TOP 0.09 0.12 0.09 0.17 0.11 0.13 0.12 0.14 0.18

SIN: Students learn as self-regulated individuals

SGR: Students learn in autonomous groups

TCO: Teacher is in control

TIN: Teacher dominates, but allows some individual autonomy

TGR: Teacher dominates, but allows some small group autonomy

TOP: Teacher dominates, but is open to students' initiatives

Getting data into statistical packages such as SPSS
• Transfer data directly from qualitative packages such as Nvivo
• Use SPSS text-import wizard (best with precoded data, ie numbers)
• Enter data by hand
Transfer data from qualitative packages
• Need to be able to export tables.
• Should only be done for tables where rows (or columns) are units of analysis (ie documents or respondents)
• should be saved as a text file (ie has the extension .txt as in table1.txt)
Transfer data from printed table
• Type into SPSS
• Transfer table from Word document
• Word document to Excel spreadsheet
The Table in Word

Shorten text and insert headings in columns that will

become SPSS variable names (ie < 9 characters no spaces

Select table and copy to clipboard

Open Excel

Paste from clipboard

Open SPSS

Under the File pull down to

Open new Data

Change the file type to Excel files [.xls]

And open the saved Excel spreadsheet

If you have names as column headings

in the first row of the Excel spreadsheet

SPSS can read them as its variable names

SPSS opens the file

(the variable view)

The data view

Notice there are dud lines in this file

-they need to be edited out

Now we need to change our

a) repeated phrases (variable ‘type’)

b) symbols (variables p1 to p8)

into numbers

Do this thru Automatic Recode

under the ‘Transform’ tab

Need to create a numeric variable

into which values of alphanumeric

variable are transformed

(alphanumeric values saved as labels)

Transferring Cross-Category tables into SPSS

[where Rows are one set of categories, columns are another set of categories]

• Three types of table:
• Cells of the table contain frequencies
• Cells of the table contain other data
• Cells of the table contain binary indicator (yes/no, true/false, present/absent etc)
Transferring Frequency Tables: 1

If only two dimensions to table (rows are categories of one variable, columns are categories of another)

• can feed table straight in as table
• easy but won’t have labelled output
• feed table in cell by cell (as for more complex tables)
• more complex but allows for labelled output and other possibilities
Feeding table in as table
• Only have cells of table as data
• Can only run one procedure (correspondence analysis) via syntax.
Feeding table in cell by cell
• Have to use syntax (data list function)

data list free

/ block slice row column frequency.

begin data.

1 1 1 287

1 1 2 143

1 2 1 94

1 2 2 23

end data.

Data list FREE

/ EMS PMS GENDER MARSTAT FREQ.

Weight by freq.

Begin data.

1 1 1 1 17

1 1 1 2 4

1 1 2 1 28

1 1 2 2 11

1 2 1 1 36

1 2 1 2 4

1 2 2 1 17

1 2 2 2 4

2 1 1 1 54

2 1 1 2 25

2 1 2 1 60

2 1 2 2 42

2 2 1 1 214

2 2 1 2 322

2 2 2 1 68

2 2 2 2 130

end data.

Var labels

EMS, 'Extramarital Sex'/

PMS, 'Premarital Sex' /

GENDER, 'Gender' /

MARSTAT,'Marital Status'.

Value labels

EMS, PMS, 1 'Yes' 2 'No' /

GENDER, 1 'Women' 2 'Men' /

MARSTAT, 1 'Divorced'

2 'Still Married'.

‘Traditional’ Quantitative Methods for Qualitative Data
• Miles & Huberman (1994)
• hierarchical cluster analysis
• Giegler & Klein (1994)
• correspondence analysis
• Bazely (2002)
• cluster analysis
• correspondence analysis
Cluster Analysis

Figure 9.11 (p.203) from Graham Gibbs (2002)

Qualitative data Analysis: Explorations with Nvivo

as an SPSS data file

Cluster Analysis: Solution I

Dendrogram using Average Linkage (Between Groups): Chi-square measure

Rescaled Distance Cluster Combine

C A S E 0 5 10 15 20 25

Label Num +---------+---------+---------+---------+---------+

Youth Training 11 òôòòòòòòòòòø

Redundancy Counselli 6 òûòòòòòòòòò÷ó

Start Up Business un 7 ò÷ùòòòòòòòòòòòòòòòòòø

Training Access Poin 8 òøóó

Workers Coops 9 òôòòòòòòòòòòòòòòò÷ùòòòòòø

Careers & Education 4 òòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòò÷óó

BCETA 2 òòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòò÷ó

Careers Information 5 òòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòò÷

Cluster Analysis: Solution II

Dendrogram using Average Linkage (Between Groups): Anderberg’s D Measure

Rescaled Distance Cluster Combine

C A S E 0 5 10 15 20 25

Label Num +---------+---------+---------+---------+---------+

Careers & Education 4 òûòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòø

Training Access Poin 8 ò÷ó

BCETA 2 òûòòòòòòòòòòòòòòòòòòòòòòòøó

Start Up Business un 7 ò÷ùòòòòòòòøó

Careers Information 5 òòòòòòòòòòòòòòòòòòòòòòòòò÷ùòòòòòòòòòòòòòòòú

Youth Training 11 òòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòú

Workers Coops 9 òòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòú

Redundancy Counselli 6 òòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòú

Cluster Analysis: Solution III

Rescaled Distance Cluster Combine

C A S E 0 5 10 15 20 25

Label Num +---------+---------+---------+---------+---------+

Careers & Education 4 òûòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòø

Training Access Poin 8 ò÷ó

Youth Training 11 òòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòú

Workers Coops 9 òòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòú

BCETA 2 òøó

Start Up Business un 7 òúó

Careers Information 5 òôòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòú

Redundancy Counselli 6 òòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòò÷

Cluster Analysis
• Varies according with coefficient chosen as measure of association between rows (or columns)
• Varies according to method of clustering
• Use with extreme caution
Other Quantitative Methods
• Find weights for categories of variable that maximize relationships between variables
• correspondence analysis
• finds weights for categories of row and categories of column
• eg regression, principal components & others
Correspondence Analysis
• Similar to principal components
• Originally derived for tables of frequencies
• [for statistics to apply need one respondent per cell, but can be used with multiple responses across cells]
• but can be used with indicator data
• Can produce separate maps of relationships between categories of rows or columns
• Can produce a joint map of categories of rows or columns
Giegler & Klein
• in a number of German magazines
• eg
• Young man, 35 y, 176cm, slim with car, good income, looks for a lovely high-bosomed and well-developed partner for a common future.

Data:

Each column contains number of instances

for each coding category

CI IM AP HEC FB CLC SEX BA HIP IV SBE SB FO HT NAT 30Y 45Y 60Y OLD PO

1001 2 2 1 0 1 3 0 1 0 1 2 0 1 1 0 1 0 0 0 2

1002 2 1 0 0 1 0 0 0 0 1 1 1 0 1 1 0 0 1 0 1

ie Each ad will appear a number of times in the cell of

any table – total frequency of table is the number of

codings not the number of ads

Correspondence Analysis
• In SPSS one of the data reduction options (like factor analysis) as Correspondence Analysis [can be run as syntax or point-and-click]
• also a syntax-only option called ANACOR which is more limited but can analyse a table directly when the only data in the SPSS spreadsheet is the table frequencies.

ANACOR syntax example: Huberman proportions table

shown earlier

Indicates data values separated by spaces

data list free

/ A B C D E F G H J.

begin data.

0.11 0.07 0.09 0.13 0.15 0.16 0.15 0.12 0.12

0.07 0.05 0.09 0.13 0.18 0.16 0.2 0.18 0.2

0.5 0.46 0.43 0.34 0.33 0.28 0.26 0.25 0.2

0.17 0.21 0.19 0.09 0.13 0.2 0.13 0.16 0.16

0.07 0.09 0.11 0.14 0.1 0.07 0.13 0.16 0.13

0.09 0.12 0.09 0.17 0.11 0.13 0.12 0.14 0.18

end data.

do repeat xs = A to J.

compute xs = xs * 100.

end repeat.

ANACOR TABLE = ALL (6,9).

Identifies columns

}Changes data values from proportions to percentages

Simplest ANACOR syntax (just identifies numbers of rows & columns)

Correspondence Analysis

The point-and-click way

How many dimensions?

• Fit of Solution

a.

b.

c.

d.

• Five possible dimensions
• Singular value – square root of eigenvalue
• Inertia – eigenvalues (variance)
• Chi-square – could be partitioned between dimensions
• (only valid if cells in table are independent)

Details for Magazines

Different ways of

describing fit of

each magazine

Location in spatial representation

Similar Fit information

More Complex versions…
• Sometimes known as Multiple Correspondence Analysis
• HOMALSHOMogeneity analysis by Alternating Least Squares
• For example
• The complete data structure of Giegler & Klein
Some other questions
• How well could we predict magazine usage from the other factors?
• Could use
• multinomial regression if cells independent (and sample size very large)
• categorical regression if just want to look at effects

A new issue:

The kind of transformation to be chosen

Kinds of tranformations
• Depends on what we want to assume
• Not inherent in the data
• Basic Kinds
• Nominal - Categorical (unordered categories)
• Ordinal (Assumes data are ordered)
• Numeric -Interval (Assumes data on a scale with equal intervals)
• Spline (smoothes ordinal & nominal transformations)

Model Summary

Dependent Variable: MAGAZINE

Predictors: SEX CONCEPT CATEGORY

• Famous example
• (Not real)

Summary of a qualitative analysis of the characteristics

of groups as postulated by Gutman from Bell & Sirjamaki (1962)

Category Quantifications
• Here the data were all treated as nominal
• Dimensions were quantification values
• Different quantifications for different dimensions
• Only possible for nominal data
• Other (ordinal, numeric) must have same quantification on each dimension. Nominal can also be similarly restricted.
For example: Using regression
• Make the group the dependent variable
• Other nominal variables cannot be multiple-nominal because regression coefficients are unidimensional
• Use other variables to predict group
• Artificial example few cases relatively many variables will give perfect prediction
• Can still compare prediction & evaluate categories
Principal Components: Demographics
• Age Group [treat as ordinal]
• Education Level [treat as ordinal]
• Marital Status [ nominal ]
• Work Status [ nominal – allow different quantications for different dimensions]
Combining Qualitative & Quantitative Data
• The availability of numeric and other transformations
• makes the combining of quantitative & qualitative data
• simple
Combining Qualitative & Quantitative Data
• Use Categorical Regression setting measurement levels appropriately
• Use Categorical Principal Components setting measurement levels appropriately
• Save transformed variables and use ordinary regression or factor analysis for better options (eg hierarchical regression or factor rotation)
Combining Qualitative & Quantitative Data
• Preserve independence of sets of data
• Generalized (more than two sets) non-linear canonical variate analysis
• OVERALS
OVERALS
• A tool for relating sets of variables
• Variant that is a common statistical model is canonical variate analysis (producing a canonical correlation between two sets of variables
• OVERALS
• Allows for more than two sets
• Allows variables to be numeric, categorical or ordinal
A current data set
• PhD project by Simone Pica
• People with psychosis featuring social withdrawal
• 19 young people suffering from psychosis with symptoms of social withdrawal
• Unstructured interviews
• Standard psychiatric measures also completed
Data
• Interviews transcribed, categories formed from content, coding made
• Diagnosis (DSM III-R)
• Scores on quantitative measures
• Symptoms of Negative Schizophrenia (SANS)
Raw material

Um, when I got home I thought it was probably a good thing I didn’t go because um, it sort of relates to motivation as well, I wasn’t really that motivated to go out and deal with people and stuff. If more of my friends were there, I’d probably would have gone, if it was a party and all my friends were there I would have thought cool you know, I’d have to go even if I only had a few dollars, that’s cool, I can go without drinks, cigarettes, I’d just want to be there you know but probably because there would have been only a couple of people I would have known there and the rest of them I wouldn’t have known. I sort of thought no, I wouldn’t have a good time because if I wanted to meet people, I like meeting people, but when I meet people I always have to talk about my psychosis, and whenever I have to talk about my psychosis, its like everyone is listening you know, and they all just stop what they are doing and they listen, “psychosis, what is that?” and then I have to explain everything about it and they are all listening type of thing, honing in type of thing.

Classified material
• 3. EXPERIENCED DIFFICULTY COMMUNICATING
• He couldn’t talk because he became jumbled, he couldn’t focus on one thing he kept thinking about whether his ex-friend was going to mention the letter to other people there
• He stayed in small groups of people throughout the evening in order to avoid saying something inappropriate that would draw attention to him
• When he felt comfortable he found it easier to talk
• He found that the comfortable feeling didn’t last, it wore off when the ‘wall’ came and he found it difficult to think of things to talk about
• When he was with the group of people he didn’t know what to talk to people about so he remained silent
• He didn’t know what to talk about because he couldn’t think of anything intelligent to say
• When he was with people and he didn’t know what to talk about his mind was blank, he didn’t think anything

Fit of Solution

Summary of Analysis

Dimension Sum

1 2

Loss Set 1 .220 .545 .764

Set 2 .359 .267 .626

Set 3 .284 .302 .585

Set 4 .119 .326 .445

Mean .245 .360 .605

Eigenvalue .755 .640

Fit 1.395

Fit of Solution

Summary of Analysis

Dimension

1 2 Sum

Loss PAS .220 .545 .764 31.6%

SANS .359 .267 .626 25.9%

Text .284 .302 .585 24.1%

DSM .119 .326 .445 18.4%

Mean .245 .360 .605 (Loss) 30%

Eigenvalue .755 .640 1.395 (Fit) 70%

Total 1.000 1.000 2.000 100%

Some pointers for Optimal Scaling
• for SPSS optimal scaling
• CATREG & CATPCA have most sophisticated options
• CATREG produces standard regression output
• Both CATREG & CATPCA can
• save transformed variables (for repeating analysis in ordinary mode eg for rotating components)
• Eliminate need to specify range (unlike HOMALS & OVERALS which must have range 1 to n specified)
Some pointers for Optimal Scaling
• Cautions
• In general category quantifications only hold for the set of variables in the analysis
• (Incredibly) there is little published experience with these techniques
• Remember to use in exploratory mode
• Change transformations and see what happens
• Delete outlying variables/categories