Quantitative Tools for Qualitative Data

Quantitative Tools for Qualitative Data Richard Bell University of Melbourne

For copies of this presentation • 130 slides • (about 500kb in a zipped file) • email: rcb@unimelb.edu.au

What kind of Qualitative Data can be Analysed? • Not raw continuous text data • Discrete text units that are replicated • Any kind of coding that has been made

What does the data have to look like • It must be able to be represented by a table • not necessarily a two-way table • for example • Giegler & Klein coding of personal advertisements • a four-way table: magazine, sex, concept, category

Giegler & Klein data as a four-way table

Here, data is stored as a table, the first 4 columns define the cells, the last column gives the frequency in the cell. To analyse this data at a case level Use the SPSS WEIGHT BY function ie WEIGHT BY FREQ.

Kinds of tables • Rows are participants, columns are categories • Rows are categories, columns are participants • Rows are one set of categories, columns are another set of categories

Data in cells of table • Indicator to indicate present/absence of relationship between rows and columns • Frequencies or counts of indicators • Values of categories

Indicator of present/absence of relationship between rows and columns

Proportions of Activities by Site (Frequencies) Data from Huber (1997) Site A B C D E F G H I SIN 0.11 0.07 0.09 0.13 0.15 0.16 0.15 0.12 0.12 SGR 0.07 0.05 0.09 0.13 0.18 0.16 0.2 0.18 0.2 TCO 0.5 0.46 0.43 0.34 0.33 0.28 0.26 0.25 0.2 TIN 0.17 0.21 0.19 0.09 0.13 0.2 0.13 0.16 0.16 TGR 0.07 0.09 0.11 0.14 0.1 0.07 0.13 0.16 0.13 TOP 0.09 0.12 0.09 0.17 0.11 0.13 0.12 0.14 0.18 SIN: Students learn as self-regulated individuals SGR: Students learn in autonomous groups TCO: Teacher is in control TIN: Teacher dominates, but allows some individual autonomy TGR: Teacher dominates, but allows some small group autonomy TOP: Teacher dominates, but is open to students' initiatives

Values of categories

Getting data into statistical packages such as SPSS • Transfer data directly from qualitative packages such as Nvivo • Use SPSS text-import wizard (best with precoded data, ie numbers) • Enter data by hand

Transfer data from qualitative packages • Need to be able to export tables. • Should only be done for tables where rows (or columns) are units of analysis (ie documents or respondents) • should be saved as a text file (ie has the extension .txt as in table1.txt)

Transfer data from printed table • Type into SPSS • Transfer table from Word document • Word document to Excel spreadsheet • Excel spreadsheet to SPSS spreadsheet

The Table in Word 1. Remove the heading row

1. Remove the headings Move subheadings into a column

Insert a new column into the table

Copy subheading into empty column cells that subheading applied to

Shorten text and insert headings in columns that will become SPSS variable names (ie < 9 characters no spaces Select table and copy to clipboard

Open Excel Paste from clipboard

Save spreadsheet

Open SPSS Under the File pull down to Open new Data

Change the file type to Excel files [.xls] And open the saved Excel spreadsheet

If you have names as column headings in the first row of the Excel spreadsheet SPSS can read them as its variable names

SPSS opens the file (the variable view)

The data view Notice there are dud lines in this file -they need to be edited out

The file fixed up

Now we need to change our a) repeated phrases (variable ‘type’) b) symbols (variables p1 to p8) into numbers Do this thru Automatic Recode under the ‘Transform’ tab

Need to create a numeric variable into which values of alphanumeric variable are transformed (alphanumeric values saved as labels)

Transferring Cross-Category tables into SPSS [where Rows are one set of categories, columns are another set of categories] • Three types of table: • Cells of the table contain frequencies • Cells of the table contain other data • Cells of the table contain binary indicator (yes/no, true/false, present/absent etc)

Transferring Frequency Tables: 1 If only two dimensions to table (rows are categories of one variable, columns are categories of another) • can feed table straight in as table • easy but won’t have labelled output • feed table in cell by cell (as for more complex tables) • more complex but allows for labelled output and other possibilities

Feeding table in as table • Only have cells of table as data • Can only run one procedure (correspondence analysis) via syntax.

Feeding table in cell by cell • Have to use syntax (data list function) data list free / block slice row column frequency. begin data. 1 1 1 287 1 1 2 143 1 2 1 94 1 2 2 23 end data.

Data list FREE / EMS PMS GENDER MARSTAT FREQ. Weight by freq. Begin data. 1 1 1 1 17 1 1 1 2 4 1 1 2 1 28 1 1 2 2 11 1 2 1 1 36 1 2 1 2 4 1 2 2 1 17 1 2 2 2 4 2 1 1 1 54 2 1 1 2 25 2 1 2 1 60 2 1 2 2 42 2 2 1 1 214 2 2 1 2 322 2 2 2 1 68 2 2 2 2 130 end data. Var labels EMS, 'Extramarital Sex'/ PMS, 'Premarital Sex' / GENDER, 'Gender' / MARSTAT,'Marital Status'. Value labels EMS, PMS, 1 'Yes' 2 'No' / GENDER, 1 'Women' 2 'Men' / MARSTAT, 1 'Divorced' 2 'Still Married'.

‘Traditional’ Quantitative Methods for Qualitative Data • Miles & Huberman (1994) • hierarchical cluster analysis • Giegler & Klein (1994) • correspondence analysis • Bazely (2002) • cluster analysis • correspondence analysis

Cluster Analysis Figure 9.11 (p.203) from Graham Gibbs (2002) Qualitative data Analysis: Explorations with Nvivo as an SPSS data file

Cluster Analysis: Solution I Dendrogram using Average Linkage (Between Groups): Chi-square measure Rescaled Distance Cluster Combine C A S E 0 5 10 15 20 25 Label Num +---------+---------+---------+---------+---------+ Worklink 10 òø Youth Training 11 òôòòòòòòòòòø Adult training 1 ò÷ùòòòòòø Redundancy Counselli 6 òûòòòòòòòòò÷ó Start Up Business un 7 ò÷ùòòòòòòòòòòòòòòòòòø Training Access Poin 8 òøóó Workers Coops 9 òôòòòòòòòòòòòòòòò÷ùòòòòòø Business Access Sche 3 ò÷óùòòòòòòòø Careers & Education 4 òòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòò÷óó BCETA 2 òòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòò÷ó Careers Information 5 òòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòò÷

Cluster Analysis: Solution II Dendrogram using Average Linkage (Between Groups): Anderberg’s D Measure Rescaled Distance Cluster Combine C A S E 0 5 10 15 20 25 Label Num +---------+---------+---------+---------+---------+ Careers & Education 4 òûòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòø Training Access Poin 8 ò÷ó BCETA 2 òûòòòòòòòòòòòòòòòòòòòòòòòøó Start Up Business un 7 ò÷ùòòòòòòòøó Careers Information 5 òòòòòòòòòòòòòòòòòòòòòòòòò÷ùòòòòòòòòòòòòòòòú Adult training 1 òòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòò÷ó Worklink 10 òòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòú Youth Training 11 òòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòú Workers Coops 9 òòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòú Redundancy Counselli 6 òòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòú Business Access Sche 3 òòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòò÷

Cluster Analysis: Solution III Dendrogram using Single Linkage Rescaled Distance Cluster Combine C A S E 0 5 10 15 20 25 Label Num +---------+---------+---------+---------+---------+ Careers & Education 4 òûòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòø Training Access Poin 8 ò÷ó Worklink 10 òòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòú Youth Training 11 òòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòú Workers Coops 9 òòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòú Business Access Sche 3 òòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòú BCETA 2 òøó Start Up Business un 7 òúó Careers Information 5 òôòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòú Adult training 1 ò÷ó Redundancy Counselli 6 òòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòò÷

Cluster Analysis • Varies according with coefficient chosen as measure of association between rows (or columns) • Varies according to method of clustering • Use with extreme caution

Other Quantitative Methods • Find weights for categories of variable that maximize relationships between variables • correspondence analysis • finds weights for categories of row and categories of column • also traditional least-squares procedures • eg regression, principal components & others

Correspondence Analysis • Similar to principal components • Originally derived for tables of frequencies • [for statistics to apply need one respondent per cell, but can be used with multiple responses across cells] • but can be used with indicator data • Can produce separate maps of relationships between categories of rows or columns • Can produce a joint map of categories of rows or columns

Giegler & Klein • Examined personal advertisements • in a number of German magazines • eg • Young man, 35 y, 176cm, slim with car, good income, looks for a lovely high-bosomed and well-developed partner for a common future.

Data: One row per ad Each column contains number of instances for each coding category CI IM AP HEC FB CLC SEX BA HIP IV SBE SB FO HT NAT 30Y 45Y 60Y OLD PO 1001 2 2 1 0 1 3 0 1 0 1 2 0 1 1 0 1 0 0 0 2 1002 2 1 0 0 1 0 0 0 0 1 1 1 0 1 1 0 0 1 0 1 ie Each ad will appear a number of times in the cell of any table – total frequency of table is the number of codings not the number of ads

Cut-down version of Giegler & Klein example

Correspondence Analysis • In SPSS one of the data reduction options (like factor analysis) as Correspondence Analysis [can be run as syntax or point-and-click] • also a syntax-only option called ANACOR which is more limited but can analyse a table directly when the only data in the SPSS spreadsheet is the table frequencies.

Quantitative Tools for Qualitative Data