1 / 6

Steps to Performing a Cluster Analysis

Steps to Performing a Cluster Analysis. Rod Funk Chestnut Health Systems Bloomington, IL. Performing a Cluster Analysis. First step is deciding on what variables you want to cluster on Data can be continuous, counts or dichotomous

jenna
Download Presentation

Steps to Performing a Cluster Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Steps to Performing a Cluster Analysis Rod Funk Chestnut Health Systems Bloomington, IL

  2. Performing a Cluster Analysis • First step is deciding on what variables you want to cluster on • Data can be continuous, counts or dichotomous • Are the variables at one time point or are you wanting to look at trajectories across time • If across time, data will need to be in horizontal format: one row per adolescent • We name variables by time with a suffix for wave; _0, for intake, _3 for 3 months (i.e. dcs_0, dcs_3, dcs_6, etc.) • Cluster analysis also expects there to be data for every variable used in the analysis. If you are missing just one variable for a record, no clusters will be calculated for that record.

  3. Handling Missing Data • Scale Level: In creating a scale that has shown good internal consistency (alpha>.7) we calculate using the average of answers as long was they have 3 valid answers: • Compute dcs=rnd(mean.3(l3a15d,l3a16d,l3a17d,l3a18d,l3a19d)*5). • Item level: random replacement of missing values • sort cases by loc xchk1. • rmv ms2w=median(s2w,2). • compute ms2w=rnd(ms2w). • This replaces a missing S2w with the median of the 4 surrounding cases

  4. Handling Missing Data • Replacement of variables across time • For scales where items not asked: • Use regression on scale using other items in cluster at that wave along with the intake and last wave values • For missing a wave of data: As long as it is not the first or last wave, interpolate using the average of the two surrounding waves.

  5. Running the Cluster Analysis • Sample syntax • CLUSTER Zpci_0 Zrpci_3 Zrpci_6 Zrpci_9 Zrpci_12 Zpci_30 Zici_0 Zrici_3 Zrici_6 Zrici_9 Zrici_12 ZSco01 Zmdci_0 Zrdci_3 Zrdci_6 Zrdci_9 Zrdci_12 ZSco02 Zl3v_0 Zrl3d_3 Zrl3d_6 Zrl3d_9 Zrl3d_12 Zl3d_30 Zl3w_0 Zrl3e_3 Zrl3e_6 Zrl3e_9 Zrl3e_12 Zl3e_30 Zmaxce_0 Zrmaxce_3 Zrmaxce_6 Zrmaxce_9 Zrmaxce_12 Zmaxce_30 • /METHOD WARD • /MEASURE= SEUCLID • /PRINT SCHEDULE • /PLOTS NONE • /SAVE CLUSTER(2,12) .

  6. Demonstration • Purpose • To Show how to take the results of the cluster and create a table and figures for validating and deciding on the proper number of clusters. • Will cover pivot tables in SPSS output, pasting into Excel and graphing in Excel

More Related