210 likes | 301 Views
This presentation explores childhood diet using cluster analysis at the Young Statisticians' Meeting. Learn about ALSPAC, dietary patterns, k-means cluster analysis, and socio-demographic associations in a birth cohort study. Discover the challenges and solutions in analyzing food frequency data through multivariate methods.
E N D
Andrew Smith Describing childhood diet with cluster analysis Young Statisticians’ meeting. 12th April 2011
Describing diet with cluster analysis • Pauline M. Emmett • P. Kirstin Newby • Kate Northstone • World Cancer Research Fund • MRC, Wellcome Trust, University of Bristol
Outline • Introductions • ALSPAC • Food frequency questionnaires • Dietary patterns • Cluster analysis • k-means cluster analysis • Results • 3 cluster solution • Associations with socio-demographic variables
ALSPAC • Avon Longitudinal Study of Parents and Children • Birth cohort study • 14,541 pregnant women and their children • www.bris.ac.uk/alspac
Dietary patterns • Examine diet as a whole • Analyse multivariate FFQ data • Use correlations between foods • PCA • Cluster analysis Image: Paul / FreeDigitalPhotos.net
Cluster analysis • Separate subjects into non-overlapping groups • Based on ‘distances’ between individuals • Unsupervised learning Image: Boaz Yiftach / FreeDigitalPhotos.net
k-means cluster analysis • Most widely used for dietary patterns • Number of clusters, k, is specified beforehand • Minimises • Distance from each subject to his/her cluster mean • Summed over all subjects in that cluster • Summed over all clusters
Problems with the standard algorithm • Short-sighted • Tends to find solutions that are at a local minimum • So run algorithm 100 times and choose solution that is minimum out of all minima
Reliability of the cluster solution • Split sample in half • Perform separate analyses on each half • See how many children change clusters • Repeat 5 times • 32 out of 8,279 children changed cluster (0.4%)
4177 children Processed Image: Suat Eman, Rawich, Master Isolated Images / FreeDigitalPhotos.net
2065 children Plant-based Image: Suat Eman, Paul, Rob Wiltshire, Simon Howden, winnond / FreeDigitalPhotos.net
2037 children Traditional British Image: Suat Eman, Filomena Scalise, Maggie Smith / FreeDigitalPhotos.net
Summary • Multivariate methods to compress FFQ data into dietary patterns • k-means cluster analysis is widespread but must be applied carefully • Processed, Plant-based and Traditional British clusters in 7-year-old children • Associated with various socio-demographic variables