210 likes | 286 Views
Visualizing CDC June 2008 data with Hamming and Euclidean distance metrics to cluster samples based on spoligotypes, Miru, CDC, and SpolDB3 families. Explore relationships between CDC families, Mycobacterium strains, and distance measures.
E N D
Visualization of CDC June 2008 data using various distance metrics
Distance metrics • Hamming Distance between spoligotypes • Euclidean Distance between Miru
CDC June 2008 data • 9337 samples • Each sample has spoligotype, miru, CDC family, SpolDB3 family. • We clustered the data using Hidden Parent Assumption and Euclidean distance between Mirus to determine distance between samples.
CDC June 2008 data – CDC families Indo-Oceanic East-African Indian Euro-American Mycobacterium bovis East Asian Mycobacterium africanum
Miru Distance vs. Hamming Distance for all connectors(edges)
Miru Distance vs. Hamming Distance for single-linked connectors(edges)
Hamming Distance vs. Number of Sample Pairs for all connectors(edges) Miru Distance = 0
Hamming Distance vs. Number of Sample Pairs for single-linked connectors(edges) with Miru Distance = 0