+. Machine Learning and Data Mining Clustering. (adapted from) Prof. Alexander Ihler. TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A. Unsupervised learning. Supervised learning Predict target value (“y”) given features (“x”) Unsupervised learning
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
(adapted from) Prof. Alexander Ihler
TexPoint fonts used in EMF.
Read the TexPoint manual before you delete this box.: AAA
Define a distance between clusters (return to this)
Initialize: every example is a cluster
Compute distances between all clusters (store for efficiency)
Merge two closest clusters
Save both clustering and sequence of cluster operations
“Dendrogram”Hierarchical Agglomerative Clustering
Initially, every datum is a cluster
In matlab: “linkage” function (stats toolbox)
produces minimal spanning tree.
avoids elongated clusters.
Various experimental conditions
What genes change together?
What conditions are similar?
Cluster on both genes and conditionsExample: microarray expression
(can increasing k ever increase the cost?)
Number of Clusters
Scree is a loose accumulation of broken rock at the base of a cliff or mountain.
Maximum Likelihood estimates
We’ll model each cluster using one of these Gaussian “bells”…
Total responsibility allocated to cluster c
Fraction of total assigned to cluster c
Weighted covariance of assigned data
(use new weighted means here)
Weighted mean of assigned data