Lab 3 DAVID, Clustering and Classification. Yang Li Lin Liu Feb 10 & Feb 11, 2014. DAVID (gene set analysis). http://david.abcc.ncifcrf.gov/summary.jsp Biological processes Molecular function Cellular component. Other gene set analysis tools.
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
Feb 10 & Feb 11, 2014
Key: distance metric/divergence
Average: pairwise distances
Expression in Sample1
Expression in Sample2
Iteration = 0
Iteration = 1
Iteration = 2
Iteration = 3
kmean.cluster <- kmeans(t(ld), 2)
Hamming distance (binary)
Correlation (range: [0, 1])
MIC (Reshef, Reshef and et al. 2011 Science) – Mutual Information Coefficient
Principal Component Analysis
Key difference between LDA and PCA?
mds = cmdscale(D, k = 2)
plot(mds[,1], mds[,2], type="p", main="Clustering using MDS”, xlab = 'mds1', ylab = 'mds2')
Classification is equivalent to prediction with binary outcomes
Machine learning cares more about prediction than statistics
Machine learning is statistics with a focus on prediction, scalability and high dimensional problems
But there’s interconnection between clustering and classification
model1 = svm(t(d[,1:12]),c(rep('Normal',4), rep('Cancer',8)),type='C',kernel='linear')
#KNN k = 1
class::knn(t(ld[,1:12]), t(ld[,13:14]), c(rep('Normal',4), rep('Cancer',8)), k=1)
#KNN k = 3
class::knn(t(ld[,1:12]), t(ld[,13:14]), c(rep('Normal',4), rep('Cancer',8)), k=3)
For graduate-level question, try to think about removing batch effects using PCA
For ComBat software, try to search “srv bioconductor” on Google.