130 likes | 223 Views
Lab: SVM (various libraries) and a little on trees. Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 9 b , March 28, 2014. Support Vector Machine. We will cover the theory, formulae, etc. next Tuesday – today will be some lead up lab exercises
E N D
Lab: SVM (various libraries) and a little on trees Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 9b, March 28, 2014
Support Vector Machine • We will cover the theory, formulae, etc. next Tuesday – today will be some lead up lab exercises • Complete these before next Tuesday – not checked but it will help a lot in understanding the material on Tuesday • SVM - general (nonlinear) classification, regression and outlier detection with an intuitive model representation
kernlab • http://escience.rpi.edu/data/DA/svmbasic_notes.pdf • Some scripts: Lab9b_<n>_2014.R • Work through the examples • They start gently and then get more open-ended • Remember you do (may) not yet know what SVM is/ does (e.g. non-linear), so: • Explore, try some things, come with questions on Tuesday
kernlab, svmpath and klaR • http://escience.rpi.edu/data/DA/v15i09.pdf • Start at page 9 (bottom) • Work through the examples • Works on familiar datasets and samples procedures from 4 libraries (theseare the most used) • kernlab • e1071 • svmpath • klaR
R-SVM • http://www.stanford.edu/group/wonglab/RSVMpage/r-svm.tar.gz • http://www.stanford.edu/group/wonglab/RSVMpage/R-SVM.html • Read/ skim the paper • Explore this method on a dataset of your choice, e.g. one of the R built-in datasets
In case you did not do this tree example # Regression Tree Example library(rpart) # build the tree fitM <- rpart(Mileage~Price + Country + Reliability + Type, method="anova", data=cu.summary) printcp(fitM) # display the results …. Root node error: 1354.6/60 = 22.576 n=60 (57 observations deleted due to missingness) CP nsplitrel error xerrorxstd 1 0.622885 0 1.00000 1.03165 0.176920 2 0.132061 1 0.37711 0.51693 0.102454 3 0.025441 2 0.24505 0.36063 0.079819 4 0.011604 3 0.21961 0.34878 0.080273 5 0.010000 4 0.20801 0.36392 0.075650
Mileage… plotcp(fitM) # visualize cross-validation results summary(fitM) # detailed summary of splits
par(mfrow=c(1,2)) rsq.rpart(fitM) # visualize cross-validation results
# plot tree plot(fitM, uniform=TRUE, main="Regression Tree for Mileage ") text(fitM, use.n=TRUE, all=TRUE, cex=.8) # prune the tree pfitM<- prune(fitM, cp=0.01160389) # from cptable # plot the pruned tree plot(pfitM, uniform=TRUE, main="Pruned Regression Tree for Mileage") text(pfitM, use.n=TRUE, all=TRUE, cex=.8) post(pfitM, file = "ptree2.ps", title = "Pruned Regression Tree for Mileage")
# Conditional Inference Tree for Mileage fit2M <- ctree(Mileage~Price + Country + Reliability + Type, data=na.omit(cu.summary))
Assignments to come… • Assignment 7: Predictive and Prescriptive Analytics. Due ~ week 11. 20% (15% written and 5% oral; individual);
Admin info (keep/ print this slide) • Class: ITWS-4963/ITWS 6965 • Hours: 12:00pm-1:50pm Tuesday/ Friday • Location: SAGE 3101 • Instructor: Peter Fox • Instructor contact: pfox@cs.rpi.edu, 518.276.4862 (do not leave a msg) • Contact hours: Monday** 3:00-4:00pm (or by email appt) • Contact location: Winslow 2120 (sometimes Lally 207A announced by email) • TA: Lakshmi Chenicheri chenil@rpi.edu • Web site: http://tw.rpi.edu/web/courses/DataAnalytics/2014 • Schedule, lectures, syllabus, reading, assignments, etc.