170 likes | 295 Views
This overview explores the concept of computational learning, emphasizing its intuitive aspects and real-world applications. It covers learning by exploration, language processing, and human behavior modeling. Key methods such as classification, regression, and clustering are explained through practical examples from medicine and data analysis. Concerns like generalization accuracy, overfitting, and biases in representation are highlighted. The text also examines learning algorithms, including decision trees and neural networks, demonstrating their effectiveness in approximating complex functions and making predictions.
E N D
Computational Learning An intuitive approach
Human Learning • Objects in world • Learning by exploration and who knows? • Language • informal training, inputs may be incorrect • Programming • A couple of examples of loops or recursions • Medicine • See one, do one, teach one • People: few complex examples, informal, complex behavioral output
Computational Learning • Representation provided • Simple inputs: vectors of values • Simple outputs: e.g. yes or no, a number, a disease • Many examples (thousands to millions) • Quantifiable • + Useful, e.g. automatic generation of expert systems
Concerns • Generalization accuracy • Performance on unseen data • Evaluation • Noise and overfitting • Biases of representation • You only find what you look for.
Three Learning Problems • Classification: from known examples create decision procedure to guess class • Patient data -> guess disease • Regression: from known examples create decision procedure to guess real numbers • Stock data -> guess price • Clustering: putting data into “meaningful” groups • Patient Data -> new diseases
Simple data attribute-value representation • <sex: male, age: 50, smoker:true, blood_pressure = low, … disease: emphysema> = 1 example • Sex, age, smoker, etc are the attributes • Values are male, 50, true etc • Only data of this form allowed.
The Data: squares and circles ? ? ? ?
Learning a (hyper)-line • Given data • Construct line – the decision boundary • Usually defined by a normal n • data is on one side if dot product data * n >0 • Recall <x1,x2> *<y1,y2> is x1*y1+x2*y2. • What a neuron does
1-Nearest Neighbor classification • If x is a example, find the nearest neighbor NN in the data using euclidean distance. • Guess the class of c is the class of NN • K-nearest neighbor: let the k-nearest neighbors vote • Renamed as IB-k in Weka
Neural Net • A single perceptron can’t learn some simple concepts, like XOR • A multilayered network of perceptrons can learn any boolean function • Learning is not biological but follows from multivariable calculus
Gedanken experiments • Try ML algorithms on imagined data • Ex. Concept: x>y, ie. • Data looks like 3,1,+. 2,4,-. etc • Which algorithms do best? And how well? • Consider the boundaries. • My guesses: • SMO> Perceptron>NearestN>DT.
Check Guesses with Weka • 199 examples. • DT= 92.9 (called J48 in weka) • NN= 97.5 (called IB1 in weka) • SVM = 99.0 (called SMO in weka)