KNN_ Assignment

Classification Using K-Nearest Neighbor Assignment Prepared By Rekha G

OUTLINE • KNN • KNN: Classification Approach • KNN: Euclidean distance • Nearest Neighbor and Exemplar • Nearest Neighbor Search • The kNN Algorithm • Closest Neighbors • Choosing appropriate k • Finding error with changed k

KNN • K-NearestNeighbors(KNN) • Simple,butaverypowerfulclassificationalgorithm • Classifiesbasedonasimilaritymeasure • Non-parametric • Lazylearning 🞑 Doesnot“learn”untilthetestexampleisgiven 🞑 Whenever wehavea newdata to classify,wefind its K-nearestneighborsfromthetraining data

Pros and Cons of KNN Pros • It is very simple algorithm to understand and interpret. • It is very useful for nonlinear data because there is no assumption about data in this algorithm. • It is a versatile algorithm as we can use it for classification as well as regression. • It has relatively high accuracy but there are much better supervised learning models than KNN. Cons • It is computationally a bit expensive algorithm because it stores all the training data. • High memory storage required as compared to other supervised learning algorithms. • Prediction is slow in case of big N. • It is very sensitive to the scale of data as well as irrelevant features.

THE KNN Algorithm

KNN: Classification Approach • Classifiedby“MAJORITYVOTES”foritsneighbor classes 🞑 Assignedto the mostcommonclassamongst its K- nearestneighbors(bymeasuring“distant”between data)

KNN: Example

KNN: Pseudocode

KNN: Example

KNN: Euclidean distance

KNN: Euclidean distance matrix

Supervised Unsupervised • Labeled Data • Unlabeled Data

Distance

Distances • Distance are used to measure similarity • There are many ways to measure the distances between two instances

Distances • Manhattan Distance |X1-X2| + |Y1-Y2| • Euclidean Distance

Distance ManhattanDistance EuclideanDistance

Properties of Distance • Dist (x,y) >= 0 • Dist (x,y) = Dist (y,x) are Symmetric • Detours can not Shorten Distance Dist(x,z) <= Dist(x,y) + Dist (y,z) z y X X y z

Distance Hamming Distance

DistancesMeasure • Distance Measure – What does it mean “Similar"? • Minkowski Distance • Norm: • Chebyshew Distance • Mahalanobis distance: d(x , y) = |x – y|TSxy1|x – y|

Nearest Neighbor and Exemplar

Exemplar • Arithmetic Mean • Geometric Mean • Medoid • Centroid

Arithmetic Mean

Geometric Mean

Nearest Neighbor Search • Given: a set P of n points in Rd • Goal: a data structure, which given a query point q, finds the nearest neighborp of q in P p q

Definition of Nearest Neighbor K-nearest neighbors of a record x are data points that have the k smallest distance to x

K-NN • (K-l)-NN: Reduce complexity by having a threshold on the majority. We could restrict the associations through (K-l)-NN.

K-NN • (K-l)-NN: Reduce complexity by having a threshold on the majority. We could restrict the associations through (K-l)-NN.K=5

K-NN • Select 5 Nearest Neighbors as Value of K=5 by Taking their Euclidean Disances

K-NN • Decide if majority of Instances over a given value of K Here, K=5.

Example

KNN Example

Scatter Plot

Euclidean Distance From Each Point

3 Nearest NeighBour

KNN Classification

Variation In KNN

Different Values of K

The kNN Algorithm • The kNN algorithm begins with a training dataset madeupofexamplesthatareclassifiedintoseveral categories,aslabeledbyanominalvariable. • Assume that we have a test dataset containing unlabeledexamplesthatotherwisehavethesame featuresasthetrainingdata. • Foreachrecordinthetestdataset,kNNidentifiesk recordsinthetrainingdatathatarethe"nearest"in similarity,wherekisanintegerspecifiedinadvance. • Theunlabeledtestinstanceisassignedtheclassof themajorityoftheknearestneighbors

Classify me now!

Example

Distance • Euclidean distance is specified by the following formula, where p andqarethexamplestobecompared,eachhavingnfeatures.The termp1referstothevalueofthefirstfeatureofexamplep,while q1referstothevalueofthefirstfeatureofexampleq: • The distance formula involves comparing the values of each feature. For example, to calculate the distance between the tomato (sweetness = 6, crunchiness = 4), and the green bean (sweetness=3,crunchiness=7),wecanusetheformulaasfollows:

Closest Neighbors

Choosing appropriate k • Deciding how many neighbors to use for kNN determineshowwellthemodewillgeneralizeto futuredata. • Thebalancebetweenoverfittingandunderfitting thetrainingdataisaproblemknownasthebias- variancetradeoff. • Choosingalargekreducestheimpactorvariance caused by noisy data, but can bias the learner such thatitrunstheriskofignoringsmall,butimportant patterns. • •

Choosing appropriate k

Choosing appropriate k • Inpractice,choosingkdependsonthedifficulty oftheconcepttobelearnedandthenumberof recordsinthetrainingdata. • Typically,kissetsomewherebetween3and10. One common practice is to set k equal to the squarerootofthenumberoftrainingexamples. • In the classifier, we might set k = 4, because there were 15 example ingredients in the trainingdataandthesquarerootof15is3.87.

Sample Application

Dataset

KNN – Classification : Dataset

Pre-processing

KNN_ Assignment

KNN_ Assignment

Presentation Transcript

Assignment

Assignment

Assignment

Assignment

Assignment

Assignment

Assignment

Assignment

Assignment

Assignment

Assignment

Assignment

ASSIGNMENT

Assignment

ASSIGNMENT

Assignment

Assignment

Assignment

ASSIGNMENT

Assignment Experts - Assignment Service

Assignment

Assignment help|Online Assignment help| Assignment expert