Understanding Node Failure Probability and Its Application in Map-Reduce and K-means Clustering

From W1-S16

From W2-S9 Node failure The probability that at least one node failing is: f= 1 – (1-p)n When n =1; then f =p Suppose p=0.0001 but n=10000, then: f = 1 – (1 -0.0001)10000 = 0.63 [why/how ?] This is one of the most important formulas to know (in general).

From W2-S15 Example • For example suppose the hash functions maps {to, Java, road} to one node. Then • (to,1) remains (to,1) • (Java,1);(Java,1);(Java,1)  (Java, [1,1,1]) • (road,1);(road,1)(road,[1,1]); • Now REDUCE function converts • (Java,[1,1,1])  (Java,3) etc. • Remember this is a very simple example…the challenge is to take complex tasks and express them as Map and Reduce!

From W2-S19 Similarity Example [2] Notice, it requires some ingenuity to come up with key-value pairs. This is key to suing map-reduce effectively

From W3-S14 K-means algorithm Let C = initial k cluster centroids (often selected randomly) Mark C as unstable While <C is unstable> Assign all data points to their nearest centroid in C. Compute the centroids of the points assigned to each element of C. Update C as the set of new centroids. Mark C as stable or unstable by comparing with previous set of centroids. End While Complexity: O(nkdI) n:num of points; k: num of clusters; d: dimension; I: num of iterations Take away: complexity is linear in n.

From W3-S16 Example: 2 Clusters A(-1,2) B(1,2) c c 4 (0,0) C(-1,-2) D(1,-2) c c 2 K-means Problem: Solution is (0,2) and (0,-2) and the clusters are {A,B} and {C,D} K-means Algorithm: Suppose the initial centroids are (-1,0) and (1,0) then {A,C} and {B,D} end up as the two clusters.

From W4-S21 Bayes Rule Prior Posterior

From W4-S25 Example: Iris Flower choose the maximum F=Flower; SL=Sepal Length; SW = Sepal Width; PL=Petal Length; PW =Petal Width Data

From W5-S7 Confusion Matrix Label 1 is called Positive, Label -1 is called Negative Let the number of test samples be N N = N1 + N2 + N3 + N4. False Positive Rate (FPR) = N2/(N2+N4) True Positive Rate (TPR) = N1/(N1+N3) False Negative Rate (FNR) = N3/(N1+N3) True Negative Rate (TNR) = N4/(N4+N2) Accuracy = (N1+N4)/(N1+N2+N3+N4) Precision = N1/(N1+N2) Recall = N1/(N1+N3)

From W5-S9 ROC (Receiver Operating Characteristic) Curves • Generally a learning algorithm A will return a real number…but what we want is a label {1 or -1} • We can apply a threshold..T TPR = 3/4 FPR = 2/5 TPR = 2/4 FPR = 2/5

From W5-S13 Random Variable • A random variable X can take values in a set which is: • discrete and finite. • Lets toss a coin and X = 1 if it’s a head and X=0 if it’s a tail. X is random variable • discrete and infinite(countable) • Let X be the number of accidents in Sydney in a day.. Then X = 0,1,2,….. • Infinite(uncountable) • Let X be the height of a Sydney-sider. • X = 150, 150.11,150.112,……

From W7-S2 These slides are from Steinbach, Pang and Kumar

From W7-S7

From W7-S8

From W9-S9

From W9-S12

From W9-S21

From W9-S26

From W11-S9 The Key Idea • Decompose the User x Rating matrix into: • User x Rating = ( User x Genre ) x (Genre x Movies) • Number of Genres is typically small • Or • R =~ UV • Find U and V such that ||R – UV|| is minimized… • Almost like k-means clustering…why ?

From W11-S15 UV Computation…. This example is from Rajaraman, Leskovic and Ullman: See Textbook

Understanding Node Failure Probability and Its Application in Map-Reduce and K-means Clustering