1 / 20

From W1-S16

From W1-S16. From W2-S9. Node failure. The probability that at least one node failing is: f = 1 – (1-p) n When n =1; then f = p Suppose p =0.0001 but n =10000, then: f = 1 – (1 -0.0001) 10000 = 0.63 [why/how ?]

darryl
Download Presentation

From W1-S16

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. From W1-S16

  2. From W2-S9 Node failure The probability that at least one node failing is: f= 1 – (1-p)n When n =1; then f =p Suppose p=0.0001 but n=10000, then: f = 1 – (1 -0.0001)10000 = 0.63 [why/how ?] This is one of the most important formulas to know (in general).

  3. From W2-S15 Example • For example suppose the hash functions maps {to, Java, road} to one node. Then • (to,1) remains (to,1) • (Java,1);(Java,1);(Java,1)  (Java, [1,1,1]) • (road,1);(road,1)(road,[1,1]); • Now REDUCE function converts • (Java,[1,1,1])  (Java,3) etc. • Remember this is a very simple example…the challenge is to take complex tasks and express them as Map and Reduce!

  4. From W2-S19 Similarity Example [2] Notice, it requires some ingenuity to come up with key-value pairs. This is key to suing map-reduce effectively

  5. From W3-S14 K-means algorithm Let C = initial k cluster centroids (often selected randomly) Mark C as unstable While <C is unstable> Assign all data points to their nearest centroid in C. Compute the centroids of the points assigned to each element of C. Update C as the set of new centroids. Mark C as stable or unstable by comparing with previous set of centroids. End While Complexity: O(nkdI) n:num of points; k: num of clusters; d: dimension; I: num of iterations Take away: complexity is linear in n.

  6. From W3-S16 Example: 2 Clusters A(-1,2) B(1,2) c c 4 (0,0) C(-1,-2) D(1,-2) c c 2 K-means Problem: Solution is (0,2) and (0,-2) and the clusters are {A,B} and {C,D} K-means Algorithm: Suppose the initial centroids are (-1,0) and (1,0) then {A,C} and {B,D} end up as the two clusters.

  7. From W4-S21 Bayes Rule Prior Posterior

  8. From W4-S25 Example: Iris Flower choose the maximum F=Flower; SL=Sepal Length; SW = Sepal Width; PL=Petal Length; PW =Petal Width Data

  9. From W5-S7 Confusion Matrix Label 1 is called Positive, Label -1 is called Negative Let the number of test samples be N N = N1 + N2 + N3 + N4. False Positive Rate (FPR) = N2/(N2+N4) True Positive Rate (TPR) = N1/(N1+N3) False Negative Rate (FNR) = N3/(N1+N3) True Negative Rate (TNR) = N4/(N4+N2) Accuracy = (N1+N4)/(N1+N2+N3+N4) Precision = N1/(N1+N2) Recall = N1/(N1+N3)

  10. From W5-S9 ROC (Receiver Operating Characteristic) Curves • Generally a learning algorithm A will return a real number…but what we want is a label {1 or -1} • We can apply a threshold..T TPR = 3/4 FPR = 2/5 TPR = 2/4 FPR = 2/5

  11. From W5-S13 Random Variable • A random variable X can take values in a set which is: • discrete and finite. • Lets toss a coin and X = 1 if it’s a head and X=0 if it’s a tail. X is random variable • discrete and infinite(countable) • Let X be the number of accidents in Sydney in a day.. Then X = 0,1,2,….. • Infinite(uncountable) • Let X be the height of a Sydney-sider. • X = 150, 150.11,150.112,……

  12. From W7-S2 These slides are from Steinbach, Pang and Kumar

  13. From W7-S7

  14. From W7-S8

  15. From W9-S9

  16. From W9-S12

  17. From W9-S21

  18. From W9-S26

  19. From W11-S9 The Key Idea • Decompose the User x Rating matrix into: • User x Rating = ( User x Genre ) x (Genre x Movies) • Number of Genres is typically small • Or • R =~ UV • Find U and V such that ||R – UV|| is minimized… • Almost like k-means clustering…why ?

  20. From W11-S15 UV Computation…. This example is from Rajaraman, Leskovic and Ullman: See Textbook

More Related