FAUST - Improving Accuracy and Speed for Pixel Classification

What should we do next? SVD seems interesting, but it's still one pixel at time as is k-Nearest-Neighbor? We should push FAUST to as far as it can go! FAUST{div,gap} is very fast. std, rankK and/or multi-attribute cutting will improve accuracy but there may be a need for greater accuracy yet. Note that in FAUST{div,gap}, we cut perpendicular to the attribute line, which contains the maximum consecutive class-mean gap and use the midpoint of that gap as a cut_point to separate the entire remaining space of pixels into 2 big boxes, one containing one partition of the remaining classes and the other containing the balance of classes. We never cut oblique to attribute directions!) Then we do it again on 1 of those sub-partitions until we reach a single class. Can that be improved? Speed-probably not. Can accuracy be improved without sacrificing speed (much)? Here is a way it can be done at about the same speed. As motivation, think about a blue-red cars class. (Define, e.g., as 2 parts red, 1 part blue). We want to do a cut (at the midpoint of the maximal gap) maximizing over all oblique directions, not just along dimensions (since the dimensions form a measure zero set of all possible directions). E.g., a “blue-red” cut would define a line at a 30 degree angle from the red axis toward the blue points in the blue-red direction. If D is any unit vector, X dot D = i=1..nXi*Di. X dot D > cut_point defines an oblique big box. We ought to consider all D-lines (noting that dimension lines ARE D-lines). For this we will need an EIN formula for mask pTree, P(X dot D)>a where X is any vector and D is an oblique vector (NOTE: if D=ei=(0,...,1,...0) then this is just the existing EIN formula for the ith dimension, but this will be harder because X is a vector, not just a number now). The pTree formula for the dot product is in the pTree book, pgs 133-134. We would like a recursive, exhaustive search for the vector D that gives us the maximal gap among the consecutive training class means for the classes that remain (not just over all attribute directions, but all combination directions). How can we find it? First a few examples:

gap D Suppose there are just 2 attributes (red and blue) and we (r,b)-scatter plot the 10 reddish-blue class training points and the 10 bluish-red class training points: blue ^ | rb rb | rb rb | rb rb rb | brrb | brrb | br brrb | br br | br br | br br -------------------------------------+------------------------------------------------------>red D-line mean for the rb class D-line mean for the br class etc. Consecutive class mean mid-point = Cut_Point Cut-HyperPlane, CHP (what we are after) Clearly we would want to find a ~45 degree unit vector, D, then calculate the means of the projections of the two training sets onto the D-line then use the midpoint of the gap between those two means as the cut_point (erecting a perpendicular bisector "hyperplane" to D there - which separates the space into the two class big boxes on each side of the hyperplane. Can it an be masked using one EIN formula??): ^ blue | rb rb | rb rb | rb rb rb | br rb | br rb | br brrb | br br | br br | br br -------------------------------------+------------------------------->red The above "diagonal" cutting produces a perfect classification (of the training points). If we had considered only cut_points along coordinate axes, it would have been very imperfect!

blue ^ | rb rb | rb rb rb | rb rb rb | rb rb | | br br | br br | br br br | br br br ---+---------------------------->red gap gap Cut-HyperPlane, CHP D D gap blue rb rb rb rb rb rb rb rbbr rbbr br br br br br br br red D How do we search through all possible angles for the D that will maximize that gap? We would have to develop the formula (pTree only formula) for the class means for any D and then maximize the gap (distance between consecutive D-projected means). Take a look at the formulas in the book, think about it, take a look at Mohammad’s formulas, see if you can come up with the mega formula above. Let D = (D1, …, Dn) be a unit vector (our “cut_line direction vector) D dot X = D1X1+ …+DnXn is the length of the perpendicular projection of X on D (length of the high noon shadow that X makes on the D line, as if D were the earth). So, we project every training point, Xc,i (class=c, i=1..10), onto D (i.e., D dot Xc,i). Calculate D-line class means, (1/n)(D dot Xc,i), select the max consecutive mean gap along D, (call it best_gap(D)=bg(D). Maximize bg(D) over all possible D. Harder? Calculate it for a [polar] grid of D’s! Maximize over that grid. Then use continuity and hill climbing to improve it. etc. Cut_point More likely the situation would be: rb's are more blue than red and br's are more red than blue. Suppose there are just 2 attributes (red and blue) and we (r,b)-scatter plot the 10 reddish-blue class training points and the 10 bluish-red class training points: rb rb rb rb rb rb rb brrb brrb br brrb br br br br br br -------+------------------------------------------------------> red blue D-line mean for the rb class D-line mean for the br class What if the training points are shifted away from the origin? This should convince you that it still works.

b grb grb grb grb grb grb grb grb grb bgr bgr bgr bgr bgr bgr bgrbgr bgr bgr D g In higher dimensions, nothing changes (If there are "convex" clustered classes, FAUST{div,oblique_gap} can find them (consider greenish-redish-blue and bluish-greenish-red): r Before considering the pTree formulas for the above, we note again that any pair of classes (multi-classes, as in divisive) that are convex, can be separated by this method. What if they are not convex? A 2-D example: A couple of comments. FAUST resembles the SVD (Support Vector Machine) method in that it constructs a separating hyperplane in the "margin" between classes. The beauty of SVD (over FAUST and all other methods) is that it is provable that there is a transformation to a higher dimensions that renders two non-hyperplane seperable classes to being hyperplane seperable (and you don't actually have to do the transformation - just determine the kernel that produces it.). The problem with SVD is that that it is computationally intensive. I think we want to keep FAUST simple (and fast!). If we can do this generalization, I think it will be a real winner! How do we search over all possible Oblique vectors, D, for the one that is "best"? Of if we are to use multi-box neighborhoods, how do we do that? A heuristic method follows:

Taking the vector connecting the means as D (in the case of each pair of classes) ANDing the two pTrees masks the region (which is r) P(mvmr)oX>|mr+mv|/2 masks vectors that makes a shadow on mr side of the midpt r r r v v r mr r v v v r r v mv v r b v v r b b v b mb b b b b b b r r r v v r mr r v v v r r v mv v r b v v r b b v b mb b b b b b b For classes r and b P(mbmr)oX>|mr+mb|/2 The only question left is (assuming we can develop these Oblique EIN pTree formulas) is which is best, to use the mean midpoints as cut_points or??? Other possibilities include to use the "furthest points away from the means in each class (in terms of their projections of the D-line); the furthest non-outlier points; the best rankK points, the best std points, etc.

G bG bG aG aG R bB bB aB aB aR aR bR bR B G R B APPENDIX on FAUST is a Near Neighbor Classifier. It is not aVoting NNClike pCkNN (where for each unclassified sample pCkNN builds around that sample, a neighborhood of TrainingSet voters, who then classify sample through majority, plurality or weighted (in PINE) vote. pCkNN classifies one unclassified sample at a time. FAUST is meant for speed and therefore FAUST attempts to classify all unclassified samples at one time. FAUST builds a Big Box Neighborhood (BBN) for each class and then classifies all unclassified samples in the BBN into that class (constructing said class-BBNs with one EIN pTree calculation per class). The BBNs can overlap, so the classification needs to be done one class at a time sequentially, in maximum gap, maximum number of std's in gap, or minimum rankK in gap order.) The whole process can be iterated as in k-means classification using the predicted classes [or subsets of] as the new training set. This can be continued until convergence. A BBN can be a coordinate box: for coord R, cb(R,class,aR,bR) is all x such that aR<xR<bR Either or both of the < can be  or . aR and bR are what were called cut_points of the class. Or BBNs can be multi-coordinate boxes, which are INTERSECTIONs of the best k (kn-1, assuming n classes) cb's for a given class ("best" can be wrt any of the above maximizations). And instead of using a fixed number of coordinates, k, we could use only those in which the "quality" of its cb is higher than a threshold, where "quality" might be measured involving the dimensions of the gaps (or other ways?). FAUST could be combined with pCkNN (probably in many ways) as follows; FAUST multi-coordinate BBN could be used first to classify the "easy points" (that fall in an intersection of high quality BBNs and are therefore fairly certain to be correctly classified). Then for the remaining "difficult points" could be classified using the original training set (or the union of each original TrainingSet class with the new "easy points" of that same class) and using L or Lp , p = 1 or 2.

FAUST - Improving Accuracy and Speed for Pixel Classification