200 likes | 364 Views
This study explores similarity metrics used in Case-Based Reasoning (CBR), focusing on attribute-value pairs in binary formats. We define the similarity between two cases, X and Y, using metrics like Hamming Distance and Simple Matching Coefficient (SMC). The text elaborates on the mathematical foundations, properties, and variations of these metrics, taking into account potential weights for attributes, enabling an in-depth understanding of scenarios where attributes may have varying importance. This knowledge is essential for enhancing decision-making processes in fields such as health and manufacturing.
E N D
Similarity in CBR (Cont’d) Sources: Chapter 4 www.iiia.csic.es/People/enric/AICom.html www.ai-cbr.org
Other Similarity Metrics • Suppose that we have cases represented as attribute-value pairs (e.g., the restaurant domain) • Suppose initially that the values are binary • We want to define similarity between two cases of the form: • X = (X1, …, Xn) where Xi = 0 or 1 • Y = (Y1, …,Yn) where Yi = 0 or 1
Preliminaries • Let: • A = (i=1,n)Xi•Yi • B = (i=1,n)Xi•(1-Yi) • C = (i=1,n)(1-Xi)•Yi • D = (i=1,n)(1-Xi) •(1-Yi) • Then, A + B + C + D = (number of attributes for which Xi =1 and Yi = 1) (number of attributes for which Xi =1 and Yi = 0) (number of attributes for which Xi =0 and Yi = 1) (number of attributes for which Xi =0 and Yi = 0) n “matching attributes” “mismatching attributes” A+D = B+C=
Hamming Distance H(X,Y) = n –(i=1,n)Xi•Yi–(i=1,n)(1-Xi)•(1-Yi) • Properties: • Range of H: • H counts the mismatch between the attribute values • H is a distance metric: • H((1-X1, …, 1-Xn), (1-Y1, …,1-Yn)) = [0,n] • H(X,X) = 0 • H(X,Y) = H(Y,X) H((X1, …, Xn), (Y1, …,Yn))
Proportion of the difference # of mismatches Simple-Matching-Coefficient (SMC) n – (A + D) = B + C • H(X,Y) = • Another distance-similarity compatible function is • f(x) = 1 – x/max (where max is the maximum value for x) • We can define the SMC similarity, simH: simH(X,Y) = 1 – ((n – (A+D))/n) = (A+D)/n = 1- ((B+C)/n) Homework(Legacy): Show that f(x) is order inverting: if x < y then f(x) > f(y)
Simple-Matching-Coefficient (SMC) (II) • If we use on simH(X,Y) = (A+D)/n =1- ((B+C)/n) = factor(A, B, C, D) • Monotonic: • If A A’ then: • If B B’ then: • If C C’ then: • If D D’ then: factor(A,B,C,D) factor(A’,B,C,D) factor(A,B’,C,D) factor(A,B,C,D) factor(A,B,C’,D) factor(A,B,C,D) factor(A,B,C,D) factor(A,B,C,D’) • Symmetric: • simH (X,Y) = simH(Y,X)
Variations of the SMC • The hamming similarity assign equal value to matches (both 0 or both 1) • There are situations in which you want to count different when both match with 1 as when both match with 0 • Thus, sim((1-X1, …, 1-Xn), (1-Y1, …,1-Yn)) = sim((X1, …, Xn), (Y1, …,Yn)) may not hold • Example: Two symptoms of patients are similar if they both have fever (Xi = 1 and Yi = 1) but not similar if neither have fever (Xi = 0 and Yi = 0) • Specific attributes may be more important than other attributes Example: manufacturing domain: some parts of the workpiece are more important than others
Variations of SMC (III) • simH(X,Y) = (A+D)/n = (A+D)/(A+B+C+D) • We introduce a weight, , with 0 < < 1: sim(X,Y) = ((A+D))/ ((A+D) + (1 - )(B+C)) • For which is sim(X,Y) = simH(X,Y)? = 0.5 • sim(X,Y) preserves the monotonic and symmetric conditions Homework(Legacy): Show that sim(X,Y) is monotonic
1 > 0.5 = 0.5 < 0.5 0 n 0 The similarity depends only from A, B, C and D (3) • What is the role of ? What happens if > 0.5? If < 0.5? sim(X,Y) = ((A+D))/ ((A+D) + (1 - )(B+C)) • If > 0.5 we give more weights to the matching attributes than to the miss-matching • If < 0.5 we give more weights to the miss-matching attributes than to the matching
Discarding 0-match • Thus, sim((1-X1, …, 1-Xn), (1-Y1, …,1-Yn)) = sim((X1, …, Xn), (Y1, …,Yn)) may not hold • Only when the attribute occurs (i.e., Xi = 1 and Yi = 1 ) will contribute to the similarity • Possible definition of the similarity: sim = A / (A+ B+C)
Specific Attributes may be More Important Than Other Attributes • Significance of the attributes varies • Weighted Hamming distance: • There is a weight vector: (1, …, n) such that • (i=1,n) i = 1 HW(X,Y) = 1 –(i=1,n) i • Xi•Yi–(i=1,n) i • (1-Xi)•(1-Yi) • Example: “Process planning: some features are more important than others”
Homework (Legacy and you have to do it for project anyway): Attributes May Have multiple Values • X = (X1, …, Xn) where Xi Ti • Y = (Y1, …,Yn) where Yi Ti • Each Ti is finite • Define a formula for the Hamming distance in this context
Non Monotonic Similarity • The monotony condition in similarity, formally, says that: sim(A,B) sim(A’,B) • always holds if A counts the number of matches and A A’ • Informally the monotony condition can be expressed as: • For any X, Y, X’ attribute-value vectors, If we obtain X’ by modifying X on the value of one attribute such that X’ and Y have the same value on that attribute then: sim(X,Y) sim(X’,Y)
Non Monotonic Similarity (2) • Is the hamming distance monotonic? Yes simH(X,Y) = (i=1,n)eq(Xi,Yi) / n • Consider the XOR function: • (0,0) and (1,1) are on the same class (+) • (0,1) and (1,0) are on the same class (-) • Thus d((1,1),(1,0)) > d((1,1),(0,0)) • Is this monotonic? No
Suppose that we have two interconnected batteries B and B’ and 3 lamps X, Y and Z that have the following properties: • If X is on, B and B’ work • If Y is on, B or B’ work • If Z is on, B works Situation X Y Z B B’ • 0 1 1 Ok Fail • 0 1 0 Fail Ok • 0 0 0 Fail Fail Non Monotonic Similarity (3) • You may think: “well that was mathematics, how about real world?” • Thus: • sim(1,3) > sim(1,2) • Non monotonic!
P S A B C Tversky Contrast Model • Defines a non monotonic distance • Comparison of a situation S with a prototype P (i.e, a case) • S and P are sets of features • The following sets: • A = S P • B = P – S • C = S – P
Tversky Contrast Model (2) • Tversky-distance: • Where f: [0, ) • f, , , and are fixed and defined by the user • Example: • If f(A) = # elements in A • = = = 1 • T counts the number of elements in common minus the differences • The Tversky-distance is not symmetric T(P,S) = f(A) - f(B) - f(C)
Local versus Global Similarity Metrics • In many situations we have similarity metrics between attributes of the same type (called local similarity metrics). Example: For a complex engine, we may have a similarity for the temperature of the engine • In such situations a reasonable approach to define a global similarity sim(x,y) is to “aggregate” the local similarity metrics simi(xi,yi). A widely used practice • What requirements should we give to sim(x,y) in terms of the use of simi(xi,yi)? sim(x,y) to increate monotonically with each simi(xi,yi).
Local versus Global Similarity Metrics (Formal Definitions) • A local similarity metric on an attribute Ti is a similarity metric simi: Ti Ti [0,1] • A function : [0,1]n [0,1] is an aggregation function if: • (0,0,…,0) = 0 • is monotonic non-decreasing on every argument • Given a collection of n similarity metrics sim1, …, simn, for attributes taken values from Ti, a global similarity metric, is a similarity metric sim:V V [0,1], V in T1 … Tn, such that there is an aggregation function with: • sim(X,Y) = sim(X,Y) = (sim1(X1,Y1), …,simn(Xn,Yn)) Example: (X1,X2,…,Xn) = (X1+X2+…+Xn)/n
Example • Cases may contain attributes of type: • real number A: the voltage output of a device • define a local similarity metric, simvoltage() • Integer B: revolutions per second • define a local similarity metric, simrps() • A bunch of symbolic attributes m = (C1,..,Cm): front light blinking or none, year of manufacture, etc • define a Hamming similarity, simH(), combining all these attributes • Define an aggregated similarity sim() metric: sim(C,C’) = (1 *simvoltage(A,A’) + 2 *simvoltage(A,A’) + 3*simH(m, m’)