Chapter 9: Matching and Ranking Cases

Chapter 9: Matching and Ranking Cases • Matching is the process of comparing two cases to each other and determining their degree of similarity • Ranking is the process of ordering partially matching cases according to the goodness of match, or usefulness • To compute the degree of match between cases, you need to: • Determine which features of two cases correspond to each other • Compute the degree of match between each pair of corresponding features • Determine how important each feature is in assigning an overall degree of match CS 682, AI:Case-Based Reasoning, Prof. Cindy Marling

Types of Matching Schemes • Dimensional matching is the ability to compare two individual features • Aggregate matching is the abilitly to compare two whole cases • Aggregate matching involves dimensional matching • Dimensional matching can be used alone, as in traversing a hierarchical memory structure and comparing the dimensions stored at each node • In static matching schemes, the matching criteria are established in advance and hard coded • In dynamic matching schemes, the criteria may change according to the present purpose • Sometimes, you can hardcode different schemes and choose among them dynamically • Sometimes, important features are determined on the fly, during situation assessment • Some flexibility can be achieved by determining the important features in advance, but weighting them differently each time CS 682, AI:Case-Based Reasoning, Prof. Cindy Marling

Types of Matching Schemes (continued) • In absolute matching, you compute a score for how well each cases matches the new one, independently of all the other cases • In relative matching, you arrange the cases in order from best to worst, without quantifying the goodness of each one • This requires dynamically comparing and contrasting cases to each other, and so is more difficul than absolute matching • If you use absolute matching, ranking becomes trivial • Any sort routine can arrange cases from best to worst based on their absolute scores CS 682, AI:Case-Based Reasoning, Prof. Cindy Marling

Input to Matching and Ranking Functions • The inputs to matching and ranking functions are: • The new case, analyzed in terms of its important features, or indexes • The purpose you have in using the new case -- you can skip this is your system always performs the same task • The recalled cases -- this may be a subset of the case base or all cases • The indexes of the recalled cases • Reasonable criteria for determining the goodness of match • You may want the best case, all relevant cases, or any case that could be adapted to your purpose CS 682, AI:Case-Based Reasoning, Prof. Cindy Marling

Feature Correspondence • To match and rank cases, you need to know which features correspond to each other • In some domains, this is very easy • Example: To help a buyer select a new car, desired price corresponds to actual price, desired make and model correspond to actual make and model, and so on • In some domains, this can be hard • CASEY had to compute correspondences, because its problem description was just a list of patient symptoms -- symptoms that were not identical could still correspond, due to the nature of heart disease CS 682, AI:Case-Based Reasoning, Prof. Cindy Marling

Computing Similarity Among Corresponding Features • Next, you need to determine how similar the values are for corresponding features • You are usually looking for some measure of distance on a qualitative or quantitative scale • Most systems hard code this on a feature by feature basis • Example: A user might be asked if a desired restaurant should be Inexpensive, Moderate, Expensive, or Very Expensive • This might be translated to < $15, $15 - $30, $30 - $50, and > $50, with each restaurant classified as belonging to one category • If a user asks for a category, say Moderate, and a restaurant is in that category, we have an exact match • If the restaurant is one category away, say Inexpensive or Expensive, we have a partial match • If the restaurant is more than one category away, we have no match • You can use four or five categories if it makes sense for your domain CS 682, AI:Case-Based Reasoning, Prof. Cindy Marling

Numeric Features • When values for features are naturally represented as numbers, you may still need special routines for comparing the numbers • Pitfalls to avoid are using absolute comparison and ranges • Example: Say your feature is age and two patients are ten years apart • If one is 60 and the other 70, you have at least a partial match and maybe a pretty good match • If one is 1 and the other 11, you may have no match at all • Example: Say you try to set up ranges, like young < 30, old > 50, and middle-aged everything in between • Then, 31-year-olds will match 49-year-olds better than they match 29-year-olds • We deal with this using normalization and/or point ranges • We could say ages within 5 years are close matches for adults and ages within 1 year are close matches for children CS 682, AI:Case-Based Reasoning, Prof. Cindy Marling

Abstraction Hierarchies • When qualitative or quantitative measures don’t suit your domain, you may need to organize values hierarchically • How can a system tell if spinach is closer to broccoli or to hamburger? • In general, the higher up you have to go in a hierarchy to find a node in common, the worse match you have • Exactly how you traverse the hierarchy to find degree of match will depend on your domain CS 682, AI:Case-Based Reasoning, Prof. Cindy Marling

Example: Abstraction Hierarchy Food Vegetable Fruit Meat Lamb Citrus Berry Pork Green Yellow Beef Veal Orange Hamburger Roast Spinach Steak Strawberry Chop Broccoli Squash Peas CS 682, AI:Case-Based Reasoning, Prof. Cindy Marling

Importance of Features • Besides considering how well features match, we also need to consider how important each feature is • In an Ecommerce application, you could ask the user how important each feature is to them • In some systems, the same features keep the same importance • In other systems, features change in importance depending on the task at hand • Kolodner uses the example of determining a salary for a professional baseball player • If you want to hire a fielder, then how well he bats is important • If you want to hire a pitcher, then batting is unimportant, but the speed of his fast ball becomes very important CS 682, AI:Case-Based Reasoning, Prof. Cindy Marling

Matching and Importance • We need to get a handle on two things at once • How close features match • How important it is for them to match • Numeric conventions are used to indicate degree of match and degree of importance • 0 means no match, 1 means exact match, and numbers in between indicate degree of partial match • 0 means unimportant, 1 means of utmost importance, and numbers in between indicate degree of importance • Note: This is not something that can be carried out to many decimal places. We often use rough estimates like .25, .5, and .75. • The nearest neighbor algorithm is often used in practice to combine feature similarity and importance CS 682, AI:Case-Based Reasoning, Prof. Cindy Marling

CS 682, AI:Case-Based Reasoning, Prof. Cindy Marling

Pitfalls of Applying Nearest Neighbor • The example was purposely chosen to point out some pitfalls of the nearest neighbor algorithm • The obvious problem is that the three old cases are equally similar to the new case. When this happens: • It’s possible that the cases really are very similar, and our case base is just too small to contain distinguishable players • It’s also possible that we’re comparing the wrong features, computing the wrong comparison values, or using the wrong importance weights • In our example, we’re using the wrong features for comparison • We are comparing RBIs and strikeouts without considering how many games a player has been in or how many at bats he’s had • We need to consider the ratio of successful attempts to opportunities • Moral of Story: It’s easy to crunch numbers, but it’s not easy to know which numbers to crunch CS 682, AI:Case-Based Reasoning, Prof. Cindy Marling

Chapter 9: Matching and Ranking Cases