410 likes | 493 Views
This research delves into automated discovery techniques for recommendation rules in recommender systems. It explores the role of default preferences and the challenge of ensuring reliable recommendations in such systems. Various retrieval approaches are investigated, including rule-based and inductive retrieval methods. The study also discusses dominance rules and rule-based retrieval using the Rubric system. The goal is to enhance the reliability and accuracy of recommendations provided to users based on their preferences and queries.
E N D
Automated Discovery of Recommendation Knowledge David McSherry School of Computing and Information Engineering University of Ulster +
Overview • Approaches to retrieval in recommender systems • Rule-based retrieval (of cases) in Rubric • Automating the discovery of recommendation rules • Role of default preferences in rule discovery • Related work • Conclusions +
The Recommendation Challenge • Often we expect salespersons to make reliable recommendations based on limited information: • I’m looking for a 3-bedroom detached property • To recommend an item with confidence, a salesperson has to consider: • The customer’s known preferences • The available alternatives • All features of the recommended item including features notmentioned by the customer
Are Recommender Systems Reliable? • Features not mentioned in the user’s query are typically ignored in: • Nearest neighbour (NN) retrieval • Decision tree approaches • Multi-criterion decision making • Assumed(or default) preferences are sometimes used for attributes like price • But for many attributes, no assumptions can be made about the user’s preferences
Preferences Pyramid Known preferences Default preferences beds = 3 type = detached Unknown preferences ..., reasonably priced, ..., ..., location = A, ...,
CBR Recommender Systems • Descriptions of available products (e.g. houses) are stored as cases in a product dataset e.g. Loc Beds Type Weight: (3) (2) (1) Case 1: A 3 semi Case 2: B 4 det Case 3: B 3 det and retrieved in response to user queries
Inductive Retrieval • Not only are the user’s unknown preferences ignored - the user is prevented from expressing them 4 Bedrooms? Case 2 (B, 4, det) 3 det Case 3 (B, 3, det) Type? semi Case 1 (A, 3, semi)
Inductive Retrieval • The recommended case exactly matches the user’s known preferences - but what if she prefers location A? 4 Bedrooms? Case 2 (B, 4, det) 3 det Case 3 (B, 3, det) Type? semi Case 1 (A, 3, semi)
The standard CBR approach is to recommend the most similar case The similarity of a case C to a query Q over a subset AQ of the product attributes A is: where wa is the weight assigned to a Nearest Neighbour Retrieval
Incomplete Queries in NN Loc Beds Type (3) (2) (1) Q : 3 det Sim Case 1: A 3 semi 2 Case 2: B 4 det 1 Case 3: B 3 det 3 most-similar(Q) = {Case 3}
Incomplete Queries in NN Loc Beds Type (3) (2) (1) Q : 3 det Sim Case 1: A 3 semi 2 Case 2: B 4 det 1 Case 3: B 3 det 3 most-similar(Q) = {Case 3} • Again, Case 3 is a good recommendation if the user happens to prefer location B
Incomplete Queries in NN Loc Beds Type (3) (2) (1) Q* : A 3 detSim Case 1: A 3 semi 5 Case 2: B 4 det 1 Case 3: B 3 det 3 most-similar(Q*) = {Case 1} • But not if she prefers location A
Rule-Based Retrieval in Rubric • In rule-based retrieval, a possible recommendation rule for Case 3 might be: Rule 1: if beds = 3 and type = det then Case 3 • Given a target query, a product dataset, and a set of recommendationrules, Rubric: • Retrieves the case recommended by the first rule that covers the target query • If none of the available rules covers the target query, it abstains from making a recommendation
Dominance Rules • For any case C and query Q, we say that Q → C is a dominance rule if: most-similar(Q*) = {C} for all extensionsQ* of Q • As Rule 1 is not a dominance rule for Case 3, it is potentially unreliable: Rule 1: if beds = 3 and type = det then Case 3
A Dominance Rule for Case 3 Loc Beds Type (3) (2) (1) Q : B3Sim Case 1: A 3 semi 2 Case 2: B 4 det 3 Case 3: B 3 det 5 most-similar(Q) = {Case 3}
A Dominance Rule for Case 3 Loc Beds Type (3) (2) (1) Q : B3SimMax Case 1: A 3 semi 2 3 Case 2: B 4 det 3 4 Case 3: B 3 det 5 • As Cases 1 and 2 can never equal the similarity of Case 3, a dominance rule for Case 3 is: Rule 2: if loc = B and beds = 3 then Case 3
Coverage of a Dominance Rule • A dominance rule Q→ C can be applied to any query Q* such that QQ* since by definition: most-similar(Q*) = {C} • Also by definition, most-similar(Q**) = {C} for any extension Q** of Q* • So no other case can equal the similarity of C regardless of the user’s unknown preferences
The Role of Case Dominance • A given case C1dominates another case C2 with respect to a query Q if: Sim(C1, Q*) > Sim(C2, Q*) for all extensions Q* of Q (McSherry, IJCAI-03) • So Q→ C is a dominance rule if and only if C dominates all other cases with respect to Q • This is not the same as Pareto dominance
Identifying Dominated Cases • A given case C1 dominates another case C2 with respect to a query Q if and only if: (McSherry, IJCAI-03) • Cases dominated by a given case can thus be identified with modest computational effort
B, 3, det B, 3 B, det 3, det B 3 det nil Dominance Rule Discovery(McSherry & Stretch, IJCAI-05) • Our algorithm targets maximally general dominance rules Q→ Csuch that Q description(C) Description of Case 3 Case 3 dominates Case 1 and Case 2 with respect to this query
Complexity of Rule Discovery • Our discovery algorithm is applied with each case in turn as the target case • For a product dataset with n cases and k attributes, where n 2k, the worst-case complexity is: O(k n2 2k) • If n < 2k, the worst-case complexity is: O(k n 22k)
Maximum Rule-Set Size In a dataset with k attributes, the number of rules discovered for a target case can never be more than kCk/2(McSherry & Stretch, IJCAI-05) With 1,000 products and 9 attributes, the maximum number of discovered rules is 126,000 Rule-set sizes tend to be much smaller in practice
Digital Camera Case Base Source: McCarthy et al. (IUI-2005) No of cases: 210 Attributes: make (9), price (8), style (7), resolution (6), optical zoom (5), digital zoom (1), weight (4), storage type (2), memory (3) Discovered Rule: if make = toshiba and style = ultra compact and optical zoom = 3 then Case 201
Discovered Rule-Set Sizes Digital Camera Case Base (k = 9)
Lengths of Discovered Rules Digital Camera Case Base (k = 9)
Limitations of Discovered Rules Example Rule if make = sony and price = 336and style = compact and resolution = 5 and weight = 236 then Case 29 Problem Exact numeric values (e.g., price, weight) make the rule seem unnatural/unrealistic They also limit its coverage Solution Assume the preferred price and weight are the same for all users
LIB and MIB Attributes • A less-is-better (LIB) attribute is one that most users would prefer to minimise e.g. price, weight • A more-is-better (MIB) attribute is one that most users would prefer to maximise e.g. resolution, optical zoom, digital zoom, memory • Often in NN retrieval, LIB and MIB attributes are treated as nearer-is-better attributes: • How much would you like to pay? 300
LIB and MIB Attributes • A less-is-better (LIB) attribute is one that most users would prefer to minimise e.g. price, weight • A more-is-better (MIB) attribute is one that most users would prefer to maximise e.g. resolution, optical zoom, digital zoom, memory • Often in NN retrieval, LIB and MIB attributes are treated as nearer-is-better attributes: • How much would you like to pay? 300 • This doesn’t make sense, as it implies that the user would prefer to pay 310 than 280
Role of Default Preferences in Rule Discovery(McSherry & Stretch, AI-2005) • We assume the preferred value of a LIB/MIB attribute is the lowest/highest value in the case base • These preferences are represented in a default query: QD : price = 106, memory = 64, resolution = 14, optical zoom = 10, digital zoom = 8, weight = 100 • In the dominance rules Q → C now targeted by our algorithm, Q includes the default preferences in QD • Thus the assumed preferences are implicit in the discovered rules
Similarity to the Default Query • We use the standard measure for numeric attributes: where x is the value in a givencase and y is the preferred value • For a LIB attribute:
Digital Camera Case Base No of cases: 210 Attributes:make, price, style, resolution, optical zoom, digital zoom, weight, storage type, memory LIB attributes: price, weight MIB attributes: resolution, optical , digital, memory Discovered Rule: if make = sony and style = compact then Case 29
Reduced Complexity of Rule Discovery(e.g., from 512 candidate queries to 8) QD {sony, compact, memory stick} QD {sony, compact}QD {sony, memory stick} QD {compact, memory stick} QD {sony} QD {compact} QD {memory stick} QD Dominance Rule Discovery for Case 29
Reduced Length of Discovered Rules DPs = Default Preferences
Recommendability of Cases • Only 56 of the 210 cases can be the most similar case for any query that includes the default query QD • The reason is that most cases are dominated with respect to the default query • For most of the 56 non-dominated cases, only a single dominance rule was discovered • The discovered rules cover 29% of all queries over the attributes make, style, and storage type
Retrieving Stories for Case-Based Teaching(Burke & Kass, 1996) • Rule-based retrieval of stories or lessons learned by experienced salespersons • Retrievalis conservative, opportunistic,and non-mandatory • A story is retrieved at the system’s initiative and only if highly relevant • By design, retrieval in Rubric is also conservative and non-mandatory (and potentially opportunistic) • Easily combined with NN retrieval of a less strongly recommended case if no rule covers a given query
Incremental Nearest Neighbour (iNN)(McSherry, IJCAI-03, AICS-05, AIR 2005) • A conversational CBR approach in which: • Question selection is goal driven(i.e., maximise number of cases dominated by a target case) • Dialogue continues until it can be safely terminated(i.e., no other case can exceed the similarity of the target case) • Relevance of any question can be explained(e.g., ability to confirm the target case) • Recommendations can be justified(i.e., unknown preferences cannot affect the outcome)
Demand Driven Discovery of Recommendation Knowledge in Top Case Top Case:What is the preferred make? User: sony Top Case: The target case is: Case 40: sony, 455, ultra compact, 5, 4, 4, 298, MS, 32 What is the preferred style? User:why Top Case:Because if style = ultra compact this will confirm Case 40 as the recommended case What is the preferred style? User: compact Top Case: The recommended case is: Case 29: sony, 336, compact, 5, 3, 4, 236, MS, 32
Conclusions • Benefits of retrieval based on dominance rules: • Provably reliable because account is taken of the user’s unknown preferences • Benefits of default preferences: • An often dramatic reduction in average length of the discovered rules • Increased coverage of queries representing the user’s personal preferences • Reduced complexity of rule discovery
References Burke, R. and Kass, A. (1996) Retrieving Stories for Case-Based Teaching. In Leake, D. (ed.) Case-Based Reasoning: Experiences, Lessons & Future Directions. Cambridge, MA: AAAI Press, 93-109 McCarthy, K., Reilly, J., McGinty, L. and Smyth, B. (2005) Experiments in Dynamic Critiquing. Proceedings of the International Conference on Intelligent User Interfaces, 175-182 McSherry, D. (2003) Increasing Dialogue Efficiency in Case-Based Reasoning without Loss of Solution Quality. Proceedings of the 18th International Joint Conference on Artificial Intelligence, 121-126 McSherry, D. (2005) Explanation in Recommender Systems. Artificial Intelligence Review24 (2) 179-197 McSherry, D. (2005) Incremental Nearest Neighbour with Default Preferences. Proceedings of the 16th Irish Conference on Artificial Intelligence and Cognitive Science, 9-18 McSherry, D. and Stretch, C. (2005) Automating the Discovery of Recommendation Knowledge. Proceedings of the 19th International Joint Conference on Artificial Intelligence, 9-14 McSherry, D. and Stretch, C. (2005) Recommendation Knowledge Discovery. Proceedings of the 25th SGAI International Conference on Innovative Techniques and Applications of Artificial Intelligence
Acknowledgements • Thanks to: • Eugene Freuder, Barry O’Sullivan, Derek Bridge, Eleanor O’Hanlon (4C) • Chris Stretch (co-author, IJCAI-05 and AI-2005) • Kevin McCarthy, Lorraine McGinty, James Reilly, Barry Smyth (UCD) for the digital camera case base
Compromise-Driven Retrieval(McSherry, ICCBR-03, UKCBR-05) • Similarity and compromise (unsatisfied constraints) play complementary roles • Queries can include upper/lower limits for LIB/MIB attributes (used only in assessment of compromise) • Every case in the product data set is covered by one of the recommended cases • That is, one of the recommended cases is at least as similar and involves the same or fewer compromises