1 / 56

Preference Analysis

Preference Analysis. Joachim Giesen and Eva Schuberth May 24, 2006. Outline . Motivation Approximate sorting Lower bound Upper bound Aggregation Algorithm Experimental results Conclusion. Motivation. Find preference structure of consumer w.r.t. a set of products

jess
Download Presentation

Preference Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Preference Analysis Joachim Giesen and Eva Schuberth May 24, 2006

  2. Outline • Motivation • Approximate sorting • Lower bound • Upper bound • Aggregation • Algorithm • Experimental results • Conclusion

  3. Motivation • Find preference structure of consumer w.r.t. a set of products • Common: assign value function to products • Value function determines a ranking of products • Elicitation: pairwise comparisons • Problem: deriving metric value function from non-metric information  We restrict ourselves to finding ranking

  4. Motivation • Efficiency measure: number of comparisons  Find for every respondent a ranking individually • Comparison based sorting algorithm • Lower Bound: comparisons As set of products can be large this is too much

  5. Motivation Possible solutions: • Approximation • Aggregation • Modeling and distribution assumptions

  6. Approximation(joint work with J. Giesen and M. Stojaković) • Lower bound (proof) • Algorithm

  7. Approximation • Consumer’s true ranking of n products corresponds to:Identity increasing permutation id on {1, .., n} • Wanted:Approximation of ranking corresponds to: s.t. small

  8. Metric on Sn • Needed: metric on • Meaningful in the market research context • Spearman’s footrule metric D: • Note:

  9. We show: To approximate ranking within expected distance at leastcomparisons necessary comparisons always sufficient

  10. Lower bound • : randomized approximate sorting algorithm A • , If for every input permutation the expected distance of the output to id is at most r, then A performs at leastcomparisons in the worst case.

  11. Assume less than comparisons for every input. • Fix  deterministic algorithm.Then for at least permutations: output at distance more than 2r. •  Expected distance of larger than r. • There is a , s.t. expected distance of larger than r.Contradiction. Lower bound: Proof Follows Yao’s Minimax Principle

  12. Lower bound: Lemma For r>0 : ball centered at with radius r r id Lemma:

  13. If then # sequences of n non-negative integers whose sum is at most r:  Lower bound: Proof of Lemma uniquely determined by the sequence For sequence of non-negative integers : at most 2n permutations satisfy

  14. Lower bound: deterministic case Now to show: For fixed, the number of input permutations which have output at distance more than 2r to id is more than

  15. Lower bound: deterministic case k comparisons  2k classes of same outcome

  16. Lower bound: deterministic case k comparisons  2k classes of same outcome

  17. Lower bound: deterministic case For in the same class: For in the same class:

  18. Lower bound: deterministic case For in the same class:

  19. Lower bound: deterministic case At most 2k input permutations have same output

  20. Lower bound: deterministic case At most input permutations with output in

  21. Lower bound: deterministic case At least input permutations with output outside

  22. Upper Bound Algorithm (suggested by Chazelle) approximates any ranking within distance with less than comparisons.

  23. Algorithm • Partitioning of elements into equal sized bins • Elements within bin smaller than any element in subsequent bin. • No ordering of elements within a bin • Output: permutation consistent with sequence of bins

  24. Algorithm Round 0 1 2

  25. Running Time • Median search and partitioning of n elements: less than 6n comparisons (algorithm by Blum et al) • m rounds  less than 6nm comparisons Distance Set Analysis of algorithm m rounds  2mbins Output : ranking consistent with ordering of bins

  26. Algorithm: Theorem Any ranking consistent with bins computed in rounds, i.e. with less thancomparisons has distance at most

  27. Approximation: Summary • For sufficiently large error: less comparisons than for exact sorting:error , const: comparisonserror : comparisons • For real applications: still too much • Individual elicitation of value function not possible •  Second approach: Aggregation

  28. Aggregation(joint work with J. Giesen and D. Mitsche) Motivation: • We think that population splits into preference/ customer types • Respondents answer according to their type (but deviation possible) • Instead of • Individual preference analysis or • aggregation over the population  aggregate within customer types

  29. Aggregation Idea: • Ask only a constant number of questions (pairwise comparisons) • Ask many respondents • Cluster the respondents according to answers into types • Aggregate information within a cluster to get type rankings Philosophy: First segment then aggregate

  30. Algorithm The algorithm works in 3 phases: • Estimate the number k of customer types • Segment the respondents into the k customer types • Compute a ranking for each customer type

  31. Algorithm Every respondent performs pairwise comparisons. Basic data structure: matrix A = [aij] Entry aij in {-1,1,0}, refers to respondent i and the j-th product pair (x,y) if respondent i prefers y over x if respondent i prefers x over y if respondent i has not compared x and y

  32. Algorithm Define B = AAT Then Bij = number of product pairs on which respondent i and j agree minus number of pairs on which they disagree (not counting 0’s).

  33. Algorithm: phase 1 Phase 1: Estimation of number k of customer types • Use matrix B • Analyze spectrum of B • We expect: k largest eigenvalues of B to be substantially larger than the other eigenvalues  Search for gap in the eigenvalues

  34. Algorithm: phase 2 Phase 2: Cluster respondents into customer types • Use again matrix B • Compute projector P onto the space spanned by the eigenvectors to the k largest eigenvalues of B • Every respondent corresponds to a column of P • Cluster columns of P

  35. Algorithm: phase 2 • Intuition for using projector – example on graphs:

  36. Algorithm: phase 2 Ad =

  37. Algorithm: phase 2 P =

  38. Algorithm: phase 2 P’ =

  39. Algorithm: phase 2 Embedding of the columns of P

  40. Algorithm: phase 3 Phase 3: Compute the ranking for each type • For each type t compute characteristic vector ct: • For each type t compute ATctif entry for product pair (x,y) is if respondent i belongs to that type otherwise positive: x preferred over y by t negative: y preferred over x by t zero : type t is indifferent

  41. Experimental study On real world data • 21 data sets from Sawtooth Software, Inc. (Conjoint data sets) Questions: • Do real populations decompose into different customer types • Comparison of our algorithm to Sawtooth’s algorithm

  42. Conjoint structures Attributes: Sets A1, .. An, |Ai|=mi An element of Ai is called level of the i-th attribute A product is an element of A1x …x An Example: Car • Number of seats = {5, 7} • Cargo area = {small, medium, large} • Horsepower = {240hp, 185hp} • Price = {$29000, $33000, $37000} • … In practical conjoint studies:

  43. Quality measures Difficulty: we do not know the real type rankings • We cannot directly measure quality of result • Other quality measures: • Number of inverted pairs :average number of inversions in the partial rankings of respondents in type i with respect to the j-th type ranking • Deviation probability • Hit Rate (Leave one out experiments)

  44. # respondents = 270Size of study: 8 x 3 x 4 = 96# questions = 20 Study 1 Largest eigenvalues of matrix B

  45. # respondents = 270Size of study: 8 x 3 x 4 = 96# questions = 20 Study 1 • two types • Size of clusters: 179 – 91 Number of inversions and deviation probability

  46. # respondents = 270Size of study: 8 x 3 x 4 = 96# questions = 20 Study 1 Hitrates: • Sawtooth: ? • Our algorithm: 69%

  47. # respondents = 539Size of study: 4 x 3 x 3 x 5 = 180# questions = 30 Study 2 Largest eigenvalues of matrix B

  48. # respondents = 539Size of study: 4 x 3 x 3 x 5 = 180# questions = 30 Study 2 • four types • Size of clusters: 81 – 119 – 130 – 209 Number of inversions and deviation probability

  49. # respondents = 539Size of study: 4 x 3 x 3 x 5 = 180# questions = 30 Study 2 Hitrates: • Sawtooth: 87% • Our algorithm: 65%

  50. # respondents = 1184Size of study: 9 x 6 x 5 = 270# questions = 48 Study 3 Size of clusters: 6 – 3 – 1164 – 8 – 3 Size of clusters: 3 – 1175 – 6 1-p = 12% Largest eigenvalues of matrix B

More Related