1 / 49

The 2-Catalog

The 2-Catalog. Segmentation. Problem. Joint work with Shmuel Safra. Motivation. Motivation. The Catalog Problem. Input: A set of customers C . A set of pages P . A function  : C  2 P . The catalog size r .

Download Presentation

The 2-Catalog

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The 2-Catalog Segmentation Problem Joint work with Shmuel Safra

  2. Motivation

  3. Motivation

  4. The Catalog Problem Input: • A set of customers C. • A set of pages P. • A function  : C  2P. • The catalog size r. Output:A catalog P’  P of size r s.t. is maximal.

  5. The Catalog Problem (cont.) Algorithm: Take the r most popular pages.

  6. Catalog Segmentation

  7. The k-Catalog Segmentation Input: • A set of customers C. • A set of pages P. • A function  : C  2P. • The catalog size r. Output:k catalogs P1,…,Pk P of size r each, s.t. is maximal.

  8. Representation as a Graph • We can consider the input as a bipartite graph G = (C, P, E), whereE = { (c,p) | c  C, p  (c) }. • Then, our goal is to find k sets of vertices P1,…Pk  P of size r each, and a partition of C into k sets C1,…,Ck s.t.| E  ( P1C1  …  Pk  Ck) | is maximal.

  9. Uniform Catalog Problem Definition:A catalog problem is called uniform if there exists a number d such that the degree of every vertex p P is d. • The maximum possible number of hits for a uniform catalog problem is krd. • Thus, we can normalize the number of hits and define

  10. Hardness Theorem (Kleinberg, Papadimitriou and Raghavan):It is NP-hard to precisely compute the optimal k catalogs.

  11. Approximation Proposition:Taking the rmost popular pages in all k catalogs gives an approximation factorof1/k. Proof:In the optimal solution, there is a catalog that gives at least 1/k of the hits. Thus, using only this catalog leaves us with at least 1/k of the hits. Replacing this catalog by the r most popular pages can only increase the number of hits.

  12. Dense Instances Kleinberg, Papadimitriou and Raghavan gave an approximation scheme for dense instances, i.e. instances in which each customer is interested in at least  fraction of the pages.

  13. The PCP • A SAT instance  = (1,…,n) over 2 types of variables: X and Y. • The range of the variables x  X isRX = {0,1}l. • The range of the variables y  Y is {0,1}. • Each i   depends on exactly one x  X and one y  Y,s.t the value assigned to x determines the value of y. Thus, we can write it as a function xy : Rx  {0,1}.

  14. The PCP (cont.) It is NP-hard to distinguish between the following 2 cases: Good: There exists an assignment A s.t. Bad: For any assignment A

  15. The Reduction Given an instance  for the above PCP, let G be the following instance for the 2-catalog segmentation problem: • P = { (x, a, s) |x  X, a  RX, s  {0,1} } • C = { (y, b) |y  Y, b {0,1} } • (x, a, s)  (y, b)xy   and xy(a) = b  s • r = |X|

  16. Completeness Theorem:If  is satisfiable then sat(G) = 1. Proof:Consider the following segmentation: •  i  {0,1}, Pi = { (x, A(x), i) |x  X}. •  y  Y, (y, A(y)) gets P0 and (y, A(y)) gets P1. Thus, for every page in the catalogs, all the customers that are interested in it get it, and hence sat(G) = 1.

  17. Soundness We would like to show that: ,   = (),  = () s.t. if sat(G) > ½ + then there exists an assignment A s.t. . We would like to construct an assignment according to the catalogs. Problem:A catalog might contain many pages for the same x with different assignments.

  18. Refining the PCP Solution:Changing the PCP. Good: There exists an assignment A s.t. Bad: For any assignment A

  19. Choosing One Catalog Now, assume sat(G) > ½ +.Thus, for one of the catalogs, Pi’,and hence

  20. Choosing a Subset of Pages • Let . • Thus, |Pi’’|  /2 |X|. • Now, let us keep only one page in Pi’’ for each x  X, and denote the set by Pi’’’.|Pi’’’| 2-l /2 |X|.

  21. Enforcing the Same s •  s’  {0,1} s.t.|{ (x, a, s’) | (x, a, s’)  Pi’’’ }|  2-l+1 /2 |X|. • Denote the set of the corresponding x’s by X’. • For an appropriate value of , |X’|  |X|.

  22. Constructing an Assignment We would like to construct an assignment as follows: •  x  X’,assign the value of the appropriate page. • y Y, if (y, b) gets the catalog Pi’, assign the value b  s’ to y. Thus,  x  X’, ½ +/2 of the clauses xyare satisfied.

  23. Problem For a variable y  Y, both (y, 0) and (y, 1) might get the same catalog. Thus, we cannot obtain an assignment to Y as we would like to.

  24. Problem For a variable y  Y, both (y, 0) and (y, 1) might get the same catalog. Thus, we cannot obtain an assignment to Y as we would like to.

  25. Taking Subsets of x’s Instead of taking one page for each (x, a, s), we take a page for every tuple of: • A subset of m x’s • An assignment to • A bit s

  26. The PCP  = (1,…,n) over variables, X and Y, s.t. it is NP-hard to distinguish between: Good: There exists an assignment A s.t. Bad: For any assignment A

  27. par[,k] - Definitions • For a 3SAT formula over boolean variables Y, let Y(k) be the set of allk-subset of Y, and let (k)be the set of all k- subset of . •  VY(k), let SVbe the set of all assignments to V. •  C(k), let SCbe the set of all satisfying assignments to C.

  28. par[,k] – Definitions (cont.) •  VY(k), C(k), let V  Cif V is a choice of one variable of each clause in C. •  VY(k),C(k), s.t. V  C let a|V denote the natural restriction of an a  SCto SV.

  29. par[,k] Definition:For a 3SAT formula over boolean variables Y, denote by par[,k] the following instance: • There are 2 types of variables: • W : x[V] for every V  Y(k), over SV • Z : x[C] for every C  (k), over SC • There is a local test [C,V] for everyV  C that accepts x[C]|v = x[V].

  30. par[,k] (cont.) Definition: For a set of boolean clauses , letsat()denote the maximal fraction of clauses of  that can be satisfied simultaneously. Theorem: • If sat() = 1 then sat(par[,k]) = 1. • sat(par[, k])  sat()c·kfor some c>0.

  31. Long Code Definition:An R-long-code has one bit for each boolean f : [R]  {0,1}.

  32. The PCP of [ST] For any bipartite graph G = ([k], [k], E) we construct a SAT instance (G), that contains one boolean function for every choice of: • z  Z • v1,…vk  LC[z] • w1,…,wk  W, s.t.  1  i  k, wi  z •  1  i  k, ui  wi • k2 perturbation functions p1,1,…,pk,k

  33. The PCP of [ST] (cont.) • (v1,…,vk,u1,…,uk,p1,1,…,pk,k) = TRUE (i,j)E, vi  uj = ‘vi  uj  pi,j’. • Denote

  34. The PCP of [ST] (cont.) Theorem: > 0, it is NP-hard to distinguish between the following 2 cases: Good: G = ([k], [k], E), p > (1 - )-|E| Bad:G = ([k], [k], E), p < 2-|E|

  35. Our PCP • A SAT instance  = (1,…,n) over 2 types of variables: X and Y. • The range of the variables x  X isRX = {0,1}l. • The range of the variables y  Y is {0,1}. • Each i   is of the type xy : Rx  {0,1}.

  36. Our PCP (cont.) • Let k = l/2. • Given an instance (G) as above, we construct an instance  as follows: • There is a variable x  X for every test   (G). An assignment to x is an assignment to the bits v1,…,vk,u1,…,uk. • Y = LC[W].

  37. Our PCP (cont.) Theorem: , > 0 and for some constant c = c( ) > 0, it is NP-hard to distinguish between: Good: There exists an assignment A s.t. Bad: For any assignment A

  38. Our PCP (cont.) Lemma: If there exists an assignment A s.t. ,then, there exists a graph G = (V, U, E) and an assignment to LC[W] and LC[Z] s.t.p  2-|E|.

  39. Our PCP (cont.) Proof: Assume there exists an assignment A s.t. .We assign the bits of LC[W] the values assigned to them by A, and the bits of LC[Z] are assigned random values.

  40. Our PCP (cont.) We now have to construct a graph G that would satisfy the lemma. We call an xgood if . Let x be good and let V0, U0 be the corresponding vertices.

  41. U0 V0 U1 V1 U2 Our PCP (cont.) The set of vertices in U0 that are consistent with x. The set of vertices in V0 for which at least½ + /2 of their edges are consistent with x. U0 \ U1 |V1|  /2 k

  42. Our PCP (cont.) Proposition:There exists i  {1,2} s.t.|Ui|/4 k, and at least ½ + /4 of the edges between Uiand V1are consistent with x.

  43. U1 V1 U’ Our PCP (cont.) The set of vertices in U0 that are consistent with x. The set of vertices in V0 for which at least½ + /2 of their edges are consistent with x. V’ U0 \ U1 |V1|  /2 k

  44. U1 V1 U2 Our PCP (cont.) The set of vertices in U0 that are consistent with x. U1 V1 The set of vertices in V0 for which at least½ + /2 of their edges are consistent with x. U0 \ U1 |V1|  /2 k

  45. Our PCP (cont.) • Let U’  Ui, V’  V1, s.t. |U’| = |V’| = /4 k, and at least ½ + /4 of the edges between U’ and V’ are consistent with x. • There are less than 22kpossibilities to choose U’ and V’  there is a subset X’ of at least 2-2k(and thus of size at least2-2k  |X|)of the good x’swith the same choice of U’ and V’.

  46. Our PCP (cont.) • Let X’’ be the subset of variables x  X’ that are consistent with the random assignment to LC[Z]. • The probability that A(x) is consistent with a random assignment to LC[Z] is 2-k the expected size of X’’ is 2-k |X’|. • Therefore, there exists an assignment to LC[Z] s.t. |X’’|  2-3k  |X|.

  47. Our PCP (cont.) • Let Gbe the multi-set of all graphsG = (V’, U’, E), corresponding to the variables x  X’’, where E is the set of all edges between U’ and V’ that are consistent with x. • |G|  2-3k  |X|. •  GG, |E|  (½ + /4) (/4 k)2.

  48. Our PCP (cont.) Lemma: Let Gbe a multi-set of bipartite graphs on [k’][k’], s.t. each graph in G has at least (½ + ’)k’2 edges.Then,  t  ’/2 k’2,  G = ([k’], [k’], E), s.t. |E|  t and .

  49. Our PCP (cont.) By the above lemma, for k’ = /4 k and’ = /2, G = ([/4 k], [/4 k], E), s.t.|E| = t = c’ (/4 k)2, where c’ < /4, and all the edges of this graph are consistent in at least 2-3k  (/4)t fraction of the variables in X. Considering this graph over the vertex sets U and V gives the desired result.

More Related