1 / 38

P2P systems: epidemic scheduling, content placement and user profiling

P2P systems: epidemic scheduling, content placement and user profiling. Laurent Massoulié Thomson, Paris Research Lab. Outline. Epidemic schemes for live streaming Rate-optimality Delay-optimality Content placement Optimisation framework Adaptive replication User profiling

marah-avery
Download Presentation

P2P systems: epidemic scheduling, content placement and user profiling

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. P2P systems:epidemic scheduling, content placement and user profiling Laurent Massoulié Thomson, Paris Research Lab

  2. Outline • Epidemic schemes for live streaming • Rate-optimality • Delay-optimality • Content placement • Optimisation framework • Adaptive replication • User profiling • Spectral clustering • Linear programming

  3. Outline • Epidemic schemes for live streaming • Rate-optimality • Delay-optimality • Content placement • Optimisation framework • Adaptive replication and 3/4 - competitivity • User profiling • Spectral clustering • Linear Programming

  4. Context • P2P systems for live streaming on the Internet • PPLive, CoolStreaming, Sopcast, TVants,TVUPlay, Joost…

  5. Network constraints • Graph connecting nodes • Capacities assigned to edges • Achievable broadcast rate [Edmonds, 73]: • Equals maximal number of edge-disjoint spanning trees that can be packed in graph • Coincides with minimum over receivers of max-flow ( = min-cut) between source and receiver

  6. Random Useful chunk selection and Edmonds’ theorem[LM, A. Twigg, C. Gkantsidis & P. Rodriguez] 1 2 4 5 7 8 Based on local informations No explicit construction of spanning trees 5 1 4 When injection rate at source is strictly feasible, Markov process is ergodic. Chunks successfully broadcast with bounded delay ? ? ? ? ? ? ? ? ?

  7. Network with access (node) constraints • Scarce resource: access capacity • Complete communication graph: Everyone can send to anyone • Bound on maximum streaming rate λ: Let ci= uplink b/w of node i Necessary condition for feasibility: …

  8. Deprived Peer / Random Useful Chunk [LM, A. Twigg, C. Gkantsidis & P. Rodriguez] Sender’s packets 1 2 4 5 7 8 5 1 5 7 8 1 4 Potential receiver 1 Potential receiver 2 Source policy: sends “fresh” packets if any (fresh = not sent yet to anyone)

  9. Deprived Peer / Random Useful Chunk [LM, A. Twigg, C. Gkantsidis & P. Rodriguez] Sender’s packets 1 2 4 5 7 8 1 5 7 8 1 4 5 Potential receiver 1 Potential receiver 2 Neighborhood management: Periodically add random neighbor & suppress least deprived neighbor  Fixed neighborhood sizes

  10. Main result • When λ < λ* , Markov process is ergodic. • Hence all packets are received at all nodes after time bounded in probability

  11. Multiple commodities • Several sources s, • Dedicated receiver sets V(s) • Can overlap • Sources are not receivers • Nodes cannot relay commodities they don’t consume …

  12. Multiple commodities • Necessary conditions for feasibility: • Bundled most deprived / random useful: do not distinguish between commodities when • measuring deprivation • Chosing random useful packet System is ergodic when Conditions hold with strict inequality

  13. Symmetric Networks (c1 = c2 = ... = cN = 1 chunk / sec ) • Previous lower bound reads log2(N) • Achievable [J. Mundinger & R. Weber]: t t-1 t-1 source t-2 t-2 t-2 t-2 t+1 t-3 t-3 t-3 t-3 t-3 t-3 t-3 t-3 Makes use of log2(N) trees; not robust against churn

  14. A look at the corresponding trees N=4 N=8 N=16 N=32

  15. Random target / latest useful packet Sender’s packets 1 2 4 5 7 8 Latest useful pkt ? 1 ? 2 ? 3 ? 8 Receiver’s packets

  16. Random target / latest useful packet [T. Bonald, LM, F. Mathieu, D. Perino & A. Twigg] I.e: Diffusion at rates arbitrarily close to optimal feasible under optimal delay ( plus constant) For arbitrary injection rate λ<1 and constant x>0, Each peer receives fraction 1- 1/x of packets in time log2(N)+O(x).

  17. Open questions • Delay optimality in heterogeneous environments • Cost optimality • Convergence time scale

  18. Outline • Epidemic schemes for live streaming • Rate-optimality • Delay-optimality • Content placement • Optimisation framework • Adaptive replication • User profiling • Spectral clustering • Linear programming

  19. Outline • Epidemic schemes for live streaming • Rate-optimality • Delay-optimality • Content placement • Optimisation framework • Adaptive replication • User profiling • Spectral clustering • Linear programming

  20. Problem statement • N users • Storage capacity: m objects • Service capacity: B requests • Local accesses are free • Request rate: f for object f • Request duration: 1 • Aim: minimize number of lost requests

  21. Optimal placement structure • Let Mf= number of replicas of object f • Schedulable region: request rates xf verifying • Effective arrival rates: times K if objects can be split into K size (1/K) sub-objects

  22. Hot/Warm/Cold partition • Sort objects according to popularity : 12 … • Replicate everywhere (Mf=N) top popular objects 1…,f(1) • Partial replication of objects f(1)+1,…f(2) : • No replication of objects for f>f(2) • f(1) and f(2) : such that “warm objects” generate requests at rate BN, and all memory is used

  23. Adaptive replication • Replication policy: • Create new replica for object f after each dropped request • Remove object chosen at random • Ignoring object-specific capacity constraints, caricature dynamics: Equilibrium:

  24. Adaptive replication (ctd) • Compare to full replication of only top popular objects, i.e. • Then reductions to offered rates verify  “Value of foresight” is less than 25%...

  25. Outline • Epidemic schemes for live streaming • Rate-optimality • Delay-optimality • Content placement • Optimisation framework • Adaptive replication • User profiling • Spectral clustering • Linear programming

  26. Outline • Epidemic schemes for live streaming • Rate-optimality • Delay-optimality • Content placement • Optimisation framework • Adaptive replication • User profiling • Spectral clustering • Linear programming

  27. User profiling • Aim: predict tastes of users • Applications: • Further optimization of placement • Recommender Systems

  28. Netflix dataset 17, 770 movies, rated by 480, 000 users

  29. The planted partition model • Userspartitioned into clusters k=1,…,K • Each pair of users (i,j) : conflict level C(i,j) in [0,1] (e.g., fraction of movies rated differently) • Statistical assumptions: • C(i,j) independent over i<j • E(C(i,j)) = bkl D/Nif users i,j belong clusters k, l

  30. A spectral algorithm Step 1: find suitable “de-noised” descriptors of users  Form normalized eigenvectors x(1),…,x(K)associated to K largest (in absolute value) eigenvalues of conflict matrix  To each user i, assign vector zi=(xi (1),…,xi (K))

  31. A spectral algorithm Step 2: do crude clustering on descriptors  Pick a random set of A users u(1),…,u(A)  Identify pair with closest descriptors (for L2 norm) and remove one of them, until only K users are left, say v(1),…,v(K)  Cluster the nodes according to proximity of their descriptors to the cluster exemplars v(1),…,v(K)

  32. Theorem Assume that • Fixed number K of clusters, each of size (N) • Matrix (bkl) has full rank K • DC log(N) for some constant C Then with probability 1-o(1) , Algorithm partitions correctly fraction 1-o(1) of nodes for suitable A ( 1<< A << D1/2 ) Main tool: control of spectral structure of E-R graph adjacency matrix when average degree DC log(N) [Feige-Ofek]

  33. Open question • Brute force Maximum Likelihood: retrieves clusters when D>>1 Efficient procedure under this assumption?

  34. Another algorithmic version of Netflix • Objective: for user n, find inference of all unknown ratings that maximizes number of users fully agreeing with user n  NP-hard (badly so) • Probabilistic model • Users belong to clusters k=1,…,K, with sizes a(k) N • Within a cluster, identical ratings (i.i.d., +1 or -1 w.p. ½ for each movie, F movies in total) • Each rating of each user: revealed w.p. p

  35. Proposed algorithm(inspiration: compressive sensing; see [Decoding by linear programming, Candes&Tao]) • Consider user 1 • For suitable cost function g, determine full rating vectors X(n) , compatible with known ratings (i.e. PnX(n)=Y(n) ), that minimize • A proxy to (intractable) minimization of

  36. Conditions for optimality • Assume optimum of (II) : “clustered” reconstruction X**(n) such that X**(n)=X**(1) for all indices n  A • Then optimum of (I) such that X*(n)=X*(1),n  A provided:

  37. Application to probabilistic model • Necessary condition for hidden cluster to be optimal: • Sufficient condition for LP algorithm to retrieve hidden cluster, under choice g= |.|:  Differ by factor at most K-1

  38. Outlook • Clustering • Robustness of proposed schemes to statistical modeling assumptions • Efficient (distributed?) implementations

More Related