1 / 33

Count / Top-k Continuous Queries on P2P Networks

Count / Top-k Continuous Queries on P2P Networks. 01/11/2006. Outline. Problem Definition P2P Architecture Count Top-K Experiment Setup Future Work. Streaming Data in P2P. P2P Dynamic changing topology, large scale, … Streaming data Continuous, unbounded, rapid, time-varying, noise

Download Presentation

Count / Top-k Continuous Queries on P2P Networks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Count / Top-k Continuous Queries on P2P Networks 01/11/2006

  2. Outline • Problem Definition • P2P Architecture • Count • Top-K • Experiment Setup • Future Work

  3. Streaming Data in P2P • P2P • Dynamic changing topology, large scale, … • Streaming data • Continuous, unbounded, rapid, time-varying, noise • P2P + Streaming data • Dynamic in both data and topology

  4. Objective and Goal • Objective • Issue a continuous query to estimate count and top-K • Goal • Lower down the communication cost • Lightweight maintenance • Approximated answers • An adaptive and progressive approach

  5. Naïve approach • Flooding the overlay continuous • Pros • Closer to the exact answer • Cons • Network congestion • Still non-real time

  6. The State-of-the-Art • Count • Focus on one-time answer in P2P • Deal with streaming data only • Top-K • P2P environment without streaming data • Distributed environment not P2P

  7. P2P architecture • Assumption • Hierarchical P2P (Focused) • Super-peer hierarchical structure • Query issuer is a super-peer • Super peer connect with other super peers • Each peer belongs to only one super peer • Pure unstructured P2P

  8. Big picture Group Accumulate information within a group based on the constraint and statistics Report changes SetConstraint Approximated answer

  9. Group in hierarchical P2P Coordinator Issuer Peer

  10. Group in hierarchical P2P 1 3 2 4

  11. Group in hierarchical P2P 1 3 2 3 4 4

  12. Group in hierarchical P2P 1 3 2 3 4 4

  13. After partition Assume we have N objects and K Groups after partition Group1 Group3 Group2

  14. User-specified Epsilon Group1 User-specifiedε(Precision) Group3 Group2

  15. Consider a group O1 O2 O3 P2 P3 Objects P1 P4 Node Coordinator

  16. Each node maintain the distribution information of owning objects # R2 Rate R3 P2 P3 object P1 R4 P4 R1

  17. At initial - Polling P2 P3 P1 P4 Node Coordinator

  18. At initial - Polling P2 P3 P1 P4 Node Coordinator

  19. Information at coordinator after polling 26 # 33 22 P2 P3 object P1 P4

  20. Statistics information Estimated value Change value for each object Latest real value 26 # 33 P1 P2 P3 P4Δ O1 1/1 6/6 10/10 5/5 22 O2 11/11 13/13 5/5 9/7 36 O3 15/15 6/6 3/3 9/9 33 R 0.3 0.2 -0.05 0.6 T 15 15 17 13 22 Updated time stamp object Maximum changing rate(+/-) of objects in each peer

  21. Update to Coordinator (Δ13, Δ23, Δ33) (Δ11, Δ21, Δ31) (Δ12, Δ22, Δ32) T2

  22. Calculate Count

  23. Redistribute Epsilon wi=Max(Δi)/Cx,0 where x is the i-index of Max(Δi) δi=wiεCx,0/ ∑wi

  24. Visiting sequence P2 P3 Pick those peers would violate δ P1 P4

  25. Update information P1 P2 P3 P4Δ O1 1/1 6/6 10/10 8/8- O2 11/11 11/11 5/5 6/6- O3 15/15 5/5 3/3 11/11- R 0.3 0.4 -0.05 0.2 T 15 30 17 33 Group

  26. For those nodes not being visited P1 P2 P3 P4Δ O1 1/26/6 10/98/8 25 O2 11/1311/11 5/46/6 34 O3 15/185/5 3/211/11 36 R 0.3 0.4 -0.05 0.2 T 15 30 17 33 Group

  27. Un-notified Leave P2 P3 Ping P1 is dead P1 P4 Remove P1’s information

  28. Experiment Setup • Generate synthetic data set by statistics distribution for • Streaming data • Life time of peers • Metrics • Message size • Communication cost • Response latency • Result accuracy

  29. Top-K • Use Regression to predicate the reasonable trend of changes • Once a updated result is required, Super Peer only need to ask those doubtfulpeers for doubtfulobjects • Update its counting list, and return the top k objects

  30. Future Work • Connect and recommend latent good friends for each user • Good friends: the ones with the same interests (behaviors) • Exploiting current connecting peers to discover good friends bit by bit • Design a system that could make clusters reflecting current interests of individual peers and connecting them together based on their similarity by using user’s social network

  31. Advantages • Reduce search time and diminish query traffic by using friends list • By utilizing their different strength of arcs/edges/ties = friendshipness, social networks exceed random-walk networks in quickly finding target objects

  32. Example Level 1 Level 2

  33. Example has larger weight than Score(Ni) Similarity Score(Ni)

More Related