1 / 74

Atish Das Sarma , Ashwin Lall , Danupon Nanongkai , Jun Xu

Randomized Multi-pass Streaming Skyline A lgorithm. Atish Das Sarma , Ashwin Lall , Danupon Nanongkai , Jun Xu. Georgia Tech. VLDB 2009. In one sentence …. “We develop a streaming algorithm. “We develop a streaming algorithm for skyline problem.

ismet
Download Presentation

Atish Das Sarma , Ashwin Lall , Danupon Nanongkai , Jun Xu

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Randomized Multi-pass Streaming Skyline Algorithm AtishDas Sarma, AshwinLall, DanuponNanongkai, Jun Xu Georgia Tech VLDB 2009

  2. In one sentence ….

  3. “We develop a streaming algorithm

  4. “We develop a streamingalgorithm for skyline problem

  5. “We develop a streamingalgorithm for skyline problem with near-optimal worst-case guarantee.”

  6. What is skyline?

  7. I want a cheap hotel nearby

  8. I want a cheap hotel nearby dominates

  9. I want a cheap hotel nearby dominates

  10. Price de la Cite Mercure Park & Suites Athena du Helder Distance

  11. Price de la Cite Mercure Park & Suites Athena du Helder Distance

  12. Problem definition • Given distinct d-dimensional points • (a1, …, ad)dominates(b1, …, bd) if ai ≤ bi for all i and ai’ < bi’ for some i’ • Skyline = set of undominated points Example (1,3) (1, 3) , (5, 2) , (3, 2) (3,2) (5,2) dominates Skyline = { (1, 3) , (3, 2) }

  13. Skyline algorithms RAM Disk (External) DD&C Kung et al. FOCS’ 75 LD&CBently et al. JACM’78, FLETBently et al. SODA’90, Preprocessing Non-preprocessing SD&CBorzsonyi et al. ICDE’01, BNL Borzsonyi et al. ICDE’01, SFSChomicki et al. ICDE’03, LESS Godfrey et al. VLDB’05 BBS Papadias et al. SIGMOD’03 NN Kossman et al. VLDB’02

  14. Our Goal “Non-preprocessing external algorithm with worst-case guarantee” What is the model of external algorithms?

  15. Models for external algorithms CPU process ≠ I/O Sequental I/O ≠ Random I/O Multi-pass Streaming Model # of random I/O’s = # of passes Streaming model naturally forces us to minimize the number of random I/O’s

  16. What is multi-pass stream?

  17. Multi-pass Streaming model Huge Harddisk (1, 2) (3, 7) (5, 3) (2, 5) (4, 1) (9, 9) Small RAM

  18. Multi-pass Streaming model Huge Harddisk (1, 2) (3, 7) (5, 3) (2, 5) (4, 1) (9, 9) Small RAM

  19. Multi-pass Streaming model Huge Harddisk (1, 2) (3, 7) (5, 3) (2, 5) (4, 1) (9, 9) Small RAM

  20. Multi-pass Streaming model Huge Harddisk (1, 2) (3, 7) (5, 3) (2, 5) (4, 1) (9, 9) Small RAM

  21. Multi-pass Streaming model Huge Harddisk (1, 2) (3, 7) (5, 3) (2, 5) (4, 1) (9, 9) 2nd pass Small RAM

  22. Multi-pass Streaming model Huge Harddisk (1, 2) (3, 7) (5, 3) (2, 5) (4, 1) (9, 9) 3rd pass Small RAM

  23. Our Goal “Non-preprocessing external algorithm with worst-case guarantee” streaming

  24. Main results Theory • RAND uses O(log n) passes & O(m) space • Every algorithm that uses 1 pass needs Ω(n) space Next: RAND algorithm Later: Experimental result RAND: Almost optimal multi-pass streaming algorithm for skyline O(log n) passes & O(m) space 1 pass needs Ω(n) space n = # of points and m = skyline size

  25. RAND algorithm

  26. Algorithms: Main Idea Suppose m is known. Theorem: In 3 passes and m space, we can find skyline points that “dominate” at least n/2 points, with high probability

  27. Eliminate-Points algorithm 1. Sample x=2m ln(mn log n) points p1, p2, …,px 2. Go through the stream,Replace each pi by a point dominating it 3. For each pi, delete pi and all points it dominates Output p1, p2, …,px and repeat (1, 5), (3, 4), (4, 5), (4, 3), (3, 3), (4,4) (4, 4)

  28. Eliminate-Points algorithm 1. Sample x=2m ln(mn log n) points p1, p2, …,px 2. Go through the stream,Replace each pi by a point dominating it 3. For each pi, delete pi and all points it dominates Output p1, p2, …,px and repeat (1, 5), (3, 4), (4, 5), (4, 3), (3, 3), (4,4) (4, 4)

  29. Eliminate-Points algorithm 1. Sample x=2m ln(mn log n) points p1, p2, …,px 2. Go through the stream,Replace each pi by a point dominating it 3. For each pi, delete pi and all points it dominates Output p1, p2, …,px and repeat (1, 5), (3, 4), (4, 5), (4, 3), (3, 3), (4,4) (4, 4)

  30. Eliminate-Points algorithm 1. Sample x=2m ln(mn log n) points p1, p2, …,px 2. Go through the stream,Replace each pi by a point dominating it 3. For each pi, delete pi and all points it dominates Output p1, p2, …,px and repeat (1, 5), (3, 4), (4, 5), (4, 3), (3, 3), (4,4) (4, 4) (3, 4)

  31. Eliminate-Points algorithm 1. Sample x=2m ln(mn log n) points p1, p2, …,px 2. Go through the stream,Replace each pi by a point dominating it 3. For each pi, delete pi and all points it dominates Output p1, p2, …,px and repeat (1, 5), (3, 4), (4, 5), (4, 3), (3, 3), (4,4) (3, 4)

  32. Eliminate-Points algorithm 1. Sample x=2m ln(mn log n) points p1, p2, …,px 2. Go through the stream,Replace each pi by a point dominating it 3. For each pi, delete pi and all points it dominates Output p1, p2, …,px and repeat (1, 5), (3, 4), (4, 5), (4, 3), (3, 3), (4,4) (3, 4)

  33. Eliminate-Points algorithm 1. Sample x=2m ln(mn log n) points p1, p2, …,px 2. Go through the stream,Replace each pi by a point dominating it 3. For each pi, delete pi and all points it dominates Output p1, p2, …,px and repeat (1, 5), (3, 4), (4, 5), (4, 3), (3, 3), (4,4) (3, 4) (3, 3)

  34. Eliminate-Points algorithm 1. Sample x=2m ln(mn log n) points p1, p2, …,px 2. Go through the stream,Replace each pi by a point dominating it 3. For each pi, delete pi and all points it dominates Output p1, p2, …,px and repeat (1, 5), (3, 4), (4, 5), (4, 3), (3, 3), (4,4) (3, 4) (3, 3)

  35. Eliminate-Points algorithm 1. Sample x=2m ln(mn log n) points p1, p2, …,px 2. Go through the stream,Replace each pi by a point dominating it 3. For each pi, delete pi and all points it dominates Output p1, p2, …,px and repeat (1, 5), (3, 4), (4, 5), (4, 3), (3, 3), (4,4) (3, 4) (3, 3)

  36. Eliminate-Points algorithm 1. Sample x=2m ln(mn log n) points p1, p2, …,px 2. Go through the stream,Replace each pi by a point dominating it 3. For each pi, delete pi and all points it dominates Output p1, p2, …,px and repeat (1, 5), (3, 4), (4, 5), (4, 3), (3, 3), (4,4) (3, 4) (3, 3)

  37. Eliminate-Points algorithm 1. Sample x=2m ln(mn log n) points p1, p2, …,px 2. Go through the stream,Replace each pi by a point dominating it 3. For each pi, delete pi and all points it dominates Output p1, p2, …,px and repeat (1, 5), (3, 4), (4, 5), (4, 3), (3, 3), (4,4) (3, 4) (3, 3)

  38. Eliminate-Points algorithm 1. Sample x=2m ln(mn log n) points p1, p2, …,px 2. Go through the stream,Replace each pi by a point dominating it 3. For each pi, delete pi and all points it dominates Output p1, p2, …,px and repeat (1, 5), (3, 4), (4, 5), (4, 3), (3, 3), (4,4) (3, 4) (3, 3)

  39. Eliminate-Points algorithm 1. Sample x=2m ln(mn log n) points p1, p2, …,px 2. Go through the stream,Replace each pi by a point dominating it 3. For each pi, delete pi and all points it dominates Output p1, p2, …,px and repeat (1, 5), (3, 4), (4, 5), (4, 3), (3, 3), (4,4) (3, 4) (3, 3)

  40. Eliminate-Points algorithm 1. Sample x=2m ln(mn log n) points p1, p2, …,px 2. Go through the stream,Replace each pi by a point dominating it 3. For each pi, delete pi and all points it dominates Output p1, p2, …,px and repeat (1, 5), (3, 4), (4, 5), (4, 3), (3, 3), (4,4) (3, 4) (3, 3)

  41. Analysis Theorem: Eliminate-Points algorithm deletes at least n/2 points with high probability

  42. Analysis • Draw trees: Each point points to its first dominating point 1, 5 3, 3 4, 5 3, 4 4, 3 4, 4 (1, 5), (3, 4), (4, 5), (4, 3), (3, 3), (4,4)

  43. Analysis • Draw trees: Each point points to its first dominating point 1, 5 3, 3 4, 5 3, 4 4, 3 4, 4 (1, 5), (3, 4), (4, 5), (4, 3), (3, 3), (4,4)

  44. Analysis • Draw trees: Each point points to its first dominating point 1, 5 3, 3 4, 5 3, 4 4, 3 4, 4 (1, 5), (3, 4), (4, 5), (4, 3), (3, 3), (4,4) Note: There will be m trees, each rooted by a skyline point

  45. Analysis • Draw trees: Each point points to its first dominating point 1, 5 3, 3 4, 5 3, 4 4, 3 4, 4 (1, 5), (3, 4), (4, 5), (4, 3), (3, 3), (4,4) (4, 4)

  46. Analysis • Draw trees: Each point points to its first dominating point 1, 5 3, 3 4, 5 3, 4 4, 3 4, 4 (1, 5), (3, 4), (4, 5), (4, 3), (3, 3), (4,4) (3, 3)

  47. Analysis • Claim: The tree that some element is sampled will be deleted 1, 5 3, 3 4, 5 3, 4 4, 3 4, 4 (1, 5), (3, 4), (4, 5), (4, 3), (3, 3), (4,4) (3, 3)

  48. Analysis • There are m trees, each rooted by a skyline point 1 2 m-1 m

  49. Analysis • There are m trees, each rooted by a skyline point 1 2 m-1 m

  50. Analysis • Big tree has bigger chance of being sampled … and deleted 1 2 m-1 m

More Related