1 / 25

Skyline

Skyline. Charuka Silva. Outline. Motivation Skyline Definition Applications Skyline Query Similar Interesting Problem Algorithms Divide and Conquer Algorithm Index based Algorithm Nearest Neighbor. Trip to Nassau (Bahamas). Hotel that is cheap and close to the beach.

osric
Download Presentation

Skyline

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Skyline Charuka Silva

  2. Outline • Motivation • Skyline Definition • Applications • Skyline Query • Similar Interesting Problem • Algorithms • Divide and Conquer Algorithm • Index based Algorithm • Nearest Neighbor Charuka Silva, Skyline

  3. Trip to Nassau (Bahamas) • Hotel that is cheap and close to the beach. • Two goals are complementary as the hotels near the beach tend to be more expensive. • Travel agent can suggest all interesting hotels. • Interesting are all hotels that are not worse than any other hotel in both dimensions. • We call this set of interesting hotels the Skyline Charuka Silva, Skyline

  4. Distribution of Hotels Charuka Silva, Skyline

  5. Formal Skyline Definition Skyline is defined as those points which are not dominated by any other point. A point dominates another point if it is as good or better in all dimensions and better in at least one dimension. Charuka Silva, Skyline

  6. Where It Applies? Skyline operator is important for applications involving multi-criteria decision making. Charuka Silva, Skyline

  7. Some Applications • Customer information systems, travel agencies and mobile city guides. Skyline has to be computed as user move on. • The Skyline of Manhattan, for instance, can be computed as the set of buildings which are high and close to the Hudson river. • Decision Support (Business intelligence), e.g. Customers who buy more and complain little • Data visualization. E.g. The points of an object from certain perspective can be determined • Distributed Query optimization. E.g. find set of interesting sites which have high computation power and are close to data needed to execute the query. Charuka Silva, Skyline

  8. Skyline Query select * from Hotels, skyline of price min , distance min what else: max, joins, group by and so on. Charuka Silva, Skyline

  9. Skyline Query Results Results for the query will be {a,i,k} Charuka Silva, Skyline

  10. Top-K Queries Vs Skyline • Top-K (or ranked) queries retrieve the best K objects that minimize a specific preference function. • E.g. Given preference function f(x,y)=x+y, the top-3 query • Retrieves <i,5>, <h,7>, <m,8> (in this order) Charuka Silva, Skyline

  11. Algorithm 1‏ Divide-and-Conquer (D&C) • Divides the dataset into several partitions so that each partition fits in memory • The partial skyline of the points in every partition is computed • Merge the partial ones to obtain full skyline Charuka Silva, Skyline

  12. Partitioned Space { a,c,g}, {d}, {i},{m,k} Charuka Silva, Skyline

  13. Divide and Conquer • All points in the skyline of s3 must remain. • Those in s2 are discarded; dominated by s3 • Each skyline point in s1 is compared only with points in s3, no point in s2 or s4 can dominate those in s1. Charuka Silva, Skyline

  14. Drawbacks • D&C efficient only for small data sets. If the data set is large, the partitioning process requires reading and writing entire data set at least once : high I/O cost • Not suitable for online applications: can't report any results until partition process completes. Charuka Silva, Skyline

  15. Algorithm 2 Index Based Skyline • Organize set of d-dimensional points into d lists, a point p = (p1, p2, ..., pd) is assigned to the ith list (1≤i≤d) when pi is the smallest. • Points in each list are sorted in ascending order of their minimum • A batch in the ith list consists of points that have the same ith coordinate Charuka Silva, Skyline

  16. Index List Charuka Silva, Skyline

  17. Processing a batch • Computing the skyline inside the batch • Among the computed points, it adds the ones not dominated by any of the already-found skyline points into the skyline list Charuka Silva, Skyline

  18. Processing a batch • Compare batch {b} and {k}, and add {k} to the list. • Load {b} and {i,m} ; Find skyline inside {i,m} first, that is {i} • Compare {i} and {b} and add {i} to skyline list • Algorithm stops, since any other batch is greater than or equal to {i} • Skyline is {a,k,i} • Loads the first batch of each list, and handles the one with the minimum minC ( i.e. {a}, {k} ), add {a} to the Skyline list Charuka Silva, Skyline

  19. Pros and Cons • Hashing technique is straight forward and incurs low CPU overhead • But high I/O cost, since multiple queries access large part of space. • Propagate and merge incur high I/O cost to scan to-do lost every time when a point is discovered and when finding best fit to merge. Charuka Silva, Skyline

  20. Algorithm 3 Nearest Neighbor (NN) • Performs a NN query on the R-tree, to find the point with the minimum distance from the beginning of the axes (point o). • Distances are computed according to L1 norm • All the points in the dominance region are exempt from further consideration • Results of NN search is used to partition the data universe recursively. Charuka Silva, Skyline

  21. Nearest Neighbor (NN) Two Partitions [0,ix) [0,∞) and (ii) [0,∞) [0,iy) Partition1: 1, 3 Partition2: 1,2 Charuka Silva, Skyline

  22. Nearest Neighbor (NN) • The set of partitions resulting after the discovery of a skyline point are inserted in a to-do list • While the to-do list is not empty, NN removes one of the partitions from the list and recursively repeats the same process Charuka Silva, Skyline

  23. Nearest Neighbor (NN) [0,ax) [0,∞) subdivisions 1 and 3 [0,ix) [0,ay) subdivision 1 and 2 Charuka Silva, Skyline

  24. NN Concepts • Laisser-faire: A main memory hash table stores the skyline points found so far. • Propagate: When a point p is found, all the partitions in the to-do list that contain p are removed and re-partitioned according to p. • Merge: The main idea is to merge partitions in the to-do, thus reducing the number of queries that have to be performed. • Fine-grained Partitioning: The original NN algorithm generates d partitions after a skyline point is found. An alternative approach is to generate 2d non-overlapping subdivisions. Charuka Silva, Skyline

  25. Reference • S. Borzs onyi, D. Kossmann, and K. Stocker.The skyline operator. In Proc. IEEE Conf. on Data Engineering, Heidelberg, Germany, 2001. • K.-L. Tan, P.-K. Eng, and B. C. Ooi. Ecient progressive skyline computation. In Proc. of the Conf. on Very Large Data Bases, Rome, Italy, Sept. 2001 • H. T. Kung, F. Luccio, and F. P. Preparata.On finding the maxima of a set of vectors. Journal of the ACM, 22(4), 1975 • Kossmann, D., Ramsak, F., Rost, S. Shooting Stars in the Sky: an Online Algorithm for Skyline Queries.VLDB, 2002. • Dimitris Papadias, Yufei Tao , Greg Fu  Bernhard Seeger. An optimal and progressive algorithm for skyline queries. In Conf. on Management of Data ACM SIGMOD 2003. Charuka Silva, Skyline

More Related