geometry approach for k regret query icde 2014 n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Geometry Approach for k -Regret Query ICDE 2014 PowerPoint Presentation
Download Presentation
Geometry Approach for k -Regret Query ICDE 2014

Loading in 2 Seconds...

play fullscreen
1 / 53

Geometry Approach for k -Regret Query ICDE 2014 - PowerPoint PPT Presentation


  • 154 Views
  • Uploaded on

Geometry Approach for k -Regret Query ICDE 2014. PENG Peng , Raymond Chi-Wing Wong CSE, HKUST. Outline. 1. Introduction 2. Contributions 3. Preliminary 4. Related Work 5 . Geometry Property 6 . Algorithm 7 . Experiment 8 . Conclusion. 1. Introduction. Multi-criteria Decision Making:

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Geometry Approach for k -Regret Query ICDE 2014' - leora


Download Now An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
geometry approach for k regret query icde 2014

Geometry Approach for k-Regret QueryICDE 2014

PENG Peng, Raymond Chi-Wing Wong

CSE, HKUST

outline
Outline
  • 1. Introduction
  • 2. Contributions
  • 3. Preliminary
  • 4. Related Work
  • 5. Geometry Property
  • 6. Algorithm
  • 7. Experiment
  • 8. Conclusion
1 introduction
1. Introduction
  • Multi-criteria Decision Making:
    • Design a query for the user which returns a number of “interesting” objects to auser
  • Traditional queries:
    • Top-k queries
    • Skyline queries
1 introduction1
1. Introduction
  • Top-k queries
    • Utility function
    • Given a particular utility function , the utility of all the points in D can be computed.
    • The output is a set of k points with the highest utilities.
  • Skyline queries
    • No utility function is required.
    • Apoint is said to be a skylinepoint if a point is not dominated by any point in the dataset.
    • Assume that a greater value in an attribute is more preferable.
    • We say that q isdominated by p if and only if for each and there exists an such that
    • The output is a set of skyline points.
limitations of traditional queries
Limitations of traditional queries
  • Traditional Queries
    • Top-k queries
      • Advantage: the output size is given by the user and it is controllable.
      • Disadvantage: the utility function is assumed to be known.
    • Skyline queries
      • Advantage: there is no assumption that the utility function is known.
      • Disadvantage: the output size cannot be controlled.
  • Recently proposed Query in VLDB2010
    • K-regret queries
      • Advantage: There is no assumption that the utility function is known and the output size is given by the user and is controllable.
2 contributions
2. Contributions
  • We give some theoretical properties of k-regret queries
    • We give a geometry explanation of a k-regret query.
    • We define happy points, candidate points for the k-regret query.
    • Significance: All existing algorithms and new algorithms to be developed for the k-regret query can also use our happy points for finding the solution of the k-regret query more efficiently and more effectively.
  • We propose two algorithms for answering a k-regret query
    • GeoGreedy algorithm
    • StoredList algorithm
  • We conduct comprehensive experimental studies
3 preliminary
3. Preliminary
  • Notations in k-regret queries

We have . Let .

    • Utility function .
      • is an example where .
      • Consider 3 utility functions, namely, .
      • .
    • Maximum utility .
      • ,
      • .
3 preliminary1
3. Preliminary
  • Notations in k-regret queries
    • Regret ratio.

Measures how bad a user with f feels after receiving the output S.

If it is 1, the user feels bad; if it is 0, then the user feels happy.

, ,

.029;

, ,

;

, ,

.

    • Maximum regret ratio.

Measures how bad a user feels after receiving the output S.

A user feels better whenis smaller.

      • .
3 preliminary2
3. Preliminary

Problem Definition

    • Given a d-dimensional database of size n and an integer k, a k-regret query is to find a set of S containing at most k points such that is minimized.
    • Let be the maximum regret ratio of the optimal solution.
  • Example
    • Given a set of points each of which is represented as a 2-dimensional vector.
    • A 2-regret query on these 4 points is to select 2 points among as the output such that the maximum regret ratio based on the selected points is minimized among other selections.
4 related work
4. Related Work
  • Variations of top-k queries
    • Personalized Top-k queries (Information System 2009)

- Partial information about the utility function is assumed to be known.

    • Diversified Top-k queries (SIGMOD 2012)

- The utility function is assumed to be known.

    • No assumption on the utility function is made for a k-regret query.
  • Variations of skyline queries
    • Representative skyline queries (ICDE 2009)

- The importance of a skyline point changes when the data is contaminated.

    • K-dominating skyline queries (ICDE 2007)

- The importance of a skyline point changes when the data is contaminated.

    • We do not need to consider the importance of a skyline point in a k-regret query.
  • Hybrid queries
    • Top-k skyline queries (OTM 2005)

- The importance of a skyline point changes when the data is contaminated.

    • -skyline queries (ICDE 2008)

- No bound is guaranteed and it is unknown how to choose .

    • The maximum regret ratio used in a k-regret query is bounded.
4 related work1
4. Related Work
  • K-regret queries
    • Regret-Minimizing Representative Databases (VLDB 2010)
      • Firstly propose the k-regret queries;
      • Proves a worst-case upper bound and a lower bound for the maximum regret ratio of the k-regret queries;
      • Propose the best-known fastest algorithm for answering a k-regret query.
    • Interactive Regret Minimization (SIGMOD 2012)
      • Propose an interactive version of k-regret query and an algorithm to answer a k-regret query.
    • Computing k-regret Minimizing set (VLDB 2014)
      • Prove the NP-completeness of a k-regret query;
      • Define a new k-regret minimizing set query and proposed two algorithms to answer this new query.
5 geometry property
5. Geometry Property
  • Geometry explanation of the maximum regret ratio given an output set S
  • Happy pointand its properties
geometry explanation of
Geometry Explanation of
    • Maximum regret ratio.
  • How to compute given an output set ?
    • The function space F can be infinite.
    • The method used in “Regret-Minimizing Representative Databases” (VLDB2010): Linear Programming
    • It is time consuming when we have to call Linear Programming independently for different s.
geometry explanation of1
Geometry Explanation of
    • Maximum regret ratio.
  • We compute with Geometry method.
    • Straightforward and easily understood;
    • Save time for computing .
an example in 2 d1
An example in 2-d
  • , where S.

1

1

geometry explanation of2
Geometry Explanation of
  • Critical ratio
    • A -critical point given denoted by is defined as the intersection between the vector and the surface of .
    • Critical ratio
geometry explanation of3
Geometry Explanation of
  • Lemma 0:
    • According to the lemma shown above, we compute at first for each which is outside and find the greatest value of which is the maximum regret ratio of .
an example in 2 d2
An example in 2-d
  • Suppose that , and the output set is .
  • .
  • .
  • .
  • So,
  • .

1

1

happy point
Happy point
  • The set is defined as a set of -dimensional points of size , where for each point and , we have when , and when .
  • In a 2-dimensional space, , where .
happy point1
Happy Point
  • In the following, we give an example of in a 2-dimensonal case.
  • Example:
happy point2
Happy point
  • Definition of domination:
    • We say that q is dominated by p if and only if for each and there exists an such that
  • Definition of subjugation:
    • We say that q is subjugated by p if and only if q is on or below all the hyperplanes containing the faces of and is below at least one hyperplane containing a face of .
    • We say that q is subjugated by p if and only if for each and there exists a such that .
an example in 2 d3
An example in 2-d
  • subjugates because is below both the line and the line .
  • does not subjugates because is above the line .
happy point3
Happy Point
  • Lemma 1:
    • There may exist a point in , which cannot be found in the optimal solution of a k-regret query.
  • Example:
    • In the example shown below, the optimal solution of a 3-regret query is , where is not a point in
an example in 2 d4
An example in 2-d
  • Lemma 2:
  • Example:
happy point4
Happy point
  • All existing studies are based on as candidate points for the k-regret query.
  • Lemma 3:
    • Let be the maximum regret ratio of the optimal solution. Then, there exists an optimal solution of a k-regret query, which is a subset of when .
  • Example:
    • Based on Lemma 3, we compute the optimal solution based on instead of .
6 algorithm
6. Algorithm
  • Geometry Greedy algorithm (GeoGreedy)
    • Pick boundary points of the dataset of size and insert them into an output set;
    • Repeatedly compute the regret ratio for each point which is outside the convex hull constructed based on the up-to-date output set, and add the point which currently achieves the maximum regret ratio into the output set;
    • The algorithm stops when the output size is k or all the points in are selected.
  • Stored List Algorithm (StoredList)
    • Preprocessing Step:
      • Call GeoGreedy algorithm to return the output of an -regret query;
      • Store the points in the output set in a list in terms of the order that they are selected.
    • Query Step:
      • Returns the first k points of the list as the output of a k-regret query.
7 experiment
7. Experiment
  • Datasets
  • Experiments on Synthetic datasets
  • Experiments on Real datasets
    • Household dataset :
    • NBA dataset:
    • Color dataset:
    • Stocks dataset:
  • Algorithms:
    • Greedy algorithm (VLDB 2010)
    • GeoGreedy algorithm
    • StoredList algorithm
  • Measurements:
    • The maximum regret ratio
    • The query time
7 experiment1
7. Experiment
  • Experiments
    • Relationship Among
    • Effect of Happy Points
    • Performance of Our Method
effect of happy points
Effect of Happy Points
  • Household: maximum regret ratio

The result based on

The result based on

effect of happy points1
Effect of Happy Points
  • Household: query time

The result based on

The result based on

performance of our method
Performance of Our Method
  • Experiments on Synthetic datasets
    • Maximum regret ratio

Effect of d

Effect of n

performance of our method1
Performance of Our Method
  • Experiments on Synthetic datasets
    • Query time

Effect of d

Effect of n

performance of our method2
Performance of Our Method
  • Experiments on Synthetic datasets
    • Maximum regret ratio

Effect of k

Effect of large k

performance of our method3
Performance of Our Method
  • Experiments on Synthetic datasets
    • Query time

Effect of k

Effect of large k

8 conclusion
8. Conclusion
  • We studied a k-regret query in this paper.
  • We proposed a set of happy points, a set of candidate points for the k-regret query, which is much smaller than the number of skyline points for finding the solution of the k-regret query more efficiently and effectively.
  • We conducted experiments based on both synthetic and real datasets.
  • Future directions:
    • Average regret ratio minimization
    • Interactive version of a k-regret query
geogreedy algorithm
GeoGreedy Algorithm
  • GeoGreedy Algorithm
geogreedy algorithm1
GeoGreedy Algorithm
  • An example in 2-d:
  • In the following, we compute a 4-regret query using GeoGreedy algorithm.

1

1

geogreedy algorithm2
GeoGreedy Algorithm
  • Line 2 – 4:

1

1

geogreedy algorithm3
GeoGreedy Algorithm
  • Line 2 – 4:
    • .
  • Line 5 – 10 (Iteration 1):
    • Since and , we add

in .

1

1

geogreedy algorithm4
GeoGreedy Algorithm
  • Line 5 – 10 (Iteration 2):
    • After Iteration 1, .
    • We can only compute which is less than 1 and we add in .

1

1

storedlist algorithm
StoredList Algorithm
  • Stored List Algorithm
    • Pre-compute the outputs based on GeoGreedy Algorithm for .
    • The outputs with a smaller size is a subset of the outputs with a larger size.
    • Store the outputs of size n in a list based on the order of the selection.
storedlist algorithm1
StoredList Algorithm
  • After two iterations in GeoGreedy Algorithm, the output set .
  • Since the critical ratio for each of the unselected points is at least 1, we stop GeoGreedy Algorithm and is the output set with the greatest size.
  • We stored the outputs in a list L which ranks the selected points in terms of the orders they are added into .
  • That is, .
  • When a 3-regret query is called, we returns the set .
effect of happy points2
Effect of Happy Points
  • NBA: maximum regret ratio

The result based on

The result based on

effect of happy points3
Effect of Happy Points
  • NBA: query time

The result based on

The result based on

effect of happy points4
Effect of Happy Points
  • Color: maximum regret ratio

The result based on

The result based on

effect of happy points5
Effect of Happy Points
  • Color: query time

The result based on

The result based on

effect of happy points6
Effect of Happy Points
  • Stocks: maximum regret ratio

The result based on

The result based on

effect of happy points7
Effect of Happy Points
  • Stocks: query time

The result based on

The result based on

preliminary
Preliminary
  • Example:
  • , where .
  • We have .
  • Let .
  • Since ,
  • and ,
  • we have .
  • Similarly,
  • ,
  • .
  • So, we have
an example in 2 d5
An example in 2-d
  • Points (normalized):

1

1