Supporting Ad-Hoc Ranking Aggregates. Chengkai Li (UIUC) joint work with Kevin Chang (UIUC) Ihab Ilyas (Waterloo). Ranking (Top-k) Queries. Find the top k answers with respect to a ranking function , which often is the aggregation of multiple criteria .
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
Supporting Ad-Hoc Ranking Aggregates
Chengkai Li (UIUC)
joint work with
Kevin Chang (UIUC) Ihab Ilyas (Waterloo)
Find the top k answers with respect to a ranking function, which often is the aggregation of multiple criteria.
Ranking is important in many database applications:
Find the best hotel deals by price, distance, etc.
Find the most similar images by color, shape, texture, etc.
Find the most relevant records/documents/pages.
Find the top profitable customers to send ads.
SPJ queries (SELECT … FROM … WHERE … ORDER BY …)
top k groups instead of tuples.
(SELECT … FROM … WHERE … GROUP BY … ORDER BY …)
SELECT zipcode,
AVG(income*w1+age*w2+credit*w3) as score
FROM customer
WHERE occupation=‘student’
GROUP BY zipcode
ORDER BY score
LIMIT 5
SELECT P.cateogy, S.zipcode,
MID_SUM(S.price - P.manufact_price)
as score
FROM products P, sales S
WHERE P.p_key=S.p_key
GROUP BY P.category, S.zipcode
ORDER BY score
LIMIT 5
Ranking Condition : F=G(T)
e.g.AVG (income*w1+age*w2+credit*w3)
MID_SUM (S.price - P.manufact_price)
w1, w2, w3 can be any values.
DSS applications are exploratory and interactive:
pre-computation
e.g., mid_sum cannot be derived from sum, avg, etc.
e.g., a+b does not help in doing a*b, and vice versa.
from the scratch
Select zipcode, AVG(income
*w1+age*w2+credit*w3)
as score
From Customer
Where
occupation=‘student’
Group By zipcode
Order By score
Limit 5
5 results
R
sorting
G
grouping
B
Boolean
sort
Total order of all groups, although only top 5 are requested.
Full materialization (scan, join, grouping, sorting).
all groups
(materialized)
group
all tuples
(materialized)
Boolean operators
(a) Traditional query plan.
sort
Without any further info, full materialization is all that we can do.
all groups
(materialized)
group
all tuples
(materialized)
Boolean operators
(a) Traditional query plan.
sort
Goal: minimize the number of tuples processed. (Partial vs. full materialization)
all groups
(materialized)
group g1
group g2
(incremental)
agg
group
tuple x1
tuple x2
(incremental)
all tuples
(materialized)
rank- & group-aware operators
Boolean operators
(b) New query plan.
(a) Traditional query plan.
There is a certain minimal number of tuples to retrieve before we can stop.
A non-trivial upper-bound is a must. (e.g., +infinity will not save anything.)
Upper-bound of a group indicates the best a group can achieve, thus tells us if it is going to make top-k or not.
Select g, SUM(v)
From R
Group By g
Order By SUM(v)
Limit 1
Assumptions for getting a non-trivial upper-bound:
F[g] can be obtained by applying G over the maximal T of g’s members.
v <= 1.)
Process the most promising group first.
agg
group-aware
scan
R
Process the most promising group first.
agg
group-aware
scan
R
Process the most promising group first.
agg
group-aware
scan
R
Process the most promising group first.
agg
group-aware
scan
R
Process the most promising group first.
agg
group-aware
scan
R
Process the most promising group first.
agg
group-aware
scan
R
Process the most promising group first.
agg
group-aware
scan
R
Retrieve tuples within a group in the order of tuple-aggregate function T.
agg
group &
rank-aware
scan
R
in the order of R.v
not in the order of R.v
Retrieve tuples within a group in the order of tuple-aggregate function T.
agg
group &
rank-aware
scan
R
in the order of R.v
not in the order of R.v
Retrieve tuples within a group in the order of tuple-aggregate function T.
agg
group &
rank-aware
scan
R
in the order of R.v
not in the order of R.v
Retrieve tuples within a group in the order of tuple-aggregate function T.
agg
group &
rank-aware
scan
R
in the order of R.v
not in the order of R.v
Though G(T) is ad-hoc, the Boolean conditions are shared in sessions of decision making.
Similar queries computed.
Pay as much as materialize-group-sort for the 1st query; amortized by the future similar queries.
agg
GetNext( )
operator
GetNext(g)
GetNext(g’’)
GetNext(g’)
scan
GetNext(g’)
GetNext(g)
scan
operator
Upper-Bound, Group-Ranking, and Tuple-Ranking
Minimal number of tuples processed