1 / 17

Preferential top-k search over local data

Preferential top-k search over local data. dissertation thesis RNDr. Martin Šumák supervisor: doc. RNDr . Stanislav Krajči , PhD. consultant: RNDr . Peter Gurský , PhD. Outline. Top-k search motivation and example restrictions and assumptions R-tree-based solution

kim
Download Presentation

Preferential top-k search over local data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Preferential top-k search over local data dissertation thesis RNDr. Martin Šumák supervisor: doc. RNDr. StanislavKrajči, PhD. consultant: RNDr. Peter Gurský, PhD.

  2. Outline • Top-k search • motivation and example • restrictions and assumptions • R-tree-based solution • normalization of data • R++-tree • Grid file-based solution • Experiments • Comparison with B+-trees-based solution, table scan, etc. Preferential top-k search over local data, Dissertationthesis, RNDr. Martin Šumák

  3. Top-k search • Example • find top 20 apartments with 3 or 4 rooms, not at first floor, with price about 60000 not exceeding 70000 euro • moreover, price is the most important attribute and floor is the least important attribute Preferential top-k search over local data, Dissertationthesis, RNDr. Martin Šumák

  4. Top-k query • k = 20 • preferences to attribute’s values – fuzzy functions • importance of attributes – weights wprice = 3 wrooms = 2 wfloor = 1 Preferential top-k search over local data - dissertation thesis - Martin Šumák

  5. Top-k query • Overall value of object O is 3*fprice(Oprice) + 2*frooms(Orooms) + 1*ffloor(Ofloor) • In general c(fprice(Oprice), frooms(Orooms), ffloor(Ofloor)) Function c has to be monotone! Preferential top-k search over local data - dissertation thesis - Martin Šumák

  6. The goal of top-k search • to find top-k objects effectively • by processing minimum amount of data • restrictions and assumptions • all the data is accessible locally • all attributes are numerical Preferential top-k search over local data - dissertation thesis - Martin Šumák

  7. R-tree-based solution • object • a vector of n numbers • a point of n-dimensional space • R-tree, R*-tree, R+-tree, R++-tree Preferential top-k search over local data - dissertation thesis - Martin Šumák

  8. From kNN to top-k search • k nearest neighbour • known incremental algorithm • distance from “query point Z” is the measure of “closeness” Preferential top-k search over local data - dissertation thesis - Martin Šumák

  9. From kNN to top-k search • top-k search • overall value (h) is the measure of “goodness” • by replacing distance with overall value and reversing order we change the result from kNN to top-k Preferential top-k search over local data - dissertation thesis - Martin Šumák

  10. Analogy of kNN and top-k search kNN • Correctness • Efficiency top-k Preferential top-k search over local data - dissertation thesis - Martin Šumák

  11. Disproportion of attribute values • floor, area, price – very different ranges • solution: normalization – linear transformation of attribute values to interval [0; 1] • Another disproportion comes from weights Preferential top-k search over local data - dissertation thesis - Martin Šumák

  12. Normalization applicability • Useful for • R*-tree • Meaningless for • R-tree (proven for the quadratic split method) • R+-tree, R++-tree • Grid file Preferential top-k search over local data - dissertation thesis - Martin Šumák

  13. Why the R++-tree • Zero overlaps & minimum bounding rectangles may cause a problem when adding new object • R+-tree avoids overlaps at the price of rectangles size Preferential top-k search over local data - dissertation thesis - Martin Šumák

  14. The R++-tree idea • Zero overlaps & minimum bounding rectangles may cause a problem when adding new object • R++-tree keeps two rectangles for each node – the minimum one and the parent covering one Preferential top-k search over local data - dissertation thesis - Martin Šumák

  15. The R++-tree properties • Height-balanced • Zero overlaps • Overflow nodes at leaf level only • Minimum node occupancy is 1 • For the top-k search purposes, attribute values can be strings or any other comparable values (not just numbers) Preferential top-k search over local data - dissertation thesis - Martin Šumák

  16. Top-k search over Grid file • Grid file is a spatial index for point data • We used static Grid file without extra directory Preferential top-k search over local data - dissertation thesis - Martin Šumák

  17. Top-k search over Grid file • We have proven correctness and efficiency as well Preferential top-k search over local data - dissertation thesis - Martin Šumák

More Related