1 / 26

R ++ -tree : an efficient spatial access method for highly redundant point data

R ++ -tree : an efficient spatial access method for highly redundant point data . Martin Šumák , Peter Gurský University of P. J. Šafárik in Košice. Research motivation.

sumana
Download Presentation

R ++ -tree : an efficient spatial access method for highly redundant point data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. R++-tree: an efficient spatial access method for highly redundant point data Martin Šumák, Peter Gurský University of P. J. Šafárik in Košice

  2. Research motivation • Besides kNN and range queries, R-tree-like index is usable for computation of Top-k query (find best k objects according to user preferences) • h(x1, x2) = f1(x1) + f2(x2) Martin Šumák, Peter Gurskýat ADBIS 2013

  3. Why highly redundant point data • Our data consists of flats with the following attributes: • price • area • floor • max floor of building • year of approbation • number of rooms • Each flat is represented by a point in 6-dimensional space Martin Šumák, Peter Gurský at ADBIS 2013

  4. R+-tree fundamentals • R+-tree is R-tree-like index with the following specialities: • zero overlaps betweennodes at the same level • rectangles of nodescover all the parent’srectangle • suitable for point dataand point queries Martin Šumák, Peter Gurský at ADBIS 2013

  5. R+-tree fundamentals • desired state • zero overlaps • minimum bounding rect. • R+-tree • avoids overlaps at the cost of rectangles size Martin Šumák, Peter Gurský at ADBIS 2013

  6. TheR++-tree idea • desired state • zero overlaps • minimum bounding rect. • R++-tree • inner nodes keep two rectangles for each child node – the minimum and the parent covering one Martin Šumák, Peter Gurský at ADBIS 2013

  7. TheR++-tree idea • desired state • zero overlaps • minimum bounding rect. • R++-tree • inner nodes keep two rectangles for each child node – the minimum and the parent covering one Leaf nodes left unchanged Martin Šumák, Peter Gurský at ADBIS 2013

  8. Nodes of R++-tree • Leaf nodes • Exactly same as leaf nodes of R+-tree • Contain Id and coordinates for each object • Take one disk page each • Inner nodes • Contain pointer and two rectangles for each child node • Take two disk pages each Martin Šumák, Peter Gurský at ADBIS 2013

  9. Using of two rectangles in inner nodes • Searching • Only the minimum bounding rectangles are necessary • Inserting new objects • Both minimum bounding and parent covering rectangles need to be used (read/updated) Martin Šumák, Peter Gurský at ADBIS 2013

  10. Implementation of inner nodes • First page contains minimum bounding rectangles • Second page contains parent covering rectangles Martin Šumák, Peter Gurský at ADBIS 2013

  11. Advantages and drawbacks of two pages idea • Advantages • searching requires reading of one page per each node involved • rate between page size and node capacity is the same as in R+-tree • Drawbacks • When updating, two pages per inner node need to be processed • The real impact on whole index size is relatively low Martin Šumák, Peter Gurský at ADBIS 2013

  12. Experiments - data • Artificial data (range, kNN and top-k query) • 100 000 random points of 2–10-dimensional space • decimal values within [0; 1] • Integer values from 1 to 100 • Integer values from 1 to 10 • Pseudo-real data (top-k query) • 6 dimensional points – data of flats for sale • 550 000 flats (20-multiple set) • 2 700 000 flats (100-multiple set) Martin Šumák, Peter Gurský at ADBIS 2013

  13. Experiments - measures • 300 random queries per each data set and query type • Average time per query • Average number of I/Os per query • One I/O corresponds to reading of one page i.e. processing one node Martin Šumák, Peter Gurský at ADBIS 2013

  14. Artificial data100 000 random points with decimal values within [0; 1] Martin Šumák, Peter Gurský at ADBIS 2013

  15. Artificial data100 000 random points with decimal values within [0; 1] Martin Šumák, Peter Gurský at ADBIS 2013

  16. Artificial data100 000 random points with decimal values within [0; 1] Martin Šumák, Peter Gurský at ADBIS 2013

  17. Artificial data100 000 random points with integer values from 1 to 100 Martin Šumák, Peter Gurský at ADBIS 2013

  18. Artificial data100 000 random points with integer values from 1 to 100 Martin Šumák, Peter Gurský at ADBIS 2013

  19. Artificial data100 000 random points with integer values from 1 to 100 Martin Šumák, Peter Gurský at ADBIS 2013

  20. Artificial data100 000 random points with integer values from 1 to 10 Martin Šumák, Peter Gurský at ADBIS 2013

  21. Artificial data100 000 random points with integer values from 1 to 10 Martin Šumák, Peter Gurský at ADBIS 2013

  22. Artificial data100 000 random points with integer values from 1 to 10 Martin Šumák, Peter Gurský at ADBIS 2013

  23. Pseudo-real data550 000 flats (i.e. 6-dimensional points) Martin Šumák, Peter Gurský at ADBIS 2013

  24. Pseudo-real data550 000 flats (i.e. 6-dimensional points) Martin Šumák, Peter Gurský at ADBIS 2013

  25. Pseudo-real data2 700 000 flats (i.e. 6-dimensional points) Martin Šumák, Peter Gurský at ADBIS 2013

  26. Thank you for your attention Martin Šumák, Peter Gurský at ADBIS 2013

More Related