260 likes | 398 Views
The R++-tree is a novel spatial access method designed to efficiently manage highly redundant point data, particularly applicable for kNN and Top-k queries. This method enhances R-tree-like indexing by utilizing dual rectangles in inner nodes, which maintain zero overlaps at the same level and offer improved querying capabilities. It focuses on efficient searching and updates, balancing the trade-offs between minimum bounding rectangles and parent-covering rectangles. Our experiments with artificial and pseudo-real datasets demonstrate its effectiveness for 6-dimensional data, including real estate attributes.
E N D
R++-tree: an efficient spatial access method for highly redundant point data Martin Šumák, Peter Gurský University of P. J. Šafárik in Košice
Research motivation • Besides kNN and range queries, R-tree-like index is usable for computation of Top-k query (find best k objects according to user preferences) • h(x1, x2) = f1(x1) + f2(x2) Martin Šumák, Peter Gurskýat ADBIS 2013
Why highly redundant point data • Our data consists of flats with the following attributes: • price • area • floor • max floor of building • year of approbation • number of rooms • Each flat is represented by a point in 6-dimensional space Martin Šumák, Peter Gurský at ADBIS 2013
R+-tree fundamentals • R+-tree is R-tree-like index with the following specialities: • zero overlaps betweennodes at the same level • rectangles of nodescover all the parent’srectangle • suitable for point dataand point queries Martin Šumák, Peter Gurský at ADBIS 2013
R+-tree fundamentals • desired state • zero overlaps • minimum bounding rect. • R+-tree • avoids overlaps at the cost of rectangles size Martin Šumák, Peter Gurský at ADBIS 2013
TheR++-tree idea • desired state • zero overlaps • minimum bounding rect. • R++-tree • inner nodes keep two rectangles for each child node – the minimum and the parent covering one Martin Šumák, Peter Gurský at ADBIS 2013
TheR++-tree idea • desired state • zero overlaps • minimum bounding rect. • R++-tree • inner nodes keep two rectangles for each child node – the minimum and the parent covering one Leaf nodes left unchanged Martin Šumák, Peter Gurský at ADBIS 2013
Nodes of R++-tree • Leaf nodes • Exactly same as leaf nodes of R+-tree • Contain Id and coordinates for each object • Take one disk page each • Inner nodes • Contain pointer and two rectangles for each child node • Take two disk pages each Martin Šumák, Peter Gurský at ADBIS 2013
Using of two rectangles in inner nodes • Searching • Only the minimum bounding rectangles are necessary • Inserting new objects • Both minimum bounding and parent covering rectangles need to be used (read/updated) Martin Šumák, Peter Gurský at ADBIS 2013
Implementation of inner nodes • First page contains minimum bounding rectangles • Second page contains parent covering rectangles Martin Šumák, Peter Gurský at ADBIS 2013
Advantages and drawbacks of two pages idea • Advantages • searching requires reading of one page per each node involved • rate between page size and node capacity is the same as in R+-tree • Drawbacks • When updating, two pages per inner node need to be processed • The real impact on whole index size is relatively low Martin Šumák, Peter Gurský at ADBIS 2013
Experiments - data • Artificial data (range, kNN and top-k query) • 100 000 random points of 2–10-dimensional space • decimal values within [0; 1] • Integer values from 1 to 100 • Integer values from 1 to 10 • Pseudo-real data (top-k query) • 6 dimensional points – data of flats for sale • 550 000 flats (20-multiple set) • 2 700 000 flats (100-multiple set) Martin Šumák, Peter Gurský at ADBIS 2013
Experiments - measures • 300 random queries per each data set and query type • Average time per query • Average number of I/Os per query • One I/O corresponds to reading of one page i.e. processing one node Martin Šumák, Peter Gurský at ADBIS 2013
Artificial data100 000 random points with decimal values within [0; 1] Martin Šumák, Peter Gurský at ADBIS 2013
Artificial data100 000 random points with decimal values within [0; 1] Martin Šumák, Peter Gurský at ADBIS 2013
Artificial data100 000 random points with decimal values within [0; 1] Martin Šumák, Peter Gurský at ADBIS 2013
Artificial data100 000 random points with integer values from 1 to 100 Martin Šumák, Peter Gurský at ADBIS 2013
Artificial data100 000 random points with integer values from 1 to 100 Martin Šumák, Peter Gurský at ADBIS 2013
Artificial data100 000 random points with integer values from 1 to 100 Martin Šumák, Peter Gurský at ADBIS 2013
Artificial data100 000 random points with integer values from 1 to 10 Martin Šumák, Peter Gurský at ADBIS 2013
Artificial data100 000 random points with integer values from 1 to 10 Martin Šumák, Peter Gurský at ADBIS 2013
Artificial data100 000 random points with integer values from 1 to 10 Martin Šumák, Peter Gurský at ADBIS 2013
Pseudo-real data550 000 flats (i.e. 6-dimensional points) Martin Šumák, Peter Gurský at ADBIS 2013
Pseudo-real data550 000 flats (i.e. 6-dimensional points) Martin Šumák, Peter Gurský at ADBIS 2013
Pseudo-real data2 700 000 flats (i.e. 6-dimensional points) Martin Šumák, Peter Gurský at ADBIS 2013
Thank you for your attention Martin Šumák, Peter Gurský at ADBIS 2013