1 / 27

Mining for Empty Rectangles in Large Data Sets

Mining for Empty Rectangles in Large Data Sets. Jeff Edmonds Jarek Gryz Dongming Liang Renee Miller. A. B. 3. 6. 1. 7. 3. 8. 1 2 3. 0. 0. 1. 6. 1. 0. 0. 7. 0. 0. 1. 8. Matrix representation.  A,B (R. S). al. um. 0. A. B. 3. 6. 0. 1. 7. 0. 3. 8. 1 2 3. 0.

misaac
Download Presentation

Mining for Empty Rectangles in Large Data Sets

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Mining for Empty Rectangles in Large Data Sets Jeff Edmonds Jarek Gryz Dongming Liang Renee Miller

  2. A B 3 6 1 7 3 8 1 2 3 0 0 1 6 1 0 0 7 0 0 1 8 Matrix representation A,B(R S)

  3. al um 0 A B 3 6 0 1 7 0 3 8 1 2 3 0 0 0 0 1 6 0 0 1 0 0 7 0 0 0 0 1 8 Find All Maximal 0-Rectangles A,B(R S)

  4. Car Year … Example A,B(R S) 95 96 97 0 0 0 0 1 BMW Z3 1 0 0 Honda L2 0 0 1 Toyota 6A First BMW Z3 series cars were made in 1997.

  5. Find all maximal empty rectangles between points in real plane O( (# 1’s)2 ) within a 0-1 matrix O( #0’s ) Machine Learning Computational Geometry Query Optimization Relation to Previous Work [Namaad, Hsu, Lee] Our Work [Lui, Ku, Hsu] & [Orlowski] Problem: Purpose: # of maximal 0-rectangles:

  6. O( # 1’s log(#1’s) + # rectangles ) = O(|X||Y|) O( #0’s ) = O(|X||Y|) O(|X||Y|) O(min(|X|, |Y|)) only two rows of matrix kept in memory Relation to Previous Work [Namaad, Hsu, Lee] Our Work [Lui, Ku, Hsu] & [Orlowski] Time: Space:

  7. Intensive random memory access • Requires a single scan of the sorted data IBM paid us $25,000 to patent it! Scales Badly • Scales well wrt • # of tuples in join • # of maximal rectangles • # of values |X| & |Y| Relation to Previous Work [Namaad, Hsu, Lee] Our Work [Lui, Ku, Hsu] & [Orlowski] Practical Implementation: Scalable: Practical?

  8. First Third Second Fourth Structure of Algorithm • loop y = 1..|Y| loop x = 1..|X| • Construct staircase(x,y) • Output all maximal 0-rectangles with <x,y> as bottom-right corner 1 X Y 1 Timing O(1) amortized time per <x,y> 1 0 0 1 1 <x,y> * 1

  9. Fifth Structure of Algorithm • loop y = 1..|Y| loop x = 1..|X| • Construct staircase(x,y) • Output all maximal 0-rectangles with <x,y> as bottom-right corner 1 X Y 1 Query Optimization & Experimental Results 1 0 0 1 1 <x,y> * 1

  10. Staircase(x,y) 1 ( x ,y ) r r 1 Stack of steps step 0 0 1 1 0 0 0 1 0 ( x ,y ) ( x ,y ) ( x ,y ) ( x ,y ) ( x ,y ) 5 4 1 2 3 4 1 5 2 3 1 0 1 0 0 0 0 Jarek Gryz: Staircase(x,y) 1 Y 1 <x,y> * X

  11. Jarek Gryz: Constructing Maximal Rectangles <x,y> *

  12. Jarek Gryz: Constructing Maximal Rectangles Too Narrow Maximal Too short <x,y> *

  13. 0 <x,y> * Jarek Gryz: Constructing staircase(x,y)from staircase(x-1,y) 1 1 0 Case 1 0 0 1 0 1 0 0 0 0 1 0 0 1 0 0 0 0 1 0 <x-1,y> * 1 0 1 0 0 0 0

  14. 0 <x,y> * Jarek Gryz: Constructing staircase(x,y)from staircase(x-1,y) 1 Case 2 1 1 1 0 1 0 1 0 0 0 0 1 0 <x-1,y> * 1 0 1 0 0 0 0

  15. Delete Keep 0 <x,y> * Jarek Gryz: Constructing staircase(x,y)from staircase(x-1,y) 1 Too Narrow Maximal Too short ( x ,y ) r r 1 1 Y 1 1 0 0 1 0 0 0 0 0 1 0 ( x ,y ) 1 1 <x-1,y> * 1 0 ( x, y ) 1 0 0 0 0 X

  16. y*(x-1,y) Jarek Gryz: Constructing x*(x,y) & y*(x,y) 1 ( x ,y ) r r 1 1 1 0 1 0 0 1 0 0 0 0 0 1 0 ( x ,y ) 1 1 <x-1,y> * 1 0 ( x, y ) x*(x-1,y) 1 0 0 0 0

  17. y*(x,y) 0 <x,y> * 0 Query x*(x,y) 0 Jarek Gryz: Constructing x*(x,y) & y*(x,y) from x*(x-1,y) & y*(x,y-1) 1 ( x ,y ) r r 1 y*(x,y-1) 1 1 (saved) 0 1 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 ( x ,y ) 1 1 <x-1,y> * 1 0 ( x, y ) x*(x-1,y) 1 0 0 0 0

  18. Third Structure of Algorithm • loop y = 1..|Y| loop x = 1..|X| • Construct staircase(x,y) • Output all maximal 0-rectangles with <x,y> as bottom-right corner 1 X Y 1 Timing O(1) amortized time per <x,y> 1 0 0 1 1 <x,y> * <x.y> 1

  19. Jarek Gryz: Timing Only work that is not constant Time Delete 1 Too Narrow Maximal Too short ( x ,y ) r r 1 1 Y 1 1 0 0 0 1 0 0 0 0 0 1 0 ( x ,y ) 1 1 <x,y> * 1 0 ( x, y ) 1 0 0 0 0 X

  20. 1 1 0 0 0 1 0 1 0 0 0 0 1 0 0 1 0 0 0 0 1 0 <x-1,y> * 1 0 1 0 0 0 0 Amortized # of steps deleted (per <x,y>) = # of steps created (per <x,y>) £ 1 Timing

  21. Number of Maximal Rectangles £ # of maximal 0-rectangles: O( (# 1’s)2 ) [Namaad, Hsu, Lee] Running time of alg = O( #0’s ) £

  22. How many empty rectangles are there? Tests done on 4 pairs of attributes with numerical domain present in typical joins in a real-world workload of a health insurance company.

  23. How big are the rectangles?

  24. Query rewrite: simple case select … from R, S,... where R.C=S.C and 60<R.A<80 and 20<S.B<80 and... select … from R, S,... where R.C=S.C and 60<R.A<80 and 20<S.B<60 and...

  25. Query rewrite: complex case select … from R, S,... where R.C=S.C and 60<R.A<80 and 20<S.B<80 and... select … from R, S,... where R.C=S.C and (… and …) or (… and …) or (… and …) or ...

  26. How much do the rectangles overlap with queries?

  27. Query optimization experiments real-world workload of 26 queries 5 of the queries “qualified” for the rewrite only simple rewrites were considered all rewrites led to improved performance

More Related