1 / 19

On Random Sampling over Joins

Spring - 2007 CSE 6392 – 003 Data Exploration and Analysis in Relational Databases. On Random Sampling over Joins. Surajit Chaudhuri Rajeev Motwani Vivek Narasayya. By: Lekhendro 2/27/2007. Outlines. Semantic and algorithms of sample The difficulty of join sampling

selima
Download Presentation

On Random Sampling over Joins

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Spring - 2007 CSE 6392 – 003 Data Exploration and Analysis in Relational Databases On Random Sampling over Joins Surajit Chaudhuri Rajeev Motwani Vivek Narasayya By: Lekhendro 2/27/2007

  2. Outlines • Semantic and algorithms of sample • The difficulty of join sampling • Classification of Join problems • Two previous sampling strategies • New strategies for join sampling • Experiment results

  3. Semantics of Sampling • Sampling with Replacement (WR) • Sampling without Replacement (WoR) • Independent Coin Flips (CF) • Stream Sampling • Sequential • Non sequential (could be on materialized data) • Weighted and Unweighted Sampling

  4. I. Black-Box U1: Given relation R with n tuples, generate an UNWEIGHTED WR sample of size r. Sequential WR Sampling

  5. II. Black-Box U2: Given relation R with n tuples, generate an UNWEIGHTED WR sample of size r. Sequential WR Sampling: continued…

  6. III. Black-Box WR1: Given relation R with n tuples, generate an WEIGHTED WR sample of size r. Sequential WR Sampling:continued…

  7. IV. Black-Box WR2: Given relation R with n tuples, generate an WEIGHTED WR sample of size r. Sequential WR Sampling:continued…

  8. The Difficulty of Join Sampling • Suppose that we have the relations

  9. Classification of the Problem: Case A :No information is available for either or . Case B : No information is available for but indexes and /or statistics are available for . Case C : Indexes/statistics are available for and .

  10. Previous Sampling Strategies I.Strategy Naive-Sample:

  11. Previous Strategies: continued… II.Strategy Olken-Sample:

  12. New Strategies for Join Sampling I. Strategy Stream Sample:

  13. New Strategies: continued … II. Strategy Group Sample

  14. New Strategy : continued … III.Strategy Frequency-Partition-Sample

  15. Experimental Results:

  16. Experimental Results:

  17. Experimental Results:

  18. Summary • The difficulty of join sampling. • Classification of problem for random sampling over join – 3 cases. • Different strategies:

  19. Summary : continued … • When indexes/statistics are NOT provided in both operands • Frequency Partition Strategy outperformed others strategies. • When indexes/statistics areprovided in both operands • Stream strategy is the best among them. • Stream strategy is also applicable when indexes/statistics areavailable only in the inner relation

More Related