1 / 17

Aggregate Query Answering under Uncertain Schema Mappings

Aggregate Query Answering under Uncertain Schema Mappings. Avigdor Gal, Maria Vanina Martinez, Gerardo I. Simari, VS Subrahmanian Presented By Stephen Lynn. Overview. Aggregate Queries Probabilistic Schema Mapping Goals/Objectives Aggregate Processing (3 proposals) By-Table Algorithm

shada
Download Presentation

Aggregate Query Answering under Uncertain Schema Mappings

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Aggregate Query Answering under Uncertain Schema Mappings Avigdor Gal, Maria Vanina Martinez, Gerardo I. Simari, VS Subrahmanian Presented By Stephen Lynn

  2. Overview • Aggregate Queries • Probabilistic Schema Mapping • Goals/Objectives • Aggregate Processing (3 proposals) • By-Table Algorithm • By-Tuple Algorithm • Evaluation • Analysis

  3. Aggregate Queries COUNT, MIN, MAX, SUM, AVG Simple PTIME algorithms to compute

  4. Probabilistic Schema Mappings

  5. By-Table vs By-Tuple • Tuple – consider all possible mappings for each tuple • Table – single mapping for entire table • P(date→postedDate) = 0.7 • P(date→reducedDate) = 0.3

  6. Goals/Objectives • Impact Analysis of Probabilistic Schemas on Aggregate Queries • Aggregate Query Algorithms • Time Complexity Analysis • Evaluation

  7. Aggregation Methods Range Distribution Expected Value

  8. Method Relationships • Distribution • Most time consuming • Most information • Range • Computed directly from distribution • Expected Value • Computed directly from distribution More efficient ways to compute

  9. By-Table Algorithm All PTIME computable

  10. By-Tuple Algorithm (COUNT) O(n * m)

  11. Example By-Tuple (COUNT)

  12. Time Complexity

  13. Evaluation • Empirical Evaluation • Real-world dataset (eBay) • Synthetic dataset • Evaluate Time Complexity • Vary tuple numbers • Vary attribute mappings

  14. Evaluation Results

  15. Evaluation Results

  16. Evaluation Results

  17. Analysis • Strengths • Effect of probabilistic schemas on aggregates • Nice PTIME algorithms • Weaknesses • Evaluation was obvious • By-Table results biased by database optimizations • Future Work • Improve algorithms • Extend to sub-queries • Heuristics

More Related