1 / 24

Using Taxonomies to Perform Aggregated Querying over Imprecise Data

Using Taxonomies to Perform Aggregated Querying over Imprecise Data. Atanu Roy Chandrima Sarkar Rafal A. Angryk. Presented by: Rafal A. Angryk Date: 2010-12-14. Outlines of the Presentation. Idea Imprecision Motivation Limitations of Previous Work Definitions Approach

aira
Download Presentation

Using Taxonomies to Perform Aggregated Querying over Imprecise Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Using Taxonomies to Perform Aggregated Querying over Imprecise Data Atanu Roy ChandrimaSarkar Rafal A. Angryk Presented by: Rafal A. Angryk Date: 2010-12-14

  2. Outlines of the Presentation • Idea • Imprecision • Motivation • Limitations of Previous Work • Definitions • Approach • Experimental Setup & Results • Conclusion and Future Work Roy, Sarkar, Angryk. Using Taxonomies to Perform Aggregated Querying over Imprecise Data

  3. Idea of the Project • This paper provides framework for answering queries over imprecise data found in the common databases. • We propose to solve this by classifying the data into taxonomical hierarchies and then capturing it in weighted hierarchical hypergraph. Roy, Sarkar, Angryk. Using Taxonomies to Perform Aggregated Querying over Imprecise Data

  4. Imprecision in Databases: An Example Roy, Sarkar, Angryk. Using Taxonomies to Perform Aggregated Querying over Imprecise Data

  5. Constraint: All soybean seeds with the same kind of stem canker should germinate in the same month of the season. Roy, Sarkar, Angryk. Using Taxonomies to Perform Aggregated Querying over Imprecise Data

  6. Motivation • Several recent papers have focused on retrieval of imprecise data, where every fact can be a region, instead of a point, in a multi-dimensional space. • The most prominent one is [BDRV07] • They have solved it by constructing marginal databases (MDBs) from extended database (EDBs) with the help of constraint hypergraph. Roy, Sarkar, Angryk. Using Taxonomies to Perform Aggregated Querying over Imprecise Data

  7. Limitations of Previous Work • Creating Marginal Databases using weighted hierarchical Hypergraph, employs brute force method for retrieving connected facts (tuples). • This increases the overall time complexity and processing time of the queries. • [BDRV07] follows a data specific technique but we propose to follow a domain specific knowledge Roy, Sarkar, Angryk. Using Taxonomies to Perform Aggregated Querying over Imprecise Data

  8. Definitions • Background knowledge: Knowledge required to generate taxonomies. • Expert knowledge: Domain-specific human expertise. • Data-derived knowledge: Derived from historic precise database and is used to generate mutually exclusive probabilities • Possible worlds: All the possible combinations that an imprecise record can assume. • Valid world: All the possible worlds which satisfies a given set of constraints. Roy, Sarkar, Angryk. Using Taxonomies to Perform Aggregated Querying over Imprecise Data

  9. Roy, Sarkar, Angryk. Using Taxonomies to Perform Aggregated Querying over Imprecise Data

  10. Assignment of Probabilities Roy, Sarkar, Angryk. Using Taxonomies to Perform Aggregated Querying over Imprecise Data

  11. EDB Creation • Probability of a possible world is the product of the unconditional occurrences of all imprecise attributes. • Sum of probabilities of all possible worlds of an imprecise record is 1. • Probability assignment rule creates a set of tuples using Roy, Sarkar, Angryk. Using Taxonomies to Perform Aggregated Querying over Imprecise Data

  12. Hyperedge Creation Roy, Sarkar, Angryk. Using Taxonomies to Perform Aggregated Querying over Imprecise Data

  13. MDB Creation • Weighted hierarchical hypergraph is defined as H(L, E) where L represents the nodes and E is the set of hyperedges between different taxonomies. • Each hyperedge signifies a distinct combination of attribute values. The weight of a possible world assigned to a hyperedge [AC10] needs to preserve the a few properties. • All t-norms [AC10] (e.g. minimum, product) fulfill these requirements. We choose product for the purposes of our preliminary investigation. Roy, Sarkar, Angryk. Using Taxonomies to Perform Aggregated Querying over Imprecise Data

  14. EDB  MDB Roy, Sarkar, Angryk. Using Taxonomies to Perform Aggregated Querying over Imprecise Data

  15. Aggregated Querying • We aggregate tuples for aggregated querying based on its uniqueness. • Group two tuples only when all their attributes values and the corresponding probabilities are the same. • Find the total no. of plants grown in august which have a Stem Canker above-sec-node • (44*0.9057) + (25*0.6429) ≈ 56 Roy, Sarkar, Angryk. Using Taxonomies to Perform Aggregated Querying over Imprecise Data

  16. Experimental Setup • Census-Income dataset from UCI Machine Learning repository. • Finally used 7 dimensions. • Precise database has 191239 records. • Test dataset has 99762 records. • Randomly inserted imprecision into the test dataset to make it imprecise. Roy, Sarkar, Angryk. Using Taxonomies to Perform Aggregated Querying over Imprecise Data

  17. Distribution of Imprecision Roy, Sarkar, Angryk. Using Taxonomies to Perform Aggregated Querying over Imprecise Data

  18. Imprecision Characteristics Roy, Sarkar, Angryk. Using Taxonomies to Perform Aggregated Querying over Imprecise Data

  19. Scalability Test Roy, Sarkar, Angryk. Using Taxonomies to Perform Aggregated Querying over Imprecise Data

  20. Extended Database Analysis Roy, Sarkar, Angryk. Using Taxonomies to Perform Aggregated Querying over Imprecise Data

  21. Influence of Imprecision Roy, Sarkar, Angryk. Using Taxonomies to Perform Aggregated Querying over Imprecise Data

  22. Absolute Percentage Error Roy, Sarkar, Angryk. Using Taxonomies to Perform Aggregated Querying over Imprecise Data

  23. Conclusion and Future Work • In this research we significantly present a framework for efficient querying over imprecise data with an average of ≈ 94% accuracy • We intend to extend this research to include Ontology in place of Taxonomy. • We also intend to use Associative Weight Mining to assign weights to hyperedges. Roy, Sarkar, Angryk. Using Taxonomies to Perform Aggregated Querying over Imprecise Data

  24. Questions? • References • [BDRV07]: Douglas Burdick, AnHai Doan, RaghuRamakrishnan, ShivakumarVaithyanathan: OLAP over Imprecise Data with Domain Constraints. VLDB 2007: 39-50 • [AC10]: Rafal A. Angryk, JacekCzerniak: Heuristic Algorithm for Interpretation of Multi-Valued Attributes in Similarity-based Fuzzy Relational Databases. International Journal of Approximate Reasoning 51: 895-911 (2010) Roy, Sarkar, Angryk. Using Taxonomies to Perform Aggregated Querying over Imprecise Data

More Related