1 / 21

Linguistic summaries on relational databases

Linguistic summaries on relational databases. Miroslav Hu d ec University of Economics in Bratislava, Department of Applied Informatics. FSTA, 2014. Relational knowledge from a data set. Most of municipalities with high altitude have small pollution?.

amory
Download Presentation

Linguistic summaries on relational databases

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Linguistic summaries on relational databases Miroslav Hudec University of Economics in Bratislava, Department of Applied Informatics FSTA, 2014

  2. Relational knowledge from a data set Most of municipalities with high altitude have small pollution? If then rules: if population density is high then waste production is high? Validity of rule

  3. Linguistic summary - introduction Q entities in database are (have) S Qx(Px) Q is a linguistic quantifier, X ={x} is a universe of disclosure and P(x) is a predicate depicting summariser S Truth value of summaries called validity and gets values from the [0, 1] interval

  4. Linguistic summary - elementary Q entities in database are (have) S where n is the cardinality of database (number of entities), is the proportion of objects in a database that satisfy P(x), µq is quantifier

  5. Linguistic summary - extended Q R objects in database are (have) S the proportion of R objects in a database that satisfy S, t is a t-norm,µq is quantifier.

  6. Linguistic summary - graph Q R objects in database are (have) S

  7. Issues

  8. Summarizer Let Dmin and Dmax be the lowest and the highest domain values of attribute A i.e. Dom(A) = [Dmin, Dmax] and L and H be the lowest and the highest values in the current content of a database respectively. In practice, [L, H] [Dmin, Dmax]. This fact should be considered in linguistic summaries.

  9. Family of summarizer The uniform domain covering method (Tudorie, 2008)

  10. Quantifier For a regular non-decreasing quantifier (e.g. most) its membership function should meet the following property: Quantifier most might be given as (Kacprzyk and Zadrożny 2009)

  11. Example Rules if population density is small then production of waste is small with cf = 1; if population density is high then production of waste is high with cf = 0.662.

  12. Family of quantifiers Uniform domain covering method on the [0, 1] interval , , , , ,

  13. Comparison of quantifiers

  14. Optimization of summaries • Decision maker creates particular linguistic summary or sentence of interest and evaluate its validity • Automatic generation of relevant linguistic summaries (Liu, 2011). is a set of relevant quantifiers, is a set of relevant linguistic expressions, is a set defining subpopulation of interest and β is the threshold value from the {0, 1] interval. Each solution produces a linguistic summary Q* R * are S*.

  15. Optimization of summaries {(small, small), (small, medium), (medium, medium), (high, high)}

  16. Fuzzy functional dependencies and linguistic summaries

  17. Queries by summaries Data on lower hierarchical level are basis for summaries but only data on higher level are revealed ranked downward from the best to the worst. Select regions where most of municipalities has small attitude above sea level where n is number of entities in whole database, Ni is number of entities in cluster i (municipalities in region i), R is number of clusters in database (regions), µp(xji) is matching degree of j-th entity in i-th cluster. • Advantages: • Sensitive or data that are not free of charge remain hidden • Policy maker… is interested in general overview not in data

  18. Example Select regions where most of municipalities has small attitude above sea level

  19. Conclusion The work demonstrates how we can start with a simple linguistic summary and build more complex summaries by merging knowledge from several fields: mining parameters for functions of summarizers from data and extending to defining parameters of quantifiers, optimization of summaries, fuzzy queries. Although fuzzy set theory has been already established as an adequate framework to deal with linguistic summaries, there is still space for improvements.

  20. Some topics for further research • Linguistic summaries on fuzzy databases, • Operations research task for optimisation the process of rules generation • Full applications for practitioners • Fuzzy functional dependencies and linguistic summaries in data mining

  21. Thank you for your attention

More Related