1 / 63

Relative Information Capacity of Simple Relational Database Schemata

Relative Information Capacity of Simple Relational Database Schemata. Paper by: Richard Hull Presented by: Jose Picado. Outline. Problem: Data relativism and information capacity Definition Examples Importance Hierarchy of dominance measures Basic results Discussion. Data relativism.

arlene
Download Presentation

Relative Information Capacity of Simple Relational Database Schemata

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Relative Information Capacity of Simple Relational Database Schemata Paper by: Richard Hull Presented by: Jose Picado

  2. Outline • Problem: Data relativism and information capacity • Definition • Examples • Importance • Hierarchy of dominance measures • Basic results • Discussion

  3. Data relativism • Represent the same data in different ways

  4. Data relativism • Represent the same data in different ways • Represent the same data under different schemas

  5. Data relativism • Represent the same data in different ways • Represent the same data under different schemas Schema 1 Example taken from: Kosky, Anhony. Transforming Databases with Recursive Data Structures, 1996.

  6. Data relativism • Represent the same data in different ways • Represent the same data under different schemas Schema 1 Schema 2 Example taken from: Kosky, Anhony. Transforming Databases with Recursive Data Sturctures, 1996.

  7. Relative information capacity • Expressiveness of a schema • Different schemas representing same data may have different information capacity

  8. Relative information capacity • Expressiveness of a schema • Different schemas representing same data may have different information capacity Schema 1 Schema 2 Example taken from: Kosky, Anthony. Transforming Databases with Recursive Data Structures, 1996.

  9. Relative information capacity • Expressiveness of a schema • Different schemas representing same data may have different information capacity • Schema 1: • Does not require that the spouse attribute of a man goes to a woman. • Does not require that for each spouse attribute in one direction there is a corresponding spouse attribute in another direction. Example taken from: Kosky, Anthony. Transforming Databases with Recursive Data Structures, 1996.

  10. Relative information capacity • Expressiveness of a schema • Different schemas representing same data may have different information capacity • Schema 2: • Allows unmarried people to be represented in the database. Example taken from: Kosky, Anthony. Transforming Databases with Recursive Data Structures, 1996.

  11. Relative information capacity • Possible solution: • Transform existing schema to new schema by structural manipulations transformation

  12. Relative information capacity • Possible solution: • Transform existing schema to new schema by structural manipulations • Information capacity preserving? transformation

  13. Importance • Schema evolution • None of the information stored in the initial database is lost

  14. Importance • Data integration • All information in one of the component databases is reflected in the integrated database Example taken from: Kosky, Anthony. Transforming Databases with Recursive Data Structures, 1996.

  15. Importance • Database normalization theory • User view construction • Schema simplification • Translation between data models

  16. Hull’s paper • Introduces theoretical tools for studying measures of relative information capacity • Theoretical frameworks at the time were complex • There was no clear definition about the concept • Hull introduced nice ways of comparing schemata and their information capacity • Defines a hierarchy of measures to compare information capacity of schemata

  17. Hull’s paper • Gives some basic results concerning the previous measures • Considers only non-keyed relations Non-keyed Keyed Relations: Instances:

  18. Definitions • Schema P is a set of relations • Relations composed of attributes, which may be of different basic types • Basic types are domain designators (have a fixed domain of possible values) • I(P) is the instances of P, usually infinite Instances I(P) Schema P …

  19. Transformation • P and Q are relational schemata • A transformation from P to Q is a map

  20. Transformation • P and Q are relational schemata • A transformation from P to Q is a map P

  21. Transformation • P and Q are relational schemata • A transformation from P to Q is a map P Q

  22. Transformation • P and Q are relational schemata • A transformation from P to Q is a map P PersonInfo(x,y,z) :- Person(x,y), Birth(x,z). Q

  23. Dominance • P and Q are relational schemata • Q dominates P via if the composition of followed by is the identity on P

  24. Dominance P Q

  25. Dominance • Take instances of P: I(P)

  26. Dominance • Apply to I(P) Male(x) :- Person(x,y,z), y=“male”. Female(x) :- Person(x,y,z), y=“female”. Marriage(x,y) :- Person(x,u,y), Person(y,v,x), u=“male”, v=“female”

  27. Dominance • Apply to (I(P)) Person(x,”male”,z) :- Male(x), Marriage(x,z). Person(x,”female”,z) :- Female(x), Marriage(x,z).

  28. Dominance • Compare I(P) and ( (I(P))) I(P) ( (I(P)))

  29. Dominance • P and Q are relational schemata • Q dominates P via if the composition of followed by is the identity on P Information structured according to P can be restructured to “fit” into Q, and restructured again to “fit” into P Q has at least as much capacity for storing information as P

  30. Equivalence • P and Q are equivalent (xxx) if they have equivalent information capacity • P and Q are equivalent if • Q dominates P (xxx) and • P dominates Q (xxx)

  31. Information dominance measures • Calculous dominance • Generic dominance • Internal dominance • Absolute dominance More restrictive Less restrictive

  32. Types of equivalency • P and Q are equivalent (calc) • P and Q are equivalent (gen) • P and Q are equivalent (int) • P and Q are equivalent (abs) More restrictive Less restrictive

  33. Level 1: Calculous dominance • Only allow transformations to be relational calculus expressions • Relational calculus: • First order logic or predicate calculus • Predicates: atom, • Each query Q(x1, …, xn) is a predicate P

  34. Level 1: Calculous dominance • Only allow transformations to be relational calculus expressions • are relational calculus expressions • Q dominates P calculously

  35. Level 2: Generic dominance • Only allow transformations that treat domain elements as “essentially uninterpreted objects” • Treat all elements as equals except some set of constants • Property of all query languages, such as SQL and Datalog

  36. Level 2: Generic dominance • Only allow transformations that treat domain elements as “essentially uninterpreted objects” • treat all elements as equals • Q dominates P generically

  37. Level 3: Internal dominance • Only allow transformations that do not invent any data • Invent data: numerical computations or string manipulations performance = goals/games

  38. Level 3: Internal dominance • Only allow transformations that do not invent any data • do not invent data • Q dominates P internally

  39. Level 4: Absolute dominance • Some set of values • : instances of P that contain only values in Y, where • : cardinality of instances of P containing only values in Y • If thenQ dominates P absolutely • Easy to compute: based on counting of instances, instead of transformations

  40. Basic results • Q dominates P calculously Q dominates P generically Q dominates P internally Q dominates P absolutely

  41. Basic results • Sometimes absolute and internal dominance hold, but generic and calculous dominance don’t • Q dominates P (abs, int) • and transformation (int) does not invent data • Q does not dominate P (gen, calc) • There is no transformation (gen, calc) that takes instances of P to Q and then back to P P Q

  42. Basic results • Absolute dominance useful for verifying calculous (not) dominance • Q dominates P calculously Q dominates P absolutely • P does not dominate Q absolutely P does not dominates Q calculously P Q *under certain constraints

  43. Basic results • Dominance is preserved by re-namings of basic types (homomorphism) • h(P): homomorphism of P • If Q dominates P thenh(Q) dominates h(P)for any measure of dominance (calc, gen, int, abs)

  44. Basic results • Calculousdominance does not accurately measure the presence of “semantic correspondence”

  45. Basic results • Calculousdominance does not accurately measure the presence of “semantic correspondence” NAME NUMBER NUMBER NAME NAME NUMBER S1 R1 P S2 R2

  46. Basic results • Calculousdominance does not accurately measure the presence of “semantic correspondence” NAME NUMBER NUMBER NAME NAME NUMBER S1 R1 P S2 R2 Q T

  47. Basic results • Calculousdominance does not accurately measure the presence of “semantic correspondence” NAME NUMBER NUMBER NAME NAME NUMBER S1 R1 P S2 R2 Q T Q dominates P (calc), but there is not semantic mapping from P to Q

  48. Basic results • If only non-keyed relational schemata with only one basic type, then all types of dominance are equivalent Theorem: Let P and Q be non-keyed relational schemata over a single basic type B. Then the following are equivalent: Q dominates P (calc) Q dominates P (gen) Q dominates P (int) Q dominates P (abs)

  49. Basic results • With any reasonable measure of relative information capacity, two non-keyed relational schemata are equivalent iff they are identical • In the relational model (non-keyed), there is essentially at most one way to represent a given data set

  50. Discussion • Strong points: • ???

More Related