1 / 13

CS 430 Database Theory

CS 430 Database Theory. Winter 2005 Lecture 7: Designing a Database Logical Level. Logical Design. Define the Relations and Attributes for a Relational Database Key problems: What tables will there be? What columns will each table have? Additional issue:

Download Presentation

CS 430 Database Theory

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS 430Database Theory Winter 2005 Lecture 7: Designing a Database Logical Level

  2. Logical Design • Define the Relations and Attributes for a Relational Database • Key problems: • What tables will there be? • What columns will each table have? • Additional issue: • Where will Integrity Constraints be enforced • Intrinsic (Database design), Triggers and other DBMS mechanisms, Application

  3. Outline • Informal Guidance • First Normal Form • Functional Dependencies and Second, Third and Boyce-Codd Normal Forms • Join Dependencies and Fourth and Fifth Normal Forms • Material from Chapters 10 and 11 in Text Book

  4. “Informal” Guidance • Informal means: without a formal (i.e. mathematical) theory • Outline: • Keep to real world semantics • Avoid redundant information • Avoid update anomalies • Minimize use of null values • Avoid generation of spurious tuples

  5. Real World Semantics • (To the extent possible) Your design should track the semantics of the real world • “Database is a record of executed policy” • Each relation should correspond to a single real world concept (e.g. Person) • Attributes should be attributes of that real world concept • Each attribute of each relation should correspond to an independent real world fact • You should be able to easily explain to the user the meaning of all the relations and attributes • You should choose semantically meaningful names

  6. Avoid Redundant Informationand Update Anomalies • See Figures 10.3 and 10.4 for examples • Insertion Anomalies • Can’t insert data without having the redundant information available (or filling with nulls) • No way to insert a (false) subentity without inserting the parent (e.g. Department can’t exist if they have no Employees) • Deletion Anomalies • May delete a (false) subentity (e.g. delete a department by deleting the last employee)

  7. Update Anomalies (Continued) • Modification (Update) Anomalies • Must change an attribute in all the duplicated locations or have multiple versions of same fact • Bottom line: • Design the database to avoid Update Anomalies • If not, document the discrepancies and make sure that these are handled correctly by applications that update the database • Data is sometimes duplicated to improve performance

  8. Minimize use of null values • Problem: null may mean: • Value is not applicable • Value is not known • Value is missing • Value will be provided later • Another problem: Large numbers of nulls can waste space • Advice: • Try to avoid null values • Restrict null to exceptional cases • Use a separate table when nulls are common • This usually implies either a class/sub-class relationship or an entity with multiple states

  9. Avoid Generation of Spurious Tuples • Natural Joins should represent True facts • See example Figures 10.5 and 10.6 • Equi-joins of non-key attributes (primary or foreign keys) are problematic • Non-key attributes represent unique facts associated with the entity designated by the key • Advice: • Design relations so that they are joined via primary or foreign keys

  10. Normal Forms • Formal basis for design of database • First normal form • Follows Relational Model • Second, Third, and Boyce-Codd Normal Forms • Individual relations have the right keys • Fourth and Fifth Normal Forms • Multi-value and Join dependencies • Collection of Relations behave correctly

  11. Practical Guidance on Normal Forms • Do a preliminary design of the database first • Use ER modeling and your understanding of the problem • Follow the informal guidance • Check that the design observes normalization rules • Iterate when there is a “bug” in the design • Document discrepancies you choose to accept • If performance becomes a problem, consider “denormalization” of the database • Most frequent denormalization: Storing joins • Beware premature optimization

  12. First Normal Form • “Tables are square” • No repeated values • No variant records • Today: This is part of the relational model • When Codd defined it: This was counter to standard practice • Tomorrow: Object-Relational and XML databases allow repeated values • See Figure 10.9 for example

  13. Fixing First Normal Form Violations Possible strategies: • (Preferred) Create a new table to store multi-valued attributes • Repeat the whole tuple for each value • Usually creates redundancy problems and update anomalies • Store the values as multiple attributes in a single tuple • Requires knowing the maximum number of values or creating overflow relations • Creates problems when querying the database • Code multiple values into a single string • Yuck!

More Related