1 / 67

Relational Database Design

Relational Database Design. Bill Woolfolk Public Health Sciences University of Virginia woolfolk@virginia.edu. Objectives. Understand definition of modern relational database Understand and be able to apply a practical method for designing databases

josh
Download Presentation

Relational Database Design

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Relational Database Design Bill Woolfolk Public Health Sciences University of Virginia woolfolk@virginia.edu

  2. Objectives • Understand definition of modern relational database • Understand and be able to apply a practical method for designing databases • Recognize and avoid common pitfalls of database design

  3. What’s a database? • A collection of logically-related information stored in a consistent fashion • Phone book • Bank records (checking statements, etc) • Library card catalog • Soccer team roster • The storage format typically appears to users as some kind of tabular list (table, spreadsheet)

  4. What Does a Database Do? • Stores information in a highly organized manner • Manipulates information in various ways, some of which are not available in other applications or are easier to accomplish with a database • Models some real world process or activity through electronic means • Often called modeling a business process • Often replicates the process only in appearance or end result

  5. Databases and the Systems which manage them • Modern electronic databases are created and managed through means of RDBMS: Relational DataBase Management Systems • An individual data storage structure created with an RDBMS is typically called a “database” • A database and its attendant views, reports, and procedures is called an “application”

  6. Database Applications • Database (the actual DB with its attendant storage structure) • SQL Engine - interprets between the database and the interface/application • Interface or application – the part the user gets to see and use

  7. Relational DatabaseManagement Systems • Low-end, proprietary, specific purpose • Email: Outlook, Eudora, Mulberry • Bibliographic: Ref. Mgr., EndNote, ProCite • Mid-level • Microsoft Access, Lotus Approach, Borland’s Paradox • More or less total control of design allows custom builds • High-end • Oracle, Microsoft SQL Server, Sybase, IBM DB2 • Professional level DBs: Banks, e-commerce, secure • Amazon.com, Ebay.com, Yahoo.com

  8. Problems with Bad Design • Early computers were slow and had limited storage capacity • Redundant or repeating data slowed operations and took up too much precious storage space • Poor design increased chance of data errors, lost or orphaned information

  9. Benefits of Good Design • Computers today are faster and possess much larger storage devices • Rigid structure of modern relational databases helped codify problems and solutions • Design problems are still possible, because the DBMS software won’t protect you from poor practices • Good design still increases efficiency of data processes, reduces waste of storage, and helps eliminate data entry errors

  10. Codd’s Rules • Edgar F. Codd • Mathematician and Researcher at IBM • Devised the relational data model in 1970 • Published 12 rules in 1985 defining ideal relational database, added 6 more in 1990 E. F. Codd: A Relational Model of Data for Large Shared Data Banks. CACM 13(6): 377-387 (1970) (http://www.acm.org/classics/nov95/toc.html) Codd, E. (1985). "Is Your DBMS Really Relational?" and "Does Your DBMS Run By the Rules?" ComputerWorld, October 14 and October 21.

  11. Modification Anomalies Customers_Orders_Inventory A search for “General Tool Co.” would miss “General Tool” and “General Toll”. A case-sensitive search for “Totally Toys” would miss “TOTALLY TOYS”

  12. Insertion Anomalies Customers_Orders_Inventory How would you enter a new item into your inventory if no one had ordered it yet?

  13. Deletion Anomalies Customers_Orders_Inventory If you wanted to stop selling “dot matrix printer” and remove it from your inventory, you would have to delete the order and customer info for “XYZ Inc.”

  14. The Fix Order_Items Orders Products Customers

  15. The Design Process • Identify the purpose of the database • Review existing data • Make a preliminary list of fields • Make a preliminary list of tables and enter fields • Identify the key fields • Draft the table relationships • Enter sample data and normalize the data/tables • Review and finalize the design

  16. Database Modeling • Refers to various, more-or-less formal methods for designing a database • Some provide precision steps and tools • Ex.: Entity-Relationship (E-R) Modeling • Widely used, especially by high-end database designers who can’t afford to miss things • Fairly complex process • Extremely precise

  17. 1. Identify purpose of the DB Clients can tell you what information they want but have no idea what data they need. • “We need to keep track of inventory” • “We need an order entry system” • “I need monthly sales reports” • “We need to provide our product catalog on the Web” Be sure to Limit the Scope of the database.

  18. 2. Review Existing Data • Electronic • Legacy database(s) • Spreadsheets • Web forms • Manual • Paper forms • Receipts and other printed output

  19. 3. Make Preliminary Field List • Make sure fields exist to support needs • Ex. if client wants monthly sales reports, you need a date field for orders. • Ex. To group employees by division, you need a division identifier • Make sure values are atomic • Ex. First and Last names stored separately • Ex. Addresses broken down to Street, City, State, etc. • Do not store values that can be calculated from other values • Ex. “Age” can be calculated from “Date of Birth”

  20. 4. Make Preliminary Tables(and insert the fields into them) • Each table holds info about one subject • Don’t worry about the quantity of tables • Look for logical groupings of information • Use a consistent naming convention

  21. Naming Conventions • Rules of thumb • Table names must be unique in DB; should be plural • Field names must be unique in the table(s) • Clearly identify table subject or field data • Be as brief as possible • Avoid abbreviations and acronyms • Use less than 30 characters, • Use letters, numbers, underscores (_) • Do not use spaces or other special characters

  22. Naming Conventions (cont’d) • Leszynski Naming Convention (LNC) • Example: tblEmployees, qryPartNum • tbl, qry = tag • Employees, PartNum = basename • LNC at Microsoft Developers Network

  23. 5. Identify the Key Fields • Primary Key(s) • Can never be Null; must hold unique values • Automatically indexed in most RDBMSs • Values rarely (if ever) change • Try to include as few fields as possible • Multi-field Primary Key • Combination of two or more fields that uniquely identify an individual record • Candidate Key • Field or fields that qualify as a primary key • Important in Third and Boyce-Codd Normal Forms

  24. 6. Identify Table Relationships • Based on business rules being modeled • Examples: • “each customer can place many orders” • “all employees belong to a department” • “each TA is assigned to one course”

  25. Relationship Terminology • Relationship Type • One-to-one: expressed as 1:1 • One-to-Many: expressed as 1:N or 1:M or 1:∞ • Many-to-Many: expressed as N:N or M:M • Primary or Parent Table • Table on the left side of 1:N relationship • Related or Child Table • Table on the right side of 1:N relationship • Relational Schema • Diagram of table relationships in database

  26. Relationship Terminology (cont’d) • Join • Definition of how related records are returned • Join Line • Visual relationship indicators in schema • Key fields • Primary Key: the linking field on the one side of a 1:N relationship • Foreign Key: the primary key from one table that is added to another table so the records can be related • Non-Key Fields: any field that is not part of a primary key, multi-field primary key, or foreign key

  27. One-to-One (1:1) • Each record in Table A relates to one, and only one, record in Table B, and vice versa. • Either table can be considered the Primary, or Parent Table • Can usually be combined into one table, although may not be most efficient design

  28. One-to-Many (1:N) • Each record in Table A may relate to zero, one or many records in Table B, but each record in Table B relates to only one record in Table A. • The potential relationship is what’s important: there might be no related records, or only one, but there could be many. • The table on the One (or left) side of a 1:N relationship is considered the Primary Table.

  29. Many-to-Many (N:N) • A record in Table A can relate to many records in Table B, and a record in Table B can relate to many records in Table A. • Most RDBMSs do not support N:N relationships, requiring the use of a linking (or intersection or bridge) table that breaks the N:N relationship down into two 1:N relationships with the linking table being on the Many side of both new relationships.

  30. Relational Schema 1 N

  31. 7. Normalization • Normal Forms (NF): design standards based on database design theory • Normalization is the process of applying the NFs to table design to eliminate redundancy and create a more efficient organization of DB storage. • Each successive NF applies an increasingly stringent set of rules

  32. First Normal Form (1NF) • A table is in first normal form if there are no repeating groups. • Repeating Groups : a set of logically related fields or values that occur multiple times in one record • 1: non-atomic value, or multiple values, stored in a field • 2: multiple fields in the same table that hold logically similar values

  33. Sample 1NF Violation - 1 Employee_Projects_Time

  34. Sample 1NF Violation - 2 Employee_Projects_Time

  35. Tables in 1NF Employees Employees_Projects

  36. Second Normal Form (2NF) • A table is in 2NF if it is in 1NF and each non-key field is functionally dependent on the entire primary key. • Functional dependency: a relationship between fields such that the value in one field determines the one value that can be contained in the other field. • Determinant: a field in which the value determines the value in another field. Example Airport – City Dulles – Washington, DC

  37. Sample 2NF Violation Employees_Projects

  38. Tables in 2NF Employees Employees_Projects Projects

  39. Third Normal Form (3NF) • A table is in 3NF when it is in 2NF and there are no transitive dependencies. • Transitive Dependency: a type of functional dependency in which the value of a non-key field is determined by the value in another non-key field and that field is not a candidate key.

  40. Sample 3NF Violation Projects_Managers

  41. Tables in 3NF Projects Project Managers

  42. Boyce-Codd Normal Form (BCNF) • A table is in BCNF when it is in 3NF and all determinants are candidate keys. • Developed to cover situations that 3NF did not address. • Applies to situations where you have overlapping candidate keys.

  43. Sample Business Rules • Business Rules: • Each course can have many students • Each student can take many courses • Each course can have multiple teaching assistants (TAs) • Each TA is associated with only one course • For each course, each student has one TA

  44. Sample BCNF Violation Course_Students_TAs

  45. Tables in BCNF Courses Students TAs

  46. Fourth Normal Form (4NF) • A table is in 4NF when it is in BCNF and there are no multi-valued dependencies. • Multi-valued Dependency: occurs when, for each value in field A, there is a set of values for field B and a set of values for field C, but B and C are not related. • Occurs when the table contains fields that are not logically related.

  47. Sample 4NF Violation - 1 Movies

  48. Tables in 4NF - 1 Stars Producers

  49. Sample 4NF Violation - 2 Projects_Equipment

  50. Tables in 4NF - 2 Equipment Projects

More Related