1 / 56

More On ERD and Concept of Normalization in Relational Data Model

More On ERD and Concept of Normalization in Relational Data Model. Weak Entities Before we discuss Normalization, we need to introduce one more type of entity and relationship: Weak entity and Weak relationship. Weak Entities

llaird
Download Presentation

More On ERD and Concept of Normalization in Relational Data Model

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. More On ERD and Concept of Normalization in Relational Data Model

  2. Weak Entities • Before we discuss Normalization, we need to introduce one more type of entity and relationship: • Weak entity and • Weak relationship

  3. Weak Entities • Entities that we have talked about so far, are all strong (regular) entity types. • A strong entity type has key attribute(s) of its own whereas a weak entity type does not have a particular key attribute that can uniquely identify its instances. • Entities belonging to a weak entity type are identified by being related to specific entities from another entity type in combination with one of their attribute values. • We can call the other entity type the Identifying or Owner Entity type and we call the relationship type that relates a weak entity to its owner entity Identifying Relationship.

  4. Weak Entities -- Cont • For example, we can store the information about the dependants of an employee in a separate entity called Dependents. • The Dependants entity can keep track of the name and other personal information of dependants of the employees. • Note that it might be possible to have two different employees that have a daughter named “Nancy”. So name cannot be used as the primary key of Dependants entity. • In order to distinguish one instance of entity Dependants from another instance, we need to know the employee Id of the employees which is the primary key of Employee entity. • The combination of employee id (that is a property of Employee entity) and the name attribute (that is a property of Dependants entity) can be used to create a primary key for the Dependants entity.

  5. Mapping ERD to Tables for weak Entities • For each weak entity E in the ERD. • Create a table T that includes all the simple attributes of E. • Include only the simple component attributes of a composite attribute. • Choose one of the key attributes of E as partial primary key for T • If the chosen key of E is composite, the composite keywill form the partial primary key of T. • Include the primary key of the owner table in T. • The primary key of the owner with the partial primary key chosen for T become the composite primary key of T.

  6. Example of Weak Entity Relationship email Age Name Name Emp-Id Gender M 1 EMPLOYEE DEPENDANTS HAS-DEPENDANT Foreign key that references Employee table PART Table 1: DEPENDANTS (Emp-Id, Name, Age, Gender) A Composite Primary key for Dependants table

  7. Another Example Of Changing ERD to tables

  8. 1 SeatNo SEAT

  9. We start with taking care of the strong entities first. They include: • AIRPORT: • AirportCode, Name, City, State, … • AIRPLANE-TYPE: • TypeName, MaxSeats, Company,… • AIRPLANE • AirPlaneId, TotalNoOfSeats,… • FLIGHT: • Number, WeekDays, Airline,…. • We better check mark the entities we have taken care of

  10. 1 SeatNo SEAT

  11. Now we also take care of the week entities. As you see, the owner entity of some week entities are strong entities and the owner of others are week entities. • It is a better idea to perform the ones that their owners are strong entities. So in our example, we get: • FLIGHT-LEG • FlightNumber, LegNo, … • FARES • FlightNumber, Code, Amount, Restrictions, …. • LEG-INSTANCE • FlightNumber, LegNo, Date, NoOfAvailSeats, … • SEAT • FlightNumber, LegNo, Date, SeatNo, CustomerName, CPhone,… • Again, lets check mark what we have done

  12. 1 SeatNo SEAT

  13. Here is what we have so far: • AIRPORT: • AirportCode, Name, City, State, … • AIRPLANE-TYPE: • TypeName, MaxSeats, Company,… • AIRPLANE • AirPlaneId, TotalNoOfSeats, • FLIGHT: • Number, WeekDays, Airline,…. • FLIGHT-LEG • FlightNumber, LegNo, • FARES • FlightNumber, Code, Amount, Restrictions, …. • LEG-INSTANCE • FlightNumber, LegNo, Date, NoOfAvailSeats, • SEAT • FlightNumber, LegNo, Date, SeatNo, CustomerName, CPhone,…

  14. Now we take care of 1:1 and 1:N relationships: • AIRPORT: • AirportCode, Name, City, State, … • AIRPLANE-TYPE: • TypeName, MaxSeats, Company,… • AIRPLANE • AirPlaneId, TotalNoOfSeats, TypeName, • FLIGHT: • Number, WeekDays, Airline,…. • FLIGHT-LEG • FlightNumber, LegNo, DepAirPortCode, ScheduledDepTime, ArrAirportCode, ScheduledArrTime, • FARES • FlightNumber, Code, Amount, Restrictions, …. • LEG-INSTANCE • FlightNumber, LegNo, Date, NoOfAvailSeats, AirPlaneId, DepAirPortCode, DepTime, ArrAirportCode, ArrTime • SEAT • FlightNumber, LegNo, Date, SeatNo, CustomerName, CPhone,…

  15. 1 SeatNo SEAT

  16. Now we take care of M:N relationships: • AIRPORT: • AirportCode, Name, City, State • AIRPLANE-TYPE: • TypeName, MaxSeats, Company • AIRPLANE • AirPlaneId, TotalNoOfSeats, TypeName • FLIGHT • Number, WeekDays, Airline • FLIGHT-LEG • FlightNumber, LegNo, DepAirPortCode, ScheduledDepTime, ArrAirportCode, ScheduledArrTime • FARES • FlightNumber, Code, Amount, Restrictions • LEG-INSTANCE • FlightNumber, LegNo, Date, NoOfAvailSeats, AirPlaneId, DepAirPortCode, DepTime, ArrAirportCode, ArrTime • SEAT • FlightNumber, LegNo, Date, SeatNo, CustomerName, CPhone • CAN-LAND • AirportCode, TypeName

  17. 1 SeatNo SEAT

  18. So far we learned how to develop an ERD for an organization • We also found how ERD can be changed to initial tables • In developing a database, do all developers come up with the same ERD? • Depending on assumptions, and style of development, it is very likely for two developers of the same database to come up with two different ERDs • Your ERD does not have to be the best ERD that can possibly be created • However, you need to do your best to relate entities properly and place attributes where they may fit the best • No matter how well-structured your ERD is, your initial tables produced from your ERD can potentially contain redundancy. • Redundancy can cause insertion, deletion, and update problems.

  19. Assume a developer plans to develop a simple database for computer science department. • The developer’s ERD looks as follows: a single entity with some attributes. Dept Name Salary Course EmpID DateCourseCompleted ComputerScience

  20. He changes his ERD to a table and inserts some data into the table. Therefore, his database contains only one table and here it is. • What is the problem with this database? • What is the primary key? What is/are the foreign key(s)? • Can you see any redundancy in this table?

  21. Well-Structured table • Definition: It is a table that contains a minimum amount of redundancy and allows users to insert, modify, and delete the rows in a table without errors or inconsistencies. • Consider this table:

  22. Anomalies • Definition: Errors or inconsistencies that may result when a user attempts to update a table that contains redundant data. There are three types of anomalies: insertion, deletion, and modification anomalies. • Insertion Anomaly: • Every time you try to input the personal information of an employee into the employee table, you also need to insert the information about the course he has completed. • What if we have an employee that has not assigned any course? • Can we add personal information and add the course information later? • No, because EmpID and Course (together) make the primary key and primary key cannot be NULL.

  23. Deletion Anomaly: • Suppose that ASP is no longer used by the company. They want to delete all information regarding ASP • If we delete ASP, we lose information about employee 190. • Modification Anomaly: • Suppose that employee number 100 gets a salary increase. You must record the increase in each of the rows for that employee (two occurrences in this case); otherwise, the data is inconsistent.

  24. Concept of Normalization • In order to take care of the anomalies, we need to normalize the tables. • Normalization: • Definition: The process of converting complex data structures into simple, stable data structures. • Normal Form: • Definition: A state of a table that can be determined by applying simple rules regarding dependencies to that table • In order to ensure that our tables have no serious anomalies, they all have to be in third normal form. • Before explaining different normal forms, we need to understand the concept of functional dependency.

  25. Functional Dependencies • Definition: If an attribute of a table (Employee Name) can be determined from another attribute in the same table (Employee Number), the Name is functionally dependant on the Number. • Common examples of functional dependencies are: • SSN determinesName, Address and Birthday, • A person’s name, address, and birthday are functionally dependent on that person’s Social Insurance number • VIN determines Make, Model and Color • The make, model, and color of a vehicle are functionally dependent on the vehicle identification number, • ISBN determines Title • The title of the book is functionally dependent on the book’s international standard book number (ISBN)

  26. First Normal Form • Definition: A table is in first normal form if each field (intersection of a row and a column) contains only one instance • Having more than one data item in a field is called ‘Repeating Groups’ • For example the following table is not in first normal form.

  27. The simplest method to make a table into the first normal form is to expand the number of rows in the table to ensure that the value of each attribute is atomic. • Thus, to make our table into the first normal formal we have to create two extra rows

  28. Second Normal Form • Definition: A table is in second normal form if it is in first normal form and non-key attributes don’t have partial functional dependency. • Partial functional dependency: • Definition: A dependency in which one or more non-key attributes are functionally dependent on part, but not on all of the primary key. • Note that, a table in first normal form is also in second normal form if • The primary key consists of only one attribute (column), or • No non-key attribute exists, or • Every non-key attribute is functionally dependent on the full primary key.

  29. In the above table the primary key is “EmpID and Course”. • However, the EmpID can functionally determine Name, Dept, and Salary. Thus, Name, Dept, and Salary are only dependent on EmpID that is part of the primary key. • EmpID and Course functionally determine the DateCompleted. • Therefore, the above table is not in second normal form.

  30. In order to solve this problem and making the table into second normal form we can split the table into two new tables • Note that, if you split a table into two tables or more, they still have to be connected through primary and foreign key • The following two tables are still related through the EmpId attribute

  31. Third Normal Form • Definition: A table is in third normal form if it is in second normal form and neither Derived Dependencies nor Transitive Dependencies exist. • Derived Dependency: • Definition: A table has a Derived Dependency if one or more non-key attribute(s) in that table can be calculated (derived) from other non-key attributes in either the same table or in other tables in the database • Transitive dependency: • Definition: A table has a Transitive Dependency if one or more non-key attribute(s) depends on another non-key attribute (in the same table)

  32. Example of Derived Dependency • The “Total” column can be derived from multiplying “QtyOrdered” and “PricePerUnit” column. Therefore, it can be removed

  33. Example of Transitive Dependency • The following is not in 3rd normal form. Why? • Answer: The reason is that the non-key attribute “SalesRepID” can determine the Region of the SalesRep. • Therefore, since transitive dependency exits, the above table is not in third normal form

  34. Note that this table is in second normal form because the primary key is a single attribute(CustID). • However, the following anomalies still exists: • Insertion Anomaly: A new salesperson assigned to the North region cannot be entered until a customer has been assigned to that sales person • Deletion Anomaly:If customer number 6837 is deleted from the table, we lose the information that salesperson Hernandez is assigned to the East region • Modification Anomaly: If salesperson Smith is assigned to the East region, several rows must be changed to reflect that fact.

  35. Since a Primary Key is unique, using a name is never a good idea. It is better to add a new attribute (numeric) and use it instead. Foreign key and Primary key relationship

  36. Customer Name Customer Number Customer Address Order Date Order Number Orders 1 has PartNumber Quantity Ordered N Details Total Price Part Description PartCost PerUnit

  37. Another Example • Consider the following tables that is generated from this ERD. The goal is to make the tables into third normal form. Orders Details

  38. First normal form -- Remove repeating groups. • Question: Are the two tables, Orders and Details, in first normal form? • Orders table is in first normal form because the value for each field is atomic • However, Details Table is not in first normal form because some fields have more than one value. • To put Details Table in first normal form, we need to remove repeating group in Details table as follows: Details

  39. Second normal form - Remove partial dependencies • Question: Are the two tables, Orders and Details, in Second normal form? • The Orders table has a single attribute primary key, (orderNumber) so it is in 2nd normal form already

  40. However, Details Table is not in Second Normal form. Why? • Note that Details Table has a multiple Attribute primary key(Order Number; Part Number) and there is a partial dependency in this table: • “PartDescription” and “PartCostPerUnit” are dependant of Part Number • We can separate “PartDescription” and “PartCostPerUnit” from the Details table and place them in a new table.

  41. The old Details Table Details Parts Foreign key and Primary key relationship

  42. Third Normal Form -- Remove Derived Dependency and Transitive Dependency • The Details table has a column with derived dependency • The “TotalPayable” can be derived from multiplying “QuantityOrdered” in Details table by the “PartCostPerUnit” in the parts table • TotalPayable can be removed from the table. • Here is what we get: TotalPayable = QuantityOrdered * PartCostPerUnit Details Parts

  43. Do you see anymore derived dependency or transitive dependency in the following tables? Details Parts • Is there any more derived dependency in any of the non-key attributes? • To have a transitive dependency in a table, we need to have at least two non-key attributes. Therefore, Details table cannot have transitive dependency either. • In the Parts table the description and cost columns cannot functionally determine each other. • Therefore, Details table and Parts table are in third normal form

  44. How about the Orders table? Orders • The Orders table contains Transitive data. Why? • The customer name and address depend on Customer Number. Note that “CustomerNumber” is a non-key attribute • We can place a customer with its name and address in a separate table and remove it from the Orders table.

  45. The old Orders Table Orders Customers Primary key and foreign key relationship

  46. What we are left with is 4 tables • These table can be used to represent any order we want Customers Orders Details Parts

  47. Revising ERD • Based on your normalization, you should revise your ERD Customer Number CustomerAddress OrderDate Order Number Customer Name 1 M MAKES CUSTOMER ORDER QuantityOrdered Part Number M INCLUDES Part Description PART L PartCostPerUnit

  48. Some Comments • The normalization process can be complicated by the ERD you start with • In the order example, you should know right away that a Customer should have a table of its own • How you do the normalization is not important. As long as the data reaches Third Normal Form is all that matters • When normalizing data, there is no ‘Right’ way to do it. Some say there is no ‘wrong’ way either. It depends on the data you are working with

  49. Sometimes, it even makes sense to ‘not’ fully normalize • EX. You calculate a ‘total’ several times a day, and each time it takes 5 minutes. This total changes at most once a day. • It is faster to save the total, and update it when it changes than to recalculate it every time • You should go through the normalization process to be sure though

  50. Cust ID Prod ID Order No Inv No Cust Name Cust Add Prod Desc Prod Price Order Date Order QTY Order Total Given This data, Derive normalized tables and ERD • Assumptions: • Each customer can make several orders; but a particular order is only make for one customer • Each product can be in many orders; and each order can contain many products • Several invoices can be issues for the same order but each invoice refers to one order

More Related