1 / 24

Chapter 3 Data Normalization Get data properly tabled!

University of Manitoba Asper School of Business 3500 DBMS Bob Travica. Chapter 3 Data Normalization Get data properly tabled!. Based on G. Post, DBMS: Designing & Building Business Applications. Updated 2014 . Normalization.

wynn
Download Presentation

Chapter 3 Data Normalization Get data properly tabled!

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. University of ManitobaAsper School of Business3500 DBMSBob Travica Chapter 3 Data Normalization Get data properly tabled! Based on G. Post, DBMS: Designing & Building Business Applications Updated 2014

  2. Normalization • The process of putting data into the format of relational databases or organizing data into correctly designed tables. • Tables should be designed so that • a) problems (anomalies) with insertion, deletion and modification of data are avoided • b) redundancy is reduced • c) data quality is preserved (completeness, consistency)

  3. Attributes/ Properties Entity (Class): Employee Table: Employee Rows/Objects EmployeeID TaxpayerID LastName FirstName HomePhone Address 12512 888-22-5552 Cartom Abdul (603) 323-9893 252 South Street 15293 222-55-3737 Venetiaan Roland (804) 888-6667 937 Paramaribo Ln 22343 293-87-4343 Johnson John (703) 222-9384 234 Main Street 29387 837-36-2933 Stenheim Susan (410) 330-9837 8934 W. Maple Relational Database Terminology • Relational database: A collection of tables (relations). Tables store atomic data. • Table: A collection of columns (attributes, properties, fields) describing an entity (class). Table is also a collection of rows (records) each with the same number of columns. • Each row represent an object (an instance of a class).

  4. OrderItem OrderIDItemID Quantity 1 229 2 1 253 4 2 229 1 2 555 4 Relational Database Terminology – Primary Key • Every table has a primary key (key) – an attribute that uniquely identifies each row (e.g., EmployeeID on previous slide) • Primary key can span more than one column combined (combined, composite, concatenated) key. • Other attributes are called non-key columns. A non-key depends on key. • Primary key can be generated automatically by DBMS – surrogate key. Note: Watch for data types (e.g., number vs. text) and naming rules (arbitrary but consistent).

  5. Primary key is underlined Non-key columns Table name Relational Database Shorthand Notation Customer(CustomerID, LastName, FirstName, Address, City, State, ZipPostalCode, TelephoneNumber) * • Note: Telephone number can be used as a “backup key.” • Shorthand notation is good for analysis but not for official diagrams. Do not use it in your assignments and exams.

  6. Tables Diagram – Schema (Normalized) Customer Customer Salesperson Salesperson 1 1 1 1 * * * * Order Order 1 * * OrderItem OrderItem * 1 * Association class (ItemOrdered, OrderDetail, etc.) Item Item Class Diagram to Schema Class Diagram (Non-Normalized)

  7. Shorthand Notation for Normalized Tables Diagram – Foreign Key Customer(CustomerID, Name, Address, City, Phone) Salesperson(EmployeeID, Name, DateHired) Order(OrderID, OrderDate, CustomerID, EmployeeID) OrderItem(OrderID, ItemID, Quantity) Item(ItemID, Description, ListPrice) • Foreign Key (FK) = Attribute that is a (primary) key in another table (e.g., CustomerID in Order). • Logic & naming of OrderItem: Replacing the Order-Item M:M relationship with two 1:M relationships. • The OrderItem key is a combination of FKs (OrderID+ItemID).

  8. NORMALIZATION GET IN TABLE!

  9. Customer table Key: CustomerID Attributes: Name Address Phone Video table Key: VideoID Attributes : Title RentalFee Rating… RentalTransaction table Key: TransactionID Attributes : CustomerID Date VideoRented table Key: TransactionID + VideoID Attributes: Copy# Video Store Transaction Processing System (VSTPS):Classes, Columns & Business Rules Transaction Data (“Dynamic” ) — Operations Entities (change more often) Master Data (“Static”)— Market & Inventory Entities (don’t change often)

  10. Customer VideoTitle 1 * includes has ? * * RentalTransaction Business Rules and Class Diagram for VSTPS • Business Rules: • A customer can have many rental transactions, each being for a specific customer. • A transaction can include many video titles, and a title is in many transactions. • A transaction can include just one copy of a video title. VideoRented

  11. Schema for VSTPS You can draw a normalized schema based on knowledge of multiplicity and data analysis you already have! 1 Customer(CustomerID, LastName, FirstName, Address, City, …) * RentalTransaction(TransID, RentDate, CustomerID) 1 * Transaction data VideoRented(TransID, VideoID, Copy#) * 1 Video(VideoID, Title, RentalFee)

  12. VideoRental Why Normalize – Avoiding Data Anomalies • How to get to those four tables using normalization logic? Why not simple design for recording rentals: Test: VideoRental(Rec#, CustomerID, LastName, FirstName,… VideoID, Title, RentalFee, Copy#, Date) Poor design because: • Master data (Customer , Video) repeat for each transaction - highredundancy. • Deletion of transaction data causes deletion of master data and reverse – deletion anomaly: Cannot delete target data but more (or less) than wanted. • A new customer can’t be added without adding a new video and reverse – insertion anomaly: Data can’t be added without corrupting other data. • To change customer name, all records must be rewritten – • update anomaly: Data can’t be updated only in a single master record. • Conclusion: From the normalization perspective, data must be properly designed in order to avoid CRUD* anomalies and reduce redundancy.

  13. 1. * rents * Customer Video 1 includes * has RentalTransaction * * Customer Video 2. 1 is rented * 1 * has contains * * RentalTransaction VideoRented (copy#) 1 includes * Normalization A process of splitting a chunk of data to arrive at clear master and transactional classes. Each many-to-many relationship must be replaced by 2 one-to-many relationships. How to track copies of a same video?

  14. Interview users, understand output needed. Put data into a large table (RentalForm). Pick out attributes. Find repeating groups (sections). Look for potential keys. Identify computed values. Normalization Process Focus is on logic not really using such process in practice. RentalForm(TransID, RentDate, (CustomerID, Name, Address, City, State, …), (VideoID, Copy#, Title, RentalFee))

  15. Problems with Repeating Groups (Sections) RentalForm(TransID, RentDate, (CustomerID, Phone, Name, Address, City, State, …), (VideoID, Copy#, Title, Rent)) Repeating Groups TransID RentDate CustomerID LastName Phone Address VideoID Copy# Title Rent 1 4/18/02 3 Washington 502-777-7575 95 Easy Street 1 2 2001: A Space Odyssey $1.50 1 4/18/02 3 Washington 502-777-7575 95 Easy Street 6 3 Clockwork Orange $1.50 2 4/30/02 7 Lasater 615-888-4474 67 S. Ray Drive 8 1 Hopscotch $1.50 2 4/30/02 7 Lasater 615-888-4474 67 S. Ray Drive 2 1 Apocalypse Now $2.00 2 4/30/02 7 Lasater 615-888-4474 67 S. Ray Drive 6 1 Clockwork Orange $1.50 • Repeating groups cause • high redundancy • update anomaly (must run through all records) • insertion anomaly as errors in data (fake CustomerID if new video added) • - deletion anomaly (can’t delete simply what is needed) • If there are repeating sections, the table is not in the first normal form (1NF).

  16. 1NF: A table is in 1NF if it does not haverepeating sections. Normalization Procedure: Remove repeating sections by splitting the initial table into new tables. Preserve associations between the initial table and new tables by replicating the initial key. Customer(TransID, CustomerID, Phone, Name, Address, City, State, ZipCod) Reminder of initial table RentalTransaction(TransID, RentDate) New Video(TransID, VideoID, Copy#, Title, RentalFee) First Normal Form (1NF)

  17. There are problems in the relationship between the key and non-keys. Concept of Functional Dependence: An attribute depends on another attribute if the change of its value is caused by a change of the other attribute. The key column must be sufficient for determining values of the non-key columns. Video TransIDVideoID Copy# Title RentalFee 1 1 2 2001: A Space Odyssey $1.50 1 6 3 Clockwork Orange $1.50 2 8 1 Hopscotch $1.50 2 2 1 Apocalypse Now $2.00 2 6 1 Clockwork Orange $1.50 Problems with First Normal Form • Problems apply only to tables with combined keys! (A single-key table in 1NF is also in 2NF.)

  18. If any non-key column depends just on a part of the key there is partial functional dependence and the table is not in 2NF. Video(TransID, VideoID, Copy#, Title, RentalFee) Problems with First Normal Form (cont.) Copy# depends on full key (TransID + VideoID) -- Full Functional Dependency on the key. * Combined determine Sufficient to determine VideoID is sufficient for predicting titles and rental fees. Therefore, there is Partial Functional Dependence between the combined key and Title and RentalFee. **

  19. Video(TransID, VideoID, Copy#, Title, RentalFee) move move replicate VideoRented(TransID, VideoID, Copy#) New Video(VideoID, Title, RentalFee) Resulting Video table * Second Normal Form (2NF) • 2NF: A table is in 2NF if it is (a) is 1NF and (b) non-key columns depend on the entire key. • Normalization Procedure: • Move TransID and Copy# into a new table VideoRented. • Preserve the association between Video and VideoRented by replicating VideoID in table VideoRented. X X

  20. Customer(TransID, CustomerID, Phone, Name, Address, City, State,…) move replicate RentalTransaction(TransID, RentDate, CustomerID) Completed Customer(CustomerID, LastName, FirstName, Address, City, …) Resulting Customer table Finalize 2NF… Table Customer must also be brought into 2NF by moving TransID into table RentalTransaction (already there) and replicating CustomerID (see Slide 15). X

  21. Problems with 3NF: If any non-key depends on some other non-key there is transitive dependence and the table is not in 3NF. 3 NF: Table is in 3NF if it is (a) in 2NF, and (b) each non-key attribute depends on the key only (or the key and nothing but the key). Our design is already in 3NF! Check it below: Customer(CustomerID, LastName, FirstName, Address, City, …) RentalTransaction(TransID, RentDate, CustomerID) VideoRented(TransID, VideoID, Copy#) Video(VideoID, Title, RentalFee) Third Normal Form (3NF)

  22. 3NF Example Table in 2NF: Sale(SaleID, CustomerID, SalespersonID, SalespersonRank…) • Violation of 3NF: SalespersonRank (non-key) is dependent on SalespersonID, not SaleID. • Solution – split table into 2 tables:: Sale(SaleID, CustomerID, SalespersonID) Salesperson(SalespersonID, SalespersonRank) • Forms beyond the 3rd are very rare and therefore reaching 3NF is sufficient for most of practical purposes. • When we say “create schema”, we mean “create tables that are in 3NF”.

  23. Customer(CustomerID, LastName, FirstName, Address, City, …) RentalTransaction(TransID, CustomerID, VideoID, RentDate) Video(VideoID, Title, RentalFee) Simplified Schema for VSTPSUsing Different Key Design 1 * * 1 Note: Video key can be made unique: VideoID = 85.1 (decimal place designates a copy), or 85c1 (text type), or use a bar code for each video and copy (ItemID).

  24. 1) If a table has repeating sections, there is huge redundancy, different classes are mixed together, and all anomalies occur. Split the table, so that classes are clearly differentiated. Result: 1NF. Normalization Summary (Must know by heart!) 1NF: A table is in 1NF if it does not haverepeating sections. 2) If a table has a combined key, non-key columns may depend on just a part of the primary key, and so there is partial functional dependency. Split the table so that in new tables non-keys depend on the entire key. Result: 2NF. 2NF: A table is in 2NF if it is in 1NF and non-key columns depend on the entire key. 3) If a non-key depends on another non-key, there is transitive dependency. Split the table so that in new tables each non-key depends on the key and nothing but the key. Result: 3NF. 3NF: A table is in 3NF if it is in 2NF and all non-keys depend on the key only.

More Related