1 / 23

Normalization

Normalization. Data inconsistency, incorrectness and incompleteness is (partially) caused by redundancy. Normalization is the process of eliminating redundancy as much as possible, without losing data. Tools of Normalization. Normalization is achieved by lossless decomposition

kris
Download Presentation

Normalization

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Normalization Data inconsistency, incorrectness and incompleteness is (partially) caused by redundancy. Normalization is the process of eliminating redundancy as much as possible, without losing data.

  2. Tools of Normalization Normalization is achieved by • lossless decomposition • functional dependency

  3. Functional dependency A functional dependency is a 1-n relation of a set of attributes within a relation to another set of attributes in the same relation. Example: anr  last_name

  4. Functional dependency Airport is functionally dependent on IATA. For each airport, exactly one IATA exists. Notation: IATA  Airport

  5. Case Description Airline Reservation System (assignment 1999) • Flights are operated by Carriers • flight number, origin, destination, ETD, ETA • destination is coded in IATA codes • Passengers make a Reservation • last name, initials, passport, nationality • booked, standby, checked in, boarded

  6. UNF

  7. UNF (continued)

  8. First Normal Form A relation is in first normal form if, and only if all domains contain atomic values only • Problems with current tables: • Carrier contains both ID and full name • Departure and Arrival contains both date and time • Solution: • Split up columns

  9. 1 NF • Anomalies: • Flights without reservations do not exist • Carriers without flights do not exist

  10. Second Normal Form A relation is in second normal form if, and only if the relation is in first normal form and if all non key-attributes are fully functionally dependent on the primary key. Full functional dependency A  B is a functional dependency on B of A, so that B is not dependant on a part of A. Example: anr + last_name  address,

  11. 1NF Dependency Diagram

  12. 2NF Violations • FlightNumber is a determinant for Destination, Origin, CarrierID and CarrierName • Passport is a determinant for Initials, LastName and Nationality. • FlightNumber and DepartureDate together determine DepartureTime, ArrivalDate and ArrivalTime. • FlightNumber, Passport and DepartureDate together determine Status.

  13. Splitting up the tables... Anomalies: Carriers without flights cannot be stored in the database

  14. Third Normal Form A relation is in third normal form if, and only if the relation is in second normal form and if all non key-attributes are non-transitively dependent on the primary key. A transitive dependency is a situation where two functional dependencies are related in the sense that A  B and B  C In other words; C is transitively dependent on A

  15. 2NF Dependency Diagram 3NF violation: FlightNo  CarrierId CarrierId  CarrierName

  16. Splitting up the tables...

  17. Boyce-Codd Normal Form Relations in 3NF can still contain redundancy when: 1. A relation has two or more candidate keys, and 2. Candidate keys are composite keys, and 3. Candidate keys overlap. Boyce-Codd Normal form: A relation is in BCNF if all determinants are candidate keys. Usually, a relation that is in 3NF is in BCNF too.

  18. Conclusion Normalization 1NF: a relation is in 1NF iff all cells contain atomic values only. 2NF: a relation is in 2NF iff it is in 1NF and all non-key attributes are fully functionally dependent on the primary key 3NF: a relation is in 3NF iff it is in 2NF and all non-key attributes are non-transitively dependent on the primary key BCNF: a relation is in BCNF iff all determinants are candidate keys. Tools of normalization are functional dependency and lossless decomposition.

  19. Normalisation assignment 2000 Case Study: Digital Library Database components: Document Records Thesaurus

  20. Thesaurus A thesaurus consists of Terms and Relationships Example: information retrieval - narrower term - text retrieval TERM Relationship TERM A document is classified into a (set of) thesaurus entries.

  21. Linking records with thesauri Example: “Conceptualising information spaces in federated digital libraries” (Papazoglou M, Hoppenbrouwers J) 1999 SIGMOD Record has keywords: meta data; semantics; information retrieval; query optimisation; federated data base; electronic library

  22. UNF digital library Bibliographic Records Table: Thesaurus Table: NOT A GOOD IDEA!

More Related