157 Views

Download Presentation
##### MIS710 Module 1b Relational Model and Normalization

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -

**MIS710 Module 1bRelational Model and Normalization**Arijit Sengupta**Structure of this semester**MIS710 1. Design 2. Querying 4. Advanced Topics 0. Intro 3. Applications Database Fundamentals Conceptual Modeling Query Languages Java DB Applications – JDBC Transaction Management Relational Model Advanced SQL Data Mining Normalization Newbie Users Designers Developers Professionals**Today’s Buzzwords**• Relational Model • Superkey, Candidate Key, Primary Key and Foreign Key • Entity Integrity Rule • Referential Integrity Rule • Normalization • First, Second, Third, and Boyce-Codd Normal Forms • Unnormalization**Objectives of this lecture**• Understand the Relational Model and its properties • Understand the notion of keys • Understand the use and importance of referential integrity • Provide an alternative way to design relations using semantics rather than concepts • Take an existing “flat file” design and creating a relational design from it through the process of Normalization • Identify sources of problems (or anomalies) within a given relational design • Argue about improvements to designs created by others**Attribute Names**Relation Tuples Attribute Values Attributes Relational Data Model • Originally proposed by Codd in 1970 • Based on mathematical set theory**Relation: Properties**• A relation is a set of tuples • A tuple is a set of attribute-value properties (relations) • Ordering of attributes is immaterial • Ordering of Tuples is immaterial • Tuples are distinct from one another • Attributes contain atomic values only**Attributes**• Attribute name • Attribute names are unique within a relation • Attribute domain • Set of all possible values an attribute may take • Domain (GPA) = • Domain (name) = • Domain (DateOfBirth) = • Domain (year) • Number of attributes: degree of the relation**Tuples**• Aggregation of attribute values • S1 = (s1, ‘Jose’, 21, ‘StonedHill’, 3.1) • S2 = (s2, ‘Alice’, 18, ‘BigHead’, 3.2) • Cardinality: Number of tuples in a relation • What is the difference between the cardinality and the degree?**Primary Keys**• Superkey: SK, a subset of attributes of R, satisfying Uniqueness, that is, no two tuples have the same combination of values for these attributes • Candidate Key: K, a superkey SK, satisfying minimality, that is, no component of K can be eliminated without destroying the uniqueness property. • Primary Key: PK, the selected Candidate key, K. • Can a primary key be composed of multiple attributes? • Can a relation have multiple primary keys?**Keys - example**Disk: (ISBN#, Artist_name, Album_name, Year, Producer, Genre, time, price) • Superkeys? • Candidate keys? • Primary key?**Entity Integrity Rule**• The primary key of a base relation cannot contain a NULL value. • Enforcement of the rule: • An update which results in a NULL value in the primary key must be rejected. • Are the following ok? Primary Key**Foreign Key**Physician (ID, Name, …) Patient (ID, Name, PhysID*, …) Club (ID, Name, …) Player (ID, Name, ?*, …) Order (OrdID, Date, …, ?*) Customer (ID, Name, …, ?*) Dept (DeptID, Name, …, ?*) Employee (EID, Name, …, ?*) • Attribute(s) of one relation that reference(s) the PK of another relation • FK may or may not be (a part of) the PK of this relation Course (CourseID, Name, …, ?*) Class (ClassID, Meets, …, ?*) Student (SID, Name, …, ?*) Registration (?) • Can an FK refer to a part of the PK of another relation? • Can an FK refer to a PK of the same relation?**Foreign Key ..**• FK and referenced PK may have different names • The values of FK must draw from the value set of PK • How do we define the Domain of an FK? • Can an FK have a NULL value? • What can we enforce with PKs and FKs? Primary Key Foreign Key Domain Value Set Domain**Referential Integrity Rule**• If FK is the foreign key of a relation R2, which matches the primary key PK of the relation R1, then: • the FK value must match the PK value in some tuple of R1, or • the FK value may be NULL, but only if the FK is not (a part of) the PK of R2. • Enforcement of the Rule • An update on either a referenced PK or an FK must satisfy the rule. Otherwise, the operation is rejected. • Which operation on the primary key may violate this rule? • Which operation on the foreign key may violate this rule?**Referential Integrity Enforcement**• If an operation violates referential integrity: • Restrict • reject the operation • Cascade • try to propagate the operation to all dependent FK values, if it is not possible, reject the operation • Nullify (or Default) • set all dependent FK values to NULL (or a default value), if that is not possible, reject the operation • Cases for each of the above situations?**Creating Relations**create table STUDENT ( ID char (11) not null primary key, Name char(30) not null, age int, GPA number (2,1)); create table COURSE ( courseno char (6) not null primary key, coursename char(30) not null, credithours number (2,1)); create table REGISTRATION ( ID references STUDENT (ID) on delete cascade, CourseNum references COURSE (courseno), primary key (ID, CourseNum) );**Normalization - Motivating Example**• Is there any redundant data? • Can we insert a new course# with a new textbook? • What should be done if ‘CIS’ is changed to ‘MIS’? • What would happen if we remove all CIS 800 students?**Why Normalization?**• Poor Relation Design causes Anomalies • Insertion anomalies - Insertion of some piece of information cannot be performed unless other irrelevant information is added to it. • Update anomalies - Update of a single piece of information requires updates to multiple tuples. • Deletion anomalies - Deletion of a piece of information removes other unrelated but necessary information. • Normalization improves the design to remove these anomalies**Why Normalization?**• Benefits • contain minimum amount of redundancy • allow users to insert, delete and modify tuples in the relation without errors or inconsistencies. • improve quality of information in the database • decrease storage space for the database • Costs • may contribute to performance problems • may require more storage in some cases**Unnormalized Relation**• Create a ‘Definition’ for this relation. • Do you see any problems in the definition? • Do you see any anomalies in the data?**Normal Forms**NF2 1NF 2NF 3NF BCNF Unnormalized Relation Only atomic attributes First Normal Form Remove nonkey dependency Second Normal Form Remove transitive dependency Third Normal Form Dependency preservation: BCNF Remove Multi-valued Dependencies: 4NF Remove Join Dependencies: 5NF Higher Order Forms**The Basis of Normalization**• Functional Dependency (FD) • Consider two attributes, X and Y, and two arbitrary tuples r1 and r2 of a relation R. • Y is functionally dependent on X iff: value of x in r1 = value of x in r2 implies value of Y in r1 = value of Y in r2 • Also stated as: R.X R.Y or X Y**Properties of FDs**• If R.X R.Y or X Y • X is called the determinant of Y. • X may or may not be the key attribute of R. • A FD changes with its semantic meaning • Name Address? • X and Y may be composite • X and Y may be mutually dependent on each other • Husband Wife, Wife Husband • The same Y value may occur in multiple tuples • Course# Text**Fully Functional Dependencies**• When is X Y a FFD? When Y is not functionally dependent on any proper subset of X • X Y is a fully functional dependency ( FFD ) ( SID, Course# ) Name? ( SID, Course# ) Grade? ( SID, Name ) Major? ( SID, Name ) SID? • By default, the term FD refers to FFD**Transitive Dependencies**• Given attributes X, Y, and Z of a relation R, • Z is transitively dependent on X (X Z) iff X Y and Y Z • For example: SID Dept, SID Major, Dept School, Major Dept • Do you see any Transitive Functional Dependencies?**Some Inference Rules for FDs**• An FD is redundant if it can be derived from other FDs based on a set of inference rules. Some of these rules are: • Reflexive rule: If X Y, then X Y • X always determines a subset of itself. • Augmentation rule: If X Y, then XZ YZ • Adding an attribute(s) on both side does not change the FD. • Transitive rule: If X Y & Y Z, then X Z • Functional dependencies can be ‘chained’. • Decomposition rule: If X YZ, then X Y and X Z • Given: { SID Name, SID Major, Major Dept }, which ones is/are redundant? SID School, SID Dept, Dept School SID ( Name, Major ), (SID, Name) (Major, Name) SID SID, SID (Name, SID)**First Normal Form**• DEFINITION • A relation R is in first normal form (1NF) if and only if all underlying domains contain atomic values only. • Translation • To be in first normal form the table must not contain any repeating attributes. • Implication • Are all ‘relations’ in First Normal Form (1NF) ?**Example - 1NF**The ‘unnormalized’ relation has been decomposed in two. • What are the PKs? Relation: Student Relation: Student-Course**Anomalies (with only 1NF)**• Insertion Anomaly • A new course cannot be inserted in the database (relation Student-Course) until a student registers for that course. • Update Anomaly • If the instructor of a course is changed, this fact would have to be noted at many places in the database (many tuples of the relation Student-Course). • Deletion Anomaly • Withdrawal of all students from an existing course (that is, deletion of related tuples from the relation Student-Course) will result in unwarranted removal of that course from the database.**Anomalies in 1NF**Course (SID, Name, Grade, Course#, Text, Major, Dept) • 1NF Relations have anomalies • Redundant Information ? • Update Anomalies ? • Insertion Anomalies ? • Deletion Anomalies ? Name SID Major Course# Grade Dept Text**Second Normal Form**• DEFINITION • A relation R is in second normal form (2NF) if and only if it is in 1NF and every nonkey attribute is dependent on the full primary key. • Translation • A table is in second normal form if there are no partial dependencies. • Implication • What kinds of primary keys may lead to a violation of the Second Normal Form (2NF) ?**Bubble Chart**• Reconsider the example .. StudentName Credits CourseTitle StudentId+ CourseId Instructor Classroom Grade**Dealing with Compound Keys**• Revised Bubble Chart StudentName Credits CourseTitle StudentId Instructor CourseId Classroom Grade**Anomalies with (only) 2NF**• Insertion anomaly • Information about a faculty (potential advisor) cannot be added to the database unless a student is assigned to him/her. • Update anomaly • If the advisor’s office location or phone were changed, many tuples would need to be changed. • Deletion anomaly • If all students assigned to an advisor graduate, information about the advisor will disappear from the database.**Third Normal Form**• DEFINITION • A relation R is in third normal form (3NF) if and only if it is in 2NF and every nonkey attribute is non-transitively dependent on the primary key. • Translation • A table is in Third Normal Form if every non-key attribute is determined by the key, and nothing else. • Implication • How many total attributes must the relation have for a possible violation of the Third Normal Form (3NF) ?**3NF Example**• Chalk out the relations. How do you maintain student-advisor relation? StudentName StudentId TotalCredits Advisor Status AdvisorOffice Advisor AdvisorPhone**Boyce-Codd Normal Form (BCNF)**• Update anomalies occur in an 3NF relation R if • R has multiple candidate keys, • Those candidate keys are composite, and • The candidate keys are overlapped. Computer-Lab (SID, Account, Class, Hours) • A relation R is in BCNF iff every determinant is a candidate key.**The Normalization Process**• Flatten the Table Completely (no composite columns) • Find the Key and “all” FDs (well as many as you can possibly detect) • Find Partial Dependencies and decompose relation using them (2NF) • Find Transitive dependencies and decompose using them (3NF) • Remember – this is not a deterministic method – depends on the order in which FDs are chosen, so same Relation, same set of FDs can lead to different decompositions!**Lossless Decomposition**• A bad decomposition loses information • In a good decomposition • The join of decomposed relations restores the original relation • Decomposed relations can be maintained independently • Rissanen’s rule for non-loss decomposition: Two projections R1 and R2 of a relation R are independent iff: • Every FD in R can be logically deduced from those in R 1 and R 2 , and • The common attributes of R 1 and R 2 form a candidate key for at least one of the pair.**Higher Normal Forms**• Fourth Normal Form • Multivalued Dependencies (Fagin 1977) • Fifth Normal Form • Join Dependencies (Fagin 1979) • Other Dependencies • Inclusion Dependencies (Casanova 1981) • Template Dependencies (Sadri 1982) • Domain-Key Normal Form (Fagin 1981)