420 likes | 607 Views
MIS710 Module 1b Relational Model and Normalization. Arijit Sengupta. Structure of this semester. MIS710. 1. Design. 2. Querying. 4. Advanced Topics. 0. Intro. 3. Applications. Database Fundamentals. Conceptual Modeling. Query Languages. Java DB Applications – JDBC.
 
                
                E N D
MIS710 Module 1bRelational Model and Normalization Arijit Sengupta
Structure of this semester MIS710 1. Design 2. Querying 4. Advanced Topics 0. Intro 3. Applications Database Fundamentals Conceptual Modeling Query Languages Java DB Applications – JDBC Transaction Management Relational Model Advanced SQL Data Mining Normalization Newbie Users Designers Developers Professionals
Today’s Buzzwords • Relational Model • Superkey, Candidate Key, Primary Key and Foreign Key • Entity Integrity Rule • Referential Integrity Rule • Normalization • First, Second, Third, and Boyce-Codd Normal Forms • Unnormalization
Objectives of this lecture • Understand the Relational Model and its properties • Understand the notion of keys • Understand the use and importance of referential integrity • Provide an alternative way to design relations using semantics rather than concepts • Take an existing “flat file” design and creating a relational design from it through the process of Normalization • Identify sources of problems (or anomalies) within a given relational design • Argue about improvements to designs created by others
Attribute Names Relation Tuples Attribute Values Attributes Relational Data Model • Originally proposed by Codd in 1970 • Based on mathematical set theory
Relation: Properties • A relation is a set of tuples • A tuple is a set of attribute-value properties (relations) • Ordering of attributes is immaterial • Ordering of Tuples is immaterial • Tuples are distinct from one another • Attributes contain atomic values only
Attributes • Attribute name • Attribute names are unique within a relation • Attribute domain • Set of all possible values an attribute may take • Domain (GPA) = • Domain (name) = • Domain (DateOfBirth) = • Domain (year) • Number of attributes: degree of the relation
Tuples • Aggregation of attribute values • S1 = (s1, ‘Jose’, 21, ‘StonedHill’, 3.1) • S2 = (s2, ‘Alice’, 18, ‘BigHead’, 3.2) • Cardinality: Number of tuples in a relation • What is the difference between the cardinality and the degree?
Primary Keys • Superkey: SK, a subset of attributes of R, satisfying Uniqueness, that is, no two tuples have the same combination of values for these attributes • Candidate Key: K, a superkey SK, satisfying minimality, that is, no component of K can be eliminated without destroying the uniqueness property. • Primary Key: PK, the selected Candidate key, K. • Can a primary key be composed of multiple attributes? • Can a relation have multiple primary keys?
Keys - example Disk: (ISBN#, Artist_name, Album_name, Year, Producer, Genre, time, price) • Superkeys? • Candidate keys? • Primary key?
Entity Integrity Rule • The primary key of a base relation cannot contain a NULL value. • Enforcement of the rule: • An update which results in a NULL value in the primary key must be rejected. • Are the following ok? Primary Key
Foreign Key Physician (ID, Name, …) Patient (ID, Name, PhysID*, …) Club (ID, Name, …) Player (ID, Name, ?*, …) Order (OrdID, Date, …, ?*) Customer (ID, Name, …, ?*) Dept (DeptID, Name, …, ?*) Employee (EID, Name, …, ?*) • Attribute(s) of one relation that reference(s) the PK of another relation • FK may or may not be (a part of) the PK of this relation Course (CourseID, Name, …, ?*) Class (ClassID, Meets, …, ?*) Student (SID, Name, …, ?*) Registration (?) • Can an FK refer to a part of the PK of another relation? • Can an FK refer to a PK of the same relation?
Foreign Key .. • FK and referenced PK may have different names • The values of FK must draw from the value set of PK • How do we define the Domain of an FK? • Can an FK have a NULL value? • What can we enforce with PKs and FKs? Primary Key Foreign Key Domain Value Set Domain
Referential Integrity Rule • If FK is the foreign key of a relation R2, which matches the primary key PK of the relation R1, then: • the FK value must match the PK value in some tuple of R1, or • the FK value may be NULL, but only if the FK is not (a part of) the PK of R2. • Enforcement of the Rule • An update on either a referenced PK or an FK must satisfy the rule. Otherwise, the operation is rejected. • Which operation on the primary key may violate this rule? • Which operation on the foreign key may violate this rule?
Referential Integrity Enforcement • If an operation violates referential integrity: • Restrict • reject the operation • Cascade • try to propagate the operation to all dependent FK values, if it is not possible, reject the operation • Nullify (or Default) • set all dependent FK values to NULL (or a default value), if that is not possible, reject the operation • Cases for each of the above situations?
Creating Relations create table STUDENT ( ID char (11) not null primary key, Name char(30) not null, age int, GPA number (2,1)); create table COURSE ( courseno char (6) not null primary key, coursename char(30) not null, credithours number (2,1)); create table REGISTRATION ( ID references STUDENT (ID) on delete cascade, CourseNum references COURSE (courseno), primary key (ID, CourseNum) );
Normalization - Motivating Example • Is there any redundant data? • Can we insert a new course# with a new textbook? • What should be done if ‘CIS’ is changed to ‘MIS’? • What would happen if we remove all CIS 800 students?
Why Normalization? • Poor Relation Design causes Anomalies • Insertion anomalies - Insertion of some piece of information cannot be performed unless other irrelevant information is added to it. • Update anomalies - Update of a single piece of information requires updates to multiple tuples. • Deletion anomalies - Deletion of a piece of information removes other unrelated but necessary information. • Normalization improves the design to remove these anomalies
Why Normalization? • Benefits • contain minimum amount of redundancy • allow users to insert, delete and modify tuples in the relation without errors or inconsistencies. • improve quality of information in the database • decrease storage space for the database • Costs • may contribute to performance problems • may require more storage in some cases
Unnormalized Relation • Create a ‘Definition’ for this relation. • Do you see any problems in the definition? • Do you see any anomalies in the data?
Normal Forms NF2 1NF 2NF 3NF BCNF Unnormalized Relation Only atomic attributes First Normal Form Remove nonkey dependency Second Normal Form Remove transitive dependency Third Normal Form Dependency preservation: BCNF Remove Multi-valued Dependencies: 4NF Remove Join Dependencies: 5NF Higher Order Forms
The Basis of Normalization • Functional Dependency (FD) • Consider two attributes, X and Y, and two arbitrary tuples r1 and r2 of a relation R. • Y is functionally dependent on X iff: value of x in r1 = value of x in r2 implies value of Y in r1 = value of Y in r2 • Also stated as: R.X R.Y or X Y
Properties of FDs • If R.X R.Y or X Y • X is called the determinant of Y. • X may or may not be the key attribute of R. • A FD changes with its semantic meaning • Name Address? • X and Y may be composite • X and Y may be mutually dependent on each other • Husband Wife, Wife Husband • The same Y value may occur in multiple tuples • Course# Text
Fully Functional Dependencies • When is X  Y a FFD? When Y is not functionally dependent on any proper subset of X • X  Y is a fully functional dependency ( FFD ) ( SID, Course# )  Name? ( SID, Course# )  Grade? ( SID, Name )  Major? ( SID, Name )  SID? • By default, the term FD refers to FFD
Transitive Dependencies • Given attributes X, Y, and Z of a relation R, • Z is transitively dependent on X (X Z) iff X Y and Y Z • For example: SID Dept, SID Major, Dept School, Major Dept • Do you see any Transitive Functional Dependencies?
Some Inference Rules for FDs • An FD is redundant if it can be derived from other FDs based on a set of inference rules. Some of these rules are: • Reflexive rule: If X  Y, then X  Y • X always determines a subset of itself. • Augmentation rule: If X  Y, then XZ  YZ • Adding an attribute(s) on both side does not change the FD. • Transitive rule: If X  Y & Y  Z, then X  Z • Functional dependencies can be ‘chained’. • Decomposition rule: If X  YZ, then X  Y and X  Z • Given: { SID  Name, SID  Major, Major  Dept }, which ones is/are redundant? SID  School, SID  Dept, Dept  School SID  ( Name, Major ), (SID, Name)  (Major, Name) SID  SID, SID  (Name, SID)
First Normal Form • DEFINITION • A relation R is in first normal form (1NF) if and only if all underlying domains contain atomic values only. • Translation • To be in first normal form the table must not contain any repeating attributes. • Implication • Are all ‘relations’ in First Normal Form (1NF) ?
Example - 1NF The ‘unnormalized’ relation has been decomposed in two. • What are the PKs? Relation: Student Relation: Student-Course
Anomalies (with only 1NF) • Insertion Anomaly • A new course cannot be inserted in the database (relation Student-Course) until a student registers for that course. • Update Anomaly • If the instructor of a course is changed, this fact would have to be noted at many places in the database (many tuples of the relation Student-Course). • Deletion Anomaly • Withdrawal of all students from an existing course (that is, deletion of related tuples from the relation Student-Course) will result in unwarranted removal of that course from the database.
Anomalies in 1NF Course (SID, Name, Grade, Course#, Text, Major, Dept) • 1NF Relations have anomalies • Redundant Information ? • Update Anomalies ? • Insertion Anomalies ? • Deletion Anomalies ? Name SID Major Course# Grade Dept Text
Second Normal Form • DEFINITION • A relation R is in second normal form (2NF) if and only if it is in 1NF and every nonkey attribute is dependent on the full primary key. • Translation • A table is in second normal form if there are no partial dependencies. • Implication • What kinds of primary keys may lead to a violation of the Second Normal Form (2NF) ?
Bubble Chart • Reconsider the example .. StudentName Credits CourseTitle StudentId+ CourseId Instructor Classroom Grade
Dealing with Compound Keys • Revised Bubble Chart StudentName Credits CourseTitle StudentId Instructor CourseId Classroom Grade
Anomalies with (only) 2NF • Insertion anomaly • Information about a faculty (potential advisor) cannot be added to the database unless a student is assigned to him/her. • Update anomaly • If the advisor’s office location or phone were changed, many tuples would need to be changed. • Deletion anomaly • If all students assigned to an advisor graduate, information about the advisor will disappear from the database.
Third Normal Form • DEFINITION • A relation R is in third normal form (3NF) if and only if it is in 2NF and every nonkey attribute is non-transitively dependent on the primary key. • Translation • A table is in Third Normal Form if every non-key attribute is determined by the key, and nothing else. • Implication • How many total attributes must the relation have for a possible violation of the Third Normal Form (3NF) ?
3NF Example • Chalk out the relations. How do you maintain student-advisor relation? StudentName StudentId TotalCredits Advisor Status AdvisorOffice Advisor AdvisorPhone
Boyce-Codd Normal Form (BCNF) • Update anomalies occur in an 3NF relation R if • R has multiple candidate keys, • Those candidate keys are composite, and • The candidate keys are overlapped. Computer-Lab (SID, Account, Class, Hours) • A relation R is in BCNF iff every determinant is a candidate key.
The Normalization Process • Flatten the Table Completely (no composite columns) • Find the Key and “all” FDs (well as many as you can possibly detect) • Find Partial Dependencies and decompose relation using them (2NF) • Find Transitive dependencies and decompose using them (3NF) • Remember – this is not a deterministic method – depends on the order in which FDs are chosen, so same Relation, same set of FDs can lead to different decompositions!
Lossless Decomposition • A bad decomposition loses information • In a good decomposition • The join of decomposed relations restores the original relation • Decomposed relations can be maintained independently • Rissanen’s rule for non-loss decomposition: Two projections R1 and R2 of a relation R are independent iff: • Every FD in R can be logically deduced from those in R 1 and R 2 , and • The common attributes of R 1 and R 2 form a candidate key for at least one of the pair.
Higher Normal Forms • Fourth Normal Form • Multivalued Dependencies (Fagin 1977) • Fifth Normal Form • Join Dependencies (Fagin 1979) • Other Dependencies • Inclusion Dependencies (Casanova 1981) • Template Dependencies (Sadri 1982) • Domain-Key Normal Form (Fagin 1981)