1 / 74

Revision for Mid 1

CS157B. Lecture 6 Functional Dependency,2NF,3NF and SQL. Revision for Mid 1. Functional Dependencies. FDs defined over two sets of attributes: X, Y Ì R Notation: X à Y reads as “ X determines Y ” If X à Y, then all tuples that agree on X must also agree on Y. R. X Y Z. 1 2 3

Download Presentation

Revision for Mid 1

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS157B Lecture 6 Functional Dependency,2NF,3NF and SQL Revision for Mid 1

  2. Functional Dependencies • FDs defined over two sets of attributes: X, Y Ì R • Notation: X à Y reads as “X determines Y” • If X à Y, then all tuples that agree on X must also agree on Y R X Y Z 1 2 3 2 4 5 1 2 4 1 2 7 2 4 8 3 7 9

  3. Functional Dependencies (example) X Y Z X Y Z 1 2 3 2 4 5 1 2 4 1 2 7 2 4 8 3 7 9

  4. Candidate Keys • an attribute (or set of attributes) that uniquely identifies a row • primary key is a special candidate key • values cannot be null • e.g. • ENROLL (Student_ID, Name, Address, …) • PK = Student_ID • candidate key = Name, Address

  5. 2NF • a relation is in second normal form if it is in first normal form AND every nonkey attribute is fully functionally dependant on the primary key • i.e. remove partial functional dependencies, so no nonkey attribute depends on just part of the key

  6. EMPLOYEE2 (Emp_ID, Course_Title, Name, Dept_Name, Salary, Date_Completed) not fully functionally dependant on the primary key

  7. Second Normal Form ( 2NF ) • it is based on the concept of full functional dependency. • A functional dependency XY is a full functional dependency , for any attribute A  X, {X - {A}}  Y.

  8. 2NF (Example) A B C D 2 Candidate Keys R with key{AB} is NOT 2NF R with key{AC} is NOT 2NF

  9. Second Normal Form Second normal form: Let R’ be a relation, and let F be the set of governing FDs. An attribute belongs to R’ is prime if a key of R’ contains A. In other words, A is prime in R’ if there exists K<R’ such that (1) K->R’, (2) for all B belongs to K, (K-B)->R’ not belongs to F+, and (3) A belongs to K

  10. General Definitions of Second Normal Form • A relation schema R is in second normal form (2NF) if every nonprime attribute A in R is fully functionally dependent on every key of R.

  11. Third Normal Form • The definition of 3NF is similar to that of BCNF, with the only difference being the third condition. • Recall that a key for a relation is a minimal set of attributes that uniquely determines all other attributes. • A must be part of a key (any key, if there are several). • It is not enough for A to be part of a superkey, because this condition is satisfied by every attribute. A relation R is in 3NF if, for all X  A that holds over R • A  X ( i.e., X  A is a trivial FD ), or • X is a superkey, or • A is part of some key for R If R is in BCNF, obviously it is in 3NF.

  12. Suppose that a dependency X  A causes a violation of 3NF. There are two cases: • X is a proper subset of some key K. Such a dependency is sometimes called a partial dependency. In this case, we store (X,A) pairs redundantly. • X is not a proper subset of any key. Such a dependency is sometimes called a transitive dependency, because it means we have a chain of dependencies K  XA.

  13. Key Attributes X Attributes A A not in a key Partial Dependencies Key Attributes X Attributes A A not in a key Key Attributes A Attributes X A in a key Transitive Dependencies

  14. Motivation of 3NF • By making an exception for certain dependencies involving key attributes, we can ensure that every relation schema can be decomposed into a collection of 3NF relations using only decompositions. • Such a guarantee does not exist for BCNF relations. • It weaken the BCNF requirements just enough to make this guarantee possible. • Unlike BCNF, some redundancy is possible with 3NF. • The problems associate with partial and transitive dependencies persist if there is a nontrivial dependency XA and X is not a superkey, even if the relation is in 3NF because A is part of a key.

  15. Reserves • Assume: sid  cardno (a sailor uses a unique credit card to pay for reservations). • Reserves is not in 3NF • sid is not a key and cardno is not part of a key • In fact, (sid, bid, day) is the only key. • (sid, cardno) pairs are redundantly.

  16. Reserves • Assume: sid  cardno, and cardno  sid (we know that credit cards also uniquely identify the owner). • Reserves is in 3NF • (cardno, sid, bid) is also a key for Reserves. • sid  cardno does not violate 3NF.

  17. Lecture 12:Further relational algebra, further SQL www.cl.cam.ac.uk/Teaching/current/Databases/

  18. Today’s lecture • Where does SQL differ from relational model? • What are some other features of SQL? • How can we extend the relational algebra to match more closely SQL?

  19. Duplicate rows • Consider our relation instances from lecture 6, Reserves, Sailors and Boats • Consider SELECT rating,age FROM Sailors; • We get a relation that doesn’t satisfy our definition of a relation! • RECALL: We have the keyword DISTINCT to remove duplicates

  20. Multiset semantics • A relation in SQL is really a multiset or bag, rather than a set as in the relational model • A multiset has no order (unlike a list), but allows duplicates • E.g. {1,2,1,3} is a bag • select, project and join work for bags as well as sets • Just work on a tuple-by-tuple basis

  21. Extended relational algebra Add features needed for SQL • Bag semantics • Duplicate elimination operator,  • Sorting operator,  • Grouping and aggregation operator,  • Outerjoin operators, oV, Vo, oVo

  22. Duplicate-elimination operator • (R) = relation R with any duplicated tuples removed • R= (R)= • This is used to model the DISTINCT feature of SQL

  23. Sorting • L1,… Ln(R) returns a list of tuples of R, ordered according to the attributes L1, …, Ln • Note:  does not return a relation • R= B(R)= [(5,2),(1,3),(3,4)] • ORDER BY in SQL, e.g. SELECT * FROM Sailors WHERE rating>7 ORDER BY age, sname;

  24. Extended projection • SQL allows us to use arithmetic operators SELECT age*5 FROM Sailors; • We extend the projection operator to allow the columns in the projection to be functions of one or more columns in the argument relation, e.g. • R= A+B,A,A(R)=

  25. Arithmetic • Arithmetic (and other expressions) can not be used at the top level • i.e. 2+2 is not a valid SQL query • How would you get SQL to compute 2+2?

  26. Aggregation • SQL provides us with operations to summarise a column in some way, e.g. SELECT COUNT(rating) FROM Sailors; SELECT COUNT(DISTINCT rating) FROM Sailors; SELECT COUNT(*) FROM Sailors WHERE rating>7; • We also have SUM, AVG, MIN and MAX

  27. Grouping • These aggregation operators have been applied to all qualifying tuples. Sometimes we want to apply them to each of several groups of tuples, e.g. • For each rating, find the average age of the sailors • For each rating, find the age of the youngest sailor

  28. GROUP BY in SQL SELECT [DISTINCT] target-list FROM relation-list WHERE qualification GROUP BY grouping-list; • The target-list contains • List of column names • Aggregate terms • NOTE: The variables in target-list must be contained in grouping-list

  29. GROUP BY cont. For each rating, find the average age of the sailors SELECT rating,AVG(age) FROM Sailors GROUP BY rating; For each rating find the age of the youngest sailor SELECT rating,MIN(age) FROM Sailors GROUP BY rating;

  30. Grouping and aggregation • L(R) where L is a list of elements that are either • Individual column names (“Grouping attributes”), or • Of the form (A), where  is an aggregation operator (MIN, SUM, …) and A is the column it is applied to • For example, rating,AVG(age)(Sailors)

  31. Example • Let R= • Compute beer,AVG(price)(R)

  32. Example cont. • Group according to the grouping attribute, beer: • Compute average of price within groups:

  33. NULL values • Sometimes field values are unknown (e.g. rating not known yet), or inapplicable (e.g. no spouse name) • SQL provides a special value, NULL, for both these situations • This complicates several issues • Special operators needed to check for NULL • Is NULL>8? Is (NULL OR TRUE)=TRUE? • We need a three-valued logic • Need to carefully re-define semantics

  34. NULL values • Consider INSERT INTO Sailors (sid,sname) VALUES (101,”Julia”); SELECT * FROM Sailors; SELECT rating FROM Sailors; SELECT sname FROM Sailors WHERE rating>0;

  35. Entity integrity constraint • An entity integrity constraint states that no primary key value can be NULL

  36. Outer join • Note that with the usual join, a tuple that doesn’t ‘join’ with any from the other relation is removed from the resulting relation • Instead, we can ‘pad out’ the columns with NULLs • This operator is called an full outer join, written oVo

  37. Example of full outer join • Let R= Let S= • Then RVS = • But RoVoS =

  38. Outer joins in SQL • SQL/92 has three variants: • LEFT OUTER JOIN (algebra: oV) • RIGHT OUTER JOIN (algebra: Vo) • FULL OUTER JOIN (algebra: oVo) • For example: SELECT * FROM Reserves r LEFT OUTER JOIN Sailors sON r.sid=s.sid;

  39. Views • A view is a query with a name that can be used in further SELECT statements, e.g. CREATE VIEW ExpertSailors(sid,sname,age) AS SELECT sid,sname,age FROM Sailors WHERE rating>9; • Note that ExpertSailors is not a stored relation • (WARNING: mysql does not support views )

  40. Querying views • So an example query SELECT sname FROM ExpertSailors WHERE age>27; • is translated by the system to the following: SELECT sname FROM Sailors WHERE rating>9 AND age>27;

  41. Relational Algebra • The Relational Algebra is used to define the ways in which relations (tables) can be operated to manipulate their data. • It is used as the basis of SQL for relational databases, and illustrates the basic operations required of any DML. • This Algebra is composed of Unary operations (involving a single table) and Binary operations (involving multiple tables).

  42. SQL • Structured Query Language (SQL) • Standardised by ANSI • Supported by modern RDBMSs • Commands fall into three groups • Data Definition Language (DLL) • Create tables, etc • Data Manipulation Language (DML) • Retrieve and modify data • Data Control Language • Control what users can do – grant and revoke privileges

  43. Selection • The selection or  operation selects rows from a table that satisfy a condition: < condition > < tablename > • Example: course = ‘CM’Students Students stud# namecourse 100Fred PH stud# namecourse 200 Dave CM 200 Dave CM 300 Bob CM 300 Bob CM

  44. Projection • The projection or  operation selects a list of columns from a table. < column list > < tablename > • Example: stud#, nameStudents Students stud# namecoursestud# name 100Fred PH 100Fred 200 Dave CM 200 Dave 300 Bob CM 300 Bob

  45. Selection / Projection • Selection and Projection are usually combined: stud#, name(course = ‘CM’Students) Students stud# namecourse 100Fred PH stud# name 200 Dave CM 200 Dave 300 Bob CM 300 Bob

  46. Cartesian Product • Concatenation of every row in the first relation (R) with every row in the second relation (S): R X S

  47. Cartesian Product - Example StudentsCourses stud# namecoursecourse# name 100Fred PH PH Pharmacy 200 Dave CM CM Computing 300 Bob CM Students XCourses = stud# Students.name course course# Courses.name 100 Fred PH PH Pharmacy 100 Fred PH CM Computing 200 Dave CM PH Pharmacy 200 Dave CM CM Computing 300 Bob CM PH Pharmacy 300 Bob CM CM Computing

  48. Theta Join • A Cartesian product with a condition applied: R ⋈ <condition> S

More Related