1 / 69

Functional Dependency Graphs and SQL

CS157A. Lecture 11. Functional Dependency Graphs and SQL. Prof. Sin-Min Lee Department of Computer Science San Jose State University. Data Normalization.

leontyne
Download Presentation

Functional Dependency Graphs and SQL

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS157A Lecture 11 Functional Dependency Graphs and SQL Prof. Sin-Min Lee Department of Computer Science San Jose State University

  2. Data Normalization • Primarily a tool to validate and improve a logical design so that it satisfies certain constraints that avoid unnecessary duplication of data. • The process of decomposing relations with anomalies to produce smaller, well-structured relations. • Primary Objective: Reduce Redundancy,Reduce nulls, • Improve “modify” activities: • insert, • update, • delete, • but not read • Price: degraded query, display, reporting

  3. Functional Dependency and Keys • Functional Dependency: The value of one attribute (the determinant) determines the value of another attribute. • Candidate Key: Each non-key field is functionally dependent on every candidate key.

  4. Functional dependency • a constraint between two attributes (columns) or two sets of columns • A  B if “for every valid instance of A, that value of A uniquely determines the value of B” • or …A B if “there exists at most one value of B for every value of A”

  5. Functional Dependencies R X Y Z • FDs defined over two sets of attributes: X, Y Ì R • Notation: X à Y reads as “X determines Y” • If X à Y, then all tuples that agree on X must also agree on Y 1 2 3 2 4 5 1 2 4 1 2 7 2 4 8 3 7 9

  6. Functional Dependencies Graph(example) X Y Z X Y Z 1 2 3 2 4 5 1 2 4 1 2 7 2 4 8 3 7 9

  7. R( A B C D) A B C D * 1 1 2 1 2 1 3 2 3 1 2 3 2 1 2 3 1 2 1 2 3 1 1 2 A B C D * Two Candidate Keys {AC}, {AD} Super Keys {AC}, {AD} {ABC}, {ACD} {ABD}, {ABCD}

  8. … functional dependency • some examples • SSN  Name, Address, Birthdate • VIN  Make, Model, Color • note: the LHS is the determinant • so functional dependency is the technical term for determines

  9. Candidate Keys • an attribute (or set of attributes) that uniquely identifies a row • primary key is a special candidate key • values cannot be null • e.g. • ENROLL (Student_ID, Name, Address, …) • PK = Student_ID • candidate key = Name, Address

  10. … candidate key • a candidate key must satisfy: • unique identification. • implies that each nonkey attribute is functionally dependent on the key (for not(A  B) to be true, A must occur more than once (with a different B), or A must map to more than one B in a given row) • nonredundancy • no attribute in the key can be deleted and still be unique • minimal set of columns (Simsion)

  11. keys and dependencies EMPLOYEE1 (Emp_ID, Name, Dept_Name, Salary) determinant functional dependency

  12. EMPLOYEE2 (Emp_ID, Course_Title, Name, Dept_Name, Salary, Date_Completed) not fully functionally dependant on the primary key

  13. determinants & candidate keys • candidate key is always a determinant (one way to find a determinant) • determinant may or may not be a candidate key •  candidate key is a determinant that uniquely identifies the remaining (nonkey) attributes • determinant may be • a candidate key • part of a composite candidate key • nonkey attribute

  14. Introduction • Data integrity maintained by various constraints on data • Functional dependencies are application constraints that help DB model real-world entity • Join dependencies are a further constraint that help resolve some FD constraint limitations

  15. What is SQL? • It is a language used to communicate with a database. • SQL statements are used to perform tasks such as update or retrieve data from a database. • Standard SQL commands: “Select”, “Create”, “Insert”, “Update”, “Delete”, and “Drop” can be used to accomplish almost everything that a database needs to do.

  16. History • Originally developed by IBM at San Jose Research Laboratory (a.k.a. Almaden Research Center) • Oracle, Sybase, Microsoft SQL server are some common relational database management systems that use SQL

  17. Table Basics • Data or information for the database are stored in tables. • Tables are uniquely identified by their names and are comprised of columns and rows. • Rows contain data for the columns.

  18. Basic Structure • The basic structure of an SQL expression consists of three clauses: select, from, and where. • The select clause corresponds to the projection operation of the relational algebra. It is used to list attributes desired in the result. • The from clause corresponds to the Cartesian-product operation of the relational algebra. It lists the relations to be scanned in the evaluation of the expression. • The where clause corresponds to the selection predicate of the relational algebra. It consists of a predicate involving attributes of the relations that appear in the from clause.

  19. Selecting Data • The Select statement is used to query the database and retrieve selected data that match the criteria that you specify • Example: select “column1” [,”column2”, etc] from “tablename” [where “condition”]; [ ] = optional continued…

  20. Example: select “column1” [,”column2”, etc] from “tablename” [where “condition”]; [ ] = optional The column names that follow the select keyword determine which columns will be returned in the results. The table name that follows the keyword from specifies the table that will be queried to retrieve the desired results. cont… Selecting Data cont…

  21. Example: select “column1” [,”column2”, etc] from “tablename” [where “condition”]; [ ] optional The where clause is optional and specifies which data values or rows will be returned or displayed, based on the criteria described on the condition. Conditional selections used in the where clause: =, >, <, >=, <=, and <> (not equal to). Selecting Data cont…

  22. Problem: Find all loan numbers for loans made at the Perryridge branch with loan amounts greater than $1200. Answer: selectloan-number fromloan wherebranch-name = ‘Perryridge’ and amount > 1200 **SQL uses the logical connectives and, or, and not – rather than the mathematical symbols in the where clause. The where Clause

  23. The from clause, by itself, defines a Cartesian product of the relations in the clause. Problem: For all customers who have a loan from the bank, find their names, loan numbers, and loan amount. Answer: select customer-name, borrower.loan- number, amount from borrower, loan whereborrower.loan- number = loan.loan- number The from Clause

  24. The create table statement is used to create a new table. The data types specify what the type of data can be for that particular column. Example of simple create table statement: create table “tablename” (“column1” “data type”, “column2” “data type”, “column3” “data type”); Creating Tables

  25. Most Common Data Types • char (size) – Fixed length character string. Size is specified in parenthesis. Max 255 bytes. • varchar (size) – Variable-length character string. Max size is specified in parenthesis. • number (size) – Number value with a max number of column digits specified in parenthesis. • date – Date value • number (size, d) – Number value with a max number of digits of “size” total, with a max number of “d” digits to the right of the decimal.

  26. Example: Problem: Create a table for a new company that contains the following information about your new employees: first name, last name, title, age, and salary. Answer: Create table myemployees (firstname varchar(30), lastname varchar(30), title varchar (30), age number(2), salary number(8, 2)); Creating Tables cont…

  27. The insert statement is used to insert or add a row of data into the table. Strings should be enclosed in single quotes, and numbers should not. Example 1: Insert into “tablename” (first_column, … last_column) values (first_value,…last_value); Example 2: Insert into employee (first, last, age, address, city, state) values (‘Luke’, ‘Duke’, 45, ‘2130 Boars Nest’, ‘Hazard Co.’, ‘Georgia’); Inserting into a Table

  28. The update statement is used to update or change records that match a specified criteria. This is accomplished by carefully constructing a where clause. Example 1: update phone_book set area_code = 623 where prefix = 979; Example 2: update “tablename” set “columnname” = “newvalue” [,”nextcolumn” = “newvalue2”…] where “columnname” OPERATOR “value” [and | or “column” OPERATOR “value”]; **[ ] = optional Updating Records

  29. The delete statement is used to delete records or rows from the table. To delete an entire row/record, enter “delete from” followed by the table name, the where clause. **If you leave off the where clause, it will delete all records. Example: delete from “tablename” where “columnname” OPERATOR “value” [and | or “column” OPERATOR “value”]; [ ] = optional Deleting Records

  30. The drop table command is used to delete a table and all rows in the table. drop table is different from deleting all of the records in the table. Dropping the table removes the table definition as well as all of its rows. Example: Drop table “tablename” Drop a Table

  31. The order by clause causes the tuples in the result of a query to appear in sorted order. By default, the order by clause lists items in ascending order. Example: To list in alphabetic order all customers who have a loan at the Perryridge branch: select distinctcustomer-name fromborrower, loan whereborrower.loan-number = loan.loan-number and branch-name = ‘Perryridge’ order bycustomer-name Ordering the Display of Tuples

  32. Aggregates • Functions that operate on sets: • COUNT, SUM, AVG, MAX, MIN • Produce numbers (not tables) • Not part of relational algebra SELECT COUNT(*) FROM Professor P SELECT MAX (Salary) FROMEmployee E

  33. Aggregates Count the number of courses taught in S2000 SELECT COUNT (T.CrsCode) FROM Teaching T WHERE T.Semester = ‘S2000’ But if multiple sections of same course are taught, use: SELECT COUNT (DISTINCT T.CrsCode) FROM Teaching T WHERE T.Semester = ‘S2000’

  34. Aggregates: Proper and Improper Usage SELECT COUNT (T.CrsCode), T. ProfId – makes no sense (in the absence of GROUP BYclause) SELECT COUNT (*), AVG (T.Grade) – but this is OK WHERE T.Grade > COUNT (SELECT ….) – aggregate cannot be applied to result of SELECT statement

  35. Grouping • But how do we compute the number of courses taught in S2000 per professor? • Strategy 1: Fire off a separate query for each professor: SELECT COUNT(T.CrsCode) FROM Teaching T WHERE T.Semester = ‘S2000’ AND T.ProfId = 123456789 • Cumbersome • What if the number of professors changes? Add another query? • Strategy 2: define a special grouping operator: SELECT T.ProfId, COUNT(T.CrsCode) FROM Teaching T WHERE T.Semester = ‘S2000’ GROUP BYT.ProfId

  36. GROUPBY

  37. GROUP BY - Example Transcript Attributes: -student’s Id -avg grade -number of courses 1234 1234 1234 1234 1234 3.3 4 SELECT T.StudId, AVG(T.Grade), COUNT (*) FROM TranscriptT GROUP BY T.StudId

  38. HAVING Clause • Eliminates unwanted groups (analogous to WHERE clause) • HAVING condition constructed from attributes of GROUP BY list and aggregates of attributes not in list SELECT T.StudId, AVG(T.Grade) AS CumGpa, COUNT (*) AS NumCrs FROM Transcript T WHERE T.CrsCode LIKE ‘CS%’ GROUP BY T.StudId HAVING AVG (T.Grade) > 3.5

  39. Evaluation of GroupBy with Having

  40. Example • Output the name and address of all seniors on the Dean’s List SELECT S.Id, S.Name FROMStudent S, Transcript T WHERE S.Id = T.StudIdAND S.Status = ‘senior’ GROUP BY HAVING AVG (T.Grade) > 3.5 AND SUM (T.Credit) > 90 S.Id -- wrong S.Id, S.Name -- right Every attribute that occurs in SELECT clause must also occur in GROUP BY or it must be an aggregate. S.Name does not.

  41. ORDER BY Clause • Causes rows to be output in a specified order SELECT T.StudId, COUNT (*) AS NumCrs, AVG(T.Grade) AS CumGpa FROM Transcript T WHERE T.CrsCode LIKE ‘CS%’ GROUP BY T.StudId HAVING AVG (T.Grade) > 3.5 ORDER BYDESCCumGpa, ASCStudId

  42. Query Evaluation Strategy • Evaluate FROM: produces Cartesian product, A, of tables in FROM list • Evaluate WHERE: produces table, B, consisting of rows of A that satisfy WHERE condition • Evaluate GROUP BY: partitions B into groups that agree on attribute values in GROUP BY list • Evaluate HAVING: eliminates groups in B that do not satisfy HAVING condition • Evaluate SELECT: produces table C containing a row for each group. Attributes in SELECT list limited to those in GROUP BY list and aggregates over group • Evaluate ORDER BY: orders rows of C

  43. Nested Queries List all courses that were not taught in S2000 SELECT C.CrsName FROM Course C WHERE C.CrsCode NOT IN (SELECT T.CrsCode --subquery FROM Teaching T WHERE T.Sem = ‘S2000’) Evaluation strategy: subquery evaluated once to produces set of courses taught in S2000. Each row (as C) tested against this set.

More Related