1 / 66

OCL4 Oracle 10g: SQL & PL/SQL Session #2

OCL4 Oracle 10g: SQL & PL/SQL Session #2. Matthew P. Johnson CISDD, CUNY June, 2005. Agenda. Last time: FDs This time: Anomalies Normalization Then: SQL. Review examples: finding FDs. Product(name, price, category, color) name, category  price category  color

sherrya
Download Presentation

OCL4 Oracle 10g: SQL & PL/SQL Session #2

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. OCL4 Oracle 10g:SQL & PL/SQLSession #2 Matthew P. Johnson CISDD, CUNY June, 2005 Matthew P. Johnson, OCL5, CISDD CUNY, Sept 2005

  2. Agenda • Last time: FDs • This time: • Anomalies • Normalization • Then: SQL Matthew P. Johnson, OCL5, CISDD CUNY, Sept 2005

  3. Review examples: finding FDs • Product(name, price, category, color) name, category  price category  color Keys are: {name, category} • Enrollment(student, address, course, room, time) student  address room, time  course student, course room, time Keys are: [in class] Matthew P. Johnson, OCL5, CISDD CUNY, Sept 2005

  4. Next topic: Anomalies • Identify anomalies in existing schema • How to decompose a relation • Boyce-Codd Normal Form (BCNF) • Recovering information from a decomposition • Third Normal Form Matthew P. Johnson, OCL5, CISDD CUNY, Sept 2005

  5. Types of anomalies • Redundancy • Repeat info unnecessarily in several tuples • Update anomalies: • Change info in one tuple but not in another • Deletion anomalies: • Delete some values & lose other values too • Insert anomalies: • Inserting row means having to insert other, separate info / null-ing it out Matthew P. Johnson, OCL5, CISDD CUNY, Sept 2005

  6. Example of anomalies • Redundancy: name, maddress • Update anomaly: Bill moves • Delete anom.: Bill doesn’t pay bills, lose phones  lose Bill! • Insert anom: can’t insert someone without a (non-null) phone • Underlying cause: SSN-phone is many-many • Effect: partial dependency ssn  name, maddress, • Whereas key = {ssn,phone} SSN Name, Mailing-address SSN Phone Matthew P. Johnson, OCL5, CISDD CUNY, Sept 2005

  7. Decomposition by projection • Soln: replace anomalous R with projections of R onto two subsets of attributes • Projection: an operation in Relational Algebra • Corresponds to SELECT command in SQL • Projecting R onto attributes (A1,…,An) means removing all other attributes • Result of projection is another relation • Yields tuples whose fields are A1,…,An • Resulting duplicates ignored Matthew P. Johnson, OCL5, CISDD CUNY, Sept 2005

  8. R1(A1, ..., An, B1, ..., Bm) R2(A1, ..., An, C1, ..., Cp) Projection for decomposition R(A1, ..., An, B1, ..., Bm, C1, ..., Cp) R1 = projection of R on A1, ..., An, B1, ..., Bm R2 = projection of R on A1, ..., An, C1, ..., Cp A1, ..., An  B1, ..., Bm C1, ..., Cp= all attributes, usually disjoint sets R1 and R2 may (/not) be reassembled to produce original R Matthew P. Johnson, OCL5, CISDD CUNY, Sept 2005

  9. Name SSN Mailing-address SSN Phone Michael 123 NY 123 212-111-1111 Hilary 456 DC 123 917-111-1111 Bill 789 Chappaqua 456 202-222-2222 456 914-222-2222 789 914-222-2222 789 212-333-3333 Decomposition example Break the relation into two: • The anomalies are gone • No more redundant data • Easy to for Bill to move • Okay for Bill to lose all phones Matthew P. Johnson, OCL5, CISDD CUNY, Sept 2005

  10. name buys Person Product price name ssn Relational Model: plus FD’s Normalization: Eliminates anomalies Thus: high-level strategy Conceptual Model: Matthew P. Johnson, OCL5, CISDD CUNY, Sept 2005

  11. Using FDs to produce good schemas • Start with set of relations • Define FDs (and keys) for them based on real world • Transform your relations to “normal form” (normalize them) • Do this using “decomposition” • Intuitively, good design means • No anomalies • Can reconstruct all (and only the) original information Matthew P. Johnson, OCL5, CISDD CUNY, Sept 2005

  12. Decomposition terminology • Projection: eliminating certain attributes from relation • Decomposition: separating a relation into two by projection • Join: (re)assembling two relations • Whenever a row from R1 and a row from R2 have the same value for some atts A, join together to form a row of R3 • If exactly the original rows are reproduced by joining the relations, then the decomposition was lossless • We join on the attributes R1 and R2 have in common (As) • If it can’t, the decomposition was lossy Matthew P. Johnson, OCL5, CISDD CUNY, Sept 2005

  13. Lossless Decompositions Lossless Decompositions A decomposition is lossless if we can recover: R(A,B,C) R1(B,C) R2(B,A) R’(A,B,C) should be the same as R(A,B,C) Decompose Recover R’ is in general larger than R. Must ensure R’ = R Matthew P. Johnson, OCL5, CISDD CUNY, Sept 2005

  14. Lossless decomposition • Sometimes the data can be reproduced: • (Word, 100) + (Word, WP)  (Word, 100, WP) • (Oracle, 1000) + (Oracle, DB)  (Oracle, 1000, DB) • (Access, 100) + (Access, DB)  (Access, 100, DB) Matthew P. Johnson, OCL5, CISDD CUNY, Sept 2005

  15. Lossy decomposition • Sometimes it’s not: • (Word, WP) + (100, WP)  (Word, 100, WP) • (Oracle, DB) + (1000, DB)  (Oracle, 1000, DB) • (Oracle, DB) + (100, DB)  (Oracle, 100, DB) • (Access, DB) + (1000, DB)  (Access, 1000, DB) • (Access, DB) + (100, DB)  (Access, 100, DB) What’swrong? Matthew P. Johnson, OCL5, CISDD CUNY, Sept 2005

  16. Ensuring lossless decomposition • Examples: • name price, so first decomposition was lossless • category name and category price, and so second decomposition was lossy R(A1, ..., An, B1, ..., Bm, C1, ..., Cp) R1(A1, ..., An, B1, ..., Bm) R2(A1, ..., An, C1, ..., Cp) If A1, ..., An B1, ..., Bmor A1, ..., An C1, ..., Cp Then the decomposition is lossless Note: don’t need both Matthew P. Johnson, OCL5, CISDD CUNY, Sept 2005

  17. Quick lossless/lossy example • At a glance: can we decompose into R1(Y,X), R2(Y,Z)? • At a glance: can we decompose into R1(X,Y), R2(X,Z)? Matthew P. Johnson, OCL5, CISDD CUNY, Sept 2005

  18. Normal Forms • First Normal Form = all attributes are atomic • As opposed to set-valued • Assumed all along • Second Normal Form (2NF) • Third Normal Form (3NF) • Boyce Codd Normal Form (BCNF) • Fourth Normal Form (4NF) • Fifth Normal Form (5NF) Matthew P. Johnson, OCL5, CISDD CUNY, Sept 2005

  19. Most important: BCNF A simple condition for removing anomalies from relations: A relation R is in BCNF if: If As  Bs is a non-trivial dependency in R , then As is a superkey for R I.e.: The left side must always contain a key I.e: If a set of attributes determines other attributes, it must determine all the attributes • Codd: Ted Codd, IBM researcher, inventor of relational model, 1970 • Boyce: Ray Boyce, IBM researcher, helped develop SQL in the 1970s Matthew P. Johnson, OCL5, CISDD CUNY, Sept 2005

  20. B’s A’s Others R1 R2 BCNF decomposition algorithm Repeat choose A1, …, Am B1, …, Bn that violates the BNCF condition //Heuristic: choose Bs as large as possible split R into R1(A1, …, Am, B1, …, Bn) and R2(A1, …, Am, [others]) continue with both R1 and R2Until no more violations Matthew P. Johnson, OCL5, CISDD CUNY, Sept 2005

  21. Boyce-Codd Normal Form • Name/phone example is not BCNF: • {ssn,phone} is key • FD: ssn  name,mailing-address holds • Violates BCNF: ssn is not a superkey • Its decomposition is BCNF • Only superkeys  anything else Matthew P. Johnson, OCL5, CISDD CUNY, Sept 2005

  22. BCNF motivation • Two big ideas: • Only a key field can determine other fields • Key values are unique •  no FD-caused redundancy • Slogan: “Every FD must contain the key, the whole key and nothing but the key.” • More accurate: “Every FD must contain (on the left) a key, a whole key, and maybe other fields. Matthew P. Johnson, OCL5, CISDD CUNY, Sept 2005

  23. BCNF Decomposition • Larger example: multiple decompositions • {Title, Year, Studio, President, Pres-Address} • FDs: • Title Year  Studio • Studio  President • President  Pres-Address •  Studio  President, Pres-Address (why?) • No many-many this time • Problem cause: transitive FDs: • Title,year  studio  president Matthew P. Johnson, OCL5, CISDD CUNY, Sept 2005

  24. BCNF Decomposition • Illegal: As  Bs, where As don’t include key • Decompose: Studio  President, Pres-Address • As = {studio} • Bs = {president, pres-address} • Cs = {title, year} • Result: • Studios(studio, president, pres-address) • Movies(studio, title, year) • Is (2) in BCNF? Is in (1) BCNF? • Key: Studio • FD: President  Pres-Address • Q: Does president  studio? If so, president is a key • But if not, it violates BCNF Matthew P. Johnson, OCL5, CISDD CUNY, Sept 2005

  25. BCNF Decomposition • Studios(studio, president, pres-address) • Illegal: As  Bs, where As don’t include key •  Decompose: President  Pres-Address • As = {president} • Bs = {pres-address} • Cs = {studio} • {Studio, President, Pres-Address} becomes • {President, Pres-Address} • {Studio, President} Matthew P. Johnson, OCL5, CISDD CUNY, Sept 2005

  26. Roadmap • Want to remove redundancy/anomalies • Convert to BCNF • Find FDs – closure alg • Check if each FD AB is ok • If A contains a key • If not, decompose into R1(A,B), R2(A,rest) • Because AB, this will be lossless • Could check by joining R1 and R2 • Would get no rows not in original Matthew P. Johnson, OCL5, CISDD CUNY, Sept 2005

  27. Decomposition algorithm example • R(N,O,R,P) F = {N  O, O  R, R  N} • Key: N,P • Violations of BCNF: N  O, OR, N OR • which kinds of violations are these? • Pick N  OR (on board) • Can we rejoin? (on board) • What happens if we pick N  O instead? • Can we rejoin? (on board) Matthew P. Johnson, OCL5, CISDD CUNY, Sept 2005

  28. BCNF summary • BCNF decomposition is lossless • Can reproduce original by joining • Saw last time: Every 2-attribute relation is in BCNF • Final set of decomposed relations might be different depending on • Order of bad FDs chosen • Saw last time: But all results will be in BCNF Matthew P. Johnson, OCL5, CISDD CUNY, Sept 2005

  29. BCNF summary • BCNF decomp. does not lose data • Resulting relations can be rejoined to obtain the original • In BCNF, there’s no FD-based redundancy • Values in key field are unique • Other FDs are from key fields •  everything is “as compressed as possible” Matthew P. Johnson, OCL5, CISDD CUNY, Sept 2005

  30. BCNF Review • Q: What’s required for BCNF? • Q: What’s the slogan for BCNF? • Q: Who are B & C? Matthew P. Johnson, OCL5, CISDD CUNY, Sept 2005

  31. BCNF Review • Q: How do we fix a non-BCNF relation? • Q: If AsBs violates BCNF, what do we do? • Q: In this case, could the decomposition be lossy? • Q: Under what circumstances could a decomposition be lossy? • Q: How do we combine two relations? Matthew P. Johnson, OCL5, CISDD CUNY, Sept 2005

  32. Design Goals • Goal for a relational database design is: • No redundancy • Lossless Join • Dependency Preservation • If we cannot achieve this, we accept one of • dependency loss • use of more expensive inter-relational methods to preserve dependencies • data redundancy due to use of 3NF • Interesting: SQL does not provide a direct way of specifying FDs other than superkeys • can specify FDs using assertions, but they are expensive to test Matthew P. Johnson, OCL5, CISDD CUNY, Sept 2005

  33. Recap: You are here • First part of course is done: conceptual foundations • You now know: • E/R Model • Relational Model • Relational Algebra • You now know how to: • Capture part of world as an E/R model • Convert E/R models to relational models • Convert relational models to good (normal) forms • Next: • Create, update, query tables with R.A/SQL • Write SQL/DB-connected applications Matthew P. Johnson, OCL5, CISDD CUNY, Sept 2005

  34. High-level agenda • Install Oracle • Start SQL • Lab on Oracle system info/SQL Matthew P. Johnson, OCL5, CISDD CUNY, Sept 2005

  35. Next topic: SQL • Standard language for querying and manipulating data Structured Query Language • Many standards: ANSI SQL, SQL92/SQL2, SQL3/SQL99 • Vendors support various subsets/extensions • We’ll do Oracle’s version • Basic form (many more bells and whistles in addition): SELECT attributes FROM relations (possibly multiple, joined) WHERE conditions (selections) Matthew P. Johnson, OCL5, CISDD CUNY, Sept 2005

  36. Data Types in SQL • Characters: • CHAR(20) -- fixed length • VARCHAR(40) -- variable length • Numbers: • BIGINT, INT, SMALLINT, TINYINT • REAL, FLOAT -- differ in precision • MONEY • Times and dates: • DATE • DATETIME -- SQL Server Matthew P. Johnson, OCL5, CISDD CUNY, Sept 2005

  37. Table name “Tables” Attribute names Product Tuples or rows Matthew P. Johnson, OCL5, CISDD CUNY, Sept 2005

  38. Simple SQL Query Product SELECT *FROM ProductWHERE category='Gadgets' “selection” Matthew P. Johnson, OCL5, CISDD CUNY, Sept 2005

  39. Simple SQL Query Product SELECT PName, Price, ManufacturerFROM ProductWHERE Price > 100 “selection” and “projection” Matthew P. Johnson, OCL5, CISDD CUNY, Sept 2005

  40. A Notation for SQL Queries Input Schema Product(PName, Price, Category, Manfacturer) SELECT Name, Price, ManufacturerFROM ProductWHERE Price > 100 (PName, Price, Manfacturer) Output Schema Matthew P. Johnson, OCL5, CISDD CUNY, Sept 2005

  41. SQL • SQL SELECT • Sometimes called a “projection” • What goes in the WHERE clause: • x = y, x < y, x <= y, etc. • For number, they have the usual meanings • For CHAR and VARCHAR: lexicographic ordering • Expected conversion between CHAR and VARCHAR • For dates and times, what you expect Matthew P. Johnson, OCL5, CISDD CUNY, Sept 2005

  42. SQL e.g. • Movies(Title,Year,Length,inColor,Studio,Prdcr#) • Q: How long was Star Wars (1977), in SQL? • Q: Which Fox movies are are at least 100 minutes long, in SQL? Matthew P. Johnson, OCL5, CISDD CUNY, Sept 2005

  43. SQL e.g. • Reps(ssn, name, etc.) • Clients(ssn, name, rssn) • Q: Who are George’s clients, in SQL? • Conceptually: • PClients.name(sReps.name=“George” and Reps.ssn=rssn(Reps x Clients)) Matthew P. Johnson, OCL5, CISDD CUNY, Sept 2005

  44. The LIKE operator • s LIKE p: pattern matching on strings • p may contain two special symbols: • _ = any single character • % = zero or more chars • Product(Name, Price, Category, Manufacturer) • Find all products whose name contains ‘gizmo’: SELECT *FROM ProductsWHERE Name LIKE '%gizmo%' Matthew P. Johnson, OCL5, CISDD CUNY, Sept 2005

  45. The LIKE operator • Q: What it want to search for values containing a ‘%’? • PName LIKE '%%%' won’t work • Instead, must use escape chars • In C/C++/J, prepend ‘\’ • In SQL, prepend an arbitrary escape char: • PName LIKE '%x%%' ESCAPE 'x' Matthew P. Johnson, OCL5, CISDD CUNY, Sept 2005

  46. More on escape chars • SQL: no official default escape char • In SQL*Plus: default escape char == '\' • Can set with • SQL> set escape x • Other tools, DBMSs: your mileage may vary • SQL string literals put in ' ': • 'mystring' • Single-quote literals escaped with single-quotes: • 'George''s string' Matthew P. Johnson, OCL5, CISDD CUNY, Sept 2005

  47. Q: What about double quotes? • A: Can’t be used in place of single quotes • But can be used when Oracle would otherwise misparse your command, e.g.: • Names with spaces: • create table bad table name (a int, b int); • Reserved words as names: • create table badfieldname(from int, b int); Matthew P. Johnson, OCL5, CISDD CUNY, Sept 2005

  48. Q: Can an escape char be an escape string? • A: No. SQL> select * from newtable where a like '%\%%' escape '\'; A B ---------- ---------- h%i there SQL> select * from newtable where a like '%\%%' escape '\\'; select * from newtable where a like '%\%%' escape '\\' * ERROR at line 1: ORA-01425: escape character must be character string of length 1 Matthew P. Johnson, OCL5, CISDD CUNY, Sept 2005

  49. More on single-quotes • Dates with DATE: • DATE '1948-05-14' • Timestamps with TIMESTAMP: • TIMESTAMP '1948-05-14 12:00:00' Matthew P. Johnson, OCL5, CISDD CUNY, Sept 2005

  50. Eliminating Duplicates SELECT category FROM Product Compare to: SELECTDISTINCT category FROM Product Matthew P. Johnson, OCL5, CISDD CUNY, Sept 2005

More Related