1 / 41

Midterm Review II

Join the epic journey of Star Wars as Luke Skywalker, Princess Leia, and Han Solo battle the dark forces of the Empire. Experience the thrill of this iconic film that defined a generation.

chuckn
Download Presentation

Midterm Review II

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Midterm Review II

  2. title year length filmTyp studioNam starName Star Wars 1977 124 color Fox Carrie Fisher Star Wars 1977 124 color Fox Mark Hamill Star Wars 1977 124 color Fox Harrison Ford Mighty Ducks 1991 104 color Disney Emilio Estevez Wayne’s World 1992 95 color Paramount Dana Carvey Wayne’s World 1992 95 color Paramount Mike Meyers Anomalies • Redundancy. • Information may be repeated unnecessarily in several tuples. • E.g. length and filmType. • Update anomalies. • We may change information in one tuple but leave it unchanged in other tuples. • E.g. we could change the length of Star Wars to 125, in the first tuple, and forget to do the same in the second and third tuple. • Deletion anomalies. • If a set of values becomes empty, we may lose other information as a side effect. • E.g. if we delete Emilio Estevez we will lose all the information about Mighty Ducks.

  3. Decomposing Relations - Example title year starName title year length filmTyp studioNam Star Wars 1977 Carrie Fisher Star Wars 1977 124 color Fox Star Wars 1977 Mark Hamill Mighty Ducks 1991 104 color Disney Star Wars 1977 Harrison Ford Wayne’s World 1992 95 color Paramount Mighty Ducks 1991 Emilio Estevez Wayne’s World 1992 Dana Carvey Wayne’s World 1992 Mike Meyers Movie1relation • No true redundancy! • The update anomaly disappeared. If we change the length of a movie, it is done only once. • The deletion anomaly disappeared. If we delete all the stars from Movie2 we still will have the other info for a movie. Movie2relation

  4. Boyce-Codd Normal Form • The goal of decomposition is to replace a relation by several that do not exhibit anomalies. • There is a simple condition under which the anomalies can be guaranteed not to exist. • This condition is calledBoyce-Codd Normal Form, or BCNF. • A relation is in BCNF if: • Whenever there is a nontrivial dependency A1A2…AnB1B2…Bm for R, it must be the case that the left hand side {A1 ,A2 ,… , An} is a superkey for R.

  5. Boyce-Codd Normal Form - Example Violating BCNF • Relation Movie in the previous figure is not in BCNF. • Consider the FD: title yearlength filmType studioName • Unfortunately, the left side of the above dependency is not a superkey. • In particular we know that the title and the year does not functionally determine starName. • On the other hand, Movie1 is in BCNF. • The only key is {title, year} and • title year  length filmType studioName is the only (non-trivial) FD that holds in the relation.

  6. Decomposition into BCNF • The decomposition strategy is: • Find a non-trivial FD A1A2…AnB1B2…Bm that violates BCNF, i.e. A1A2…An is not a superkey. • Decompose the relation schema into two overlapping relation schemas: • One is all the attributes involved in the violating dependency and • the other is the left side and all the other attributes not involved in the dependency. • By repeatedly, choosing suitable decompositions, we can break any relation schema into a collection of smaller schemas in BCNF. • The data in the original relation is represented faithfully by the data in the relations that are the result of the decomposition. • i.e. we can reconstruct the original relation exactly from the decomposed relations.

  7. Boyce-Codd Normal Form - Example Consider relation schema: Movies(title, year, studioName, president, presAddr) and functional dependencies: title year  studioName studioName  president president  presAddr Last two FDs violate BCNF. Why? Compute {title, year}+, {studioName}+, {president}+ and see if you get all the attributes of the relation. If not, you got some FD which is violates BCNF, and need to break relation.

  8. Boyce-Codd Normal Form – Example • Let’s decompose starting with: • studioName  president • Let’s add to the right-hand side any other attributes in the closure of studioName (optional “rule of thumb”). • X={studioName}studioNamepresident • X={studioName, president}presidentpresAddr • X={studioName}+={studioName, president, presAddr}

  9. Boyce-Codd Normal Form – Example From the closure we get: studioNamepresident presAddr We decompose the relation schema into the following two schemas: Movies1(studioName, president, presAddr) Movies2(title, year, studioName) Movies2 is in BCNF. Because we can’t find a “bad” FD holding there. What about Movies1? The following dependency violates BCNF. presidentpresAddr Why it’s bad to leave Movies1 table as is? If many studios share the same president than we would have redundancy when repeating the presAddr in all those studios.

  10. Boyce-Codd Normal Form – Example We must decompose Movies1, using the FD: presidentpresAddr The resulting relationschemas, both in BCNF, are: Movies11(title, year, studioName) Movies12(studioName, president) In general, we must keep applying the decomposition rule as many times as needed, until all our relations are in BCNF. So, finally we got Movies11, Movies12, and Movies2.

  11. Finding FDs for the decomposed relations • When we decompose a relation, we need to check that the resulting schemas are in BCNF. • We can’t tell a relation is in BCNF, unless we can determine the FDs that hold for that relation.

  12. Finding FDs for the decomposed relations • Suppose S is one of the resulting relations in a decomposition of R. • For this: • Consider each subset X of attributes of S. • Compute X+using the FD on R. • At the end throw out the attributes of R, which aren’t in S. • Then, for each attribute B such that: • B is an attribute of S, • B is in X+ • we have that the functionaldependency XB holds in S.

  13. Relational Algebra Operations • Operations of relational algebra fall into four broad classes: • The usual set operations • union • intersection • difference • Operations that remove parts of a relation: • selection eliminates some rows(tuples) • projection eliminates some columns • Operations that combine the tuples of two relations: • Cartesian product pairs the tuples of two relations in all possible ways • join selectively pairs tuples from two relations. • An operation called “renaming.”

  14. Conditions for Set Operations on Relations • R and S must have schemas with identical sets of attributes. • Before applying the operations, the columns of R and S must be ordered so that the order of attributes is the same for both relations.

  15. Projection Produces from a relation R a new relation that has only some of R’s columns. A1, A2,…,An(R)is a relation that has only the columns for attributes A1, A2,…, Anof R. Example: Compute the expression title, year, length(Movies) on the table: title year length filmType studioName producerC# Star wars 1977 124 color Fox 12345 Mighty Ducks 1991 104 color Disney 67890 Wayne’s World 1992 95 color Paramount 99999

  16. Example (Continued) Resulting relation: title year length Star wars 1977 124 Mighty Ducks 1991 104 Wayne’s World 1992 95 What about filmtype(Movies)

  17. Selection Selection, applied to a relation R, produces a new relation with a subset of R’s tuples. The tuples in the result are those that satisfy some condition C. Denote it with C( R ). The schema for the resulting relation is the same as R’s schema. Example: The expression length100(Movie) is: title year length filmType studioName producerC# Star wars 1977 124 color Fox 12345 Mighty Ducks 1991 104 color Disney 67890

  18. Cartesian Product • Cartesian Product of two relations R and S is the set of pairs that can be formed by choosing the first element of the pair to be any element of R and the second an element of S. This denoted as RS. • Example: • R: A B S: B C D • 1 2 2 5 6 • 3 4 4 7 8 • 9 10 11 RS: A R.B S.B C D 1 2 2 5 6 1 2 4 7 8 1 2 9 10 11 3 4 2 5 6 3 4 4 7 8 3 4 9 10 11

  19. Natural Join Denoted as R S. Let A1, A2,…,An be the attributes in both the schema of R and the schema of S. Then a tuple r from R and a tuple s from S are successfully paired if and only if r and s agree on each of the attributes A1, A2, …, An. Example: The natural join of the relation R and S from previous example is: A B C D 1 2 5 6 3 4 7 8

  20. Combing Operations to Form Queries • “What are the title and years of movies made by Fox that are at least 100 minutes long?” • One way to compute the answer to this query is: • Select those Movie tuples that have length  100. • Select those Movie tuples that have studioName =‘Fox’. • Compute the intersection of first and second steps. • Project the relation from the third step onto attributes title and year.

  21. Another Example • Consider two relations Movie1and Movie2, • With schemas: • Movie1(title, year, length, filmType, studioName) • Movie2(title, year, starName) • Suppose we want to know: • “Find the stars of the movies that are at least 100 minutes long.” • First we join the two relations: Movie1, Movie2 • Second we select movies with length at least 100 min. • Then we project the starName.

  22. Relational Algebra on Bags • A bag is like a set, but an element may appear more than once. • Example: {1,2,1,3} is a bag. {1,2,3} is also a bag that happens to be a set. • Bags also resemble lists, but order in a bag is unimportant. • Example: • {1,2,1} = {1,1,2} as bags, but • [1,2,1] != [1,1,2] as lists.

  23. Operations on Bags • Selection applies to each tuple, so its effect on bags is like its effect on sets. • Projection also applies to each tuple, but as a bag operator, we do not eliminate duplicates. • Products and joins are done on each pair of tuples, so duplicates in bags have no effect on how we operate.

  24. Bag Union, Intersection, Difference • An element appears in the union of two bags the sum of the number of times it appears in each bag. • Example: {1,2,1}  {1,1,2,3,1} = {1,1,1,1,1,2,2,3} • An element appears in the intersection of two bags the minimum of the number of times it appears in either. • Example: {1,2,1}  {1,2,3} = {1,2}. • An element appears in differenceA – B of bags as many times as it appears in A, minus the number of times it appears in B. • But never less than 0 times. • Example: {1,2,1} – {1,2,3} = {1}.

  25. The Extended Algebra •  = eliminate duplicates from bags. •  = sort tuples. • Extended projection: arithmetic, duplication of columns. •  = grouping and aggregation. • OUTERJOIN: avoids “dangling tuples” = tuples that do not join with anything.

  26. R S = A B C 1 2 3 4 5 NULL NULL 6 7 Example: Outerjoin R = A B S = B C 1 2 2 3 4 5 6 7 (1,2) joins with (2,3), but the other two tuples are dangling.

  27. Aggregation Operators • They apply to entire columns of a table and produce a single result. • The most important examples: • SUM • AVG • COUNT • MIN • MAX

  28. Example: Aggregation R = A B 1 3 3 4 3 2 SUM(A) = 7 COUNT(A) = 3 MAX(B) = 4 MIN(B) = 2 AVG(B) = 3

  29. Grouping Operator • R1 := L (R2). • L is a list of elements that are either: • Grouping attributes. • AGG(A), where AGG is one of the aggregation operators and A is an attribute. Semantics • Group R according to all the grouping attributes on list L. • That is, form one group for each distinct list of values for those attributes in R. • Within each group, compute AGG(A) for each aggregation on list L. • Result has grouping attributes and aggregations as attributes. • One tuple for each list of values for the grouping attributes and their group’s aggregations.

  30. Then, average C within groups: A B AVG(C) 1 2 4 4 5 6 First, group R : A B C 1 2 3 1 2 5 4 5 6 Example: Grouping/Aggregation R = A B C 1 2 3 4 5 6 1 2 5 A,B,AVG(C) (R) = ??

  31. Example: Grouping/Aggregation • StarsIn(title, year, starName) • We want, for each star who has appeared in at least three movies the earliest year in which he or she appeared. • First we group, using starName as a grouping attribute. • Then, we have to compute the MIN(year) for each group. • However, we need also compute COUNT(title) aggregate for each group, in order to filter out those stars with less than three movies. • ctTitle>3[starName,MIN(year)minYear,COUNT(title)ctTitle(StarsIn)]

  32. Aggregations in SQL • SUM, AVG, COUNT, MIN, and MAX can be applied to a column in a SELECT clause to produce that aggregation on the column. • Find the average length of movies from Disney. SELECT AVG(length) FROM Movie WHERE studioName = 'Disney';

  33. Eliminating Duplicates in an Aggregation • DISTINCT inside an aggregation causes duplicates to be eliminated before the aggregation. • Example: Find the number of different producers for Disney movies: SELECT COUNT(DISTINCT producerc) FROM Movie WHERE studioname = 'Disney'; This is not the same as: SELECT DISTINCT COUNT(producerc) FROM Movie WHERE studioname = 'Disney';

  34. NULL’s Ignored in Aggregation • NULL never contributes to a sum, average, or count, and can never be the minimum or maximum of a column. select SUM(networth) from movieexec;

  35. The number of movies from Disney with a known length. Example: Effect of NULL’s SELECT count(*) FROM Movie WHERE studioName = 'Disney'; SELECT count(length) FROM Movie WHERE studioName = 'Disney'; The number of movies from Disney.

  36. Grouping • We may follow a SELECT-FROM-WHERE expression by GROUP BY and a list of attributes. • The relation that results from the SELECT-FROM-WHERE is grouped according to the values of all those attributes, and any aggregation is applied only within each group. • From Movie relation, find the average length for each studio: SELECT studioName, AVG(length) FROM Movie GROUP BY studioName;

  37. Compute those tuples first, then group by name. Example: Grouping • Find the producer’s total length of film produced. SELECT name, SUM(length) FROM Movie, MovieExec WHERE producerc = cert GROUP BY name;

  38. Restriction on SELECT Lists With Aggregation • If any aggregation is used, then each element of the SELECT list must be either: • Aggregated, or • An attribute on the GROUP BY list.

  39. Illegal Query Example • We might think we could find the shortest movie of Disney as: SELECT title, MIN(length) FROM Movie WHERE studioName = 'Disney'; • But this query is illegal in SQL. Why? • Because title is neither aggregated nor on the GROUP BY list. • We should do instead: SELECT title, length FROM Movie WHERE studioName = 'Disney' AND length = (SELECT MIN(length) FROM Movie WHERE studioName = 'Disney');

  40. HAVING Clauses • HAVING <condition> may follow a GROUP BY clause. • If so, the condition applies to each group, and groups not satisfying the condition are eliminated. • These conditions may refer to attributes that make sense within a group; i.e., they are either: • Grouping attributes, or • Aggregated attributes.

  41. Example: HAVING • Suppose that we didn’t wish to include all the producers in our table of aggregated movie lengths. • Suppose for instance we want those producers who have at least one movie before 1972. SELECT name, SUM(length) FROM MovieExec, Movie WHERE producerc = cert GROUP BY name HAVING MIN(year) < 1973;

More Related