Download Presentation
## Relational Algebra and My SQL(II)

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -

**Lecture 6 CS157B**Relational Algebra and My SQL(II) Prof. Sin Min Lee Deparment of Computer Science San Jose State University**Lecture 12:Further relational algebra, further SQL**www.cl.cam.ac.uk/Teaching/current/Databases/**Today’s lecture**• Where does SQL differ from relational model? • What are some other features of SQL? • How can we extend the relational algebra to match more closely SQL?**Duplicate rows**• Consider our relation instances from lecture 6, Reserves, Sailors and Boats • Consider SELECT rating,age FROM Sailors; • We get a relation that doesn’t satisfy our definition of a relation! • RECALL: We have the keyword DISTINCT to remove duplicates**Multiset semantics**• A relation in SQL is really a multiset or bag, rather than a set as in the relational model • A multiset has no order (unlike a list), but allows duplicates • E.g. {1,2,1,3} is a bag • select, project and join work for bags as well as sets • Just work on a tuple-by-tuple basis**Bag operations**• Bag union: • Sum the number of times that an element appears in the two bags, e.g. • {1,2,1}{1,2,3} = {1,1,1,2,2,3} • Bag intersection: • Take the minimum of the number of occurrences in each bag, e.g. • {1,2,1}{1,2,3,3} = {1,2} • Bag difference: • Proper-subtract the number of occurrences in the two bags, e.g. • {1,2,1}-{1,2,3,3} = {1}**Laws for bags**• Note that whilst some of the familiar (set-theoretic) laws continue to hold, some of them do not • Example: R(ST) = (RS)(RT) ??**Extended relational algebra**Add features needed for SQL • Bag semantics • Duplicate elimination operator, • Sorting operator, • Grouping and aggregation operator, • Outerjoin operators, oV, Vo, oVo**Duplicate-elimination operator**• (R) = relation R with any duplicated tuples removed • R= (R)= • This is used to model the DISTINCT feature of SQL**Sorting**• L1,… Ln(R) returns a list of tuples of R, ordered according to the attributes L1, …, Ln • Note: does not return a relation • R= B(R)= [(5,2),(1,3),(3,4)] • ORDER BY in SQL, e.g. SELECT * FROM Sailors WHERE rating>7 ORDER BY age, sname;**Extended projection**• SQL allows us to use arithmetic operators SELECT age*5 FROM Sailors; • We extend the projection operator to allow the columns in the projection to be functions of one or more columns in the argument relation, e.g. • R= A+B,A,A(R)=**Arithmetic**• Arithmetic (and other expressions) can not be used at the top level • i.e. 2+2 is not a valid SQL query • How would you get SQL to compute 2+2?**Aggregation**• SQL provides us with operations to summarise a column in some way, e.g. SELECT COUNT(rating) FROM Sailors; SELECT COUNT(DISTINCT rating) FROM Sailors; SELECT COUNT(*) FROM Sailors WHERE rating>7; • We also have SUM, AVG, MIN and MAX**Grouping**• These aggregation operators have been applied to all qualifying tuples. Sometimes we want to apply them to each of several groups of tuples, e.g. • For each rating, find the average age of the sailors • For each rating, find the age of the youngest sailor**GROUP BY in SQL**SELECT [DISTINCT] target-list FROM relation-list WHERE qualification GROUP BY grouping-list; • The target-list contains • List of column names • Aggregate terms • NOTE: The variables in target-list must be contained in grouping-list**GROUP BY cont.**For each rating, find the average age of the sailors SELECT rating,AVG(age) FROM Sailors GROUP BY rating; For each rating find the age of the youngest sailor SELECT rating,MIN(age) FROM Sailors GROUP BY rating;**Grouping and aggregation**• L(R) where L is a list of elements that are either • Individual column names (“Grouping attributes”), or • Of the form (A), where is an aggregation operator (MIN, SUM, …) and A is the column it is applied to • For example, rating,AVG(age)(Sailors)**Semantics**• Group R according to the grouping attributes • Within each group, compute (A) • Result is the relation consisting of one tuple for each group. The components of that tuple are the values associated with each element of L for that group**Example**• Let R= • Compute beer,AVG(price)(R)**Example cont.**• Group according to the grouping attribute, beer: • Compute average of price within groups:**NULL values**• Sometimes field values are unknown (e.g. rating not known yet), or inapplicable (e.g. no spouse name) • SQL provides a special value, NULL, for both these situations • This complicates several issues • Special operators needed to check for NULL • Is NULL>8? Is (NULL OR TRUE)=TRUE? • We need a three-valued logic • Need to carefully re-define semantics**NULL values**• Consider INSERT INTO Sailors (sid,sname) VALUES (101,”Julia”); SELECT * FROM Sailors; SELECT rating FROM Sailors; SELECT sname FROM Sailors WHERE rating>0;**Entity integrity constraint**• An entity integrity constraint states that no primary key value can be NULL**Outer join**• Note that with the usual join, a tuple that doesn’t ‘join’ with any from the other relation is removed from the resulting relation • Instead, we can ‘pad out’ the columns with NULLs • This operator is called an full outer join, written oVo**Example of full outer join**• Let R= Let S= • Then RVS = • But RoVoS =**Outer joins in SQL**• SQL/92 has three variants: • LEFT OUTER JOIN (algebra: oV) • RIGHT OUTER JOIN (algebra: Vo) • FULL OUTER JOIN (algebra: oVo) • For example: SELECT * FROM Reserves r LEFT OUTER JOIN Sailors sON r.sid=s.sid;**Views**• A view is a query with a name that can be used in further SELECT statements, e.g. CREATE VIEW ExpertSailors(sid,sname,age) AS SELECT sid,sname,age FROM Sailors WHERE rating>9; • Note that ExpertSailors is not a stored relation • (WARNING: mysql does not support views )**Querying views**• So an example query SELECT sname FROM ExpertSailors WHERE age>27; • is translated by the system to the following: SELECT sname FROM Sailors WHERE rating>9 AND age>27;**Relational Algebra**• The Relational Algebra is used to define the ways in which relations (tables) can be operated to manipulate their data. • It is used as the basis of SQL for relational databases, and illustrates the basic operations required of any DML. • This Algebra is composed of Unary operations (involving a single table) and Binary operations (involving multiple tables).**SQL**• Structured Query Language (SQL) • Standardised by ANSI • Supported by modern RDBMSs • Commands fall into three groups • Data Definition Language (DLL) • Create tables, etc • Data Manipulation Language (DML) • Retrieve and modify data • Data Control Language • Control what users can do – grant and revoke privileges**Selection**• The selection or operation selects rows from a table that satisfy a condition: < condition > < tablename > • Example: course = ‘CM’Students Students stud# namecourse 100Fred PH stud# namecourse 200 Dave CM 200 Dave CM 300 Bob CM 300 Bob CM**Projection**• The projection or operation selects a list of columns from a table. < column list > < tablename > • Example: stud#, nameStudents Students stud# namecoursestud# name 100Fred PH 100Fred 200 Dave CM 200 Dave 300 Bob CM 300 Bob**Selection / Projection**• Selection and Projection are usually combined: stud#, name(course = ‘CM’Students) Students stud# namecourse 100Fred PH stud# name 200 Dave CM 200 Dave 300 Bob CM 300 Bob**Binary OperationsCartesian ProductTheta JoinInner**JoinNatural JoinOuter JoinsSemi Joins**Cartesian Product**• Concatenation of every row in the first relation (R) with every row in the second relation (S): R X S**Cartesian Product - Example**StudentsCourses stud# namecoursecourse# name 100Fred PH PH Pharmacy 200 Dave CM CM Computing 300 Bob CM Students XCourses = stud# Students.name course course# Courses.name 100 Fred PH PH Pharmacy 100 Fred PH CM Computing 200 Dave CM PH Pharmacy 200 Dave CM CM Computing 300 Bob CM PH Pharmacy 300 Bob CM CM Computing**Theta Join**• A Cartesian product with a condition applied: R ⋈ <condition> S**Theta Join - Example**StudentsCourses stud# namecoursecourse# name 100 Fred PH PH Pharmacy 200 Dave CM CM Computing 300 Bob CM Students ⋈ stud# = 200Courses stud# Students.name course course# Courses.name 200 Dave CM PH Pharmacy 200 Dave CM CM Computing**Inner Join (Equijoin)**• A Theta join where the <condition> is the match (=) of the primary and foreign keys. R ⋈<R.primary_key = S.foreign_key> S**Inner Join - Example**StudentsCourses stud# namecoursecourse# name 100 Fred PH PH Pharmacy 200 Dave CM CM Computing 300 Bob CM Students ⋈ course = course#Courses stud# Students.name course course# Courses.name 100 Fred PH PH Pharmacy 200 Dave CM CM Computing 300 Bob CM CM Computing**Natural Join**• Inner join produces redundant data (in the previous example: course and course#). To get rid of this duplication: < stud#, Students.name, course, Courses.name > (Students ⋈<course = course#> Courses) Or R1= Students ⋈<course = course#> Courses R2= < stud#, Students.name, course, Courses.name > R1 The result is called the natural join of Students and Courses**Natural Join - Example**StudentsCourses stud# namecoursecourse# name 100Fred PH PH Pharmacy 200 Dave CM CM Computing • Bob CM R1=Students⋈<course = course#>Courses R2=< stud#, Students.name, course, Courses.name >R1 stud# Students.name course Courses.name 100 Fred PH Pharmacy 200 Dave CM Computing 300 Bob CM Computing**Outer Joins**• Inner join + rows of one table which do not satisfy the <condition>. • Left Outer Join:R <R.primary_key = S.foreign_key> S All rows from R are retained and unmatched rows of S are padded with NULL • Right Outer Join:R <R.primary_key = S.foreign_key> S All rows from S are retained and unmatched rows of R are padded with NULL**Left Outer Join - Example**StudentsCourses stud# namecoursecourse# name 100Fred PH PH Pharmacy 200 Dave CM CM Computing 400 Peter EN CH Chemistry Students<course = course#>Courses stud# Students.name course course# Courses.name 100 Fred PH PH Pharmacy 200 Dave CM CM Computing • Peter EN NULL NULL**Right Outer Join - Example**StudentsCourses stud# namecoursecourse# name 100Fred PH PH Pharmacy 200 Dave CM CM Computing 400 Peter EN CH Chemistry Students<course = course#>Courses stud# Students.name course course# Courses.name 100 Fred PH PH Pharmacy 200 Dave CM CM Computing NULL NULL NULL CH Chemistry**Combination of Unary and Join Operations**StudentsCourses stud# name address course course# name 100Fred Aberdeen PH PH Pharmacy 200 Dave Dundee CM CM Computing 300 Bob Aberdeen CM Show the names of students (from Aberdeen) and the names of their courses R1= Students ⋈<course=course#> Courses R2= <address=“Aberdeen”> R1 R3= <Students.name, Course.name> R2 Students.name Courses.name Fred Pharmacy Bob Computing**Set Operations**Union Intersection Difference**Union**• Takes the set of rows in each table and combines them, eliminating duplicates • Participating relations must be compatible, ie have the same number of columns, and the same column names, domains, and data types R S R S A B a2 b2 a3 b3 A B a1 b1 a2 b2 a3 b3 A B a1 b1 a2 b2**Intersection**• Takes the set of rows that are common to each relation • Participating relations must be compatible R S R S A B a1 b1 a2 b2 A B a2 b2 a3 b3 A B a2 b2