460 likes | 477 Views
Explore SQL basics with an emphasis on Select-From-Where statements, grouping, and aggregation, using Jeff Ullman's slides with modified examples. Learn how to build databases using XAMPP and execute queries efficiently.
E N D
Introduction to SQL Elements and process: relations database (with XAMPP) Select-From-Where Statements Grouping and Aggregation Slides by Jeff Ullman (infolab.stanford.edu/~ullman/dscb/pslides/sql1.ppt), with example modified and some additions Bettina Berendt, ISI 2015 Last updated 2015-10-21
Why SQL? • SQL is a very-high-level language. • Say “what to do” rather than “how to do it.” • Avoid a lot of data-manipulation details needed in procedural languages like C++ or Java. • Database management system figures out “best” way to execute query. • Called “query optimization.”
Select-From-Where Statements SELECT desired attributes FROM one or more tables WHERE condition about tuples of the tables
A note about case-sensitivity • SQL is case-sensitive inside strings. • MySQL is case-sensitive on some operating systems (e.g. Mac). • It isn‘t on Windows.
Our Running ExampleAll our SQL queries will be based on the following database schema Session (Month, Year) Session_Day (Day, Month, Year) Agenda_Item (Agenda_item_ID, Title, Day, Month, Year, Number) Speech (Speech_ID, Spoken_text, Language, Video_URI, Agenda_item_ID, Number, MEP_ID) Parliament_Member (MEP_ID, Date_of_birth, Given_name, Family_name) Country (Acronym, Name, EU_member_since) Role (Name) Political_Institution (Acronym, Institution_label) Represents (MEP_ID, CountryAcronym) In_Political_Function (MEP_ID, RoleName, Inst_Acronym, Start_date, End_date) EU_Party (Inst_Acronym) National_Party (Inst_Acronym) EU_Committee (Inst_Acronym)
So first, we need to turn these relations into actual tables! • We will use XAMPP (https://www.apachefriends.org), which installs a web server, a database management system, and a browser-based interface for your databases on your computer. The first is Apache, the second MariaDB (was: MySQL*), and the third phpmyadmin. • A short tutorial is here:https://www.siteground.com/tutorials/phpmyadmin/phpmyadmin_create_database.htm • In the lecture, we will build up the database step by step. • To load the already-implemented and –populated database: • Download the file eup.sql from Toledo • create a database named (e.g.) eup • Go to Import, choose the file eup.sql, click OK * For our purposes, the same. See http://programmers.stackexchange.com/questions/120178/whats-the-difference-between-mariadb-and-mysql
Example • Using Agenda_Item (Agenda_item_ID, Title, Day, Month, Year, Number), what titles were discussed in 2012? SELECT Title FROM Agenda_Item WHERE Year=2012;
Result of Query The answer is a relation with a single attribute, Title, and tuples with the title of each agenda item discussed in 2012.
Meaning of Single-Relation Query(“Formal Semantics”) • Begin with the relation in the FROM clause. • Apply the selection indicated by the WHERE clause. • Apply the extended projection indicated by the SELECT clause.
tv Include tv.Title in the result Check if 2012 Operational Semantics Title Year … Com- position <etc.> 2012
Operational Semantics • To implement this algorithm think of a tuple variable ranging over each tuple of the relation mentioned in FROM. • Check if the “current” tuple satisfies the WHERE clause. • If so, compute the attributes or expressions of the SELECT clause using the components of this tuple.
* In SELECT clauses • When there is one relation in the FROM clause, * in the SELECT clause stands for “all attributes of this relation.” • Example using Agenda_item: SELECT * FROM Agenda_Item WHERE Year=2012;
Result of Query: Now, the result has each of the attributes of Agenda item.
Complex Conditions in WHERE Clause • From Agenda_item, find the titles of agenda items in February 2012: SELECT Title FROM Agenda_Item WHERE Year=2012 AND Month=02;
Patterns • WHERE clauses can have conditions in which a string is compared with a pattern, to see if it matches. • General form: <Attribute> LIKE <pattern> or <Attribute> NOT LIKE <pattern> • Pattern is a quoted string with % = “any string”; _ = “any character.”
Example • From Agenda_item, find the agenda items whose title includes “Composition” : SELECT Title, Month, Year FROM Agenda_Item WHERE Title LIKE "%Composition%"; Note that SQL is case-sensitive inside strings, and MySQL is case-sensitive on some operating systems (e.g. Mac).
NULL Values • Tuples in SQL relations can have NULL as a value for one or more components. • Meaning depends on context. Two common cases: • Missing value: e.g., we know Louise Weiss has some birth date, but we don’t know what it is. • Inapplicable : e.g., the value of attribute spouse for an unmarried person.
Comparing NULL’s to Values • The logic of conditions in SQL is really 3-valued logic: TRUE, FALSE, UNKNOWN. • When any value is compared with NULL, the truth value is UNKNOWN. • But a query only produces a tuple in the answer if its truth value for the WHERE clause is TRUE (not FALSE or UNKNOWN).
Three-Valued Logic • To understand how AND, OR, and NOT work in 3-valued logic, think of TRUE = 1, FALSE = 0, and UNKNOWN = ½. • AND = MIN; OR = MAX, NOT(x) = 1-x. • Example: TRUE AND (FALSE OR NOT(UNKNOWN)) = MIN(1, MAX(0, (1 - ½ ))) = MIN(1, MAX(0, ½ ) = MIN(1, ½ ) = ½.
UNKNOWN UNKNOWN UNKNOWN Surprising Example • From the following Sells relation: bar beer price Joe’s Bar Bud NULL SELECT bar FROM Sells WHERE price < 2.00 OR price >= 2.00;
Multirelation Queries • Interesting queries often combine data from more than one relation. • We can address several relations in one query by listing them all in the FROM clause. • Distinguish attributes of the same name by “<relation>.<attribute>”
Example • Using relations Parliament_member and In_Political_Function, find the names of MEPs who are chairs (or co-chairs, vice-chairs, …) of some institution. SELECT Given_name, Family_name, Inst_Acronym, RoleName FROM Parliament_Member, In_Political_Function WHERE Parliament_Member.MEP_ID = In_Political_Function.MEP_ID AND RoleName LIKE "%chair";
Formal Semantics • Almost the same as for single-relation queries: • Start with the product of all the relations in the FROM clause. • Apply the selection condition from the WHERE clause. • Project onto the list of attributes and expressions in the SELECT clause.
Operational Semantics • Imagine one tuple-variable for each relation in the FROM clause. • These tuple-variables visit each combination of tuples, one from each relation. • If the tuple-variables are pointing to tuples that satisfy the WHERE clause, send these tuples to the SELECT clause.
to output check for role check these are equal Example MEP_ID Family_name MEP_ID Inst_Acronym Role tv1 tv2 1003 PPE vice-chair 1003 Pisoni Parliament_memberIn_political_function
Joins with more tables: In what languages do these people give speeches?
Explicit Tuple-Variables • Sometimes, a query needs to use two copies of the same relation. • Distinguish copies by following the relation name by the name of a tuple-variable, in the FROM clause. • It’s always an option to rename relations this way, even when not essential.
Example • From Speech, find all pairs of speeches by the same MEP. • Do not produce pairs like (1,1). • Produce pairs in ascending order, e.g. (1,2), not (2,1).
Controlling Duplicate Elimination • Force the result to be a set by SELECT DISTINCT . . . • Force the result to be a bag (i.e., don’t eliminate duplicates) by ALL, as in . . . UNION ALL . . .
Example: DISTINCT(What does this mean? What would you get without DISTINCT?)
Aggregations • SUM, AVG, COUNT, MIN, and MAX can be applied to a column in a SELECT clause to produce that aggregation on the column. • Also, COUNT(*) counts the number of tuples.
Example: Aggregation • From Session, find the average month: SELECT AVG(Month) FROM Session; Why?
Eliminating Duplicates in an Aggregation • Use DISTINCT inside an aggregation. • Example: Session_day has 4 days in January 2012 and 1 in February 2012. • What is the result of these queries? SELECT count( * ) FROM Session_day; SELECT count( Month ) FROM Session_day; SELECT count( DISTINCT Month ) FROM Session_day;
NULL’s Ignored in Aggregation • NULL never contributes to a sum, average, or count, and can never be the minimum or maximum of a column. • But if there are no non-NULL values in a column, then the result of the aggregation is NULL.
Example: Effect of NULL´s • An incomplete version of the database has • SELECT count(*) FROM `speech` 628 • SELECT count(agenda_item_ID) FROM `speech` 0
Grouping • We may follow a SELECT-FROM-WHERE expression by GROUP BY and a list of attributes. • The relation that results from the SELECT-FROM-WHERE is grouped according to the values of all those attributes, and any aggregation is applied only within each group.
Groupingover >1 table,sorted(Note:ORDER BY also for other attributes)
Restriction on SELECT Lists With Aggregation • If any aggregation is used, then each element of the SELECT list must be either: • Aggregated, or • An attribute on the GROUP BY list.
HAVING Clauses • HAVING <condition> may follow a GROUP BY clause. • If so, the condition applies to each group, and groups not satisfying the condition are eliminated.
Requirements on HAVING Conditions • These conditions may refer to any relation or tuple-variable in the FROM clause. • They may refer to attributes of those relations, as long as the attribute makes sense within a group; i.e., it is either: • A grouping attribute, or • Aggregated.
Reading Book III, • Ch.1, 207-213 about basics (literal values, variables, functions) • Ch.2, 231-234, 236-240, 248-255 about SELECT statements • Ch.4, 308-313 about SELECT statements on JOINed tables.
Next week • More on SQL!