1 / 24

Missing Information

Missing Information. Stanislava Armstrong http://www.cs.nott.ac.uk/~saw/teaching/G64DBS/lecture_slides.html. Missing Information. Sometimes we don’t know what the value is for an entry in a relation We know that there is a value, but don’t know what it is

ford
Download Presentation

Missing Information

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Missing Information Stanislava Armstrong http://www.cs.nott.ac.uk/~saw/teaching/G64DBS/lecture_slides.html SQL Data Definition II

  2. Missing Information • Sometimes we don’t know what the value is for an entry in a relation • We know that there is a value, but don’t know what it is • There is no value at all that makes any sense • The value is intentionally undisclosed • Two main methods have been proposed to deal with this • NULLs can be used as markers to show that information is missing • A default value can be used to represent the missing value Missing Information

  3. NULLs, ' ' and 0 • A NULL is a placeholder for a missing value (unknown, inapplicable or withheld) • NULL is not an actual value • An extra set of rules have been devised to accommodate NULL when using: • Arithmetic expressions • Comparison expressions • GROUP BY and ORDER BY • Aggregate functions • Fro comparision: ' ' and 0 are actual values • They will be treated as normal values at all times. • One suggestion is to use ' ' and 0 when you know that the value does not exist and NULL when you don’t know what the value is Missing Information

  4. Problems with NULLs • Problems with extending relational algebra operations to NULLs: • Defining selection operation: if we check tuples for some property like “Mark > 40” and one of our tuples is {Mark, NULL}, do we include it? • Defining intersection or difference of two relations: are two tuples {John,NULL} and {John,NULL} the same or not? • Additional problems for SQL: do we treat NULLs as duplicates? Do we include them in count, sum, average and if so, how? How do arithmetic operations behave when an argument is NULL? Missing Information

  5. NULLS and Arithmetic Expressions Fruit ID 1 2 3 4 Name kiwi NULL ' ' lime Price NULL 34 10 0 Fruit • Arithmetic expressions with NULL in them evaluate to NULL e.g. ID 1 2 3 4 Name kiwi NULL ' ' lime Price NULL 39 15 5 UPDATE Fruit SET price = price + 5; Fruit ID 1 2 3 4 Name a kiwi NULL a a lime Price NULL 34 10 0 UPDATE Fruit SET name = CONCAT ('a ', name); Missing Information

  6. NULLS and Comparing Expressions • 1 > NULL can neither be evaluated to true nor to false. In fact it just cannot be evaluated. • If we allow NULLs in the database we need to have a way to handle comparison of expressions. • An expression which involves NULL theoretically evaluates to unknown. • In MySQL that is usually represented by NULL Exceptions and specifics: • True OR NULL evaluates to True • NULL is not equivalent to NULL, so NULL = NULL will evaluate to NULL and not to True. • WHERE clause like Price >= 120 OR Price < 120 will not display rows which include NULL for the price Missing Information

  7. Example Fruit Fruit ID 1 2 3 4 ID 3 4 Name kiwi NULL apple lime Name apple lime Price NULL 34 0 0 Price 0 0 SELECT * FROM Fruit WHERE price <20; SELECT * FROM Fruit WHERE price = NULL; The empty set. Missing Information

  8. ISNULL, ISNOTNULL and ISNULL() • As NULL could take any value, one NULL is not necessarily equal to another NULL. • To deal with the problem of extracting rows that contain the NULL marker, MySQL has introduced number of keywords and functions. • IS NOT NULL - returns all values that are not NULL • IS NULL can be used to return all rows which contain the NULL marker for the tested column • ISNULL(<expr>) – returns true if the <expr> evaluates to NULL, false otherwise. Missing Information

  9. Example Fruit Fruit Fruit Fruit ID 1 ID 1 ID 2 3 4 ID 1 2 3 4 Name kiwi Name kiwi NULL apple lime Name NULL apple lime Name kiwi Price 34 0 0 Price NULL 34 0 0 Price NULL Price NULL SELECT * FROM Fruit WHERE price IS NOT NULL; SELECT * FROM Fruit WHERE price IS NULL; SELECT * FROM Fruit WHERE ISNULL(Price) = TRUE; Missing Information

  10. NULLs and aggregate functions • Aggregate functions remove NULLs before performing the required operations • COUNT(*) is the only one which takes rows containing NULLs into account • Note that 0 unlike NULL will be taken in account in the calculations Missing Information

  11. Example Fruit Fruit ID 1 2 3 4 Name kiwi kiwi lime lime COUNT(*) 4 COUNT(Price) 3 Name kiwi lime Price NULL 34 0 34 AVG (Price) 34 17 SELECT COUNT(*) FROM Fruit; SELECT COUNT(Price) FROM Fruit; SELECT Name, AVG(price) FROM Fruit GROUP BY Name; Missing Information

  12. NULL with GROUP BY, ORDER BY AND DISTINCT Fruit Fruit Fruit Fruit ID 4 2 1 3 ID 1 2 3 4 Name kiwi NULL kiwi NULL Name kiwi Name NULL NULL kiwi kiwi Name NULL kiwi Price NULL 34 100 0 Price 17 100 Price 34 0 NULL 100 • GROUP BY, ORDER BY and DISTINCT all treat NULLs as equivalent SELECT Name, AVG(Price) FROM Fruit GROUP BY Name; SELECT ID, Name, Price FROM Fruit ORDER BY Name, Price; SELECT DISTINCT Name FROM Fruit; Missing Information

  13. Outer Joins • When we take the join of two relations we match up tuples which share values • Some tuples have no match, and are ‘lost’ • These are called ‘dangles’ • Outer joins include dangles in the result and use NULLs to fill in the blanks • Left outer join • Right outer join • Full outer join – not available in MySQL Missing Information

  14. Outer Join Syntax in MySQL SELECT <cols> FROM <table1> <type> OUTER JOIN <table2> ON <condition> Where <type>is one of LEFTor RIGHT • You can then use any of the normal SELECT attributes, e.g. a WHERE clause, GROUP BY, etc. • You could also have a natural outer, the syntax is: SELECT <cols> FROM <table1> NATURAL <type> OUTER JOIN <table2> Missing Information

  15. Example Fruit Buyer ID 1 NULL ID 1 2 ID 1 2 ID 1 3 Kingdom Fairlyland Trolland Name kiwi lime Name kiwi NULL Name kiwi lime Price 10 NULL Price 10 20 Price 10 20 SELECT * FROM Fruit LEFT OUTER JOIN Buyers ON Fruit.ID = Buyers.ID; Kingdom Fairlyland NULL Kingdom Fairlyland Trolland ID 1 3 ID 1 NULL SELECT * FROM Fruit RIGHT OUTER JOIN Buyers ON Fruit.ID = Buyers.ID; Missing Information

  16. Default Values • Default values are an alternative to the use of NULLs • If a value is not known a particular placeholder value - the default - is used • These are actual values, so no special rules have to be introduced for them. • Problem: since defaults are real values • They can be updated like any other value • You need to use a value that won’t appear in any other circumstances • They might not be interpreted properly Missing Information

  17. DEFAULTS • A default value can be assigned for each column • You can use a meaningful value like -1 for price, but be careful with UPDATE • The default value in MySQL has to be an exact value and not a function or an expression • Exceptions: • CURRENT_TIMESTAMP for TIMESTAMP • AUTO_INCREMENT for numerical data types - the DEFAULT is the next value in the sequence • In MySQL the TEXT and BLOB data types cannot be assigned a DEFAULT value • If a column does not have a specific DEFAULT value, MySQL will assume DEFAULT NULL • If the column does not have a DEFAULT value and it has been declared NOT NULL, the DEFAULT will be automatically set to 0 for numerical types and ‘ ‘ for character types Missing Information

  18. Splitting Tables • NULLs and defaults both try to fill entries with missing data • NULLs mark the data as missing • Defaults give some indication as to what sort of missing information we are dealing with • Often you can remove entries that have missing data • You can split the table up so that columns which might have NULLs are in separate tables • Entries that would be NULL are not present in these tables Missing Information

  19. Splitting Tables Example Fruit ID Name Fairyland Trolland 1 kiwi 10 20 2 lime 15 NULL 3 orange 3 100 grape NULL 30 5 cherry NULLNULL 6 Fig 150 0 Fruit ID Name Fairyland 1 kiwi 10 2 lime 15 3 orange 3 6 Fig 150 Fruit ID Name Trolland 1 kiwi 20 3 orange 100 grape 30 6 Fig 0 Missing Information

  20. Problems with Splitting Tables • Splitting tables has its own problems • We might introduce many extra tables • Information gets spread out over the database • Queries become more complex and require many joins • We can recover the original table, but • We need to do an outer join to do so • This introduces NULLs, which brings in all the associated problems again • In MySQL full outer joins are not possible, which means that we will never be able to recover the original table exactly. Missing Information

  21. Which Method to Use? • Often a matter of personal choice, but • Default values should not be used when they might be confused with ‘real’ values • Splitting tables shouldn’t be used too much or you’ll have lots of tables • NULLs can (and often are) used where the other approaches seem inappropriate • You don’t have to always use the same method - you can mix and match as needed Missing Information

  22. Exercise Determine the outcome from executing each of the following statements on the Holidays table below: • SELECT Destination FROM Holiday WHERE Destination LIKE 'C%'; • SELECT Destination FROM Holiday WHERE Destination LIKE 'N%'; • SELECT * FROM Holiday WHERE Price <> 500; • SELECT * FROM Holiday WHERE ISNULL(Price) = TRUE; • SELECT COUNT(*) FROM Holiday WHERE ISNULL(Price) = TRUE; Missing Information

  23. Exercise Determine the outcome from executing each of the following statements on the Holidays table below: • SELECT Destination, COUNT(*) FROM Holiday WHERE ISNULL(Price) = TRUE GROUP BY Destination; • SELECT COUNT(*), AVG(Price) FROM Holiday; • UPDATE Holiday SET Price = Price*0.80; • SELECT * FROM Holiday; • UPDATE Holiday SET Price = Price*0.80; • SELECT * FROM Holiday; Missing Information

  24. Reading Material • A First Course in Database Systems by Ullman and Widom – chapters 6.1.5, 6.1.6 and 6.3.8 • Any other database book – chapters on missing values SQL Data Definition

More Related