CMPT 354 Views and Indexes

CMPT 354 Views and Indexes Spring 2012 Instructor: Hassan Khosravi

Three level vision of a database

What are views • Relations that are defined with a create table statement exist in the physical layer and • do not change unless explicitly told so • Virtual views do not physically exist, they are defined by expression over the tables. • Can be queries (most of the time) as if they were tables. • Why use views? • Hide some data from users • Make some queries easier • Modularity of database • When not specified exactly based on tables.

Defining and using Views • Create View <view name> As <view definition> • View definition is defined in SQL • From now on we can use the view almost as if it is just a normal table • View V (R1,…Rn ) • query Q involving V • Conceptually • V (R1,…Rn ) is used to evaluate Q • In reality • The evaluation is performed over R1,…Rn

Example of views

Views • Conceptually • Reality • Query optimizer • Users do not need to worry about any of this

Views on Views

Views make life easy

Deleting views • Drop View <view name> • Dropping a view does not affect any tuples of the in the underlying relation. • Dropping a table would remove all of the tuples of the table and views that use that table. • What would happen if CSaccept is dropped? • Some DBMS would not allow it because it has dependencies • Some would delete it and the be unable to answer queries on CSberk

Modifying views • Can views be modified? • Doesn’t make sense to modify a view since it doesn’t physically exist • Should be possible since many users can only observe views and do not have access to tables, so they should be able to alter tables using views. • Solution Modifications to the view rewritten to modify base tables.

Example • R(A,B) which has (1, 2) • V is just the A attribute of V = ∏ A (R) • Insert (3) into A • What would be insert into R(A,B)? • (3,1)? • (3,2)? • (3,null)? • R(N) • V = avg(N) • Update

Example

Approaches to View Modification • Rewriting process specified explicitly by view creator (user) • + Can handle all modification • - No guarantee of correctness • - More work • Restrict views and modifications so that translation to base tables can be done automatically, correctly, and unambiguously • + No User intervention • - Restrictions are significant

Rewriting process specified explicitly by view creator • Instead of Triggers are mainly used • We can write a trigger to perform deletion on this view

This will produce an error without having a trigger for it • There is nothing stopping you from making a mistake in your trigger.

Restricting modifications in Views • Showing students that are either in CS or EE • Inserting new students

Automatic View Modification • Restrict views and modifications so that translation to base tables can be done automatically, correctly, and unambiguously • The view should be on one table and not join of tables • Attributes that are not in the view can be null or have default values • Sub-queries must not refer to T • You cannot have group by or aggregation • MYSQL supports automatic updates on views • Postgress and SQLite do not support automatic updates on views and triggers should be used

Deletion Example • The deletion will be done in the apply table and would show in the CSaccept view

Insertion Example • You should not be able to add to view tuples that do not belong to the view

Updatable Views -- Aggregation • What would this deletion mean? • The system disallows this deletion.

Updatable views -- Sub-queries • This is disallowed since it is vague • It would have surprising effects on the base table

Updatable views-- Joins • What would insertion or deletion to this table mean? • Updates the apply table but not the student table so the first row would not match any more.

Indexes in SQL • Primary mechanism to get improved performance on database • Data structure stored in the database • Implementation issues • Full scan vs. immediate tuple location

Implementation of Indexes • Underlying data structure • Balance Trees • A = v • A < v • A > v • Logarithm time • Hash tables • Only A = v • If designed properly, constant time

Downside of Indexes • Extra space • Local hard space • Over head in Index creation • Index maintenance

Calculating the best indexes to create • Typically a relation is stored over many disk blocks • Principle cost of a query or modification is often the number of disks that have to be brought into the main memory • Indexes store where tuples are stored, so we know which block to restore instead of full scan of the table • However indexes themselves have to be stored and modified • Modification in the actual relation needs one page access to modify the relation and one page access to modify the index • Assuming we have history of the previous queries and modifications how can we decide what to have indexes on?

Example • StarIn(movieTitle, movieYear, starName) • Three database operations are performed • (Q1)Look for title and year of movies in which a given star appeared (p1) • (Q2) Look for stars that appear in a given movie (p2) • (I) Insert a new tuple into the starIn table (1-p1-p2) • StarIn occupies 10 pages • On average a star has appeared in 3 movies • On average a movie has 3 stars • Movies of a star and stars of a movie are speared over all pages • One disk access is needed to read a page in the index • Insertion requires 1 access to find page to write to and 1 to actually write it

Automatic Selection of Indexes to create

Materialized views • Materialized views • Periodic Maintenance of Materialized views • Common for databases to serve two purposes • Record current inventory • Analysts to study patterns • Analyst’s queries involve many aggregation and joins • Not as sensitive to modifications • They are updated (reconstructed) periodically • Once a day or week

CMPT 354 Views and Indexes