Final Review

Final Review

SQL • Will be in the exam. • So, refresh midterm review. • Constraints as well.

Security and Authorization

GRANT Command GRANT privileges ON object TO users [WITH GRANT OPTION] • The following privileges can be specified: • SELECT Can read all columns • including those added later via ALTER TABLE command • INSERT(column-name) Can insert tuples with non-null or nondefault values in this column. • INSERT means same right with respect to all columns. • DELETE Can delete tuples. • REFERENCES(column-name) Can define foreign keys (in other tables) that refer to this column.

Grant Examples I • Joe has created the tables • Sailors(sid, sname, rating, age) • Boats(bid, bname, color) • Reserves(sid, bid, day) • Joe now executes the following: GRANT INSERT, DELETE ON Reserves TO Yuppy WITH GRANT OPTION; • Yuppy can now insert or delete Reserves rows and authorize someone else to do the same.

Grant Examples II • Joe further executes: GRANT SELECT ON Reserves TO Michael; GRANT SELECT ON Sailors TO Michael WITH GRANT OPTION; • Michael can now execute SELECT queries on Sailors and Reserves, and he can pass this privilege to others for Sailors but not for Reserves. • With the SELECT privilege, Michael can create a view that accesses the Sailors and Reserves tables, for example, the ActiveSailors view: CREATE VIEW ActiveSailors (name, age, day) AS SELECT S.sname, S.age, R.day FROM Sailors S, Reserves R WHERE S.sid = R.sid AND S.rating > 6; • However, Michael cannot grant SELECT on ActiveSailors to others. Why?

Grant Examples III • On the other hand, suppose that Michael creates the following view: CREATE VIEW YoungSailors (sid, age, rating)AS SELECT S.sid, S.age, S.rating FROM Sailors S WHERE S.age < 18; • The only underlying table is Sailors, for which Michael has SELECT with grant option. Therefore he can pass this on to Eric and Guppy: GRANT SELECT ON YoungSailors TO Eric, Guppy; • Eric and Guppy can now execute SELECT queries on the view YoungSailors. • Note, however, that Eric and Guppy don’t have the right to execute SELECT queries directly on the underlying Sailor table.

Grant Examples IV • Suppose now Joe executes: GRANT UPDATE (rating) ON Sailors TO Leah; • Leah can update only the rating column of Sailors.E.g. UPDATE Sailors S SET S.rating = 8; • However, she cannot execute: UPDATE Sailors S SET S.age = 25; • She cannot execute either: UPDATE Sailors S SET S.rating = S.rating-l; • Why?

Grant Examples V • Suppose now Joe executes: GRANT SELECT, REFERENCES(bid) ON Boats TO Bill; • Bill can refer to the bid column of Boats as a foreign key in another table. E.g. CREATE TABLE Reserves ( sid INTEGER, bid INTEGER, day DATE, PRIMARY KEY (bid, day), FOREIGN KEY (sid) REFERENCES Sailors, FOREIGN KEY (bid) REFERENCES Boats );

Revoke Examples I REVOKE [GRANT OPTION FOR] privileges ON object FROM users {RESTRICT | CASCADE} • Suppose Joe is the creator of Sailors. GRANT SELECT ON Sailors TO Art WITH GRANT OPTION (executed by Joe) GRANT SELECT ON Sailors TO Bob WITH GRANT OPTION (executed by Art) REVOKE SELECT ON Sailors FROM Art CASCADE (executed by Joe)

Revoke Examples II • Art loses the SELECT privilege on Sailors. • Then Bob, who received this privilege from Art, and only Art, also loses this privilege. • Bob’s privilege is said to be abandoned • When CASCADE is specified, all abandoned privileges are also revoked • Possibly causing privileges held by other users to become abandoned and thereby revoked recursively. • If the RESTRICT keyword is specified, the command is rejected if revoking privileges causes other privileges becoming abandoned.

Grant and Revoke on Views • Suppose that Joe created Sailors and gave Michael the SELECT privilege on it with the grant option. • Michael then created the view YoungSailors and gave Eric the SELECT privilege on YoungSailors. • Eric now defines a view called FineYoungSailors: CREATE VIEW FineYoungSailors (name, age, rating) AS SELECT S.sname, S.age, S.rating FROM YoungSailors S WHERE S.rating> 6 • What happens if Joe revokes the SELECT privilege on Sailors from Michael? • Michael no longer has the authority to execute the query used to define YoungSailors because the definition refers to Sailors. • Therefore, the view YoungSailors is dropped (I.e., destroyed). • In turn, FineYoungSailors is dropped as well.

Revoking REFERENCES privilege • Suppose Joe had executed: GRANT REFERENCES(bid) ON Boats TO Bill; • Bill can refer to the bid column of Boats as a foreign key in another table. E.g. CREATE TABLE Reserves ( sid INTEGER, bid INTEGER, day DATE, PRIMARY KEY (bid, day), FOREIGN KEY (sid) REFERENCEs Sailors, FOREIGN KEY (bid) REFERENCES Boats ); • If Joe revokes the REFERENCES privilege from Bill, then the Foreign Key constraint referencing the Boat table will be dropped from the Bill’s Reserves table.

Storage

The Memory Hierarchy Desired data carried to read/write port, access times in seconds. Most common: racks of tapes; newer devices: CDROM “juke boxes,” tape “silo's.” Capacities in terabytes. • Typically magnetic disks, magnetooptical • (erasable), CDROM. • Access times in milliseconds, great • variability. • Unit of read/write = block or page, • typically 16Kb. • Capacities in gigabytes. under a microsecond, random access, perhaps 512Mb fastest, perhaps 1Mb

Disks To motivate many of the ideas used in DBMS’es, we must examine the operation of disks in detail. • Platters with top and bottom surfaces rotate • around a spindle. • Diameters 1 inch to 4 feet. • 2--30 surfaces. • Rotation speed: 3600--7200 rpm. • One head per surface. • All heads move in and out in unison.

Tracks and sectors • Surfaces are covered with concentric tracks. • Tracks at a common radius = cylinder. • Important because all data of a cylinder can be read quickly, without moving the heads. • Typical magnetic disk: 16,000 cylinders • Tracks are divided into sectors by unmagnetized gaps (which are 10% of track). • Typical track: 512 sectors. • Typical sector: 4096 bytes. • Sectors are grouped into blocks. • Typical: one 16K block = 4 4096byte sectors.

I/O model of computation • Disk I/O = read or write of a block is very expensive compared with what is likely to be done with the block once it arrives in main memory. • Perhaps 1,000,000 machine instructions in the time to do one random disk I/O. • Random block accesses is the norm if there are several processes accessing disks, and the disk controller does not schedule accesses carefully. • Reasonable model of computation that requires secondary storage: count only the disk I/O's.

TwoPhase, Multiway Merge Sort Merge Sort still not very good in disk I/O model. • log2n passes, so each record is read/written from disk log2n times. • The secondary memory algorithms operate in a small number of passes; • in one pass every record is read into main memory once and written out to disk once. • 2PMMS: 2 reads + 2 writes per block. • Phase 1 1. Fill main memory with records. 2. Sort using favorite mainmemory sort. 3. Write sorted sublist to disk. 4. Repeat until all records have been put into one of the sorted lists.

Phase 2 • Use one buffer for each of the sorted sublists and one buffer for an output block. • Initially load input buffers with the first blocks of their respective sorted lists. • Repeatedly run a competition among the first unchosen records of each of the buffered blocks. • Move the record with the least key to the output block; it is now “chosen.” • Manage the buffers as needed: • If an input block is exhausted, get the next block from the same file. • If the output block is full, write it to disk.

Reasons to limit the block size • First, we cannot use blocks that cover several tracks effectively. • Second, small relations would occupy only a fraction of a block, so large blocks would waste space on the disk. • The larger the blocks are, the fewer records we can sort by 2PMMS (see next slide). • Nevertheless, as machines get faster and disks more capacious, there is a tendency for block sizes to grow.

How many records can we sort? • The block size is B bytes. • The main memory available for buffering blocks is M bytes. • Records take R bytes. • Number of main memory buffers = M/B blocks • We need one output buffer, so we can actually use (M/B)-1 input buffers. • How many sorted sublists makes sense to produce? • (M/B)-1. • What’s the total number of records we can sort? • Each time we fill in the memory we sort M/R records. • Hence, we are able to sort (M/R)*[(M/B)-1] or approximately M2/RB. If we use the parameters in the example about TPMMS we have: M=100MB = 100,000,000 Bytes = 108 Bytes B = 16,384 Bytes R = 160 Bytes So, M2/RB = (108)2 / (160 * 16,384) = 4.2 billion records, or 2/3 of a TeraByte.

Sorting larger relations • If our relation is bigger, then, we can use 2PMMS to create sorted sublists of M2/RB records. • Then, in a third pass we can merge (M/B)-1 of these sorted sublists. • The third phase let’s us sort • [(M/B)-1]*[M2/RB]  M3/RB2records • For our example, the third phase let’s us sort 75 trillion records occupying 7500 Petabytes!!

Primary Indexes Dense Indexes Pointer to every record of a sequential file, (ordered by search key). • Can make sense because records may be much bigger than keypointer pairs. • Fit index in memory, even if data file does not? • Faster search through index than data file? • Test existence of record without going to data file. Sparse Indexes Keypointer pairs for only a subset of records, typically first in each block. • Saves index space.

Dense Index

Secondary Indexes • A primary index is an index on a sorted file. • Such an index “controls” the placement of records to be “primary,” • Secondary index = index that does not control placement, surely not on a file sorted by its search key. • Sparse, secondary index makes no sense. • Usually, search key is not a “key.”

Indirect Buckets • To avoid repeating keys in index, use a level of indirection, called buckets. • Additional advantage: allows intersection of sets of records without looking at records themselves. • Example Movies(title, year, length, studioName); secondary indexes on studioName and year. SELECT title FROM Movies WHERE studioName = 'Disney' AND year = 1995;

BTrees Generalizes multilevel index. • Number of levels varies with size of data file, but is often 3. • B+ tree = form we'll discuss. • All nodes have same format: n keys, n + 1 pointers. • Useful for primary, secondary indexes, primary keys, nonkeys. • Leaf has at least key-pointer pairs • Interior nodes use at least pointers.

Leaf 57 81 95 To next leaf in sequence To record with key 57 To record with key 95 To record with key 81 Interior Node 57, 81, and 95 are the least keys we can reach by via the corresponding pointers. 57 81 95 To keys K<57 To keys K95 To keys 81K<95 To keys 57K<81 A typical leaf and interior node (unclusttered index)

Lookup Try to find a record with search key 40. • Recursive procedure: • If we are at a leaf, look among the keys there. If the i-th key is K, the the i-th pointer will take us to the desired record. • If we are at an internal node with keys K1,K2,…,Kn, then if K<K1we follow the first pointer, if K1K<K2 we follow the second pointer, and so on.

Insertion Try to insert a search key = 40. First, lookup for it, in order to find where to insert. It has to go here, but the node is full!

Beginning of the insertion of key 40 What’s the problem? No parent yet for the new node! Observe the new node and the redistribution of keys and pointers

Continuing of the Insertion of key 40 We must now insert a pointer to the new leaf into this node. We must also associate with this pointer the key 40, which is the least key reachable through the new leaf. But the node is full. Thus it too must split!

Completing of the Insertion of key 40 This is a new node. • We have to redistribute 3 keys and 4 pointers. • We leave three pointers in the existing node and give two pointers to the new node. 43 goes in the new node. • But where the key 40 goes? • 40 is the least key reachable via the new node.

Completing of the Insertion of key 40 It goes here! 40 is the least key reachable via the new node.

Structure of B-trees • Degree n means that all nodes have space for n search keys and n+1 pointers • Node = block • Let • block size be 4096 Bytes, • key 4 Bytes, • pointer 8 Bytes. • Let’s solve for n: 4n + 8(n+1)  4096  n  340 n = degree = order = fanout

Example • n = 340, however a typical node has 255 keys • At level 3 we have: 2552 nodes, which means 2553 16  220 records can be indexed. • Suppose record = 1024 Bytes  we can index a file of size 16  220 210 16 GB • If the root is kept in main memory accessing a record requires 3 disk I/O

Transactions

Transactions Correctness Principle • A transaction is atomic --all or none property. If it executes partly, an invalid state is likely to result. • A transaction, may change the DB from a consistent state to another consistent state. Otherwise it is rejected (aborted). • Concurrent execution of transactions may lead to inconsistency – each transaction must appear to be executed in isolation • The effect of a committed transaction is durable i.e. the effect on DB of a transaction must never be lost, once the transaction has completed. • ACID: Properties of a transaction: Atomicity, Consistency, Isolation, and Durability

Concurrent Transactions • Even when there is no “failure,”several transactions can interact to turn a consistent state into an inconsistent state.

Transactions and Schedules • A transaction (model) is a sequence of r and w actions on database elements. • A schedule is a sequence of reads/writes actions performed by a collection of transactions. • Serial Schedule = All actions for each transaction are consecutive. r1(A); w1(A); r1(B); w1(B); r2(A); w2(A); r2(B); w2(B); … • Serializable Schedule: A schedule whose “effect” is equivalent to that of some serial schedule. • We will introduce a sufficient condition for serializability.

Conflicts • Suppose for DB elements X and Y, ri(X); rj(Y) is part of a schedule, and we flip the order of these operations. • ri(X); rj(Y) ≡ rj(Y); ri(X) • This holds always (even when X=Y) • We can flip ri(X); wj(Y), as long as X≠Y • That is, ri(X); wj (X) wj(X); ri (X) • In the RHS, Ti reads the value of X written by Tj, whereas it is not so in the LHS.

Conflicts (Cont’d) • We can flip wi(X); wj(Y); provided X≠Y • However, wi(X); wj(X) ≢ wj(X); wi(X); • The final value of X may be different depending on which write occurs last. • There is a conflict if 2 conditions hold. • A read and a write of the same X, or • Two writes of X conflict in general and may not be swapped in order. All other events (reads/writes) may be swapped without changing the effect of the schedule (on the DB).

Schedulers • A scheduler takes requests from transactions for reads and writes, and decides if it is “OK” to allow them to operate on DB or defer them until it is safe to do so. • Ideal: a scheduler forwards a request iffit cannot lead to inconsistency of DB • Too hard to decide this in real time. • Real: it forwards a request if it cannot result in a violation of conflictserializability. • We thus need to develop schedulers which ensure conflict-serializability.

Lock Actions • Before reading or writing an element X, a transaction Ti requests a lock on X from the scheduler. • The scheduler can either grant the lock to Ti or make Ti wait for the lock. • If granted, Ti should eventually unlock (release) the lock on X. • Shorthands: • li(X) = “transaction Ti requests a lock on X” • ui(X) = “Ti unlocks/releases the lock on X”

Two Phase Locking There is a simple condition, which guarantees confict-serializability: In every transaction, all lock requests (phase 1) precede all unlock requests (phase 2).

Undo Logging (Cont’d) Two rules of Undo Logging: • U1: Log records for a DB element X must be on disk before any database modification to X appears on disk. • U2: If a transaction T commits, then the log record <COMMIT T> must be written to disk only after all database elements changed by T are written to disk. • In order to force log records to disk, the log manager needs a FLUSH LOG command that tells the buffer manager to copy to disk any log blocks that haven’t previously been copied to disk or that have been changed since they were last copied.

Example: Action t Buff A Buff B A in HD B in HD Log Read(A,t) 8 8 8 8 <Start T> t:=t*2 16 8 8 8 Write(A,t) 16 16 8 8 <T,A,8> Read(B,t) 8 16 8 8 8 t:=t*2 16 16 8 8 8 Write(B,t) 16 16 16 8 8 <T,B,8> Flush Log Output(A) 16 16 16 16 8 Output(B) 16 16 16 16 16 <Commit T> Flush Log

Recovery With Undo Logging 1. Examine the log to identify all transactions T such that <START T> appears in the log, but neither <COMMIT T> nor <ABORT T> does. • Call such transactions incomplete. 2. Examine each log entry <T, X, v> a) If T isn’t an incomplete transaction, do nothing. b) If T is incomplete, restore the old value of X In what order? From most recent to earliest. 3. For each incomplete transaction T add <ABORT T> to the log, and flush the log. • What about the transactions that had already <ABORT T> in the log? • We do nothing about them. If T aborted, then the effect on the DB should have been restored anyway.

Final Review

Final Review

Presentation Transcript

Final Review

Final Review

Final Review

Final Review

Final review

Final review

Final Review

Final Review

Final Review

Final Review

Final Review

Final review

Final Review

Final Review

Final Review

FINAL REVIEW

Final Review