1 / 16

Query Processing for SQL Updates

Query Processing for SQL Updates. Galindo-Legaria, Stefani and Waas (Microsoft Corp) Talk for CS 632 by S. Sudarshan. Issues in Efficient Processing of Updates. Checking constraints primary/unique key check constraints referential integrity, esp. with cascading actions

jraymer
Download Presentation

Query Processing for SQL Updates

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Query Processing for SQL Updates Galindo-Legaria, Stefani and Waas (Microsoft Corp) Talk for CS 632 by S. Sudarshan

  2. Issues in Efficient Processing of Updates • Checking constraints • primary/unique key • check constraints • referential integrity, esp. with cascading actions • Updating primary and secondary indices • Bulk updates vs. individual updates • Not in this paper • Write optimized index structures • Materialized view maintenance

  3. Modeling Updates • Delta stream • StreamUpdate operator • issues updates to the storage engine • multiple of these can be cascaded or run serially for a single update

  4. General form of update Plan • Read-query provides • for insert/update: values to be inserted/new values for update • for delete/update: locators of tuples to be deleted updated

  5. Index Maintenance • Primary access path • B-tree or heap • locator: uniquely identify row, and allow efficient retrieval • For B-tree: index attributes + uniquifier • For heap: physical locator: RID • Secondary indices (non-clustered) • Store locator of record corresp to primary access path • Optionally: included columns at leaf level • (a.k.a. covering indices) • Also: • computed columns • unique declaration

  6. Index Maintenance • Primary access path updated first • locator needed for secondary indices • old values needed for secondary index update, obtained from primary access path • copied out into delta table (pre-copy) • Per-row vs Per-Index update plans • Per-row: update all indices corresp. to a row, before processing next row • Per-index: use all rows to update an index, then move to next index • Allows sorting of delta table, to reduce access costs • but requires spooling of delta table

  7. Per-Index Maintenance • A single update affects two areas of an index • messes up access order even if delta table is sorted • idea: create two records, one for delete action, one for insert action • called split delta • sort split delta stream by (locator, action) • delete sorts first to avoid spurious uniqueness violations

  8. Split Delta

  9. Choosing an Update Plan • Per-index plan can be more efficient for a large batch • Sorted order, piggy back other steps such as referential integrity checks • But may be less efficient in other cases due to • overhead of more operators for per-index plan • spooling of delta • Wide vs stacked plans • i.e. separate deltas for each secondary index vs. shared delta pipelined through many operators • wide seems better since it can prune unnecessary columns • Cache effects: when you have a choice of index for selection, may be better to use one that is going to be updated

  10. Checking Single Table Constraints • Only those that may be affected by update are checked • exception/rollback on failure

  11. Referential Integrity Constraints • Updates/deletes on referenced side • Inserts/updates on referencing side • Use outerjoin of delta rows with other table • ensure match/lack of match depending on updated side being referencing/referenced • Checks performed after updating the table • But for updates to referenced table, old values must be used • May require index on attributes of referencing table • index on referenced table already present normally

  12. Checking Referential Integrity

  13. Referential Integrity (Cont.) • When checking self-referential integrity constraints, perform all updates done by a DML statement, only then check • Why? • If there is an index on f.k. attrs, integrity check can be combined with index maintenance, since both need same sort order • place ref integrity check operator just above index maintenance • Another optimization: eliminate duplicates

  14. Cascading Updates • E.g. S has f.k. referencing R, on update cascade • Update to p.k. of tuple in R cascades to S • Similar to f.k. checking, but instead of validating, update referencing relation(s) • to do so, create delta table, and use update plan to process delta table

  15. Cascading Updates

  16. Issues not addressed by paper • All optimizations are for a single transaction • What if there is a sequence of small transactions? • Insert/update/delete to large B-tree may require 1 I/O per update to leaf page • even assuming internal pages are in memory • Solution: write-optimized B-trees • collect deltas and apply as a batch • but ensure queries are correctly answered meanwhile, even if more slowly • Several schemes including work at IITB in 1997, and a nice survey paper by Graefe with some neat implementation ideas

More Related