1 / 13

CSE444: REVIEW

CSE444: REVIEW. CSE444 in one slide. Logical : E/R diagram  normalized relations Physical : files, buffering, and indexes Logical : Relational algebra and SQL Physical : join methods, optimization CC/R choices affect correctness, performance

spielman
Download Presentation

CSE444: REVIEW

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CSE444: REVIEW

  2. CSE444 in one slide • Logical: E/R diagramnormalized relations • Physical: files, buffering, and indexes • Logical: Relational algebra and SQL • Physical: join methods, optimization • CC/R choices affect correctness, performance • Object Databases extend, improve relations • … OODBMS, ORDBMS are converging? • … OQL, SQL3 are converging?

  3. The E/R model • Entities vs. relationships • Attributes for entities and relationships • Keys and weak entities (no foreign keys!!) • Cardinality constraints • Participation (key) constraints • Translation to relations: • Entity becomes relation • Relationship becomes relation iff it is many-many • Other relationships: key of “parent” goes with “child”

  4. FDs and Normal Forms • Determine Candidate Keys (CKs) • A field is prime iff it is in some CK • 1NF: All relations are in 1NF • 2NF: For each FD XA: • A is prime OR X is not a proper subset of any CK • 3NF: For each FD XA: • X is a superkey OR A is prime OR A X (trivial) • BCNF: For each FD XA: • X is a superkey OR A X (trivial) • Normalize by decomposing R (ABC) to R1(AB) and R2 (BC), where BC violates a NF

  5. Physical DB Design • File organizations: heapfile, sorted file, hashed file • Disks: speed and physical characteristics • Buffer pool can vary in size; frame holds page • Replacement policy choice is important • Indexes can speed up data access: • Clustered vs. unclustered • Primary vs. secondary • Dense vs. sparse • B+ Trees are the most common: log(N) searches

  6. Queries: Logical Aspects • Relational Algebra • , , , ,  • Joins kind of important too!!! • General join • Equijoin • Natural join • Transformations for optimization • SQL • Declarative • Relational calculus + algebra + grouping, etc. • Computationally incomplete • Query execution: relational algebra operations

  7. Queries: Physical Aspects • Implementation of relational algebra • File scan vs. index lookup vs. binary search • Exact-match vs. range queries • Impact of clustered vs. non-clustered indices • Join methods: • Block-Oriented Nested Loops (good idea, if buffers avail.) • Index Nested Loops (good idea, if index avail.) • Sort-Merge (good idea, if sorted or needs to be sorted) • Query optimization: cost estimates hard!!! • Large space of physical, logical alternatives • Prune space by considering only left-deep plans • Enables pipelined execution

  8. Concurrency Control and Recovery • ACID transactions enforce CC&R • Strict 2PL “guarantees” serializability (C, I) • Deadlocks possible (detect and abort a XACT) • Phantom tuples possible (fix with index locking) • ARIES guarantees XACTS are A, D • Write-ahead logging • Log UNDO actions to allow complete replaying of history • Recovery phases: • Analyze (rebuild main memory structures) • REDO (rebuild data in buffer frames) • UNDO (abort uncommitted XACTS)

  9. Object Databases • ORDBMS (e.g. Oracle 8i) vs. OODBMS (e.g. O2) • Structural aspects • Collection-valued fields • OIDs • Freely composable type constructors • Behavioral aspects (methods) • Adds new dimensions to RDBMS problems • SQL3, OQL standards are emerging

  10. Current and Future DBMS Issues • New applications yield new techniques • New techniques yield new applications • Some “new” applications: • Data warehousing • On-line analytical processing (OLAP) • Data mining • Distributed data • Heterogeneous data and data integration • Scientific/sequential/ordered data • Partial or approximate query answers

  11. Current and Future Issues (cont.) • Active DBs: rule management (ICs and triggers) • Real-time DBMS • Web-based DBMS • XML and semi-structured data • Spatial and high-dimensional data (lots of columns) • Special-purpose DBMSs • Digital Libraries • Geographic Information Systems • etc…..

  12. Current and Future Issues (cont.) • Some “new” techniques: • New kinds of indices • Improved B Trees • Faster aggregation algorithms • New QP algorithms • Better optimization techniques • Data broadcasting • Generic data models • Faster sorting algorithms • New query languages • Deductive DBMSs

  13. Current and Future Issues (cont.) • Object databases • New algebras • Query cost estimation • New locking and commit protocols • Main-memory databases • CC/R techniques for non-relational settings • DBMS interfaces, visualization tools • DBMS development tools • etc…. • BOTTOM LINE: Lots of opportunities for jobs, research, development, and fun !!!

More Related