html5-img
1 / 25

CS 543 Lecture 1: Review of DBMS

CS 543 Lecture 1: Review of DBMS. January 23, 2012. Basic Definitions. Database: A collection of related data. Data: Known facts that can be recorded and have an implicit meaning. Mini-world:

cedric
Download Presentation

CS 543 Lecture 1: Review of DBMS

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS 543Lecture 1: Review of DBMS January 23, 2012

  2. Basic Definitions • Database: • A collection of related data. • Data: • Known facts that can be recorded and have an implicit meaning. • Mini-world: • Some part of the real world about which data is stored in a database. For example, student registration, grades and transcripts at a university. • Database Management System (DBMS): • A software package/system to facilitate the creation and maintenance of a computerized database. • Database System: • The DBMS software together with the data itself. Sometimes, the application programs and interfaces are also included.

  3. Data Models • Data Model: • A set of concepts to describe the structure of a database, the operations for manipulating the data, and the constraints that the data should follow. • Data Model Structure and Constraints: • Data Model constructs define the database structure • Data model constructs often include: data elements and their data types (often called attributes); grouping of related elements into entities (also called objects or records or tuples); and relationships among entities • Constraints specify restrictions on the stored data; the data that satisfies the constraints is called valid data

  4. Relational Model Concepts • A Relation is a mathematical concept based on the ideas of sets • The model was first proposed by Dr. E.F. Codd of IBM Research in 1970 in the following paper: • "A Relational Model for Large Shared Data Banks," Communications of the ACM, June 1970 • The above paper caused a major revolution in the field of database management and earned Dr. Codd the coveted ACM Turing Award

  5. Informal Definitions • Informally, a relation looks like a table of values. • A relation typically contains a set of rows. • The data elements in each row represent certain facts that correspond to a real-world entity or relationship • In the formal model, rows are called tuples • Each column has a column header that gives an indication of the meaning of the data items in that column • In the formal model, the column header is called an attribute name (or just attribute)

  6. Informal Definitions • Key of a Relation: • Each row has a value of a data item (or set of items) that uniquely identifies that row in the table • Called the key • In the STUDENT table, SSN is the key • Sometimes row-ids or sequential numbers are assigned as keys to identify the rows in a table • Called artificial key or surrogate key

  7. Formal Definitions - Summary • Formally, • Given R(A1, A2, .........., An) • r(R) dom (A1) X dom (A2) X ....X dom(An) • R(A1, A2, …, An) is the schema of the relation • R is the name of the relation • A1, A2, …, An are the attributes of the relation • r(R): a specific state (or "value" or “population”) of relation R – this is a set of tuples (rows) • r(R) = {t1, t2, …, tn} where each ti is an n-tuple • ti = <v1, v2, …, vn> where each vjelement-ofdom(Aj)

  8. Definition Summary

  9. Relational Integrity Constraints • Constraints are conditions that must hold on all valid relation states. • There are three main types of constraints in the relational model: • Key constraints • Entityintegrity constraints • Referential integrity constraints • Another implicit constraint is the domain constraint • Every value in a tuple must be from the domain of its attribute (or it could be null, if allowed for that attribute)

  10. Overview of SQL • SQL is a standard, comprehensive language, based on the relational model • SQL includes capabilities for many functions: • DDL statements for creating schemas and specifying data types and constraints • Statements for specifying database retrievals • Statements for modifying the database • Statements for specifying views, triggers, and assertions (see Chapter 5) • Many other capabilities

  11. Summary of SQL Queries • A query in SQL can consist of up to six clauses, but only the first two, SELECT and FROM, are mandatory. The clauses are specified in the following order:SELECT <attribute list>FROM <table list>[WHERE <condition>][GROUPBY <grouping attribute(s)>][HAVING <group condition>][ORDER BY <attribute list>] ;

  12. ER Model Concepts • Entities and Attributes • Entities are specific objects or things in the mini-world that are represented in the database. • For example the EMPLOYEE John Smith, the Research DEPARTMENT, the ProductX PROJECT • Attributes are properties used to describe an entity. • For example an EMPLOYEE entity may have the attributes Name, SSN, Address, Sex, BirthDate • A specific entity will have a value for each of its attributes. • For example a specific employee entity may have Name='John Smith', SSN='123456789', Address ='731, Fondren, Houston, TX', Sex='M', BirthDate='09-JAN-55‘ • Each attribute has a value set (or data type) associated with it – e.g. integer, string, subrange, enumerated type, …

  13. Relationships and Relationship Types • A relationship relates two or more distinct entities with a specific meaning. • For example, EMPLOYEE John Smith works on the ProductX PROJECT, or EMPLOYEE Franklin Wong manages the Research DEPARTMENT. • Relationships of the same type are grouped or typed into a relationship type. • For example, the WORKS_ON relationship type in which EMPLOYEEs and PROJECTs participate, or the MANAGES relationship type in which EMPLOYEEs and DEPARTMENTs participate. • The degree of a relationship type is the number of participating entity types. • Both MANAGES and WORKS_ON are binary relationships.

  14. Constraints on Relationships • Constraints on Relationship Types • (Also known as ratio constraints) • Cardinality Ratio (specifies maximum participation) • One-to-one (1:1) • One-to-many (1:N) or Many-to-one (N:1) • Many-to-many (M:N) • Existence Dependency Constraint (specifies minimum participation) (also called participation constraint) • zero (optional participation, not existence-dependent, partial) • one or more (mandatory participation, existence-dependent, total)

  15. Structural Constraints • 2 options: • 1. Combine notation • (Min, Max) • 2. Separate notation • One for cardinality ratio • One for participation constraints

  16. Relational Database Design by ER- and EER-to-Relational Mapping • ER-to-Relational Mapping Algorithm • Step 1: Mapping of Regular Entity Types • Step 2: Mapping of Weak Entity Types • Step 3: Mapping of Binary 1:1 Relation Types • Step 4: Mapping of Binary 1:N Relationship Types. • Step 5: Mapping of Binary M:N Relationship Types. • Step 6: Mapping of Multivalued attributes. • Step 7: Mapping of N-ary Relationship Types. • Mapping EER Model Constructs to Relations • Step 8: Options for Mapping Specialization or Generalization. • Step 9: Mapping of Union Types (Categories).

  17. Functional Dependencies • Functional dependencies (FDs) • Are used to specify formal measures of the "goodness" of relational designs • And keys are used to define normal forms for relations • Are constraints that are derived from the meaning and interrelationships of the data attributes • A set of attributes X functionallydetermines a set of attributes Y if the value of X determines a unique value for Y

  18. Inference Rules for FDs (1) • Given a set of FDs F, we can infer additional FDs that hold whenever the FDs in F hold • Armstrong's inference rules: • IR1. (Reflexive) If Y subset-of X, then X -> Y • IR2. (Augmentation) If X -> Y, then XZ -> YZ • (Notation: XZ stands for X U Z) • IR3. (Transitive) If X -> Y and Y -> Z, then X -> Z • IR1, IR2, IR3 form a sound and complete set of inference rules • These are rules hold and all other rules that hold can be deduced from these

  19. Inference Rules for FDs (2) • Some additional inference rules that are useful: • IR4. Decomposition: If X -> YZ, then X -> Y and X -> Z • IR5. Union: If X -> Y and X -> Z, then X -> YZ • IR6.Psuedotransitivity: If X -> Y and WY -> Z, then WX -> Z • The last three inference rules, as well as any other inference rules, can be deduced from IR1, IR2, and IR3 (completeness property)

  20. Closure • Closure of a set F of FDs is the set F+ of all FDs that can be inferred from F • Closure of a set of attributes X with respect to F is the set X+ of all attributes that are functionally determined by X • X+ can be calculated by repeatedly applying IR1, IR2, IR3 using the FDs in F

  21. Practical Use of Normal Forms • Normalization is carried out in practice so that the resulting designs are of high quality and meet the desirable properties • The practical utility of these normal forms becomes questionable when the constraints on which they are based are hard to understand or to detect • The database designers need not normalize to the highest possible normal form • (usually up to 3NF, BCNF or 4NF) • Denormalization: • The process of storing the join of higher normal form relations as a base relation—which is in a lower normal form

  22. First Normal Form • Disallows • composite attributes • multivalued attributes • nested relations; attributes whose values for an individual tuple are non-atomic • Considered to be part of the definition of relation

  23. Second Normal Form • A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key • R can be decomposed into 2NF relations via the process of 2NF normalization

  24. Third Normal Form • A relation schema R is in third normal form (3NF) if it is in 2NF and no non-prime attribute A in R is transitively dependent on the primary key • R can be decomposed into 3NF relations via the process of 3NF normalization • NOTE: • In X -> Y and Y -> Z, with X as the primary key, we consider this a problem only if Y is not a candidate key. • When Y is a candidate key, there is no problem with the transitive dependency . • E.g., Consider EMP (SSN, Emp#, Salary ). • Here, SSN -> Emp# -> Salary and Emp# is a candidate key.

More Related