Chapter 2 THE RELATIONAL MODEL OF DATA
Chapter 2The Relational Model of Data 2.1 An Overview of Data Models 2.2 Basics of Relational Model 2.3 Defining a Relation Schema in SQL 2.4 An Algebraic Query Language 2.5 Constraints on Relations 2.6 Summary of Chapter 2 2.7 References for Chapter 2
Section 2.1 An Overview of Data Models
2.1 An Overview of Data Models 2.1.1 What is Data Model? 2.1.2 Important Data Models 2.1.3 The Relational Data Model in Brief 2.1.4 The Semi-structured Model in Brief 2.1.5 Other Data Models 2.1.6 Comparison of Modeling Approaches
2.1.1 What is Data Model? • A data model is a “small set of notations/mathematics” (mathematical model – see Definition in Discrete Mathematics) for describing data. • The description generally consists of 3 parts: • Structure: it can be imagined as 'object' in Java or 'struct' in C but in database world, the structure of data is higher level than physical data model. That's why we refer to it as conceptual model. • Operations: a limited set of queries (retrieving data) and modifications (changing data) • Constraints: applying some limitations on what the data can be.
Structure • External view • E.g. CS • conceptual view • E.g Knowledge • Internal view • Congress libaray
2.1.2 Important Data Models • Currently, there are two important data models as follows: • The Relational Data Model (including Object-Relational Extension) which is present in all commercial DBMS's • The Semi-Structured Data Model (including XML) which is an added feature of most DBMS's
2.1.3 The Relational Data Model in Brief • The relational model is based on tables. • For instance, the following table shows 3 movies but you can imagine that there are many more rows.
2.1.3 The Relational Data Model in Brief (cont'd) • We are not going to talk about how to implement the structure of the tables here and it will be postponed to higher courses in database. • There are some operations that we can do on the tables. For example we can query the rows where the genre is 'comedy'. • As an example for the constraints, we may decide there could never be two movies with the same title and year in this table.
2.1.4 The Semi-structured Model in Brief • Semi-structured data resembles trees or graphs, rather than tables or arrays. • The principal manifestation of this viewpoint today is XML, a way to represent data by hierarchically nested tagged elements. TYLIN: IBM ARC abandom it • The tags, similar to those used in HTML. • You can imagine tags as the column headers do in the relational model. • You can see an example of XML in the next slide which is the same as movies data.
2.1.4 The Semi-structured Model in Brief (cont'd) <Movies> <Movie title=“Gone with the wind”> <Year>1939</Year> <Length>281</Length> <Genre>drama</Genre> </Movie> <Movie title=“Star Wars”> <Year>1977</Year> <Length>124</Length> <Genre>Scifi</Genre> </Movie> <Movie title=“Wayne’s World”> <Year>1992</Year> <Length>95</Length> <Genre>Comedy</Genre> </Movie> </Movies>
2.1.5 Other Data Models • A modern trend is to add object-oriented features to the relational model. • There are two effects of object-orientation on relations: • Values can have structure, rather than being elementary types such as integer or strings. • Relations can have associated methods. • These extensions are called object-relational model.
2.1.5 Other Data Models (cont'd) • In earlier DBMS's, there were several other models like hierarchical model or network model. • Hierarchical model was a tree-oriented model that unlike the modern DBMS's, it really operated at the physical level. • Network model was a graph-oriented and also physical level model as well.
2.1.6 Comparison of Modeling Approaches (IBM ARC; no more • Semi-structured models have more flexibility but relational model is still preferred. • In large databases, efficiency of access to data and modifying data are of great importance. • Ease of use is another factor of using DBMS's. • Both of these features can found in relational DBMS's. • Moreover, SQL, the structured query language, in spite of its simplicity, is a powerful language for database operations.
Section 2.2 Basics of Relational Model
2.2 Basics of Relational Model 2.2.1 Attributes 2.2.2 Schemas 2.2.3 Tuples 2.2.4 Domains 2.2.5 Equivalent Representations of a Relation 2.2.6 Relation Instances 2.2.7 Keys of Relations 2.2.8 An Example Database Schema 2.2.9 Exercises for Section 2.2
2.2 Basics of Relational Model • The relational model gives us a single way to represent data: as a two-dimensional table called a relation. Movies relation • Each row (tuple) represents a movie and each column (attribute) represents a property of the movie.
2.2.1 Attributes • The columns of a relation are called attributes. • In the Movies relation (in previous slide), title, year, length, and genre are attributes. • Attributes appear at the top of the columns. • Like choosing descriptive names for variables in regular programming languages, attributes names should be chosen in such a way that describe the contents.
2.2.2 Schemas • The names of a relation and the set of attributes for the relation is called the schema of the relation. • We show the schema for the relation with the relation name followed by the parenthesized list of its attributes • For instance, the following is the schema of relation Movies: Movies(title, year, length, genre)
2.2.2 Schemas (cont'd) • Note that the attributes are a set, not a list but when we talk about relations, we often specify an order for the attributes. • A database consists of one ore more relations. The set of schemas in the database is called a relational database schema, or just a database schema.
2.2.3 Tuples • The rows of a relation, other than the header row containing the attributes names, are called tuples. • A tuple has one component for each attribute of the relation. • For example, in the Movies relation, the first tuple has four components: 'Gone with the wind', 1939, 231, and drama for attributes title, year, length, and genre respectively.
2.2.3 Tuples (cont'd) • When we wish to write a tuple in isolation, not as a part of a relation, we normally use commas to separate components like this:('Gone with the wind', 1939, 231, 'drama') • Note that we always use the same order of the attributes to show the tuple in isolation.
2.2.4 Domains Tylin 8/29: Past, present, future data • The relational model requires that each component of each tuple be atomic. • That is, it must be of some elementary type such as integer or string. • It is not allowed for a value to be a set, list, array or any other type that reasonably can be broken into smaller components. • It's further assumed that associated with each attribute of a relation is a domain.
2.2.4 Domains (cont'd) • We can include the domain for each attributes in a schema as follows:Movies(title:string, year:integer, length:integer, genre:string)
2.2.5 Equivalent Representations of a Relation • Relations are sets of tuples, not lists of tuples. In other words, the order of tuples in a relation has no significance. • Moreover, we can reorder the attributes of a relation as well. • Note that, when we change the order of the attributes, we should change the order of the contents as well.
2.2.6 Relation Instances • Relations change over time. In other words, relations are not static. • For example, we insert tuples in the Movies relation over time, and also, we may delete or update some tuples as well. • Even the schema can be changed. In other words, we may add/delete an attribute to/from the schema. • We call a set of tuples for a given relation an instance of that relation. • Current instance, is the set of tuples that exists now.
2.2.7 Keys of Relations • Relational model allows us to place some constraints on a schema. • One important constraint is called key constraint or simply a key. • A set of attributes (one or more) forms a key if two tuples in the relation cannot have the same values in all the attributes of the key.
2.2.7 Keys of Relations (cont'd) Example 2.1 For the Movies relation, we can assign the attributes title and year be the key of the relation. In this way, the relation cannot have two tuples with the same title and year. Note that the title by itself does not form a key because there are many movies over the years that have the same name. In other words, the title by itself is not unique and cannot identify a movie uniquely.
2.2.7 Keys of Relations (cont'd) • We indicate the attribute(s) contributing in the key by underlying them as follows:Movies(title, year, length, genre) • Note that the key is a constraint for all possible instances of the relation, and not for a specific instance. • Most of the time, we use an artificial keys for a relation. For example, for the Movies relation, we could add a new attributes movie_id and assign it as the key. In this way, we could make sure it was unique for all possible tuples.
2.2.8 An Example Database Schema The database schema that are used during this book is as follows: Movies(title:string, year:integer, length:integer, genre:string, studioName:string, producerC#:integer) Moviestar(name:string, address:string, gender:char, birthdate:date)
2.2.8 An Example Database Schema (cont'd) StarsIn(movieTitle:string, movieYear:integer, starName:string) MovieExec (name:string, addres:string, cert#:integer, netWorth:integer) Studio (name:string, address:string, presC#:integer)
Section 2.3 Defining a Relation Schema in SQL
2.3 Defining a Relation Schema in SQL 2.3.1 Relations in SQL 2.3.2 Data Types 2.3.3 Simple Table Declaration 2.3.4 Modifying Relations Schemas 2.3.5 Default Values 2.3.6 Declaring Keys 2.3.7 Exercises for Section 2.3
2.3 Defining a Relation Schema in SQL • SQL, Structured Query Language, pronounced "sequel", is the principal language to describe, and manipulate relational database. • There is a standard called SQL-99 that most commercial databases implemented something similar, but not identical to, the standard. • There are two sub-languages for SQL: • DDL: Data Definition Language • DML: Data Manipulation Language
2.3.1 Relations in SQL • SQL makes a distinction between three kinds of relations: • Stored relations: are called tables. These relations exists in database and usually we deal with them. • Views: are relations that do not exist but are constructed when needed. • Temporary tables: are constructed temporarily by SQL processor when it executes queries or other tasks. • We are going to discuss about tables in this chapter. Views will be covered in chapter 8 and temporary tables are never declared.
2.3.2 Data Types • All attributes must have a data type. • The primitive data types supported by SQL are: • Character string • CHAR(n): fixed length string of length n; short strings will be padded with trailing blank to make n characters. • VARCHAR(n): variable length string up to n character; an end-marker or string-length is used to show the end of the string; the purpose is to save space.Note that longer values will be truncated to fit. • Bit string • BIT(n): fixed bit string of length n; • BIT VARYING(n): bit string of length up to n;
2.3.2 Data Types (cont'd) • The primitive data types (cont'd): • BOOLEAN: a logical value of TRUE, FALSE, or UNKNOWN (NULL) • INT or INTEGER: integer value • SHORTINT: short integer; usually the lower bound and the upper bound of SHORTINT is half of INTEGER's. • FLOAT or REAL: floating point number • DOUBLE: double precision real number • DECIMAL(n, d): customized real number; • NUMERIC(n, d): a synonym for DECIMAL
2.3.2 Data Types (cont'd) • The primitive data types (cont'd): • DATE : represents a date value of the form 'yyyy-mm-dd' • TIME: represents a time value of the form'HH:mm:ss' or 'HH:mm:ss.d' (d is a fraction of seconds) • You can create a date constant like this:DATE '2011-08-24' • You can create a time constant like this:TIME '16:09:25' or TIME '16:09:25.378' • Most databases have TIMESTAMP data type of the form 'yyyy-mm-ddHH:mm:ss'
2.3.3 Simple Table Declaration • The simplest form of relation declaration:CREATE TABLE tabName(attrib1 type,attrib2 type,...attribn type);
2.3.3 Simple Table Declaration (cont'd) Example 2.2 The relation Movies can be declared as follows: CREATE TABLE Movies( title CHAR(100), year INT, length INT, genre CHAR(10), studioName CHAR(30), producerC# INT);
2.3.3 Simple Table Declaration (cont'd) Example 2.3 The relation MovieStar can be declared as follows: CREATE TABLE MovieStar( name CHAR(30), address VARCHAR(255), gender CHAR(1), birthdate DATE); The gender attribute can be 'M' or 'F'.
2.3.4 Modifying Relations Schemas • To drop a relation R, execute the following SQL statement:DROP TABLE R; • To alter the schema, we have several options. • To add attributes:ALTER TABLE R ADD attrib1 type, ..., attribn type; • To drop attributes:ALTER TABLE RDROP attrib1, ..., attribn;
2.3.4 Modifying Relations Schemas (cont'd) Example 2.4 Add an attribute to MoviesStar for phone data. ALTER TABLE MovieStar ADD phone CHAR(16); Note that phone attribute will be NULL for all existing tuples. Drop birthdate attribute ALTER TABLE MovieStar DROP bithdate;
2.3.5 Default Values • When we insert or modify a tuple, we sometimes do not have values for some attributes and we wish to assign a default values for them. • To assign a default value for attribute1, we use the following syntax: CREATE TABLE tabName(attrib1 type DEFAULT defaultValue,...attribn type);
2.3.4 Modifying Relations Schemas (cont'd) Example 2.5 Assign default value '?' for gender and default value '0000-00-00' for birthdate. CREATE TABLE MovieStar( name CHAR(30), address VARCHAR(255), gender CHAR(1) DEFAULT '?', birthdate DATE DEFAULT DATE '0000-00-00'); Note that we can assign a default value when altering a schema as well: ALTER TABLE MovieStar ADD phone CHAR(16) DEFAULT 'unlisted';
2.3.6 Declaring Keys • There are two ways to declare an attribute or a set of attributes to be a key: • Method 1:CREATE TABLE tabName(attrib1 type PRIMARY KEY,...attribn type); • Method 2:CREATE TABLE tabName(attrib1 type,...attribn type,PRIMARY KEY(attrib1,...,attribk));
2.3.6 Declaring Keys (cont'd) • Note that if the key is a set of attributes, then we have to use method 2 but if the key is just one attribute, then either methods can be used. • There are two declarations that may be used to indicate key: • PRIMARY KEY • UNIQUE • Both have the same effect but in PRIMARY KEY case, none of the attributes of the key can be NULL but in UNIQUE case, it's possible.
2.3.6 Declaring Keys (cont'd) Example 2.6 Declare name attribute as primary key in MovieStar relation. CREATE TABLE MovieStar( name CHAR(30) PRIMARY KEY, address VARCHAR(255), gender CHAR(1) DEFAULT '?', birthdate DATE DEFAULT DATE '0000-00-00');
2.3.6 Declaring Keys (cont'd) Example 2.6 (cont'd) Alternatively, we can use the following syntax: CREATE TABLE MovieStar( name CHAR(30), address VARCHAR(255), gender CHAR(1) DEFAULT '?', birthdate DATE DEFAULT DATE '0000-00-00'), PRIMARY KEY (name); Note that UNIQUE can replace PRIMARY KEY.