680 likes | 1.21k Views
Database Introduction and Terminology. Alan Schneider. Database Introduction and Terminology Overview. Types of Databases Operational / OLTP Analytical / DW Operational Data Stores / ODS Database Evolution Hierarchical Network Relational Database Terminology Value-Related
E N D
Database Introduction and Terminology Alan Schneider
Database Introduction and Terminology Overview • Types of Databases • Operational / OLTP • Analytical / DW • Operational Data Stores / ODS • Database Evolution • Hierarchical • Network • Relational • Database Terminology • Value-Related • Structure-Related • Relationship-Related
Types of Databases Found Today • The Business Cycle below shows us that any enterprise must operate at three levels • Operational (i.e., the day-to-day running of the business); • Tactical (i.e., the definition of policy and the monitoring of operations) • Strategic (i.e., the definition of organization's vision, goals and objectives).
Types of Databases Found Today • Operational or On-Line Transaction Processing (OLTP) • Used in the day-to-day life of an organization, institution, or business • Stores dynamic data • Data changes constantly and reflects up-to-the-minute information • Examples include: • Inventory databases • Order maintenance databases • Patient-tracking databases • Periodical subscription databases
Types of Databases Found Today • Analytical or Data Warehouse (DW) • Used to store and track historical and time-dependent data • Track trends, view statistical data over a long period of time, or make long-term projections • Stores Static data • Data is never (or at least rarely) modified, and the information reflected by the database is applicable to a specific point in time • Examples include: • Chemical test databases • Geological sample databases • Survey databases
Types of Databases Found Today • Operational Data Stores (ODS) • “The architectural construct where collective integrated operational data is stored." • Unlike OLTP databases, the Operational Data Store contains subject-oriented, enterprise-wide data like a DW • Unlike DW databases, the data in Operational Data Stores are volatile, current, and detailed like a OLTP • Data in the Operational Data Store are constantly refreshed so that the resulting image reflects the latest state of operations.
The Hierarchical Database Model • Data • Structured hierarchically • Visualized as an inverted tree. • A single table will act as the "root" of the inverted tree • Other tables will act as the branches flowing from the root
The Hierarchical Database Model • Relationships • Represented in terms of parent / child. • A parent table can be associated with many child tables • A single child table can have only one parent table
The Hierarchical Database Model • Advantages • Data can be retrieved very quickly because the table structures are explicitly linked • Referential integrity is built in and automatically enforced. What this basically means is: • A record in a child table must be linked to an existing record in a parent table • If a record is deleted in a Parent table, all associated records in any child tables to which that record is linked will be deleted as well
The Hierarchical Database Model • Disadvantages • Cannot store a record in a child table that is currently unrelated to any record in a parent table • Cannot support complex relationships
The Network Database Model • Developed in part in an attempt to address some of the problems of the hierarchical database model • As with the hierarchical database model, the structure can be visualized as an inverted tree • However, there can be several inverted trees that share branches, all of which are part of the same database structure
The Network Database Model • Established and represented by a “Set Structure” • A transparent construction that relates a pair of tables together • One table is an owner • The other table as a member
The Network Database Model • Set structures • Support a one-to-many relationship • Within a specific set, a record in the “Owner” table can be related to many records in the member table • Single record in the “Member” table is related to only one record in the owner table • A record in the “Member” table cannot exist without being related to an existing record in the “Owner” table • For example, a client must be assigned to an agent, although an agent with no clients may still be listed
The Network Database Model • Data • Accessed by working through the appropriate Set Structures • Unlike the hierarchical database model, where a user must begin to access data from the root table, a user can access data from any table and work backward or forward through related sets
The Network Database Model • Advantages • Data can be accessed very rapidly • A user can fashion more complex queries
The Network Database Model • Disadvantages • A user has to be very familiar with the structure of the database in order to work through the set structures • It is not easy to change the database structure without affecting the application programs that interact with it • Relationships are explicitly defined as Set Structures • Set Structures are used by the application to navigate through the data • All references made from within the application program to that Set Structure will have to be modified
The Relational Database Model • The Network database model was clearly a step up from the Hierarchical database model • However, as each model emerged, users found that they could ask more complex questions • As the complexity of the questions increased, more demands were made on the database • The result was the development of the Relational Database Model
The Relational Database Model • History • Late 1960s: Dr. E. F. Codd, a research mathematician at IBM, was looking into new ways to handle large amounts of data • He had the idea that applying the disciplines and structures of mathematics to data management would help to solve many of the problems encountered when using other database models • Such as: • Data redundancy • Too much dependence on physical implementation • Weak data integrity
The Relational Database Model • History • June, 1970: Dr. Codd presented his now landmark work titled • "A Relational Model of Data for Large Shared Databanks” • Based on two branches of mathematics • Set Theory • First Order Predicate Logic • The RDM derives its name from the term “Relation" which is a part of Set Theory • A general misconception is that the RDM derives its name from the fact that tables in the database can be "related" to one another
The Relational Database Model • Description • Data is stored in “Relations”, which are perceived by the user as tables • Each “Relation” is composed of • “Tuples” or “Records” • “Attributes” or “Fields” • The terms “Tables," “Records," and “Fields" will be used from now on
The Relational Database Model • Description • Two characteristics of an RDM allow the data to exist independent of the way it is physically stored in the computer. In other words, a user isn't required to know the physical location of a record in order to retrieve its data • The physical order of the records or fields in a table is completely immaterial • Each record in the table is identified by a field that contains a unique value
The Relational Database Model • Description • The RDM is unlike the Hierarchical and Network database models, in which knowing the layout of the structures is very important if a user is going to retrieve any data
The Relational Database Model • Relationships • Categorized as: • One-to-One • One-to-Many • Many-to-Many. • A relationship between a pair of tables is established implicitly through matching values of a shared field (Primary and Foreign Keys)
The Relational Database Model • Accessing Data • As long as a user is familiar with the relationships among the tables in the database: • They can access data in an almost unlimited number of ways • Can access data from tables that are directly related as well as from tables that are indirectly related
The Relational Database Model • Advantages • Built-in multilevel integrity: • At the field level to ensure the accuracy of the data • At the table level to ensure that records are not duplicated and to detect missing Primary key values • At the relationship level to ensure that the relationship between a pair of tables is valid • At the business level to ensure that the data is accurate in terms of the business itself
The Relational Database Model • Advantages • Logical and physical data independence from database applications • Neither changes a user makes to the logical design of the database nor changes made by the database software vendor to the physical implementation of the database will adversely affect the applications built upon it
The Relational Database Model • Advantages • Guaranteed data consistency and accuracy • Data is consistent and accurate due to the various levels of integrity you can impose within the database • Easy data retrieval • At the user's command, data can be retrieved either from a particular table or from any number of related tables within the database
Relational Database Management Systems (RDBMS) • A software program that is used to create, maintain, modify, and manipulate a relational database • Also used to create the applications that users will use to interact with the data stored in the database • Even among "true" RDBMSs, support for the RDM varies among vendors, and there is yet to be a full implementation of the RDM's potential
Relational Database Management Systems (RDBMS) • Since the early 1970s, a number of RDBMS programs have been produced by a variety of software vendors (e.g., Oracle, Microsoft, IBM, Informix, Sybase) • These programs have encompassed various types of computer hardware, operating systems, and programming environments. • Today RDBMS programs can be found on just about any type of system and can be used under a wide variety of circumstances
Relational Database Management Systems (RDBMS) • Initially were widely used to manage large volumes of shared data in a Client / Server architecture (db server, custom application) • In recent years RDBMS have been incorporated into a three-tier architecture (db server, application server, web browser)
RDBMS Database Terminology • Used to express and define the special ideas and concepts of the relational database model • Used to express and define the database design process itself • Used anywhere a relational database or RDBMS is discussed
Value-Related Terms • “Data” • The values that are stored in the database. • Data are static in the sense that they remain in the same state until they are modified by some manual or automated process
Value-Related Terms • “Information” • Data that has been processed in a way that makes it meaningful and therefore useful to the person working with or viewing it • Information is dynamic in the sense that it constantly changes relative to the data stored in the database, and also in the sense that it can be processed in an unlimited number of ways • Information can be presented in a variety of forms. • Result of a query • Displayed on-screen • Printed on a reports • But the point to remember is that data must be processed in some manner in order to become meaningful
Structure-Related Terms • “Table” • Known as a “Relation” in relational database theory • The chief structure in a relational database • Composed of fields and records • Represents something that is tangible, such as a: • Person • Place • Thing
Structure-Related Terms • “Field “ • Known as an “Attribute” in relational database theory • The smallest structure in a relational database • Used to store data in the database and it represents a characteristic of the subject of the table in which the field resides
Structure-Related Terms • “Record” • Known as a “Tuple” in relational database theory • A structure within a table that represents a unique instance of the subject of the table • Composed of the entire set of fields in a table, regardless of whether or not the fields contain any values
Structure-Related Terms • “View” • A virtual table that is composed of the fields from one or more tables • Considered "virtual" because it doesn't store any data on its own • instead it draws its data from the tables on which it is based
Structure-Related Terms • “View” • Important for at least three reasons. • Allows you to draw data from multiple tables. (In order for a View to draw data from multiple tables, it is necessary for the tables to have connections (joins), or relationships, to each other) • Provides a means of preventing specified fields within a table (or group of tables) from being manipulated or seen by certain users. This capability can be very advantageous in terms of security • Can be used to implement data integrity; this type of View is known as a Validation View
Structure-Related Terms • “Keys” • Special fields that serve specific purposes within a table, and the type of key determines its use within the table • Primary key is a field that uniquely identifies a record within a table • Foreign key, which is a field that is used to establish a relationship between a pair of tables
Structure-Related Terms • “Index” • A structure within an RDBMS that is used to improve data processing
Structure-Related Terms • Key and Index are two terms that are frequently used interchangeably throughout the database industry • The best way to remember the difference between the two is that Keys are logical structures used to identify records within a table, and Indexes are physical structures used to optimize data processing
Relationship-Related Terms • “Relationships” • A connection established between a pair of tables is known as a relationship • A relationship exists when a pair of tables is connected by a Primary key and a Foreign key or is linked together by a third table, known as a linking table
Relationship-Related Terms • “Relationships” • Are very important to data integrity because they • Help reduce redundant data and duplicate data • They also provide the means to define views • Every relationship can be characterized by: • The Type of relationship that exists between the tables • The Type of participation each table has within the relationship • The Degree of participation each table has within the relationship
Relationship-Related Terms • “Relationships” • When two tables are related, there is always a specific type of relationship (traditionally known as cardinality) that exists between them • One-to-One • One-to-Many • Many-to-Many.
Relationship-Related Terms • “Relationships” • One-to-One Relationships • Exists between a pair of tables if a single record in the first table is related to only one record in the second table, and a single record in the second table is related to only one record in the first table
Relationship-Related Terms • “Relationships” • One-to-Many Relationship • Exists between a pair of tables if a single record in the first table can be related to one or more records in the second table, but a single record in the second table can be related to only one record in the first table • This is by far the most common relationship that exists between a pair of tables in a database. A one-to-many relationship is very important because it helps to eliminate duplicate data and to keep redundant data to an absolute minimum.
Relationship-Related Terms • “Relationships” • Many-to-Many • Relationship exists between a pair of tables if a single record in the first table can be related to one or more records in the second table, and a single record in the second table can be related to one or more records in the first table • Establishing a direct connection between these two tables is difficult because it will produce a large amount of redundant data in one of the tables. There is also a problem with inserting, updating, and deleting data in this type of relationship. However, these problems can be surmounted