Chapter 1: Introduction • Purpose of Database Systems • View of Data • Data Models • Data Definition Language • Data Manipulation Language • Transaction Management • Storage Management • Database Administrator • Database Users • Database Management System Structure
What is a Database? • According to the book: • Collection of interrelated data • Set of programs to access the data • A DBMS contains information about a particular enterprise • DBMS provides an environment that is both convenient and efficient to use. • Another definition: • A database is a collection of organized, interrelated data, typically relating to a particular enterprise • A Database Management System (DBMS) is a set of programs for managing and accessing databases
Some PopularDatabase Management Systems • Commercial “off-the-shelf” (COTS): • Oracle • IBM DB2 (IBM) • SQL Server (Microsoft) • Sybase • Informix (IBM) • Access (Microsoft) • Open Source: • MySQL • PostgreSQL Note: This is not a course on any particular DBMS!
Some Database Applications • Databases touch all aspects of our lives: • Banking – accounts, loans, customers • Airlines - reservations, schedules • Universities - registration, grades • Sales - customers, products, purchases • Manufacturing - production, inventory, orders, supply chain • Human resources - employee records, salaries, tax deductions • Anywhere there is data, there could be a database. • Course context is an “enterprise” that has requirements for: • Storage and management of 100’s of gigabytes or terabytes of data • Support for 100’s or more of concurrent users and transactions • Traditional supporting platform, e.g, Sun Enterprise server, 2GB RAM, 10TB of disk space
Purpose of Database System • In the early days, database applications were built on top of file systems – coded from the ground up. Sometimes this approach is still advocated. • Drawbacks of this approach: • Data redundancy and inconsistency • Multiple files and formats • Difficulty accessing data • A new program is required to carry out each new task • Data integrity problems • Integrity constraints (e.g. account balance > 0) become embedded throughout program code • Hard to add new constraints or change existing ones • Plus others…
Purpose of Database Systems (Cont.) • Database systems offer solutions for the above problems. • Database systems also support: • Atomicity of updates • Failures may leave database in an inconsistent state with partial updates • E.g., transfer of funds from one account to another should either complete or not happen at all • Concurrent access by multiple users • Concurrent accessed is needed for performance • Uncontrolled concurrent accesses can lead to inconsistencies • E.g. two people reading a balance and updating it at the same time • Data security • Recoding this functionality from scratch is not easy!
Levels of Abstraction • Physical level: defines low-level details about how a data item (e.g., customer) is stored on disk. • Logical level: describes data stored in a database, and the relationships among the data. type customer = recordname : string;street : string;city : integer;end; • View level: defines how information is presented to users. Views can also hide details of data types, and information (e.g., salary) for security purposes.
View of Data An “architecture” for a database system:
View of Data, Cont. • In general, the interfaces between the three levels should be defined so that changes in some parts do not seriously influence others. • Carefully defining the interfaces helps enhance Physical Data Independence, which is the ability to modify the physical schema without changing the logical schema.
Instances vs. Schemas • Similar to types and variables in programming languages • Schema – defines the structure or design of a database • Analogous to type information of a variable in a program • E.g., a database might consists of information about a set of customers and accounts and the relationship between them • More precisely: • Logical schema: database design at the logical level • Physical schema: database design at the physical level • Instance – a database and its’ contents at one point in time • Analogous to a variable and its’ value
What is a Data Model? • The phrase “data model” is used in a couple of different ways. • The phrase (use #1) is frequently used to refer to an overall approach or philosophy for database design and development. • For those individuals, groups and corporations that subscribe to a specific data model, that model permeates all aspects of database design, development, implementation, etc. • Current data models: • Entity-Relationship model • Relational model • Object-oriented model • Object-relational model • Semi, and non-structured data models • Legacy models: • Network • Hierarchical
What is a Data Model, Cont? • During the early phases of database design and development, a “data model” is frequently developed (use #2). • The purpose of developing the data model is to define: • Data • Relationships between data items • Semantics of data items • Constraints on data items In other words, a data model defines the logical schema, i.e., the logical level of design of a database. • A data model is typically conveyed as one or more diagrams. • The type of diagrams used depends on the overall approach or philosophy (i.e., the data model, as defined in the first sense). • This early phase is referred to as data modeling.
Entity-Relationship Model Example of an entity-relationship diagram: • Widely used for database modelling. • An ER model is converted to tables in a relational database.
Attributes account-number customer-city customer-street customer-name customer-id 192-83-7465 019-28-3746 192-83-7465 321-12-3123 019-28-3746 Johnson Smith Johnson Jones Smith Alma North Alma Main North Palo Alto Rye Palo Alto Harrison Rye A-101 A-215 A-201 A-217 A-201 Relational Model • Example of tabular data in the relational model • From a data modeling perspective, which approach is preferable? The ER model, or the relational model?
Data Definition Language (DDL) • Notation and program for defining a (physical) database schema (see page 129 for a more complete example): create tableaccount (account-number char(10),balance integer) • Given a DDL file, the DDL compiler generates a set of tables • A description of those tables is stored in a data dictionary: • Contains information from the database schema • Frequently referred to as metadata (i.e., data about data) • Data storage and definition language: • Language in which the storage structure and access methods used by the database system are specified • Usually an extension of the data definition language
Data Manipulation Language (DML) • Language for accessing and manipulating the data • DML is also known as query language • Two classes of DML languages: • Procedural – user specifies what data is required and how to get that data • Non-procedural – user specifies what data is required without specifying how to get that data • SQL is the most widely used query language • Usually referred to as a non-procedural query language
SQL Examples • Find the name of the customer with customer-id 192-83-7465: selectcustomer.customer-namefromcustomerwherecustomer.customer-id = ‘192-83-7465’ • Find the balances of all accounts held by the customer with customer-id 192-83-7465: selectaccount.balancefromdepositor, accountwheredepositor.customer-id = ‘192-83-7465’ anddepositor.account-number = account.account-number • Databases are typically accessed by: • Users through a command line interface • Application programs that (generally) access them through: • Language extensions to allow embedded SQL • An application program interface (e.g. ODBC/JDBC) which allow SQL queries to be sent to a database
Database Users • Users are differentiated by the way they interact with the system • Naïve users – invoke application programs that have been written previously • E.g. people accessing a database over the web, bank tellers, clerical staff, ATM users • Application programmers – interact with the system by making DML calls through an API, e.g., ODBC or JDBC from within a computer program • Sophisticated users – form requests in a database query language, typically submitted at the command-line • Specialized users – write specialized database applications that do not fit into the traditional data processing framework
Database Administrator (DBA) • Coordinates all the activities of the database system; the DBA has a good understanding of the enterprise’s information resources and needs. • DBA duties include: • Granting user authority to access the database • Acting as liaison with users • Installing and maintaining DBMS software • Monitoring performance and performance tuning • Backup and recovery • According to the book, the DBA is also responsible for: • Schema definition • Storage structure and access method definition • Schema and physical organization modification • Specifying integrity constraints • Responding to changes in requirements • These latter tasks are typically performed by a DB designer.
Transaction Management • A transaction is a collection of operations that performs a single logical function in a database application • The backup and recovery components of a DBMS ensure that the database remains in a consistent (correct) state despite failures: • system, power, network failures • operating system crashes • transaction failures. • The concurrency-control manager in a DBMS controls the interaction among the concurrent transactions, to ensure the consistency of the database.
Storage Management • The storage manager in a DBMS provides the interface between the low-level data stored in the database and the application programs and queries submitted to the system. • The storage manager is responsible to the following tasks: • interaction with the file manager • efficient storing, retrieving and updating of data • Note that the DBMS may or may not make use of the facilities of the operating systems’ file management facilities.
Overall System Structure Query Optimizer
Application Architectures • Architectures: • Mainframe – client programs and DBMS reside on one platform. • Two Tier – client programs and DBMS reside on different platforms; clients connect to DBMS via an API such as ODBC/JDBC. • Three Tier – client programs, application server (or other “middleware”), and DBMS; clients connect to DBMS indirectly through the application server (also via an API). Typically used in web-based applications. • N Tier – recent generalization of 2 and 3 tier architectures.