440 likes | 706 Views
Introduction to Database Design. Donghui Zhang CCIS, Northeastern University. Outline. Database and DBMS Architecture of Database Applications Database Design Database Application Programming. Database, DBMS. A Database is a very large, integrated collection of data .
E N D
Introduction toDatabase Design Donghui Zhang CCIS, Northeastern University
Outline • Database and DBMS • Architecture of Database Applications • Database Design • Database Application Programming
Database, DBMS • A Database is a very large, integrated collection of data. • A Database Management System (DBMS)is a software designed to store and manage databases. • A Database Application is a software which enables the users to access the database.
Why DBMS? • We currently live in a world experiencing information explosion. • To manage the huge amount of data: DBMS • the total RDBMS market in 2003 was $7 billion in license revenues. • Much more money was spent to develop Database applications.
The worldwide database management software market saw double-digit growth in 2004. • The five-year forecast calls for a compound annual growth rate of nearly 6 percent, bringing the market to $12.7 billion in new license revenue by 2009. • Title: Forecast: Database Management Systems Software, Worldwide, 2003-2009 • Author: Colleen Graham, Gartner • Time: April 21, 2005
DBMS can Provide … • Data independence and efficient access. • Reduced application development time. • Data integrity and security. • Uniform data administration. • Concurrent access, recovery from crashes.
DBMS Historic Points • First DBMS developed by Turing Award winner Charles Bachman in the early 1960s. • in 1970, Turing Award winner Edgar Codd proposed the relational data model. • in the late 1980s, IBM proposed SQL.
Outline • Database and DBMS • Architecture of Database Applications • Database Design • Database Application Programming
Components of Data-Intensive Systems Three separate types of functionality: • Data management • Application logic • Presentation
Example: Course Enrollment -- Build a system using which students can enroll in courses: • Data Management • Student info, course info, instructor info, course availability, pre-requisites, etc. • Application Logic • Logic to add a course, drop a course, create a new course, etc. • Presentation • Log in different users (students, staff, faculty), display forms and human-readable output
The Three-Tier Architecture Client Program (Web Browser) Presentation tier Application Server Middle tier Database System Data managementtier
E.g. What we use Client Program (Web Browser) Presentation tier Application Server Apache JSP Middle tier Database System Data managementtier MySQL
<HTML> <HEAD></HEAD> <BODY> <h1>Barns and Nobble Internet Bookstore</h1> Our inventory: <h3>Science</h3> <b>The Character of Physical Law</b> <UL> <LI>Author: Richard Feynman</LI> <LI>Published 1980</LI> <LI>Hardcover</LI> </UL> <h3>Fiction</h3> <b>Waiting for the Mahatma</b> <UL> <LI>Author: R.K. Narayan</LI> <LI>Published 1981</LI> </UL> <b>The English Teacher</b> <UL> <LI>Author: R.K. Narayan</LI> <LI>Published 1980</LI> <LI>Paperback</LI> </UL> </BODY> </HTML> HTML: An Example
HTML: static vs dynamic • Static: you create an HTML file which is sent to the client’s web browser upon request. E.g.: • your CCIS login is ‘donghui’, • your HTML file is /home/donghui/.www/index.html • The URL is http://www.ccs.neu.edu/home/donghui • Dynamic: the HTML file is generated dynamically via your ASP.NET code.
Another View Client Machines Machine 2 Machine 1 Client browser 1 Your JSP Code Your database Client browser 2 Apache MySQL Client browser 3
Client-Server Architecture • Data Management: DBMS @ Server. • Presentation: Client program. • Application Logic: can go either way. • If combined with server: thin-client architecture • If combined with client: thick-client architecture Server Client
Thin-Client Architecture Client • Database server and web server too closely coupled, • E.g. Does not allow the application logic to access multiple databases on different servers. Server Client Client
Thick-Client Architecture Client • No central place to update the business logic • Security issues: Server needs to trust clients • Does not scale to more than several 100s of clients Client Server Client
Advantages of the Three-Tier Architecture • Heterogeneous systems • Tiers can be independently maintained, modified, and replaced • Thin clients • Only presentation layer at clients (web browsers) • Integrated data access • Several database systems can be handled transparently at the middle tier • Central management of connections • Scalability • Replication at middle tier permits scalability of business logic • Software development • Code for business logic is centralized • Interaction between tiers through well-defined APIs: Can reuse standard components at each tier
Outline • Database and DBMS • Architecture of Database Applications • Database Design • Database Application Programming
ER-Model • Entity: Real-world object distinguishable from other objects. E.g. Students, Courses. • An entity has multiple attributes. E.g. Students have ssn, name, phone. • Entities have relationships with each other. E.g. Students enroll Courses.
Example of ER Diagram time name title ssn phone unit cid Enroll Courses Students To implement the above design, store three tables in the database.
Students Enroll Courses
Key Constraint in ER Diagram name dname ssn phone address did BelongsTo Departments Students Many-to-one relationship: no need to be implemented as a table!
Students Departments
Some Other Design Concepts • Primary key • Participation constraint • Normal forms (BCNF, 3-NF, etc.) • IS-A hierarchy • Ternary relationships
Outline • Database and DBMS • Architecture of Database Applications • Database Design • Database Application Programming
SQL Query Find the students in Computer Science Department . • if we know the did is 1: • otherwise: SELECT S.name FROM Students S WHERE S.did=1 SELECT S.name FROM Students S, Departments D WHERE D.did=S.did AND D.dname=`Computer Science’
SQL in Application Code • SQL commands can be called from within a host language (e.g., C++, Java) program. • Two main integration approaches: • Embed SQL in the host language (Embedded SQL, SQLJ) • Create special API to call SQL commands (JDBC)
Implementation of Database SystemIntroduction Donghui Zhang Partially using Prof. Hector Garcia-Molina’s slides (Notes01) http://www-db.stanford.edu/~ullman/dscb.html
Relations Statements Results Isn’t Implementing a Database System Simple?
Introducing the MEGATRON 3000 Database Management System • The latest from Megatron Labs • Incorporates latest relational technology • UNIX compatible
Megatron 3000 Implementation Details • Relations stored in files (ASCII) e.g., relation R is in /usr/db/R Smith # 123 # CS Jones # 522 # EE . . .
Megatron 3000 Implementation Details • Directory file (ASCII) in /usr/db/directory R1 # A # INT # B # STR … R2 # C # STR # A # INT … . . .
Megatron 3000Sample Sessions % MEGATRON3000 Welcome to MEGATRON 3000! & & quit % . . .
Megatron 3000Sample Sessions & select * from R # Relation R ABC SMITH 123 CS &
Megatron 3000Sample Sessions & select A,B from R,S where R.A = S.A and S.C > 100 # AB 123 CAR 522 CAT &
Megatron 3000 • To execute “select * from R where condition”: (1) Read directory file to get R attributes (2) Read R file, for each line: (a) Check condition (b) If OK, display
Megatron 3000 • To execute “select A,B from R,S where condition”: (1) Read dictionary to get R,S attributes (2) Read R file, for each line: (a) Read S file, for each line: (i) Create join tuple (ii) Check condition (iii) Display if OK
What’s wrong with the Megatron 3000 DBMS? • Expensive update and search e.g., - To locate an employee with a given SSN, file scan. - To change “Cat” to “Cats”, complete file write. • Solution: Indexing!
What’s wrong with the Megatron 3000 DBMS? • Brute force query processing e.g., select * from R,S where R.A = S.A and S.B > 1000 - Do select first? - More efficient join? • Solution: Query optimization!
What’s wrong with the Megatron 3000 DBMS? • No concurrency control or reliability e.g., - if two client programs read your bank balance ($5000) and add $1000 to it… - Crash. • Solution: Transaction management!