1 / 108

Compiler Concepts for Database Systems

Compiler Concepts for Database Systems. Prof. Steven A. Demurjian Computer Science & Engineering Department The University of Connecticut 371 Fairfield Way, Unit 2155 Storrs, CT 06269-3155. steve@engr.uconn.edu http://www.engr.uconn.edu/~steve (860) 486 - 4818. Overview.

cricket
Download Presentation

Compiler Concepts for Database Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Compiler Concepts for Database Systems Prof. Steven A. Demurjian Computer Science & Engineering Department The University of Connecticut 371 Fairfield Way, Unit 2155 Storrs, CT 06269-3155 steve@engr.uconn.edu http://www.engr.uconn.edu/~steve (860) 486 - 4818

  2. Overview • Motivation and Background • Database System Architecture • Exploring its Capabilities • Focusing on Compiler-Related Concepts • Compile Time Issues in Database Systems • The SQL Query Language • Optimization Issues in Database Systems • Typing • Runtime Issues in Database Systems • Transaction Processing • Execution for Complex Joins

  3. Database System Architecture • What are the Various Components? • How do they Relate to Compilers?

  4. How Does it Compare to Java Environment?

  5. Database Concepts - Summary • Schema vs. Data • Database-Structured Collection of Data Describing • Objects of Universe of Discourse being Modeling. • A Database Consists of Schema and Data • Schema: Describes the Intension (Type) of Objects • Data: Describes the Extension (Instances) of Objects • What is Schema w.r.t. Compilers? What is Data?

  6. What is a DBMS? • A Database Management System (DBMS) is the Generalized Tool that Facilitates the Management of and Access to the Database • Main Functions: • Defining a Database: Specifying Data Types, Structures, and Constraints • Constructing a Database: the Process of Storing the Data Itself on Some Storage Medium • Manipulating a Database: Function for Querying Specific Data in the Database and Updating the Database • What are the Analogies of Each of the Main Functions w.r.t. Programming Languages and Compilers?

  7. What is a DBMS? • Additional Functions: • Interaction with File Manager • So that Details Related to Data Storage and Access are Removed From Application Programs • Integrity Enforcement • Guarantee Correctness, Validity, Consistency • Security Enforcement • Prevent Data From Illegal Uses • Concurrency Control • Control the Interference Between Concurrent Programs • Recovery from Failure • Query Processing and Optimization • Again – What are Relevant Compiler Concepts?

  8. DBMS Architecture • DBMS Languages • Data Definition Language (DDL) • Data Manipulation Language (DML) • From Embedded Queries or DB Commands Within a Program • “Stand-alone” Query Language • Host Language: • DML Specification (e.g., SQL) is Embedded in a “Host” Programming Language (e.g., Java, C++) • DBMS Interfaces • Menu-Based Interface • Graphical Interface • Forms-Based Interface • Interface for DBA (DB Administrator)

  9. DBMS Architecture • Main DBMS Modules • DDL Compiler • DML Compiler • Ad-hoc (Interactive) Query Compiler • Run-time Database Processor • Stored Data Manager • Concurrency/Back-Up/Recovery Subsystem • DBMS Utility Modules • Loading Routines • Backup Utility • System Catalog/data Dictionary

  10. Components of a DBMS

  11. ANSI/SPARC - Three Schema Architecture • External Data Schema (Users’ view) • Conceptual Data Schema (Logical Schema) • Internal Data Schema (Physical Schema) • What are the Programming Language Analogies?

  12. Conceptual Schema • Describes the Meaning of Data in the Universe of Discourse • Emphasizes on General, Conceptually Relevant, and Often Time Invariant Structural Aspects of the Universe of Discourse • Excludes the Physical Organization and Access Aspects of the Data • This could be a UML Design that Realizes a Set of Classes (no data) or Java Class Declarations (APIs)

  13. Conceptual Schema • Another Example – A Programming Language Level Definition

  14. External Schema • Describes Parts of the Information in the Conceptual Schema in a form Convenient to a Particular User Group’s View • Derived from the Conceptual Schema • What is the View of the Outside World in OO? • Akin to Public Interface

  15. External Schema • Another Example

  16. Internal Schema • Describes How the Information Described in the Conceptual Schema is Physically Represented in a Database to Provide the Overall Best Performance

  17. Internal Schema • Another Example • This Corresponds to Data Typing and Layout in Compilers from Runtime Environment!

  18. Unified Example of Three Schemas

  19. Database Access Process • What Does This Access Process Resemble? • Akin to Runtime Execution Environment! • A More Complex Activation Process!

  20. Metadata vs. Data • Recall Introspection and Reflection in Java where you Can “Look” into the Class Definitions Themselves!

  21. Data Independence • Ability that Allows Application Programs Not Being Affected by Changes in Irrelevant Parts of the Conceptual Data Representation, Data Storage Structure and Data Access Methods • Invisibility (Transparency) of the Details of Entire Database Organization, Storage Structure and Access Strategy to the Users • Recall Software Engineering Concepts: • Abstraction the Details of an Application's Components Can Be Hidden, Providing a Broad Perspective on the Design • Representation Independence: Changes Can Be Made to the Implementation that have No Impact on the Interface and Its Users • Realized in Today’s Modern PLs!

  22. What are System Components? • How are these Similar to Complier/PL Concepts?

  23. Relational Model • Relational Model of Data Based on the Concept of a Relation • Relation - a Mathematical Concept Based on Sets • Strength of the Relational Approach to Data Management Comes From the Formal Foundation Provided by the Theory of Relations • RELATION: A Table of Values • A Relation May Be Thought of as a Set of Rows • A Relation May Alternately be Though of as a Set of Columns • Each Row of the Relation May Be Given an Identifier • Each Column Typically is Called by its Column Name or Column Header or Attribute Name

  24. Relational Tables - Rows/Columns/Tuples

  25. Relational Database Definition CREATE TABLE Student: Name(CHAR(30)), SSN(CHAR(9)), Gpa(FLOAT(2)) CREATE TABLE Faculty: Name(CHAR(30)), SSN(CHAR(9)), Ophone(CHAR(7)) CREATE TABLE Courses: Course#(CHAR(6)), Title(CHAR(20)), Descrip(CHAR(100)), PCourse#(CHAR(6)) CREATE TABLE Formats: Section#(INTEGER(3)), Quarter(CHAR(10)), Campus(CHAR(15)) CREATE TABLE TakeorTeach: SSN(CHAR(9)), Course#(CHAR(6)), Section#(INTEGER(3)) CREATE TABLE COfferings: Course#(CHAR(6)), Section#(INTEGER(3)) Student(Name*, SSN, Gpa) Faculty(Name*, SSN, Ophone) Courses(Course#*, Title, Descrip, PCourse#*) Formats(Section#*, Quarter, Campus) TakeorTeach(SSN, Course#, Section#) COfferings(Course#, Section#)

  26. Relational Views • Two Views Derived From Prior Tables • Student Transcript View • Course Prerequisite View

  27. SQL is a Partial Example of a Tuple Relational Language Simple Queries are all Declarative More Complex Queries are both Declarative and Procedural (e.g., joins, nested queries) Find the names of employees working on the CAD/CAM project SELECT EMP.ENAME FROM EMP, WORKS, PROJ WHERE (EMP.ENO= WORKS.ENO) AND (WORKS.PNO = PROJ.PNO) AND (PROJ.PNAME = “CAD/CAM”) SQL Defines a Programming Language and Associated Semantics for Usage and Processing SQL: Tuple Relational Calculus-Based

  28. SQL Components • Data Definition Language (DDL) • For External and Conceptual Schemas • Views - DDL for External Schemas • Data Manipulation Language (DML) • Interactive DML Against External and Conceptual Schemas • Embedded DML in Host PLs (EQL, JDBC, etc.) • Note: Separation of Definition (DDL) from Usage (DML) – Is there Something Similar in PLs? • Others • Integrity (Allowable Values/Referential) • Transaction Control (Long-Duration and Batch) • Authorization (Who can Do What When)

  29. SQL DDL and DML • Data Definition Language (DDL) - Declarations • Defining the Relational Schema - Relations, Attributes, Domains - The Meta-Data CREATE TABLE Student: Name(CHAR(30)),SSN(CHAR(9)),GPA(FLOAT(2)) CREATE TABLE Courses: Course#(CHAR(6)), Title(CHAR(20)), Descrip(CHAR(100)), PCourse#(CHAR(6)) • Data Manipulation Language (DML) - Code • Defining the Queries Against the Schema SELECT Name, SSN From Student Where GPA > 3.00

  30. Data Definition Language - DDL • A Pre-Defined set of Primitive Types • Numeric • Character-string • Bit-string • Additional Types • Defining Domains • Defining Schema • Defining Tables • Defining Views • Note: Each DBMS May have their Own DBMS Specific Data Types - Is this Good or Bad? • What is this Similar to re. Different C++ Compilers? • These are Akin to PL Data Types!

  31. DDL - Primitive Types • Numeric • INTEGER (or INT), SMALLINT • REAL, DOUBLE PRECISION • FLOAT(N) Floating Point with at Least N Digits • DECIMAL(P,D) (DEC(P,D) or NUMERIC(P,D)) have P Total Digits with D to Right of Decimal • Note that INTs and REALs are Machine Dependent (Based on Hardware/OS Platform) • Again – this is Similar to PLs/Compilers and Code Generation – Data Layout

  32. DDL - Primitive Types • Character-String • CHAR(N) or CHARACTER(N) - Fixed • VARCHAR(N), CHAR VARYING(N), or CHARACTER VARYING(N) Variable with at Most N Characters • Bit-Strings • BIT(N) Fixed • VARBIT(N) or BIT VARYING(N) Variable with at Most N Bits

  33. DDL - Primitive Types • These Specialized Primitive Types are Used to: • Simplify Modeling Process • Include “Popular” Types • Reduce Composite Attributes/Programming • DATE : YYYY-MM-DD • TIME: HH-MM-SS • TIME(I): HH-MM-SS-F....F - I Fraction Seconds • TIME WITH TIME ZONE: HH-MM-SS-HH-MM • TIME-STAMP: YYYY-MM-DD-HH-MM-SS-F...F{-HH-MM} • PLs also have Specialized Types! • Problem: Different Database Systems Sometime Implement these Types very Differently • This Impacts Portability!

  34. What is a SQL Schema? • A Schema in SQL is the Major Meta-Data Construct • Supports the Definition of: • Relation - Table with Name • Attributes - Columns and their Types • Identification - Primary Key • Constraints - Referential Integrity (FK) • Two Part Definition • CREATE Schema - Named Database or Conceptually Related Tables • CREATE Table - Individual Tables of the Schema

  35. DDL-Create/Drop a Schema • Creating a Schema:CREATE SCHEMA MY_COMPANY AUTHORIZATION Demurjian; • Schema MY_COMPANY bas Been Created and is Owner by the User “Demurjian” • Tables can now be Created and Added to Schema • Dropping a Schema:DROP SCHEMA MY_COMPANY RESTRICT;DROP SCHEMA MY_COMPANY CASCADE; • Restrict: • Drop Operation Fails If Schema is Not Empty • Cascade: • Drop Operation Removes Everything in the Schema

  36. DDL - Create Tables CREATE TABLE EMPLOYEE ( FNAME VARCHAR(15) NOT NULL , MINIT CHAR , LNAME VARCHAR(15) NOT NULL , SSN CHAR(9) NOT NULL , BDATE DATE ADDRESS VARCHAR(30) , SEX CHAR , SALARY DECIMAL(10,2) , SUPERSSN CHAR(9) , DNO INT NOT NULL , PRIMARY KEY (SSN) , FOREIGN KEY (SUPERSSN) REFERENCES EMPLOYEE(SSN) , FOREIGN KEY (DNO) REFERENCES DEPARTMENT(DNUMBER) ) ;

  37. DDL - Create Tables (continued) CREATE TABLE DEPARTMENT ( DNAME VARCHAR(15) NOT NULL , DNUMBER INT NOT NULL ,MGRSSN CHAR(9) NOT NULL , MGRSTARTDATE DATE , PRIMARY KEY (DNUMBER) , UNIQUE (DNAME) , FOREIGN KEY (MGRSSN) REFERENCES EMPLOYEE(SSN) ) ; CREATE TABLE DEPT_LOCATIONS (DNUMBER INT NOT NULL , DLOCATION VARCHAR(15) NOT NULL , PRIMARY KEY (DNUMBER, DLOCATION) , FOREIGN KEY (DNUMBER) REFERENCES DEPARTMENT(DNUMBER) ) ;

  38. DDL - Create Tables (continued) CREATE TABLE PROJECT (PNAME VARCHAR(15) NOT NULL , PNUMBER INT NOT NULL ,PLOCATION VARCHAR(15) , DNUM INT NOT NULL , PRIMARY KEY (PNUMBER) , UNIQUE (PNAME) , FOREIGN KEY (DNUM) REFERENCES DEPARTMENT(DNUMBER) ) ; CREATE TABLE WORKS_ON (ESSN CHAR(9) NOT NULL , PNO INT NOT NULL , HOURS DECIMAL(3,1) NOT NULL , PRIMARY KEY (ESSN, PNO) , FOREIGN KEY (ESSN) REFERENCES EMPLOYEE(SSN) ,FOREIGN KEY (PNO) REFERENCES PROJECT(PNUMBER) ) ;

  39. DDL - Create Tables with Constraints CREATE TABLE EMPLOYEE ( . . . , DNO INT NOT NULL DEFAULT 1, CONSTRAINT EMPPK PRIMARY KEY (SSN) , CONSTRAINT EMPSUPERFK FOREIGN KEY (SUPERSSN) REFERENCES EMPLOYEE(SSN) ON DELETE SET NULL ON UPDATE CASCADE , CONSTRAINT EMPDEPTFK FOREIGN KEY (DNO) REFERENCES DEPARTMENT(DNUMBER) ON DELETE SET DEFAULT ON UPDATE CASCADE );

  40. DDL - Create Tables with Constraints • Is there an Equivalent to Keys and Constraints in PLs? • What Does Java Have Internally? • Constraints Facilitate Type Checking at Data Level! CREATE TABLE DEPARTMENT ( . . . , MGRSSN CHAR(9) NOT NULL DEFAULT '888665555' , . . . , CONSTRAINT DEPTPK PRIMARY KEY (DNUMBER) , CONSTRAINT DEPTSK UNIQUE (DNAME), CONSTRAINT DEPTMGRFK FOREIGN KEY (MGRSSN) REFERENCES EMPLOYEE(SSN) ON DELETE SET DEFAULT ON UPDATE CASCADE );

  41. Data Manipulation Language - DML • SQL has the SELECT Statement for Retrieving Info. from a Database (Not Relational Algebra Select) • SQL vs. Formal Relational Model • SQL Allows a Table (Relation) to have Two or More Identical Tuples in All Their Attribute Values • Hence, an SQL Table is a Multi-set (Sometimes Called a Bag) of Tuples; it is Not a Set of Tuples • SQL Relations Can Be Constrained to Sets by • PRIMARY KEY or UNIQUE Attributes • Using the DISTINCT Option in a Query • Implied Processing and Procedural Semantics • SQL Queries have Specific Semantics • These Semantics Dictate Processing • Includes Code Generation, Optimization, etc.

  42. Interactive DML - Main Components • Select-from-where Statement Contains: • Select Clause - Chosen Attributes/Columns • From Clause - Involved Tables • Where Clause - Constrain Tuple Values • Tuple Variables - Distinguish Among Same Names in Different Tables • String Matching - Detailed Matching Including • Exact • Starts With • Near • Ordering of Rows - Sorting Tuple Results

  43. Recall Prior Schema

  44. …and Corresponding DB Tables Which Represent Tuples/Instances of Each Relation A S C null W B null null 1 4 5 5

  45. …and Corresponding DB Tables

  46. Simple SQL Queries B S C null W B null null • Query 0: Retrieve the Birthdate and Address of the Employee whose Name is 'John B. Smith'.SELECT BDATE, ADDRESSFROM EMPLOYEEWHERE FNAME='John' AND MINIT='B’ AND LNAME='Smith’ • Which Row(s) are Selected? • Note: While All of these Next Queries are from Chapter 8, Some are From “Earlier” Edition

  47. Simple SQL Queries • Query 1: Retrieve Name and Address of all Employees who work for the 'Research' DepartmentSELECT FNAME, MINIT, LNAME, ADDRESS, DNAMEFROM EMPLOYEE, DEPARTMENTWHERE DNAME='Research' ANDDNUMBER=DNO • What Action is Being Performed? Join! Cartesian Product!

  48. Simple SQL Queries - Result Theta Join on DNO=DNUMBER

  49. Simple SQL Queries • Query 2: For Every Project in 'Stafford', list the Project Number, the Controlling Dept. Number, and the Dept. Manager's Last Name, Address, and BirthdateSELECT PNUMBER, DNUM, LNAME, BDATE,ADDRESSFROM PROJECT, DEPARTMENT, EMPLOYEEWHERE DNUM=DNUMBER AND MGRSSN=SSN AND PLOCATION='Stafford' • In Q2, there are Two Join Conditions: • The Join Condition DNUM=DNUMBER Relates a Project to its Controlling Department • The Join Condition MGRSSN=SSN Relates the Controlling Department to the Employee who Manages that Department

  50. Query Results A S C null W B null null SELECT PNUMBER, DNUM, LNAME, BDATE,ADDRESSFROM PROJECT, DEPARTMENT, EMPLOYEEWHERE DNUM=DNUMBER AND MGRSSN=SSN AND PLOCATION='Stafford'

More Related