COP5725 Advanced Database Systems - PowerPoint PPT Presentation

cop5725 advanced database systems n.
Skip this Video
Loading SlideShow in 5 Seconds..
COP5725 Advanced Database Systems PowerPoint Presentation
Download Presentation
COP5725 Advanced Database Systems

play fullscreen
1 / 106
COP5725 Advanced Database Systems
Download Presentation
Download Presentation

COP5725 Advanced Database Systems

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. COP5725Advanced Database Systems DB Fundamentals Spring 2014

  2. What are Database Management Systems DBMS is a system for providing EFFICIENT, CONVENIENT, and SAFEMULTI-USER storage of and access to MASSIVE amounts of PERSISTENT data

  3. Example: Banking System • Data • Information on accounts, customers, balances, current interest rates, transaction histories, etc. • MASSIVE • Many gigabytes at a minimum for big banks, more if keep history of all transactions, even more if keep images of checks -> Far too big to fit in main memory • PERSISTENT • Data outlives programs that operate on it

  4. Example: Banking System • SAFE: • from system failures • from malicious users • CONVENIENT: • simple commands to debit account, get balance, write statement, transfer funds, etc. • also unpredicted queries should be easy • EFFICIENT: • don't search all files in order to get balance of one account, get all accounts with low balances, get large transactions, etc. • massive data! -> DBMS's carefully tuned for performance

  5. Multi-user Access • Many people/programs accessing same database, or even same data, simultaneously -> Need careful controls • Alex @ ATM1: withdraw $100 from account #007 get balance from database; if balance >= 100 then balance := balance - 100; dispense cash; put new balance into database; • Bob @ ATM2: withdraw $50 from account #007 get balance from database; if balance >= 50 then balance := balance - 50; dispense cash; put new balance into database; • Initial balance = 120. Final balance = ??

  6. Why File Systems Won’t Work • Storing data: file system is limited • size limit by disk or address space • when system crashes we may loose data • Password/file-based authorization insufficient • Query/update: • need to write a new C++/Java program for every new query • need to worry about performance • Concurrency: limited protection • need to worry about interfering with other users • need to offer different views to different users (e.g. registrar, students, professors) • Schema change: • entails changing file formats • need to rewrite virtually all applications That’s why the notion of DBMS was motivated!

  7. DBMS Architecture User/Web Forms/Applications/DBA query transaction DDL commands Query Parser Transaction Manager DDL Processor Query Rewriter Concurrency Control Logging & Recovery Query Optimizer Query Executor Records Indexes Lock Tables Buffer: data, indexes, log, etc Buffer Manager Main Memory Storage Manager Storage data, metadata, indexes, log, etc CS411

  8. Data Structuring: Model, Schema, Data • Data model • conceptual structuring of data stored in database • ex: data is set of records, each with student-ID, name, address, courses, photo • ex: data is graph where nodes represent cities, edges represent airline routes • Schema versus data • schema: describes how data is to be structured, defined at set-up time, rarely changes (also called "metadata") • data: actual "instance" of database, changes rapidly • vs. types and variables in programming languages

  9. Schema vs. Data • Schema: name, name of each field, the type of each field • Students (Sid:string, Name:string, Age: integer, GPA: real) • A template for describing a student • Data: an example instance of the relation

  10. Data Structuring: Model, Schema, Data • Data definition language (DDL) • commands for setting up schema of database • Data Manipulation Language (DML) • Commands to manipulate data in database: • RETRIEVE, INSERT, DELETE, MODIFY • Also called "query language"

  11. People • DBMS user: queries/modifies data • DBMS application designer • set up schema, loads data, … • DBMS administrator • user management, performance tuning, … • DBMS implementer: builds systems

  12. Key Steps in Building DB Applications • Step 0: pick an application domain • Step 1: conceptual design • Discuss with your team mate what to model in the application domain • Need a modeling language to express what you want • ER model is the most popular such language • Output: an ER diagram of the application domain • Step 2: pick a type of DBMS’s • Relational DBMS is most popular and is our focus

  13. Key Steps in Building DB Applications • Step 3: translate ER design to a relational schema • Use a set of rules to translate from ER to relational schema • Use a set of schema refinement rules to transform the above relational schema into a goodrelational schema • 1NF, 2NF, 3NF, BCNF, 4NF,…… • At this point • You have a good relational schema on paper

  14. Key Steps in Building DB Applications • Step 4: Implement your relational DBMS using a "database programming language" called SQL • SELECT-FROM-WHERE-GROUPBY-HAVING • Step 5: Ordinary users cannot interact with the database directly and the database also cannot do everything you want, hence • Write your application program in C++, Java, PHP, etc. to handle the interaction and take care of things that the database cannot do

  15. Constraints • Constraint: an assertion about the database that must be true at all times • Part of the database schema • Very important in database design • Finding constraints is part of the modeling process • Keys: social security number uniquely identifies a person • Single-value constraints: a person can have only one father • Referential integrity constraints: if you work for a company, it must exist in the database • Domain constraints: peoples’ ages are between 0 and 150 • General constraints: all others (at most 50 students enroll in a class)

  16. More about Keys • Every entity must have a key • why? • A key can consist of more than one attribute • There can be more than one key for an entity set • Among all candidate keys, one key will be designated as primary key

  17. ER Model vs. Relational Model • Both are used to model data • ER model has many concepts • Entities, relationships, attributes, etc. • Well-suited for capturing the app. requirements • Not well-suited for computer implementation • Relational model • Has just a single concept: relation • World is represented with a collection of tables • Well-suited for efficient manipulations on computers

  18. Relation: An Example Name of Table (Relation) Column (Field, Attribute) Products Domain (Atomic type) Row (Record, Tuple)

  19. Relations • Schema vs. instance = columns vs. rows • Schema of a relation • Relation name • Attribute names • Attribute types (domains) • Schema of a database • A set of relation schemas • Questions • When do you determine a schema (instance)? • How often do you change your mind?

  20. Relations • The database maintains a current database state • Updates to the data happen very frequently • add a tuple • delete a tuple • modify an attribute in a tuple • Updates to the schema are relatively rare, and rather painful. Why?

  21. Defining a Database Schema • A database schema comprises declarations for the relations (“tables”) of the database • Simplest form of creation is: CREATE TABLE <name> ( <list of elements> ); • And you may remove a relation from the database schema by: DROP TABLE <name>;

  22. Elements of Table Declarations • The principal element is a pair consisting of an attribute and a type • The most common types are: • INT or INTEGER (synonyms) • REAL or FLOAT (synonyms) • CHAR(n ) = fixed-length string of n characters • VARCHAR(n ) = variable-length string of up to ncharacters

  23. Example: Create Table CREATE TABLE Sells ( bar CHAR(20), beer VARCHAR(20), price REAL );

  24. Declaring Keys • An attribute or list of attributes may be declared PRIMARY KEY or UNIQUE • Each says the attribute(s) so declared functionally determines all the attributes of the relation schema • Single attribute keys CREATE TABLE Beers ( name CHAR(20) UNIQUE, manf CHAR(20) );

  25. Multi-attribute Keys CREATE TABLE Sells ( bar CHAR(20), beer VARCHAR(20), price REAL, PRIMARY KEY (bar, beer) );

  26. Foreign Keys • A Foreign Key is a field whose values are keys in another relation • Must correspond to primary key of the second relation • Like a `logical pointer’ Enrolled Students CREATE TABLE Enrolled ( sidCHAR(20), cid CHAR(20), grade CHAR(2), PRIMARY KEY (sid,cid), FOREIGN KEY (sid) REFERENCESStudents, FOREIGN KEY (cid) REFERENCESCourses )

  27. Relational Algebra • Querying the database: specify what we want from our database • Find all the people who earn more than $1,000,000 and pay taxes in Tallahassee • Could write in C++/Java, but a bad idea • Instead use high-level query languages: • Theoretical: Relational Algebra, Datalog • Practical: SQL • Relational algebra: a basic set of operations on relations that provide the basic principles

  28. What is an “Algebra”? • Mathematical system consisting of: • Operands--- variables or values from which new values can be constructed • Operators--- symbols denoting procedures that construct new values from given values • Examples • Arithmetic algebra, linear algebra, Boolean algebra …… • What are operands? • What are operators?

  29. What is Relational Algebra? • An algebra • Whose operands are relations or variables that represent relations • Whose operators are designed to do common things that we need to do with relations in a database • relations as input, new relation as output • Can be used as a query languagefor relations

  30. Relational Operators at a Glance • Five basic RA operations: • Basic Set Operations • union, difference (no intersection, no complement) • Selection: s • Projection: p • Cartesian Product: X • When our relations have attribute names: • Renaming: r • Derived operations: • Intersection, complement • Joins (natural join, equi-join, theta join, semi-join)

  31. Set Operations • Union: all tuples in R1 or R2, denoted as R1 U R2 • R1, R2 must have the same schema • R1 U R2 has the same schema as R1, R2 • Example: • Active-Employees U Retired-Employees • Difference: all tuples in R1 and not in R2, denoted as R1 – R2 • R1, R2 must have the same schema • R1 - R2 has the same schema as R1, R2 • Example • All-Employees - Retired-Employees

  32. Selection • Returns all tuples which satisfy a condition, denoted assc(R) • c is a condition: =, <, >, AND, OR, NOT • Output schema: same as input schema • Find all employees with salary more than $40,000: • sSalary > 40000(Employee)

  33. Projection • Unary operation: returns certain columns, denoted as P A1,…,An(R) • Eliminates duplicate tuples ! • Input schema R(B1, …, Bm) • Condition: {A1, …, An} {B1, …, Bm} • Output schema S(A1, …, An) • Example: project social-security number and names: • PSSN, Name (Employee)

  34. Selection vs. Projection • Think of relation as a table • How are they similar? • How are they different? • Why do you need both?

  35. Cartesian Product • Each tuple in R1 with each tuple in R2, denoted as R1 x R2 • Input schemas R1(A1,…,An), R2(B1,…,Bm) • Output schema is S(A1, …, An, B1, …, Bm) • Very rare in practice; but joinsare very common • Example: Employee x Dependent

  36. Example Employee Dependent Employee x Dependent

  37. Renaming • Does not change the relational instance, denoted as Notation: rS(B1,…,Bn) (R) • Changes the relational schema only • Input schema: R(A1, …, An) • Output schema: S(B1, …, Bn) • Example: • Soc-sec-num, firstname(Employee)

  38. Set Operations: Intersection • Intersection: all tuples both in R1 and in R2, denoted as R1 R2 • R1, R2 must have the same schema • R1 R2 has the same schema as R1, R2 • Example • UnionizedEmployeesRetiredEmployees • Intersection is derived: • R1 R2 = R1 – (R1 – R2) why ?

  39. Theta Join • A join that involves a predicate q, denoted as R1 qR2 • Input schemas: R1(A1,…,An), R2(B1,…,Bm) • Output schema: S(A1,…,An,B1,…,Bm) • Derived operator: R1 qR2 = sq (R1 x R2) • Take the product R1 x R2 • Then apply SELECTC to the result • As for SELECT, C can be any Boolean-valued condition

  40. Theta Join: Example Sells Bar BarInfo := Sells Sells.Bar=Bar.Name Bar

  41. Natural Join • Notation: R1 R2 • Input Schema: R1(A1, …, An), R2(B1, …, Bm) • Output Schema: S(C1,…,Cp) • Where{C1, …, Cp} = {A1, …, An} U{B1, …, Bm} • Meaning: combine all pairs of tuples in R1 and R2 that agree on the attributes: • {A1,…,An} {B1,…, Bm}(called the join attributes)

  42. Natural Join: Examples Employee Dependent • Employee Dependent = • PSSN, Name, Dependent-Name(sEmployee.SSN=Dependent.SSN(Employee x Dependent)

  43. Natural Join: Examples R S R S

  44. Natural Join • Given the schemas R(A, B, C, D), S(A, C, E), what is the schema of R S ? • Given R(A, B, C), S(D, E), what is R S? • Given R(A, B), S(A, B), what is R S?

  45. Equi-join • Special case of theta join: condition c contains only conjunction of equalities • Result schema is the same as that of Cartesian product • May have fewer tuples than Cartesian product • Most frequently used in practice: R1 A=BR2 • Natural join is a particular case of equi-join • A lot of research on how to do it efficiently

  46. Building Complex Expressions • Algebras allow us to express sequences of operations in a natural way • Example • In arithmetic algebra: (x + 4)*(y - 3) • Relational algebra allows the same • Three notations, just as in arithmetic: • Sequences of assignment statements • Expressions with several operators • Expression trees

  47. Sequences of Assignments • Create temporary relation names • Renaming can be implied by giving relations a list of attributes • Example: R3 := R1 JOINC R2 can be written: R4 := R1 x R2 R3 := SELECTC (R4)

  48. Expressions with Several Operators • Example: the theta-join R3 := R1 JOINC R2 can be written: R3 := SELECTC (R1 x R2) • Precedence of relational operators: • Unary operators --- select, project, rename --- have highest precedence, bind first • Then come products and joins • Then intersection • Finally, union and set difference bind last • But you can always insert parentheses to force the order you desire

  49. Expression Trees • Leaves are operands • either variables standing for relations or particular constant relations • Interior nodes are operators, applied to their child or children

  50. UNION RENAMER(name) PROJECTname PROJECTbar SELECTaddr = “Tennessee St.” SELECT price<3 AND beer=“Bud” Expression Tree: Examples Given Bars(name, addr), Sells(bar, beer, price), find the names of all the bars that are either on Tennessee St. or sell Bud for less than $3 Bars Sells