1 / 53

Student Self-Introduction

COP5725: Principles of Database Management Systems Fall 2011 Dr. Tao Li Florida International University. Student Self-Introduction. Name I will try to remember your names. But if you have a Long name, please let me know how should I call you  Anything you want us to know. Course Overview.

wkarl
Download Presentation

Student Self-Introduction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. COP5725:Principles of Database Management SystemsFall 2011Dr. Tao LiFlorida International University

  2. Student Self-Introduction • Name • I will try to remember your names. But if you have a Long name, please let me know how should I call you  • Anything you want us to know

  3. Course Overview • Meeting time and Location • MW 18:25pm – 19:40pm • Office hours: • Th 14:30pm – 16:30pm or by appointment • ECS 318 Phone 76036 • TA Info • TBA • Course Webpage: • http://www.cs.fiu.edu/~taoli/class/COP5725-F11/ • Announcements • Course Materials • Lecture Notes • Assignments • Username: student • Password: COP5725

  4. Textbook and References • Raghu Ramakrishnan and Johannes Gehrke. Database Management Systems. Third Edition, McGraw Hill, 2003. ISBN: 0-07-246563-8. • Ramez Elmasri and Shamkant B. Navathe Fundamentals of Database Systems. Fourth Edition, Addison-Wesley, 2004. ISBN 0-321-12226-7. • Abraham Silberschatz, Henry F. Korth and S. Sudarshan. Database System Concepts . Fourth Edition. McGraw Hill, 2004. ISBN 0-07-255481-9.

  5. Assignments and Grading • Reading/Written Assignments • Projects • Will be done in groups • Group will be assigned by the instructor • Midterm Exam • Final Exam • Class attendance is mandatory. • Final Grade • Quizzes and Class Participation: 10% • Midterm Exam: 25% • Final Exam: 25% • Assignments: 40%

  6. Course Objectives • To study databases from three viewpoints, in particular, those of the database user, the database designer and manager, and the database system implementer. • Sub-objectives • To understand the principles of relational database management systems and their languages – in particular SQL. • To learn the methodology for building applications on top of database management systems – the so called data modeling process. • To investigate the internal operations of relational database management systems.

  7. Tentative Topics • Database Design • Relational Model • Relational Algebra and Calculus • SQL • Indexing and Storage • Transaction Processing • Query Evaluation and Optimization • XML • Data Warehouse • Spatial Data Management

  8. Historical Perspective • 1973 Turing Award Winner: Charles Bachman ( seminal work in network data model) • 1981 Turing Award Winner: Edgar Codd (seminal work in relation data model) • 1999 Turing Award Winner: James Gray (contributions in database transaction management)

  9. Acknowledgements Some of the material used in this course is drawn from other sources: • Prof. Mohammed Zakiat RPI • Professor Mitch Cherniack at brandeis • Professor Joakim Kalvenes at SMU

  10. What is a Database? • A database (DB) is an integrated and structured collection of data about the entities that exist in the environment that is being modeled. • A database management system (DBMS) is the generalized tool/software that facilitates the management of and access to the database.

  11. What need to be managed? • Data Organization • Data Retrieval • Data Integrity

  12. 1. Data Organization • Logical: Data Models • Physical: Data Storage 2. Data Retrieval • Queries = Declarative “Retrieval Programs” • Easy-to-write (compared to standard PL’s) • Efficient execution (w/ query processing) 3. Data Integrity • Managing Concurrency (Transactions) • Managing Crashes (Recovery) • Managing Semantic Inconsistencies (Integrity Constraints) SummaryWhat Gets Managed in a DBMS?

  13. In A Sentence • How to manage large amounts of persistent, and structureddata that are shared among distributed users and processes and whose integrity must be maintained and whose security must be controlled?

  14. Why is it different from storing them in files and writing applications to access these files?

  15. DBMSs VS File Processing Why do we need a DBMS? Why not just use files to store data? User ApplicationProgram File

  16. Old File Method • Stores permanent data in various files • Different application programs to extract data from, and add data to, the appropriate files

  17. Pay History Benefits Employee Employee Choices Old File Method Programs Files Payroll Data Definition File 1 … File 2 … Benefits Data Definition File 3 File 2 File 4 …

  18. Problems I: Data Organization • Data Redundancy and Inconsistency • The address and telephone numbers of an employee may appear both in the Benefits file and the Employee file • Data Isolation • Data in various files and files in different formats • Data definition is part of application programs • programs & data are interdependent • File defined in programs • Cannot read file without its definition • Every time you alter file, you must rewrite code • Change in a program/file will crash other code Hard to develop application programs Hard to change data

  19. Problem II: Data Retrieval • Difficulty in accessing data • Find all employees who live in 33134 • Find the pay history of the employee who enrolled in benefit plan B • No Sophisticated Tools for Selective Data Access • Not efficient • scan the file, • Complex program for efficient Access • Special code for different queries • Want all “retrieval” program to be • Easy to write • Convenient and Efficient to execute

  20. Problem 3: Data Integrity • No Support for Sharing (Multi-user problems) • How to prevent simultaneous modification? • No Coping Mechanisms for System Crashes • How to recover work? • No Means of Preventing Data Entry Errors

  21. Data is most important Data defined first Standard format Access through DBMS Database Approach PROGRAM 1 DBMS Query Processor Integrated Database PROGRAM 1 Transaction Mgr … PROGRAM 2 • Data independence • Change data definition without changing code • Alter code without changing data • Move/split data without changing code.

  22. Why Use database approach? • Data independence and efficient access. • Minimal data redundancy • Sharing of Data • Reduced application development time. • Data integrity and security. • Uniform data administration. • Concurrent access, recovery from crashes.

  23. An Overview of Data Organization Occurs at two levels: • Logical • How do users see data? (e.g., what data is stored where?) • data model, data semantics • Physical • How does data get stored?

  24. Logical Level: Data Models • Data Models • collection of concepts for describing data. • Framework for organizing, interpreting data • Relational (tables), Entity-Relationships (E/R) • Aschemais a description of a particular collection of data, using a given data model. • Relational data model. • Relation, basically a table with rows and columns. • Every relation has a schema, which describes the columns, or fields.

  25. Instance

  26. Physical Data Model • Physical models describe how data is stored on disk and what access methods are available to it • Which data type is assigned to which attribute ? • What indices are available (Indexing) • Where can data be stored? • When to Move data? (Buffer Management) • Mapping data to files? (File Management) • Data independenceis the separation of the physical implementation from the conceptual view of the database.

  27. Levels of Abstraction • Many views, single conceptual (logical) schemaand physical schema. • Views describe how users see the data. • Conceptual schema defines logical structure • The underlying data in terms of the data models • Physical schema describes the files and indexes used. View 1 View 2 View 3 Conceptual Schema Physical Schema • Schemas are defined using DDL; data is modified/queried using DML.

  28. Example: University Database • Conceptual schema: • EMP(eno: string, ename: string, Title: string) • PROJ(pno: string, pname: string, budget: integer) • WORKS(eno: string, pno: string, resp: string, duration: integer) • Physical schema: • Relations stored as unordered files. • Index on first column of EMP. • External Schema (View): • Project_info(pid:string,ename:string) • Defined through SQL query

  29. Data Independence • Applications are insulated from how data is structured and stored. • Invisibility (transparency) of the details of conceptual organization, storage structure and access strategy to the users • Logical • transparency of the conceptual organization • transparency of logical access strategy • Physical • transparency of the physical storage organization • transparency of physical access paths • One of the most important benefits of using a DBMS!

  30. Internal view ANSI/SPARC Architecture ASSIGNMENT(ENO,PNO,ENAME,PNAME) Users External Schema External view External view External view EMP(ENO: string, ENAME: string, TITLE: string) PROJ(PNO: string, PNAME: string, BUDGET: integer) WORKS(ENO: string, PNO: string, RESP: string, DUR: integer) Conceptual Schema Conceptual view Internal Schema Store all the relations as unsorted files. Build indexes on EMP(ENO), PROJ(PNO) and WORKS(ENO,PNO). DBMS

  31. An Overview of Data OrganizationDBMS Components DBA DDL Commands DDL Interpreter Storage Manager File Manager Buffer Manager Secondary Storage Metadata Data Schema

  32. An Overview of Data Retrieval • Query = Declarative data retrieval program • declarative: describes what data, not how to retrieve it • Declarative: Give me all appointments with John • Non-declarative: Scan appointments from start to finish • Why? • 1. Easier-to-write • 2. Efficient to execute (Why?)

  33. Query Language • SQL is a complete programming language for all possible database actions • Relational algebra is a base language for user queries to “extract data” • Embedded SQL is a language for combining SQL programs with a host language • ODBC/JDBC/Dynamic SQL are standard protocols for accessing a database system from a remote machine

  34. An Overview of Data Retrieval Query Query Processor Query Optimizer Query Evaluator Plan Data • Query Optimizer • “compiler” for queries (aka “DML Compiler”)

  35. Index An Overview of Data Retrieval • Index = Auxiliary Data Structure to Facilitate Data Access • Much like a library card catalog ? I’m looking for “x” x Look here! • Essential for Query Processing • ~ 50% of query optimization = choosing indexes

  36. DBProgrammer User Query Code w/ embedded queries DML Precompiler Statistics An Overview of Data RetrievalDBMS Components DBA DDL Commands Query Processor Query Optimizer DDL Interpreter Query Evaluator Storage Manager File Manager Buffer Manager Secondary Storage Metadata Indices Data Schema

  37. An Overview of Data Integrity • Why Concurrent Access to Data Must Be Managed • Jane@ATM3: Withdraws $50 from Acct #55 • 1. Get balance from database • 2. If balance > $50 then • a. balance := balance - 50 • b. dispense cash • c. update balance in database • Joe@ATM2: Withdraws $100 from Acct #55 • 1. Get balance from database • 2. If balance > $100 then • a. balance := balance - 100 • b. dispense cash • c. update balance in database • Q. Acct #55’s Initial Balance = $400, Final Balance = ? A: It depends...

  38. An Overview of Data Integrity • Joe@ATM2: Withdraws $100 from Acct #55 • 1. Get balance from database • 2. If balance > $100 then • a. balance := balance - 100 • b. dispense cash • c. update balance in database • Jane@ATM3: Withdraws $50 from Acct #55 • 1. Get balance from database • 2. If balance > $50 then • a. balance := balance - 50 • b. dispense cash • c. update balance in database • Acct #55’s Balance: $300 • Acct #55’s Balance: $400 • Acct #55’s Balance: $250

  39. An Overview of Data Integrity • Joe@ATM2: Withdraws $100 from Acct #55 • 1. Get balance from database • 2. If balance > $100 then • a. balance := balance - 100 • b. dispense cash • Jane@ATM3: Withdraws $50 from Acct #55 • 1. Get balance from database • 2. If balance > $50 then • a. balance := balance - 50 • b. dispense cash c. update balance in database c. update balance in database • Soln: Transactions and Concurrency Control • Acct #55’s Balance: $400 • Acct #55’s Balance: $350 • Acct #55’s Balance: $300

  40. An Overview of Data Integrity • Jim transfers $100 from Acct #55 to Acct #376 • 1. Get balance from database for acct #55 • 2. If balance55 > $100 then • a. balance55 := balance55 - 100 • b. update balance55 in database • c. get balance from database for acct #376 • d. balance376 := balance376 + 100 • e. update balance376 in database • Suppose there’s a crash between the dotted lines... • Soln: Recovery Management

  41. An Overview of Data Integrity • Example #1: • Example #2: No Branch # 342! • Soln: Integrity Constraints

  42. Transaction • Key concept is transaction, which is an atomicsequence of database actions (reads/writes). • Transactions • Execute user requests as atomic units • May contain one query or multiple queries • Provide • Concurrency transparency • Multiple users may access the database, but they each see the database as their own personal data • Concurrency control • Failure transparency • Even when system failure occurs, database consistency is not violated • Logging and recovery

  43. Transaction Properties • Atomicity • All-or-nothing property • Consistency • Each transaction is correct and does not violate database consistency • Isolation • Concurrent transactions do not interfere with each other • Durability • Once the transaction completes its work (commits), its effects are guaranteed to be reflected in the database regardless of what may occur

  44. Concurrency Control • Concurrent execution of user programs is essential for good DBMS performance. • Because disk accesses are frequent, and relatively slow, it is important to keep the CPU humming by working on several user programs concurrently. • Interleaving actions of different user programs can lead to inconsistency: e.g., check is cleared while account balance is being computed. • DBMS ensures such problems don’t arise: users can pretend they are using a single-user system.

  45. Scheduling Concurrent Transactions • DBMS ensures that execution of {T1, ... , Tn} is equivalent to some serial execution T1’ ... Tn’. • Before reading/writing an object, a transaction requests a lock on the object, and waits till the DBMS gives it the lock. All locks are released at the end of the transaction. (Strict 2PL locking protocol.) • Idea: If an action of Ti (say, writing X) affects Tj (which perhaps reads X), one of them, say Ti, will obtain the lock on X first and Tj is forced to wait until Ti completes; this effectively orders the transactions.

  46. Crash Recovery: Ensuring Atomicity • DBMS ensures atomicity(all-or-nothing property) even if system crashes in the middle of an transaction. • Idea: Keep a log(history) of all actions carried out by the DBMS while executing a set of transactions: • Before a change is made to the database, the corresponding log entry is forced to a safe location. (WAL protocol; OS support for this is often inadequate.) • After a crash, the effects of partially executed transactions are undone using the log. (Thanks to WAL, if log entry wasn’t saved before the crash, corresponding change was not applied to database!)

  47. The Log • The following actions are recorded in the log: • Ti writes an object: The old value and the new value. • Log record must go to disk beforethe changed page! • Ti commits/aborts: A log record indicating this action. • Log records chained together by transaction id, so it’s easy to undo a specific transaction (e.g., to resolve a deadlock). • Log is often duplexed and archived on “stable” storage. • All log related activities (and in fact, all CC related activities such as lock/unlock, dealing with deadlocks etc.) are handled transparently by the DBMS.

  48. Transaction Manager Recovery Manager Integrity Constraints An Overview of Data OrganizationComponents of a DBMS DBProgrammer DBA User DDL Commands Query Code w/ embedded queries Query Processor Query Optimizer DML Precompiler DDL Interpreter Query Evaluator Storage Manager File Manager Buffer Manager Secondary Storage Metadata Indices Data Statistics Schema

More Related