1 / 17

CSC 485D/586D/SEng 480D Introduction

CSC 485D/586D/SEng 480D Introduction. What’s a database?. In essence a database is nothing more than a collection of information that exists over a long period of time.

erica
Download Presentation

CSC 485D/586D/SEng 480D Introduction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CSC 485D/586D/SEng 480D Introduction

  2. What’s a database? • In essence a database is nothing more than a collection of information that exists over a long period of time. • Databases are empowered by a body of knowledge and technology embodied in specialized software called a database management system, or DBMS. • A DBMS is a powerful tool for creating and managing large amounts of data efficiently and allowing it to persist over long periods of time, safely. • Among the most complex types of software available.

  3. The database [management] system • Allows users to create new databases and specify their schema(logical structure of the data), using a data-definition language. • Gives user the ability to query the data and modify the data, using a query language and data-manipulation language. • Supports intelligent storage of very large amounts of data. • Protects the data from accident or not proper use. • Example: We can require from the DBMS to not allow the insertion of two different employees with the same SIN. • Allows efficient access to the data for queries and modifications. • Example: The use of indexes over a specified field, e.g. on the name field for employees, allows fast response for queries asking a specific name. • Controls access to data from many users at once (concurrency), without allowing “bad” interactions that can corrupt the data accidentally. • Recovers from failures and crashes.

  4. Relational Model • Based on tables: • Today used in most DBMS’s • Oracle, SQL-Server, IBM DB2, Sybase, Microsoft Access etc… • Challengers: Object oriented DB’s (ObjectStore, Objectivity…) • Object-relational:: oo-extension of rel’s

  5. Database Studies • Design of databases. • What kinds of information go into the database? • How is the information structured? • How do data items connect? • Database programming. • How does one express queries on the database? • How does one use other capabilities of a DBMS, such as transactions or constraints, in an application? • How is database programming combined with conventional programming? • Database system implementation. • How does one build a DBMS, including such matters as query processing, transaction processing and organizing storage for efficient access?

  6. Fictitious Megatron 2006 DBMS • Stores relations as Unix files • Students(name, sid, dept) is stored in the file /home/megatron/students as Smith#123#CS Jones#533#EE • Schemas are stored in /home/megatron/schemas e.g. Students#name#STR#id#INT#dept#STR Depts#name#STR#office#str

  7. Megatron sample session mayne$ megatron WELCOME TO MEGATRON 2006 megaSQL% SELECT * FROM Students; Name id dept ---------------------------------- Smith 123 CS Johnson 522 EE megaSQL%

  8. Megatron sample session II megaSQL% SELECT * FROM Students WHERE id >= 500 | HighId.txt; megaSQL% more HighId.txt Jones#522#EE megaSQL% quit THANK YOU FOR USING MEGATRON 2006 mayne$

  9. Megatron Implementation • To execute SELECT * FROM R WHERE <COND> • Read file schema to get attributes of R • Check that the <COND> is semantically valid for R • Read file R, • for each line • check condition • if OK, display • If we pipe the result into a file, say T, then add an entry for T in the file /home/megatron/schemas

  10. Megatron Implementation II • To execute SELECT office FROM Students, Dept WHERE Students.name = 'Smith' AND Students.dept = Depts.name; • Read file schema to get attributes and do semantic check. • If Ok, then, for each tuple s in Students for each tuple d in Depts if s and d satisfy the WHERE condition, display the office value from s

  11. What’s wrong with Megatron? • Tuple layout on disk: no flexibility for DB modifications. • Change CS to ECON and the entire file has to be rewritten. • Search Expensive: no indexes; always read entire relation. • Brute­force query processing. • Did we need to look at all pairs of student­dept tuples? • No buffer manager: everything comes off of disk all the time. • No concurrency control: several users can modify a file at the same time with unpredictable results. • No reliability: can lose data in a crash or leave operations half done. • Little security: file system protection too coarse.

  12. Architecture of a DBMS • The “cylindrical” component contains not only data, but also metadata, i.e.info about the structure of data. • If the DBMS is relational the metadata includes: • names of relations, • names of attributes of those relations, and • data types for those attributes (e.g., integer or character string). • Often a database maintains indexes for the data. • Indexes are part of the stored data. • A description of which attributes have indexes is part of the metadata.

  13. Storage and BufferManager • The job of the storage manager is • to obtain requested information from the data storage, and • to modify the information to the data storage when requested. • The buffer manager handles main memory. It obtains and returns blocks of data from/to the file manager and stores the blocks temporarily in main memory pages. • E.g. 1 block = 1 page = 4,000 to 16,000 bytes.   • One block is the smallest unit of data that is read/written from/to disk.

  14. Query Processor • The query processor handles: queries+modifications to the data. • Its job is to find the best way to carry out a requested operation and, • to issue commands to the storage manager that will carry them out. • E.g. A bank has a DB with two relat.:   • Customers (name, ssn, address), • Accounts (accountNo, balance, ssn) • Query: “Find the balances of all accounts of which Sally is the owner.” • SELECT Accounts.balance • FROM Customers, Accounts • WHERE Customers.ssn = Accounts.ssn AND Customers.name = “Sally”

  15. Query Processor (Cont.) • What this query logically says is: • Make the Cartesian product of the tables specified in the FROM-clause, • i.e. associate each tuple of Customers with each tuple of Accounts. • We get a new temporary relation R with longer tuples, • the attributes are renamed so as to include the name of originating relation. (Customer.ssnetc.) • Chose from R only the tuples satisfying the condition in the WHERE clause. • Produce in the answer only the values of attributes in SELECT-clause. • Of course,if we would answer this query as it says the performance would be terrible (step 1). • Supp. we have an index on name of Customer and an index on ssn of Accounts. • Using the index on name of Customer we need usually three disk access. • One more access gets us the tuple for “Sally” • Similarly we need four disk accesses for finding the correspond. account. • If there are several accounts of “Sally” we need a few more accesses.

  16. Transaction Manager • The transaction manager is responsible for the integrity of the system. It must assure that: • several queries running simultaneously do not interfere with each other and that, • the system will not lose data even if there is a power failure. • The transaction manager interacts with: • execution engine, • it may need to delay certain queries or operations in order to avoid conflicts. • storage manager • schemes for protecting the data usually involve storing a log of changes to the data.

  17. What will be covered • Storage Systems: • Physical devices and characteristics, especially disks. • Logical layout of data; data structures, especially, indexed­sequential files, B­trees, hashing. • Multidimensional indexes for GIS and OLAP • Query optimization (we concentrate a lot here): • Query­plan generation; algebraic transformations. • Join methods. • Resilience: • Logging. • Authorization and encryption. • Transaction processing: Serialization, deadlocks, locking, timestamping. • Distributed DB's. • OLAP in detail • Data Integration

More Related