1 / 55

Lecture 15

Lecture 15. Persistence Database Management Systems. Resources so far. We studied: H ow memory is managed within the virtual memory space of a process. How messages can be exchanged between processes, using inter-process communication. Storage resource.

Download Presentation

Lecture 15

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture 15 Persistence Database Management Systems

  2. Resources so far • We studied: • How memory is managed within the virtual memory space of a process. • How messages can be exchanged between processes, using inter-process communication

  3. Storage resource • Storage remains unchanged beyond the end of the process execution. • RTE provides services to processes to support persistent data storage. • Persistent: the data survives after the process with which it was created has ended.

  4. Persistent storage • Can be shared among several processes • We will discuss two types of persistent storage services: • The file system • Database services.

  5. File System (FS) Services • The file system provides the following interface to processes: • Map names to storage locations: • Storage location=storage on disk. • FS maps names ("/usr/local/file.ext") to concrete locations on disk. • Interface through system calls:open(name), create(name), delete(name);queries getfiles(name-pattern). • FS organizes space of file names in hierarchical structure of embedded directories.

  6. File System Services • Access through streams: • Stream interface to read and writea file as a stream of bytes. • FS Allows to seek within the file, read its content and write new content. • Allocationof space / automatic increase storage space (in writes).

  7. File System (FS) Services (interface) • Locking files: • Allows multiple processes to synchronize access on shared files. • One can lock a whole file or just a segment of bytes within the content of a file.

  8. File Systems • FS provides abstract interface of system calls (open, read, write, seek etc) • implemented by specific drivers which know how to interact with specific hardware devices. • Distributed FS: • process can request access to files stored on a remote machine: • Local FS in RTE passes the system calls (open, read, etc) to FS-server on server • Distributed file systems (for example, NFS and SAMBA) have complex locking mechanisms.

  9. Database Management System (DBMS) • Another way to store persistent data. • DBMS is present as a service. • Processes connect through inter-process communication protocol (over TCP). • DBMS manages storage, often by accessing FS on its side.

  10. File systems vs. DBMS - differences • Data Model • FS is very simple: a stream of bytes with the open/close, read/write and lock operation. • DBMS defines a rich data model, manages serialization (encoding/decoding). • Programmer must define the specific data model for the data he is interested in storing.

  11. File systems vs. DBMS • Data Independence – • DBMSs provide an abstract interface for data storage/access. • Programmer does not need to define in which file, at which offset data is to be stored. • One can query the DBMS by content (e.g., "retrieve all content that satisfies certain criteria").

  12. File systems vs. DBMS • Efficient concurrency • DBMSs are built to support thousands of concurrent users. • Ensure data is kept consistent • Provide efficient management of such high concurrency.

  13. File systems vs. DBMS • Reduced application development time • DBMSs allow data manipulation using a simple API. • DBMS manages storage, query optimization, concurrency and integrity management.

  14. DBMS Services • Security management: • Verify that users are authorized to access data. • Specific rights can be granted for each part of the data and each group of users.

  15. DBMS Services • Session and Transaction management: • Users can request the execution of data transactions. • Transaction = complex data operation - read and modify many different objects • Viewed from the outside as a single atomic operation- completely succeeds or not performed.

  16. DBMS Services • Query optimization and execution: • clients interact with DBMS by complex queries • DBMS execute queries most efficiently • Powerful query compilation: relational database servers.“Relational” == “tabular”.

  17. Back-end services • not directly visible to user: • File and Access methods. • Buffer management. • Disk space management.

  18. Transaction: • Unitof work performed against a database management system • Treated in a coherent and reliable way independentlyof other transactions. • A transaction is atomic, consistent, isolated, durable (ACID). • Composed of independent units of work, each reading and/or writing information to DB

  19. Transaction • Provide "all-or-nothing" - work units must complete or take no effect whatsoever. • Must be isolated from other transactions • Results must agree with existing constraints in database. • If completed successfully transactions must be committed to the storage.

  20. Transaction pattern • A transaction is usually issued to the DB in a language like SQL, using a pattern similar to the following: • Begin transaction. • Executedata manipulations and queries. • If no errors occur then committransaction. • If errors occur rollbackthe transaction.

  21. Transactions resources • To implement transactions, the RTE employs the following logic: • Identify resources that will be accessed during the transaction. • SQL: full tables or parts of tables (rows). • Acquire locks on all resources before transaction starts.

  22. Transactions safety • Perform transaction. • Modifications performed on snapshot in memory. • isolation: data in transaction is not affected by other threads. intermediate modifications not committed are not visible to other threads.

  23. Transactions safety • Transaction committed: modifications are merged to persistent storage • If transaction is rollbacked: modifications are deleted and the storage is not modified.

  24. Data Models • Data model = which data values can be sent for storage • Example: Object Oriented method: • Primitive data types (int, bool, char etc) • Complex data types • References, methods etc. • Dominates the programming world. • DBMS - relational model – dominates DB world.

  25. Relational Model • "Relation" = "table","relational" = "based on tables“… • Logical representation of information. • Database normalization - is the process of organizing the columns (attributes) and tables (relations) of a DB to reduce data redundancy and improve data integrity.

  26. Case Study: A Data Model for Colleges • Data model of an academic college. • Different departments, each specializing in teaching a certain domain. • Computer Science / Physics department • Department has a department head and a geographical location. • Students may be enrolled in one department and registered to courses given by it.

  27. building blocks • Relation - set of records of same fields. header of a table • Our example: students, departments, etc. • Attribute (columns) - field within a relation(data type, name) • Our example: student name, department head,

  28. building blocks • Domain - restriction of data types • our example: student ID is 9 digits 0-9 each. • Records - (or tuple), single row within a relation. • Integrity constraints - constraint on attribute with regard to all records • our example: Student ID is unique.

  29. Relational Algebra • Selection - select specific rows from a relation. • Projection - projects specific columns • Set operations - union, intersection, cross-product, set-difference • on relations that have the same structure (same attributes) • Join - cross product followed by selection and projection. • Renaming operation - change name of columns.

  30. Example: select Selection: Projection:

  31. Join example: consider the tables

  32. Cross join

  33. Inner join

  34. Properties • Take existing relations as arguments and return new relations as value • Columns of tables in DB are restricted to being simple values • Columns must be primitive data types • No embedded tables or array. • Complex data - primary and foreign keys.

  35. SQL (Structured Query Language) • Query language - interface with relational DBMS • DDL: Data Definition Language define schemas, relations and domains. • DML: Data Manipulation Language - queries, insertions, updates and deletions

  36. Client interacts with SQL server by sending SQL queries (TCP protocol) • Receiving as answers a result-set. • The client requests the rows of the result as needed (result may be large). • Communication protocol between an SQL server and its clients is statefuland session oriented.

  37. Keys • subset of attributes of relation which uniquely identify rows. • Example: relation for students with attributes name, birthday, address and Social Security Number (SSN). • Name is not a key (two students could have the same name). SSN is a key. • Several attributes as a key: courses (course name, department, semester, lecturer) course name is not a key / but the pair (course name, semester) could be.

  38. Database Normalization and Foreign Keys • minimize duplication of information (also known as "Don't Repeat Yourself" (DRY) • Duplication= data appear twice in the domain of the problem modeled. • Example: relation student (ID, Name, Course) a student is enrolled to 3 courses = 3 records duplicating the ID and Name

  39. Solution: student data will be stored only in a single relation. • Use keys to remove data repetition • use a single attribute SSN to refer to a student.

  40. Join • Retrieve the information on the student given a tuple from the courses relation, we need to operate a join operation on the SSN attribute (which is shared by the 2 relations) • The LEFT JOIN keyword returns all rows from the left table (table_name1), even if there are no matches in the right table (table_name2).

  41. Foreign key • In student relation, SSN is a key. • In course relation, StudentSSN is not a key • Several courses taken by a single student • StudentSSNrefers to the key of another table. • It is foreign key in the course relation - refers to the student relation.

  42. Data Integrity Constraints • Cannot create a tuple in the course relation that refers to a student that does not exist. • New row in course relation: check foreign key exists in student relation. • Update SSN field of course relation: check new StudentSSNexists in student relation. • Delete row in student relation: verify no rows in course that refer to it • Update row in student relation: verify no rows in course that refer to old value.

  43. When defining a data model on a server, user declares which constraints are part of the model: • keys and foreign keys. • DBMS server enforces constraints and verifies that insert, update, delete operations – have no conflict with constraints.

  44. NULL • belongs to all domains • indicateslack of knowledge- we do not know what its value is • example (1 = NULL) is false and (1 != NULL) is also false! (NULL = NULL) is also false. • compare NULL with value, result is always false.

  45. Indexing • Representing the records in a relation in a data structure to allow efficient queries with respect to certain attributes. • B-Tree / hash-tables • query efficient • cost time when inserting or deleting data. • Implicit indexing - employed by DBMS for attributes defined as unique (keys).

  46. Design Steps

  47. Step A – Identify simple correlation among attributes • A Student has a name and ID. • ignore Courses and Department for now - not simple - hold correlation to other attributes as well: • Department has a name, location and head. • Course has a name.

  48. Step B – Identify unique attributes • Unique attribute – the most important feature. • We wish to have at least one attribute in a relation that is unique • Primary Key - a unique attribute that must have a value (NULL is not allowed). • Student relation: ID = primary key. • Courses relation = artificial course number. department ID.

  49. Any relation that should be later queried efficiently must have a primary key. • in rare cases, no need to define primary key. • example: logs stored in DB, only insertions are performed occasionally based on a time range. • no requirement to uniquely identify a single row of a log - so there is no need to define a key on such a table.

More Related