1 / 49

Lecture 10 Term 2

Lecture 10 Term 2. 10/3/14 Data Integrity and Security. Sub-queries within sub-queries. • A sub-query is a query within a query. We can also have sub-queries within sub-queries (a query within a query within a query). These are called nested subqueries and we can go to 255 levels.

morag
Download Presentation

Lecture 10 Term 2

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture 10 Term 2 10/3/14 Data Integrity and Security

  2. Sub-queries within sub-queries • A sub-query is a query within a query. We can also have sub-queries within sub-queries (a query within a query within a query). These are called nested subqueries and we can go to 255 levels. • An example of this is select emp_name, sal from employee where sal > (select avg(sal) from employee where sal > (select losal from salgrade where grade =1) ); Output: WALSH 3500 Walsh 3500 1550 700

  3. EMP_NO EMP_NAME JOB DEPT_NO SAL COMM HIREDATE MANAGER_NO ---------- -------------------- -------------------- ---------- -------------------- -------------------- --------- ---------- 3 HEARNE ANALYST 4 800 100 07-JAN-11 4 BYRNE CLERK 3 100 20-FEB-09 6 WALSH MANAGER 4 3500 11-OCT-07 8 HARTE ACCOUNTANT 3 700 300 20-JAN-12 11 CASEY MANAGER 1 1000 150 17-OCT-08 12 MURRAY CLERK 2 360 80 27-JAN-12 9 DOHERTY CLERK 3 150 28-JUN-11 10 MARTIN MANAGER 2 900 190 07-SEP-13 • GRADE LOSAL HISAL • ---------- ---------- ---------- • 5 3001 9999 • 4 2001 3000 • 3 1401 2000 • 2 1201 1400 • 700 1200

  4. Usage of Multiple Row operators [> ALL] More than the highest value returned by the subquery [< ALL] Less than the lowest value returned by the subquery [< ANY] Less than the highest value returned by the subquery [> ANY] More than the lowest value returned by the subquery [= ANY] Equal to any value returned by the subquery (same as IN)

  5. Multiple Row Sub queries • Returns more than one row • Uses multiple comparison operators • IN – equal to any member on the list • ANY – compare value to each value returned by the subquery • ALL – compare value to every value returned

  6. More on Subqueries select emp_no, emp_name, job, sal from employee where sal < ALL (select sal from employee where job='MANAGER'); EMP_NO EMP_NAME JOB SAL ---------- -------------------- -------------------- -------------------- 4 BYRNE CLERK 100 ALL compares a value to every value returned by a subquery Example above displays employees whose salary is less than the salary of all employees with job MANAGER and whose job is not MANAGER >ALL means more than the maximum < ALL means less than the minimum SAL -------------------- 3500 1000 900

  7. More on All.. select emp_no, emp_name, job, sal from employee where sal > ALL (select sal from employee where job='MANAGER');

  8. Any select emp_no, emp_name, job, sal from employee where sal = ANY (select sal from employee where job='MANAGER'); EMP_NO EMP_NAME JOB SAL ---------- -------------------- -------------------- -------------------- 6 WALSH MANAGER 3500 11 CASEY MANAGER 1000 10 MARTIN MANAGER 900 ANY (SYNONYM SOME) compares a value to EACH value returned by a subquery SAL should be ANY of the Manager’s salary >Any means more than the minimum < Any means less than the maximum Some further examples here: • http://www.w3resource.com/sql/subqueries/multiplee-row-column-subqueries.php

  9. Using Outer and Inner Rows • We have seen that a sub-query is a query within a query. • The inner query is run first, and the result passed to the outer query. • What happens in a correlated sub-query? • A correlated sub-query takes this a step further. • In a correlated sub-query, the outer rows and inner rows are used together. The query is executed once for each row in the outer query. • In queries that include a correlated subquery (also known as a repeating subquery), the subquery depends on the outer query for its values. • This means that the subquery is executed repeatedly, once for each row that might be selected by the outer query.

  10. Correlated Sub queries • As opposed to a regular subquery, where the outer query depends on values provided by the inner query • A correlated subquery is one where the inner query depends on values provided by the outer query • This means that in a correlated subquery, the inner query is executed repeatedly, once for each row that might be selected by the outer query • Correlated subqueries can produce result tables that answer complex management questions.

  11. Continued.. • Let us take the example of determining all employees who earn less than the average salary for their job type. select emp_name, job, sal from employee e where sal < ( select avg(sal) from employee e where job=e.job ) order by job; EMP_NAME JOB SAL -------------------- -------------------- -------------------- HARTE ACCOUNTANT 700 HEARNE ANALYST 800 BYRNE CLERK 100 MURRAY CLERK 360 DOHERTY CLERK 150 MARTIN MANAGER 900

  12. SELECT * FROM EMPLOYEE ORDER BY JOB; EMP_NO EMP_NAME JOB DEPT_NO SAL COMM HIREDATE MANAGER_NO ---------- -------------------- -------------------- ---------- -------------------- -------------------- --------- ---------- 8 HARTE ACCOUNTANT 3 700 300 20-JAN-12 3 HEARNE ANALYST 4 800 100 07-JAN-11 4 BYRNE CLERK 3 100 20-FEB-09 12 MURRAY CLERK 2 360 80 27-JAN-12 9 DOHERTY CLERK 3 150 28-JUN-11 6 WALSH MANAGER 4 3500 11-OCT-07 10 MARTIN MANAGER 2 900 190 07-SEP-13 11 CASEY MANAGER 1 1000 150 17-OCT-08

  13. The Process The sequence that is followed is: • Get the first row from the outer query • Run the inner query using the value from the row from step A • Determine whether to return a result based on the outer where clause • Repeat process until all outer rows have been selected

  14. EMP_NO EMP_NAME JOB DEPT_NO SAL COMM HIREDATE MANAGER_NO ---------- -------------------- -------------------- ---------- -------------------- -------------------- --------- ---------- 3 HEARNE ANALYST 4 800 100 07-JAN-11 4 BYRNE CLERK 3 100 20-FEB-09 6 WALSH MANAGER 4 3500 11-OCT-07 8 HARTE ACCOUNTANT 3 700 300 20-JAN-12 11 CASEY MANAGER 1 1000 150 17-OCT-08 12 MURRAY CLERK 2 360 80 27-JAN-12 9 DOHERTY CLERK 3 150 28-JUN-11 10 MARTIN MANAGER 2 900 190 07-SEP-13

  15. Continued… No other analyst in my dataset no matching inner record EMP_NAME JOB SAL -------------------- -------------------- -------------------- HEARNE ANALYST 800 BYRNE CLERK 100 WALSH MANAGER 3500 HARTE ACCOUNTANT 700 CASEY MANAGER 1000 MURRAY CLERK 360 DOHERTY CLERK 150 MARTIN MANAGER 900 select emp_name, job, sal from employee e where sal < ( select avg(sal) from employee where job=e.job ); No other accountant in my dataset no matching inner record EMP_NAME JOB SAL -------------------- -------------------- -------------------- BYRNE CLERK 100 CASEY MANAGER 1000 DOHERTY CLERK 150 MARTIN MANAGER 900

  16. Example of the process • Lets look at this process for the query above. • Get first row from employee table (outer query) • Run the inner query. • The where clause ensures that the job type matches between the two rows. • Get the average salary for the job type from the outer row • The outer where clause determines whether or not to return a row. • In this case, is the salary less than the average salary for that employees job type. • Get the next row and repeat.

  17. Which is it? If you are unsure whether a sub-query is a correlated sub-query, there is a simple check. • If the inner query compares to a row from the outer query, then it is a correlated sub-query. http://www.tutorialspoint.com/sql_certificate/subqueries_to_solve_queries.htm

  18. EXISTS • We have already seen the SOME, ANY, and ALL operators. • There is another operator which is often used with correlated sub-queries. • This operator is the EXISTS operator (NOT EXISTS is its opposite). • EXISTS determines if the inner query returns any values.

  19. EXISTS (cont) • An example of this is determining all employees who have people reporting to them (i.e. they manage people). select emp_name, emp_no, job from employee e where exists (select emp_name from employee where manager_no=e.emp_no);

  20. The Process for EXISTS example • Get the first row from the inner query • Run the inner query using the value from the row from step A • Determine whether to return a result based on the outer where clause • Repeat process until all outer rows have been selected. • The outer query gets the first row (the first employee) • The inner query goes through all employees checking if their managers number (manager_no) is equal to the employee number from the outer query. • If one or more employees manager number matches the employee number from the row from the outer query, then we have a match • The “outer” employee does have people reporting to him/her • If there is a match, then the inner query will return rows. • If the inner query returns rows, then the EXISTS operator is true and the “outer” employee will be returned.

  21. Database Administrator • Oversee a staff of database specialists • Final recommendations for DB design • Load and maintain DB • Establish security controls • Perform backup and recovery

  22. Data Administration • The role of managing the organisation’s data resources, concerned with information policy, data planning, maintenance of data dictionaries, and data quality standards. • http://www.youtube.com/watch?v=Jos6dRLzi80

  23. Data Administration Data Administrator Database technology And management Database Management System Data planning and modelling technology Users

  24. NOSQL • NoSQL encompasses a wide variety of different database technologies and were developed in response to a rise in the volume of data stored about users, objects and products, the frequency in which this data is accessed, and performance and processing needs. • Relational databases, on the other hand, were not designed to cope with the scale and agility challenges that face modern applications, nor were they built to take advantage of the cheap storage and processing power available • Document databases • Key-value stores • Graph stores • Wide-column stores • http://www.mongodb.com/learn/nosql

  25. PostgreSQL • Open source object-relational database system • It has more than 15 years of active development and a proven architecture that has earned it a strong reputation for reliability, data integrity, and correctness • Runs on all major operating systems • http://db-engines.com/en/system/Oracle%3BPostgreSQL

  26. Threats to the Database • User Errors • Software errors • Hardware failures • Malicious damage • Breach of confidentiality • Concurrency errors

  27. Data Integrity • Entity Integrity • Referential Integrity

  28. Data Validation • Defence against erroneous data entry to the database • Example You can define rules to validate certain fields against specific values, a range of values, patterns, and length. You can define multiple rules, which are listed in the ruleList element. Each rule you define is contained within a rule element, and each rule element only defines one rule (though the rule can be complex and include conditional rules and operators).

  29. Type Checking • Range of acceptable input values • Admissible operations on the data • How the data is handled on output

  30. Validation Techniques • Constraints definable with the DBMS • Field level validation • Input mask • Validation rule/text • required • Record level validation • Control validation • Form-level validation • SQL constraints

  31. More validation… • Assertions • General controls on content of the database • Triggers • General validation mechanism enabling execution of units of code when certain related events occur

  32. Back-up and Recovery • Strategy for protecting against loss of database data • Periodic copying of the data to some other storage unit and place the copy in a secure location • Typically taken when the system is inactive • Frequency dictated by: • the rate of transactions applied to the database • The level of availability of service demanded by the application • The balance of time requires to perform the backup compared with the potential delay in recovery

  33. Transaction Logs • Journal to record the effect of all changes made to the database by transaction by application systems • In the event of systems failure, it is possible to recover the database by restoring the most recent backup and then re-executing the transactions recorded in the transaction log

  34. Checkpoints • Transaction status records taken at intervals during normal processing of transactions • At checkpoint: • Initiation of new transactions are temporarily suspended • All memory buffers are flushed to disk (ensures that all committed transactions have been actioned on the physical database) • All currently active transactions are noted and recorded in the transaction log

  35. Data Privileges and Permissions • Multi-user databases require control of data in terms of access to modification and accessibility • Necessary to ensure who can read or change the data • Who has permission to: • Create/change or drop a table • To grant privileges to others

  36. Facilities in SQL • Use the commands: • GRANT • REVOKE • E.g GRANT select, update ON Order TO Joe, Mary; REVOKE update ON Order TO Joe:

  37. Facilities in MS Access • Administer permission allows a user or group to change anyone’s permissions • Allowed by user/group by object and permission • This should only be accessed where necessary

  38. For consideration… • Describe the techniques that can be employed to ensure data recovery in the event of a system failure • Consider the techniques that can be employed to ensure that data maintains its accuracy in the database

  39. Requirements collection and analysis • Involves the collection and analysis of the requirements of the users and application areas. • Gather info for each major user view. • A description of the data used or generated. • Details of how the data will be used. • Additional requirements for the new db application.

  40. Multitple User Views • Three approaches to dealing with a number of user views.. • The centralised approach • The view integration approach • A combination of both approaches.

  41. Database Design • Includes the conceptual, logical and physical design of the database. • Conceptual design • models the information used in an organisation independent of all physical considerations. • Logical database design • identifies the important objects that need representation. • Physical design • decides how the logical design is physically implemented.

  42. DBMS Selection(optional) • Selection of a suitable DBMS for the database applications. • If no DBMS exists, this is an appropriate part of the lifecycle to make the selection. • A simple approach is to tick off DBMS features against company requirements.

  43. Application Design • Phase used for designing the user interface and the application programs that use and process the database. • Need to design the interaction between the application itself and the database (transaction design). • Requires design of an appropriate interface to the database application.

  44. Prototyping (optional) • Involves building a working model of the db application. • At various points during the design of the application there is the option of build a prototype of the system. • Two common strategies used. • Requirements Prototyping • Evolutionary Prototyping

  45. Implementation • Involves creating the external, conceptual, and internal db definitions and the applications programs. • It is done using DDL (data definition language) creating the database structure and empty database files. • The application programs are usually third or fourth generation languages

  46. Data Conversion and Loading • Involves converting and loading data from the old system to the new system. • Required only when a new system is replacing an old system. • It is common that the DBMS has utilities that load data into the new db. • This data is automatically converted into the required format of the new db files.

  47. Testing • The database application is tested for errors and validated against the requirements specified by the users. • Before going live the db application is put through a number of planned test strategies and loaded with realistic data so that the entire testing process is methodically and rigorously carried out.

  48. Maintenance • When the db application is fully implemented the system is continually monitored and maintained. • New requirements are incorporated into the db application through the preceeding stages of the lifecycle. • Maintaining and upgrading the db application when required. • Involves monitoring the performance of the db applcation.

  49. Critical Factors of Database Design • Work interactively with users as much as possible. • Follow a structured methodology throughout the data modeling process. • Employ a data-driven approach. • Incorporate structural and integrity considerations into the data models. • Combine conceptualization, normalisation, and transaction validation techniques into the data modelling methodology.

More Related