1 / 48

Introduction to Database Systems

Introduction to Database Systems. Today’s session. Some history Origins of DB Organisational structure Organisational information systems Data storage DBMS Data models. What is “Data”?. ANSI definition: Data

conan
Download Presentation

Introduction to Database Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to Database Systems

  2. Today’s session • Some history • Origins of DB • Organisational structure • Organisational information systems • Data storage • DBMS • Data models

  3. What is “Data”? • ANSI definition: • Data • A representation of facts, concepts, or instructions in a formalized manner suitable for communication, interpretation, or processing by humans or by automatic means. • Any representation such as characters or analog quantities to which meaning is or might be assigned. Generally, we perform operations on data or data items to supply some informationabout an entity. • Volatile vs. persistent data • Our concern is primarily with persistent data

  4. PROGRAM 1 DATA SET 1 Data Management PROGRAM 2 DATA SET 2 Data Management PROGRAM 3 DATA SET 3 Data Management Early Data Management - Ancient History • Data are not stored on disk • Programmer defines both logical data structure and physical structure (storage structure, access methods, I/O modes, etc) • One data set per program. High data redundancy.

  5. Problems • There is no persistence. • All data is transient and disappears when the program terminates. • Random access memory (RAM) is expensive and limited • All data may not fit available memory • Programmer productivity low • The programmer has to do a lot of tedious work.

  6. File Processing - Recent (and Current) History • Data are stored in files with interface between programs and files. • Various access methods exist (e.g., sequential, indexed, random) • One file corresponds to one or several programs. PROGRAM 1 Data Management FILE 1 File System Services PROGRAM 2 Redundant Data Data Management PROGRAM 3 FILE 2 Data Management

  7. File System Functions • Mapping between logical files and physical files • Logical files: a file viewed by users and programs. • Data may be viewed as a collection of bytes or as a collection of records (collection of bytes with a particular structure) • Programs manipulate logical files • Physical files: a file as it actually exists on a storage device. • Data usually viewed as a collection of bytes located at a physical address on the device • Operating systems manipulate physical files. • A set of services and an interface (usually called application independent interface – API)

  8. Problems With File Systems • Data are highly redundant • sharing limited and at the file level • Data is unstructured • “flat” files • High maintenance costs • data dependence; accessing data is difficult • ensuring data consistency and controlling access to data • Sharing granularity is very coarse • Difficulties in developing new applications

  9. Database Approach PROGRAM 1 DBMS Query Processor Integrated Database PROGRAM 1 Transaction Mgr … PROGRAM 2

  10. Origins of Databases • c1455: Gutenberg invents printing. Explosive interest in publication of books (analogous with explosive growth of Web in early 1990s) leads to public libraries • Libraries were first to introduce standards for information storage and retrieval • These paper-based systems were extended and enhanced, and filing, indexing and classification schemes were developed • Second World War: accelerated R&D in computing technologies spawned capability to computerise maintenance of records

  11. Computerised Data Storage • Advantages of computerised data storage over paper-based systems include: • ability to store data compactly (e.g Britannica CD) • enhanced data retrieval • ability to access data remotely, e.g. from a mobile workstation, off-site location, or distant branch • ability to share data amongst multiple users with concurrent access • facility to automate regular, speedy back-ups • enhanced data editing • Most significant disadvantage is vulnerability e.g system crash, corrupted data, viruses, hackers, etc.

  12. Information: A Corporate Asset • Information is a vital corporate asset. Without accurate, current, relevant information, mistakes and misjudgements may be made • Data management is an essential capability in the modern business environment / information society • A knowledge organisation is one in which the primary asset is information; its competitive advantage is derived from effective use of documented knowledge. Examples: accounting firms, marketing companies, software houses • Organisational memory extends and amplifies information / knowledge by capturing, organising, disseminating, and reusing it

  13. Organisational Memory Database (logical grouping of related files) File / Table (collection of related records) Record (collection of related fields, bound together as single units) Field (part of a record reserved for a particular data item) Byte (group of eight binary bits - can represent a single character) Organisational Memory

  14. Organisational Memory Organisational memory Data Informed decisions Improved products and services

  15. Attributes of Organisational Memory • Current • Timely • real-time systems are commonplace in modern business environment • Relevant • data is only useful if relevant to task in hand • Shareable • Complete • Accurate and consistent

  16. Attributes of Organisational Memory • Transportable • authorised personnel should have access to data anywhere, anytime • Secure against unauthorised access

  17. Organisational Information Systems • Generally, there is an alignment between business units and core operational systems • Typical core systems are: • Sales & Marketing Department: Customer management system, Order processing system • Operations Unit: Purchasing system, Inventory control system • Finance Department: Accounts payable and receivable systems, Credit Management system • There are interdependencies between these systems; hence the need for an integrated data management approach

  18. Data Storage • Information systems create, read, update and delete data • Data can be stored in conventional files or databases • Filesare collections of similar records • Databasesare collections of interrelated files. • Records can be linked through specified relationships to records in other files

  19. File Information System File File File Conventional Files • In the file environment, data storage is built around the applications that will use the files • Essentially, the file “belongs” to a specific application. This is termed program-data dependence • As applications are developed, customised files are created which may be unusable by other applications

  20. Conventional Files • First attempts at computerised storage of records followed traditional paper-based metaphors (“Flat file” systems) • Flat files were inefficient for data retrieval: it might be necessary to search entire file for a record (which, it may transpire, does not exist). Remedy: index files • Indexing improved data retrieval, but conventional files have other disadvantages: • Program-Data dependence • Proprietary file formats (closed systems) • Poor scalability

  21. Conventional Files • Disadvantages (Cont’d): • Duplicated and redundant data • ambiguity: same thing being referred to by different names in different places • inconsistency: conflicting / unsynchronised data • wasted effort • Separation and isolation of data • data dispersed amongst many files complicates processing • Inflexibility • cumbersome data structures and report layouts • not responsive to ad hoc queries • excessive program maintenance • Development environment • procedural -v- non-procedural (3GL -v- 4GL)

  22. Conventional Files • Advantages: • Historically, conventional files have been faster to process than DBMS applications • As legacy file-based systems become candidates for reengineering, the trend is to replace them with database systems

  23. Information System Information System Information System Information System Database Database Management Systems • A database is a large, integrated collection of data which models a real-world enterprise • A Database Management System (DBMS)is a software package designed to store and manage databases • In a DBMS environment, applications are built around an integrated adaptable database

  24. Database Management Systems • Advantages: • ability to share the same data across multiple applications and systems • data independence • control of redundancy • enforced data integrity • improved data security • uniform data administration • concurrent access • improved backup and recovery facility • flexible data structures

  25. Database Management Systems • Advantages (Cont’d): • databases allow the use of the data in ways not originally specified by the end-users (ad hoc queries) • database definition can be extended without impacting existing programs that use it • economies of scale

  26. Database Management Systems • Disadvantages: • database technology is more complex than file technology • requires more sophisticated hardware and software (DBMS) • DBMS’s can still be slower than file-based systems • database technology requires a significant investment • database administration • operating costs and ongoing maintenance • end-user training • higher impact of system failure

  27. Database Management Systems • Roles in a DBMS environment • Data Administrator • Database Administrator • Database Designer • Application Programmer • End-User

  28. Database Architectures • Hierarchical Data Model • Network Data Model • Relational Data Model • Object-Relational Model

  29. Data Relationships • One-to-One (1:1) • Example: Bank Manager manages one and only one Bank Branch; Bank Branch is managed by one and only one Bank Manager • One-to-Many (1:M) • Example: An Employee works in a designated Department; in any Department, there may be many Employees • Many-to-Many (M:M) • Example: A Student registers for one or more Courses; for any Course, there may be one or more registered Students

  30. Hierarchical Data Model

  31. Hierarchical Data Model • Arose as a solution to a practical problem • Managing millions of parts for the space program (standard “Bill of Materials” problem) • The basic structure is a hierarchy or tree • Parent-child relationship • Relationships are represented by pointers • Restrictions: • Each segment has at most one parent • All relationships are 1:M

  32. Hierarchical Data Model • Problem: how to represent a M:M relationship ? • A hierarchical structure (tree) can only support 1:M relationships. Therefore, to represent M:M, must create multiple hierarchies • … but, this means that records are duplicated in different hierarchies • duplication gives rise to data anomalies • duplication can be eliminated using pointers; must decide in which table to store data, and in which table to store pointer. This is a very awkward means of implementing M:M

  33. Network Data Model

  34. Network Data Model • Objective was to overcome shortcomings of the hierarchical data model, in particular, representation of M:M relationship • Like the hierarchical model, it can be likened to trees; unlike hierarchical model, several trees can share branches • In practice, enjoyed little commercial success • Too complex: suited to use by programmers as opposed to end-users • Overtaken by the relational model • No clear theoretical base

  35. Network Model: Structures • Data item • a field or attribute • Record • a collection of data items • Vectors (repeating structures) are permitted • Relationships are represented by sets • Sets have owners (parent) and members (children) • A member cannot have two parents in the one set • cannot directly represent M:M relationships • Member records of a set can be ordered

  36. Shortcomings of Early Models • Languages of both hierarchical and network are procedural and record-at-a-time • To retrieve data … • you must navigate (find a path) to the required record • issue multiple statements directing the system to traverse that path • It was necessary to issue multiple requests to the DBMS to retrieve a data item • It was necessary to have a detailed understanding of how data was stored and structured • this is contrary to data independence principle

  37. Relational Data Model • Relational model, developed by E. F. Codd, has a strong theoretical basis and overcomes shortcomings of network / hierarchical models • Relational model is non-procedural (What?, not How?), and set-at-a-time • Navigation is automatic • Relations • 2-dimensional data set consisting of N columns (fields / attributes) and M rows (records) • All rows in a relation are unique • A relational database is a set of relations

  38. Relational Model: Structures

  39. Relational Model: Structures

  40. Relational Model: Structures • Primary key • A unique identifier of a row in a relation; can be composite • Candidate key • An attribute that could be a primary key • Alternative key • A candidate key that is not selected as the primary key • Foreign key • An attribute of a relation that is the primary key of another relation; can be composite

  41. SQL Relational Algebra SELECT * FROM A WHERE condition A where condition Restrict A [X] SELECT X FROM A Project A times B SELECT * FROM A, B Product SELECT * FROM A UNION SELECT * FROM B A union B Union SELECT * FROM A WHERE NOT EXISTS (SELECT * FROM B WHERE A.X = B.X AND A.Y=B.Y AND ... A minus B Difference Relational Algebra and SQL • A major strength of the relational model is that it supports SQL, a flexible data retrieval language which facilitates ad hoc queries • Relational algebra is a standard for judging a data retrieval language

  42. Relational Databases • A truly relational database supports • structures (domains and relations) • integrity rules • a manipulation language • The word “relational” is sometimes used too freely; many commercial systems are not fully relational because they do not support domains and integrity rules • E. F. Codd has set forward 12 rules that a database must satisfy before it can be said to be truly relational

  43. Codd’s 12 Rules • Information representation • All data, including metadata (data definition, constraints, user names, etc.), is represented solely and explicitly as values in a table • Guaranteed access • Every value in a database is accessed by specifying a combination of table name, column name, and value of the primary key of the row in which it is stored • No artificial paths, such as linked lists • Systematic treatment of null values • There must be a distinct representation for unknown data, irrespective of data type • Null values are not equivalent to zero or the empty string

  44. Codd’s 12 Rules • Dynamic on-line catalog based on the relational model • database description is represented at the logical level in the same way as ordinary data • thus, only one manipulation language needs to be learned • Comprehensive data sublanguage rule • A relational system may support several languages and various modes of terminal use • However, there must be at least one language that supports data definition, data manipulation, security and integrity constraints, and transaction processing operations • View updating • If the base tables of a view are updated, then the view itself should be updated

  45. Codd’s 12 Rules • High-level insert, update, and delete • The system must support set-at-a-time operations; for example, a set of rows can be deleted by a single statement • Physical data independence • Changes to storage representation or access methods will not affect application programs • Logical data independence • Implementation of changes to base tables will not affect application programs; for example, if a table is restructured or split, applications should be immune to change (views are beneficial here)

  46. Codd’s 12 Rules • Integrity independence • Integrity constraints should be part of a database's definition rather than embedded within application programs • It must be possible to change integrity constraints without affecting any existing application programs • Distribution independence • Introduction of a distributed DBMS or redistributing existing distributed data should have no impact on existing applications • Nonsubversion • It must not be possible to use low-level record-at-a-time interface to by-pass high-level set-at-a-time security or integrity constraints

  47. Codd’s Rule 0 • A relational DBMS must be able to manage databases entirely through its relational capacities • A DBMS is either totally relational or it is not relational

  48. Tasks in the seminars 1 - 3 • In order to do MMDB you need to be able to use SQL – self study sessions with tutor support

More Related