Organizing Data and Information Chapter 5
Topics in Ch. 5 • General data management concepts and terms • Two major approaches to data management • Three database models • The common functions performed by database management systems • Three popular end-user database management systems • Recent database developments.
Data and Information What is data? • A collection of raw unprocessed facts, figures, and symbols What is information? • Data that is organized, meaningful, and useful How are data and information related? • Computers process data into information
What is a database? Includes a collection of data organized so you can access, retrieve, and use the data Database software allows you to Create a computerized database Add, change, and delete data Sort and retrieve data from the database Create forms and reports for the data in the database Database software also called a database management system (DBMS)
Hierarchy of data • Bit • A single binary digit with either a one or zero state • Character • A byte, which is the basic building block of information • Field • A combination of one or more characters • The smallest unit of data a user accesses • A field name uniquely identifies each field • Record • A collection of related fields • Files or Tables • A collection of related records • Database • A collection of related files or tables
Data entities, Attributes, and Keys • Entity • Generalized class of people, places, or things for which data is collected, stored, and maintained • E.g. at a university, students, library books, and courses; the entity in figure 5.2 is employee. • Attributes • A characteristic of an entity • E.g., employee number, last name, first name, hire date, and department number in figure 5.2 • Key • A field or set of fields that identifies a record • A primary key: a field or a set of fields the uniquely defines a record: e.g. SSN • A secondary key: an alternative key that can be used to access records: e.g., last name
What is a file management approach? Each department or area within an organization has its own set of files Records in one file often do not relate to the records in any other file Traditional approach (File management approach)
Weakness of the “Traditional” Approach • Data redundancy • Wastes resources such as storage space and people’s time • The files often store the same fields in multiple files • Requires a larger storage capacity • Compromises data integrity • Data integrity • The degree to which data is correct • When a database contains errors, it loses integrity • Very important because computers and people use information to make decisions and take actions • Program-data dependence • Programs and data developed and organized for one application are incompatible with programs and data organized differently for another application • Isolated data • It is often difficult to access data stored in many files across several departments
Reduced data redundancy Improved data integrity Shared data Reduced development time Easier modification and updating so on ( Table 5.1 p. 175) A high cost involved with acquiring and implementing a database Increased vulnerability What are the strengths of the database approach? What are the weakness of the database approach?
How do a database application and a file processing application differ in the way they might store data? File processing example Database example
Data Design Issues • Content: What data should be collected? • Access: What data should be given to what users? • Logical structure: How will the data be organized to make sense to a particular user? • Physical organization: Where will the data actually be located?
Data Modeling • Logical design • How data are grouped together and how that are related to one another • Physical design • Combining or splitting some of the groups identified in the logical design • Planned data redundancy • Data model • A diagram used by a database designer to show the logical relationships among the entities in the database • Entity-relationship (ER) diagrams
Entity-Relationship (ER) Diagrams • Fig 5.5
Correspondence with ER Model • Relations (tables) correspond with entity types and with many-to-many relationship types • Rows correspond with entity instances and with many-to-many relationship instances • Columns correspond with attributes • NOTE: The word relation (in relational database) is NOT the same same the word relationship (in ER model)
Key Fields • Keys are special fields that serve two main purposes: • Primary keys are unique identifiers of the relation in question. Examples include employee numbers, social security numbers, etc. This is how we can guarantee that all rows are unique • Foreign keys are identifiers that enable a dependent relation (on the many side of a relationship) to refer to its parent relation (on the one side of the relationship)
Basic ER Symbols Entity symbol Attribute symbol Relationship symbol
(a) CUSTOMER entity type with simple attributes (b) CUSTOMER relation
(a) CUSTOMER entity type with composite attribute (b) CUSTOMER relation with address detail
Mapping the relationship Foreign key
Database Models • Every database and DBMS is based on a specific data model • Consists of rules and standards that define how the database organizes data • Defines how users view the organization of the data • Three types of database models • Hierarchical models • Data is organized in a top-down or inverted tree structure • Network models • An extension of the hierarchical model in which a member may have many owners • Relational models • Stores data in tables that consist of rows and columns • Each row has a primary key and each column has a unique name
Operations of a relational database • Selection operation • The operation extracts data from a row (record) • Projection operation • Join operation
What is an example of a selection operation? selection operation results
Database Management System • A DBMS is a data storage and retrieval system which permits data to be stored non-redundantly while making it appear to the user as if the data is well-integrated • Software that allows you to create, access, and manage a database • Available for many sizes and types of computers.
Application #1 Application #2 Application #3 DBMS Database containing centralized shared data Database Management System DBMS manages data resources like an operating system manages hardware resources
Provide a user view Schema: the logical description of an entire database Subschema: a file that contains a description of part of the database and identifies which users can view or modify the data items in that part of the database Provide tools to create & modify the database Data Definition Language (DDL) Data dictionary: a detailed description of all the data used in the database Store & retrieve data Manipulate data and produce reports SQL (Structured Query Language) Data manipulation language (DML) Database Management Systems (DBMSs)
Schema • A description of the entire database • Subschema • Shows only some of the records and their relationships in the database • Data Definition Language (DDL) • Command used to create a database • Commands that define a database, including creating, altering, and dropping tables and establishing constraints • Schemas and subschemas are described using a DDL.
Data Dictionary System tables that store metadata Contains data about each file in the database and each field within those files Sometimes called metadata Users usually can view some of these tables Users are restricted from updating them
What is a default value for a field? A value that the DBMS initially displays in a field Data dictionary allows you to specify a default value for a field
Data Dictionary Benefits • Reduced data redundancy • Faster program development • Easier data & information modification • Increased data reliability
What is a query? A request for specific data from the database Four commonly used methods to access data Query language Query by example Form Report generator
What is Structured Query Language (SQL)? The standard data manipulation language for relational databases A query language that allows you to manage, update, and retrieve data Has special keywords and rules that you include in SQL statements (Data manipulation language) projection operation join operation selection operation
Query by Example (QBE) query results wizard for querying the database • The program retrieves records that match criteria you enter in the form fields • Uses a graphical user interface • Available in MS Access • MS Access translates QBE to SQL and vice versa
What is a form? Sometimes called a data entry form A window on the screen that provides areas for entering or changing data in a database
Concurrency Control • Problem – in a multi-user environment, simultaneous access to data can result in interference and data loss • Solution – Concurrency Control • A method of dealing with situation in which two or more people need to access the same record in a database at the same time. • Serializability – • Finish one transaction before starting another • Locking Mechanisms • The most common way of achieving serialization • Data that is retrieved for the purpose of updating is locked for the updater • No other user can perform update until unlocked
Popular DBMS • Popular mainframe-based DBMSs • Oracle • IBM DB/2 • Microsoft SQL Server • Popular end-user DBMSs • MS Access • Lotus Approach • Inprise’s dBASE
Selecting a DBMS • Database size • Number of concurrent users • Performance • Integration • Features • Vendor • Cost
Database Developments • Distributed database • Data warehouses • Data marts • Data mining • Online analytical processing (OLAP)
Distributed Database • Distributed Database: A single logical database that is spread physically across computers in multiple locations that are connected by a data communications link • Replicated database: a database that holds a duplicate set of frequently used data • Decentralized Database: A collection of independent databases on non-networked computers They are NOT the same thing!
Reasons forDistributed Database • Business unit autonomy and distribution • Data sharing • Data communication costs • Multiple application vendors • Database recovery • Transaction and analytic processing