Systems Analysis I Designing Databases

Systems Analysis IDesigning Databases ISYS 200 Glenn Booker Week #5

Databases • Storing and manipulating data are the heart of an information system • Data must be • Available when needed • Accurate • Consistent in its definition • Efficiently stored and retrieved • Presented in some useful form Week #5

Databases • There are two approaches for database design – flat file and relational • A flat (or ‘conventional’) file database keeps separate files for each application • COBOL systems, and FileMaker Pro • Like using Microsoft Excel as a database • A relational database stores data in a single structure for use by many applications • Every major database system; Oracle, DB2, Access, MySQL, Informix, etc. Week #5

Databases • Flat file systems are, by definition, redundant • Data is copied to each report table only when the report is generated, so later changes make the table obsolete • Relational databases keep the connections to each set of data • Each record exists once in the system • Is much more flexible to generate custom views of the data, without changing its structure Week #5

Reality, Data, Metadata • A database models some collection of data which appears in reality • Reality is the thing being modeled • Data is used to describe the key attributes of that thing • About what aspects of that thing do we wish to keep track? • Metadata (data about data) describes the characteristics of the data Week #5

Data Modeling • Data modeling (or database or information modeling) is a way of organizing and describing the data in a system • It is a logical model to describe the specific data fields (elements) we wish to capture, and how they are related to each other Week #5

Where to start? • Recall that data modeling starts with thinking about the things involved in your system • These things are formally called “entities” – nouns, if you will • Start by identifying all of the places, people, events, and ideas which are affected by your system Week #5

Permanent vs. Transient Data • A key for relational data modeling is that we are primarily concerned with data we need to keep permanently • Data which is only needed briefly isn’t modeled in an ERD • Major difference between relational and object-oriented analysis Week #5

Lots of Types of Files • Master files contain permanent information • Transaction files describe business events • Document files contain copies of historic data • Archival files contain master and transaction file records offline, which are no longer needed • Table look-up files contain static data for reference calculations (tax tables, Zip codes) • Audit files record changes to other files Week #5

Characterize Entities • Then examine each entity and determine the attributes which you are interested in – what data do you need to know in order to describe one such entity meaningfully? • Consider if some attributes can be readily grouped together, thereby forming compound attributes (e.g. name) Week #5

Characterize Entities • Entities are generally one of two types: • A set of data you want to keep permanently (customer orders, product information, etc.), or • A lookup list or table (types of status codes, shipping rates, tax rates, etc.) • Data which is transient is generally kept in local variables, and doesn’t appear in an ERD (e.g. change of address info) Week #5

Keep it or not? • In trying to decide if data needs to be kept, consider whether someone might want to analyze that data in the future • For examples, to look for sales patterns, trace relocation history, keep record of data changes (who modified what data and when?) • When in doubt, keep it for now Week #5

Attributive Entities • An entity which depends on the existence of another entity, but only indicates multiple examples of that ‘parent’ entity, is an attributive entity • Hence it’s referred to as describing ‘repeating groups of data’, or they are ‘parent/child’ entities • Shown using part of an oval inside the entity box (p. 36) Week #5

Database Notation • In week 2 we covered the basic notation for cardinality • Only one • Zero or one • One or many • Zero, one, or many • Strictly many • Now we want to put all the two-entity pairs together, into a single diagram Week #5

ERD • The Entity-Relationship Diagram (ERD) captures all the entities for our system, and shows the relationships among them (the lines connecting them) • The relationships all need to have • A verb phrase in at least one direction • Cardinality at both ends • Now we can add attributes and keys Week #5

Attributes • Each entity is almost certain to have at least two attributes (often more) • At least one of them is a unique identifier for each record in the entity, called the primary key • Other attributes describe significant characteristics of the entity • So, about what aspects of the entity do you want to retain knowledge? • Attributes are also called fields, data items, etc. Week #5

Characterize Attributes • For each attribute, define its data type: • Text (“Fred”) [and the character set (Latin)] • Number (real (3.56) or integer (124)) • Date and/or time • Yes/No (a.k.a. T/F, binary, or Boolean) • A fixed set of possible values (e.g. grades) • Formally known as an enumerated list • Multimedia: photos, drawings, movies, sounds Week #5

Attributes & Records • Attributes can be fixed or variable length • Length of integer (8-64 bits, in powers of 2), • Lengths of text fields (the number of characters or bytes it contains) • Length of real number fields and the number of decimals (8.1 only has six digits, -12345.6) • We won’t worry about attribute lengths • A record is a set of attributes from one entity • Typically corresponding to one row in a data table Week #5

Relevant Data Type Standards • Character sets • ISO/IEC 8859 – a set of nine 8-bit character sets, each used for a different family of languages (Latin, Cyrillic, Greek, Arabic, etc.) • Unicode – a 16-bit character set to represent almost all languages • Representation of dates and times • ISO 8601 Week #5

Characterize Attributes • Other traits to look for, beyond the scope of this course: • Identify the domain of each attribute • What is the range of allowable values? • Determine if there is a default value for each attribute • Is each attribute mandatory (required) for each entity? (Avoid many mandatory fields) • We care: could an attribute be a key? Week #5

Key Attributes • An attribute or group of attributes may be a unique identifier, or key, for each entity • Examples are Social Security Number, driver’s license number, ISBN, Student ID • If a group of attributes is used, it is a concatenated (or composite or compound) key • {Course number, section number, and term} could form a concatenated key Week #5

Many Keys Possible • There might be more than one usable key for an entity • Each possible key is called a candidate key • One candidate key is selected to be the primary key (PK) • All others are alternate keys (AK) • Example: the electric company may use a customer ID or account number as a primary key, and your phone number as an alternate key Week #5

Primary Key may be Meaningless • A primary key may correspond to some recognizable attribute • SSN, student ID, ISBN, etc. • Or it may be completely meaningless • A sequential number, called Order_ID • As long as the primary key is unique for every record, either kind is acceptable Week #5

Foreign Keys • A foreign key is an attribute which establishes the connection between two entities • A Sales entity might have a foreign key attribute Customer_ID, which points to all of the data in the Customer entity for the particular customer who placed a given sale • If that were the case, the primary key in the Customer entity must be Customer_ID • This type of connection is the heart of relational database modeling (!) Week #5

Foreign Keys • A foreign key (FK) is an attribute which exists, in an entity other than where it is a primary key (PK), to establish the relationship between the two entities • Primary key must be unique for each record, but a foreign key value may appear many times • Only one PK-FK connection is required for the relationship to exist • Entity with FK generally has a PK of its own Week #5

Other Key Traits • A PK attribute may also be a FK • Especially for 1:1 relationships • An associative entity builds a concatenated primary key from more than one entity • Uses a diamond shape inside the normal box to show its special nature Week #5

Other Relationships • A many-to-many (non-specific) relationship implies a lot of one-to-many relationships • Often use an associative entity to bridge between them • An identifying relationship is when a parent entity’s PK is used as part of the PK for a child entity • Child entity is then considered “weak” because it depends on the parent Week #5

Summary of Key Traits • The bottom line for keys is: • Each entity must have at least one PK • More than one PK implies a concatenated key • Alternate keys are completely optional • Each entity may have from zero to many FK’s • Each FK is a PK in another, related entity • Only one PK-FK relationship is needed to relate two entities • Some keys are not inherently meaningful data Week #5

Data Normalization • Analysis of a data model for implementation is done using data normalization • Normalization organizes data attributes to form simple, non-redundant, flexible, adaptive entities • There are five levels of data normalization, of which three are ever used • They build on each other; so to get to second level you have to comply with first level too Week #5

First Normal Form (1NF) • An entity is in first normal form if there are no attributes which can have more than one value for each instance (record) of the entity • Attributes which could have more than one value for a given entity belong to a different kind of entity • In other words, every attribute appears only once for each record Week #5

Second Normal Form (2NF) Look at concatenated keys only! • Must be first normal form, and: • Each non-primary-key attribute is uniquely determined by the entire primary key • Non-primary-key attributes may not be dependent on only part of the primary key • If any are, move them to another table which uses only that part of the primary key • If there are no concatenated keys, second normal form comes automatically with 1NF Week #5

Third Normal Form (3NF) • Must be second normal form, and: • The value of each non-primary-key attribute is not dependent upon any other non-primary-key attribute • Every attribute depends only on the primary key • The two ways to look for this are derived attributes and transitive dependencies... Week #5

Third Normal Form (3NF) • Derived attributes (data) are fields calculated or logically derived from other fields • Exception: OK to keep attribute if multiple entities are involved in deriving an attribute • Transitive dependencies may exist for non-concatenated keyed tables; is when a non-key attribute depends on another non-key attribute Week #5

Third Normal Form (3NF) • Or in brief, for third normal form…An entity is in third normal form if every non-primary key attribute is dependent on the primary key, the whole primary key, and nothing but the primary key (as in, “Do you swear to tell the truth…”) Week #5

Further Normalization • Additional improvement in data structure is possible through “Simplification by Inspection” - look for other redundancies or simplifications possible • Many CASE tools can also inspect for first level normalization, but generally no further • Just for the record, here are the 4th and 5th normal forms • I’ve never seen them used in industry Week #5

Fourth and Fifth Normal Forms • Fourth normal form (4NF) involves removing multivalued dependencies • If a pair of records has two matching attributes, decompose the data structure to remove that • Fifth normal form (5NF) involves removing join dependencies (nearly impossible to do) • This is when business rules define a connection among many entities (e.g. if you replace a tire, you must also replace the valve stem) Taken from the INFO 605 text, pp. 351-354 Week #5

File Organization • There are many ways to organize files • Sequential organization • Linked lists • Hash files • We won’t go into detail on them • You can forget about 4NF and 5NF, too Week #5

Summary: How to create an ERD • Define the entities about which you’ll want to keep information • Define the relationships between those entities • Must have a verb phrase and both cardinalities • Define the attributes for each entity • At a minimum, define the name and data type of each attribute • Identify a primary key for each entity • Identify foreign keys as needed • Normalize to third normal form (and fix keys as needed) Week #5

Systems Analysis I Designing Databases

Systems Analysis I Designing Databases

Presentation Transcript

Databases Management Systems

Chapter 10 Designing Databases

Designing Databases

Lecture 12 Designing Databases

Chapter 12: Designing Databases

Chapter 10 Designing Databases

Designing Relational Databases

Chapter 12: Designing Databases

Designing Databases Using Access

Chapter 12: Designing Databases

PART I : Databases and Database Management Systems

Designing Databases for Biological Research

Databases I Domeincalculus

Chapter 10: Designing Databases

Database Systems I Design Theory for Relational Databases

Designing Databases for Biological Research

Databases Management Systems

Internet Databases I

Designing Databases for Biological Research

Chapter 12: Designing Databases