1 / 38

Systems Analysis I Designing Databases

Systems Analysis I Designing Databases. ISYS 200 Glenn Booker. Databases. Storing and manipulating data are the heart of an information system Data must be Available when needed Accurate Consistent in its definition Efficiently stored and retrieved Presented in some useful form.

amina
Download Presentation

Systems Analysis I Designing Databases

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Systems Analysis IDesigning Databases ISYS 200 Glenn Booker Week #5

  2. Databases • Storing and manipulating data are the heart of an information system • Data must be • Available when needed • Accurate • Consistent in its definition • Efficiently stored and retrieved • Presented in some useful form Week #5

  3. Databases • There are two approaches for database design – flat file and relational • A flat (or ‘conventional’) file database keeps separate files for each application • COBOL systems, and FileMaker Pro • Like using Microsoft Excel as a database • A relational database stores data in a single structure for use by many applications • Every major database system; Oracle, DB2, Access, MySQL, Informix, etc. Week #5

  4. Databases • Flat file systems are, by definition, redundant • Data is copied to each report table only when the report is generated, so later changes make the table obsolete • Relational databases keep the connections to each set of data • Each record exists once in the system • Is much more flexible to generate custom views of the data, without changing its structure Week #5

  5. Reality, Data, Metadata • A database models some collection of data which appears in reality • Reality is the thing being modeled • Data is used to describe the key attributes of that thing • About what aspects of that thing do we wish to keep track? • Metadata (data about data) describes the characteristics of the data Week #5

  6. Data Modeling • Data modeling (or database or information modeling) is a way of organizing and describing the data in a system • It is a logical model to describe the specific data fields (elements) we wish to capture, and how they are related to each other Week #5

  7. Where to start? • Recall that data modeling starts with thinking about the things involved in your system • These things are formally called “entities” – nouns, if you will • Start by identifying all of the places, people, events, and ideas which are affected by your system Week #5

  8. Permanent vs. Transient Data • A key for relational data modeling is that we are primarily concerned with data we need to keep permanently • Data which is only needed briefly isn’t modeled in an ERD • Major difference between relational and object-oriented analysis Week #5

  9. Lots of Types of Files • Master files contain permanent information • Transaction files describe business events • Document files contain copies of historic data • Archival files contain master and transaction file records offline, which are no longer needed • Table look-up files contain static data for reference calculations (tax tables, Zip codes) • Audit files record changes to other files Week #5

  10. Characterize Entities • Then examine each entity and determine the attributes which you are interested in – what data do you need to know in order to describe one such entity meaningfully? • Consider if some attributes can be readily grouped together, thereby forming compound attributes (e.g. name) Week #5

  11. Characterize Entities • Entities are generally one of two types: • A set of data you want to keep permanently (customer orders, product information, etc.), or • A lookup list or table (types of status codes, shipping rates, tax rates, etc.) • Data which is transient is generally kept in local variables, and doesn’t appear in an ERD (e.g. change of address info) Week #5

  12. Keep it or not? • In trying to decide if data needs to be kept, consider whether someone might want to analyze that data in the future • For examples, to look for sales patterns, trace relocation history, keep record of data changes (who modified what data and when?) • When in doubt, keep it for now Week #5

  13. Attributive Entities • An entity which depends on the existence of another entity, but only indicates multiple examples of that ‘parent’ entity, is an attributive entity • Hence it’s referred to as describing ‘repeating groups of data’, or they are ‘parent/child’ entities • Shown using part of an oval inside the entity box (p. 36) Week #5

  14. Database Notation • In week 2 we covered the basic notation for cardinality • Only one • Zero or one • One or many • Zero, one, or many • Strictly many • Now we want to put all the two-entity pairs together, into a single diagram Week #5

  15. ERD • The Entity-Relationship Diagram (ERD) captures all the entities for our system, and shows the relationships among them (the lines connecting them) • The relationships all need to have • A verb phrase in at least one direction • Cardinality at both ends • Now we can add attributes and keys Week #5

  16. Attributes • Each entity is almost certain to have at least two attributes (often more) • At least one of them is a unique identifier for each record in the entity, called the primary key • Other attributes describe significant characteristics of the entity • So, about what aspects of the entity do you want to retain knowledge? • Attributes are also called fields, data items, etc. Week #5

  17. Characterize Attributes • For each attribute, define its data type: • Text (“Fred”) [and the character set (Latin)] • Number (real (3.56) or integer (124)) • Date and/or time • Yes/No (a.k.a. T/F, binary, or Boolean) • A fixed set of possible values (e.g. grades) • Formally known as an enumerated list • Multimedia: photos, drawings, movies, sounds Week #5

  18. Attributes & Records • Attributes can be fixed or variable length • Length of integer (8-64 bits, in powers of 2), • Lengths of text fields (the number of characters or bytes it contains) • Length of real number fields and the number of decimals (8.1 only has six digits, -12345.6) • We won’t worry about attribute lengths • A record is a set of attributes from one entity • Typically corresponding to one row in a data table Week #5

  19. Relevant Data Type Standards • Character sets • ISO/IEC 8859 – a set of nine 8-bit character sets, each used for a different family of languages (Latin, Cyrillic, Greek, Arabic, etc.) • Unicode – a 16-bit character set to represent almost all languages • Representation of dates and times • ISO 8601 Week #5

  20. Characterize Attributes • Other traits to look for, beyond the scope of this course: • Identify the domain of each attribute • What is the range of allowable values? • Determine if there is a default value for each attribute • Is each attribute mandatory (required) for each entity? (Avoid many mandatory fields) • We care: could an attribute be a key? Week #5

  21. Key Attributes • An attribute or group of attributes may be a unique identifier, or key, for each entity • Examples are Social Security Number, driver’s license number, ISBN, Student ID • If a group of attributes is used, it is a concatenated (or composite or compound) key • {Course number, section number, and term} could form a concatenated key Week #5

  22. Many Keys Possible • There might be more than one usable key for an entity • Each possible key is called a candidate key • One candidate key is selected to be the primary key (PK) • All others are alternate keys (AK) • Example: the electric company may use a customer ID or account number as a primary key, and your phone number as an alternate key Week #5

  23. Primary Key may be Meaningless • A primary key may correspond to some recognizable attribute • SSN, student ID, ISBN, etc. • Or it may be completely meaningless • A sequential number, called Order_ID • As long as the primary key is unique for every record, either kind is acceptable Week #5

  24. Foreign Keys • A foreign key is an attribute which establishes the connection between two entities • A Sales entity might have a foreign key attribute Customer_ID, which points to all of the data in the Customer entity for the particular customer who placed a given sale • If that were the case, the primary key in the Customer entity must be Customer_ID • This type of connection is the heart of relational database modeling (!) Week #5

  25. Foreign Keys • A foreign key (FK) is an attribute which exists, in an entity other than where it is a primary key (PK), to establish the relationship between the two entities • Primary key must be unique for each record, but a foreign key value may appear many times • Only one PK-FK connection is required for the relationship to exist • Entity with FK generally has a PK of its own Week #5

  26. Other Key Traits • A PK attribute may also be a FK • Especially for 1:1 relationships • An associative entity builds a concatenated primary key from more than one entity • Uses a diamond shape inside the normal box to show its special nature Week #5

  27. Other Relationships • A many-to-many (non-specific) relationship implies a lot of one-to-many relationships • Often use an associative entity to bridge between them • An identifying relationship is when a parent entity’s PK is used as part of the PK for a child entity • Child entity is then considered “weak” because it depends on the parent Week #5

  28. Summary of Key Traits • The bottom line for keys is: • Each entity must have at least one PK • More than one PK implies a concatenated key • Alternate keys are completely optional • Each entity may have from zero to many FK’s • Each FK is a PK in another, related entity • Only one PK-FK relationship is needed to relate two entities • Some keys are not inherently meaningful data Week #5

  29. Data Normalization • Analysis of a data model for implementation is done using data normalization • Normalization organizes data attributes to form simple, non-redundant, flexible, adaptive entities • There are five levels of data normalization, of which three are ever used • They build on each other; so to get to second level you have to comply with first level too Week #5

  30. First Normal Form (1NF) • An entity is in first normal form if there are no attributes which can have more than one value for each instance (record) of the entity • Attributes which could have more than one value for a given entity belong to a different kind of entity • In other words, every attribute appears only once for each record Week #5

  31. Second Normal Form (2NF) Look at concatenated keys only! • Must be first normal form, and: • Each non-primary-key attribute is uniquely determined by the entire primary key • Non-primary-key attributes may not be dependent on only part of the primary key • If any are, move them to another table which uses only that part of the primary key • If there are no concatenated keys, second normal form comes automatically with 1NF Week #5

  32. Third Normal Form (3NF) • Must be second normal form, and: • The value of each non-primary-key attribute is not dependent upon any other non-primary-key attribute • Every attribute depends only on the primary key • The two ways to look for this are derived attributes and transitive dependencies... Week #5

  33. Third Normal Form (3NF) • Derived attributes (data) are fields calculated or logically derived from other fields • Exception: OK to keep attribute if multiple entities are involved in deriving an attribute • Transitive dependencies may exist for non-concatenated keyed tables; is when a non-key attribute depends on another non-key attribute Week #5

  34. Third Normal Form (3NF) • Or in brief, for third normal form…An entity is in third normal form if every non-primary key attribute is dependent on the primary key, the whole primary key, and nothing but the primary key (as in, “Do you swear to tell the truth…”) Week #5

  35. Further Normalization • Additional improvement in data structure is possible through “Simplification by Inspection” - look for other redundancies or simplifications possible • Many CASE tools can also inspect for first level normalization, but generally no further • Just for the record, here are the 4th and 5th normal forms • I’ve never seen them used in industry Week #5

  36. Fourth and Fifth Normal Forms • Fourth normal form (4NF) involves removing multivalued dependencies • If a pair of records has two matching attributes, decompose the data structure to remove that • Fifth normal form (5NF) involves removing join dependencies (nearly impossible to do) • This is when business rules define a connection among many entities (e.g. if you replace a tire, you must also replace the valve stem) Taken from the INFO 605 text, pp. 351-354 Week #5

  37. File Organization • There are many ways to organize files • Sequential organization • Linked lists • Hash files • We won’t go into detail on them • You can forget about 4NF and 5NF, too Week #5

  38. Summary: How to create an ERD • Define the entities about which you’ll want to keep information • Define the relationships between those entities • Must have a verb phrase and both cardinalities • Define the attributes for each entity • At a minimum, define the name and data type of each attribute • Identify a primary key for each entity • Identify foreign keys as needed • Normalize to third normal form (and fix keys as needed) Week #5

More Related