1 / 49

Database Design: From Conceptual Design to Physical Implementation

Database Design: From Conceptual Design to Physical Implementation. University of California, Berkeley School of Information Management and Systems SIMS 257 – Database Management. Review. Database Design Process Normalization. Database Design Process. Application 1. Application 2.

sorley
Download Presentation

Database Design: From Conceptual Design to Physical Implementation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Database Design: From Conceptual Design to Physical Implementation University of California, Berkeley School of Information Management and Systems SIMS 257 – Database Management SIMS 257 – Database Management

  2. Review • Database Design Process • Normalization SIMS 257 – Database Management

  3. Database Design Process Application 1 Application 2 Application 3 Application 4 External Model External Model External Model External Model Application 1 Conceptual requirements Application 2 Conceptual requirements Conceptual Model Logical Model Internal Model Application 3 Conceptual requirements Application 4 Conceptual requirements SIMS 257 – Database Management

  4. Normalization • Normalization theory is based on the observation that relations with certain properties are more effective in inserting, updating and deleting data than other sets of relations containing the same data • Normalization is a multi-step process beginning with an “unnormalized” relation • Hospital example from Atre, S. Data Base: Structured Techniques for Design, Performance, and Management. SIMS 257 – Database Management

  5. Normal Forms • First Normal Form (1NF) • Second Normal Form (2NF) • Third Normal Form (3NF) • Boyce-Codd Normal Form (BCNF) • Fourth Normal Form (4NF) • Fifth Normal Form (5NF) SIMS 257 – Database Management

  6. Normalization Unnormalized Relations First normal form Functional dependencyof nonkey attributes on the primary key - Atomic values only Second normal form No transitive dependency between nonkey attributes Third normal form Boyce- Codd and Higher Full Functional dependencyof nonkey attributes on the primary key All determinants are candidate keys - Single multivalued dependency SIMS 257 – Database Management

  7. Unnormalized Relations • First step in normalization is to convert the data into a two-dimensional table • In unnormalized relations data can repeat within a column SIMS 257 – Database Management

  8. Unnormalized Relation SIMS 257 – Database Management

  9. First Normal Form SIMS 257 – Database Management

  10. Second Normal Form SIMS 257 – Database Management

  11. Second Normal Form SIMS 257 – Database Management

  12. Second Normal Form SIMS 257 – Database Management

  13. Third Normal Form SIMS 257 – Database Management

  14. Third Normal Form SIMS 257 – Database Management

  15. Most 3NF Relations are also BCNF SIMS 257 – Database Management

  16. Fourth Normal Form • Any relation is in Fourth Normal Form if it is BCNF and any multivalued dependencies are trivial • Eliminate non-trivial multivalued dependencies by projecting into simpler tables SIMS 257 – Database Management

  17. Fifth Normal Form • A relation is in 5NF if every join dependency in the relation is implied by the keys of the relation • Implies that relations that have been decomposed in previous NF can be recombined via natural joins to recreate the original relation. SIMS 257 – Database Management

  18. Normalization • Normalization is performed to reduce or eliminate Insertion, Deletion or Update anomalies. • However, a completely normalized database may not be the most efficient or effective implementation. • “Denormalization” is sometimes used to improve efficiency. SIMS 257 – Database Management

  19. Denormalization • Usually driven by the need to improve query speed • Query speed is improved at the expense of more complex or problematic DML (Data manipulation language) for updates, deletions and insertions. SIMS 257 – Database Management

  20. Customer ID Address Name Telephone Order Order No Date Taken Date Dispatched Date Invoiced Cust ID Order Order No Date Taken Date Dispatched Date Invoiced Cust ID Cust Name Customer ID Address Name Telephone Downward Denormalization Before: After: SIMS 257 – Database Management

  21. Order Order No Date Taken Date Dispatched Date Invoiced Cust ID Cust Name Order Order No Date Taken Date Dispatched Date Invoiced Cust ID Cust Name Order Price Order Item Order No Item No Item Price Num Ordered Order Item Order No Item No Item Price Num Ordered Upward Denormalization SIMS 257 – Database Management

  22. Today: New Design • Today we will build the COOKIE database from needs (rough) through the conceptual model, logical model and finally physical implementation in Access. SIMS 257 – Database Management

  23. ER Diagram Symbols Ovals are used to indicate the attributes associated with an entity or relationship (That is, the pieces of information recorded in the database about the entity or relationship) An underlined name indicates that the attribute is a primary key (That is, it can uniquely identify the entity) Rectangles are used to indicate entities (That is, the representatives or records describing persons, things, or events in the database) Diamonds are used to indicate relationships between entities. (That is, some association between the data records of different entities) Primary key Attribute Entity Relationship SIMS 257 – Database Management

  24. Cookie Requirements • Cookie is a bibliographic database that contains information about a hypothetical union catalog of several libraries. • Need to record which books are held by which libraries • Need to search on bibliographic information • Author, title, subject, call number for a given library, etc. • Need to know who publishes the books for ordering, etc. SIMS 257 – Database Management

  25. Cookie Database • There are currently 5 main types of entities in the database • Books (bibfile) • Local Call numbers (callfile) • Libraries (libfile) • Publishers (pubfile) • Subject headings (subfile) • Links between subject and books (indxfile) SIMS 257 – Database Management

  26. BIBFILE • Books (BIBFILE) contains information about particular books. It includes one record for each book. The attributes are: • accno -- an “accession” or serial number • author -- The author’s name (not realistic -- one author per book) • title -- The title of the book • loc -- Location of publication (where published) • date -- Date of publication • price -- Price of the book • pagination -- Number of pages • ill -- What type of illustrations (maps, etc) if any • height -- Height of the book in centimeters SIMS 257 – Database Management

  27. Books/BIBFILE Author Title accno Loc Books Price Date Pagination Ill Height SIMS 257 – Database Management

  28. CALLFILE • CALLFILE contains call numbers and holdings information linking particular books with particular libraries. Its attributes are: • accno -- the book accession number • libid -- the id of the holding library • callno -- the call number of the book in the particular library • copies -- the number of copies held by the particular library SIMS 257 – Database Management

  29. LocalInfo/CALLFILE libid Callno accno Copies CALLFILE SIMS 257 – Database Management

  30. LIBFILE • LIBFILE contain information about the libraries participating in this union catalog. Its attributes include: • libid -- Library id number • library -- Name of the library • laddress -- Street address for the library • lcity -- City name • lstate -- State code (postal abbreviation) • lzip -- zip code • lphone -- Phone number • mop - suncl -- Library opening and closing times for each day of the week. SIMS 257 – Database Management

  31. Libraries/LIBFILE lstate lcity laddress lzip Library lphone Libid SatCl LIBFILE SunOp SatOp Suncl FCl MOp FOp ThCl Mcl TuOp TuCl WOp WCl ThOp SIMS 257 – Database Management

  32. PUBFILE • PUBFILE contain information about the publishers of books. Its attributes include • pubid -- The publisher’s id number • publisher -- Publisher name • paddress -- Publisher street address • pcity -- Publisher city • pstate -- Publisher state • pzip -- Publisher zip code • pphone -- Publisher phone number • ship -- standard shipping time in days SIMS 257 – Database Management

  33. Publisher/PUBFILE paddress Publisher pcity PUBFILE pubid pstate pzip Ship pphone SIMS 257 – Database Management

  34. SUBFILE • SUBFILE contains each unique subject heading that can be assigned to books. Its attributes are • subcode -- Subject identification number • subject -- the subject heading/description SIMS 257 – Database Management

  35. Subjects/SUBFILE subid Subject SUBFILE SIMS 257 – Database Management

  36. INDXFILE • INDXFILE provides a way to allow many-to-many mapping of subject headings to books. Its attributes consist entirely of links to other tables • subcode -- link to subject id • accno -- link to book accession number SIMS 257 – Database Management

  37. Linking Subjects and Books subid accno INDXFILE SIMS 257 – Database Management

  38. Some examples of Cookie Searches • Who wrote Microcosmographia Academica? • How many pages long is Alfred Whitehead’s The Aims of Education and Other Essays? • Which branches in Berkeley’s public library system are open on Sunday? • What is the call number of Moffitt Library’s copy of Abraham Flexner’s book Universities: American, English, German? • What books on the subject of higher education are among the holdings of Berkeley (both UC and City) libraries? • Print a list of the Mechanics Library holdings, in descending order by height. • What would it cost to replace every copy of each book that contains illustrations (including graphs, maps, portraits, etc.)? • Which library closes earliest on Friday night? SIMS 257 – Database Management

  39. publishes PUBFILE pubid pubid Cookie ER diagram accno Has call Has copy BIBFILE CALLFILE LIBFILE accno libid libid Note: diagram contains only attributes used for linking Has index Has subject INDXFILE SUBFILE accno subcode subcode SIMS 257 – Database Management

  40. What Problems? • What sorts of problems and missing features arise given the previous ER diagram? SIMS 257 – Database Management

  41. Field sizes inappropriate Author doesn’t allow multiple authors (editors, etc). Subtitles, parallel titles Edition information Series information lending status material type designation Genre, class information Better codes (ISBN?) Missing information (ISBN) Authority control for authors Missing/incomplete data Data entry problems Ordering information Illustrations Subfield separation (such as last_name, first_name) Separate personal and corporate authors Problems Identified SIMS 257 – Database Management

  42. Location field inconsistent No notes field No language field Zipcode doesn’t support plus-4 No publisher shipping addresses No (indexable) keyword search capability No support for multivolume works No support for URLs to online version to libraries to publishers Problems (Cont.) SIMS 257 – Database Management

  43. pubid accno Has call Has copy BIBFILE CALLFILE LIBFILE accno Libid Callno libid Library publishes PUBFILE pubid Address, etc Has index Has subject INDXFILE SUBFILE accno subid subid subject Original Cookie ER diagram SIMS 257 – Database Management

  44. PUBFILE pubid Cookie2: Separate Name Authorities pubid accno BIBFILE CALLFILE LIBFILE accno accno libid AUTHBIB libid nameid authtype INDXFILE SUBFILE AUTHFILE accno subcode subcode name nameid SIMS 257 – Database Management

  45. PUBFILE pubid Cookie3: Keywords termid accno termid pubid TERMS KEYMAP accno BIBFILE CALLFILE LIBFILE accno accno libid AUTHBIB libid nameid authtype INDXFILE SUBFILE AUTHFILE accno subcode subcode name nameid SIMS 257 – Database Management

  46. ser_title PUBFILE pubid SERIES seriesid Cookie 4: Series termid seriesid accno termid pubid TERMS KEYMAP accno BIBFILE CALLFILE LIBFILE accno accno libid AUTHBIB libid nameid authtype INDXFILE SUBFILE AUTHFILE accno subcode subcode name nameid SIMS 257 – Database Management

  47. ser_title PUBFILE pubid SERIES seriesid Cookie 5: Circulation termid seriesid accno termid pubid TERMS KEYMAP accno BIBFILE CALLFILE LIBFILE accno accno circid libid AUTHBIB libid nameid authtype PATRON CIRC INDXFILE SUBFILE AUTHFILE subcode accno subcode copynum patronid circid name nameid SIMS 257 – Database Management

  48. Mapping to Relations • Take each entity • BIBFILE • LIBFILE • CALLFILE • SUBFILE • PUBFILE • INDXFILE • And make it a table... SIMS 257 – Database Management

  49. Implementing the Physical Database... • For each of the entities, we will build a table… • Start up access… • Use “New” in Tables… • Loading data • Entering data • Data entry forms SIMS 257 – Database Management

More Related