1 / 0

Data Management: Databases and Organizations Richard Watson

Data Management: Databases and Organizations Richard Watson. Summary of Chapters 3-6 prepared by Kirk Scott. Data Modeling and SQL. Chapter 3. The Single Entity Chapter 4. The One-to-Many Relationship Chapter 5. The Many-to-Many Relationship

qamar
Download Presentation

Data Management: Databases and Organizations Richard Watson

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Management: Databases and OrganizationsRichard Watson

    Summary of Chapters 3-6 prepared by Kirk Scott
  2. Data Modeling and SQL Chapter 3. The Single Entity Chapter 4. The One-to-Many Relationship Chapter 5. The Many-to-Many Relationship Chapter 6. One-to-One and Recursive Relationships
  3. Introduction Large parts of these overheads will be somewhat repetitive They cover in general terms some of the things that were specifically illustrated by concrete SQL examples However, the repetition shouldn’t be harmful It should put the examples into a broader context, and add new examples to flesh the ideas out The ultimate goal is for the basic concepts and diagramming to be clear so that there will be no trouble considering design questions in unit 14
  4. Chapter 3. The Single Entity The author starts with the entity relationship diagramming conventions and the concept of a single entity The author represents an entity with a box containing its name in capital letters inside, at the top Full field names are given after that in small letters The primary key field is marked with an asterisk
  5. Different diagramming conventions are perfectly acceptable, as long as you are consistent The name of the entity may be given above the box representing it You may choose just to capitalize just the first letter of the name
  6. In theory, you could qualify field names, although this would be redundant, given the entity name at the top You could also use short names for fields if space is at a premium Primary keys could be marked with pk or underlined
  7. Chapter 4. The One-to-Many Relationship The author uses the crow’s foot to mark a one-to-many relationship in a ER diagram In a simple ER diagram fields may not be listed, just entity names and crow’s feet In a more complete diagram, fields can be listed
  8. The author does not include the embedded pk/fk in the list of fields in the fk/many table because it is redundant I do not follow this convention I believe that in the interests of clarity it is worthwhile to include the fk in the list of fields
  9. Chapter 5. The Many-to-Many Relationship As is known, the many-to-many relationship is the most “complicated” of the relationships The book presents some interesting examples that arise in real situations They illustrate ideas that are not immediately apparent from the examples that have gone before The first example is based on a bill of sale, shown on the next overhead
  10. The Bill of Sale Example: An Interesting Case of a pk/fk Relationship
  11. The book analyzes this situation as consisting of base entities which are a sale and the items which are sold There is a many-to-many relationship between these base entities because each sale can consist of many items Also, each item can be present in many sales The book’s ER for this analysis is shown on the next overhead
  12. When first introducing many-to-many relationships, I referred to the table in the middle More formally, the book refers to an associative entity The associative entity is the table in the middle that captures the relationship between two base entities
  13. In the ER notation for this example the + sign is used This has not been seen before For the purposes of understanding the book’s example, it is important to know what this means
  14. The + sign is shown over a crow’s foot It symbolizes the fact that the embedded fk is part of the pk of the table it’s embedded in You have seen an example of a table in the middle where the pk is the concatenation of the two embedded fk’s This example is not the same as that
  15. In this example the saleno is the pk of the Sale table It is embedded as a fk in the Lineitem table A saleno value will appear in the Lineitem table as many times as there are separate lines belonging to the sale These separate lines are identified by lineno’s The lineno’s are not embedded fk’s based on the unique identifiers, itemno’s, of entries in the Item table
  16. An alternative way of representing the relationship would be to list the fields of the table in the middle this way: salenopk, fk linenopk itemnofk lineqty lineprice Note again that the saleno is both a pk and a fk, while the lineno is purely pk
  17. At first glance it may seem a little strange, but the table in the middle contains every line of every sale, listed separately It is the saleno and the lineno together which uniquely identify the entries in the Lineitem table This model actually reflects reality well It differs, in particular, from the car sale example
  18. In the car sale example, there were individual cars that were sold In the example database they were only shown as being sold once In reality, the same car might be sold more than once This could be modeled by making the salesdate part of the pk of the Carsale table
  19. In the Sale, Lineitem, Item example, the items are not actually individual items An item is a kind of item, like a screw or a shovel or a microwave oven The seller may have many of each kind of item in stock and doesn’t distinguish between individual items
  20. Multiple instances of the same (kind of) item may be sold to the same customer Also, the same (kind of) item can be sold to more than one customer It’s not incredibly difficult, but it’s worth emphasizing that the itemnodoes appear in the table in the middle as a fk This tells which item that line of a sale was in reference to However, the itemno is not part of the pk of the table in the middle
  21. In a perfect world, you might argue that each item should appear on only one line of a sale If so, then you could dispense with individual line numbers and use the itemno as part of the pk instead However, reality makes the given solution better
  22. When creating a data model, it should be flexible and accommodate all possibilities Could a customer, in the middle of making a purchase, decide that more instances of a certain item were desired? If so, do you allow this, and how do you support it?
  23. From a business point of view, few things are more destructive than a computer system whose model imposes artificial constraints on the user (seller and customer) Of course, if a customer decides that more instances are desired you want to sell them
  24. Have you ever heard things like these: “I’d like to let you buy more, but the computer won’t allow it.” “I’d like to let you buy more, but it will be necessary to start a completely new bill of sale.” “I’d like to let you buy more, but it will be necessary to go back and modify the earlier line of the sale for that item.”
  25. In any of the previous scenarios, both the customer and the salesperson want to scream The best scenario would go like this: “Oh, you want 20 instead of 10? We’ll just add another line here at the bottom for another 10.” Now everybody sighs with satisfaction…
  26. Relational Division, For All, and Not Exists The book points out that SQL, with operations like AND, OR, NOT, and so on, has qualities of algebra Similarly, there are set operators like UNION Although Microsoft Access SQL doesn’t support INTERSECT, some implementations do
  27. The Cartesian product represents a form of multiplication for relations The results of a join operation are a subset of the results of a product In an algebraic system, the existence of a multiplication operation implies the existence of a division operation
  28. As pointed out when doing the concrete SQL examples, there is no FOR ALL operator However, double NOT EXISTS can accomplish the same thing FOR ALL/double NOT EXISTS is roughly analogous to division in a relational system Before we’re finished with SQL we will see queries which are actually stated in terms of division
  29. This is the point where Watson takes up the case of double not exists The book shows a ER diagram of 3 tables capturing a many-to-many relationship This diagram is labeled generically, but it is of the same structure as the Lineitem example
  30. It then outlines the double NOT EXISTS query that could be written for it The fact that this models the Lineitem example is not important The table in the middle could have a completely concatenated primary key It could also have its own, separate primary key
  31. The important point is that the base tables are at the ends of the ER diagram The book refers to these as target and source, respectively The table in the middle, the associative entity, is labeled Target-Source by the book
  32. If you want to find those rows of the target which are in relation to all of the rows of the source, Then in the double NOT EXISTS query: The target appears first, in the outermost query The source appears second, in the middle, in the first nested subquery And the table in the middle appears last, in the second nested subquery The ER diagram and the schematic query are shown on the next overhead
  33. A Design with a Cycle The next diagram illustrates a design containing a cycle Such designs will become especially important when considering normalization, the theory of correctness in designs For the time being simply note that there is nothing preventing designs with cycles
  34. A Concatenated Key with Date The next example design is one where both of the embedded foreign keys are part of the primary key of a table in the middle However, it is more complicated than that because a date field is also included in the primary key This allows the same pair of base values to be paired with each other more than once
  35. A Simple Concatenated Key The next design is actually somewhat simpler It also has two embedded pk/fk’s in the table in the middle The table in the middle isn’t pure key though There is also a non-key attribute field for the table in the middle
  36. The Music CD Library Example In the overheads for chapters 3 and 4 some very primitive starting designs were given for a collection of music CD’s At the end of chapter 5, with the capability to model many-to-many relationships, this model blossoms
  37. On the next overhead an 8 entity design is shown Note that 4 of the 8 entities can be classified as associative entities These are the entities: CD, Composition, Label, Person, Person-CD, Person-Composition, Person-Track, Track
  38. The next overhead shows the music CD design blossoming further The Person-Track table has been removed Recording and Person-Recording tables have been added In the book, the new relationships are analyzed I will not list the analysis here The new design reflects additional assumptions and capabilities The new design should be a better model of reality, with fewer exceptions and more flexibility
  39. Chapter 6. One-to-One and Recursive Relationships What one-to-one relationships are should be clear The book uses the term recursive relationship for those cases where a table is in a relationship with itself
  40. One-to-One Relationships You may recall some of the different options for capturing one-to-one relationships If this is truly one-to-one in all cases at all times, then this can be a single relation Otherwise, you end up embedding the pk of one entity as a fk in another
  41. Maintaining this as a one-to-one relationship then becomes a question of data integrity When choosing which pk to embed as a fk, you should take into consideration any possible exceptions or changes in the relationship in the future The book has a number of examples which illustrate details of this concept
  42. The book’s examples start with a company with a two level management hierarchy There are bosses of departments and there is an overall managing director The (non-ER) diagram on the following overhead illustrates this
  43. Next the book shows an ER diagram illustrating that departments have employees and that departments have bosses A garden variety crow’s foot doesn’t have to be labeled A one-to-one relationships should be labeled
  44. The foregoing diagram doesn’t explicitly show whether the pk of Dept is embedded as a fk in Emp or vice-versa In this case it is likely that the pk of Emp is embedded as a fk in Dept This is because, all else being equal, a department will have a boss However, few employees will be bosses There would be lots of nulls if there were a “department which you’re the boss of” field in Emp
  45. A One-to-One Recursive Relationship Next, the book considers recording which employee is which other employee’s boss This leads to what the book calls a recursive relationship This is when there is a one-to-many relationship between a table and itself Such a one-to-many relationship should be labeled because the meaning of the embedding would not necessarily be clear An ER diagram illustrating this follows
  46. The previous design may not be ideal If every employee is assigned to a department, it would seem that the employee’s boss would be the boss of that department At first glance, at the very least, this appears to be redundant Redundancy means that information is repeated, and it opens up the possibility of inconsistencies between the repeated representations of the same data
  47. However, this is another problem that arises from real life Ask yourself, what departments are the bosses of departments assigned to? For example, if “Bob” is the head of Marketing and his department is listed as Marketing, is he his own boss? It should be apparent that his boss is the managing director
  48. Another detail that might be considered is split assignments or temporary assignments If an employee is split 50-50 between departments, who is their boss? If an employee is only temporarily assigned to a department, who is their boss? The apparently redundant design allows such cases to be handled with full flexibility
  49. A One-to-One Recursive Relationship that Forms a Linked List The next example the book pursues is a little artificial However, something like it might arise in real life, and this provides an introduction to the idea It is possible for there to be a one-to-one relationship between a table and itself
  50. The following overhead illustrates the idea with the succession of monarchs The idea is that the pk of the monarch table is embedded as a fk in the table Every monarch except the first has the previous monarch recorded The problem could also be solved by simply recording a numbering for the monarchs
  51. A Many-to-Many Recursive Relationship The next example considers a table in a many-to-many relationship with itself This is another example drawn from real life which is very instructive about how relational databases work It is helpful because it brings out one of the limitations of relational databases It provides insight into the subject of object-oriented databases
  52. Whenever a table is in a relationship with itself, the book refers to this as a recursive relationship As far as I’m concerned, the use of the term recursive is optional, although descriptive I am just as happy in this context with saying “in a relationship with itself” In any case, consider the ER diagram on the next overhead and the explanatory remarks that follow
  53. The idea is that the Product table contains entries for stand-alone products (possible sub-products) and for products (super-products) that consist of collections of other products Potentially the Product table might also contain things (sub-products) which themselves aren’t even individual products, but which only exist as components of finished products
  54. The Assembly table is the table which shows the relationship between products and sub-products (whether those sub-products have an independent existence or not) Notice that both of the crows’ feet in the diagram have + signs on them This means that the pk of an assembly is the concatenation of the embedded fk’s of a (super) product and a (sub) product
  55. In addition, the Assembly table has a quantity field, telling how many of the sub-product there are in the super-product If you assume that this is just a two-level hierarchy with super-products and sub-products, things seem relatively clear However, both from a database point of view and a real life point of view, there is no need for this restriction to apply
  56. There is no reason why a given product might not consist of several other (sub) products Each of these (sub) products, in turn might be super-products consisting of other sub-products, and so on Now the descriptiveness of the term recursion becomes apparent
  57. There is no theoretical limit on how deeply things might be related in this kind of “has-a” relationship Practically speaking, the only limit is how many rows there are in the Product table This last claim leads to one more observation
  58. Data integrity would require that no product be a super-product or sub-product of itself Otherwise you would have a containment cycle It seems apparent that in real life this shouldn’t occur
  59. The product-assembly relationship crops up reasonably frequently in real life If you think about it, what’s really being captured is a tree-like containment structure Manufacturing is a problem domain where this is relevant
  60. Working from the top down, a car has various components, including doors Doors may be made of a variety of panels, among other things The panels may consist of various items, including screws And so on, down the line
  61. SQL and Recursive Relationships The given relational design works, to a certain extent, but it has shortcomings For example, it is not necessarily an easy way to understand or a natural way to envision tree-like relationships In particular, consider what you know about SQL and what kind of query you might liked to execute against products and assemblies
  62. SQL is non-procedural For a given product you could ask for all of its immediate sub-products or sub-assemblies However, it would not be possible to form a query that would retrieve all of the constituent parts of a given product SQL won’t allow you to travel “down the tree”
  63. Object-Oriented Databases It is these problems that led, at least in part, to the development of what are known as object-oriented databases In essence, O-O databases are constructed around tree-like containment
  64. Although extremely useful in some problem domains, it is estimated that O-O db’s have about 5% of the commercial market The remaining 95% is relational because relational db’s are applicable and convenient in so many other problem domains
  65. The CD Music Library Again The chapter concludes with the latest version of the CD music library It illustrates several points Although the ER diagram is useful for getting the big picture, it’s becoming clear that without written text explaining the problem and the assumptions made, you haven’t completely and clearly documented what’s going on
  66. This example illustrates another point, which is also relevant to the final project You might have thought that a CD music library was a pretty simple, toy application Notice that it has grown to 13 tables, twice as many as you’re required to have for your project
  67. It is likely that before you’re finished with your project, you will be simplifying the problem you tackled so that you meet the minimum requirements without inviting too much trouble for yourself
  68. The previous version of the design had these tables: CD, Composition, Label, Person, Person-CD, Person-Composition, Person-Recording, Recording Track This latest version has these tables added to it: Group, Group-CD, Group-Recording, Person-Group The ER diagram is shown on the following overhead
  69. The End
More Related