1 / 31

LIS 557 Database Design and Management

William Voon Michael Cole Spring '04. LIS 557 Database Design and Management. 5 February 2004. Relational Databases. "Simplicity does not precede complexity, but follows it." - Alan J. Perlis. The Relational Database. 1970 – Invented by Edgar (Ted) Codd (IBM)

bly
Download Presentation

LIS 557 Database Design and Management

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. William Voon Michael Cole Spring '04 LIS 557Database Design and Management

  2. 5 February 2004 Relational Databases

  3. "Simplicity does not precede complexity, but follows it." - Alan J. Perlis

  4. The Relational Database 1970 – Invented by Edgar (Ted) Codd (IBM) "A Relational Model of Data for Large Shared Data Banks" http://www.acm.org/classics/nov95/toc.html 1981 – Turing Award 2002 – Forbes names relational databses as one of the most important modern innovations http://www.forbes.com/forbes/2002/1223/172_print.html

  5. The Problem • Databases were already important, but computer time cost hundreds of dollars a minute. • So, programs needed to be as efficient as possible before execution. • Early databases had • a rigid hierarchical structure, or • a complex system of navigational pointers to the physical locations of the data on magnetic tapes. It took groups of programmers to write database queries (CODASYL). Every query required a new program. New data types required a redesign of the database. The price of efficiency was utter inflexibility.

  6. Codd's Insight • Relationships between data items should be based on the item's values, and not on separately specified linking or nesting. • A big simplification for queries since we only talk about the data • Allows flexibility to use existing data in new ways • Users should be able to work at a more natural-language level and not have to know anything about the details of the data storage

  7. Codd's RDB • This separation of the data from the details of its storage, etc. requires much more computing power to keep track of those details • In this respect relational databases are far less efficient than thier predecessors, but they are spectacularly more cost effective • Relational databases really took off as computing power continued to improve (Moore's Law) • Today, relational databases are dominant.

  8. RDB development • 1973 - IBM begins development in the System R project • Structured Query Language (SQL), to express queries (Don Chamberlin and Ray Boyce) • A cost-based optimizer, which automatically translates a high-level query into an efficient plan for executing the query (Pat Selinger) • A query compiler, to save query plans for future use (Raymond Lorie) • Boyce and Codd also developed Boyce-Codd Normal Form to design efficient RDB tables without needless duplication of information in different tables

  9. Early RDB Commercialization • 1970s – INGRES UC-Berkeley (Mike Stonebraker) • 1977 – Relational Software Inc. (Larry Ellison) produces first commercially available relational database. • 1981 – IBM announces SQL/DS. • 1983 – IBM introduces DB2 for large mainframes. It now stores data on handhelds to supercomputers and handles billions of transactions per day • 1983 – Relational Software renamed Oracle

  10. The RDBMS • Handles the complex physical details of the location and actual relationships between the data sets • Presents the data model so that the user sees the database as a collection of tables • In designing and using the database we need only think about relationships between tables

  11. Tables Columns store attributes (same data type for each column!) Rows (tuples) store entities Table = entity set = relation

  12. A flat version of a database Why is this inefficient?

  13. Relating Tables Table name: CUSTOMER Table name: SALESREP The tables are independent, but we can still learn that Nam June Paik is Sonny Blount's sales rep or that Paik handles both Blount and John Gilmore (and that makes sense).

  14. What's the big deal? • Complex things (the point of the database) can be broken into logical units (perhaps along the lines of the data domains) • The units (aka tables) can be selected so that they minimize the redundant data in all of the tables

  15. Nature of Inefficiency • Duplicated data • Natural groupings are associated with types of queries

  16. Anatomy of a Table • A table (relation) is a 2-D structure • Each row (tuple) defines a single entity • Each column defines the attributes of the entities in the set • Each row/column element has a single data value of a type appropriate to the attribute (and allowed by the software) • Order of rows and columns does not matter to the user • Every table has a primary key

  17. Column = Attribute • Each column represents an attribute • Each column has a unique name • The values in the column are of a single type • Range of possible values is the column's domain

  18. Row = Entity • A row corresponds to an entity, so reading across the columns we see all of the specific attributes for that entity • Taken together the rows are the entity set. Notice that this is logically equivalent to the table itself.

  19. Null Values • Can be useful • Placeholder for an unknown attribute value • Known, but missing attribute value • A 'not applicable' condition

  20. Data Types • Options are defined by the software • Should be appropriate to the attribute • Standard data types: • Numeric • Character • Date • Logical • Other data types may be supported

  21. Numeric Data • Numbers that are arithmetically meaningful • It makes sense to add, subtract, etc. • Other uses of numbers are handled as character data • e.g. Telephone numbers, credit card identifiers, zip codes

  22. Character Data • Also called string data • Any set of characters that is not intended for mathematical manipulation

  23. Date Data Calendar dates are stored in a Julian date format invented by Joseph Scaliger in 1583 • the Julian date is the number of elapsed days since the beginning of a cycle of 7,980 years • The system allows easy calculation of the integer difference between two calendar dates

  24. Julian Dates • The 7,980 year cycle was derived by combining several traditional time cycles (solar, lunar, and a particular Roman tax cycle) for which 7,980 was a common multiple. • The first Julian cycle began on January 1, 4713 B.C. and will end on January 22, 3268. February 5, 2004 @ 19:00:00 EST = 2453041.5000 A calendar calculator: http://www.fourmilab.com/documents/calendar/

  25. Logical Data • True / False • Can perform Boolean logical calculations (AND, OR, NOT) on this data

  26. Keys • The RDB works because of controlled redundancy • Tables share common attributes, so they can be linked • The idea is that a specific attribute in one table can be used to point to a row (entity) in another table

  27. Key Types • Superkey • Can be an attribute or a combination of attributes that uniquely identifies each entity (row) in a table • Candidate key • A minimal superkey (A superkey that contains no superkeys) • Primary key • A candidate key that identifies all the values in a row (entity)

  28. The Primary Key • Each table must have a primary key • There must be a unique key for each entity (row), null values are not allowed • The primary key is the identifier for each entity • A table with this property has entity integrity

  29. Key Types • Secondary key • An attribute (or combination) used only for data retrieval • Foreign key • An attribute (or combination) in one table whose values must match the primary key in another table or be null.

  30. Integrity • Entity integrity depends on the primary key • Ensures each entity has a unique identifier • Referential integrity depends on the foreign key(s) • Makes it impossible for an attribute to have an invalid entry (might be null)

  31. Assignment • Design a collection of tables for a database • Briefly, what is the database about? Who are the users? What purposes will the database serve? • At least three related tables • Explain why this is the most efficient design • Is this also the most logical design?

More Related