normalizing your database and why you want to do it l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Normalizing Your Database and Why you WANT to do it! PowerPoint Presentation
Download Presentation
Normalizing Your Database and Why you WANT to do it!

Loading in 2 Seconds...

play fullscreen
1 / 39

Normalizing Your Database and Why you WANT to do it! - PowerPoint PPT Presentation


  • 284 Views
  • Uploaded on

INFYS540 Normalizing Your Database and Why you WANT to do it! Lesson 7 Chapter 5 Appendix Data Redundancy Problems Redundancy breeds errors Same data defined in multiple places is BAD Spelling/typographical error prone Lack of data integrity Inability to perform simple queries

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Normalizing Your Database and Why you WANT to do it!' - Anita


Download Now An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
data redundancy problems
Data Redundancy Problems
  • Redundancy breeds errors
    • Same data defined in multiple places is BAD
    • Spelling/typographical error prone
    • Lack of data integrity
  • Inability to perform simple queries
  • Inflexibility and inscalability
  • Impossible to MAINTAIN!
relational database
Relational Database

PROJECTCHIEF

Project Project Chief

Computing 333-22-1111

Intranet987-65-4321

Contracting 123-45-6789

CAT 333-22-1111

EMPLOYEES

LNameFName SSNDept

Jones Mike 123-45-6789 M&B

Smith Tony 987-65-4321 C2G

Lee Bruce 567-89-1234 MLD

Doodle Yankee 333-22-1111 M&B

1

DEPARTMENTS

Dept Dept. DirectorRoom

MLD 181-94-5676 B115

C2G 987-65-4321 123

M&B 123-45-6789 147

1

What is a candidate key?

What is a primary key?

What is a foreign key?

purpose of normalization
Purpose of Normalization
  • Take advantage of the powerful tools available in a DBMS
  • There are five levels of Normalization
    • The higher the Normal Form the “better” and more efficient the database
    • But, increasing the levels of Normal Form takes time and effort
    • For most applications, 3rd Normal Form will solve most potential problems with a DB
normalizing database
Normalizing Database
  • Process of creating well-structured tables.
  • Improve performance, integrity of data
  • 5-step process (w/ 2 rules) to achieve

Third Normal Form (3NF)

  • First two steps put DB into a form so you can normalize it
rule 1 in databases
Rule #1 in Databases

Never design redundant data into a Database

duplicate data is not consistent

duplicate data wastes space

step 1 primary keys
Step 1. Primary Keys
  • A primary key is one or more data fields (columns) that uniquely identify each record in the table
  • What would the primary key be below?
        • “table of employees, assigned to a department.”

EMPLOYEES

LNameFNameSSNDept

Jones Mike 123-45-6789 Math

Smith Tony 987-65-4321 M&B

Lee Bruce 567-89-1234 Science

step 1 primary keys8
Step 1. Primary Keys
  • Answer: The SSN
  • It is the only “guaranteed” unique column in the table. Names are easily repeated.

EMPLOYEES

LNameFName SSNDept

Jones Mike 123-45-6789 Math

Smith Tony 987-65-4321 M&B

Lee Bruce 567-89-1234 Science

step 1 primary keys9
Step 1. Primary Keys
  • Now try the following example:
      • “A table of projects assigned to employees, listing the project name and the employee’s function on the project.”

A Counter --The MS Access Default Key

EmpProj

Counter SSNProject Function

1 123-45-6789 Dining Designer

2 123-45-6789 Computing Designer

3 987-65-4321 Contracting Designer

4 444-55-6666 Intranet Webmaster

5 222-99-7777 Dining Overwatch

step 1 primary keys10
Step 1. Primary Keys
  • It is the combination of the SSN and the Project fields. Why?

EMPLOYEES’ PROJECTS

Counter SSNProject Function

1 123-45-6789 Dining Designer

2 123-45-6789 Computing Designer

3 987-65-4321 Contracting Designer

4 444-55-6666 IntranetWebmaster

5222-99-7777 Dining Overwatch

step 1 primary keys11
Step 1. Primary Keys
  • Because, you can have the following:

EMPLOYEES’ PROJECTS

Counter SSNProject Function

1 123-45-6789 Dining Designer

2 123-45-6789 Dining Designer

3 987-65-4321 Intranet Designer

4 444-55-6666 Intranet Webmaster

5 222-99-7777 Dining Overwatch

  • Redundant records! (Redundancy = BAD)
rule 2 about databases
Rule #2 about Databases

NEVER Use a

Counter as a

Primary Key

step 2 eliminate many to many relationships
Step 2: Eliminate Many-to-Many Relationships
  • What is wrong with the following table?
      • “a table of personnel authorized access to a project”

PROJECTS QUERY ACCESS

Project Access_1 Access_2 Access_3

Dining 222-99-7777 181-94-5676

Computing 222-99-7777 181-94-5676

Intranet 987-65-4321 818-49-6765 123-45-6789

step 2 eliminate many to many relationships14
Step 2: Eliminate Many-to-Many Relationships
  • Here’s essentially what this table looks like within the Access relationships diagram:

Employees:

SSN

Last Name

First Name

....

Projects:

Project

Project Chief

Department

Access_1

Access_2

Access_3

has access to info about

step 2 eliminate many to many relationships15
Step 2: Eliminate Many-to-Many Relationships
  • Here’s how you model it in a database:
    • Break it up into two one-to-many relationships

Projects:

Project

Project Chief

Department

....

Employees:

SSN

Last Name

First Name

....

1

1

Access to

Project Info:

Project

SSN

step 2 eliminate many to many relationships16
Step 2: Eliminate Many-to-Many Relationships
  • How to do it:
    • The primary key of the new table is the composite of the primary keys of the existing tables.
      • Primary key of Projects = Project Name
      • Primary key of Employees = SSN
      • New table primary key of Project Name and SSN
step 2 eliminate many to many relationships17
Step 2: Eliminate Many-to-Many Relationships
  • No artificial restrictions on number of people with access
  • You can add attributes about the types of access granted
  • You can easily query who has access to information about each project

PROJ QUERY ACCESS

Project SSN

Dining 222-99-7777

Dining 181-94-5676

Computing 222-99-7777

Computing 181-94-5676

Intranet 987-65-4321

Intranet 818-49-6765

Intranet 123-45-6789

PROJECT

ProjectProjectChief Dept

Computing 333-22-1111 MATH

Intranet987-65-4321 M&B

Contracting 123-45-6789 M&B

CAT 333-22-1111 Admin

EMPLOYEE

LNameFName SSN

Jones Mike 123-45-6789

Smith Tony 987-65-4321

Lee Bruce 567-89-1234

Doodle Yankee 333-22-1111

what is wrong with the following
What is wrong with the following?

“A table of PCs, which are loaded with many different applications, and assigned to a user.”

PCSerial# LoadedSoftware Assigned

10291 Word, Powerpoint, ccMail Jones

10301 Word, Powerpoint, Lotus Notes Smith

10311 Word, LotusNotes, Borland C++ Hacker

step 3 achieving 1nf all data must be atomic
Step 3: Achieving 1NF:All Data must be Atomic
  • “Atomic” - the data occupying a field cannot be further broken down.
    • i.e., no multi-data entries
    • i.e., “No attributes can have more than one value for a single instance of an entity”

PCSerial# LoadedSoftware Assigned

10291 Word, Powerpoint, ccMail Jones

  • If not atomic, updating is complex and error prone
  • If not atomic, can not easily query the database
step 3 answer
Step 3 Answer

PCSerial# LoadedSoftware Assigned

10291 Word Jones

10291 Powerpoint Jones

10291 ccMail Jones

10301 Word Smith

10301 Powerpoint Smith

10301 LotusNotes Smith

10311 Word Hacker

10311 LotusNotes Hacker

10311 Borland C++ Hacker

step 3 achieving 1nf all data must be atomic21
Step 3. Achieving 1NF:All Data must be Atomic

Another source of redundancy: calculated fields

TotalYTD

Age

DaysRemaining

Solution: Use a Query!

Remove all calculated fields from table and create a query

...then use the query whenever you need up-to-date data

step 4 achieving 2nf eliminate partial dependencies
Step 4. Achieving 2NF:Eliminate Partial Dependencies
  • What is a partial dependency?
    • Look at the table. What’s redundant?
        • “A table of functions an employee is assigned to for a project, and the project chief.”

EMPLOYEES’ PROJECTS

SSNProject Function Project Chief

123-45-6789 Dining Designer 222-99-7777

123-45-6789 Computing Designer 333-88-5656

123-45-6789 Intranet Member 987-65-4321

987-65-4321 Intranet Designer 987-65-4321

444-55-6666 Intranet Webmaster 987-65-4321

222-99-7777 Dining Overwatch 222-99-7777

step 4 achieving 2nf eliminate partial dependencies23

EMPLOYEES’ PROJECTS

SSNProject Function ProjectChief

123-45-6789 Dining Designer 222-99-7777

123-45-6789 Computing Designer 333-88-5656

123-45-6789 Intranet Member 987-65-4321

987-65-4321 Intranet Designer 987-65-4321

444-55-6666 Intranet Webmaster 987-65-4321

222-99-7777 Dining Overwatch 222-99-7777

Step 4. Achieving 2NF:Eliminate Partial Dependencies
  • Function depends on the entire primary key: SSN and Project.
  • ProjectChief is dependent on just a portion of the primary key
step 4 achieving 2nf eliminate partial dependencies24
Step 4. Achieving 2NF:Eliminate Partial Dependencies
  • Why is this bad?
    • Well, what’s wrong with the following?

EMPLOYEES’ PROJECTS

SSNProject Function Project Chief

123-45-6789 Dining Designer 222-99-7777

123-45-6789 Computing Designer 333-88-5656

123-45-6789 Intranet Member 987-65-4321

987-65-4321 Intranet Designer 987-65-4321

444-55-6666 Intranet Webmaster 222-99-7777

222-99-7777 Dining Overwatch 222-99-7777

step 4 achieving 2nf eliminate partial dependencies25
Step 4. Achieving 2NF:Eliminate Partial Dependencies
  • A partial dependency (PD) occurs when a non-key field depends on only a part of the primary key, and not the whole primary key.
  • PDs are a relation. So, we need a new table.....

EMPLOYEES’ PROJECTS

SSNProject Function Project Chief

123-45-6789 Dining Designer 222-99-7777

123-45-6789 Computing Designer 333-88-5656

123-45-6789 Intranet Member 987-65-4321

987-65-4321 Intranet Designer 987-65-4321

444-55-6666 Intranet Webmaster 987-65-4321

222-99-7777 Dining Overwatch 222-99-7777

step 4 achieving 2nf eliminate partial dependencies26
Step 4. Achieving 2NF:Eliminate Partial Dependencies
  • Here’s how it should look......

EMPLOYEES’ PROJECTS

SSNProject Function

123-45-6789 Dining Designer

123-45-6789 Computing Designer

123-45-6789 Intranet Member

987-65-4321 Intranet Designer

444-55-6666 Intranet Webmaster

222-99-7777 Dining Overwatch

PROJECTS

Project Project Chief

Dining 222-99-7777

Computing 333-88-5656

Intranet 987-65-4321

step 5 achieving 3nf eliminate transitive dependencies
Step 5: Achieving 3NF:Eliminate Transitive Dependencies
  • What is wrong with the following table?

PROJECTS

Project Project Chief Dept. Dept. Director Room

Dining 222-99-7777 Admin 181-94-5676 B115

Computing 333-88-5656 Admin 181-94-5676 B115

Intranet987-65-4321 M&B 818-49-6765 123

Contracting 187-87-8787 M&B 818-49-6765 123

CAT 333-22-1111 Grounds 123-45-6789 147

step 5 achieving 3nf eliminate transitive dependencies28
Step 5: Achieving 3NF:Eliminate Transitive Dependencies
  • We have fields dependent on a non-key field:
    • The Director and Room fields clearly relate to the Dept., and have nothing to do with the project. (Dept is a “determinant” that is not a candidate key)

PROJECTS

Project Project Chief Dept. Dept. Director Room

Dining 222-99-7777 Admin 181-94-5676 B115

Computing 333-88-5656 Admin 181-94-5676 B115

Intranet987-65-4321 M&B 818-49-6765 123

Contracting 187-87-8787 M&B 818-49-6765 123

CAT 333-22-1111 GRND 123-45-6789 147

step 5 achieving 3nf eliminate transitive dependencies29
Step 5: Achieving 3NF:Eliminate Transitive Dependencies
  • A transitive dependency occurs when a non-key field depends on another non-key field.
  • Why is this bad?.
    • A typo appeared in the Contracting line. A database without the transitive dependency would not have allowed this to happen.

PROJECTS

Project Project Chief Dept. Dept. Director Room

Dining 222-99-7777 Admin 181-94-5676 B115

Computing 333-88-5656 Admin 181-94-5676 B115

Intranet987-65-4321 M&B 818-49-6765 123

Contracting 187-87-8787 M&B 818-49-6765 124

CAT 333-22-1111 GRND 123-45-6789 147

step 5 achieving 3nf eliminate transitive dependencies30
Step 5: Achieving 3NF:Eliminate Transitive Dependencies
  • How to do it:

a. Which fields are dependent on a non-key field in the table? (Director, Room)

b. Which fields are these dependent on? (Dept)

c. Create a new table with (b) as the primary key.

d. Put (a) in the new table.

e. Remove (a) from the old table.

step 5 achieving 3nf eliminate transitive dependencies31
Step 5: Achieving 3NF:Eliminate Transitive Dependencies
  • Here are the new tables.

PROJECTS

Project Project ChiefDept.

Dining 222-99-7777 Admin

Computing 333-88-5656 Admin

Intranet987-65-4321 M&B

Contracting 187-87-8787 M&B

CAT 333-22-1111 GRND

DEPARTMENTS

Dept. NameDept. DirectorRoom

Admin 181-94-5676 B115

M&B 818-49-6765 123

GRND 123-45-6789 147

slide32
Data Analysis: Normalization
  • An entity is in first normal form (1NF) if there are no attributes that can have more than one value for a single instance of the entity.
  • An entity is in second normal form (2NF) if it is already in 1NF, and if the values of all non-primary key attributes are dependent on the full primary key – not just part of it.
  • An entity is in third normal form (3NF) if it is already in 2NF, and if the values of its non-primary key attributes are not dependent on any other non-primary key attributes.
conclusion
Conclusion
  • Rule1: Never design redundant data into a database
  • Rule2: Never use a counter as Primary Key
  • Identify proper primary keys (1NF)
  • Break up many-to-many relationships (1NF)
  • 1NF: Break all data into atomic components
  • 2NF: Identify/eliminate partial dependencies
  • 3NF: Eliminate transitive dependencies
  • Common sense test
slide34
What is a Good Data Model?
  • A good data model is simple.
    • As a general rule, the data attributes that describe an entity should describe only that entity.
  • A good data model is essentially non-redundant.
    • This means that each data attribute, other than foreign keys, describes at most one entity.
  • A good data model should be flexible and adaptable to future needs.
    • We should make the data models as application-independent as possible to encourage database structures that can be extended or modified without impact to current programs.
slide35
Data and Referential Integrity
    • There are at least three types of data integrity that must be designed into any database - key integrity, domain integrity and referential integrity.
    • Key Integrity:
      • Every table should have a primary key (which may be concatenated).
        • The primary key must be controlled such that no two records in the table have the same primary key value.
        • The primary key for a record must never be allowed to have a NULL value.
slide36
Data and Referential Integrity
    • Domain Integrity:
      • Appropriate controls must be designed to ensure that no field takes on a value that is outside of the range of legal values.
    • Referential Integrity:
      • A referential integrity error exists when a foreign key value in one table has no matching primary key value in the related table.
slide37
Referential Integrity:
      • Referential integrity is specified in the form of deletion rules as follows:
        • No restriction.
          • Any record in the table may be deleted without regard to any records in any other tables.
        • Delete:Cascade.
          • A deletion of a record in the table must be automatically followed by the deletion of matching records in a related table.
        • Delete:Restrict.
          • A deletion of a record in the table must be disallowed until any matching records are deleted from a related table.
        • Delete:Set Null.
          • A deletion of a record in the table must be automatically followed by setting any matching keys in a related table to the value NULL.
slide38
Database Capacity Planning
    • A database is stored on disk.
      • The database administrator will want an estimate of disk capacity for the new database to ensure that sufficient disk space is available.
    • Database capacity planning can be calculated with simple arithmetic as follows.
      • For each table, sum the field sizes.
        • This is the record size for the table.
      • For each table, multiply the record size times the number of entity instances to be included in the table.
        • This is the table size.
slide39
Database Capacity Planning
    • Database capacity planning can be calculated with simple arithmetic as follows. (continued)
      • Sum the table sizes.
        • This is the database size.
      • Optionally, add a slack capacity buffer (e.g., 10%) to account for unanticipated factors or inaccurate estimates above.
        • This is the anticipated database capacity.