- 60 Views
- Uploaded on
- Presentation posted in: General

CM20145 Further DB Design – Normalization

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Dr Alwyn Barry

Dr Joanna Bryson

CM20145Further DB Design –Normalization

- Database design is an ongoing, iterative process.
- Requirements come from data, user demands, design issues.
- Change occurs:
- Corporations & technologies grow.
- Programmers & users learn.

- Views / security.
- Lossless-join decomposition
Now: Science for improving design.

- We assume a schema R is given.
- R could have been generated when converting E-R diagram to a set of tables.
- R could have been a single relation containing all attributes that are of interest (called universal relation).

- Normalization breaks R into smaller relations.
R could be the result of any ad hoc design of relations, which we then test & convert to normal form.

- First Normal Form.
- Functional Dependencies.
- Second Normal Form.
- Third Normal Form.
- Boyce-Codd Normal Form.
- Fourth Normal Form.
- Fifth Normal Form.
- Domain Key / Normal Form.
- Design Process & Problems.

- You aren’t supposed to have more than one value per attribute of a tuple.
- All tuples have the same number of attributes.
- Necessary for a relational database.

BAD

- A domain is atomic if its elements are considered to be indivisible units.
- Examples of non-atomic domains:
- Set-valued attributes, composite attributes.
- Identifiers like CS101 that can be broken up into parts.

- Examples of non-atomic domains:
- A relational schema R is in first normal form if the domains of all attributes of R are atomic.
- Non-atomic values:
- complicate storage,
- encourage redundancy,
- Depend on interpretation built into application programs.

- Atomicity is not an intrinsic property of the elements of the domain.
- Atomicity is a property of how the elements of the domain are used.
- E.g. strings containing a possible delimiter (here: a space)
- cities = “Melbourne Sydney” (non-atomic: space separated list)
- surname = “Fortescue Smythe” (atomic: compound surname)

- E.g. strings encoding two separate fields
- bucs_login = cssjjb
- If the first two characters are extracted to find the department, the domain bucs_login is not atomic.
- Leads to encoding of information in application program rather than in the database.

- E.g. strings containing a possible delimiter (here: a space)

- Violated when a nonkey column is a fact about part of the primary key.
- A column is not fully functionally dependent on the full primary key.
- CUSTOMER-CREDIT in this case:

From Watson

BAD

FIX

ITEM

*itemid

…

ORDER

quantity

…

CUSTOMER

*customerid

customer-credit

…

- Let R be a relation schema
R and R

- The functional dependency (FD) holds on R (“ is FD on ”) iff for any legal relations r(R):
- whenever any two tuples t1and t2 of r agree on the attributes
- they also agree on the attributes .
- i.e. (t1) = (t2) (t1) = (t2)

- Example: Consider r(A,B) with the following instance of r:
- AB does NOT hold, but BA does hold

- Way to encode “business rules”.
- Specify constraints on the set of legal relations.
- We say that Fholds onR if all legal relations on R satisfy the set of FDs F.

- Test relations to see if they are legal under a given set of FDs.
- If a relation r is legal under a set F of FDs, we say that rsatisfies F.

- An FD is an assertion about a schema, not an instance.
- If we only consider an instance or a few instances, we can’t tell if an FD holds.
- Inspecting only a few bird species (e.g. crows, cardinals and canaries) we might conclude colour species.
- However, this would be a bad FD as we would find out if we found some ravens.

- Thus, identifying FDs is part of the data modelling process.

- An FD is trivial if it is satisfied by all instances of a relation
- E.g.
- customer-name, loan-number customer-name
- customer-name customer-name

- In general, is trivial if

- E.g.
- Permitting such FDs makes certain definitions and algorithms easier to state.

- FDs can express the same constraints we could express using keys:
- Superkeys:
- K is a superkey for relation schema R if and only if K R

- Candidate keys:
- K is a candidate key for R if and only if
- K R, and
- there is no K’ K such that K’ R

- K is a candidate key for R if and only if
- Of course, which candidate key becomes the primary key is arbitrary.

- FDs can represent more information than keys can on their own.
- Consider the following Loan-info-schema:
Loan-info-schema = (customer-name, loan-number,branch-name, amount).

We expect these FDs to hold:

loan-numberamountloan-number branch-name

We could try to express this by making loan-number the key,however the following FD does not hold:

loan-number customer-name

- Incidentally, this isn’t a very good table! (¬2NF)

- Given a set F of FDs, other FDs are logically implied.
- E.g. If A B and B C, we can infer that A C

- The set of all FDs implied by F is the closure of F, written F+ .
- Find F+by applying Armstrong’s Axioms:
- if , then (reflexivity)
- if , then (augmentation)
- if , and , then (transitivity)

- Additional rules (derivable from Armstrong’s Axioms):
- If and holds, then holds (union)
- If holds, then holds and holds (decomposition)
- If holds and holds, then holds (pseudotransitivity)

- A Non Lossless-Join Decomposition
R = (A, B) R1 = (A), R2 = (B)

A

B

A

B

A

B

1

2

1

2

1

2

1

1

2

B(r)

A(r)

r

A (r) ⋈ B (r)

- Thus, r is different to A (r) ⋈ B (r)
- So A,B is not a lossless-join decomposition of R.

- All attributes of an original schema (R) must appear in the decomposition (R1, R2):
R = R1 R2

- Lossless-join decomposition.For all possible relations r on schema R
r = R1 (r) ⋈ R2 (r)

- A decomposition of R into R1 and R2 is lossless-join if and only if at least one of the following dependencies is in F+:
- R1 R2R1
- R1 R2R2

- Violated when a nonkey column is a fact about part of the primary key.
- A column is not fully functionally dependent on the full primary key.
- CUSTOMER-CREDIT in this case:

From Watson

BAD

FIX

ITEM

*itemid

…

ORDER

quantity

…

CUSTOMER

*customerid

customer-credit

…

- Violated when a nonkey column is a fact about another nonkey column.
- A column is not fully functionally dependent on the primary key.
- R is 3NF iff R is 2NF and has no transitive dependencies.
- EXCHANGE RATE violates this.

FIX

BAD

- Arises when a table:
- has multiple candidate keys,
- the candidate keys are composite,
- the candidate keys overlap.

- R is BCNF iff every determinant is a cand. key.
- E.g. Assume one consultant per problem per client, and one problem per consultant.
- If client-problem is the primary key, how do you add a new consultant?

- Like 3NF but now worry about all fields.

BAD

FIX

- Goals for a relational database design:
- eliminate redundancies by decomposing relations,
- must be able to recover original data using lossless joins,
- prefer not to loose dependencies.

- BCNF:
- no redundancies,
- no guarantee of dependency preservation.

- 3NF:
- dependency preservation,
- but possible redundancies.

- A row should not contain two or more independent multivalued facts.
- 4NF iff BCNF & no non-trivial multi-valued dependencies.
- Multivalued dependency means the value of one attributed determines a set of values for another.

BAD

FIX

- 5NF iff a relation has no join dependency.
- The schemas R1, R2,.., Rn have a join dependency over R if they define a lossless-join decomposition over R.
- This is way too complicated, don’t worry about it.

- Every constraint on the relation must be a logical consequence of the domain constraints and the key constraints that apply to the relation.
- Key: unique identifier.
- Constraint: rule governing attribute values.
- Domain: set of values of the same data type.

- No known algorithm gives DK/NF.

- When an E-R diagram is carefully designed, identifying all entities correctly, the tables generated should not need further normalization.
- However, in a real (imperfect) design there can be FDs from non-key attributes of an entity to other attributes of the entity.
- The keys identified in E-R diagrams might not be minimal - FDs can help us to identify minimal keys.
- FDs from non-key attributes of a relationship set are possible, but rare.

- May want to use non-normalized schema for performance.
- E.g. displaying customer-name along with account-number and balance requires join of account with depositor.

- Alternative 1: Use denormalized relation containing attributes of account as well as depositor.
- faster lookup.
- extra space and extra execution time for updates.
- extra coding work for programmer and possibility of error in extra code.

- Alternative 2: use a materialized view defined as account ⋈ depositor
- as above, except less extra coding, errors.

- Examples of bad database design, not caught by normalization.
- Good:
- earnings(company-id, year, amount)

- Bad:
- earnings-2000, earnings-2001, earnings-2002, etc., on (company-id, earnings)
- all are BCNF, but querying across years difficult
- needs a new table each year

- company-year(company-id, earnings-2000,earnings-2001, earnings-2002)
- in BCNF, but querying across years difficult
- requires new attribute each year

- earnings-2000, earnings-2001, earnings-2002, etc., on (company-id, earnings)

- 1NF: attributes not atomic.
- 2NF: non-key attribute FD on part of key.
- 3NF: one non-key attribute FD on another.
- Boyce-Codd NF: overlapping but otherwise independent candidate keys.
- 4NF: multiple, independent multi-valued attributes.
- 5NF: join dependency.
- Domain Key / NF: all constraints either domain or key

- Functional Dependencies:
- Axioms & Closure.

- Lossless-join decomposition.
- Design Process.
- Normalization Problems.
Next: Interfaces and Architectures

- Reading
- Connolly & Begg Chapter (13, 14)
- Silberschatz Chapters 7.
- Any other book, the design/normalization chapter.

- Exercises:
- Silberschatz
- 7.1, 7.2, 7.16, 7.23, 7.24, 7.27-29

- Silberschatz

- Architectures and Implementations
- Integrity and Security

Slides after and including this one you are not responsible for, but I am saving in case I decide to use them in the future.

- Process:
- Decide whether a particular relation R is in “good” form.
- In the case that a relation R is not in “good” form, decompose it into a set of relations {R1, R2, ..., Rn} such that:
- each relation is in good form,
- the decomposition is a lossless-join decomposition.

- Theory:
- Constraints on the set of legal relations.
- Require that the value for a certain set of attributes determines uniquely the value for another set of attributes – functional dependencies.