dr alwyn barry dr joanna bryson
Download
Skip this Video
Download Presentation
CM20145 Further DB Design – Normalization

Loading in 2 Seconds...

play fullscreen
1 / 33

CM20145 Further DB Design Normalization - PowerPoint PPT Presentation


  • 123 Views
  • Uploaded on

Dr Alwyn Barry Dr Joanna Bryson. CM20145 Further DB Design – Normalization. Last Time. Database design is an ongoing, iterative process. Requirements come from data, user demands, design issues. Change occurs: Corporations & technologies grow. Programmers & users learn. Views / security.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'CM20145 Further DB Design Normalization' - nieve


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
last time
Last Time
  • Database design is an ongoing, iterative process.
    • Requirements come from data, user demands, design issues.
    • Change occurs:
      • Corporations & technologies grow.
      • Programmers & users learn.
  • Views / security.
  • Lossless-join decomposition

Now: Science for improving design.

design process normalization
Design Process & Normalization
  • We assume a schema R is given.
    • R could have been generated when converting E-R diagram to a set of tables.
    • R could have been a single relation containing all attributes that are of interest (called universal relation).
  • Normalization breaks R into smaller relations.

R could be the result of any ad hoc design of relations, which we then test & convert to normal form.

overview
Overview
  • First Normal Form.
  • Functional Dependencies.
  • Second Normal Form.
  • Third Normal Form.
  • Boyce-Codd Normal Form.
  • Fourth Normal Form.
  • Fifth Normal Form.
  • Domain Key / Normal Form.
  • Design Process & Problems.
first normal form 1nf
First Normal Form – 1NF
  • You aren’t supposed to have more than one value per attribute of a tuple.
  • All tuples have the same number of attributes.
  • Necessary for a relational database.

BAD

getting caught out with 1nf
Getting Caught Out With 1NF
  • A domain is atomic if its elements are considered to be indivisible units.
    • Examples of non-atomic domains:
      • Set-valued attributes, composite attributes.
      • Identifiers like CS101 that can be broken up into parts.
  • A relational schema R is in first normal form if the domains of all attributes of R are atomic.
  • Non-atomic values:
    • complicate storage,
    • encourage redundancy,
    • Depend on interpretation built into application programs.
are you atomic
Are You Atomic?
  • Atomicity is not an intrinsic property of the elements of the domain.
  • Atomicity is a property of how the elements of the domain are used.
    • E.g. strings containing a possible delimiter (here: a space)
      • cities = “Melbourne Sydney” (non-atomic: space separated list)
      • surname = “Fortescue Smythe” (atomic: compound surname)
    • E.g. strings encoding two separate fields
      • bucs_login = cssjjb
      • If the first two characters are extracted to find the department, the domain bucs_login is not atomic.
      • Leads to encoding of information in application program rather than in the database.
second normal form 2nf
Second Normal Form (2NF)
  • Violated when a nonkey column is a fact about part of the primary key.
  • A column is not fully functionally dependent on the full primary key.
    • CUSTOMER-CREDIT in this case:

From Watson

BAD

FIX

ITEM

*itemid

ORDER

quantity

CUSTOMER

*customerid

customer-credit

def functional dependency
Def: Functional Dependency
  • Let R be a relation schema

  R and   R

  • The functional dependency (FD)  holds on R (“ is FD on ”) iff for any legal relations r(R):
    • whenever any two tuples t1and t2 of r agree on the attributes 
    • they also agree on the attributes .
    • i.e. (t1) = (t2)   (t1) =  (t2)
  • Example: Consider r(A,B) with the following instance of r:
  • AB does NOT hold, but BA does hold
functional dependencies uses
Functional Dependencies: Uses
  • Way to encode “business rules”.
  • Specify constraints on the set of legal relations.
    • We say that Fholds onR if all legal relations on R satisfy the set of FDs F.
  • Test relations to see if they are legal under a given set of FDs.
    • If a relation r is legal under a set F of FDs, we say that rsatisfies F.
functional dependencies
Functional Dependencies
  • An FD is an assertion about a schema, not an instance.
  • If we only consider an instance or a few instances, we can’t tell if an FD holds.
    • Inspecting only a few bird species (e.g. crows, cardinals and canaries) we might conclude colour  species.
    • However, this would be a bad FD as we would find out if we found some ravens.
  • Thus, identifying FDs is part of the data modelling process.
trivial functional dependencies
Trivial Functional Dependencies
  • An FD is trivial if it is satisfied by all instances of a relation
    • E.g.
      • customer-name, loan-number customer-name
      • customer-name customer-name
    • In general,   is trivial if   
  • Permitting such FDs makes certain definitions and algorithms easier to state.
functional dependency vs key
Functional Dependency vs Key
  • FDs can express the same constraints we could express using keys:
  • Superkeys:
    • K is a superkey for relation schema R if and only if K R
  • Candidate keys:
    • K is a candidate key for R if and only if
      • K R, and
      • there is no K’  K such that K’ R
  • Of course, which candidate key becomes the primary key is arbitrary.
fds keys
FDs <> Keys
  • FDs can represent more information than keys can on their own.
  • Consider the following Loan-info-schema:

Loan-info-schema = (customer-name, loan-number,branch-name, amount).

We expect these FDs to hold:

loan-numberamount loan-number  branch-name

We could try to express this by making loan-number the key,however the following FD does not hold:

loan-number customer-name

  • Incidentally, this isn’t a very good table! (¬2NF)
fd closure
FD Closure
  • Given a set F of FDs, other FDs are logically implied.
    • E.g. If A  B and B  C, we can infer that A  C
  • The set of all FDs implied by F is the closure of F, written F+ .
  • Find F+by applying Armstrong’s Axioms:
    • if   , then   (reflexivity)
    • if  , then    (augmentation)
    • if  , and   , then   (transitivity)
  • Additional rules (derivable from Armstrong’s Axioms):
    • If   and  holds, then    holds (union)
    • If    holds, then   holds and  holds (decomposition)
    • If   holds and   holds, then   holds (pseudotransitivity)
bad decomposition example from last time
Bad Decomposition Example(From Last Time)
  • A Non Lossless-Join Decomposition

R = (A, B) R1 = (A), R2 = (B)

A

B

A

B

A

B

1

2

1

2

1

2

1

1

2

B(r)

A(r)

r

A (r) ⋈ B (r)

  • Thus, r is different to A (r) ⋈ B (r)
  • So A,B is not a lossless-join decomposition of R.
fds lossless decomposition
FDs & Lossless Decomposition
  • All attributes of an original schema (R) must appear in the decomposition (R1, R2):

R = R1  R2

  • Lossless-join decomposition.For all possible relations r on schema R

r = R1 (r) ⋈ R2 (r)

  • A decomposition of R into R1 and R2 is lossless-join if and only if at least one of the following dependencies is in F+:
    • R1 R2R1
    • R1 R2R2
second normal form 2nf1
Second Normal Form (2NF)
  • Violated when a nonkey column is a fact about part of the primary key.
  • A column is not fully functionally dependent on the full primary key.
    • CUSTOMER-CREDIT in this case:

From Watson

BAD

FIX

ITEM

*itemid

ORDER

quantity

CUSTOMER

*customerid

customer-credit

third normal form 3nf
Third Normal Form (3NF)
  • Violated when a nonkey column is a fact about another nonkey column.
  • A column is not fully functionally dependent on the primary key.
  • R is 3NF iff R is 2NF and has no transitive dependencies.
    • EXCHANGE RATE violates this.

FIX

BAD

boyce codd bcnf
Boyce-Codd (BCNF)
  • Arises when a table:
    • has multiple candidate keys,
    • the candidate keys are composite,
    • the candidate keys overlap.
  • R is BCNF iff every determinant is a cand. key.
    • E.g. Assume one consultant per problem per client, and one problem per consultant.
    • If client-problem is the primary key, how do you add a new consultant?
  • Like 3NF but now worry about all fields.

BAD

FIX

design goals their discontents
Design Goals & their discontents
  • Goals for a relational database design:
    • eliminate redundancies by decomposing relations,
    • must be able to recover original data using lossless joins,
    • prefer not to loose dependencies.
  • BCNF:
    • no redundancies,
    • no guarantee of dependency preservation.
  • 3NF:
    • dependency preservation,
    • but possible redundancies.
fourth normal form 4nf
Fourth normal form (4NF)
  • A row should not contain two or more independent multivalued facts.
  • 4NF iff BCNF & no non-trivial multi-valued dependencies.
  • Multivalued dependency means the value of one attributed determines a set of values for another.

BAD

FIX

fifth normal form 5nf
Fifth normal form (5NF)
  • 5NF iff a relation has no join dependency.
  • The schemas R1, R2,.., Rn have a join dependency over R if they define a lossless-join decomposition over R.
  • This is way too complicated, don’t worry about it.
domain key normal form
Domain Key Normal Form
  • Every constraint on the relation must be a logical consequence of the domain constraints and the key constraints that apply to the relation.
    • Key: unique identifier.
    • Constraint: rule governing attribute values.
    • Domain: set of values of the same data type.
  • No known algorithm gives DK/NF.
e r model and normalization
E-R Model and Normalization
  • When an E-R diagram is carefully designed, identifying all entities correctly, the tables generated should not need further normalization.
  • However, in a real (imperfect) design there can be FDs from non-key attributes of an entity to other attributes of the entity.
  • The keys identified in E-R diagrams might not be minimal - FDs can help us to identify minimal keys.
  • FDs from non-key attributes of a relationship set are possible, but rare.
denormalization performance
Denormalization & Performance
  • May want to use non-normalized schema for performance.
    • E.g. displaying customer-name along with account-number and balance requires join of account with depositor.
  • Alternative 1: Use denormalized relation containing attributes of account as well as depositor.
    • faster lookup.
    • extra space and extra execution time for updates.
    • extra coding work for programmer and possibility of error in extra code.
  • Alternative 2: use a materialized view defined as account ⋈ depositor
    • as above, except less extra coding, errors.
limits of normalization
Limits of Normalization
  • Examples of bad database design, not caught by normalization.
  • Good:
    • earnings(company-id, year, amount)
  • Bad:
    • earnings-2000, earnings-2001, earnings-2002, etc., on (company-id, earnings)
      • all are BCNF, but querying across years difficult
      • needs a new table each year
    • company-year(company-id, earnings-2000,earnings-2001, earnings-2002)
      • in BCNF, but querying across years difficult
      • requires new attribute each year
summary 1 rules to watch
Summary 1 – Rules to Watch
  • 1NF: attributes not atomic.
  • 2NF: non-key attribute FD on part of key.
  • 3NF: one non-key attribute FD on another.
  • Boyce-Codd NF: overlapping but otherwise independent candidate keys.
  • 4NF: multiple, independent multi-valued attributes.
  • 5NF: join dependency.
  • Domain Key / NF: all constraints either domain or key
summary 2 concepts
Summary 2 – Concepts
  • Functional Dependencies:
    • Axioms & Closure.
  • Lossless-join decomposition.
  • Design Process.
  • Normalization Problems.

Next: Interfaces and Architectures

reading exercises
Reading & Exercises
  • Reading
    • Connolly & Begg Chapter (13, 14)
    • Silberschatz Chapters 7.
    • Any other book, the design/normalization chapter.
  • Exercises:
    • Silberschatz
      • 7.1, 7.2, 7.16, 7.23, 7.24, 7.27-29
next week
Next Week
  • Architectures and Implementations
  • Integrity and Security
slide32

Slides after and including this one you are not responsible for, but I am saving in case I decide to use them in the future.

goal formalize good design
Goal: Formalize “Good Design”
  • Process:
    • Decide whether a particular relation R is in “good” form.
    • In the case that a relation R is not in “good” form, decompose it into a set of relations {R1, R2, ..., Rn} such that:
      • each relation is in good form,
      • the decomposition is a lossless-join decomposition.
  • Theory:
    • Constraints on the set of legal relations.
    • Require that the value for a certain set of attributes determines uniquely the value for another set of attributes – functional dependencies.
ad