Keys and Functional Dependency

1 / 60

# Keys and Functional Dependency - PowerPoint PPT Presentation

CS157A. Lecture 14. Keys and Functional Dependency. Prof. Sin-Min Lee Department of Computer Science San Jose State University. Data Normalization. Primarily a tool to validate and improve a logical design so that it satisfies certain constraints that avoid unnecessary duplication of data.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about 'Keys and Functional Dependency' - maj

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

CS157A

Lecture 14

### Keys and Functional Dependency

Prof. Sin-Min Lee

Department of Computer Science

San Jose State University

Data Normalization
• Primarily a tool to validate and improve a logical design so that it satisfies certain constraints that avoid unnecessary duplication of data.
• The process of decomposing relations with anomalies to produce smaller, well-structured relations.
• Primary Objective: Reduce Redundancy,Reduce nulls,
• Improve “modify” activities:
• insert,
• update,
• delete,
• Price: degraded query, display, reporting
Functional Dependency and Keys
• Functional Dependency: The value of one attribute (the determinant) determines the value of another attribute.
• Candidate Key: Each non-key field is functionally dependent on every candidate key.
Functional dependency
• a constraint between two attributes (columns) or two sets of columns
• A  B if “for every valid instance of A, that value of A uniquely determines the value of B”
• or …A B if “there exists at most one value of B for every value of A”
Functional Dependencies

R

X Y Z

• FDs defined over two sets of attributes: X, Y Ì R
• Notation: X à Y reads as “X determines Y”
• If X à Y, then all tuples that agree on X must also agree on Y

1 2 3

2 4 5

1 2 4

1 2 7

2 4 8

3 7 9

Functional Dependencies (example)

X Y Z

X Y Z

1 2 3

2 4 5

1 2 4

1 2 7

2 4 8

3 7 9

… functional dependency
• some examples
• SSN  Name, Address, Birthdate
• VIN  Make, Model, Color
• note: the LHS is the determinant
• so functional dependency is the technical term for determines
Candidate Keys
• an attribute (or set of attributes) that uniquely identifies a row
• primary key is a special candidate key
• values cannot be null
• e.g.
• ENROLL (Student_ID, Name, Address, …)
• PK = Student_ID
• candidate key = Name, Address
… candidate key
• a candidate key must satisfy:
• unique identification.
• implies that each nonkey attribute is functionally dependent on the key (for not(A  B) to be true, A must occur more than once (with a different B), or A must map to more than one B in a given row)
• nonredundancy
• no attribute in the key can be deleted and still be unique
• minimal set of columns (Simsion)
keys and dependencies

EMPLOYEE1 (Emp_ID, Name, Dept_Name, Salary)

determinant

functional dependency

EMPLOYEE2 (Emp_ID, Course_Title, Name, Dept_Name, Salary, Date_Completed)

not fully functionally dependant on the primary key

determinants & candidate keys
• candidate key is always a determinant (one way to find a determinant)
• determinant may or may not be a candidate key
•  candidate key is a determinant that uniquely identifies the remaining (nonkey) attributes
• determinant may be
• a candidate key
• part of a composite candidate key
• nonkey attribute
Introduction
• Data integrity maintained by various constraints on data
• Functional dependencies are application constraints that help DB model real-world entity
• Join dependencies are a further constraint that help resolve some FD constraint limitations
Normal Forms provide database designers with:
• A formal framework for analyzing relation schemas based on their keys and on the functional dependencies among their attributes.
• A series of tests that can be carried out on individual relation schemas so that the relational database can be normalized to any degree.
Keys
• superkey:a superkey is a set of attributes S  R={A1,A2,….An} with the property that no two tuples t1 and t2 in any relation state r of R will have t1[S] = t2[S].
• A key K is a superkey with the additional property that removal of any attribute from K will cause K not to be a superkey anymore.
Keys
• The difference between a key and a superkey is that a key has to be “minimal”.
• Example:
• {SSN} is a key for EMPLOYEE, whereas {SSN}, {SSN,ENAME}, {SSN, ENAME, BDATE} are all superkeys.
Keys
• If a relation schema has more than one “minimal” key, each is called a candidate key.
Keys
• one of the candidate keys is designated to be the primary key.
• Each relation schema must have a primary key.
• For example, {SSN} is the only candidate key for EMPLOYEE, so it is also the primary key.

R(A B C D E)

• FD1. A -> C
• FD2. BC ->D
• FD3. E ->AB
• result = A
• By FD1. A -> C A result
• result = {A, C}

By FD2. BC -> D BC result

result = {A, C}

By FD3. E ->AB E result

result = {A, C}

 {A}+ = {A, C}

Similarly {B}+ = {B}

• {C}+ = {C}
• {D}+ = {D}
• {E}+ = {E,A,B,C,D}
• E is a candidate key

Now, we see

{AB}+ = {ABCD} {AC}+ = {AC} {AD}+ = {ACD}

{BC}+ = {BCD} {BD}+ = {BD} {CD}+ = {CD}

{ABC}+ = {ABCD} {ABD}+ = {ABCD} {BCD}+ = {BCD}

{ACD}+ = {ACD}

What is the largest normal form of this table?

R(A B C D E)

FD1. A ->C

FD2. BC ->D

FD3. E ->AB

Answer: {E} is the only candidate key of R

The non-prime attributes are: A, B, C, D

As FD!. A->C, we have transitive dependency.

Thus R(ABCD) is 2NF but not 3NF

What is Normalization?
• The purpose of normalization is to produce a stable set of relations that is a faithful model of the operations of the enterprise. By following the principles of normalization, we can achieve a design that is highly flexible, allowing the model to be extended when needed to account for new attributes, entity sets, and relationships.
Normal Forms
• A relation is in specific normal form if it satisfies the set of requirements or constraints for that form. All of the normal forms are nested in that each satisfies the constraints of the previous one but is a "better" form because each eliminates flaws found in the previous
1NF
• relation is in first normal form if it contains no multivalued attributes
• remove repeating groups to a new table as already demonstrated, “carrying” the PK as a FK
First Normal Form ( 1NF )
• the domains of attributes must include only atomic(simple, indivisible) valuesand the value of any attribute in a tuple must be a single value from the domain of the attribute.
First Normal Form ( 1NF )
• example:

Department

research 5 333445555 {Bellaire ,

Sugarland Houston}

• the domain of DLOCATIONS contains atomic values, but some tuples can have a set of these values. In this case,
• DNUMBER x->DLOCATIONS.
• The domain of DLOCATIONS contains sets of values and hence in non-atomic.

Our Example in 1NF

PROJ_NUM

PROJ_NAME

EMP_NUM

EMP_NAME

JOB_CLASS

CHG_HOUR

HOURS

• Key (PROJ_NUM, EMP_NUM)
• Given PROJ_NUM
• PROJ_NAME is determined
• Given EMP_NUM
• EMP_NAME, JOB_CLASS, and CHG_HOUR are determined
2NF
• a relation is in second normal form if it is in first normal form AND every nonkey attribute is fully functionally dependant on the primary key
• i.e. remove partial functional dependencies, so no nonkey attribute depends on just part of the key

EMPLOYEE2 (Emp_ID, Course_Title, Name, Dept_Name, Salary, Date_Completed)

not fully functionally dependant on the primary key

Second Normal Form ( 2NF )
• it is based on the concept of full functional dependency.
• A functional dependency XY is a full functional dependency , for any attribute A  X, {X - {A}}  Y.
Second Normal Form
• A relation is in second normal form (2NF) if and only if it is in first normal form and all the nonkey attributes are fully functionally dependent on the key.

Second Normal Form

• A table is in second normal form (2NF) if:
• It is in 1NF
• It includes no partial dependencies. No attribute is dependent on only a portion of the primary key.
2NF
• a relation is in 2NF if it is in 1NF and any one of these is true:
• the PK consists of only 1 attribute
• all attributes are part of the PK (no nonkey attributes)
• every nonkey attribute is functionally dependant on the whole PK

2NF (Example)

A B C D

2 Candidate Keys

R with key{AB} is NOT 2NF

R with key{AC} is NOT 2NF

Second Normal Form ( 2NF )

fd1

fd2

• {SSN, PNUMBER}HOURS is a fully dependency (neither SSNHOURS nor PNUMBERHOURS holds).

fd3

Second Normal Form ( 2NF )

EMP_PROJ

• The functional dependencies fd1,fd2,fd3 lead to the decomposition of EMP_PROJ into the three relation schemas EP1,EP2,EP3, each of which is in 2NF.

fd1

fd2

fd3

2NF NORMALIZATION

EP2

EP3

EP1

fd2

fd1

fd3

1NF 2NF
• EMPLOYEE2 (Emp_ID, Course_Title, Name, Dept_Name, Salary, Date_Completed)
• EMPLOYEE1 (Emp_ID, Name, Dept_Name, Salary)
• and
• EMP_COURSE (Emp_ID, Course_Title, Date_Completed)
• EMPLOYEE1 satisfies condition1
• EMP_COURSE satisfies condition3
3NF
• a relation is in third normal form if it is in 2NF, AND no transitive dependencies exist
• transitive dependency is a functional dependency between nonkey attributes
transitive dependency

transitive

dependency

… transitive dependency
• same problems
• insertion anomaly (no salesman without a customer)
• deletion anomaly (if a salesman is assigned to only 1 customer, and the customer is deleted, we lose the salesman!)
• modification (update) anomaly (reassign salesperson to region)

Converting to 2NF

• To convert from 1NF to 2NF, list each key component and then the key itself.
• Each component will become the key in a new table.

Our Example in 2NF

Table Name: PROJECT

PROJ_NUM

PROJ_NAME

Table Name: EMPLOYEE

EMP_NUM

EMP_NAME

JOB_CLASS

CHG_HOUR

Table Name: ASSIGN

PROJ_NUM

EMP_NUM

HOURS

Problems with 2NF

• Transitive Dependency
• An attribute that is dependent on a non-prime attribute exhibits transitive dependency.
• Still leads to data anomalies.

EMP_NUM

EMP_NAME

JOB_CLASS

CHG_HOUR

Our example contains the transitive dependency:

JOB_CLASS -----> CHG_HOUR

Second Normal Form

Second normal form:

Let R’ be a relation, and let F be the set of governing FDs. An attribute belongs to R’ is prime if a key of R’ contains A. In other words, A is prime in R’ if there exists K<R’ such that (1) K->R’,

(2) for all B belongs to K, (K-B)->R’ not belongs to F+, and

(3) A belongs to K

Third Normal Form

Third normal form:

Let R’ be a relation, a subset of the universal relation, in the context of a set of FDs F. R’ satisfies third normal form if for every nontrival X->A belong to F+, either

(1). X is superkey for R’ or

(2). A is a prime attribute in R’.

Third Normal Form ( 3 NF )
• Third normal form is based on the concept of transitive dependency.
• A functional dependency XY in a relation schema R is a transitive dependency if there is a set of attributes Z that is not a subset of any key of R, and both XZ and ZY hold.
Third Normal Form ( 3 NF )

EMP_DEPT