Skip to content

Database Normalization ​

Also Read Keys 🔑

Database normalization

The process of restructuring a a relational database in accordance with a series of normal forms in order to reduce data redundancy and improve data integrity.

Normalization prevents anomalies:

  • Update anomaly
  • Insertion anomaly
  • Deletion anomaly

Functional Dependency ​

Functional Dependency

  • Let R(..., A, ..., B, ...) be a relational schema.
  • There is a functional dependency between A and B denoted A−>B in R if for any instance of R, tuples with the same value for A have the same value for B.

In other words the value of B follows from the value of A.

Examples:

  • birth_date−>astrological_sign: the astrological sign (B) will always be the same for the same values of birth_date (A)
  • ssn−>name: since the ssn is unique (A) it will agree with name (B)
  • on the other hand ssn (B) does not functionally depend on birth_date (A) because there can be multiple people born on the same date (and they would have a different B).

Functional Dependencies For Sets:

Functional Dependency

  • Let R be a relational schema.
  • Let X, Y be sets of attributes.
  • There is a functional dependency between X and Y denoted X−>Y in R if for any instance of R, tuples with the same values for X also have the same values for Y.

In other words the values of Y follow from the values of X

Rewrite Rules for FDs ​

Rewrite Rules

  1. XY−>X (reflexivity)
  2. If X−>Y then XZ−>YZ (augmentation)
  3. If X−>Y and Y−>Z then X−>Z (transitivity)
  4. If X−>Y and YW−>Z then XW−>Z
  5. If X−>Y and X−>Z then X−>YZ
  6. If X−>YZ then X−>Y and X−>Z

Rules 1-3 are known as Armstrong's Axioms. Rules 4-6 are convenient rules derived from Armstrong's Axioms.

Closure ​

Closure

For a set of functional dependencies F, it's closure F+ is the set of all functional dependencies that can be logically implied by F

Algorithms ​

Naive Algorithm ​

Compute the closure by applying rewrite rules until no more FDs can be derived.

Smarter Algorithm ​
  • If we have A−>B can we find some Y such that Y−>A ? Then from Y−>A,A−>B we derive Y−>B.
Example ​

Example

Given: ABC−>E,CD−>B,E−>D Which new FD's can we derive ?

  • Applying E−>D to CD−>B yields CE−>B (CE−>CD−>B)
  • Applying ABC−>E to E−>D yields ABC−>D
  • Applying CD−>BtoABC−>EyieldsACD−>E

Written Formally:

Let,

F={ABC−>E,CD−>B,E−>D}$$bethesetofFDsforarelation∗R(A,B,C,D,E)∗Theclosureofthisset:$$F+={ABC−>DE,ACD−>E,CD−>B,CE−>B,E−>D}

Notation

  • Discard trivial FDs (like ABC−>A)
  • Consider only FDs with a single attribute as the right hand side.
  • If FDs have identical left-hand sides, combine them as a shorthand notation. (i.e. ABC−>DE for ABC−>D,ABC−>E)
show table
SNBA
115565789Alice Anderson1987-04-15Aries
136729221Bill Brown1973-05-02Taurus
262823054Bill Brown1998-03-21Aries
228923541Chris Carpenter1992-07-17Leo

Example

F={S−>NB,B−>A}
  • we can add S−>A from S−>B,B−>A
F+={S−>NBA,B−>A}

Closure of a set of attributes ​

X+=X∪{all attributes that are functionally dependent on X}

For example:

F={course−>responsible,responsible−>officetel_no,tel_no−>office}
  • {responsible}+={responsibleofficetel_no}
  • {course}+={courseresponsibleofficetel_no}
  • {office}+={office}
  • {tel_no}+=officetel_no

Normal Forms ​

1NF ​

1NF

All attributes have atomic values: each attribute contains a single value.

This normal form is enforced by default by the way we construct database schemas: consequence of the principle that attributes in a class diagram cannot have set values.

2NF ​

2NF

  • The relation is in 1NF There is no non-prime attribute that is functionally dependent on a proper subset of a candidate key.

non-prime attribute: an attribute that is not part of any candidate key.

In other words all non-key attributes are fully dependent on the primary key.

Example ​

Not 2NF Not Normalized
courseteacheroffice
Database TheoryRebecca RobertsAB 183
Database TheoryStuart StevensAB 240
Database TheoryTheresa ThompsonH 3072
Advanced ProgrammingStuart StevensAB 240
Advanced ProgrammingWilliam WaltersH 3122
  • Functional dependency: teacher−>office
  • One candidate key: course teacher
  • Not in 2NF because: office depends on part of the candidate key. (teacher)
2NF Normalized
courseteacher
Database TheoryRebecca Roberts
Database TheoryStuart Stevens
Database TheoryTheresa Thompson
Advanced ProgrammingStuart Stevens
Advanced ProgrammingWilliam Walters
teacheroffice
Rebecca RobertsAB 183
Stuart StevensAB 240
Theresa ThompsonH 3072
William WaltersH 3122
  • Functional dependencies:
{course−>responsibleresponsible−>office}
  • One candidate key: course.

3NF ​

3NF

  • The relation is in 2NF
  • No non-prime attribute is transitively dependent on a superkey

transitive dependency: X−> and Y−>Z then Z is transitivley dependent on X.

Example: ​

not 3NF Not Normalized
courseresponsibleoffice
Database TheoryRebecca RobertsAB 183
Advanced ProgrammingWilliam WaltersAB 240
Operation SystemsWilliam WaltersAB 240
Artificial IntelligencePhil PetersenPT 5120
Discrete MathematicsNaomi NilssonH 2403
  • Functional dependencies:
{course−>responsibleresponsible−>office}
  • Transitive dependency: course−>office

A non-prime attribute office is transitively dependent on the superkey course, therfore the relation is not in 3NF.

3NF Normalized
courseresponsible
Database TheoryRebecca Roberts
Advanced ProgrammingWilliam Walters
Operation SystemsWilliam Walters
Artificial IntelligencePhil Petersen
Discrete MathematicsNaomi Nilsson
responsibleoffice
Rebecca RobertsAB 183
William WaltersH 3122
Phil PetersenPT 5120
Naomi NilssonH 2403

BCNF (Boyce-Codd Normal Form) ​

BCNF

If X−>Y is a functional dependecy of R, then either X−>Y is a trivial dependency (i.e Y⊆X) or X is a superkey of R.

or:

A relation is in BCNF if the left-hand side of every nontrivial FD is a superkey.

BCNF decomposition algorithm ​

The BCNF decomposition algorithm serves allows us to make a relational schema satisfy BCNF.

algorithm

  • Let R be a relational schema. FD X -> Y be a violation of the BCNF constraint
    • compute X+
    • R1=X+
    • R2=X∪{allattributesnotinX}
    • Compute FDs for R1 and R2
  • If R1 or R2 not in BCNF, apply the same rules.

The decomposition is losless, i.e. an instance of R can be reconstructed from instances of R1 and R2.

Decomposition Properites ​

  1. elimination of anomalies
  2. recoverability of information (losslessness)
  3. Preservation of dependencies
  • BCNF guarantees 1,2
  • 3NF guarantees 2,3

Multivalued Dependencies ​

MVD

  • Let R(XYZ) be a relational schema
  • Let X, Y, Z be sets of attributes (a mvd only exists when there are at least three attributes.)

X↠Y if for any instance of R the following holds:

  • If xy1z1 in R and xy2z2 in R
  • Then also xy1z2 in R and xy2z1 in R

↠ is called a multivalued dependency.

Example

Kindergarten has a database with parents, their tel numbers and their children. This is their table:

| parent | tel_no | child | | Alice Anderson| 053-4332567| Tom | | Alice Anderson | 06-10547732 | Tom | | Alice Anderson | 053-4332567 | Bram | | Alice Anderson | 06-10547732 | Bram | | Bob Anderson | 053-4332567 | Tom | | Bob Anderson | 053-4332567 | Bram | Deborah Davids | 074-2662359 | Harry | | Deborah Davids | 06-18809372 | Harry |

this table is in BCNF because it has no functional dependencies.

But if a new child would be born we would have to add to add a new row to the table for each tel_no of that parent. 😬

There are two multivalued dependencies in this relation: {parent}↠{telno} and {parent}↠{child}

MVD Properties

Let R(XYZ) be a relation:

  1. If X↠Y then X↠Z
  2. If X−>Y then X↠Y

Fourth normal form ​

4NF

If X↠Y a functional dependency of R, then either X \twoheadrightarrow Y is a trivial MVD (i.e. Y⊆X) or X is a superkey of R.

Example

| parent | tel_no | child | | Alice Anderson| 053-4332567| Tom | | Alice Anderson | 06-10547732 | Tom | | Alice Anderson | 053-4332567 | Bram | | Alice Anderson | 06-10547732 | Bram | | Bob Anderson | 053-4332567 | Tom | | Bob Anderson | 053-4332567 | Bram | Deborah Davids | 074-2662359 | Harry | | Deborah Davids | 06-18809372 | Harry |

MVDs:

  • {parent}↠{telno}
  • {parent}↠{child} Candidate key:
  • (parent, tel_no, child)

Decomposition:

parenttel_no
Alice Anderson053-4332567
Alice Anderson06-10547732
Bob Anderson053-4332567
Bob Anderson053-4332567
Deborah Davids074-2662359
Deborah Davids06-18809372
parentchild
Alice AndersonTom
Alice AndersonBram
Bob AndersonTom
Bob AndersonBram
Deborah DavidsHarry