Database Normalization ​
Also Read Keys 🔑
Database normalization
The process of restructuring a a relational database in accordance with a series of normal forms in order to reduce data redundancy and improve data integrity.
Normalization prevents anomalies:
- Update anomaly
- Insertion anomaly
- Deletion anomaly
Functional Dependency ​
Functional Dependency
- Let R(..., A, ..., B, ...) be a relational schema.
- There is a functional dependency between A and B denoted
in R if for any instance of R, tuples with the same value for A have the same value for B.
In other words the value of B follows from the value of A.
Examples:
: the astrological sign (B) will always be the same for the same values of birth_date (A) : since the ssn is unique (A) it will agree with name (B) - on the other hand ssn (B) does not functionally depend on birth_date (A) because there can be multiple people born on the same date (and they would have a different B).
Functional Dependencies For Sets:
Functional Dependency
- Let R be a relational schema.
- Let X, Y be sets of attributes.
- There is a functional dependency between X and Y denoted
in R if for any instance of R, tuples with the same values for X also have the same values for Y.
In other words the values of Y follow from the values of X
Rewrite Rules for FDs ​
Rewrite Rules
(reflexivity) (augmentation) (transitivity)
Rules 1-3 are known as Armstrong's Axioms.
Rules 4-6 are convenient rules derived from Armstrong's Axioms.
Closure ​
Closure
For a set of functional dependencies
Algorithms ​
Naive Algorithm ​
Compute the closure by applying rewrite rules until no more FDs can be derived.
Smarter Algorithm ​
- If we have
can we find some such that ? Then from we derive .
Example ​
Example
Given:
- Applying
to yields ( ) - Applying
to yields - Applying
Written Formally:
Let,
Notation
- Discard trivial FDs (like
) - Consider only FDs with a single attribute as the right hand side.
- If FDs have identical left-hand sides, combine them as a shorthand notation. (i.e.
for )
show table
S | N | B | A |
---|---|---|---|
115565789 | Alice Anderson | 1987-04-15 | Aries |
136729221 | Bill Brown | 1973-05-02 | Taurus |
262823054 | Bill Brown | 1998-03-21 | Aries |
228923541 | Chris Carpenter | 1992-07-17 | Leo |
Example
- we can add
Closure of a set of attributes ​
For example:
Normal Forms ​
1NF ​
1NF
All attributes have atomic values: each attribute contains a single value.
This normal form is enforced by default by the way we construct database schemas: consequence of the principle that attributes in a class diagram cannot have set values.
2NF ​
2NF
- The relation is in 1NF There is no non-prime attribute that is functionally dependent on a proper subset of a candidate key.
non-prime attribute: an attribute that is not part of any candidate key.
In other words all non-key attributes are fully dependent on the primary key.
Example ​
Not 2NF Not Normalized
course | teacher | office |
---|---|---|
Database Theory | Rebecca Roberts | AB 183 |
Database Theory | Stuart Stevens | AB 240 |
Database Theory | Theresa Thompson | H 3072 |
Advanced Programming | Stuart Stevens | AB 240 |
Advanced Programming | William Walters | H 3122 |
- Functional dependency:
- One candidate key: course teacher
- Not in 2NF because: office depends on part of the candidate key. (teacher)
2NF Normalized
course | teacher |
---|---|
Database Theory | Rebecca Roberts |
Database Theory | Stuart Stevens |
Database Theory | Theresa Thompson |
Advanced Programming | Stuart Stevens |
Advanced Programming | William Walters |
teacher | office |
---|---|
Rebecca Roberts | AB 183 |
Stuart Stevens | AB 240 |
Theresa Thompson | H 3072 |
William Walters | H 3122 |
- Functional dependencies:
- One candidate key:
course
.
3NF ​
3NF
- The relation is in 2NF
- No non-prime attribute is transitively dependent on a superkey
transitive dependency:
Example: ​
not 3NF Not Normalized
course | responsible | office |
---|---|---|
Database Theory | Rebecca Roberts | AB 183 |
Advanced Programming | William Walters | AB 240 |
Operation Systems | William Walters | AB 240 |
Artificial Intelligence | Phil Petersen | PT 5120 |
Discrete Mathematics | Naomi Nilsson | H 2403 |
- Functional dependencies:
- Transitive dependency:
A non-prime attribute office
is transitively dependent on the superkey course, therfore the relation is not in 3NF.
3NF Normalized
course | responsible |
---|---|
Database Theory | Rebecca Roberts |
Advanced Programming | William Walters |
Operation Systems | William Walters |
Artificial Intelligence | Phil Petersen |
Discrete Mathematics | Naomi Nilsson |
responsible | office |
---|---|
Rebecca Roberts | AB 183 |
William Walters | H 3122 |
Phil Petersen | PT 5120 |
Naomi Nilsson | H 2403 |
BCNF (Boyce-Codd Normal Form) ​
BCNF
If
or:
A relation is in BCNF if the left-hand side of every nontrivial FD is a superkey.
BCNF decomposition algorithm ​
The BCNF decomposition algorithm serves allows us to make a relational schema satisfy BCNF.
algorithm
- Let R be a relational schema. FD X -> Y be a violation of the BCNF constraint
- compute
- Compute FDs for
and
- compute
- If
or not in BCNF, apply the same rules.
The decomposition is losless, i.e. an instance of R can be reconstructed from instances of
Decomposition Properites ​
- elimination of anomalies
- recoverability of information (losslessness)
- Preservation of dependencies
BCNF
guarantees 1,23NF
guarantees 2,3
Multivalued Dependencies ​
MVD
- Let R(XYZ) be a relational schema
- Let X, Y, Z be sets of attributes (a mvd only exists when there are at least three attributes.)
- If
- Then also
Example
Kindergarten has a database with parents, their tel numbers and their children. This is their table:
| parent | tel_no | child | | Alice Anderson| 053-4332567| Tom | | Alice Anderson | 06-10547732 | Tom | | Alice Anderson | 053-4332567 | Bram | | Alice Anderson | 06-10547732 | Bram | | Bob Anderson | 053-4332567 | Tom | | Bob Anderson | 053-4332567 | Bram | Deborah Davids | 074-2662359 | Harry | | Deborah Davids | 06-18809372 | Harry |
this table is in BCNF because it has no functional dependencies.
But if a new child would be born we would have to add to add a new row to the table for each tel_no of that parent. 😬
There are two multivalued dependencies in this relation:
MVD Properties
Let R(XYZ) be a relation:
- If
- If
Fourth normal form ​
4NF
If
Example
| parent | tel_no | child | | Alice Anderson| 053-4332567| Tom | | Alice Anderson | 06-10547732 | Tom | | Alice Anderson | 053-4332567 | Bram | | Alice Anderson | 06-10547732 | Bram | | Bob Anderson | 053-4332567 | Tom | | Bob Anderson | 053-4332567 | Bram | Deborah Davids | 074-2662359 | Harry | | Deborah Davids | 06-18809372 | Harry |
MVDs:
Candidate key: - (parent, tel_no, child)
Decomposition:
parent | tel_no |
---|---|
Alice Anderson | 053-4332567 |
Alice Anderson | 06-10547732 |
Bob Anderson | 053-4332567 |
Bob Anderson | 053-4332567 |
Deborah Davids | 074-2662359 |
Deborah Davids | 06-18809372 |
parent | child |
---|---|
Alice Anderson | Tom |
Alice Anderson | Bram |
Bob Anderson | Tom |
Bob Anderson | Bram |
Deborah Davids | Harry |