Normalisation: fine points

from the Artful MySQL Tips List


1NF = an unordered table of unordered atomic columns with no repeat rows

2NF = 1NF + all nonkey columns are fully functionally dependent on the PK

3NF = 2NF + all nonkey columns are mutually independent

BCNF = 3NF + every determinant (column on which some other column depends) is a candidate key (rowwise unique column combination, a column or column combo with no nulls & no dupes)

4NF = 3NF + all multivalued dependencies (MVDs) but one are projected to new tables (MVD = cols B & C depend on A but C does not depend on B).

5NF = 4NF + no cyclic dependencies ("join dependencies", JD).

Suppose you project a table with cyclic dependencies to N new tables, ie keep doing projections till there are no more non-key dependencies. N-decomposable means that having done these projections, recomposing the original table from the children gets you exactly the original table.

Another way of presenting it: cyclic dependency or JD exists in a table if for any i and j, if
    (Ai, Bi, Cj), eg A=Imre B=Izzy C=Jussi
    (Ai, Bj, Ci), eg A=Imre B=Jock C=Iago
    (Aj, Bi, Ci), eg A=Jan  B=Izzy C=Iago

all occur, then
    (Ai, Bi, Ci ) eg A=Imre B=Izzy C=Iago

also occurs, without which projection followed by reconstitution would create spurious rows.

N-decomposability is clarified by pointing at where it isn't. Consider this cribbed example:

           PSC
  PID      SID      Course
  ~~~~~~~~~~~~~~~~~~~~~~~~
  Emile    Wendy    CSC330
  Emile    Joan     CSC111
  Merrie   Wendy    CSC111

Now projecting PSC to obtain PS(PID,SID), SC(SID,Course), and PC(PID,Course), ie
every projection of degree 2, we get:

       PS                SC                  PC
  PID     SID        SID    Course      PID      Course
  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  Emile   Wendy      Wendy  CSC330      Emile    CSC330
  Emile   Joan       Joan   CSC111      Emile    CSC111
  Merrie  Wendy      Wendy  CSC111      Merrie   CSC111

But neither the projections PS and SC (nor any two of them) are equivalent to PSC:
joining them over the common domain SID yields mythical rows (marked *):

              PS_SC                            PS_SC_PC
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    PID       SID      Course          PID       SID      Course
    Emile     Wendy    CSC330          Emile     Wendy    CSC330
    Emile     Wendy    CSC111*         Emile     Wendy    CSC111*
    Emile     Joan     CSC111          Emile     Joan     CSC111
    Merrie    Wendy    CSC330*         Merrie    Wendy    CSC111
    Merrie    Wendy    CSC111

Bottom line:
BCNF = every FD is a consequence of the candidate keys
4NF = every MVD is a consequence of the candidate keys
5NF = every JD is a consequence of the candidate keys

Return to the Artful MySQL Tips page