Normalization is the process of structuring data to reduce redundancy and improve consistency.
It’s goal is to avoid duplicated data, prevent anomalies (update, insert, delete problems) and make data logically organized.
Normalization in Relational Databases
In relational databases, normalization means organizing tables and their relationships according to formal rules called normal forms.
Core
First Normal Form: 1NF
- All attributes are atomic (no lists or nested values)
- No repeating groups
❌ Bad:
Student(name, phones = [123, 456])
✅ Good:
Student(name, phone)
Second Normal Form: 2NF
- Must be in 1NF
- No partial dependency on part of a composite key
Meaning: Every non-key attribute depends on the whole primary key
Third Normal Form: 3NF
- Must be in 2NF
- No transitive dependencies
Meaning: Non-key attributes depends only on the primary key, not on other non-key attributes
❌ Bad:
Student(id, dept_id, dept_name)
→ dept_name depends on dept_id, not directly on id
✅ Good:
Student(id, dept_id)
Department(dept_id, dept_name)
Boyce-Codd Normal Form: BCNF
BCNF is a stronger version of Third Normal Form focused on eliminating certain dependency anomalies that 3NF may still allow.
A relation is in BCNF if: For every functional dependency must be a candidate key.
Key concepts
Functional Dependency
A functional dependency: means: The value of X uniquely determines the value of Y.
Example: student_id → student_name
A student ID determines exactly one student name.
Determinant
The determinant is the left side of the dependency.
In:
- = Determinant
Candidate Key
A candidate key is a minimal set of attributes that uniquely identifies tuples.
The BCNF rule
BCNF requires that:
Meaning:
- No non-key attribute may determine other attributes
- Only keys are allowed to determine values
Why 3NF is not always enough
3NF allows some dependencies where:
- the determinant is not a candidate key
- but the dependent attribute is part of a candidate key
BCNF removes these exceptions.
Example (classic BCNF violation)
Consider:
Courses(student, course, professor)
Assume:
- A professor teaches only one course
- A student can take many courses
Functional dependencies:
Candidate key
The key is: because together they uniquely identify tuples.
Problem
Dependency: means:
- knowing the professor determines the course
But:
professoris not a candidate key
→ BCNF violation
Why this is bad
Redundancy appears:
(Alice, Databases, Smith)(Bob, Databases, Smith)(Carol, Databases, Smith)
If Smith changes course:
- many rows must be updated
This causes:
- update anomalies
- redundancy
BCNF decomposition
Split into:
ProfessorCourse(professor, course)
StudentProfessor(student, professor)
Now:
- determinants are keys
- redundancy reduced
Relationship with 3NF
3NF
Allows some non-key determinants under special conditions.
BCNF
Stricter:
- every determinant must be a candidate key
- no exceptions
Important trade-off
BCNF sometimes:
- requires more decomposition
- may sacrifice dependency preservation
So some systems stop at 3NF for practicality.