3. Normalization
3
Normalization: the process of decomposing
unsatisfactory "bad" relations by breaking up their
attributes into smaller relations
Normalization is carried out in practice so that the
resulting designs are of high quality and meet the
desirable properties
The main goal of Database Normalization is to
restructure the logical data model of a database to:
Eliminate redundancy
Organize data efficiently
Reduce the potential for data anomalies
4. Data Anomalies
Mixing attributes of multiple entities may cause
problems
Information is stored redundantly wasting storage
Well structured relations contain minimal redundancy of data
They allow modification, insertion and deletion of data in the
relation without error
Data Anomalies are errors/inconsistencies that arise due to
redundantly stored data in a relation
The three most common anomalies in relational database
design are:
Insertion anomalies
Deletion anomalies
Modification anomalies (update anomalies)
4
6. Data Anomalies: Insertion Anomalies
These type of data anomalies occur when we try to
insert new records to a relation.
Insertion anomalies can be differentiated into two
types:
6
7. Data Anomalies: Insertion Anomalies
7
1.To insert a new employee tuple into EMP_DEPT, we must
include either the attribute values for the department that the
employee works for, or nulls (if the employee does not work
for a department as yet)
2. It is difficult to insert a new department that has no
employees as yet in the EMP_DEPT relation.
The only way to do this is to place null values in the
attributes for employee.
This causes a problem because SSN is the primary key of
EMP_DEPT, and each tuple is supposed to represent an
employee entity-not a department entity
Moreover, when the first employee is assigned to that
department, we do not need this tuple with null values
any more.
8. Data Anomalies: Deletion anomalies
E.g: If we delete from EMP_DEPT an employee tuple that
happens to represent the last employee working for a
particular department, the information concerning that
department is lost from the database
These type of anomalies occur when critical data has been
unintentionally (perhaps) removed from the database
8
9. Data Anomalies: Modification/Update Anomalies
These anomalies arise when the database must make
multiple changes on records to reflect a single attribute
change
Example:
In EMP_DEPT, if we change the value of one of the
attributes of a particular department-say, the manager of
department 5-we must update the tuples of all employees
who work in that department; otherwise, the database will
become inconsistent.
If we fail to update some tuples, the same department will
be shown to have two different values for manager in
different employee tuples, which would be wrong
9
10. Practical Use of Normal Forms
10
Normal form: Condition using keys and FDs of a
relation to certify whether a relation schema is in a
particular normal form
The practical utility of these normal forms becomes
questionable when the constraints on which they are
based are hard to understand or to detect
Denormalization: the process of storing the join of
higher normal form relations as a base relation—
which is in a lower normal form
11. Normalization and Normal Forms
The normalization process, as first proposed by Codd (l972a),
takes a relation schema through a series of tests to "certify"
whether it satisfies a certain normal form.
Normalization helps to:
Eliminate redundancy
Organize data efficiently
Reduce the potential for anomalies during data operations,
and
Improve data consistency
11
12. Normalization and Normal Forms
In the relational model, methods exist for quantifying how
efficient a database is.
These classifications are called normal forms (or NF), and
there are algorithms for converting a given database between
them
Edgar F. Codd originally established three normal forms:
1NF
2NF and
3NF
Later, others like BCNF, 4NF and 5NF were introduced and
were generally accepted, but 3NF is widely considered to be
sufficient for most applications
Most tables when reaching 3NF are also in BCNF (Boyce-
Codd Normal Form)
12
13. Normal Forms: First Normal Form (1NF)
A relation (table) R is in 1NF if and only if all underlying domains of
attributes contain only atomic values (simple/non divisible)
Each attribute must be atomic
• No repeating columns within a row
• No multi-valued columns.
1NF simplifies non atomic attributes
• Queries become easier
Normalization (Decomposition)
There are three options to normalize a relation into 1NF (as discussed
in the next slide) but the best option is to form new relation for each
non-atomic attribute or nested relations
Example: Employee Relation ( un normalized)
emp_no name dept_no dept_name skills
1 Kevin Jacobs 201 R&D C, Perl, Java
2 Barbara Jones 224 IT Linux, Mac
3 Jake Rivera 201 R&D DB2, Oracle, Java
13
14. Normal Forms: First Normal Form (1NF)
There are three techniques to achieve a 1NF for such relation:
Expand the key so that there will be a separate tuple in the original Employee
relation for each Skill of Employee. But this option has the disadvantage of
introducing redundancy in the relation
Remove the attribute Skills that violates 1NF and place it in a separate relation
EMP_SKILLS along with the primary key Emp_no of Employee
This decomposes the non-1NF relation into two 1NFrelations with the
following Schemas:
Employee (emp_no,name,dept_no,dept_name)
Emp_Skills (emp_no,skills)
If a maximum number of values is known for the non-atomic attribute-for
example, if it is known that at most three skills can exist for an employee-
replace the Skills attribute by three atomic attributes: Skill1,Skill2 and
Skill3. But this has the disadvantage of introducing null values if some
employees has less than three skills
emp_no name dept_no dept_name skills
1 Kevin Jacobs 201 R&D C
1 Kevin Jacobs 201 R&D Perl
1 Kevin Jacobs 201 R&D Java
2 Barbara Jones 224 IT Linux
2 Barbara Jones 224 IT Mac
3 Jake Rivera 201 R&D DB2
3 Jake Rivera 201 R&D Oracle
3 Jake Rivera 201 R&D Java
14
15. Normal Forms: Second Normal Form (2NF)
Second normal form (2NF) is based on the concept of full
functional dependency. A functional dependency XY is a
full functional dependency if removal of any attribute A from X
means that the dependency does not hold any more;
The test for 2NF involves testing for functional
dependencies whose left-hand side attributes are part of
the primary key. If the primary key contains a single
attribute, the test need not be applied at all
15
16. Example: Consider Employee -Project Relation schema below
The relation is in 1NF
But the functional dependencies FD2 and FD3 make ENAME, PNAME, and
PLOCATION partially dependent on the primary key {SSN, PNUMBER} of
EMP_PROJ, thus violating the 2NF test.
Normalizing the relation into 2NF hence leads to the decomposition of
EMP_PROJ into the three relation schemas EPl, EP2, and EP3 as shown below:
Normal Forms: Second Normal Form (2NF)
16
17. Normal Forms: Third Normal Form (3NF)
Third normal form (3NF) is based on the concept of
transitive dependency
A functional dependency XY in a relation schema R is a
transitive dependency if there is a set of attributes Z that is
neither a candidate key nor a subset of any key of R, and
both XZ and ZY hold.
Example:
The dependency SSNDMGRSSN is transitive through
DNUMBER in EMP_DEP
17
18. Normal Forms: Third Normal Form (3NF)
Example:
The relation EMP_DEPT is in 2NF since no partial dependencies on a key exist
However, EMP_DEPT is not in 3NF because of the transitive dependency of
DMGRSSN (and also DNAME) on SSN via DNUMBER.
We can normalize EMP_DEPT by decomposing it into the two 3NF relation
schemas ED1 and ED2 as shown below
18
20. Denormalization
Normalization is performed to reduce or eliminate Insertion, Deletion or
Update anomalies
However, a completely normalized database may not be the most
efficient or effective implementation
“Denormalization” is sometimes used to improve efficiency.
Denormalization
Is the process of selectively taking normalized tables and re-combining
the data in them
Usually driven by the need to improve query speed.
20
21. Normalization
Improves maintenance for database changes
Tends to slow down retrieval
Better at finding problems than solving them
Standard normalization procedures are subtle and
may introduce BCNF or 4NF problems into tables
22. Intuitive(Accepted) by Normalization
1NF Tables represent entities
2NF Each table represents only one entity
3NF Tables do not contain attributes from embedded
entities
4NF Triple relationships should not represent a pair
of dual relationships
23. Exercise
1. Given the Grade report relation below and its functional dependencies,
normalize the relation
Gradereport (StudNo, StudName, Major, Advisor, CourseNo, Ctitle, InstName,
InstrucLocn, Grade)
Functional Dependencies:
• StudNo -> StudName
• CourseNo -> Ctitle, InstrucName
• InstrucName -> InstrucLocn
• StudNo, CourseNo, Major -> Grade
• StudNo, Major -> Advisor
• Advisor -> Major
23
25. Example: Company Database
The COMPANY database keeps track of a company's employees, departments,
and projects. Suppose that after the requirements collection and analysis phase,
the database designers provided the following description of the the part of the
company to be represented in the database:
The company is organized into departments. Each department has a unique
name, a unique number, and a particular employee who manages the department.
We keep track of the start date when that employee began managing the
department. A department may have several locations.
A department controls a number of projects, each of which has a unique name, a
unique number, and a single location
We store each employee's name, social security number, address, salary, sex, and
birth date. An employee is assigned to one department but may work on several
projects, which are not necessarily controlled by the same department. We keep
track of the number of hours per week that an employee works on each project.
We also keep track of the direct supervisor of each employee
We want to keep track of the dependents of each employee for insurance pur-
poses. We keep each dependent's first name, sex, birth date, and relationship to
the employee.
25
28. Reading Assignments
1. Discuss the correspondences between the ER model constructs and the
relational model constructs. Show how each ER model construct can be
mapped to the relational model, and discuss any alternative mappings
2. Discuss the options for mapping EER model constructs to relations.
3. Why should nulls in a relation be avoided as far as possible?
4. What does spurious tuples refer to? Discuss the problem of spurious
tuples and how we may prevent it
5. Discuss insertion, deletion, and modification anomalies. Why are they
considered bad? Illustrate with examples.
6. What does the term unnormalized relation refer to?
7. What undesirable dependencies are avoided when a relation is in 2NF?
8. What undesirable dependencies are avoided when a relation is in 3NF?
9. Define Boyce-Codd normal form. How does it differ from 3NF?Why is it
considered a stronger form of 3NF?
28
Normalization splits database information across multiple tables.
To retrieve complete information from a normalized database, the JOIN operation must be used.
JOIN tends to be expensive in terms of processing time, and very large joins are very expensive.
Examples:
If we have transitive dependency in a relation, it means there are different entities in a relation