2. AKN/IDBII.2Introduction to databases
The Goal
The goal of relational database design is to
generate a set of relation schemas that allows
to store information without unnecessary
redundancy,
also allows us to retrieve information easily
and efficiently.
3. AKN/IDBII.3Introduction to databases
Redundancy: The Problem
Consider a relation schema
instDept (ID, name, salary, dept name, building, budget)
Problems
For each instructor of same department the building
and budget information gets repeated.
If a new department is opened, then database is
unable to keep this department information until a
new instructor is appointed.
What is the assurance that, one department is
housed in one building, and one budget?
4. AKN/IDBII.4Introduction to databases
Solution
The database design tries to avoid these
problems using the concept of normalization
It is the technique of designing the relation schema
in compliance to one of the several normal forms.
Normal forms are the well defined rules to avoid
unnecessary redundancy and other anomalous
conditions.
6NF 5NF
4NF
BCNF
3NF
2NF 1NF
Arranged
according to
strictness, i.e. 6th
is highest and 1st
is lowest
5. AKN/IDBII.5Introduction to databases
Anomalies in Relational Database-I
If a database not designed properly may exhibit
following anomalies.
Redundancies (repetition of information )
Unnecessary wastage of disk space.
studNum Address deptNum deptName Building
S21 Patna 5 CSIT C-Block
S22 Edinburgh 5 CSIT C-Block
S23 BBSR 4 MECH B-Block
S24 KolKata 4 MECH B-Block
S25 Manchester 1 PHY D-Block
Any change to department building information need
to be updated in multiple records, that may lead to
inconsistency.
6. AKN/IDBII.6Introduction to databases
Anomalies in Relational Database
Insertion Anomaly
If a new department is opened, then there is no
scope to insert this information into the database
unless a student gets admitted in to the department
Deletion Anomaly
If the last student of a department leaves the college and
hence deleted from the database, then the department
information also deleted from the database forever.
All these problems do occur due to the faulty design of
the database.
Therefore, database should be designed using
normalization techniques that assures avoidance of
redundancy and hence anomalies.
7. AKN/IDBII.7Introduction to databases
First Normal Form - I
A relation schema R is said to be in 1NF, if the domain
of all attributes in R is atomic in nature.
A domain is atomic if elements of the domain are of
indivisible units
i.e. according to 1NF, there can’t be sub-structure
within a column and the value present in each
attribute is never a set of values or a list of values.
Examples
Sub-structure: address (street, city, state, pin), regNo
(SOAITERCSIT2016A101)
Set/List of values: multiple phone numbers, mail ids,
names etc.
8. AKN/IDBII.8Introduction to databases
First Normal Form - II
regNo (SOAITERCSIT2016A101) : The dept of a student
can be found by writing code (extra programming!)
i.e. information coded in programming rather than data
If this attribute is used as primary key, and the student
changes department!
The regNo of that student interpreted by code gives wrong
result!
need to be changed every where it occurs – a difficult task
However, In some domains entities may have a
complex structure, forcing an 1NF puts an extra burden
on programmer to write code to convert data back
and forth.
In fact modern databases do support many non-
atomic values!
9. AKN/IDBII.9Introduction to databases
Functional Dependency
It is a formal methodology for evaluating whether a
relational schema should be decomposed.
Notations used
relation schema: r(R)
i.e. r : relation and R: set of attributes. and r(R) R, when
relation name is not important.
K : super key of r(R)
Only r : instance of relation r
There exists certain constraints on the data
Students and instructors are uniquely identified by their ID.
Each student and instructor has only one name.
Each instructor and student is (primarily) associated with only
one department etc.
10. AKN/IDBII.10Introduction to databases
Super Key
An instance of a relation that satisfies all such real-world
constraints is called a legal instance of the relation
Super Key: A subset K of R is a superkey of r(R),
if t1 ≠ t2, then t1[K] ≠ t2[K], for all pairs t1 and t2 of tuples in the
instance of r
That is, no two tuples in any legal instance of relation r (R) may
have the same value on attribute set K.
A super key uniquely identifies a tuple in r
A functional dependency allows us to express
constraints that uniquely identify the values of certain
attributes.
11. AKN/IDBII.11Introduction to databases
Functional Dependency - I
Let x,y R, then the instance of r(R) is said to be
satisfying functional dependency x y,
If t1[x] = t2[x], then t1[y] = t2[y], for all pair of tuples t1 and t2
Functional dependency x y holds on schema r (R) if,
in every legal instance of r (R), it satisfies the functional
dependency.
Functional dependency is a generalization of key
concept of database. i.e.
K is a super key if, for every pair of tuples t1 and t2,
If t1[K] = t2[K], then t1[R] = t2[R]. i.e. (t1 = t2)
i.e. K is a superkey of r (R) if the functional dependency K→R
holds on r (R). (K R), and K uniquely determines tuples in r(R)
12. AKN/IDBII.12Introduction to databases
Example: FD
Consider the relation schema
account(accNum, balance, brID).
There exists functional dependency like
accNum balance
i.e. if t1[accNum] = t2[accNum ], then t1[balance] =
t2[balance] etc.
accNum brID,
. . .
accNum accNum, balance, brID
i.e. accNum uniquely determines the tuples in account
relation.
Therefore accNum shall be the key
13. AKN/IDBII.13Introduction to databases
Example-II
Find Functional dependencies
A B A C A D
B A C A D A
A A B B
AB A AB B
These FDs are satisfied by all relations and are called
trivial functional dependency
A FD of the form x y in r(R) are said to be trivial FD
if y x, x, y R
14. AKN/IDBII.14Introduction to databases
Clousure of FD Set
The given set of Fds may logically infer few more FDs
For any FD set F, the set of all FDs that can be inferred
is called the closure of F and is denoted by F+.
Example: Let r(A,B,C,D,E) and given F={A D, D B, B
C}
Then F+ = {A D, D B, B C, A B, A C, D C}
The rules (Axioms) used to find the closure of FD set is
called Armstrong's Axioms
Rule 1: Reflexivity Rule
If y x, then x y holds
Rule 2: Augmentation Rule
If x y, then zx zy holds
15. AKN/IDBII.15Introduction to databases
Armstrong’s rule contd.
Rule 3: Transitivity Rule
If x y, AND y z then x z holds
Armstrong’s rules are sound and complete, but to find
closure some more rules are derived from these
axioms.
Rule 4: Union Rule
If x y, AND x z then x yz holds
Rule 5: Decomposition Rule
If x yz then x y, AND x z holds
Rule 6: Pseudo-transitivity Rule
If x y, AND yz w then xz w holds
16. AKN/IDBII.16Introduction to databases
Example: Finding F+
Let R=(A, B, C, G, H, I) and F={A B, A C, CG H,
CG I, B H}. Find F+.
A B AND B H A H (Transitivity)
CG H AND CG I CG HI (Union)
A C AND CG I AG I (Pseudo-transitivity)
F+ = {
A B,
A C,
CG H,
CG I,
B H,
A H,
CG HI,
AG I }
17. AKN/IDBII.17Introduction to databases
Attribute Closure
a b : b is functionally determined by a
Can we know whether a is a super key?
i.e. if we can prove that a functionally determines all
other attributes.
Solution: Compute F+ then consider all FDs taking a as
the LHS and take the union of the RHS. However, the
process is expensive if F+ is large.
The attribute closure of x, represented as x+ represents
all those attributes of R that can be functionally
determined from x.
Attribute closure may be used to
Find if an attribute or a set of attributes is a key. i.e. If x+=R,
then x is a key of r(R)
To determine, if the FD x y holds
18. AKN/IDBII.18Introduction to databases
Ex:Attribute Closure
Example 1: R=(A, B, C, D, E), F={A CD, C B, B E
}, find the key.
Solution
A+ = {ABCDE} : A is a key
BC+={BCE}
B+ = {BE}
Example 2: For the above example, check if A
functionally determines E?
Solution
A+ = {ABCDE} , so A E is true
19. AKN/IDBII.19Introduction to databases
Decomposition
Relational DB design requires a relation schema to be
decomposed into more than one relation as a process
of DB normalization.
Any decomposition of a relation schema must satisfy
following properties
Lossless decomposition
Dependency preservation
20. AKN/IDBII.20Introduction to databases
Lossless Decomposition
If R be decomposed into two relation schema R1 and
R2, then the decomposition is said to be lossless
if no DB information is lost in the process of decomposition and
all information can be recalled by joining the decomposed
relation schemas.
In other words the decomposition is loss less
If r1(R1) ⨝ r2 (R2) = r(R), ⨝ : join operator
The above decomposition can be verified for its
lossless property if any one of the following holds. i.e.
Either R1 R2 R1
Or R1 R2 R2
A decomposition is lossless if the decomposed integrity shares
referential integrity among them. i.e. if P(K) of one relation is F(K)
of another relation.
21. AKN/IDBII.21Introduction to databases
Dependency Preservation
If R with FD set F be decomposed into two relation
schema R1 and R2, resulting two FD sets as F1 and F2
respectively then the decomposition is said to be
dependency preserving if it satisfying
(F1 F2)+ = F+
That is if no FD exhibited by original relation schema is lost in the
process of decomposition.
Example1:
Let R=(A, B, C) and F = {A B, B C} is decomposed as R1=(A, B)
with F1 = {A B} and R2(B, C) with F2 = {B C}
Here (F1 F2)+ = F+ , Therefore dependency preserved
Example2:
Let R=(A, B, C) and F = {A B, B C} is decomposed as R1=(A, B)
with F1 = {A B} and R2(A, C) with F2 = {A C}
Here (F1 F2)+ ≠ F+ , Therefore dependency is not preserved
22. AKN/IDBII.22Introduction to databases
Second Normal form
A relation schema is said to be in second normal form,
if it does not exhibit any partial functional
dependency
If a relation schema is having a composite primary
key, then
there may exist a FD where a part of the key functionally
determines non-key attributes
such FDs are referred as partial functional dependency.
Ex. R(A, B, C, D, E), F={AB C, B D, D E }
R exhibits a partial FD of the form, B D
Hence it does not satisfy 2NF
23. AKN/IDBII.23Introduction to databases
Normalizing to 2NF
Divide R(A, B, C, D, E) into two relations
R1(A,B,C), F1={ABC}, key={AB}
R2(B,D,E), F2={B D, D E}, key={B}
For R1 and R2 individually no partial FD, so they are
now normalized to 2NF
R1 R2 = B R2, so the decomposition is lossless
F1 F2 = F, so it is dependency preserving
Problem: Check if the following relation is in 2NF, if not
normalize it
order(orderNum, clientNum, itemNo, unitPrice, qty)
F={orderNum clientNum
itemNum unitPrice
orderNum, itemNum qty }
Key={orderNum,itemNum}
24. AKN/IDBII.24Introduction to databases
Solution - I
order exhibits partial dependency of the form,
orderNum clientNum,
itemNum unitPrice, it exhibits partial functional dependency,
hence does not satisfy 2NF
Normalization: divide the relation into the followings
orderItem(orderNum, itemNum,qty),
F1={orderNum, itemNum qty} , key1={orderNum, itemNum}
orderClient(orderNum,clientNum),
F2={orderNum clientNum}, key2={orderNum}
item(itemNum,unitPrice),
F3={itemNum unitPrice}, key3={itemNum}
25. AKN/IDBII.25Introduction to databases
Solution - II
Check for lossless decomposition
orderItem orderClient = orderNum orderClient
orderClient item = itemNum item, so lossless
Check for dependency preserving
F1 F2 F3 = F, so it is also dependency preserving
Therefore, the relation schemas are in 2NF
N.B.: A relation schema having singular or non-
composite primary key is always in 2NF! (why?)
as it can not have partial FD
26. AKN/IDBII.26Introduction to databases
Example
Check if the following relation is in 2NF, if not normalize
it.
F={Manufacturer → Manufacturer Country
Manufacturer, Model → ModelFullName}
Key={Manufacturer, Model }
Composite hence not in 2NF
Manufacturer Model ModelFullName
Manufacturer
Country
Forte X-Prime Forte X-Prime Italy
Forte Ultraclean Forte Ultraclean Italy
Dent-o-Fresh EZbrush Dent-o-Fresh EZbrush USA
Kobayashi ST-60 Kobayashi ST-60 Japan
Hoch Toothmaster Hoch Toothmaster Germany
Hoch X-Prime Hoch X-Prime Germany
27. AKN/IDBII.27Introduction to databases
Solution
Break it to two tables as follows
Key1={Manufacturer}
Key2={Manufacturer, Model}
Lossless?
Dependency preserving?
Manufacturer
Manufacturer
Country
Forte Italy
Dent-o-Fresh USA
Kobayashi Japan
Hoch Germany
Manufacturer Model Model Full Name
Forte X-Prime Forte X-Prime
Forte Ultraclean Forte Ultraclean
Dent-o-Fresh EZbrush Dent-o-Fresh EZbrush
Kobayashi ST-60 Kobayashi ST-60
Hoch
Toothmast
er
Hoch Toothmaster
Hoch X-Prime Hoch X-Prime
28. AKN/IDBII.28Introduction to databases
Third Normal Form (3NF)
A relation r(R), with a given set of FDs is said to be in
3NF ,
Defn 1: If for all FDs of the form X Y in F+, if any one
of the three following condition is satisfied
X Y is a trivial FD
X is the supper key
Y contains at least one prime attribute (key attribute)
Defn 2: If for all non-trivial FDs of the form X Y in F+, if
any one of the following two condition is satisfied
X is the supper key
Y contains at least one prime attribute (key attribute)
29. AKN/IDBII.29Introduction to databases
Third Normal Form (3NF)
Defn 3: If the schema does not exhibit any transitive
dependency of the form
key non-key non-key
That is a schema is said to be in 3NF, if it does not
exhibit any functional dependency from a non-key to
another non-key attribute(s).
Ex1. Consider the relation instance, check for 3NF, 2NF
studNum Address deptNum deptName Building
S21 Patna 5 CSIT C-Block
S22 Edinburgh 5 CSIT C-Block
S23 BBSR 4 MECH B-Block
S24 KolKata 4 MECH B-Block
S25 Manchester 1 PHY D-Block
30. AKN/IDBII.30Introduction to databases
Solution-I
Find Functional Dependencies
F = {studNum Address, deptNum, deptName, Building
deptNum deptName, Building}
Find the key
Key = {studNum}
Check for 3NF
studNum deptNum deptName, Building
i.e. key non-key non-key
Hence it is not in 3 NF
Decomposition
R1(studNum , Address, deptNum), R2(deptNum, deptName,
Building )
F1={studNum Address, deptNum},
F2={deptNum deptName, Building}
31. AKN/IDBII.31Introduction to databases
Solution-II
Decomposition continued
Key1 = {studNum}, key2={deptNum}
Hence R1 and R2 are now in 3NF as they does not
exhibit transitive dependency
Loss less decomposition
R1R2 = deptNum R2, hence loss less
Dependency Preservation
(F1 F2)+ = F, hence dependency preserving
2NF
There is no partial FD, therefore R1 and R2 are in 2NF
32. AKN/IDBII.32Introduction to databases
Example-2
Consider the relation schema R(A, B, C, D, E) with FD
set F={AB C, B D, D E}
What normal form R is in? Normalize the relation upto
3NF.
Solution:
Check for 2NF
Key={AB}
Partial FD, B D, hence not in 2NF
Decompose: R1 (A, B, C), R2(B, D, E)
F1={AB C}, F2={B D, D E}, key1 = {AB} , key2={B}
It is now in 2NF
33. AKN/IDBII.33Introduction to databases
Example-2 contd.
Check for 3NF
R1 in 3NF, R2 not in 3NF (?)
Transitive dependency in R2 (B D E)
Decompose R2: R3(B, D), R4(D, E)
F3={B D }, F4={D E}
Now both are in 3NF
Final Schema: R1(A, B, C), R3(B, D), R4(D, E)
Check for Loss less and dependency preservation
decomposition
34. AKN/IDBII.34Introduction to databases
Task
Consider the relation schema R(A, B, C, D, E) with FD
set F={AC B, E D, A E}
What normal form R is in? Normalize the relation upto
3NF.
35. AKN/IDBII.35Introduction to databases
Boyce Codd Normal Form (BCNF)
Defn 1: r(R) is said to be in BCNF with respect to F+, if for all FDs of
the form X Y in F+ any one of the following two conditions hold
X Y is trivial FD
X is the super key
Defn 2: r(R) is said to be in BCNF with respect to F+, if for all non-
trivial FDs of the form X Y in F+ and X is the super key
Defn 3: BCNF allows only those FDs where the left side
contains only the key of the relational schema.
Note:
BCNF is the highest possible normal form for relation schemas
only exhibiting FD
BCNF is more strict than 3NF
Every relation in BCNF is also in BCNF, however a relation in
3NF is not necessarily in BCNF.
36. AKN/IDBII.36Introduction to databases
Boyce Codd Normal Form (BCNF)
Example: check for 3NF and BCNF
R={A,B,C}
F={AB C,
C B }
3NF
both are non-trivial FD
C B : Y is a prime attribute and key non-key key
Hence in 3NF
BCNF
C B => non-key key, Hence not in BCNF
37. AKN/IDBII.37Introduction to databases
Boyce Codd Normal Form (BCNF)
Every relation in 3NF is also in BCNF, however a relation
in 3NF is not necessarily in BCNF.
Example:
R(property_id, countryName, lot#, area, price, taxRate)
F={property_id countryName, lot#, area, price, taxRate
countryName, lot# property_id #, area, price, taxRate
countryName taxRate
area price
area countryName
}
40. AKN/IDBII.40Introduction to databases
Example - III
Non key key
Area Country_name, hence not in BCNF
Normalize to BCNF
LOTS
LOTS1 LOTS2
LOTS1AX LOTS1AY LOTS1B LOTS2
LOTS1A LOTS1B LOTS2
1NF
2NF
3NF
BCNF
41. AKN/IDBII.41Introduction to databases
Limitations of BCNF
There exist multiple ways of decomposing/normalising
a non-BCNF schema to BCNF schemas
All possible BCNF decomposition although generates
loss-less property, it may not gurantee the property of
dependency preservation.
If the DB designer do not find a possible BCNF
decomposition, that gurantees dependency
preservation, they may have to restrict themselves for
the lower normal form, i.e. 3NF
42. AKN/IDBII.42Introduction to databases
Functional Dependency Contd.
In some cases, constraints can’t be expressed
as functional dependencies.
Ex. loan(custNum, loanNum, phoneNum)
One customer can have multiple loans and multiple
phone numbers
Is it in BCNF?
Key = {custNum, loanNum, phoneNum}
It exhibits trivial functional dependency hence in
BCNF
But still this schema exhibits redundancy
43. AKN/IDBII.43Introduction to databases
Example contd.
If we have two or more multi-valued independent
attributes, then we need to repeat every value of one
attribute with every value of another attribute to make
the relation consistent.
This type of constraint is specified by multi-valued
dependency.
Loan
custNum loanNum phoneNum
C1 L1 P1
C1 L1 P2
C1 L2 P1
C1 L2 P2
44. AKN/IDBII.44Introduction to databases
Multi-Valued Dependency
A multi-valued dependency (MVD) from X to Y
(X Y, X,Y R) specified on a relation r(R),
exibits following constraints on r: if two tuples t1
and t2 exist in r such that t1[x] = t2[x], then two
other tuples t3, t4 should also exist in r with
following properties.
t3[X]=t4[X]=t1[X]=t2[X]
t3[Y] =t1[Y] & t4[Y] = t2[Y]
t3[R-XY] = t2[R-XY] & t4[R-XY] = t1[R-XY]
45. AKN/IDBII.45Introduction to databases
Multi-Valued Dependency - I
Whenever X →→ Y holds, we say that X multi-
determines Y.
Because of the symmetry in the definition,
whenever X →→ Y holds in R, so does X →→ Z.
(Z=R-XY)
Hence, X →→ Y X →→ Z, and therefore it is
sometimes written as X →→ Y|Z.
An MVD X →→ Y in R is called a trivial MVD if
Y is a subset of X, or
X ∪ Y= R
46. AKN/IDBII.46Introduction to databases
Fourth Normal form (4NF)- I
If a relation schema r(R), with a given set of
dependencies D, where D includes FDs and
MVDs, then r(R) is said to be in 4NF if all MVDs
w.r.t. D+ holds any one of the following two
conditions.
X Y is a trivial MVD
X is a superkey
Example1: test if the relation schema is in 4NF
R(A,B,C,E) and
D={A E
AB
A C}
47. AKN/IDBII.47Introduction to databases
4NF Example contd.
It is not in 4NF because
AE is not a trivial MVD
A is not a superkey
Decompose into R1(A,E),D1(AE) and R2(A,B,C),
F2(AB, AC)
In R1: AE is trivial MVD, thus in 4NF
In R2: A is the key , thus in 4NF
Example 2: R(custNum, loanNum, phoneNum)
D={custNumloanNum,
custNumphoneNum}
Not in 4NF?
49. AKN/IDBII.49Introduction to databases
Denormalization for Performance
Occasionally database designers choose a schema
that has redundant information
They use the redundancy to improve performance for
specific applications.
The penalty paid for not using a normalized schema is
the extra work (in terms of coding time and execution
time) to keep redundant data consistent.
The process of taking a normalized schema and
making it non-normalized is called denormalization
Designers use it to tune performance of systems to
support time-critical operations.
A better alternative is to use the normalized schema,
and additionally store the join of them as a
materialized view.