Generic or specific? Making sensible software design decisions
Lesson06 database design
1. 6 September 2019 S M Irteza / Dr Rafi Ullah 1
LESSON 06
DATABASE DESIGN
2. 6 September 2019 S M Irteza / Dr Rafi Ullah 2
Relation Design
How to build efficient and problem free
relations
How to evaluate which design is more
efficient.
3. 6 September 2019 S M Irteza / Dr Rafi Ullah 3
Normalization
A design technique for building good table structures
NORMAL FORMS:
A relation is in particular normal form (NF) if it
satisfies a certain set of constraints.
1st NF , 2nd NF , 3rd NF , BCNF , 4th NF , 5th NF
1st NF = Lowest
2nd NF = Higher than 1st and so on…
4. 6 September 2019 S M Irteza / Dr Rafi Ullah 4
Normalization
Higher NF is more desirable
Higher the NF better is the table.
Subset relationship b/w NFs
If a table is in 3rd NF , it already satisfy 1st and 2nd
NF.
3rd
2nd
1st
5. 6 September 2019 S M Irteza / Dr Rafi Ullah 5
Normalization
If table is in BCNF than it already satisfy
1st , 2nd and 3rd NF
HOW TO DETERMINE THE CURRENT NF:
Use test and fail method
We test it for 2nd NF, if table passes it then it is now
in at least 2nd NF and so on,
But if it fails 2nd NF , it is in 1st NF.
If tables are in 3rd NF it is good enough for most
practical purposes.
6. 6 September 2019 S M Irteza / Dr Rafi Ullah 6
Normalization Process
A relationship in some given NF can be converted in
to a set of relations In higher NF.
7. 6 September 2019 S M Irteza / Dr Rafi Ullah 7
Functional Dependencies
A key component in defining several NFs , ( tests are
based on functional dependencies FD)
<definition>
Let A and B attributes in a relationship R , B is
functionally dependent on A if for a given value of A,
there corresponds precisely one value of B.
8. 6 September 2019 S M Irteza / Dr Rafi Ullah 8
is the symbol for FD.
S# City
As a particular supplier has a specific city,
BUT
City S# ? (wrong)
E.g
London S1
London S2 So many answers
London S3
S# SNAME STATUS CITY
S1 Smith 20 London
S2 Jones 10 Paris
S3 Blake 30 Paris
S4 Clark 20 London
S5 Adams 30 Athens
Supplier S
9. 6 September 2019 S M Irteza / Dr Rafi Ullah 9
But if we say
S# city
S1 London (only one value so correct , is FD)
So only when we can answer with a single answer,
there is a Functional Dependency.
P# color (is correct)
Color P# ( is wrong)
E.g.
City Area Code
Islamabad 051
Pindi 051 That is OK
City X 051
10. 6 September 2019 S M Irteza / Dr Rafi Ullah 10
Nation President
And
President Nation
So
Nation President
If Attribute x is the P.K then all the other attributes in
the table must be F.D on x.
e.g
A(x,y,z) P.K = x
So x y
x z
11. 6 September 2019 S M Irteza / Dr Rafi Ullah 11
Some times it is dangerous to answer F.D based on
current values e.g.
In a table SName might be unique for a while but
after a while we might get more suppliers with
same name.
12. 6 September 2019 S M Irteza / Dr Rafi Ullah 12
Various Normal Forms
1) First Normal Form (1st N.F)
<definition>
a relation is in 1st NF iif all cells contain atomic
(single) values.
2) Second Normal Form (2nd N.F)
< definition >
a relation is in 2nd N.F iif it is in 1st N.F and contains
no partial dependencies.
< definition >Partial Dependencies
when a non-key attribute is FD on only a part of the
PK
I.e, the P.K has to be composite in this relation.
non-key = not a part of P.K or C.K
13. 6 September 2019 S M Irteza / Dr Rafi Ullah 13
A partial dependency exists only if the P.K is
composite. E.g
Relation(S#, P#, SName, QTY)
P.K = (S#,P#)
S# SName
so this table fails to be in 2nd N.F & is in 1st N.F
If the P.K is not composite the table is at least in 2nd N.F
14. 6 September 2019 S M Irteza / Dr Rafi Ullah 14
Second Normal Form e.g
P.K = (S#,P#)
Additional FDs
S# city
S# status
city status
Table is in 1st NF
Because it fails to be in 2nd N.F
P# S# CITY STATUS QTY
P1 S1 London 30 200
P2 S1 London 30 300
P3 S1 London 30 200
15. 6 September 2019 S M Irteza / Dr Rafi Ullah 15
Problems
When table fails to be in 2nd NF several problems arise
1. Redundancy
2. Anomalies: means operational problems , some
operations don’t work or work badly.
Insertions
Deletions
Update
16. 6 September 2019 S M Irteza / Dr Rafi Ullah 16
Anomalies
Insertions:
Cannot insert the fact that a particular supplier is
located in a particular city until that supplier makes
at least one shipment
Because P.k consists of P# from the order.
E.g. S7 in Tokyo
17. 6 September 2019 S M Irteza / Dr Rafi Ullah 17
Anomalies
Deletion
If we delete the only tuple for a particular supplier we
destroy not only the shipment info but also the
supplier’s city information.
E.g.
S4, P1, Paris, 40, 100
Only one shipment for S4, so if we delete the row ,
we loose suppliers info for ever.
18. 6 September 2019 S M Irteza / Dr Rafi Ullah 18
Anomalies
Update:
The redundancy causes update problems
E.g.
If S1 moves from London to Paris, so we have
to update so many tuples
Inconsistency might occur.
19. 6 September 2019 S M Irteza / Dr Rafi Ullah 19
Strategy For Normalizing Process
(1st to 2nd)
The key for 1st to 2nd conversion is to detect or
identify and eliminate the partial dependencies.
This is achieved by decomposing the attributes
involved in partial dependencies on to a new table.
20. 6 September 2019 S M Irteza / Dr Rafi Ullah 20
Strategy For Normalizing Process
(1st to 2nd)
DECOMPOSITION:
First (S#, City, Status, P#, Qty )
Where P.K is (S#,P#)
Now Partial Dependencies that exists in this examples
are
S# City
S# Status
Decompose the table First into 2 tables.
21. 6 September 2019 S M Irteza / Dr Rafi Ullah 21
Strategy For Normalizing Process
(1st to 2nd)
Decompose the table First into 2 tables.
First (S#, City, Status, P#, Qty )
Second(S# , City , Status)
SP(S# , P# , Qty)
We Know S#, City and Status were attributes involved in
partial dependencies so we take them out.
22. 6 September 2019 S M Irteza / Dr Rafi Ullah 22
Problems Solved
1) Insertion
Before we had to have at least one delivery or
shipment from supplier to add a supplier but now we
can
First(S# , City , Status , P# , Qty)
But now we have
Second( S# , Status , City)
So Insertion Problem that raised was cured.
23. 6 September 2019 S M Irteza / Dr Rafi Ullah 23
Problems Solved
2) Deletion
Before if we wanted to erase one shipment we ended up
(possibly) deleting the supplier completely from the
database.
Now we can delete the shipment and supplier remains
definitely.
So the deletion problem is solved.
24. 6 September 2019 S M Irteza / Dr Rafi Ullah 24
Problems Solved
3) Update
We were repeating too much redundant information
For example, if S1 supplied 10 parts then we repeat 10
times the City and Status and had to change 10 rows if S1
moved to another City.
Now City is written only once.
So the update problem is solved.
Second (S#, City, Status)
SP (S#, P#, Qty)
25. 6 September 2019 S M Irteza / Dr Rafi Ullah 25
Note !
Decomposition is done by suitable projections
(Decomposition Operator)
RECOMPOSITION:
The original table can be recovered by taking the JOIN of
the new tables
JOIN is called Re-composition Operator
26. 6 September 2019 S M Irteza / Dr Rafi Ullah 26
Re-Composition Example
R( A , B , C , D)
P.K is (A , B )
F.D Are
A D
What N.F is R in ?
Table R is in 1st N.F because it fails in 2nd N.F. Reason:
there exists a Partial Dependency AD
How To Decompose ???
R1(A, D)
R2(A, B, C)
27. 6 September 2019 S M Irteza / Dr Rafi Ullah 27
3rd Normal Form
<Definition>
A Relation is in 3rd N.F iif it is in 2nd N.F and there is no
“Transitive Dependencies”
< Definition >
If a non-key attribute functionally determines another
non-key attribute then the F.D is called Transitive
Dependency.
28. 6 September 2019 S M Irteza / Dr Rafi Ullah 28
3rd Normal Form
e.g.
Second( S# , City , Status)
P.K = S#
Now transitive dependencies that exists is
City Status (Transitive Dependency T.D)
The second table SP is now perfect.
29. 6 September 2019 S M Irteza / Dr Rafi Ullah 29
Problems
Insertion
Cannot insert the fact that a particular city has a particular
status until we have some supplier actually located in that
city.
Status might represent per capita income etc.
30. 6 September 2019 S M Irteza / Dr Rafi Ullah 30
Problems
Deletion
If we delete the only tuple for a particular city we destroy
not only the supplier info but also the city’s status info.
Second( S# , City , Status)
E.g S5 , Boston , 60 is a record in second table.
If this is deleted and it’s the only one containing Boston
then , Boston info is destroyed for good.
31. 6 September 2019 S M Irteza / Dr Rafi Ullah 31
Problems
Update
The redundancy of status can cause update problems.
I.e change status of London e.g. in all the tuples from 30
to 40.
So we have to search and change all tuples which had
London in them .
32. 6 September 2019 S M Irteza / Dr Rafi Ullah 32
Strategy For Normalization
(2nd to 3rd)
The Key to 2nd to 3rd N.F is to delete and eliminate
Transitive Dependencies by decomposing the table into
more tables
I.e separate out the attributes involved in T.D onto a new
table.
DECOMPOSITION:
Decompose Second into 2 tables:
S1(City , Status)
S2(S# , City)
Cant be status as 2 cities can have same status at a
time.
33. 6 September 2019 S M Irteza / Dr Rafi Ullah 33
Problems Solved
1) Insertion
Insertion anomaly is solved , we can now add a
city without creating a supplier.
2) Deletion
now if we delete a supplier than chances for
deleting the data for a city and status for good are
none.
3) Update
Update problem is solved as there exists very less
repeated data or redundant data.
34. 6 September 2019 S M Irteza / Dr Rafi Ullah 34
Stratergy For Normalization
(2nd to 3rd) example
S( x,y,z)
P.K = x
T.D = (z y)
What N.F is S in ?
this table is in 2nd N.F as it fails to be in 3rd normal form
because of the T.d (zy)
How to decompose ??
S1 (z,y)
S2 (x,z)
This Decomposition can be a little tricky as there are
some other ways to decompose the same table too.
35. 6 September 2019 S M Irteza / Dr Rafi Ullah 35
Good And Bad Decomposition
If Decomposition is Bad , joining the new table back together
may cause the table to re-appear with some additional ‘false’
tuples.
A NON-LOSS decomposition guarantees that the JOIN produces
exactly the original relation.
A LOSSY decomposition loses information, in the sense that the
JOIN may produce a superset of the original and there is no way
of knowing which tuples are true andd which are false
36. 6 September 2019 S M Irteza / Dr Rafi Ullah 36
Good And Bad Decomposition example
Second(S# , CITY , STATUS) ; P.K = S#
T.D = City Status
DECOMPOSITION #1
SC(S# , City) & CS(City , Status)
a Non-Loss and Good decomposition.
DECOMPOSITION #2
SS(S# , Status) & SC(S# , City)
a “Non-Loss” and less satisfactory decomposition.
DECOMPOSITION #3
SS(S# , Status) & CS(City , Status)
A “Lossy” and Bad Decomposition
37. 6 September 2019 S M Irteza / Dr Rafi Ullah 37
Decomposition example 1
Decomposition #1
When we join SC and CS we get the same table as before.
So good and NON-LOSS decomposition, always desired.
S# CITY STATUS
S1 London 30
S2 Paris 30
CITY STATUS
London 30
Paris 30
S# CITY
S1 London
S2 Paris
SC
CS
SECOND
S# CITY STATUS
S1 London 30
S2 Paris 30
38. 6 September 2019 S M Irteza / Dr Rafi Ullah 38
Decomposition example 2
Decomposition #2
Gives the same table when JOIN is used but it is less
satisfactory, we need status with city etc.
S# CITY STATUS
S1 London 30
S2 Paris 30
SS
SC
SECOND
S# CITY
S1 London
S2 Paris
S# STATUS
S1 30
S2 30
S# CITY STATUS
S1 London 30
S2 Paris 30
39. 6 September 2019 S M Irteza / Dr Rafi Ullah 39
Decomposition example 3
S# CITY STATUS
S1 London 30
S2 Paris 30
SS
CS
SECOND
CITY STATUS
London 30
Paris 30
S# STATUS
S1 30
S2 30
S# STATUS
S1 30
S2 30
STATUS CITY
30 London
30 Paris
S# CITY STATUS
S1 London 30
S1 Paris 30
S2 London 30
S2 Paris 30
When JOIN is used we get False Tuples as
shown . Remember status of 2 cities may
be same.So LOSSY and Dangerous
JOIN
40. 6 September 2019 S M Irteza / Dr Rafi Ullah 40
Verifying a “GOOD” Decomposition
RULES
1) The common attribute of the new table forms a candidate
key for at least one of the pair
If fails LOSSY
Decomposition #1 : SC(S# , City) And CS(City ,
Status)
Decomposition #2 : SS(S# , Status) And SC(S# ,
City)
2) Every F-D in the original can be logically deduced from
those F-D in the new tables
If Fails Less Satisfactory
For example:
41. 6 September 2019 S M Irteza / Dr Rafi Ullah 41
Verifying a “GOOD” Decomposition
Second(S# , City , Status) FD are S# City
S# Status
City Status
Decomposition 2
Relations: SS(S# , Status) SC(S# , City)
F.D: S# Status S# City
So we cannot logically deduce status from city etc. hence less
satisfactory.
42. 6 September 2019 S M Irteza / Dr Rafi Ullah 42
Verifying a “GOOD” Decomposition
Second(S# , City , Status) FD are S# City
S# Status
City Status
Decomposition 1
Relations: SC(S# , City) CS(City , Status)
F.D: S# City City Status
So we can logically deduce the status from S# or City. Hence
GOOD , so this logically deduce all FDs .
43. 6 September 2019 S M Irteza / Dr Rafi Ullah 43
Good And Bad Decomposition example 2
PCZ (phone , company , zip)
P.K ( Phone)
F.D Zip Company
phone Zip
Phone Company
As P.K is not composite so it is at least in second N.F
44. 6 September 2019 S M Irteza / Dr Rafi Ullah 44
Good And Bad Decomposition example 2
PCZ (phone , company , zip)
Good And Non-Lossy
PZ(Phone , Zip)
ZC(Zip , Company)
Less Satisfactry and Non-Lossy
PZ(phone , Zip)
PC(Phone , Company)
Bad and Lossy
PC(Phone , company)
CZ(Company , Zip)
•One zip is handled by
only 1 company.
•One company can have
control over many zips.