2. n Normalization
n Normal Forms
n 1 NF
n 2 NF
n 3 NF
n Codd’s Rules
3. Data Normalization
n The purpose of normalization is to
produce a stable set of relations that is
a faithful model of the operations of the
enterprise.
n Achieve a design that is highly flexible
n Reduce redundancy
n Ensure that the design is free of certain
update, insertion and deletion anomalies
4. Normalization
1NF
1NF Flat file
2NF
2NF Partial dependencies removed
3NF
3NF Transitive dependencies removed
BCNF
BCNF Every determinant is a candidate key
4NF
4NF Non-tivial multi-valued dependencies
removed
5. Order No. 10001
Stereos To Go
Date: 6 / 15 / 99 Invoice
Stereos To Go Go, Hogs
Account No. 0000-000-0000-0
Customer: John Smith
0000 000 0000 0
Address: 2036-26 Street John Smith 1/05
Sacramento CA 95819
City State Zip Code
Date Shipped: 6 / 18 / 99
Item Product
Number Code Product Description/Manufacturer Qty Price
1 SAGX730 Pioneer Remote A/V Receiver 1 56995
2 AT10 Cervwin Vega Loudspeakers 35995
1
3 CDPC725 Sony Disc-Jockey CD Changer 1 39995
4
5
Subtotal 132985
Shipping & Handling 10000
Sales Tax 10306
Total 153291
6. Unnormalized Relation
(Invoice_number, Invoice_date, Date_delivered, Cust_account
Cust_name Cust_addr Cust_city Cust_state Zip_code,
Item1 Item1_descrip Item1_qty Item1_price,
Item2 Item2_descrip Item2_qty Item2_price, . . . ,
Item7 Item7_descrip Item7_qty Item7_price)
How would a program process the data to recreate the invoice?
7. Unnormalized to 1NF
(Invoice_number, Invoice_date, Date_delivered, Cust_account
Cust_name Cust_addr Cust_city Cust_state Zip_code,
Item1, Item1_descrip, Item1_qty, Item1_price,
Item2, Item2_descrip, Item2_qty, Item2_price, . . . , Repeating groups
Item7, Item7_descrip, Item7_qty, Item7_price)
A flat file places all the data of a transaction into a single record.
record.
This is reminiscent of a COBOL or BASIC program
processing a single transaction with one read statement.
8. Unnormalized to 1NF
(Invoice_number, Invoice_date, Date_delivered, Cust_account,
Cust_name, Cust_addr, Cust_city, Cust_state, Zip_code,
Item, Item_descrip, Item_qty, Item_price)
Nominated group of attributes
to serve as the key
(form a unique combination)
• Eliminate the repeating groups.
• Each row retains data for one item.
• If a person bought 5 items, we
would have five tuples
9. 1NF
r er
e b e
b m
um num r na Flat File
n t e
i ce un m
vo co sto Item Item
In Ac Cu Item Description Quantity Price
10001 123456 John Smith ••• SAGX730 Pioneer Remote A/V Rec
10001 123456 John Smith ••• SAGX730 Pioneer Remote A/V Rec 1
1 569.95
569.95
10001 123456 John Smith •••
10001 123456 John Smith ••• AT10
AT10 Cerwin Vega Loudspeakers 1
Cerwin Vega Loudspeakers 1 359.95
359.95
10001 123456 John Smith ••• CDPC725 Sony Disc Jockey CD
10001 123456 John Smith ••• CDPC725 Sony Disc Jockey CD 1
1 399.95
399.95
10001 123456 John Smith ••• S/H
10001 123456 John Smith ••• S/H Shipping
Shipping 1
1 100.00
100.00
10001 123456 John Smith ••• Tax
10001 123456 John Smith ••• Tax Sales Tax
Sales Tax 1
1 103.06
103.06
10. From 1NF
(Invoice_number, Invoice_date, Date_delivered,
Cust_account, Cust_name, Cust_addr, Cust_city,
Cust_state, Zip_code,
Item, Item_descrip, Item_qty, Item_price)
Functional dependencies and determinants
Example: item_descrip is functionally dependent on item,
such that item is the determinant of item_descript.
11. From 1NF to 2NF
(Invoice_number, Invoice_date, Date_delivered,
Cust_account, Cust_name, Cust_addr, Cust_city,
Cust_state, Zip_code)
(Item, Item_descrip, Item_qty, Item_price)
Is this unique by itself?
What happens if the item is purchased more than once?
12. From 1NF to 2NF
(Invoice_number, Invoice_date, Date_delivered,
Cust_account, Cust_name, Cust_addr, Cust_city,
Cust_state, Zip_code)
Partial dependency
(Invoice_number, Item, Item_descrip, Item_qty, Item_price)
Composite key (forms a unique combination)
13. From 1NF to 2NF
(Invoice_number, Invoice_date, Date_delivered, Cust_account,
Cust_name, Cust_addr, Cust_city, Cust_state, Zip_code)
(Invoice_number, Item, Item_qty, Item_price)
(Item, Item_descrip)
14. From 2NF to 3NF
(Invoice_number, Invoice_date, Date_delivered, Cust_account,
Cust_name, Cust_addr, Cust_city, Cust_state, Zip_code)
(Invoice_number, Item, Item_qty, Item_price)
(Item, Item_descrip)
Which attributes are dependent on others?
Is there a problem?
15. Transitive Dependencies and
Anomalies
n Insertion anomalies
n To add a new row, all customer (name,
address, city, state, zip code, phone) and
products (description) must be consistent
with previous entries
n Deletion anomalies
n By deleting a row, a customer or product
may cease to exist
n Modification anomalies
n To modify a customer’s or product’s data in
one row, all modifications must be carried
16. Insertion and Modification Anomalies
For example…
Insert a new Panasonic product
Product_code Manufacturer_name
DVD-A110
DVD-A110 Panasonic
Panasonic
PV-4210
PV-4210 Panasonic
Panasonic CT-32S35
CT-32S35 PAN
PAN
PV-4250
PV-4250 Panasonic
Panasonic
Inconsistency
DVD-A110
DVD-A110 Panasonic
Panasonic
Change all Panasonic
PV-4210
PV-4210 PanaSonic
PanaSonic
PV-4250 Pana Sonic products’ manufacturer
PV-4250 Pana Sonic
CT-32S35
CT-32S35 PAN
PAN name to “Panasonic USA”
17. Deletion Anomaly
For Example…
4377182 John Smith lll Sacramento CA 95831
4398711 Arnold S lll Davis CA 95691
4578461 Gray Davis lll Sacramento CA 95831
4873179 Lisa Carr lll Reno NV 89557
By deleting customer Arnold S, we would also be deleting
Davis, California.
18. Invoice_number
Transitive Invoice_date
Dependencies Date_delivered
Cust_account
Cust_name
Ÿ A condition where A, B, C
are attributes of a relation Cust_addr
such that if A à B and Cust_city
B à C, then C is transitively Cust_state
dependent on A via B
Zip_code
(provided that A is not
functionally dependent on B Item
or C). Item_descrip
Invoice_number+Item
Item_qty
Item_price
19. Why Should City and State Be
Separated from Customer Relation?
n City and state are dependent on zip
code for their values and not the
customer’s identifier (i.e., key).
Zip_code à City, State
n Otherwise,
Cust_account à Cust_addr,
Zip_code à City, State
21. 3NF
Invoice Relation
(Invoice_number, Invoice_date, Date_delivered, Cust_account)
Customer Relation
(Cust_account, Cust_name, Cust_addr, Zip_code)
Zip_code Relation
(Zip_code, City, State)
Invoice_items Relation
(Invoice_number, Item, Item_qty, Item_price)
Items Relation Manufacturers Relation
(Item, Item_descrip) (Manuf_code, Manuf_name)
Since the Items relation contains the manufacturer’s name in the
description, a separate Manufacturers relation can be created
22.
23. First to Third Normal Form
(1NF - 3NF)
n 1NF: A relation is in first normal form if and only
if every attribute is single-valued for each tuple
(remove the repeating or multi-value attributes
and create a flat file)
n 2NF: A relation is in second normal form if and
only if it is in first normal form and the nonkey
attributes are fully functionally dependent on the
key (remove partial dependencies)
n 3NF: A relation is in third normal form if it is in
second normal form and no nonkey attribute is
transitively dependent on the key (remove
transitive dependencies)
24. Codd's Rules
E. F. Codd presented these rules as a
basis of determining whether a DBMS
could be classified as Relational
25. Codd's Rules
n Codd's Rules can be divided into 5
functional areas –
n Foundation Rules
n Structural Rules
n Integrity Rules
n Data Manipulation Rules
n Data Independence Rules
26. Foundation Rules
n Rule 0 –
n Any system claimed to be a RDBMS
must be able to manage databases
entirely through its relational
capabilities.
n All data definition & manipulation must be
able to be done through relational ops.
27. Foundation Rules
n Rule 12 - Nonsubversion Rule -
n If a RDBMS has a low level (record at a time)
language, that low level language cannot be
used to subvert or bypass the integrity rules
&constraints expressed in the higher-level
relational language.
n All database access must be controlled through the
DBMS so that the integrity of the database cannot be
compromised without the knowledge of the user or
the DBA.
n This does not prohibit use of record at a time languages e.g.
PL/SQL
28. Codd's Rules
n Structural Rules (Rules 1 & 6)
n The fundamental structural construct is the
table.
n Codd states that an RDBMS must support
tables, domains, primary & foreign keys.
n Each table should have a primary key.
29. Structural Rules
n Rule 1 -
n All info in a RDB is represented
explicitly at the logical level in exactly
one way - by values in a table.
n ALL info even the Metadata held in the
system catalogue MUST be stored as
relations(tables) & manipulated in the
same way as data.
30. Structural Rules
n Rule 6 - View Updating –
n All views that are theoretically
updatable are updatable by the system.
n Not really implemented yet by any
available system.
31. Codd's Rules
n Integrity Rules (Rules 3 & 10)
n Integrity should be maintained by the DBMS not
the application.
n Rule 3 - Systematic treatment of null
values -
n Null values are supported for representation
of 'missing' & inapplicable information in a
systematic way & independent of data type.
32. Integrity Rules
n Rule 10 - Integrity independence -
n Integrity constraints specific to a
particular RDB MUST be definable in
the relational data sublanguage &
storable in the DB, NOT the application
program.
n This gives the advantage of centralised
control & enforcement
33. Codd's Rules
n Data Manipulation Rules (Rule 2, 4, 5 & 7)
n User should be able to manipulate the 'Logical
View' of the data with no need for knowledge of
how it is Physically stored or accessed.
n Rule 2 - Guaranteed Access -
n Each & every datum in an RDB is guaranteed to be
logically accessible by a combination of table
name, primary key value & column name.
34. Data Manipulation Rules
n Rule 4 - Dynamic on-line Catalog based
on relational model
n The DB description (metadata) is represented
at logical level in the same way as ordinary
data, so that same relational language can be
used to interrogate the metadata as regular
data.
n System & other data stored & manipulated in the
same way.
35. Data Manipulation Rules
n Rule 5 - Comprehensive Data Sublanguage -
n RDBMS may support many languages & modes of
use, but there must be at least ONE language
whose statements can express ALL of the
following -
n Data Definition
n View Definition
n Data manipulation (interactive & via program)
n Integrity constraints
n Authorization
n Transaction boundaries (begin, commit & rollback)
n 1992 - ISO standard for SQL provides all these functions
36. Data Manipulation Rules
n Rule 7 - High-level insert, update &
delete -
n Capability of handling a base table or
view as a single operand applies not
only to data retrieval but also to insert,
update & delete operations.
37. Codd's Rules
n Data Independence Rules (Rules 8, 9
11)
n These rules protect users & application
developers from having to change the
applications following any low-level
reorganisation of the DB.
38. Data Independence Rules
n Rule 8 - Physical Data Independence -
n Application Programs & Terminal Activities
remain logically unimpaired whenever any
changes are made either to the storage
organisation or access methods.
n Rule 9 - Logical Data Independence -
n Appn Progs & Terminal Acts remain logically
unimpaired when information-preserving
changes of any kind that theoretically permit
unimpairment are made to the base tables.
39. Data Independence Rules
n Rule 11 - Distribution Independence -
n The data manipulation sublanguage of an
RDBMS must enable application programs
& queries to remain logically unchanged
whether & whenever data is physically
centralised or distributed.
40. Data Independence Rules
n Rule 11 - Distribution Independence -
n This means that an Application Program that
accesses the DBMS on a single computer
should also work ,without modification, even if
the data is moved from one computer to
another in a network environment.
n The user should 'see' one centralised DB whether
data is located on one or more computers.
41. Data Independence Rules
n Rule 11 - Distribution Independence –
n This rule does not say that to be fully
Relational the DBMS must support distributed
DB's but that if it does the query must remain
the same.
42. Summary
n Codd's Rules can be divided into 5
functional areas –
n Foundation Rules
n Structural Rules
n Integrity Rules
n Data Manipulation Rules
n Data Independence Rules