JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
Relational Model and Relational Algebra - Lecture 3 - Introduction to Databases (1007156ANR)
1. 2 December 2005
Introduction to Databases
Relational Model and Relational Algebra
Prof. Beat Signer
Department of Computer Science
Vrije Universiteit Brussel
http://www.beatsigner.com
2. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 2March 3, 2017
Relational Model
Theory for data management developed
by Edgar F. Codd while working at IBM
Edgar F. Codd, A Relational Model of Data for
Large Shared Data Banks, Communications
of the ACM 13(6), June 1970
data independence
- between logical and physical level
set-based query language
- relational query language
normalisation
- avoid redundancy
IBM first did not implement the relational model in order
to "protect" their IMS/DB revenues
Edgar F. Codd
3. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 3March 3, 2017
Relational Model …
IBM's System R (1974) was a DBMS prototype
implementing Codd's relational model
first implementation of the Structured English Query
Language (SEQUEL)
- later renamed to Structured Query Language (SQL)
The System R prototype finally led to the development of
different commercial DBMSs including IBM's DB2,
Oracle or Microsoft SQL Server
4. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 4March 3, 2017
Relational Database
A relational database consists of a number of tables
each table row defines a relationship between a set of values
There is an analogy between the concept of a table
(collection of relationships) and the mathematical
concept of a relation
in the following we therefore talk about relations instead of tables
a relational database consists of a collection of relations
customerID name street postcode city
1 Max Frisch Bahnhofstrasse 7 8001 Zurich
2 Eddy Merckx Pleinlaan 25 1050 Brussels
... ... ... ... ...
5. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 5March 3, 2017
Relational Database ...
Information is normally partitioned into different relations
since the storage in a single relation would lead to
a replication of information (redundancy)
a large number of necessary null values
While tables are used at the logical level, different
storage structures can be used at the physical level
data independence
6. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 6March 3, 2017
Relation
The column headers of a table are called attributes and
for each attribute ai there is a set of permitted values
called the domain Di of ai
Given the domains D1, D2,..., Dn, a relation r is defined as
a subset of the cartesian product D1D2...Dn
name street ... city
Max Frisch Bahnhofstrasse 7 ... Zurich
... ... ... ...
D1 D2 Dn
t1
tm
a1 a2 an
attributes
tuples
relation r
degree
cardinality
7. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 7March 3, 2017
Relation ...
A relation r is a set of n-ary tuples ti = (a1, a2,..., an) where
each aiDi
the number of attributes is called a relation's degree
the number of tuples is called a relation's cardinality
Since a relation is a set of tuples, the order of the tuples
is irrelevant
each tuple is distinctive (no duplicate tuples)
The order of attributes is irrelevant
The domain of each attribute has to be atomic
all members of the domain have to be indivisible units
a relation with only atomic values is normalised (first normal form)
8. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 8March 3, 2017
Relation ...
Multiple attributes can have the same domain Di
we will see later how the domain types can be defined (in SQL)
A tuple variable is a variable that stands for a tuple
its domain is defined by the set of all tuples
The special null value is part of any domain Di
used to represent an unknown or non-existing value
null values are not easy to handle (not the same as empty string)
- avoid null values whenever possible
9. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 9March 3, 2017
Example of a Relation
Let us assume that we have the following attributes with
their domains
name = {Merckx, Frisch, Botta, ...}
street = {Bahnhofstrasse, Pleinlaan, Via Nassa, ...}
city = {Zurich, Brussels, Lugano, Paris, ...}
Then
r = {(Merckx, Pleinlaan, Brussels),
(Frisch, Bahnhofstrasse, Zurich),
(Botta, Via Nassa, Lugano),
(Botta, Bahnhofstrasse, Lugano)}
forms a relation over namestreetcity
10. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 10March 3, 2017
Relational Database Example
customerID name street postcode city
1 Max Frisch Bahnhofstrasse 7 8001 Zurich
2 Eddy Merckx Pleinlaan 25 1050 Brussels
5 Claude Debussy 12 Rue Louise 75008 Paris
53 Albert Einstein Bergstrasse 18 8037 Zurich
8 Max Frisch ETH Zentrum 8092 Zurich
cdID name duration price year
1 Falling into Place 2007 17.90 2007
2 Carcassonne 3156 15.50 1993
customer
cd
11. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 11March 3, 2017
Relational Database Example ...
orderID customerID cdID date amount status
1 53 2 13.02.2010 2 open
2 2 1 15.02.2010 1 delivered
order
supplierID name city
5 Max Frisch Zurich
2 Mario Botta Lugano
supplier
Customer (customerID, name, street, postcode, city)
CD (cdID, name, duration, price, year)
Order (orderId, customerID, cdID, date, amount, status)
Supplier (supplierID, name, city)
relational database schema
12. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 12March 3, 2017
Database Schema
The logical design of a database is defined by the
database schema
A database instance is a snapshot of the data stored in a
database at a given time
A relation schema R = (A1, A2,..., An) is defined by a list of
attributes
by convention, the name of a relation schema starts with an
uppercase letter (in contrast to relations)
- e.g. Customer (customerID, name, street, postcode, city)
a relation can be defined based on the relation schema
- e.g. customer = (Customer)
a relation instance contains the relation's actual values (tuples)
13. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 13March 3, 2017
Keys
For the keys we use the same terminology as
introduced earlier for the ER model
K R is a superkey of R if the values of K uniquely
identify any tuple of possible relation instances r(R)
t1 and t2 r and t1 t2 t1[K] t2[K]
e.g. {cdID} and {cdID, name} are both superkeys of CD
K is a candidate key if K is minimal
e.g. {cdID} is a candidate key of CD
The DB designer has to choose one of the candidate
keys for each relation schema as primary key
if possible, the value of a primary key should not change or only in
very rare cases
14. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 14March 3, 2017
Foreign Keys
A relation schema may have one or multiple attributes
that correspond to the primary key of another relation
schema and are called foreign keys
e.g. customerID is a foreign key of the order relation schema
note that a foreign key of the referencing relation can only contain
values that occur in the primary key of the referenced relation or it
must be null referential integrity
customerID
name
street
postcode
city
cdID
name
duration
price
year
orderID
customerID
cdID
date
amount
status
customer cd
order
schema diagram
15. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 15March 3, 2017
A query language is a language that is used to access
(read) information stored in a database as well as to
create, update and delete information (CRUD)
There are different types of query languages
procedural
- e.g. relational algebra
declarative
- e.g. Structured Query Language (SQL)
The relational algebra as well as the tuple and domain
relational calculus form the basis for "higher level" query
languages (e.g. SQL)
Query Language
16. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 16March 3, 2017
Relational Algebra
The relational algebra consists of six fundamental
operations
unary operations
selection: s
projection: p
rename: r
binary operations
union:
set difference: -
cartesian product:
An operator takes one (unary) or two (binary) relations
as input and returns a new single relation as output
17. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 17March 3, 2017
Relational Algebra ...
Based on the six fundamental operations, we can define
additional relational operations (add no additional power)
set intersection:
natural join: ⋈
theta join: q
equijoin: q
semijoin: ⋉
division:
assignment:
Extended operations with additional expressiveness
generalised projection
aggregate function
outer join
18. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 18March 3, 2017
Selection (s)
Goal: Select specific tuples (rows) from a relation (table)
sp(r) = {t | t r and p(t)}
p is a selection predicate
- consists of terms pi connected by an and (), or () or not ()
a term pi has the form attributem operator attributen or constant
- the available operators are: =, , >, <, ,
Examples
"Find all tuples in the customer relation that are in the city of
Zurich and have a postcode greater than 8010."
scity = "Zurich" postcode > 8010 (customer)
customerID name street postcode city
53 Albert Einstein Bergstrasse 18 8037 Zurich
8 Max Frisch ETH Zentrum 8092 Zurich
19. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 19March 3, 2017
Selection (s) ...
Examples ...
A selection predicate may contain a comparison between two
attributes
"Find all tuples in the cd relation with a value for the duration
attribute that is equal to the year of release."
sduration = year (cd)
Attention: The selection operation in relational algebra
has a different meaning than the SELECT statement used
in SQL
SELECT in SQL corresponds to a projection in relational algebra
cdID name duration price year
1 Falling into Place 2007 17.90 2007
20. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 20March 3, 2017
Projection (p)
Goal: Select specific attributes (columns) from a relation
pA1, A2,..., Am
(r)
returns a relation instance that only contains the columns for
which an attribute Ai has been listed
duplicate tuples are removed from the resulting relation (set)
Example
"Return the name and city of all tuples in the customer relation."
pname,city (customer)
name city
Max Frisch Zurich
Eddy Merckx Brussels
Claude Debussy Paris
Albert Einstein Zurich
21. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 21March 3, 2017
Composition of Relational Operations
Since the result of a relational operation is a new single
relation instance, multiple operations can be combined
Example
"Find the names of all customers who live in Zurich."
pname (scity = "Zurich" (customer))
name
Max Frisch
Albert Einstein
22. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 22March 3, 2017
Union ()
Goal: Unify tuples from two relations
r s = {t | t r or t s}
r and s must have the same degree (same number of attributes)
the corresponding attribute domains must have a compatible type
Example
"Find the names of persons who are either customers or suppliers."
pname (customer) pname (supplier)
name
Max Frisch
Eddy Merckx
Claude Debussy
Albert Einstein
Mario Botta
23. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 23March 3, 2017
Set Difference (-)
Goal: Find tuples that are in one relation but not in
another relation
r - s = {t | t r and t s}
r and s must have the same degree
the corresponding attribute domains must have a compatible type
finds tuples that are in a relation r but not in another relation s
Example
"Find the names of suppliers who are no customers."
pname (supplier) - pname (customer)
name
Mario Botta
24. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 24March 3, 2017
Cartesian Product ()
Goal: Combine information from any two relations
r s = {tu | t r and u s}
the attribute names of r(R) and s(S) have to be distinct
- attributes with the same name have to be renamed via the rename operator r
Example
pname (customer) pcity (customer)
name city
Max Frisch Zurich
Max Frisch Brussels
Max Frisch Paris
Eddy Merckx Zurich
Eddy Merckx Brussels
... ...
25. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 25March 3, 2017
Cartesian Product () ...
Example ...
"List the names of all customers with at least one order."
pname( scustomer.customerID = order.customerID (customer order) )
Note that we will later see another operator (natural join)
for the combination of two tables based on common
attributes
name
Eddy Merckx
Albert Einstein
26. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 26March 3, 2017
Rename (r)
Goal: Rename a relation or an attribute
rx(E)
renames the result of expression E to x
rx(A1,A2,..., An)
(E)
renames the result of expression E to x and renames the
attributes to A1,A2,...,An
Example
rperson(name,location) (pname,city (customer))
name location
Max Frisch Zurich
... ...
person
27. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 27March 3, 2017
Rename (r) ...
Example
"Find the price of the most expensive CD in the cd relation."
Pprice (cd) - Pcd.price(scd.price < d.price (cd rd (cd)))
We will later see another operator (aggregate function)
that can be used to simplify this type of query for a max
value
price
17.90
28. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 28March 3, 2017
Formal Definition
A basic relational algebra expression is either a relation
from the database or a constant relation
The set of all relational algebra expressions is defined by
E1 E2
E1 - E2
E1 E2
sp(E1)
pA1,A2,..., Am
(E1)
rx(A1,A2,..., An)
(E1)
29. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 29March 3, 2017
Set Intersection
r s = {t | t r and t s}
r and s must have the same degree
the corresponding attribute domains must have a compatible type
Note that the set intersection can be implemented via a
pair of set difference operations
r s = r - (r - s)
Example
"Find the names of people who are customers and suppliers."
pname (customer) pname (supplier)
name
Max Frisch
30. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 30March 3, 2017
Natural Join (⋈)
r ⋈ s = pR S (sr.A1= s.A1 r.A2= s.A2 ... r.An= s.An
(r s))
where R S = {A1, A2,..., An}
Keep all tuples of the cartesian product r s that have the
same value for the shared attributes of r(R) and s(S)
the natural join is an associative operation: (r ⋈ s) ⋈ t = r ⋈ (s ⋈ t)
Example
"List the name and street of customers whose order is still open."
pname, street(sstatus="open"(order ⋈ customer))
note that we could do the selection first (more efficient)
- we will review this later when discussing query optimisation
name street
Albert Einstein Bergstrasse 18
31. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 31March 3, 2017
Theta Join (q) and Equijoin
r ⋈p s = sp (r s)
where the predicate p is of the form r.Ai q s.Aj and q is a
comparison operator (=, , >, <, , )
Example
customer ⋈postcode duration (rcd(cdID,cdName,duration,price,year) (cd))
An equijoin (r ⋈p s) is a special form of a theta join where
only the equality operator (=) may be used
note that a natural join is a equijoin over all common attributes of
a relation r and s
customerID name ... cdID cdName ...
2 Eddy Merckx ... 1 Falling into Place ...
2 Eddy Merckx ... 2 Carcassonne ...
32. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 32March 3, 2017
Semijoin (⋉)
r ⋉ s = pA1, A2,..., An
(r ⋈ s)
where A1, A2,..., An are all the attributes of r
Example
"List the customers whose order is still open."
customer ⋉ sstatus="open" (order)
customerID Name Street Postcode city
53 Albert Einstein Bergstrasse 18 8037 Zurich
33. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 33March 3, 2017
Division ()
r s = {t | t pR-S(r) and " u s tu r}
for the relations r(R) and s(S) and S R
suited for queries that include the phrase "for all"
Example
r s
A B C D
a 3 a 1
a 1 a 3
a 3 b 3
b 1 a 1
c 4 b 3
b 2 b 2
c 4 a 1
r
C D
b 3
a 1
A B
a 3
c 4
s
r s
34. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 34March 3, 2017
Assigment
variable E
Works like an assigment in programming languages
Assigments must always be made to temporaray relation
variables
no database modification
Example
temp1 pname, street(sstatus="open"(order ⋈ customer))
temp2 pname,street (scity = "Brussels" (customer))
result temp1 temp2
name street
Eddy Merckx Pleinlaan 25
Albert Einstein Bergstrasse 18
35. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 35March 3, 2017
Generalised Projection (p)
pF1, F2,..., Fm
(E)
generalisation of the project operation that supports the projection
to arithmetic expressions F1, F2,..., Fm
Example
"Show a list of the names of all CDs together with a reduced
price (80% of the original price)."
pname, price 0.8 as reducedPrice (cd)
name reducedPrice
Falling into Place 14.32
Carcassonne 12.72
36. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 36March 3, 2017
Aggregate Function
G1, G2,..., Gm
G F1(A1), F2(A2),..., Fn(An) (E)
takes a collection of values as input and returns a single value
based on the following operations
- min: minimum value
- max: maximum value
- sum: sum of values
- count: number of value
- avg: average value
G1, G2,..., Gm lists the attributes on which to group
- can be empty
Fi are the aggregate functions
by default, duplicates are not elimiated before aggregating
- can use the keyword distict to eliminate duplicates (e.g. sum-distinct)
37. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 37March 3, 2017
Aggregate Function ...
Examples
"List the average amount of items in an order."
G avg(amount)(order)
"List the number of customers for each city."
cityG count(customerID) as customerNo (customer)
avg(amount)
1.5
city customerNo
Zurich 3
Brussels 1
Paris 1
38. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 38March 3, 2017
Outer Joins
The left outer join (⟕), right outer join (⟖) and full outer
join (⟗) are extensions of the natural join operation
Computes the natural join and then adds the tuples from
one relation that do not match the other relation
filled up with null values
Example
customer ⟕ order
customerID name ... orderID cdID ...
53 Albert Einstein ... 1 2 ...
2 Eddy Merckx ... 2 1 ...
5 Claude Debussy ... null null ...
... ... ... ... ... ...
39. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 39March 3, 2017
Outer Joins ...
left outer join (⟕)
keeps all tuples of the left relation
non-matching parts filled up with null values
right outer join (⟖)
keeps all tuples of the right relation
non-matching parts filled up with null values
full outer join (⟗)
keeps all tuples of both relations
non-matching parts filled up with null values
40. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 40March 3, 2017
Null Values
A null value means that the value is unknown or
nonexistent
Primary key attributes can never be null (entity integrity)
The result of an arithmetic operation that involves a null
value is null
For grouping and duplicate elimination null values are
treated like other values (two null values are the same)
Null values are ignored in aggregate functions
41. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 41March 3, 2017
Database Modifications
A database may be modified by using one of the
following operations
insert
update
delete
These operations are defined via the assigment operator
42. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 42March 3, 2017
Insert
r r E
To insert new data into a relation we can
explicitly specify the tuples to be inserted
write a query and insert the result tuple set
Example
cd cd {(3, "Chromatic", 3012, 16.50, 1994)}
cdID name duration price year
1 Falling into Place 2007 17.90 2007
2 Carcassonne 3156 15.50 1993
3 Chromatic 3012 16.50 1994
43. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 43March 3, 2017
Update
r pF1, F2,..., Fn
(r)
Update specific values in a tuple r
Fi is either the old value of r or a new value if the
attribute has to be updated
Example
"Increase the price of all CDs by 10%."
cd pcdID, name, duration, price 1.1, year (cd)
cdID name duration price year
1 Falling into Place 2007 19.69 2007
2 Carcassonne 3156 17.05 1993
3 Chromatic 3012 18.15 1994
44. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 44March 3, 2017
Delete
r r - E
A delete is expressed similar to a query except that the
result is removed from the database
Cannot remove single attributes
Example
"Remove the CD with the name 'Carcassonne' from the
database."
cd cd - sname = "Carcassonne" (cd)
cdID name duration price year
1 Falling into Place 2007 19.69 2007
3 Chromatic 3012 18.15 1994
45. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 45March 3, 2017
Homework
Study the following chapters of the
Database System Concepts book
chapter 2
- Relational Model
chapter 6
- section 6.1
- Relational Algebra
46. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 46March 3, 2017
Exercise 3
Relational model
Relational algebra
47. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 47March 3, 2017
References
A. Silberschatz, H. Korth and S. Sudarshan,
Database System Concepts (Sixth Edition),
McGraw-Hill, 2010
E. F. Codd, A Relational Model of Data for Large Shared
Data Banks, Communications of the ACM 13(6),
June 1970