1. Subtypes vs. Roles
One question recently posed referred to the concept of the subtype in the logical data model, the
questioner seeking to know what they are, why do we use them and what benefits they bring to the
table. Using subtypes to categorize data has always been a challenging area to the novice data modeler
especially when dealing with the concept of exclusive and inclusive subtypes. An exclusive subtype is
defined as follows: for each occurrence in the super type there can be one and only one subtype
occurrence. Example:
Party
Person Organization
The super type Party may be sub typed as a person (or individual) or an Organization (a group of
people). Therefore, these subtypes are mutually exclusive, a Party can only be categorized as one or the
other but never both.
The inclusive subtype is defined as follows: for each occurrence in the super type there may be one to
many subtype occurrences. Example:
Person
Customer Employee Vendor
The super type Person then may be categorized as both a Customer and an Employee. For that matter,
a person could be subtyped as all three or any combination.
In reality, the exclusive subtype is the form of subtyping most commonly used. Inclusive subtypes have
Page 1
2. Subtypes vs. Roles
been used less and less overtime for the following reasons:
1. Data modelers have focused more on the semantics (the study of meaning) of data objects
such as entities in the data model
2. Inclusive subtypes can create complexity and ambiguity in the data model.
3. There are inherent difficulties in defining relationships between subtype entities as well as
relationships between subtype entities that are on different levels of a subtype - supertype
hierarchy.
Data modeling represents more than creating structures for databases. A logical data model must also
capture clear definitions of data objects, the relationships between data objects and the related
business rules. Identifying Customer as a subtype of person upon reflection does not accurately reflect
the semantic differences between Person and Customer. A Person is a unique human being, a single
physical object. A customer is a role that a person may play. In life and in business, people often play
multiple roles; a Person may play a role of a Customer for one or more businesses while being an
Employee of one or more businesses and additionally that customer may also be a Vendor providing
goods and/or services to a Customer.
There then is a clearly defined difference between an entity that is in a supertype subtype hierarchy and
entities that represent roles.
- Entities in a supertype – subtype represent things
- Entities that are role entities represent the roles/purposes that things serve.
So what does this matter? It’s just semantics, right?
Actually, there is a heavy price that may be paid by unclear semantics. How many tables have we seen in
systems that have a Customer table, an Employee table and a Vendor table such as the one below:
Page 2
3. Subtypes vs. Roles
Person
Customer Employee Vendor
Customer Number: INTEGER Employee Number: NUMBER(5) Vendor Number: DECIMAL(7)
Customer Name: TEXT(45) Employee First Name: VARCHAR(25) Vendor Name: VARCHAR(20)
Csustomer Street Address: VARCHAR(20) Employee Middle Initial: CHAR(1) Vendor Personal First Name: VARCHAR(15)
Customer City: VARCHAR(30) Employee Last Name: VARCHAR(20) Vendor Personal Middle Name: VARCHAR(15
Customer State Abbreviation: CHAR(2) Employee Address: VARCHAR(45) Vendor Personal Last Name: VARCHAR(15)
Customer: NUMBER(5) Vendor Street Address: VARCHAR(35)
Vendor City State Address: VARCHAR45)
Vendor Zip Code: CHAR(10)
The challenges abound! The inclusive subtyping tends to lead to denormalization and data duplicationAs
always, duplicated data creates challenges of synchronizing data and maintaining standardized formats
and data. The non standardization names and address create problems with sharing and comparing
data. Also, how do we determine which Employees are also Vendors to the company? Harte Hanks,
Trillium … many companies have made significant profits trough offering address cleansing, names
cleansing and rationalization (de duplication) services just to identify the same person in these different
tables and/or to determine the correct and most current address for a Customer who is also a Vendor?
But wait, there is more!
Can’t a Vendor be a Person or a Company (Organization?) Can’t a Customer be a Person or a company?
What does this look like?
Page 3
4. Subtypes vs. Roles
Party
Person
Organization
Employee
Customer
Regulator
Vendor
What does this do to relationships?
We have not added all of the relationships but we begin to see challenges 2 and 3 with the inclusive
subtypes. Multi subtype relationships, ambiguity and the lack of clarity abound!
Page 4
5. Subtypes vs. Roles
Party
Person
Organization
Employee
Customer
Regulator
Vendor
Clearly if you are a believer in the KISS (Keep It Simple Stupid) principle in data modeling, this will not be
a satisfactory outcome. Several other questions arise:
1. How flexible would this model be?
2. How extensible is this model?
3. How easily maintained would be the programs developed against the model?
Let’s look at a role based model.
Page 5
6. Subtypes vs. Roles
Customer Role
Address
Party
Vendor Role
Regulator Role
Physical Address Phone Address EMail Address
Employee Role
Person Organization
A much simpler model that is more semantically correct, less complex and has greater clarity. This
model of course is a starting point. Addresses definitely have roles. An associative entity between
Address and Party named Party Address Role would cover this.
Semantics play an important role in developing successful supertype – subtype hierarchies. The
exclusive subtype assists in maintaining clarity and simplicity in the model. Inclusive subtypes may often
be indicative of roles and are better expressed in the model by using a role based approach.
Hopefully, these musings will spark some thoughts.
Page 6