4. Questions
• Who works with databases?
• Who has had to design a database?
• Who has no idea what normalization is
but has heard about it?
• ERD?
• Anyone heard of denormalization?
14. Example Data
student emergency school
parents student age classroom teacher grade level
Name contact year
John and Mary John Smith
John Smith 10 C110 Ms. Brown 2010 6
Smith Jr.
Mary Smith 9 C80 Mr Green 2009 5
April
John Smith 10 C110 Ms. Brown 2010 6
Smith
Mary Smith 9 A25 Mr Baker 2009 5
Julie
Dave Harris Dave Harris 6 A10 Mr Jones 2010 3
Harris
18. Example Data
student emergency school
parents student age classroom teacher grade level
Name contact year
John and Mary John Smith
John Smith 10 C110 Ms. Brown 2010 6
Smith Jr.
Mary Smith 9 C80 Mr Green 2009 5
April
John Smith 10 C110 Ms. Brown 2010 6
Smith
Mary Smith 9 A25 Mr Baker 2009 5
Julie
Dave Harris Dave Harris 6 A10 Mr Jones 2010 3
Harris
19. Example Data
student emergency school
parents student age classroom teacher grade level
Name contact year
John Smith John Smith Jr. John Smith 10 C110 Ms. Brown 2010 6
Mary Smith John Smith Jr. John Smith 10 C110 Ms. Brown 2010 6
John Smith John Smith Jr. Mary Smith 9 C80 Mr Green 2009 5
Mary Smith John Smith Jr. Mary Smith 9 C80 Mr Green 2009 5
John Smith April Smith John Smith 10 C110 Ms. Brown 2010 6
Mary Smith April Smith John Smith 10 C110 Ms. Brown 2010 6
John Smith April Smith Mary Smith 9 A25 Mr Baker 2009 5
Mary Smith April Smith Mary Smith 9 A25 Mr Baker 2009 5
Dave Harris Julie Harris Dave Harris 6 A10 Mr Jones 2010 3
24. Example Data
student emergency school
parents student age classroom teacher grade level
Name contact year
John Smith John Smith Jr. John Smith 10 C110 Ms. Brown 2010 6
Mary Smith John Smith Jr. John Smith 10 C110 Ms. Brown 2010 6
John Smith John Smith Jr. Mary Smith 9 C80 Mr Green 2009 5
Mary Smith John Smith Jr. Mary Smith 9 C80 Mr Green 2009 5
John Smith April Smith John Smith 10 C110 Ms. Brown 2010 6
Mary Smith April Smith John Smith 10 C110 Ms. Brown 2010 6
John Smith April Smith Mary Smith 9 A25 Mr Baker 2009 5
Mary Smith April Smith Mary Smith 9 A25 Mr Baker 2009 5
Dave Harris Julie Harris Dave Harris 6 A10 Mr Jones 2010 3
44. Example Data
StuClass
Student
studentName class_id
studentName studentAge
John Smith Jr 1
John Smith Jr 10
John Smith Jr 2
April Smith 10
Julie Harris 6 April Smith 1
April Smith 3
Julie Harris 4
Class
class_id classroom teacher schoolYear gradeLevel
1 C110 Ms. Brown 2010 6
2 C80 Mr. Green 2009 5
3 A25 Mr. Baker 2009 5
4 A10 Mr. Jones 2010 3
45. Example Data
StuContact
Student studentName contactName
studentName studentAge
John Smith Jr John Smith
John Smith Jr 10
April Smith 10 John Smith Jr Mary Smith
Julie Harris 6 April Smith John Smith
April Smith Mary Smith
Contact Julie Harris Dave Harris
contactName parent contactPhone contactEmail contactAddress
John Smith Y 123-555-9876 john@blah.com 1 Main St
Mary Smith Y 123-555-2947 mary@blah.com 1 Main St
Dave Harris Y 123-555-3456 dave@work.cm 5 Baker Rd
yeah yeah - Oracle I know.\n3 years with MySQL the company and about 5-6 years as a developer\n\n\n
Ok, before we go any further I want to note a few things about this talk. When I first submitted this talk, I thought it would be quite an easy one to discuss. But as I dug deeper into the information (including my old text books and what is on the web), I found there is a lot of “discussion/argument” about it based on which relational theorist you prefer. \n\nI was originally taught these rules according to the theorist Codd. However there have been advancements to Codd’s theory - most notably by Date - that many now consider to be just as valid. Depending on the theorist you prefer to follow some of the information I give will either be valid - or invalid. \n\nI am not here to argue over which theorist is more “valid” but rather to try and give you, the beginner, a handhold into the concepts of normalization. That is my primary objective. \n\nOn the other hand I want to make sure to you the beginner understand that there is a depth of technical information that I will not be covering in order to not overwhelm you with technical details, terminology and arguments (Atomity, super keys, candidate keys, transitive dependencies, and partial dependencies for example). If anyone after the session would like to discuss in more detail these things, I would be more then happy to do it in the hallway track. \n\nFinally, I would love to hear feedback on this talk. It has been much harder then I thought to balance the complexity of the theory with the needs of not overwhelming a beginner. I would like to know how I did and what anyone thinks can be done to improve the talk.\n
\n
\n
To put it simply - it is the process of organizing your data. This includes deciding what tables to create, what attributes/columns you have for each table, how you inter-relate those table, and the data you put in the table.\n
\n
\n
- forces you to break down the data to its smallest form\n - small data saved only in one place == minimal space == increased speed (generally speaking).\n - because you are only saving the data in one place, you are less likely to “miss” data when you insert, update or delete the data\n
\n
So what the heck is that - I hear you say!\n
\n
\n
DO NOT SHOW THIS SLIDE\nThis is an elementary school table (grade 1-6) and shows some of the basic information we are going to have to handle. As we go we may choose to expand on some of information we are going to keep\n\nFYI - Software is Workbench. This draws the ERD (Entity-Relationship Diagram) for us.\n
This is an elementary school table (grade 1-6) and shows some of the basic information we are going to have to handle. As we go we may choose to expand on some of information we are going to keep\n\nFYI - Software is Workbench. This draws the ERD (Entity-Relationship Diagram) for us.\n
DO NOT SHOW THIS SLIDE!\nOur spread sheet:\nJohn Jr and April are twins of John and Mary Smith. \nThey have attended this school for 3 years. \nThey have had the same teacher this year.\n
Our spread sheet:\nJohn Jr and April are twins of John and Mary Smith. \nThey have attended this school for 2 years. \nThey have had the same teacher this year.\n
We have John and Mary Smith who have twin children in the school - John Jr and April. John Jr and April have been at the school for 2 years and are both in Ms Brown’s class this year.\n\nDave Harris has recently moved into the area and only has 1 child - Julie - in the school this year. \n
\n
\n
\n\n\n
So where are the “repeating groups”? \n1) we have John *and* Mary Smith as parents for both John Jr and April Smith. We will need to isolate each parent to each child.\n2) I also see that John Jr and April both have been in the school for 2 years. So we will want to isolate the children to each year in school.\n\nBasically what we are working toward is making sure that each intersection between a row and a column contains only a single “atomic” value.\n
When we break everything out, this is what we get. Notice that there is an entry for both parents, each child and each emergency contact.\n
DO NOT SHOW THIS SLIDE!\nSo we make sure each row has the (repeating) values\n
\n
\n
This can be a single column value, or a combination of columns. Modern web systems now tend to use MySQL’s auto_increment ability to create a unique integer value for each row. But the primary key *DOES NOT* have to be this. \n\nThere are a number of reasons why most systems now use auto_incs, from space being cheap (which historically was not always true), to the speed and ease of manipulation of an integer. But do not lock yourself into the thought that it must be.\n
Primary Key is a column or set of columns that can uniquely identify any row. In this example the combination of the student Name, grade level, school year and teacher will uniquely identify each row. (can handle repeated years, skipping a grade in the same year, changing teachers or the same teacher multiple years)\n
This technically is in 1NF. \n\nHowever, depending upon how the term “repeating groups” is interpreted (and by whom), some could argue that there are additional changes we must make. For example if you take the term “repeating groups” and think of it in terms of atomic data, we would have to break down the parents listing into individual names for each row. So John Jr and April would each then have 6 rows associated with them. I personally prefer it this way, but for brevity sake, I chose not to try and make a table that displays 12 rows and 13 columns for you to see.\n\nWe could also drill it down further and require the separation of first and last names into individual columns... and it goes on and on.\n\nI told you it could drive you mad! :D\n
\n
\n
So looking at our data - can we uniquely identify each row? I think so. I personally like the combination of student name with teacher, classroom, year and grade. This would allow a student to skip grades within the same school year and change classrooms or teachers without breaking the uniqueness.\n
Now that we have first normal form, we can see what it takes to get to second.\n
Second normal for builds off first normal form. In order to be in second normal form we must first meet all the requirements of 1NF. Once we have that, we will need to isolate subsets of data.\n\nThis sounds hard but lets do a few examples so you can see what we mean.\n
Ok - so what related subsets of data do we have?\nFirst thing I see is the grades. \n\n
Ok - so what repeating subsets of data do we have? Well we know that John Jr and April are in the same class now. So that repeats. Lets pull that out into its own table.\n\n
So we break this out into its own table - grades. Also note I have added a grades_id column as the primary key for this table. I did this since I have to still maintain 1NF (requires PK) but I chose this time around to use an artificial primary key rather then natural key made of a composite of multiple columns. This is mostly for convenience sake as you will see later.\n
So we break this out into its own table - grades. Also note I have added a grades_id column as the primary key for this table. I did this since I have to still maintain 1NF (requires PK) but I chose this time around to use an artificial primary key rather then natural key made of a composite of multiple columns. This is mostly for convenience sake as you will see later.\n
So we break this out into its own table. Also note I have added a grades_id column. I did this since I have to still maintain 1NF - which means I have to have a primary key. I could hypothetically make the primary key out of a combination of all 4 quarters grades, but since I know I may use it later - I chose in this case to make an artificial key to identify the row.\n
So we break this out into its own table - grades. Also note I have added a grades_id column as the primary key for this table. I did this since I have to still maintain 1NF (requires PK) but I chose this time around to use an artificial primary key rather then natural key made of a composite of multiple columns. This is mostly for convenience sake as you will see later.\n
OK now that we have removed the grades information- what other repeating subsets do we see? Hmm - John Jr and April are in the same class - so that is repeated. Lets break that out.\n
OK now that we have removed the grades information- what other repeating subsets do we see? Hmm - John Jr and April are in the same class - so that is repeated. Lets break that out.\n
\n
\n
\n
I am creating an artificial primary key for the Class table. \n\n \n
Ok - so is there anything else that repeats? Yep - we have the parent and emergency contact information.\n\n\n
Ok - so is there anything else that repeats? Next thing I can think of is that John Jr and April share the same parents. So that repeats again. We can then pull that out.\n\n\n
Now the way I chose to handle this is to make a Contact table. In it we will place the emergency contact information along with the parent information since it is not uncommon for them to be the same. \n\nI will also include a parent column flag. This will tell me if the name is a parent or a separate emergency contact (Ex: single parent who has the grandparent as the emergency contact.)\n\nIf this was a real world situation we would want to also hold contact information like phone numbers, address, maybe even email.\n
Now the way I chose to handle this is to make a Contact table. In it we will place the emergency contact information along with the parent information since it is not uncommon for them to be the same. \n\nI will also include a parent column flag. This will tell me if the name is a parent or a separate emergency contact (Ex: single parent who has the grandparent as the emergency contact.)\n\nIf this was a real world situation we would want to also hold contact information like phone numbers, address, maybe even email.\n
Now the way I chose to handle this is to make a Contact table. In it we will place the emergency contact information along with the parent information since it is not uncommon for them to be the same. \n\nI will also include a parent column flag. This will tell me if the name is a parent or a separate emergency contact (Ex: single parent who has the grandparent as the emergency contact.)\n\nIf this was a real world situation we would want to also hold contact information like phone numbers, address, maybe even email. Later on you will see that I add this information to the table.\n
Now the way I chose to handle this is to make a Contact table. In it we will place the emergency contact information along with the parent information since it is not uncommon for them to be the same. \n\nI will also include a parent column flag. This will tell me if the name is a parent or a separate emergency contact (Ex: single parent who has the grandparent as the emergency contact.)\n\nIf this was a real world situation we would want to also hold contact information like phone numbers, address, maybe even email. \n
\n
\n\n
Which we can rename to the Student table\n\n
\n
\n
\n
\n
Now that we have isolated related data, it is time to create the relations between the data.\n
So we have these tables, and now have to “reconnect” them to each other. This puts back in the relations.\n
So we have these tables, and now have to “reconnect” them to each other. This puts back in the relations.\n
The easiest relationship I see is the Student to the Emergency Contact.\n\nWe have to keep in mind how our relationships work. Working under the understanding that each student can have many contacts (Ex: John and Mary Smith) and the each Contact can be for 1 or more students (Ex: John Jr and April).\n\nBecause this is a many students to many contacts relationship we make a pivot/associative table so we can link/list an individual student to an individual contact.\n\n
The easiest relationship I see is the Student to the Emergency Contact.\n\nWe have to keep in mind how our relationships work. Working under the understanding that each student can have many contacts (Ex: John and Mary Smith) and the each Contact can be for 1 or more students (Ex: John Jr and April).\n\nBecause this is a many students to many contacts relationship we make a pivot/associative table so we can link/list an individual student to an individual contact.\n\n
The StuContact table now links the 2 tables. We can now find each contact for an individual student (We can search on John Jr and find Mary Smith and John Smith.)\nAnd each student associated with a specific contact (We can search on Mary Smith and find John Jr and April.)\n\nNow could I make artificial Primary keys for the Student and the Contact tables so the pivot table is only working with integars - sure. This is now a common practice on the web to help indexes stay small and fast. However it should be noted that when you do this you are potentially taking up a lot of extra space to hold that unrelated value.\n\nFYI - this is called crows foot notation. The crows foot symbolizes a Many relationship. The other side stands for a singular relationship. So just by looking at the diagram we know that Student is a 1-N relationship to StuContact, and StuContact is a N-1 relationship with Contact.\n
The StuContact table now links the 2 tables. We can now find each contact for an individual student (We can search on John Jr and find Mary Smith and John Smith.)\nAnd each student associated with a specific contact (We can search on Mary Smith and find John Jr and April.)\n\nNow could I make artificial Primary keys for the Student and the Contact tables so the pivot table is only working with integars - sure. This is now a common practice on the web to help indexes stay small and fast. However it should be noted that when you do this you are potentially taking up a lot of extra space to hold that unrelated value.\n\nFYI - this is called crows foot notation. The crows foot symbolizes a Many relationship. The other side stands for a singular relationship. So just by looking at the diagram we know that Student is a 1-N relationship to StuContact, and StuContact is a N-1 relationship with Contact.\n
Please note here that I have changed the primary key for the Class table. Originally I was using the classroom, teacher, school year combination to uniquely identify each row. And that was fine. But now I will be connecting the Class table to the Student. So rather then have all that data repeatedly listed, I chose to add an artificial primary key.\n
\n
Like the student and contact listing, the student and class listing is also many to many (many students are in a class and a student can be in many classes over time).\n\nSo again we make a pivot/associative table.\n
Like the student and contact listing, the student and class listing is also many to many (many students are in a class and a student can be in many classes over time).\n\nSo again we make a pivot/associative table.\n
\n
\n
I chose to bring the Grades table in by also linking it into the student and the class. So if we have the name of the student and the class, we can then find the grades.\n
I chose to bring the Grades table in by also linking it into the student and the class. So if we have the name of the student and the class, we can then find the grades.\n
This is what all the tables and relationships look like so far.\n
This is what all the tables and relationships look like so far.\n
\n
\n
\n
\n
This is what all the tables and relationships look like so far.\n\nSo looking at any individual table - is there any data in a table that is not - or could not be linked to the primary key?\n\nNope there is nothing to improve here. So lets change things up a little bit.\n
What about now?\n
Does this help any?\n
So what is here that is not dependent upon the primary key? The teacherSalary. \n\nSo since it is not dependent upon the primary key - to place this table into 3NF we will need to extract it (and any related data) out into its own table.\n
\n
\n
All the tables in 3NF\n
Lets start with the Grades table.\n
Is there any data here that is not dependent on the primary key? Hmm - the final grade is a a derived value (sum of 4 quarters values divided by 4). So technically it is not dependent on the primary key. So we remove it.\n
\n
\n
BCNF, 4NF, etc. \n
But generally speaking 3NF is as high as most applications need to go. For applications I personally prefer to start at 3NF and then adjust as needed for my requirements.\n
\n
the process of attempting to optimize the performance of a database by adding redundant data or by grouping data.\n