2. Introduction
• Most common used data format next to
spreadsheets.
• Spreadsheets relatively easily
• Research projects mostly claim data to be
stored in relational database.
• Understanding a relational structure opens
the access to many data
3. Relational databases - Data mining
• Exploration of data
• Prerequisite: data should be available in a
minable format - database
• Database = electronic document storing data
– Non-relational: 1 bulk system with non-related
items (eg. Msexcel files, text-documents, non-
related-tables)
– Relational: all items (tables) are linked to each
other (see further)
4. Relational databases
Why using a database
• Relational database:
– All your data is stored in 1 file
• Easy to retrieve data
• Easy to backup
– Data and metadata stored together
• Data ...
• Metadata: data about the data (documentation)
– Many data-files contain undocumented values:
– Species A has an abundance of 17 ( meaning of value 17?)
5. Relational databases
Why using a database
• All data in a good relational designed database
is only stored once:
– Example: species list typing errors
• Nudora thorakista
• Nudora thorrakista
• Nudora thorakhista
• Nudora thorakisa
– 1 species species richness calculation: 4
– Solution: 1 table with each species 1 record and
use it as a reference
6. Why using a database
• Data is much more rigid ...
– More difficult to make errors
– E.g. Sorting in excell
7. Relational databases
Principle - Exercise
• A practical example to understand ...
– Make a list of 15 people you know
– Make a list of all genders
– Make a list of characters and indicate for each
character whether nice or not
– Make a list of countries
• Start coupling all your lists
• You made a relational database
12. Relational databases
Principles
• Think before you start ...
– Structure of a database is the key to a good
dataset
– Structure has to translate the whole concept
• One look at the structure (relational scheme) should
explain the database
13. Relational databases - components
• Tables
– Basic structures containing the data
– Structure of table important
– ID
• Relations
– Definition of how different tables are connected
and form a sense-full unit
• Queries
– Extractions of data from database
14. Table designs ...
• A table consists of a series of Columns ...
• Each record as such:
– Different fields
– Design of table must be done
before data is entered
– Each field: name, data type
– Each field can also by formatted layout
Record
ColumnField
15. Table designs ...
• Field types:
– Numeric – integer/double
– Text
– Date/Time
– Memo
– Autonumber ID
– Yes/No
16. Excercise on field types:
• 12
• 15 jan 1988
• hallo
• 12,456
• 12:56
• Azdazdazd azdda zda azdd dad zd dadazdzd
azdazddazdd azdazd azdazd dzdzdzzd ada zzd
azdaz dda azd da az d z azdzadazd a zd a azd
azd z dd da a z a z zd d ddaa zd
• 09:89
17. Special field in a table: key
• A key = a unique identifier for a record
– Example: pasport number:
• Number in a database which is unique and relates to all data
about you
– Each record in a table gets also a key
– This key is used to link tables to each other
– Example:
• Nudora sp1 – id: 123776
• Nudora sp2 – id: 34688
– Advantage: species name changes: linked taxa remain
linked
18. Linking tables through id’s
• Storing numbers is most effecient way to store
data:
• Nudora sp1 is found in the north sea with a
density of 32
• Species 123776 is found in station 2 (North
sea) with a density of 32
• Record in table density becomes:
123776 | 2 | 32
19. Setting up relations between tables
• Relations: links between tables
• Connecting tables through certain fields in a
rigid way to each other
• Advantage: database becomes a strong unity
• Types of relations:
– 1 to many
– Many to many ( = 2 times 1 to many)
20. Examples of relations
• Table places: field country (numeric)
• Table countries – list of countries,
each country has unique id
• Relation is made between:
– Field country in places
– Field id in country
• One to many relation: 1 record in table
country linked to multiple records in places
• No deleting of countries possible
Places
Country
21. Examples of relations
• Many to many
• Id of sample
• Id of species
• Table density: unique combination of sample,
species ...
Species
Sample
Density
22.
23. Queries
• All data in database:
– Next step: get it out again
– Selections on 1 table: by using filters
– Selections on multiple tables: using queries
– Queries can be saved and reused
– Queries can be the basis for new queries
36. Exporting data
• From msaccess it is possible to export to
different formats!
• Tables, queries, ...
• Exports can be used to do further data mining:
– Through MSExcell making graphs
– To do statistical analysis
39. Step by step demonstration
• Open a database
• Different items in database
• Open tables, sorting, filtering
• Table design
• Relationships
• Queries
40. Query operators
= equals
> Larger than
< Smaller than
>= larger than or equals
Between ... And ...
Is null
Like ...
Not like ...
42. Query operators
and both true
or at least 1 true
< Smaller than
>= larger than or equals
Between ... And ...
Is null
Like ...
Not like ... >"q*" and <"u*" VOORNAAM René, Robbie, Stefan, Stijn, Tim, Tristam
="r*" or "s*" VOORNAAM Robbie, Stefan, Stijn
43. Intermezzo ... Design a dataset
• Research project:
– You work with 3 persons on it
– You will sample 4 times on 3 locations
– You will measure 5 environmental characteristics
– You will identify all species
– You will count them
– Extra: you will measure each specimen
– Task: design on paper how your dataset will look
like