Tools for evaluating geo-referencing quality and characterizing germplasm collection sites
1. Tools
Mauricio Parra Quijano
FAO consultant
International Treaty on Plant Genetic Resources
for Nutrition and Agriculture
CAPFITOGEN Program Coordinator
6. Level Value
ORIGCTY CRI
ADM1 Punta Arenas
ADM2 Buenos Aires
ADM3 NA
ADM4 NA
Level Value
ORIGCTY CRI
ADM1 Punta Arenas
ADM2 Pérez Zeledón
ADM3 NA
ADM4 NA
Description of the collecting site
Error describing the collecting sites
7. GEOQUAL features
•GEOQUAL is a tool which assigns a quality value to the passport data of a
germplasm collection that include coordinates.
•The user enters the passport data in FAO-Bioversity 2012 format.
•GEOQUAL calculates three parameters COORQUAL, LOCALQUAL and SUITQUAL
along with other sub-parameters.
•The parameters are summarized to generate both TOTALQUAL (0-60 range) and
TOTALQUAL100 (0-100 range)
8. Parameter that determines the intrinsic quality of the coordinates included in the
passport data. Values from 0 to 20. Sub-parameters.:
• ERRORES: Values beyond the coordinate frame
• PRECIS: Accuracy level. Measured in degrees, minutes or seconds (sexagesimal)
• GEORBLE: Probability of correct coordinates from site description
• INTERTEMP: Quality of coordinates by collection year
• * GEOREFMETH: System by which coordinates are assigned
COORQUAL
9. SUITQUAL
Parameter that assigns a quality value to the coordinates according to the
appropriateness of the collection site for plant growth. Values from 0 to 20.
• Difference between cultivated and wild plants (SAMPSTAT)
• It uses information on land use from Global Land Cover map (1 km)
10. > 30 km
10-20 km
5-10 km
0-1 km
Ground level
0 20
Distance from the coastline
SUITQUAL
12. LOCALQUAL
Parameter that comes from the comparison of the site (locality) description and
administrative data coming from the coordinates, both from user’s passport data.
• The administrative geo-referenced information is extracted from the GADM
database
• The comparison is between character strings, generating a distance
(Levenshtein). Insertions, deletions and changes are determined, to assume
that a string is equal to another. Function "agrep" in R
• According to the number of correct matches, a value ranging from 0 to 20 is
assigned.
Passport
description
GADM from
Coordinates
GADM (second
option)
ORIGCTY ISO
ADM1 NAME1 VARNAME1
ADM2 NAME2 VARNAME2
ADM3 NAME3 VARNAME3
ADM4 NAME4 VARNAME4
16. ELC maps
It allows the user to create eco-geographical land
characterization maps (ELC), that reflect adaptive
scenarios for a given species (or species groups) and a
specific country or region
18. Variable selection
Geophysical variables
Cluster analysis
Determination of
optimal number
of groups
Combination
(N bioclimatic*N geophysical*N edaphic)
Categories
MAP
Description of categories using original variables
Edaphic variables
Cluster analysis
Determination of
optimal number
of groups
Bioclimatic variables
Cluster analysis
Determination of
optimal number
of groups
How an ELC map is developed?
19. Expert opinion / knowledge
• Experts on target species are a valuable source of information
• Surveys are an efficient way to gather information from expert knowledge
(internet/email, meetings, workshops, etc.).
• Variable lists are made by components, with details on the nature of the variables
(explanation of codes, variable units, source, etc..). Then a value is assigned based
on the importance that a given variable has regarding the adaptation of the
species.
Bibliography search on major factors in the adaptation of target species
Variable selection I
20. Variable selection II
Debugging:
• Redundancy? Correlation? Collinearity?
• Bivariate correlations analysis, PCA, the inflation factor of VIF variance
(comparison of linear relationships between variables – only in regression)
• Significance. Through a multiple regression analysis taking into account a
dependent variable (that gives a measurement of adaptation).
x1
x2
x1
x1
x1
21. What type of map you need?
Depending on the approach of the analysis, the ELC map can be :
1. Generalist map
2. Map by species / gene pool / group of related Sp
(Specific map)
It defines the major environments for a large number of species
(related or not). For most of these species, the ELC map should
discriminate different adaptive scenarios in a given target area. It is
expected to find unadjusted relationships between adaptive
characteristic of a smaller group of species and the resulting map (see
Parra-Quijano et al., 2012).
They define in more detail the key environments for a particular
species or a limited set of genetically related species. A good fit
between the map and the adaptive characteristics of the target
species is expected.
22. ELC mapas tool results
• Maps (which can be opened with DIVA-GIS) and tables describing each category.
23. ECOGEO
It allows to perform eco-geographical characterization
of the geo-referenced collecting sites
24. 0 cm
5 cm
10 cm
Internodes
length
= 5.56 cm
1 2 3
1 0 1
0 1 0
= present = 1
= absent = 0
NOT of the
germplasm
but of the
collecting site
ECOGEO is a characterization
25. Process of ecogeographical characterization
Characterization
matrix :
Rows: Germplasm
identifier
Columns:
Ecogreographical
descriptors
passport
Data (including
coordinates)
GIS
Elevation
Average Annual Temp
Soil Organic Carbon
Soil pH
….
….
Y
X
26. Point or radial extraction?
2 4 3
1 3 2
1 3 2
1
1
3
1 1 3 4
Ecogeografical variable X
NA
NA
NA
NA
1 1 3 4NA
ACCENUMB VARIABLE
a NA
b NA
c 2
2 4 3
1 3 2
1 3 2
1
1
3
1 1 3 4
NA
NA
NA
NA
1 1 3 4NA
a
b
c
Distribution of
passport data entries
2 4 3
1 3 2
1 3 2
1
1
3
1 1 3 4
NA
NA
NA
NA
1 1 3 4NA
GIS overlap Extraction results
ACCENUMB VARIABLE
a NA (1)
b 1
c 3
a
b
c
True location
a=68
b=65
c=50
GEOQUAL
uncertainty
Radius
Radial extraction
27. ACCENUMB CAPTURED
VALUES
AVERAGE
a NA,1,1 1
b NA,1,1 1
c 3,2,1,3,2,
3
2.333
GIS overlap
Results of radial extraction
ACCENUMB VARIABLE
a 1
b 1
c 3
Correct extraction
ACCENUMB VARIABLE
a NA
b NA
c 2
Point extraction
1
1
2.333
Radial extraction
2 4 3
1 3 2
1 3 2
1
1
3
1 1 3 4
NA
NA
NA
NA
1 1 3 4NA