The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
Herrera B - Spatial Epidemiology and Crop Pest and Diseases Mapping 2012
1. Pic:Neil Palmer, CIAT
Species distribution modeling of Pests and diseases
Beatriz Vanessa Herrera
International Center for Tropical Agriculture (CIAT) -Decision and Policy Analysis-2012
2. Methodology
• Occurrence records
related with knowledge
about pests behaviour and
epidemiology of pathogens
•Variable selection
•Evaluation of niche
models
CLASSIFICATION
•Consensus distribution
maps
7. Several niche approximations
CSM- Climate Space Model
ED- Environmental Distance
GARP- Genetic Algorithm for
Rule-Set Production
A ∩ B ∩ M = RN
RN = Realized niche
Soberón & Peterson, 2005, 3
SVM- Support Vector Machines
Maxent- Maximum Entropy
species distribution model
9. EVALUATION METRICS
EVALUATION SAMPLE
REAL PRESENCE (+) PSEUDOABSENCES (-)
Negative false
True positive
PRESENCE (+) a b COMISION ERRROR
CORRECT PREDICTION
OVERPREDICTION
MODEL
PREDICTION
Negative false
ABSENCE True false
c OMISION ERROR d
(-) CORRECT PREDICTION
UNDERPREDICTION
Sensibilidad= (A/A + C) A y D: correct prediction
1-Especificidad= (D/B + D) B: Comission error (POSITIVE FALSE) (overprediction)
C: Omission error (NEGATIVE FALSE) (underprediction)
Error de clasificación= (B +C)/N
Kappa= [(a+d) – (((a +c)(a+b) + (b+d)(c+d)/ N)]
[N – (((a+c) (a+b) + (b+d) (c+d))/N)]
10. Evaluation metrics and selection criteria
Commission error:
Omission error:
(pseudo) absence records
records in non-predicted areas
in predicted areas
12. Whitefly- expert criteria
Some results and comparisons of
Kappa/ thresholdvariable datasets
Specificity/ error rate
Climate Space Model
0.83/8- Whitefly- COR
0.269–0.96
Environmental distance
0.802/63.1 - 0.357/90
0.826 0.902
Maxent
0.842/23 - 0.5/1
0.442 0.995
Garp
0.806/30 - 0.632/30
0.903 0.844
Support vector Machines
0.823/25.1 - 0.73/56
0.596 0.956
Source: Herrera et al. 2011. Threats to cassava production: known and potential geographic distribution of four key biotic constraints. Food security: 3:329-345
13. Problems
Whitefly
Model sensitivity Error rate Weight
GARP All 0.904 0.16 26.1
ED All 0.942 0.05 27.23
ED Exp 0.826 0.09 23.8
CSM all 0.788 0.03 22.7
0.865 0.0825 99.8
Maxent- COR
Source: Herrera et al. 2011. Threats to cassava production: known and potential geographic distribution of four key biotic constraints. Food security: 3:329-345
14. Realized vs potential distribution
Source: Herrera et al. 2011. Threats to cassava production: known and potential geographic distribution of four key biotic constraints. Food security: 3:329-345
15. Model comparisons
CMD
Error
Model sensitivity Weight
rate
GARP all 0.722 0.04 50
ED all 0.833 0.02 50 Examples of Underprediction
0.7775 0.03 100
Source: Herrera et al. 2011. Threats to cassava production: known and potential geographic distribution of four key biotic constraints. Food security: 3:329-345
16. Cassava Mosaic Disease
Source: Herrera et al. 2011. Threats to cassava production: known and potential geographic distribution of four key biotic constraints. Food security: 3:329-345
24. Some implications for CWR research
Future research should make full use of the advantages of
several species distribution models for global and regional
studies.
In CC research complementary models are required in
order to better explain expected changes in species
responses.
Research in CWR should include pressures due to biotic
constraints.
Our basic methodological approach is species distribution modeling, a process for: mapping the known distribution of pests and diseases, analyzing the environment where these pests and diseases have been found, developing ecological niche models by analyzing the environmental characteristics of the known locations of the pest and diseases. Validating the models producing statistic and maps showing the known and predicted distributions of the pests and diseases.On the left side of the diagram, the steps to assess the known distribution of a pest or disease are shown. The first step is to collect information on the known distribution from one of four sources: databases from virology or entomology labs, online databases, such as the global biodiversity information facility, scientific articles and surveys. The known locations should be geographically referenced with latitude and longitude coordinates.On the right side of the diagram, environmental variables that are in some way related to the distribution of pests and diseases are organized. Different sets of variables are tested and methods for reducing collinearity are employed.At the bottom of the diagram, the known distributions are overlaid on related environmental variables to produce a data set for modeling.Ecological niche models are variations on logistic regression. We have been using six different models: Bioclim, environmental distance, climate space model, support vector machine, Garp and Maxent. These models are implemented in three different computer interface environments – DIVA-GIS, open modeler and Maxent. The next step is to assess errors, sensitivity and overall model performance. In this step usually some of the input data is held back to use it for validation.The final maps can be selected according to error and sensitivity statistics or by determining where different models agree.
Typical environmental information for the models are global climate databases such as Worldclim. Another possibility for analysis is to use global circulation model data on climate change. Basically the same methodology is applied but using predicted future climate. Other non-climatic information could be used as well.
Species distributionmodels use climatic and other data to assess the environmental range of a species in multi dimensional space. There is a large bio-geographical literature on this topic.
The error analysis tells us which models performed well.
The models produce a map of the potential distribution of a pest or disease, based on the known occurrence. Numbers closest to one are places where the pest or disease has potential but lower likelihood. Numbers closest to 100 are the most likely places for the pest or disease to find suitable environmental conditions.By applying a weighted overlay analysis maps can be developed that show where different models agree, lending support to the notion that agreement across models is an indicator of the reliability of the predictions.
Same model, same specie, same number of occurrence records. Performance depends on the number of occurrence records but more in the correlation of variables.
Bemisia tabaci- predictionsAnyone of the models show us areas outside the range, where the species actually occurs. One problem in this case, is that if we are looking for variables which better explain the distribution of a species, it should be a different exercise of summarize the models. Which could be the same
Maxent and SVM(left) with few variables underrepresented the realized distribution of the species
As an example, in the case of cassava, climate change predictionssuggest that cassava will not be impacted from abiotic constraints (drought or highest temperatures). …But higher temperatures and changes in precipitation patterns could affect rate development of arthropod pest. the greatest impact on cassava would be from biotic constraints!
In this case the modelling is possible only for key pest for which the information is available. One problem here is that we are modelling species as statics! And they aren’t. Crops could be modified but invasive species not! … and also other species related with cassava could affect the crop under global change.
Here we have a big list of the main pest of cassava and their natural enemies, which have significance and…
Specialist species of cassava. These are successful stories about biological control possible due to the effort and investigation of many years. But we could experiment similar impacts wit emerging pest and we have to be prepared because the demands for food supplies are bigger and increasing.
And the cassava complex is large.
This is an example of an “possible” emerging pest. In this map we use few records to predict the potential distribution in Asia, but it is not enough information because this pest is not a problem here in the Americas. But… this is already happen in Asia.