Key lecture for the EURO-BASIN Training Workshop on Introduction to Statistical Modelling for Habitat Model Development, 26-28 Oct, AZTI-Tecnalia, Pasaia, Spain
2. 2
Index
• Introduction to species habitat and some concepts in community ecology
• Statistical methods dealing with communities
• Analysis of β-diversity: Similarity and distance matrices & Mantel and
partial Mantel test
Practical session “Community Ecology with R”
• Direct Ordination Methods (CCA and RDA)
• Variation partitioning
Practical session “Community Ecology with R”
• 4th corner method
3. 3
Hypothesis
• Which are the main factors that determine the distribution (or the habitat) of species?
• Environmental factors (e.g. temperature, nutrients, …) → Adaptation processes
versus
• Dispersal limitation factors (reproduction and mortality rate, growth,
migration,…) → Historical processes
• for a species, but species compete for resources (hence, for space)
• for an assemblage (or community) of species, within a guild
A guild (or ecological guild) is any group of species that exploit the same
resources: e.g. zooplankton, phytoplankton, trees
4. 4
Hypothesis
Site 1 Site 2 Site 1 Site 2
A
E B
D A F
B C E
F
C G G D
Shared species ↓ Shared species ↑
• Which are the main factors that determine the species composition of a
community in a region?
• What are the factors that determine the maintenance of local and regional
diversity?
5. 5
diversities…
γ-diversity /
Landscape
α-diversity / β-diversity /
Within an homogeneous habitat Environnemental
gradient
Whittaker (1960, 1977)
6. 6
Habitat theories
1. Environmental factors ⇔ Niche ⇔ « Environmental patchiness »
Abundance a b c d e
Environmental Gradient
2. Geographic Distance ⇔ Dispersal limitation ⇔ « random walk »
(Neutral theory, Hubbell 2001)
Shared
species
Distance between sites
Neutral community: all individuals have the same rates of reproduction and mortality
7. 7
Niche model
• The Hutchinsonian niche views niche as an multi-dimensional hypervolume, where the
dimensions are environmental conditions and the resources that define the requirements of
an individual or a species (E. Hutchinson, 1957).
• The full range of environmental conditions (physical and biological, i.e. the resources) under
which an organism can exist describes its fundamental niche.
Unidimensional niche
Two dimensional niche
Abundance
Variable
Three dimensional niche
8. 8
Dispersal-limited model
• Species composition fluctuates in a random, autocorrelated way.
Site 1 Site 2 Site 1 Site 2
A
E B
D A F
B C E
F
C G G D
Similarity ↓ : β-diversity↑ Similarity ↑: β-diversity ↓
Distance decay
β-diversity Metacommunity: a set of local communities
Shared that are linked by dispersal of multiple,
species Metacommunity A
potentially interacting species
Metacommunity B
Geographical distance
9. 9
Terminology
A metapopulation is a group of spatially separated populations of the same
species which interact at some level
m1
n2
n1
A metacommunity is a set of local communities that are linked by dispersal of
multiple, potentially interacting species
na2 ma1
mb1 nb1
nb2
na1
nc1
na3
10. 10
The theory of island biogeography
(MacArthur and Wilson, 1967)
• The number of species found on an undisturbed island is determined by immigration and
extinction.
• Immigration and emigration are affected by the distance of an island from a source of colonists
(distance effect).
• Large islands => lower extinction
• Near islands to continents => higher immigration rate
MacArthur, R. H. and Wilson, E. O. 1967. The Theory
of Island Biogeography. Princeton, N.J.: Princeton
University Press.
11. 11
Dispersal limited model
Variance partitionning
β-diversity
Condit et al. Science, January 25, 2002. Duivenvoorden et al. Science, January 25, 2002.
12. 12
Spatial Autocorrelation
species
Shared
Environmental Gradient Geographic distance
Legendre, P. (1993) Spatial autocorrelation: trouble or new paradigm. Ecology, 74, 1659–1673.
• Environmental variables and species distributions tend to be spatially autocorrelated:
• Species distributions are most often aggregated because of contagious biotic processes such as
local dispersal
• But also, environment is structured primarily by climate and geomorphological processes on
land that cause gradients and patchy structures.
• Therefore values of these variables are not stochastically independent from one another. This may
lead to misinterpretation of patterns using classical statistics when ecologists conclude that species–
habitat associations are statistically significant.
• To evaluate the relative importance of environmental segregation and limited dispersal in explaining
species distributions, spatial structure must be considered.
• Spatial autocorrelation can be a problem for explaining species ecological niche, however, it can
improve habitat modelling
13. 13
Some statistical methods to analyse distribution
patterns of species communities
• Similarity and distance matrices &
Mantel and partial Mantel test
(Analysis of β-diversity)
Practical session “Community Ecology
with R”
• Direct Ordination Methods (CCA and
RDA)
• Variation partitioning
• Practical session “Community Ecology
with R”
• 4th corner method
14. 14
Analysis of β-diversity:
Similarity and distance matrices
&
Mantel and partial Mantel test
16. 16
(Dis)Similarity and distance indices
Site 1 Site 2
Similarity indices (for species data): 0 → 1 B
F
A E
• Jaccard index (for presence-absence data) is the C
number of species shared between the two plots, D
divided by the total number of species observed.
0 (no shared species) → 1 (all species shared) Jaccard = 4 / 6
• Bray-Curtis index (for abundance data) is defined by
2W/(A+B), where W is the sum over all species of the
minimum abundances between the two stations of sp1 sp2 min
each species, and A and B are the sums of the St1 3 4 →3
abundances of all species at each of the two stations.
St2 5 2 →2
• Bray-Curtis is also known as Steinhaus dissimilarity,
Sørensen index, or Czekanowski W= 5
•…
Distance indices (for variables): var1 var2
• Euclidean : d St1 32.3 0.2
•…
St2 34.6 0.3
d1=2.32 d2=0.12
19. 19
Case Study 1: Tree rainforest in Panama
Floristic data:
708 tree species (> 10 cm dbh)
53 sites of ~1 ha
Precipitation
Gradient
Floristic
Composition
Environmental Variables:
• Precipitation
• Elevation
• Slope
• Water accumulation flow
• Geology
• Fragmentation
20. 20
Case Study 1: Tree rainforest in Panama
Jaccard
Geographical Distance (GD) 0.637
Fraction species shared
Dispersal-related factors ln(GD) 0.696
β-diversity Cross-plot forest fraction 0.323
Elevation 0.424
Slope 0.318
Runoff 0.078
Environmental factors
Precipitation 0.572
Dry season 0.461
Geologic types 0.126
Band 1 0.305
Band 2 0.117
Band 3 0.127
Spectral data
Distance (km) Band 4 0.258
Band 5 0.148
Condit et al. Science, January 25, 2002.
Band 7 0.160
21. 21
Identification of complementary areas of diversity
Site 1 Site 2 Site 1 Site 2
A
E
D A F
B C E
F
C G D
• Problem of the minimal area
Minimise the total surface while preserving all species
• Problem of the maximal coverage
Maximise the number of species within a fixed surface
Optimising γ-diversity
23. 23
Identification of complementary areas
Step 1. Hierarchical agglomerative clustering
Similarity
0%
8%
20%
Cluster 3.1 3.2 3.3 3.4 3.5 2.1 2.2 1.1 1.2
Plots 1,3,4,21,22, 2,S0,S1,S3,S2,S4, 34 40 41 31,32 36 35 37
29,23,27,24, SH,25,26,5,17,13, 33
28,30,C1, 10,11,18,14,P1,
C4,C2,C3 P2,6,7,12,15,16,8,
9, 20,19,G1,G2
Step 3.
Step 2. Multiple Regression Model Extrapolation
between distance matrices of the model
and cluster
1.0
assignation
Jaccard similarity
0.8
Ŝ(pixel i, site 1)
• Log(GD) 0.6 Ŝ(pixel i, site 2)
• Elevation
• Bands 1-4 0.4
:
R2 = 0.57 (p < 0.001)
Ŝ(pixel i, site 53)
0.2
0.2 0.4 0.6 0.8 1.0
Predicted
24. 24
Predicted floristic types: identification of complementary areas
Non-rain forest
Water surfaces
Cluster 1.1
Cluster 1.2
Cluster 2.1
Cluster 2.2
Cluster 3.1
Cluster 3.2
Cluster 3.3
Cluster 3.4
Cluster 3.5
Chust, G., J. Chave, R. Condit, S. Aguilar, S. Lao, & R. Pérez (2006) Determinants and spatial modeling of beta-
diversity in a tropical forest landscape in Panama. Journal of Vegetation Science 17: 83-92.
25. 25
Case Study 2: zooplankton in the Bay of Biscay
47
20
10
0
267 Zooplankton
0m m
samples
collected from
46
May 2-16, 2004
Bay of Biscay
45 Cap Ferret Canyon
Gironde Estuary 24 most abundant
copepods
Arcachon Bay
44
Cap Breton Canyon
Adour river
43
-7 -6 -5 -4 -3 -2 -1 0
Copepod Calanus helgolandicus
Irigoien, X., G.Chust, J.A. Fernandes, A. Albaina, L. Zarauz (2011) Factors determining
mesozooplankton species distribution and community structure in shelf and coastal waters.
Journal of Plankton Research 33: 1182-1192.
26. 26
Case Study 2: zooplankton in the Bay of Biscay
Species similarity indices against geographic distance
1.0
Species similarity (Bray-curtis)
0.8
0.6
Species similarity 0.4
0.2
0.0
0 50 100 150 200 250 300 350
Distance (km)
0.8
Species similarity (Jaccard)
0.6
0.4
0.2
0.0
0 50 100 150 200 250 300 350
Distance (km)
Distance (km)
27. 27
Case Study 2: zooplankton in the Bay of Biscay
Species similarity indices against environment:
• 15 environmental variables (bottom depth, temperature,
salinity and density at surface and bottom, difference in
density between surface and bottom, Frequency of Brunt-
Vaisala, integrated fluorescence, depth of the maximum
fluorescence, fluorescence at the maximum, abundance of
chaetognath, jellyfish and fish eggs)
• 32767 possible subsets were compared
!
• ∑ , where n: number of var., k: combinations
! !
• The best subset of environmental variables selected so that
explain the maximum variation of the species similarity were
4: Frequency of Brunt-Vaisala, salinity at surface, density at
bottom and jellyfish abundance (for Bray-Curtis index)
28. 28
Model Selection
Aim: to select the best subset of environmental variables, so that distances of
(scaled) environmental variables have the maximum correlation with community
dissimilarities
Environmental Matrix Environmental similarity
x11 x12 ... x1q 1 s12 s13 s14 s15
. 1 s23 s24 s25
x21 ... . .
AMB = AMB
(Euclidean, …) sim = . . 1 s34 s35
: . . .
. . . 1 s45
x .
m1 . . xmq
. . . 1
x11 x12
x ...
AMB = 21
: .
x .
m1
n combinations of q variables → n Environmental similarity matrices
29. 29
Case Study 2: zooplankton in the Bay of Biscay
Mantel r p-value Terms selected for Environmental variables
Bray-Curtis × Environment 0.54 0.001 Frequency of Brunt-Vaisala, Salinity at surface,
Density at bottom, Jellyfish abundance
Bray-Curtis × Distance 0.43 0.001
Bray-Curtis × Environment (Distance partially out) 0.50 0.001
Jaccard × Environment 0.44 0.001 Temperature at bottom, Density at surface and at
bottom, Fish abundance
Jaccard × Distance 0.47 0.001
Jaccard × Environ selec (Distance partially out) 0.34 0.001
ENV Conclusion: mesozooplankton communities in
DIS the Bay of Biscay are subjected to balanced
degree of dispersal limitation and niche
segregation.
30. 30
Case Study 2: a comparison of estuarine intertidal communities
Saltmarsh and seagrass plants Macroalgae Macroinvertebrates
rM = 0.625 rM = 0.316 rM = 0.064
Slope = -0.0021 Slope = -0.0020 Slope = -0.0003
31. 31
Software for Similarity/distance indices and Mantel tests
• R: vegan package (Oksanen et al. 2011, see Docs)
• PRIMER (Clarke & Gorley 2006; http://www.primer-e.com/)
• …
33. 33
ANALYZING BETA DIVERSITY: PARTITIONING THE
SPATIAL VARIATION OF COMMUNITY COMPOSITION
DATA (Legendre et al. 2005, Ecological Monographs)
• The variance of a dissimilarity matrix among sites (rM2) is not the variance of the
community composition,
• hence, partitioning on distance matrices should not be used to study the variation in
community composition among sites.
• Partitioning on distance matrices underestimated the amount of variation in
community composition explained by the raw-data approach.
• The proper statistical procedure for partitioning the spatial variation of community
composition data among environmental and spatial components, and for testing
hypotheses about the origin and maintenance of variation in community composition
among sites, is canonical partitioning.
• The Mantel approach is appropriate for testing other hypotheses, such as the variation
in beta diversity among groups of sites. Regression on distance matrices is also
appropriate for fitting models to similarity decay plots.
36. 36
Constrained (Canonical) Ordination Methods
• Ordination methods such as principal component analysis (PCA) are used to
reduce the variation in community composition in an ordination diagram.
(PCA uses an orthogonal transformation to convert a set of observations of possibly correlated
variables into a set of values of uncorrelated variables called principal components)
x11 x12 ... x1n x11 x12 ... x1q
pc1
pc 2 ... pc 3
x21 ... . . x21 ... . .
S = → ...
PCA =
... . .
× AMB = :
: . . .
: . . .
. . .
... ...
x . . xmn
. . x . . xmq
m1 m1
• Constrained (Canonical) Ordination: is a combination of ordination and
multiple regression. It extracts continuous axes of variation from species abundance
data in order to explain which portion of this variation is directly explained by
environmental variables. The axes are constrained to be linear combinations of
environmental variables. The orthogonal directions in PCA is particular and other
directions may well be better related to env. var. Canonical Ordination is a solution
for this.
Response models Indirect Direct Multivariate
Linear PCA Constrained Ordination: RDA
Unimodal Constrained Ordination: CCA
37. 37
Constrained (Canonical) Ordination Methods
Canonical Correspondence Analysis (CCA): species are assumed to have unimodal
response surfaces with respect to compound environmental gradient. It is related to
Correspondence Analysis and it is based on Chi-squared distance.
Abundance
a b c
Environnemental Gradient
Redundancy Analysis (RDA): species are assumed to have linear response surfaces with
respect to compound environmental gradients. Thus, RDA is a direct extension of multiple
regression to the modelling of multivariate response data. It is related to PCA and it is based
on Euclidean Distances.
c
a
Abundance b
Environnemental Gradient
38. 38
Spatial terms for Canonical Ordination Methods: trend surface
Geographic distance Trend surface model
for Mantel approaches Linear
30
x, y
25
d 20
Z Data
15
10
5
5 4
ta
3
Da
0
X
4 2
3
x y Y Da
ta
2
1
1
.
. .
.
. .
Cubic 100
x, y, xy, x2, y2, x2y, y2x, x3, y3 50
0
Z Data
-50
5
-100
4
ta
3
Da
-150
X
4 2
3
2
Y Da 1
ta 1
40. 40
Software for Canonical and Redundancy analysis, and Variation Partitioning:
• R: vegan package (Oksanen et al. 2011, see Docs)
• CANOCO (ter Braak and Smilauer 1998;
http://www.pri.wur.nl/uk/products/canoco/)
• …
41. 41
Case Study 1: Tree rainforest in Panama
Based on Mantel test Based on Canonical Correspondence Analysis
Shared
Spatial
terms 17%
25% Environment
10%
46%
Duivenvoorden et al. Science, 2002. Not explained Chust et al. 2006. JVS*
Conclusion: The distribution of Panamanian tree species appears
to be primarily determined by dispersal limitation, then by
environmental heterogeneity
*Chust, G., J. Chave, R. Condit, S. Aguilar, S. Lao, & R. Pérez (2006) Determinants and spatial modeling
of beta-diversity in a tropical forest landscape in Panama. Journal of Vegetation Science 17: 83-92.
44. 44
4th Corner Method (Legendre et al. 1997)
• The fourth-corner tests for the association between biological traits to habitat at
locations where the corresponding species are found.
• How do the biological and behavioral characteristics of species determine
their relative locataions in an ecosystem?
• e.g. are the modes of dispersion related to habitat fragmentation?
A B C
Presence/Absence × Traits × Environment
245 sp × 78 sites 3 life form Fragmentation
4 types of dispersion
A( sp × sites ) B ( sp × trait ) D = C * A’ * B
C (var× sites) D(var× trait )
• test F (global)
• Correlation r
Legendre, P., Galzin, R. & Harmelin-Vivien, L. (1997) Relating behavior to habitat: solutions to the fourth-corner
problem. Ecology, 78, 547–562.
45. 45
Case study 1: Coral reef fish data
• Biological and behavioural traits
• Environmental variables:
Bottom type
Depth
…
Legendre et al. 1997
46. 46
Case study 2: Plant traits
3 life forms
Habitat fragmentation
4 types of dispersion
47. 47
Case study: Plant traits
Test F
Fragmentation
Correlation
Interpretation: The effects of
fragmentation of scrubland
on scrub species community
are related to the dispersal
type
Interpretation: Wind-
dispersed species are
positively related to the
defragmentation
48. 48
Case study: Plant traits
Woody plants
Annual herbs Interpretation: Wind-
Number of species in scrublands dispersed and annual
species are positively
related to the
defragmentation of
scrublands
Animal-dispersed
Wind-dispersed
0-33 34-66 67-100
Fraction of scrubland (%)
Fragmentation
Chust, G., A. Pérez-Haase, J. Chave, & J. Ll. Pretus. (2006) Linking floristic patterns and plant traits of Mediterranean communities in
fragmented habitats. Journal of Biogeography 33: 1235–1245.