Integrated GIS/Machine-Learning Workflows
for Modeling Spatiotemporal Variations in Potential Seagrass Habitats within a Changing Climate, European Geosciences Union General Assembly Paper ESSI4.3 – EGU2018-10081, Vienna, Austria, April 2018.
Coastal marine plant habitats are impacted by changes in ocean conditions and the resulting changes in plant
populations can produce positive climate feedbacks which exasperate warming. (Waycott et al., 2009). One such
example is seagrasses, marine plants that can sequester vast amounts of carbon. When compared to tropical
terrestrial forests, seagrasses can store up to 100 times more CO2 at a rate that is 12 times faster (Mcleod et al.,
2011). Understanding the future of an important biologic carbon sink such as seagrass can shed some light into
future carbon balance. Modeling the relationships between seagrass occurrence and ocean conditions, current and
future, can aid in quantifying the impacts on future carbon balance. In this work, we use an integrated GIS and
machine learning approach to build a data-driven model of seagrass presence-absence in a changing climate. We
quantify the relationships between observed seagrass occurrence and ocean conditions. This relationship allows us
to delineate patterns in current ocean conditions that promote favorable seagrass habitats.We pose this relationship
as a binary classification problem and utilize Random Forest to establish a relationship for seagrass occurrence.
This relationship is projected into the future under changing ocean conditions. We use deep-learning methods,
recurrent neural networks, to forecast ocean conditions as the oceans get warmer and use these conditions in
conjunction with the Random Forest model to predict the abundance of future seagrass habitats. We integrate
multiple data sources including fine-scale seagrass data from MarineCadastre.gov and the recently available,
globally extensive publicly available Ecological Marine Units (EMU) dataset. In addition, we use global ocean
models from NOAA to calibrate our ocean forecasts. Our analysis includes a sensitivity study which investigates
the vulnerability of seagrass to changes in specific ocean variables. We use the proposed model to provide an
upper bound of the amount of carbon that can be stored in seagrasses as ocean conditions change. Finally, we
use a Getis-Ord Gi* statistic within a space-time window to quantify the temporal changes in potential seagrass
habitats.
Engler and Prantl system of classification in plant taxonomy
Integrated GIS/Machine-Learning Workflows - Seagrass Use Case
1. Integrated GIS/Machine-Learning Workflows
for Modeling Spatiotemporal Variations in Potential Seagrass
Habitats within a Changing Climate
Orhun Aydin, Kevin Butler, and Dawn Wright
Environmental Systems Research Institute
Redlands, California, USA
oaydin@esri.com
ESSI4.3 – EGU2018-10081
2. 1 Acre
1000s
fish
10,000,000s
invertebrates= +
Bloomfield & Gillanders, 2005, Estuaries & Coasts
What will be the fate of seagrasses
worldwide as oceans warm?
600,000 km2
coverage Duarte et al., 2005, Aquatic Ecosystems: Trends and Global Prospects
30 - 100
times more 𝐶𝑂2 captured compared to tropical
forests Irving et al., 2011, PLOS ONE
3. Data at esriurl.com/emudata, Story Map at esriurl.com/emustory
Ecological Marine Units (EMU)
… www.esri.com/ecological-marine-units
Transitioning to Ecological Coastal Units (ECUs) in 2018…
Friday ESSI4.3 Poster X1.37
EGU2018-6938
5. ① Quantify the relationship
between observed seagrass
occurrence and long-term ocean
conditions (EMU data)
6. ① Quantify the relationship between
observed seagrass occurrence and
existing ocean conditions
• Created continuous surfaces of
EMU data using Empirical Bayesian
Kriging
• Used temperature, salinity,
dissolved oxygen and nutrients to
train a random forest classifier on
U.S. seagrass occurrence data
(yielded 97.8% classification
accuracy). Predicted globally.
• Full details about the RF model
available in a Jupyter notebook:
https://github.com/orhuna/RF-
Demo/tree/master
10. Predicting Oceans Conditions: Spatial or Aspatial?
Temperature (C)
Salinity(ppm)
• A regression model applied to data
globally cannot capture different
characteristics in the data
• Relationship between ocean
conditions vary with space (e.g., the
relationship in the Mediterranean is
different than the one in the North
Sea)
• How can we better identify these
regions before creating a predictive
model?
11. Defining Data-Driven Neritic Regions
• Ocean regions based on
• Temperature
• Salinity
• Phosphate
• Silicate
• Nitrate
• Oxygen
• Without supervision, Spatially
Constrained Clustering identified six
regions:
• Land-locked seas
• Southern coasts of Americas
• Coast of Antarctica
• Northern Coasts
• Warmer and nutrient rich coasts
• Asian portion of the ring of fire
12. Predicting Ocean Conditions in Each Region
Southern coasts of the AmericasLand-locked seas Coast of Antarctica
Warmer and nutrient rich coastsNorthern coasts Asia portion of the ring of fire
13. ③ Simulate future ocean
conditions and determine
impacts on seagrass occurence
14. Predicting Seagrass Occurrence In Warming Oceans
Predicted Seagrass Habitats (Present)
Observed Seagrass Distribution (Short et al. 2007,
J Experimental Marine Bio & Ecol.)
low
density
high
density
34. Predicting Seagrasses in Warming Oceans
• Seagrass occurrence predicted for a
gradual ocean temperature increase
• Steps of 0.1°C increase up to 2°C
• Changes in seagrass habitats are
mined using Emerging Hot Spot
Analysis (EHSA)
• Australia could lose significant
seagrass habitat
• Arctic (e.g., Siberia) could potentially
gain
35. Conclusions Discussion and Future Work
• Seagrass occurrence predicted for a gradual
increase in ocean temperature
• Steps of 0.1 C increase up to 2C
• Australia could lose significant seagrass
habitat
• Arctic becomes a favorable location for
seagrass habitat
• However, seagrass seeds still need to
travel
• Zostera marina species endangered
• Grows in Hudson Bay, Australia and
Polynesia
• Global warming is destroying mechanisms
that actively keep it in check (such as
seagrass)
• Current-day temperature range is preserved
in the analysis
• Random Forest regressor is truncated
to avoid extrapolation
• Model does not take into account human-
induced changes to nutrients such as
fertilizer run-off
• Acidification and amount of colloidals in the
water column will be modelled to build a
new seagrass habitat model
36. Resources
• Download this presentation at esriurl.com/eguseagrass and contact Orhun Aydin,
oaydin@esri.com for more info, as well AGU Eos article eos.org/articles/rising-ocean-
temperatures-threaten-carbon-storing-sea-grass
• Github repo
github.com/orhuna/RF-Demo/tree/master
• Learning Module with Jupyter Notebook
http://learn.arcgis.com/en/projects/predict-seagrass-habitats-with-machine-learning/
• Main GeoAI repo
github.com/ArcGIS/geo-ai
Email github_admin@esri.com for access
• Friday ESSI4.3 Poster X1.37, EGU2018-6938, Drew Stephens presenting
37. References
1. Bloomfield, A. L., & Gillanders, B. M. (2005). Fish and invertebrate assemblages in seagrass,
mangrove, saltmarsh, and nonvegetated habitats. Estuaries and Coasts, 28(1), 63-77.
2. Duarte, C.M., Borum, J., Short, F.T., Walker, D.I., 2005. Seagrass ecosystems: their global status
and prospects. In: Polunin, N.V.C. (Ed.), Aquatic Ecosystems: Trends and Global Prospects.
Cambridge Univ. Press.
3. Irving, A. D., Connell, S. D., & Russell, B. D. (2011). Restoring coastal plants to improve global
carbon storage: reaping what we sow. PLoS One, 6(3), e18311.
4. Short, F., Carruthers, T., Dennison, W., & Waycott, M. (2007). Global seagrass distribution and
diversity: a bioregional model. Journal of Experimental Marine Biology and Ecology, 350(1), 3-
20.
Notas del editor
Seagrasses in Numbers…
1 acre provides habitat for 1000s of fish and 10s of millions of invertebrates
Large coverage of 600,000 sq km on shallow seabeds throughout the world
Can capture 30-100 times more CO2 as compared to tropical forests, in our battle against greenhouse gas emissions, and they can do this TWELVE TIMES FASTER
This graph from NOAA shows a constant increase in sea temperature. We want to understand what may happen to seagrass habitats as oceans get warmer.
To answer the question in the title: Yes it is! Kudos to the EGU audience as many are directly involved in collecting and managing the data that shows this
“It stands to reason that as the atmosphere warms from the buildup of greenhouse gases, so does the ocean. Scientists have long suspected this was true, but they did not have enough solid evidence. Now they do. Data compiled by Marinexplore (now PlanetOS) in Sunnyvale, Calif., not only confirm previous studies that the world's oceans are simmering, but they also bring surprising news: the heating extends beyond the first few meters of surface waters, down to 700 meters…”
Courtesy planetos.com
The Ecological Marine Units (EMU) project is a new undertaking commissioned by the Group on Earth Observations (GEO) as a means of developing a standardized and practical global ecosystems classification and map for the oceans, and thus a key outcome of the GEO Biodiversity Observation Network (GEO BON). The project is one of four components of the new GI-14 GEO Ecosystems Initiative within the GEO 2016 Transitional Work plan, and for eventual use by the Global Earth Observation System of Systems (GEOSS). The project is also the follow-on to a comprehensive Ecological Land Units project (ELU), also commissioned by GEO. The EMU is comprised of a global point mesh framework, created from 52,487,233 points from the NOAA World Ocean Atlas; spatial resolution is ¼° by ¼° by varying depth; temporal resolution is currently decadal; each point has x, y, z, as well as six attributes of chemical and physical oceanographic structure (temperature, salinity, dissolved oxygen, nitrate, silicate, phosphate) that are likely drivers of many ecosystem responses. We implemented a k-means statistical clustering of the point mesh (using the pseudo-F statistic to help determine the numbers of clusters), allowing us to identify and map 37 environmentally distinct 3D regions (candidate ‘ecosystems’) within the water column. These units can be attributed according to their productivity, direction and velocity of currents, species abundance, global seafloor geomorphology (from Harris et al.), and much more. A series of data products for open access will share the 3D point mesh and EMU clusters at the surface, bottom, and within the water column, as well as 2D and 3D web apps for exploration of the EMUs and the original World Ocean Atlas data. A global delineation of Ecological Coastal Units (ECU) at a much finer spatial resolution is now underway, and a future project will delineate global ecological freshwater ecosystems (EFUs). We will also be exploring how to conceptually and spatially connect EMUs, ELUs, and EFUs at the ECU interface.
Workflow include establishing a relationship between seagrass occurrence and ocean conditions
And predicting ocean variables into the future using the 6 major parameters of the EMUs (which in turn is based on NOAA’s World Ocean Atlas)
We used a random forest (generic ML) methodology as a classifier to establish a relationship between where seagrass occurs and the ocean conditions at that location.
And we forecasted these variables into the future, assuming the case where oceans get gradually warmer
Quantify the relationship between observed seagrass occurrence and long-term ocean conditions (EMU data). Blue-fonts are for data sources. We have seagrass habitat polygons from Marine Cadastre and ocean measurements from ESRI’s open-source EMU dataset.
Quantify the relationship between temperature and other EMU variables to simulate future ocean conditions. Our goal is to simulate temperature increase in order to know how increased temperatures might impact seagrass habitat, in association with other ocean variables.
Implement the seagrass simulation model based on random forests ML to estimate where seagrasses can grow under these new conditions, where temperatures are slightly increased. We increase the temperature in 0.1 degree increments to see what the impact of the increased temperature is on the other EMU variables and then reevaluate seagrass occurrence under these new ocean conditions.
Quantify the relationship between observed seagrass occurrence and long-term ocean conditions (EMU data).
EBK – Empirical Bayesian kriging (EBK) is a geostatistical interpolation method that automates the most difficult aspects of building a valid kriging model. Other kriging methods in the ArcGIS Geostatistical Analyst extension require manual adjustment of parameters to generate accurate results. However, EBK automatically calculates these parameters through a process of subsetting and simulations.
Even with an accuracy of 97.8%, we need to think about the applicability of the model we built to the entire dataset. Just like any other statistical model, Random Forests have their limitations. Although they are powerful interpolators, they are poor extrapolators. Thus, we cannot accurately predict areas where oceanic conditions fall outside the bounds of U.S. coastal conditions. To deal with this limitation, we eliminated areas close to the Poles that fell outside of the bounds of the training data.
Seagrass occurrence under present conditions. Interactive maps available at: https://www.arcgis.com/home/webmap/viewer.html?webmap=d7b5bd75f6494894a8183e601d86939f&extent=-170.1562,-59.0887,174.375,77.1167
Quantify the relationship between temperature and other EMU variables to simulate future ocean conditions. Our goal is to simulate temperature increase in order to know how increased temperatures might impact seagrass habitat in association with other ocean variables.
In order to forecast ocean conditions into the future (in order to determine the impact of a changing ocean on seagrass presence), we need to understand the relationship between temperature (the variable we are going to change) and the other EMU variables. Looking at this scatterplot matrix, we can see that the relationships are complex. (no nice neat, clear linear relationships). This could be because the relationships are truly complex or that there are different relationships in different parts of the globe (spatial regimes).
Looking at subsets of this scatterplot, we see clearer relationships (e.g. the yellow and green lines). How can we better identify these regions before creating a predictive model? GIS to the rescue!
Given that relationships between temperature and the other ocean variables in the scatter plots were not likely to show an easy linear relationship, we turned to spatial clustering to create data-driven regions.
Using an unsupervised machine learning technique available in ArcGIS Pro – Spatially Constrained Multivariate Clustering – we identified six unique coastal regions. We determined the optimum number of clusters based on the Pseudo-F statistic.
Regions are based on all six EMU variables. Very similar to the process used in creating the EMUs except in this slide, we used only shallow, nearshore EMUs (< 90 meters deep), and we imposed spatial constraints. Individual EMU points can’t be combined into a cluster unless they are adjacent to one another. This analytical result is very interesting in and of itself!
How well did machine learning do in predicting the relationships between temperature and the other EMU variables. Recall that we want to be able to increase temperature. If we increase by 0.2 degrees to model future ocean conditions, what will the future salinity look like? Nutrients? We used four machine learning techniques to model the current relationship of temperature to the other EMU variables.
RANSAC -- Random Sample Consensus: This is a linear regression method favored when there are outliers in the data.
SVM – Support Vector Machines: They are favored to model non-linear relationships.
MLP -- multilayer perceptron (a type of neural network): This a deep neural network that consists of numerous layers. Similar to SVM they are used to capture complex relationships in the data
RF -- Random Forest Regression: Unlike other regression methods, RF is a tree-based regression method. It is an ideal method to use when the relationship between variables are a scatter rather than a (non)-linear curve
The blue dots on these scatterplots represent the actual data. The gold dots represent the predictions from random forest regression. Support vector machine estimates are shown in red and RANSAC estimates in green. Even from a quick examination of these scatterplots it is clear that Random Forest out performed the other three machine learning techniques.
To check the validity of our predicted seagrass habitats, we compared the map of our predictions to a map produced by Short et al., 2007. There is a reasonable amount of similarity. Differences can, in part, be attributed to the fact that our model was limited to a depth of ~90 meters and was trained on a limited set of seagrass species which grow off the US coasts.
This is an indication that the Random Forest classifier we used to model seagrass habitats is useful. Now we apply this model to future oceans to predict how seagrass presence changes as oceans get warmer.
Short, F., Carruthers, T., Dennison, W., & Waycott, M. (2007). Global seagrass distribution and diversity: a bioregional model. Journal of Experimental Marine Biology and Ecology, 350(1), 3-20.
Given that the Random Forest classifier we used to model seagrass habitats is useful, we apply this model to future oceans to predict how seagrass presence changes as oceans get warmer
So the result here is a snapshot of seagrass habitat suitability as temps increase
We used a tool called Emerging Hot Spot Analysis (EHSA) to mine the changes in spatial/temporal patterns in seagrass habitat suitability
There are so many categories in the legend b/c the EHSA tool computes the change of a statistic over time discretized as hexagons into 17 different spatio-temporal signatures. EHSA always returns all of these categories which have very specific meanings for the change of the Gi* statistic over time
In this SIMULATION, as oceans get warmer portions of the Arctic (e.g., Siberia) becomes a prime location for seagrass
Australia’s seagrass habitats are threatened with implications for its fisheries. 1.4 C increase in temp is a tipping pt for Australian coasts to be unsuitable for seagrass
Basically an example of incorporating 3rd party libraries from Python and integrating them into a big spatial workflow
Examining the previous maps carefully shows that the patterns of seagrass loss are temporally complex. For example, areas off the east coast of Greenland are consistently unsuitable for seagrass regardless of the increase in temperature. However, areas off the southeast coast of South America initially have an increase in seagrass occurrence, the amount of seagrass decrease for a while and then in the most recent time periods (when the temperature increase is approaching 2.0 degrees) increases again. To summarize these complex temporal patterns, we created a space time cube from the expected seagrass counts at each location. Using the Emerging Hot Sport Analysis tool we identified hot and cold spots of seagrass. Hotspots are areas that have a significantly higher amounts of seagrass than expected and cold spots have significantly less seagrass than expected. This tool also looks at the intensity of the clustering over time. The clustering can intensify over time, stay the same, or decrease. Based on the type of clustering (hotspots indicate suitable areas for seagrass growth) and the trend in the intensity of the clustering, each location was categorized as indicated in the legend to the right.