Bio-inspired techniques and their application to precision agriculture (Andre...
Algortimos bio-inspirados para clustering y visualizacion de datos geoespaciales
1. Faculté des Hautes Etudes Commerciales (HEC)
Institut des Systèmes d'information (ISI)
Algoritmos bio-inspirados para clustering y
visualización de datos geo-espaciales
Miguel Arturo Barreto Sánz
2. Outline
● Algoritmos bio-inspirados ?
● Desafios en el clustering y
visualizacion de datos geo-espaciales
g p
● Algoritmos bio-inspirados usados en
clustering y visualizacion de datos
geo-espaciales
● Conclusiones
1
3. 1.Bio inspirados
1 Bio-inspirados ?
Speedo's Aerodynamic Surfaces
"Fastskin" suit,
Fastskin for Vehicles
inspired by
shark skin
Technologies
T h l i
Inspired by
Sharks
By Tracy Staedter, feb 2009 ,
Discovery News
y
4. 1.Bio inspirados
1 Bio-inspirados ?
Inspired b
by
A clear version of
Touchco’s human skin
multitouch sensor By Nick Bilton, Dec 30 2009,
platform The New York Times
Sensors capture the
p
Sensors pick
variation in pressure levels
up the
pressure of a of a pencil drawing.
hand placed
on a Touchco
device 2
5. 1.Bio inspirados
1 Bio-inspirados ?
• La naturaleza innova inventa prueba valida mejora y
innova, inventa, prueba, valida,
diversifica los sistemas vivos desde hace centenas de
millones de años.
• El punto de vista de los sistemas bio-inspirados se basa
en el estudio de las “invenciones” y las “astucias” de la
naturaleza para inspirarse y crear soluciones (esto no
significa necesariamente copiar).
• Innumerables ejemplos de soluciones de ingeniería
“natural”
“ t l” son ya utilizadas para el d
tili d l desarrollo d nuevos
ll de
materiales, retinas artificiales, etc.
Andres Perez-Uribe
Perez Uribe
1
6. 1.Bio inspirados
1 Bio-inspirados ?
Fuentes de inspiración
Largo termino
Evolución
E l ió
Auto-organización
Aprendizaje
Emergencia
Corto termino
Individuo Poblaciones
1
7. 1.Bio inspirados
1 Bio-inspirados ?
Fuentes de inspiración
Largo termino
Evolución
E l ió
Auto-organización
Aprendizaje
Emergencia
Corto termino
Individuo Poblaciones
1
8. 1.Bio inspirados?
1 Bio-inspirados?
Auto-organización
The rat whisker-barrel system
It is also the rat's sensory system of choice for exploring the environment and collecting information
about the location, shape, size and texture of objects around it. The system is well suited to examining
neural coding issues because of its functional efficiency and its elegant structural organization. The
g y g g
whisker area of somatosensory cortex (known as barrel cortex) is arranged as a topographic map of
the whiskers .This means that sensory signals arising in one whisker are channelled through a
restricted population of neurons and can be sampled by an electrode at different stages of the sensory
system.
9. 1.Bio inspirados?
1 Bio-inspirados?
Clustering bio-inspirado
Neural networks have solved a wide range of
problems and h
bl d have good l
d learning capabilities.
i biliti
Their strengths include adaptation, ease of
implementation, parallelization, speed, and
p p p
flexibility.
Bio inspired
Bio-inspired clustering is closely related to the
concept of competitive learning.
10. 1.Bio-inspirados ?
Clustering bio-inspirado
bio inspirado
Hard and soft competitive learning
Hard …
a) k initial "means" b) k clusters are c) The centroid of d) Steps 2 and 3 are
created by each of the k repeated until
associating
g clusters becomes convergence has been
every the new means reached.
observation
with the nearest
mean
11. 1.Bio-inspirados ?
Clustering bio-inspirado
bio inspirado
Hard and soft competitive learning
Soft
S ft … mi = mi + α(t)hci(t)(x - mi)
The neighborhood function hck(t) is centered over the best matched
g ()
neuron mc, which is shown as a black cell. The neighboring neurons
that have their weights recalculated by this best match are shown in
gray. Other neurons are not affected.
12. 1.Bio-inspirados ?
Clustering bio-inspirado
bio inspirado
Hierarchical Self-organizing structures
Se o ga
Self-organizing
g Adaptive Hierarchical
Hierarchical Feature Incremental
Growing Hierarchical SOM
Maps Grid Growing
14. 2. Desafíos en clustering y visualización de
datos geo-espaciales
Information received from
remote sensing systems,
and environmental
monitoring devices used in:
● Agro-ecology
● Environmental change
● Species distribution
● Disease propagation
● Urban dynamics
● Migration patterns
3
15. 2. Desafíos en clustering y visualización de
datos geo-espaciales
The special nature of spatio-temporal data poses several
spatio temporal
challenges to clustering and visualization.
For instance:
1. Visualization of clusters in both geographic and feature space
2. The fact that spatial and temporal relationships exist at various
levels (scales);
( );
3. To handle fuzzy boundaries in geospatial clusters
4. The temporal context in which some variables are involved
5. The high dimensionally of the geospatial data sets
6.
6 The large quantity of data
17
16. 2. Desafíos en clustering y visualización de datos geo-espaciales
Geographic space and f t
G hi d feature
space
Geographic space is concerned with surface features as the terrain
we walk on.
Feature space is concerned with the representation of similarities
associated with geo-referenced sites in the geographic space
Geographic space Feature space
23
17. 2. Desafios en clustering y visualizacion de datos geo-espaciales
Geographic space and f t
G hi d feature
space
The clusters found in the
feature space in many
cases are not the same as
those found in geographic
space.
Represent clusters of a
multidimensional space:
map multidimensional data
o to t o d e s o a
onto a two-dimensional
lattice of cells.
Similarity of sugarcane
growing environmental
conditions (1999 2005)
diti (1999-2005)
using Self-organizing
maps
29
18. 2. Desafios en clustering y visualizacion de datos geo-espaciales
Heterogeneity in scales
Necessary to have
methodologies to
evaluate clusters at
different scales in order
to find “interesting”
patterns between levels.
Improve the analysis of
cluster structure at
different scales,
creating representations
of the cluster f ili i
f h l facilitating
the selection of clusters
at different scales.
Geographic space Feature space
19
19. 2. Desafios en clustering y visualizacion de datos geo-espaciales
Boundaries in geospatial data
Crisp Fuzzy
Algorithms for clustering spatio-
temporal databases have to
consider the neighbors of the geo
geo-
referenced data.
For instance part of the complexity
instance,
of the problem lies in the fact that
the boundaries of these neighbors
are not hard, but rather soft
,
boundaries.
21
20. 2. Desafíos en clustering y visualización de datos geo-espaciales
Temporal relationships b t
T l l ti hi between
spatial objects
The relationship between spatial objects can change over time.
This dynamic relationships can be observed for instance in the
cluster changes over the time
time.
22 Similarity of sugarcane growing environmental conditions (1999-2001) using Self-
organizing maps
21. 3. Algoritmos bio-inspirados usados en clustering y
visualización de datos geo-espaciales
i li ió d d t i l
Why to use bio-inspired algorithms ?
y p g
1. Discovering natural clusters in unlabeled data sets.
2. Reduction of information redundancy contained in the data.
3. The maximization of mutual information between the inputs
and the outputs of a network in the presence of noise
noise.
4. To help discover nonlinear, local or partial correlations
between variables.
5. To work with data with unknown distribution.
22. 3. Algoritmos bio-inspirados usados en clustering y
visualización de datos geo-espaciales
i li ió d d t i l
A trivial case: finding zones with analogous precipitation and air temperature
in South America by using FGHSON
Recorderis!
FGHSON
Fuzzy Growing Hierarchical Self-organizing Networks (FGHSON)
23. 3. Algoritmos bio-inspirados usados en clustering y visualización de datos geo-espaciales
A trivial case: finding zones with analogous precipitation and air temperature in South America by using
FGHSON
January
Air temperature
and precipitation
24. 3. Algoritmos bio-inspirados usados en clustering y visualización de datos geo-espaciales
A trivial case: finding zones with analogous precipitation and air temperature in South America by using
FGHSON
January
Air temperature
and precipitation
25. 3. Algoritmos bio-inspirados usados en clustering y visualización de datos geo-espaciales
Clusters of sites with similar
characteristics in time and space
For commercial (mass production) crops (rice, corn) it is known the
“when” and “where”
For native crops (e.g. guanabana, lulo) it is not the case
(e g guanabana case.
When and what I must cultivate ?
Market demand
The COCH project
16
26. 3. Algoritmos bio-inspirados usados en clustering y visualización de datos geo-espaciales
Clusters of sites with similar
characteristics in time and space
Soil What crops or varieties are likely to perform well where and
when.
Climate
Genotype
(Source: Homologue)
Homologues places for Colombian coffee production.
Brazil, Equator, East Africa, and New Guinea.
14
27. 3. Algoritmos bio-inspirados usados en clustering y visualización de datos geo-espaciales
Clusters of sites with similar
characteristics in time and space
Harvest at different time of the same crop
15
28. 3. Algoritmos bio-inspirados usados en clustering y visualización de datos geo-espaciales
FGHSON using to find analogous ecoregions through time
29. 3. Algoritmos bio-inspirados usados en clustering y visualización de datos geo-espaciales
FGHSON using to find analogous ecoregions through time
30. Conclusiones (I)
• Discovering natural clusters in unlabeled data sets. The continuous updating,
large quantity, and th di
l tit d the diverse uses of geospatial d t make diffi lt t l b l d
f ti l data, k difficult to labeled
observations in order to define classes.
• Reduction of information redundancy contained in the data. Soft competitive
learning algorithms create prototypes of the observations. Hence, large data sets
g g p yp , g
can be reduced without, or a minimal, lose of information
• The maximization of mutual information between the inputs and the outputs
of a network in the presence of noise. Usually, geospatial variables are measured
by instruments in difficult and not controlled environmental conditions (e g satellites
(e.g. satellites,
meteorological stations).
• To help discover nonlinear, local or partial correlations between variables.
Several soft competitive learning algorithms allow the projection of high-dimensional
space in a two dimensional grid. Thus, allowing the visual exploratory analysis of
data, facilitating to discover non linear, local, or partial correlations;
• To work with data with unknown distribution. Many clustering algorithms had
been developed to deal with certain data distributions (e g Gaussian distributions)
(e.g. distributions).
Soft competitive learning algorithms are very useful when working with geospatial
data because they do not need to assume any data distribution
1
31. Conclusiones (II)
FGHSON
Advantages
1.
1 FGHSON does not require a priory setup of the number of clusters
clusters.
This aspect is critical when dealing with geospatial data, because
usually it is no possible estimate a priory the optimal number of
clusters that can better represent a data set
2. The membership of the observations to the clusters is fuzzy
3. The final structure does not necessarily lead to a balanced hierarchy
(i.e.
(i e a hierarchy with equal depth in each branch) Therefore areas in the
branch). Therefore,
input space that require more units for appropriate data representation
create deeper branches than others. It is important when dealing with
geographical-based data, due to in many cases are found regions that
must be better represented
1
32. Conclusiones (III)
FGHSON
Advantages
4. The algorithm execute a self-organizing p
g g g processes that can be p
performed in
parallel. Hence, when dealing when large data sets the tasks can be divided
distributing computational cost.
5. A software using FGHSON algorithm in geosciences is in development
6. The maps on individual layers can not grow irregularly in shape and they can not
may remove connections between neighboring units. In this way it is lose information
about the input data.
Disadvantages
1. The FGHSOM can not project a high-dimensional space in a two dimensional
space
2. The FGHSOM is a new algorithm
1