Here are the steps I would take to answer these competency queries about a marine species:1. Lookup the scientific name in a taxonomic database like WoRMS to get basic information about the species like kingdom, phylum, class, order, family, genus. This establishes the biological classification. 2. Search occurrence records in a global biodiversity database like OBIS or GBIF to find locations where the species has been observed or collected. This gives insights into its natural environment and ecosystem associations. 3. Cross-reference the occurrence locations with geographic data sets like marine ecoregions or countries to characterize its known range at different spatial scales.4. Check nomenclature resources and literature for documented common names
The iMarine initiative provides a data infrastructure aimed at facilitating open access, the sharing of data, collaborative analysis, processing and mining processing, as well as the dissemination of newly generated knowledge. The iMarine data infrastructure is developed to support decision making in high-level challenges that require policy decisions typical of the ecosystem approach. The iMarine offering can be articulated in six bundles. A “bundle” is a set of services and technologies grouped according to a family of related tasks for achieving a common objective. Bundles can be customized and/or enriched into flexible, purpose-built Virtual Research Environments (VRE). Virtual research environments offer flexible and secure web-based, community-centric platforms, so researchers can work together on common challenges. Each VRE in the infrastructure is tightly integrated with the underlying gCube enabling software, and can access and re-purpose data from other iMarine applications.
Similar a Here are the steps I would take to answer these competency queries about a marine species:1. Lookup the scientific name in a taxonomic database like WoRMS to get basic information about the species like kingdom, phylum, class, order, family, genus. This establishes the biological classification. 2. Search occurrence records in a global biodiversity database like OBIS or GBIF to find locations where the species has been observed or collected. This gives insights into its natural environment and ecosystem associations. 3. Cross-reference the occurrence locations with geographic data sets like marine ecoregions or countries to characterize its known range at different spatial scales.4. Check nomenclature resources and literature for documented common names
Data Management and Applications at SOCIBDavid March
Similar a Here are the steps I would take to answer these competency queries about a marine species:1. Lookup the scientific name in a taxonomic database like WoRMS to get basic information about the species like kingdom, phylum, class, order, family, genus. This establishes the biological classification. 2. Search occurrence records in a global biodiversity database like OBIS or GBIF to find locations where the species has been observed or collected. This gives insights into its natural environment and ecosystem associations. 3. Cross-reference the occurrence locations with geographic data sets like marine ecoregions or countries to characterize its known range at different spatial scales.4. Check nomenclature resources and literature for documented common names (20)
Here are the steps I would take to answer these competency queries about a marine species:1. Lookup the scientific name in a taxonomic database like WoRMS to get basic information about the species like kingdom, phylum, class, order, family, genus. This establishes the biological classification. 2. Search occurrence records in a global biodiversity database like OBIS or GBIF to find locations where the species has been observed or collected. This gives insights into its natural environment and ecosystem associations. 3. Cross-reference the occurrence locations with geographic data sets like marine ecoregions or countries to characterize its known range at different spatial scales.4. Check nomenclature resources and literature for documented common names
1. 4th iMarine Board
Rome
17-18 October 2013
iMarine Products and services delivery
Pasquale Pagano (CNR)
iMarine Technical Director
pasquale.pagano@isti.cnr.it
2. Outline
Products and services development progress report
• BiolCube
• StatsCube
• GeosCube
• ConnectCube
Products and services catalogue at project conclusion
• Tiny selection of products
iMarine Products and services delivery
4. Application Bundles
Management and interpretation of biological and
ecological data in the environment
Complete full life-cycle data framework, from
observational data to aggregated data repositories
enriched with validation and analytical tools
Storage and interpretation of geospatial explicit
information, including WPS processing
Flexible sharing, storage, reporting, search and
retrieval, aggregation and projection facilities
iMarine Products and services delivery
A BUNDLE is
a set of
services and
technologie
s grouped
according to
a family of
related
tasks for ac
hieving a
common
objective
5. A fraction of the products and services belonging to BiolCube
PRODUCTS AND SERVICES
DEVELOPMENT PROGRESS REPORT
iMarine Products and services delivery
6. Species Data Discovery
Search for multiple species
Search across several data providers
Search for all occurrences of a set of species and their synonyms
Search occurrences for all species belonging a taxon group
iMarine Products and services delivery
7. Species Data Discovery
Search in GBIF all the occurrences about 'sarda sarda' and its synonyms found in WoRMS
• SEARCH BY SN 'sarda sarda' EXPAND WITH WoRMS IN GBIF RETURN Occurrence
Search in CoL all the Taxa about 'sarda sarda' and its synonyms found in WoRMS
• SEARCH BY SN 'sarda sarda' EXPAND WITH WoRMS IN CoL RETURN TAXON
Search all occurrences for the species commonly recognized as 'shark' in WoRMS and their
synonyms as recognized by CoL. Accept only the results with coordinate less or equals to
(15.12, 16.12).
• SEARCH BY CN 'shark' RESOLVE WITH WoRMS EXPAND WITH CoL WHERE coordinate <= 15.12, 16.12 RETURN
Occurrence
Search in OBIS all the occurrences for 'sarda sarda' and 'Carcharodon carcharias' expanded
with synonyms from WoRMS and CoL. Accept only the results with an event date between
2000 and 2005.
• SEARCH BY SN 'sarda sarda', 'Carcharodon carcharias' EXPAND WITH WoRMS, CoL IN OBIS WHERE eventDate >=
'2000' AND eventDate <= '2005' RETURN Occurrence
iMarine Products and services delivery
8. Occurrence Points
Occurrence Data from GBIF
Occurrence Data from Obis
∩
ᴜ
-
Intersection
Union
Difference
DD
Duplicates Deletion
A
B
x,y
x,y
Records
Event Date
Event Date
Modif Date
Modif Date
Similarity
Author
Species Scientific Name
Author
Species Scientific Name
iMarine Products and services delivery
9. Similarity between habitats
Habitat Representativeness Score:
1.
Measures the similarity between the environmental features of two areas
2. Assesses the quality of models and environmental features
Latimeria chalumnae
HRS=10.5
Habitat
Representativeness
Score
iMarine Products and services delivery
10. BiOnym
Raw Input String.
E.g. Gadus morua Lineus 1758
Reference
Source
(ASFIS)
Preprocessing
And
Parsing
Reference
Source
(FISHBASE)
Reference
Source
(Other in
DwC-A)
Reference
Source
(OBIS)
Taxon
Matcher 1
Taxon
Matcher 2
A flexible workflow approach to
taxon name matching
Accounts for:
• Variations in the spelling and
interpretation of taxonomic
names
• Combination of data from
different sources
• Harmonization and reconciliation
of Taxa names
Taxon
Matcher n
PostProcessing
Correct Transcriptions:
E.g. Gadus morhua (Linnaeus, 1758)
iMarine Products and services delivery
11. Trendylyzer - Scope
• Fill some knowledge gaps on marine
species
• Account for sampling biases
• Define trends for common species
We focus on the OBIS database
Is the Fulmar losing its common
species status among the
seabirds?
Herring recovered after the fish ban
Can we recognize big changes in
species presence?
Plankton regime shift
iMarine Products and services delivery
14. Trendylyzer – Observation ranks on Marine Ecoregions of the World
iMarine Products and services delivery
15. Length-Weight Relationships
Objective:
Calculate the a and b parameters for several
species.
Requirements:
Account for...
• Many studies about a single species
• Single study
• Use existing studies to inform new studies
bluewatermag.com.au
Solution:
Combine existing knowledge with new data by
means of Bayesian methods.
Approach:
Collaborative development with the
‘stakeholder’
Integration of R Scripts
Usage of Cloud computing for R Scripts
iMarine Products and services delivery
16. LWR - Performance
The porting to the D4Science Statistical Manager allowed to run the
scripts in distributed fashion
The original time of the scientist’s procedure was 20 days
After the optimization on our R development machines the time of
the sequential run was reduced to 10 days
The timing on the Statistical Manager was of 11 hours!
Time reduction of 95.4%
The script has been run periodically and currently solves LWR for
37 234 species
iMarine Products and services delivery
17. A fraction of the products and services belonging to StatsCube
PRODUCTS AND SERVICES
DEVELOPMENT PROGRESS REPORT
iMarine Products and services delivery
18. Tabular Data Manager
Complete new application for the management
of data workflow. It allows to *manage* *flow of
data* and to create report out of the
management activities.
• flow of data: dataset compliant with a template
that are generated and updated in chunks.
• manage: import, store, transform, validate,
access, analyze, visualize, and export.
iMarine Products and services delivery
19. Tabular Data Manager: Templates
• A table template defines:
– Table definition
– Columns definition
– A set of table transformations
– A set of validation procedures
• Can be applied to any dataset
• Can be modified and shared among people
iMarine Products and services delivery
20. Tabular Data Manager: Menu
Ribbon style menu
Buttons behavior depends
on current document
Alt messages on
mouseover
iMarine Products and services delivery
23. 330 Cores Currently Allocated
Infrastructure: Computing as Service
Hadoop
• MapReduce
Statistical
Manager
• Analysis/clustering/modeling
R clusters
• Windows and Linux
I-MARINE EXTENDED BOARD
23
24. A fraction of the products and services belonging to GeosCube
PRODUCTS AND SERVICES
DEVELOPMENT PROGRESS REPORT
iMarine Products and services delivery
25. Rasterization
A polygonal map is
transformed into a raster
map or into a point map
iMarine Products and services delivery
29. Environmental Enrichment: Approach
• (Oozie)workflow to optimize the processing chain:
– Extract occurrences for the Carcharodon carcharias (White
Shark) for a given time of interest
– Apply the dbscan algorithm (R implementation) to identify
geospatial clusters
– Create bounding boxes around the clusters
– Use the bounding boxes as queryables for the WCS request
– Apply BEAM Pixel Extraction (same algorithm as BioOracle
environmental enrichment service)
– Create the time series
– Visualize the time series
iMarine Products and services delivery
31. SPREAD
• Interactive investigation process for statisticians &
scientists to confront data from different domains
(e.g. Statistics vs. GIS data) and batch process of data
reallocations hypothesis
DATA IMPORT / CURATION
Estimates dataset
by EEZ – high seas
Catch dataset
by FAO area
FAO Areas
GIS DATA DISCOVERY,
SEARCHING & SHARING
Available
Target Areas
DATA SELECTION
(e.g. Filter)
Geographic intersection
FAO Areas / EEZs – Highs seas
REALLOCATION
Species
distributions
iMarine Products and services delivery
32. Legacy Processes (IRD)
• iX Catches per Species: per Ocean / Area, per
Fishing Gear type, per Month / Year, and kernel
density for biodiversity / ecological datasets
(IRD+OBIS+GBIF)
20°N
10°N
0
10°S
20°S
30°S
30°E
50°E
70°E
90°E
110°E
iMarine Products and services delivery
33. A fraction of the products and services belonging to ConnectCube
PRODUCTS AND SERVICES
DEVELOPMENT PROGRESS REPORT
iMarine Products and services delivery
34. MarineTLO
Version 3.0.0
Version 2.0.0
–
–
–
–
–
–
–
–
Species
Scientific Name of Species
FAO Species Code
IRD Species Code
WoRMS Species Code
Predators and Prey
Competitors
Biological Classification of Species
(e.g. WoRMS)
–
–
–
–
–
–
–
–
–
–
–
–
–
–
MarineTLO Version 2.0.0
Water Areas
Species connected to Water Areas
Countries
Countries connected to Water Aras
Species connected to Countries
Ecosystems
Ecosystems connected to Countries
Species connected to Ecosystems
Exclusive Economical Zones
Fishing Gears
Fishing Vessels
More species and more Predators
Common Names of Species
iMarine Products and services delivery
34
35. Requirements as Competency Queries
#Query For a scientific name of a species (e.g. Thunnus Albacares or Poromitra Crassiceps),
find/give me
Q1
the biological environments (e.g. ecosystems) in which the species has been introduced and more
general descriptive information of it (such as the country)
Q2
its common names and their complementary info (e.g. languages and countries where they are
used)
Q3
Q4
Q5
Q6
the water areas and their FAO codes in which the species is native
the countries in which the species lives
the water areas and the FAO portioning code associated with a country
the presentation w.r.t Country, Ecosystem, Water Area and Exclusive Economical Zone (of the
water area)
Q7
the projection w.r.t. Ecosystem and Competitor, providing for each competitor the identification
information (e.g. several codes provided by different organizations)
Q8
a map w.r.t. Country and Predator, providing for each predator both the identification information
and the biological classification
Q9
who discovered it, in which year, the biological classification, the identification information, the
common names - providing for each common name the language, the countries where it is used
in.
iMarine Products and services delivery
35
36. The MarineTLO-based warehouse Evolution
RDF
Triple Store
TLOMarine
FLOD2TLOm
apping
ECOSCOPE2TLO
mapping
WoRMS2TLO
mapping
DBpediaS2TLO
mapping
FB2TLO
mapping
FLOD
ECOSCOPE
WoRMS
DBpedia
Fishbase
Copy
FLOD
By FAO
Copy
ECOSCOPE
By IRD
Copy
WoRMS
(part)
Generated by SPD
&TLO wrapper
Copy
DBpedia
(part)
By DBpedia
SPARQL Endpoint
iMarine Products and services delivery
Copy
Fishbase
(part)
By Fishbase
RDMS
37. Warehouse V3
Concepts
Ecoscope
FLOD
WoRMS DBpedia Fishbase
Species
Scientific Names
Authorships
Common Names
Predators
Ecosystems
Countries
Water Areas
Vessels
Gears
EEZ
iMarine Products and services delivery
38. TLO warehouse V2 vs V3
V2 Contains information about 19,000 distinct marine species
Source
Species
Number
DBpedia
FLOD
14,291
FLOD
Common Species (size of intersections)
10,849
WoRMS
3,046
Ecoscope
731
56
768
FLOD
1124
Ecoscope
DBpedia
WoRMS
73
277
WoRMS
768
53
V3 contains information about 37,000 distinct marine species
Source
Common Species (size of intersections)
Species
Number
DBpedia
14,291
FLOD
FLOD
WoRMS
Ecoscope
Fishbase
10,849
WoRMS
1124
Ecoscope
277
FishBase
31,277
DBpedia
FLOD
3,046
731
56
9833
768
73
6141
53
1288
WoRMS
Ecoscope
iMarine Products and services delivery
53
39. A tiny fraction of the products and services belonging to BiolCube
PRODUCTS AND SERVICES CATALOGUE
AT PROJECT CONCLUSION
iMarine Products and services delivery
40. Trendylyzer – Definition of Common Species
Grey = not a common species in 1990
Trends for common
species can be indicators
of ecological changes
A formal definition of
common species is not
trivial
A definition based on
occurrences distribution
gives interesting, result
but is affected by sampling
biases
iMarine Products and services delivery
41. Trendylyzer – Definition of Common Species
We are searching for a more formal definition of C.S., which accounts
for the biases in the database …
We defined a commonness score function
The terms influencing the Commonness of a species are given a weight
using pattern recognition models
For each species:
1. Nr of observations
2. Nr of individuals per observation
3. Nr of observations per dataset
4. Nr of datasets
5. Nr of geographical cells
6. Temporal frequency of the observations
Normalizing => relative commonness.
Create score or rank by taxonomic group
We are assessing the
performances on the
indications by FishBase and
IUCN on some benchmark
species
iMarine Products and services delivery
42. Trendylyzer - Performance
A preliminary definition of CS was done using
1. Nr of observations per dataset in one year
2. Nr of datasets containing the species in one year
On a ‘trustable’ benchmark with 255 species the correctness of the
classification with respect to an expert classification was 99.21%!
The complex approximating function including also time and
geographical extent gave 80% of agreement with respect to an expert
classification on an ‘wild’ benchmark (80 species)
The results are very promising!
iMarine Products and services delivery
43. A tiny fraction of the products and services belonging to StatsCube
PRODUCTS AND SERVICES CATALOGUE
AT PROJECT CONCLUSION
iMarine Products and services delivery