CromoCat: New developpments to genetic diversity databasing
1. CROMOCAT: NEW DEVELOPMENTS TO
GENETIC DIVERSITY DATABASING
Cèsar BlanchÊ & Joan Simon
GReB, Laboratori de BotĂ nica, Facultat de FarmĂ cia, Universitat de Barcelona (Barcelona)
UNIVERSITAT DE BARCELONA
BioC
Introduction
Continued contributions to chromosome diversity and the increasing volume of reports on plant molecular diversity in last years promoted some improvements in
CROMOCAT (Chromosome Database of the Flora of the Catalan Countries) in order to gather a wider dataset of plant genetic diversity. Both scientific (systematics,
plant population biology) and applied (plant conservation, genetic resources cataloguing, etc.) sectors will benefit of an easier access to plant genetic information.
Figure 1
Catalan Countries
CROMOCAT area coverage
CROMOCAT current coverage
CROMOCAT new developments
⢠Acknowledgements to Dr. Juli CaujapÊ-
Castells for permission to test and use the
SAGE database software and to Albert
Armengol for technical assistance in data
gathering.
CROMOCAT TECHNICAL CHARACTERISTICS
⢠Based on OPTIMA Standards (OPTIMA Comission for Karyosystematics, 1993)
⢠Designed as independent module of- and linked/available through BIOCAT (Biodiversity Database of Catalonia)
⢠Full territorial coverage (Catalonia, Valencian Country, Balearic Isl., Aragon Strip, Andorra and Northern/French Catalonia) (Fig. 1)
⢠Based on Microsoft AccessŽ software and built as relational database (Fig. 2)
⢠Fields design available from Simon & BlanchÊ, 1997 (OPTIMA Newsletter 32: 6-7) and Simon, Margelà & BlanchÊ, 2001 (Bocconea 13: 281-297)
⢠Taxonomic theasaurus following Bolòs, Vigo, Masalles & Ninot, 2005 (Fl. Manual Països Cat. 3rd ed., Pòrtic, Barcelona)
⢠Full chromosome data from the covered area (CRO-IN) and summarized data from outside (CRO-OUT)
⢠Data provided to Euro+Med Chromosome Database
⢠Partially funded by Autonomous Government of Catalonia - Generalitat de Catalunya (Departament de Medi Ambient i Habitatge)
⢠Available trough the internet at http://biodiver.bio.ub.es/biocat/homepage.html and full references through http://www.ub.es/cedocbiv/
â˘Increasing research on plant molecular genetic diversity in last years leaded us to enlarge the coverage of data gathered by CROMOCAT, including other non-
chromosomal reports of genetic diversity. First attempts at the end of 2006 were driven to collect data of DNA values and molecular markers widely used in systematics
and population genetics, in a new module called GENOCAT (Fig. 6, 7). This module follows the CROMOCAT system by addition of new fields of C-DNA (values and
bibliographic references) and of allozyme references (mainly bibliographic citations, after some non-successful attempts to gather variation indices as Ho, He, A, Ap, P,
that are non-standardized data and may need some additional calculations from original publications, frequently impossible to perform as original primary data are not
available since journals space limitations). 4981 new records and 1059 new references were added (Table 2)
â˘Although phylogenetic molecular data sets (sequences, phylogenetic trees) are usually available from current databases (i.e.GenBank) there is not an internationally
accepted system to collect genetic diversity data belonging to population genetics (patterns of diversity analogue in some way to chromosome diversity and significant
at both individual and population level). After some (few) published attempts to compile individual genotypes to build up population genetics databases (mainly
microsatellite and allozyme reports), a new database (SAGE, Storage and analysis of genotypes, CaujapĂŠ-Castells et al. 2007 ) based on the computer program
Transformer-3 (CaujapĂŠ-Castells and Baccarani-Rosas 2005) and its furture upgrades is now available and the CROMOCAT team adhered to this initiative to improve
the GENOCAT module to a new version (v. 2.0) to be launched at the end of 2007, with a link to the SAGE database
Figure 6
GENOCAT v 1.0 (2006)
C-DNA values table
SAGE DATABASE CHARACTERISTICS (CaujapĂŠ-Castells et al., 2007)
⢠Based on the capabilities of Transformer-3Ž software (by J. CaujapÊ-Castells & M. Baccarani-Rosas)
and its future upgrades
⢠Stores matrices of individual genotypes ready to analysis plus ancilary biological information
⢠Suitable for all types of population genetic molecular markers
⢠All data georeferred and mapped (through Google MapsŽ API) allowing for distance matrix building
⢠Matrix workable for further metaanalyses
â˘Current data submission by e-mail (further access by web)Projected to be adopted as international
standard prior to scientific papers submission to journals
â˘SAGE is available through http://www.exegen.org/sage (still at a preliminar stage)
Pteridophyta Gymnosperms Dicots Monocots Total (records number)
CRO-IN 73 0 2231 571 2875
CRO-OUT 699 90 25095 10250 36134
References
6603
TOTAL RECORDS 47618
0
5000
10000
15000
20000
25000
30000
35000
40000
45000
50000
records
2001 2002 2003 2004 2005 2006
years
Record number evolution (last 6 years)
REFERENCES
CRO-OUT
CRO-IN
Table 1
CROMOCAT current size (December 2006)
Figure 5
CROMOCAT was launched on 1999
Continued growing size each year
Figure 2
Display of the first
version (1999)
of CROMOCAT
Figure 3
Relational database
structure with fields
shared with other
databases for further
conversions
Figure 4
Fields design
Figure 7
GENOCAT v. 1.0 (2006)
Molecular markers table
2.- Enter geographic coordinates for populations or individuals
Easy data transfer to SAGE
1.- Enter the data matrix for any molecular marker (Transformer-3 format)
Images reproduced with permission of Dr. J. CaujapĂŠ-Castells
GenoCat Pteridophyta Gymnosperms Dicots Monocots Total (records number)
Genetic
diversity
records
11 106 824 285 1226
References 980
TOTAL RECORDS 2206
Pteridophyta Gymnosperms Dicots Monocots Total (records number)
C-DNA
records
0 0 1842 864 2706
References 79
TOTAL RECORDS 2785
Table 2. GENOCAT current size (December 2006)