Presentation held at the M3 SIG meeting at the ISMB in Stockholm 2009. Purpose to show the audience the software development activities of the Genomics standards Consortium. See also http://gensc.org
Software Development by the Genomics Standards Consortium
1. Bringing Standards to Life:
Software Development by the
Genomics
Standards Consortium
Renzo Kottmann
Microbial Genomics Group
Max Planck Institute for Marine Microbiology
M3 SIG Stockholm July 2009 1
2. Genomic Standards Consortium (GSC)
Goal
• Promote mechanisms that
standardize the description of genomes
exchange and integrate genomic data
Open-membership, international working body
• Established in Sept 2005
• Participants include DDBJ, EMBL, GenBank, Sanger,
JCVI, JGI, EBI and a range of US, UK and EU research
institutions
• Organized a series of workshops
2 2
http://gensc.org and http://gensc.org/gc_wiki/index.php/GSC_Membership
3. Minimum Information about a Genome Sequence
(MIGS) Specification
MIGS extends what DDBJ/EMBL/GenBank request
upon submission of a genome sequence
• Examples:
Description of geographic location of a sample and
habitat
“Minimum Information about a Metagenomic Sequence”
(MIMS)
– Temperature
– pH
Description of sequence generation
– Sequencing method
– Assembly method
3 3
Field et al. Nat Biotechnol. 2008
6. Software Development for MIGS/MIMS
Mechanisms for
achieving compliance
are needed:
• Such mechanisms
involve
an appropriate reporting
structure for capturing
and exchanging data,
software,
databases
and controlled
vocabularies and/or
ontologies for defining
the terms used in the
annotations.
6
Field et al. Nat Biotechnol. 2008
7. Software Development for MIGS/MIMS
Mechanisms for Supporting Projects:
achieving compliance • Habitat-Lite (Ontology
are needed: specification)
• Such mechanisms
involve
an appropriate reporting
structure for capturing
and exchanging data,
software,
databases
and controlled
vocabularies and/or
ontologies for defining
the terms used in the
annotations.
7
Field et al. Nat Biotechnol. 2008
8. Software Development for MIGS/MIMS
Mechanisms for Supporting Projects:
achieving compliance • Habitat-Lite (Ontology
are needed: specification)
• Such mechanisms • Genomic Rosetta Stone
involve (Identifier Mapping)
an appropriate reporting
structure for capturing
and exchanging data,
software,
databases
and controlled
vocabularies and/or
ontologies for defining
the terms used in the
annotations.
8
Field et al. Nat Biotechnol. 2008
9. Software Development for MIGS/MIMS
Mechanisms for Supporting Projects:
achieving compliance • Habitat-Lite (Ontology
are needed: specification)
• Such mechanisms • Genomic Rosetta Stone
involve (Identifier Mapping)
an appropriate reporting • GCDML (MIGS/MIMS
structure for capturing
and exchanging data, specification in XML)
software,
databases
and controlled
vocabularies and/or
ontologies for defining
the terms used in the
annotations.
9
Field et al. Nat Biotechnol. 2008
10. Software Development for MIGS/MIMS
Mechanisms for Supporting Projects:
achieving compliance • Habitat-Lite (Ontology
are needed: specification)
• Such mechanisms • Genomic Rosetta Stone
involve (Identifier Mapping)
an appropriate reporting • GCDML (MIGS/MIMS
structure for capturing
and exchanging data, specification in XML)
software, • Genomes Catalogue
databases (Database and Web
and controlled Server)
vocabularies and/or
ontologies for defining
the terms used in the
annotations.
10
Field et al. Nat Biotechnol. 2008
11. Aquatic Aquatic: Freshwater Acquatic: Marine Terrestrial Air Fossil Food Organism-Associated Extreme Habitat Other
Habitat-Lite (= EnvO-Lite)
Easy-to-use (small) set of terms
• Captures high-level information about habitat
• Derived from the Environment Ontology (EnvO).
Meet the needs of multiple users
• Annotators, database providers, biologists, and
bioinformaticians alike who need to search and
employ such data in comparative analyses.
Hirschman et al. OMICS. 2008 11 11
12. Habitat-Lite
1. Level 2. Level
Aquatic soil
Aquatic: Freshwater sediment
Aquatic: Marine sludge
Terrestrial waste water
Air hot spring
Fossil hydrothermal vent
Food biofilm
Organism-Associated microbial mat
Extreme Habitat
Other
< 20 terms
Hirschman et al. OMICS. 2008 12 12
14. Genomic Rosetta Stone (GRS)
Create a unified mapping between different genomic
resources
Improve navigation across these resources
Enable the integration of this information in the near
future.
Van Brabant et al. OMICS. 2008 14 14
16. Genomic Rosetta Stone (GRS)
Enable the integration of this information in the near
future
Van Brabant et al. OMICS. 2008 16 16
17. Genomic Contextual Data
Markup Language (GCDML)
An Extensible Markup Language (XML)
Aim
• Implement MIGS/MIMS
• Provide even more descriptors
• Facilitate exchange and integration of genomic data
Kottmann et al. OMICS. 2008 17 17
22. Genome Catalogue
Requirements
• A Rich toolkit/user-friendly
• Designed to give credit to all contributors
• XML-based (GCDML)
Able to maintain all versions of GCDML schemas
• Web services-based
Supporting the automated exchange of content
• Serve as the international GCAT identifier authority
• Comprehensive
Containing reports for all taxa and metagenomes
• Ontology-supportive
• Shared by the GSC
22 22
23. Current Status
We have specifications:
• MIGS/MIMS
• Habitat-Lite
• Genomic Rosetta Stone
Work on supporting software is ongoing:
• Genomes Catalogue is in prototype status
• Funding
This is a long-term endeavour that can not be done on a
voluntary basis
23 23
24. Disscusion
Need of software for:
• Creation of MIGS/MIMS data
• Storage
• Analysis
Expand standardization efforts to
• Software specification/development
• Work on a standardized genomic data management
architecture / cyberinfrastructure
Data intensive science is successful if it works
towards one community with one vision
• World Wide Genomics project
24 24
25. Acknowledgements
All Members of GSC incl.
Dawn Field
Peter Sterk
Saul Kravitz
Tanya Gray
Megx.net team
Frank Oliver Glöckner
Ivaylo Kostadinov
Melissa Beth Duhaime
Pier Luigi Buttigieg
Wolfgang Hankeln
Pelin Yilmaz
25