The document describes the Mouse Gene Expression Database (GXD), which integrates data on gene expression during mouse development from various sources and assay types. GXD provides search tools to query expression data for genes, anatomical structures, developmental stages, mutants, and references. It also offers visual summaries of expression assay results, images, and links to detailed annotations. Recent improvements include enhanced search capabilities, sortable summaries, and direct links to expression images.
Martin Ringwald, Mouse Gene Expression DB, fged_seattle_2013
1. The Mouse Gene Expression Database
(GXD)
Martin Ringwald
The Jackson Laboratory
2. Mouse developmental gene expression data provide insights into
• organismal function of genes
• molecular mechanism of differentiation
• molecular basis of disease
Genotype PhenotypeExpression
Mouse Strains and Mutants
Of mice and men …..
3. • integrates different types of expression data
RNA in situ hybridization Northern blot
Immunohistochemistry Western blot
Knock-in reporter studies RT-PCR
• focus on endogenous gene expression
during mouse development
• all developmental stages
• expression data from wild-type and mutant mice
The Gene Expression Database (GXD)
Gene
RNA
Protein
1…n
1…p
Time Space
Genotype
4. Standardized description of expression patterns
Hierarchical structure:
• Extensibility
• Hierarchical searches
• Integrated description
of expression patterns
from assays with
differing spatial resolution
Anatomical Ontology for Mouse Development: developed by Edinburgh Mouse Atlas Project
maintained and expanded by EMAP and GXD
Anatomical Ontology for the Adult Mouse: developed and maintained by GXD
5. Integrated access to complex and heterogeneous data
to facilitate the use of the mouse as an experimental
model to study human development and disease.
Integration with all the other data in MGI
Genotype PhenotypeExpression
Function!
PubMed
OMIM
GenBank/EMBL/DDBJ
Entrez Gene
UniProt
InterPro
EMAGE
GenePaint
GEO
Array Express
IMSR
Other species DB
Many links
to other resources:
8. • Data Acquisition and Current Data Content
• New Search and Display Features
Recent Progress
9. • curation of expression data from literature
• electronic submission from laboratories – small and large scale data
• collaboration with projects that generate data at a large scale
Data Acquisition for GXD
10. First step of literature curation:
Each article is indexed with regard to
- Genes
- Assay types
- Embryonic ages
- Bibliographic information
11. as of 6/15/13:
149,941 entries
20,996 references
15,033 genes
up-to-date
complete from
1993 (1990) to
the present
12.
13. Superior to PubMed:
• Manual annotation of whole manuscript
• Use of standard gene nomenclature
• Indexing of assay types and embryonic ages
17. • Standard nomenclature
• Extensive use of controlled vocabularies
• Manual and computational consistency checks
• Editorial Interface and QC reports
• Detailed and regularly updated editorial guidelines
Data Quality Control
18. Data Quality Control
• Text-based annotations complemented by primary image data
• Annotations are NOT based on our own interpretation of the
images. They strictly rely on the statements of the authors.
• Resolution of annotations is determined by details provided in
the text of the manuscript.
• We notify authors once data for their publications have been
entered. Authors can provide comments and additional
information.
21. Incorporation of large-scale data sets
• Develop parsers to extract and evaluate data
• Manual and computational quality controls
- verify gene identity: probe to gene mapping
- verify probe identity: probe already in database?
- map results to anatomical ontology
and other controlled vocabularies
- resolve ambiguities
- complete annotations
• Bring data in standardized format for data loads
• Bulk-load curated data in GXD
22. GXD adds value to large-scale data sets
from other databases
• data are integrated with all the other data in GXD and MGI
• data are accessible via many new search parameters
• data and data connections are maintained
and kept up-to-date
23. GXD: Current Data Content
249,010 Expression Images
1,394,685 Annotated Expression Results
63,374 Expression Assays
13,751 Genes
1,820 Mouse Mutants with Expression Data
24. • Gene Expression Data Query Forms
• Expression Data Summaries
• Expression Assay Details
• Images
Improved Search and Display Capabilities
31. 1824 genes annotated to
DNA binding
Expression data are
available for this gene set
(otherwise ‘DNA binding’
would be greyed out).
Auto-fill
function
33. New Summary – Assay Results
• 4 sortable data summaries: genes, assays, assay results, images
• links to detailed annotations and images
• summary data can be downloaded and exported to other applications
Sort
36. New Summary – Assay Results
• 4 sortable data summaries: genes, assays, assay results, images
• links to detailed annotations and images
• summary data can be downloaded and exported to other applications
Sort
43. • Gene Expression Data Query Forms
- improved layout
- new query capabilities
• Strongly enhanced query performance
• Expression Data Summaries
- more flexible and interactive
- option to download and export data
- image summaries
• Expression Assay Details
- integration of images and annotations
- improved layout - focus on essential data
Improved Search and Display Capabilities
44. • MGI Batch Query
• GXD BioMart
New ways to access GXD Data
45. • Enter list of gene symbols or IDs and look up associated expression data
• Download data and export data to other applications
46. GXD BioMart
Find expression data
• for a gene
• for a list of genes
• for an anatomical
structure
• for a mutant
• for a reference
Integrated searches
across different
BioMarts
47. GXD BioMart: Query Results (default view)
Export Data
Link to ImagesLink to Assay Details
48. Constance Smith
Jacqueline Finger
Terry Hayamizu
Ingeborg McCright
Jingxia Xu
David Shaw
Joanne Berghout
MGI Software Group
Jim Kadin
Joel Richardson
Janan Eppig
Acknowledgements
GXD is supported by NICHD