Manual curation is crucial to improving the quality of the annotations of a genome. It enables curators to refine automated gene predictions using experimental data and aligned predictions from closely related organisms to more accurately represent the underlying biology. Apollo is a web-based genome annotation editor that allows curators to manually revise and edit the structure and function of predicted genomic elements.
Apollo, built on top of the JBrowse genome browser, offers an ‘Annotator Panel’ that allows users to efficiently navigate the genome and its annotations. Changes are reflected in real-time to all users (similar to Google Docs) and aggregated in a revertible, visual history of structural edits. Apollo allows the export of sequences and metadata associated with each annotated genomic element in FASTA, GFF3, or Chado. A single Apollo server can be scaled to support multiple genome projects and curators. Access to genomes is controlled with fine-grained permissions (e.g. administrator, curator, public). To support integration into larger workflows, we expose the suite of web services that drives user-interface functionality. These web- services have been leveraged to integrate with Docker and the Galaxy platform.
Striving to increase Apollo’s repertoire of visual exploration and exploratory analytics tools, two major undertakings are currently under development. First, the ability to visualize variant data and to annotate their predicted effects, primarily on coding regions. New technology trends and scientific paradigms point to new needs in genomic analytic tools to leverage information about variants that impact human health. Driven by a growing need to identify disease causing variants across diverse groups, we are working towards providing full functionality in genomic variant analysis and curation. Second, is the transformation of separate genomic coordinates into a single, synthetic region. This will allow the visualization of two or more genomics regions, from the length of entire chromosomes to just a few exons, within an artificially constructed genomic region. Artificially joining scaffolds facilitates annotation of genomic features split across two or more regions of a fragmented assembly (e.g, scaffolds), likely informing potential improvements to the genome assembly in the process. Additionally, this will allow hiding (visual genome folding) intra- and intergenic regions to provide a more information-rich visualization of the genome. For example, bringing exons closer together will facilitate annotating gene models with long introns, as sequences at the edge of exons separated by thousands of base-pairs will be shown adjacently.
Apollo is currently being used in over one hundred genome annotation projects around the world, ranging from annotation of a single species to lineage-specific efforts supporting the annotation of dozens of species at a time.
2. ge nom e arc hite c t. org
Genome Annotation
Structural Annotation
• exons, introns, UTRs
• repeat regions
• transposable elements
• tRNA, snRNA, snoRNA,
miRNA, ncRNA, rRNA
2
Functional Annotation
• metabolic pathways / functions
• Gene Ontology
• molecular function
• biological process
• cellular component
• expression
• gene families
http://geneontology.orgPhoto Credit: Alex Wild at http://www.alexanderwild.com/
3. ge nom e arc hite c t. org
Functional Annotation
• metabolic pathways / functions
• Gene Ontology
• molecular function
• biological process
• cellular component
• expression
• gene families
Genome Annotation
Structural Annotation
• exons, introns, UTRs
• repeat regions
• transposable elements
• tRNA, snRNA, snoRNA,
miRNA, ncRNA, rRNA
3http://geneontology.orgPhoto Credit: Alex Wild at http://www.alexanderwild.com/
4. ge nom e arc hite c t. org
Example Genome Analysis Workflow
4
Experimental design,
sampling
Comparative analyses
Curated
Gene Set
Manual
Annotation
Sequencing
Synthesis &
dissemination
Create
Assembly
FGENESH
Automated
Annotation
5. ge nom e arc hite c t. org
Experimental design,
sampling
5
Consensus
Gene Set
Automated
Annotation
Sequencing
Synthesis &
dissemination
FGENESH
Example Genome Analysis Workflow
Create
Assembly
Analyses need Quality Data
Comparative analyses
Annotation
Manual
Annotation
6. ge nom e arc hite c t. org
Integration into Workflow and Tools
6
• Over 100 organizations refine annotation
• Multiple genomes per organization
NCBI Ensemble
Refined Annotations Distributed to Public
7. ge nom e arc hite c t. org
Automated Identification is not Perfect
7
Automated
Annotation
Generation of Gene Models
Find ORFs, multiple rounds of gene prediction
Annotation of Gene Models
Predicting function, expression patterns,
metabolic network memberships
• Assembly errors can cause fragmented annotations
• Limited coverage makes precise identification difficult
Manual
Annotation
8. ge nom e arc hite c t. org
Human Analysis
Automated
Annotation
Manual Annotation Refines Genome
8
Experimental Evidence
cDNAs, HMM domain searches, RNAseq,
genes from other species.
• Additional data
• Biological knowledge
• Curator best represents underlying evidence
Manual
Annotation
9. ge nom e arc hite c t. org
9
Annotators
Apollo
Google Web Toolkit
(GWT) / Bootstrap
Apollo is a Tool for Collaborative Annotation
Annotators
Apollo
Google Web Toolkit
(GWT) / Bootstrap
Annotators
Apollo
Google Web Toolkit
(GWT) / Bootstrap
• Web-based Editor
• Real-time collaborative
• Easy to use
• genomic browser
Photo Credits: i5K; Alex Wild at http://www.alexanderwild.com/: leaf cutter ant, ensign wasp; Leo Bukeboom:
Nasonia vitripennis jewel wasp; Wikimedia Commons: Apis mellifera honey bee; Mike MacNeil USDA/ARS Fort
Keogh LARRL: Bos taurus cow.
10. ge nom e arc hite c t. org
1 - Evidence Viewer / Genome Browser
10
Evidence
Transcripts
(GFF3, GBK)
BAM Reads
Transcripts
(GFF3, GBK)
BigWig XY
BigWig
HeatMap
Themes
(dark/light)
Color CDS Frame
Automated
Annotation
Manual Annotation
11. ge nom e arc hite c t. org
1 - Evidence Viewer (Genome Browser)
11
Dynamically Open
Configure Multiple Tracks
addStores={"url":{"type":"JBrowse/Store/SeqFeature/
GFF3","urlTemplate":"http://host/genes.gff"}}
&addTracks=[{"label":"genes","type":"JBrowse/View/
Track/CanvasFeatures","store":"url"}]
Append via URL
Statically Configure
• BAM
• BigWig
• GFF
• GTF
• GBK
• VCF
• FASTA
• FASTAi
• SPARQL
• custom
types (e.g.,
REST end-
point)
https://gmod.github.io/jbrowse-registry/
12. ge nom e arc hite c t. org
2 - Genome Annotation Editor
12
Transcripts
(GFF3, GBK)
BAM Reads
Transcripts
(GFF3, GBK)
BigWig XY
BigWig
HeatMap
Automated
Annotation
Manual Annotation
Exported Refined
Genomic Elements
13. ge nom e arc hite c t. org
13
Alignments shown in red
Annotate other genomic
types with drop-down
Create Annotation
Add Annotation by
Dragging a Genomic Element
14. ge nom e arc hite c t. org
Edit Annotation Structure
14
Adjust exon by dragging
15. ge nom e arc hite c t. org
Editing Annotations
15
Edit Additional
Structural Data
(right-click popup)
Edit Associations
• PubMed / dbxref
• Gene Ontology
• Metadata
• key/value
• status
• comments
Change Annotation
Type
History of
Structural Edits
16. ge nom e arc hite c t. org
Edit Annotation Structure
16
Revertible History of Structural Operations
Current position
Highlighted row shown
17. ge nom e arc hite c t. org
Annotate Reference Sequence Alterations
17
Alteration Reflected
18. ge nom e arc hite c t. org
18
Search
View / Edit Details
List / Navigate Vertically
Collapsible
3 - Annotator Panel
Link to
Location
Alternate Annotations View
19. ge nom e arc hite c t. org
Reference Sequence - Search and Export
19
Search
Navigation
Export Annotations
20. ge nom e arc hite c t. org
Organism: Configuration
20
Import JBrowse directory
Share “Public” organisms
Genome Res. 2009 Sep;19(9):1630-8. doi: 10.1101/gr.094607.109
Create JBrowse tracks from FASTA / GFF3 / BAM / BigWig
21. ge nom e arc hite c t. org
Admin: Users and Groups
21
Add / Search Users
Edit User
Permission
User Can “Admin” an Organism
Use Groups to
Manage Bulk
Permissions
22. ge nom e arc hite c t. org
Apollo Server - Grails
Security
Architecture
22
Web Services Client
Perl, Shell, Groovy, PHP, etc.
Annotators
Apollo
Google Web Toolkit
(GWT) / BootstrapJBrowse
DOJO / jQuery
WebSocket
JDBC
File
System
Apollo
Client(s)
Server
REST
23. ge nom e arc hite c t. org
Apollo Server - Grails
Security
Architecture
23
Web Services Client
Perl, Shell, Groovy, PHP, etc.
Annotators
Apollo
Google Web Toolkit
(GWT) / BootstrapJBrowse
DOJO / jQuery
WebSocket
JDBC
File
System
Apollo
Client(s)
Server
REST
24. ge nom e arc hite c t. org
Apollo Server - Grails
Security
Architecture
24
Web Services Client
Perl, Shell, Groovy, PHP, etc.
Annotators
Apollo
Google Web Toolkit
(GWT) / BootstrapJBrowse
DOJO / jQuery
WebSocket
JDBC
File
System
Apollo
Server
Client(s)
REST
25. ge nom e arc hite c t. org
Apollo Server - Grails
Security
Architecture
25
Web Services Client
Perl, Shell, Groovy, PHP, etc.
Annotators
Apollo
Google Web Toolkit
(GWT) / BootstrapJBrowse
DOJO / jQuery
WebSocket
JDBC
File
System
Apollo
Server
Client(s)
REST
26. ge nom e arc hite c t. org
Apollo Server - Grails
Security
Architecture
26
Web Services Client
Perl, Shell, Groovy, PHP, etc.
Annotators
Apollo
Google Web Toolkit
(GWT) / BootstrapJBrowse
DOJO / jQuery
WebSocket
JDBC
File
System
Apollo
Client(s)
Server
REST
27. ge nom e arc hite c t. org
Summary
Annot
ators
Apollo
Google Web Toolkit
(GWT) / Bootstrap
Apollo
Google Web Toolkit
(GWT) / Bootstrap
Apollo
Google Web Toolkit
(GWT) / Bootstrap
Real-time collaborative
Curators refine genome annotations
Integrates within workflow
Visual evidence and feedback
28. ge nom e arc hite c t. org
* Coordinate
Transformation
Future Work: Coordinate Transform
28
Mavenize
Web Apollo
Desktop Apollo
DB backend, Sidebar,
Grails, Multi-
organism, WS
1.0
2.0
2.1
2.2
* Variant annotation
and visualization
2.3
Genome Folding
Phenotype
annotation
Assembly Composition
Group 20 Group 31
29. ge nom e arc hite c t. org
Combine Scaffolds
Lock and Orient Combined Scaffolds
Used Scaffolds
Set Orientation
Select to
Combine
Drag to Rearrange
30. ge nom e arc hite c t. org
Combine Scaffolds
View Individual Features
31. ge nom e arc hite c t. org
Variant Annotation and Visualization
Mavenize
Web Apollo
Desktop Apollo
DB backend, Sidebar,
Grails, Multi-
organism, WS
1.0
2.0
2.1
2.2
2.3
Annotate Variants
Phenotype
annotation
Visual Predictions
* Coordinate
Transformation
* Variant annotation
and visualization
Create from Evidence (e.g., VCF)
32. • Berkeley Bioinformatics Open-source Projects (BBOP),
Berkeley Lab: Apollo and Gene Ontology teams. Suzanna E.
Lewis (PI).
• § Christine G. Elsik (PI). University of Missouri.
• * Ian Holmes (PI). University of California Berkeley.
• Stephen Ficklin, GenSAS, Washington State University
• Apollo is supported by NIH grants 5R01GM080203 from
NIGMS, and 5R01HG004483 from NHGRI. Also supported
by the Director, Office of Science, Office of Basic Energy
Sciences, of the U.S. Department of Energy under Contract
No. DE-AC02-05CH11231
• Alex Wild at http://www.alexanderwild.com/: leaf cutter
ant, ensign wasp; Leo Bukeboom: Nasonia vitripennis jewel
wasp; Wikimedia Commons: Apis mellifera honey bee;
Mike MacNeil USDA/ARS
• Thanks to you and the Apollo / GMOD
Communities
Apollo
*Nathan Dunn
Monica Munoz-Torres
Deepak Unni §
Colin Diesh §
JBrowse
Eric Yao
Texas A&M
University
Eric Rasche
Gene Ontology
Chris Mungall
Seth Carbon
Jeremy Nguyen
BBOP
Apollo: http://genomearchitect.org
https://github.org/GMOD/Apollo/
Questions?
NAL at USDA
Christopher Childers
Monica Poelchau
Yu-Yu “Fish” Lin
@apollo_bbop