The document proposes an Oz Mammals Bioinformatics and Data Resource to store, share, and analyze genomic and other data from Australian mammal studies. It would:
1) Capture existing Oz mammal data and resources, provide long-term storage, and integrate new genomic data from the OMG Project.
2) Enable data sharing within the OMG project and provide access to Oz mammal data worldwide.
3) Give access to data processing, analysis, and visualization tools, and integrate with external resources like the Atlas of Living Australia.
Disentangling the origin of chemical differences using GHOST
An Oz Mammals Bioinformatics and Data Resource
1. Vicky Schneider, Andrew Pask, Denis O’Meally,
Philippa Griffin, Jeff Christiansen, Mike
Charleston, Dominique Gorse, Andrew Treloar,
Jason Williams, Rebecca Johnson and Andrew Lonie
With comments from Paul Flicek, Michelle Barker,
Susanna Sansone, Dave Burt
An Oz Mammals
Bioinformatics and
Data Resource
2. • Whole-genome
sequencing reads
• Exon-capture
sequencing reads
• RADseq/GBS/exon-
capture sequencing
reads
Raw Processed
• Genome assemblies
• Gene alignments
• Phylogenetic trees
• Variant calls
• Transcriptome data
• Annotations
• Sequence alignments
• Phylogenetic trees
• Microsatellite datasets
• Cytological data (images?)
• Phenotype information
• …..... From Australian Alps on Flickr:
https://www.flickr.com/photos/australianalps/69549
40609 CC BY-NC-ND 2.0
3. From Australian Alps on Flickr:
https://www.flickr.com/photos/australianalps/69549
40609 CC BY-NC-ND 2.0
OMG Project
data
● Hugely valuable for
○ Understanding our natural heritage
○ Tackling evolutionary and ecological questions
○ Placental mammal research including human
biomedical research
● Uniquely Australian
● Often irreplaceable samples and data
We are leading the world in generating these data
Could also be leading in sharing the data!
+
5. Data Life Cycle
framework
visualising
Metadata (contextual
information about the data)
is key to making this work
e.g.
Sample
- species
- tissue type
- collection
location
- museum ID
Experiment
- sample
processing
method
- technology used
- settings
6. • A place to store and share data and metadata for the OMG project
• A place to store and share existing Oz mammal datasets
• A place to share data processing/analysis workflows
• A place to access data processing, analysis and visualisation tools
(with appropriate compute resources)
• Integration with external tools, e.g. Atlas of Living Australia
What is not covered by the OMG project?
7. • 5 strains x 2 growth conditions of 2 bacterial species
• Genomic, transcriptomic, metabolomic and proteomic profiles
8.
9. Select datasets using drop-down menu
of metadata values:
e.g.
• ‘all raw transcriptomic datasets from
E. coli grown in blood media’
• ‘all datasets from bacterial samples
collected from patients in NSW
before 2010’
Send to local
desktops, HPC
systems for
analysis
Log in
Process/analyse/visualise
in common cloud-based
environment with pre-
installed software tools
Submit data and
metadata to
international
repository
10. • Large collaborative project funded by Research Data Services (RDS), linked
with the BPA Antibiotic-Resistant Pathogens Project
• Project members from VicNode, QCIF, Melbourne Bioinformatics (formerly
VLSCI), Intersect
• -> There is expertise in Australia in developing this kind of resource
• data storage
• research data management
• delivering analysis tools in a common cloud environment
• linking across storage/management/analysis layers
• Many of the pieces can be reused/adapted for different research projects
11. Existing Oz Mammal
data resources• For within-project
collaboration
• Focus on data sharing,
(storage), community
genome annotation
• Datasets mostly
unpublished as yet
Tools:
• File downloads
• JBrowse
• BLAST
• Apollo
12. http://copo-project.org/
Aims to provide an easy-to-use
interface for researches to
access interoperable
• Metadata annotation
services
• Data repository services
• Data analysis services
• Data publishing services
14. • Capture Oz Mammal data and resources that already exist
• long-term, secure data storage
• Integrate new OMG data and metadata
• Enable data sharing within OMG project (and collaborators)
• Provide access to Oz Mammal data for the world!
What could a well-funded Oz Mammals Data
and Bioinformatics Resource do?
15. • Capture Oz Mammal data and resources that already exist
• long-term, secure data storage
• Integrate new OMG data and metadata
• Enable data sharing within OMG project (and collaborators)
• Provide access to Oz Mammal data for the world!
• Access to data processing, analysis and visualisation tools in one
place
• Integrate external tools, e.g. Atlas of Living Australia
• Enable sharing of processing/analysis workflows within the project
What could a well-funded Oz Mammals Data
and Bioinformatics Resource do?
16. • Capture Oz Mammal data and resources that already exist
• long-term, secure data storage
• Integrate new OMG data and metadata
• Enable data sharing within OMG project (and collaborators)
• Provide access to Oz Mammal data for the world!
• Access to data processing, analysis and visualisation tools in one
place
• Integrate external tools, e.g. Atlas of Living Australia
• Enable sharing of processing/analysis workflows within the project
• Enable sharing via submission to appropriate international
repositories
• encourage best-practice data formats
• encourage complete, rich metadata that complies with repository
and community standards
What could a well-funded Oz Mammals Data
and Bioinformatics Resource do?
17. • Capture Oz Mammal data and resources that already exist
• long-term, secure data storage
• Integrate new OMG data and metadata
• Enable data sharing within OMG project (and collaborators)
• Provide access to Oz Mammal data for the world!
• Access to data processing, analysis and visualisation tools in one
place
• Integrate external tools, e.g. Atlas of Living Australia
• Enable sharing of processing/analysis workflows within the project
• Enable sharing via submission to appropriate international
repositories
• encourage best-practice data formats
• encourage complete, rich metadata that complies with repository
and community standards
• Use and build on existing platforms like the OMICS platform
• Long-term hosting and maintenance
What could a well-funded Oz Mammals Data
and Bioinformatics Resource do?
18.
19. Current way forward
• Drafting a proposal aimed at ANDS/NeCTAR/RDS
• No funding scheme yet - but possibly later this year
• Engaging with European Bioinformatics Institute (Ensembl Vertebrates),
ISA-Tools, Cyverse for potential collaborations and advice
• Aligns with broader digital infrastructure strategy currently being mapped
at national level
Vicky Schneider, Andrew Pask, Denis
O’Meally, Philippa Griffin, Jeff Christiansen, Mike
Charleston, Dominique Gorse, Andrew Treloar,
Jason Williams, Rebecca Johnson and Andrew
Lonie
With comments from Paul Flicek, Michelle Barker,
Susanna Sansone, Dave Burt
20. Timescale
• Year 1-2: scoping requirements, building, ongoing testing
• Year 2-3: building, release, outreach/training, improvement
Expertise required
• Research Software Engineering
• Business Analyst expertise + domain knowledge
• Biocuration
• Input on Bioinformatics Needs
• Input on User Experience Design
• Input on Training/Outreach
• Project Management
For comparison
• COPO: 4 FTE for 3 years
• Cyverse: 35 FTE for 5 years ( US$100 million
over 10 years )
Matt Francey on Flickr: https://www.flickr.com/photos/howfardad/31879952075
CC BY-NC 2.0
21. Your thoughts?
• Are there OMG project needs not covered in this
list?
• Any other Oz Mammal portals/resources to be
aware of / consider incorporating?
• What do you see as the highest priority in data
management / accessing compute resources /
sharing and storing data for the OMG:
• Currently?
• A year from now?