Five-minute presentation for the SSI Fellowship selection day at Manchester University, 10th November 2014.
I present and overview of the current state of genome-scale evolutionary analyses ('phylogenomics') and highlight some technical and cultural problems in the field which I hope to address using funding from the SSI to run a series of phylogenetic software curation hackathons.
6.1 Pests of Groundnut_Binomics_Identification_Dr.UPR
Software Sustainability Institute - Fellowship Selection Day 2014 - Joe Parker
1. Massively parallel phylogenomics
Joe Parker
Queen Mary University of London
Phylogenomics matters
flickr/ste phe njjo hnso n Illumina. co m spe ctrum. ie e e . o rg
• Interface of genetics, evolution, statistics, computation
• Disease, adaptation, climate change, medicine
• Distributed, parallel & high-performance computation
2. Challenges
• Size of data sets
• Heterogeneous architecture
• Use of non-standard workflows
• Limited testing
• Time-limited projects
• Versioning / legacy
• Collaboration / silos
• Compatibility
Distribution
Source / binaries
Plugins, Git / SVN
Commercial
Open
Languages
Python Perl
C/C++ Java Javascript
Ruby R Matlab
3. Objective
Portal for phylo tools
(e.g. wiki / blog / searchable metdata feed / stack exchange)
“ What are we all doing? ”
“ Surely this has been done
before? ”
Solutions
Projects
Binaries
Source
Discovery
Tag clouds / wiki
Metadata
Catalogue
Inputs / outputs
Versioning
Architecture
“ How do you do that? ”
4. Approach
Six Hackspace / networking events
“ Live stack e xchang e fo r phylo g e ne tics”
Portal
Share
Hack
Learn
Discovery
Existing tools
Best practice
Problems
Community
Build network
Develop consensus
Engage
• Build awareness / community, develop consensus on best-practice, share
hacks and develop phlyo tools portal.
• Research software engineers from academia, industry (including SMEs
and startups) and hackers.
• London’s Silicon Roundabout; space hire, refreshments and dissemination.
(L-R): Joe Parker; GeorgiaTsagkogeorga; Kalina Davies; Steve Rossiter; Xiuguang Mao; Seb Bailey
5. Outputs
Live-blog & documentation
Live stack e xchang e fo r phylo g e ne tics, disseminate
Resource for tools metadata
Community-spe cifie d, de ve lo pe d and maintaine d
Supporting SSI Aims
Sustainability through community
Re pe atability, communicatio n and co ntinuity
Legacy of best-practice
Pe e r-base d disseminatio n o f skills, pro blems, so lutio ns
6. Outputs
Live-blog & documentation
Live stack e xchang e fo r phylo g e ne tics, disseminate
Resource for tools metadata
Community-spe cifie d, de ve lo pe d and maintaine d
Supporting SSI Aims
Sustainability through community
Re pe atability, communicatio n and co ntinuity
Legacy of best-practice
Pe e r-base d disseminatio n o f skills, pro blems, so lutio ns
Notas del editor
Lots of existing tools - not pralleleised for batch
‘cloud, db, network cartoons + phylogeny’
binaries from author plugins repos vendor
Repository for bio tools
Input
Output
Process
Binaries / test harnesses
Turn these into a network diag
Identify areas for collaboration / enhancements
Awareness of other developers / groups
Document best practices, hacks, common pitfalls
Live stack exchange for phylogenetics
Use ‘SSI Aims - action - outcome’
Identify areas for collaboration / enhancements
Awareness of other developers / groups
Document best practices, hacks, common pitfalls
Live stack exchange for phylogenetics
Use ‘SSI Aims - action - outcome’