The SciLifeLab National Genomics Infrastructure is one of the largest sequencing facilities in Europe. We are an accredited facility providing library preparation, sequencing, basic analysis and quality control for Swedish research groups. Our sample throughput requires a highly automated and robust bioinformatics platform. Until recently, we had multiple analysis pipelines built with a range of different workflow tools for each data type. This made development work difficult and led to inevitable technical debt. In this talk I will describe how we have migrated to Nextflow for a range of our data types, the difficulties that we faced and how we hope to leverage Nextflow to migrate to the cloud in coming years.
4. NGI bioinformatics
• Initial data analysis for major protocols
• Internal QC and standardised starting
point for users
• Team of 10 bioinformaticians
• Accredited facility
9. sharing is caring
• Open-source on GitHub from day one
• Easier help and feedback from others
• Other people may help to develop your code
• https://github.com/nextflow-io/awesome-nextflow
11. use containers
• Create a docker image, even if you don’t think you
need to
• Makes local and automated testing possible
• Future proof for cloud / singularity / other people
13. test, test and test again
• Find a small test dataset
• Make a bash script to fetch data and launch pipeline
• Different flags to launch with different parameters
• Use Travis build matrix to launch parallel test runs
17. minimal configs
• Build config files around blocks of function
• Hardware / software deps / genome references
• Use nextflow profiles
• Even if only using ‘standard’ default
• Don’t be afraid to use multiple configs per profile
• Build on a base profile and be clever with
limits
21. reference genomes
• illumina iGenomes is a great resource for this
• Standard organisation allows easy use of multiple
genomes
• Use AWS iGenomes for free on AWS S3
• See https://ewels.github.io/AWS-iGenomes/
40. future plans
• Use singularity for everything
• Benchmark AWS run pricing for future
planning
• Refine pipelines
• Improve resource requests
• Automate launch and run management