SlideShare una empresa de Scribd logo
1 de 41
Descargar para leer sin conexión
NGI stockholm
NGI-ChIPseq
Processing ChIP-seq data at the
National Genomics Infrastructure
Phil Ewels
phil.ewels@scilifelab.se
NBIS ChIP-seq tutorial
2017-11-29
NGI stockholm
SciLifeLab NGI
Our mission is to offer a

state-of-the-art infrastructure

for massively parallel DNA sequencing
and SNP genotyping, available to
researchers all over Sweden
NGI stockholm
SciLifeLab NGI
National resource
State-of-the-art
infrastructure
Guidelines and
support
We provide 

guidelines and support

for sample collection, study
design, protocol selection and
bioinformatics analysis
NGI stockholm
NGI Organisation
NGI Stockholm NGI Uppsala
NGI stockholm
NGI Organisation
Funding
Staff salaries
Premises and service
contracts
Capital equipment
Host universities
SciLifeLab
VR
KAW
User fees
Reagent costs
NGI Stockholm NGI Uppsala
NGI stockholm
Project timeline
Sample QC
Library
preparation,
Sequencing,
Genotyping
Data processing
and primary
analysis
Scientific support
and project
consultation
Data delivery
NGI stockholm
Methods offered at NGI
Exome

sequencing
Nanoporesequencing
ATAC-seq
Metagenomics
ChIP-seqBisulphite

sequencing
RAD-seq
RNA-seq
de novo
Whole
Genome
seq
Data
analysis
included for
FREE
Just

Sequencing
Accredited methods
NGI stockholm
ChIP-seq: NGI Stockholm
• You do the ChIP, we do the seq

• Rubicon ThruPlex DNA (NGI Production)

• Min 1 ng input
• Min 10 μl
• 0.2-10 ng/μl
• Ins. size 200-800 bp
• 963 kr / prep
NGI stockholm
ChIP-seq: NGI Stockholm
• You do the ChIP, we do the seq

• Rubicon ThruPlex DNA (NGI Production)
• Typically run SE 50bp

• Illumina HiSeq High
Output mode v4, SR
1x50bp
• 1226 kr / sample
(40M reads)
NGI stockholm
ChIP-seq: NGI Stockholm
• You do the ChIP, we do the seq

• Rubicon ThruPlex DNA (NGI Production)
• Typically run SE 50bp

• Start by organising a
planning meeting
https://ngisweden.scilifelab.se
NGI stockholm
ChIP-seq Pipeline
• Takes raw FastQ sequencing data as input

• Provides range of results

• Alignments (BAM)
• Peaks (optionally filtered)
• Quality Control
• Pipeline in use since early 2017 (on request)
NGI-ChIPseq
ChIP-seq Pipeline
FastQC

TrimGalore!

BWA

Samtools, Picard

Phantompeakqualtools 

deepTools

NGSPlot

MACS2

Bedtools

MultiQC
Sequence QC
Read trimming
Alignment
Sort, index, mark duplicates
Strand cross-correlation QC
Fingerprint, sample correlation
TSS / Gene profile plots
Peak calling
Filtering blacklisted regions
Reporting
NGI-ChIPseq
FastQ
BAM
HTML
BED
NGI stockholm
Nextflow
• Tool to manage computational pipelines

• Handles interaction with compute infrastructure

• Easy to learn how to run, minimal oversight required
NGI stockholm
Nextflow
https://www.nextflow.io/
NGI stockholm
Nextflow
#!/usr/bin/env nextflow
input = Channel.fromFilePairs( params.reads )
process fastqc {
input:
file reads from input
output:
file "*_fastqc.{zip,html}" into results
script:
"""
fastqc -q $reads
"""
}
https://www.nextflow.io/
NGI stockholm
Default: Run locally, assume
software is installed
Nextflow
#!/usr/bin/env nextflow
input = Channel.fromFilePairs( params.reads )
process fastqc {
input:
file reads from input
output:
file "*_fastqc.{zip,html}" into results
script:
"""
fastqc -q $reads
"""
} process {
executor = 'slurm'
clusterOptions = { "-A b2017123" }
cpus = 1
memory = 8.GB
time = 2.h
$fastqc {
module = ['bioinfo-tools', ‘FastQC']
}
}
Submit jobs to SLURM queue
Use environment modules
NGI stockholm
Nextflow
#!/usr/bin/env nextflow
input = Channel.fromFilePairs( params.reads )
process fastqc {
input:
file reads from input
output:
file "*_fastqc.{zip,html}" into results
script:
"""
fastqc -q $reads
"""
} process {
executor = 'slurm'
clusterOptions = { "-A b2017123" }
cpus = 1
memory = 8.GB
time = 2.h
$fastqc {
module = ['bioinfo-tools', ‘FastQC']
}
}
docker {
enabled = true
}
process {
container = 'biocontainers/fastqc'
cpus = 1
memory = 8.GB
time = 2.h
}
Run locally, use docker container
for all software dependencies
NGI stockholm
NGI-ChIPseq
https://github.com/SciLifeLab/NGI-ChIPseq
NGI stockholm
NGI-ChIPseq
https://github.com/SciLifeLab/NGI-ChIPseq
NGI stockholm
Running NGI-ChIPseq
Step 1: Install Nextflow

• Uppmax - load the Nextflow module
module load nextflow
• Anywhere (including Uppmax) - install Nextflow
curl -s https://get.nextflow.io | bash
Step 2: Try running NGI-ChIPseq pipeline

nextflow run SciLifeLab/NGI-ChIPseq --help
NGI stockholm
Running NGI-ChIPseq
Step 3: Choose your reference

• Common organism - use iGenomes
--genome GRCh37
• MACS peak calling config file
--macsconfig config.csv
Step 4: Organise your data

• One (if single-end) or two (if paired-end) FastQ per sample
• Everything in one directory, simple filenames help!
NGI stockholm
Running NGI-ChIPseq
Step 5: Run the pipeline on your data

• Remember to run detached from your terminal
screen / tmux / nohup
Step 6: Check your results

• Read the Nextflow log and check the MultiQC report
Step 7: Delete temporary files

• Delete the ./work directory, which holds all intermediates
NGI stockholm
Using UPPMAX
nextflow run SciLifeLab/NGI-ChIPseq
--project b2017123
--genome GRCh37 --macsconfig p.txt
--reads "data/*_R{1,2}.fastq.gz"
• Default config is for UPPMAX

• Knows about central iGenomes references
• Uses centrally installed software
NGI stockholm
Using other clusters
nextflow run SciLifeLab/NGI-ChIPseq
-profile hebbe
--bwaindex ./ref --macsconfig p.txt
--reads "data/*_R{1,2}.fastq.gz"
• Can run just about anywhere

• Supports local, SGE, LSF, SLURM, PBS/Torque,
HTCondor, DRMAA, DNAnexus, Ignite, Kubernetes
NGI stockholm
Using Docker
nextflow run SciLifeLab/NGI-ChIPseq
-profile docker
--fasta genome.fa --macsconfig p.txt
--reads "data/*_R{1,2}.fastq.gz"
• Can run anywhere with Docker

• Downloads required software and runs in a container
• Portable and reproducible.
NGI stockholm
Using AWS
nextflow run SciLifeLab/NGI-ChIPseq
-profile aws
--genome GRCh37 --macsconfig p.txt
--reads "s3://my-bucket/*_{1,2}.fq.gz"
--outdir "s3://my-bucket/results/"
• Runs on the AWS cloud with Docker

• Pay-as-you go, flexible computing
• Can launch from anywhere with minimal configuration
NGI stockholm
Input data
ERROR ~ Cannot find any reads matching: XXXX
NB: Path needs to be enclosed in quotes!
NB: Path requires at least one * wildcard!
If this is single-end data, please specify

--singleEnd on the command line.
--reads '*_R{1,2}.fastq.gz'
--reads '*.fastq.gz' --singleEnd
--reads *_R{1,2}.fastq.gz
--reads '*.fastq.gz'
--reads sample.fastq.gz
NGI stockholm
Read trimming
• Pipeline runs TrimGalore! to remove adapter
contamination and low quality bases automatically

• Use --notrim to disable this
• Some library preps also include additional adapters

--clip_r1 [int]
--clip_r2 [int]
--three_prime_clip_r1 [int]
--three_prime_clip_r2 [int]
NGI stockholm
Blacklist filtering
• Some parts of the reference genome collect incorrectly
mapped reads

• Good practice to remove these peaks
• Pipeline has ENCODE regions for Human & Mouse

• Can pass own BED file of custom regions

--blacklist_filtering
--blacklist regions.bed
NGI stockholm
Broad Peaks
• Some chromatin profiles don't have narrow, sharp peaks

• For example, H3K9me3 & H3K27me3
• MACS2 can call peaks in "broad peak" mode

• Pipeline uses default qvalue cutoff of 0.1
--broad
NGI stockholm
Extending Read Length
• When using single-end data, sequenced read length is
shorter than the sequence fragment length

• For DeepTools, need to "extend" the read length

• Set to 100bp by default. Use this parameter to
customise this value.
• Expected fragment length - sequence read length
--extendReadsLen [int]
NGI stockholm
Saving intermediates
• By default, the pipeline doesn't save some intermediate
files to your final results directory

• Reference genome indices that have been built
• FastQ files from TrimGalore!
• BAM files from STAR (we have BAMs from Picard)
--saveReference
--saveTrimmed
--saveAlignedIntermediates
NGI stockholm
Resuming pipelines
• If something goes wrong, you can resume a stopped
pipeline

• Will use cached versions of completed processes
• NB: Only one hyphen!
-resume
• Can resume specific past runs

• Use nextflow log to find job names
-resume job_name
NGI stockholm
Customising output
-name
Give a name to your run. Used in logs
and reports
--outdir Specify the directory for saved results
--saturation
Run saturation analysis, subsampling
reads from 10% - 100%
--email
Get e-mailed a summary report when
the pipeline finishes
NGI stockholm
Nextflow config files
• Can save a config file with defaults

• Anything with two hyphens is a params
params {
email = 'phil.ewels@scilifelab.se'
project = "b2017123"
}
process.$multiqc.module = []
./nextflow.config
~/.nextflow/config
-c /path/to/my.config
NGI stockholm
NGI-ChIPseq config
N E X T F L O W ~ version 0.26.1
Launching `SciLifeLab/NGI-ChIPseq` [deadly_bose] - revision: 28e24c2a2a
=========================================
NGI-ChIPseq: ChIP-Seq Best Practice v1.4
=========================================
Run Name : deadly_bose
Reads : data/*fastq.gz
Data Type : Single-End
Genome : GRCh37
BWA Index : /sw/data/uppnex/igenomes//Homo_sapiens/Ensembl/GRCh37/Sequence/BWAIndex/
MACS Config : data/macsconfig.txt
Saturation analysis : false
MACS broad peaks : false
Blacklist filtering : false
Extend Reads : 100 bp
Current home : /home/phil
Current user : phil
Current path : /home/phil/demo_data/ChIP/Human/test
Working dir : /home/phil/demo_data/ChIP/Human/test/work
Output dir : ./results
R libraries : /home/phil/R/nxtflow_libs/
Script dir : /home/phil/GitHub/NGI-ChIPseq
Save Reference : false
Save Trimmed : false
Save Intermeds : false
Trim R1 : 0
Trim R2 : 0
Trim 3' R1 : 0
Trim 3' R2 : 0
Config Profile : UPPMAX
UPPMAX Project : b2017001
E-mail Address : phil.ewels@scilifelab.se
====================================
NGI stockholm
Version control
$fastqc.module = ['FastQC/0.7.2']
$trim_galore.module = ['TrimGalore/0.4.1', 'FastQC/0.7.2']
$bw.module = ['bwa/0.7.8', 'samtools/1.5']
$samtools.module = ['samtools/1.5', 'BEDTools/2.26.0']
$picard.module = ['picard/2.10.3', 'samtools/1.5']
$phantompeakqualtools.module = ['phantompeakqualtools/1.1']
$deepTools.module = ['deepTools/2.5.1']
$ngsplot.module = ['samtools/1.5', 'R/3.2.3', 'ngsplot/2.61']
$macs.module = ['MACS/2.1.0']
$saturation.module = ['MACS/2.1.0', 'samtools/1.5']
$saturation_r.module = ['R/3.2.3']
NGI stockholm
Version control
• Pipeline is always released under a stable version tag

• Software versions and code reproducible

• For full reproducibility, specify version revision when
running the pipeline
nextflow run SciLifeLab/NGI-ChIPseq -r v1.3
Conclusion
• Use NGI-ChIPseq to prepare your data if you want:

• To not have to remember every parameter for every tool
• Extreme reproducibility
• Ability to run on virtually any environment
• Now running for all ChIPseq projects at NGI-Stockholm
Conclusion
NGI stockholm
SciLifeLab/NGI-RNAseq
https://github.com/
SciLifeLab/NGI-MethylSeq
SciLifeLab/NGI-smRNAseq
SciLifeLab/NGI-ChIPseq
MITLicence
Conclusion
NGI stockholm
SciLifeLab/NGI-RNAseq
https://github.com/
SciLifeLab/NGI-MethylSeq
SciLifeLab/NGI-smRNAseq
SciLifeLab/NGI-ChIPseq
Acknowledgements

Phil Ewels
Chuan Wang
Jakub Westholm
Rickard Hammarén
Max Käller
Denis Moreno
NGI Stockholm Genomics Applications
Development Group
support@ngisweden.se
http://opensource.scilifelab.se

Más contenido relacionado

La actualidad más candente

Reproducible Computational Pipelines with Docker and Nextflow
Reproducible Computational Pipelines with Docker and NextflowReproducible Computational Pipelines with Docker and Nextflow
Reproducible Computational Pipelines with Docker and Nextflowinside-BigData.com
 
Real-time streams and logs with Storm and Kafka
Real-time streams and logs with Storm and KafkaReal-time streams and logs with Storm and Kafka
Real-time streams and logs with Storm and KafkaAndrew Montalenti
 
Lenovo system management solutions
Lenovo system management solutionsLenovo system management solutions
Lenovo system management solutionsinside-BigData.com
 
Optimizing, profiling and deploying high performance Spark ML and TensorFlow ...
Optimizing, profiling and deploying high performance Spark ML and TensorFlow ...Optimizing, profiling and deploying high performance Spark ML and TensorFlow ...
Optimizing, profiling and deploying high performance Spark ML and TensorFlow ...DataWorks Summit
 
The linux networking architecture
The linux networking architectureThe linux networking architecture
The linux networking architecturehugo lu
 
DPDK in Containers Hands-on Lab
DPDK in Containers Hands-on LabDPDK in Containers Hands-on Lab
DPDK in Containers Hands-on LabMichelle Holley
 
Spack - A Package Manager for HPC
Spack - A Package Manager for HPCSpack - A Package Manager for HPC
Spack - A Package Manager for HPCinside-BigData.com
 
State of Containers and the Convergence of HPC and BigData
State of Containers and the Convergence of HPC and BigDataState of Containers and the Convergence of HPC and BigData
State of Containers and the Convergence of HPC and BigDatainside-BigData.com
 
Twisted: a quick introduction
Twisted: a quick introductionTwisted: a quick introduction
Twisted: a quick introductionRobert Coup
 
iptables and Kubernetes
iptables and Kubernetesiptables and Kubernetes
iptables and KubernetesHungWei Chiu
 
Addressing DHCP and DNS scalability issues in OpenStack Neutron
Addressing DHCP and DNS scalability issues in OpenStack NeutronAddressing DHCP and DNS scalability issues in OpenStack Neutron
Addressing DHCP and DNS scalability issues in OpenStack NeutronVikram G Hosakote
 
Disaggregating Ceph using NVMeoF
Disaggregating Ceph using NVMeoFDisaggregating Ceph using NVMeoF
Disaggregating Ceph using NVMeoFZoltan Arnold Nagy
 
Feedback on Big Compute & HPC on Windows Azure
Feedback on Big Compute & HPC on Windows AzureFeedback on Big Compute & HPC on Windows Azure
Feedback on Big Compute & HPC on Windows AzureANEO
 
An Introduction to Twisted
An Introduction to TwistedAn Introduction to Twisted
An Introduction to Twistedsdsern
 
Red Hat Storage 2014 - Product(s) Overview
Red Hat Storage 2014 - Product(s) OverviewRed Hat Storage 2014 - Product(s) Overview
Red Hat Storage 2014 - Product(s) OverviewMarcel Hergaarden
 
Load Balancing 101
Load Balancing 101Load Balancing 101
Load Balancing 101HungWei Chiu
 
[En] IPVS for Docker Containers
[En] IPVS for Docker Containers[En] IPVS for Docker Containers
[En] IPVS for Docker ContainersAndrey Sibirev
 

La actualidad más candente (20)

Reproducible Computational Pipelines with Docker and Nextflow
Reproducible Computational Pipelines with Docker and NextflowReproducible Computational Pipelines with Docker and Nextflow
Reproducible Computational Pipelines with Docker and Nextflow
 
Real-time streams and logs with Storm and Kafka
Real-time streams and logs with Storm and KafkaReal-time streams and logs with Storm and Kafka
Real-time streams and logs with Storm and Kafka
 
Lenovo system management solutions
Lenovo system management solutionsLenovo system management solutions
Lenovo system management solutions
 
Optimizing, profiling and deploying high performance Spark ML and TensorFlow ...
Optimizing, profiling and deploying high performance Spark ML and TensorFlow ...Optimizing, profiling and deploying high performance Spark ML and TensorFlow ...
Optimizing, profiling and deploying high performance Spark ML and TensorFlow ...
 
The linux networking architecture
The linux networking architectureThe linux networking architecture
The linux networking architecture
 
DPDK in Containers Hands-on Lab
DPDK in Containers Hands-on LabDPDK in Containers Hands-on Lab
DPDK in Containers Hands-on Lab
 
Spack - A Package Manager for HPC
Spack - A Package Manager for HPCSpack - A Package Manager for HPC
Spack - A Package Manager for HPC
 
State of Containers and the Convergence of HPC and BigData
State of Containers and the Convergence of HPC and BigDataState of Containers and the Convergence of HPC and BigData
State of Containers and the Convergence of HPC and BigData
 
SDAccel Design Contest: Vivado HLS
SDAccel Design Contest: Vivado HLSSDAccel Design Contest: Vivado HLS
SDAccel Design Contest: Vivado HLS
 
Twisted: a quick introduction
Twisted: a quick introductionTwisted: a quick introduction
Twisted: a quick introduction
 
SDAccel Design Contest: Xilinx SDAccel
SDAccel Design Contest: Xilinx SDAccel SDAccel Design Contest: Xilinx SDAccel
SDAccel Design Contest: Xilinx SDAccel
 
iptables and Kubernetes
iptables and Kubernetesiptables and Kubernetes
iptables and Kubernetes
 
Addressing DHCP and DNS scalability issues in OpenStack Neutron
Addressing DHCP and DNS scalability issues in OpenStack NeutronAddressing DHCP and DNS scalability issues in OpenStack Neutron
Addressing DHCP and DNS scalability issues in OpenStack Neutron
 
Disaggregating Ceph using NVMeoF
Disaggregating Ceph using NVMeoFDisaggregating Ceph using NVMeoF
Disaggregating Ceph using NVMeoF
 
Inferno Scalable Deep Learning on Spark
Inferno Scalable Deep Learning on SparkInferno Scalable Deep Learning on Spark
Inferno Scalable Deep Learning on Spark
 
Feedback on Big Compute & HPC on Windows Azure
Feedback on Big Compute & HPC on Windows AzureFeedback on Big Compute & HPC on Windows Azure
Feedback on Big Compute & HPC on Windows Azure
 
An Introduction to Twisted
An Introduction to TwistedAn Introduction to Twisted
An Introduction to Twisted
 
Red Hat Storage 2014 - Product(s) Overview
Red Hat Storage 2014 - Product(s) OverviewRed Hat Storage 2014 - Product(s) Overview
Red Hat Storage 2014 - Product(s) Overview
 
Load Balancing 101
Load Balancing 101Load Balancing 101
Load Balancing 101
 
[En] IPVS for Docker Containers
[En] IPVS for Docker Containers[En] IPVS for Docker Containers
[En] IPVS for Docker Containers
 

Similar a NBIS ChIP-seq course

NBIS RNA-seq course
NBIS RNA-seq courseNBIS RNA-seq course
NBIS RNA-seq coursePhil Ewels
 
Standardising Swedish genomics analyses using nextflow
Standardising Swedish genomics analyses using nextflowStandardising Swedish genomics analyses using nextflow
Standardising Swedish genomics analyses using nextflowPhil Ewels
 
Attack monitoring using ElasticSearch Logstash and Kibana
Attack monitoring using ElasticSearch Logstash and KibanaAttack monitoring using ElasticSearch Logstash and Kibana
Attack monitoring using ElasticSearch Logstash and KibanaPrajal Kulkarni
 
NGINX Installation and Tuning
NGINX Installation and TuningNGINX Installation and Tuning
NGINX Installation and TuningNGINX, Inc.
 
FPGAs in the cloud? (October 2017)
FPGAs in the cloud? (October 2017)FPGAs in the cloud? (October 2017)
FPGAs in the cloud? (October 2017)Julien SIMON
 
Performance Optimization of SPH Algorithms for Multi/Many-Core Architectures
Performance Optimization of SPH Algorithms for Multi/Many-Core ArchitecturesPerformance Optimization of SPH Algorithms for Multi/Many-Core Architectures
Performance Optimization of SPH Algorithms for Multi/Many-Core ArchitecturesDr. Fabio Baruffa
 
The Popper Experimentation Protocol and CLI tool
The Popper Experimentation Protocol and CLI toolThe Popper Experimentation Protocol and CLI tool
The Popper Experimentation Protocol and CLI toolIvo Jimenez
 
Real time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache SparkReal time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache SparkRahul Jain
 
Virtual Knowledge Graphs for Federated Log Analysis
Virtual Knowledge Graphs for Federated Log AnalysisVirtual Knowledge Graphs for Federated Log Analysis
Virtual Knowledge Graphs for Federated Log AnalysisKabul Kurniawan
 
Running Accurate, Scalable, and Reproducible Simulations of Distributed Syste...
Running Accurate, Scalable, and Reproducible Simulations of Distributed Syste...Running Accurate, Scalable, and Reproducible Simulations of Distributed Syste...
Running Accurate, Scalable, and Reproducible Simulations of Distributed Syste...Rafael Ferreira da Silva
 
AWS re:Invent 2016: Large-Scale, Cloud-Based Analysis of Cancer Genomes: Less...
AWS re:Invent 2016: Large-Scale, Cloud-Based Analysis of Cancer Genomes: Less...AWS re:Invent 2016: Large-Scale, Cloud-Based Analysis of Cancer Genomes: Less...
AWS re:Invent 2016: Large-Scale, Cloud-Based Analysis of Cancer Genomes: Less...Amazon Web Services
 
Automating Complex Setups with Puppet
Automating Complex Setups with PuppetAutomating Complex Setups with Puppet
Automating Complex Setups with PuppetKris Buytaert
 
introduction to node.js
introduction to node.jsintroduction to node.js
introduction to node.jsorkaplan
 
Fedora Developer's Conference 2014 Talk
Fedora Developer's Conference 2014 TalkFedora Developer's Conference 2014 Talk
Fedora Developer's Conference 2014 TalkRainer Gerhards
 
Why favour Icinga over Nagios - Rootconf 2015
Why favour Icinga over Nagios - Rootconf 2015Why favour Icinga over Nagios - Rootconf 2015
Why favour Icinga over Nagios - Rootconf 2015Icinga
 
Go Faster with Ansible (AWS meetup)
Go Faster with Ansible (AWS meetup)Go Faster with Ansible (AWS meetup)
Go Faster with Ansible (AWS meetup)Richard Donkin
 
Learning the basics of Apache NiFi for iot OSS Europe 2020
Learning the basics of Apache NiFi for iot OSS Europe 2020Learning the basics of Apache NiFi for iot OSS Europe 2020
Learning the basics of Apache NiFi for iot OSS Europe 2020Timothy Spann
 
Automating complex infrastructures with Puppet
Automating complex infrastructures with PuppetAutomating complex infrastructures with Puppet
Automating complex infrastructures with PuppetKris Buytaert
 
Ansible is the simplest way to automate. SymfonyCafe, 2015
Ansible is the simplest way to automate. SymfonyCafe, 2015Ansible is the simplest way to automate. SymfonyCafe, 2015
Ansible is the simplest way to automate. SymfonyCafe, 2015Alex S
 

Similar a NBIS ChIP-seq course (20)

NBIS RNA-seq course
NBIS RNA-seq courseNBIS RNA-seq course
NBIS RNA-seq course
 
Standardising Swedish genomics analyses using nextflow
Standardising Swedish genomics analyses using nextflowStandardising Swedish genomics analyses using nextflow
Standardising Swedish genomics analyses using nextflow
 
Attack monitoring using ElasticSearch Logstash and Kibana
Attack monitoring using ElasticSearch Logstash and KibanaAttack monitoring using ElasticSearch Logstash and Kibana
Attack monitoring using ElasticSearch Logstash and Kibana
 
Logstash
LogstashLogstash
Logstash
 
NGINX Installation and Tuning
NGINX Installation and TuningNGINX Installation and Tuning
NGINX Installation and Tuning
 
FPGAs in the cloud? (October 2017)
FPGAs in the cloud? (October 2017)FPGAs in the cloud? (October 2017)
FPGAs in the cloud? (October 2017)
 
Performance Optimization of SPH Algorithms for Multi/Many-Core Architectures
Performance Optimization of SPH Algorithms for Multi/Many-Core ArchitecturesPerformance Optimization of SPH Algorithms for Multi/Many-Core Architectures
Performance Optimization of SPH Algorithms for Multi/Many-Core Architectures
 
The Popper Experimentation Protocol and CLI tool
The Popper Experimentation Protocol and CLI toolThe Popper Experimentation Protocol and CLI tool
The Popper Experimentation Protocol and CLI tool
 
Real time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache SparkReal time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache Spark
 
Virtual Knowledge Graphs for Federated Log Analysis
Virtual Knowledge Graphs for Federated Log AnalysisVirtual Knowledge Graphs for Federated Log Analysis
Virtual Knowledge Graphs for Federated Log Analysis
 
Running Accurate, Scalable, and Reproducible Simulations of Distributed Syste...
Running Accurate, Scalable, and Reproducible Simulations of Distributed Syste...Running Accurate, Scalable, and Reproducible Simulations of Distributed Syste...
Running Accurate, Scalable, and Reproducible Simulations of Distributed Syste...
 
AWS re:Invent 2016: Large-Scale, Cloud-Based Analysis of Cancer Genomes: Less...
AWS re:Invent 2016: Large-Scale, Cloud-Based Analysis of Cancer Genomes: Less...AWS re:Invent 2016: Large-Scale, Cloud-Based Analysis of Cancer Genomes: Less...
AWS re:Invent 2016: Large-Scale, Cloud-Based Analysis of Cancer Genomes: Less...
 
Automating Complex Setups with Puppet
Automating Complex Setups with PuppetAutomating Complex Setups with Puppet
Automating Complex Setups with Puppet
 
introduction to node.js
introduction to node.jsintroduction to node.js
introduction to node.js
 
Fedora Developer's Conference 2014 Talk
Fedora Developer's Conference 2014 TalkFedora Developer's Conference 2014 Talk
Fedora Developer's Conference 2014 Talk
 
Why favour Icinga over Nagios - Rootconf 2015
Why favour Icinga over Nagios - Rootconf 2015Why favour Icinga over Nagios - Rootconf 2015
Why favour Icinga over Nagios - Rootconf 2015
 
Go Faster with Ansible (AWS meetup)
Go Faster with Ansible (AWS meetup)Go Faster with Ansible (AWS meetup)
Go Faster with Ansible (AWS meetup)
 
Learning the basics of Apache NiFi for iot OSS Europe 2020
Learning the basics of Apache NiFi for iot OSS Europe 2020Learning the basics of Apache NiFi for iot OSS Europe 2020
Learning the basics of Apache NiFi for iot OSS Europe 2020
 
Automating complex infrastructures with Puppet
Automating complex infrastructures with PuppetAutomating complex infrastructures with Puppet
Automating complex infrastructures with Puppet
 
Ansible is the simplest way to automate. SymfonyCafe, 2015
Ansible is the simplest way to automate. SymfonyCafe, 2015Ansible is the simplest way to automate. SymfonyCafe, 2015
Ansible is the simplest way to automate. SymfonyCafe, 2015
 

Más de Phil Ewels

Reproducible bioinformatics for everyone: Nextflow & nf-core
Reproducible bioinformatics for everyone: Nextflow & nf-coreReproducible bioinformatics for everyone: Nextflow & nf-core
Reproducible bioinformatics for everyone: Nextflow & nf-corePhil Ewels
 
Reproducible bioinformatics workflows with Nextflow and nf-core
Reproducible bioinformatics workflows with Nextflow and nf-coreReproducible bioinformatics workflows with Nextflow and nf-core
Reproducible bioinformatics workflows with Nextflow and nf-corePhil Ewels
 
ELIXIR Proteomics Community - Connection with nf-core
ELIXIR Proteomics Community - Connection with nf-coreELIXIR Proteomics Community - Connection with nf-core
ELIXIR Proteomics Community - Connection with nf-corePhil Ewels
 
Coffee 'n code: Regexes
Coffee 'n code: RegexesCoffee 'n code: Regexes
Coffee 'n code: RegexesPhil Ewels
 
Nextflow Camp 2019: nf-core tutorial (Updated Feb 2020)
Nextflow Camp 2019: nf-core tutorial (Updated Feb 2020)Nextflow Camp 2019: nf-core tutorial (Updated Feb 2020)
Nextflow Camp 2019: nf-core tutorial (Updated Feb 2020)Phil Ewels
 
Nextflow Camp 2019: nf-core tutorial
Nextflow Camp 2019: nf-core tutorialNextflow Camp 2019: nf-core tutorial
Nextflow Camp 2019: nf-core tutorialPhil Ewels
 
EpiChrom 2019 - Updates in Epigenomics at the NGI
EpiChrom 2019 - Updates in Epigenomics at the NGIEpiChrom 2019 - Updates in Epigenomics at the NGI
EpiChrom 2019 - Updates in Epigenomics at the NGIPhil Ewels
 
The future of genomics in the cloud
The future of genomics in the cloudThe future of genomics in the cloud
The future of genomics in the cloudPhil Ewels
 
SciLifeLab NGI NovaSeq seminar
SciLifeLab NGI NovaSeq seminarSciLifeLab NGI NovaSeq seminar
SciLifeLab NGI NovaSeq seminarPhil Ewels
 
Lecture: NGS at the National Genomics Infrastructure
Lecture: NGS at the National Genomics InfrastructureLecture: NGS at the National Genomics Infrastructure
Lecture: NGS at the National Genomics InfrastructurePhil Ewels
 
SBW 2016: MultiQC Workshop
SBW 2016: MultiQC WorkshopSBW 2016: MultiQC Workshop
SBW 2016: MultiQC WorkshopPhil Ewels
 
Whole Genome Sequencing - Data Processing and QC at SciLifeLab NGI
Whole Genome Sequencing - Data Processing and QC at SciLifeLab NGIWhole Genome Sequencing - Data Processing and QC at SciLifeLab NGI
Whole Genome Sequencing - Data Processing and QC at SciLifeLab NGIPhil Ewels
 
Developing Reliable QC at the Swedish National Genomics Infrastructure
Developing Reliable QC at the Swedish National Genomics InfrastructureDeveloping Reliable QC at the Swedish National Genomics Infrastructure
Developing Reliable QC at the Swedish National Genomics InfrastructurePhil Ewels
 
Using visual aids effectively
Using visual aids effectivelyUsing visual aids effectively
Using visual aids effectivelyPhil Ewels
 
Analysis of ChIP-Seq Data
Analysis of ChIP-Seq DataAnalysis of ChIP-Seq Data
Analysis of ChIP-Seq DataPhil Ewels
 
Internet McMenemy
Internet McMenemyInternet McMenemy
Internet McMenemyPhil Ewels
 

Más de Phil Ewels (16)

Reproducible bioinformatics for everyone: Nextflow & nf-core
Reproducible bioinformatics for everyone: Nextflow & nf-coreReproducible bioinformatics for everyone: Nextflow & nf-core
Reproducible bioinformatics for everyone: Nextflow & nf-core
 
Reproducible bioinformatics workflows with Nextflow and nf-core
Reproducible bioinformatics workflows with Nextflow and nf-coreReproducible bioinformatics workflows with Nextflow and nf-core
Reproducible bioinformatics workflows with Nextflow and nf-core
 
ELIXIR Proteomics Community - Connection with nf-core
ELIXIR Proteomics Community - Connection with nf-coreELIXIR Proteomics Community - Connection with nf-core
ELIXIR Proteomics Community - Connection with nf-core
 
Coffee 'n code: Regexes
Coffee 'n code: RegexesCoffee 'n code: Regexes
Coffee 'n code: Regexes
 
Nextflow Camp 2019: nf-core tutorial (Updated Feb 2020)
Nextflow Camp 2019: nf-core tutorial (Updated Feb 2020)Nextflow Camp 2019: nf-core tutorial (Updated Feb 2020)
Nextflow Camp 2019: nf-core tutorial (Updated Feb 2020)
 
Nextflow Camp 2019: nf-core tutorial
Nextflow Camp 2019: nf-core tutorialNextflow Camp 2019: nf-core tutorial
Nextflow Camp 2019: nf-core tutorial
 
EpiChrom 2019 - Updates in Epigenomics at the NGI
EpiChrom 2019 - Updates in Epigenomics at the NGIEpiChrom 2019 - Updates in Epigenomics at the NGI
EpiChrom 2019 - Updates in Epigenomics at the NGI
 
The future of genomics in the cloud
The future of genomics in the cloudThe future of genomics in the cloud
The future of genomics in the cloud
 
SciLifeLab NGI NovaSeq seminar
SciLifeLab NGI NovaSeq seminarSciLifeLab NGI NovaSeq seminar
SciLifeLab NGI NovaSeq seminar
 
Lecture: NGS at the National Genomics Infrastructure
Lecture: NGS at the National Genomics InfrastructureLecture: NGS at the National Genomics Infrastructure
Lecture: NGS at the National Genomics Infrastructure
 
SBW 2016: MultiQC Workshop
SBW 2016: MultiQC WorkshopSBW 2016: MultiQC Workshop
SBW 2016: MultiQC Workshop
 
Whole Genome Sequencing - Data Processing and QC at SciLifeLab NGI
Whole Genome Sequencing - Data Processing and QC at SciLifeLab NGIWhole Genome Sequencing - Data Processing and QC at SciLifeLab NGI
Whole Genome Sequencing - Data Processing and QC at SciLifeLab NGI
 
Developing Reliable QC at the Swedish National Genomics Infrastructure
Developing Reliable QC at the Swedish National Genomics InfrastructureDeveloping Reliable QC at the Swedish National Genomics Infrastructure
Developing Reliable QC at the Swedish National Genomics Infrastructure
 
Using visual aids effectively
Using visual aids effectivelyUsing visual aids effectively
Using visual aids effectively
 
Analysis of ChIP-Seq Data
Analysis of ChIP-Seq DataAnalysis of ChIP-Seq Data
Analysis of ChIP-Seq Data
 
Internet McMenemy
Internet McMenemyInternet McMenemy
Internet McMenemy
 

Último

biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY1301aanya
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and ClassificationsAreesha Ahmad
 
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...Silpa
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformationAreesha Ahmad
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxSuji236384
 
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...chandars293
 
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Bookingroncy bisnoi
 
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verifiedConnaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verifiedDelhi Call girls
 
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLkantirani197
 
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit flypumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit flyPRADYUMMAURYA1
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxRizalinePalanog2
 
dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...
dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...
dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...dkNET
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPirithiRaju
 
Seismic Method Estimate velocity from seismic data.pptx
Seismic Method Estimate velocity from seismic  data.pptxSeismic Method Estimate velocity from seismic  data.pptx
Seismic Method Estimate velocity from seismic data.pptxAlMamun560346
 
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)Joonhun Lee
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfrohankumarsinghrore1
 
IDENTIFICATION OF THE LIVING- forensic medicine
IDENTIFICATION OF THE LIVING- forensic medicineIDENTIFICATION OF THE LIVING- forensic medicine
IDENTIFICATION OF THE LIVING- forensic medicinesherlingomez2
 
Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfSumit Kumar yadav
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learninglevieagacer
 

Último (20)

biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
 
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformation
 
Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
 
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
 
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
 
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verifiedConnaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
 
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
 
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit flypumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
 
dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...
dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...
dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
Seismic Method Estimate velocity from seismic data.pptx
Seismic Method Estimate velocity from seismic  data.pptxSeismic Method Estimate velocity from seismic  data.pptx
Seismic Method Estimate velocity from seismic data.pptx
 
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 
IDENTIFICATION OF THE LIVING- forensic medicine
IDENTIFICATION OF THE LIVING- forensic medicineIDENTIFICATION OF THE LIVING- forensic medicine
IDENTIFICATION OF THE LIVING- forensic medicine
 
Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdf
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
 

NBIS ChIP-seq course

  • 1. NGI stockholm NGI-ChIPseq Processing ChIP-seq data at the National Genomics Infrastructure Phil Ewels phil.ewels@scilifelab.se NBIS ChIP-seq tutorial 2017-11-29
  • 2. NGI stockholm SciLifeLab NGI Our mission is to offer a
 state-of-the-art infrastructure
 for massively parallel DNA sequencing and SNP genotyping, available to researchers all over Sweden
  • 3. NGI stockholm SciLifeLab NGI National resource State-of-the-art infrastructure Guidelines and support We provide 
 guidelines and support
 for sample collection, study design, protocol selection and bioinformatics analysis
  • 4. NGI stockholm NGI Organisation NGI Stockholm NGI Uppsala
  • 5. NGI stockholm NGI Organisation Funding Staff salaries Premises and service contracts Capital equipment Host universities SciLifeLab VR KAW User fees Reagent costs NGI Stockholm NGI Uppsala
  • 6. NGI stockholm Project timeline Sample QC Library preparation, Sequencing, Genotyping Data processing and primary analysis Scientific support and project consultation Data delivery
  • 7. NGI stockholm Methods offered at NGI Exome
 sequencing Nanoporesequencing ATAC-seq Metagenomics ChIP-seqBisulphite
 sequencing RAD-seq RNA-seq de novo Whole Genome seq Data analysis included for FREE Just
 Sequencing Accredited methods
  • 8. NGI stockholm ChIP-seq: NGI Stockholm • You do the ChIP, we do the seq • Rubicon ThruPlex DNA (NGI Production) • Min 1 ng input • Min 10 μl • 0.2-10 ng/μl • Ins. size 200-800 bp • 963 kr / prep
  • 9. NGI stockholm ChIP-seq: NGI Stockholm • You do the ChIP, we do the seq • Rubicon ThruPlex DNA (NGI Production) • Typically run SE 50bp • Illumina HiSeq High Output mode v4, SR 1x50bp • 1226 kr / sample (40M reads)
  • 10. NGI stockholm ChIP-seq: NGI Stockholm • You do the ChIP, we do the seq • Rubicon ThruPlex DNA (NGI Production) • Typically run SE 50bp • Start by organising a planning meeting https://ngisweden.scilifelab.se
  • 11. NGI stockholm ChIP-seq Pipeline • Takes raw FastQ sequencing data as input • Provides range of results • Alignments (BAM) • Peaks (optionally filtered) • Quality Control • Pipeline in use since early 2017 (on request) NGI-ChIPseq
  • 12. ChIP-seq Pipeline FastQC TrimGalore! BWA Samtools, Picard Phantompeakqualtools  deepTools NGSPlot MACS2 Bedtools MultiQC Sequence QC Read trimming Alignment Sort, index, mark duplicates Strand cross-correlation QC Fingerprint, sample correlation TSS / Gene profile plots Peak calling Filtering blacklisted regions Reporting NGI-ChIPseq FastQ BAM HTML BED
  • 13. NGI stockholm Nextflow • Tool to manage computational pipelines • Handles interaction with compute infrastructure • Easy to learn how to run, minimal oversight required
  • 15. NGI stockholm Nextflow #!/usr/bin/env nextflow input = Channel.fromFilePairs( params.reads ) process fastqc { input: file reads from input output: file "*_fastqc.{zip,html}" into results script: """ fastqc -q $reads """ } https://www.nextflow.io/
  • 16. NGI stockholm Default: Run locally, assume software is installed Nextflow #!/usr/bin/env nextflow input = Channel.fromFilePairs( params.reads ) process fastqc { input: file reads from input output: file "*_fastqc.{zip,html}" into results script: """ fastqc -q $reads """ } process { executor = 'slurm' clusterOptions = { "-A b2017123" } cpus = 1 memory = 8.GB time = 2.h $fastqc { module = ['bioinfo-tools', ‘FastQC'] } } Submit jobs to SLURM queue Use environment modules
  • 17. NGI stockholm Nextflow #!/usr/bin/env nextflow input = Channel.fromFilePairs( params.reads ) process fastqc { input: file reads from input output: file "*_fastqc.{zip,html}" into results script: """ fastqc -q $reads """ } process { executor = 'slurm' clusterOptions = { "-A b2017123" } cpus = 1 memory = 8.GB time = 2.h $fastqc { module = ['bioinfo-tools', ‘FastQC'] } } docker { enabled = true } process { container = 'biocontainers/fastqc' cpus = 1 memory = 8.GB time = 2.h } Run locally, use docker container for all software dependencies
  • 20. NGI stockholm Running NGI-ChIPseq Step 1: Install Nextflow • Uppmax - load the Nextflow module module load nextflow • Anywhere (including Uppmax) - install Nextflow curl -s https://get.nextflow.io | bash Step 2: Try running NGI-ChIPseq pipeline nextflow run SciLifeLab/NGI-ChIPseq --help
  • 21. NGI stockholm Running NGI-ChIPseq Step 3: Choose your reference • Common organism - use iGenomes --genome GRCh37 • MACS peak calling config file --macsconfig config.csv Step 4: Organise your data • One (if single-end) or two (if paired-end) FastQ per sample • Everything in one directory, simple filenames help!
  • 22. NGI stockholm Running NGI-ChIPseq Step 5: Run the pipeline on your data • Remember to run detached from your terminal screen / tmux / nohup Step 6: Check your results • Read the Nextflow log and check the MultiQC report Step 7: Delete temporary files • Delete the ./work directory, which holds all intermediates
  • 23. NGI stockholm Using UPPMAX nextflow run SciLifeLab/NGI-ChIPseq --project b2017123 --genome GRCh37 --macsconfig p.txt --reads "data/*_R{1,2}.fastq.gz" • Default config is for UPPMAX • Knows about central iGenomes references • Uses centrally installed software
  • 24. NGI stockholm Using other clusters nextflow run SciLifeLab/NGI-ChIPseq -profile hebbe --bwaindex ./ref --macsconfig p.txt --reads "data/*_R{1,2}.fastq.gz" • Can run just about anywhere • Supports local, SGE, LSF, SLURM, PBS/Torque, HTCondor, DRMAA, DNAnexus, Ignite, Kubernetes
  • 25. NGI stockholm Using Docker nextflow run SciLifeLab/NGI-ChIPseq -profile docker --fasta genome.fa --macsconfig p.txt --reads "data/*_R{1,2}.fastq.gz" • Can run anywhere with Docker • Downloads required software and runs in a container • Portable and reproducible.
  • 26. NGI stockholm Using AWS nextflow run SciLifeLab/NGI-ChIPseq -profile aws --genome GRCh37 --macsconfig p.txt --reads "s3://my-bucket/*_{1,2}.fq.gz" --outdir "s3://my-bucket/results/" • Runs on the AWS cloud with Docker • Pay-as-you go, flexible computing • Can launch from anywhere with minimal configuration
  • 27. NGI stockholm Input data ERROR ~ Cannot find any reads matching: XXXX NB: Path needs to be enclosed in quotes! NB: Path requires at least one * wildcard! If this is single-end data, please specify
 --singleEnd on the command line. --reads '*_R{1,2}.fastq.gz' --reads '*.fastq.gz' --singleEnd --reads *_R{1,2}.fastq.gz --reads '*.fastq.gz' --reads sample.fastq.gz
  • 28. NGI stockholm Read trimming • Pipeline runs TrimGalore! to remove adapter contamination and low quality bases automatically • Use --notrim to disable this • Some library preps also include additional adapters --clip_r1 [int] --clip_r2 [int] --three_prime_clip_r1 [int] --three_prime_clip_r2 [int]
  • 29. NGI stockholm Blacklist filtering • Some parts of the reference genome collect incorrectly mapped reads • Good practice to remove these peaks • Pipeline has ENCODE regions for Human & Mouse • Can pass own BED file of custom regions --blacklist_filtering --blacklist regions.bed
  • 30. NGI stockholm Broad Peaks • Some chromatin profiles don't have narrow, sharp peaks • For example, H3K9me3 & H3K27me3 • MACS2 can call peaks in "broad peak" mode • Pipeline uses default qvalue cutoff of 0.1 --broad
  • 31. NGI stockholm Extending Read Length • When using single-end data, sequenced read length is shorter than the sequence fragment length • For DeepTools, need to "extend" the read length • Set to 100bp by default. Use this parameter to customise this value. • Expected fragment length - sequence read length --extendReadsLen [int]
  • 32. NGI stockholm Saving intermediates • By default, the pipeline doesn't save some intermediate files to your final results directory • Reference genome indices that have been built • FastQ files from TrimGalore! • BAM files from STAR (we have BAMs from Picard) --saveReference --saveTrimmed --saveAlignedIntermediates
  • 33. NGI stockholm Resuming pipelines • If something goes wrong, you can resume a stopped pipeline • Will use cached versions of completed processes • NB: Only one hyphen! -resume • Can resume specific past runs • Use nextflow log to find job names -resume job_name
  • 34. NGI stockholm Customising output -name Give a name to your run. Used in logs and reports --outdir Specify the directory for saved results --saturation Run saturation analysis, subsampling reads from 10% - 100% --email Get e-mailed a summary report when the pipeline finishes
  • 35. NGI stockholm Nextflow config files • Can save a config file with defaults • Anything with two hyphens is a params params { email = 'phil.ewels@scilifelab.se' project = "b2017123" } process.$multiqc.module = [] ./nextflow.config ~/.nextflow/config -c /path/to/my.config
  • 36. NGI stockholm NGI-ChIPseq config N E X T F L O W ~ version 0.26.1 Launching `SciLifeLab/NGI-ChIPseq` [deadly_bose] - revision: 28e24c2a2a ========================================= NGI-ChIPseq: ChIP-Seq Best Practice v1.4 ========================================= Run Name : deadly_bose Reads : data/*fastq.gz Data Type : Single-End Genome : GRCh37 BWA Index : /sw/data/uppnex/igenomes//Homo_sapiens/Ensembl/GRCh37/Sequence/BWAIndex/ MACS Config : data/macsconfig.txt Saturation analysis : false MACS broad peaks : false Blacklist filtering : false Extend Reads : 100 bp Current home : /home/phil Current user : phil Current path : /home/phil/demo_data/ChIP/Human/test Working dir : /home/phil/demo_data/ChIP/Human/test/work Output dir : ./results R libraries : /home/phil/R/nxtflow_libs/ Script dir : /home/phil/GitHub/NGI-ChIPseq Save Reference : false Save Trimmed : false Save Intermeds : false Trim R1 : 0 Trim R2 : 0 Trim 3' R1 : 0 Trim 3' R2 : 0 Config Profile : UPPMAX UPPMAX Project : b2017001 E-mail Address : phil.ewels@scilifelab.se ====================================
  • 37. NGI stockholm Version control $fastqc.module = ['FastQC/0.7.2'] $trim_galore.module = ['TrimGalore/0.4.1', 'FastQC/0.7.2'] $bw.module = ['bwa/0.7.8', 'samtools/1.5'] $samtools.module = ['samtools/1.5', 'BEDTools/2.26.0'] $picard.module = ['picard/2.10.3', 'samtools/1.5'] $phantompeakqualtools.module = ['phantompeakqualtools/1.1'] $deepTools.module = ['deepTools/2.5.1'] $ngsplot.module = ['samtools/1.5', 'R/3.2.3', 'ngsplot/2.61'] $macs.module = ['MACS/2.1.0'] $saturation.module = ['MACS/2.1.0', 'samtools/1.5'] $saturation_r.module = ['R/3.2.3']
  • 38. NGI stockholm Version control • Pipeline is always released under a stable version tag • Software versions and code reproducible • For full reproducibility, specify version revision when running the pipeline nextflow run SciLifeLab/NGI-ChIPseq -r v1.3
  • 39. Conclusion • Use NGI-ChIPseq to prepare your data if you want: • To not have to remember every parameter for every tool • Extreme reproducibility • Ability to run on virtually any environment • Now running for all ChIPseq projects at NGI-Stockholm
  • 41. Conclusion NGI stockholm SciLifeLab/NGI-RNAseq https://github.com/ SciLifeLab/NGI-MethylSeq SciLifeLab/NGI-smRNAseq SciLifeLab/NGI-ChIPseq Acknowledgements Phil Ewels Chuan Wang Jakub Westholm Rickard Hammarén Max Käller Denis Moreno NGI Stockholm Genomics Applications Development Group support@ngisweden.se http://opensource.scilifelab.se