2. ChIP-Seq experiment
By Jkwchui - Cell diagram adapted from LadyOfHats' Animal Cell diagram. Information based on Illumina data sheet, as well as ChIP and immunoprecipitation articles
& references., CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=17890854
3. ChIP-Seq experiment
By Jkwchui - Cell diagram adapted from LadyOfHats' Animal Cell diagram. Information based on Illumina data sheet, as well as ChIP and immunoprecipitation articles
& references., CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=17890854
4. ChIP-Seq experiment
By Jkwchui - Cell diagram adapted from LadyOfHats' Animal Cell diagram. Information based on Illumina data sheet, as well as ChIP and immunoprecipitation articles
& references., CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=17890854
5. ChIP-Seq experiment
By Jkwchui - Cell diagram adapted from LadyOfHats' Animal Cell diagram. Information based on Illumina data sheet, as well as ChIP and immunoprecipitation articles
& references., CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=17890854
7. Raw data, reads in FASTQ format
@HWI-ST227:389:C4WA2ACXX:7:1204:2272:59979
GGAGGAAGGTCCTCGCTCCTCTTTCATATAAGGGAAATGGCTGAAT
+
FFFFHHHHHHJIJJJJJJJJIJJJIGIGIGGIJJIJIJJJJJJIII
@HWI-ST227:389:C4WA2ACXX:7:1205:15214:42893
GAGGATCCCAGGGAGGAAGGTCCTCGCTCCTCTTTCATCTAAGGGA
+
12BAFB?A:3<AE1@<FF;1*@EG*)?0?DBD>9BF9B*?######
@HWI-ST227:389:C4WA2ACXX:8:2208:2467:44624
AAAGAGGAGAGAGGACCATCCTCCCTGGGATCCTCAGAAGTCTACT
+
BDDA:DB?2AA@FC>F?EEGC<FED>GFD;?GBB?<?F99*/9?9?
Header Sequence Quality
8. Raw data, reads in FASTQ format
zcat B7_H3K4me1.fastq.gz | awk '{num++}END{print num/4}’
41103741
Counting fastq reads (the slow way)
9. Raw data, reads in FASTQ format
Phred quality score.
l Q=-10 log10p
l p = probability that the corresponding base call is
incorrect
l Example: p = 0.001 means a quality of 30
!"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJ
0........................26...31.........41
10. Raw data, reads in FASTQ format
Analyzing the quality (FASTQC)
GOOD BAD
https://www.bioinformatics.babraham.ac.uk/projects/fastqc/
11. Alignment
l Align 20-30 million reads per sample to the reference
genome.
l Reference genome can be very long (human is 3 Giga
bases)
l We need ultra-fast mappers:
l Bowtie (http://bowtie-bio.sourceforge.net/index.shtml)
l Bwa (http://bio-bwa.sourceforge.net/)
l GEM (https://github.com/smarco/gem3-mapper)
l …
17. Alignment
l Align 20-30 million reads per sample to the reference
genome.
l Reference genome has to be indexed
l Problems with repetitive sequences
?
18. Alignment
l Align 20-30 million reads per sample to the reference
genome.
l Reference genome has to be indexed
l Problems with repetitive sequences
l Problems with PCR artifacts (marking duplicates)
26. Peak calling
Zhang Y, Liu T, Meyer CA, Eeckhoute J, Johnson DS, Bernstein BE, Nusbaum C, Myers RM, Brown M, Li W, Liu XS. Model-based analysis of ChIP-
Seq (MACS). Genome Biol. 2008;9(9):R137.
It is possible to infer the fragment size and use it for extending the reads to
get more reliable peaks (i.e. binding sites). The peak is in the middle.
28. bigBed and bigWig format
https://genome.ucsc.edu/goldenpath/help/bigWig.html
https://genome.ucsc.edu/goldenpath/help/bigBed.html
Indexed binary format generated from bed and wiggle files.
29. Annotating peaks
https://bedtools.readthedocs.io/en/latest/
Quinlan AR. BEDTools: The Swiss-Army Tool for Genome Feature Analysis. Curr Protoc Bioinformatics. 2014 Sep 8;47:11.12.1-34
Crossing information from gtf files and bed files (BedTools)
intersectBed -a Peaks/B7_H3K4me1_vs_B7_input-macs-narrow--q_0_peaks.bed
-b gencode.vM17.annotation.gtf
-wa -wb -nonamecheck |
awk '{if ($9 == "gene") print }'
30. Annotating peaks
https://bedtools.readthedocs.io/en/latest/
Quinlan AR. BEDTools: The Swiss-Army Tool for Genome Feature Analysis. Curr Protoc Bioinformatics. 2014 Sep 8;47:11.12.1-34
Crossing information from gtf files and bed files (BedTools)
intersectBed -a Peaks/B7_H3K4me1_vs_B7_input-macs-narrow--q_0_peaks.bed
-b gencode.vM17.annotation.gtf
-wa -wb -nonamecheck |
awk '{if ($9 == "gene") print }'
chr1 3444977 3445551 peak_15 31 .
chr1 HAVANA gene -nonamecheck 3205901 3671498 . -
. gene_id "ENSMUSG00000051951.5"; gene_type
"protein_coding"; gene_name "Xkr4"; level 2; havana_gene
"OTTMUSG00000026353.2";
31. Annotating peaks
https://bedtools.readthedocs.io/en/latest/
Quinlan AR. BEDTools: The Swiss-Army Tool for Genome Feature Analysis. Curr Protoc Bioinformatics. 2014 Sep 8;47:11.12.1-34
Crossing information from gtf files and bed files (BedTools)
awk '{if ($3 == "gene") print }' gencode.vM17.annotation.gtf |
closestBed -a Peaks/B7_H3K4me1_vs_B7_input-macs-narrow--q_0_peaks.bed
-d -b -