In early 2014 we upgraded JASPAR, the largest open-access, manually curated, database storing transcription factor (TF) binding profiles (PMID:24194598), and are in the process of preparing the 2016 release. A new BioPython module dedicated to accessing and using TF binding profiles stored in JASPAR is available, which we will introduce in the first portion of the webinar.
In the second part of the webinar, we will introduce the MANTA (Mongodb for the ANalysis of Tfbs Alteration) database we used for the analysis of cis-regulatory somatic mutations in B-cell lymphomas (PMID:25903198). The database stores positions of predicted TFBSs in ChIP-seq data using JASPAR TF binding profiles. We will describe the database and how to access and use it.
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
Webinar about JASPAR BioPython module and MANTA.
1. www.cmmt.ubc.ca
JASPAR BioPython & MANTA
Anthony Mathelier, David Arenillas & Wyeth Wasserman
anthony.mathelier@gmail.com & dave@cmmt.ubc.ca
Wasserman Lab
2. 2
Outline
● JASPAR BioPython module
– What is JASPAR?
– How to construct matrices from JASPAR files using
the JASPAR BioPython module.
● MANTA
– What is stored in MANTA?
– How to interrogate the MANTA DB using Python and
our web application.
3. 3
http://jaspar.genereg.net
Mathelier et al. JASPAR 2014: an extensively expanded and updated open-access database of
transcription factor binding profiles. Nucleic Acids Res. 2014 PMID 24194598
5. 5
Scoring putative TFBS sequences
A [ 1 0 19 20 18 1 20 7 ]
C [ 1 0 1 0 1 18 0 2 ]
G [17 0 0 0 1 0 0 3 ]
T [ 1 20 0 0 0 1 0 8 ]
A [1.5 2.5 1.7 1.8 1.6 1.5 1.8 0.4 ]
C [1.5 2.5 1.5 2.5 1.5 1.6 2.5 1.0 ]
G [ 1.6 2.5 2.5 2.5 1.5 2.5 2.5 0.6 ]
T [1.5 1.8 2.5 2.5 2.5 1.5 2.5 0.6 ]
A C G A G T T A A A C A A G C T A
A [1.5 2.5 1.7 1.8 1.6 1.5 1.8 0.4 ]
C [1.5 2.5 1.5 2.5 1.5 1.6 2.5 1.0 ]
G [ 1.6 2.5 2.5 2.5 1.5 2.5 2.5 0.6 ]
T [1.5 1.8 2.5 2.5 2.5 1.5 2.5 0.6 ]
Score = 9.2
PFM PWM – Position Weight Matrix
PWM
Sum score at
each position
(aka PSSM – Position Specific Scoring Matrix)
7. 7
JASPAR Biopython modules
➢ Bio.motifs.jaspar
➢ Read / write motifs encoded in the JASPAR flat file formats:
sites, PFM and jaspar
➢ Bio.motifs.jaspar.db
➢ Search / fetch motifs from a JASPAR formatted database.
http://biopython.org*
*Cock et al. Biopython: freely available Python tools for computational molecular biology and
bioinformatics. Bioinformatics. 2009 Jun 1;25(11):1422-3. PMID: 19304878
Extend Biopython's Bio.motifs module to support construction
of TFBS matrices from JASPAR supported formats.
8. 8
Constructing a matrix from a JASPAR sites
formatted file
The JASPAR sites format consists of a list of known binding sites for a motif.
9. 9
Constructing a matrix from a JASPAR pfm
formatted file
The JASPAR pfm format simply describes a frequency matrix for a single motif.
10. 10
Constructing matrices from a JASPAR jaspar
formatted file
Note the use of the parse rather than the read method to read multiple motifs.
The JASPAR jaspar format allows for multiple motifs. Each record consists of a header line
followed by four lines defining the frequency matrix.
11. 11
Constructing matrices from a JASPAR jaspar
formatted file cont'd
The frequency portions of the file can be specified in a simpler format identical to the pfm
format.
12. 12
The JASPAR DB module
Connect to a JASPAR database:
Modelled after the Perl TFBS modules*.
Specifically, the Bio.motifs.jaspar.db.JASPAR5 BioPython class is modelled
after the TFBS::DB::JASPAR5 perl class.
Fetch a specific motif by it's JASPAR ID:
* Lenhard et al. TFBS: Computational framework for transcription factor binding site analysis.
Bioinformatics. 2002 PMID 12176838
13. 13
JASPAR DB module cont'd
Fetch multiple motifs according to various attributes.
Example: fetch the motifs of all the vertebrate and insect transcription factors from the CORE
JASPAR collection which are part of the Forkhead family and which have an information
content of at least 12 bits:
Note that selection criteria (such a 'tax_group' and 'tf_family') which allow multiple values may
be specified either as a single value or as a list of values.
14. 14
For more information...
For an overview and examples of using these modules, please
see the JASPAR sub-section under the “Reading motifs”
section of the BioPython Tutorial and Cookbook:
http://biopython.org/DIST/docs/tutorial/Tutorial.html
For more technical information see the Bio.motifs.jaspar
section of the BioPython API docs:
http://biopython.org/DIST/docs/api
15. 15
MANTA
MongoDB for Analysis of TFBS Alteration
Mathelier et al. Cis-regulatory somatic mutations and gene-expression alteration in B-cell lymphomas.
Genome Biology. 2015. PMID 25903198
26. 26
DNA
TFBS
0.80 0.85 0.90 0.95 1.00 1.05 1.10
01234567
alt/ref
Density
Assessing the impact of variations on TF binding
27. 27
DNA
SNV
0.80 0.85 0.90 0.95 1.00 1.05 1.10
01234567
alt/ref
Density
Alternative
Assessing the impact of variations on TF binding
28. 28
Example of Application of MANTA
Mathelier et al. Cis-regulatory somatic mutations and gene-expression alteration in B-cell lymphomas.
Genome Biology. 2015. PMID
29. 29
The MANTA Database
Implemented with MongoDB (http://www.mongodb.org)
Consists of 3 collections:
Experiments
- experiment name, type, TF name, JASPAR matrix ID, etc.
Peaks
- peak position (chromosome, start, end), score, position of maximum
peak height, etc.
TFBSs / SNVs
- position (chromosome, start, end), strand, score for the unmutated
TFBS plus similar information and impact score for each position / alt.
allele mutation.
30. 30
MANTA DB with Python
Example: connect to MANTA DB and fetch all TFBS affected by an SNV at position 6425005
on chromosome 19.
31. 31
MANTA Web Interface
URL: http://manta.cmmt.ubc.ca/manta
Source code: https://github.com/wassermanlab/MANTA