Highlights of the Biopython project for computational biology, 2011-2012: Artemis-like genome track comparison with GenomeDiagram, new formats for SeqIO, phylogenetics with Bio.Phylo, Bio.PDB improvements, and an update on Google Summer of Code (GSoC) projects.
How to Troubleshoot Apps for the Modern Connected Worker
Biopython Project Update (BOSC 2012)
1. Project Update
Bioinformatics Open Source Conference (BOSC)
July 14, 2012
Long Beach, California, USA
Eric Talevich, Peter Cock,
Brad Chapman, João Rodrigues,
and Biopython contributors
2. Hello, BOSC
Biopython is a freely available Python library for biological
computation, and a long-running, distributed collaboration
to produce and maintain it [1].
● Supported by the Open Bioinformatics Foundation
(OBF)
● "This is Python's Bio* library. There are several Bio*
libraries like it, but this one is ours."
● http://biopython.org/
_____
[1] Cock, P.J.A., Antao, T., Chang, J.T., Chapman, B.A., Cox, C.J., Dalke, A.,
Friedberg, I., Hamelryck, T., Kauff, F., Wilczynski, B., de Hoon, M.J. (2009)
Biopython: freely available Python tools for computational molecular biology
and bioinformatics. Bioinformatics 25(11) 1422-3. doi:10.1093
/bioinformatics/btp163
3. Bio.Graphics (Biopython 1.59, February 2012)
New features in...
BasicChromosome:
● Draw simple sub-features on chromosome segments
● Show the position of genes, SNPs or other loci
GenomeDiagram [2]:
● Cross-links between tracks
● Track-specific start/end positions for showing regions
_____
[2] Pritchard, L., White, J.A., Birch, P.R., Toth, I. (2010) GenomeDiagram: a
python package for the visualization of large-scale genomic data.
Bioinformatics 2(5) 616-7.
doi:10.1093/bioinformatics/btk021
7. SeqIO and AlignIO
(Biopython 1.58, August 2011)
● SeqXML format [3]
● Read support for ABI chromatogram files (Wibowo A.)
● "phylip-relaxed" format (Connor McCoy, Brandon I.)
○ Relaxes the 10-character limit on taxon names
○ Space-delimited instead
○ Used in RAxML, PhyML, PAML, etc.
_____
[3] Schmitt et al. (2011) SeqXML and OrthoXML: standards for sequence and
orthology information. Briefings in Bioinformatics 12(5): 485-488. doi:10.1093
/bib/bbr025
8. Bio.Phylo & pypaml
● PAML interop: wrappers, I/O, glue
○ Merged Brandon Invergo’s pypaml as
Bio.Phylo.PAML (Biopython 1.58, August 2011)
● Phylo.draw improvements
● RAxML wrapper (Biopython 1.60, June 2012)
● Paper in review [4]
_____
[4] Talevich, E., Invergo, B.M., Cock, P.J.A., Chapman, B.A. (2012) Bio.Phylo:
a unified toolkit for processing, analysis and visualization of phylogenetic data
in Biopython. BMC Bioinformatics 13:209. doi:10.1186/1471-2105-13-209
10. Bio.bgzf (Blocked GNU Zip Format)
● BGZF is a GZIP variant that compresses
blocks of a fixed, known size
● Used in Next Generation Sequencing for
efficient random access to compressed files
○ SAM + BGZF = BAM
Bio.SeqIO can now index BGZF compressed
sequence files. (Biopython 1.60, June 2012)
11. TogoWS
(Biopython 1.59, February 2012)
● TogoWS is an integrated web resource for
bioinformatics databases and services
● Provided by the Database Center for Life Science in
Japan
● Usage is similar to NCBI Entrez
_____
http://togows.dbcls.jp/
12. PyPy and Python 3
Biopython:
● works well on PyPy 1.9
(excluding NumPy & C extensions)
● works on Python 3 (excluding some C
extensions), but concerns remain about
performance in default unicode mode.
○ Currently 'beta' level support.
13. Bio.PDB
● mmCIF parser restored (Biopython 1.60, June 2012)
○ Lenna Peterson fixed a 4-year-old lex/yacc-related
compilation issue
○ That was awesome
○ Now she's a GSoC student
○ Py3/PyPy/Jython compatibility in progress
● Merging GSoC results incrementally
○ Atom element names & weights (João Rodrigues,
GSoC 2010)
○ Lots of feature branches remaining...
15. Google Summer of Code (GSoC)
In 2011, Biopython had three projects funded via the OBF:
● Mikael Trellet (Bio.PDB)
● Michele Silva (Bio.PDB, Mocapy++)
● Justinas Daugmaudis (Mocapy++)
In 2012, we have two projects via the OBF:
● Wibowo Arindrarto: (SearchIO)
● Lenna Peterson: (Variants)
_____
http://biopython.org/wiki/Google_Summer_of_Code
http://www.open-bio.org/wiki/Google_Summer_of_Code
https://www.google-melange.com/
16. GSoC 2011: Mikael Trellet
Biomolecular interfaces in Bio.PDB
Mentor: João Rodrigues
● Representation of protein-protein
interfaces: SM(I)CRA
● Determining interfaces from PDB coordinates
● Analyses of these objects
_____
http://biopython.org/wiki/GSoC2011_mtrellet
17. GSoC 2011: Michele Silva
Python/Biopython bindings for Mocapy++
Mentor: Thomas Hamelryck
Michele Silva wrote a Python bridge for Mocapy++ and
linked it to Bio.PDB to enable statistical analysis of protein
structures.
More-or-less ready to merge after the next Mocapy++
release.
_____
http://biopython.org/wiki/GSOC2011_Mocapy
18. GSoC 2011: Justinas Daugmaudis
Mocapy extensions in Python
Mentor: Thomas Hamelryck
Enhance Mocapy++ in a complementary way, developing a
plugin system for Mocapy++ allowing users to easily write
new nodes (probability distribution functions) in Python.
He's finishing this as part of his master's thesis project with
Thomas Hamelryck.
_____
http://biopython.org/wiki/GSOC2011_MocapyExt
19. GSoC 2012: Lenna Peterson
Diff My DNA: Development of a
Genomic Variant Toolkit for Biopython
Mentors: Brad Chapman, James Casbon
● I/O for VCF, GVF formats
● internal schema for variant data
_____
http://arklenna.tumblr.com/tagged/gsoc2012
20. GSoC 2012: Wibowo Arindrarto
SearchIO implementation in
Biopython
Mentor: Peter Cock
Unified, BioPerl-like API for
search results from BLAST,
HMMer, FASTA, etc.
_____
http://biopython.org/wiki/SearchIO
http://bow.web.id/blog/tag/gsoc/
21. Thanks
● OBF
● BOSC organizers
● Biopython contributors
● Scientists like you
Check us out:
● Website: http://biopython.org
● Code: https://github.com/biopython/biopython