SlideShare una empresa de Scribd logo
1 de 16
Descargar para leer sin conexión
Biopython Project Update
     Peter Cock, Plant Pathology, SCRI, Dundee, UK
10th Annual Bioinformatics Open Source Conference (BOSC)
           Stockholm, Sweden, 28 June 2009
Contents

•  Brief introduction to Biopython & history
•  Releases since BOSC 2008
•  Current and future projects
•  CVS, git and github
•  BoF hackathon and tutorial at BOSC 2009
Biopython

•  Free, open source library for bioinformatics
•  Supported by Open Bioinformatics Foundation
•  Runs on Windows, Linux, Mac OS X, etc
•  International team of volunteer developers
•  Currently about three releases per year
•  Extensive “Biopython Tutorial & Cookbook”
•  See www.biopython.org for details
Biopython’s Ten Year History

1999 •  Started by Jeff Chang & Andrew Dalke
2000 •  First release, Biopython 0.90
2001 •  Biopython 1.00, “semi-complete”
 …    •  Biopython 1.10, …, 1.41
2007 •  Biopython 1.43 (Bio.SeqIO), 1.44
2008 •  Biopython 1.45, 1.47, 1.48, 1.49
2009 •  Biopython 1.50, 1.51beta
      •  OA Publication, Cock et al.
Biopython Publication – Cock et al. 2009




          N.B. Open Access!
November 2008 – Biopython 1.49

•  Support for Python 2.6
•  Switched from “Numeric” to “NumPy”
   (important Numerical library for Python)
•  More biological methods on core Seq object
April 2009 – Biopython 1.50

•  New Bio.Motif module for sequence motifs
   (to replace Bio.AliceAce and Bio.MEME)
•  Support for QUAL and FASTQ in Bio.SeqIO
   (important NextGen sequencing formats)
•  Integration of GenomeDiagram for figures
   (Pritchard et al. 2006)
Biopython 1.50 includes GenomeDiagram
                      De novo assembly
                      of 42kb phage from
                      Roche 454 data
                         “Feature Track”
                         showing ORFs

                         Scale tick marks

                         “Barchart Track” of
                         read depth (~100,
                         scale max 200)
Reading a FASTA file with Bio.SeqIO
>FL3BO7415JACDX	
TTAATTTTATTTTGTCGGCTAAAGAGATTTTTAGCTAAACGTTCAATTGCTTTAGCTGAA	
GTACGAGCAGATACTCCAATCGCAATTGTTTCTTCATTTAAAATTAGCTCGTCGCCACCT	
TCAATTGGAAATTTATAATCACGATCTAACCAGATTGGTACATTATGTTTTGCAAATCTT	
GGATGATATTTAATGATGTACTCCATGAATAATGATTCACGTCTACGCGCTGGTTCTCTC	
ATCTTATTTATCGTTAAGCCA	
>FL3BO7415I7AFR	
...	



from Bio import SeqIO	
for rec in SeqIO.parse(open("phage.fasta"), "fasta") :	
    print rec.id, len(rec.seq), rec.seq[:10]+"..."	


FL3BO7415JACDX   261   TTAATTTTAT...	
FL3BO7415I7AFR
FL3BO7415JCAY5
                 267
                 136
                       CATTAACTAA...	
                       TTTCTTTTCT...	                           Focus on the
FL3BO7415JB41R   208   CTCTTTTATG...	
FL3BO7415I6HKB
FL3BO7415I63UC
                 268
                 219
                       GGTATTTGAA...	
                       AACATGTGAG...	
                                                                filename and
...	
                                                                format (“fasta”)…
Reading a FASTQ file with Bio.SeqIO
@FL3BO7415JACDX	
TTAATTTTATTTTGTCGGCTAAAGAGATTTTTAGCTAAACGTTCAATTGCTTTAGCTGAAGTACGAGCAGATACTCCAATCGCAATTGTTTCTTC
ATTTAAAATTAGCTCGTCGCCACCTTCAATTGGAAATTTATAATCACGATCTAACCAGATTGGTACATTATGTTTTGCAAATCTTGGATGATATT
TAATGATGTACTCCATGAATAATGATTCACGTCTACGCGCTGGTTCTCTCATCTTATTTATCGTTAAGCCA	
+	
BBBB2262=1111FFGGGHHHHIIIIIIIIIIIIIIIIIIIIIIIFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFGGGFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFGGGGFFFFFFFFFFFFFFFFFGB
BBCFFFFFFFFFFFFFFFFFFFFFFFGGGGGGGIIIIIIIGGGIIIGGGIIGGGG@AAAAA?===@@@???	
@FL3BO7415I7AFR	
...	


from Bio import SeqIO	
for rec in SeqIO.parse(open("phage.fastq"), "fastq") :	
    print rec.id, len(rec.seq), rec.seq[:10]+"..."	
    print rec.letter_annotations["phred_quality"][:10], "..."	

FL3BO7415JACDX 261 TTAATTTTAT...	
[33, 33, 33, 33, 17, 17, 21, 17, 28,
FL3BO7415I7AFR 267 CATTAACTAA...	
                                       16] ...	
                                                                   Just filename and
[37, 37, 37, 37, 37, 37, 37, 37, 38,   38] ...	
FL3BO7415JCAY5 136 TTTCTTTTCT...	
[37, 37, 36, 36, 29, 29, 29, 29, 36,   37] ...	
                                                                   format changed
FL3BO7415JB41R 208 CTCTTTTATG...	
[37, 37, 37, 38, 38, 38, 38, 38, 37,   37] ...	                    (“fasta” to “fastq”)
FL3BO7415I6HKB 268 GGTATTTGAA...	
[37, 37, 37, 37, 34, 34, 34, 37, 37,   37] ...	
FL3BO7415I63UC 219 AACATGTGAG...	
[37, 37, 37, 37, 37, 37, 37, 37, 37,   37] ...	
...
June 2009 – Biopython 1.51 beta

•  Support for Illumina 1.3+ FASTQ files
   (in addition to Sanger FASTQ and older
   Solexa/Illumina FASTQ files)
•  Faster parsing of UniProt/SwissProt files
•  Bio.SeqIO now writes feature table in
   GenBank output
                         Already being used at SCRI
                         for genome annotation, e.g.
                         with RAST and Artemis
Google Summer of Code Projects

•  Eric Talevich - Parsing and writing phyloXML

  •  Mentors Brad Chapman & Christian Zmasek

•  Nick Matzke - Biogeographical Phylogenetics

  •  Mentors Stephen Smith, Brad Chapman & David Kidd

•  Hosted by NESCent Phyloinformatics Group

•  Code development on github branches...
Other Notable Active Projects

•  Brad Chapman – GFF parsing
•  Tiago Antão – Population genetics statistics
•  Peter Cock – Parsing Roche 454 SFF files
   (with Jose Blanca, co-author of sff_extract)
•  Plus other ongoing refinements and
   documentation improvements
Distributed Development

•  Currently work from a stable branch in CVS
•  CVS master is mirrored to github.com
•  Several sub-projects are being developed on
   github branches (in public)
•  This is letting us get familiar with git & github
•  Suggested plan is to switch from CVS to git
   summer 2009 (still hosted by OBF), continue
   to push to github for public collaboration
Acknowledgements
•  Other Biopython contributors & developers!
•  Open Bioinformatics Foundation (OBF)
   supports Biopython (and BioPerl etc)
•  Society for General Microbiology
   (SGM) for my travel costs
•  My Biopython work supported by:
  •  EPSRC funded PhD
     (MOAC DTC, University of Warwick, UK)
  •  SCRI (Scottish Crop Research Institute),
     who also paid my conference fees
What next?

•  This afternoon’s “Birds of a Feather” session:
         Biopython Tutorial and/or Hackathon

•  Sign up to our mailing list?




•  Homepage www.biopython.org

Más contenido relacionado

Similar a Cock Biopython Bosc2009

BOSC 2008 Biopython
BOSC 2008 BiopythonBOSC 2008 Biopython
BOSC 2008 Biopythontiago
 
Biopython
BiopythonBiopython
Biopythonbosc
 
Python 2 is dead! Drag your old code into the modern age
Python 2 is dead! Drag your old code into the modern agePython 2 is dead! Drag your old code into the modern age
Python 2 is dead! Drag your old code into the modern ageBecky Smith
 
2016 bioinformatics i_bio_python_wimvancriekinge
2016 bioinformatics i_bio_python_wimvancriekinge2016 bioinformatics i_bio_python_wimvancriekinge
2016 bioinformatics i_bio_python_wimvancriekingeProf. Wim Van Criekinge
 
그렇게 커미터가 된다: Python을 통해 오픈소스 생태계 가르치기
그렇게 커미터가 된다: Python을 통해 오픈소스 생태계 가르치기그렇게 커미터가 된다: Python을 통해 오픈소스 생태계 가르치기
그렇게 커미터가 된다: Python을 통해 오픈소스 생태계 가르치기Jeongkyu Shin
 
Python Evolution
Python EvolutionPython Evolution
Python EvolutionQuintagroup
 
BioRuby -- Bioinformatics Library
BioRuby -- Bioinformatics LibraryBioRuby -- Bioinformatics Library
BioRuby -- Bioinformatics Libraryngotogenome
 
Teaching with JupyterHub - lessons learned
Teaching with JupyterHub - lessons learnedTeaching with JupyterHub - lessons learned
Teaching with JupyterHub - lessons learnedMartin Christen
 
Bonnal bosc2010 bio_ruby
Bonnal bosc2010 bio_rubyBonnal bosc2010 bio_ruby
Bonnal bosc2010 bio_rubyBOSC 2010
 
Package a PyApp as a Flatpak Package: An HTTP Server for Example @ PyCon APAC...
Package a PyApp as a Flatpak Package: An HTTP Server for Example @ PyCon APAC...Package a PyApp as a Flatpak Package: An HTTP Server for Example @ PyCon APAC...
Package a PyApp as a Flatpak Package: An HTTP Server for Example @ PyCon APAC...Jian-Hong Pan
 
PyCon Taiwan 2013 Tutorial
PyCon Taiwan 2013 TutorialPyCon Taiwan 2013 Tutorial
PyCon Taiwan 2013 TutorialJustin Lin
 
Pharo 7.0 and 8.0 alpha
Pharo 7.0 and 8.0 alphaPharo 7.0 and 8.0 alpha
Pharo 7.0 and 8.0 alphaPharo
 
Introduction Apache Kafka
Introduction Apache KafkaIntroduction Apache Kafka
Introduction Apache KafkaJoe Stein
 
fastp: the FASTQ pre-processor
fastp: the FASTQ pre-processorfastp: the FASTQ pre-processor
fastp: the FASTQ pre-processorHoffman Lab
 
2015 bioinformatics python_io_wim_vancriekinge
2015 bioinformatics python_io_wim_vancriekinge2015 bioinformatics python_io_wim_vancriekinge
2015 bioinformatics python_io_wim_vancriekingeProf. Wim Van Criekinge
 
New Features of Python 3.10
New Features of Python 3.10New Features of Python 3.10
New Features of Python 3.10Gabor Guta
 
Git 101, or, how to sanely manage your Koha customizations
Git 101, or, how to sanely manage your Koha customizationsGit 101, or, how to sanely manage your Koha customizations
Git 101, or, how to sanely manage your Koha customizationsIan Walls
 

Similar a Cock Biopython Bosc2009 (20)

BOSC 2008 Biopython
BOSC 2008 BiopythonBOSC 2008 Biopython
BOSC 2008 Biopython
 
Biopython
BiopythonBiopython
Biopython
 
2015 bioinformatics bio_python
2015 bioinformatics bio_python2015 bioinformatics bio_python
2015 bioinformatics bio_python
 
Python 2 is dead! Drag your old code into the modern age
Python 2 is dead! Drag your old code into the modern agePython 2 is dead! Drag your old code into the modern age
Python 2 is dead! Drag your old code into the modern age
 
2016 bioinformatics i_bio_python_wimvancriekinge
2016 bioinformatics i_bio_python_wimvancriekinge2016 bioinformatics i_bio_python_wimvancriekinge
2016 bioinformatics i_bio_python_wimvancriekinge
 
Python Orientation
Python OrientationPython Orientation
Python Orientation
 
그렇게 커미터가 된다: Python을 통해 오픈소스 생태계 가르치기
그렇게 커미터가 된다: Python을 통해 오픈소스 생태계 가르치기그렇게 커미터가 된다: Python을 통해 오픈소스 생태계 가르치기
그렇게 커미터가 된다: Python을 통해 오픈소스 생태계 가르치기
 
Python Evolution
Python EvolutionPython Evolution
Python Evolution
 
BioRuby -- Bioinformatics Library
BioRuby -- Bioinformatics LibraryBioRuby -- Bioinformatics Library
BioRuby -- Bioinformatics Library
 
Teaching with JupyterHub - lessons learned
Teaching with JupyterHub - lessons learnedTeaching with JupyterHub - lessons learned
Teaching with JupyterHub - lessons learned
 
Bonnal bosc2010 bio_ruby
Bonnal bosc2010 bio_rubyBonnal bosc2010 bio_ruby
Bonnal bosc2010 bio_ruby
 
Introduction to FIWARE IoT
Introduction to FIWARE IoTIntroduction to FIWARE IoT
Introduction to FIWARE IoT
 
Package a PyApp as a Flatpak Package: An HTTP Server for Example @ PyCon APAC...
Package a PyApp as a Flatpak Package: An HTTP Server for Example @ PyCon APAC...Package a PyApp as a Flatpak Package: An HTTP Server for Example @ PyCon APAC...
Package a PyApp as a Flatpak Package: An HTTP Server for Example @ PyCon APAC...
 
PyCon Taiwan 2013 Tutorial
PyCon Taiwan 2013 TutorialPyCon Taiwan 2013 Tutorial
PyCon Taiwan 2013 Tutorial
 
Pharo 7.0 and 8.0 alpha
Pharo 7.0 and 8.0 alphaPharo 7.0 and 8.0 alpha
Pharo 7.0 and 8.0 alpha
 
Introduction Apache Kafka
Introduction Apache KafkaIntroduction Apache Kafka
Introduction Apache Kafka
 
fastp: the FASTQ pre-processor
fastp: the FASTQ pre-processorfastp: the FASTQ pre-processor
fastp: the FASTQ pre-processor
 
2015 bioinformatics python_io_wim_vancriekinge
2015 bioinformatics python_io_wim_vancriekinge2015 bioinformatics python_io_wim_vancriekinge
2015 bioinformatics python_io_wim_vancriekinge
 
New Features of Python 3.10
New Features of Python 3.10New Features of Python 3.10
New Features of Python 3.10
 
Git 101, or, how to sanely manage your Koha customizations
Git 101, or, how to sanely manage your Koha customizationsGit 101, or, how to sanely manage your Koha customizations
Git 101, or, how to sanely manage your Koha customizations
 

Más de bosc

Swertz Molgenis Bosc2009
Swertz Molgenis Bosc2009Swertz Molgenis Bosc2009
Swertz Molgenis Bosc2009bosc
 
Bosc Intro 20090627
Bosc Intro 20090627Bosc Intro 20090627
Bosc Intro 20090627bosc
 
Software Patterns Panel Bosc2009
Software Patterns Panel Bosc2009Software Patterns Panel Bosc2009
Software Patterns Panel Bosc2009bosc
 
Schbath Rmes Bosc2009
Schbath Rmes Bosc2009Schbath Rmes Bosc2009
Schbath Rmes Bosc2009bosc
 
Kallio Chipster Bosc2009
Kallio Chipster Bosc2009Kallio Chipster Bosc2009
Kallio Chipster Bosc2009bosc
 
Welch Wordifier Bosc2009
Welch Wordifier Bosc2009Welch Wordifier Bosc2009
Welch Wordifier Bosc2009bosc
 
Rice Emboss Bosc2009
Rice Emboss Bosc2009Rice Emboss Bosc2009
Rice Emboss Bosc2009bosc
 
Prlic Bio Java Bosc2009
Prlic Bio Java Bosc2009Prlic Bio Java Bosc2009
Prlic Bio Java Bosc2009bosc
 
Senger Soaplab Bosc2009
Senger Soaplab Bosc2009Senger Soaplab Bosc2009
Senger Soaplab Bosc2009bosc
 
Hanmer Software Patterns Bosc2009
Hanmer Software Patterns Bosc2009Hanmer Software Patterns Bosc2009
Hanmer Software Patterns Bosc2009bosc
 
Snell Psoda Bosc2009
Snell Psoda Bosc2009Snell Psoda Bosc2009
Snell Psoda Bosc2009bosc
 
Procter Vamsas Bosc2009
Procter Vamsas Bosc2009Procter Vamsas Bosc2009
Procter Vamsas Bosc2009bosc
 
Drablos Composite Motifs Bosc2009
Drablos Composite Motifs Bosc2009Drablos Composite Motifs Bosc2009
Drablos Composite Motifs Bosc2009bosc
 
Fauteux Seeder Bosc2009
Fauteux Seeder Bosc2009Fauteux Seeder Bosc2009
Fauteux Seeder Bosc2009bosc
 
Moeller Debian Bosc2009
Moeller Debian Bosc2009Moeller Debian Bosc2009
Moeller Debian Bosc2009bosc
 
Wilczynski_BNFinder_BOSC2009
Wilczynski_BNFinder_BOSC2009Wilczynski_BNFinder_BOSC2009
Wilczynski_BNFinder_BOSC2009bosc
 
Welsh_BioHDF_BOSC2009
Welsh_BioHDF_BOSC2009Welsh_BioHDF_BOSC2009
Welsh_BioHDF_BOSC2009bosc
 
Varre_Biomanycores_BOSC2009
Varre_Biomanycores_BOSC2009Varre_Biomanycores_BOSC2009
Varre_Biomanycores_BOSC2009bosc
 
Trelles_QnormBOSC2009
Trelles_QnormBOSC2009Trelles_QnormBOSC2009
Trelles_QnormBOSC2009bosc
 
Rother_ModeRNA_BOSC2009
Rother_ModeRNA_BOSC2009Rother_ModeRNA_BOSC2009
Rother_ModeRNA_BOSC2009bosc
 

Más de bosc (20)

Swertz Molgenis Bosc2009
Swertz Molgenis Bosc2009Swertz Molgenis Bosc2009
Swertz Molgenis Bosc2009
 
Bosc Intro 20090627
Bosc Intro 20090627Bosc Intro 20090627
Bosc Intro 20090627
 
Software Patterns Panel Bosc2009
Software Patterns Panel Bosc2009Software Patterns Panel Bosc2009
Software Patterns Panel Bosc2009
 
Schbath Rmes Bosc2009
Schbath Rmes Bosc2009Schbath Rmes Bosc2009
Schbath Rmes Bosc2009
 
Kallio Chipster Bosc2009
Kallio Chipster Bosc2009Kallio Chipster Bosc2009
Kallio Chipster Bosc2009
 
Welch Wordifier Bosc2009
Welch Wordifier Bosc2009Welch Wordifier Bosc2009
Welch Wordifier Bosc2009
 
Rice Emboss Bosc2009
Rice Emboss Bosc2009Rice Emboss Bosc2009
Rice Emboss Bosc2009
 
Prlic Bio Java Bosc2009
Prlic Bio Java Bosc2009Prlic Bio Java Bosc2009
Prlic Bio Java Bosc2009
 
Senger Soaplab Bosc2009
Senger Soaplab Bosc2009Senger Soaplab Bosc2009
Senger Soaplab Bosc2009
 
Hanmer Software Patterns Bosc2009
Hanmer Software Patterns Bosc2009Hanmer Software Patterns Bosc2009
Hanmer Software Patterns Bosc2009
 
Snell Psoda Bosc2009
Snell Psoda Bosc2009Snell Psoda Bosc2009
Snell Psoda Bosc2009
 
Procter Vamsas Bosc2009
Procter Vamsas Bosc2009Procter Vamsas Bosc2009
Procter Vamsas Bosc2009
 
Drablos Composite Motifs Bosc2009
Drablos Composite Motifs Bosc2009Drablos Composite Motifs Bosc2009
Drablos Composite Motifs Bosc2009
 
Fauteux Seeder Bosc2009
Fauteux Seeder Bosc2009Fauteux Seeder Bosc2009
Fauteux Seeder Bosc2009
 
Moeller Debian Bosc2009
Moeller Debian Bosc2009Moeller Debian Bosc2009
Moeller Debian Bosc2009
 
Wilczynski_BNFinder_BOSC2009
Wilczynski_BNFinder_BOSC2009Wilczynski_BNFinder_BOSC2009
Wilczynski_BNFinder_BOSC2009
 
Welsh_BioHDF_BOSC2009
Welsh_BioHDF_BOSC2009Welsh_BioHDF_BOSC2009
Welsh_BioHDF_BOSC2009
 
Varre_Biomanycores_BOSC2009
Varre_Biomanycores_BOSC2009Varre_Biomanycores_BOSC2009
Varre_Biomanycores_BOSC2009
 
Trelles_QnormBOSC2009
Trelles_QnormBOSC2009Trelles_QnormBOSC2009
Trelles_QnormBOSC2009
 
Rother_ModeRNA_BOSC2009
Rother_ModeRNA_BOSC2009Rother_ModeRNA_BOSC2009
Rother_ModeRNA_BOSC2009
 

Último

Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 

Último (20)

Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 

Cock Biopython Bosc2009

  • 1. Biopython Project Update Peter Cock, Plant Pathology, SCRI, Dundee, UK 10th Annual Bioinformatics Open Source Conference (BOSC) Stockholm, Sweden, 28 June 2009
  • 2. Contents •  Brief introduction to Biopython & history •  Releases since BOSC 2008 •  Current and future projects •  CVS, git and github •  BoF hackathon and tutorial at BOSC 2009
  • 3. Biopython •  Free, open source library for bioinformatics •  Supported by Open Bioinformatics Foundation •  Runs on Windows, Linux, Mac OS X, etc •  International team of volunteer developers •  Currently about three releases per year •  Extensive “Biopython Tutorial & Cookbook” •  See www.biopython.org for details
  • 4. Biopython’s Ten Year History 1999 •  Started by Jeff Chang & Andrew Dalke 2000 •  First release, Biopython 0.90 2001 •  Biopython 1.00, “semi-complete” … •  Biopython 1.10, …, 1.41 2007 •  Biopython 1.43 (Bio.SeqIO), 1.44 2008 •  Biopython 1.45, 1.47, 1.48, 1.49 2009 •  Biopython 1.50, 1.51beta •  OA Publication, Cock et al.
  • 5. Biopython Publication – Cock et al. 2009 N.B. Open Access!
  • 6. November 2008 – Biopython 1.49 •  Support for Python 2.6 •  Switched from “Numeric” to “NumPy” (important Numerical library for Python) •  More biological methods on core Seq object
  • 7. April 2009 – Biopython 1.50 •  New Bio.Motif module for sequence motifs (to replace Bio.AliceAce and Bio.MEME) •  Support for QUAL and FASTQ in Bio.SeqIO (important NextGen sequencing formats) •  Integration of GenomeDiagram for figures (Pritchard et al. 2006)
  • 8. Biopython 1.50 includes GenomeDiagram De novo assembly of 42kb phage from Roche 454 data “Feature Track” showing ORFs Scale tick marks “Barchart Track” of read depth (~100, scale max 200)
  • 9. Reading a FASTA file with Bio.SeqIO >FL3BO7415JACDX TTAATTTTATTTTGTCGGCTAAAGAGATTTTTAGCTAAACGTTCAATTGCTTTAGCTGAA GTACGAGCAGATACTCCAATCGCAATTGTTTCTTCATTTAAAATTAGCTCGTCGCCACCT TCAATTGGAAATTTATAATCACGATCTAACCAGATTGGTACATTATGTTTTGCAAATCTT GGATGATATTTAATGATGTACTCCATGAATAATGATTCACGTCTACGCGCTGGTTCTCTC ATCTTATTTATCGTTAAGCCA >FL3BO7415I7AFR ... from Bio import SeqIO for rec in SeqIO.parse(open("phage.fasta"), "fasta") : print rec.id, len(rec.seq), rec.seq[:10]+"..." FL3BO7415JACDX 261 TTAATTTTAT... FL3BO7415I7AFR FL3BO7415JCAY5 267 136 CATTAACTAA... TTTCTTTTCT... Focus on the FL3BO7415JB41R 208 CTCTTTTATG... FL3BO7415I6HKB FL3BO7415I63UC 268 219 GGTATTTGAA... AACATGTGAG... filename and ... format (“fasta”)…
  • 10. Reading a FASTQ file with Bio.SeqIO @FL3BO7415JACDX TTAATTTTATTTTGTCGGCTAAAGAGATTTTTAGCTAAACGTTCAATTGCTTTAGCTGAAGTACGAGCAGATACTCCAATCGCAATTGTTTCTTC ATTTAAAATTAGCTCGTCGCCACCTTCAATTGGAAATTTATAATCACGATCTAACCAGATTGGTACATTATGTTTTGCAAATCTTGGATGATATT TAATGATGTACTCCATGAATAATGATTCACGTCTACGCGCTGGTTCTCTCATCTTATTTATCGTTAAGCCA + BBBB2262=1111FFGGGHHHHIIIIIIIIIIIIIIIIIIIIIIIFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFGGGFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFGGGGFFFFFFFFFFFFFFFFFGB BBCFFFFFFFFFFFFFFFFFFFFFFFGGGGGGGIIIIIIIGGGIIIGGGIIGGGG@AAAAA?===@@@??? @FL3BO7415I7AFR ... from Bio import SeqIO for rec in SeqIO.parse(open("phage.fastq"), "fastq") : print rec.id, len(rec.seq), rec.seq[:10]+"..." print rec.letter_annotations["phred_quality"][:10], "..." FL3BO7415JACDX 261 TTAATTTTAT... [33, 33, 33, 33, 17, 17, 21, 17, 28, FL3BO7415I7AFR 267 CATTAACTAA... 16] ... Just filename and [37, 37, 37, 37, 37, 37, 37, 37, 38, 38] ... FL3BO7415JCAY5 136 TTTCTTTTCT... [37, 37, 36, 36, 29, 29, 29, 29, 36, 37] ... format changed FL3BO7415JB41R 208 CTCTTTTATG... [37, 37, 37, 38, 38, 38, 38, 38, 37, 37] ... (“fasta” to “fastq”) FL3BO7415I6HKB 268 GGTATTTGAA... [37, 37, 37, 37, 34, 34, 34, 37, 37, 37] ... FL3BO7415I63UC 219 AACATGTGAG... [37, 37, 37, 37, 37, 37, 37, 37, 37, 37] ... ...
  • 11. June 2009 – Biopython 1.51 beta •  Support for Illumina 1.3+ FASTQ files (in addition to Sanger FASTQ and older Solexa/Illumina FASTQ files) •  Faster parsing of UniProt/SwissProt files •  Bio.SeqIO now writes feature table in GenBank output Already being used at SCRI for genome annotation, e.g. with RAST and Artemis
  • 12. Google Summer of Code Projects •  Eric Talevich - Parsing and writing phyloXML •  Mentors Brad Chapman & Christian Zmasek •  Nick Matzke - Biogeographical Phylogenetics •  Mentors Stephen Smith, Brad Chapman & David Kidd •  Hosted by NESCent Phyloinformatics Group •  Code development on github branches...
  • 13. Other Notable Active Projects •  Brad Chapman – GFF parsing •  Tiago Antão – Population genetics statistics •  Peter Cock – Parsing Roche 454 SFF files (with Jose Blanca, co-author of sff_extract) •  Plus other ongoing refinements and documentation improvements
  • 14. Distributed Development •  Currently work from a stable branch in CVS •  CVS master is mirrored to github.com •  Several sub-projects are being developed on github branches (in public) •  This is letting us get familiar with git & github •  Suggested plan is to switch from CVS to git summer 2009 (still hosted by OBF), continue to push to github for public collaboration
  • 15. Acknowledgements •  Other Biopython contributors & developers! •  Open Bioinformatics Foundation (OBF) supports Biopython (and BioPerl etc) •  Society for General Microbiology (SGM) for my travel costs •  My Biopython work supported by: •  EPSRC funded PhD (MOAC DTC, University of Warwick, UK) •  SCRI (Scottish Crop Research Institute), who also paid my conference fees
  • 16. What next? •  This afternoon’s “Birds of a Feather” session: Biopython Tutorial and/or Hackathon •  Sign up to our mailing list? •  Homepage www.biopython.org