SlideShare una empresa de Scribd logo
1 de 21
Descargar para leer sin conexión
Biomanycores, a repository of interoperable
            open-source code for many-cores bioinformatics

                  Jean-St´phane Varr´, St´phane Janot, Mathieu Giraud
                         e          e    e
                             contact@biomanycores.org
                                   Sequoia Bioinformatics
                       LIFL – UMR CNRS 8022 – Universit´ Lille 1, France
                                                         e
                               INRIA Lille Nord-Europe, France



                                          June 2009



J.-S. Varr´, S. Janot, M. Giraud (LIFL)
          e                                Biomanycores                    June 2009   1 / 20
Outline




         High-performance computing
         Graphical Processing Units and bioinformatics
         biomanycores.org
                 aim of the project
                 what has been done ?
                 future developments




J.-S. Varr´, S. Janot, M. Giraud (LIFL)
          e                               Biomanycores   June 2009   2 / 20
High Performance Bioinformatics – Manycores


                                                              1970 – 2002:
                                                              Moore’s law =
                                                              increasing frequencies
                                                              problems:
                                                              power consumption,
                                                              heat dissipation here

         from now on: Moore’s law continues with multiple cores
                 from multicores: dual-cores, quad-cores, octo-cores...
                 to manycores:
                         Graphic processing units (GPUs)
                         Nvidia GTX 285 ⇒ 30 × 8 cores, 1.2 GHz, 40 (×8) GFlops
                         convergence CPU-GPU: Intel Larrabee



J.-S. Varr´, S. Janot, M. Giraud (LIFL)
          e                                  Biomanycores                   June 2009   3 / 20
High Performance Bioinformatics – Manycores

 GPGPU = General-Purpose computation on GPU
         until 2007: tweaking graphics primitives
         2007: Nvidia CUDA
         2009: OpenCL (Khronos Group)
                 dec 08: 1.0 specification
                 may 09: beta release of a Nvidia compiler
                 AMD/ATI compiler coming soon
         ⇒ portable manycores applications ?

 With GPGPU...
         10× / 100× peak speed-up, low costs ($50–$500)
         even with loss due to parallelism, 10× speed-up is possible
         (relatively) easy with CUDA / OpenCL, requires some learning

J.-S. Varr´, S. Janot, M. Giraud (LIFL)
          e                               Biomanycores             June 2009   4 / 20
GPU + Bioinformatics
Methods




         “Graphical” GPGPU (2005/06):
                                              speed-up
                                   RAxML     up to 2×         Charalambous et al. 2005
                                  ClustalW   up to 7×         Liu et al. 2006
         CUDA (since 2007):
                                              speed-up
                         mummerGPU           up to 10×        Schatz et al. 2007
                      Smith-Waterman         up to 15×        Manavski and Valle 2008
                      Neighbor-Joining       up to 26×        Liu et al. 2009
                             RNAfold         up to 17×        Risk and Lavenier 2009
         ∼ 10 papers between 2007 and 2009



J.-S. Varr´, S. Janot, M. Giraud (LIFL)
          e                                    Biomanycores                       June 2009   5 / 20
GPU + Bioinformatics
Specific Bioinformatics HPC Events




         HiComb (IEEE Workshop on High Performance Computational Biology)
         since 2002
         in conjunction with IPDPS [may 09, Roma]
         PBC (Parallel Bio-Computing Workshop)
         since 2005, every two years
         in conjunction with PPAM [sept 09, Wroclaw]
         HiBi (Workshop on High Performance Computational Systems Biology)
         [oct 09, Trento]




J.-S. Varr´, S. Janot, M. Giraud (LIFL)
          e                               Biomanycores            June 2009   6 / 20
Sequoia Bioinformatics
LIFL, INRIA, Universit´ Lille 1, France
                      e




 H. Touzet’s group, 14 people (including 5 PhD students)
         Large-scale sequence analysis
         Sequence comparisons, seed-based heuristics
         RNA, transcription factors, NRPS
         High-Performance Bioinformatics
                 SIMD flexible read mapper (L. No´, M. Gˆ
                                                 e       ırdea)
                 GPU PWM scan / P-value (22× – 77× on a GTX 280)
                 GPU ADP (6.1× – 22.8× on a GTX 280, with U. Bielefeld)
                 GPU & bit-parallelism pattern matching (ongoing)
                 Supported by NVIDIA (Professor Partnership, 2009)




J.-S. Varr´, S. Janot, M. Giraud (LIFL)
          e                               Biomanycores               June 2009   7 / 20
GPU + Position-Weight Matrices (PWM)
Parallel Position Weight Matrices Algorithms. M. Giraud and J.-S. Varr´. ISPDC’09
                                                                      e

        PWMs are used for modeling transcription
        factor binding sites, transcription start sites,                                                                            2.0




                                                                                                                                                   TGT         GGT
        protein domains, . . .




                                                                                                                             bits
                                                                                                                                    1.0
        score threshold or P-value computation:                                                                                           A    T                        T
                                                                                                                                    0.0
                                                                                                                                          TC A
                                                                                                                                          C A CT          C         A
                                                                                                                                                                    C
                                                                                                                                                                        C
                                                                                                                                                                        A




        requires to enumerate words
                                                                                                                                                           A        G




                                                                                                                                                           5
                                                                                                                                                               WebLogo 3.0

        occurrences: requires to scan quickly a very
        long sequence
                                                                                                 25x
                    100x
                                 CPU (one thread)
                                    GeForce 8800
                                         GTX 280                                                 20x
                                GTX 280 (+ atomic)


                     10x                                                                         15x
                                                                                       Speedup
          Speedup




                                                                                                 10x                                CPU (one thread)
                                                                                                                                      GeForce 8800
                                                                                                                                           GTX 280

                      1x                                                                          5x




                           35        40       45      50        55   60   65      70                   0   10   20   30    40       50    60       70     80   90
                                                     Matrix length                                                        Matrix length

J.-S. Varr´, S. Janot, M. Giraud (LIFL)
          e                                                                    Biomanycores                                                             June 2009       8 / 20
HPC Bioinformatics for human beings ?




         Research in High-Performance Computing
                 nice ideas, nice papers
                 but not always exploited
         A few HPC bioinformatics frameworks projects...

            ⇒ far from everyday usage of bioinformaticians and biologists




J.-S. Varr´, S. Janot, M. Giraud (LIFL)
          e                                 Biomanycores           June 2009   9 / 20
www.biomanycores.org



                                             1. Share OpenCL code
                                                = public repository, open-source

                                             2. Make it easy
                                                = Bio∗ integration

                                             3. Benchmark
                                                algorithms, implementations, hardware




J.-S. Varr´, S. Janot, M. Giraud (LIFL)
          e                                      Biomanycores                 June 2009   10 / 20
www.biomanycores.org



                                             1. Share OpenCL code (currently CUDA)
                                                = public repository, open-source

                                             2. Make it easy
                                                = Bio∗ integration

                                             3. Benchmark
                                                algorithms, implementations, hardware




J.-S. Varr´, S. Janot, M. Giraud (LIFL)
          e                                      Biomanycores                June 2009   10 / 20
Already included projects


         SWcuda – Smith-Waterman protein alignment
                 CRIBI Genomics, University of Padova, Italy
                 S. A. Manavski, G. Valle, CUDA compatible GPU cards as efficient hardware
                 accelerators for Smith-Waterman sequence alignment, BMC Bioinformatics 2008,
                 9(S2):S10
         pknotsRG – pseudonots of an RNA sequence
                 Universit¨t Bielefeld, Germany
                          a
                 J. Reeder, P. Steffen, R. Giegerich, pknotsRG: RNA pseudoknot folding including
                 near-optimal structures and sliding windows, Nucl. Acids. Res., 2007
         cudaPWM – scan a PWM against a DNA sequence
                 Sequoia, LIFL, INRIA, Universit´ Lille 1
                                                  e
                 M. Giraud, J.-S. Varr´, Parallel Position Weight Matrices Algorithms, ISPDC’09
                                      e


            Interfaces to BioJava 1.6, BioPerl 1.52, and Biopython 1.50b



J.-S. Varr´, S. Janot, M. Giraud (LIFL)
          e                                   Biomanycores                         June 2009   11 / 20
J.-S. Varr´, S. Janot, M. Giraud (LIFL)
          e                               Biomanycores   June 2009   12 / 20
Biopython + CRIBI SW




 from Bio i m p o r t SeqIO
 from Biomanycores i m p o r t PadovaSW

 bank = SeqIO . parse ( open ( ” u n i p r o t −s t a r t . f a ” ) , ” f a s t a ” )

 f o r query i n SeqIO . parse ( open ( ” p r o t 6 4 . f a ” ) , ” f a s t a ” ) :
       handle = PadovaSW . run ( query , bank )
       result = PadovaSW . SWParser ( ) . parse ( )
       p r i n t result




J.-S. Varr´, S. Janot, M. Giraud (LIFL)
          e                                     Biomanycores                            June 2009   13 / 20
Biopython + CRIBI SW
Tests on a GeForce 8800


 biopython$ time python sw-demo.py cuda
 ** cd ../bin/ ; ./swcuda config.gpu ../tmp/swcuda.fa ../tmp/swcuda.bank
 ** 1.846s
 12098 results...
 [(84.0, 0, 0, ’sp|P30350|ADH1_ANAPL’), (81.0, 0, 0, ’sp|P23991|ADH1_CHICK’), (81.0,
 real 2.81    user 1.79     sys 0.27

 biopython$ time python sw-demo.py cpu
 ** cd ../bin/ ; ./swcuda config.cpu ../tmp/swcuda.fa ../tmp/swcuda.bank
 ** 16.604s
 12098 results...
 [(84.0, 0, 0, ’sp|P30350|ADH1_ANAPL’), (81.0, 0, 0, ’sp|P23991|ADH1_CHICK’), (81.0,
 real 17.57     user 16.42     sys 0.14


         10× – 15× paper speedup (BMC Bioinformatics 2008, 9S2)
         8.7× application speedup
         6.2× final speedup (including Biopython/Biomanycores)

J.-S. Varr´, S. Janot, M. Giraud (LIFL)
          e                               Biomanycores              June 2009   14 / 20
BioPerl + CRIBI SW

 BioPerl tutorial
 u s e Bio : : Tools : : pSW ;

 $factory = new Bio : : Tools : : pSW ( ’−m a t r i x ’=> ’ b l o s u m 6 2 . b l a ’ , ’−gap ’ ←
     =>12, ’−e x t ’ =>2) ;
 $factory−>alig n_and_sh ow ( $seq1 , $seq2 , STDOUT ) ;
 $aln = $factory−>p a i r w i s e _ a l i g n m e n t ( $seq1 , $seq2 ) ;



 With biomanycores
 u s e Bio : : SeqIO ;
 u s e Biomanycores : : PadovaSW ;

 $factory = PadovaSW−>new ( ) ;

 $factory−>swcuda ( $inputseq , $bank ) ;
 @r = $factory−>parse_result ( ) ;



J.-S. Varr´, S. Janot, M. Giraud (LIFL)
          e                                  Biomanycores                           June 2009   15 / 20
BioJava + PWM

 i m p o r t org . biojavax . bio . seq . RichSequence ;
 i m p o r t org . biojava . bio . dp . S i m p l e W e i g h t M a t r i x ;
  ...
 i m p o r t org . biomanycores . bio . pwm . ∗ ;
  ...
 {
    LillePWMScan scanner = new LillePWMScan ( launcher ) ;

     // r e a d t h e s e q u e n c e
     R i c h S e q u e n c e I t e r a t o r it = n u l l ;
     Buffe redRead er in1 = new Buff eredRead er ( new FileReader ( args [ 1 ] ) ) ;
     it = RichSequence . IOTools . readFastaDNA ( in1 , n u l l ) ;
     RichSequence query = it . n e x t R i c h S e q u e nc e ( ) ;

     // r e a d a w e i g h t m a t r i x
     S i m p l e W e i g h t M a t r i x pwm = PFMParser . PARSER . get ( args [ 2 ] , alph , ”ACGT” ) ;

     // s c a n t h e s e q u e n c e
     List<PWMHit> al = scanner . scan ( query , pwm , threshold ) ;
 }



J.-S. Varr´, S. Janot, M. Giraud (LIFL)
          e                                   Biomanycores                           June 2009   16 / 20
Challenges


         Differents APIs, different philosophies
                 BioJava : no external program execution ?
                 Object representation (alignments)
                 Object existence (PWM)

         Minimal modifications to the source code of applications
                 CribiSW : command-line arguments

         Real-world pipelines ?
                 Bio∗ are not HPC frameworks
                 Succession of several programs

         Usage: requires CUDA / OpenCL SDK




J.-S. Varr´, S. Janot, M. Giraud (LIFL)
          e                               Biomanycores             June 2009   17 / 20
Licenses




         Projects must have an open-source licence
         Bio∗ interfaces : same license than mother API
                 BioJava: LGPL 2.1
                 BioPerl: Perl artistic license
                 Biopython: Biopython license




J.-S. Varr´, S. Janot, M. Giraud (LIFL)
          e                                Biomanycores   June 2009   18 / 20
www.biomanycores.org

                                             1. Share OpenCL code (currently CUDA)
                                                = public repository, open-source
                                                ⇒ bring new projects

                                             2. Make it easy
                                                = Bio∗ integration
                                                ⇒ integrate new projects
                                                ⇒ improve current interfaces

                                             3. Benchmark
                                                algorithms, implementations, hardware
                                                ⇒ think !


J.-S. Varr´, S. Janot, M. Giraud (LIFL)
          e                                      Biomanycores                  June 2009   19 / 20
go back

Más contenido relacionado

Similar a Varre_Biomanycores_BOSC2009

Sucha_ICC_2012
Sucha_ICC_2012Sucha_ICC_2012
Sucha_ICC_2012sucha
 
Learning Biologically Relevant Features Using Convolutional Neural Networks f...
Learning Biologically Relevant Features Using Convolutional Neural Networks f...Learning Biologically Relevant Features Using Convolutional Neural Networks f...
Learning Biologically Relevant Features Using Convolutional Neural Networks f...Wesley De Neve
 
MIMO-OFDM for 4G network
MIMO-OFDM for 4G networkMIMO-OFDM for 4G network
MIMO-OFDM for 4G networknimay1
 
Progress of Integration in MEMS and New Industry Creation
Progress of Integration in MEMS and New Industry CreationProgress of Integration in MEMS and New Industry Creation
Progress of Integration in MEMS and New Industry CreationSLINTEC
 
TRACK D: Advanced design regardless of process technology/ Marco Casale-Rossi
TRACK D: Advanced design regardless of process technology/ Marco Casale-RossiTRACK D: Advanced design regardless of process technology/ Marco Casale-Rossi
TRACK D: Advanced design regardless of process technology/ Marco Casale-Rossichiportal
 
Definition and Validation of Scientific Algorithms for the SEOSAT/Ingenio GPP
Definition and Validation of Scientific Algorithms for the SEOSAT/Ingenio GPPDefinition and Validation of Scientific Algorithms for the SEOSAT/Ingenio GPP
Definition and Validation of Scientific Algorithms for the SEOSAT/Ingenio GPPEsri
 
Bryan Thompson, Chief Scientist and Founder at SYSTAP, LLC at MLconf NYC
Bryan Thompson, Chief Scientist and Founder at SYSTAP, LLC at MLconf NYCBryan Thompson, Chief Scientist and Founder at SYSTAP, LLC at MLconf NYC
Bryan Thompson, Chief Scientist and Founder at SYSTAP, LLC at MLconf NYCMLconf
 
An Information-Theoretic Approach for Clonal Selection Algorithms
An Information-Theoretic Approach for Clonal Selection AlgorithmsAn Information-Theoretic Approach for Clonal Selection Algorithms
An Information-Theoretic Approach for Clonal Selection AlgorithmsMario Pavone
 
IRJET- Study of MIMO Precoding Techniques and their Application using Joi...
IRJET-  	  Study of MIMO Precoding Techniques and their Application using Joi...IRJET-  	  Study of MIMO Precoding Techniques and their Application using Joi...
IRJET- Study of MIMO Precoding Techniques and their Application using Joi...IRJET Journal
 
Globecom 2015: Citywide MU vs SU MIMO (Siming Zhang)
Globecom 2015: Citywide MU vs SU MIMO (Siming Zhang)Globecom 2015: Citywide MU vs SU MIMO (Siming Zhang)
Globecom 2015: Citywide MU vs SU MIMO (Siming Zhang)Andrew Nix
 
Globecom 2015 siming_v3
Globecom 2015 siming_v3Globecom 2015 siming_v3
Globecom 2015 siming_v3SIMING ZHANG
 
2021 itu challenge_reinforcement_learning
2021 itu challenge_reinforcement_learning2021 itu challenge_reinforcement_learning
2021 itu challenge_reinforcement_learningLASSEMedia
 
Adaptive Hyper-Parameter Tuning for Black-box LiDAR Odometry [IROS2021]
Adaptive Hyper-Parameter Tuning for Black-box LiDAR Odometry [IROS2021]Adaptive Hyper-Parameter Tuning for Black-box LiDAR Odometry [IROS2021]
Adaptive Hyper-Parameter Tuning for Black-box LiDAR Odometry [IROS2021]KenjiKoide1
 
Graph Analysis: New Algorithm Models, New Architectures
Graph Analysis: New Algorithm Models, New ArchitecturesGraph Analysis: New Algorithm Models, New Architectures
Graph Analysis: New Algorithm Models, New ArchitecturesJason Riedy
 
The Membrane Interface Probe
The Membrane Interface ProbeThe Membrane Interface Probe
The Membrane Interface ProbeRobert-Jan Stuut
 

Similar a Varre_Biomanycores_BOSC2009 (20)

Cospar
CosparCospar
Cospar
 
Sucha_ICC_2012
Sucha_ICC_2012Sucha_ICC_2012
Sucha_ICC_2012
 
Learning Biologically Relevant Features Using Convolutional Neural Networks f...
Learning Biologically Relevant Features Using Convolutional Neural Networks f...Learning Biologically Relevant Features Using Convolutional Neural Networks f...
Learning Biologically Relevant Features Using Convolutional Neural Networks f...
 
MIMO-OFDM for 4G network
MIMO-OFDM for 4G networkMIMO-OFDM for 4G network
MIMO-OFDM for 4G network
 
Progress of Integration in MEMS and New Industry Creation
Progress of Integration in MEMS and New Industry CreationProgress of Integration in MEMS and New Industry Creation
Progress of Integration in MEMS and New Industry Creation
 
TRACK D: Advanced design regardless of process technology/ Marco Casale-Rossi
TRACK D: Advanced design regardless of process technology/ Marco Casale-RossiTRACK D: Advanced design regardless of process technology/ Marco Casale-Rossi
TRACK D: Advanced design regardless of process technology/ Marco Casale-Rossi
 
ISBI MPI Tutorial
ISBI MPI TutorialISBI MPI Tutorial
ISBI MPI Tutorial
 
Definition and Validation of Scientific Algorithms for the SEOSAT/Ingenio GPP
Definition and Validation of Scientific Algorithms for the SEOSAT/Ingenio GPPDefinition and Validation of Scientific Algorithms for the SEOSAT/Ingenio GPP
Definition and Validation of Scientific Algorithms for the SEOSAT/Ingenio GPP
 
Bryan Thompson, Chief Scientist and Founder at SYSTAP, LLC at MLconf NYC
Bryan Thompson, Chief Scientist and Founder at SYSTAP, LLC at MLconf NYCBryan Thompson, Chief Scientist and Founder at SYSTAP, LLC at MLconf NYC
Bryan Thompson, Chief Scientist and Founder at SYSTAP, LLC at MLconf NYC
 
An Information-Theoretic Approach for Clonal Selection Algorithms
An Information-Theoretic Approach for Clonal Selection AlgorithmsAn Information-Theoretic Approach for Clonal Selection Algorithms
An Information-Theoretic Approach for Clonal Selection Algorithms
 
IRJET- Study of MIMO Precoding Techniques and their Application using Joi...
IRJET-  	  Study of MIMO Precoding Techniques and their Application using Joi...IRJET-  	  Study of MIMO Precoding Techniques and their Application using Joi...
IRJET- Study of MIMO Precoding Techniques and their Application using Joi...
 
Globecom 2015: Citywide MU vs SU MIMO (Siming Zhang)
Globecom 2015: Citywide MU vs SU MIMO (Siming Zhang)Globecom 2015: Citywide MU vs SU MIMO (Siming Zhang)
Globecom 2015: Citywide MU vs SU MIMO (Siming Zhang)
 
Overview and Implications of Nanotechnology
Overview and Implications of NanotechnologyOverview and Implications of Nanotechnology
Overview and Implications of Nanotechnology
 
Globecom 2015 siming_v3
Globecom 2015 siming_v3Globecom 2015 siming_v3
Globecom 2015 siming_v3
 
2021 itu challenge_reinforcement_learning
2021 itu challenge_reinforcement_learning2021 itu challenge_reinforcement_learning
2021 itu challenge_reinforcement_learning
 
On the role of quantum mechanical simulation in materials science.
On the role of quantum mechanical simulation in materials science. On the role of quantum mechanical simulation in materials science.
On the role of quantum mechanical simulation in materials science.
 
Adaptive Hyper-Parameter Tuning for Black-box LiDAR Odometry [IROS2021]
Adaptive Hyper-Parameter Tuning for Black-box LiDAR Odometry [IROS2021]Adaptive Hyper-Parameter Tuning for Black-box LiDAR Odometry [IROS2021]
Adaptive Hyper-Parameter Tuning for Black-box LiDAR Odometry [IROS2021]
 
Graph Analysis: New Algorithm Models, New Architectures
Graph Analysis: New Algorithm Models, New ArchitecturesGraph Analysis: New Algorithm Models, New Architectures
Graph Analysis: New Algorithm Models, New Architectures
 
A3050 Eng
A3050 EngA3050 Eng
A3050 Eng
 
The Membrane Interface Probe
The Membrane Interface ProbeThe Membrane Interface Probe
The Membrane Interface Probe
 

Más de bosc

Swertz Molgenis Bosc2009
Swertz Molgenis Bosc2009Swertz Molgenis Bosc2009
Swertz Molgenis Bosc2009bosc
 
Bosc Intro 20090627
Bosc Intro 20090627Bosc Intro 20090627
Bosc Intro 20090627bosc
 
Software Patterns Panel Bosc2009
Software Patterns Panel Bosc2009Software Patterns Panel Bosc2009
Software Patterns Panel Bosc2009bosc
 
Schbath Rmes Bosc2009
Schbath Rmes Bosc2009Schbath Rmes Bosc2009
Schbath Rmes Bosc2009bosc
 
Kallio Chipster Bosc2009
Kallio Chipster Bosc2009Kallio Chipster Bosc2009
Kallio Chipster Bosc2009bosc
 
Welch Wordifier Bosc2009
Welch Wordifier Bosc2009Welch Wordifier Bosc2009
Welch Wordifier Bosc2009bosc
 
Rice Emboss Bosc2009
Rice Emboss Bosc2009Rice Emboss Bosc2009
Rice Emboss Bosc2009bosc
 
Prlic Bio Java Bosc2009
Prlic Bio Java Bosc2009Prlic Bio Java Bosc2009
Prlic Bio Java Bosc2009bosc
 
Senger Soaplab Bosc2009
Senger Soaplab Bosc2009Senger Soaplab Bosc2009
Senger Soaplab Bosc2009bosc
 
Cock Biopython Bosc2009
Cock Biopython Bosc2009Cock Biopython Bosc2009
Cock Biopython Bosc2009bosc
 
Hanmer Software Patterns Bosc2009
Hanmer Software Patterns Bosc2009Hanmer Software Patterns Bosc2009
Hanmer Software Patterns Bosc2009bosc
 
Snell Psoda Bosc2009
Snell Psoda Bosc2009Snell Psoda Bosc2009
Snell Psoda Bosc2009bosc
 
Procter Vamsas Bosc2009
Procter Vamsas Bosc2009Procter Vamsas Bosc2009
Procter Vamsas Bosc2009bosc
 
Drablos Composite Motifs Bosc2009
Drablos Composite Motifs Bosc2009Drablos Composite Motifs Bosc2009
Drablos Composite Motifs Bosc2009bosc
 
Fauteux Seeder Bosc2009
Fauteux Seeder Bosc2009Fauteux Seeder Bosc2009
Fauteux Seeder Bosc2009bosc
 
Prins Bio Lib Bosc 2009
Prins Bio Lib Bosc 2009Prins Bio Lib Bosc 2009
Prins Bio Lib Bosc 2009bosc
 
Wilczynski_BNFinder_BOSC2009
Wilczynski_BNFinder_BOSC2009Wilczynski_BNFinder_BOSC2009
Wilczynski_BNFinder_BOSC2009bosc
 
Welsh_BioHDF_BOSC2009
Welsh_BioHDF_BOSC2009Welsh_BioHDF_BOSC2009
Welsh_BioHDF_BOSC2009bosc
 
Trelles_QnormBOSC2009
Trelles_QnormBOSC2009Trelles_QnormBOSC2009
Trelles_QnormBOSC2009bosc
 
Rother_ModeRNA_BOSC2009
Rother_ModeRNA_BOSC2009Rother_ModeRNA_BOSC2009
Rother_ModeRNA_BOSC2009bosc
 

Más de bosc (20)

Swertz Molgenis Bosc2009
Swertz Molgenis Bosc2009Swertz Molgenis Bosc2009
Swertz Molgenis Bosc2009
 
Bosc Intro 20090627
Bosc Intro 20090627Bosc Intro 20090627
Bosc Intro 20090627
 
Software Patterns Panel Bosc2009
Software Patterns Panel Bosc2009Software Patterns Panel Bosc2009
Software Patterns Panel Bosc2009
 
Schbath Rmes Bosc2009
Schbath Rmes Bosc2009Schbath Rmes Bosc2009
Schbath Rmes Bosc2009
 
Kallio Chipster Bosc2009
Kallio Chipster Bosc2009Kallio Chipster Bosc2009
Kallio Chipster Bosc2009
 
Welch Wordifier Bosc2009
Welch Wordifier Bosc2009Welch Wordifier Bosc2009
Welch Wordifier Bosc2009
 
Rice Emboss Bosc2009
Rice Emboss Bosc2009Rice Emboss Bosc2009
Rice Emboss Bosc2009
 
Prlic Bio Java Bosc2009
Prlic Bio Java Bosc2009Prlic Bio Java Bosc2009
Prlic Bio Java Bosc2009
 
Senger Soaplab Bosc2009
Senger Soaplab Bosc2009Senger Soaplab Bosc2009
Senger Soaplab Bosc2009
 
Cock Biopython Bosc2009
Cock Biopython Bosc2009Cock Biopython Bosc2009
Cock Biopython Bosc2009
 
Hanmer Software Patterns Bosc2009
Hanmer Software Patterns Bosc2009Hanmer Software Patterns Bosc2009
Hanmer Software Patterns Bosc2009
 
Snell Psoda Bosc2009
Snell Psoda Bosc2009Snell Psoda Bosc2009
Snell Psoda Bosc2009
 
Procter Vamsas Bosc2009
Procter Vamsas Bosc2009Procter Vamsas Bosc2009
Procter Vamsas Bosc2009
 
Drablos Composite Motifs Bosc2009
Drablos Composite Motifs Bosc2009Drablos Composite Motifs Bosc2009
Drablos Composite Motifs Bosc2009
 
Fauteux Seeder Bosc2009
Fauteux Seeder Bosc2009Fauteux Seeder Bosc2009
Fauteux Seeder Bosc2009
 
Prins Bio Lib Bosc 2009
Prins Bio Lib Bosc 2009Prins Bio Lib Bosc 2009
Prins Bio Lib Bosc 2009
 
Wilczynski_BNFinder_BOSC2009
Wilczynski_BNFinder_BOSC2009Wilczynski_BNFinder_BOSC2009
Wilczynski_BNFinder_BOSC2009
 
Welsh_BioHDF_BOSC2009
Welsh_BioHDF_BOSC2009Welsh_BioHDF_BOSC2009
Welsh_BioHDF_BOSC2009
 
Trelles_QnormBOSC2009
Trelles_QnormBOSC2009Trelles_QnormBOSC2009
Trelles_QnormBOSC2009
 
Rother_ModeRNA_BOSC2009
Rother_ModeRNA_BOSC2009Rother_ModeRNA_BOSC2009
Rother_ModeRNA_BOSC2009
 

Último

New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 

Último (20)

New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 

Varre_Biomanycores_BOSC2009

  • 1. Biomanycores, a repository of interoperable open-source code for many-cores bioinformatics Jean-St´phane Varr´, St´phane Janot, Mathieu Giraud e e e contact@biomanycores.org Sequoia Bioinformatics LIFL – UMR CNRS 8022 – Universit´ Lille 1, France e INRIA Lille Nord-Europe, France June 2009 J.-S. Varr´, S. Janot, M. Giraud (LIFL) e Biomanycores June 2009 1 / 20
  • 2. Outline High-performance computing Graphical Processing Units and bioinformatics biomanycores.org aim of the project what has been done ? future developments J.-S. Varr´, S. Janot, M. Giraud (LIFL) e Biomanycores June 2009 2 / 20
  • 3. High Performance Bioinformatics – Manycores 1970 – 2002: Moore’s law = increasing frequencies problems: power consumption, heat dissipation here from now on: Moore’s law continues with multiple cores from multicores: dual-cores, quad-cores, octo-cores... to manycores: Graphic processing units (GPUs) Nvidia GTX 285 ⇒ 30 × 8 cores, 1.2 GHz, 40 (×8) GFlops convergence CPU-GPU: Intel Larrabee J.-S. Varr´, S. Janot, M. Giraud (LIFL) e Biomanycores June 2009 3 / 20
  • 4. High Performance Bioinformatics – Manycores GPGPU = General-Purpose computation on GPU until 2007: tweaking graphics primitives 2007: Nvidia CUDA 2009: OpenCL (Khronos Group) dec 08: 1.0 specification may 09: beta release of a Nvidia compiler AMD/ATI compiler coming soon ⇒ portable manycores applications ? With GPGPU... 10× / 100× peak speed-up, low costs ($50–$500) even with loss due to parallelism, 10× speed-up is possible (relatively) easy with CUDA / OpenCL, requires some learning J.-S. Varr´, S. Janot, M. Giraud (LIFL) e Biomanycores June 2009 4 / 20
  • 5. GPU + Bioinformatics Methods “Graphical” GPGPU (2005/06): speed-up RAxML up to 2× Charalambous et al. 2005 ClustalW up to 7× Liu et al. 2006 CUDA (since 2007): speed-up mummerGPU up to 10× Schatz et al. 2007 Smith-Waterman up to 15× Manavski and Valle 2008 Neighbor-Joining up to 26× Liu et al. 2009 RNAfold up to 17× Risk and Lavenier 2009 ∼ 10 papers between 2007 and 2009 J.-S. Varr´, S. Janot, M. Giraud (LIFL) e Biomanycores June 2009 5 / 20
  • 6. GPU + Bioinformatics Specific Bioinformatics HPC Events HiComb (IEEE Workshop on High Performance Computational Biology) since 2002 in conjunction with IPDPS [may 09, Roma] PBC (Parallel Bio-Computing Workshop) since 2005, every two years in conjunction with PPAM [sept 09, Wroclaw] HiBi (Workshop on High Performance Computational Systems Biology) [oct 09, Trento] J.-S. Varr´, S. Janot, M. Giraud (LIFL) e Biomanycores June 2009 6 / 20
  • 7. Sequoia Bioinformatics LIFL, INRIA, Universit´ Lille 1, France e H. Touzet’s group, 14 people (including 5 PhD students) Large-scale sequence analysis Sequence comparisons, seed-based heuristics RNA, transcription factors, NRPS High-Performance Bioinformatics SIMD flexible read mapper (L. No´, M. Gˆ e ırdea) GPU PWM scan / P-value (22× – 77× on a GTX 280) GPU ADP (6.1× – 22.8× on a GTX 280, with U. Bielefeld) GPU & bit-parallelism pattern matching (ongoing) Supported by NVIDIA (Professor Partnership, 2009) J.-S. Varr´, S. Janot, M. Giraud (LIFL) e Biomanycores June 2009 7 / 20
  • 8. GPU + Position-Weight Matrices (PWM) Parallel Position Weight Matrices Algorithms. M. Giraud and J.-S. Varr´. ISPDC’09 e PWMs are used for modeling transcription factor binding sites, transcription start sites, 2.0 TGT GGT protein domains, . . . bits 1.0 score threshold or P-value computation: A T T 0.0 TC A C A CT C A C C A requires to enumerate words A G 5 WebLogo 3.0 occurrences: requires to scan quickly a very long sequence 25x 100x CPU (one thread) GeForce 8800 GTX 280 20x GTX 280 (+ atomic) 10x 15x Speedup Speedup 10x CPU (one thread) GeForce 8800 GTX 280 1x 5x 35 40 45 50 55 60 65 70 0 10 20 30 40 50 60 70 80 90 Matrix length Matrix length J.-S. Varr´, S. Janot, M. Giraud (LIFL) e Biomanycores June 2009 8 / 20
  • 9. HPC Bioinformatics for human beings ? Research in High-Performance Computing nice ideas, nice papers but not always exploited A few HPC bioinformatics frameworks projects... ⇒ far from everyday usage of bioinformaticians and biologists J.-S. Varr´, S. Janot, M. Giraud (LIFL) e Biomanycores June 2009 9 / 20
  • 10. www.biomanycores.org 1. Share OpenCL code = public repository, open-source 2. Make it easy = Bio∗ integration 3. Benchmark algorithms, implementations, hardware J.-S. Varr´, S. Janot, M. Giraud (LIFL) e Biomanycores June 2009 10 / 20
  • 11. www.biomanycores.org 1. Share OpenCL code (currently CUDA) = public repository, open-source 2. Make it easy = Bio∗ integration 3. Benchmark algorithms, implementations, hardware J.-S. Varr´, S. Janot, M. Giraud (LIFL) e Biomanycores June 2009 10 / 20
  • 12. Already included projects SWcuda – Smith-Waterman protein alignment CRIBI Genomics, University of Padova, Italy S. A. Manavski, G. Valle, CUDA compatible GPU cards as efficient hardware accelerators for Smith-Waterman sequence alignment, BMC Bioinformatics 2008, 9(S2):S10 pknotsRG – pseudonots of an RNA sequence Universit¨t Bielefeld, Germany a J. Reeder, P. Steffen, R. Giegerich, pknotsRG: RNA pseudoknot folding including near-optimal structures and sliding windows, Nucl. Acids. Res., 2007 cudaPWM – scan a PWM against a DNA sequence Sequoia, LIFL, INRIA, Universit´ Lille 1 e M. Giraud, J.-S. Varr´, Parallel Position Weight Matrices Algorithms, ISPDC’09 e Interfaces to BioJava 1.6, BioPerl 1.52, and Biopython 1.50b J.-S. Varr´, S. Janot, M. Giraud (LIFL) e Biomanycores June 2009 11 / 20
  • 13. J.-S. Varr´, S. Janot, M. Giraud (LIFL) e Biomanycores June 2009 12 / 20
  • 14. Biopython + CRIBI SW from Bio i m p o r t SeqIO from Biomanycores i m p o r t PadovaSW bank = SeqIO . parse ( open ( ” u n i p r o t −s t a r t . f a ” ) , ” f a s t a ” ) f o r query i n SeqIO . parse ( open ( ” p r o t 6 4 . f a ” ) , ” f a s t a ” ) : handle = PadovaSW . run ( query , bank ) result = PadovaSW . SWParser ( ) . parse ( ) p r i n t result J.-S. Varr´, S. Janot, M. Giraud (LIFL) e Biomanycores June 2009 13 / 20
  • 15. Biopython + CRIBI SW Tests on a GeForce 8800 biopython$ time python sw-demo.py cuda ** cd ../bin/ ; ./swcuda config.gpu ../tmp/swcuda.fa ../tmp/swcuda.bank ** 1.846s 12098 results... [(84.0, 0, 0, ’sp|P30350|ADH1_ANAPL’), (81.0, 0, 0, ’sp|P23991|ADH1_CHICK’), (81.0, real 2.81 user 1.79 sys 0.27 biopython$ time python sw-demo.py cpu ** cd ../bin/ ; ./swcuda config.cpu ../tmp/swcuda.fa ../tmp/swcuda.bank ** 16.604s 12098 results... [(84.0, 0, 0, ’sp|P30350|ADH1_ANAPL’), (81.0, 0, 0, ’sp|P23991|ADH1_CHICK’), (81.0, real 17.57 user 16.42 sys 0.14 10× – 15× paper speedup (BMC Bioinformatics 2008, 9S2) 8.7× application speedup 6.2× final speedup (including Biopython/Biomanycores) J.-S. Varr´, S. Janot, M. Giraud (LIFL) e Biomanycores June 2009 14 / 20
  • 16. BioPerl + CRIBI SW BioPerl tutorial u s e Bio : : Tools : : pSW ; $factory = new Bio : : Tools : : pSW ( ’−m a t r i x ’=> ’ b l o s u m 6 2 . b l a ’ , ’−gap ’ ← =>12, ’−e x t ’ =>2) ; $factory−>alig n_and_sh ow ( $seq1 , $seq2 , STDOUT ) ; $aln = $factory−>p a i r w i s e _ a l i g n m e n t ( $seq1 , $seq2 ) ; With biomanycores u s e Bio : : SeqIO ; u s e Biomanycores : : PadovaSW ; $factory = PadovaSW−>new ( ) ; $factory−>swcuda ( $inputseq , $bank ) ; @r = $factory−>parse_result ( ) ; J.-S. Varr´, S. Janot, M. Giraud (LIFL) e Biomanycores June 2009 15 / 20
  • 17. BioJava + PWM i m p o r t org . biojavax . bio . seq . RichSequence ; i m p o r t org . biojava . bio . dp . S i m p l e W e i g h t M a t r i x ; ... i m p o r t org . biomanycores . bio . pwm . ∗ ; ... { LillePWMScan scanner = new LillePWMScan ( launcher ) ; // r e a d t h e s e q u e n c e R i c h S e q u e n c e I t e r a t o r it = n u l l ; Buffe redRead er in1 = new Buff eredRead er ( new FileReader ( args [ 1 ] ) ) ; it = RichSequence . IOTools . readFastaDNA ( in1 , n u l l ) ; RichSequence query = it . n e x t R i c h S e q u e nc e ( ) ; // r e a d a w e i g h t m a t r i x S i m p l e W e i g h t M a t r i x pwm = PFMParser . PARSER . get ( args [ 2 ] , alph , ”ACGT” ) ; // s c a n t h e s e q u e n c e List<PWMHit> al = scanner . scan ( query , pwm , threshold ) ; } J.-S. Varr´, S. Janot, M. Giraud (LIFL) e Biomanycores June 2009 16 / 20
  • 18. Challenges Differents APIs, different philosophies BioJava : no external program execution ? Object representation (alignments) Object existence (PWM) Minimal modifications to the source code of applications CribiSW : command-line arguments Real-world pipelines ? Bio∗ are not HPC frameworks Succession of several programs Usage: requires CUDA / OpenCL SDK J.-S. Varr´, S. Janot, M. Giraud (LIFL) e Biomanycores June 2009 17 / 20
  • 19. Licenses Projects must have an open-source licence Bio∗ interfaces : same license than mother API BioJava: LGPL 2.1 BioPerl: Perl artistic license Biopython: Biopython license J.-S. Varr´, S. Janot, M. Giraud (LIFL) e Biomanycores June 2009 18 / 20
  • 20. www.biomanycores.org 1. Share OpenCL code (currently CUDA) = public repository, open-source ⇒ bring new projects 2. Make it easy = Bio∗ integration ⇒ integrate new projects ⇒ improve current interfaces 3. Benchmark algorithms, implementations, hardware ⇒ think ! J.-S. Varr´, S. Janot, M. Giraud (LIFL) e Biomanycores June 2009 19 / 20