SlideShare una empresa de Scribd logo
1 de 19
Concurrent Bioinformatics Software FORDISCOVERING Genome-Wide Patternsand Word-based Genomic Signatures Jens Lichtenberg, Kyle Kurz, Xiaoyu Liang, Rami Al-Ouran, Lev Neiman, Lee Nau, Joshua Welch, Edwin Jacox, Thomas Bitterman, Klaus Ecker, Laura Elnitski, Frank Drews, Stephen Lee, Lonnie Welch
The WordSeeker Tool Enumeration Suffix Tree and Suffix Array Radix Tree Scoring Clustering Sequence Clustering Word Clustering Conservation Analysis Phast Cons Score Extraction Location Distributions Sequence Coverage Min set of words necessary to 	cover all sequences Module Discovery Enumerative Ranger Markup Basic Functional Elements
Software Properties Google code repository: http://code.google.com/p/word-seeker/ GNU General Public License v3 Doxygen code generator (Internal Documentation). Svn for command line access: http://word-seeker.googlecode.com/svn/trunk Requirements G++ compiler version 4.1* or higher OpenMP headers MPI environment (distributed version) For visualizations and other post-processing steps Perl 5.8.8, TFBS (http://tfbs.genereg.net/) SET::Scalar LWP::Simple Parallel::Forkmanager GD::Graphs::bars, Algorithm::Cluster  Bio::SeqIO (all available through CPAN) Gnuplot version 4.2 or higher
Need for a Scalable Approach Word Enumeration Module Represents a set of biological input sequences based on some data structure Keeps track of words, word counts, sequence counts, and word locations Need to keep the data persistent in memory Word Scoring Module Determines statistical scores for each word Frequent lookups for words and substrings of words  Example: Markov order m model requires lookups for all  substrings of up to length m for all words ,[object Object],lookups low
Enumeration Approaches Total number of nucleotides in the input sequences: n Word length: m
Distributed Solution Tasks executed on different nodes Distributed Memory Multi-core Solution Tasks executed on different cores Shared Memory Solution Parallelization
Parallel Software Properties Shared Memory Open MP parallelization Simple, portable, directives that compile even on non supported architectures Simple loops are run in parallel on multiple processors Distributed Memory MPI parallelization Hardware optimizations and support for Fortran, C/C++, Perl Each node is provided a subset of the data to process “Smart” division of tasks is key
Results Analyzed the Arabidopsis thaliana genome All segments and the full genome Multiple word lengths (1-20) Searched top words against AGRIS (repository of known elements in A. thaliana) Characterized the Framework Speedup and runtime analysis Radix Trie and Suffix Tree
Memory Requirements for Arabidopsis thaliana Conducted at the Ohio Supercomputer Center
Execution Times for Arabidopsis thaliana
Speedup, efficiency and timing using A. thaliana core promoter sequences. Analyzing the Parallel System
Shared and Distributed Memory Speedup Radix Trie Suffix Tree
Shared and Distributed Memory Efficiency Radix Trie Suffix Tree
Shared and Distributed Memory Performance Radix Trie Suffix Tree
Scoring Speedup Contribution Runtime Scoring
Results: Pushing the limits
Summary Parallel Shared memory on single nodes Distributed memory on 5 nodes High-throughput Full genomes analyzed in under 5 hours Long word lengths Genomes approaching 20 Smaller files often 100 or greater Powerful analysis Detailed statistics Degeneracy via clustering Additional post-processing (scatter plots, logos, etc.)
Future Work Post-processing Word distributions Sequence clustering Gbrowse visualization Further parallelization Within a node Greater distributed abstraction (more prefixes)
Questions?

Más contenido relacionado

La actualidad más candente

How to be a bioinformatician
How to be a bioinformaticianHow to be a bioinformatician
How to be a bioinformaticianChristian Frech
 
Introduction to Galaxy and RNA-Seq
Introduction to Galaxy and RNA-SeqIntroduction to Galaxy and RNA-Seq
Introduction to Galaxy and RNA-SeqEnis Afgan
 
Summary of Simultaneous Multithreading: Maximizing On-Chip Parallelism
Summary of Simultaneous Multithreading: Maximizing On-Chip ParallelismSummary of Simultaneous Multithreading: Maximizing On-Chip Parallelism
Summary of Simultaneous Multithreading: Maximizing On-Chip ParallelismFarwa Ansari
 
A Survey of NGS Data Analysis on Hadoop
A Survey of NGS Data Analysis on HadoopA Survey of NGS Data Analysis on Hadoop
A Survey of NGS Data Analysis on HadoopChung-Tsai Su
 
LO-PHI: Low-Observable Physical Host Instrumentation for Malware Analysis
LO-PHI: Low-Observable Physical Host Instrumentation for Malware AnalysisLO-PHI: Low-Observable Physical Host Instrumentation for Malware Analysis
LO-PHI: Low-Observable Physical Host Instrumentation for Malware AnalysisPietro De Nicolao
 
Near Real time Indexing Kafka Messages to Apache Blur using Spark Streaming
Near Real time Indexing Kafka Messages to Apache Blur using Spark StreamingNear Real time Indexing Kafka Messages to Apache Blur using Spark Streaming
Near Real time Indexing Kafka Messages to Apache Blur using Spark StreamingDibyendu Bhattacharya
 
Progressive Provenance Capture Through Re-computation
Progressive Provenance Capture Through Re-computationProgressive Provenance Capture Through Re-computation
Progressive Provenance Capture Through Re-computationPaul Groth
 
Dynamic Resource Allocation Algorithm using Containers
Dynamic Resource Allocation Algorithm using ContainersDynamic Resource Allocation Algorithm using Containers
Dynamic Resource Allocation Algorithm using ContainersIRJET Journal
 
Fredrick Ishengoma - HDFS+- Erasure Coding Based Hadoop Distributed File System
Fredrick Ishengoma -  HDFS+- Erasure Coding Based Hadoop Distributed File SystemFredrick Ishengoma -  HDFS+- Erasure Coding Based Hadoop Distributed File System
Fredrick Ishengoma - HDFS+- Erasure Coding Based Hadoop Distributed File SystemFredrick Ishengoma
 

La actualidad más candente (20)

Pthread Library
Pthread LibraryPthread Library
Pthread Library
 
Taming Snakemake
Taming SnakemakeTaming Snakemake
Taming Snakemake
 
How to be a bioinformatician
How to be a bioinformaticianHow to be a bioinformatician
How to be a bioinformatician
 
Microkernel design
Microkernel designMicrokernel design
Microkernel design
 
eScience Cluster Arch. Overview
eScience Cluster Arch. OvervieweScience Cluster Arch. Overview
eScience Cluster Arch. Overview
 
Introduction to Galaxy and RNA-Seq
Introduction to Galaxy and RNA-SeqIntroduction to Galaxy and RNA-Seq
Introduction to Galaxy and RNA-Seq
 
Summary of Simultaneous Multithreading: Maximizing On-Chip Parallelism
Summary of Simultaneous Multithreading: Maximizing On-Chip ParallelismSummary of Simultaneous Multithreading: Maximizing On-Chip Parallelism
Summary of Simultaneous Multithreading: Maximizing On-Chip Parallelism
 
Wiki 2
Wiki 2Wiki 2
Wiki 2
 
Chapter04 new
Chapter04 newChapter04 new
Chapter04 new
 
A Survey of NGS Data Analysis on Hadoop
A Survey of NGS Data Analysis on HadoopA Survey of NGS Data Analysis on Hadoop
A Survey of NGS Data Analysis on Hadoop
 
LO-PHI: Low-Observable Physical Host Instrumentation for Malware Analysis
LO-PHI: Low-Observable Physical Host Instrumentation for Malware AnalysisLO-PHI: Low-Observable Physical Host Instrumentation for Malware Analysis
LO-PHI: Low-Observable Physical Host Instrumentation for Malware Analysis
 
Near Real time Indexing Kafka Messages to Apache Blur using Spark Streaming
Near Real time Indexing Kafka Messages to Apache Blur using Spark StreamingNear Real time Indexing Kafka Messages to Apache Blur using Spark Streaming
Near Real time Indexing Kafka Messages to Apache Blur using Spark Streaming
 
Progressive Provenance Capture Through Re-computation
Progressive Provenance Capture Through Re-computationProgressive Provenance Capture Through Re-computation
Progressive Provenance Capture Through Re-computation
 
NGS: Mapping and de novo assembly
NGS: Mapping and de novo assemblyNGS: Mapping and de novo assembly
NGS: Mapping and de novo assembly
 
Acdc
AcdcAcdc
Acdc
 
Kosmos Filesystem
Kosmos FilesystemKosmos Filesystem
Kosmos Filesystem
 
Dynamic Resource Allocation Algorithm using Containers
Dynamic Resource Allocation Algorithm using ContainersDynamic Resource Allocation Algorithm using Containers
Dynamic Resource Allocation Algorithm using Containers
 
Hadoop
HadoopHadoop
Hadoop
 
Fredrick Ishengoma - HDFS+- Erasure Coding Based Hadoop Distributed File System
Fredrick Ishengoma -  HDFS+- Erasure Coding Based Hadoop Distributed File SystemFredrick Ishengoma -  HDFS+- Erasure Coding Based Hadoop Distributed File System
Fredrick Ishengoma - HDFS+- Erasure Coding Based Hadoop Distributed File System
 
Cn lab manual sb 19_scsl56 (1)
Cn lab manual sb 19_scsl56 (1)Cn lab manual sb 19_scsl56 (1)
Cn lab manual sb 19_scsl56 (1)
 

Destacado

Как стать информационным продюсером
Как стать информационным продюсеромКак стать информационным продюсером
Как стать информационным продюсеромАльберт Коррч
 
Оптимизация интерактивного тестирования с использованием метрики Покрытие кода
Оптимизация интерактивного тестирования с использованием метрики Покрытие кодаОптимизация интерактивного тестирования с использованием метрики Покрытие кода
Оптимизация интерактивного тестирования с использованием метрики Покрытие кодаSPB SQA Group
 
Venkatesan bosc2010 onto-toolkit
Venkatesan bosc2010 onto-toolkitVenkatesan bosc2010 onto-toolkit
Venkatesan bosc2010 onto-toolkitBOSC 2010
 
C:\Users\The Andersens\Desktop\Karin\I Wanna Learn To Play Like The Dolphins
C:\Users\The Andersens\Desktop\Karin\I Wanna Learn To Play Like The DolphinsC:\Users\The Andersens\Desktop\Karin\I Wanna Learn To Play Like The Dolphins
C:\Users\The Andersens\Desktop\Karin\I Wanna Learn To Play Like The Dolphinskkindig
 
Results from survey.
Results from survey.Results from survey.
Results from survey.afrostwick
 
Edison.powerpoint.106.v2
Edison.powerpoint.106.v2Edison.powerpoint.106.v2
Edison.powerpoint.106.v2aedison
 
Nars cosmetics coupon
Nars cosmetics couponNars cosmetics coupon
Nars cosmetics couponMaterazzi3
 
Snapshot Of Umt For Investment
Snapshot Of Umt For InvestmentSnapshot Of Umt For Investment
Snapshot Of Umt For Investmentmqazi
 
Gogirl indonesia
Gogirl indonesiaGogirl indonesia
Gogirl indonesiaJay Lee
 
_right_ Goozzy TechCrunch presentation
_right_ Goozzy TechCrunch presentation_right_ Goozzy TechCrunch presentation
_right_ Goozzy TechCrunch presentationalarin
 
자바스터디 4
자바스터디 4자바스터디 4
자바스터디 4jangpd007
 
From Forests to Farms, and Back Again: Land Use Change in the Hudson Valley
From Forests to Farms, and Back Again: Land Use Change in the Hudson Valley From Forests to Farms, and Back Again: Land Use Change in the Hudson Valley
From Forests to Farms, and Back Again: Land Use Change in the Hudson Valley Cary Institute of Ecosystem Studies
 
Influenta brandurilor asupra consumatorilor social media
Influenta brandurilor asupra consumatorilor social mediaInfluenta brandurilor asupra consumatorilor social media
Influenta brandurilor asupra consumatorilor social mediaValentin Vesa
 

Destacado (20)

Как стать информационным продюсером
Как стать информационным продюсеромКак стать информационным продюсером
Как стать информационным продюсером
 
Оптимизация интерактивного тестирования с использованием метрики Покрытие кода
Оптимизация интерактивного тестирования с использованием метрики Покрытие кодаОптимизация интерактивного тестирования с использованием метрики Покрытие кода
Оптимизация интерактивного тестирования с использованием метрики Покрытие кода
 
Venkatesan bosc2010 onto-toolkit
Venkatesan bosc2010 onto-toolkitVenkatesan bosc2010 onto-toolkit
Venkatesan bosc2010 onto-toolkit
 
1.2 Hubert Bolduc
1.2 Hubert Bolduc1.2 Hubert Bolduc
1.2 Hubert Bolduc
 
C:\Users\The Andersens\Desktop\Karin\I Wanna Learn To Play Like The Dolphins
C:\Users\The Andersens\Desktop\Karin\I Wanna Learn To Play Like The DolphinsC:\Users\The Andersens\Desktop\Karin\I Wanna Learn To Play Like The Dolphins
C:\Users\The Andersens\Desktop\Karin\I Wanna Learn To Play Like The Dolphins
 
Limecoconut
LimecoconutLimecoconut
Limecoconut
 
Results from survey.
Results from survey.Results from survey.
Results from survey.
 
Edison.powerpoint.106.v2
Edison.powerpoint.106.v2Edison.powerpoint.106.v2
Edison.powerpoint.106.v2
 
Nars cosmetics coupon
Nars cosmetics couponNars cosmetics coupon
Nars cosmetics coupon
 
Snapshot Of Umt For Investment
Snapshot Of Umt For InvestmentSnapshot Of Umt For Investment
Snapshot Of Umt For Investment
 
Gogirl indonesia
Gogirl indonesiaGogirl indonesia
Gogirl indonesia
 
Portfolio acadêmico
Portfolio acadêmicoPortfolio acadêmico
Portfolio acadêmico
 
_right_ Goozzy TechCrunch presentation
_right_ Goozzy TechCrunch presentation_right_ Goozzy TechCrunch presentation
_right_ Goozzy TechCrunch presentation
 
자바스터디 4
자바스터디 4자바스터디 4
자바스터디 4
 
From Forests to Farms, and Back Again: Land Use Change in the Hudson Valley
From Forests to Farms, and Back Again: Land Use Change in the Hudson Valley From Forests to Farms, and Back Again: Land Use Change in the Hudson Valley
From Forests to Farms, and Back Again: Land Use Change in the Hudson Valley
 
Influenta brandurilor asupra consumatorilor social media
Influenta brandurilor asupra consumatorilor social mediaInfluenta brandurilor asupra consumatorilor social media
Influenta brandurilor asupra consumatorilor social media
 
CRITERIOS DE OBTENCIÓN DEL CERTIFICADO BAI EUSKARARI
CRITERIOS DE OBTENCIÓN DEL CERTIFICADO BAI EUSKARARICRITERIOS DE OBTENCIÓN DEL CERTIFICADO BAI EUSKARARI
CRITERIOS DE OBTENCIÓN DEL CERTIFICADO BAI EUSKARARI
 
Latest trends in em
Latest trends in emLatest trends in em
Latest trends in em
 
LE LABEL BAI EUSKARARI: CRITERES D'OBTENCION
LE LABEL BAI EUSKARARI: CRITERES D'OBTENCIONLE LABEL BAI EUSKARARI: CRITERES D'OBTENCION
LE LABEL BAI EUSKARARI: CRITERES D'OBTENCION
 
Gustar2
Gustar2Gustar2
Gustar2
 

Similar a Lichtenberg bosc2010 wordseeker

Effect of Virtualization on OS Interference
Effect of Virtualization on OS InterferenceEffect of Virtualization on OS Interference
Effect of Virtualization on OS InterferenceEric Van Hensbergen
 
Lecture 3,4 operating systems
Lecture 3,4   operating systemsLecture 3,4   operating systems
Lecture 3,4 operating systemsPradeep Kumar TS
 
Lecture 3,4 operating systems
Lecture 3,4   operating systemsLecture 3,4   operating systems
Lecture 3,4 operating systemsPradeep Kumar TS
 
Operating system concepts
Operating system conceptsOperating system concepts
Operating system conceptsGreen Ecosystem
 
Petapath HP Cast 12 - Programming for High Performance Accelerated Systems
Petapath HP Cast 12 - Programming for High Performance Accelerated SystemsPetapath HP Cast 12 - Programming for High Performance Accelerated Systems
Petapath HP Cast 12 - Programming for High Performance Accelerated Systemsdairsie
 
Unit1 principle of programming language
Unit1 principle of programming languageUnit1 principle of programming language
Unit1 principle of programming languageVasavi College of Engg
 
Operating System 4 1193308760782240 2
Operating System 4 1193308760782240 2Operating System 4 1193308760782240 2
Operating System 4 1193308760782240 2mona_hakmy
 
Operating System 4
Operating System 4Operating System 4
Operating System 4tech2click
 
.Net framework interview questions
.Net framework interview questions.Net framework interview questions
.Net framework interview questionsMir Majid
 
Intro to Perfect - LA presentation
Intro to Perfect - LA presentationIntro to Perfect - LA presentation
Intro to Perfect - LA presentationTim Taplin
 
Windows Operating system notes taken from somewhere
Windows Operating system notes taken from somewhereWindows Operating system notes taken from somewhere
Windows Operating system notes taken from somewheretoursofecstacy
 

Similar a Lichtenberg bosc2010 wordseeker (20)

Linux Driver and Embedded Developer with Android Course Content & Highlights
Linux Driver and Embedded Developer with Android Course Content & HighlightsLinux Driver and Embedded Developer with Android Course Content & Highlights
Linux Driver and Embedded Developer with Android Course Content & Highlights
 
Linux Driver and Embedded Developer Course Highlights
Linux Driver and  Embedded Developer Course HighlightsLinux Driver and  Embedded Developer Course Highlights
Linux Driver and Embedded Developer Course Highlights
 
Effect of Virtualization on OS Interference
Effect of Virtualization on OS InterferenceEffect of Virtualization on OS Interference
Effect of Virtualization on OS Interference
 
Mmp hotos2003-slides
Mmp hotos2003-slidesMmp hotos2003-slides
Mmp hotos2003-slides
 
Lecture 3,4 operating systems
Lecture 3,4   operating systemsLecture 3,4   operating systems
Lecture 3,4 operating systems
 
Lecture 3,4 operating systems
Lecture 3,4   operating systemsLecture 3,4   operating systems
Lecture 3,4 operating systems
 
Hardware & softwares
Hardware & softwaresHardware & softwares
Hardware & softwares
 
App A
App AApp A
App A
 
Operating system concepts
Operating system conceptsOperating system concepts
Operating system concepts
 
Petapath HP Cast 12 - Programming for High Performance Accelerated Systems
Petapath HP Cast 12 - Programming for High Performance Accelerated SystemsPetapath HP Cast 12 - Programming for High Performance Accelerated Systems
Petapath HP Cast 12 - Programming for High Performance Accelerated Systems
 
Unit1 principle of programming language
Unit1 principle of programming languageUnit1 principle of programming language
Unit1 principle of programming language
 
Chapter 22 - Windows XP
Chapter 22 - Windows XPChapter 22 - Windows XP
Chapter 22 - Windows XP
 
Lamp
LampLamp
Lamp
 
Unix1
Unix1Unix1
Unix1
 
Operating System 4 1193308760782240 2
Operating System 4 1193308760782240 2Operating System 4 1193308760782240 2
Operating System 4 1193308760782240 2
 
Operating System 4
Operating System 4Operating System 4
Operating System 4
 
.Net framework interview questions
.Net framework interview questions.Net framework interview questions
.Net framework interview questions
 
Open64 compiler
Open64 compilerOpen64 compiler
Open64 compiler
 
Intro to Perfect - LA presentation
Intro to Perfect - LA presentationIntro to Perfect - LA presentation
Intro to Perfect - LA presentation
 
Windows Operating system notes taken from somewhere
Windows Operating system notes taken from somewhereWindows Operating system notes taken from somewhere
Windows Operating system notes taken from somewhere
 

Más de BOSC 2010

Mercer bosc2010 microsoft_framework
Mercer bosc2010 microsoft_frameworkMercer bosc2010 microsoft_framework
Mercer bosc2010 microsoft_frameworkBOSC 2010
 
Langmead bosc2010 cloud-genomics
Langmead bosc2010 cloud-genomicsLangmead bosc2010 cloud-genomics
Langmead bosc2010 cloud-genomicsBOSC 2010
 
Schultheiss bosc2010 persistance-web-services
Schultheiss bosc2010 persistance-web-servicesSchultheiss bosc2010 persistance-web-services
Schultheiss bosc2010 persistance-web-servicesBOSC 2010
 
Swertz bosc2010 molgenis
Swertz bosc2010 molgenisSwertz bosc2010 molgenis
Swertz bosc2010 molgenisBOSC 2010
 
Rice bosc2010 emboss
Rice bosc2010 embossRice bosc2010 emboss
Rice bosc2010 embossBOSC 2010
 
Morris bosc2010 evoker
Morris bosc2010 evokerMorris bosc2010 evoker
Morris bosc2010 evokerBOSC 2010
 
Kono bosc2010 pathway_projector
Kono bosc2010 pathway_projectorKono bosc2010 pathway_projector
Kono bosc2010 pathway_projectorBOSC 2010
 
Kanterakis bosc2010 molgenis
Kanterakis bosc2010 molgenisKanterakis bosc2010 molgenis
Kanterakis bosc2010 molgenisBOSC 2010
 
Gautier bosc2010 pythonbioconductor
Gautier bosc2010 pythonbioconductorGautier bosc2010 pythonbioconductor
Gautier bosc2010 pythonbioconductorBOSC 2010
 
Gardler bosc2010 community_developmentattheasf
Gardler bosc2010 community_developmentattheasfGardler bosc2010 community_developmentattheasf
Gardler bosc2010 community_developmentattheasfBOSC 2010
 
Friedberg bosc2010 iprstats
Friedberg bosc2010 iprstatsFriedberg bosc2010 iprstats
Friedberg bosc2010 iprstatsBOSC 2010
 
Fields bosc2010 bio_perl
Fields bosc2010 bio_perlFields bosc2010 bio_perl
Fields bosc2010 bio_perlBOSC 2010
 
Chapman bosc2010 biopython
Chapman bosc2010 biopythonChapman bosc2010 biopython
Chapman bosc2010 biopythonBOSC 2010
 
Bonnal bosc2010 bio_ruby
Bonnal bosc2010 bio_rubyBonnal bosc2010 bio_ruby
Bonnal bosc2010 bio_rubyBOSC 2010
 
Puton bosc2010 bio_python-modules-rna
Puton bosc2010 bio_python-modules-rnaPuton bosc2010 bio_python-modules-rna
Puton bosc2010 bio_python-modules-rnaBOSC 2010
 
Bader bosc2010 cytoweb
Bader bosc2010 cytowebBader bosc2010 cytoweb
Bader bosc2010 cytowebBOSC 2010
 
Talevich bosc2010 bio-phylo
Talevich bosc2010 bio-phyloTalevich bosc2010 bio-phylo
Talevich bosc2010 bio-phyloBOSC 2010
 
Zmasek bosc2010 aptx
Zmasek bosc2010 aptxZmasek bosc2010 aptx
Zmasek bosc2010 aptxBOSC 2010
 
Wilkinson bosc2010 moby-to-sadi
Wilkinson bosc2010 moby-to-sadiWilkinson bosc2010 moby-to-sadi
Wilkinson bosc2010 moby-to-sadiBOSC 2010
 
Taylor bosc2010
Taylor bosc2010Taylor bosc2010
Taylor bosc2010BOSC 2010
 

Más de BOSC 2010 (20)

Mercer bosc2010 microsoft_framework
Mercer bosc2010 microsoft_frameworkMercer bosc2010 microsoft_framework
Mercer bosc2010 microsoft_framework
 
Langmead bosc2010 cloud-genomics
Langmead bosc2010 cloud-genomicsLangmead bosc2010 cloud-genomics
Langmead bosc2010 cloud-genomics
 
Schultheiss bosc2010 persistance-web-services
Schultheiss bosc2010 persistance-web-servicesSchultheiss bosc2010 persistance-web-services
Schultheiss bosc2010 persistance-web-services
 
Swertz bosc2010 molgenis
Swertz bosc2010 molgenisSwertz bosc2010 molgenis
Swertz bosc2010 molgenis
 
Rice bosc2010 emboss
Rice bosc2010 embossRice bosc2010 emboss
Rice bosc2010 emboss
 
Morris bosc2010 evoker
Morris bosc2010 evokerMorris bosc2010 evoker
Morris bosc2010 evoker
 
Kono bosc2010 pathway_projector
Kono bosc2010 pathway_projectorKono bosc2010 pathway_projector
Kono bosc2010 pathway_projector
 
Kanterakis bosc2010 molgenis
Kanterakis bosc2010 molgenisKanterakis bosc2010 molgenis
Kanterakis bosc2010 molgenis
 
Gautier bosc2010 pythonbioconductor
Gautier bosc2010 pythonbioconductorGautier bosc2010 pythonbioconductor
Gautier bosc2010 pythonbioconductor
 
Gardler bosc2010 community_developmentattheasf
Gardler bosc2010 community_developmentattheasfGardler bosc2010 community_developmentattheasf
Gardler bosc2010 community_developmentattheasf
 
Friedberg bosc2010 iprstats
Friedberg bosc2010 iprstatsFriedberg bosc2010 iprstats
Friedberg bosc2010 iprstats
 
Fields bosc2010 bio_perl
Fields bosc2010 bio_perlFields bosc2010 bio_perl
Fields bosc2010 bio_perl
 
Chapman bosc2010 biopython
Chapman bosc2010 biopythonChapman bosc2010 biopython
Chapman bosc2010 biopython
 
Bonnal bosc2010 bio_ruby
Bonnal bosc2010 bio_rubyBonnal bosc2010 bio_ruby
Bonnal bosc2010 bio_ruby
 
Puton bosc2010 bio_python-modules-rna
Puton bosc2010 bio_python-modules-rnaPuton bosc2010 bio_python-modules-rna
Puton bosc2010 bio_python-modules-rna
 
Bader bosc2010 cytoweb
Bader bosc2010 cytowebBader bosc2010 cytoweb
Bader bosc2010 cytoweb
 
Talevich bosc2010 bio-phylo
Talevich bosc2010 bio-phyloTalevich bosc2010 bio-phylo
Talevich bosc2010 bio-phylo
 
Zmasek bosc2010 aptx
Zmasek bosc2010 aptxZmasek bosc2010 aptx
Zmasek bosc2010 aptx
 
Wilkinson bosc2010 moby-to-sadi
Wilkinson bosc2010 moby-to-sadiWilkinson bosc2010 moby-to-sadi
Wilkinson bosc2010 moby-to-sadi
 
Taylor bosc2010
Taylor bosc2010Taylor bosc2010
Taylor bosc2010
 

Último

Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 

Último (20)

Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 

Lichtenberg bosc2010 wordseeker

  • 1. Concurrent Bioinformatics Software FORDISCOVERING Genome-Wide Patternsand Word-based Genomic Signatures Jens Lichtenberg, Kyle Kurz, Xiaoyu Liang, Rami Al-Ouran, Lev Neiman, Lee Nau, Joshua Welch, Edwin Jacox, Thomas Bitterman, Klaus Ecker, Laura Elnitski, Frank Drews, Stephen Lee, Lonnie Welch
  • 2. The WordSeeker Tool Enumeration Suffix Tree and Suffix Array Radix Tree Scoring Clustering Sequence Clustering Word Clustering Conservation Analysis Phast Cons Score Extraction Location Distributions Sequence Coverage Min set of words necessary to cover all sequences Module Discovery Enumerative Ranger Markup Basic Functional Elements
  • 3. Software Properties Google code repository: http://code.google.com/p/word-seeker/ GNU General Public License v3 Doxygen code generator (Internal Documentation). Svn for command line access: http://word-seeker.googlecode.com/svn/trunk Requirements G++ compiler version 4.1* or higher OpenMP headers MPI environment (distributed version) For visualizations and other post-processing steps Perl 5.8.8, TFBS (http://tfbs.genereg.net/) SET::Scalar LWP::Simple Parallel::Forkmanager GD::Graphs::bars, Algorithm::Cluster Bio::SeqIO (all available through CPAN) Gnuplot version 4.2 or higher
  • 4.
  • 5. Enumeration Approaches Total number of nucleotides in the input sequences: n Word length: m
  • 6. Distributed Solution Tasks executed on different nodes Distributed Memory Multi-core Solution Tasks executed on different cores Shared Memory Solution Parallelization
  • 7. Parallel Software Properties Shared Memory Open MP parallelization Simple, portable, directives that compile even on non supported architectures Simple loops are run in parallel on multiple processors Distributed Memory MPI parallelization Hardware optimizations and support for Fortran, C/C++, Perl Each node is provided a subset of the data to process “Smart” division of tasks is key
  • 8. Results Analyzed the Arabidopsis thaliana genome All segments and the full genome Multiple word lengths (1-20) Searched top words against AGRIS (repository of known elements in A. thaliana) Characterized the Framework Speedup and runtime analysis Radix Trie and Suffix Tree
  • 9. Memory Requirements for Arabidopsis thaliana Conducted at the Ohio Supercomputer Center
  • 10. Execution Times for Arabidopsis thaliana
  • 11. Speedup, efficiency and timing using A. thaliana core promoter sequences. Analyzing the Parallel System
  • 12. Shared and Distributed Memory Speedup Radix Trie Suffix Tree
  • 13. Shared and Distributed Memory Efficiency Radix Trie Suffix Tree
  • 14. Shared and Distributed Memory Performance Radix Trie Suffix Tree
  • 15. Scoring Speedup Contribution Runtime Scoring
  • 17. Summary Parallel Shared memory on single nodes Distributed memory on 5 nodes High-throughput Full genomes analyzed in under 5 hours Long word lengths Genomes approaching 20 Smaller files often 100 or greater Powerful analysis Detailed statistics Degeneracy via clustering Additional post-processing (scatter plots, logos, etc.)
  • 18. Future Work Post-processing Word distributions Sequence clustering Gbrowse visualization Further parallelization Within a node Greater distributed abstraction (more prefixes)

Notas del editor

  1. MPI: Widely Supported by network interface designers