SlideShare una empresa de Scribd logo
1 de 23
BioMake Chris Mungall Berkeley Drosophila Genome Project [email_address]
build networks ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Existing approaches ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
biomake: executable computational protocols ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Example Genomic Sequence Analysis Pipeline ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Target dependencies Assembled Genomic Sequence Chunked Sequence RepeatMasked Sequence Gene Predictions Blast Alignments FastaDB (local) FastaDB (remote) BlastIndexed FastaDB Relational Database Gene Models BlastP & HMM Alignments XML Export XML Import Flatfile Export
Specifying Targets Chunked Sequence Blast Alignments FastaDB (local) BlastIndexed FastaDB formatdb(D) run: formatdb -i  D   blast(P,S,D,A) req:  formatdb(D) run: blastall -p  P  -i  S  -d  D A flat: S-blast/ bn(S).D . P .out  A generic target pattern has a name and arguments A target pattern has tags for specifying dependencies, actions and filesystem or database IDs
BioMake Execution TARGET: blast(blastx, my.seq, nr.fly, -) SUBTARGET: formatdb(nr.fly) RUN: formatdb -i nr -p T OUT: nr.fly.{psq,pin,phr} formatdb-nr.fly.OK RUN:  blastall -p blastx -i my.seq -d nr.fly  OUT: my.seq-blast/my.seq.nr.fly.blastx.out my.seq-blast/my.seq.nr.fly.blastx.out.OK formatdb(D) run: formatdb -i  D   blast(P,S,D,A) req:  formatdb(D) run: blastall -p  P  -i  S  -d  D A flat: S-blast/ bn(S).D . P .out
Targets can be nested bop(S,B) run: apollo -bop -s  S B  -o  target flat:  B .game.xml  store(XML) run: xml2db  XML store(bop(my.seq, genscan(repeatmask(my.seq, drosophila)))) target instantiations can be thought of as a  skolem IDs repeatmask(S,Org) run: repeatmask  S  -a  Org flat:  S .mask  genscan(S) run: genscan  S flat:  S -pred/ bn(S) .genscan.out
Iteration ,[object Object],[object Object],[object Object],[object Object],[object Object]
Iterating over datasets splitfasta(F) run:  splitfasta.pl -d  F -split -md5  F flat:  F -split/ bn(F). pathlist comment:  splitfasta.pl is part of the biomake distro analyze_multifasta(F) iterate:  analyze_seq(S)   where  S  in  splitfasta(F) analyze_seq(S) req:  genie(S) blast( blastx ,S, nr ,-) MultiFasta >seq1 TAGGTATTGGTT AGGTGCGTCCTC >seq2 GCGGTATAGCTT TTCCTTCTCTCT >seq3 CAAAGCAGAGAT ATATTTATTCGC >seq1 TAGGTATTGGTT AGGTGCGTCCTC >seq2 GCGGTATAGCTT TTCCTTCTCTCT >seq3 CAAAGCAGAGAT ATATTTATTCGC seq1.genie.out seq1.nr.blastx.out seq2.genie seq2.nr.blastx.out seq3.nr.blastx.out seq3.genie
Controlling the runmode ,[object Object],[object Object],[object Object],[object Object],[object Object]
runmode example blast(P,S,D,A) req:  formatdb(D) run: blastall -p  P  -i  S  -d  D A flat: S-blast/ bn(S).D . P .out runmode: async(qsubwrap) The blast job will be executed on the compute farm via qsubwrap (comes with biomake distro) Upon submission, the status target status_run(blast(P,S,D,A)) will be generated; on completion, the target status_ok (blast(P,S,D,A)) will be generated biomake can automatically handle moving data in and out between user’s filesystem (or db) and local cluster nodes
Datastores ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Asynchronous Execution local machine running biomake scheduler node NFS For each target T to be built: 1) biomake fetches status of T skips T if status = ok/run 2) biomake stores  status_run(T) 3) biomake creates a runner agent  script and submits it to the cluster 4) continue onto next target 1) agent  fetches  any input data 2) agent  runs  command (eg blast) synchronously 3) agent  stores  result 4) agent  stores  status of T as ‘ ok’ or ‘err’ AGENT AGENT
Specifying rules ,[object Object],[object Object],[object Object],[object Object],[object Object]
Embedding prolog facts <data relation=“fastadb” cols=“ID,SeqAlphabet,SOType,Org” del=“ws”> </data> <prolog> nucdb(D):- fastadb(D,na,_,_). pepdb(D):- fastadb(D,aa,_,_). </prolog> formatdb(D) run: formatdb -i D -p ‘T’  {pepdb(D)} run: formatdb -i D -p ‘F’  {nucdb(D)} na na aa D melanogaster cDNA cdna.fly.fst D melanogaster EST est.fly.fst D melanogaster polypeptide protein.fly.fst
BioMake module system ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
biomake extensibility ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
BioMake in use ,[object Object],[object Object],[object Object],[object Object]
Running biomake ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Acknowledgements Shengqiang Shu Sima Misra Erwin Frise Eric Smith Mark Yandell George Hartzell Chris Smith Simon Prochnik Jon Tupy Josh Kaminker Karen Eilbeck Nomi Harris Suzanna Lewis Gerry Rubin
Problem Specification ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Más contenido relacionado

La actualidad más candente

ch6-pv2-device-drivers
ch6-pv2-device-driversch6-pv2-device-drivers
ch6-pv2-device-drivers
yushiang fu
 
Hacking and Computer Forensics
Hacking and Computer ForensicsHacking and Computer Forensics
Hacking and Computer Forensics
Kristian Arjianto
 
Smashing the stack for fun and profit
Smashing the stack for fun and profitSmashing the stack for fun and profit
Smashing the stack for fun and profit
Alexey Miasoedov
 

La actualidad más candente (19)

Language-agnostic data analysis workflows and reproducible research
Language-agnostic data analysis workflows and reproducible researchLanguage-agnostic data analysis workflows and reproducible research
Language-agnostic data analysis workflows and reproducible research
 
101 4.2 maintain the integrity of filesystems
101 4.2 maintain the integrity of filesystems101 4.2 maintain the integrity of filesystems
101 4.2 maintain the integrity of filesystems
 
ch6-pv2-device-drivers
ch6-pv2-device-driversch6-pv2-device-drivers
ch6-pv2-device-drivers
 
Lineage-driven Fault Injection, SIGMOD'15
Lineage-driven Fault Injection, SIGMOD'15Lineage-driven Fault Injection, SIGMOD'15
Lineage-driven Fault Injection, SIGMOD'15
 
Unit 5 dwqb ans
Unit 5 dwqb ansUnit 5 dwqb ans
Unit 5 dwqb ans
 
4.2 maintain the integrity of filesystems
4.2 maintain the integrity of filesystems4.2 maintain the integrity of filesystems
4.2 maintain the integrity of filesystems
 
Linux Ethernet device driver
Linux Ethernet device driverLinux Ethernet device driver
Linux Ethernet device driver
 
Page table manipulation attack
Page table manipulation attackPage table manipulation attack
Page table manipulation attack
 
Namespaces and Autoloading
Namespaces and AutoloadingNamespaces and Autoloading
Namespaces and Autoloading
 
H2HC - R3MF
H2HC - R3MFH2HC - R3MF
H2HC - R3MF
 
intro unix/linux 08
intro unix/linux 08intro unix/linux 08
intro unix/linux 08
 
Hacking and Computer Forensics
Hacking and Computer ForensicsHacking and Computer Forensics
Hacking and Computer Forensics
 
File handling in C
File handling in CFile handling in C
File handling in C
 
File Management in C
File Management in CFile Management in C
File Management in C
 
The Internals of "Hello World" Program
The Internals of "Hello World" ProgramThe Internals of "Hello World" Program
The Internals of "Hello World" Program
 
File in C language
File in C languageFile in C language
File in C language
 
Buffer overflow attacks
Buffer overflow attacksBuffer overflow attacks
Buffer overflow attacks
 
File handling in 'C'
File handling in 'C'File handling in 'C'
File handling in 'C'
 
Smashing the stack for fun and profit
Smashing the stack for fun and profitSmashing the stack for fun and profit
Smashing the stack for fun and profit
 

Similar a BioMake BOSC 2004

Systemtap
SystemtapSystemtap
Systemtap
Feng Yu
 
Swift profiling middleware and tools
Swift profiling middleware and toolsSwift profiling middleware and tools
Swift profiling middleware and tools
zhang hua
 
Linux kernel-rootkit-dev - Wonokaerun
Linux kernel-rootkit-dev - WonokaerunLinux kernel-rootkit-dev - Wonokaerun
Linux kernel-rootkit-dev - Wonokaerun
idsecconf
 
Virtual platform
Virtual platformVirtual platform
Virtual platform
sean chen
 

Similar a BioMake BOSC 2004 (20)

[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf
 
Systemtap
SystemtapSystemtap
Systemtap
 
Pragmatic Optimization in Modern Programming - Demystifying the Compiler
Pragmatic Optimization in Modern Programming - Demystifying the CompilerPragmatic Optimization in Modern Programming - Demystifying the Compiler
Pragmatic Optimization in Modern Programming - Demystifying the Compiler
 
Efficient System Monitoring in Cloud Native Environments
Efficient System Monitoring in Cloud Native EnvironmentsEfficient System Monitoring in Cloud Native Environments
Efficient System Monitoring in Cloud Native Environments
 
Workshop NGS data analysis - 2
Workshop NGS data analysis - 2Workshop NGS data analysis - 2
Workshop NGS data analysis - 2
 
Datasets and tools_from_ncbi_and_elsewhere_for_microbiome_research_v_62817
Datasets and tools_from_ncbi_and_elsewhere_for_microbiome_research_v_62817Datasets and tools_from_ncbi_and_elsewhere_for_microbiome_research_v_62817
Datasets and tools_from_ncbi_and_elsewhere_for_microbiome_research_v_62817
 
Debugging Python with gdb
Debugging Python with gdbDebugging Python with gdb
Debugging Python with gdb
 
Srgoc dotnet
Srgoc dotnetSrgoc dotnet
Srgoc dotnet
 
Os lab final
Os lab finalOs lab final
Os lab final
 
Introduction to FPGA acceleration
Introduction to FPGA accelerationIntroduction to FPGA acceleration
Introduction to FPGA acceleration
 
Debugging node in prod
Debugging node in prodDebugging node in prod
Debugging node in prod
 
Technical Overview of Apache Drill by Jacques Nadeau
Technical Overview of Apache Drill by Jacques NadeauTechnical Overview of Apache Drill by Jacques Nadeau
Technical Overview of Apache Drill by Jacques Nadeau
 
Swift profiling middleware and tools
Swift profiling middleware and toolsSwift profiling middleware and tools
Swift profiling middleware and tools
 
Smash the Stack: Writing a Buffer Overflow Exploit (Win32)
Smash the Stack: Writing a Buffer Overflow Exploit (Win32)Smash the Stack: Writing a Buffer Overflow Exploit (Win32)
Smash the Stack: Writing a Buffer Overflow Exploit (Win32)
 
Linux kernel-rootkit-dev - Wonokaerun
Linux kernel-rootkit-dev - WonokaerunLinux kernel-rootkit-dev - Wonokaerun
Linux kernel-rootkit-dev - Wonokaerun
 
Mach-O Internals
Mach-O InternalsMach-O Internals
Mach-O Internals
 
Dpdk applications
Dpdk applicationsDpdk applications
Dpdk applications
 
Virtual platform
Virtual platformVirtual platform
Virtual platform
 
Membrane protein-ligand tutorial with GROMACS.pdf
Membrane protein-ligand tutorial with GROMACS.pdfMembrane protein-ligand tutorial with GROMACS.pdf
Membrane protein-ligand tutorial with GROMACS.pdf
 
Node Interactive Debugging Node.js In Production
Node Interactive Debugging Node.js In ProductionNode Interactive Debugging Node.js In Production
Node Interactive Debugging Node.js In Production
 

Más de Chris Mungall

Más de Chris Mungall (20)

MADICES Mungall 2022.pptx
MADICES Mungall 2022.pptxMADICES Mungall 2022.pptx
MADICES Mungall 2022.pptx
 
Scaling up semantics; lessons learned across the life sciences
Scaling up semantics; lessons learned across the life sciencesScaling up semantics; lessons learned across the life sciences
Scaling up semantics; lessons learned across the life sciences
 
LinkML Intro July 2022.pptx PLEASE VIEW THIS ON ZENODO
LinkML Intro July 2022.pptx PLEASE VIEW THIS ON ZENODOLinkML Intro July 2022.pptx PLEASE VIEW THIS ON ZENODO
LinkML Intro July 2022.pptx PLEASE VIEW THIS ON ZENODO
 
Ontology Access Kit_ Workshop Intro Slides.pptx
Ontology Access Kit_ Workshop Intro Slides.pptxOntology Access Kit_ Workshop Intro Slides.pptx
Ontology Access Kit_ Workshop Intro Slides.pptx
 
LinkML Intro (for Monarch devs)
LinkML Intro (for Monarch devs)LinkML Intro (for Monarch devs)
LinkML Intro (for Monarch devs)
 
LinkML presentation to Yosemite Group
LinkML presentation to Yosemite GroupLinkML presentation to Yosemite Group
LinkML presentation to Yosemite Group
 
Experiences in the biosciences with the open biological ontologies foundry an...
Experiences in the biosciences with the open biological ontologies foundry an...Experiences in the biosciences with the open biological ontologies foundry an...
Experiences in the biosciences with the open biological ontologies foundry an...
 
All together now: piecing together the knowledge graph of life
All together now: piecing together the knowledge graph of lifeAll together now: piecing together the knowledge graph of life
All together now: piecing together the knowledge graph of life
 
Collaboratively Creating the Knowledge Graph of Life
Collaboratively Creating the Knowledge Graph of LifeCollaboratively Creating the Knowledge Graph of Life
Collaboratively Creating the Knowledge Graph of Life
 
Representation of kidney structures in Uberon
Representation of kidney structures in UberonRepresentation of kidney structures in Uberon
Representation of kidney structures in Uberon
 
SparqlProg (BioHackathon 2019)
SparqlProg (BioHackathon 2019)SparqlProg (BioHackathon 2019)
SparqlProg (BioHackathon 2019)
 
Ontology Development Kit: Bio-Ontologies 2019
Ontology Development Kit: Bio-Ontologies 2019Ontology Development Kit: Bio-Ontologies 2019
Ontology Development Kit: Bio-Ontologies 2019
 
US2TS: Reasoning over multiple open bio-ontologies to make machines and human...
US2TS: Reasoning over multiple open bio-ontologies to make machines and human...US2TS: Reasoning over multiple open bio-ontologies to make machines and human...
US2TS: Reasoning over multiple open bio-ontologies to make machines and human...
 
Uberon: opening up to community contributions
Uberon: opening up to community contributionsUberon: opening up to community contributions
Uberon: opening up to community contributions
 
Modeling exposure events and adverse outcome pathways using ontologies
Modeling exposure events and adverse outcome pathways using ontologiesModeling exposure events and adverse outcome pathways using ontologies
Modeling exposure events and adverse outcome pathways using ontologies
 
Causal reasoning using the Relation Ontology
Causal reasoning using the Relation OntologyCausal reasoning using the Relation Ontology
Causal reasoning using the Relation Ontology
 
US2TS presentation on Gene Ontology
US2TS presentation on Gene OntologyUS2TS presentation on Gene Ontology
US2TS presentation on Gene Ontology
 
Introduction to the BioLink datamodel
Introduction to the BioLink datamodelIntroduction to the BioLink datamodel
Introduction to the BioLink datamodel
 
Computing on Phenotypes AMP 2015
Computing on Phenotypes AMP 2015Computing on Phenotypes AMP 2015
Computing on Phenotypes AMP 2015
 
ENVO GSC 2015
ENVO GSC 2015ENVO GSC 2015
ENVO GSC 2015
 

Último

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 

Último (20)

Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 

BioMake BOSC 2004

  • 1. BioMake Chris Mungall Berkeley Drosophila Genome Project [email_address]
  • 2.
  • 3.
  • 4.
  • 5.
  • 6. Target dependencies Assembled Genomic Sequence Chunked Sequence RepeatMasked Sequence Gene Predictions Blast Alignments FastaDB (local) FastaDB (remote) BlastIndexed FastaDB Relational Database Gene Models BlastP & HMM Alignments XML Export XML Import Flatfile Export
  • 7. Specifying Targets Chunked Sequence Blast Alignments FastaDB (local) BlastIndexed FastaDB formatdb(D) run: formatdb -i D blast(P,S,D,A) req: formatdb(D) run: blastall -p P -i S -d D A flat: S-blast/ bn(S).D . P .out A generic target pattern has a name and arguments A target pattern has tags for specifying dependencies, actions and filesystem or database IDs
  • 8. BioMake Execution TARGET: blast(blastx, my.seq, nr.fly, -) SUBTARGET: formatdb(nr.fly) RUN: formatdb -i nr -p T OUT: nr.fly.{psq,pin,phr} formatdb-nr.fly.OK RUN: blastall -p blastx -i my.seq -d nr.fly OUT: my.seq-blast/my.seq.nr.fly.blastx.out my.seq-blast/my.seq.nr.fly.blastx.out.OK formatdb(D) run: formatdb -i D blast(P,S,D,A) req: formatdb(D) run: blastall -p P -i S -d D A flat: S-blast/ bn(S).D . P .out
  • 9. Targets can be nested bop(S,B) run: apollo -bop -s S B -o target flat: B .game.xml store(XML) run: xml2db XML store(bop(my.seq, genscan(repeatmask(my.seq, drosophila)))) target instantiations can be thought of as a skolem IDs repeatmask(S,Org) run: repeatmask S -a Org flat: S .mask genscan(S) run: genscan S flat: S -pred/ bn(S) .genscan.out
  • 10.
  • 11. Iterating over datasets splitfasta(F) run: splitfasta.pl -d F -split -md5 F flat: F -split/ bn(F). pathlist comment: splitfasta.pl is part of the biomake distro analyze_multifasta(F) iterate: analyze_seq(S) where S in splitfasta(F) analyze_seq(S) req: genie(S) blast( blastx ,S, nr ,-) MultiFasta >seq1 TAGGTATTGGTT AGGTGCGTCCTC >seq2 GCGGTATAGCTT TTCCTTCTCTCT >seq3 CAAAGCAGAGAT ATATTTATTCGC >seq1 TAGGTATTGGTT AGGTGCGTCCTC >seq2 GCGGTATAGCTT TTCCTTCTCTCT >seq3 CAAAGCAGAGAT ATATTTATTCGC seq1.genie.out seq1.nr.blastx.out seq2.genie seq2.nr.blastx.out seq3.nr.blastx.out seq3.genie
  • 12.
  • 13. runmode example blast(P,S,D,A) req: formatdb(D) run: blastall -p P -i S -d D A flat: S-blast/ bn(S).D . P .out runmode: async(qsubwrap) The blast job will be executed on the compute farm via qsubwrap (comes with biomake distro) Upon submission, the status target status_run(blast(P,S,D,A)) will be generated; on completion, the target status_ok (blast(P,S,D,A)) will be generated biomake can automatically handle moving data in and out between user’s filesystem (or db) and local cluster nodes
  • 14.
  • 15. Asynchronous Execution local machine running biomake scheduler node NFS For each target T to be built: 1) biomake fetches status of T skips T if status = ok/run 2) biomake stores status_run(T) 3) biomake creates a runner agent script and submits it to the cluster 4) continue onto next target 1) agent fetches any input data 2) agent runs command (eg blast) synchronously 3) agent stores result 4) agent stores status of T as ‘ ok’ or ‘err’ AGENT AGENT
  • 16.
  • 17. Embedding prolog facts <data relation=“fastadb” cols=“ID,SeqAlphabet,SOType,Org” del=“ws”> </data> <prolog> nucdb(D):- fastadb(D,na,_,_). pepdb(D):- fastadb(D,aa,_,_). </prolog> formatdb(D) run: formatdb -i D -p ‘T’ {pepdb(D)} run: formatdb -i D -p ‘F’ {nucdb(D)} na na aa D melanogaster cDNA cdna.fly.fst D melanogaster EST est.fly.fst D melanogaster polypeptide protein.fly.fst
  • 18.
  • 19.
  • 20.
  • 21.
  • 22. Acknowledgements Shengqiang Shu Sima Misra Erwin Frise Eric Smith Mark Yandell George Hartzell Chris Smith Simon Prochnik Jon Tupy Josh Kaminker Karen Eilbeck Nomi Harris Suzanna Lewis Gerry Rubin
  • 23.