SlideShare una empresa de Scribd logo
1 de 123
Descargar para leer sin conexión
Genomes on Rails
  has_many :sequences
Hello
➊
Previously

    ➋
Production

    ➌
 Process
➊ Previously
The human genome


   15 years to decode
     3 billion letters
$3 billion
$3 billion ++
Race for the prize
Open data
Open source
Perl
Lots of Perl
Lots of Perl
 ~4500 modules
Onwards!
40 species
Map evolutionary
     space
Compare genomes
compare species
Compare genomes
compare species
Compare genomes

 compa re indi viduals
More Perl
~1500 modules
Quantum leap!
1000 personal
  genomes
beyond 23andme
1000 personal
  genomes
Hypertension
Diabetes
Coronary heart disease
Bipolar disorder
Malaria
➋ Production
Register projects


Register samples


  Sample prep


  Sequencing


    Analysis
Change!
Flexible data capture
Virtual fields
Sample


   Name
  Organism
Concentration
class Sample < ActiveRecord::Base
  has_many :descriptors
  has_many :descriptor_values
end
Key value pairs
Faster than you’d think
Change!
V1               V2


   Sample          Sample


   Name             Name
  Organism        Organism
Concentration   Concentration
                   Origin
                Quality metric
Rationalize!
V1               V2


   Sample          Sample


   Name             Name
  Organism        Organism
Concentration   Concentration
                   Origin
                Quality metric
Mapping!
V1               V3


   Sample          Sample


   Name             Name
  Organism         Species
Concentration   Concentration
   Origin          Origin
                Quality metric
Pipeline management
Workflow

 Task 1        Task 2        Task 3

  Name          Name         Name
Operator     Serial number   Passed
Instrument        Kit
Throughput!
320Tb 450 CPU
320Tb 450 CPU   Archive
75   Tb
pilot study!
Multiple apps
Multiple instances
Loosely coupled
Loose coupling is hard
Deployment
Maintenance
Monitoring
Hard to maintain
  separation
Support novel science
Single code base
nginx reverse proxy
fairnginx
Mongrel
Fast deployment
Automate everything
Play well with others!



 Interoperability!
Legacy databases
RESTful services
Generate API stubs
SCALE!
Trillionics
2   X
150Tb per week
Over 6 months
More hardware
400 additional nodes
additional 360 Tb
Towards a
Virtual Institute
Lots of data
Lots of data, lots of
      people
Lots of data, lots of
people, lots of compute
Lots of data, lots of
people, lots of compute,
      lots of uses
Lots of data, lots of
 people, lots of compute,
lots of uses, lots and lots
   and lots and lots...
➌ Process
Concept Requirements Development   Product
takes too lon
                                   g
Concept Requirements Development       Product
takes too lon
                                    g
Concept Requirements Development        Product




       the se change
Plan                 Development




 REVIEW        Concept




What we need              Get ready
Focused
Project owner is key
Weekly releases
More flexible
Less time
Better transparency
Less software
Sequencing informatics
Thank you
GREENISGOOD.CO.UK

Más contenido relacionado

Similar a Genomes On Rails

Services For Science April 2009
Services For Science April 2009Services For Science April 2009
Services For Science April 2009
Ian Foster
 
Utility HPC: Right Systems, Right Scale, Right Science
Utility HPC: Right Systems, Right Scale, Right ScienceUtility HPC: Right Systems, Right Scale, Right Science
Utility HPC: Right Systems, Right Scale, Right Science
Chef Software, Inc.
 

Similar a Genomes On Rails (20)

Introducing VSWarehouse - A Scalable Genetic Data Warehouse for VarSeq
Introducing VSWarehouse - A Scalable Genetic Data Warehouse for VarSeqIntroducing VSWarehouse - A Scalable Genetic Data Warehouse for VarSeq
Introducing VSWarehouse - A Scalable Genetic Data Warehouse for VarSeq
 
Rare Variant Analysis Workflows: Analyzing NGS Data in Large Cohorts
Rare Variant Analysis Workflows: Analyzing NGS Data in Large CohortsRare Variant Analysis Workflows: Analyzing NGS Data in Large Cohorts
Rare Variant Analysis Workflows: Analyzing NGS Data in Large Cohorts
 
Cassava genome hub
Cassava genome hubCassava genome hub
Cassava genome hub
 
Services For Science April 2009
Services For Science April 2009Services For Science April 2009
Services For Science April 2009
 
2014 nicta-reproducibility
2014 nicta-reproducibility2014 nicta-reproducibility
2014 nicta-reproducibility
 
BioIT Europe 2010 - BioCatalogue
BioIT Europe 2010 - BioCatalogueBioIT Europe 2010 - BioCatalogue
BioIT Europe 2010 - BioCatalogue
 
Sharing massive data analysis: from provenance to linked experiment reports
Sharing massive data analysis: from provenance to linked experiment reportsSharing massive data analysis: from provenance to linked experiment reports
Sharing massive data analysis: from provenance to linked experiment reports
 
DCC Keynote 2007
DCC Keynote 2007DCC Keynote 2007
DCC Keynote 2007
 
Next generation sequencing & microarray-- Genotypic Technology
Next generation sequencing & microarray-- Genotypic TechnologyNext generation sequencing & microarray-- Genotypic Technology
Next generation sequencing & microarray-- Genotypic Technology
 
Ngs part i 2013
Ngs part i 2013Ngs part i 2013
Ngs part i 2013
 
So you want to do a: RNAseq experiment, Differential Gene Expression Analysis
So you want to do a: RNAseq experiment, Differential Gene Expression AnalysisSo you want to do a: RNAseq experiment, Differential Gene Expression Analysis
So you want to do a: RNAseq experiment, Differential Gene Expression Analysis
 
Utility HPC: Right Systems, Right Scale, Right Science
Utility HPC: Right Systems, Right Scale, Right ScienceUtility HPC: Right Systems, Right Scale, Right Science
Utility HPC: Right Systems, Right Scale, Right Science
 
Dr Justin Schonfeld - Bioinformatics Applications
Dr Justin Schonfeld - Bioinformatics ApplicationsDr Justin Schonfeld - Bioinformatics Applications
Dr Justin Schonfeld - Bioinformatics Applications
 
Zero ETL analytics with LLAP in Azure HDInsight
Zero ETL analytics with LLAP in Azure HDInsightZero ETL analytics with LLAP in Azure HDInsight
Zero ETL analytics with LLAP in Azure HDInsight
 
BioThings API: Building a FAIR API Ecosystem for Biomedical Knowledge
BioThings API: Building a FAIR API Ecosystem for Biomedical KnowledgeBioThings API: Building a FAIR API Ecosystem for Biomedical Knowledge
BioThings API: Building a FAIR API Ecosystem for Biomedical Knowledge
 
Solr and Elasticsearch, a performance study
Solr and Elasticsearch, a performance studySolr and Elasticsearch, a performance study
Solr and Elasticsearch, a performance study
 
Making NGS Data Analysis Clinically Practical: Repeatable and Time-Effective ...
Making NGS Data Analysis Clinically Practical: Repeatable and Time-Effective ...Making NGS Data Analysis Clinically Practical: Repeatable and Time-Effective ...
Making NGS Data Analysis Clinically Practical: Repeatable and Time-Effective ...
 
Beyond Transparency: Success & Lessons From tambisBoston2003
Beyond Transparency: Success & Lessons From tambisBoston2003Beyond Transparency: Success & Lessons From tambisBoston2003
Beyond Transparency: Success & Lessons From tambisBoston2003
 
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
 
Prometheus lightning talk (Devops Dublin March 2015)
Prometheus lightning talk (Devops Dublin March 2015)Prometheus lightning talk (Devops Dublin March 2015)
Prometheus lightning talk (Devops Dublin March 2015)
 

Más de Matt Wood

Más de Matt Wood (12)

Genomics in the Cloud
Genomics in the CloudGenomics in the Cloud
Genomics in the Cloud
 
How to make Friendfeeds and influence people
How to make Friendfeeds and influence peopleHow to make Friendfeeds and influence people
How to make Friendfeeds and influence people
 
Genomes On Rails
Genomes On RailsGenomes On Rails
Genomes On Rails
 
Into The Wonderful
Into The WonderfulInto The Wonderful
Into The Wonderful
 
Extreme Informatics
Extreme InformaticsExtreme Informatics
Extreme Informatics
 
What can Bioinformaticians learn from YouTube?
What can Bioinformaticians learn from YouTube?What can Bioinformaticians learn from YouTube?
What can Bioinformaticians learn from YouTube?
 
The A to Z of developing for the web
The A to Z of developing for the webThe A to Z of developing for the web
The A to Z of developing for the web
 
Introduction to Scrum
Introduction to ScrumIntroduction to Scrum
Introduction to Scrum
 
30 Minutes With Rails
30 Minutes With Rails30 Minutes With Rails
30 Minutes With Rails
 
Subversion Best Practices
Subversion Best PracticesSubversion Best Practices
Subversion Best Practices
 
Lucene
LuceneLucene
Lucene
 
Introduction to the Semantic Web
Introduction to the Semantic WebIntroduction to the Semantic Web
Introduction to the Semantic Web
 

Último

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Último (20)

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 

Genomes On Rails