SlideShare a Scribd company logo
1 of 10
KIPPER: SEQUENCE DATABASE VERSIONING FOR
GALAXY BIOINFORMATICS SERVERS
Damion Dooley
Hsiao Lab, BC Public Health Microbiology & Reference Laboratory
And UBC Department of Pathology, Vancouver, Canada
https://github.com/Public-Health-Bioinformatics
/kipper /versioned_data
How to recreate sequencing analysis?
Retrieve or redo sequencing data
Get right software versions
Get databases as they
appeared on a certain date
Nice database vs. juggernaut
Periodically published
Varying ability to download past versions
RDP RNA v10.1 – 11.4 (5.5 GB)
Silva RNA v89 – 119 (2.6 GB)
Uniref (~50 versions, ~35 GB latest)
Pseudo-versioned
Version stated but no way to get past ones?
No client software for insert/delete diff
NCBI nt (58 GB) NCBI nr (78 GB) Ancient juggernaut supporting
immortal database and crushing
unwary sys admins in its path
Kipper – fetch!
What is a poor server admin to do?
Kipper data store
Metadata file
Kipper data store
Volume file(s)
Version listing
• Add new version:
• Retrieve a version by id:
$ Kipper rdp_rna –i download.fasta –o.
$ Kipper rdp_rna –e –n11
• Kipper is a python script
$ Kipper rdp_rna
Galaxy - version retrieval
Version retrieval
Acknowledgements
This work was supported by Genome Canada / Genome BC Grant “A
Federated Bioinformatics Platform for Public Health Microbial
Genomics” to Fiona Brinkman, Gary Van Domselaar and William Hsiao.
More information about the IRIDA project (Integrated Rapid Infectious
Disease Analysis) can be found at http://www.irida.ca

More Related Content

Viewers also liked

Viewers also liked (7)

Tita12015
Tita12015Tita12015
Tita12015
 
Presupuestos procesales de toda demanda
Presupuestos procesales de toda demandaPresupuestos procesales de toda demanda
Presupuestos procesales de toda demanda
 
Phy exppp chap10
Phy exppp chap10Phy exppp chap10
Phy exppp chap10
 
workplace committees 2000
workplace committees 2000workplace committees 2000
workplace committees 2000
 
Sales Promotion Planner -2016
Sales Promotion Planner -2016 Sales Promotion Planner -2016
Sales Promotion Planner -2016
 
tipos de crisis
tipos de crisistipos de crisis
tipos de crisis
 
Lena Koinberg | Biologi: Cellen och arvet
Lena Koinberg | Biologi: Cellen och arvetLena Koinberg | Biologi: Cellen och arvet
Lena Koinberg | Biologi: Cellen och arvet
 

More from IRIDA_community

Robertson immemxi final March 2016
Robertson immemxi final March 2016Robertson immemxi final March 2016
Robertson immemxi final March 2016IRIDA_community
 
Hetman immem xi final March 2016
Hetman immem xi final March 2016Hetman immem xi final March 2016
Hetman immem xi final March 2016IRIDA_community
 
Barker immemxi final March 2016
Barker immemxi final March 2016Barker immemxi final March 2016
Barker immemxi final March 2016IRIDA_community
 
Emma Food on workshop allergy_eg
Emma Food on workshop allergy_egEmma Food on workshop allergy_eg
Emma Food on workshop allergy_egIRIDA_community
 
Biocuration gen epio_poster
Biocuration gen epio_posterBiocuration gen epio_poster
Biocuration gen epio_posterIRIDA_community
 
Emma Griffiths ASM microbe gen_epio_poster
Emma Griffiths ASM microbe gen_epio_posterEmma Griffiths ASM microbe gen_epio_poster
Emma Griffiths ASM microbe gen_epio_posterIRIDA_community
 
Julie Shay CCBC poster may 11 2016
Julie Shay CCBC poster may 11 2016Julie Shay CCBC poster may 11 2016
Julie Shay CCBC poster may 11 2016IRIDA_community
 
Integrate Ontologies into your apps
Integrate Ontologies into your appsIntegrate Ontologies into your apps
Integrate Ontologies into your appsIRIDA_community
 
Report Calc for Quality Control
Report Calc for Quality ControlReport Calc for Quality Control
Report Calc for Quality ControlIRIDA_community
 
Gen epio immem_griffiths
Gen epio immem_griffithsGen epio immem_griffiths
Gen epio immem_griffithsIRIDA_community
 
Grand round whsiao_may2015
Grand round whsiao_may2015Grand round whsiao_may2015
Grand round whsiao_may2015IRIDA_community
 
IRIDA: Canada’s federated platform for genomic epidemiology, ABPHM 2015 WHsiao
IRIDA: Canada’s federated platform for genomic epidemiology, ABPHM 2015 WHsiaoIRIDA: Canada’s federated platform for genomic epidemiology, ABPHM 2015 WHsiao
IRIDA: Canada’s federated platform for genomic epidemiology, ABPHM 2015 WHsiaoIRIDA_community
 
Domselaar GMI8 Beijing Canadian WGS Surveillance Experience
Domselaar GMI8 Beijing Canadian WGS Surveillance ExperienceDomselaar GMI8 Beijing Canadian WGS Surveillance Experience
Domselaar GMI8 Beijing Canadian WGS Surveillance ExperienceIRIDA_community
 

More from IRIDA_community (16)

Robertson immemxi final March 2016
Robertson immemxi final March 2016Robertson immemxi final March 2016
Robertson immemxi final March 2016
 
Hetman immem xi final March 2016
Hetman immem xi final March 2016Hetman immem xi final March 2016
Hetman immem xi final March 2016
 
Barker immemxi final March 2016
Barker immemxi final March 2016Barker immemxi final March 2016
Barker immemxi final March 2016
 
Emma FoodON poster3
Emma FoodON poster3Emma FoodON poster3
Emma FoodON poster3
 
Emma Food on workshop allergy_eg
Emma Food on workshop allergy_egEmma Food on workshop allergy_eg
Emma Food on workshop allergy_eg
 
Biocuration gen epio_poster
Biocuration gen epio_posterBiocuration gen epio_poster
Biocuration gen epio_poster
 
Emma Griffiths ASM microbe gen_epio_poster
Emma Griffiths ASM microbe gen_epio_posterEmma Griffiths ASM microbe gen_epio_poster
Emma Griffiths ASM microbe gen_epio_poster
 
Julie Shay CCBC poster may 11 2016
Julie Shay CCBC poster may 11 2016Julie Shay CCBC poster may 11 2016
Julie Shay CCBC poster may 11 2016
 
Integrate Ontologies into your apps
Integrate Ontologies into your appsIntegrate Ontologies into your apps
Integrate Ontologies into your apps
 
Report Calc for Quality Control
Report Calc for Quality ControlReport Calc for Quality Control
Report Calc for Quality Control
 
Irida immemxi hsiao
Irida immemxi hsiaoIrida immemxi hsiao
Irida immemxi hsiao
 
Gen epio immem_griffiths
Gen epio immem_griffithsGen epio immem_griffiths
Gen epio immem_griffiths
 
Irida bccdc dec10_2015
Irida bccdc dec10_2015Irida bccdc dec10_2015
Irida bccdc dec10_2015
 
Grand round whsiao_may2015
Grand round whsiao_may2015Grand round whsiao_may2015
Grand round whsiao_may2015
 
IRIDA: Canada’s federated platform for genomic epidemiology, ABPHM 2015 WHsiao
IRIDA: Canada’s federated platform for genomic epidemiology, ABPHM 2015 WHsiaoIRIDA: Canada’s federated platform for genomic epidemiology, ABPHM 2015 WHsiao
IRIDA: Canada’s federated platform for genomic epidemiology, ABPHM 2015 WHsiao
 
Domselaar GMI8 Beijing Canadian WGS Surveillance Experience
Domselaar GMI8 Beijing Canadian WGS Surveillance ExperienceDomselaar GMI8 Beijing Canadian WGS Surveillance Experience
Domselaar GMI8 Beijing Canadian WGS Surveillance Experience
 

Recently uploaded

Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPirithiRaju
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsAArockiyaNisha
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)Areesha Ahmad
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksSérgio Sacani
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSarthak Sekhar Mondal
 
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxBroad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxjana861314
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxkessiyaTpeter
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfmuntazimhurra
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfSumit Kumar yadav
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencySheetal Arora
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPirithiRaju
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptxanandsmhk
 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxpradhanghanshyam7136
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...jana861314
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PPRINCE C P
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxAleenaTreesaSaji
 

Recently uploaded (20)

Engler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomyEngler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomy
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
 
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxBroad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptx
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C P
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptx
 

Kipper: Sequence database versioning for Galaxy bioinformatics servers

  • 1. KIPPER: SEQUENCE DATABASE VERSIONING FOR GALAXY BIOINFORMATICS SERVERS Damion Dooley Hsiao Lab, BC Public Health Microbiology & Reference Laboratory And UBC Department of Pathology, Vancouver, Canada https://github.com/Public-Health-Bioinformatics /kipper /versioned_data
  • 2. How to recreate sequencing analysis? Retrieve or redo sequencing data Get right software versions Get databases as they appeared on a certain date
  • 3. Nice database vs. juggernaut Periodically published Varying ability to download past versions RDP RNA v10.1 – 11.4 (5.5 GB) Silva RNA v89 – 119 (2.6 GB) Uniref (~50 versions, ~35 GB latest) Pseudo-versioned Version stated but no way to get past ones? No client software for insert/delete diff NCBI nt (58 GB) NCBI nr (78 GB) Ancient juggernaut supporting immortal database and crushing unwary sys admins in its path
  • 4. Kipper – fetch! What is a poor server admin to do?
  • 7. Version listing • Add new version: • Retrieve a version by id: $ Kipper rdp_rna –i download.fasta –o. $ Kipper rdp_rna –e –n11 • Kipper is a python script $ Kipper rdp_rna
  • 8. Galaxy - version retrieval
  • 10. Acknowledgements This work was supported by Genome Canada / Genome BC Grant “A Federated Bioinformatics Platform for Public Health Microbial Genomics” to Fiona Brinkman, Gary Van Domselaar and William Hsiao. More information about the IRIDA project (Integrated Rapid Infectious Disease Analysis) can be found at http://www.irida.ca

Editor's Notes

  1. EXPERIMENTAL REPRODUCABILITY HOW do you recreate sequencing analysis say 2 years from now?
  2. This juggling requires some infrastructure MAYBE WE HAVE … WHAT ABOUT THOSE …
  3. NO SOFTWARE FOR GENERATING OLDER RESULTS INFRASTRUCTURE PROBLEM Anyone doing NCBI diff processing? What if diff version regeneration speed >= download speed?
  4. BIOMAJ + KIPPER KIPPER IS PROTOTYPE KIPPER HANDLES LARGE FILES THAT GIT HAS TO EXTERNALIZE
  5. ----- Meeting Notes (15-07-10 01:08) ----- the volume file has within it all the diff info for a range of versions. For a particular fasta sequence WHEN fasta sequence CREATED, WHAT version deleted in ACCESSION ID, title & description
  6. ----- Meeting Notes (15-07-10 01:08) ----- the volume file has within it all the diff info for a range of versions. For a particular fasta sequence WHEN fasta sequence CREATED, WHAT version deleted in ACCESSION ID, title & description
  7. Retrieve any number of databases Access by global version date Lists dates (and/or ids) of available versions Select one or more workflows on given database
  8. SHOULDN’T PROVIDERS be convinced to supply their databases in a versioned format !