SlideShare una empresa de Scribd logo
1 de 1
Descargar para leer sin conexión
To provide a genomic narrative that can be trusted, microbiology
laboratories need quality control (QC) metrics to accompany their
genomic pipelines. QC metrics enable:
•  Implementing standards in routine lab sample processing
•  Performance comparison of pipeline optimizations or alternatives
•  Retrospective tracing of problems that arise
QC metrics are not easy to implement – they may need to be adjusted for
organism type, sample quality, sequencing technology and preparation,
and the mix of software components that are brought together in a
pipeline. Another challenge is to transform QC reporting from a manual
review of a pipeline’s disparate and often opaque application log files,
into an automated system of reporting and decision making that can be
adjusted by researchers and system administrators who are not expert
programmers.
We have developed a general purpose text-mining and reporting
application called Report Calc for Quality Control (RCQC) that works
directly within command-line scripts, or as a tool in Galaxy (an interactive
bioinformatics platform and workflow engine). An RCQC interpreter
follows instructions in a RCQC script to extract QC variables from various
application log and report files. It can implement rules that trigger
warning or failure statuses in an active pipeline. Various opportunities
arise for metrics along the stages of a genomic pipeline; our initial focus
is on basic assembly metrics as illustrated on this poster.
Abstract
RCQC Recipes
QC Ontology
Using the JSON-LD format’s metadata feature, RCQC can link particular
QC report terms to their standardized ontology counterparts. Creating a
controlled vocabulary for QC enables reports from disparate genomic
pipelines to be compared, which should eventually lead to a set of
pipeline metrics for accrediting commercial, government and open source
software. Within the context of the OBOFoundry of ontologies we are
introducing an ontology called GenEpiO (currently available at
https://github.com/Public-Health-Bioinformatics/irida_ontology) which
holds QC terms like "genome size ratio", “contig count”, etc. Using the
Protégé ontology editor it is easy to see the definitions for these terms.
Acknowledgements
IRIDA project funding is provided by Genome Canada, Genome BC, and
the Genomics R&D Initiative (GRDI) with additional support from Simon
Fraser University and Cystic Fibrosis Canada. We thank additional
project advisors for constructive comments.
We have started a library of simple "recipe" scripts that extract quality
control (QC) data from various reports like FastQC, QUAST, CheckM and
SPAdes into the popular and software-friendly JSON format (an auto-
generated HTML version of the same content is also available). One can
override sections of an RCQC recipe with settings that test variations in a
pipeline job. An example RCQC text-mining script and output HTML and
JSON report is shown below along with typical report files from other
pipeline tools.
1Department of Pathology, University of British Columbia; 2National Microbiology Laboratory, Public Health Agency of Canada; 3Department of Pathology,
University of British Columbia & BC Public Health Microbiology and Reference Laboratory
Damion M. Dooley1; Aaron J. Petkau2; Franklin Bristow2;
Gary Van Domselaar2; William W.L. Hsiao3
A Scripting Language For Standardized Evaluation Of Quality
Metrics In Galaxy And Command-line Driven Workflows
This work stemmed from the plan to enhance QC reporting on the web-
based Integrated Rapid Infectious Disease Analysis (www.IRIDA.ca)
project which manages sequence libraries and pipelines for food-born
pathogen assembly, annotation, SNP detection, and phylogenetic
analysis. RCQC has been developed to work as a command-line python
app, but in addition, since IRIDA uses Galaxy to execute its pipeline, we
have a Galaxy RCQC tool for “pro” users to develop recipes. We will be
offering a basic version of this tool that allows users without programming
skills to adjust key QC parameters only.
Recipes can include conditionals that trigger a halt to a pipeline by
sending the appropriate signal (exit code). More than one RCQC recipe
can be run in a pipeline, and their report output can be daisy chained in
order to contribute to a single collective report. QC metric conditionals
shown below can either signal a possible error situation (the “fail(qc…)”
call), or even call a halt to futile pipeline work (via “fail(job …)”).
adjusting parameters and formulae for pipeline operation – one that did
not require recompilation after each user-driven change. As a result, the
RCQC system provides a more transparent rule set that reduces the skill
needed to make process adjustments. Standard assembly pipeline QC
metrics are introduced which provide a blueprint for the way QC
components could be shared amongst NGS sequencing pipelines.
Further information, including source code, is available at
https://github.com/Public-Health-Bioinformatics/rcqc.
Implementation
Protege ontology editor view of GenEpiO assembly quality control terms
JSON-LDHTML
FLASHFastQC
CheckM
RCQC recipe for text-mining flash.log
In developing a scripting language to
do this work, we did not want to
reinvent the wheel (in fact RCQC offers
up for reuse all of python’s built-in
math and operator functions). We did
however need a flexible mechanism for
FLASH

Más contenido relacionado

Similar a Report Calc for Quality Control

Software Maintenance Bug Triaging
Software Maintenance Bug TriagingSoftware Maintenance Bug Triaging
Software Maintenance Bug Triaging
Ramis Khan
 
Cypress nlm himss13_03042013
Cypress nlm himss13_03042013Cypress nlm himss13_03042013
Cypress nlm himss13_03042013
Saul Kravitz
 
CV_SyedShoeb_2015
CV_SyedShoeb_2015CV_SyedShoeb_2015
CV_SyedShoeb_2015
Syed Shoeb
 
NRNB project Stoichiometry Plugin
NRNB project Stoichiometry PluginNRNB project Stoichiometry Plugin
NRNB project Stoichiometry Plugin
Sravanthi Sinha
 

Similar a Report Calc for Quality Control (20)

Scale and Load Testing of Micro-Service
Scale and Load Testing of Micro-ServiceScale and Load Testing of Micro-Service
Scale and Load Testing of Micro-Service
 
ABAP Test Cockpit in action with Doctor ZedGe and abap2xlsx
ABAP Test Cockpit in action with Doctor ZedGe and abap2xlsxABAP Test Cockpit in action with Doctor ZedGe and abap2xlsx
ABAP Test Cockpit in action with Doctor ZedGe and abap2xlsx
 
Solo Requisitos 2008 - 07 Upc
Solo Requisitos 2008 - 07 UpcSolo Requisitos 2008 - 07 Upc
Solo Requisitos 2008 - 07 Upc
 
safety assurence in process control
safety assurence in process controlsafety assurence in process control
safety assurence in process control
 
Software Maintenance Bug Triaging
Software Maintenance Bug TriagingSoftware Maintenance Bug Triaging
Software Maintenance Bug Triaging
 
Reports & Analysis_Katalyst HLS
Reports & Analysis_Katalyst HLSReports & Analysis_Katalyst HLS
Reports & Analysis_Katalyst HLS
 
Oracle application testing suite (OATS)
Oracle application testing suite (OATS)Oracle application testing suite (OATS)
Oracle application testing suite (OATS)
 
Cypress/VSAC Presentation at HIMSS13
Cypress/VSAC Presentation at HIMSS13Cypress/VSAC Presentation at HIMSS13
Cypress/VSAC Presentation at HIMSS13
 
[EN] Success Story ArianeGroup
[EN] Success Story ArianeGroup[EN] Success Story ArianeGroup
[EN] Success Story ArianeGroup
 
Cypress nlm himss13_03042013
Cypress nlm himss13_03042013Cypress nlm himss13_03042013
Cypress nlm himss13_03042013
 
Control source code quality using the SonarQube platform
Control source code quality using the SonarQube platformControl source code quality using the SonarQube platform
Control source code quality using the SonarQube platform
 
LT033 RIQAS Explained MAY17
LT033 RIQAS Explained MAY17LT033 RIQAS Explained MAY17
LT033 RIQAS Explained MAY17
 
Value stream mapping for DevOps
Value stream mapping for DevOpsValue stream mapping for DevOps
Value stream mapping for DevOps
 
CV_SyedShoeb_2015
CV_SyedShoeb_2015CV_SyedShoeb_2015
CV_SyedShoeb_2015
 
Overview on “Computer System Validation” CSV
Overview on  “Computer System Validation” CSVOverview on  “Computer System Validation” CSV
Overview on “Computer System Validation” CSV
 
NRNB project
NRNB projectNRNB project
NRNB project
 
NRNB project Stoichiometry Plugin
NRNB project Stoichiometry PluginNRNB project Stoichiometry Plugin
NRNB project Stoichiometry Plugin
 
Gowtham_resume
Gowtham_resumeGowtham_resume
Gowtham_resume
 
QualityGate for buyers of custom software
QualityGate for buyers of custom softwareQualityGate for buyers of custom software
QualityGate for buyers of custom software
 
SPS IPC Drives 2015 - Itris Automation paper
SPS IPC Drives 2015 - Itris Automation paperSPS IPC Drives 2015 - Itris Automation paper
SPS IPC Drives 2015 - Itris Automation paper
 

Más de IRIDA_community

Más de IRIDA_community (15)

Robertson immemxi final March 2016
Robertson immemxi final March 2016Robertson immemxi final March 2016
Robertson immemxi final March 2016
 
Hetman immem xi final March 2016
Hetman immem xi final March 2016Hetman immem xi final March 2016
Hetman immem xi final March 2016
 
Barker immemxi final March 2016
Barker immemxi final March 2016Barker immemxi final March 2016
Barker immemxi final March 2016
 
Emma FoodON poster3
Emma FoodON poster3Emma FoodON poster3
Emma FoodON poster3
 
Emma Food on workshop allergy_eg
Emma Food on workshop allergy_egEmma Food on workshop allergy_eg
Emma Food on workshop allergy_eg
 
Biocuration gen epio_poster
Biocuration gen epio_posterBiocuration gen epio_poster
Biocuration gen epio_poster
 
Emma Griffiths ASM microbe gen_epio_poster
Emma Griffiths ASM microbe gen_epio_posterEmma Griffiths ASM microbe gen_epio_poster
Emma Griffiths ASM microbe gen_epio_poster
 
Julie Shay CCBC poster may 11 2016
Julie Shay CCBC poster may 11 2016Julie Shay CCBC poster may 11 2016
Julie Shay CCBC poster may 11 2016
 
Integrate Ontologies into your apps
Integrate Ontologies into your appsIntegrate Ontologies into your apps
Integrate Ontologies into your apps
 
Irida immemxi hsiao
Irida immemxi hsiaoIrida immemxi hsiao
Irida immemxi hsiao
 
Gen epio immem_griffiths
Gen epio immem_griffithsGen epio immem_griffiths
Gen epio immem_griffiths
 
Irida bccdc dec10_2015
Irida bccdc dec10_2015Irida bccdc dec10_2015
Irida bccdc dec10_2015
 
Grand round whsiao_may2015
Grand round whsiao_may2015Grand round whsiao_may2015
Grand round whsiao_may2015
 
IRIDA: Canada’s federated platform for genomic epidemiology, ABPHM 2015 WHsiao
IRIDA: Canada’s federated platform for genomic epidemiology, ABPHM 2015 WHsiaoIRIDA: Canada’s federated platform for genomic epidemiology, ABPHM 2015 WHsiao
IRIDA: Canada’s federated platform for genomic epidemiology, ABPHM 2015 WHsiao
 
Domselaar GMI8 Beijing Canadian WGS Surveillance Experience
Domselaar GMI8 Beijing Canadian WGS Surveillance ExperienceDomselaar GMI8 Beijing Canadian WGS Surveillance Experience
Domselaar GMI8 Beijing Canadian WGS Surveillance Experience
 

Último

AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
VictorSzoltysek
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 

Último (20)

W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
 
How to Choose the Right Laravel Development Partner in New York City_compress...
How to Choose the Right Laravel Development Partner in New York City_compress...How to Choose the Right Laravel Development Partner in New York City_compress...
How to Choose the Right Laravel Development Partner in New York City_compress...
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdfAzure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
 
Exploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfExploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdf
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 

Report Calc for Quality Control

  • 1. To provide a genomic narrative that can be trusted, microbiology laboratories need quality control (QC) metrics to accompany their genomic pipelines. QC metrics enable: •  Implementing standards in routine lab sample processing •  Performance comparison of pipeline optimizations or alternatives •  Retrospective tracing of problems that arise QC metrics are not easy to implement – they may need to be adjusted for organism type, sample quality, sequencing technology and preparation, and the mix of software components that are brought together in a pipeline. Another challenge is to transform QC reporting from a manual review of a pipeline’s disparate and often opaque application log files, into an automated system of reporting and decision making that can be adjusted by researchers and system administrators who are not expert programmers. We have developed a general purpose text-mining and reporting application called Report Calc for Quality Control (RCQC) that works directly within command-line scripts, or as a tool in Galaxy (an interactive bioinformatics platform and workflow engine). An RCQC interpreter follows instructions in a RCQC script to extract QC variables from various application log and report files. It can implement rules that trigger warning or failure statuses in an active pipeline. Various opportunities arise for metrics along the stages of a genomic pipeline; our initial focus is on basic assembly metrics as illustrated on this poster. Abstract RCQC Recipes QC Ontology Using the JSON-LD format’s metadata feature, RCQC can link particular QC report terms to their standardized ontology counterparts. Creating a controlled vocabulary for QC enables reports from disparate genomic pipelines to be compared, which should eventually lead to a set of pipeline metrics for accrediting commercial, government and open source software. Within the context of the OBOFoundry of ontologies we are introducing an ontology called GenEpiO (currently available at https://github.com/Public-Health-Bioinformatics/irida_ontology) which holds QC terms like "genome size ratio", “contig count”, etc. Using the Protégé ontology editor it is easy to see the definitions for these terms. Acknowledgements IRIDA project funding is provided by Genome Canada, Genome BC, and the Genomics R&D Initiative (GRDI) with additional support from Simon Fraser University and Cystic Fibrosis Canada. We thank additional project advisors for constructive comments. We have started a library of simple "recipe" scripts that extract quality control (QC) data from various reports like FastQC, QUAST, CheckM and SPAdes into the popular and software-friendly JSON format (an auto- generated HTML version of the same content is also available). One can override sections of an RCQC recipe with settings that test variations in a pipeline job. An example RCQC text-mining script and output HTML and JSON report is shown below along with typical report files from other pipeline tools. 1Department of Pathology, University of British Columbia; 2National Microbiology Laboratory, Public Health Agency of Canada; 3Department of Pathology, University of British Columbia & BC Public Health Microbiology and Reference Laboratory Damion M. Dooley1; Aaron J. Petkau2; Franklin Bristow2; Gary Van Domselaar2; William W.L. Hsiao3 A Scripting Language For Standardized Evaluation Of Quality Metrics In Galaxy And Command-line Driven Workflows This work stemmed from the plan to enhance QC reporting on the web- based Integrated Rapid Infectious Disease Analysis (www.IRIDA.ca) project which manages sequence libraries and pipelines for food-born pathogen assembly, annotation, SNP detection, and phylogenetic analysis. RCQC has been developed to work as a command-line python app, but in addition, since IRIDA uses Galaxy to execute its pipeline, we have a Galaxy RCQC tool for “pro” users to develop recipes. We will be offering a basic version of this tool that allows users without programming skills to adjust key QC parameters only. Recipes can include conditionals that trigger a halt to a pipeline by sending the appropriate signal (exit code). More than one RCQC recipe can be run in a pipeline, and their report output can be daisy chained in order to contribute to a single collective report. QC metric conditionals shown below can either signal a possible error situation (the “fail(qc…)” call), or even call a halt to futile pipeline work (via “fail(job …)”). adjusting parameters and formulae for pipeline operation – one that did not require recompilation after each user-driven change. As a result, the RCQC system provides a more transparent rule set that reduces the skill needed to make process adjustments. Standard assembly pipeline QC metrics are introduced which provide a blueprint for the way QC components could be shared amongst NGS sequencing pipelines. Further information, including source code, is available at https://github.com/Public-Health-Bioinformatics/rcqc. Implementation Protege ontology editor view of GenEpiO assembly quality control terms JSON-LDHTML FLASHFastQC CheckM RCQC recipe for text-mining flash.log In developing a scripting language to do this work, we did not want to reinvent the wheel (in fact RCQC offers up for reuse all of python’s built-in math and operator functions). We did however need a flexible mechanism for FLASH