SlideShare una empresa de Scribd logo
1 de 16
QBI’s Centre for Brain Genomics The informatics side of things [Sprengben [why not get a friend]] September 8, 2011
Objective of QBI’s Centre for Brain genomics On-time delivery Reliable data production Convincing data Easy delivery Perkel JM. Coding your way out of a problem. Nat Methods. 2011 Jun PMID: 21716280.
Birdseye view of facility’s workflow September 8, 2011
Detailed workflow September 8, 2011 Cbot HiSeq 30 diff.  programs CASAVA Raw sequence reads projects flowcell HiSeq cluster cluster
Overview of Production Informatics framework September 8, 2011 Automatic Manual Processing                             Evaluation Run/ Data/ MakeFastq.sh trigger.sh  armed trigger.sh  html Unaligned/ bwa/, reCaAl/, variant/ Summary.html //clusterstorage Apache, IGV, R, UCSC //cluster-vm
Trigger.sh September 8, 2011 Keeping data separate from scripts Automating verification, quality control and summary HTML generation Rerunning pipeline from every point
Flexible generic names: header #Programs BWA="/clusterdata/hiseq_apps/bin/$MODE/bwa" SAMTOOLS="/clusterdata/hiseq_apps/bin/$MODE/samtools" IGVTOOLS="/clusterdata/hiseq_apps/bin/$MODE/igvtools/IGVTools/igvtools.jar” # Task names TASKFASTQC="fastQC" TASKBWA="bwa" TASKRCA="reCalAln” #Fileabb READONE="read1" READTWO="read2" FASTQ="fastq.gz" ALN="aln" # aligned  September 8, 2011
Config.txt September 8, 2011 #******************** # Tasks #******************** mappingBWA="1"  recalibrateQualScore="1"  #******************** # Paths #******************** FASTA="/clusterdata/resources/hg19/hg19.fasta"  SEQREG=chr1:229994688-230071581" DBSNP="/clusterdata/resources/hg19/snpdb132.vcf"  #******************** # PARAMETER #******************** LIBRARY="QBI” ADDPARAMBWA=“--force single”  Specifics what to do, e.g. mapping and recalibration  Specifics where to find resources  Customizes stanardsripts for this project
call trigger.shconfig.txtarmed trigger.shconfig.txthtml September 8, 2011 s_1_read1.fastq s_1_read2.fastq s_2_read1.fastq s_2_read2.fastq s_3_read1.fastq s_3_read2.fastq s_4_read1.fastq s_4_read2.fastq s_1.bam s_2.bam s_1.ashrr.bam s_2.ashrr.bam s_3.bam s_4.bam s_3.ashrr.bam s_4.ashrr.bam Sub1_s_1.out Sub1_s_2.out Sub2_s_3.out Sub2_s_4.out Sub1_s_1.out Sub1_s_2.out Sub2_s_3.out Sub2_s_4.out
Summary.html Project Cards September 8, 2011 Sequence statistics Run check  points Data Visualization Mapping stats Download Interesting Regions
Scaffold of pbsScripts.sh: Error catching September 8, 2011 Code example for setting up what errors to look out for # QCVARIABLES, loosing reads, unmapped read,no such file,file not found,bwa.sh: line Output in Summary.html >>>>>>>>>> Errors QC_PASS .. 0 have We are loosing reads/184 QC_PASS .. 0 have for unmapped read/184 QC_PASS .. 0 have no such file/184 QC_PASS .. 0 have file not found/184 QC_PASS .. 0 have bwa.sh: line/184
Scaffold of pbsScripts.sh: checkpoints September 8, 2011 qsub -by -jy [PBSOPTIONS] pbsScript.sh -k HISEQINF [PARAMETERS] Code example for setting up checkpoints in the pbsScript.sh echo “********* mapping” $BWA aln -t $THREADS $FASTA $f > $OUT/${n/$FASTQ/sai} $BWA aln -t $THREADS $FASTA ${f/$READONE/$READTWO} > $OUT/${n/$READONE.$FASTQ/$READTWO.sai} Output in Summary.html >>>>>>>>>> CheckPoints QC_PASS .. 184 have mapping/184 QC_PASS .. 184 have sorting and bam-conversion/184 QC_PASS .. 184 have mark duplicates/184 QC_PASS .. 184 have statistics/184 QC_PASS .. 184 have coverage track/184
Availability: tailored to skills 1 2 3 Website  RStudio Command line
The big picture Covering all aspects of: design*, set-up*, maintenance*, usage  (*except cluster) Documentation: Project Server //project 5 TB raw data 750 GB processed data 57 GB external data 7 project-cards 10 Projects, 6 HiSeq-Runs  40 wiki pages, 250 Tasks, 551h logged 160 Commits 35 external programs 41 custom scripts (4197 lines of code) Application Backup/Version Control Data Warehousing Statistic  Analysis HiSeq Output RSudio Raw Data Quality Control Project Cards Processed Data Processed Data Rsync Hypothesis Generation Software BWA, GATK, samtools, etc. Custom Scripts Custom Scripts Version Control Data Processing and Analysis External Genomic Resources Cluster Genomes, Annotation, etc. Project Server Content Galaxy Visualization IGV Genome Browser //cluster-vm //clusterstorage //groupshare, //ethan
Three things to remember Reliable data production Projects have all a similar structure and are processed in the same way Convincing data All steps are tightly quality controlled and the QC report is accessible Easy delivery We tailored data availability to skill-levels (webpage, Rstudio, console On time delivery Production informatics has priority on the cluster September 8, 2011 ( )
Next week NGS Discussion group:  Methylation analysis 	Kevin Dudley and Danay Baker-Andresen September 8, 2011

Más contenido relacionado

Destacado

Towards Precision Medicine: Tute Genomics, a cloud-based application for anal...
Towards Precision Medicine: Tute Genomics, a cloud-based application for anal...Towards Precision Medicine: Tute Genomics, a cloud-based application for anal...
Towards Precision Medicine: Tute Genomics, a cloud-based application for anal...Reid Robison
 
VariantSpark: applying Spark-based machine learning methods to genomic inform...
VariantSpark: applying Spark-based machine learning methods to genomic inform...VariantSpark: applying Spark-based machine learning methods to genomic inform...
VariantSpark: applying Spark-based machine learning methods to genomic inform...Denis C. Bauer
 
Machine Learning in Modern Medicine with Erin LeDell at Stanford Med
Machine Learning in Modern Medicine with Erin LeDell at Stanford MedMachine Learning in Modern Medicine with Erin LeDell at Stanford Med
Machine Learning in Modern Medicine with Erin LeDell at Stanford MedSri Ambati
 
Deep Machine Learning for Making Sense of Biotech Data - From Clean Energy to...
Deep Machine Learning for Making Sense of Biotech Data - From Clean Energy to...Deep Machine Learning for Making Sense of Biotech Data - From Clean Energy to...
Deep Machine Learning for Making Sense of Biotech Data - From Clean Energy to...Wesley De Neve
 
The Hive Think Tank: Machine Learning Applications in Genomics by Prof. Jian ...
The Hive Think Tank: Machine Learning Applications in Genomics by Prof. Jian ...The Hive Think Tank: Machine Learning Applications in Genomics by Prof. Jian ...
The Hive Think Tank: Machine Learning Applications in Genomics by Prof. Jian ...The Hive
 

Destacado (6)

Towards Precision Medicine: Tute Genomics, a cloud-based application for anal...
Towards Precision Medicine: Tute Genomics, a cloud-based application for anal...Towards Precision Medicine: Tute Genomics, a cloud-based application for anal...
Towards Precision Medicine: Tute Genomics, a cloud-based application for anal...
 
VariantSpark: applying Spark-based machine learning methods to genomic inform...
VariantSpark: applying Spark-based machine learning methods to genomic inform...VariantSpark: applying Spark-based machine learning methods to genomic inform...
VariantSpark: applying Spark-based machine learning methods to genomic inform...
 
Machine Learning in Modern Medicine with Erin LeDell at Stanford Med
Machine Learning in Modern Medicine with Erin LeDell at Stanford MedMachine Learning in Modern Medicine with Erin LeDell at Stanford Med
Machine Learning in Modern Medicine with Erin LeDell at Stanford Med
 
Deep Machine Learning for Making Sense of Biotech Data - From Clean Energy to...
Deep Machine Learning for Making Sense of Biotech Data - From Clean Energy to...Deep Machine Learning for Making Sense of Biotech Data - From Clean Energy to...
Deep Machine Learning for Making Sense of Biotech Data - From Clean Energy to...
 
The Hive Think Tank: Machine Learning Applications in Genomics by Prof. Jian ...
The Hive Think Tank: Machine Learning Applications in Genomics by Prof. Jian ...The Hive Think Tank: Machine Learning Applications in Genomics by Prof. Jian ...
The Hive Think Tank: Machine Learning Applications in Genomics by Prof. Jian ...
 
Machine learning
Machine learningMachine learning
Machine learning
 

Similar a Qbi Centre for Brain genomics (Informatics side)

Django - Python MVC Framework
Django - Python MVC FrameworkDjango - Python MVC Framework
Django - Python MVC FrameworkBala Kumar
 
Jboss World 2011 Infinispan
Jboss World 2011 InfinispanJboss World 2011 Infinispan
Jboss World 2011 Infinispancbo_
 
Shibboleth 2.0 SP slides - Installfest
Shibboleth 2.0 SP slides - InstallfestShibboleth 2.0 SP slides - Installfest
Shibboleth 2.0 SP slides - InstallfestJISC.AM
 
Intro To Mvc Development In Php
Intro To Mvc Development In PhpIntro To Mvc Development In Php
Intro To Mvc Development In Phpfunkatron
 
Instrumentación de entrega continua con Gitlab
Instrumentación de entrega continua con GitlabInstrumentación de entrega continua con Gitlab
Instrumentación de entrega continua con GitlabSoftware Guru
 
Deploy Rails Application by Capistrano
Deploy Rails Application by CapistranoDeploy Rails Application by Capistrano
Deploy Rails Application by CapistranoTasawr Interactive
 
Scaling up development of a modular code base
Scaling up development of a modular code baseScaling up development of a modular code base
Scaling up development of a modular code baseRobert Munteanu
 
Buying a Ferrari for your teenager? You may want to think twice
Buying a Ferrari for your teenager? You may want to think twiceBuying a Ferrari for your teenager? You may want to think twice
Buying a Ferrari for your teenager? You may want to think twiceAl Zindiq
 
Internet Explorer 8 for Developers by Christian Thilmany
Internet Explorer 8 for Developers by Christian ThilmanyInternet Explorer 8 for Developers by Christian Thilmany
Internet Explorer 8 for Developers by Christian ThilmanyChristian Thilmany
 
Workshop quality assurance for php projects - phpbelfast
Workshop quality assurance for php projects - phpbelfastWorkshop quality assurance for php projects - phpbelfast
Workshop quality assurance for php projects - phpbelfastMichelangelo van Dam
 
Create a web-app with Cgi Appplication
Create a web-app with Cgi AppplicationCreate a web-app with Cgi Appplication
Create a web-app with Cgi Appplicationolegmmiller
 
Service Oriented Integration With ServiceMix
Service Oriented Integration With ServiceMixService Oriented Integration With ServiceMix
Service Oriented Integration With ServiceMixBruce Snyder
 
Introduction To ASP.NET MVC
Introduction To ASP.NET MVCIntroduction To ASP.NET MVC
Introduction To ASP.NET MVCAlan Dean
 
Workshop quality assurance for php projects - ZendCon 2013
Workshop quality assurance for php projects - ZendCon 2013Workshop quality assurance for php projects - ZendCon 2013
Workshop quality assurance for php projects - ZendCon 2013Michelangelo van Dam
 
Augustus Overview Open Source Analytics
Augustus Overview  Open Source AnalyticsAugustus Overview  Open Source Analytics
Augustus Overview Open Source Analyticsjtrussell
 
Front End Website Optimization
Front End Website OptimizationFront End Website Optimization
Front End Website OptimizationGerard Sychay
 
Storage Benchmarks - Voodoo oder Wissenschaft? – data://disrupted® 2020
Storage Benchmarks - Voodoo oder Wissenschaft? – data://disrupted® 2020Storage Benchmarks - Voodoo oder Wissenschaft? – data://disrupted® 2020
Storage Benchmarks - Voodoo oder Wissenschaft? – data://disrupted® 2020data://disrupted®
 

Similar a Qbi Centre for Brain genomics (Informatics side) (20)

Django - Python MVC Framework
Django - Python MVC FrameworkDjango - Python MVC Framework
Django - Python MVC Framework
 
Jboss World 2011 Infinispan
Jboss World 2011 InfinispanJboss World 2011 Infinispan
Jboss World 2011 Infinispan
 
Shibboleth 2.0 SP slides - Installfest
Shibboleth 2.0 SP slides - InstallfestShibboleth 2.0 SP slides - Installfest
Shibboleth 2.0 SP slides - Installfest
 
Bioinformatica 10-11-2011-p6-bioperl
Bioinformatica 10-11-2011-p6-bioperlBioinformatica 10-11-2011-p6-bioperl
Bioinformatica 10-11-2011-p6-bioperl
 
Intro To Mvc Development In Php
Intro To Mvc Development In PhpIntro To Mvc Development In Php
Intro To Mvc Development In Php
 
Instrumentación de entrega continua con Gitlab
Instrumentación de entrega continua con GitlabInstrumentación de entrega continua con Gitlab
Instrumentación de entrega continua con Gitlab
 
Deploy Rails Application by Capistrano
Deploy Rails Application by CapistranoDeploy Rails Application by Capistrano
Deploy Rails Application by Capistrano
 
Scaling up development of a modular code base
Scaling up development of a modular code baseScaling up development of a modular code base
Scaling up development of a modular code base
 
Buying a Ferrari for your teenager? You may want to think twice
Buying a Ferrari for your teenager? You may want to think twiceBuying a Ferrari for your teenager? You may want to think twice
Buying a Ferrari for your teenager? You may want to think twice
 
Internet Explorer 8 for Developers by Christian Thilmany
Internet Explorer 8 for Developers by Christian ThilmanyInternet Explorer 8 for Developers by Christian Thilmany
Internet Explorer 8 for Developers by Christian Thilmany
 
Workshop quality assurance for php projects - phpbelfast
Workshop quality assurance for php projects - phpbelfastWorkshop quality assurance for php projects - phpbelfast
Workshop quality assurance for php projects - phpbelfast
 
Create a web-app with Cgi Appplication
Create a web-app with Cgi AppplicationCreate a web-app with Cgi Appplication
Create a web-app with Cgi Appplication
 
Service Oriented Integration With ServiceMix
Service Oriented Integration With ServiceMixService Oriented Integration With ServiceMix
Service Oriented Integration With ServiceMix
 
Beyond Unit Testing
Beyond Unit TestingBeyond Unit Testing
Beyond Unit Testing
 
Introduction To ASP.NET MVC
Introduction To ASP.NET MVCIntroduction To ASP.NET MVC
Introduction To ASP.NET MVC
 
Workshop quality assurance for php projects - ZendCon 2013
Workshop quality assurance for php projects - ZendCon 2013Workshop quality assurance for php projects - ZendCon 2013
Workshop quality assurance for php projects - ZendCon 2013
 
Augustus Overview Open Source Analytics
Augustus Overview  Open Source AnalyticsAugustus Overview  Open Source Analytics
Augustus Overview Open Source Analytics
 
Php frameworks
Php frameworksPhp frameworks
Php frameworks
 
Front End Website Optimization
Front End Website OptimizationFront End Website Optimization
Front End Website Optimization
 
Storage Benchmarks - Voodoo oder Wissenschaft? – data://disrupted® 2020
Storage Benchmarks - Voodoo oder Wissenschaft? – data://disrupted® 2020Storage Benchmarks - Voodoo oder Wissenschaft? – data://disrupted® 2020
Storage Benchmarks - Voodoo oder Wissenschaft? – data://disrupted® 2020
 

Más de Denis C. Bauer

Cloud-native machine learning - Transforming bioinformatics research
Cloud-native machine learning - Transforming bioinformatics research Cloud-native machine learning - Transforming bioinformatics research
Cloud-native machine learning - Transforming bioinformatics research Denis C. Bauer
 
Translating genomics into clinical practice - 2018 AWS summit keynote
Translating genomics into clinical practice - 2018 AWS summit keynoteTranslating genomics into clinical practice - 2018 AWS summit keynote
Translating genomics into clinical practice - 2018 AWS summit keynoteDenis C. Bauer
 
Going Server-less for Web-Services that need to Crunch Large Volumes of Data
Going Server-less for Web-Services that need to Crunch Large Volumes of DataGoing Server-less for Web-Services that need to Crunch Large Volumes of Data
Going Server-less for Web-Services that need to Crunch Large Volumes of DataDenis C. Bauer
 
How novel compute technology transforms life science research
How novel compute technology transforms life science researchHow novel compute technology transforms life science research
How novel compute technology transforms life science researchDenis C. Bauer
 
How novel compute technology transforms life science research
How novel compute technology transforms life science researchHow novel compute technology transforms life science research
How novel compute technology transforms life science researchDenis C. Bauer
 
Population-scale high-throughput sequencing data analysis
Population-scale high-throughput sequencing data analysisPopulation-scale high-throughput sequencing data analysis
Population-scale high-throughput sequencing data analysisDenis C. Bauer
 
Allelic Imbalance for Pre-capture Whole Exome Sequencing
Allelic Imbalance for Pre-capture Whole Exome SequencingAllelic Imbalance for Pre-capture Whole Exome Sequencing
Allelic Imbalance for Pre-capture Whole Exome SequencingDenis C. Bauer
 
Centralizing sequence analysis
Centralizing sequence analysisCentralizing sequence analysis
Centralizing sequence analysisDenis C. Bauer
 
Differential gene expression
Differential gene expressionDifferential gene expression
Differential gene expressionDenis C. Bauer
 
Transcript detection in RNAseq
Transcript detection in RNAseqTranscript detection in RNAseq
Transcript detection in RNAseqDenis C. Bauer
 
Functionally annotate genomic variants
Functionally annotate genomic variantsFunctionally annotate genomic variants
Functionally annotate genomic variantsDenis C. Bauer
 
Variant (SNPs/Indels) calling in DNA sequences, Part 2
Variant (SNPs/Indels) calling in DNA sequences, Part 2Variant (SNPs/Indels) calling in DNA sequences, Part 2
Variant (SNPs/Indels) calling in DNA sequences, Part 2Denis C. Bauer
 
Variant (SNPs/Indels) calling in DNA sequences, Part 1
Variant (SNPs/Indels) calling in DNA sequences, Part 1 Variant (SNPs/Indels) calling in DNA sequences, Part 1
Variant (SNPs/Indels) calling in DNA sequences, Part 1 Denis C. Bauer
 
Introduction to second generation sequencing
Introduction to second generation sequencingIntroduction to second generation sequencing
Introduction to second generation sequencingDenis C. Bauer
 
Introduction to Bioinformatics
Introduction to BioinformaticsIntroduction to Bioinformatics
Introduction to BioinformaticsDenis C. Bauer
 
The missing data issue for HiSeq runs
The missing data issue for HiSeq runsThe missing data issue for HiSeq runs
The missing data issue for HiSeq runsDenis C. Bauer
 
Deciphering the regulatory code in the genome
Deciphering the regulatory code in the genomeDeciphering the regulatory code in the genome
Deciphering the regulatory code in the genomeDenis C. Bauer
 
STAR: Recombination site prediction
STAR: Recombination site predictionSTAR: Recombination site prediction
STAR: Recombination site predictionDenis C. Bauer
 

Más de Denis C. Bauer (20)

Cloud-native machine learning - Transforming bioinformatics research
Cloud-native machine learning - Transforming bioinformatics research Cloud-native machine learning - Transforming bioinformatics research
Cloud-native machine learning - Transforming bioinformatics research
 
Translating genomics into clinical practice - 2018 AWS summit keynote
Translating genomics into clinical practice - 2018 AWS summit keynoteTranslating genomics into clinical practice - 2018 AWS summit keynote
Translating genomics into clinical practice - 2018 AWS summit keynote
 
Going Server-less for Web-Services that need to Crunch Large Volumes of Data
Going Server-less for Web-Services that need to Crunch Large Volumes of DataGoing Server-less for Web-Services that need to Crunch Large Volumes of Data
Going Server-less for Web-Services that need to Crunch Large Volumes of Data
 
How novel compute technology transforms life science research
How novel compute technology transforms life science researchHow novel compute technology transforms life science research
How novel compute technology transforms life science research
 
How novel compute technology transforms life science research
How novel compute technology transforms life science researchHow novel compute technology transforms life science research
How novel compute technology transforms life science research
 
Population-scale high-throughput sequencing data analysis
Population-scale high-throughput sequencing data analysisPopulation-scale high-throughput sequencing data analysis
Population-scale high-throughput sequencing data analysis
 
Trip Report Seattle
Trip Report SeattleTrip Report Seattle
Trip Report Seattle
 
Allelic Imbalance for Pre-capture Whole Exome Sequencing
Allelic Imbalance for Pre-capture Whole Exome SequencingAllelic Imbalance for Pre-capture Whole Exome Sequencing
Allelic Imbalance for Pre-capture Whole Exome Sequencing
 
Centralizing sequence analysis
Centralizing sequence analysisCentralizing sequence analysis
Centralizing sequence analysis
 
Differential gene expression
Differential gene expressionDifferential gene expression
Differential gene expression
 
Transcript detection in RNAseq
Transcript detection in RNAseqTranscript detection in RNAseq
Transcript detection in RNAseq
 
Functionally annotate genomic variants
Functionally annotate genomic variantsFunctionally annotate genomic variants
Functionally annotate genomic variants
 
Variant (SNPs/Indels) calling in DNA sequences, Part 2
Variant (SNPs/Indels) calling in DNA sequences, Part 2Variant (SNPs/Indels) calling in DNA sequences, Part 2
Variant (SNPs/Indels) calling in DNA sequences, Part 2
 
Variant (SNPs/Indels) calling in DNA sequences, Part 1
Variant (SNPs/Indels) calling in DNA sequences, Part 1 Variant (SNPs/Indels) calling in DNA sequences, Part 1
Variant (SNPs/Indels) calling in DNA sequences, Part 1
 
Introduction to second generation sequencing
Introduction to second generation sequencingIntroduction to second generation sequencing
Introduction to second generation sequencing
 
Introduction to Bioinformatics
Introduction to BioinformaticsIntroduction to Bioinformatics
Introduction to Bioinformatics
 
The missing data issue for HiSeq runs
The missing data issue for HiSeq runsThe missing data issue for HiSeq runs
The missing data issue for HiSeq runs
 
Deciphering the regulatory code in the genome
Deciphering the regulatory code in the genomeDeciphering the regulatory code in the genome
Deciphering the regulatory code in the genome
 
ReliF
ReliFReliF
ReliF
 
STAR: Recombination site prediction
STAR: Recombination site predictionSTAR: Recombination site prediction
STAR: Recombination site prediction
 

Último

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Bhuvaneswari Subramani
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 

Último (20)

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 

Qbi Centre for Brain genomics (Informatics side)

  • 1. QBI’s Centre for Brain Genomics The informatics side of things [Sprengben [why not get a friend]] September 8, 2011
  • 2. Objective of QBI’s Centre for Brain genomics On-time delivery Reliable data production Convincing data Easy delivery Perkel JM. Coding your way out of a problem. Nat Methods. 2011 Jun PMID: 21716280.
  • 3. Birdseye view of facility’s workflow September 8, 2011
  • 4. Detailed workflow September 8, 2011 Cbot HiSeq 30 diff. programs CASAVA Raw sequence reads projects flowcell HiSeq cluster cluster
  • 5. Overview of Production Informatics framework September 8, 2011 Automatic Manual Processing Evaluation Run/ Data/ MakeFastq.sh trigger.sh armed trigger.sh html Unaligned/ bwa/, reCaAl/, variant/ Summary.html //clusterstorage Apache, IGV, R, UCSC //cluster-vm
  • 6. Trigger.sh September 8, 2011 Keeping data separate from scripts Automating verification, quality control and summary HTML generation Rerunning pipeline from every point
  • 7. Flexible generic names: header #Programs BWA="/clusterdata/hiseq_apps/bin/$MODE/bwa" SAMTOOLS="/clusterdata/hiseq_apps/bin/$MODE/samtools" IGVTOOLS="/clusterdata/hiseq_apps/bin/$MODE/igvtools/IGVTools/igvtools.jar” # Task names TASKFASTQC="fastQC" TASKBWA="bwa" TASKRCA="reCalAln” #Fileabb READONE="read1" READTWO="read2" FASTQ="fastq.gz" ALN="aln" # aligned September 8, 2011
  • 8. Config.txt September 8, 2011 #******************** # Tasks #******************** mappingBWA="1" recalibrateQualScore="1" #******************** # Paths #******************** FASTA="/clusterdata/resources/hg19/hg19.fasta" SEQREG=chr1:229994688-230071581" DBSNP="/clusterdata/resources/hg19/snpdb132.vcf" #******************** # PARAMETER #******************** LIBRARY="QBI” ADDPARAMBWA=“--force single” Specifics what to do, e.g. mapping and recalibration Specifics where to find resources Customizes stanardsripts for this project
  • 9. call trigger.shconfig.txtarmed trigger.shconfig.txthtml September 8, 2011 s_1_read1.fastq s_1_read2.fastq s_2_read1.fastq s_2_read2.fastq s_3_read1.fastq s_3_read2.fastq s_4_read1.fastq s_4_read2.fastq s_1.bam s_2.bam s_1.ashrr.bam s_2.ashrr.bam s_3.bam s_4.bam s_3.ashrr.bam s_4.ashrr.bam Sub1_s_1.out Sub1_s_2.out Sub2_s_3.out Sub2_s_4.out Sub1_s_1.out Sub1_s_2.out Sub2_s_3.out Sub2_s_4.out
  • 10. Summary.html Project Cards September 8, 2011 Sequence statistics Run check points Data Visualization Mapping stats Download Interesting Regions
  • 11. Scaffold of pbsScripts.sh: Error catching September 8, 2011 Code example for setting up what errors to look out for # QCVARIABLES, loosing reads, unmapped read,no such file,file not found,bwa.sh: line Output in Summary.html >>>>>>>>>> Errors QC_PASS .. 0 have We are loosing reads/184 QC_PASS .. 0 have for unmapped read/184 QC_PASS .. 0 have no such file/184 QC_PASS .. 0 have file not found/184 QC_PASS .. 0 have bwa.sh: line/184
  • 12. Scaffold of pbsScripts.sh: checkpoints September 8, 2011 qsub -by -jy [PBSOPTIONS] pbsScript.sh -k HISEQINF [PARAMETERS] Code example for setting up checkpoints in the pbsScript.sh echo “********* mapping” $BWA aln -t $THREADS $FASTA $f > $OUT/${n/$FASTQ/sai} $BWA aln -t $THREADS $FASTA ${f/$READONE/$READTWO} > $OUT/${n/$READONE.$FASTQ/$READTWO.sai} Output in Summary.html >>>>>>>>>> CheckPoints QC_PASS .. 184 have mapping/184 QC_PASS .. 184 have sorting and bam-conversion/184 QC_PASS .. 184 have mark duplicates/184 QC_PASS .. 184 have statistics/184 QC_PASS .. 184 have coverage track/184
  • 13. Availability: tailored to skills 1 2 3 Website RStudio Command line
  • 14. The big picture Covering all aspects of: design*, set-up*, maintenance*, usage (*except cluster) Documentation: Project Server //project 5 TB raw data 750 GB processed data 57 GB external data 7 project-cards 10 Projects, 6 HiSeq-Runs 40 wiki pages, 250 Tasks, 551h logged 160 Commits 35 external programs 41 custom scripts (4197 lines of code) Application Backup/Version Control Data Warehousing Statistic Analysis HiSeq Output RSudio Raw Data Quality Control Project Cards Processed Data Processed Data Rsync Hypothesis Generation Software BWA, GATK, samtools, etc. Custom Scripts Custom Scripts Version Control Data Processing and Analysis External Genomic Resources Cluster Genomes, Annotation, etc. Project Server Content Galaxy Visualization IGV Genome Browser //cluster-vm //clusterstorage //groupshare, //ethan
  • 15. Three things to remember Reliable data production Projects have all a similar structure and are processed in the same way Convincing data All steps are tightly quality controlled and the QC report is accessible Easy delivery We tailored data availability to skill-levels (webpage, Rstudio, console On time delivery Production informatics has priority on the cluster September 8, 2011 ( )
  • 16. Next week NGS Discussion group: Methylation analysis Kevin Dudley and Danay Baker-Andresen September 8, 2011

Notas del editor

  1. http://www.haynesboone.com/files/ImageControl/64f36756-3f0f-4254-b7bb-d9b447ae14d5/c8cd574b-4e35-4071-8a35-007febd928ee/Presentation/Image/mainImage_perspective.jpg