SlideShare una empresa de Scribd logo
1 de 47
IMGS 2011 Bioinformatics Workshop Deanna Church, NCBI Carol Bult, The Jackson Laboratory
Intro Sequencing Technology: life in the fast lane Alignments: things to consider File formats: everything you always wanted to know but were afraid to ask Tools: Pick the right one for the job at hand
Cost Throughput Gigabases Cost per Kb Lucinda Fulton, The Genome Center at Washington University
Sequencing Technologies http://www.geospiza.com/finchtalk/uploaded_images/plates-and-slides-718301.png
Sequence “Space” Roche 454 – Flow space Measure pyrophosphate released by a nucleotide when it is added to a growing DNA chain Flow space describes sequence in terms of these base incorporations http://www.youtube.com/watch?v=bFNjxKHP8Jc AB SOLiD – Color space Sequencing by DNA ligation via synthetic DNA molecules that contain two nested known bases with a flouorescent dye Each base sequenced twice http://www.youtube.com/watch?v=nlvyF8bFDwM&feature=related Illumina/Solexa – Base space Single base extentions of fluorescent-labeled nucleotides with protected 3 ‘ OH groups Sequencing via cycles of base addition/detection followed deprotection of  the 3’ OH http://www.youtube.com/watch?v=77r5p8IBwJk&feature=related GenomeTV – Next Generation Sequencing (lecture) http://www.youtube.com/watch?v=g0vGrNjpyA8&feature=related http://finchtalk.geospiza.com/2008/03/color-space-flow-space-sequence-space_23.html
Optimal global alignment Optimal local alignment Needleman-Wunsch Smith-Waterman Sequences align essentially from end to end Sequences align only in small, isolated regions Global and local alignments References Needleman and Wunsch (1970). J. Mol. Biol. 48, 443-453. Smith and Waterman (1981). Nucleic Acids Res 13, 645-656.
Word size = 3(configurable)  Hashing methods References Wilbur & Lipman (1983), PNAS80, 726-30 Lipman & Pearson (1985), Science227, 1435-1441 Pearson & Lipman (1988), PNAS85, 2444-2448 MVRRLPERTSTPACE Query sequence MVR VRR RRL RLP LPE PER ERT RTS TST STP TPA PAC ACE
http://www.slideshare.net/thomaskeane/eccb-2010-nextgen-sequencing-tutorial
http://www.slideshare.net/thomaskeane/eccb-2010-nextgen-sequencing-tutorial
Sensitivity vs. Specificity Sensitivity = actual number of true positives (tp) identified Specificity = number of true negatives (tn) identified Predicted positives negatives positives Actual negatives Sensitivity= TP/(TP+FN) Specificity=TN/(TN+FP)
Richa Agarwala MHC Alternate locus Alignment to chr6
Tools Alignments BLAST: not for NGS BWA Bowtie Maq … Transcriptomics Tophat Cufflinks … Variant calling ssahaSNP Mosaic … Counting (Chip-Seq, etc) FindPeaks PeakSeq
Genome Workbench http://www.ncbi.nlm.nih.gov/projects/gbench/
“Standard” File formats Sequence containers FASTA FASTQ BAM/SAM Alignments BAM/SAM MAF Annotation BED GFF/GTF/GFF3 WIG Variation VCF GVF
FASTQ: Data Format FASTQ Text based Encodes sequence calls and quality scores with ASCII characters Stores minimal information about the sequence read 4 lines per sequence Line 1: begins with @; followed by sequence identifier and optional description Line 2: the sequence Line 3: begins with the “+” and is followed by sequence identifiers and description (both are optional) Line 4: encoding of quality scores for the sequence in line 2 References/Documentation http://maq.sourceforge.net/fastq.shtml Cock et al. (2009). Nuc Acids Res 38:1767-1771.
FASTQ Example For analysis, it may be necessary to convert to the Sanger form of FASTQ…For example, Illumina stores quality scores ranging from 0-62; Sanger quality scores range from 0-93. Solexa quality scores have to be converted to PHRED quality scores. FASTQ example from: Cock et al. (2009). Nuc Acids Res 38:1767-1771.
SAM (Sequence Alignment/Map) It may not be necessary to align reads from scratch…you can instead use existing alignments in SAM format SAM is the output of aligners that map reads to a reference genome Tab delimited w/ header section and alignment section Header sections begin with @ (are optional) Alignment section has 11 mandatory fields BAM is the binary format of SAM http://samtools.sourceforge.net/
Mandatory Alignment Fields http://samtools.sourceforge.net/SAM1.pdf
Alignment Examples Alignments in SAM format http://samtools.sourceforge.net/SAM1.pdf
Valid BED files chr1	86114265	86116346	nsv433165 chr2	1841774	1846089	nsv433166 chr16	2950446	2955264	nsv433167 chr17	14350387	14351933	nsv433168 chr17	32831694	32832761	nsv433169 chr17	32831694	32832761	nsv433170 chr18	61880550	61881930	nsv433171 chr1	16759829	16778548	chr1:21667704	270866	- chr1	16763194	16784844	chr1:146691804	407277	+ chr1	16763194	16784844	chr1:144004664	408925	- chr1	16763194	16779513	chr1:142857141	291416	- chr1	16763194	16779513	chr1:143522082	293473	- chr1	16763194	16778548	chr1:146844175	284555	- chr1	16763194	16778548	chr1:147006260	284948	- chr1	16763411	16784844	chr1:144747517	405362	+
Mouse chrX: 35,000,000-36,000000
Mouse chrX: 35,000,000-36,000000 X MGSCv3 Build 36
NC_000086.6
GRCh37 hg19 Zv7 danRer5 MGSCv37 mm8 NCBIM37
Assemblies with the same name aren’t always the same chr21:8,913,216-9,246,964
Assemblies with the same name aren’t always the same Zv7 chr21:8,913,216-9,246,964 X Mouse Build 36 chrX
GRCh37 hg19 GCA_000001405.1
Tutorial Web Site http://www.ncbi.nlm.nih.gov/staff/church/GenomeAnalysis/index.shtml This site will be accessible after the meeting. Check back for updates and new tutorials.
RNA Seq Workflow Convert data to FASTQ Upload files to Galaxy Quality Control  Throw out low quality sequence reads, etc. Map reads to a reference genome Many algorithms available Trade off between speed and sensitivity Data summarization Associating alignments with genome annotations Counts Data Visualization Statistical Analysis
Typical RNA_Seq Project Work Flow  Tissue Sample Total RNA mRNA cDNA  Sequencing  FASTQ file QC TopHat Cufflinks Gene/Transcript/Exon Expression Visualization Statistical Analysis JAX Computational Sciences Service
TopHat http://tophat.cbcb.umd.edu/ TopHat is a good tool for aligning RNA Seq data compared to other aligners (Maq, BWA) because it takes splicing into account during the alignment process. Figure from: Trapnell et al. (2010). Nature Biotechnology 28:511-515. Trapnell et al. (2009). Bioinformatics 25:1105-1111.
TopHat is built on the Bowtie alignment algorithm. Trapnell C et al. Bioinformatics 2009;25:1105-1111
Cufflinks http://cufflinks.cbcb.umd.edu/ ,[object Object]
Estimates their abundances, and
Tests for differential expression and regulation in RNA-Seq samples Trapnell et al. (2010). Nature Biotechnology 28:511-515.
Galaxy See Tutorial 1  http://main.g2.bx.psu.edu/ Build and share data and analysis workflows No programming experience required Strong and growing development and user community
Short Read Archive http://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi? Short Read Archive Handbook http://www.ncbi.nlm.nih.gov/books/NBK47528/
Aspera Connect http://www.asperasoft.com/en/products/client_software_2/aspera_connect_8 High performance file transfer for getting data from the Short Read Archive
SRA Toolkit http://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=software
Galaxy on the Cloud Create an Amazon Web Services AWS account Sign up for Amazon Elastic Compute Cloude (EC2) and Amazon Simple Storage Service (S3 service) Use the AWS Management Console to start a master EC2 instance Use the Galaxy Cloud web interface to manage the cluster Step by step instructions are here: https://bitbucket.org/galaxy/galaxy-central/wiki/cloud Screencast to demonstrate the sign up process is here: https://bitbucket.org/galaxy/galaxy-central/wiki/cloud Afgan E., Baker D., Coraor N., Chapman B., Nekrutenko A., Taylor J. (2010) BMC Bioinformatics. 11:2010.
Why Go to the Cloud? Files and Compute needs are much greater for next gen sequence data  Amazon cloud provides a scalable, cost-effective solution Afgan E., Baker D., Coraor N., Chapman B., Nekrutenko A., Taylor J. (2010) BMC Bioinformatics. 11:2010.

Más contenido relacionado

Destacado

Systematic evaluation of spliced alignment programs for RNA-seq data
Systematic evaluation  of spliced alignment programs  for RNA-seq dataSystematic evaluation  of spliced alignment programs  for RNA-seq data
Systematic evaluation of spliced alignment programs for RNA-seq dataMonica Dragan
 
Part 2 of RNA-seq for DE analysis: Investigating raw data
Part 2 of RNA-seq for DE analysis: Investigating raw dataPart 2 of RNA-seq for DE analysis: Investigating raw data
Part 2 of RNA-seq for DE analysis: Investigating raw dataJoachim Jacob
 
Part 1 of RNA-seq for DE analysis: Defining the goal
Part 1 of RNA-seq for DE analysis: Defining the goalPart 1 of RNA-seq for DE analysis: Defining the goal
Part 1 of RNA-seq for DE analysis: Defining the goalJoachim Jacob
 
Biotecnologia Genomica na era do sequenciamento de DNA em larga escala
Biotecnologia Genomica na era do sequenciamento de DNA em larga escalaBiotecnologia Genomica na era do sequenciamento de DNA em larga escala
Biotecnologia Genomica na era do sequenciamento de DNA em larga escalaRinaldo Pereira
 
Sequenciamento de nova geração- Curso de Inverno de Genética 2013-UFPR by Jos...
Sequenciamento de nova geração- Curso de Inverno de Genética 2013-UFPR by Jos...Sequenciamento de nova geração- Curso de Inverno de Genética 2013-UFPR by Jos...
Sequenciamento de nova geração- Curso de Inverno de Genética 2013-UFPR by Jos...Joseph Evaristo
 
3 Things Every Sales Team Needs to Be Thinking About in 2017
3 Things Every Sales Team Needs to Be Thinking About in 20173 Things Every Sales Team Needs to Be Thinking About in 2017
3 Things Every Sales Team Needs to Be Thinking About in 2017Drift
 

Destacado (7)

Systematic evaluation of spliced alignment programs for RNA-seq data
Systematic evaluation  of spliced alignment programs  for RNA-seq dataSystematic evaluation  of spliced alignment programs  for RNA-seq data
Systematic evaluation of spliced alignment programs for RNA-seq data
 
Personalomics
PersonalomicsPersonalomics
Personalomics
 
Part 2 of RNA-seq for DE analysis: Investigating raw data
Part 2 of RNA-seq for DE analysis: Investigating raw dataPart 2 of RNA-seq for DE analysis: Investigating raw data
Part 2 of RNA-seq for DE analysis: Investigating raw data
 
Part 1 of RNA-seq for DE analysis: Defining the goal
Part 1 of RNA-seq for DE analysis: Defining the goalPart 1 of RNA-seq for DE analysis: Defining the goal
Part 1 of RNA-seq for DE analysis: Defining the goal
 
Biotecnologia Genomica na era do sequenciamento de DNA em larga escala
Biotecnologia Genomica na era do sequenciamento de DNA em larga escalaBiotecnologia Genomica na era do sequenciamento de DNA em larga escala
Biotecnologia Genomica na era do sequenciamento de DNA em larga escala
 
Sequenciamento de nova geração- Curso de Inverno de Genética 2013-UFPR by Jos...
Sequenciamento de nova geração- Curso de Inverno de Genética 2013-UFPR by Jos...Sequenciamento de nova geração- Curso de Inverno de Genética 2013-UFPR by Jos...
Sequenciamento de nova geração- Curso de Inverno de Genética 2013-UFPR by Jos...
 
3 Things Every Sales Team Needs to Be Thinking About in 2017
3 Things Every Sales Team Needs to Be Thinking About in 20173 Things Every Sales Team Needs to Be Thinking About in 2017
3 Things Every Sales Team Needs to Be Thinking About in 2017
 

Similar a Imgc2011 bioinformatics tutorial

Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016Prof. Wim Van Criekinge
 
Dgaston dec-06-2012
Dgaston dec-06-2012Dgaston dec-06-2012
Dgaston dec-06-2012Dan Gaston
 
Tools for Transcriptome Data Analysis
Tools for Transcriptome Data AnalysisTools for Transcriptome Data Analysis
Tools for Transcriptome Data AnalysisSANJANA PANDEY
 
The Transformation of Systems Biology Into A Large Data Science
The Transformation of Systems Biology Into A Large Data ScienceThe Transformation of Systems Biology Into A Large Data Science
The Transformation of Systems Biology Into A Large Data ScienceRobert Grossman
 
2009 HEP Science Network Requirements Workshop Final Report
2009 HEP Science Network Requirements Workshop Final Report2009 HEP Science Network Requirements Workshop Final Report
2009 HEP Science Network Requirements Workshop Final Reportbutest
 
Enabling Large Scale Sequencing Studies through Science as a Service
Enabling Large Scale Sequencing Studies through Science as a ServiceEnabling Large Scale Sequencing Studies through Science as a Service
Enabling Large Scale Sequencing Studies through Science as a ServiceJustin Johnson
 
Datasets and tools_from_ncbi_and_elsewhere_for_microbiome_research_v_62817
Datasets and tools_from_ncbi_and_elsewhere_for_microbiome_research_v_62817Datasets and tools_from_ncbi_and_elsewhere_for_microbiome_research_v_62817
Datasets and tools_from_ncbi_and_elsewhere_for_microbiome_research_v_62817Ben Busby
 
ICAR 2015 Workshop - Nick Provart
ICAR 2015 Workshop - Nick ProvartICAR 2015 Workshop - Nick Provart
ICAR 2015 Workshop - Nick ProvartAraport
 
OVium Bioinformatic Solutions
OVium Bioinformatic SolutionsOVium Bioinformatic Solutions
OVium Bioinformatic SolutionsOVium Solutions
 
RNA-seq quality control and pre-processing
RNA-seq quality control and pre-processingRNA-seq quality control and pre-processing
RNA-seq quality control and pre-processingmikaelhuss
 
CCLS Internship Presentation
CCLS Internship PresentationCCLS Internship Presentation
CCLS Internship PresentationCharles Naut
 
fastp: the FASTQ pre-processor
fastp: the FASTQ pre-processorfastp: the FASTQ pre-processor
fastp: the FASTQ pre-processorHoffman Lab
 
Arthropod es tpipeline_poster
Arthropod es tpipeline_posterArthropod es tpipeline_poster
Arthropod es tpipeline_posterTamizhmuhil
 
Design and evaluation of a genomics variant analysis pipeline using GATK Spar...
Design and evaluation of a genomics variant analysis pipeline using GATK Spar...Design and evaluation of a genomics variant analysis pipeline using GATK Spar...
Design and evaluation of a genomics variant analysis pipeline using GATK Spar...Paolo Missier
 
Making Use of NGS Data: From Reads to Trees and Annotations
Making Use of NGS Data: From Reads to Trees and AnnotationsMaking Use of NGS Data: From Reads to Trees and Annotations
Making Use of NGS Data: From Reads to Trees and AnnotationsJoão André Carriço
 

Similar a Imgc2011 bioinformatics tutorial (20)

Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
 
Dgaston dec-06-2012
Dgaston dec-06-2012Dgaston dec-06-2012
Dgaston dec-06-2012
 
Tools for Transcriptome Data Analysis
Tools for Transcriptome Data AnalysisTools for Transcriptome Data Analysis
Tools for Transcriptome Data Analysis
 
2015_CV_J_SHELTON_linked
2015_CV_J_SHELTON_linked2015_CV_J_SHELTON_linked
2015_CV_J_SHELTON_linked
 
The Transformation of Systems Biology Into A Large Data Science
The Transformation of Systems Biology Into A Large Data ScienceThe Transformation of Systems Biology Into A Large Data Science
The Transformation of Systems Biology Into A Large Data Science
 
3rd presentation
3rd presentation3rd presentation
3rd presentation
 
2009 HEP Science Network Requirements Workshop Final Report
2009 HEP Science Network Requirements Workshop Final Report2009 HEP Science Network Requirements Workshop Final Report
2009 HEP Science Network Requirements Workshop Final Report
 
Enabling Large Scale Sequencing Studies through Science as a Service
Enabling Large Scale Sequencing Studies through Science as a ServiceEnabling Large Scale Sequencing Studies through Science as a Service
Enabling Large Scale Sequencing Studies through Science as a Service
 
Datasets and tools_from_ncbi_and_elsewhere_for_microbiome_research_v_62817
Datasets and tools_from_ncbi_and_elsewhere_for_microbiome_research_v_62817Datasets and tools_from_ncbi_and_elsewhere_for_microbiome_research_v_62817
Datasets and tools_from_ncbi_and_elsewhere_for_microbiome_research_v_62817
 
RNA-Seq with R-Bioconductor
RNA-Seq with R-BioconductorRNA-Seq with R-Bioconductor
RNA-Seq with R-Bioconductor
 
ICAR 2015 Workshop - Nick Provart
ICAR 2015 Workshop - Nick ProvartICAR 2015 Workshop - Nick Provart
ICAR 2015 Workshop - Nick Provart
 
OVium Bioinformatic Solutions
OVium Bioinformatic SolutionsOVium Bioinformatic Solutions
OVium Bioinformatic Solutions
 
RNA-seq quality control and pre-processing
RNA-seq quality control and pre-processingRNA-seq quality control and pre-processing
RNA-seq quality control and pre-processing
 
CCLS Internship Presentation
CCLS Internship PresentationCCLS Internship Presentation
CCLS Internship Presentation
 
Poster (1)
Poster (1)Poster (1)
Poster (1)
 
fastp: the FASTQ pre-processor
fastp: the FASTQ pre-processorfastp: the FASTQ pre-processor
fastp: the FASTQ pre-processor
 
Arthropod es tpipeline_poster
Arthropod es tpipeline_posterArthropod es tpipeline_poster
Arthropod es tpipeline_poster
 
Design and evaluation of a genomics variant analysis pipeline using GATK Spar...
Design and evaluation of a genomics variant analysis pipeline using GATK Spar...Design and evaluation of a genomics variant analysis pipeline using GATK Spar...
Design and evaluation of a genomics variant analysis pipeline using GATK Spar...
 
COPO kick-off meeting
COPO kick-off meetingCOPO kick-off meeting
COPO kick-off meeting
 
Making Use of NGS Data: From Reads to Trees and Annotations
Making Use of NGS Data: From Reads to Trees and AnnotationsMaking Use of NGS Data: From Reads to Trees and Annotations
Making Use of NGS Data: From Reads to Trees and Annotations
 

Más de Deanna Church

Más de Deanna Church (17)

Church SFAF2014 keynote
Church SFAF2014 keynoteChurch SFAF2014 keynote
Church SFAF2014 keynote
 
Church_NCBIvariation2013
Church_NCBIvariation2013Church_NCBIvariation2013
Church_NCBIvariation2013
 
Church_GenomeAccess_2013_genome2013
Church_GenomeAccess_2013_genome2013Church_GenomeAccess_2013_genome2013
Church_GenomeAccess_2013_genome2013
 
Church iowa2013
Church iowa2013Church iowa2013
Church iowa2013
 
Church emory2013
Church emory2013Church emory2013
Church emory2013
 
Church GeT-RM
Church GeT-RMChurch GeT-RM
Church GeT-RM
 
Church sfaf13
Church sfaf13Church sfaf13
Church sfaf13
 
Church gia13
Church gia13Church gia13
Church gia13
 
Church apr2013
Church apr2013Church apr2013
Church apr2013
 
Church ngs
Church ngsChurch ngs
Church ngs
 
Church agbt13 merge
Church agbt13 mergeChurch agbt13 merge
Church agbt13 merge
 
Church clinical2012
Church clinical2012Church clinical2012
Church clinical2012
 
Church isca2012
Church isca2012Church isca2012
Church isca2012
 
Church nhgri 2012
Church nhgri 2012Church nhgri 2012
Church nhgri 2012
 
Church gmod2012 pt2
Church gmod2012 pt2Church gmod2012 pt2
Church gmod2012 pt2
 
Church gmod2012 pt1
Church gmod2012 pt1Church gmod2012 pt1
Church gmod2012 pt1
 
Church Fif2009
Church Fif2009Church Fif2009
Church Fif2009
 

Último

On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptxOn_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptxPooja Bhuva
 
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...Amil baba
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSCeline George
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptxCOMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptxannathomasp01
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - Englishneillewis46
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and ModificationsMJDuyan
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.pptRamjanShidvankar
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxDr. Ravikiran H M Gowda
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfPoh-Sun Goh
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxRamakrishna Reddy Bijjam
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...pradhanghanshyam7136
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...Poonam Aher Patil
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.MaryamAhmad92
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.christianmathematics
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Jisc
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the ClassroomPooky Knightsmith
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfSherif Taha
 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxCeline George
 

Último (20)

On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptxOn_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
 
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptxCOMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - English
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptx
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the Classroom
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptx
 

Imgc2011 bioinformatics tutorial

  • 1. IMGS 2011 Bioinformatics Workshop Deanna Church, NCBI Carol Bult, The Jackson Laboratory
  • 2. Intro Sequencing Technology: life in the fast lane Alignments: things to consider File formats: everything you always wanted to know but were afraid to ask Tools: Pick the right one for the job at hand
  • 3. Cost Throughput Gigabases Cost per Kb Lucinda Fulton, The Genome Center at Washington University
  • 5. Sequence “Space” Roche 454 – Flow space Measure pyrophosphate released by a nucleotide when it is added to a growing DNA chain Flow space describes sequence in terms of these base incorporations http://www.youtube.com/watch?v=bFNjxKHP8Jc AB SOLiD – Color space Sequencing by DNA ligation via synthetic DNA molecules that contain two nested known bases with a flouorescent dye Each base sequenced twice http://www.youtube.com/watch?v=nlvyF8bFDwM&feature=related Illumina/Solexa – Base space Single base extentions of fluorescent-labeled nucleotides with protected 3 ‘ OH groups Sequencing via cycles of base addition/detection followed deprotection of the 3’ OH http://www.youtube.com/watch?v=77r5p8IBwJk&feature=related GenomeTV – Next Generation Sequencing (lecture) http://www.youtube.com/watch?v=g0vGrNjpyA8&feature=related http://finchtalk.geospiza.com/2008/03/color-space-flow-space-sequence-space_23.html
  • 6. Optimal global alignment Optimal local alignment Needleman-Wunsch Smith-Waterman Sequences align essentially from end to end Sequences align only in small, isolated regions Global and local alignments References Needleman and Wunsch (1970). J. Mol. Biol. 48, 443-453. Smith and Waterman (1981). Nucleic Acids Res 13, 645-656.
  • 7.
  • 8. Word size = 3(configurable) Hashing methods References Wilbur & Lipman (1983), PNAS80, 726-30 Lipman & Pearson (1985), Science227, 1435-1441 Pearson & Lipman (1988), PNAS85, 2444-2448 MVRRLPERTSTPACE Query sequence MVR VRR RRL RLP LPE PER ERT RTS TST STP TPA PAC ACE
  • 9.
  • 10.
  • 11.
  • 12.
  • 15. Sensitivity vs. Specificity Sensitivity = actual number of true positives (tp) identified Specificity = number of true negatives (tn) identified Predicted positives negatives positives Actual negatives Sensitivity= TP/(TP+FN) Specificity=TN/(TN+FP)
  • 16. Richa Agarwala MHC Alternate locus Alignment to chr6
  • 17. Tools Alignments BLAST: not for NGS BWA Bowtie Maq … Transcriptomics Tophat Cufflinks … Variant calling ssahaSNP Mosaic … Counting (Chip-Seq, etc) FindPeaks PeakSeq
  • 19. “Standard” File formats Sequence containers FASTA FASTQ BAM/SAM Alignments BAM/SAM MAF Annotation BED GFF/GTF/GFF3 WIG Variation VCF GVF
  • 20. FASTQ: Data Format FASTQ Text based Encodes sequence calls and quality scores with ASCII characters Stores minimal information about the sequence read 4 lines per sequence Line 1: begins with @; followed by sequence identifier and optional description Line 2: the sequence Line 3: begins with the “+” and is followed by sequence identifiers and description (both are optional) Line 4: encoding of quality scores for the sequence in line 2 References/Documentation http://maq.sourceforge.net/fastq.shtml Cock et al. (2009). Nuc Acids Res 38:1767-1771.
  • 21. FASTQ Example For analysis, it may be necessary to convert to the Sanger form of FASTQ…For example, Illumina stores quality scores ranging from 0-62; Sanger quality scores range from 0-93. Solexa quality scores have to be converted to PHRED quality scores. FASTQ example from: Cock et al. (2009). Nuc Acids Res 38:1767-1771.
  • 22. SAM (Sequence Alignment/Map) It may not be necessary to align reads from scratch…you can instead use existing alignments in SAM format SAM is the output of aligners that map reads to a reference genome Tab delimited w/ header section and alignment section Header sections begin with @ (are optional) Alignment section has 11 mandatory fields BAM is the binary format of SAM http://samtools.sourceforge.net/
  • 23. Mandatory Alignment Fields http://samtools.sourceforge.net/SAM1.pdf
  • 24. Alignment Examples Alignments in SAM format http://samtools.sourceforge.net/SAM1.pdf
  • 25. Valid BED files chr1 86114265 86116346 nsv433165 chr2 1841774 1846089 nsv433166 chr16 2950446 2955264 nsv433167 chr17 14350387 14351933 nsv433168 chr17 32831694 32832761 nsv433169 chr17 32831694 32832761 nsv433170 chr18 61880550 61881930 nsv433171 chr1 16759829 16778548 chr1:21667704 270866 - chr1 16763194 16784844 chr1:146691804 407277 + chr1 16763194 16784844 chr1:144004664 408925 - chr1 16763194 16779513 chr1:142857141 291416 - chr1 16763194 16779513 chr1:143522082 293473 - chr1 16763194 16778548 chr1:146844175 284555 - chr1 16763194 16778548 chr1:147006260 284948 - chr1 16763411 16784844 chr1:144747517 405362 +
  • 27. Mouse chrX: 35,000,000-36,000000 X MGSCv3 Build 36
  • 29. GRCh37 hg19 Zv7 danRer5 MGSCv37 mm8 NCBIM37
  • 30. Assemblies with the same name aren’t always the same chr21:8,913,216-9,246,964
  • 31. Assemblies with the same name aren’t always the same Zv7 chr21:8,913,216-9,246,964 X Mouse Build 36 chrX
  • 33. Tutorial Web Site http://www.ncbi.nlm.nih.gov/staff/church/GenomeAnalysis/index.shtml This site will be accessible after the meeting. Check back for updates and new tutorials.
  • 34.
  • 35. RNA Seq Workflow Convert data to FASTQ Upload files to Galaxy Quality Control Throw out low quality sequence reads, etc. Map reads to a reference genome Many algorithms available Trade off between speed and sensitivity Data summarization Associating alignments with genome annotations Counts Data Visualization Statistical Analysis
  • 36. Typical RNA_Seq Project Work Flow Tissue Sample Total RNA mRNA cDNA Sequencing FASTQ file QC TopHat Cufflinks Gene/Transcript/Exon Expression Visualization Statistical Analysis JAX Computational Sciences Service
  • 37. TopHat http://tophat.cbcb.umd.edu/ TopHat is a good tool for aligning RNA Seq data compared to other aligners (Maq, BWA) because it takes splicing into account during the alignment process. Figure from: Trapnell et al. (2010). Nature Biotechnology 28:511-515. Trapnell et al. (2009). Bioinformatics 25:1105-1111.
  • 38. TopHat is built on the Bowtie alignment algorithm. Trapnell C et al. Bioinformatics 2009;25:1105-1111
  • 39.
  • 41. Tests for differential expression and regulation in RNA-Seq samples Trapnell et al. (2010). Nature Biotechnology 28:511-515.
  • 42. Galaxy See Tutorial 1 http://main.g2.bx.psu.edu/ Build and share data and analysis workflows No programming experience required Strong and growing development and user community
  • 43. Short Read Archive http://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi? Short Read Archive Handbook http://www.ncbi.nlm.nih.gov/books/NBK47528/
  • 44. Aspera Connect http://www.asperasoft.com/en/products/client_software_2/aspera_connect_8 High performance file transfer for getting data from the Short Read Archive
  • 46. Galaxy on the Cloud Create an Amazon Web Services AWS account Sign up for Amazon Elastic Compute Cloude (EC2) and Amazon Simple Storage Service (S3 service) Use the AWS Management Console to start a master EC2 instance Use the Galaxy Cloud web interface to manage the cluster Step by step instructions are here: https://bitbucket.org/galaxy/galaxy-central/wiki/cloud Screencast to demonstrate the sign up process is here: https://bitbucket.org/galaxy/galaxy-central/wiki/cloud Afgan E., Baker D., Coraor N., Chapman B., Nekrutenko A., Taylor J. (2010) BMC Bioinformatics. 11:2010.
  • 47. Why Go to the Cloud? Files and Compute needs are much greater for next gen sequence data Amazon cloud provides a scalable, cost-effective solution Afgan E., Baker D., Coraor N., Chapman B., Nekrutenko A., Taylor J. (2010) BMC Bioinformatics. 11:2010.
  • 48. Some Tips You’ll need a credit card to activate the service You’ll need to be near a phone so that you can verify your identity during the sign up process There is a time lag between signing up for AWS and getting access
  • 49. History Dialog/Parameter Selection Tools Let’s Get Started!

Notas del editor

  1. Show alignment of a feature from first slide to show how far down the chromosome it has moved…
  2. Keeping track of people is way easier than keeping track of assemblies.
  3. Can talk about Genomic Collections here