SlideShare una empresa de Scribd logo
1 de 48
Genome Annotation
Karan Veer Singh,
Scientist.
NBAGR, Karnal,
India

1
The Genome
•

The genome contains all the biological information required to
build and maintain any given living organism

•

The genome contains the organisms molecular history

•

Decoding the biological information encoded in these molecules
will have enormous impact in our understanding of biology
Genomics

1.

Structural genomics-genetic and physical mapping of genomes.

2.

Functional genomics-analysis of gene function (and non-genes).

3.

Comparative genomics-comparison of genomes across species.


Includes structural and functional genomics.



Evolutionary genomics.
Human Genome Project

The Human genome project promised to
revolutionise medicine and explain every
base of our DNA.
Large MEDICAL GENETICS focus
Identify variation in
the genome that is
disease causing

Determine how individual
genes play a role in health
and disease
Human Genome Project & Functional
Genome

It cost 3 billion dollars and took 10 years to complete (5 less than
initially predicted).
•

Approx 200 Mb still in progress
– Heterochromatin
– Repetitive
Genomics & Genome
annotation


First genome annotation software system was designed in 1995 by Dr.
Owen White with The Institute for Genomic Research that sequenced
and analyzed the first genome of a free-living organism to be decoded,
the bacterium Haemophilus influenzae



It involve assembling of the reads to form contigs then assembling with
a reference genome (reference assembly) or de novo assembly to
obtain the complete genome



Variations such as mutations, SNP, InDels etc can be identified



The genome is then annotated by structural and functional annotation



Mapping Image of Whole genome in an easily understandable manner.
Sequence to Annotation
Input1 to Genome Viewer- Variant
Annotation
Input2 to Genome Viewer- Structural
Annotation
 Structural

2.5.5)

Annotation- AUGUSTUS (version
Input3 to Genome Viewer-Functional
Annotation
Genome Annotation
 The

process of identifying the locations of
genes and the coding regions in a genome to
determe what those genes do

 Finding

and attaching the structural elements
and its related function to each genome
locations

11
Genome Annotation

gene structure prediction

gene function prediction

Identifying elements
(Introns/exons,CDS,stop,start)
in the genome

Attaching biological information
to these elements- eg: for which
12
protein exon will code for
Structural annotation
Structural annotation - identification of genomic elements
Open reading frame and their localisation
gene structure
coding regions
location of regulatory motifs
Functional annotation
Functional annotation- attaching biological
information to genomic elements
biochemical function
biological function
involved regulations
Genome annotation - workflow
Genome sequence

Repeats

Masked or un-masked genome sequence
Structural annotation-Gene finding
nc-RNAs (tRNA, rRNA),
Introns

Protein-coding genes
Functional annotation

View in Genome viewer
16
Genome Repeats & features
Polymorphic between individuals/populations
 Percentage of repetitive sequences in different organisms
Genome
Aedes aegypti

Genome Size
(Mb)

% Repeat
~70

Anopheles gambiae

260

~30

Culex pipiens







1,300

540

~50

Microsatellite
Minisatellite
Tandem repeat
Short tandem repeat
SSR

17
Finding repeats as a preliminary to gene prediction
 Repeat discovery

Homology based approaches
Use RepeatMasker to search the genome and mask the sequence

18
Masked sequence




Repeatmasked sequence is an artificial construction where those regions which
are thought to be repetitive are marked with X’s
Widely used to reduce the overhead of subsequent computational analyses and
to reduce the impact of TE’s in the final annotation set

>my sequence

>my sequence (repeatmasked)

atgagcttcgatagcgatcagctagcgatcaggct
actattggcttctctagactcgtctatctctatta
gctatcatctcgatagcgatcagctagcgatcagg
ctactattggcttcgatagcgatcagctagcgatc
aggctactattggcttcgatagcgatcagctagcg
atcaggctactattggctgatcttaggtcttctga
tcttct

atgagcttcgatagcgatcagctagcgatcaggct
actattxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxatctcgatagcgatcagctagcgatcagg
ctactattxxxxxxxxxxxxxxxxxxxtagcgatc
aggctactattggcttcgatagcgatcagctagcg
atcaggctxxxxxxxxxxxxxxxxxxxtcttctga
tcttct

Positions/locations are not affected by masking
19
Types of Masking- Hard or Soft?


Sometimes we want to mark up repetitive sequence but not to exclude it from
downstream analyses. This is achieved using a format known as soft-masked

>my sequence

>my sequence (softmasked)

ATGAGCTTCGATAGCGCATCAGCTAGCGATCAGGC
TACTATTGGCTTCTCTAGACTCGTCTATCTCTATT
AGTATCATCTCGATAGCGATCAGCTAGCGATCAGG
CTACTATTGGCTTCGATAGCGATCAGCTAGCGATC
AGGCTACTATTGGCTTCGATAGCGATCAGCTAGCG
ATCAGGCTACTATTGGCTGATCTTAGGTCTTCTGA
TCTTCT

ATGAGCTTCGATAGCGCATCAGCTAGCGATCAGGC
TACTATTggcttctctagactcgtctatctctatt
agtatcATCTCGATAGCGATCAGCTAGCGATCAGG
CTACTATTggcttcgatagcgatcagcTAGCGATC
AGGCTACTATTggcttcgatagcgatcagcTAGCG
ATCAGGCTACTATTGGCTGATCTTAGGTCTTCTGA
TCTTCT

>my sequence (hardmasked)
atgagcttcgatagcgatcagctagcgatcaggct
actattxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxatctcgatagcgatcagctagcgatcagg
ctactattxxxxxxxxxxxxxxxxxxxtagcgatc
aggctactattggcttcgatagcgatcagctagcg
atcaggctxxxxxxxxxxxxxxxxxxxtcttctga20
tcttct
Genome annotation - workflow
Genome sequence

Map repeats

Masked or un-masked
Gene finding- structural annotation
nc-RNAs, Introns

Protein-coding genes
Functional annotation

View in Genome viewer
21
Structural annotation
Identification of genomic elements
 Open

reading frame and their localization
 Coding regions
 Location of regulatory motifs
 Start/Stop
 Splice Sites
 Non coding Regions/RNA’s
 Introns

22
Methods
 Similarity
•

Similarity between sequences which does not necessarily infer any
evolutionary linkage

 Ab- initio prediction
•

Prediction of gene structure from first principles using only the genome
sequence

24
Genefinding
ab initio

similarity

25
ab initio prediction
Genome
Coding
potential
ATG & Stop
codons
Splice sites
ATG & Stop
codons
Coding
potential

Examples:
Genefinder, Augustus,
Glimmer, SNAP, fgenesh

26
Genefinding - similarity
 Use known coding sequence to define coding regions
 EST sequences
 Peptide sequences
Problem to handle fuzzy alignment regions around splice sites
Examples: EST2Genome, exonerate, genewise, Augustus,
Prodigal

Gene-finding - comparative
 Use two or more genomic sequences to predict genes based on
conservation of exon sequences
 Examples: Twinscan and SLAM
27
Genome annotation - workflow
Genome sequence

Map repeats

Masked or un-masked
Gene finding- structural annotation
Gene finding- structural annotation
nc-RNAs, Introns

Protein-coding genes
Functional annotation

View in Genome viewer
28
Genefinding - non-coding RNA genes

 Non-coding RNA genes can be predicted using knowledge of their
structure or by similarity with known examples

 tRNAscan - uses an HMM and co-variance model for prediction of
tRNA genes

 Rfam - a suite of HMM’s trained against a large number of different
RNA genes

29
Gene-finding omissions

Alternative isoforms
Currently there is no good method for predicting alternative isoforms
Only created where supporting transcript evidence is present
Pseudogenes
Each genome project has a fuzzy definition of pseudogenes
Badly curated/described across the board

Promoters
Rarely a priority for a genome project
Some algorithms exist but usually not integrated into an annotation set

30
Practical- structural annotation
Eukaryotes- AUGUSTUS (gene model)

~/Programs/augustus.2.5.5/bin/augustus --strand=both --genemodel=partial
--singlestrand=true --alternatives-from-evidence=true --alternatives-from-sampling=tr
--progress=true --gff3=on --uniqueGeneId=true --species=magnaporthe_grisea
our_genome.fasta >structural_annotation.gff

Prokaryotes – PRODIGAL (Codon Usage table)
~/Programs/prodigal.v2_60.linux -a protein_file.fa -g 11 –d nucleotide_exon_seq.fa
-f gff -i contigs.fa -o genes_quality.txt -s genes_score.txt -t genome_training_file.txt
31
Structural Annotation-output


Structural Annotation conducted using AUGUSTUS (version 2.5.5),
Magnaporthe_grisea as genome model
Functional
annotation

33
Genome annotation - workflow
Genome sequence

Map repeats

Masked or un-masked
Gene finding- structural annotation
nc-RNAs, Introns

Protein-coding genes
Functional annotation

View in Genome viewer
34
Functional annotation
Genome
Transcription

Primary Transcript
RNA processing

Processed mRNA

ATG

STOP

m 7G

AAAn

Translation

Polypeptide
Protein folding

Folded protein
Find function
Enzyme activity

Functional activity

A

B
35
Functional annotation
Attaching biological information to genomic elements
Biochemical

function
Biological function
Involved regulation and interactions
Expression

•

Utilize known structural annotation to predicted protein sequence

36
Functional annotation – Homology Based


Predicted Exons/CDS/ORF are searched against the non-redundant
protein database (NCBI, SwissProt) to search for similarities



Visually assess the top 5-10 hits to identify whether these have
been assigned a function



Functions are assigned

37
Functional annotation - Other features
 Other








features which can be determined

Signal peptides
Transmembrane domains
Low complexity regions
Various binding sites, glycosylation sites etc.
Protein Domain
Secretome

See http://expasy.org/tools/ for a good list of possible prediction algorithms

38
Functional annotation - Other features
(Ontologies)
 Use



of ontologies to annotate gene products

Gene Ontology (GO)




Cellular component
Molecular function
Biological process

39
Practical - FUNCTIONAL
ANNOTATION


Homology Based Method



setup blast database for nucleotide/protein



Blasting the genome.fasta for annotations (nucleotide/protein)



sorting for blast minimum E-value (>=0.01) for nucleotide/protein



assigning functions

40
Functional annotation- output

August 2008

Bioinformatics tools for Comparative Genomics
of Vectors

41
Conclusion


Annotation accuracy is dependent available supporting data at the
time of annotation; update information is necessary



Gene predictions will change over time as new data becomes
available (NCBI) that are much similar than previous ones



Functional assignments will change over time as new data becomes
available (characterization of hypothetical proteins)

42
Genome annotation - workflow
Genome sequence

Map repeats

Masked or un-masked
Gene finding- structural annotation
nc-RNAs, Introns

Protein-coding genes
Functional annotation

View in Genome viewer
43
Genome Viewer
The Files that can be visualised
Annotation files
Indel files
Consensus sequence

Comparative Genomics

44
Genome View

August 2008

45
46
47
48
Short Read track

49
Thank You
50

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Genome sequencing
Genome sequencingGenome sequencing
Genome sequencing
 
DNA microarray
DNA microarrayDNA microarray
DNA microarray
 
Rasmol
RasmolRasmol
Rasmol
 
Gene bank by kk sahu
Gene bank by kk sahuGene bank by kk sahu
Gene bank by kk sahu
 
blast bioinformatics
blast bioinformaticsblast bioinformatics
blast bioinformatics
 
Blast and fasta
Blast and fastaBlast and fasta
Blast and fasta
 
Est database
Est databaseEst database
Est database
 
STRUCTURAL GENOMICS, FUNCTIONAL GENOMICS, COMPARATIVE GENOMICS
STRUCTURAL GENOMICS, FUNCTIONAL GENOMICS, COMPARATIVE GENOMICSSTRUCTURAL GENOMICS, FUNCTIONAL GENOMICS, COMPARATIVE GENOMICS
STRUCTURAL GENOMICS, FUNCTIONAL GENOMICS, COMPARATIVE GENOMICS
 
Ddbj
DdbjDdbj
Ddbj
 
TrEMBL
TrEMBLTrEMBL
TrEMBL
 
sequence of file formats in bioinformatics
sequence of file formats in bioinformaticssequence of file formats in bioinformatics
sequence of file formats in bioinformatics
 
Gen bank databases
Gen bank databasesGen bank databases
Gen bank databases
 
MULTIPLE SEQUENCE ALIGNMENT
MULTIPLE  SEQUENCE  ALIGNMENTMULTIPLE  SEQUENCE  ALIGNMENT
MULTIPLE SEQUENCE ALIGNMENT
 
Structural genomics
Structural genomicsStructural genomics
Structural genomics
 
Dot matrix
Dot matrixDot matrix
Dot matrix
 
sequence alignment
sequence alignmentsequence alignment
sequence alignment
 
Comparative genomics
Comparative genomicsComparative genomics
Comparative genomics
 
Chromosome walking
Chromosome walkingChromosome walking
Chromosome walking
 
Kegg
KeggKegg
Kegg
 
BLAST
BLASTBLAST
BLAST
 

Destacado

BIOL335: How to annotate a genome
BIOL335: How to annotate a genomeBIOL335: How to annotate a genome
BIOL335: How to annotate a genomePaul Gardner
 
Gene prediction methods vijay
Gene prediction methods  vijayGene prediction methods  vijay
Gene prediction methods vijayVijay Hemmadi
 
Gene identification and discovery
Gene identification and discoveryGene identification and discovery
Gene identification and discoveryAmit Ruchi Yadav
 
Functional genomics
Functional genomicsFunctional genomics
Functional genomicsajay301
 
Fine structure of gene
Fine structure of geneFine structure of gene
Fine structure of geneSayali28
 
DNA SEQUENCING METHOD
DNA SEQUENCING METHODDNA SEQUENCING METHOD
DNA SEQUENCING METHODMusa Khan
 
The NCBI Eukaryotic Genome Annotation Pipeline and Alternate Genomic Sequences
The NCBI Eukaryotic Genome Annotation Pipeline and Alternate Genomic SequencesThe NCBI Eukaryotic Genome Annotation Pipeline and Alternate Genomic Sequences
The NCBI Eukaryotic Genome Annotation Pipeline and Alternate Genomic SequencesGenome Reference Consortium
 
Bioinformatics and functional genomics
Bioinformatics and functional genomicsBioinformatics and functional genomics
Bioinformatics and functional genomicsAisha Kalsoom
 
Introduction to Database Concepts
Introduction to Database ConceptsIntroduction to Database Concepts
Introduction to Database ConceptsRosalyn Lemieux
 
What is comparative genomics
What is comparative genomicsWhat is comparative genomics
What is comparative genomicsUsman Arshad
 
Database system concepts
Database system conceptsDatabase system concepts
Database system conceptsKumar
 
Sequence Alignment In Bioinformatics
Sequence Alignment In BioinformaticsSequence Alignment In Bioinformatics
Sequence Alignment In BioinformaticsNikesh Narayanan
 
databases in bioinformatics
databases in bioinformaticsdatabases in bioinformatics
databases in bioinformaticsnadeem akhter
 
Pairwise sequence alignment
Pairwise sequence alignmentPairwise sequence alignment
Pairwise sequence alignmentavrilcoghlan
 

Destacado (20)

Gemome annotation
Gemome annotationGemome annotation
Gemome annotation
 
BIOL335: How to annotate a genome
BIOL335: How to annotate a genomeBIOL335: How to annotate a genome
BIOL335: How to annotate a genome
 
Gene prediction methods vijay
Gene prediction methods  vijayGene prediction methods  vijay
Gene prediction methods vijay
 
Gene identification and discovery
Gene identification and discoveryGene identification and discovery
Gene identification and discovery
 
Functional genomics
Functional genomicsFunctional genomics
Functional genomics
 
Fine structure of gene
Fine structure of geneFine structure of gene
Fine structure of gene
 
DNA SEQUENCING METHOD
DNA SEQUENCING METHODDNA SEQUENCING METHOD
DNA SEQUENCING METHOD
 
Bioalgo 2012-01-gene-prediction-stat
Bioalgo 2012-01-gene-prediction-statBioalgo 2012-01-gene-prediction-stat
Bioalgo 2012-01-gene-prediction-stat
 
The NCBI Eukaryotic Genome Annotation Pipeline and Alternate Genomic Sequences
The NCBI Eukaryotic Genome Annotation Pipeline and Alternate Genomic SequencesThe NCBI Eukaryotic Genome Annotation Pipeline and Alternate Genomic Sequences
The NCBI Eukaryotic Genome Annotation Pipeline and Alternate Genomic Sequences
 
Genome Assembly
Genome AssemblyGenome Assembly
Genome Assembly
 
Bioinformatics and functional genomics
Bioinformatics and functional genomicsBioinformatics and functional genomics
Bioinformatics and functional genomics
 
Introduction to Database Concepts
Introduction to Database ConceptsIntroduction to Database Concepts
Introduction to Database Concepts
 
What is comparative genomics
What is comparative genomicsWhat is comparative genomics
What is comparative genomics
 
Database system concepts
Database system conceptsDatabase system concepts
Database system concepts
 
Biological databases
Biological databasesBiological databases
Biological databases
 
Sequence Alignment In Bioinformatics
Sequence Alignment In BioinformaticsSequence Alignment In Bioinformatics
Sequence Alignment In Bioinformatics
 
Types of genomics ppt
Types of genomics pptTypes of genomics ppt
Types of genomics ppt
 
databases in bioinformatics
databases in bioinformaticsdatabases in bioinformatics
databases in bioinformatics
 
DNA Sequencing
DNA SequencingDNA Sequencing
DNA Sequencing
 
Pairwise sequence alignment
Pairwise sequence alignmentPairwise sequence alignment
Pairwise sequence alignment
 

Similar a Genome annotation 2013

genomeannotation2013-140127002622-phpapp02.ppt
genomeannotation2013-140127002622-phpapp02.pptgenomeannotation2013-140127002622-phpapp02.ppt
genomeannotation2013-140127002622-phpapp02.pptMohamedHasan816582
 
BTC 506 Gene Identification using Bioinformatic Tools-230302130331.pptx
BTC 506 Gene Identification using Bioinformatic Tools-230302130331.pptxBTC 506 Gene Identification using Bioinformatic Tools-230302130331.pptx
BTC 506 Gene Identification using Bioinformatic Tools-230302130331.pptxChijiokeNsofor
 
Thesis def
Thesis defThesis def
Thesis defJay Vyas
 
Bioinformatics.Practical Notebook
Bioinformatics.Practical NotebookBioinformatics.Practical Notebook
Bioinformatics.Practical NotebookNaima Tahsin
 
Whole genome sequencing of bacteria & analysis
Whole genome sequencing of bacteria & analysisWhole genome sequencing of bacteria & analysis
Whole genome sequencing of bacteria & analysisdrelamuruganvet
 
Functional annotation
Functional annotationFunctional annotation
Functional annotationRavi Gandham
 
Cool Informatics Tools and Services for Biomedical Research
Cool Informatics Tools and Services for Biomedical ResearchCool Informatics Tools and Services for Biomedical Research
Cool Informatics Tools and Services for Biomedical ResearchDavid Ruau
 
Pathema Burkholderia Annotation Jamboree: Prokaryotic Annotation Overview
Pathema Burkholderia Annotation Jamboree: Prokaryotic Annotation OverviewPathema Burkholderia Annotation Jamboree: Prokaryotic Annotation Overview
Pathema Burkholderia Annotation Jamboree: Prokaryotic Annotation OverviewPathema
 

Similar a Genome annotation 2013 (20)

genomeannotation2013-140127002622-phpapp02.ppt
genomeannotation2013-140127002622-phpapp02.pptgenomeannotation2013-140127002622-phpapp02.ppt
genomeannotation2013-140127002622-phpapp02.ppt
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
bioinformatic.pptx
bioinformatic.pptxbioinformatic.pptx
bioinformatic.pptx
 
Genome comparision
Genome comparisionGenome comparision
Genome comparision
 
BTC 506 Gene Identification using Bioinformatic Tools-230302130331.pptx
BTC 506 Gene Identification using Bioinformatic Tools-230302130331.pptxBTC 506 Gene Identification using Bioinformatic Tools-230302130331.pptx
BTC 506 Gene Identification using Bioinformatic Tools-230302130331.pptx
 
Paper - Muhammad Gulraj
Paper - Muhammad GulrajPaper - Muhammad Gulraj
Paper - Muhammad Gulraj
 
proteome.pdf
proteome.pdfproteome.pdf
proteome.pdf
 
Thesis def
Thesis defThesis def
Thesis def
 
Bioinformatics.Practical Notebook
Bioinformatics.Practical NotebookBioinformatics.Practical Notebook
Bioinformatics.Practical Notebook
 
Gene identification using bioinformatic tools.pptx
Gene identification using bioinformatic tools.pptxGene identification using bioinformatic tools.pptx
Gene identification using bioinformatic tools.pptx
 
Biological databases
Biological databasesBiological databases
Biological databases
 
M Sc Project
M Sc ProjectM Sc Project
M Sc Project
 
Whole genome sequencing of bacteria & analysis
Whole genome sequencing of bacteria & analysisWhole genome sequencing of bacteria & analysis
Whole genome sequencing of bacteria & analysis
 
Functional annotation
Functional annotationFunctional annotation
Functional annotation
 
Genome project.pdf
Genome project.pdfGenome project.pdf
Genome project.pdf
 
Gene prediction strategies
Gene prediction strategies Gene prediction strategies
Gene prediction strategies
 
Cool Informatics Tools and Services for Biomedical Research
Cool Informatics Tools and Services for Biomedical ResearchCool Informatics Tools and Services for Biomedical Research
Cool Informatics Tools and Services for Biomedical Research
 
Pathema Burkholderia Annotation Jamboree: Prokaryotic Annotation Overview
Pathema Burkholderia Annotation Jamboree: Prokaryotic Annotation OverviewPathema Burkholderia Annotation Jamboree: Prokaryotic Annotation Overview
Pathema Burkholderia Annotation Jamboree: Prokaryotic Annotation Overview
 
Introduction to Apollo for i5k
Introduction to Apollo for i5kIntroduction to Apollo for i5k
Introduction to Apollo for i5k
 
NCBI
NCBINCBI
NCBI
 

Más de Karan Veer Singh

Yak genetic resources of india
Yak genetic resources of indiaYak genetic resources of india
Yak genetic resources of indiaKaran Veer Singh
 
Social groups for awareness
Social groups for awarenessSocial groups for awareness
Social groups for awarenessKaran Veer Singh
 
Access and Benefit sharing from Genetic Resources
Access and Benefit sharing from Genetic ResourcesAccess and Benefit sharing from Genetic Resources
Access and Benefit sharing from Genetic ResourcesKaran Veer Singh
 
Indian acts governing different IPRs
Indian acts governing different IPRsIndian acts governing different IPRs
Indian acts governing different IPRsKaran Veer Singh
 
Ip protected invention in the field of biotechnology
Ip protected invention in the field of biotechnologyIp protected invention in the field of biotechnology
Ip protected invention in the field of biotechnologyKaran Veer Singh
 
Patent In Molecular Biology
Patent In Molecular BiologyPatent In Molecular Biology
Patent In Molecular BiologyKaran Veer Singh
 
MICROSATELITE Markers for LIVESTOCK Genetic DIVERSITY ANALYSES
MICROSATELITE Markers for LIVESTOCK Genetic DIVERSITY ANALYSESMICROSATELITE Markers for LIVESTOCK Genetic DIVERSITY ANALYSES
MICROSATELITE Markers for LIVESTOCK Genetic DIVERSITY ANALYSESKaran Veer Singh
 
Semen Banking for conservation of livestock biodiversity
Semen Banking for conservation of  livestock biodiversitySemen Banking for conservation of  livestock biodiversity
Semen Banking for conservation of livestock biodiversityKaran Veer Singh
 
DiGE....2-D gel electrophoresis
DiGE....2-D gel electrophoresisDiGE....2-D gel electrophoresis
DiGE....2-D gel electrophoresisKaran Veer Singh
 

Más de Karan Veer Singh (20)

Pcr primer design
Pcr primer designPcr primer design
Pcr primer design
 
Yak genetic resources of india
Yak genetic resources of indiaYak genetic resources of india
Yak genetic resources of india
 
DNA Barcoding
DNA BarcodingDNA Barcoding
DNA Barcoding
 
Microsatellites Markers
Microsatellites  MarkersMicrosatellites  Markers
Microsatellites Markers
 
Tick identification guide
Tick identification guideTick identification guide
Tick identification guide
 
Social groups for awareness
Social groups for awarenessSocial groups for awareness
Social groups for awareness
 
Access and Benefit sharing from Genetic Resources
Access and Benefit sharing from Genetic ResourcesAccess and Benefit sharing from Genetic Resources
Access and Benefit sharing from Genetic Resources
 
IPR
IPRIPR
IPR
 
Indian acts governing different IPRs
Indian acts governing different IPRsIndian acts governing different IPRs
Indian acts governing different IPRs
 
Ip protected invention in the field of biotechnology
Ip protected invention in the field of biotechnologyIp protected invention in the field of biotechnology
Ip protected invention in the field of biotechnology
 
Patent In Molecular Biology
Patent In Molecular BiologyPatent In Molecular Biology
Patent In Molecular Biology
 
NGS - QC & Dataformat
NGS - QC & Dataformat NGS - QC & Dataformat
NGS - QC & Dataformat
 
MICROSATELITE Markers for LIVESTOCK Genetic DIVERSITY ANALYSES
MICROSATELITE Markers for LIVESTOCK Genetic DIVERSITY ANALYSESMICROSATELITE Markers for LIVESTOCK Genetic DIVERSITY ANALYSES
MICROSATELITE Markers for LIVESTOCK Genetic DIVERSITY ANALYSES
 
Rna seq pipeline
Rna seq pipelineRna seq pipeline
Rna seq pipeline
 
Semen Banking for conservation of livestock biodiversity
Semen Banking for conservation of  livestock biodiversitySemen Banking for conservation of  livestock biodiversity
Semen Banking for conservation of livestock biodiversity
 
DiGE....2-D gel electrophoresis
DiGE....2-D gel electrophoresisDiGE....2-D gel electrophoresis
DiGE....2-D gel electrophoresis
 
Tecto3
Tecto3Tecto3
Tecto3
 
Paradigm
ParadigmParadigm
Paradigm
 
Electrophoresis
ElectrophoresisElectrophoresis
Electrophoresis
 
Electrophoresis
ElectrophoresisElectrophoresis
Electrophoresis
 

Último

Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application ) Sakshi Ghasle
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docxPoojaSen20
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting DataJhengPantaleon
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Concept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfConcept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfUmakantAnnand
 
PSYCHIATRIC History collection FORMAT.pptx
PSYCHIATRIC   History collection FORMAT.pptxPSYCHIATRIC   History collection FORMAT.pptx
PSYCHIATRIC History collection FORMAT.pptxPoojaSen20
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 

Último (20)

Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application )
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Staff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSDStaff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSD
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docx
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Concept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfConcept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.Compdf
 
PSYCHIATRIC History collection FORMAT.pptx
PSYCHIATRIC   History collection FORMAT.pptxPSYCHIATRIC   History collection FORMAT.pptx
PSYCHIATRIC History collection FORMAT.pptx
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 

Genome annotation 2013

  • 1. Genome Annotation Karan Veer Singh, Scientist. NBAGR, Karnal, India 1
  • 2. The Genome • The genome contains all the biological information required to build and maintain any given living organism • The genome contains the organisms molecular history • Decoding the biological information encoded in these molecules will have enormous impact in our understanding of biology
  • 3. Genomics 1. Structural genomics-genetic and physical mapping of genomes. 2. Functional genomics-analysis of gene function (and non-genes). 3. Comparative genomics-comparison of genomes across species.  Includes structural and functional genomics.  Evolutionary genomics.
  • 4. Human Genome Project The Human genome project promised to revolutionise medicine and explain every base of our DNA. Large MEDICAL GENETICS focus Identify variation in the genome that is disease causing Determine how individual genes play a role in health and disease
  • 5. Human Genome Project & Functional Genome It cost 3 billion dollars and took 10 years to complete (5 less than initially predicted). • Approx 200 Mb still in progress – Heterochromatin – Repetitive
  • 6. Genomics & Genome annotation  First genome annotation software system was designed in 1995 by Dr. Owen White with The Institute for Genomic Research that sequenced and analyzed the first genome of a free-living organism to be decoded, the bacterium Haemophilus influenzae  It involve assembling of the reads to form contigs then assembling with a reference genome (reference assembly) or de novo assembly to obtain the complete genome  Variations such as mutations, SNP, InDels etc can be identified  The genome is then annotated by structural and functional annotation  Mapping Image of Whole genome in an easily understandable manner.
  • 8. Input1 to Genome Viewer- Variant Annotation
  • 9. Input2 to Genome Viewer- Structural Annotation  Structural 2.5.5) Annotation- AUGUSTUS (version
  • 10. Input3 to Genome Viewer-Functional Annotation
  • 11. Genome Annotation  The process of identifying the locations of genes and the coding regions in a genome to determe what those genes do  Finding and attaching the structural elements and its related function to each genome locations 11
  • 12. Genome Annotation gene structure prediction gene function prediction Identifying elements (Introns/exons,CDS,stop,start) in the genome Attaching biological information to these elements- eg: for which 12 protein exon will code for
  • 13. Structural annotation Structural annotation - identification of genomic elements Open reading frame and their localisation gene structure coding regions location of regulatory motifs
  • 14. Functional annotation Functional annotation- attaching biological information to genomic elements biochemical function biological function involved regulations
  • 15. Genome annotation - workflow Genome sequence Repeats Masked or un-masked genome sequence Structural annotation-Gene finding nc-RNAs (tRNA, rRNA), Introns Protein-coding genes Functional annotation View in Genome viewer 16
  • 16. Genome Repeats & features Polymorphic between individuals/populations  Percentage of repetitive sequences in different organisms Genome Aedes aegypti Genome Size (Mb) % Repeat ~70 Anopheles gambiae 260 ~30 Culex pipiens      1,300 540 ~50 Microsatellite Minisatellite Tandem repeat Short tandem repeat SSR 17
  • 17. Finding repeats as a preliminary to gene prediction  Repeat discovery Homology based approaches Use RepeatMasker to search the genome and mask the sequence 18
  • 18. Masked sequence   Repeatmasked sequence is an artificial construction where those regions which are thought to be repetitive are marked with X’s Widely used to reduce the overhead of subsequent computational analyses and to reduce the impact of TE’s in the final annotation set >my sequence >my sequence (repeatmasked) atgagcttcgatagcgatcagctagcgatcaggct actattggcttctctagactcgtctatctctatta gctatcatctcgatagcgatcagctagcgatcagg ctactattggcttcgatagcgatcagctagcgatc aggctactattggcttcgatagcgatcagctagcg atcaggctactattggctgatcttaggtcttctga tcttct atgagcttcgatagcgatcagctagcgatcaggct actattxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxatctcgatagcgatcagctagcgatcagg ctactattxxxxxxxxxxxxxxxxxxxtagcgatc aggctactattggcttcgatagcgatcagctagcg atcaggctxxxxxxxxxxxxxxxxxxxtcttctga tcttct Positions/locations are not affected by masking 19
  • 19. Types of Masking- Hard or Soft?  Sometimes we want to mark up repetitive sequence but not to exclude it from downstream analyses. This is achieved using a format known as soft-masked >my sequence >my sequence (softmasked) ATGAGCTTCGATAGCGCATCAGCTAGCGATCAGGC TACTATTGGCTTCTCTAGACTCGTCTATCTCTATT AGTATCATCTCGATAGCGATCAGCTAGCGATCAGG CTACTATTGGCTTCGATAGCGATCAGCTAGCGATC AGGCTACTATTGGCTTCGATAGCGATCAGCTAGCG ATCAGGCTACTATTGGCTGATCTTAGGTCTTCTGA TCTTCT ATGAGCTTCGATAGCGCATCAGCTAGCGATCAGGC TACTATTggcttctctagactcgtctatctctatt agtatcATCTCGATAGCGATCAGCTAGCGATCAGG CTACTATTggcttcgatagcgatcagcTAGCGATC AGGCTACTATTggcttcgatagcgatcagcTAGCG ATCAGGCTACTATTGGCTGATCTTAGGTCTTCTGA TCTTCT >my sequence (hardmasked) atgagcttcgatagcgatcagctagcgatcaggct actattxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxatctcgatagcgatcagctagcgatcagg ctactattxxxxxxxxxxxxxxxxxxxtagcgatc aggctactattggcttcgatagcgatcagctagcg atcaggctxxxxxxxxxxxxxxxxxxxtcttctga20 tcttct
  • 20. Genome annotation - workflow Genome sequence Map repeats Masked or un-masked Gene finding- structural annotation nc-RNAs, Introns Protein-coding genes Functional annotation View in Genome viewer 21
  • 21. Structural annotation Identification of genomic elements  Open reading frame and their localization  Coding regions  Location of regulatory motifs  Start/Stop  Splice Sites  Non coding Regions/RNA’s  Introns 22
  • 22. Methods  Similarity • Similarity between sequences which does not necessarily infer any evolutionary linkage  Ab- initio prediction • Prediction of gene structure from first principles using only the genome sequence 24
  • 24. ab initio prediction Genome Coding potential ATG & Stop codons Splice sites ATG & Stop codons Coding potential Examples: Genefinder, Augustus, Glimmer, SNAP, fgenesh 26
  • 25. Genefinding - similarity  Use known coding sequence to define coding regions  EST sequences  Peptide sequences Problem to handle fuzzy alignment regions around splice sites Examples: EST2Genome, exonerate, genewise, Augustus, Prodigal Gene-finding - comparative  Use two or more genomic sequences to predict genes based on conservation of exon sequences  Examples: Twinscan and SLAM 27
  • 26. Genome annotation - workflow Genome sequence Map repeats Masked or un-masked Gene finding- structural annotation Gene finding- structural annotation nc-RNAs, Introns Protein-coding genes Functional annotation View in Genome viewer 28
  • 27. Genefinding - non-coding RNA genes  Non-coding RNA genes can be predicted using knowledge of their structure or by similarity with known examples  tRNAscan - uses an HMM and co-variance model for prediction of tRNA genes  Rfam - a suite of HMM’s trained against a large number of different RNA genes 29
  • 28. Gene-finding omissions Alternative isoforms Currently there is no good method for predicting alternative isoforms Only created where supporting transcript evidence is present Pseudogenes Each genome project has a fuzzy definition of pseudogenes Badly curated/described across the board Promoters Rarely a priority for a genome project Some algorithms exist but usually not integrated into an annotation set 30
  • 29. Practical- structural annotation Eukaryotes- AUGUSTUS (gene model) ~/Programs/augustus.2.5.5/bin/augustus --strand=both --genemodel=partial --singlestrand=true --alternatives-from-evidence=true --alternatives-from-sampling=tr --progress=true --gff3=on --uniqueGeneId=true --species=magnaporthe_grisea our_genome.fasta >structural_annotation.gff Prokaryotes – PRODIGAL (Codon Usage table) ~/Programs/prodigal.v2_60.linux -a protein_file.fa -g 11 –d nucleotide_exon_seq.fa -f gff -i contigs.fa -o genes_quality.txt -s genes_score.txt -t genome_training_file.txt 31
  • 30. Structural Annotation-output  Structural Annotation conducted using AUGUSTUS (version 2.5.5), Magnaporthe_grisea as genome model
  • 32. Genome annotation - workflow Genome sequence Map repeats Masked or un-masked Gene finding- structural annotation nc-RNAs, Introns Protein-coding genes Functional annotation View in Genome viewer 34
  • 33. Functional annotation Genome Transcription Primary Transcript RNA processing Processed mRNA ATG STOP m 7G AAAn Translation Polypeptide Protein folding Folded protein Find function Enzyme activity Functional activity A B 35
  • 34. Functional annotation Attaching biological information to genomic elements Biochemical function Biological function Involved regulation and interactions Expression • Utilize known structural annotation to predicted protein sequence 36
  • 35. Functional annotation – Homology Based  Predicted Exons/CDS/ORF are searched against the non-redundant protein database (NCBI, SwissProt) to search for similarities  Visually assess the top 5-10 hits to identify whether these have been assigned a function  Functions are assigned 37
  • 36. Functional annotation - Other features  Other       features which can be determined Signal peptides Transmembrane domains Low complexity regions Various binding sites, glycosylation sites etc. Protein Domain Secretome See http://expasy.org/tools/ for a good list of possible prediction algorithms 38
  • 37. Functional annotation - Other features (Ontologies)  Use  of ontologies to annotate gene products Gene Ontology (GO)    Cellular component Molecular function Biological process 39
  • 38. Practical - FUNCTIONAL ANNOTATION  Homology Based Method  setup blast database for nucleotide/protein  Blasting the genome.fasta for annotations (nucleotide/protein)  sorting for blast minimum E-value (>=0.01) for nucleotide/protein  assigning functions 40
  • 39. Functional annotation- output August 2008 Bioinformatics tools for Comparative Genomics of Vectors 41
  • 40. Conclusion  Annotation accuracy is dependent available supporting data at the time of annotation; update information is necessary  Gene predictions will change over time as new data becomes available (NCBI) that are much similar than previous ones  Functional assignments will change over time as new data becomes available (characterization of hypothetical proteins) 42
  • 41. Genome annotation - workflow Genome sequence Map repeats Masked or un-masked Gene finding- structural annotation nc-RNAs, Introns Protein-coding genes Functional annotation View in Genome viewer 43
  • 42. Genome Viewer The Files that can be visualised Annotation files Indel files Consensus sequence Comparative Genomics 44
  • 44. 46
  • 45. 47
  • 46. 48

Notas del editor

  1. Try to describe Genome annotation as a process Emphasize the ongoing nature of annotation. There is no real end point to the annotation process (only artificially defined ones) Best to think of this as a ‘best guess’ annotation
  2. Softmasking
  3. Softmasking
  4. Try to describe Genome annotation as a process Emphasize the ongoing nature of annotation. There is no real end point to the annotation process (only artificially defined ones) Best to think of this as a ‘best guess’ annotation
  5. Try to describe Genome annotation as a process Emphasize the ongoing nature of annotation. There is no real end point to the annotation process (only artificially defined ones) Best to think of this as a ‘best guess’ annotation
  6. Try to describe Genome annotation as a process Emphasize the ongoing nature of annotation. There is no real end point to the annotation process (only artificially defined ones) Best to think of this as a ‘best guess’ annotation
  7. Try to describe Genome annotation as a process Emphasize the ongoing nature of annotation. There is no real end point to the annotation process (only artificially defined ones) Best to think of this as a ‘best guess’ annotation