SlideShare una empresa de Scribd logo
1 de 76
Motivation
Your research is valuable
All advances in knowledge are incremental, with
each new idea ultimately building on earlier
knowledge such as you are gathering.
Losing data at a rapid rate
up to 80% unavailable after 20 years
2
http://www.nature.com/news/scientists-losing-data-at-a-rapid-rate-1.14416
Data valuation
 Information is infinitely shareable without any loss of
value
 Reuse increases the value derived from the original
investment
 By combining data, their value increases
 The more these assets are used, the more additional
knowledge can be gathered (data science)
 As a corollary, unshared or insufficiently documented
information is less valuable
 The more accurate and complete the information is,
the more useful, and therefore valuable, it is
Moody and Walsh 1999
WHAT ONTOLOGIES ARE
eye
 what kinds of
things exist?
 what are the
relationships
between
these things?
ommatidium
sense organeye disc
is_a
part_of
develops
from
A biological ontology is:
 A machine interpretable representation of
some aspect of biological reality
October 25, 2016
Ontology defined
 The science of what is: of the kinds and
structures of the objects, and their properties
and relations in every area of reality.
 The classification of entities and the relations
between them.
 Defined by a scientific field's vocabulary and by
the canonical formulations of its theories.
 Seeks to solve problems which arise in these
domains.
WHY ARE ONTOLOGIES
NEEDED
Ontologies help with decision
making
handy ontology tells us what’s there…
Where
should I
eat…?
Ontologies don’t just organize data; they
also facilitate inference,
and that creates new knowledge, often
unconsciously in the user.
(Presumable)
country of origin
Type of cuisine
What a 5 year old child (or a computer) will likely infer
about the world from this helpful ontology…
Flag of fresh juice
‘Frozen Yogurt’ cuisine in
search of a national identity?
Where delicatessen food
hails from from…
Fresh Juice is a national cuisine…
Information retrieval is not straightforward
 18-day pregnant females
 female (lactating)
 individual female
 worker caste (female)
 2 yr old female
 female (pregnant)
 lgb*cc females
 sex: female
 400 yr. old female
 female (outbred)
 mare
 female, other
 adult female
 female parent
 female (worker)
 female child
 asexual female
 female plant
 monosex female
 femal
 femlale
 diploid female
 female(gynoecious)
 remale
 metafemale
 f
 femele
 semi-engorged female
 sterile female
 famale
 female, pooled
 sexual oviparous female
 normal female
 femail
 femalen
 sterile female worker
 sf
 female
 females
 strictly female
 vitellogenic replete
female
 female - worker
 females only
 tetraploid female
 worker
 female (alate sexual)
 gynoecious
 thelytoky
 hexaploid female
 female (calf)
 healthy female
 female (gynoecious)
 female (f-o)
 hen
 probably female (based
on morphology)
 castrate female
 female with eggs
 ovigerous female
 3 female
 cf.female
 female worker
 oviparous sexual
females
 female (phenotype)
 cystocarpic female
 female, 6-8 weeks old
 worker bee
 female mice
 dikaryon
 female, virgin
 female enriched
 female, spayed
 dioecious female
 female, worker
 pseudohermaprhoditic
female
Courtesy of N. Silvester and S. Orchard, European Nucleotide Archive, EMBL-
EBI
October 25, 2016
Motivation is to represent biology
accurately
 Inferences and decisions we make are
based upon what we know of the
biological reality.
 An ontology is a computable
representation of this underlying biological
reality.
 Enables a computer to reason over the
data in (some of) the ways that we do.
Annotation bottleneck
 Even the best research will be for naught if
data can never be found again.
 An active lab can easily generate 10-100GB of
data per month, and it is very difficult to
manage on this scale.
 Must be annotated at the rate at which it is
generated
 And the data must be integrated with other data
 Furthermore, the effort put into generating this data
will be utterly wasted if the curated data cannot be
reliably computed upon.
HOW TO BUILD ONTOLOGIES
Ontologies must be shared
 Communities form scientific theories
 that seek to explain all of the existing evidence
 and can be used for prediction
 The computable representation must also be
shared
 Thus ontology development is inherently
collaborative
October 25, 2016
October 25, 2016
Ontologies must be used
 Usage feeds back on ontology development and
improves the ontology
 It improves even more when these data are used
to answer research questions
 There will be fewer problems in the ontology and
more commitment to fixing remaining problems
when important research data is involved that
scientists depend upon
Why do we need rules for good
ontology?
 Ontologies must be intelligible
 To humans (for annotation) and
 To machines (for searching, reasoning and error-checking)
 Makes it easier to find the most accurate term(s) to use
 Avoids annotation errors
 Makes it easier for new curators to learn and understand
 Makes it easier to combine with other ontologies and terminologies
 Makes automatic reasoning possible for searching & inference
 Bottom line:
 Following basic rules makes more useful ontologies
October 25, 2016
First Rule: Univocity
 Terms (including those describing
relations) should have the same meanings
on every occasion of use.
 In other words, they should refer to the
same kinds of entities in reality
October 25, 2016
Glucose
synthesis
GluconeogenesisGlucose
synthesis
?
The Challenge of Univocity:
People call the same thing by different names
Comparison is difficult, especially across species or across
databases that each use one of these different variants
Disambiguation
 Use a single term, and
plenty of synonyms
 Gluconeogenesis
 Synonyms:
 Glucose synthesis
 Glucose biosynthesis
 Glucose formation
 Glucose anabolism
October 25, 2016
Bud initiation? How is a
computer to know?
= tooth bud initiation
= cellular bud initiation
= flower bud initiation
Include plain “bud initiation” as a synonym for each of
these terms
Classification rule:
Disambiguation
October 25, 2016
Second Rule: Positivity
 Complements of classes are not
themselves classes.
 Terms such as ‘non-mammal’ or ‘non-
membrane’ do not designate genuine
classes.
October 25, 2016
The Challenge of Positivity
Some organelles are membrane-bound.
A centrosome is not a membrane bound organelle,
but it still may be considered an organelle.
October 25, 2016
Positivity
 Note the logical difference between
 “non-membrane-bound organelle” and
 “not a membrane-bound organelle”
 The latter includes everything that is not a
membrane bound organelle!
October 25, 2016
Third Rule: Objectivity
 Which classes exist is not a function of our
biological knowledge.
 Terms such as ‘unknown’ or
‘unclassified’ or ‘unlocalized’ do not
designate biological natural kinds.
Objectivity
 How can we annotate when we know that
we don’t have any information?
 Annotate to root nodes and use the ND (no data)
evidence code
 Similar strategies can be used for any
situation more specific information is not
yet known
October 25, 2016
October 25, 2016
GPCRs with unknown ligands
Annotate
to this
Ontologies are graphs, where the nodes (terms in the
ontology ) are connected by edges (relationships
between the terms)
is-a
part-of
Fourth Rule: Use defined
relationships
mitochondrial
membrane
chloroplast
Cell
membrane
Chloroplast
membrane
Reasoning is critical
 Prokaryotic and
Eukaryotic cell are
declared disjoints
 Fungal cell is a
Eukaryotic cell
 Spore is a Fungal cell
and a Prokaryotic cell
Satisfiable?
http://www.plosone.org/article/info:doi/10.1371/journal.pone.0022006
32
Prokaryotic
Cell
Eukaryotic
Cell
Fungal
Cell
Spore
disjoint
Reasoning is critical
Solution: clarify spore
http://www.plosone.org/article/info:doi/10.1371/journal.pone.0022006
33
Prokaryotic
Cell
Eukaryotic
Cell
Fungal
Cell
disjoint
Actinomycete
Type Spore
Mycetozoa
Type Spore
October 25, 2016
Fifth Rule: Intelligibility of Definitions
 The terms used in a definition should be
simpler (more intelligible) than the term to
be defined
 otherwise the definition provides no
assistance
 to human understanding
 for machine processing
October 25, 2016
Sixth Rule: Keep it Real
 When building or maintaining an ontology,
always think carefully at how classes
(types, kinds, species) relate to instances
in reality
October 25, 2016
The Rules
1. Univocity: Terms should have the same meanings
on every occasion of use
2. Positivity: Terms such as ‘non-mammal’ or ‘non-
membrane’ do not designate genuine classes.
3. Objectivity: Terms such as ‘unknown’ or
‘unclassified’ or ‘unlocalized’ do not designate
biological natural kinds.
4. Single Inheritance: No class in a classification
hierarchy should have more than one is_a parent
on the immediate higher level
5. Intelligibility of Definitions: The terms used in a
definition should be simpler (more intelligible) than
the term to be defined
6. Basis in Reality: When building or maintaining an
ontology, always think carefully at how classes
relate to instances in reality
7. Distinguish Universals and Instances
Natural Language Computable Ontology
+ Large existing body of information
+ Highly expressive
- Ambiguous (making it difficult and
unreliable to compute on) - Less expressive
+ Logical
+ Precise
How to best describe biology?
ONTOLOGIES AND BIOLOGY
Without rigor, we won’t—know what we
know, or where to find it, or what we can infer
from it.
GENOME ANNOTATION
Apollo
Once a genome is sequenced…
 What are the parts? (sequence features)
 Protein coding genes (coding sequence)
 Non coding RNAs (rRNA, snoRNA, tRNA, microRNA
antisense RNA)
 Promoters and regulatory regions
 Transposons
 Recombination hotspots, origins of replication
 Centromeres & telomeres
 …
ComputeCrawler
RepeatMasker
Genscan
FgenesH
Grail
Blast
Sim4
Genewise
Lap
CGTGTGCGCAGGGGGATATGCGGCGCATATTGTGTTGAAGAGATGCGCTGCATTTCGCGATGCCGATTAGGNCACAGGGAA
DNA on a linear coordinate
Little boxes
de novo predictions
protein alignments
transcript alignments
full length cDNAs
APOLLO
annotation editing environment
BECOMING ACQUAINTED WITH APOLLO
Color by CDS frame,
toggle strands, set color
scheme and highlights.
Upload evidence files
(GFF3, BAM, BigWig),
add combination and
sequence search
tracks.
Query the genome using
BLAT.
Navigation and zoom.
Search for a gene
model or a scaffold.
Get coordinates and “rubber
band” selection for zooming.
Login
User-created
annotations.
Annotator
panel.
Evidence
Tracks
Stage and
cell-type
specific
transcription
data.
http://genomearchitect.org/web_apollo_user_guide
Coordinate transforms:
Curator ‘ligation’
Coordinate transforms:
intron folding
Alterations: whether experimental artifacts or
natural differences
Substitutions
Alterations: whether experimental artifacts or
natural differences
Insertions
Alterations: whether experimental artifacts or
natural differences
Deletions
Alterations: whether experimental artifacts or
natural differences
Impact
Instructions54 |
APOLLO ON THE WEB
instructions
Username:
user.number@example.com
Password:
usernumber
Email Password Server Begin at
user.one@example.com userone 1 1
user.two@example.com usertwo 2 1
user.three@example.com userthree 3 1
user.four@example.com userfour 4 1
user.five@example.com userfive 5 1
user.six@example.com usersix 1 7
user.seven@example.com userseven 2 7
user.eight@example.com usereight 3 7
user.nine@example.com usernine 4 7
user.ten@example.com userten 5 7
user.eleven@example.com usereleven 1 1
user.twelve@example.com usertwelve 2 1
user.thirteen@example.com userthirteen 3 1
user.fourteen@example.com userfourteen 4 1
user.fifteen@example.com userfifteen 5 1
user.sixteen@example.com usersixteen 1 7
user.seventeen@example.com userseventeen 2 7
user.eightteen@example.com usereighteen 3 7
user.nineteen@example.com usernineteen 4 7
user.twenty@example.com usertwenty 5 7
user.twentyone@example.com usertwentyone 1 1
user.twentytwo@example.com usertwentytwo 2 1
user.twentythree@example.com usertwentythree 3 1
user.twentyfour@example.com usertwentyfour 4 1
user.twentyfive@example.com usertwentyfive 5 1
user.twentysix@example.com usertwentysix 1 7
user.twentyseven@example.com usertwentyseven 2 7
user.twentyeight@example.com usertwentyeight 3 7
user.twentynine@example.com usertwentynine 4 7
Server URL
1
http://ec2-52-63-181-136.ap-southeast-
2.compute.amazonaws.com/apollo/
2
http://ec2-52-64-198-214.ap-southeast-
2.compute.amazonaws.com/apollo/
3
http://ec2-52-62-166-89.ap-southeast-
2.compute.amazonaws.com/apollo/
4
http://ec2-52-64-182-170.ap-southeast-
2.compute.amazonaws.com/apollo/
5
http://ec2-52-63-255-136.ap-southeast-
2.compute.amazonaws.com/apollo/
GCGAAGTGCCAACTTCTACACACACAAAG
GCGAAGTGCCAACTTCTACACACACAAAG
For example – ontologically described
genotypes/variants
intrinsic genotype
genomic variation
complementgenomic background
= +
CGTAGC
CGTACC
apchu745/+; fgfa8ti282/ti282(AB)
genomic variation
complement
variant single locus
complement
variant allele
sequence alteration
has_part has_part
apchu745/+
apchu745
hu745
has_part has_part
has_part has_part
X
AACGTACCGACGCTCGCTACGGGCGTATC
(AB) apchu745/+; fgf8ati282/ti282
apchu745/+; fgf8ati282/ti282
GCGAAGTGCCAACTTCTACACACACAAAG
GCGAAGTGCCAACTTCTACACACACAAAG
AACGTAGCGACGCTCGCTACGGGCGTATC
AACGTACCGACGCTCGCTACGGGCGTATC X
ACAC
X
X
X
X
AACGTAGCGACGCTCGCTACGGGCGTATC
X ACAC
X
X
X
X
X
FUNCTIONAL ANNOTATION
Phylogenetic Annotation Inferencing Tool
— PAINT
Evolutionary history is the
natural way to organize and
analyze biological data
Ancestral inference
• Integration at points of common ancestry
• Infer “hidden” character of living organisms
• Explicitly leverage evolutionary relationships
E.c.
A.t. MTHFR1
A.t. MTHFR2
D.d.
S.p.
S.c. MET13
D.m.
A.g.
S.p.
S.c. MET12
C.e.
D.r.
G.g.
H.s. MTHFR
R.n.
M.m.
divergence
Biochemistry: purification and assay
Genetics: mutant phenotypes
What is transitive annotation?
 Related genes have a common function because their common
ancestor had that function.
 Not just an inference about one gene. It is also an inference for
 The most recent common ancestor (MRCA)
 Continuous inheritance since the MRCA
 Potential inheritance by other descendants of the MRCA
Gene in
Yeast
Gene in
Mouse
Function X
Gene in
Opisthokont
MRCA
Function X
Function X
Gene in
Zebrafish
Function X
Function X
Gene in
Human
Function X
Function X
61
• Green indicates experimental
• Black dot indicates direct
experimental data.
dot indicates a more
general functional class
inferred from ontology
Red indicates NOT
function for the gene
All nodes have persistent identifiers
which are retained across different
builds of the protein family trees.
cholinesterase
carboxylic ester hydrolase
Evolutionary event type:
duplication
speciation
• PAINTed nodes –
• 3 steps carried out by
curator
• Gain & Loss of function
• Inferred By Descendants
• Experimental annotations
provide evidence
• Inferred by Ancestry
• Propagation to
unannotated leaves
carboxylic ester
hydrolase
Node with loss of function
Gaudet, P., et al. (2011). Phylogenetic-
based propagation of functional
annotations within the Gene
Ontology consortium. Briefings in
Bioinformatics, 12(5), 449–62.
doi:10.1093/bib/bbr042
Node with gain of
function- cholinesterase
PGM1
subfamily
PGM5
subfamily
Curated active site information from CDD (cd03085)
phosphoglucomutase
Duplication
event
http://questfororthologs.org/
FUNCTIONAL ANNOTATION
Noctua for Building Models of Biology
Motivation: multi-scale knowledge
models of mechanistic biology
Bai, J. P. F., & Abernethy, D. R. (n.d.). Systems Pharmacology to Predict Drug Toxicity : Integration Across Levels of Biological Organization ∗,
451–473. doi:10.1146/annurev-pharmtox-011112-140248
A data model for causal ontology
annotations: “LEGO”
Activity
GO:nnnnnnn
What: <molecule>
A data model for causal ontology
annotations: “LEGO”
Activity
GO:nnnnnnn
What: <molecule>
Where: GO/CL/Uberon
A data model for causal ontology
annotations: “LEGO”
Activity
GO:nnnnnnn
What: <molecule>
Where: GO/CL/Uberon
Activity
GO:nnnnnnn
What: <molecule>
Where: GO/CL/Uberon
Relationship
RO:nnnnnnn
A data model for causal ontology
annotations: “LEGO”
Activity
GO:nnnnnnn
What: <molecule>
Where: GO/CL/Uberon
Activity
GO:nnnnnnn
What: <molecule>
Where: GO/CL/Uberon
Relationship
RO:nnnnnnn
Evidence: ECO, SEPIO
Source: PMID, ORCID, ...
Process
GO:nnnnnnn
A data model for causal ontology
annotations: “LEGO”
Activity
GO:nnnnnnn
What: <molecule>
Where: GO/CL/Uberon
Activity
GO:nnnnnnn
What: <molecule>
Where: GO/CL/Uberon
Relationship
RO:nnnnnnn
A data model for causal ontology
annotations: “LEGO”
GTPase activity
GO:0003924
What: TEM1 S000004529
Where: spindle pole
GO:0000922
GTPase inhibitor activity
GO:0005095
What: BFA1
S000003814
Where: spindle pole
GO:0000922
Exit from mitosis
GO:0010458
A data model for causal ontology
annotations: “LEGO”
GTPase activity
GO:0003924
What: TEM1 S000004529
Where: spindle pole
GO:0000922
GTPase inhibitor activity
GO:0005095
What: BFA1
S000003814
Where: spindle pole
GO:0000922
http://noctua.berkeleybop.org/
Collaborative
Editing!
RDF/OWL
Semantic
Representation
Pathway data
-Reasoning
-Linked data
Gene sets
Building causal models
of biology
using ontologies
Diabetes mockup example
https://vimeo.com/channels/Noctua

Más contenido relacionado

Similar a Annotation Systems & Implementation Issues - Suzanna Lewis

Things to consider with websitesAuthority -- what are the autho.docx
Things to consider with websitesAuthority -- what are the autho.docxThings to consider with websitesAuthority -- what are the autho.docx
Things to consider with websitesAuthority -- what are the autho.docxOllieShoresna
 
Structure Of A Research Essay. The Research Paper Structure - How to Write a ...
Structure Of A Research Essay. The Research Paper Structure - How to Write a ...Structure Of A Research Essay. The Research Paper Structure - How to Write a ...
Structure Of A Research Essay. The Research Paper Structure - How to Write a ...Becky Strickland
 
Nature of science and evolution
Nature of science and evolutionNature of science and evolution
Nature of science and evolutionQuanina Quan
 
A good response to others is not something like I agree. Please .docx
A good response to others is not something like I agree. Please .docxA good response to others is not something like I agree. Please .docx
A good response to others is not something like I agree. Please .docxransayo
 
Presentation to the J. Craig Venter Institute, Dec. 2014
Presentation to the J. Craig Venter Institute, Dec. 2014Presentation to the J. Craig Venter Institute, Dec. 2014
Presentation to the J. Craig Venter Institute, Dec. 2014Mark Wilkinson
 
The First Civilization
The First CivilizationThe First Civilization
The First CivilizationVapula
 
High School Biology Instructional Unit_Jordan Hampton
High School Biology Instructional Unit_Jordan HamptonHigh School Biology Instructional Unit_Jordan Hampton
High School Biology Instructional Unit_Jordan HamptonJordan Hampton
 
1 Running head THE ETHICS OF ELEPHANTS IN CIRCUSES .docx
1 Running head THE ETHICS OF ELEPHANTS IN CIRCUSES    .docx1 Running head THE ETHICS OF ELEPHANTS IN CIRCUSES    .docx
1 Running head THE ETHICS OF ELEPHANTS IN CIRCUSES .docxhoney725342
 
Week 5 inductive essay writing copy
Week 5 inductive essay writing   copyWeek 5 inductive essay writing   copy
Week 5 inductive essay writing copyDr. Russell Rodrigo
 

Similar a Annotation Systems & Implementation Issues - Suzanna Lewis (12)

Things to consider with websitesAuthority -- what are the autho.docx
Things to consider with websitesAuthority -- what are the autho.docxThings to consider with websitesAuthority -- what are the autho.docx
Things to consider with websitesAuthority -- what are the autho.docx
 
Scientific Essay Sample
Scientific Essay SampleScientific Essay Sample
Scientific Essay Sample
 
Structure Of A Research Essay. The Research Paper Structure - How to Write a ...
Structure Of A Research Essay. The Research Paper Structure - How to Write a ...Structure Of A Research Essay. The Research Paper Structure - How to Write a ...
Structure Of A Research Essay. The Research Paper Structure - How to Write a ...
 
Scientific Method
Scientific MethodScientific Method
Scientific Method
 
Nature of science and evolution
Nature of science and evolutionNature of science and evolution
Nature of science and evolution
 
A good response to others is not something like I agree. Please .docx
A good response to others is not something like I agree. Please .docxA good response to others is not something like I agree. Please .docx
A good response to others is not something like I agree. Please .docx
 
Presentation to the J. Craig Venter Institute, Dec. 2014
Presentation to the J. Craig Venter Institute, Dec. 2014Presentation to the J. Craig Venter Institute, Dec. 2014
Presentation to the J. Craig Venter Institute, Dec. 2014
 
The First Civilization
The First CivilizationThe First Civilization
The First Civilization
 
High School Biology Instructional Unit_Jordan Hampton
High School Biology Instructional Unit_Jordan HamptonHigh School Biology Instructional Unit_Jordan Hampton
High School Biology Instructional Unit_Jordan Hampton
 
1 Running head THE ETHICS OF ELEPHANTS IN CIRCUSES .docx
1 Running head THE ETHICS OF ELEPHANTS IN CIRCUSES    .docx1 Running head THE ETHICS OF ELEPHANTS IN CIRCUSES    .docx
1 Running head THE ETHICS OF ELEPHANTS IN CIRCUSES .docx
 
Week 5 inductive essay writing copy
Week 5 inductive essay writing   copyWeek 5 inductive essay writing   copy
Week 5 inductive essay writing copy
 
Scientific Essay Definition
Scientific Essay DefinitionScientific Essay Definition
Scientific Essay Definition
 

Último

Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfmuntazimhurra
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfSumit Kumar yadav
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡anilsa9823
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfSumit Kumar yadav
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksSérgio Sacani
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)PraveenaKalaiselvan1
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsSumit Kumar yadav
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptxRajatChauhan518211
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PPRINCE C P
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Nistarini College, Purulia (W.B) India
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptxanandsmhk
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 

Último (20)

Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questions
 
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptx
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C P
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
Engler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomyEngler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomy
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 

Annotation Systems & Implementation Issues - Suzanna Lewis

  • 1. Motivation Your research is valuable All advances in knowledge are incremental, with each new idea ultimately building on earlier knowledge such as you are gathering.
  • 2. Losing data at a rapid rate up to 80% unavailable after 20 years 2 http://www.nature.com/news/scientists-losing-data-at-a-rapid-rate-1.14416
  • 3. Data valuation  Information is infinitely shareable without any loss of value  Reuse increases the value derived from the original investment  By combining data, their value increases  The more these assets are used, the more additional knowledge can be gathered (data science)  As a corollary, unshared or insufficiently documented information is less valuable  The more accurate and complete the information is, the more useful, and therefore valuable, it is Moody and Walsh 1999
  • 4.
  • 5.
  • 7. eye  what kinds of things exist?  what are the relationships between these things? ommatidium sense organeye disc is_a part_of develops from A biological ontology is:  A machine interpretable representation of some aspect of biological reality
  • 8. October 25, 2016 Ontology defined  The science of what is: of the kinds and structures of the objects, and their properties and relations in every area of reality.  The classification of entities and the relations between them.  Defined by a scientific field's vocabulary and by the canonical formulations of its theories.  Seeks to solve problems which arise in these domains.
  • 10. Ontologies help with decision making handy ontology tells us what’s there… Where should I eat…?
  • 11. Ontologies don’t just organize data; they also facilitate inference, and that creates new knowledge, often unconsciously in the user. (Presumable) country of origin Type of cuisine
  • 12. What a 5 year old child (or a computer) will likely infer about the world from this helpful ontology… Flag of fresh juice ‘Frozen Yogurt’ cuisine in search of a national identity? Where delicatessen food hails from from… Fresh Juice is a national cuisine…
  • 13. Information retrieval is not straightforward  18-day pregnant females  female (lactating)  individual female  worker caste (female)  2 yr old female  female (pregnant)  lgb*cc females  sex: female  400 yr. old female  female (outbred)  mare  female, other  adult female  female parent  female (worker)  female child  asexual female  female plant  monosex female  femal  femlale  diploid female  female(gynoecious)  remale  metafemale  f  femele  semi-engorged female  sterile female  famale  female, pooled  sexual oviparous female  normal female  femail  femalen  sterile female worker  sf  female  females  strictly female  vitellogenic replete female  female - worker  females only  tetraploid female  worker  female (alate sexual)  gynoecious  thelytoky  hexaploid female  female (calf)  healthy female  female (gynoecious)  female (f-o)  hen  probably female (based on morphology)  castrate female  female with eggs  ovigerous female  3 female  cf.female  female worker  oviparous sexual females  female (phenotype)  cystocarpic female  female, 6-8 weeks old  worker bee  female mice  dikaryon  female, virgin  female enriched  female, spayed  dioecious female  female, worker  pseudohermaprhoditic female Courtesy of N. Silvester and S. Orchard, European Nucleotide Archive, EMBL- EBI
  • 14. October 25, 2016 Motivation is to represent biology accurately  Inferences and decisions we make are based upon what we know of the biological reality.  An ontology is a computable representation of this underlying biological reality.  Enables a computer to reason over the data in (some of) the ways that we do.
  • 15. Annotation bottleneck  Even the best research will be for naught if data can never be found again.  An active lab can easily generate 10-100GB of data per month, and it is very difficult to manage on this scale.  Must be annotated at the rate at which it is generated  And the data must be integrated with other data  Furthermore, the effort put into generating this data will be utterly wasted if the curated data cannot be reliably computed upon.
  • 16. HOW TO BUILD ONTOLOGIES
  • 17. Ontologies must be shared  Communities form scientific theories  that seek to explain all of the existing evidence  and can be used for prediction  The computable representation must also be shared  Thus ontology development is inherently collaborative October 25, 2016
  • 18. October 25, 2016 Ontologies must be used  Usage feeds back on ontology development and improves the ontology  It improves even more when these data are used to answer research questions  There will be fewer problems in the ontology and more commitment to fixing remaining problems when important research data is involved that scientists depend upon
  • 19. Why do we need rules for good ontology?  Ontologies must be intelligible  To humans (for annotation) and  To machines (for searching, reasoning and error-checking)  Makes it easier to find the most accurate term(s) to use  Avoids annotation errors  Makes it easier for new curators to learn and understand  Makes it easier to combine with other ontologies and terminologies  Makes automatic reasoning possible for searching & inference  Bottom line:  Following basic rules makes more useful ontologies
  • 20. October 25, 2016 First Rule: Univocity  Terms (including those describing relations) should have the same meanings on every occasion of use.  In other words, they should refer to the same kinds of entities in reality
  • 21. October 25, 2016 Glucose synthesis GluconeogenesisGlucose synthesis ? The Challenge of Univocity: People call the same thing by different names
  • 22. Comparison is difficult, especially across species or across databases that each use one of these different variants Disambiguation  Use a single term, and plenty of synonyms  Gluconeogenesis  Synonyms:  Glucose synthesis  Glucose biosynthesis  Glucose formation  Glucose anabolism
  • 23. October 25, 2016 Bud initiation? How is a computer to know?
  • 24. = tooth bud initiation = cellular bud initiation = flower bud initiation Include plain “bud initiation” as a synonym for each of these terms Classification rule: Disambiguation
  • 25. October 25, 2016 Second Rule: Positivity  Complements of classes are not themselves classes.  Terms such as ‘non-mammal’ or ‘non- membrane’ do not designate genuine classes.
  • 26. October 25, 2016 The Challenge of Positivity Some organelles are membrane-bound. A centrosome is not a membrane bound organelle, but it still may be considered an organelle.
  • 27. October 25, 2016 Positivity  Note the logical difference between  “non-membrane-bound organelle” and  “not a membrane-bound organelle”  The latter includes everything that is not a membrane bound organelle!
  • 28. October 25, 2016 Third Rule: Objectivity  Which classes exist is not a function of our biological knowledge.  Terms such as ‘unknown’ or ‘unclassified’ or ‘unlocalized’ do not designate biological natural kinds.
  • 29. Objectivity  How can we annotate when we know that we don’t have any information?  Annotate to root nodes and use the ND (no data) evidence code  Similar strategies can be used for any situation more specific information is not yet known October 25, 2016
  • 30. October 25, 2016 GPCRs with unknown ligands Annotate to this
  • 31. Ontologies are graphs, where the nodes (terms in the ontology ) are connected by edges (relationships between the terms) is-a part-of Fourth Rule: Use defined relationships mitochondrial membrane chloroplast Cell membrane Chloroplast membrane
  • 32. Reasoning is critical  Prokaryotic and Eukaryotic cell are declared disjoints  Fungal cell is a Eukaryotic cell  Spore is a Fungal cell and a Prokaryotic cell Satisfiable? http://www.plosone.org/article/info:doi/10.1371/journal.pone.0022006 32 Prokaryotic Cell Eukaryotic Cell Fungal Cell Spore disjoint
  • 33. Reasoning is critical Solution: clarify spore http://www.plosone.org/article/info:doi/10.1371/journal.pone.0022006 33 Prokaryotic Cell Eukaryotic Cell Fungal Cell disjoint Actinomycete Type Spore Mycetozoa Type Spore
  • 34. October 25, 2016 Fifth Rule: Intelligibility of Definitions  The terms used in a definition should be simpler (more intelligible) than the term to be defined  otherwise the definition provides no assistance  to human understanding  for machine processing
  • 35. October 25, 2016 Sixth Rule: Keep it Real  When building or maintaining an ontology, always think carefully at how classes (types, kinds, species) relate to instances in reality
  • 36. October 25, 2016 The Rules 1. Univocity: Terms should have the same meanings on every occasion of use 2. Positivity: Terms such as ‘non-mammal’ or ‘non- membrane’ do not designate genuine classes. 3. Objectivity: Terms such as ‘unknown’ or ‘unclassified’ or ‘unlocalized’ do not designate biological natural kinds. 4. Single Inheritance: No class in a classification hierarchy should have more than one is_a parent on the immediate higher level 5. Intelligibility of Definitions: The terms used in a definition should be simpler (more intelligible) than the term to be defined 6. Basis in Reality: When building or maintaining an ontology, always think carefully at how classes relate to instances in reality 7. Distinguish Universals and Instances
  • 37. Natural Language Computable Ontology + Large existing body of information + Highly expressive - Ambiguous (making it difficult and unreliable to compute on) - Less expressive + Logical + Precise How to best describe biology?
  • 38. ONTOLOGIES AND BIOLOGY Without rigor, we won’t—know what we know, or where to find it, or what we can infer from it.
  • 39.
  • 41. Once a genome is sequenced…  What are the parts? (sequence features)  Protein coding genes (coding sequence)  Non coding RNAs (rRNA, snoRNA, tRNA, microRNA antisense RNA)  Promoters and regulatory regions  Transposons  Recombination hotspots, origins of replication  Centromeres & telomeres  …
  • 43. DNA on a linear coordinate Little boxes
  • 47. APOLLO annotation editing environment BECOMING ACQUAINTED WITH APOLLO Color by CDS frame, toggle strands, set color scheme and highlights. Upload evidence files (GFF3, BAM, BigWig), add combination and sequence search tracks. Query the genome using BLAT. Navigation and zoom. Search for a gene model or a scaffold. Get coordinates and “rubber band” selection for zooming. Login User-created annotations. Annotator panel. Evidence Tracks Stage and cell-type specific transcription data. http://genomearchitect.org/web_apollo_user_guide
  • 50. Alterations: whether experimental artifacts or natural differences Substitutions
  • 51. Alterations: whether experimental artifacts or natural differences Insertions
  • 52. Alterations: whether experimental artifacts or natural differences Deletions
  • 53. Alterations: whether experimental artifacts or natural differences Impact
  • 54. Instructions54 | APOLLO ON THE WEB instructions Username: user.number@example.com Password: usernumber Email Password Server Begin at user.one@example.com userone 1 1 user.two@example.com usertwo 2 1 user.three@example.com userthree 3 1 user.four@example.com userfour 4 1 user.five@example.com userfive 5 1 user.six@example.com usersix 1 7 user.seven@example.com userseven 2 7 user.eight@example.com usereight 3 7 user.nine@example.com usernine 4 7 user.ten@example.com userten 5 7 user.eleven@example.com usereleven 1 1 user.twelve@example.com usertwelve 2 1 user.thirteen@example.com userthirteen 3 1 user.fourteen@example.com userfourteen 4 1 user.fifteen@example.com userfifteen 5 1 user.sixteen@example.com usersixteen 1 7 user.seventeen@example.com userseventeen 2 7 user.eightteen@example.com usereighteen 3 7 user.nineteen@example.com usernineteen 4 7 user.twenty@example.com usertwenty 5 7 user.twentyone@example.com usertwentyone 1 1 user.twentytwo@example.com usertwentytwo 2 1 user.twentythree@example.com usertwentythree 3 1 user.twentyfour@example.com usertwentyfour 4 1 user.twentyfive@example.com usertwentyfive 5 1 user.twentysix@example.com usertwentysix 1 7 user.twentyseven@example.com usertwentyseven 2 7 user.twentyeight@example.com usertwentyeight 3 7 user.twentynine@example.com usertwentynine 4 7 Server URL 1 http://ec2-52-63-181-136.ap-southeast- 2.compute.amazonaws.com/apollo/ 2 http://ec2-52-64-198-214.ap-southeast- 2.compute.amazonaws.com/apollo/ 3 http://ec2-52-62-166-89.ap-southeast- 2.compute.amazonaws.com/apollo/ 4 http://ec2-52-64-182-170.ap-southeast- 2.compute.amazonaws.com/apollo/ 5 http://ec2-52-63-255-136.ap-southeast- 2.compute.amazonaws.com/apollo/
  • 55. GCGAAGTGCCAACTTCTACACACACAAAG GCGAAGTGCCAACTTCTACACACACAAAG For example – ontologically described genotypes/variants intrinsic genotype genomic variation complementgenomic background = + CGTAGC CGTACC apchu745/+; fgfa8ti282/ti282(AB) genomic variation complement variant single locus complement variant allele sequence alteration has_part has_part apchu745/+ apchu745 hu745 has_part has_part has_part has_part X AACGTACCGACGCTCGCTACGGGCGTATC (AB) apchu745/+; fgf8ati282/ti282 apchu745/+; fgf8ati282/ti282 GCGAAGTGCCAACTTCTACACACACAAAG GCGAAGTGCCAACTTCTACACACACAAAG AACGTAGCGACGCTCGCTACGGGCGTATC AACGTACCGACGCTCGCTACGGGCGTATC X ACAC X X X X AACGTAGCGACGCTCGCTACGGGCGTATC X ACAC X X X X X
  • 56.
  • 57. FUNCTIONAL ANNOTATION Phylogenetic Annotation Inferencing Tool — PAINT
  • 58. Evolutionary history is the natural way to organize and analyze biological data
  • 59. Ancestral inference • Integration at points of common ancestry • Infer “hidden” character of living organisms • Explicitly leverage evolutionary relationships E.c. A.t. MTHFR1 A.t. MTHFR2 D.d. S.p. S.c. MET13 D.m. A.g. S.p. S.c. MET12 C.e. D.r. G.g. H.s. MTHFR R.n. M.m. divergence Biochemistry: purification and assay Genetics: mutant phenotypes
  • 60. What is transitive annotation?  Related genes have a common function because their common ancestor had that function.  Not just an inference about one gene. It is also an inference for  The most recent common ancestor (MRCA)  Continuous inheritance since the MRCA  Potential inheritance by other descendants of the MRCA Gene in Yeast Gene in Mouse Function X Gene in Opisthokont MRCA Function X Function X Gene in Zebrafish Function X Function X Gene in Human Function X Function X
  • 61. 61 • Green indicates experimental • Black dot indicates direct experimental data. dot indicates a more general functional class inferred from ontology Red indicates NOT function for the gene All nodes have persistent identifiers which are retained across different builds of the protein family trees. cholinesterase carboxylic ester hydrolase Evolutionary event type: duplication speciation
  • 62. • PAINTed nodes – • 3 steps carried out by curator • Gain & Loss of function • Inferred By Descendants • Experimental annotations provide evidence • Inferred by Ancestry • Propagation to unannotated leaves carboxylic ester hydrolase Node with loss of function Gaudet, P., et al. (2011). Phylogenetic- based propagation of functional annotations within the Gene Ontology consortium. Briefings in Bioinformatics, 12(5), 449–62. doi:10.1093/bib/bbr042 Node with gain of function- cholinesterase
  • 63. PGM1 subfamily PGM5 subfamily Curated active site information from CDD (cd03085) phosphoglucomutase Duplication event
  • 65. FUNCTIONAL ANNOTATION Noctua for Building Models of Biology
  • 66. Motivation: multi-scale knowledge models of mechanistic biology Bai, J. P. F., & Abernethy, D. R. (n.d.). Systems Pharmacology to Predict Drug Toxicity : Integration Across Levels of Biological Organization ∗, 451–473. doi:10.1146/annurev-pharmtox-011112-140248
  • 67. A data model for causal ontology annotations: “LEGO” Activity GO:nnnnnnn What: <molecule>
  • 68. A data model for causal ontology annotations: “LEGO” Activity GO:nnnnnnn What: <molecule> Where: GO/CL/Uberon
  • 69. A data model for causal ontology annotations: “LEGO” Activity GO:nnnnnnn What: <molecule> Where: GO/CL/Uberon Activity GO:nnnnnnn What: <molecule> Where: GO/CL/Uberon Relationship RO:nnnnnnn
  • 70. A data model for causal ontology annotations: “LEGO” Activity GO:nnnnnnn What: <molecule> Where: GO/CL/Uberon Activity GO:nnnnnnn What: <molecule> Where: GO/CL/Uberon Relationship RO:nnnnnnn Evidence: ECO, SEPIO Source: PMID, ORCID, ...
  • 71. Process GO:nnnnnnn A data model for causal ontology annotations: “LEGO” Activity GO:nnnnnnn What: <molecule> Where: GO/CL/Uberon Activity GO:nnnnnnn What: <molecule> Where: GO/CL/Uberon Relationship RO:nnnnnnn
  • 72. A data model for causal ontology annotations: “LEGO” GTPase activity GO:0003924 What: TEM1 S000004529 Where: spindle pole GO:0000922 GTPase inhibitor activity GO:0005095 What: BFA1 S000003814 Where: spindle pole GO:0000922
  • 73. Exit from mitosis GO:0010458 A data model for causal ontology annotations: “LEGO” GTPase activity GO:0003924 What: TEM1 S000004529 Where: spindle pole GO:0000922 GTPase inhibitor activity GO:0005095 What: BFA1 S000003814 Where: spindle pole GO:0000922