SlideShare una empresa de Scribd logo
1 de 61
Experiences using logic programming in bioinformatics  Chris Mungall Berkeley Bioinformatics and Ontologies Group http://berkeleybop.org Lawrence Berkeley National Laboratory ICLP 2009
Outline Biology and biological data integration: a brief introduction Obol: First experiences applying LP Blipkit: a reusable bioinformatics developer’s toolkit Modular structure I/O and relational database connectivity Some applications of Blipkit and LP Genes and genomics Phenotype matching Web applications Conclusions Where next? Some recommendations for the LP community
The promise and challenges of biological research Why study biological systems? Because they’re fascinating Improve health Improve the environment BUT: Biology is hard Biological systems are extremely diverse Biology deal with phenomena at multiple levels of granularity There is a deluge of data Bioinformatics Biology as an information science Computational methods vital to understanding
Diversity of biological systems
Biology in the small: Molecules DNA RNA pseudoknot
Cells and organismal biology gastrulation blastula gastrula axon terminal dendrite node of Ranvier soma axon schwann cell cell nucleus myelin sheath
Ecosystems
bio-databases 1200 Biological Databases published in Nucleic Acids Research many more unpublished many of these are database federations (e.g. Ensembl) Heterogeneous systems Storage mechanism: Relational XML Flat files Ad-hoc, semi-structured, natural language Limited APIs lack of standards limited query expressivity Poorly integrated Limited integration beyond identifier cross-references Users must manually integrate Bioinformatics runs on perl glue metabolic pathways mutants genes fruit flies tumors
Data interrogation and discovery Sample of tasks Find mutations in regions upstream of neurotransmitter-producing genes Find drug targets or animal models for neurodegenerative diseases What biological pathways are enriched in high acidity environments? Answer each of these is difficult Manual aggregation from lots of databases Various kinds of inference required
OBO: Open Biological Ontologies small large
Obol: First experience with LP in bioinformatics Problem Many existing bio-ontologies were in fact more like terminologies Basic axioms, is_a hierarchies Deeper logical structure implicit in terms Long noun phrases, recursively composed “regulation of transcription during G1 phase of mitotic cell cycle” Existing solutions (2004) Take advantage of semi-controlled syntax of terms Parse using ad-hoc regular expressions Influence of perl in bioinformatics! But context-free grammars (at least) were required
A better solution: Definite Clause Grammars Obol: A collection of domain specific DCGs Significant improvement over perlRegExs Declarative More expressive Integration with simple reasoning Bi-directional: can be used for term generation from logical expressions
Example process grammar process(P) regulation(P) | specification(P) | transcription(P) | ... process(P and during(W)) process(P),[during],process(W). process(P andpart_of(W)) process(P),[of],process(W). regulation( regulates(P) )  [regulation,of],process(P). specification( specifies(C) )  [specification, of], cell(C). cell(C and part_of(O)) ogan(O),cell(C). “regulation of transcription during G1 phase of mitotic cell cycle”  regulates(transcription) and during(g1_phase and part_of(mitosis)) “regulation of transcription from RNA polymerase II promoter involved in ventral spinal cord interneuron specification”  regulates(transcription and has_signal(rna_pol_ii))  and part_of(specifies(interneuron and part_of(ventral_spinal_cord)))
Implementations Obol v1 : 2005 XSB DCGs + tabling Earley / chart parsing Basic ontology reasoning (tabling to avoid cycles) Integration into java editing environment (XSB interprolog) Obol v2 : 2006 Port to SWI-Prolog Web interface Earley algorithm implementation Backward chaining for simple reasoning Forward chaining for full reasoning Obol v2.5 : 2007 Reversion to plain DCGs careful construction to avoid cycles ,[object Object]
Obol java
Obol v3 : 2009
In progress
OWL-Centric
Built on Thea2http://wiki.geneontology.org/index.php/Obol
Results Obol grammars applied successfully to generate axioms for multiple ontologies particularly the Gene Ontology Still used frequently Lessons learned Small amount of basic LP goes a long way LP techniques not widely known in bioinformatics Different LP systems have different strengths Choosing between them is hard – and frustrating
Could LP prove as successful in the wider bioinformatics arena? Rule-based analysis pipelines prolog > make Integration of ontology reasoning and database queries prolog > datalog > sql Pathways graphs, ASP Genomics Linear transformations, CLP Phylogenetics operations on trees
Toolkit Paradigm: BioPerl http://www.bioperl.org/ Established 1990s Collaborative Open Source, svn repository No funding, all voluntary Modular Namespaces Interrelated Separation of I/O from models Parsers Writers SQL database bindings Publication: The BioPerl toolkit, Stajich et al, Genome Research 2002 1044 citations (google scholar) ,[object Object]
biojava
biopython
bioruby
bioocaml
…
Parent org
open bioinformatics foundation
Issues
object oriented
perl!,[object Object]
Anatomy of a blip domain package Model(s) of the domain dependencies to other domain modules extensional and intensional predicates I/O parsers/writers for small subset of bioinformatics file formats DCGs or external perl translators for common XML schemas Native prolog serialization of model ‘for free’  Web UI Bridges Relational Other prolog models Ontology models
Domain model modules A model consists of extensional + intensionalpredicates Extensional predicates Unit clauses / facts - Asserted and/or compiled from fact files Akin to relational tables Intensional predicates Declarative: No I/O side effects Prolog has no built in extensional/intensional distinction All clauses treated equally Facts conventionally declared dynamic/1 and multifile/1 Some metamodeling is useful Easy to roll own A standard metamodel module would be useful optional type system + relational DDL style constraints Works as documentation
Example from systems biology model %%reaction_modifier(?R,?P) is nondet % relation between a biochemical reaction and a molecular constituent that plays a role in the process but is unmodified :- extensional(reaction_modifier/2). % --- INTENSIONAL PREDICATES --- %%derivation_link(?Input,?Output,?Via) %  two species directly linked via a connecting % reaction  (excludes modifiers) derivation_link(Input,Output,R):- reaction_reactant(R,Input), reaction_product(R,Output). %...[snip]… :- module(sb_db,[ reaction_product/2, reaction_reactant/2, reaction_modifier/2, derivation_link/3, …]). :- use_module(bio(dbmeta)). % metamodel %%reaction_product(?R,?P) is nondet % relation between a biochemical reaction and a molecular constituent produced in the reaction :- extensional(reaction_product/2). %% reaction_reactant(?R,?P) is nondet % relation between a biochemical reaction and a molecular constituent that is consumed in the reaction :- extensional(reaction_reactant/2).
Integrating with relational databases Most biological data stored in relational databases Many provide open SQL ports for distributed queries RDBs scale well with large quantities of data …but RDBs lack necessary deductive capabilities Expressivity Hierarchy FOL Pure prolog Datalog Relational Model Using prolog with RDBs should be easy… right?
sql_compiler Given a mapping to a relational schema: rewrites prolog terms as SQL queries Used in conjunction with db connectivity module History Draxler, 1992 Source forked, modified versions available with various prologs Blip includes extensions to Rewrite sub-optimal queries Rewrite non-recursive prolog clauses Integrate with SWI ODBC
Example query rewriting program rewriting program ?- sqlbind(sb_db:all, mydb). derivation_link(Input,Output,R):- reaction_reactant(R,Input), reaction_product(R,Output). call goal ?- derivation_link(X,Y) schema metadata + relation(reac_in,2). attribute(1,react_in,reac_id,int). attribute(2,react_in,input_id,int). relation(reac_out,2). attribute(1,react_out,reac_id,int). attribute(2,react_out,output_id,int). query rewriting + SELECT  reac_in.reac_id, reac_in.input_id, reac_in.output_id FROM reac_in, reac_out WHERE reac_in.reac_id=reac_out.reac_id; mapping reaction_reactant(R,P) <- reac_in(R,P). reaction_product(R,P) <- reac_out(R,P). odbc.pl
Obtaining data from web services Many large bioinformatics data providers provide RESTful APIs NCBI caBIG SWI libraries used http_client sgml (for parsing XML payloads) XML -> Models Direct translation of sgml too low level XSLT-inspired prolog template-oriented processing language Application: ontology enhanced search term expansion E.g. “find all genes implicated in neurodegenerative disease”  ‘parkinsons’ OR ‘alzheimers’ OR …
Applications of Blipkit and LP techniques Genomics and DNA sequences Deduction of implicit information Consistency checking of genome datasets Phenotype matching Finding similarities of mutational effects
Genome inference Deluge of genomic data Cost per genome decreasing Soon we will all know our genome sequence But what does it mean? Effective use of genomics data relies on deductive inference Many rules are logical: genome calculus Currently encoded using ad-hoc imperative code Probabilistic inference also useful But must be built on top of the logical inference
DNA human chromosome 1:   247m base pairs, 4220 genes Entire genome:  3x109 bps, 20k genes T A G C
DNA human chromosome 1:   247m base pairs, 4220 genes Entire genome:  3x109 bps, 20k genes T A G C Gene expression: transcription splicing translation
Transcription A subsequence of a DNA sequence is transcribed to an RNA sequence  regulated by sequence called promoters and enhancers
Splicing Zero or more subsequences (introns) of the RNA  sequence are spliced out. The remaining sequences (exons) are joined together at splice sites. ,[object Object]
combinatorial possibilities,[object Object],[object Object]
Genomics databases Genome databases are important for biomedicine understanding evolution in a molecular level Problem: genome databases are incomplete stating all implicit features leads to redundancy integration and complex queries difficult ad-hoc rules embedded in imperative code Problem: genome databases are inconsistent Different interpretation of gene, exon, UTR etc
Solution: Sequence Ontology + Deductive Database The Sequence Ontology standardizes sequence terms Additional axioms are being added Encoding genome calculus Genome relations based on Allen Interval Algebra Can be used in conjunction with a deductive genome database consistency checking does this genome dataset make sense? inference and querying what entities are present in region X?
Sequence relationship predicates based on Allen Interval Algebra no recursion conjunction of binary terms uses arithmetic (for efficiency) Extensions: strands circular genomes upstream_of(X,Y) :- has_end(X,XE), has_start(Y,YS),   XE < YS.  ?- upstream_of(exon3,X). X=exon1 ; X=exon2 exon3 exon1 exon2 exon4 exon5
Intron-exon inference intron( i(T,S,E) ) :-    exon(X1),   exon(X2),   has_end(X1,S,T),   has_start(X2,E,T),    ((exon(X3),          contained_by(X3,T),         starts_after_start_of(X3,X1),         ends_before_end_of(X3,X2))). ,[object Object]
possibility of recursion through negationexon(exon1). exon(exon2). has_end(exon1,1000,t1). has_start(exon2,2000,t2). ?- intron(X). X = i(t1,1000,2000) t1 exon1 exon2
OWL implementation Many axioms cannot be expressed in OWL Interval relations –  no arithmetic in OWL option 1: use SWRL option 2: enumerate all base pairs and use property chain axioms Cannot infer properties of unnamed individuals E.g. introns from exons Cyclic structures cannot be described Requires Description Graph extension Open World Assumption useful for semantic web CWA is more convenient for genomics
Deductive database implementation Methods: Convert sequence ontology OWL->DLP via Thea2 Manually edit Add rules that cannot be expressed in OWL Tested on XSB and Yap requires tabling Results Currently scales to small regions more debugging required difficult to eliminate unstratified negation
Disjunctive datalog implementation Adds: Constraints Disjunctions in rule heads Implementation DLV-Complex : allows functions in arguments Program written from scratch: Rules must be ‘safe’ Results Scales over small regions Useful for detecting inconsistencies in data More research needed More efficient programs Use of relational database backend Further exploration of ASP semantics Genomic rules have many exceptions
Prolog implementation Removes: rules that cause cycles with backtracking Implementation Optional use of Nested Containment List library (C + SWI FLI) Results Results can be incomplete due to missing rules E.g. intron :- exon, but not exon :- intron Ruleset can be tailored for dataset Scales over medium sized datasets
Hybrid Prolog-Relational implementation Uses same program as prolog implementation Relational database store facts (extensional) can be distributed Uses sql_compiler + mappings to genomics databases Ensembl Chado Non-recursive prolog rules dynamically translated to complex SQL Recursive subclass rules translated by query compiler using UNIONs precomputed and stored in relational database Scales to full genomes
LP for genomics: conclusions No one paradigm is perfect Many axioms cannot be expressed in OWL but tools are good Disjunctive Datalog good for consistency checking in small regions More research required on efficiency of tabling solution, ASPs WAM solution most efficient Manually rewriting programs is tedious! Hybrid solutions useful RDBs for asserted facts
Application: match.com for diseases Organisms have phenotypes characteristics under the control of the genes of that organism Related genes can have similar phenotypic effects even when the least common ancestor of the gene is 500m years ago Finding these genes can help understand disease evolution
Application: match.com for diseases
Semantic Similarity Given a collection of features F = {f1, f2, …} attributes A = {a1, a2, …} feature-attribute mappings: a(f) = F x A For any feature pair x,y, calculate: Jacard coefficient |a(x) ∩ a(y)| / |a(x)∪ a(y)| maximum IC IC(a) = -log2p(a) maxIC(x,y) = Max[IC(a) : a ∈a(x)∩ a(y)]
SWI-Prolog implementation Uses GMP normal prolog programs have unbounded integer arithmetic allows fast bitwise implementations of set intersection/union Encode feature attribute lists as integers m : A  {0, .., |A|-1} ai(f) = ∑ 2 m(a) a ∈ a(f) Set intersection and union computed using bitwise and/or Fast implementation of Jacard coefficient J is (A1 /A2 / A1  A2)

Más contenido relacionado

La actualidad más candente

Talk ABRF 2015 (Gunnar Rätsch)
Talk ABRF 2015 (Gunnar Rätsch)Talk ABRF 2015 (Gunnar Rätsch)
Talk ABRF 2015 (Gunnar Rätsch)Gunnar Rätsch
 
Apollo Introduction for the Chestnut Research Community
Apollo Introduction for the Chestnut Research CommunityApollo Introduction for the Chestnut Research Community
Apollo Introduction for the Chestnut Research CommunityMonica Munoz-Torres
 
Differential gene expression
Differential gene expressionDifferential gene expression
Differential gene expressionDenis C. Bauer
 
Introduction to BioNLP and its applications
Introduction to BioNLP and its applicationsIntroduction to BioNLP and its applications
Introduction to BioNLP and its applicationsShankaiYan
 
The Language of the Gene Ontology
The Language of the Gene OntologyThe Language of the Gene Ontology
The Language of the Gene Ontologyrobertstevens65
 
Chambwe bosc2010
Chambwe bosc2010Chambwe bosc2010
Chambwe bosc2010BOSC 2010
 
An introduction to RNA-seq data analysis
An introduction to RNA-seq data analysisAn introduction to RNA-seq data analysis
An introduction to RNA-seq data analysisAGRF_Ltd
 
Representing and reasoning with biological knowledge
Representing and reasoning with biological knowledgeRepresenting and reasoning with biological knowledge
Representing and reasoning with biological knowledgeBenjamin Good
 
Apollo : A workshop for the Manakin Research Coordination Network
Apollo: A workshop for the Manakin Research Coordination NetworkApollo: A workshop for the Manakin Research Coordination Network
Apollo : A workshop for the Manakin Research Coordination NetworkMonica Munoz-Torres
 
The Past, Present and Future of Knowledge in Biology
The Past, Present and Future of Knowledge in BiologyThe Past, Present and Future of Knowledge in Biology
The Past, Present and Future of Knowledge in Biologyrobertstevens65
 
Integrating Pathway Databases with Gene Ontology Causal Activity Models
Integrating Pathway Databases with Gene Ontology Causal Activity ModelsIntegrating Pathway Databases with Gene Ontology Causal Activity Models
Integrating Pathway Databases with Gene Ontology Causal Activity ModelsBenjamin Good
 
Pipeline Scripting for the Parallel Alignment of Genomic Short Sequence Reads
Pipeline Scripting for the Parallel Alignment of Genomic Short Sequence ReadsPipeline Scripting for the Parallel Alignment of Genomic Short Sequence Reads
Pipeline Scripting for the Parallel Alignment of Genomic Short Sequence ReadsAdam Bradley
 
RNASeq DE methods review Applied Bioinformatics Journal Club
RNASeq DE methods review Applied Bioinformatics Journal ClubRNASeq DE methods review Applied Bioinformatics Journal Club
RNASeq DE methods review Applied Bioinformatics Journal ClubJennifer Shelton
 
2015 bioinformatics go_hmm_wim_vancriekinge
2015 bioinformatics go_hmm_wim_vancriekinge2015 bioinformatics go_hmm_wim_vancriekinge
2015 bioinformatics go_hmm_wim_vancriekingeProf. Wim Van Criekinge
 

La actualidad más candente (20)

Nlp
NlpNlp
Nlp
 
Talk ABRF 2015 (Gunnar Rätsch)
Talk ABRF 2015 (Gunnar Rätsch)Talk ABRF 2015 (Gunnar Rätsch)
Talk ABRF 2015 (Gunnar Rätsch)
 
Apollo Introduction for the Chestnut Research Community
Apollo Introduction for the Chestnut Research CommunityApollo Introduction for the Chestnut Research Community
Apollo Introduction for the Chestnut Research Community
 
Differential gene expression
Differential gene expressionDifferential gene expression
Differential gene expression
 
Introduction to BioNLP and its applications
Introduction to BioNLP and its applicationsIntroduction to BioNLP and its applications
Introduction to BioNLP and its applications
 
The Language of the Gene Ontology
The Language of the Gene OntologyThe Language of the Gene Ontology
The Language of the Gene Ontology
 
Chambwe bosc2010
Chambwe bosc2010Chambwe bosc2010
Chambwe bosc2010
 
2015_CV_J_SHELTON_linked
2015_CV_J_SHELTON_linked2015_CV_J_SHELTON_linked
2015_CV_J_SHELTON_linked
 
An introduction to RNA-seq data analysis
An introduction to RNA-seq data analysisAn introduction to RNA-seq data analysis
An introduction to RNA-seq data analysis
 
Icpc11c.ppt
Icpc11c.pptIcpc11c.ppt
Icpc11c.ppt
 
ChipSeq Data Analysis
ChipSeq Data AnalysisChipSeq Data Analysis
ChipSeq Data Analysis
 
Representing and reasoning with biological knowledge
Representing and reasoning with biological knowledgeRepresenting and reasoning with biological knowledge
Representing and reasoning with biological knowledge
 
Apollo : A workshop for the Manakin Research Coordination Network
Apollo: A workshop for the Manakin Research Coordination NetworkApollo: A workshop for the Manakin Research Coordination Network
Apollo : A workshop for the Manakin Research Coordination Network
 
The Past, Present and Future of Knowledge in Biology
The Past, Present and Future of Knowledge in BiologyThe Past, Present and Future of Knowledge in Biology
The Past, Present and Future of Knowledge in Biology
 
Sequence assembly
Sequence assemblySequence assembly
Sequence assembly
 
Integrating Pathway Databases with Gene Ontology Causal Activity Models
Integrating Pathway Databases with Gene Ontology Causal Activity ModelsIntegrating Pathway Databases with Gene Ontology Causal Activity Models
Integrating Pathway Databases with Gene Ontology Causal Activity Models
 
Pipeline Scripting for the Parallel Alignment of Genomic Short Sequence Reads
Pipeline Scripting for the Parallel Alignment of Genomic Short Sequence ReadsPipeline Scripting for the Parallel Alignment of Genomic Short Sequence Reads
Pipeline Scripting for the Parallel Alignment of Genomic Short Sequence Reads
 
RNASeq DE methods review Applied Bioinformatics Journal Club
RNASeq DE methods review Applied Bioinformatics Journal ClubRNASeq DE methods review Applied Bioinformatics Journal Club
RNASeq DE methods review Applied Bioinformatics Journal Club
 
Genome Curation using Apollo
Genome Curation using ApolloGenome Curation using Apollo
Genome Curation using Apollo
 
2015 bioinformatics go_hmm_wim_vancriekinge
2015 bioinformatics go_hmm_wim_vancriekinge2015 bioinformatics go_hmm_wim_vancriekinge
2015 bioinformatics go_hmm_wim_vancriekinge
 

Destacado

Prins Bio Lib Bosc2008
Prins Bio Lib Bosc2008Prins Bio Lib Bosc2008
Prins Bio Lib Bosc2008bosc_2008
 
Prins Bio Lib Bosc 2009
Prins Bio Lib Bosc 2009Prins Bio Lib Bosc 2009
Prins Bio Lib Bosc 2009bosc
 
H Mishima - Biogem, Ruby UCSC API, and BioRuby
H Mishima - Biogem, Ruby UCSC API, and BioRubyH Mishima - Biogem, Ruby UCSC API, and BioRuby
H Mishima - Biogem, Ruby UCSC API, and BioRubyJan Aerts
 
BioRuby -- Bioinformatics Library
BioRuby -- Bioinformatics LibraryBioRuby -- Bioinformatics Library
BioRuby -- Bioinformatics Libraryngotogenome
 
Thea: Processing OWL Ontologies - An application of logic programming
Thea: Processing OWL Ontologies - An application of logic programmingThea: Processing OWL Ontologies - An application of logic programming
Thea: Processing OWL Ontologies - An application of logic programmingguest57f623bf
 
Sharing Data: An Introductory Workshop from OpenAIRE and Foster
Sharing Data: An Introductory Workshop from OpenAIRE and FosterSharing Data: An Introductory Workshop from OpenAIRE and Foster
Sharing Data: An Introductory Workshop from OpenAIRE and FosterOpenAIRE
 
Open Bioinformatics Foundation: 2014 Update & Some Introspection
Open Bioinformatics Foundation: 2014 Update & Some IntrospectionOpen Bioinformatics Foundation: 2014 Update & Some Introspection
Open Bioinformatics Foundation: 2014 Update & Some IntrospectionHilmar Lapp
 
yw jakartarb20101031
yw jakartarb20101031yw jakartarb20101031
yw jakartarb20101031Yannick Wurm
 

Destacado (11)

Prins Bio Lib Bosc2008
Prins Bio Lib Bosc2008Prins Bio Lib Bosc2008
Prins Bio Lib Bosc2008
 
Prins Bio Lib Bosc 2009
Prins Bio Lib Bosc 2009Prins Bio Lib Bosc 2009
Prins Bio Lib Bosc 2009
 
Amistad
AmistadAmistad
Amistad
 
H Mishima - Biogem, Ruby UCSC API, and BioRuby
H Mishima - Biogem, Ruby UCSC API, and BioRubyH Mishima - Biogem, Ruby UCSC API, and BioRuby
H Mishima - Biogem, Ruby UCSC API, and BioRuby
 
D03-NextGen-Bio-NGS
D03-NextGen-Bio-NGSD03-NextGen-Bio-NGS
D03-NextGen-Bio-NGS
 
BioRuby -- Bioinformatics Library
BioRuby -- Bioinformatics LibraryBioRuby -- Bioinformatics Library
BioRuby -- Bioinformatics Library
 
Thea: Processing OWL Ontologies - An application of logic programming
Thea: Processing OWL Ontologies - An application of logic programmingThea: Processing OWL Ontologies - An application of logic programming
Thea: Processing OWL Ontologies - An application of logic programming
 
Sharing Data: An Introductory Workshop from OpenAIRE and Foster
Sharing Data: An Introductory Workshop from OpenAIRE and FosterSharing Data: An Introductory Workshop from OpenAIRE and Foster
Sharing Data: An Introductory Workshop from OpenAIRE and Foster
 
Open Bioinformatics Foundation: 2014 Update & Some Introspection
Open Bioinformatics Foundation: 2014 Update & Some IntrospectionOpen Bioinformatics Foundation: 2014 Update & Some Introspection
Open Bioinformatics Foundation: 2014 Update & Some Introspection
 
yw jakartarb20101031
yw jakartarb20101031yw jakartarb20101031
yw jakartarb20101031
 
Ch5andch6
Ch5andch6Ch5andch6
Ch5andch6
 

Similar a Experiences applying logic programming in bioinformatics

Bio onttalk 30minutes-june2003[1]
Bio onttalk 30minutes-june2003[1]Bio onttalk 30minutes-june2003[1]
Bio onttalk 30minutes-june2003[1]Joanne Luciano
 
From peer-reviewed to peer-reproduced: a role for research objects in scholar...
From peer-reviewed to peer-reproduced: a role for research objects in scholar...From peer-reviewed to peer-reproduced: a role for research objects in scholar...
From peer-reviewed to peer-reproduced: a role for research objects in scholar...Alejandra Gonzalez-Beltran
 
Scaling up semantics; lessons learned across the life sciences
Scaling up semantics; lessons learned across the life sciencesScaling up semantics; lessons learned across the life sciences
Scaling up semantics; lessons learned across the life sciencesChris Mungall
 
20090219 The case for another systems biology modelling environment
20090219 The case for another systems biology modelling environment20090219 The case for another systems biology modelling environment
20090219 The case for another systems biology modelling environmentJonathan Blakes
 
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...Stuart Chalk
 
EChang-SystemsBiology
EChang-SystemsBiologyEChang-SystemsBiology
EChang-SystemsBiologywebuploader
 
MADICES Mungall 2022.pptx
MADICES Mungall 2022.pptxMADICES Mungall 2022.pptx
MADICES Mungall 2022.pptxChris Mungall
 
Modularity and evolvability
Modularity and evolvabilityModularity and evolvability
Modularity and evolvabilitypedrobeltrao
 
Towards ubiquitous OWL computing: Simplifying programmatic authoring of and q...
Towards ubiquitous OWL computing: Simplifying programmatic authoring of and q...Towards ubiquitous OWL computing: Simplifying programmatic authoring of and q...
Towards ubiquitous OWL computing: Simplifying programmatic authoring of and q...Hilmar Lapp
 
Falcon-AO: Results for OAEI 2007
Falcon-AO: Results for OAEI 2007Falcon-AO: Results for OAEI 2007
Falcon-AO: Results for OAEI 2007Gong Cheng
 
Connecting life sciences data at the European Bioinformatics Institute
Connecting life sciences data at the European Bioinformatics InstituteConnecting life sciences data at the European Bioinformatics Institute
Connecting life sciences data at the European Bioinformatics InstituteConnected Data World
 
Annotation of SBML Models Through Rule-Based Semantic Integration
Annotation of SBML Models Through Rule-Based Semantic IntegrationAnnotation of SBML Models Through Rule-Based Semantic Integration
Annotation of SBML Models Through Rule-Based Semantic IntegrationAllyson Lister
 
Brief Review of Common Modeling Formalisms and Representation Approaches
Brief Review of Common Modeling Formalisms and Representation ApproachesBrief Review of Common Modeling Formalisms and Representation Approaches
Brief Review of Common Modeling Formalisms and Representation ApproachesMike Hucka
 
Venkatesan bosc2010 onto-toolkit
Venkatesan bosc2010 onto-toolkitVenkatesan bosc2010 onto-toolkit
Venkatesan bosc2010 onto-toolkitBOSC 2010
 
BITS: Overview of important biological databases beyond sequences
BITS: Overview of important biological databases beyond sequencesBITS: Overview of important biological databases beyond sequences
BITS: Overview of important biological databases beyond sequencesBITS
 
Formal representation of models in systems biology
Formal representation of models in systems biologyFormal representation of models in systems biology
Formal representation of models in systems biologyMichel Dumontier
 
Aamir javed perl
Aamir javed perlAamir javed perl
Aamir javed perlAamir Javed
 
Chado for evolutionary biology
Chado for evolutionary biologyChado for evolutionary biology
Chado for evolutionary biologyChris Mungall
 

Similar a Experiences applying logic programming in bioinformatics (20)

Chado introduction
Chado introductionChado introduction
Chado introduction
 
Bio onttalk 30minutes-june2003[1]
Bio onttalk 30minutes-june2003[1]Bio onttalk 30minutes-june2003[1]
Bio onttalk 30minutes-june2003[1]
 
From peer-reviewed to peer-reproduced: a role for research objects in scholar...
From peer-reviewed to peer-reproduced: a role for research objects in scholar...From peer-reviewed to peer-reproduced: a role for research objects in scholar...
From peer-reviewed to peer-reproduced: a role for research objects in scholar...
 
Scaling up semantics; lessons learned across the life sciences
Scaling up semantics; lessons learned across the life sciencesScaling up semantics; lessons learned across the life sciences
Scaling up semantics; lessons learned across the life sciences
 
20090219 The case for another systems biology modelling environment
20090219 The case for another systems biology modelling environment20090219 The case for another systems biology modelling environment
20090219 The case for another systems biology modelling environment
 
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...
 
BioNLPSADI
BioNLPSADIBioNLPSADI
BioNLPSADI
 
EChang-SystemsBiology
EChang-SystemsBiologyEChang-SystemsBiology
EChang-SystemsBiology
 
MADICES Mungall 2022.pptx
MADICES Mungall 2022.pptxMADICES Mungall 2022.pptx
MADICES Mungall 2022.pptx
 
Modularity and evolvability
Modularity and evolvabilityModularity and evolvability
Modularity and evolvability
 
Towards ubiquitous OWL computing: Simplifying programmatic authoring of and q...
Towards ubiquitous OWL computing: Simplifying programmatic authoring of and q...Towards ubiquitous OWL computing: Simplifying programmatic authoring of and q...
Towards ubiquitous OWL computing: Simplifying programmatic authoring of and q...
 
Falcon-AO: Results for OAEI 2007
Falcon-AO: Results for OAEI 2007Falcon-AO: Results for OAEI 2007
Falcon-AO: Results for OAEI 2007
 
Connecting life sciences data at the European Bioinformatics Institute
Connecting life sciences data at the European Bioinformatics InstituteConnecting life sciences data at the European Bioinformatics Institute
Connecting life sciences data at the European Bioinformatics Institute
 
Annotation of SBML Models Through Rule-Based Semantic Integration
Annotation of SBML Models Through Rule-Based Semantic IntegrationAnnotation of SBML Models Through Rule-Based Semantic Integration
Annotation of SBML Models Through Rule-Based Semantic Integration
 
Brief Review of Common Modeling Formalisms and Representation Approaches
Brief Review of Common Modeling Formalisms and Representation ApproachesBrief Review of Common Modeling Formalisms and Representation Approaches
Brief Review of Common Modeling Formalisms and Representation Approaches
 
Venkatesan bosc2010 onto-toolkit
Venkatesan bosc2010 onto-toolkitVenkatesan bosc2010 onto-toolkit
Venkatesan bosc2010 onto-toolkit
 
BITS: Overview of important biological databases beyond sequences
BITS: Overview of important biological databases beyond sequencesBITS: Overview of important biological databases beyond sequences
BITS: Overview of important biological databases beyond sequences
 
Formal representation of models in systems biology
Formal representation of models in systems biologyFormal representation of models in systems biology
Formal representation of models in systems biology
 
Aamir javed perl
Aamir javed perlAamir javed perl
Aamir javed perl
 
Chado for evolutionary biology
Chado for evolutionary biologyChado for evolutionary biology
Chado for evolutionary biology
 

Más de Chris Mungall

LinkML Intro July 2022.pptx PLEASE VIEW THIS ON ZENODO
LinkML Intro July 2022.pptx PLEASE VIEW THIS ON ZENODOLinkML Intro July 2022.pptx PLEASE VIEW THIS ON ZENODO
LinkML Intro July 2022.pptx PLEASE VIEW THIS ON ZENODOChris Mungall
 
Ontology Access Kit_ Workshop Intro Slides.pptx
Ontology Access Kit_ Workshop Intro Slides.pptxOntology Access Kit_ Workshop Intro Slides.pptx
Ontology Access Kit_ Workshop Intro Slides.pptxChris Mungall
 
LinkML Intro (for Monarch devs)
LinkML Intro (for Monarch devs)LinkML Intro (for Monarch devs)
LinkML Intro (for Monarch devs)Chris Mungall
 
LinkML presentation to Yosemite Group
LinkML presentation to Yosemite GroupLinkML presentation to Yosemite Group
LinkML presentation to Yosemite GroupChris Mungall
 
Experiences in the biosciences with the open biological ontologies foundry an...
Experiences in the biosciences with the open biological ontologies foundry an...Experiences in the biosciences with the open biological ontologies foundry an...
Experiences in the biosciences with the open biological ontologies foundry an...Chris Mungall
 
All together now: piecing together the knowledge graph of life
All together now: piecing together the knowledge graph of lifeAll together now: piecing together the knowledge graph of life
All together now: piecing together the knowledge graph of lifeChris Mungall
 
Collaboratively Creating the Knowledge Graph of Life
Collaboratively Creating the Knowledge Graph of LifeCollaboratively Creating the Knowledge Graph of Life
Collaboratively Creating the Knowledge Graph of LifeChris Mungall
 
Representation of kidney structures in Uberon
Representation of kidney structures in UberonRepresentation of kidney structures in Uberon
Representation of kidney structures in UberonChris Mungall
 
SparqlProg (BioHackathon 2019)
SparqlProg (BioHackathon 2019)SparqlProg (BioHackathon 2019)
SparqlProg (BioHackathon 2019)Chris Mungall
 
Ontology Development Kit: Bio-Ontologies 2019
Ontology Development Kit: Bio-Ontologies 2019Ontology Development Kit: Bio-Ontologies 2019
Ontology Development Kit: Bio-Ontologies 2019Chris Mungall
 
US2TS: Reasoning over multiple open bio-ontologies to make machines and human...
US2TS: Reasoning over multiple open bio-ontologies to make machines and human...US2TS: Reasoning over multiple open bio-ontologies to make machines and human...
US2TS: Reasoning over multiple open bio-ontologies to make machines and human...Chris Mungall
 
Uberon: opening up to community contributions
Uberon: opening up to community contributionsUberon: opening up to community contributions
Uberon: opening up to community contributionsChris Mungall
 
Modeling exposure events and adverse outcome pathways using ontologies
Modeling exposure events and adverse outcome pathways using ontologiesModeling exposure events and adverse outcome pathways using ontologies
Modeling exposure events and adverse outcome pathways using ontologiesChris Mungall
 
Causal reasoning using the Relation Ontology
Causal reasoning using the Relation OntologyCausal reasoning using the Relation Ontology
Causal reasoning using the Relation OntologyChris Mungall
 
US2TS presentation on Gene Ontology
US2TS presentation on Gene OntologyUS2TS presentation on Gene Ontology
US2TS presentation on Gene OntologyChris Mungall
 
Introduction to the BioLink datamodel
Introduction to the BioLink datamodelIntroduction to the BioLink datamodel
Introduction to the BioLink datamodelChris Mungall
 
Computing on Phenotypes AMP 2015
Computing on Phenotypes AMP 2015Computing on Phenotypes AMP 2015
Computing on Phenotypes AMP 2015Chris Mungall
 
Mungall keynote-biocurator-2017
Mungall keynote-biocurator-2017Mungall keynote-biocurator-2017
Mungall keynote-biocurator-2017Chris Mungall
 

Más de Chris Mungall (20)

LinkML Intro July 2022.pptx PLEASE VIEW THIS ON ZENODO
LinkML Intro July 2022.pptx PLEASE VIEW THIS ON ZENODOLinkML Intro July 2022.pptx PLEASE VIEW THIS ON ZENODO
LinkML Intro July 2022.pptx PLEASE VIEW THIS ON ZENODO
 
Ontology Access Kit_ Workshop Intro Slides.pptx
Ontology Access Kit_ Workshop Intro Slides.pptxOntology Access Kit_ Workshop Intro Slides.pptx
Ontology Access Kit_ Workshop Intro Slides.pptx
 
LinkML Intro (for Monarch devs)
LinkML Intro (for Monarch devs)LinkML Intro (for Monarch devs)
LinkML Intro (for Monarch devs)
 
LinkML presentation to Yosemite Group
LinkML presentation to Yosemite GroupLinkML presentation to Yosemite Group
LinkML presentation to Yosemite Group
 
Experiences in the biosciences with the open biological ontologies foundry an...
Experiences in the biosciences with the open biological ontologies foundry an...Experiences in the biosciences with the open biological ontologies foundry an...
Experiences in the biosciences with the open biological ontologies foundry an...
 
All together now: piecing together the knowledge graph of life
All together now: piecing together the knowledge graph of lifeAll together now: piecing together the knowledge graph of life
All together now: piecing together the knowledge graph of life
 
Collaboratively Creating the Knowledge Graph of Life
Collaboratively Creating the Knowledge Graph of LifeCollaboratively Creating the Knowledge Graph of Life
Collaboratively Creating the Knowledge Graph of Life
 
Representation of kidney structures in Uberon
Representation of kidney structures in UberonRepresentation of kidney structures in Uberon
Representation of kidney structures in Uberon
 
SparqlProg (BioHackathon 2019)
SparqlProg (BioHackathon 2019)SparqlProg (BioHackathon 2019)
SparqlProg (BioHackathon 2019)
 
Ontology Development Kit: Bio-Ontologies 2019
Ontology Development Kit: Bio-Ontologies 2019Ontology Development Kit: Bio-Ontologies 2019
Ontology Development Kit: Bio-Ontologies 2019
 
US2TS: Reasoning over multiple open bio-ontologies to make machines and human...
US2TS: Reasoning over multiple open bio-ontologies to make machines and human...US2TS: Reasoning over multiple open bio-ontologies to make machines and human...
US2TS: Reasoning over multiple open bio-ontologies to make machines and human...
 
Uberon: opening up to community contributions
Uberon: opening up to community contributionsUberon: opening up to community contributions
Uberon: opening up to community contributions
 
Modeling exposure events and adverse outcome pathways using ontologies
Modeling exposure events and adverse outcome pathways using ontologiesModeling exposure events and adverse outcome pathways using ontologies
Modeling exposure events and adverse outcome pathways using ontologies
 
Causal reasoning using the Relation Ontology
Causal reasoning using the Relation OntologyCausal reasoning using the Relation Ontology
Causal reasoning using the Relation Ontology
 
US2TS presentation on Gene Ontology
US2TS presentation on Gene OntologyUS2TS presentation on Gene Ontology
US2TS presentation on Gene Ontology
 
Introduction to the BioLink datamodel
Introduction to the BioLink datamodelIntroduction to the BioLink datamodel
Introduction to the BioLink datamodel
 
Computing on Phenotypes AMP 2015
Computing on Phenotypes AMP 2015Computing on Phenotypes AMP 2015
Computing on Phenotypes AMP 2015
 
ENVO GSC 2015
ENVO GSC 2015ENVO GSC 2015
ENVO GSC 2015
 
Mungall keynote-biocurator-2017
Mungall keynote-biocurator-2017Mungall keynote-biocurator-2017
Mungall keynote-biocurator-2017
 
Kboom phenoday-2016
Kboom phenoday-2016Kboom phenoday-2016
Kboom phenoday-2016
 

Último

TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 

Último (20)

TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 

Experiences applying logic programming in bioinformatics

  • 1. Experiences using logic programming in bioinformatics Chris Mungall Berkeley Bioinformatics and Ontologies Group http://berkeleybop.org Lawrence Berkeley National Laboratory ICLP 2009
  • 2. Outline Biology and biological data integration: a brief introduction Obol: First experiences applying LP Blipkit: a reusable bioinformatics developer’s toolkit Modular structure I/O and relational database connectivity Some applications of Blipkit and LP Genes and genomics Phenotype matching Web applications Conclusions Where next? Some recommendations for the LP community
  • 3. The promise and challenges of biological research Why study biological systems? Because they’re fascinating Improve health Improve the environment BUT: Biology is hard Biological systems are extremely diverse Biology deal with phenomena at multiple levels of granularity There is a deluge of data Bioinformatics Biology as an information science Computational methods vital to understanding
  • 5. Biology in the small: Molecules DNA RNA pseudoknot
  • 6. Cells and organismal biology gastrulation blastula gastrula axon terminal dendrite node of Ranvier soma axon schwann cell cell nucleus myelin sheath
  • 8. bio-databases 1200 Biological Databases published in Nucleic Acids Research many more unpublished many of these are database federations (e.g. Ensembl) Heterogeneous systems Storage mechanism: Relational XML Flat files Ad-hoc, semi-structured, natural language Limited APIs lack of standards limited query expressivity Poorly integrated Limited integration beyond identifier cross-references Users must manually integrate Bioinformatics runs on perl glue metabolic pathways mutants genes fruit flies tumors
  • 9. Data interrogation and discovery Sample of tasks Find mutations in regions upstream of neurotransmitter-producing genes Find drug targets or animal models for neurodegenerative diseases What biological pathways are enriched in high acidity environments? Answer each of these is difficult Manual aggregation from lots of databases Various kinds of inference required
  • 10. OBO: Open Biological Ontologies small large
  • 11. Obol: First experience with LP in bioinformatics Problem Many existing bio-ontologies were in fact more like terminologies Basic axioms, is_a hierarchies Deeper logical structure implicit in terms Long noun phrases, recursively composed “regulation of transcription during G1 phase of mitotic cell cycle” Existing solutions (2004) Take advantage of semi-controlled syntax of terms Parse using ad-hoc regular expressions Influence of perl in bioinformatics! But context-free grammars (at least) were required
  • 12. A better solution: Definite Clause Grammars Obol: A collection of domain specific DCGs Significant improvement over perlRegExs Declarative More expressive Integration with simple reasoning Bi-directional: can be used for term generation from logical expressions
  • 13. Example process grammar process(P) regulation(P) | specification(P) | transcription(P) | ... process(P and during(W)) process(P),[during],process(W). process(P andpart_of(W)) process(P),[of],process(W). regulation( regulates(P) )  [regulation,of],process(P). specification( specifies(C) )  [specification, of], cell(C). cell(C and part_of(O)) ogan(O),cell(C). “regulation of transcription during G1 phase of mitotic cell cycle”  regulates(transcription) and during(g1_phase and part_of(mitosis)) “regulation of transcription from RNA polymerase II promoter involved in ventral spinal cord interneuron specification”  regulates(transcription and has_signal(rna_pol_ii)) and part_of(specifies(interneuron and part_of(ventral_spinal_cord)))
  • 14.
  • 16. Obol v3 : 2009
  • 20. Results Obol grammars applied successfully to generate axioms for multiple ontologies particularly the Gene Ontology Still used frequently Lessons learned Small amount of basic LP goes a long way LP techniques not widely known in bioinformatics Different LP systems have different strengths Choosing between them is hard – and frustrating
  • 21. Could LP prove as successful in the wider bioinformatics arena? Rule-based analysis pipelines prolog > make Integration of ontology reasoning and database queries prolog > datalog > sql Pathways graphs, ASP Genomics Linear transformations, CLP Phylogenetics operations on trees
  • 22.
  • 27.
  • 32.
  • 33. Anatomy of a blip domain package Model(s) of the domain dependencies to other domain modules extensional and intensional predicates I/O parsers/writers for small subset of bioinformatics file formats DCGs or external perl translators for common XML schemas Native prolog serialization of model ‘for free’ Web UI Bridges Relational Other prolog models Ontology models
  • 34. Domain model modules A model consists of extensional + intensionalpredicates Extensional predicates Unit clauses / facts - Asserted and/or compiled from fact files Akin to relational tables Intensional predicates Declarative: No I/O side effects Prolog has no built in extensional/intensional distinction All clauses treated equally Facts conventionally declared dynamic/1 and multifile/1 Some metamodeling is useful Easy to roll own A standard metamodel module would be useful optional type system + relational DDL style constraints Works as documentation
  • 35. Example from systems biology model %%reaction_modifier(?R,?P) is nondet % relation between a biochemical reaction and a molecular constituent that plays a role in the process but is unmodified :- extensional(reaction_modifier/2). % --- INTENSIONAL PREDICATES --- %%derivation_link(?Input,?Output,?Via) % two species directly linked via a connecting % reaction (excludes modifiers) derivation_link(Input,Output,R):- reaction_reactant(R,Input), reaction_product(R,Output). %...[snip]… :- module(sb_db,[ reaction_product/2, reaction_reactant/2, reaction_modifier/2, derivation_link/3, …]). :- use_module(bio(dbmeta)). % metamodel %%reaction_product(?R,?P) is nondet % relation between a biochemical reaction and a molecular constituent produced in the reaction :- extensional(reaction_product/2). %% reaction_reactant(?R,?P) is nondet % relation between a biochemical reaction and a molecular constituent that is consumed in the reaction :- extensional(reaction_reactant/2).
  • 36. Integrating with relational databases Most biological data stored in relational databases Many provide open SQL ports for distributed queries RDBs scale well with large quantities of data …but RDBs lack necessary deductive capabilities Expressivity Hierarchy FOL Pure prolog Datalog Relational Model Using prolog with RDBs should be easy… right?
  • 37. sql_compiler Given a mapping to a relational schema: rewrites prolog terms as SQL queries Used in conjunction with db connectivity module History Draxler, 1992 Source forked, modified versions available with various prologs Blip includes extensions to Rewrite sub-optimal queries Rewrite non-recursive prolog clauses Integrate with SWI ODBC
  • 38. Example query rewriting program rewriting program ?- sqlbind(sb_db:all, mydb). derivation_link(Input,Output,R):- reaction_reactant(R,Input), reaction_product(R,Output). call goal ?- derivation_link(X,Y) schema metadata + relation(reac_in,2). attribute(1,react_in,reac_id,int). attribute(2,react_in,input_id,int). relation(reac_out,2). attribute(1,react_out,reac_id,int). attribute(2,react_out,output_id,int). query rewriting + SELECT reac_in.reac_id, reac_in.input_id, reac_in.output_id FROM reac_in, reac_out WHERE reac_in.reac_id=reac_out.reac_id; mapping reaction_reactant(R,P) <- reac_in(R,P). reaction_product(R,P) <- reac_out(R,P). odbc.pl
  • 39. Obtaining data from web services Many large bioinformatics data providers provide RESTful APIs NCBI caBIG SWI libraries used http_client sgml (for parsing XML payloads) XML -> Models Direct translation of sgml too low level XSLT-inspired prolog template-oriented processing language Application: ontology enhanced search term expansion E.g. “find all genes implicated in neurodegenerative disease”  ‘parkinsons’ OR ‘alzheimers’ OR …
  • 40. Applications of Blipkit and LP techniques Genomics and DNA sequences Deduction of implicit information Consistency checking of genome datasets Phenotype matching Finding similarities of mutational effects
  • 41. Genome inference Deluge of genomic data Cost per genome decreasing Soon we will all know our genome sequence But what does it mean? Effective use of genomics data relies on deductive inference Many rules are logical: genome calculus Currently encoded using ad-hoc imperative code Probabilistic inference also useful But must be built on top of the logical inference
  • 42. DNA human chromosome 1: 247m base pairs, 4220 genes Entire genome: 3x109 bps, 20k genes T A G C
  • 43. DNA human chromosome 1: 247m base pairs, 4220 genes Entire genome: 3x109 bps, 20k genes T A G C Gene expression: transcription splicing translation
  • 44. Transcription A subsequence of a DNA sequence is transcribed to an RNA sequence regulated by sequence called promoters and enhancers
  • 45.
  • 46.
  • 47. Genomics databases Genome databases are important for biomedicine understanding evolution in a molecular level Problem: genome databases are incomplete stating all implicit features leads to redundancy integration and complex queries difficult ad-hoc rules embedded in imperative code Problem: genome databases are inconsistent Different interpretation of gene, exon, UTR etc
  • 48. Solution: Sequence Ontology + Deductive Database The Sequence Ontology standardizes sequence terms Additional axioms are being added Encoding genome calculus Genome relations based on Allen Interval Algebra Can be used in conjunction with a deductive genome database consistency checking does this genome dataset make sense? inference and querying what entities are present in region X?
  • 49. Sequence relationship predicates based on Allen Interval Algebra no recursion conjunction of binary terms uses arithmetic (for efficiency) Extensions: strands circular genomes upstream_of(X,Y) :- has_end(X,XE), has_start(Y,YS), XE < YS. ?- upstream_of(exon3,X). X=exon1 ; X=exon2 exon3 exon1 exon2 exon4 exon5
  • 50.
  • 51. possibility of recursion through negationexon(exon1). exon(exon2). has_end(exon1,1000,t1). has_start(exon2,2000,t2). ?- intron(X). X = i(t1,1000,2000) t1 exon1 exon2
  • 52. OWL implementation Many axioms cannot be expressed in OWL Interval relations – no arithmetic in OWL option 1: use SWRL option 2: enumerate all base pairs and use property chain axioms Cannot infer properties of unnamed individuals E.g. introns from exons Cyclic structures cannot be described Requires Description Graph extension Open World Assumption useful for semantic web CWA is more convenient for genomics
  • 53. Deductive database implementation Methods: Convert sequence ontology OWL->DLP via Thea2 Manually edit Add rules that cannot be expressed in OWL Tested on XSB and Yap requires tabling Results Currently scales to small regions more debugging required difficult to eliminate unstratified negation
  • 54. Disjunctive datalog implementation Adds: Constraints Disjunctions in rule heads Implementation DLV-Complex : allows functions in arguments Program written from scratch: Rules must be ‘safe’ Results Scales over small regions Useful for detecting inconsistencies in data More research needed More efficient programs Use of relational database backend Further exploration of ASP semantics Genomic rules have many exceptions
  • 55. Prolog implementation Removes: rules that cause cycles with backtracking Implementation Optional use of Nested Containment List library (C + SWI FLI) Results Results can be incomplete due to missing rules E.g. intron :- exon, but not exon :- intron Ruleset can be tailored for dataset Scales over medium sized datasets
  • 56. Hybrid Prolog-Relational implementation Uses same program as prolog implementation Relational database store facts (extensional) can be distributed Uses sql_compiler + mappings to genomics databases Ensembl Chado Non-recursive prolog rules dynamically translated to complex SQL Recursive subclass rules translated by query compiler using UNIONs precomputed and stored in relational database Scales to full genomes
  • 57. LP for genomics: conclusions No one paradigm is perfect Many axioms cannot be expressed in OWL but tools are good Disjunctive Datalog good for consistency checking in small regions More research required on efficiency of tabling solution, ASPs WAM solution most efficient Manually rewriting programs is tedious! Hybrid solutions useful RDBs for asserted facts
  • 58. Application: match.com for diseases Organisms have phenotypes characteristics under the control of the genes of that organism Related genes can have similar phenotypic effects even when the least common ancestor of the gene is 500m years ago Finding these genes can help understand disease evolution
  • 60. Semantic Similarity Given a collection of features F = {f1, f2, …} attributes A = {a1, a2, …} feature-attribute mappings: a(f) = F x A For any feature pair x,y, calculate: Jacard coefficient |a(x) ∩ a(y)| / |a(x)∪ a(y)| maximum IC IC(a) = -log2p(a) maxIC(x,y) = Max[IC(a) : a ∈a(x)∩ a(y)]
  • 61. SWI-Prolog implementation Uses GMP normal prolog programs have unbounded integer arithmetic allows fast bitwise implementations of set intersection/union Encode feature attribute lists as integers m : A  {0, .., |A|-1} ai(f) = ∑ 2 m(a) a ∈ a(f) Set intersection and union computed using bitwise and/or Fast implementation of Jacard coefficient J is (A1 /A2 / A1 A2)
  • 62. Similarity metrics + reasoning Attributes are description logic class expressions rarely exact matches across species a(human1) a(zebrafish7) ≠ dystrophic∩ ∃quality_of. arm_muscle atrophied∩ ∃quality_of.pectoral_fin_muscle a(human1) ∩ a(zebrafish7) = {}
  • 63. Use reasoning to find subsumer Find Least Common Ancestor expression typically class expression, not named class a(human1) a(zebrafish7) decreased_size∩ ∃quality_of. muscle_of_upper_limb dystrophic∩ ∃quality_of. arm_muscle atrophied∩ ∃quality_of.pectoral_fin_muscle a*(human1) ∩ a*(zebrafish7) = {decreased_size∩ ∃quality_of. muscle_of_upper_limb}
  • 64. Implementation: Uses Thea2 Thea2 is a prolog package for OWL2 http://github.com/vangelisv/thea reads/writes RDF/XML OWL-XML Native prolog form Description Logic Programs (DLPs) Reasoning strategies Prolog DL reasoners (via JPL/OWLAPI) SQL DB + forward chaining
  • 65. Phenotype matching: Results Proof of concept on 10 human disease genes publication forthcoming Currently applying to neurodegenerative diseases Funding to extend to all Mendelian diseases
  • 66. Web Applications http://berkeleybop.org/obo Web interface to Open Bio Ontologies Implemented in perl + SWI-Prolog Prototype for future development SWI-Prolog Production version in perl and/or java
  • 67. Experiences using LP for bioinformatics: conclusions A little bit of LP goes a long way The theory-application gap is largely untapped A variety of LP paradigms are useful ASP, datalog, DLs, prolog, ILP, … Interoperation can be hard! LP for ‘real world’ applications It is possible! Declarative approach arguably superior Web/database applications are a sweet spot We need to show more success stories ..and to dispel myths
  • 68. Recommendation: make it easier for users Documentation: Unify community knowledge in a single wiki Create a general LP mail list c.f. OWL/SemWeb community Tools: Program analysis Lint-like tool for tabled prologs, ASP Visualization Libraries CPAN for Prolog
  • 69. Recommendation: make it open-source Why Encourages collaboration Bioinformaticianslove open source The people who fund bioinformaticians love open source Open source can still generate revenue How Deposit code in open source code repositories github, sourceforge, googlecode, etc Embrace Web 2.0 blog it, put it on a wiki
  • 70. Recommendation: interoperate with RDBs Why? RDBs and LP should be a natural match Application developers are conservative and familiar with RDBs lightweight in-memory embedded RDBs are becoming more popular How: Hide LP systems behind pseudo-SQL interface SQL queries and DDL translated behind the scenes. cfsql_compiler Users can use native LP syntax and semantics as they feel comfortable Embed LP systems directly in RDBs E.g. PostgreSQL extensions Improve prolog->SQL interfaces Common API c.f. JDBC (Java), DBI (Perl)
  • 71. Recommendation: A unified API to all LP systems Use case: calling LP system from host language (java, perl, ruby, even other prolog) Problem: No standardization amongst APIs Analagous problem: RDB APIs Solved: a 20th century problem Proposal: Common REST interface Single interface per host language
  • 72. Interoperation between LP systems LP systems (ILP, ASP, Prolog, …) differ in whether they accept: Foo(x). ‘Foo’(x). ‘foobar’(x). foo(‘xy’). foo(“xy”). Non-prolog systems should: Adhere to ISO standard for intersection with pure prolog Or at least provide ISO mode Also: ISO Common Logic W3C RIF
  • 73. Future directions Scalable LP Probabilistic + logic modeling CLP(Bayes) PRISM
  • 74. Robot scientist The Automation of Science King et al. Science 3 April 2009: 85-89 DOI: 10.1126/science.1165620 http://news.bbc.co.uk/2/hi/science/nature/7979113.stm
  • 75. Acknowledgments Vangelis Vassiliadis (Thea) Stephen Veitch (intervaldb) ChristophDraxler (sql_compiler) Jan Wielemaker + SWI Mail list Paulo Moura Vítor Santos Costa + Yap developers Terrence Swift + XSB developers

Notas del editor

  1. data: curse and a blessing
  2. typically download flat files of data and manually integrate@article{stein_perl_1996, title = {How Perl saved the human genome project}, volume = {1}, number = {0001}, journal = {The Perl Journal}, author = {L. Stein}, year = {1996}},@article{stein_creatingbioinformatics_2002, title = {Creating a bioinformatics nation}, volume = {417}, number = {6885}, journal = {Nature}, author = {L. Stein}, year = {2002}, pages = {119--120}},@article{stein_integrating_2003, title = {Integrating biological databases}, volume = {4}, number = {5}, journal = {Nature Reviews Genetics}, author = {L. D. Stein}, year = {2003}, pages = {337--345}}
  3. Damascene conversion
  4. @ARTICLE{Stajich2002, author = {Stajich, J. E. and Block, D. and Boulez, K. and Brenner, S. E. andChervitz, S. A. and Dagdigian, C. and Fuellen, G. and Gilbert, J. G. and Korf, I. and Lapp, H. and Lehvaslaiho, H. and Matsalla, C. and Mungall, C. J. and Osborne, B. I. and Pocock, M. R. and Schattner, P. and Senger, M. and Stein, L. D. and Stupka, E. and Wilkinson, M. D. and Birney, E.}, title = {The Bioperl toolkit: Perl modules for the life sciences}, journal = {Genome Res}, year = {2002}, volume = {12}, pages = {1611-8}, number = {10}, note = {1088-9051 Journal Article},
  5. The empty extension problem
  6. extensional is a macro for dynamic + multifile. Also asserts facts in metamodel module allowing introspection, saving etc. Still some repetition.pldoc comments. harder to extract metamodel info. more typing would be good.metamodel directives don’t do much : graceful failure when no data. i/o
  7. same code for both in-memory and rdb – amazing!! v powerful. non-recursive only. choose when to swap out prolog store and use rdb.with recursive clauses, can choose to bind only the fact predicates
  8. The term ‘junk DNA’ is outdated
  9. Pax6 is master regulator. shared anc.5bn yrs.fly eyes vastly different.
  10. Pax6 is master regulator. shared anc.5bn yrs.fly eyes vastly different.