SlideShare una empresa de Scribd logo
1 de 57
Graph DB + Bioinformatics:
    Bio4j, recent applications and future
                  directions




www.ohnosequences.com               www.bio4j.com
But who‟s this guy talking here?
     I am Currently working as a Bioinformatics consultant/developer/researcher at
     Oh no sequences! and I have been here at the Ohio State University working as a
     Visiting Scholar during these last two months.

     Oh no what !?
     We are the R&D group at Era7 Bioinformatics.
     we like bioinformatics, cloud computing, NGS, category theory, bacterial
     genomics…
     well, lots of things.


     What about Era7 Bioinformatics?
     Era7 Bioinformatics is a Bioinformatics company specialized in sequence analysis,
     knowledge management and sequencing data interpretation.
     Our area of expertise revolves around biological sequence analysis, particularly
     Next Generation Sequencing data management and analysis.




www.ohnosequences.com                                                      www.bio4j.com
We‟re a small but quite peculiar company! (in the good sense of course  )
  Currently we have offices in:




                                                         Madrid (Spain)
       Boston MA (USA)




                                                   Yeah, I know what you‟re thinking,
                                                   they are not precisely ugly cities…



                     Granada (Spain)

www.ohnosequences.com                                                      www.bio4j.com
Our team is multidisciplinary: bioinformaticians, mathematicians, lab
   researchers, immunologists, biologists specialized in biochemistry and IT
   professionals.

   A team formed by people with different backgrounds is able to analyze the
   same problem from different point of views.



   We are based in Research

            In a fast changing area, our activity is based in being able to offer
            cutting edge solutions. This is only possible maintaining a continuous
            research and innovation activity.
            In addition, since many of our customers are researchers, being part
            of that community allow us to be really customer oriented.




www.ohnosequences.com                                                      www.bio4j.com
Everything we do is 100% Open source !

         Yes, we hate patents.
         And no, we‟re not crazy (or maybe just a bit…)



   Ok that‟s really nice, but how can that actually work??

         •   Free marketing and dissemination
         •   We can use other bioinformatics open source tools/DBs/etc…
         •   Faster adaptation to a fast changing field (bioinformatics, genomics)
         •   You may not earn a lot of money but you earn money enough doing many
             creative things




www.ohnosequences.com                                                www.bio4j.com
Money? Where from ??

        • Providing services
        • Adapting services to different infrastructures and frameworks…


   OK, but you could probably get much more money with
   a different business model…




                Yeah, but this is our philosophy!




www.ohnosequences.com                                                 www.bio4j.com
We are also based on Cloud Computing




   Cloud Computing has revolutionized the world of computing because in this
   paradigm you get the infrastructure as a service (IaaS). We are expert in the
   use of the leaders of this world: Amazon Web Services (AWS).


   So, what do we get?

         a) No investment in infrastructure. Pay per use.

         b) Scalability: For example we can launch just one virtual server for two
            hours or more than one hundred during ten hours depending on the
            amount of data that should be analyzed in different projects.



www.ohnosequences.com                                                   www.bio4j.com
What‟s Bio4j?
     Bio4j is a bioinformatics graph based DB including most data
     available in :
            Uniprot KB(SwissProt + Trembl)

            Gene Ontology (GO)

            UniRef (50,90,100)

            NCBI Taxonomy

            RefSeq

            Enzyme DB




www.ohnosequences.com                                      www.bio4j.com
What‟s Bio4j?

     It provides a completely new and powerful framework
     for protein related information querying and
     management.


     Since it relies on a high-performance graph engine, data
     is stored in a way that semantically represents its own
     structure




www.ohnosequences.com                                www.bio4j.com
What‟s Bio4j?

     Bio4j uses Neo4j technology, a "high-performance graph
     engine with all the features of a mature and robust
     database".

     Thanks to both being based on Neo4j DB and the API
     provided, Bio4j is also very scalable, allowing anyone
     to easily incorporate his own data making the best
     out of it.



www.ohnosequences.com                                 www.bio4j.com
What‟s Bio4j?


                        Everything in Bio4j is open source !



       released under AGPLv3




www.ohnosequences.com                              www.bio4j.com
Bioinformatics       Highly interconnected overlapping knowledge
DBs and Graphs       spread throughout different DBs

Initial motivation


Bio4j structure


Some samples


Why Bio4j?


Bio4j and the
Cloud




www.ohnosequences.com                                              www.bio4j.com
Bioinformatics       However all this data is in most cases modeled in relational databases.
DBs and Graphs       Sometimes even just as plain CSV files

Initial motivation          As the amount and diversity of data grows, domain models
                            become crazily complicated!
Bio4j structure


Some samples


Why Bio4j?


Bio4j and the
Cloud




www.ohnosequences.com                                                     www.bio4j.com
Bioinformatics       With a relational paradigm, the double implication
DBs and Graphs
                                          Entity  Table
Initial motivation
                     does not go both ways.

Bio4j structure
                          You get „auxiliary‟ tables that have no relationship with the small
                          piece of reality you are modeling.
Some samples

                          You need ‘artificial’ IDs only for connecting entities, (and these are mixed
Why Bio4j?                with IDs that somehow live in reality)


Bio4j and the             Entity-relationship models are cool but in the end you always have to
Cloud                     deal with ‘raw’ tables plus SQL.


                          Integrating/incorporating new knowledge into already existing
                          databases is hard and sometimes even not possible without changing
                          the domain model




www.ohnosequences.com                                                             www.bio4j.com
Bioinformatics       Life in general and biology in particular are probably not 100% like a graph…
DBs and Graphs


Initial motivation


Bio4j structure


Some samples


Why Bio4j?


Bio4j and the
Cloud




                                         but one thing‟s sure, they are not a set of tables!




www.ohnosequences.com                                                             www.bio4j.com
Bioinformatics
DBs and Graphs
                     NoSQL    (not only SQL)


Initial motivation
                     NoSQ… what !??
Bio4j structure


Some samples         Let‟s see what Wikipedia says…


Why Bio4j?                “NoSQL is a broad class of database management systems
                          that differ from the classic model of the relational database
Bio4j and the
Cloud                     management system (RDBMS) in some significant ways.
                          These data stores may not require fixed table schemas,
                          usually avoid join operations and typically scale
                          horizontally.”




www.ohnosequences.com                                                    www.bio4j.com
Bioinformatics       NoSQL data models
DBs and Graphs


Initial motivation


Bio4j structure


Some samples


Why Bio4j?


Bio4j and the
Cloud




www.ohnosequences.com                    www.bio4j.com
Bioinformatics
DBs and Graphs


Initial motivation

                     Neo4j is a high-performance, NOSQL graph database with all
Bio4j structure
                     the features of a mature and robust database.

Some samples
                     The programmer works with an object-oriented, flexible
                     network structure rather than with strict and static tables
Why Bio4j?


Bio4j and the        All the benefits of a fully transactional, enterprise-strength
Cloud                database.


                     For many applications, Neo4j offers performance
                     improvements on the order of 1000x or more compared to
                     relational DBs.




www.ohnosequences.com                                                 www.bio4j.com
Bioinformatics DBs
and Graphs
                     Ok, but why starting all this?
                     Were you so bored…?!
Initial
motivation
                     It all started somehow around our need for massive access to
                     protein GO (Gene Ontology) annotations.
Bio4j structure
                     At that point I had to develop my own MySQL DB based on the official
                     GO SQL database, and problems started from the beginning:
Some samples

                          I got crazy „deciphering‟ how to extract Uniprot protein annotations
Why Bio4j?                from GO official tables schema


Bio4j and the             Uniprot and GO official protein annotations were not always consistent
Cloud
                          Populating my own DB took really long due to all the joins and
                          subqueries needed in order to get and store the protein annotations.

                          Soon enough we also had the need of having massive access to basic
                          protein information.




www.ohnosequences.com                                                              www.bio4j.com
Bioinformatics DBs
                     These processes had to be automated for our (specifically
and Graphs
                     designed for NGS data) bacterial genome annotation system
Initial              BG7
motivation

                           Uniprot web services available were too limited:
Bio4j structure
                            - Slow
Some samples
                            - Number of queries limitation

Why Bio4j?                  - Too little information available

Bio4j and the
Cloud

                              So I downloaded the whole Uniprot DB in XML format
                              (Swiss-Prot + Trembl)

                              and started to have some fun with it !




www.ohnosequences.com                                                  www.bio4j.com
BG7 algorithm


       • Selection of the specific reference protein set
   1

       • Prediction of possible genes by BLAST similarity
   2


       • Gene definition: merging compatible similarity regions, detecting   start and stop
   3


       • Solving overlapped predicted genes
   4

       • RNA prediction by BLAST similarity
   5


   6
       • Final annotation and complete deliverables. Quality control.




www.era7bioinformatics.com
Bioinformatics DBs   We got used to having massive direct access to all this protein
and Graphs           related information…

Initial
motivation                So why not adding other resources we needed quite often
                          in most projects and which now were becoming a sort of
                          bottleneck compared to all those already included in Bio4j ?
Bio4j structure

                     Then came:
Some samples
                           -   Isoform sequences

Why Bio4j?                 -   Protein interactions and features

                           -   Uniref 50, 90, and 100
Bio4j and the
Cloud                      -   RefSeq

                           -   NCBI Taxonomy

                           -   Enzyme Expasy DB




www.ohnosequences.com                                                   www.bio4j.com
Bioinformatics DBs   Let‟s dig a bit about Bio4j structure:
and Graphs


Initial motivation   Data sources and their relationships:


Bio4j structure


Some samples


Why Bio4j?


Bio4j and the
Cloud




www.ohnosequences.com                                         www.bio4j.com
Bioinformatics DBs   Bio4j domain model
and Graphs


Initial motivation


Bio4j structure


Some samples


Why Bio4j?


Bio4j and the
Cloud




www.ohnosequences.com                     www.bio4j.com
Bioinformatics DBs
and Graphs           The Graph DB model: representation

Initial motivation
                     Core abstractions:
Bio4j structure         Nodes

                        Relationships between nodes
Some samples
                        Properties on both
Why Bio4j?


Bio4j and the
Cloud




www.ohnosequences.com                                     www.bio4j.com
Bioinformatics DBs   Let‟s dig a bit about Bio4j structure:
and Graphs


Initial motivation              How are things modeled?

Bio4j structure

                                     Couldn‟t be simpler!
Some samples


Why Bio4j?

                          Entities           Associations / Relationships
Bio4j and the
Cloud



                          Nodes                         Edges




www.ohnosequences.com                                             www.bio4j.com
Bioinformatics DBs   Some examples of nodes would be:
and Graphs


Initial motivation                             GO term
                           Protein
Bio4j structure
                                                                 Genome Element

Some samples


Why Bio4j?
                     and relationships:

Bio4j and the
Cloud
                                     Protein   PROTEIN_GO_ANNOTATION


                                                               GO term




www.ohnosequences.com                                                    www.bio4j.com
Bioinformatics DBs   We have developed a tool aimed to be used both as a reference manual and
and Graphs           initial contact for Bio4j domain model: Bio4jExplorer

                     Bio4jExplorer allows you to:
Initial motivation
                     • Navigate through all nodes and relationships

Bio4j structure
                     • Access the javadocs of any node or relationship

Some samples
                     • Graphically explore the neighborhood of a node/relationship

Why Bio4j?
                     • Look up for the indexes that may serve as an entry point for a node

Bio4j and the
Cloud                • Check incoming/outgoing relationships of a specific node


                     • Check start/end nodes of a specific relationship




www.ohnosequences.com                                                          www.bio4j.com
Bioinformatics DBs   Entry points and indexing
and Graphs

                     There are two kinds of entry points for the graph:
Initial motivation


Bio4j structure            Auxiliary relationships going from the reference node, e.g.

                             - CELLULAR_COMPONENT: leads to the root of GO cellular component
Some samples                 sub-ontology

                             - MAIN_DATASET: leads to both main datasets: Swiss-Prot and Trembl
Why Bio4j?

                           Node indexing
Bio4j and the
Cloud                      There are two types of node indexes:

                             - Exact: Only exact values are considered hits

                             - Fulltext: Regular expressions can be used




www.ohnosequences.com                                                          www.bio4j.com
Bioinformatics DBs          Querying Bio4j with Cypher
and Graphs


Initial motivation
                     Getting a keyword by its ID

Bio4j structure      START k=node:keyword_id_index(keyword_id_index = "KW-0181")
                     return k.name, k.id

Some samples
                     Finding circuits/simple cycles of length 3 where at least one protein is from
                     Swiss-Prot dataset:
Why Bio4j?
                     START d=node:dataset_name_index(dataset_name_index = "Swiss-Prot")
                     MATCH d <-[r:PROTEIN_DATASET]- p,
Bio4j and the
                     circuit = (p) -[:PROTEIN_PROTEIN_INTERACTION]-> (p2) -
Cloud
                     [:PROTEIN_PROTEIN_INTERACTION]-> (p3) -
                     [:PROTEIN_PROTEIN_INTERACTION]-> (p)
                      return p.accession, p2.accession, p3.accession


                              Check this blog post for more info and our Bio4j Cypher cheetsheet




www.ohnosequences.com                                                                 www.bio4j.com
Bioinformatics DBs
and Graphs
                                                                A graph traversal language
Initial motivation

                     Get protein by its accession number and return its full name
Bio4j structure

                      gremlin> g.idx('protein_accession_index')[['protein_accession_index':'P12345']].full_name
Some samples          ==> Aspartate aminotransferase, mitochondrial

                     Get proteins (accessions) associated to an interpro motif (limited to 4 results)
Why Bio4j?
                      gremlin>
                      g.idx('interpro_id_index')[['interpro_id_index':'IPR023306']].inE('PROTEIN_INTERPRO').outV.accessio
Bio4j and the         n[0..3]
Cloud                 ==> E2GK26
                      ==> G3PMS4
                      ==> G3Q865
                      ==> G3PIL8


                              Check our Bio4j Gremlin cheetsheet




www.ohnosequences.com                                                                              www.bio4j.com
Bioinformatics DBs   Visualizations (1)  REST Server Data Browser
and Graphs

                     Navigate through Bio4j data in real time !
Initial motivation


Bio4j structure


Some samples


Why Bio4j?


Bio4j and the
Cloud




www.ohnosequences.com                                                www.bio4j.com
Bioinformatics DBs   Visualizations (2)  Bio4j + Gephi
and Graphs

                     Get really cool graph visualizations using Bio4j and Gephi visualization and
Initial motivation   exploration platform


Bio4j structure


Some samples


Why Bio4j?


Bio4j and the
Cloud




www.ohnosequences.com                                                                www.bio4j.com
Bioinformatics DBs   Visualizations (3)  Bio4j GO Tools
and Graphs


Initial motivation


Bio4j structure


Some samples


Why Bio4j?


Bio4j and the
Cloud




www.ohnosequences.com                                      www.bio4j.com
Bioinformatics DBs   Why would I use Bio4j ?
and Graphs

                     Massive access to protein/genome/taxonomy… related
Initial motivation   information

Bio4j structure      Integration of your own DBs/resources around common
                     information
Some samples
                     Development of services tailored to your needs built around
Why Bio4j?
                     Bio4j


Bio4j and the
                     Networks analysis
Cloud
                     Visualizations

                     Besides many others I cannot think of myself…
                     If you have something in mind for which Bio4j might be useful, please let
                     us know so we can all see how it could help you meet your needs! ;)




www.ohnosequences.com                                                             www.bio4j.com
Bioinformatics DBs   Bio4j + Cloud (1)
and Graphs

                     We use AWS (Amazon Web Services) everywhere we can around
Initial motivation
                     Bio4j, giving us the following benefits:

Bio4j structure
                          Interoperability and data distribution

Some samples              Releases are available as public EBS Snapshots, giving AWS users
                          the opportunity of creating and attaching to their instances Bio4j DB
                          100% ready volumes in just a few seconds.
Why Bio4j?


Bio4j and the             CloudFormation templates:
Cloud
                             - Basic Bio4j DB Instance

                             - Bio4j REST Server Instance




www.ohnosequences.com                                                           www.bio4j.com
Bioinformatics DBs   Bio4j + Cloud (2)
and Graphs


Initial motivation       Backup and Storage using S3 (Simple Storage Service)

                          We use S3 both for backup (indirectly through the EBS snapshots) and
Bio4j structure           storage (directly storing RefSeq sequences as independent S3 files)

                          What kind of benefits do we get from this?
Some samples
                             • Easy to use

Why Bio4j?                   • Flexible

                             • Cost-Effective
Bio4j and the
Cloud                        • Reliable

                             • Scalable and high-performance

                             • Secure




www.ohnosequences.com                                                          www.bio4j.com
Bioinformatics DBs   Bio4j + Cloud (3)
and Graphs


Initial motivation       Web servers and service providers in the cloud

                          Deploying your own web server in AWS using Bio4j as back-end is really
Bio4j structure           simple.

                          A good example of this would be Bio4jTestServer, a continuously
Some samples              developed server showcasing Web Services based on Bio4j.


Why Bio4j?


Bio4j and the
Cloud




www.ohnosequences.com                                                          www.bio4j.com
Bioinformatics DBs   Community
and Graphs

                     Bio4j has a fast growing internet presence:
Initial motivation


Bio4j structure       - Twitter: check @bio4j for updates

                      - Blog: go to http://blog.bio4j.com
Some samples

                      - Mail-list: ask any question you may have in our list.
Why Bio4j?

                      - LinkedIn: check the Bio4j group
Bio4j and the
Cloud
                      - Github issues: don‟t be shy! open a new issue if you think
                                       something‟s going wrong.




www.ohnosequences.com                                                     www.bio4j.com
Bioinformatics DBs
and Graphs
                     And the best part of all this is:

Initial motivation


Bio4j structure


Some samples
                       You have the latest version of Bio4j
Why Bio4j?             already imported and
                       fully working in EgStation! ;)
Bio4j and the
Cloud




www.ohnosequences.com                                         www.bio4j.com
Bio4j + MG7 for the integration and
     analysis of Chip-seq data




www.ohnosequences.com              www.bio4j.com
Bio4j + MG7 + 24 Chip-Seq samples

     Some numbers:

                •   157 639 502 nodes

                •   742 615 705 relationships

                •   632 832 045 properties

                •   148 relationship types

                •   44 node types


             And it works just fine!


www.ohnosequences.com                           www.bio4j.com
MG7 domain model




www.ohnosequences.com   www.bio4j.com
What’s MG7?

     MG7 is a new system for massive analysis of sequences from
     metagenomics samples specially designed for next generation sequencing
     technologies.


      MG7 uses cloud computing to solve the problem of massive data analysis
     providing scalable, real time, on demand computing for metagenomics data
     analysis.


     MG7 is able to obtain annotation and functional profiles for shot gun genomic
     sequences and taxonomic assignation for any type of read.


     The inference of function and the assignation of taxonomical origin for each
     sequence are based on massive BLAST similarity analysis.




www.ohnosequences.com                                                      www.bio4j.com
What’s MG7?

     MG7 provides the possibility of choosing different parameters to fix the
     thresholds for filtering the BLAST hits:

     i.    E-value
     ii.   Identity and query coverage

     It allows exporting the results of the analysis to different data formats like:
     • XML
     • CSV
     • Gexf (Graph exchange XML format)

     As well as provides to the user with Heat maps and graph visualizations whilst
     including an user-friendly interface that allows to access to the alignment
     responsible for each functional or taxonomical read assignation and that displays
     the frequencies in the taxonomical tree --> MG7Viewer




www.ohnosequences.com                                                         www.bio4j.com
Heat-map Viz




www.ohnosequences.com   www.bio4j.com
Graph Viz




www.ohnosequences.com   www.bio4j.com
MG7 Viewer




www.ohnosequences.com   www.bio4j.com
Bio4j + GRG

        A completely new approach for
       modeling genomic information and
          gene regulatory networks




www.ohnosequences.com                 www.bio4j.com
Bio4j + GRG

     Integrating genomic information from organisms such as:


            • Zea mays subsp. Mays

            • Oryza sativa Japonica Group

            • Sorghum bicolor

            • Brachypodium distachyon

            • Arabidopsis thaliana Columbia

            • Arabidopsis lyrata lyrata MN47




www.ohnosequences.com                                          www.bio4j.com
Bio4j + GRG domain model




www.ohnosequences.com          www.bio4j.com
Bio4j + GRG

     Get all the advantages of Bio4j and Graph DB while modeling genomic data for
     grasses, (although it could be also applied to other species/families).


     Possibility of integrating data from other projects here at CAPS/EGLab in a
     common framework.


     Data-mining of data that currently is not accessible or simply is not structured
     enough/in a good way to explore it. Both for external genomic data included in
     sites like phytozome or coming directly from the experiments/analysis performed
     in the lab.


     Common framework for accessing all this information together with other
     “Universal” resources such as Uniprot, RefSeq or Gene Ontology.




www.ohnosequences.com                                                    www.bio4j.com
Bio4j + GRG

     Chance for the Lab to enter the Cloud and Graph DB world, being pioneer in
     providing access to this sort of data to a whole set of possible different users.




      Not worrying anymore about possible problems with back-ups, mantaining
      infrastructure or things like that…
      And what‟s more important:


              Scalability  Being able to adapt to the specific needs of new projects
              as they go along.




www.ohnosequences.com                                                       www.bio4j.com
And the best part… Acknowledgments!



                               Bio4j + MG7 + Chip-Seq results




                 Bio4j + GRG




www.ohnosequences.com                                           www.bio4j.com
The other guys from the basement… 




                          (Brett)




                                    (Matias)
          (Andrew)


www.ohnosequences.com                          www.bio4j.com
And of course the rest of the Lab !




www.ohnosequences.com                     www.bio4j.com
That’s it !


                        Thanks for
                        your time ;)




www.ohnosequences.com                  www.bio4j.com

Más contenido relacionado

La actualidad más candente

Ontologies neo4j-graph-workshop-berlin
Ontologies neo4j-graph-workshop-berlinOntologies neo4j-graph-workshop-berlin
Ontologies neo4j-graph-workshop-berlinSimon Jupp
 
Introduction to the BioLink datamodel
Introduction to the BioLink datamodelIntroduction to the BioLink datamodel
Introduction to the BioLink datamodelChris Mungall
 
Building SADI Services Tutorial - SIB Workshop, Geneva, December 2015
Building SADI Services Tutorial - SIB Workshop, Geneva, December 2015Building SADI Services Tutorial - SIB Workshop, Geneva, December 2015
Building SADI Services Tutorial - SIB Workshop, Geneva, December 2015Mark Wilkinson
 
FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...
FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...
FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...Mark Wilkinson
 
How SADI & SHARE help restore the Scientific Method to in silico science
How SADI & SHARE help restore the Scientific Method to in silico scienceHow SADI & SHARE help restore the Scientific Method to in silico science
How SADI & SHARE help restore the Scientific Method to in silico scienceMark Wilkinson
 
How to search_free_crystallography_databases_benedictine_university final 111...
How to search_free_crystallography_databases_benedictine_university final 111...How to search_free_crystallography_databases_benedictine_university final 111...
How to search_free_crystallography_databases_benedictine_university final 111...Benedictine University Library
 
Bio ontologies and semantic technologies[2]
Bio ontologies and semantic technologies[2]Bio ontologies and semantic technologies[2]
Bio ontologies and semantic technologies[2]Prof. Wim Van Criekinge
 
Chado for evolutionary biology
Chado for evolutionary biologyChado for evolutionary biology
Chado for evolutionary biologyChris Mungall
 
Libraries and Linked Data: Looking to the Future (2)
Libraries and Linked Data: Looking to the Future (2)Libraries and Linked Data: Looking to the Future (2)
Libraries and Linked Data: Looking to the Future (2)ALATechSource
 
Wilkinson bosc2010 moby-to-sadi
Wilkinson bosc2010 moby-to-sadiWilkinson bosc2010 moby-to-sadi
Wilkinson bosc2010 moby-to-sadiBOSC 2010
 
From OBO to OWL and back - building scalable ontologies
From OBO to OWL and back - building scalable ontologiesFrom OBO to OWL and back - building scalable ontologies
From OBO to OWL and back - building scalable ontologiesdosumis
 
2019 03 05_biological_databases_part4_v_upload
2019 03 05_biological_databases_part4_v_upload2019 03 05_biological_databases_part4_v_upload
2019 03 05_biological_databases_part4_v_uploadProf. Wim Van Criekinge
 
Getting the best of Linked Data and Property Graphs: rdf2neo and the KnetMine...
Getting the best of Linked Data and Property Graphs: rdf2neo and the KnetMine...Getting the best of Linked Data and Property Graphs: rdf2neo and the KnetMine...
Getting the best of Linked Data and Property Graphs: rdf2neo and the KnetMine...Rothamsted Research, UK
 
Libraries and Linked Data: Looking to the Future (3)
Libraries and Linked Data: Looking to the Future (3)Libraries and Linked Data: Looking to the Future (3)
Libraries and Linked Data: Looking to the Future (3)ALATechSource
 
We’re all SMILES! Building Chemical Semantic Web Services with SADI, ChEBI, a...
We’re all SMILES! Building Chemical Semantic Web Services with SADI, ChEBI, a...We’re all SMILES! Building Chemical Semantic Web Services with SADI, ChEBI, a...
We’re all SMILES! Building Chemical Semantic Web Services with SADI, ChEBI, a...Michel Dumontier
 

La actualidad más candente (20)

Ontologies neo4j-graph-workshop-berlin
Ontologies neo4j-graph-workshop-berlinOntologies neo4j-graph-workshop-berlin
Ontologies neo4j-graph-workshop-berlin
 
Introduction to the BioLink datamodel
Introduction to the BioLink datamodelIntroduction to the BioLink datamodel
Introduction to the BioLink datamodel
 
Building SADI Services Tutorial - SIB Workshop, Geneva, December 2015
Building SADI Services Tutorial - SIB Workshop, Geneva, December 2015Building SADI Services Tutorial - SIB Workshop, Geneva, December 2015
Building SADI Services Tutorial - SIB Workshop, Geneva, December 2015
 
FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...
FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...
FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...
 
The importance of the InChI identifier as a foundation technology for eScienc...
The importance of the InChI identifier as a foundation technology for eScienc...The importance of the InChI identifier as a foundation technology for eScienc...
The importance of the InChI identifier as a foundation technology for eScienc...
 
How SADI & SHARE help restore the Scientific Method to in silico science
How SADI & SHARE help restore the Scientific Method to in silico scienceHow SADI & SHARE help restore the Scientific Method to in silico science
How SADI & SHARE help restore the Scientific Method to in silico science
 
How to search_free_crystallography_databases_benedictine_university final 111...
How to search_free_crystallography_databases_benedictine_university final 111...How to search_free_crystallography_databases_benedictine_university final 111...
How to search_free_crystallography_databases_benedictine_university final 111...
 
Bio ontologies and semantic technologies[2]
Bio ontologies and semantic technologies[2]Bio ontologies and semantic technologies[2]
Bio ontologies and semantic technologies[2]
 
Triple Stores
Triple StoresTriple Stores
Triple Stores
 
Chado for evolutionary biology
Chado for evolutionary biologyChado for evolutionary biology
Chado for evolutionary biology
 
Chado introduction
Chado introductionChado introduction
Chado introduction
 
Libraries and Linked Data: Looking to the Future (2)
Libraries and Linked Data: Looking to the Future (2)Libraries and Linked Data: Looking to the Future (2)
Libraries and Linked Data: Looking to the Future (2)
 
Wilkinson bosc2010 moby-to-sadi
Wilkinson bosc2010 moby-to-sadiWilkinson bosc2010 moby-to-sadi
Wilkinson bosc2010 moby-to-sadi
 
From OBO to OWL and back - building scalable ontologies
From OBO to OWL and back - building scalable ontologiesFrom OBO to OWL and back - building scalable ontologies
From OBO to OWL and back - building scalable ontologies
 
2019 03 05_biological_databases_part4_v_upload
2019 03 05_biological_databases_part4_v_upload2019 03 05_biological_databases_part4_v_upload
2019 03 05_biological_databases_part4_v_upload
 
Chado-XML
Chado-XMLChado-XML
Chado-XML
 
Getting the best of Linked Data and Property Graphs: rdf2neo and the KnetMine...
Getting the best of Linked Data and Property Graphs: rdf2neo and the KnetMine...Getting the best of Linked Data and Property Graphs: rdf2neo and the KnetMine...
Getting the best of Linked Data and Property Graphs: rdf2neo and the KnetMine...
 
Libraries and Linked Data: Looking to the Future (3)
Libraries and Linked Data: Looking to the Future (3)Libraries and Linked Data: Looking to the Future (3)
Libraries and Linked Data: Looking to the Future (3)
 
OWL and OBO
OWL and OBOOWL and OBO
OWL and OBO
 
We’re all SMILES! Building Chemical Semantic Web Services with SADI, ChEBI, a...
We’re all SMILES! Building Chemical Semantic Web Services with SADI, ChEBI, a...We’re all SMILES! Building Chemical Semantic Web Services with SADI, ChEBI, a...
We’re all SMILES! Building Chemical Semantic Web Services with SADI, ChEBI, a...
 

Destacado

Hadoop and Graph Databases (Neo4j): Winning Combination for Bioanalytics - Jo...
Hadoop and Graph Databases (Neo4j): Winning Combination for Bioanalytics - Jo...Hadoop and Graph Databases (Neo4j): Winning Combination for Bioanalytics - Jo...
Hadoop and Graph Databases (Neo4j): Winning Combination for Bioanalytics - Jo...Neo4j
 
The power of graphs to analyze biological data
The power of graphs to analyze biological dataThe power of graphs to analyze biological data
The power of graphs to analyze biological datadatablend
 
FluxGraph: a time-machine for your graphs
FluxGraph: a time-machine for your graphsFluxGraph: a time-machine for your graphs
FluxGraph: a time-machine for your graphsdatablend
 
2015 Upload Campaigns Calendar - SlideShare
2015 Upload Campaigns Calendar - SlideShare2015 Upload Campaigns Calendar - SlideShare
2015 Upload Campaigns Calendar - SlideShareSlideShare
 
What to Upload to SlideShare
What to Upload to SlideShareWhat to Upload to SlideShare
What to Upload to SlideShareSlideShare
 
Getting Started With SlideShare
Getting Started With SlideShareGetting Started With SlideShare
Getting Started With SlideShareSlideShare
 

Destacado (8)

Hadoop and Graph Databases (Neo4j): Winning Combination for Bioanalytics - Jo...
Hadoop and Graph Databases (Neo4j): Winning Combination for Bioanalytics - Jo...Hadoop and Graph Databases (Neo4j): Winning Combination for Bioanalytics - Jo...
Hadoop and Graph Databases (Neo4j): Winning Combination for Bioanalytics - Jo...
 
The power of graphs to analyze biological data
The power of graphs to analyze biological dataThe power of graphs to analyze biological data
The power of graphs to analyze biological data
 
Genome Database Systems
Genome Database Systems Genome Database Systems
Genome Database Systems
 
FluxGraph: a time-machine for your graphs
FluxGraph: a time-machine for your graphsFluxGraph: a time-machine for your graphs
FluxGraph: a time-machine for your graphs
 
A jQuery for WebRTC
A jQuery for WebRTCA jQuery for WebRTC
A jQuery for WebRTC
 
2015 Upload Campaigns Calendar - SlideShare
2015 Upload Campaigns Calendar - SlideShare2015 Upload Campaigns Calendar - SlideShare
2015 Upload Campaigns Calendar - SlideShare
 
What to Upload to SlideShare
What to Upload to SlideShareWhat to Upload to SlideShare
What to Upload to SlideShare
 
Getting Started With SlideShare
Getting Started With SlideShareGetting Started With SlideShare
Getting Started With SlideShare
 

Similar a Graph DB + Bioinformatics: Bio4j, recent applications and future directions

Bio4j: A pioneer graph based database for the integration of biological Big D...
Bio4j: A pioneer graph based database for the integration of biological Big D...Bio4j: A pioneer graph based database for the integration of biological Big D...
Bio4j: A pioneer graph based database for the integration of biological Big D...graphdevroom
 
OWL-XML-Summer-School-09
OWL-XML-Summer-School-09OWL-XML-Summer-School-09
OWL-XML-Summer-School-09Duncan Hull
 
A semantic framework for biomedical image discovery
A semantic framework for biomedical image discoveryA semantic framework for biomedical image discovery
A semantic framework for biomedical image discoverySyed Ahmad Chan Bukhari, PhD
 
Open interoperability standards, tools and services at EMBL-EBI
Open interoperability standards, tools and services at EMBL-EBIOpen interoperability standards, tools and services at EMBL-EBI
Open interoperability standards, tools and services at EMBL-EBIPistoia Alliance
 
an-introduction-to-relational-database-theory.pdf
an-introduction-to-relational-database-theory.pdfan-introduction-to-relational-database-theory.pdf
an-introduction-to-relational-database-theory.pdfbrilliantkashuware
 
Life Science Database Cross Search and Metadata
Life Science Database Cross Search and MetadataLife Science Database Cross Search and Metadata
Life Science Database Cross Search and MetadataMaori Ito
 
Experience with MarkLogic at Elsevier
Experience with MarkLogic at ElsevierExperience with MarkLogic at Elsevier
Experience with MarkLogic at ElsevierDATAVERSITY
 
Designing Biological Databases
Designing Biological DatabasesDesigning Biological Databases
Designing Biological DatabasesArjei Balandra
 
Scientific data management from the lab to the web
Scientific data management   from the lab to the webScientific data management   from the lab to the web
Scientific data management from the lab to the webJose Manuel Gómez-Pérez
 
Linked Data for Biopharma
Linked Data for BiopharmaLinked Data for Biopharma
Linked Data for BiopharmaTom Plasterer
 
OeRC Seminar
OeRC SeminarOeRC Seminar
OeRC Seminarseanb
 
Bernadette Hyland SemTech 2011 West - Linked Data Cookbook
Bernadette Hyland SemTech 2011 West - Linked Data CookbookBernadette Hyland SemTech 2011 West - Linked Data Cookbook
Bernadette Hyland SemTech 2011 West - Linked Data CookbookBernadette Hyland-Wood
 
Polyglot Persistence with MongoDB and Neo4j
Polyglot Persistence with MongoDB and Neo4jPolyglot Persistence with MongoDB and Neo4j
Polyglot Persistence with MongoDB and Neo4jCorie Pollock
 
2010 CASCON - Towards a integrated network of data and services for the life ...
2010 CASCON - Towards a integrated network of data and services for the life ...2010 CASCON - Towards a integrated network of data and services for the life ...
2010 CASCON - Towards a integrated network of data and services for the life ...Michel Dumontier
 
Oracle Discoverer is dead - Where to next for BI?
Oracle Discoverer is dead - Where to next for BI?Oracle Discoverer is dead - Where to next for BI?
Oracle Discoverer is dead - Where to next for BI?Sage Computing Services
 
OOPs-Interview-Questions.pdf
OOPs-Interview-Questions.pdfOOPs-Interview-Questions.pdf
OOPs-Interview-Questions.pdfSamir Paul
 
SemTechBiz 2012: Domeo: a web-based tool for semantic annotation of online do...
SemTechBiz 2012: Domeo: a web-based tool for semantic annotation of online do...SemTechBiz 2012: Domeo: a web-based tool for semantic annotation of online do...
SemTechBiz 2012: Domeo: a web-based tool for semantic annotation of online do...Paolo Ciccarese
 

Similar a Graph DB + Bioinformatics: Bio4j, recent applications and future directions (20)

Bio4j: A pioneer graph based database for the integration of biological Big D...
Bio4j: A pioneer graph based database for the integration of biological Big D...Bio4j: A pioneer graph based database for the integration of biological Big D...
Bio4j: A pioneer graph based database for the integration of biological Big D...
 
OWL-XML-Summer-School-09
OWL-XML-Summer-School-09OWL-XML-Summer-School-09
OWL-XML-Summer-School-09
 
A semantic framework for biomedical image discovery
A semantic framework for biomedical image discoveryA semantic framework for biomedical image discovery
A semantic framework for biomedical image discovery
 
Open interoperability standards, tools and services at EMBL-EBI
Open interoperability standards, tools and services at EMBL-EBIOpen interoperability standards, tools and services at EMBL-EBI
Open interoperability standards, tools and services at EMBL-EBI
 
an-introduction-to-relational-database-theory.pdf
an-introduction-to-relational-database-theory.pdfan-introduction-to-relational-database-theory.pdf
an-introduction-to-relational-database-theory.pdf
 
Life Science Database Cross Search and Metadata
Life Science Database Cross Search and MetadataLife Science Database Cross Search and Metadata
Life Science Database Cross Search and Metadata
 
Experience with MarkLogic at Elsevier
Experience with MarkLogic at ElsevierExperience with MarkLogic at Elsevier
Experience with MarkLogic at Elsevier
 
Designing Biological Databases
Designing Biological DatabasesDesigning Biological Databases
Designing Biological Databases
 
Scientific data management from the lab to the web
Scientific data management   from the lab to the webScientific data management   from the lab to the web
Scientific data management from the lab to the web
 
Linked Data for Biopharma
Linked Data for BiopharmaLinked Data for Biopharma
Linked Data for Biopharma
 
OeRC Seminar
OeRC SeminarOeRC Seminar
OeRC Seminar
 
Cshl minseqe 2013_ouellette
Cshl minseqe 2013_ouelletteCshl minseqe 2013_ouellette
Cshl minseqe 2013_ouellette
 
Rashad Badrawi's Bio
Rashad Badrawi's BioRashad Badrawi's Bio
Rashad Badrawi's Bio
 
Bernadette Hyland SemTech 2011 West - Linked Data Cookbook
Bernadette Hyland SemTech 2011 West - Linked Data CookbookBernadette Hyland SemTech 2011 West - Linked Data Cookbook
Bernadette Hyland SemTech 2011 West - Linked Data Cookbook
 
Polyglot Persistence with MongoDB and Neo4j
Polyglot Persistence with MongoDB and Neo4jPolyglot Persistence with MongoDB and Neo4j
Polyglot Persistence with MongoDB and Neo4j
 
2010 CASCON - Towards a integrated network of data and services for the life ...
2010 CASCON - Towards a integrated network of data and services for the life ...2010 CASCON - Towards a integrated network of data and services for the life ...
2010 CASCON - Towards a integrated network of data and services for the life ...
 
Oracle Discoverer is dead - Where to next for BI?
Oracle Discoverer is dead - Where to next for BI?Oracle Discoverer is dead - Where to next for BI?
Oracle Discoverer is dead - Where to next for BI?
 
OOPs-Interview-Questions.pdf
OOPs-Interview-Questions.pdfOOPs-Interview-Questions.pdf
OOPs-Interview-Questions.pdf
 
Oop's
Oop'sOop's
Oop's
 
SemTechBiz 2012: Domeo: a web-based tool for semantic annotation of online do...
SemTechBiz 2012: Domeo: a web-based tool for semantic annotation of online do...SemTechBiz 2012: Domeo: a web-based tool for semantic annotation of online do...
SemTechBiz 2012: Domeo: a web-based tool for semantic annotation of online do...
 

Último

Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management SystemChristalin Nelson
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfVanessa Camilleri
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Mark Reed
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)lakshayb543
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Celine George
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONHumphrey A Beña
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxHumphrey A Beña
 
Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)cama23
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxCarlos105
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4JOYLYNSAMANIEGO
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfJemuel Francisco
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPCeline George
 
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...JojoEDelaCruz
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management systemChristalin Nelson
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPCeline George
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for BeginnersSabitha Banu
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxAshokKarra1
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parentsnavabharathschool99
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxlancelewisportillo
 
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptxiammrhaywood
 

Último (20)

Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management System
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdf
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
 
Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERP
 
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management system
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERP
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for Beginners
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptx
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parents
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
 
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
 

Graph DB + Bioinformatics: Bio4j, recent applications and future directions

  • 1. Graph DB + Bioinformatics: Bio4j, recent applications and future directions www.ohnosequences.com www.bio4j.com
  • 2. But who‟s this guy talking here? I am Currently working as a Bioinformatics consultant/developer/researcher at Oh no sequences! and I have been here at the Ohio State University working as a Visiting Scholar during these last two months. Oh no what !? We are the R&D group at Era7 Bioinformatics. we like bioinformatics, cloud computing, NGS, category theory, bacterial genomics… well, lots of things. What about Era7 Bioinformatics? Era7 Bioinformatics is a Bioinformatics company specialized in sequence analysis, knowledge management and sequencing data interpretation. Our area of expertise revolves around biological sequence analysis, particularly Next Generation Sequencing data management and analysis. www.ohnosequences.com www.bio4j.com
  • 3. We‟re a small but quite peculiar company! (in the good sense of course  ) Currently we have offices in: Madrid (Spain) Boston MA (USA) Yeah, I know what you‟re thinking, they are not precisely ugly cities… Granada (Spain) www.ohnosequences.com www.bio4j.com
  • 4. Our team is multidisciplinary: bioinformaticians, mathematicians, lab researchers, immunologists, biologists specialized in biochemistry and IT professionals. A team formed by people with different backgrounds is able to analyze the same problem from different point of views. We are based in Research In a fast changing area, our activity is based in being able to offer cutting edge solutions. This is only possible maintaining a continuous research and innovation activity. In addition, since many of our customers are researchers, being part of that community allow us to be really customer oriented. www.ohnosequences.com www.bio4j.com
  • 5. Everything we do is 100% Open source ! Yes, we hate patents. And no, we‟re not crazy (or maybe just a bit…) Ok that‟s really nice, but how can that actually work?? • Free marketing and dissemination • We can use other bioinformatics open source tools/DBs/etc… • Faster adaptation to a fast changing field (bioinformatics, genomics) • You may not earn a lot of money but you earn money enough doing many creative things www.ohnosequences.com www.bio4j.com
  • 6. Money? Where from ?? • Providing services • Adapting services to different infrastructures and frameworks… OK, but you could probably get much more money with a different business model… Yeah, but this is our philosophy! www.ohnosequences.com www.bio4j.com
  • 7. We are also based on Cloud Computing Cloud Computing has revolutionized the world of computing because in this paradigm you get the infrastructure as a service (IaaS). We are expert in the use of the leaders of this world: Amazon Web Services (AWS). So, what do we get? a) No investment in infrastructure. Pay per use. b) Scalability: For example we can launch just one virtual server for two hours or more than one hundred during ten hours depending on the amount of data that should be analyzed in different projects. www.ohnosequences.com www.bio4j.com
  • 8. What‟s Bio4j? Bio4j is a bioinformatics graph based DB including most data available in : Uniprot KB(SwissProt + Trembl) Gene Ontology (GO) UniRef (50,90,100) NCBI Taxonomy RefSeq Enzyme DB www.ohnosequences.com www.bio4j.com
  • 9. What‟s Bio4j? It provides a completely new and powerful framework for protein related information querying and management. Since it relies on a high-performance graph engine, data is stored in a way that semantically represents its own structure www.ohnosequences.com www.bio4j.com
  • 10. What‟s Bio4j? Bio4j uses Neo4j technology, a "high-performance graph engine with all the features of a mature and robust database". Thanks to both being based on Neo4j DB and the API provided, Bio4j is also very scalable, allowing anyone to easily incorporate his own data making the best out of it. www.ohnosequences.com www.bio4j.com
  • 11. What‟s Bio4j? Everything in Bio4j is open source ! released under AGPLv3 www.ohnosequences.com www.bio4j.com
  • 12. Bioinformatics Highly interconnected overlapping knowledge DBs and Graphs spread throughout different DBs Initial motivation Bio4j structure Some samples Why Bio4j? Bio4j and the Cloud www.ohnosequences.com www.bio4j.com
  • 13. Bioinformatics However all this data is in most cases modeled in relational databases. DBs and Graphs Sometimes even just as plain CSV files Initial motivation As the amount and diversity of data grows, domain models become crazily complicated! Bio4j structure Some samples Why Bio4j? Bio4j and the Cloud www.ohnosequences.com www.bio4j.com
  • 14. Bioinformatics With a relational paradigm, the double implication DBs and Graphs Entity  Table Initial motivation does not go both ways. Bio4j structure You get „auxiliary‟ tables that have no relationship with the small piece of reality you are modeling. Some samples You need ‘artificial’ IDs only for connecting entities, (and these are mixed Why Bio4j? with IDs that somehow live in reality) Bio4j and the Entity-relationship models are cool but in the end you always have to Cloud deal with ‘raw’ tables plus SQL. Integrating/incorporating new knowledge into already existing databases is hard and sometimes even not possible without changing the domain model www.ohnosequences.com www.bio4j.com
  • 15. Bioinformatics Life in general and biology in particular are probably not 100% like a graph… DBs and Graphs Initial motivation Bio4j structure Some samples Why Bio4j? Bio4j and the Cloud but one thing‟s sure, they are not a set of tables! www.ohnosequences.com www.bio4j.com
  • 16. Bioinformatics DBs and Graphs NoSQL (not only SQL) Initial motivation NoSQ… what !?? Bio4j structure Some samples Let‟s see what Wikipedia says… Why Bio4j? “NoSQL is a broad class of database management systems that differ from the classic model of the relational database Bio4j and the Cloud management system (RDBMS) in some significant ways. These data stores may not require fixed table schemas, usually avoid join operations and typically scale horizontally.” www.ohnosequences.com www.bio4j.com
  • 17. Bioinformatics NoSQL data models DBs and Graphs Initial motivation Bio4j structure Some samples Why Bio4j? Bio4j and the Cloud www.ohnosequences.com www.bio4j.com
  • 18. Bioinformatics DBs and Graphs Initial motivation Neo4j is a high-performance, NOSQL graph database with all Bio4j structure the features of a mature and robust database. Some samples The programmer works with an object-oriented, flexible network structure rather than with strict and static tables Why Bio4j? Bio4j and the All the benefits of a fully transactional, enterprise-strength Cloud database. For many applications, Neo4j offers performance improvements on the order of 1000x or more compared to relational DBs. www.ohnosequences.com www.bio4j.com
  • 19. Bioinformatics DBs and Graphs Ok, but why starting all this? Were you so bored…?! Initial motivation It all started somehow around our need for massive access to protein GO (Gene Ontology) annotations. Bio4j structure At that point I had to develop my own MySQL DB based on the official GO SQL database, and problems started from the beginning: Some samples I got crazy „deciphering‟ how to extract Uniprot protein annotations Why Bio4j? from GO official tables schema Bio4j and the Uniprot and GO official protein annotations were not always consistent Cloud Populating my own DB took really long due to all the joins and subqueries needed in order to get and store the protein annotations. Soon enough we also had the need of having massive access to basic protein information. www.ohnosequences.com www.bio4j.com
  • 20. Bioinformatics DBs These processes had to be automated for our (specifically and Graphs designed for NGS data) bacterial genome annotation system Initial BG7 motivation Uniprot web services available were too limited: Bio4j structure - Slow Some samples - Number of queries limitation Why Bio4j? - Too little information available Bio4j and the Cloud So I downloaded the whole Uniprot DB in XML format (Swiss-Prot + Trembl) and started to have some fun with it ! www.ohnosequences.com www.bio4j.com
  • 21. BG7 algorithm • Selection of the specific reference protein set 1 • Prediction of possible genes by BLAST similarity 2 • Gene definition: merging compatible similarity regions, detecting start and stop 3 • Solving overlapped predicted genes 4 • RNA prediction by BLAST similarity 5 6 • Final annotation and complete deliverables. Quality control. www.era7bioinformatics.com
  • 22. Bioinformatics DBs We got used to having massive direct access to all this protein and Graphs related information… Initial motivation So why not adding other resources we needed quite often in most projects and which now were becoming a sort of bottleneck compared to all those already included in Bio4j ? Bio4j structure Then came: Some samples - Isoform sequences Why Bio4j? - Protein interactions and features - Uniref 50, 90, and 100 Bio4j and the Cloud - RefSeq - NCBI Taxonomy - Enzyme Expasy DB www.ohnosequences.com www.bio4j.com
  • 23. Bioinformatics DBs Let‟s dig a bit about Bio4j structure: and Graphs Initial motivation Data sources and their relationships: Bio4j structure Some samples Why Bio4j? Bio4j and the Cloud www.ohnosequences.com www.bio4j.com
  • 24. Bioinformatics DBs Bio4j domain model and Graphs Initial motivation Bio4j structure Some samples Why Bio4j? Bio4j and the Cloud www.ohnosequences.com www.bio4j.com
  • 25. Bioinformatics DBs and Graphs The Graph DB model: representation Initial motivation Core abstractions: Bio4j structure Nodes Relationships between nodes Some samples Properties on both Why Bio4j? Bio4j and the Cloud www.ohnosequences.com www.bio4j.com
  • 26. Bioinformatics DBs Let‟s dig a bit about Bio4j structure: and Graphs Initial motivation How are things modeled? Bio4j structure Couldn‟t be simpler! Some samples Why Bio4j? Entities Associations / Relationships Bio4j and the Cloud Nodes Edges www.ohnosequences.com www.bio4j.com
  • 27. Bioinformatics DBs Some examples of nodes would be: and Graphs Initial motivation GO term Protein Bio4j structure Genome Element Some samples Why Bio4j? and relationships: Bio4j and the Cloud Protein PROTEIN_GO_ANNOTATION GO term www.ohnosequences.com www.bio4j.com
  • 28. Bioinformatics DBs We have developed a tool aimed to be used both as a reference manual and and Graphs initial contact for Bio4j domain model: Bio4jExplorer Bio4jExplorer allows you to: Initial motivation • Navigate through all nodes and relationships Bio4j structure • Access the javadocs of any node or relationship Some samples • Graphically explore the neighborhood of a node/relationship Why Bio4j? • Look up for the indexes that may serve as an entry point for a node Bio4j and the Cloud • Check incoming/outgoing relationships of a specific node • Check start/end nodes of a specific relationship www.ohnosequences.com www.bio4j.com
  • 29. Bioinformatics DBs Entry points and indexing and Graphs There are two kinds of entry points for the graph: Initial motivation Bio4j structure Auxiliary relationships going from the reference node, e.g. - CELLULAR_COMPONENT: leads to the root of GO cellular component Some samples sub-ontology - MAIN_DATASET: leads to both main datasets: Swiss-Prot and Trembl Why Bio4j? Node indexing Bio4j and the Cloud There are two types of node indexes: - Exact: Only exact values are considered hits - Fulltext: Regular expressions can be used www.ohnosequences.com www.bio4j.com
  • 30. Bioinformatics DBs Querying Bio4j with Cypher and Graphs Initial motivation Getting a keyword by its ID Bio4j structure START k=node:keyword_id_index(keyword_id_index = "KW-0181") return k.name, k.id Some samples Finding circuits/simple cycles of length 3 where at least one protein is from Swiss-Prot dataset: Why Bio4j? START d=node:dataset_name_index(dataset_name_index = "Swiss-Prot") MATCH d <-[r:PROTEIN_DATASET]- p, Bio4j and the circuit = (p) -[:PROTEIN_PROTEIN_INTERACTION]-> (p2) - Cloud [:PROTEIN_PROTEIN_INTERACTION]-> (p3) - [:PROTEIN_PROTEIN_INTERACTION]-> (p) return p.accession, p2.accession, p3.accession Check this blog post for more info and our Bio4j Cypher cheetsheet www.ohnosequences.com www.bio4j.com
  • 31. Bioinformatics DBs and Graphs A graph traversal language Initial motivation Get protein by its accession number and return its full name Bio4j structure gremlin> g.idx('protein_accession_index')[['protein_accession_index':'P12345']].full_name Some samples ==> Aspartate aminotransferase, mitochondrial Get proteins (accessions) associated to an interpro motif (limited to 4 results) Why Bio4j? gremlin> g.idx('interpro_id_index')[['interpro_id_index':'IPR023306']].inE('PROTEIN_INTERPRO').outV.accessio Bio4j and the n[0..3] Cloud ==> E2GK26 ==> G3PMS4 ==> G3Q865 ==> G3PIL8 Check our Bio4j Gremlin cheetsheet www.ohnosequences.com www.bio4j.com
  • 32. Bioinformatics DBs Visualizations (1)  REST Server Data Browser and Graphs Navigate through Bio4j data in real time ! Initial motivation Bio4j structure Some samples Why Bio4j? Bio4j and the Cloud www.ohnosequences.com www.bio4j.com
  • 33. Bioinformatics DBs Visualizations (2)  Bio4j + Gephi and Graphs Get really cool graph visualizations using Bio4j and Gephi visualization and Initial motivation exploration platform Bio4j structure Some samples Why Bio4j? Bio4j and the Cloud www.ohnosequences.com www.bio4j.com
  • 34. Bioinformatics DBs Visualizations (3)  Bio4j GO Tools and Graphs Initial motivation Bio4j structure Some samples Why Bio4j? Bio4j and the Cloud www.ohnosequences.com www.bio4j.com
  • 35. Bioinformatics DBs Why would I use Bio4j ? and Graphs Massive access to protein/genome/taxonomy… related Initial motivation information Bio4j structure Integration of your own DBs/resources around common information Some samples Development of services tailored to your needs built around Why Bio4j? Bio4j Bio4j and the Networks analysis Cloud Visualizations Besides many others I cannot think of myself… If you have something in mind for which Bio4j might be useful, please let us know so we can all see how it could help you meet your needs! ;) www.ohnosequences.com www.bio4j.com
  • 36. Bioinformatics DBs Bio4j + Cloud (1) and Graphs We use AWS (Amazon Web Services) everywhere we can around Initial motivation Bio4j, giving us the following benefits: Bio4j structure Interoperability and data distribution Some samples Releases are available as public EBS Snapshots, giving AWS users the opportunity of creating and attaching to their instances Bio4j DB 100% ready volumes in just a few seconds. Why Bio4j? Bio4j and the CloudFormation templates: Cloud - Basic Bio4j DB Instance - Bio4j REST Server Instance www.ohnosequences.com www.bio4j.com
  • 37. Bioinformatics DBs Bio4j + Cloud (2) and Graphs Initial motivation Backup and Storage using S3 (Simple Storage Service) We use S3 both for backup (indirectly through the EBS snapshots) and Bio4j structure storage (directly storing RefSeq sequences as independent S3 files) What kind of benefits do we get from this? Some samples • Easy to use Why Bio4j? • Flexible • Cost-Effective Bio4j and the Cloud • Reliable • Scalable and high-performance • Secure www.ohnosequences.com www.bio4j.com
  • 38. Bioinformatics DBs Bio4j + Cloud (3) and Graphs Initial motivation Web servers and service providers in the cloud Deploying your own web server in AWS using Bio4j as back-end is really Bio4j structure simple. A good example of this would be Bio4jTestServer, a continuously Some samples developed server showcasing Web Services based on Bio4j. Why Bio4j? Bio4j and the Cloud www.ohnosequences.com www.bio4j.com
  • 39. Bioinformatics DBs Community and Graphs Bio4j has a fast growing internet presence: Initial motivation Bio4j structure - Twitter: check @bio4j for updates - Blog: go to http://blog.bio4j.com Some samples - Mail-list: ask any question you may have in our list. Why Bio4j? - LinkedIn: check the Bio4j group Bio4j and the Cloud - Github issues: don‟t be shy! open a new issue if you think something‟s going wrong. www.ohnosequences.com www.bio4j.com
  • 40. Bioinformatics DBs and Graphs And the best part of all this is: Initial motivation Bio4j structure Some samples You have the latest version of Bio4j Why Bio4j? already imported and fully working in EgStation! ;) Bio4j and the Cloud www.ohnosequences.com www.bio4j.com
  • 41. Bio4j + MG7 for the integration and analysis of Chip-seq data www.ohnosequences.com www.bio4j.com
  • 42. Bio4j + MG7 + 24 Chip-Seq samples Some numbers: • 157 639 502 nodes • 742 615 705 relationships • 632 832 045 properties • 148 relationship types • 44 node types And it works just fine! www.ohnosequences.com www.bio4j.com
  • 44. What’s MG7? MG7 is a new system for massive analysis of sequences from metagenomics samples specially designed for next generation sequencing technologies. MG7 uses cloud computing to solve the problem of massive data analysis providing scalable, real time, on demand computing for metagenomics data analysis. MG7 is able to obtain annotation and functional profiles for shot gun genomic sequences and taxonomic assignation for any type of read. The inference of function and the assignation of taxonomical origin for each sequence are based on massive BLAST similarity analysis. www.ohnosequences.com www.bio4j.com
  • 45. What’s MG7? MG7 provides the possibility of choosing different parameters to fix the thresholds for filtering the BLAST hits: i. E-value ii. Identity and query coverage It allows exporting the results of the analysis to different data formats like: • XML • CSV • Gexf (Graph exchange XML format) As well as provides to the user with Heat maps and graph visualizations whilst including an user-friendly interface that allows to access to the alignment responsible for each functional or taxonomical read assignation and that displays the frequencies in the taxonomical tree --> MG7Viewer www.ohnosequences.com www.bio4j.com
  • 49. Bio4j + GRG A completely new approach for modeling genomic information and gene regulatory networks www.ohnosequences.com www.bio4j.com
  • 50. Bio4j + GRG Integrating genomic information from organisms such as: • Zea mays subsp. Mays • Oryza sativa Japonica Group • Sorghum bicolor • Brachypodium distachyon • Arabidopsis thaliana Columbia • Arabidopsis lyrata lyrata MN47 www.ohnosequences.com www.bio4j.com
  • 51. Bio4j + GRG domain model www.ohnosequences.com www.bio4j.com
  • 52. Bio4j + GRG Get all the advantages of Bio4j and Graph DB while modeling genomic data for grasses, (although it could be also applied to other species/families). Possibility of integrating data from other projects here at CAPS/EGLab in a common framework. Data-mining of data that currently is not accessible or simply is not structured enough/in a good way to explore it. Both for external genomic data included in sites like phytozome or coming directly from the experiments/analysis performed in the lab. Common framework for accessing all this information together with other “Universal” resources such as Uniprot, RefSeq or Gene Ontology. www.ohnosequences.com www.bio4j.com
  • 53. Bio4j + GRG Chance for the Lab to enter the Cloud and Graph DB world, being pioneer in providing access to this sort of data to a whole set of possible different users. Not worrying anymore about possible problems with back-ups, mantaining infrastructure or things like that… And what‟s more important: Scalability  Being able to adapt to the specific needs of new projects as they go along. www.ohnosequences.com www.bio4j.com
  • 54. And the best part… Acknowledgments! Bio4j + MG7 + Chip-Seq results Bio4j + GRG www.ohnosequences.com www.bio4j.com
  • 55. The other guys from the basement…  (Brett) (Matias) (Andrew) www.ohnosequences.com www.bio4j.com
  • 56. And of course the rest of the Lab ! www.ohnosequences.com www.bio4j.com
  • 57. That’s it ! Thanks for your time ;) www.ohnosequences.com www.bio4j.com