SlideShare una empresa de Scribd logo
1 de 122
Please turn off your mobiles or put
them on silence mode
Biological Relation Extraction Tools Using Biomedical
                          Ontologies and Text Mining
Agenda
 Introduction to Biomedical Text Mining
 System Overview
   Problem Description
   Motivation
   Challenges
 System Framework
 Application upon System Framework
   Swanson’s Algorithm
   Protein to Protein Interactions (PPI)
   Gene Clustering based on Text Mining
 Extended Work
 Conclusion and Future Work.
Agenda
 Introduction to Biomedical Text Mining
 System Overview
   Problem Description
   Motivation
   Challenges
 System Framework
 Application upon System Framework
   Swanson’s Algorithm
   Protein to Protein Interactions (PPI)
   Gene Clustering based on Text Mining
 Extended Work
 Conclusion and Future Work.
Introduction to Biomedical Text
Mining
 Text Mining = Process unstructured (textual)
  information, extract meaningful data, make the
  information contained in the text accessible to the
  various data mining (statistical and machine learning)
  algorithms.
 Biomedical Text Mining = Working on biomedical
 documents.
Agenda
 Introduction to Biomedical Text Mining
 System Overview
   Problem Description
   Motivation
   Challenges
 System Framework
 Application upon System Framework
   Swanson’s Algorithm
   Protein to Protein Interactions (PPI)
   Gene Clustering based on Text Mining
 Extended Work
 Conclusion and Future Work.
System Overview
 Problem Description
   Huge amount of information stored in million of
    documents
   These information can be used effectively to solve many
    problems
       Knowledge retrieval with no much effort
       Discover relationship between different entities
       Assessing relationship strength between different entities
       Group entities into different clusters
System Overview
 Motivation:
   Build semantic structure of documents which
    facilitates navigation through thousands of
    documents.
   Extract relationships between biomedical terms using
    text mining techniques with aid of biomedical
    ontologies.
   Using text mining to group genes into different clusters.
System Overview
 Challenges:
   Concept Recognition
   Build semantic structure of annotated documents using
    ontologies
   Relationship Recognition
   Similarity (distance) between different entities.
Overall System Components
 Framework
 Searching and Browsing
 Swanson’s Algorithm
 PPI
 Gene Clustering
Overall System Architecture

                             Searching
          Gene     Swanson’s
  PPI                            &
        Clustering Algorithm
                             Browsing



        Framework
Agenda
 Introduction to Biomedical Text Mining
 System Overview
   Problem Description
   Motivation
   Challenges
 System Framework
 Application upon System Framework
   Swanson’s Algorithm
   Protein to Protein Interactions (PPI)
   Gene Clustering based on Text Mining
 Extended Work
 Conclusion and Future Work.
System Framework Agenda
 Objective
 Framework Concept Issues
 Framework Design Issues
 Framework Sequence Diagram
 Framework Database
 Framework GUI
 Framework Demo
System Framework Agenda
 Objective
 Framework Concept Issues
 Framework Design Issues
 Framework Sequence Diagram
 Framework Database
 Framework GUI
 Comparison
 Framework Demo
System Framework
 Objective:
   Use ontologies to markup biomedical text documents.
   Based on established semantic links between documents
    and ontology concepts, the goal is build semantic
    representation of information.
   Provide services to other applications and users.
System Framework
              Framework

    Concept Issues        Design Issues
System Framework Agenda
 Objective
 Framework Concept Issues
 Framework Design Issues
 Framework Sequence Diagram
 Framework Database
 Framework GUI
 Comparison
 Framework Demo
Framework Concept Issues
      User                         Expanded Query
             Query Expansion
     Query                                                        Fetching
                                                                Documents

                                                                     Search PubMed

                 Gene                               Documents
                Ontology


                        Extract GO terms
                                              Annotate                       PubMed
                                             documents

Structure Representation
           of documents
                                  Annotated Documents
System Framework
 PubMed:
   Largest documents source in the biomedical field
   Contains over 18 million documents
   Maintained by the United States National Library
    of Medicine (NLM)
   Indexes all documents by MeSH terms to facilitate
    searching and retrieval
System Framework
 Gene Ontology:
   The Gene Ontology project is a major
    bioinformatics initiative with the aim of
    standardizing the representation of gene and gene
    product attributes across species and databases
   Includes a controlled vocabulary of terms for
    describing gene product characteristics.
   Consists of three main categories
       Cellular component
       Biological process
       Molecular function
System Framework
 MeSH database:
    Comprehensive controlled vocabulary for the purpose of indexing journal articles and
       books in the life sciences; it can also serve as a thesaurus that facilitates searching
       [Wikipedia]
 MeSH main heading:
      Anatomy
      Organisms
      Diseases
      Chemicals and Drugs
      Analytical, Diagnostic and Therapeutic Techniques and Equipment
      Psychiatry and Psychology
      Phenomena and Processes
      Disciplines and Occupations
      Anthropology, Education, Sociology and Social Phenomena
      Technology, Industry, Agriculture
      Humanities
      Information Science
      Named Groups
      Health Care
      Publication Characteristics
      Geographical liocations
System Framework
 Query Expansion (QE):is the process of reformulating
  a seed query to improve retrieval performance in
  information retrieval operations [Wikipedia]
 How ?
 Example
Query
Expansion           Ocellus
                 pigmentation
 Example




     Pigment
                                  Pigment
     metabolic   Pigmentation
                                accumulation
      process




                    Cellular
                 pigmentation
System Framework
 Documents Annotating
   Annotate documents with Gene Ontology Terms, Genes
    and proteins.
   Represent each documents by set of terms. (How ?)
GO extractor
●GO’s vocabulary consists of 7,841 words. The majority of the GO words found
occur only once in the whole ontology. On the other hand 51 of the GO words
occur at least 100 times in the ontology. More than 90%, do not occur more
than 10 times.

●words with a very high frequency do not give much information as they are
part of many labels in the ontology. However, extracting a word with a low
frequency gives a much better hint about a mentioned concept. (Zipf's law).

●From the nature of GO-terms, the words in the end are very general
ex.(activity , transport).
●Besides, many GO-terms are substring of descending GO-terms.




●The algorithm is taken from GOPubMed (2008) “GoPubMed: Ontology-based
literature search for the life sciences”.
GO extractor algorithm
 Get last
  word


  Compar                                      Set main
   e with                                     root as a
    root                                 N      root

                        Do BFS
   The same
     word
                  N    and take     Reache     Y          Get
  occurred at         each one as    s leaf               next
  any sibling
                        a root                            word

      Y


 get next word
 & do BFS and
 consider each
  one as a root
Go Extractor
Example:-
Abstract
“............................................and it's effected by the Kinase activity”. Abstract.

●   Starting from the last word of the paragraph “activity”.
●Starting from the root of the GO tree searching for GO-term ending with
“activity”.
●   When we rich it, fetch the next word and starting from the new root.
●   Now we are looking in the subtree for an ontology ends with “Kinase activity”.
●While on search we reach leaf . It means that we got a GO-term. Now restart
by take the next word and from the root.
System Framework Agenda
 Objective
 Framework Concept Issues
 Framework Design Issues
 Framework Sequence Diagram
 Framework Database
 Framework GUI
 Comparison
 Framework Demo
Framework Design Issues
 Top Level Architecture of the System can be divided into:-
    Data Handling Components
    Information Handling Components
        Information Extraction
        Information Representation
        Information Retrieval
Class Diagram
System Framework
 Framework main components:
   Document Sources
   Extractor
   Document Annotators
   Ontology Manager
   System Engine
   Database Manager
   Cache Manager
   Document
System Framework
 Document Sources
   Fetching of singles or collections of documents from
    remote stores.
 Extractor
   Implements Information Extraction algorithms to extract
    ontology terms from the documents
 Document Annotators
   establish semantic link between documents and ontology
    concepts.
   For example linking documents with its GO terms, MeSH
    terms . . . etc.
System Framework
 Ontology Manager
   Provide interface to around ontologies
   Composed by sub-managers to merge ontologies such as
    Gene ontology
 System Engine
   Main component of the system.
   Responsible for maintaining all the operations and
    communications between various components of the
    system
System Framework
 Database Manager
   implemented as a pool object (connections pool)
   handles and maintains queries to the database such
   insert, update and delete documents
 Cache Manager
   Implemented as client side of MemCached (open source
    caching project).
   Handles operations to the system cache
System Framework Agenda
 Objective
 Framework Concept Issues
 Framework Design Issues
 Framework Sequence Diagram
 Framework Database
 Framework GUI
 Comparison
 Framework Demo
Framework Sequence Diagram
System Framework Agenda
 Objective
 Framework Concept Issues
 Framework Design Issues
 Framework Sequence Diagram
 Framework Database
 Framework GUI
 Comparison
 Framework Demo
Framework Database
System Framework Agenda
 Objective
 Framework Concept Issues
 Framework Design Issues
 Framework Sequence Diagram
 Framework Database
 Framework GUI
 Comparison
 Framework Demo
Framework GUI
 GUI Goals
   User friendly
   Consistency
   Model View Control (MVC)
   Human-Computer Interaction concepts
   Usability
   Specific Application services satisfaction
   Standard Data Exchange
   Internationalization
Framework GUI
System Framework Agenda
 Objective
 Framework Concept Issues
 Framework Design Issues
 Framework Sequence Diagram
 Framework Database
 Framework GUI
 Comparison
 Framework Demo
Our system      Textpresso       XplorMed          Vivismo

Ontology   Full Gene       Only 30         Top hierarchy     Drive
used       Ontology        category        of                ontology
                                           the MeSH          from the
                                           ontology          search
                                                             result
Output     Uses the deep Returns a list    For each          Returns a list
           ontology to      of relevant    MeSH              of relevant
           navigate         abstract       category,         abstract
           through a                       there is an
           large result set                associated list
           in a non-
           sequential
           order
IBN-SINA vs. Others
                 IBN-SINA         Textpresso       XplorMed         Vivismo

Works on     works on all     Designed for     works on all     works on all
             the PubMed       full paper which the PubMed       the PubMed
             abstracts        not available    abstracts        abstracts
                              most of the
                              time
Term         Allows gaps      Tries to nd the   Extract terms   Extract terms
Extraction   within           category terms    based on        based on term
             matches and      directly in the   term            frequency in
             considers the    text only         frequency in    the collected
             information      allowing          the collected   documents
             content of the   for some          documents
             words, which     variations in
             leads to more    lower/uppercas
             rened term       e letters and
             extraction       plural forms
Our System Vs. GoPubMed
System Framework Agenda
 Objective
 Framework Concept Issues
 Framework Design Issues
 Framework Sequence Diagram
 Framework Database
 Framework GUI
 Comparison
 Framework Demo
Framework Demo


          DEMO
Agenda
 Introduction to Biomedical Text Mining
 System Overview
   Problem Description
   Motivation
   Challenges
 System Framework
 Application upon System Framework
   Swanson’s Algorithm
   Protein to Protein Interactions (PPI)
   Gene Clustering based on Text Mining
 Extended Work
 Conclusion and Future Work.
Overall System Architecture

                             Searching
          Gene     Swanson’s
  PPI                            &
        Clustering Algorithm
                             Browsing



        Framework
Swanson Algorithm(1986)
Swanson’s method is a away of finding indirect relations between
objects.




      A                                                   B
             Related                         Related
             term A1                         term B1

             Related                         Related
             term A2                         term B2

 1986: “Undiscovered public knowledge”
Cosine Similarity
                Cosine similarity is a measure of similarity between two vectors of n
            dimensions by finding the cosine of the angle between them, often used to
                                       compare documents in text mining [Wikipedia].

                                             Terms related to first term “As’ related terms”

    A           B           C        D           E            F        G          H

                                         Terms related to second term “Bs’ related terms”

    A           X           Y        B           Z            D        E          F


A       B       C       D       E        F       G        H       X        Y       Z
1       1       1       1       1        1       1        1       0        0       0



A       B       C       D       E        F       G        H       X        Y       Z
1       1       0       1       1        1       0        0       1        1       1
Cosine Similarity (Cont.)
                     Finally, applying cosine similarity function :-




A     B   C   D      E       F        G       H        X       Y       Z
1     1   1   1      1       1        1       1        0       0       0



A     B   C   D      E       F        G       H        X       Y       Z
1     1   0   1      1       1        0       0        1       1       1



                  Similarity = (1+1+0+1+1+1+0+0+0+0+0)/ (√8*√8) = 5/8 = 0.625
Swanson example
                                            Relation between P53 and P51




 1986: “Fish oil, Raynaud’s syndrome, and
 undiscovered public knowledge”
Agenda
 Introduction to Biomedical Text Mining
 System Overview
   Problem Description
   Motivation
   Challenges
 System Framework
 Application upon System Framework
   Swanson’s Algorithm
   Protein to Protein Interactions (PPI)
   Gene Clustering based on Text Mining
 Extended Work
 Conclusion and Future Work.
Overall System Architecture

                             Searching
          Gene     Swanson’s
  PPI                            &
        Clustering Algorithm
                             Browsing



        Framework
PPI Agenda
   Problem Description
   Motivation
   PPI System Overview
   PPI System Main Components
       Dependency Parse Tree
       Similarity Metrics
       K-Nearest Neighbor Classifier
   Evaluation of PPI
       Evaluation Metrics
   Results and Comparison
PPI Agenda
   Problem Description
   Motivation
   PPI System Overview
   PPI System Main Components
       Dependency Parse Tree
       Similarity Metrics
       K-Nearest Neighbor Classifier
   Evaluation of PPI
       Evaluation Metrics
   Results and Comparison
Problem Description
 Due to the ever growing amount of publications about
 protein-protein interactions, information extraction from
 text is increasingly recognized as one of crucial
 technologies in bioinformatics




 Reference:
 Gunes Erkan, Arzucan Ozgur, Dragomir R. Radev. Semi-Supervised Classication
 for Extracting Protein Interaction Sentences using Dependency Parsing.
 Proceedings of the 2007 Joint Conference on Empirical Methods in Natural
 Language Processing and Computational Natural Language Learning, pp. 228237,
 Prague, June 2007
PPI Agenda
   Problem Description
   Motivation
   PPI System Overview
   PPI System Main Components
       Dependency Parse Tree
       Similarity Metrics
       K-Nearest Neighbor Classifier
   Evaluation of PPI
       Evaluation Metrics
   Results and Comparison
Motivation
   The interactions between proteins are important for
    very numerous if not all biological functions.
   The function of a protein can be characterized more
    precisely through knowledge of PPI.
   Information about these interactions improves our
    understanding of diseases and can provide the basis
    for new therapeutic approaches.
   Validate experimental results and test benches.
PPI Agenda
   Problem Description
   Motivation
   PPI System Overview
   PPI System Main Components
       Dependency Parse Tree
       Similarity Metrics
       K-Nearest Neighbor Classifier
   Evaluation of PPI
       Evaluation Metrics
   Results and Comparison
System Overview
   We worked on Sentence level (Why?)
       It increases the semantic understood from the sentence.
       Synthesis of the sentence increases the knowledge
        obtained from it.
       Specific relation between proteins can be deduced from
        it.
System Overview
System Overview
   Our approach depends on:
      The shortest path between the entities in dependency
      tree of a sentence usually captures the necessary
      information to identify their relationship.
PPI Agenda
   Problem Description
   Motivation
   PPI System Overview
   PPI System Main Components
       Dependency Parse Tree
       Similarity Metrics
       K-Nearest Neighbor Classifier
   Evaluation of PPI
       Evaluation Metrics
   Results and Comparison
Dependency Parse Tree
Dependency Parse Tree
• Unlike a syntactic parse, it captures the semantic
 predicate-argument relationships among its words.
 Stanford Parser API to make the Natural Language

  processing task.
 Shortest path is found using Breadth First Search

  (BFS) as each edge has equal wait, and therefore this
  leads to most near path discovered first.
Dependency Parse Tree (Example)
 "The dependency tree of the sentence “The results demonstrated
  that KaiC interacts rhythmically with KaiA, KaiB, and SasA.”
Example (Cont.)
• Then, we select the shortest paths between the
 protein pairs:
  • KaiC - nsubj - interacts - prep with – SasA
  • KaiC - nsubj - interacts - prep with - SasA - conj and -
      KaiA
  •   KaiC - nsubj - interacts - prep with – SasA - conj and –
      KaiB
  •   SasA - conj and – KaiA
  •   SasA - conj and – KaiB
  •   KaiA – conj and – SasA - conj and - KaiB
Example (Cont.)
• Then, we rename the proteins in the pair as PROTX1
 and PROTX2, and all the other proteins in the sentence
 as PROTX0:
  • PROTX1 - nsubj - interacts - prep with - ROTX2
  • PROTX1 - nsubj - interacts - prep with - ROTX0 – conj_and -
      PROTX2
  •   PROTX1 - nsubj - interacts - prep with – ROTX0 –conj_and -
      PROTX2
  •   PROTX1 – conj_and - PROTX2
  •   PROTX1 – conj_and - PROTX2
  •   PROTX1 – conj_and – PROTX0 – conj_and - PROTX2
PPI Agenda
   Problem Description
   Motivation
   PPI System Overview
   PPI System Main Components
       Dependency Parse Tree
       Similarity Metrics
       K-Nearest Neighbor Classifier
   Evaluation of PPI
       Evaluation Metrics
   Results and Comparison
Similarity Metrics
Similarity Metrics
   The main idea of using similarity metrics is to
    find a function that maps input patterns into a
    target space such that a simple distance in the
    target space approximates the “semantic”
    distance in the input space.
Similarity Metrics
   We implemented Levenshtein distance (Edit
    Distance).
       number of transpositions, substitutions and deletions
        needed to transform one string into another.
   We also used an open source library called
    “SimMetrics” – Java library of 23 string similarity
    metrics.
    • Developed at the University of Sheffield (Chapman,
        2004)
Similarity Metrics
• We used only 10 string similarities from SimMetrics.
  • Cosine Similarity
  • Block Distance
  • Dice Similarity
  • Euclidean Distance
  • Jaccard Similarity
  • Jaro Similarity
  • Jaro Winkler Similarity
  • Matching Coecient
  • Monge Elkan Similarity
PPI Agenda
   Problem Description
   Motivation
   PPI System Overview
   PPI System Main Components
       Dependency Parse Tree
       Similarity Metrics
       K-Nearest Neighbor Classifier
   Evaluation of PPI
       Evaluation Metrics
   Results and Comparison
K-Nearest Neighbor Classifier
K-Nearest Neighbor Classifier
• k nearest neighbor-assign label according to the
 majority label of k nearest-neighboor training
 patterns.
KNN Example
• If k = 3, it is classified as
  a triangle

• k = 5, it is classified as a
  square
KNN Strengths and Weaknesses
• Strengths:
  • Simple to implement and use
  • Comprehensible – easy to explain prediction
  • Robust to noisy data by averaging k-nearest neighbors
KNN Strengths and Weaknesses
• Weaknesses:
  • Need a lot of space to store all examples.
  • Takes more time to classify a new example than with a
    model (need to calculate and compare distance from new
    example to all other examples).
PPI Agenda
   Problem Description
   Motivation
   PPI System Overview
   PPI System Main Components
       Dependency Parse Tree
       Similarity Metrics
       K-Nearest Neighbor Classifier
   Evaluation of PPI
       Evaluation Metrics
   Results and Comparison
Evaluation of PPI
Evaluation of PPI
• we used five different datasets which are:
  • BioInfer dataset.
  • AIMed dataset.
  • LLL dataset.
  • IEPA dataset.
  • HPRD50 dataset.
• We used KNN classier and changing K and similarity
 metric as parameters.
Confusion Matrix
Evaluation Metrics
• Precision:



• Recall:



• F-measure:
PPI Agenda
   Problem Description
   Motivation
   PPI System Overview
   PPI System Main Components
       Dependency Parse Tree
       Similarity Metrics
       K-Nearest Neighbor Classifier
   Evaluation of PPI
       Evaluation Metrics
   Results and Comparison
Results
Results
Results
Results
Results
Results and Comparison
Dataset    Min. Result   Max. Result
BioInfer   32            56.9

AIMed      5             48.9

LLL        48.8          73

IEPA       36.6          72

HPRD50     12.9          63.49
Our PPI System Vs. Graph Kernel
           Approach
Dataset    Our System   Graph Kernel
           (%)          Approach (%)

BioInfer   56.9         52.9
AIMed      48.9         56.4
LLL        73           76.8
IEPA       72           75.1
HPRD50     67           63.4
Agenda
 Introduction to Biomedical Text Mining
 System Overview
   Problem Description
   Motivation
   Challenges
 System Framework
 Application upon System Framework
   Swanson’s Algorithm
   Protein to Protein Interactions (PPI)
   Gene Clustering based on Text Mining
 Extended Work
 Conclusion and Future Work.
Overall System Architecture

                             Searching
          Gene     Swanson’s
  PPI                            &
        Clustering Algorithm
                             Browsing



        Framework
Motivation
 Goal :
   Grouping genes according some features .


 Challenges :
   Large number of genes .
   The complexity of biological networks .
Motivation
 The solution is :




          Gene       Clustering
Gene Clustering Techniques
 Based on Gene Expression :
   Advantages :
       High Accuracy .


   Disadvantages :
       High cost .
       Time Consuming .
       Noise .
Gene Clustering Techniques
 Based on Text Mining :
   Advantages :
       Low Cost .
       Low Time Consuming .


   Disadvantages :
       Low accuracy .
Gene Clustering Based on Text
Mining
 To perform Gene Clustering we need :
   Clustering Algorithms .
   Similarity Measurements .
Clustering Algorithms
 Hierarchical Algorithms .


 Partitioning Algorithms .


 Density-Based Algorithms .
Hierarchical Algorithms
 Single Linkage
Partitioning Algorithms
 K-Medoids
Density-Based Algorithms
 DBScan
Graph-Theoretic Algorithms
 Zahn Algorithm
Similarity Measurements
 Swanson Algorithm .


 Document Occurrences .
Swanson Algorithm
 Search PubMed for gene A and extract set A ( the
  most related keywords - MeSH or GO terms - ) .
 Search PubMed for gene B and extract set B ( the most
  related keywords - MeSH or GO terms - ) .
 Based on the intersection between set A and set B, we
  apply the cosine similarity.
Document Occurrences
 Search PubMed for gene A and extract set A
  (documents Ids of gene A) .
 Search PubMed for gene B and extract set B
  (documents Ids of gene B).
 Based on the intersection between set A and set B, we
  apply the Jaccard Similarity Coefficient.
Agenda
 Introduction to Biomedical Text Mining
 System Overview
   Problem Description
   Motivation
   Challenges
 System Framework
 Application upon System Framework
   Swanson’s Algorithm
   Protein to Protein Interactions (PPI)
   Gene Clustering based on Text Mining
 Extended Work
 Conclusion and Future Work.
Extended Work: PPI System with
SVM Classifier (1)
  Equation :
      u=w⋅x-b
 - Objective :
 min (1/2) || w||2
  subject to
 yi (w ⋅ xi-b) ≥ 1,
   ∀i
Extended Work: PPI System with
SVM Classifier (2)
 min Ψ (α ) = min (1/2) ∑ ∑ yi yj (xi ⋅xj)αi αj- ∑ αi
     α is called multiplier and if we can get α we can get (w , b) .


       w = ∑ yi αi xi , b = w ⋅ xk-yk for some αk > 0
Agenda
 Introduction to Biomedical Text Mining
 System Overview
   Problem Description
   Motivation
   Challenges
 System Framework
 Application upon System Framework
   Swanson’s Algorithm
   Protein to Protein Interactions (PPI)
   Gene Clustering based on Text Mining
 Extended Work
 Conclusion and Future Work.
Conclusion
 Problem 1: Algorithms for concept recognition in
  documents abstracts and titles
    We introduced an algorithm to annotate the Gene Ontology
     terms in the documents.
 Problem 2: Use the annotated documents to build a
  structured representation of documents
    We introduced how framework uses Gene Ontology to build a
     semantic representation of the obtained documents
 Problem 3: Design a system for ontology based search
  engines for biological researchers
    We introduced design of the framework and how it is flexible
     for future modifications and scalable with respect to number
     of documents and number of users.
Conclusion
 Problem 4: Using Swanson’s algorithm to assess the similarity between
  different biological terms
    We introduced how can Swanson's algorithm be used to estimate the
      similarity between two instances (P53 and P21)
 Problem 5: Supervised machine learning algorithms for prediction of
  Protein to Protein interactions
    We introduced how we used supervised machine learning algorithms such
      as KNN and a new technique to estimate the distance between sentence in
      order to predict the possible interactions between proteins mentioned in
      the documents.
 Problem 6: Unsupervised machine learning algorithms to identify
  different clusters of Genes
    We introduced how we used unsupervised machine learning algorithms
      such as DBScan and the similarity based on Swanson Algorithms and
      Cosine similarity in order to group genes mentioned in the documents in
      different clusters.
Future work
 There are hot research areas and open problems
 in the biological text mining
   The content Provider for Documents
       Google Scholar
       Using Semantic web 3.0 ( Online Journals )
   The Ontology Generation
       Ability to Edit the Ontologies and Adding knowledge
   Other Ontologies
       Using Wikipedia as an Ontology
Future work
 There are some features that may be added to the
 System
   Biomedical Ontology based Search Engine
     Provide documents summary for each group of documents
       Allow the user to save and print the results obtained by the system.
   Protein-Protein Interaction (PPI)
     Use more sophisticated classifiers and machine learning techniques
      such as AdaBoost to enhance the classification process.
     Use a background knowledge of verbs as there are many verbs gives the
      same meaning.
     This will help the system to have more accurate results, as we can
      introduce some fuzzy distance to the differences between the meaning
      of verbs. This also will introduce the ability to discover the type of
      relations between the terms and to be more semantic relations
      identification.
Future work
• There are some features that may be added to
  the System
   Gene Clustering
     Using more sophisticated clustering algorithms which originally
      designed for gene clustering.
 More Applications:
   Based on the services provided by the ontology based
    engine, we can construct some applications such as
    extracting the relation between the drugs and diseases,
    group diseases in different clusters which decision helps
    to identify the characteristics of a new discovered disease
    and other applications that relay on text mining in
    biomedical documents.
Ibn Sina
Ibn Sina

Más contenido relacionado

La actualidad más candente

The Neuroscience Information Framework:The present and future of neuroscience...
The Neuroscience Information Framework:The present and future of neuroscience...The Neuroscience Information Framework:The present and future of neuroscience...
The Neuroscience Information Framework:The present and future of neuroscience...Neuroscience Information Framework
 
Bioinformatics Final Report
Bioinformatics Final ReportBioinformatics Final Report
Bioinformatics Final ReportShruthi Choudary
 
The Past, Present and Future of Knowledge in Biology
The Past, Present and Future of Knowledge in BiologyThe Past, Present and Future of Knowledge in Biology
The Past, Present and Future of Knowledge in Biologyrobertstevens65
 
Advanced Bioinformatics for Genomics and BioData Driven Research
Advanced Bioinformatics for Genomics and BioData Driven ResearchAdvanced Bioinformatics for Genomics and BioData Driven Research
Advanced Bioinformatics for Genomics and BioData Driven ResearchEuropean Bioinformatics Institute
 
WikiPathways: how open source and open data can make omics technology more us...
WikiPathways: how open source and open data can make omics technology more us...WikiPathways: how open source and open data can make omics technology more us...
WikiPathways: how open source and open data can make omics technology more us...Chris Evelo
 
Semantic Technology empowering Real World outcomes in Biomedical Research and...
Semantic Technology empowering Real World outcomes in Biomedical Research and...Semantic Technology empowering Real World outcomes in Biomedical Research and...
Semantic Technology empowering Real World outcomes in Biomedical Research and...Amit Sheth
 
Using biological network approaches for dynamic extension of micronutrient re...
Using biological network approaches for dynamic extension of micronutrient re...Using biological network approaches for dynamic extension of micronutrient re...
Using biological network approaches for dynamic extension of micronutrient re...Chris Evelo
 
Analysis with biological pathways:
Analysis with biological pathways: Analysis with biological pathways:
Analysis with biological pathways: Chris Evelo
 
Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for L...
Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for L...Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for L...
Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for L...Amit Sheth
 
Metagenomic Data Provenance and Management using the ISA infrastructure --- o...
Metagenomic Data Provenance and Management using the ISA infrastructure --- o...Metagenomic Data Provenance and Management using the ISA infrastructure --- o...
Metagenomic Data Provenance and Management using the ISA infrastructure --- o...Alejandra Gonzalez-Beltran
 
Using ontologies to do integrative systems biology
Using ontologies to do integrative systems biologyUsing ontologies to do integrative systems biology
Using ontologies to do integrative systems biologyChris Evelo
 
Opening up pharmacological space, the OPEN PHACTs api
Opening up pharmacological space, the OPEN PHACTs apiOpening up pharmacological space, the OPEN PHACTs api
Opening up pharmacological space, the OPEN PHACTs apiChris Evelo
 

La actualidad más candente (20)

PhDc exam presentation
PhDc exam presentationPhDc exam presentation
PhDc exam presentation
 
The Neuroscience Information Framework:The present and future of neuroscience...
The Neuroscience Information Framework:The present and future of neuroscience...The Neuroscience Information Framework:The present and future of neuroscience...
The Neuroscience Information Framework:The present and future of neuroscience...
 
Bioinformatics Final Report
Bioinformatics Final ReportBioinformatics Final Report
Bioinformatics Final Report
 
The Past, Present and Future of Knowledge in Biology
The Past, Present and Future of Knowledge in BiologyThe Past, Present and Future of Knowledge in Biology
The Past, Present and Future of Knowledge in Biology
 
NETTAB 2013
NETTAB 2013NETTAB 2013
NETTAB 2013
 
Advanced Bioinformatics for Genomics and BioData Driven Research
Advanced Bioinformatics for Genomics and BioData Driven ResearchAdvanced Bioinformatics for Genomics and BioData Driven Research
Advanced Bioinformatics for Genomics and BioData Driven Research
 
WikiPathways: how open source and open data can make omics technology more us...
WikiPathways: how open source and open data can make omics technology more us...WikiPathways: how open source and open data can make omics technology more us...
WikiPathways: how open source and open data can make omics technology more us...
 
CSHALS 2013
CSHALS 2013CSHALS 2013
CSHALS 2013
 
Use of data
Use of dataUse of data
Use of data
 
Semantic Technology empowering Real World outcomes in Biomedical Research and...
Semantic Technology empowering Real World outcomes in Biomedical Research and...Semantic Technology empowering Real World outcomes in Biomedical Research and...
Semantic Technology empowering Real World outcomes in Biomedical Research and...
 
Using biological network approaches for dynamic extension of micronutrient re...
Using biological network approaches for dynamic extension of micronutrient re...Using biological network approaches for dynamic extension of micronutrient re...
Using biological network approaches for dynamic extension of micronutrient re...
 
NETTAB 2012
NETTAB 2012NETTAB 2012
NETTAB 2012
 
Analysis with biological pathways:
Analysis with biological pathways: Analysis with biological pathways:
Analysis with biological pathways:
 
T1 2018 bioinformatics
T1 2018 bioinformaticsT1 2018 bioinformatics
T1 2018 bioinformatics
 
Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for L...
Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for L...Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for L...
Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for L...
 
Metagenomic Data Provenance and Management using the ISA infrastructure --- o...
Metagenomic Data Provenance and Management using the ISA infrastructure --- o...Metagenomic Data Provenance and Management using the ISA infrastructure --- o...
Metagenomic Data Provenance and Management using the ISA infrastructure --- o...
 
2012 03 01_bioinformatics_ii_les1
2012 03 01_bioinformatics_ii_les12012 03 01_bioinformatics_ii_les1
2012 03 01_bioinformatics_ii_les1
 
Using ontologies to do integrative systems biology
Using ontologies to do integrative systems biologyUsing ontologies to do integrative systems biology
Using ontologies to do integrative systems biology
 
Opening up pharmacological space, the OPEN PHACTs api
Opening up pharmacological space, the OPEN PHACTs apiOpening up pharmacological space, the OPEN PHACTs api
Opening up pharmacological space, the OPEN PHACTs api
 
Drug Discovery- ELRIG -2012
Drug Discovery- ELRIG -2012Drug Discovery- ELRIG -2012
Drug Discovery- ELRIG -2012
 

Destacado

Ibnu sina (avicena)
Ibnu sina (avicena)Ibnu sina (avicena)
Ibnu sina (avicena)elmakrufi
 
Ibnu Sina presentation slides
Ibnu Sina presentation slidesIbnu Sina presentation slides
Ibnu Sina presentation slidesSri Suganyaa
 
Ibn sina
Ibn sinaIbn sina
Ibn sinamious
 
IBNU SINA SEM 2
IBNU SINA SEM 2IBNU SINA SEM 2
IBNU SINA SEM 2anuar2u
 
Tokoh Islam : Ibnu Sina
Tokoh Islam : Ibnu SinaTokoh Islam : Ibnu Sina
Tokoh Islam : Ibnu SinaAnnis Najwa
 
Avicenna ibn sina peak
Avicenna ibn sina peakAvicenna ibn sina peak
Avicenna ibn sina peakazzip khan
 
Muslim philosophers in psychology
Muslim philosophers in psychologyMuslim philosophers in psychology
Muslim philosophers in psychologyGul Meena
 
Lecture slides week14-15
Lecture slides week14-15Lecture slides week14-15
Lecture slides week14-15Shani729
 
Distance Metric Learning tutorial at CVPR 2015
Distance Metric Learning tutorial at CVPR 2015Distance Metric Learning tutorial at CVPR 2015
Distance Metric Learning tutorial at CVPR 2015Ruiping Wang
 
APA Divisions Slideshow at APA Convention 2016
APA Divisions Slideshow at APA Convention 2016APA Divisions Slideshow at APA Convention 2016
APA Divisions Slideshow at APA Convention 2016Keith Cooke
 
Book: Al-Farabi & the Foundation of Islamic Political Philosophy
Book: Al-Farabi & the Foundation  of Islamic Political PhilosophyBook: Al-Farabi & the Foundation  of Islamic Political Philosophy
Book: Al-Farabi & the Foundation of Islamic Political PhilosophyMuhsin Maltezos
 

Destacado (20)

Ibnu sina (avicena)
Ibnu sina (avicena)Ibnu sina (avicena)
Ibnu sina (avicena)
 
Ibnu Sina presentation slides
Ibnu Sina presentation slidesIbnu Sina presentation slides
Ibnu Sina presentation slides
 
Ibn sina
Ibn sinaIbn sina
Ibn sina
 
IBNU SINA SEM 2
IBNU SINA SEM 2IBNU SINA SEM 2
IBNU SINA SEM 2
 
Ibn Sina
Ibn SinaIbn Sina
Ibn Sina
 
Tokoh Islam : Ibnu Sina
Tokoh Islam : Ibnu SinaTokoh Islam : Ibnu Sina
Tokoh Islam : Ibnu Sina
 
Topik 2 biography ibn sina
Topik 2 biography ibn sinaTopik 2 biography ibn sina
Topik 2 biography ibn sina
 
Ibnu sina
Ibnu sinaIbnu sina
Ibnu sina
 
Ibn Sina Avicenna
Ibn Sina AvicennaIbn Sina Avicenna
Ibn Sina Avicenna
 
Ibnu sina
Ibnu sinaIbnu sina
Ibnu sina
 
Avicenna ibn sina peak
Avicenna ibn sina peakAvicenna ibn sina peak
Avicenna ibn sina peak
 
Muslim philosophers in psychology
Muslim philosophers in psychologyMuslim philosophers in psychology
Muslim philosophers in psychology
 
Turkey famous theologians
Turkey   famous theologiansTurkey   famous theologians
Turkey famous theologians
 
Lecture slides week14-15
Lecture slides week14-15Lecture slides week14-15
Lecture slides week14-15
 
Distance Metric Learning tutorial at CVPR 2015
Distance Metric Learning tutorial at CVPR 2015Distance Metric Learning tutorial at CVPR 2015
Distance Metric Learning tutorial at CVPR 2015
 
Islamic Medicine
Islamic MedicineIslamic Medicine
Islamic Medicine
 
Rebt kel.4
Rebt kel.4Rebt kel.4
Rebt kel.4
 
APA Divisions Slideshow at APA Convention 2016
APA Divisions Slideshow at APA Convention 2016APA Divisions Slideshow at APA Convention 2016
APA Divisions Slideshow at APA Convention 2016
 
Viktor Frankl
Viktor FranklViktor Frankl
Viktor Frankl
 
Book: Al-Farabi & the Foundation of Islamic Political Philosophy
Book: Al-Farabi & the Foundation  of Islamic Political PhilosophyBook: Al-Farabi & the Foundation  of Islamic Political Philosophy
Book: Al-Farabi & the Foundation of Islamic Political Philosophy
 

Similar a Ibn Sina

Computing on the shoulders of giants
Computing on the shoulders of giantsComputing on the shoulders of giants
Computing on the shoulders of giantsBenjamin Good
 
Welch Wordifier Bosc2009
Welch Wordifier Bosc2009Welch Wordifier Bosc2009
Welch Wordifier Bosc2009bosc
 
How Bio Ontologies Enable Open Science
How Bio Ontologies Enable Open ScienceHow Bio Ontologies Enable Open Science
How Bio Ontologies Enable Open Sciencedrnigam
 
Semantics for Bioinformatics: What, Why and How of Search, Integration and An...
Semantics for Bioinformatics: What, Why and How of Search, Integration and An...Semantics for Bioinformatics: What, Why and How of Search, Integration and An...
Semantics for Bioinformatics: What, Why and How of Search, Integration and An...Amit Sheth
 
2011-10-11 Open PHACTS at BioIT World Europe
2011-10-11 Open PHACTS at BioIT World Europe2011-10-11 Open PHACTS at BioIT World Europe
2011-10-11 Open PHACTS at BioIT World Europeopen_phacts
 
Bioinformatic, and tools by kk sahu
Bioinformatic, and tools by kk sahuBioinformatic, and tools by kk sahu
Bioinformatic, and tools by kk sahuKAUSHAL SAHU
 
Bioschemas at bio hackathon 2017
Bioschemas at bio hackathon 2017Bioschemas at bio hackathon 2017
Bioschemas at bio hackathon 2017Bioschemas
 
NCBO haendel talk 2013
NCBO haendel talk 2013NCBO haendel talk 2013
NCBO haendel talk 2013mhaendel
 
Ontologies for life sciences: examples from the gene ontology
Ontologies for life sciences: examples from the gene ontologyOntologies for life sciences: examples from the gene ontology
Ontologies for life sciences: examples from the gene ontologyMelanie Courtot
 
Literature Based Framework for Semantic Descriptions of e-Science resources
Literature Based Framework for Semantic Descriptions of e-Science resourcesLiterature Based Framework for Semantic Descriptions of e-Science resources
Literature Based Framework for Semantic Descriptions of e-Science resourcesHammad Afzal
 
Semantic Web for Health Care and Biomedical Informatics
Semantic Web for Health Care and Biomedical InformaticsSemantic Web for Health Care and Biomedical Informatics
Semantic Web for Health Care and Biomedical InformaticsAmit Sheth
 
NAISTビッグデータシンポジウム - 情報 松本先生
NAISTビッグデータシンポジウム - 情報 松本先生NAISTビッグデータシンポジウム - 情報 松本先生
NAISTビッグデータシンポジウム - 情報 松本先生ysuzuki-naist
 
Introduction to Bioinformatics-1.pdf
Introduction to Bioinformatics-1.pdfIntroduction to Bioinformatics-1.pdf
Introduction to Bioinformatics-1.pdfkigaruantony
 
Data Mining in Rediology reports
Data Mining in Rediology reportsData Mining in Rediology reports
Data Mining in Rediology reportsSaeed Mehrabi
 

Similar a Ibn Sina (20)

Computing on the shoulders of giants
Computing on the shoulders of giantsComputing on the shoulders of giants
Computing on the shoulders of giants
 
Welch Wordifier Bosc2009
Welch Wordifier Bosc2009Welch Wordifier Bosc2009
Welch Wordifier Bosc2009
 
Prosdocimi ucb cdao
Prosdocimi ucb cdaoProsdocimi ucb cdao
Prosdocimi ucb cdao
 
Chibucos annot go_final
Chibucos annot go_finalChibucos annot go_final
Chibucos annot go_final
 
How Bio Ontologies Enable Open Science
How Bio Ontologies Enable Open ScienceHow Bio Ontologies Enable Open Science
How Bio Ontologies Enable Open Science
 
Semantics for Bioinformatics: What, Why and How of Search, Integration and An...
Semantics for Bioinformatics: What, Why and How of Search, Integration and An...Semantics for Bioinformatics: What, Why and How of Search, Integration and An...
Semantics for Bioinformatics: What, Why and How of Search, Integration and An...
 
2011-10-11 Open PHACTS at BioIT World Europe
2011-10-11 Open PHACTS at BioIT World Europe2011-10-11 Open PHACTS at BioIT World Europe
2011-10-11 Open PHACTS at BioIT World Europe
 
www.ijerd.com
www.ijerd.comwww.ijerd.com
www.ijerd.com
 
bioinformatics enabling knowledge generation from agricultural omics data
bioinformatics enabling knowledge generation from agricultural omics databioinformatics enabling knowledge generation from agricultural omics data
bioinformatics enabling knowledge generation from agricultural omics data
 
Bioinformatic, and tools by kk sahu
Bioinformatic, and tools by kk sahuBioinformatic, and tools by kk sahu
Bioinformatic, and tools by kk sahu
 
Bioschemas at bio hackathon 2017
Bioschemas at bio hackathon 2017Bioschemas at bio hackathon 2017
Bioschemas at bio hackathon 2017
 
NCBO haendel talk 2013
NCBO haendel talk 2013NCBO haendel talk 2013
NCBO haendel talk 2013
 
Ontologies for life sciences: examples from the gene ontology
Ontologies for life sciences: examples from the gene ontologyOntologies for life sciences: examples from the gene ontology
Ontologies for life sciences: examples from the gene ontology
 
Literature Based Framework for Semantic Descriptions of e-Science resources
Literature Based Framework for Semantic Descriptions of e-Science resourcesLiterature Based Framework for Semantic Descriptions of e-Science resources
Literature Based Framework for Semantic Descriptions of e-Science resources
 
Semantic Web for Health Care and Biomedical Informatics
Semantic Web for Health Care and Biomedical InformaticsSemantic Web for Health Care and Biomedical Informatics
Semantic Web for Health Care and Biomedical Informatics
 
Applications of bioinformatics
Applications of bioinformaticsApplications of bioinformatics
Applications of bioinformatics
 
NAISTビッグデータシンポジウム - 情報 松本先生
NAISTビッグデータシンポジウム - 情報 松本先生NAISTビッグデータシンポジウム - 情報 松本先生
NAISTビッグデータシンポジウム - 情報 松本先生
 
Introduction to Bioinformatics-1.pdf
Introduction to Bioinformatics-1.pdfIntroduction to Bioinformatics-1.pdf
Introduction to Bioinformatics-1.pdf
 
Data Mining in Rediology reports
Data Mining in Rediology reportsData Mining in Rediology reports
Data Mining in Rediology reports
 
Bioinformatics introduction
Bioinformatics introductionBioinformatics introduction
Bioinformatics introduction
 

Más de Yasmine Gaber (8)

Capistrano
CapistranoCapistrano
Capistrano
 
Ionic
IonicIonic
Ionic
 
Dyna trace
Dyna traceDyna trace
Dyna trace
 
Mahout part2
Mahout part2Mahout part2
Mahout part2
 
Mahout part1
Mahout part1Mahout part1
Mahout part1
 
Home Bowling
Home BowlingHome Bowling
Home Bowling
 
Oauth2.0
Oauth2.0Oauth2.0
Oauth2.0
 
Why_do i_hate_shopping
Why_do i_hate_shoppingWhy_do i_hate_shopping
Why_do i_hate_shopping
 

Último

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...apidays
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusZilliz
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 

Último (20)

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 

Ibn Sina

  • 1. Please turn off your mobiles or put them on silence mode
  • 2. Biological Relation Extraction Tools Using Biomedical Ontologies and Text Mining
  • 3. Agenda  Introduction to Biomedical Text Mining  System Overview  Problem Description  Motivation  Challenges  System Framework  Application upon System Framework  Swanson’s Algorithm  Protein to Protein Interactions (PPI)  Gene Clustering based on Text Mining  Extended Work  Conclusion and Future Work.
  • 4. Agenda  Introduction to Biomedical Text Mining  System Overview  Problem Description  Motivation  Challenges  System Framework  Application upon System Framework  Swanson’s Algorithm  Protein to Protein Interactions (PPI)  Gene Clustering based on Text Mining  Extended Work  Conclusion and Future Work.
  • 5. Introduction to Biomedical Text Mining  Text Mining = Process unstructured (textual) information, extract meaningful data, make the information contained in the text accessible to the various data mining (statistical and machine learning) algorithms.  Biomedical Text Mining = Working on biomedical documents.
  • 6. Agenda  Introduction to Biomedical Text Mining  System Overview  Problem Description  Motivation  Challenges  System Framework  Application upon System Framework  Swanson’s Algorithm  Protein to Protein Interactions (PPI)  Gene Clustering based on Text Mining  Extended Work  Conclusion and Future Work.
  • 7. System Overview  Problem Description  Huge amount of information stored in million of documents  These information can be used effectively to solve many problems  Knowledge retrieval with no much effort  Discover relationship between different entities  Assessing relationship strength between different entities  Group entities into different clusters
  • 8. System Overview  Motivation:  Build semantic structure of documents which facilitates navigation through thousands of documents.  Extract relationships between biomedical terms using text mining techniques with aid of biomedical ontologies.  Using text mining to group genes into different clusters.
  • 9. System Overview  Challenges:  Concept Recognition  Build semantic structure of annotated documents using ontologies  Relationship Recognition  Similarity (distance) between different entities.
  • 10. Overall System Components  Framework  Searching and Browsing  Swanson’s Algorithm  PPI  Gene Clustering
  • 11. Overall System Architecture Searching Gene Swanson’s PPI & Clustering Algorithm Browsing Framework
  • 12. Agenda  Introduction to Biomedical Text Mining  System Overview  Problem Description  Motivation  Challenges  System Framework  Application upon System Framework  Swanson’s Algorithm  Protein to Protein Interactions (PPI)  Gene Clustering based on Text Mining  Extended Work  Conclusion and Future Work.
  • 13. System Framework Agenda  Objective  Framework Concept Issues  Framework Design Issues  Framework Sequence Diagram  Framework Database  Framework GUI  Framework Demo
  • 14. System Framework Agenda  Objective  Framework Concept Issues  Framework Design Issues  Framework Sequence Diagram  Framework Database  Framework GUI  Comparison  Framework Demo
  • 15. System Framework  Objective:  Use ontologies to markup biomedical text documents.  Based on established semantic links between documents and ontology concepts, the goal is build semantic representation of information.  Provide services to other applications and users.
  • 16. System Framework Framework Concept Issues Design Issues
  • 17. System Framework Agenda  Objective  Framework Concept Issues  Framework Design Issues  Framework Sequence Diagram  Framework Database  Framework GUI  Comparison  Framework Demo
  • 18. Framework Concept Issues User Expanded Query Query Expansion Query Fetching Documents Search PubMed Gene Documents Ontology Extract GO terms Annotate PubMed documents Structure Representation of documents Annotated Documents
  • 19. System Framework  PubMed:  Largest documents source in the biomedical field  Contains over 18 million documents  Maintained by the United States National Library of Medicine (NLM)  Indexes all documents by MeSH terms to facilitate searching and retrieval
  • 20. System Framework  Gene Ontology:  The Gene Ontology project is a major bioinformatics initiative with the aim of standardizing the representation of gene and gene product attributes across species and databases  Includes a controlled vocabulary of terms for describing gene product characteristics.  Consists of three main categories  Cellular component  Biological process  Molecular function
  • 21. System Framework  MeSH database:  Comprehensive controlled vocabulary for the purpose of indexing journal articles and books in the life sciences; it can also serve as a thesaurus that facilitates searching [Wikipedia]  MeSH main heading:  Anatomy  Organisms  Diseases  Chemicals and Drugs  Analytical, Diagnostic and Therapeutic Techniques and Equipment  Psychiatry and Psychology  Phenomena and Processes  Disciplines and Occupations  Anthropology, Education, Sociology and Social Phenomena  Technology, Industry, Agriculture  Humanities  Information Science  Named Groups  Health Care  Publication Characteristics  Geographical liocations
  • 22. System Framework  Query Expansion (QE):is the process of reformulating a seed query to improve retrieval performance in information retrieval operations [Wikipedia]  How ?  Example
  • 23. Query Expansion Ocellus pigmentation Example Pigment Pigment metabolic Pigmentation accumulation process Cellular pigmentation
  • 24. System Framework  Documents Annotating  Annotate documents with Gene Ontology Terms, Genes and proteins.  Represent each documents by set of terms. (How ?)
  • 25. GO extractor ●GO’s vocabulary consists of 7,841 words. The majority of the GO words found occur only once in the whole ontology. On the other hand 51 of the GO words occur at least 100 times in the ontology. More than 90%, do not occur more than 10 times. ●words with a very high frequency do not give much information as they are part of many labels in the ontology. However, extracting a word with a low frequency gives a much better hint about a mentioned concept. (Zipf's law). ●From the nature of GO-terms, the words in the end are very general ex.(activity , transport). ●Besides, many GO-terms are substring of descending GO-terms. ●The algorithm is taken from GOPubMed (2008) “GoPubMed: Ontology-based literature search for the life sciences”.
  • 26. GO extractor algorithm Get last word Compar Set main e with root as a root N root Do BFS The same word N and take Reache Y Get occurred at each one as s leaf next any sibling a root word Y get next word & do BFS and consider each one as a root
  • 27. Go Extractor Example:- Abstract “............................................and it's effected by the Kinase activity”. Abstract. ● Starting from the last word of the paragraph “activity”. ●Starting from the root of the GO tree searching for GO-term ending with “activity”. ● When we rich it, fetch the next word and starting from the new root. ● Now we are looking in the subtree for an ontology ends with “Kinase activity”. ●While on search we reach leaf . It means that we got a GO-term. Now restart by take the next word and from the root.
  • 28. System Framework Agenda  Objective  Framework Concept Issues  Framework Design Issues  Framework Sequence Diagram  Framework Database  Framework GUI  Comparison  Framework Demo
  • 29. Framework Design Issues  Top Level Architecture of the System can be divided into:-  Data Handling Components  Information Handling Components  Information Extraction  Information Representation  Information Retrieval
  • 31. System Framework  Framework main components:  Document Sources  Extractor  Document Annotators  Ontology Manager  System Engine  Database Manager  Cache Manager  Document
  • 32. System Framework  Document Sources  Fetching of singles or collections of documents from remote stores.  Extractor  Implements Information Extraction algorithms to extract ontology terms from the documents  Document Annotators  establish semantic link between documents and ontology concepts.  For example linking documents with its GO terms, MeSH terms . . . etc.
  • 33. System Framework  Ontology Manager  Provide interface to around ontologies  Composed by sub-managers to merge ontologies such as Gene ontology  System Engine  Main component of the system.  Responsible for maintaining all the operations and communications between various components of the system
  • 34. System Framework  Database Manager  implemented as a pool object (connections pool)  handles and maintains queries to the database such insert, update and delete documents  Cache Manager  Implemented as client side of MemCached (open source caching project).  Handles operations to the system cache
  • 35. System Framework Agenda  Objective  Framework Concept Issues  Framework Design Issues  Framework Sequence Diagram  Framework Database  Framework GUI  Comparison  Framework Demo
  • 37. System Framework Agenda  Objective  Framework Concept Issues  Framework Design Issues  Framework Sequence Diagram  Framework Database  Framework GUI  Comparison  Framework Demo
  • 39. System Framework Agenda  Objective  Framework Concept Issues  Framework Design Issues  Framework Sequence Diagram  Framework Database  Framework GUI  Comparison  Framework Demo
  • 40. Framework GUI  GUI Goals  User friendly  Consistency  Model View Control (MVC)  Human-Computer Interaction concepts  Usability  Specific Application services satisfaction  Standard Data Exchange  Internationalization
  • 42. System Framework Agenda  Objective  Framework Concept Issues  Framework Design Issues  Framework Sequence Diagram  Framework Database  Framework GUI  Comparison  Framework Demo
  • 43. Our system Textpresso XplorMed Vivismo Ontology Full Gene Only 30 Top hierarchy Drive used Ontology category of ontology the MeSH from the ontology search result Output Uses the deep Returns a list For each Returns a list ontology to of relevant MeSH of relevant navigate abstract category, abstract through a there is an large result set associated list in a non- sequential order
  • 44. IBN-SINA vs. Others IBN-SINA Textpresso XplorMed Vivismo Works on works on all Designed for works on all works on all the PubMed full paper which the PubMed the PubMed abstracts not available abstracts abstracts most of the time Term Allows gaps Tries to nd the Extract terms Extract terms Extraction within category terms based on based on term matches and directly in the term frequency in considers the text only frequency in the collected information allowing the collected documents content of the for some documents words, which variations in leads to more lower/uppercas rened term e letters and extraction plural forms
  • 45. Our System Vs. GoPubMed
  • 46. System Framework Agenda  Objective  Framework Concept Issues  Framework Design Issues  Framework Sequence Diagram  Framework Database  Framework GUI  Comparison  Framework Demo
  • 48. Agenda  Introduction to Biomedical Text Mining  System Overview  Problem Description  Motivation  Challenges  System Framework  Application upon System Framework  Swanson’s Algorithm  Protein to Protein Interactions (PPI)  Gene Clustering based on Text Mining  Extended Work  Conclusion and Future Work.
  • 49. Overall System Architecture Searching Gene Swanson’s PPI & Clustering Algorithm Browsing Framework
  • 50. Swanson Algorithm(1986) Swanson’s method is a away of finding indirect relations between objects. A B Related Related term A1 term B1 Related Related term A2 term B2 1986: “Undiscovered public knowledge”
  • 51. Cosine Similarity Cosine similarity is a measure of similarity between two vectors of n dimensions by finding the cosine of the angle between them, often used to compare documents in text mining [Wikipedia]. Terms related to first term “As’ related terms” A B C D E F G H Terms related to second term “Bs’ related terms” A X Y B Z D E F A B C D E F G H X Y Z 1 1 1 1 1 1 1 1 0 0 0 A B C D E F G H X Y Z 1 1 0 1 1 1 0 0 1 1 1
  • 52. Cosine Similarity (Cont.) Finally, applying cosine similarity function :- A B C D E F G H X Y Z 1 1 1 1 1 1 1 1 0 0 0 A B C D E F G H X Y Z 1 1 0 1 1 1 0 0 1 1 1 Similarity = (1+1+0+1+1+1+0+0+0+0+0)/ (√8*√8) = 5/8 = 0.625
  • 53. Swanson example Relation between P53 and P51 1986: “Fish oil, Raynaud’s syndrome, and undiscovered public knowledge”
  • 54. Agenda  Introduction to Biomedical Text Mining  System Overview  Problem Description  Motivation  Challenges  System Framework  Application upon System Framework  Swanson’s Algorithm  Protein to Protein Interactions (PPI)  Gene Clustering based on Text Mining  Extended Work  Conclusion and Future Work.
  • 55. Overall System Architecture Searching Gene Swanson’s PPI & Clustering Algorithm Browsing Framework
  • 56. PPI Agenda  Problem Description  Motivation  PPI System Overview  PPI System Main Components  Dependency Parse Tree  Similarity Metrics  K-Nearest Neighbor Classifier  Evaluation of PPI  Evaluation Metrics  Results and Comparison
  • 57. PPI Agenda  Problem Description  Motivation  PPI System Overview  PPI System Main Components  Dependency Parse Tree  Similarity Metrics  K-Nearest Neighbor Classifier  Evaluation of PPI  Evaluation Metrics  Results and Comparison
  • 58. Problem Description  Due to the ever growing amount of publications about protein-protein interactions, information extraction from text is increasingly recognized as one of crucial technologies in bioinformatics Reference: Gunes Erkan, Arzucan Ozgur, Dragomir R. Radev. Semi-Supervised Classication for Extracting Protein Interaction Sentences using Dependency Parsing. Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 228237, Prague, June 2007
  • 59. PPI Agenda  Problem Description  Motivation  PPI System Overview  PPI System Main Components  Dependency Parse Tree  Similarity Metrics  K-Nearest Neighbor Classifier  Evaluation of PPI  Evaluation Metrics  Results and Comparison
  • 60. Motivation  The interactions between proteins are important for very numerous if not all biological functions.  The function of a protein can be characterized more precisely through knowledge of PPI.  Information about these interactions improves our understanding of diseases and can provide the basis for new therapeutic approaches.  Validate experimental results and test benches.
  • 61. PPI Agenda  Problem Description  Motivation  PPI System Overview  PPI System Main Components  Dependency Parse Tree  Similarity Metrics  K-Nearest Neighbor Classifier  Evaluation of PPI  Evaluation Metrics  Results and Comparison
  • 62. System Overview  We worked on Sentence level (Why?)  It increases the semantic understood from the sentence.  Synthesis of the sentence increases the knowledge obtained from it.  Specific relation between proteins can be deduced from it.
  • 64. System Overview  Our approach depends on: The shortest path between the entities in dependency tree of a sentence usually captures the necessary information to identify their relationship.
  • 65. PPI Agenda  Problem Description  Motivation  PPI System Overview  PPI System Main Components  Dependency Parse Tree  Similarity Metrics  K-Nearest Neighbor Classifier  Evaluation of PPI  Evaluation Metrics  Results and Comparison
  • 67. Dependency Parse Tree • Unlike a syntactic parse, it captures the semantic predicate-argument relationships among its words.  Stanford Parser API to make the Natural Language processing task.  Shortest path is found using Breadth First Search (BFS) as each edge has equal wait, and therefore this leads to most near path discovered first.
  • 68. Dependency Parse Tree (Example)  "The dependency tree of the sentence “The results demonstrated that KaiC interacts rhythmically with KaiA, KaiB, and SasA.”
  • 69. Example (Cont.) • Then, we select the shortest paths between the protein pairs: • KaiC - nsubj - interacts - prep with – SasA • KaiC - nsubj - interacts - prep with - SasA - conj and - KaiA • KaiC - nsubj - interacts - prep with – SasA - conj and – KaiB • SasA - conj and – KaiA • SasA - conj and – KaiB • KaiA – conj and – SasA - conj and - KaiB
  • 70. Example (Cont.) • Then, we rename the proteins in the pair as PROTX1 and PROTX2, and all the other proteins in the sentence as PROTX0: • PROTX1 - nsubj - interacts - prep with - ROTX2 • PROTX1 - nsubj - interacts - prep with - ROTX0 – conj_and - PROTX2 • PROTX1 - nsubj - interacts - prep with – ROTX0 –conj_and - PROTX2 • PROTX1 – conj_and - PROTX2 • PROTX1 – conj_and - PROTX2 • PROTX1 – conj_and – PROTX0 – conj_and - PROTX2
  • 71. PPI Agenda  Problem Description  Motivation  PPI System Overview  PPI System Main Components  Dependency Parse Tree  Similarity Metrics  K-Nearest Neighbor Classifier  Evaluation of PPI  Evaluation Metrics  Results and Comparison
  • 73. Similarity Metrics  The main idea of using similarity metrics is to find a function that maps input patterns into a target space such that a simple distance in the target space approximates the “semantic” distance in the input space.
  • 74. Similarity Metrics  We implemented Levenshtein distance (Edit Distance).  number of transpositions, substitutions and deletions needed to transform one string into another.  We also used an open source library called “SimMetrics” – Java library of 23 string similarity metrics. • Developed at the University of Sheffield (Chapman, 2004)
  • 75. Similarity Metrics • We used only 10 string similarities from SimMetrics. • Cosine Similarity • Block Distance • Dice Similarity • Euclidean Distance • Jaccard Similarity • Jaro Similarity • Jaro Winkler Similarity • Matching Coecient • Monge Elkan Similarity
  • 76. PPI Agenda  Problem Description  Motivation  PPI System Overview  PPI System Main Components  Dependency Parse Tree  Similarity Metrics  K-Nearest Neighbor Classifier  Evaluation of PPI  Evaluation Metrics  Results and Comparison
  • 78. K-Nearest Neighbor Classifier • k nearest neighbor-assign label according to the majority label of k nearest-neighboor training patterns.
  • 79. KNN Example • If k = 3, it is classified as a triangle • k = 5, it is classified as a square
  • 80. KNN Strengths and Weaknesses • Strengths: • Simple to implement and use • Comprehensible – easy to explain prediction • Robust to noisy data by averaging k-nearest neighbors
  • 81. KNN Strengths and Weaknesses • Weaknesses: • Need a lot of space to store all examples. • Takes more time to classify a new example than with a model (need to calculate and compare distance from new example to all other examples).
  • 82. PPI Agenda  Problem Description  Motivation  PPI System Overview  PPI System Main Components  Dependency Parse Tree  Similarity Metrics  K-Nearest Neighbor Classifier  Evaluation of PPI  Evaluation Metrics  Results and Comparison
  • 84. Evaluation of PPI • we used five different datasets which are: • BioInfer dataset. • AIMed dataset. • LLL dataset. • IEPA dataset. • HPRD50 dataset. • We used KNN classier and changing K and similarity metric as parameters.
  • 86. Evaluation Metrics • Precision: • Recall: • F-measure:
  • 87. PPI Agenda  Problem Description  Motivation  PPI System Overview  PPI System Main Components  Dependency Parse Tree  Similarity Metrics  K-Nearest Neighbor Classifier  Evaluation of PPI  Evaluation Metrics  Results and Comparison
  • 93. Results and Comparison Dataset Min. Result Max. Result BioInfer 32 56.9 AIMed 5 48.9 LLL 48.8 73 IEPA 36.6 72 HPRD50 12.9 63.49
  • 94. Our PPI System Vs. Graph Kernel Approach Dataset Our System Graph Kernel (%) Approach (%) BioInfer 56.9 52.9 AIMed 48.9 56.4 LLL 73 76.8 IEPA 72 75.1 HPRD50 67 63.4
  • 95. Agenda  Introduction to Biomedical Text Mining  System Overview  Problem Description  Motivation  Challenges  System Framework  Application upon System Framework  Swanson’s Algorithm  Protein to Protein Interactions (PPI)  Gene Clustering based on Text Mining  Extended Work  Conclusion and Future Work.
  • 96. Overall System Architecture Searching Gene Swanson’s PPI & Clustering Algorithm Browsing Framework
  • 97. Motivation  Goal :  Grouping genes according some features .  Challenges :  Large number of genes .  The complexity of biological networks .
  • 98. Motivation  The solution is :  Gene Clustering
  • 99. Gene Clustering Techniques  Based on Gene Expression :  Advantages :  High Accuracy .  Disadvantages :  High cost .  Time Consuming .  Noise .
  • 100. Gene Clustering Techniques  Based on Text Mining :  Advantages :  Low Cost .  Low Time Consuming .  Disadvantages :  Low accuracy .
  • 101. Gene Clustering Based on Text Mining  To perform Gene Clustering we need :  Clustering Algorithms .  Similarity Measurements .
  • 102. Clustering Algorithms  Hierarchical Algorithms .  Partitioning Algorithms .  Density-Based Algorithms .
  • 104.
  • 107.
  • 109. Similarity Measurements  Swanson Algorithm .  Document Occurrences .
  • 110. Swanson Algorithm  Search PubMed for gene A and extract set A ( the most related keywords - MeSH or GO terms - ) .  Search PubMed for gene B and extract set B ( the most related keywords - MeSH or GO terms - ) .  Based on the intersection between set A and set B, we apply the cosine similarity.
  • 111. Document Occurrences  Search PubMed for gene A and extract set A (documents Ids of gene A) .  Search PubMed for gene B and extract set B (documents Ids of gene B).  Based on the intersection between set A and set B, we apply the Jaccard Similarity Coefficient.
  • 112. Agenda  Introduction to Biomedical Text Mining  System Overview  Problem Description  Motivation  Challenges  System Framework  Application upon System Framework  Swanson’s Algorithm  Protein to Protein Interactions (PPI)  Gene Clustering based on Text Mining  Extended Work  Conclusion and Future Work.
  • 113. Extended Work: PPI System with SVM Classifier (1)  Equation : u=w⋅x-b - Objective : min (1/2) || w||2 subject to yi (w ⋅ xi-b) ≥ 1, ∀i
  • 114. Extended Work: PPI System with SVM Classifier (2)  min Ψ (α ) = min (1/2) ∑ ∑ yi yj (xi ⋅xj)αi αj- ∑ αi  α is called multiplier and if we can get α we can get (w , b) .  w = ∑ yi αi xi , b = w ⋅ xk-yk for some αk > 0
  • 115. Agenda  Introduction to Biomedical Text Mining  System Overview  Problem Description  Motivation  Challenges  System Framework  Application upon System Framework  Swanson’s Algorithm  Protein to Protein Interactions (PPI)  Gene Clustering based on Text Mining  Extended Work  Conclusion and Future Work.
  • 116. Conclusion  Problem 1: Algorithms for concept recognition in documents abstracts and titles  We introduced an algorithm to annotate the Gene Ontology terms in the documents.  Problem 2: Use the annotated documents to build a structured representation of documents  We introduced how framework uses Gene Ontology to build a semantic representation of the obtained documents  Problem 3: Design a system for ontology based search engines for biological researchers  We introduced design of the framework and how it is flexible for future modifications and scalable with respect to number of documents and number of users.
  • 117. Conclusion  Problem 4: Using Swanson’s algorithm to assess the similarity between different biological terms  We introduced how can Swanson's algorithm be used to estimate the similarity between two instances (P53 and P21)  Problem 5: Supervised machine learning algorithms for prediction of Protein to Protein interactions  We introduced how we used supervised machine learning algorithms such as KNN and a new technique to estimate the distance between sentence in order to predict the possible interactions between proteins mentioned in the documents.  Problem 6: Unsupervised machine learning algorithms to identify different clusters of Genes  We introduced how we used unsupervised machine learning algorithms such as DBScan and the similarity based on Swanson Algorithms and Cosine similarity in order to group genes mentioned in the documents in different clusters.
  • 118. Future work  There are hot research areas and open problems in the biological text mining  The content Provider for Documents  Google Scholar  Using Semantic web 3.0 ( Online Journals )  The Ontology Generation  Ability to Edit the Ontologies and Adding knowledge  Other Ontologies  Using Wikipedia as an Ontology
  • 119. Future work  There are some features that may be added to the System  Biomedical Ontology based Search Engine  Provide documents summary for each group of documents  Allow the user to save and print the results obtained by the system.  Protein-Protein Interaction (PPI)  Use more sophisticated classifiers and machine learning techniques such as AdaBoost to enhance the classification process.  Use a background knowledge of verbs as there are many verbs gives the same meaning.  This will help the system to have more accurate results, as we can introduce some fuzzy distance to the differences between the meaning of verbs. This also will introduce the ability to discover the type of relations between the terms and to be more semantic relations identification.
  • 120. Future work • There are some features that may be added to the System  Gene Clustering  Using more sophisticated clustering algorithms which originally designed for gene clustering.  More Applications:  Based on the services provided by the ontology based engine, we can construct some applications such as extracting the relation between the drugs and diseases, group diseases in different clusters which decision helps to identify the characteristics of a new discovered disease and other applications that relay on text mining in biomedical documents.