SlideShare una empresa de Scribd logo
1 de 26
Semantic annotation, clustering
      and visualization
                Media Technology Msc Programme
  David Graus                 Graduation Project
                            Supervisor: Joris Slob
David Graus   Media Technology Msc Programme   07/02/2012




Introduction
David Graus   Media Technology Msc Programme                               07/02/2012




Cyttron DB entry
                   "The volume of the brain evaluated in this study. The
                   color scale represents the number of 4-mm voxels with
                   data in at least 7 subjects along a 3-cm deep line into
                   the brain. A three-dimensional rendering of a brain is
                   shown in regions where insufficient data were
                   obtained. The most superior regions of the frontal and
                   parietal lobes and the most inferior regions of the
                   temporal lobes were not evaluated. Imaging artifacts
                   may also compromise the significance of results in the
                   most inferior portions of the frontal lobe."
David Graus           Media Technology Msc Programme        07/02/2012




Tasks
 1.      Semantic annotation
         Identify and tag most important concepts from text [NLP]
 2.      Topic extraction
         Relate concepts and find clusters [Linked Data]
 3.      Visualization
         Draw resulting graphs and clusters [Datavisualization]
David Graus            Media Technology Msc Programme   07/02/2012




1. Semantic Annotation
  Method I: Find words       Method II: Compare texts
David Graus   Media Technology Msc Programme                            07/02/2012




Semantic Annotation: Method I
                "The volume of the brain evaluated in this study. The
                color scale represents the number of 4-mm voxels with
                data in at least 7 subjects along a 3-cm deep line into
                the brain. A three-dimensional rendering of a brain is
                shown in regions where insufficient data were
                obtained. The most superior regions of the frontal and
                parietal lobes and the most inferior regions of the
                temporal lobes were not evaluated. Imaging artifacts
                may also compromise the significance of results in the
                most inferior portions of the frontal lobe."
David Graus    Media Technology Msc Programme   07/02/2012




Formal knowledge: Biomedical Ontology
David Graus                  Media Technology Msc Programme                       07/02/2012




NCI Thesaurus
                89.129 unique concepts
                50.804 definitions
                258.051 synonyms
                Relations!

                Concept         Agrobacterium tumefaciens
                Definition      A species of Gram negative, rod shaped bacteria
                                assigned to the phylum Proteobacteria. This
                                bacteria is motile by flagella and mediates the
                                horizontal gene transfer of its Ti plasmid to
                                infect plants. A. tumefaciens is commonly found
                                in soil and around the root surfaces of plants
                                and is the causative agent of crown gall disease.
                Synonyms        RHIZOBIUM RADIOBACTER
                                CDC GROUP VD-3
David Graus                            Media Technology Msc Programme   07/02/2012




Semantic Annotation: Method I
            "The volume of the brain evaluated in this study. The
            color scale represents the number of 4-mm voxels with
            data in at least 7 subjects along a 3-cm deep line into
            the brain. A three-dimensional rendering of a brain is
            shown in regions where insufficient data were
            obtained. The most superior regions of the frontal and
            parietal lobes and the most inferior regions of the
            temporal lobes were not evaluated. Imaging artifacts
            may also compromise the significance of results in the
            most inferior portions of the frontal lobe."
David Graus                            Media Technology Msc Programme   07/02/2012




Semantic Annotation: Method I
            "The volume of the brain evaluated in this study. The
            color scale represents the number of 4-mm voxels with
            data in at least 7 subjects along a 3-cm deep line into
            the brain. A three-dimensional rendering of a brain is
            shown in regions where insufficient data were
            obtained. The most superior regions of the frontal and
            parietal lobes and the most inferior regions of the
            temporal lobes were not evaluated. Imaging artifacts
            may also compromise the significance of results in the
            most inferior portions of the frontal lobe."
David Graus                            Media Technology Msc Programme   07/02/2012




Semantic Annotation: Method I
            "The volume of the brain evaluated in this study. The
            color scale represents the number of 4-mm voxels with
            data in at least 7 subjects along a 3-cm deep line into
            the brain. A three-dimensional rendering of a brain is
            shown in regions where insufficient data were
            obtained. The most superior regions of the frontal and
            parietal lobes and the most inferior regions of the
            temporal lobes were not evaluated. Imaging artifacts
            may also compromise the significance of results in the
            most inferior portions of the frontal lobe."
David Graus                       Media Technology Msc Programme                        07/02/2012




Example
     "The volume of the brain evaluated in this study. The color scale represents the number of
     4-mm voxels with data in at least 7 subjects along a 3-cm deep line into the brain. A three-
     dimensional rendering of a brain is shown in regions where insufficient data were obtained.
     The most superior regions of the frontal and parietal lobes and the most inferior regions of
     the temporal lobes were not evaluated. Imaging artifacts may also compromise the
     significance of results in the most inferior portions of the frontal lobe."




     Most, Brain, A, Inferior, Data, And, With, Volume, Volume, Three,
     Temporal, Superior, Study, Scale, Parietal, Number, Lobe, Line, Into,
     Frontal Lobe, Deep, Color, At
David Graus                       Media Technology Msc Programme                        07/02/2012




Example
     "The volume of the brain evaluated in this study. The color scale represents the number of
     4-mm voxels with data in at least 7 subjects along a 3-cm deep line into the brain. A three-
     dimensional rendering of a brain is shown in regions where insufficient data were obtained.
     The most superior regions of the frontal and parietal lobes and the most inferior regions of
     the temporal lobes were not evaluated. Imaging artifacts may also compromise the
     significance of results in the most inferior portions of the frontal lobe."
David Graus                         Media Technology Msc Programme   07/02/2012




Semantic Annotation: Method I
       2 ‘Modifiers’ of representations:
                1.   (Porter) Stemming (text & ontologyconcepts)
                     Lobes – lobe
                     Brains – brain
                     Etc…
                2.   Generate synonyms (using WordNet)
David Graus                              Media Technology Msc Programme                                        07/02/2012



Different text representations
       Most frequent       'brain, regions, data, evaluated, frontal, inferior, lobes, along, also, artifacts,
               words
                           color, compromise, deep, dimensional, imaging, insufficient, least, line, lobe‘

       Most frequent       'brain, color, deep, imaging, insufficient, line, lobe, number, rendering, scale,
               nouns
                           significance, study, volume‘

                Bigrams    'also compromise, artifacts may, cm deep, color scale, compromise significance,
                           deep line, dimensional rendering, imaging artifacts, may also, mm voxels,
                           represents number, scale represents, significance results, subjects along, data
                           least, data obtained, evaluated study, frontal lobe, frontal parietal, inferior
                           portions‘
                Trigrams   'also compromise significance, artifacts may also, cm deep line, color scale
                           represents, compromise significance results, imaging artifacts may, may also
                           compromise, scale represents number, insufficient data obtained, mm voxels
                           data, portions frontal lobe, […]

                 Combo     'brain, regions, data, evaluated, frontal, inferior, lobes, along, also, artifacts,
                           color, compromise, deep, dimensional, imaging, insufficient, least, line, lobe.
                           brain, color, deep, imaging, insufficient, […]
David Graus        Media Technology Msc Programme    07/02/2012




Semantic Annotation: Method I
  6 Representations (literal + 5 keyword variations)

  4 Treatments (literal + stem + synonyms + both)

  24 results
David Graus                   Media Technology Msc Programme   07/02/2012




Method II: Text Comparison
 Find concepts that might not occur in text
 "The volume of the brain evaluated in
 this study. The color scale represents the
 number of 4-mm voxels with data in at
 least 7 subjects along a 3-cm deep line
 into the brain. A three-dimensional
 rendering of a brain is shown in regions
 where insufficient data were obtained.
 The most superior regions of the frontal
 and parietal lobes and the most inferior
 regions of the temporal lobes were not
 evaluated. Imaging artifacts may also
 compromise the significance of results
 in the most inferior portions of the
 frontal lobe."
David Graus                           Media Technology Msc Programme                            07/02/2012




Compare text to definitions
 Find relevant concepts based on their (textual) definitions




                 "The volume of the
                 brain evaluated in
                 this study. The
                 color scale
                 represents the                                               Parietal Lobe: One
                 number of 4-mm                                               of the lobes of the
                 voxels with data in                                          cerebral hemisphere
                 at least 7 subjects                                          located superiorly to
                 along a 3-cm deep                                            the occipital lobe and
                 line into the brain.                                         posteriorly to the
                 A three-dim                                                  frontal lobe.
                                                                              Cognition and
                                                                              visuospatial
                                                                              processing are its
                Cyttron entry                                                 main functions.



                                                                         NCI Thesaurus
                                                                         definitions
David Graus                   Media Technology Msc Programme   07/02/2012




Method II: Text Comparison
 Find concepts that might not occur in text
 "The volume of the brain evaluated in
 this study. The color scale represents the
 number of 4-mm voxels with data in at
 least 7 subjects along a 3-cm deep line
 into the brain. A three-dimensional
 rendering of a brain is shown in regions
 where insufficient data were obtained.
 The most superior regions of the frontal
 and parietal lobes and the most inferior
 regions of the temporal lobes were not
 evaluated. Imaging artifacts may also
 compromise the significance of results
 in the most inferior portions of the
 frontal lobe."
David Graus                   Media Technology Msc Programme   07/02/2012




Compare how?
       Bag of Words + TF-IDF

       Dictionary: BioMedCentral Corpus
           > 100.000 articles

           > 8GB raw data

       Process Corpus
           Clean (strip tags, store only article body)
           Tokenize (create list of words)
           Remove common words (stopwords)
           Stem remaining words
David Graus                           Media Technology Msc Programme                  07/02/2012




Method II: Text Comparison
       Convert both texts to vector space using dictionary, compute similarity.
       Return most similar concepts.



        "The volume of the brain evaluated in this study.      1. Frontotemporal Dementia
        The color scale represents the number of 4-mm
        voxels with data in at least 7 subjects along a 3-
                                                               2. Parietal Lobe
        cm deep line into the brain. A three-dimensional       3. Area of Broca
        rendering of a brain is shown in regions where         4. Anterior Cranial Fossa
        insufficient data were obtained. The most
        superior regions of the frontal and parietal lobes     5. Brain Lobectomy
        and the most inferior regions of the temporal          6. Anterior Parietal Artery
        lobes were not evaluated. Imaging artifacts may
        also compromise the significance of results in the
                                                               7. Mammary Gland
        most inferior portions of the frontal lobe."           8. Frontal Lobe
                                                               9. Interlobar
                                                               10. Lobar
David Graus                    Media Technology Msc Programme   07/02/2012




Method II: Text Comparison
       Different cut-off rules:
       1.       Anything over x% similar
       2.       5 most similar
       3.       10 most similar
       4.       20% most similar
       5.       10% most similar
David Graus            Media Technology Msc Programme   07/02/2012




Result
       Long list of (linked) concepts
       Relevancy?
David Graus             Media Technology Msc Programme   07/02/2012




Find clusters
       Measure semantic similarity between concepts

       - Shortest paths
       - Shared parents
       - Node’s ‘depth’
David Graus   Media Technology Msc Programme   07/02/2012
David Graus          Media Technology Msc Programme   07/02/2012




To do
        Get data!
        Analyse algorithms

Más contenido relacionado

Destacado

Semantic Annotation Dc 2009
Semantic Annotation Dc 2009Semantic Annotation Dc 2009
Semantic Annotation Dc 2009
sdas617
 

Destacado (6)

Semantic Annotation of the Cyttron Database
Semantic Annotation of the Cyttron DatabaseSemantic Annotation of the Cyttron Database
Semantic Annotation of the Cyttron Database
 
Semantic Annotation - Ontobras 2015
Semantic Annotation - Ontobras 2015Semantic Annotation - Ontobras 2015
Semantic Annotation - Ontobras 2015
 
Semantic Annotation Dc 2009
Semantic Annotation Dc 2009Semantic Annotation Dc 2009
Semantic Annotation Dc 2009
 
Semantic Annotation of Documents
Semantic Annotation of DocumentsSemantic Annotation of Documents
Semantic Annotation of Documents
 
Semantic annotation with Pundit: Enriching the Web of Science
Semantic annotation with Pundit: Enriching the Web of ScienceSemantic annotation with Pundit: Enriching the Web of Science
Semantic annotation with Pundit: Enriching the Web of Science
 
IRE Semantic Annotation of Documents
IRE Semantic Annotation of Documents IRE Semantic Annotation of Documents
IRE Semantic Annotation of Documents
 

Similar a Semantic annotation, clustering and visualization

from pq import from search import class InformedNode(No
from pq import from search import class InformedNode(Nofrom pq import from search import class InformedNode(No
from pq import from search import class InformedNode(No
JeanmarieColbert3
 

Similar a Semantic annotation, clustering and visualization (20)

8-13-2013LPPPresentation
8-13-2013LPPPresentation8-13-2013LPPPresentation
8-13-2013LPPPresentation
 
USING SINGULAR VALUE DECOMPOSITION IN A CONVOLUTIONAL NEURAL NETWORK TO IMPRO...
USING SINGULAR VALUE DECOMPOSITION IN A CONVOLUTIONAL NEURAL NETWORK TO IMPRO...USING SINGULAR VALUE DECOMPOSITION IN A CONVOLUTIONAL NEURAL NETWORK TO IMPRO...
USING SINGULAR VALUE DECOMPOSITION IN A CONVOLUTIONAL NEURAL NETWORK TO IMPRO...
 
Using Singular Value Decomposition in a Convolutional Neural Network to Impro...
Using Singular Value Decomposition in a Convolutional Neural Network to Impro...Using Singular Value Decomposition in a Convolutional Neural Network to Impro...
Using Singular Value Decomposition in a Convolutional Neural Network to Impro...
 
Sangeetha seminar (1)
Sangeetha  seminar (1)Sangeetha  seminar (1)
Sangeetha seminar (1)
 
from pq import from search import class InformedNode(No
from pq import from search import class InformedNode(Nofrom pq import from search import class InformedNode(No
from pq import from search import class InformedNode(No
 
From pq import from search import class informed node(no
From pq import from search import class informed node(noFrom pq import from search import class informed node(no
From pq import from search import class informed node(no
 
Image Steganography Technique By Using Braille Method of Blind People (LSBrai...
Image Steganography Technique By Using Braille Method of Blind People (LSBrai...Image Steganography Technique By Using Braille Method of Blind People (LSBrai...
Image Steganography Technique By Using Braille Method of Blind People (LSBrai...
 
Brain Image Segmentation Methods using Image Processing Techniques to Analysi...
Brain Image Segmentation Methods using Image Processing Techniques to Analysi...Brain Image Segmentation Methods using Image Processing Techniques to Analysi...
Brain Image Segmentation Methods using Image Processing Techniques to Analysi...
 
Brain Image Segmentation Methods using Image Processing Techniques to Analysi...
Brain Image Segmentation Methods using Image Processing Techniques to Analysi...Brain Image Segmentation Methods using Image Processing Techniques to Analysi...
Brain Image Segmentation Methods using Image Processing Techniques to Analysi...
 
Brain structural connectivity and functional default mode network in deafness
Brain structural connectivity and functional default mode network in deafnessBrain structural connectivity and functional default mode network in deafness
Brain structural connectivity and functional default mode network in deafness
 
HIGH RESOLUTION MRI BRAIN IMAGE SEGMENTATION TECHNIQUE USING HOLDER EXPONENT
HIGH RESOLUTION MRI BRAIN IMAGE SEGMENTATION TECHNIQUE USING HOLDER EXPONENTHIGH RESOLUTION MRI BRAIN IMAGE SEGMENTATION TECHNIQUE USING HOLDER EXPONENT
HIGH RESOLUTION MRI BRAIN IMAGE SEGMENTATION TECHNIQUE USING HOLDER EXPONENT
 
High Resolution Mri Brain Image Segmentation Technique Using Holder Exponent
High Resolution Mri Brain Image Segmentation Technique Using Holder Exponent  High Resolution Mri Brain Image Segmentation Technique Using Holder Exponent
High Resolution Mri Brain Image Segmentation Technique Using Holder Exponent
 
H0114857
H0114857H0114857
H0114857
 
Segmentation and Classification of Brain MRI Images Using Improved Logismos-B...
Segmentation and Classification of Brain MRI Images Using Improved Logismos-B...Segmentation and Classification of Brain MRI Images Using Improved Logismos-B...
Segmentation and Classification of Brain MRI Images Using Improved Logismos-B...
 
Visual Character N-Grams for Classification and Retrieval of Radiological Images
Visual Character N-Grams for Classification and Retrieval of Radiological ImagesVisual Character N-Grams for Classification and Retrieval of Radiological Images
Visual Character N-Grams for Classification and Retrieval of Radiological Images
 
Visual character n grams for classification and retrieval of radiological images
Visual character n grams for classification and retrieval of radiological imagesVisual character n grams for classification and retrieval of radiological images
Visual character n grams for classification and retrieval of radiological images
 
Analysis of Alzheimer’s Disease Using Color Image Segmentation
Analysis of Alzheimer’s Disease Using Color Image SegmentationAnalysis of Alzheimer’s Disease Using Color Image Segmentation
Analysis of Alzheimer’s Disease Using Color Image Segmentation
 
34 107-1-pb
34 107-1-pb34 107-1-pb
34 107-1-pb
 
FAKE NEWS DETECTION WITH SEMANTIC FEATURES AND TEXT MINING
FAKE NEWS DETECTION WITH SEMANTIC FEATURES AND TEXT MININGFAKE NEWS DETECTION WITH SEMANTIC FEATURES AND TEXT MINING
FAKE NEWS DETECTION WITH SEMANTIC FEATURES AND TEXT MINING
 
FAKE NEWS DETECTION WITH SEMANTIC FEATURES AND TEXT MINING
FAKE NEWS DETECTION WITH SEMANTIC FEATURES AND TEXT MININGFAKE NEWS DETECTION WITH SEMANTIC FEATURES AND TEXT MINING
FAKE NEWS DETECTION WITH SEMANTIC FEATURES AND TEXT MINING
 

Más de David Graus

Dynamic Collective Entity Representations for Entity Ranking
Dynamic Collective Entity Representations for Entity RankingDynamic Collective Entity Representations for Entity Ranking
Dynamic Collective Entity Representations for Entity Ranking
David Graus
 
David Graus - Entity Linking (at SEA), Search Engines Amsterdam, Fri June 27th
David Graus - Entity Linking (at SEA), Search Engines Amsterdam, Fri June 27thDavid Graus - Entity Linking (at SEA), Search Engines Amsterdam, Fri June 27th
David Graus - Entity Linking (at SEA), Search Engines Amsterdam, Fri June 27th
David Graus
 
Understanding Email Traffic (talk @ E-Discovery NL Symposium)
Understanding Email Traffic (talk @ E-Discovery NL Symposium)Understanding Email Traffic (talk @ E-Discovery NL Symposium)
Understanding Email Traffic (talk @ E-Discovery NL Symposium)
David Graus
 
Semantic Search in E-Discovery
Semantic Search in E-DiscoverySemantic Search in E-Discovery
Semantic Search in E-Discovery
David Graus
 

Más de David Graus (20)

Pragmatic ethical and fair AI for data scientists
Pragmatic ethical and fair AI for data scientistsPragmatic ethical and fair AI for data scientists
Pragmatic ethical and fair AI for data scientists
 
Bias in Recommendations
Bias in RecommendationsBias in Recommendations
Bias in Recommendations
 
RecSys in the Media Industry: Relevance, Recency, Popularity, and Diversity.
RecSys in the Media Industry: Relevance, Recency, Popularity, and Diversity.RecSys in the Media Industry: Relevance, Recency, Popularity, and Diversity.
RecSys in the Media Industry: Relevance, Recency, Popularity, and Diversity.
 
CAT/AI: Computer Assisted Translation 
Assessment for Impact
CAT/AI: Computer Assisted Translation 
Assessment for ImpactCAT/AI: Computer Assisted Translation 
Assessment for Impact
CAT/AI: Computer Assisted Translation 
Assessment for Impact
 
Opening the Black Box of User Profiles in Content-based Recommender Systems
Opening the Black Box of User Profiles in Content-based Recommender SystemsOpening the Black Box of User Profiles in Content-based Recommender Systems
Opening the Black Box of User Profiles in Content-based Recommender Systems
 
Zoeken, vinden, en aanbevelen: personalisatie vs. privacy
Zoeken, vinden, en aanbevelen: personalisatie vs. privacyZoeken, vinden, en aanbevelen: personalisatie vs. privacy
Zoeken, vinden, en aanbevelen: personalisatie vs. privacy
 
Layman's Talk: Entities of Interest --- Discovery in Digital Traces
Layman's Talk: Entities of Interest --- Discovery in Digital TracesLayman's Talk: Entities of Interest --- Discovery in Digital Traces
Layman's Talk: Entities of Interest --- Discovery in Digital Traces
 
Financial News Mining @ PyData Amsterdam
Financial News Mining @ PyData AmsterdamFinancial News Mining @ PyData Amsterdam
Financial News Mining @ PyData Amsterdam
 
De Macht van Data --- Hoe algoritmen ons leven vormgeven
De Macht van Data --- Hoe algoritmen ons leven vormgevenDe Macht van Data --- Hoe algoritmen ons leven vormgeven
De Macht van Data --- Hoe algoritmen ons leven vormgeven
 
Financial News Mining @ FD Mediagroep/Company.info
Financial News Mining @ FD Mediagroep/Company.infoFinancial News Mining @ FD Mediagroep/Company.info
Financial News Mining @ FD Mediagroep/Company.info
 
Big Data & Machine Learning - Mogelijkheden & Valkuilen
Big Data & Machine Learning - Mogelijkheden & ValkuilenBig Data & Machine Learning - Mogelijkheden & Valkuilen
Big Data & Machine Learning - Mogelijkheden & Valkuilen
 
Analyzing and Predicting Task Reminders
Analyzing and Predicting Task RemindersAnalyzing and Predicting Task Reminders
Analyzing and Predicting Task Reminders
 
Dynamic Collective Entity Representations for Entity Ranking
Dynamic Collective Entity Representations for Entity RankingDynamic Collective Entity Representations for Entity Ranking
Dynamic Collective Entity Representations for Entity Ranking
 
Dynamic Collective Entity Representations for Entity Ranking
Dynamic Collective Entity Representations for Entity RankingDynamic Collective Entity Representations for Entity Ranking
Dynamic Collective Entity Representations for Entity Ranking
 
Understanding Email Traffic
Understanding Email TrafficUnderstanding Email Traffic
Understanding Email Traffic
 
David Graus - Entity Linking (at SEA), Search Engines Amsterdam, Fri June 27th
David Graus - Entity Linking (at SEA), Search Engines Amsterdam, Fri June 27thDavid Graus - Entity Linking (at SEA), Search Engines Amsterdam, Fri June 27th
David Graus - Entity Linking (at SEA), Search Engines Amsterdam, Fri June 27th
 
Understanding Email Traffic (talk @ E-Discovery NL Symposium)
Understanding Email Traffic (talk @ E-Discovery NL Symposium)Understanding Email Traffic (talk @ E-Discovery NL Symposium)
Understanding Email Traffic (talk @ E-Discovery NL Symposium)
 
Generating Pseudo-ground Truth for Detecting New Concepts in Social Streams
Generating Pseudo-ground Truth for Detecting New Concepts in Social StreamsGenerating Pseudo-ground Truth for Detecting New Concepts in Social Streams
Generating Pseudo-ground Truth for Detecting New Concepts in Social Streams
 
yourHistory - entity linking for a personalized timeline of historic events
yourHistory - entity linking for a personalized timeline of historic eventsyourHistory - entity linking for a personalized timeline of historic events
yourHistory - entity linking for a personalized timeline of historic events
 
Semantic Search in E-Discovery
Semantic Search in E-DiscoverySemantic Search in E-Discovery
Semantic Search in E-Discovery
 

Último

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Último (20)

ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 

Semantic annotation, clustering and visualization

  • 1. Semantic annotation, clustering and visualization Media Technology Msc Programme David Graus Graduation Project Supervisor: Joris Slob
  • 2. David Graus Media Technology Msc Programme 07/02/2012 Introduction
  • 3. David Graus Media Technology Msc Programme 07/02/2012 Cyttron DB entry "The volume of the brain evaluated in this study. The color scale represents the number of 4-mm voxels with data in at least 7 subjects along a 3-cm deep line into the brain. A three-dimensional rendering of a brain is shown in regions where insufficient data were obtained. The most superior regions of the frontal and parietal lobes and the most inferior regions of the temporal lobes were not evaluated. Imaging artifacts may also compromise the significance of results in the most inferior portions of the frontal lobe."
  • 4. David Graus Media Technology Msc Programme 07/02/2012 Tasks 1. Semantic annotation Identify and tag most important concepts from text [NLP] 2. Topic extraction Relate concepts and find clusters [Linked Data] 3. Visualization Draw resulting graphs and clusters [Datavisualization]
  • 5. David Graus Media Technology Msc Programme 07/02/2012 1. Semantic Annotation Method I: Find words Method II: Compare texts
  • 6. David Graus Media Technology Msc Programme 07/02/2012 Semantic Annotation: Method I "The volume of the brain evaluated in this study. The color scale represents the number of 4-mm voxels with data in at least 7 subjects along a 3-cm deep line into the brain. A three-dimensional rendering of a brain is shown in regions where insufficient data were obtained. The most superior regions of the frontal and parietal lobes and the most inferior regions of the temporal lobes were not evaluated. Imaging artifacts may also compromise the significance of results in the most inferior portions of the frontal lobe."
  • 7. David Graus Media Technology Msc Programme 07/02/2012 Formal knowledge: Biomedical Ontology
  • 8. David Graus Media Technology Msc Programme 07/02/2012 NCI Thesaurus 89.129 unique concepts 50.804 definitions 258.051 synonyms Relations! Concept Agrobacterium tumefaciens Definition A species of Gram negative, rod shaped bacteria assigned to the phylum Proteobacteria. This bacteria is motile by flagella and mediates the horizontal gene transfer of its Ti plasmid to infect plants. A. tumefaciens is commonly found in soil and around the root surfaces of plants and is the causative agent of crown gall disease. Synonyms RHIZOBIUM RADIOBACTER CDC GROUP VD-3
  • 9. David Graus Media Technology Msc Programme 07/02/2012 Semantic Annotation: Method I "The volume of the brain evaluated in this study. The color scale represents the number of 4-mm voxels with data in at least 7 subjects along a 3-cm deep line into the brain. A three-dimensional rendering of a brain is shown in regions where insufficient data were obtained. The most superior regions of the frontal and parietal lobes and the most inferior regions of the temporal lobes were not evaluated. Imaging artifacts may also compromise the significance of results in the most inferior portions of the frontal lobe."
  • 10. David Graus Media Technology Msc Programme 07/02/2012 Semantic Annotation: Method I "The volume of the brain evaluated in this study. The color scale represents the number of 4-mm voxels with data in at least 7 subjects along a 3-cm deep line into the brain. A three-dimensional rendering of a brain is shown in regions where insufficient data were obtained. The most superior regions of the frontal and parietal lobes and the most inferior regions of the temporal lobes were not evaluated. Imaging artifacts may also compromise the significance of results in the most inferior portions of the frontal lobe."
  • 11. David Graus Media Technology Msc Programme 07/02/2012 Semantic Annotation: Method I "The volume of the brain evaluated in this study. The color scale represents the number of 4-mm voxels with data in at least 7 subjects along a 3-cm deep line into the brain. A three-dimensional rendering of a brain is shown in regions where insufficient data were obtained. The most superior regions of the frontal and parietal lobes and the most inferior regions of the temporal lobes were not evaluated. Imaging artifacts may also compromise the significance of results in the most inferior portions of the frontal lobe."
  • 12. David Graus Media Technology Msc Programme 07/02/2012 Example "The volume of the brain evaluated in this study. The color scale represents the number of 4-mm voxels with data in at least 7 subjects along a 3-cm deep line into the brain. A three- dimensional rendering of a brain is shown in regions where insufficient data were obtained. The most superior regions of the frontal and parietal lobes and the most inferior regions of the temporal lobes were not evaluated. Imaging artifacts may also compromise the significance of results in the most inferior portions of the frontal lobe." Most, Brain, A, Inferior, Data, And, With, Volume, Volume, Three, Temporal, Superior, Study, Scale, Parietal, Number, Lobe, Line, Into, Frontal Lobe, Deep, Color, At
  • 13. David Graus Media Technology Msc Programme 07/02/2012 Example "The volume of the brain evaluated in this study. The color scale represents the number of 4-mm voxels with data in at least 7 subjects along a 3-cm deep line into the brain. A three- dimensional rendering of a brain is shown in regions where insufficient data were obtained. The most superior regions of the frontal and parietal lobes and the most inferior regions of the temporal lobes were not evaluated. Imaging artifacts may also compromise the significance of results in the most inferior portions of the frontal lobe."
  • 14. David Graus Media Technology Msc Programme 07/02/2012 Semantic Annotation: Method I 2 ‘Modifiers’ of representations: 1. (Porter) Stemming (text & ontologyconcepts) Lobes – lobe Brains – brain Etc… 2. Generate synonyms (using WordNet)
  • 15. David Graus Media Technology Msc Programme 07/02/2012 Different text representations Most frequent 'brain, regions, data, evaluated, frontal, inferior, lobes, along, also, artifacts, words color, compromise, deep, dimensional, imaging, insufficient, least, line, lobe‘ Most frequent 'brain, color, deep, imaging, insufficient, line, lobe, number, rendering, scale, nouns significance, study, volume‘ Bigrams 'also compromise, artifacts may, cm deep, color scale, compromise significance, deep line, dimensional rendering, imaging artifacts, may also, mm voxels, represents number, scale represents, significance results, subjects along, data least, data obtained, evaluated study, frontal lobe, frontal parietal, inferior portions‘ Trigrams 'also compromise significance, artifacts may also, cm deep line, color scale represents, compromise significance results, imaging artifacts may, may also compromise, scale represents number, insufficient data obtained, mm voxels data, portions frontal lobe, […] Combo 'brain, regions, data, evaluated, frontal, inferior, lobes, along, also, artifacts, color, compromise, deep, dimensional, imaging, insufficient, least, line, lobe. brain, color, deep, imaging, insufficient, […]
  • 16. David Graus Media Technology Msc Programme 07/02/2012 Semantic Annotation: Method I 6 Representations (literal + 5 keyword variations) 4 Treatments (literal + stem + synonyms + both) 24 results
  • 17. David Graus Media Technology Msc Programme 07/02/2012 Method II: Text Comparison Find concepts that might not occur in text "The volume of the brain evaluated in this study. The color scale represents the number of 4-mm voxels with data in at least 7 subjects along a 3-cm deep line into the brain. A three-dimensional rendering of a brain is shown in regions where insufficient data were obtained. The most superior regions of the frontal and parietal lobes and the most inferior regions of the temporal lobes were not evaluated. Imaging artifacts may also compromise the significance of results in the most inferior portions of the frontal lobe."
  • 18. David Graus Media Technology Msc Programme 07/02/2012 Compare text to definitions Find relevant concepts based on their (textual) definitions "The volume of the brain evaluated in this study. The color scale represents the Parietal Lobe: One number of 4-mm of the lobes of the voxels with data in cerebral hemisphere at least 7 subjects located superiorly to along a 3-cm deep the occipital lobe and line into the brain. posteriorly to the A three-dim frontal lobe. Cognition and visuospatial processing are its Cyttron entry main functions. NCI Thesaurus definitions
  • 19. David Graus Media Technology Msc Programme 07/02/2012 Method II: Text Comparison Find concepts that might not occur in text "The volume of the brain evaluated in this study. The color scale represents the number of 4-mm voxels with data in at least 7 subjects along a 3-cm deep line into the brain. A three-dimensional rendering of a brain is shown in regions where insufficient data were obtained. The most superior regions of the frontal and parietal lobes and the most inferior regions of the temporal lobes were not evaluated. Imaging artifacts may also compromise the significance of results in the most inferior portions of the frontal lobe."
  • 20. David Graus Media Technology Msc Programme 07/02/2012 Compare how? Bag of Words + TF-IDF Dictionary: BioMedCentral Corpus > 100.000 articles > 8GB raw data Process Corpus Clean (strip tags, store only article body) Tokenize (create list of words) Remove common words (stopwords) Stem remaining words
  • 21. David Graus Media Technology Msc Programme 07/02/2012 Method II: Text Comparison Convert both texts to vector space using dictionary, compute similarity. Return most similar concepts. "The volume of the brain evaluated in this study. 1. Frontotemporal Dementia The color scale represents the number of 4-mm voxels with data in at least 7 subjects along a 3- 2. Parietal Lobe cm deep line into the brain. A three-dimensional 3. Area of Broca rendering of a brain is shown in regions where 4. Anterior Cranial Fossa insufficient data were obtained. The most superior regions of the frontal and parietal lobes 5. Brain Lobectomy and the most inferior regions of the temporal 6. Anterior Parietal Artery lobes were not evaluated. Imaging artifacts may also compromise the significance of results in the 7. Mammary Gland most inferior portions of the frontal lobe." 8. Frontal Lobe 9. Interlobar 10. Lobar
  • 22. David Graus Media Technology Msc Programme 07/02/2012 Method II: Text Comparison Different cut-off rules: 1. Anything over x% similar 2. 5 most similar 3. 10 most similar 4. 20% most similar 5. 10% most similar
  • 23. David Graus Media Technology Msc Programme 07/02/2012 Result Long list of (linked) concepts Relevancy?
  • 24. David Graus Media Technology Msc Programme 07/02/2012 Find clusters Measure semantic similarity between concepts - Shortest paths - Shared parents - Node’s ‘depth’
  • 25. David Graus Media Technology Msc Programme 07/02/2012
  • 26. David Graus Media Technology Msc Programme 07/02/2012 To do  Get data!  Analyse algorithms

Notas del editor

  1. So these are the 10 most similar concepts returned
  2. Example of a connectedgraph.I want to explore the possibilities of visualizing the results, withvarying node (circle) sizesfor more and less important concepts.Colored and transparant circlesforliteral and non-literalconcepts.Conveying the information from the text in a graph.This might also help with analyzing the differences of my method vs. that of humans.