SlideShare una empresa de Scribd logo
1 de 7
Topic Analysis in ARCOMEM
Yahoo Research Barcelona
What is Probabilistic Topic Modelling?
Exploring and retrieving meaningful information from large
collections of textual documents is a challenging task
Probabilistic topic models are a suite of algorithms (a framework)
that aim to discover and annotate large archives of documents
with thematic information.
They do not require any prior annotations or labeling of the
documents.
Topics emerge from the statistical analysis of the original texts
Probabilistic Topic Model
Topic models are based upon the idea that documents are mixtures
of topics, where a topic is a probability distribution over a fixed
vocabulary.
A topic model is a generative model for documents: it specifies a
simple probabilistic procedure by which documents can be generated.
The idea is to study the co-occurrence of words, assuming that
words that tend to co-occur frequently, express, or belong to, the
same semantic concept.
Example: A document (d) can be represented by the following mixture
of topics Biology Physics Mathematics
0,6 0,3 0,1
In the topic “Biology” words such as “Dna, genetic, evolution” have high
probability
Intuition behind topic modelling
Documents exhibit multiple topics
Each topic is individually interpretable, providing a probability
distribution over words that picks out a coherent cluster of
correlated terms
Evolution Biology
Genetics
Statistical
Analysis
The challenge is to identify, for each campaign, significant and
important topics that are relevant to the two user cases, broadcasting
and parliament libraries.
Topic analysis provides semantic useful categories which allow end-
users to search and browse content archives.
Try out on SARA: Trending topics
Try out on SARA: Statistical Topic Models

Más contenido relacionado

La actualidad más candente

text_mining.doc
text_mining.doctext_mining.doc
text_mining.doc
butest
 
Finding electronic journal articles s mc
Finding electronic journal articles s mcFinding electronic journal articles s mc
Finding electronic journal articles s mc
SteveMcIndoe
 
An efficient concept based mining model for enhancing text clustering(synopsis)
An efficient concept based mining model for enhancing text clustering(synopsis)An efficient concept based mining model for enhancing text clustering(synopsis)
An efficient concept based mining model for enhancing text clustering(synopsis)
Mumbai Academisc
 
Concepts as Action-Oriented as 'Search'
Concepts as Action-Oriented as 'Search'Concepts as Action-Oriented as 'Search'
Concepts as Action-Oriented as 'Search'
mahmad
 
Tdm information retrieval
Tdm information retrievalTdm information retrieval
Tdm information retrieval
KU Leuven
 
Ontology-Based Word Sense Disambiguation for Scientific Literature
Ontology-Based Word Sense Disambiguation for Scientific LiteratureOntology-Based Word Sense Disambiguation for Scientific Literature
Ontology-Based Word Sense Disambiguation for Scientific Literature
eXascale Infolab
 
Model of information retrieval (3)
Model  of information retrieval (3)Model  of information retrieval (3)
Model of information retrieval (3)
9866825059
 
Vector space model of information retrieval
Vector space model of information retrievalVector space model of information retrieval
Vector space model of information retrieval
Nanthini Dominique
 

La actualidad más candente (20)

text_mining.doc
text_mining.doctext_mining.doc
text_mining.doc
 
Ir 01
Ir   01Ir   01
Ir 01
 
Supporting scientific discovery through linkages of literature and data
Supporting scientific discovery through linkages of literature and dataSupporting scientific discovery through linkages of literature and data
Supporting scientific discovery through linkages of literature and data
 
Finding electronic journal articles s mc
Finding electronic journal articles s mcFinding electronic journal articles s mc
Finding electronic journal articles s mc
 
An efficient concept based mining model for enhancing text clustering(synopsis)
An efficient concept based mining model for enhancing text clustering(synopsis)An efficient concept based mining model for enhancing text clustering(synopsis)
An efficient concept based mining model for enhancing text clustering(synopsis)
 
Concepts as Action-Oriented as 'Search'
Concepts as Action-Oriented as 'Search'Concepts as Action-Oriented as 'Search'
Concepts as Action-Oriented as 'Search'
 
Tdm information retrieval
Tdm information retrievalTdm information retrieval
Tdm information retrieval
 
Classification of News and Research Articles Using Text Pattern Mining
Classification of News and Research Articles Using Text Pattern MiningClassification of News and Research Articles Using Text Pattern Mining
Classification of News and Research Articles Using Text Pattern Mining
 
Data Mining in Rediology reports
Data Mining in Rediology reportsData Mining in Rediology reports
Data Mining in Rediology reports
 
Ontology-Based Word Sense Disambiguation for Scientific Literature
Ontology-Based Word Sense Disambiguation for Scientific LiteratureOntology-Based Word Sense Disambiguation for Scientific Literature
Ontology-Based Word Sense Disambiguation for Scientific Literature
 
Model of information retrieval (3)
Model  of information retrieval (3)Model  of information retrieval (3)
Model of information retrieval (3)
 
Probabilistic Information Retrieval
Probabilistic Information RetrievalProbabilistic Information Retrieval
Probabilistic Information Retrieval
 
Vector space model of information retrieval
Vector space model of information retrievalVector space model of information retrieval
Vector space model of information retrieval
 
Ontology learning
Ontology learningOntology learning
Ontology learning
 
Dr. N K Swain’s research prescription for LIS novices
Dr. N K Swain’s research prescription for LIS novices Dr. N K Swain’s research prescription for LIS novices
Dr. N K Swain’s research prescription for LIS novices
 
How to write research papers? Version 5.0
How to write research papers? Version 5.0How to write research papers? Version 5.0
How to write research papers? Version 5.0
 
Dynamic & Attribute Weighted KNN for Document Classification Using Bootstrap ...
Dynamic & Attribute Weighted KNN for Document Classification Using Bootstrap ...Dynamic & Attribute Weighted KNN for Document Classification Using Bootstrap ...
Dynamic & Attribute Weighted KNN for Document Classification Using Bootstrap ...
 
A Topic map-based ontology IR system versus Clustering-based IR System: A Com...
A Topic map-based ontology IR system versus Clustering-based IR System: A Com...A Topic map-based ontology IR system versus Clustering-based IR System: A Com...
A Topic map-based ontology IR system versus Clustering-based IR System: A Com...
 
Pharmacy
PharmacyPharmacy
Pharmacy
 
Survey of natural language processing(midp2)
Survey of natural language processing(midp2)Survey of natural language processing(midp2)
Survey of natural language processing(midp2)
 

Similar a Arcomem training Topic Analysis Models beginners

Grounded Theory
Grounded TheoryGrounded Theory
Grounded Theory
litdoc1999
 
Philosophy of science summary presentation engelby
Philosophy of science summary presentation engelbyPhilosophy of science summary presentation engelby
Philosophy of science summary presentation engelby
David Engelby
 

Similar a Arcomem training Topic Analysis Models beginners (20)

Probabilistic Topic Models
Probabilistic Topic ModelsProbabilistic Topic Models
Probabilistic Topic Models
 
A Text Mining Research Based on LDA Topic Modelling
A Text Mining Research Based on LDA Topic ModellingA Text Mining Research Based on LDA Topic Modelling
A Text Mining Research Based on LDA Topic Modelling
 
A TEXT MINING RESEARCH BASED ON LDA TOPIC MODELLING
A TEXT MINING RESEARCH BASED ON LDA TOPIC MODELLINGA TEXT MINING RESEARCH BASED ON LDA TOPIC MODELLING
A TEXT MINING RESEARCH BASED ON LDA TOPIC MODELLING
 
Concurrent Inference of Topic Models and Distributed Vector Representations
Concurrent Inference of Topic Models and Distributed Vector RepresentationsConcurrent Inference of Topic Models and Distributed Vector Representations
Concurrent Inference of Topic Models and Distributed Vector Representations
 
what is Grounded Theory Method
what is Grounded Theory Methodwhat is Grounded Theory Method
what is Grounded Theory Method
 
7 calais
7 calais7 calais
7 calais
 
7 calais
7 calais7 calais
7 calais
 
Topic Models Exploration
Topic Models ExplorationTopic Models Exploration
Topic Models Exploration
 
Applying machine learning techniques to big data in the scholarly domain
Applying machine learning techniques to big data in the scholarly domainApplying machine learning techniques to big data in the scholarly domain
Applying machine learning techniques to big data in the scholarly domain
 
Literature Review
Literature ReviewLiterature Review
Literature Review
 
A Document Exploring System on LDA Topic Model for Wikipedia Articles
A Document Exploring System on LDA Topic Model for Wikipedia ArticlesA Document Exploring System on LDA Topic Model for Wikipedia Articles
A Document Exploring System on LDA Topic Model for Wikipedia Articles
 
Aletras, Nikolaos and Stevenson, Mark (2013) "Evaluating Topic Coherence Us...
Aletras, Nikolaos  and  Stevenson, Mark (2013) "Evaluating Topic Coherence Us...Aletras, Nikolaos  and  Stevenson, Mark (2013) "Evaluating Topic Coherence Us...
Aletras, Nikolaos and Stevenson, Mark (2013) "Evaluating Topic Coherence Us...
 
Grounded Theory
Grounded TheoryGrounded Theory
Grounded Theory
 
Philosophy of science summary presentation engelby
Philosophy of science summary presentation engelbyPhilosophy of science summary presentation engelby
Philosophy of science summary presentation engelby
 
This presentation is not about RefWorks
This presentation is not about RefWorksThis presentation is not about RefWorks
This presentation is not about RefWorks
 
Grounded Theory
Grounded Theory Grounded Theory
Grounded Theory
 
A Natural Logic for Artificial Intelligence, and its Risks and Benefits
A Natural Logic for Artificial Intelligence, and its Risks and Benefits A Natural Logic for Artificial Intelligence, and its Risks and Benefits
A Natural Logic for Artificial Intelligence, and its Risks and Benefits
 
A NATURAL LOGIC FOR ARTIFICIAL INTELLIGENCE, AND ITS RISKS AND BENEFITS
A NATURAL LOGIC FOR ARTIFICIAL INTELLIGENCE, AND ITS RISKS AND BENEFITSA NATURAL LOGIC FOR ARTIFICIAL INTELLIGENCE, AND ITS RISKS AND BENEFITS
A NATURAL LOGIC FOR ARTIFICIAL INTELLIGENCE, AND ITS RISKS AND BENEFITS
 
A NATURAL LOGIC FOR ARTIFICIAL INTELLIGENCE, AND ITS RISKS AND BENEFITS
A NATURAL LOGIC FOR ARTIFICIAL INTELLIGENCE, AND ITS RISKS AND BENEFITSA NATURAL LOGIC FOR ARTIFICIAL INTELLIGENCE, AND ITS RISKS AND BENEFITS
A NATURAL LOGIC FOR ARTIFICIAL INTELLIGENCE, AND ITS RISKS AND BENEFITS
 
Conceptual Framework By Zewde Alemayehu Tilahun
Conceptual Framework By Zewde Alemayehu TilahunConceptual Framework By Zewde Alemayehu Tilahun
Conceptual Framework By Zewde Alemayehu Tilahun
 

Más de arcomem

Más de arcomem (20)

Arcomem training – Enrichment Advanced (update)
Arcomem training – Enrichment Advanced (update)Arcomem training – Enrichment Advanced (update)
Arcomem training – Enrichment Advanced (update)
 
Arcomem training – Enrichment Beginner (update)
Arcomem training – Enrichment Beginner (update)Arcomem training – Enrichment Beginner (update)
Arcomem training – Enrichment Beginner (update)
 
Arcomem training Specifying Crawls Advanced
Arcomem training Specifying Crawls AdvancedArcomem training Specifying Crawls Advanced
Arcomem training Specifying Crawls Advanced
 
Arcomem training Specifying Crawls Beginners
Arcomem training Specifying Crawls BeginnersArcomem training Specifying Crawls Beginners
Arcomem training Specifying Crawls Beginners
 
Arcomem training Twitter Domain Experts advanced
Arcomem training Twitter Domain Experts advancedArcomem training Twitter Domain Experts advanced
Arcomem training Twitter Domain Experts advanced
 
Arcomem training Cultural Analysis Advanced
Arcomem training Cultural Analysis AdvancedArcomem training Cultural Analysis Advanced
Arcomem training Cultural Analysis Advanced
 
Arcomem training Cultural Analysis Beginner
Arcomem training Cultural Analysis BeginnerArcomem training Cultural Analysis Beginner
Arcomem training Cultural Analysis Beginner
 
Arcomem training twitter-dynamics_advanced
Arcomem training twitter-dynamics_advancedArcomem training twitter-dynamics_advanced
Arcomem training twitter-dynamics_advanced
 
Arcomem training system-overview_advanced
Arcomem training system-overview_advancedArcomem training system-overview_advanced
Arcomem training system-overview_advanced
 
Arcomem training specifying-crawls
Arcomem training specifying-crawlsArcomem training specifying-crawls
Arcomem training specifying-crawls
 
Arcomem training simple-text-mining_beginner
Arcomem training simple-text-mining_beginnerArcomem training simple-text-mining_beginner
Arcomem training simple-text-mining_beginner
 
Arcomem training opinions_advanced
Arcomem training opinions_advancedArcomem training opinions_advanced
Arcomem training opinions_advanced
 
Arcomem training neer_beginner
Arcomem training neer_beginnerArcomem training neer_beginner
Arcomem training neer_beginner
 
Arcomem training neer_advanced
Arcomem training neer_advancedArcomem training neer_advanced
Arcomem training neer_advanced
 
Arcomem training heritrix_beginner
Arcomem training heritrix_beginnerArcomem training heritrix_beginner
Arcomem training heritrix_beginner
 
Arcomem training heritrix_advanced
Arcomem training heritrix_advancedArcomem training heritrix_advanced
Arcomem training heritrix_advanced
 
Arcomem training entities-and-events_advanced
Arcomem training entities-and-events_advancedArcomem training entities-and-events_advanced
Arcomem training entities-and-events_advanced
 
Arcomem training enrichment_beginner
Arcomem training enrichment_beginnerArcomem training enrichment_beginner
Arcomem training enrichment_beginner
 
Arcomem training enrichment_advanced
Arcomem training enrichment_advancedArcomem training enrichment_advanced
Arcomem training enrichment_advanced
 
Arcomem training diversification
Arcomem training diversificationArcomem training diversification
Arcomem training diversification
 

Último

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 

Último (20)

Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 

Arcomem training Topic Analysis Models beginners

  • 1. Topic Analysis in ARCOMEM Yahoo Research Barcelona
  • 2. What is Probabilistic Topic Modelling? Exploring and retrieving meaningful information from large collections of textual documents is a challenging task Probabilistic topic models are a suite of algorithms (a framework) that aim to discover and annotate large archives of documents with thematic information. They do not require any prior annotations or labeling of the documents. Topics emerge from the statistical analysis of the original texts
  • 3. Probabilistic Topic Model Topic models are based upon the idea that documents are mixtures of topics, where a topic is a probability distribution over a fixed vocabulary. A topic model is a generative model for documents: it specifies a simple probabilistic procedure by which documents can be generated. The idea is to study the co-occurrence of words, assuming that words that tend to co-occur frequently, express, or belong to, the same semantic concept. Example: A document (d) can be represented by the following mixture of topics Biology Physics Mathematics 0,6 0,3 0,1 In the topic “Biology” words such as “Dna, genetic, evolution” have high probability
  • 4. Intuition behind topic modelling Documents exhibit multiple topics Each topic is individually interpretable, providing a probability distribution over words that picks out a coherent cluster of correlated terms Evolution Biology Genetics Statistical Analysis
  • 5. The challenge is to identify, for each campaign, significant and important topics that are relevant to the two user cases, broadcasting and parliament libraries. Topic analysis provides semantic useful categories which allow end- users to search and browse content archives.
  • 6. Try out on SARA: Trending topics
  • 7. Try out on SARA: Statistical Topic Models