SlideShare a Scribd company logo
1 of 65
Download to read offline
Badenes-Olmedo, Carlos
Corcho, Oscar
Ontology Engineering Group (OEG)
Universidad Politécnica de Madrid (UPM)
Probabilistic
Topic Models
K-CAP 2017
Knowledge Capture

December 4th-6th, 2017

Austin, Texas, United States
cbadenes@fi.upm.es
@carbadol
github.com/librairy
oeg-upm.net
2
get ready to play ..
$	git	clone	git@github.com:librairy/tutorial.git	.	
$	docker	run	--name	test	-v	"$PWD":/src	librairy/compiler
3
Outline
1. Artifacts
2. Parameters
3. Training
4. Evaluation and Interpretation
5. Inference
6. Trends
7. Domains
8. Topic-based Similarity
4
Documents
5
Words
dataset
datum
knowledge ontology
query
example
page
eventconcept
class
extraction
section
science term
instance
measure
6
ontology 0.011696399528999426
query 0.00840189384297452
datum 0.007648923027408832
example 0.007628095111872131
knowledge 0.006779803572327723
page 0.006705717576295264
event 0.0066423608486780505
case 0.006638825784874849
dataset 0.006361954166489025
concept 0.005970016686794867
.. ..
class 0.0002888599713777317
6
extraction 0.0002747438325633771
6
datum 0.0002738706330554531
section 0.000268698920616526
example 0.0002680552015363467
6
event 0.0002648849674904949
dataset 0.0002602128572589013
5
case 0.0002595933993586539
result 0.0002585102532654629
entity 0.0002571243653838122
5
.. ..
Topics
dataset
event
7
ontology 0.01169639952899942
6query 0.00840189384297452
datum 0.00764892302740883
2example 0.00762809511187213
1knowledge 0.00677980357232772
3page 0.00670571757629526
4event 0.00664236084867805
05case 0.00663882578487484
9dataset 0.00636195416648902
5concept 0.00597001668679486
7.. ..
class 0.00028885997137773
176extraction 0.00027474383256337
716datum 0.00027387063305545
31section 0.00026869892061652
6example 0.00026805520153634
676event 0.00026488496749049
49dataset 0.00026021285725890
135case 0.00025959339935865
39result 0.00025851025326546
29entity 0.00025712436538381
225.. ..
Documents
dataset event
8
Representational Model
[ 0.6, 0.3, 0.1]
[ 0.6, 0.3, 0.1]
[ 0.6, 0.3, 0.1]
[ 0.6, 0.3, 0.1]
[ 0.6, 0.3, 0.1]
[ 0.6, 0.3, 0.1]
[ 0.6, 0.3, 0.1]
[ 0.6, 0.3, 0.1]
[ 0.6, 0.3, 0.1]
[ 0.6, 0.3, 0.1]
[ 0.2, 0.2, 0.6]
[ 0.4, 0.3, 0.3]
[ T0, T1, T2]
dataset
event
term
datum
concept
knowledge
TOPICS
WORDS
DOCUMENTS
Topic
Model
9
Visualization
• PMLA items from JSTOR's Data for Research service
• restricted to items categorized as “full-length articles” with more than 2000 words
• 5605 articles out of the 9200 items from the years 1889–2007
• LDA, 64 topics
10
Outline
1. Artifacts
2. Parameters
3. Training
4. Evaluation and Interpretation
5. Inference
6. Trends
7. Domains
8. Topic-based Similarity
11
• Each topic is a distribution over words
• Each document is a mixture of corpus-wide topics
• Each word is drawn from one of those topics
Probabilistic Topic Models
12
• We only observe the documents
• The other structure are hidden variables.
• Our goal is to infer the hidden variables by computing their distribution
conditioned on the documents:
p(topics, proportions, assignments | documents)
Probabilistic Topic Models
Graphical Models
13
• Encodes assumptions
• Defines a factorization of the joint
distribution
• Connects to algorithms for computing
with data
Proportions
parameter
Per-document
topic proportions
Per-word
topic assignment
Observed
word
TopicsDocuments
Topic
parameter
Words
• Nodes are random variables
• Edges indicate dependence
• Shaded nodes are observed
• Plates indicate replicated variables
Plate Notation:
Per-topic
word proportions
14
Proportions
parameter
Per-document
topic proportions
Per-word
topic assignment
Observed
word
TopicsDocuments
Topic
parameter
Words
Per-topic
word proportions
Latent Dirichlet Allocation (LDA) [Blei et al, 2003]
15
Approximate posterior inference algorithms:
- Mean FieldVariational methods [Blei et al., 2001,2003]
- Expectation Propagation [Minka and Lafferty, 2002]
- Collapsed Gibbs Sampling [Griggiths and Steyvers, 2002]
- CollapsedVariational Inference [Teh et al., 2006]
- OnlineVariational Inference [Hoffman et al., 2010]
Probabilistic Topic Models as Graphical Models
From a collection of documents, infer:
- Per-word topic assignment Zd,n
- Per-document topic proportions θd
- Per-corpus topic distributions βk
16
Dirichlet Distribution
• Exponential family distribution over the simplex, 

i.e. positive vectors that sum to one

• The parameter 𝛼 controls the mean shape and sparsity of 𝛳
probability distribution of
documents over topics (dt)
probability distribution of
words over topics (wt)
[ dt1, dt2, dt3]
dt1+dt2+dt3 = 1
[ wt1, wt2, wt3]
wt1+wt2+wt3 = 1
Word
Latent Dirichlet Allocation (LDA) [Blei et al, 2003]
Topic 1
Topic 2
Topic 3
17
Latent Dirichlet Allocation (LDA) [Blei et al, 2003]
𝛼=100
18
Latent Dirichlet Allocation (LDA) [Blei et al, 2003]
𝛼=1
19
Latent Dirichlet Allocation (LDA) [Blei et al, 2003]
𝛼=0.1
20
Latent Dirichlet Allocation (LDA) [Blei et al, 2003]
𝛼=0.001
21
Outline
1. Artifacts
2. Parameters
3. Training
4. Evaluation and Interpretation
5. Inference
6. Trends
7. Domains
8. Topic-based Similarity
22
http://librairy.linkeddata.es/api
❖ Distributing Text Mining tasks with librAIry. 

Badenes-Olmedo, C.; Redondo-Garcia, J.; Corcho, O.

In Proceedings of the 2017 ACM Symposium on Document Engineering (DocEng '17).
ACM, 63-66. DOI: https://doi.org/10.1145/3103010.3121040
password: kcap2017user: oeg
23
• kcap:

<librairy-api>/domains/kcap

• kcap2015:

<librairy-api>/domains/kcap2015

• kcap2017: 

<librairy-api>/domains/kcap2017
READ
CORPUS
24
• Documents per Corpus:

<librairy-api>/domains/kcap/items

• Document Content:

<librairy-api>/items/3707eb49b81fb67e76bf1d40da842275?content=true

• Document (noun) Lemmas:

<librairy-api>/items/3707eb49b81fb67e76bf1d40da842275/annotations/lemma

• Document Annotations:

<librairy-api>/items/3707eb49b81fb67e76bf1d40da842275/annotations

READ
DOCUMENTS
25
• Topics per Corpus:

<librairy-api>/domains/kcap/topics?words=10

• Topic Details:

<librairy-api>/domains/kcap/topics/0

• Most Representative Documents in a Topic:

<librairy-api>/domains/kcap/topics/0/items

• Topics per Document:

<librairy-api>/domains/kcap/items/d8933e1de1888ccbdf4e0df42c404d7e/topics?
words=5

READ
TOPICS
26
let’s create a LDA model
27
• Parameters:



[GET] <librairy-api>/domains/{domainId}/
parameters



[POST]<librairy-api>/domains/{domainId}/
parameters
✴ lda.alpha = 0.1
✴ lda.beta = 0.01
✴ lda.topics = 6
✴ lda.vocabulary.size = 10000
✴ lda.max.iterations = 100
✴ lda.optimizer = manual 

(basic, nsga)
✴ lda.stopwords =example,service
CREATE
{
"name": "lda.beta",
"value": "0.01"
}
TOPICS
• (Re)Train a model: [PUT] <librairy-api>/domains/kcap/topics

28
CREATE
TOPICS
$	git	checkout	lda
$	docker	start	-i	test
• Adjust parameter.properties
• Move to the ‘lda’ stage
• Try
$	nano	parameters.properties
• Results in ‘output/models/lda' folder
29
CREATE
TOPICS
$	dataset.csv.gz							#	texts	
$	model.documents.txt		#	topics	per	document	
$	model.parameters.txt	#	model	settings	
$	model.topics.txt					#	topic	words	
$	model.uris.txt							#	document	URI	
$	model.vocabulary.txt	#	list	of	words	
$	model.words.txt						#	topics	per	word	in	document
• See ‘output/models/lda/kcap' folder:
30
Outline
1. Artifacts
2. Parameters
3. Training
4. Evaluation and Interpretation
5. Inference
6. Trends
7. Domains
8. Topic-based Similarity
Evaluation and Interpretation
31
Evaluation and Interpretation
32
• Synthetic Data

- Because topic models have a generative process, we can generate data

- Latent variables are no longer latent.

- Take it account topic identifiers

- Semi-synthetic data can be a good compromise: train from real data and generate
new documents from that model

• Baselines and Metrics

- Compare to another model that either has a similar structure or application.

- Perplexity or held-out likelihood: probability of held-out data given the settings of the
model

- Accuracy: prediction performance -> purpose oriented- Accuracy: prediction performance
How well the model fit the
data to be modelled
Evaluation and Interpretation
33
Chang, J., Gerrish, S.,Wang, C., & Blei, D. M. (2009). ReadingTea Leaves: How Humans InterpretTopic Models.Advances in Neural
Information Processing Systems 22, 288--296
• Word Intrusion: how semantically ‘cohesive’ the topics inferred by a model are and
tests whether topics correspond to natural groupings for humans (Topic Coherence)











• Topic Intrusion: how well a topic model’s decomposition of a document as a mixture
of topics agrees with human associations of topics with a document
{ dog, cat, horse, apple, pig, cow}
{ car, teacher, platypus, agile, blue, Zaire}
How interpretable is the
model by a human
34
Outline
1. Artifacts
2. Parameters
3. Training
4. Evaluation and Interpretation
5. Inference
6. Trends
7. Domains
8. Topic-based Similarity
35
• Create a Document

[POST] <librairy-api>/items/{id}

• Add it to a Domain

[POST] <librairy-api>/domains/{domainId}/items/{itemId}

• Read Topics per Document:

[GET] <librairy-api>/domains/{domainId}/items/{id}/topics

INFERENCE TOPIC DISTRIBUTIONS
CREATE
{	

		"content":	"string",	
		"language":	"string",	
		"name":	“string"	
}

36
CREATE
$	git	checkout	inference
• Write text in file.txt:
$	docker	start	-i	test
• Execute:
$	nano	file.txt
• Move to the ‘inference’ stage
INFERENCE TOPIC DISTRIBUTIONS
• Results in ‘output/inferences/lda' folder
37
Outline
1. Artifacts
2. Parameters
3. Training
4. Evaluation and Interpretation
5. Inference
6. Trends
7. Domains
8. Topic-based Similarity
Correlated Topic Model (CTM) [Blei and Lafferty, 2006]
38
M, B. D., & D, L. J. (2006). CorrelatedTopic Models.Advances in Neural Information Processing Systems 18, 147–154
Correlated Topic Model (CTM)
uses a logistic normal distribution for the topic proportions
instead of a Dirichlet distribution to allow representing
correlations between topics
Extending LDA
39
• A Dynamic Topic Model (DTM) uses a logistic normal in a linear dynamic model to capture
how topics change over time:
• Supervised LDA are topic models of documents and responses, fit to find topics predictive of
the response
40
COMPARE TOPICS
CREATE
$	git	checkout	trends
•Move to the ‘trends’ stage
$	docker	start	-i	test
• Adjust parameter.properties
• Execute
$	nano	parameters.properties
• Results in ‘output/similarities/topics' folder
41
CREATE
$	similarities.csv				#	pairwise	topic	similarities	
$	topics.{domain}.txt	#	topic	words	in	domain	
• See the ‘output/similarities/topics/ folder:
Kumar, R., & Vassilvitskii, S. (2010). Generalized distances between rankings.
Proceedings of the 19th International Conference on World Wide Web - WWW ’10,(3), 571.
http://doi.org/10.1145/1772690.1772749
COMPARE TOPICS
42
Outline
1. Artifacts
2. Parameters
3. Training
4. Evaluation and Interpretation
5. Inference
6. Trends
7. Domains
8. Topic-based Similarity
Historical Domain
43
• Finding themes in document that reflect temporal trends

• Looking not just for the topical of each time period, but how those topics
shift in concentration as they are influenced by historical events 

• Discovering how events are reflected in writing and how ideas and
emotions emerge in response to changing events.
Topic Models in Historical Documents
44
Allen Beye Riddell. How to Read 22,198 Journal Articles: Studying the History
of German Studies with Topic Models, pages 91–113. Camden House, 2012.
A LDA model trained with 150 topics on 22.198 articles from JSTOR in the 20th century, removing
numbers and articles having fewer than 200 words
Historical Domain
45
Allen Beye Riddell. How to Read 22,198 Journal Articles: Studying the History
of German Studies with Topic Models, pages 91–113. Camden House, 2012.
Historical Domain
46
• the chronological trajectory of a topic is not the same thing as the trajectory of the
individual words that compose it



• individual topics always need to be interpreted in the context of the larger model



• it becomes essential validate the description provided by a topic model by reference to
something other than the topic model itself
Conclusions
Historical Domain
Scientific Domain
47
• Specialized vocabularies define fields of study

• Scientific documents innovate: unlike other domains, scientific documents are not just
reports of news or events; they are news. Innovation is hard to detect and hard to
attribute

• Science and Policy: understanding scientific publications is important for funding
agencies, lawmakers, and the public.
Topic Models in Scientific Publications
48
Thomas L. Griffths and Mark Steyvers. Finding scientific topics. Proceedings of the National
Academy of Sciences, 101(Suppl 1):5228–5235, 2004
Reconstruct the official
Proceedings of the National
Academy of Sciences (PNAS)
topics automatically using topic
models
28.154 abstract published in
PNAS from 1991 to 2001
LDA, beta=0.1, alpha=50/K,
K=numTopics=50/100/200../1000
Scientific Domain
49
Thomas L. Griffths and Mark Steyvers. Finding scientific topics. Proceedings of the National
Academy of Sciences, 101(Suppl 1):5228–5235, 2004
Scientific Domain
50
Zhou, Ding, Xiang Ji, Hongyuan Zha and C. Lee Giles. Topic evolution and social
interactions: how authors effect research. CIKM (2006).
given a seemingly new topic, from where does this topic evolve?
a newly emergent topic is truly new or rather a variation of an old topic?
CiteSeer Dataset
hypothesis
• “one topic evolves into another topic when the
corresponding social actors interact with other
actors with different topics in the latent social
network"
Scientific Domain
51
Zhou, Ding, Xiang Ji, Hongyuan Zha and C. Lee Giles. Topic evolution and social
interactions: how authors effect research. CIKM (2006).
Topics with heavy methodology
requirements (e.g. np problem, linear
system) and/or popular topics (e.g. mobile
computing, net- work) are more likely to
remain stable. By contrast, topics closely
related to applications are more likely to
have higher transition probabilities than
other topics (e.g. data mining in database,
security) all things being equal
Scientific Domain
52
Zhou, Ding, Xiang Ji, Hongyuan Zha and C. Lee Giles. Topic evolution and social
interactions: how authors effect research. CIKM (2006).
Scientific Domain
53
• Understanding science communication allows us to see how our understanding of
nature, technology, and engineering have advanced over the years 



• Topic models can capture how these fields have changed and have gained additional
knowledge with each new discovery


• As the scientific enterprise becomes more distributed and faster moving, these tools
are important for scientists hoping to understand trends and development and for
policy makers who seek to guide innovation
Conclusions
Scientific Domain
Fiction and Literature Domain
54
• What is a document?:Treating novels as a single bag of words does not work.Topics
resulting from this corpus treatment are overly vague and lack thematic coherence 

• People and Places: Because most works of fiction are set in imaginary worlds that do
not exist outside the work itself, they have words such as character names that are
extremely frequent locally but never occur elsewhere
• Beyond the Literal: Documents that are valued not just for their information content
but for their artistic expression
• Topic models complement close reading in two ways, as a survey method and as a
means for tracing and comparing large-scale patterns
Topic Models in Literature
Fiction and Literature
55
Beyond the Literal
Lia M. Rhody. Topic modeling and figurative language. Journal of Digital Humanities, 2(1), 2012
In a corpus of 4600 poems and a sixty-topic model, one of the topics discovered: 

{night, light, moon, stars, day, dark, sun, sleep, sky, wind, time, eyes, star, darkness,
bright }
Analyzing the context of the topic (i.e. poems), they are using a metaphor relating night and sleep
to death.
Thus, the topic can be considered as the death as metaphor.
Rhody [2012] demonstrates on a corpus of poetry that although topics do not represent
symbolic meanings, they are a good way of detecting the concrete language associated with
repeated metaphors.
Fiction and Literature
56
• Topic models cannot by themselves study literature, but they are useful tools for
scholars studying literature 

• Literary concepts are complicated, but they often have surprisingly strong statistical
signatures
• Models can still be useful in identifying areas of potential interest, even if they don’t
“understand” what they are finding
Conclusions
57
Outline
1. Artifacts
2. Parameters
3. Training
4. Evaluation and Interpretation
5. Inference
6. Trends
7. Domains
8. Topic-based Similarity
58
Topic Model
Similarity Valuetopic0: 0.520,
topic1: 0.327
topic2: 0.081
…
topic122: 0.182
topic1: 0.573,
topic2: 0.172
topic3: 0.136
…
topic122: 0.099
0.595
..
Topic0 Topic1 Topic2 Topic122
DOCUMENT
‘A’
DOCUMENT
‘B’
Topic-based Similarity
Jensen-Shannon Divergence
59
Representativeness
Badenes-Olmedo, C., Redondo-Garcia, J. L., & Corcho, O. (2017). An Initial Analysis of Topic-based Similarity among Scientific Documents
based on their rhetorical discourse parts. In Proceedings of the 1st SEMSCI workshop co-located with ISWC.
Full-Paper
Summary
Internal
External
finding related items
describing main ideas
(JSD-based similarity)
(precision / recall / f-measure)
JSD-based
similarity
JSD-based
similarity
librAIry
60
librAIry
• kcap:



[GET] <librairy-api>/domains/{domainId}/items/{itemId}/relations

RELATIONS
READ
61
$	git	checkout	similarity
CREATE
RELATIONS
• Move to the ‘similarity’ stage:
$	docker	start	-i	test
• Adjust parameter.properties
• Execute
$	nano	parameters.properties
• Results in ‘output/models/similarities' folder
62
CREATE
• Open ‘similarity-network.html’ in a browser (Firefox)
63
CREATE
• Try with input.domain=ecommerce (1000 nodes)
64
References
• Jurafsky, D., & Martin, J. H. (2016). Language Modeling with N- grams. Speech and Language Processing, 1–28.

• Jordan Boyd-Graber, Yuening Hu and David Mimno (2017), "Applications of Topic Models", Foundations and
Trends® in Information Retrieval: Vol. 11: No. 2-3, pp 143-296

• Blei, David M., Lawrence Carin and David B. Dunson. “Probabilistic Topic Models.” IEEE Signal Processing
Magazine 27 (2010): 55-65.

• Zhai, ChengXiang. “Probabilistic Topic Models for Text Data Retrieval and Analysis.” SIGIR (2017).

• Wang, C., Blei, D., & Heckerman, D. (2008). Continuous Time Dynamic Topic Models. Proc of UAI, 579–586.

• Chang, J., Gerrish, S., Wang, C., & Blei, D. M. (2009). Reading Tea Leaves: How Humans Interpret Topic Models.
Advances in Neural Information Processing Systems 22, 288--296. 

• Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet Allocation. Journal of Machine Learning Research, 3(4–
5), 993–1022. 

• Cheng, X., Yan, X., Lan, Y., & Guo, J. (2014). BTM: Topic Modeling over Short Texts. Knowledge and Data Engineering,
IEEE Transactions

• Blei, D. M., & Lafferty, J. D. (2007). A correlated topic model of Science. The Annals of Applied Statistics, 1(1), 17–
35. 

• Griffiths, T. L., Jordan, M. I., Tenenbaum, J. B., & Blei, D. M. (2004). Hierarchical Topic Models and the Nested
Chinese Restaurant Process. Advances in Neural Information Processing Systems, 17–24.

• Badenes-Olmedo, C.; Redondo-Garcia, J.; Corcho, O. (2017) Distributing Text Mining tasks with librAIry. In
Proceedings of the 2017 ACM Symposium on Document Engineering (DocEng '17). ACM, 63-66.
Probabilistic
Topic Models
Badenes-Olmedo, Carlos
Corcho, Oscar
Ontology Engineering Group (OEG)
Universidad Politécnica de Madrid (UPM)
K-CAP 2017
Knowledge Capture

December 4th-6th, 2017

Austin, Texas, United States
cbadenes@fi.upm.es
@carbadol
github.com/librairy
oeg-upm.net

More Related Content

What's hot

Relations for Reusing (R4R) in A Shared Context: An Exploration on Research P...
Relations for Reusing (R4R) in A Shared Context: An Exploration on Research P...Relations for Reusing (R4R) in A Shared Context: An Exploration on Research P...
Relations for Reusing (R4R) in A Shared Context: An Exploration on Research P...andrea huang
 
Progress Towards Leveraging Natural Language Processing for Collecting Experi...
Progress Towards Leveraging Natural Language Processing for Collecting Experi...Progress Towards Leveraging Natural Language Processing for Collecting Experi...
Progress Towards Leveraging Natural Language Processing for Collecting Experi...Anubhav Jain
 
LOTUS: Adaptive Text Search for Big Linked Data
LOTUS: Adaptive Text Search for Big Linked DataLOTUS: Adaptive Text Search for Big Linked Data
LOTUS: Adaptive Text Search for Big Linked DataFilip Ilievski
 
Visualising the Australian open data and research data landscape
Visualising the Australian open data and research data landscapeVisualising the Australian open data and research data landscape
Visualising the Australian open data and research data landscapeJonathan Yu
 
Dual Embedding Space Model (DESM)
Dual Embedding Space Model (DESM)Dual Embedding Space Model (DESM)
Dual Embedding Space Model (DESM)Bhaskar Mitra
 
How to clean data less through Linked (Open Data) approach?
How to clean data less through Linked (Open Data) approach?How to clean data less through Linked (Open Data) approach?
How to clean data less through Linked (Open Data) approach?andrea huang
 
Learning Systems for Science
Learning Systems for ScienceLearning Systems for Science
Learning Systems for ScienceIan Foster
 
From Open Linked Data towards an Ecosystem of Interlinked Knowledge
From Open Linked Data towards an Ecosystem of Interlinked KnowledgeFrom Open Linked Data towards an Ecosystem of Interlinked Knowledge
From Open Linked Data towards an Ecosystem of Interlinked KnowledgeSören Auer
 
20160818 Semantics and Linkage of Archived Catalogs
20160818 Semantics and Linkage of Archived Catalogs20160818 Semantics and Linkage of Archived Catalogs
20160818 Semantics and Linkage of Archived Catalogsandrea huang
 
Accelerating materials design through natural language processing
Accelerating materials design through natural language processingAccelerating materials design through natural language processing
Accelerating materials design through natural language processingAnubhav Jain
 
Tutorial@BDA 2017 -- Knowledge Graph Expansion and Enrichment
Tutorial@BDA 2017 -- Knowledge Graph Expansion and Enrichment Tutorial@BDA 2017 -- Knowledge Graph Expansion and Enrichment
Tutorial@BDA 2017 -- Knowledge Graph Expansion and Enrichment Paris Sud University
 
A Linked Data Prototype for the Union Catalog of Digital Archives Taiwan
A Linked Data Prototype for the Union Catalog of Digital Archives TaiwanA Linked Data Prototype for the Union Catalog of Digital Archives Taiwan
A Linked Data Prototype for the Union Catalog of Digital Archives Taiwanandrea huang
 
Australian Open government and research data pilot survey 2017
Australian Open government and research data pilot survey 2017Australian Open government and research data pilot survey 2017
Australian Open government and research data pilot survey 2017Jonathan Yu
 
Lightning fast genomics with Spark, Adam and Scala
Lightning fast genomics with Spark, Adam and ScalaLightning fast genomics with Spark, Adam and Scala
Lightning fast genomics with Spark, Adam and ScalaAndy Petrella
 
Materials design using knowledge from millions of journal articles via natura...
Materials design using knowledge from millions of journal articles via natura...Materials design using knowledge from millions of journal articles via natura...
Materials design using knowledge from millions of journal articles via natura...Anubhav Jain
 

What's hot (15)

Relations for Reusing (R4R) in A Shared Context: An Exploration on Research P...
Relations for Reusing (R4R) in A Shared Context: An Exploration on Research P...Relations for Reusing (R4R) in A Shared Context: An Exploration on Research P...
Relations for Reusing (R4R) in A Shared Context: An Exploration on Research P...
 
Progress Towards Leveraging Natural Language Processing for Collecting Experi...
Progress Towards Leveraging Natural Language Processing for Collecting Experi...Progress Towards Leveraging Natural Language Processing for Collecting Experi...
Progress Towards Leveraging Natural Language Processing for Collecting Experi...
 
LOTUS: Adaptive Text Search for Big Linked Data
LOTUS: Adaptive Text Search for Big Linked DataLOTUS: Adaptive Text Search for Big Linked Data
LOTUS: Adaptive Text Search for Big Linked Data
 
Visualising the Australian open data and research data landscape
Visualising the Australian open data and research data landscapeVisualising the Australian open data and research data landscape
Visualising the Australian open data and research data landscape
 
Dual Embedding Space Model (DESM)
Dual Embedding Space Model (DESM)Dual Embedding Space Model (DESM)
Dual Embedding Space Model (DESM)
 
How to clean data less through Linked (Open Data) approach?
How to clean data less through Linked (Open Data) approach?How to clean data less through Linked (Open Data) approach?
How to clean data less through Linked (Open Data) approach?
 
Learning Systems for Science
Learning Systems for ScienceLearning Systems for Science
Learning Systems for Science
 
From Open Linked Data towards an Ecosystem of Interlinked Knowledge
From Open Linked Data towards an Ecosystem of Interlinked KnowledgeFrom Open Linked Data towards an Ecosystem of Interlinked Knowledge
From Open Linked Data towards an Ecosystem of Interlinked Knowledge
 
20160818 Semantics and Linkage of Archived Catalogs
20160818 Semantics and Linkage of Archived Catalogs20160818 Semantics and Linkage of Archived Catalogs
20160818 Semantics and Linkage of Archived Catalogs
 
Accelerating materials design through natural language processing
Accelerating materials design through natural language processingAccelerating materials design through natural language processing
Accelerating materials design through natural language processing
 
Tutorial@BDA 2017 -- Knowledge Graph Expansion and Enrichment
Tutorial@BDA 2017 -- Knowledge Graph Expansion and Enrichment Tutorial@BDA 2017 -- Knowledge Graph Expansion and Enrichment
Tutorial@BDA 2017 -- Knowledge Graph Expansion and Enrichment
 
A Linked Data Prototype for the Union Catalog of Digital Archives Taiwan
A Linked Data Prototype for the Union Catalog of Digital Archives TaiwanA Linked Data Prototype for the Union Catalog of Digital Archives Taiwan
A Linked Data Prototype for the Union Catalog of Digital Archives Taiwan
 
Australian Open government and research data pilot survey 2017
Australian Open government and research data pilot survey 2017Australian Open government and research data pilot survey 2017
Australian Open government and research data pilot survey 2017
 
Lightning fast genomics with Spark, Adam and Scala
Lightning fast genomics with Spark, Adam and ScalaLightning fast genomics with Spark, Adam and Scala
Lightning fast genomics with Spark, Adam and Scala
 
Materials design using knowledge from millions of journal articles via natura...
Materials design using knowledge from millions of journal articles via natura...Materials design using knowledge from millions of journal articles via natura...
Materials design using knowledge from millions of journal articles via natura...
 

Similar to Probabilistic Topic models

Knowledge Graph Construction and the Role of DBPedia
Knowledge Graph Construction and the Role of DBPediaKnowledge Graph Construction and the Role of DBPedia
Knowledge Graph Construction and the Role of DBPediaPaul Groth
 
TopicModels_BleiPaper_Summary.pptx
TopicModels_BleiPaper_Summary.pptxTopicModels_BleiPaper_Summary.pptx
TopicModels_BleiPaper_Summary.pptxKalpit Desai
 
A Data Ecosystem to Support Machine Learning in Materials Science
A Data Ecosystem to Support Machine Learning in Materials ScienceA Data Ecosystem to Support Machine Learning in Materials Science
A Data Ecosystem to Support Machine Learning in Materials ScienceGlobus
 
Research Objects: more than the sum of the parts
Research Objects: more than the sum of the partsResearch Objects: more than the sum of the parts
Research Objects: more than the sum of the partsCarole Goble
 
The Rhetoric of Research Objects
The Rhetoric of Research ObjectsThe Rhetoric of Research Objects
The Rhetoric of Research ObjectsCarole Goble
 
Topic Extraction on Domain Ontology
Topic Extraction on Domain OntologyTopic Extraction on Domain Ontology
Topic Extraction on Domain OntologyKeerti Bhogaraju
 
CiteSeerX: Mining Scholarly Big Data
CiteSeerX: Mining Scholarly Big DataCiteSeerX: Mining Scholarly Big Data
CiteSeerX: Mining Scholarly Big DataJian Wu
 
Data Science Keys to Open Up OpenNASA Datasets
Data Science Keys to Open Up OpenNASA DatasetsData Science Keys to Open Up OpenNASA Datasets
Data Science Keys to Open Up OpenNASA DatasetsPyData
 
Data Science Keys to Open Up OpenNASA Datasets - PyData New York 2017
Data Science Keys to Open Up OpenNASA Datasets - PyData New York 2017Data Science Keys to Open Up OpenNASA Datasets - PyData New York 2017
Data Science Keys to Open Up OpenNASA Datasets - PyData New York 2017Noemi Derzsy
 
Pemanfaatan Big Data Dalam Riset 2023.pptx
Pemanfaatan Big Data Dalam Riset 2023.pptxPemanfaatan Big Data Dalam Riset 2023.pptx
Pemanfaatan Big Data Dalam Riset 2023.pptxelisarosa29
 
Discovery Engines for Big Data: Accelerating Discovery in Basic Energy Sciences
Discovery Engines for Big Data: Accelerating Discovery in Basic Energy SciencesDiscovery Engines for Big Data: Accelerating Discovery in Basic Energy Sciences
Discovery Engines for Big Data: Accelerating Discovery in Basic Energy SciencesIan Foster
 
The Research Object Initiative: Frameworks and Use Cases
The Research Object Initiative:Frameworks and Use CasesThe Research Object Initiative:Frameworks and Use Cases
The Research Object Initiative: Frameworks and Use CasesCarole Goble
 
LSI latent (par HATOUM Saria et DONGO ESCALANTE Irvin Franco)
LSI latent (par HATOUM Saria et DONGO ESCALANTE Irvin Franco)LSI latent (par HATOUM Saria et DONGO ESCALANTE Irvin Franco)
LSI latent (par HATOUM Saria et DONGO ESCALANTE Irvin Franco)rchbeir
 
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...Armin Haller
 
Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...
Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...
Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...Bertram Ludäscher
 
Knowledge Graph Research and Innovation Challenges
Knowledge Graph Research and Innovation ChallengesKnowledge Graph Research and Innovation Challenges
Knowledge Graph Research and Innovation ChallengesSören Auer
 
Semantics-enhanced Cyberinfrastructure for ICMSE : Interoperability, Analyti...
Semantics-enhanced Cyberinfrastructure for ICMSE :  Interoperability, Analyti...Semantics-enhanced Cyberinfrastructure for ICMSE :  Interoperability, Analyti...
Semantics-enhanced Cyberinfrastructure for ICMSE : Interoperability, Analyti...Artificial Intelligence Institute at UofSC
 
The Future of Semantics on the Web
The Future of Semantics on the WebThe Future of Semantics on the Web
The Future of Semantics on the WebJohn Domingue
 

Similar to Probabilistic Topic models (20)

Knowledge Graph Construction and the Role of DBPedia
Knowledge Graph Construction and the Role of DBPediaKnowledge Graph Construction and the Role of DBPedia
Knowledge Graph Construction and the Role of DBPedia
 
TopicModels_BleiPaper_Summary.pptx
TopicModels_BleiPaper_Summary.pptxTopicModels_BleiPaper_Summary.pptx
TopicModels_BleiPaper_Summary.pptx
 
A Data Ecosystem to Support Machine Learning in Materials Science
A Data Ecosystem to Support Machine Learning in Materials ScienceA Data Ecosystem to Support Machine Learning in Materials Science
A Data Ecosystem to Support Machine Learning in Materials Science
 
Research Objects: more than the sum of the parts
Research Objects: more than the sum of the partsResearch Objects: more than the sum of the parts
Research Objects: more than the sum of the parts
 
The Rhetoric of Research Objects
The Rhetoric of Research ObjectsThe Rhetoric of Research Objects
The Rhetoric of Research Objects
 
Topic Extraction on Domain Ontology
Topic Extraction on Domain OntologyTopic Extraction on Domain Ontology
Topic Extraction on Domain Ontology
 
CiteSeerX: Mining Scholarly Big Data
CiteSeerX: Mining Scholarly Big DataCiteSeerX: Mining Scholarly Big Data
CiteSeerX: Mining Scholarly Big Data
 
Data Science Keys to Open Up OpenNASA Datasets
Data Science Keys to Open Up OpenNASA DatasetsData Science Keys to Open Up OpenNASA Datasets
Data Science Keys to Open Up OpenNASA Datasets
 
Data Science Keys to Open Up OpenNASA Datasets - PyData New York 2017
Data Science Keys to Open Up OpenNASA Datasets - PyData New York 2017Data Science Keys to Open Up OpenNASA Datasets - PyData New York 2017
Data Science Keys to Open Up OpenNASA Datasets - PyData New York 2017
 
Sybrandt Thesis Proposal Presentation
Sybrandt Thesis Proposal PresentationSybrandt Thesis Proposal Presentation
Sybrandt Thesis Proposal Presentation
 
Pemanfaatan Big Data Dalam Riset 2023.pptx
Pemanfaatan Big Data Dalam Riset 2023.pptxPemanfaatan Big Data Dalam Riset 2023.pptx
Pemanfaatan Big Data Dalam Riset 2023.pptx
 
Discovery Engines for Big Data: Accelerating Discovery in Basic Energy Sciences
Discovery Engines for Big Data: Accelerating Discovery in Basic Energy SciencesDiscovery Engines for Big Data: Accelerating Discovery in Basic Energy Sciences
Discovery Engines for Big Data: Accelerating Discovery in Basic Energy Sciences
 
The Research Object Initiative: Frameworks and Use Cases
The Research Object Initiative:Frameworks and Use CasesThe Research Object Initiative:Frameworks and Use Cases
The Research Object Initiative: Frameworks and Use Cases
 
LSI latent (par HATOUM Saria et DONGO ESCALANTE Irvin Franco)
LSI latent (par HATOUM Saria et DONGO ESCALANTE Irvin Franco)LSI latent (par HATOUM Saria et DONGO ESCALANTE Irvin Franco)
LSI latent (par HATOUM Saria et DONGO ESCALANTE Irvin Franco)
 
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
 
Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...
Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...
Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...
 
Knowledge Graph Research and Innovation Challenges
Knowledge Graph Research and Innovation ChallengesKnowledge Graph Research and Innovation Challenges
Knowledge Graph Research and Innovation Challenges
 
Semantics-enhanced Cyberinfrastructure for ICMSE : Interoperability, Analyti...
Semantics-enhanced Cyberinfrastructure for ICMSE :  Interoperability, Analyti...Semantics-enhanced Cyberinfrastructure for ICMSE :  Interoperability, Analyti...
Semantics-enhanced Cyberinfrastructure for ICMSE : Interoperability, Analyti...
 
A Clean Slate?
A Clean Slate?A Clean Slate?
A Clean Slate?
 
The Future of Semantics on the Web
The Future of Semantics on the WebThe Future of Semantics on the Web
The Future of Semantics on the Web
 

More from Carlos Badenes-Olmedo

Semantically-enabled Browsing of Large Multilingual Document Collections
Semantically-enabled Browsing of Large Multilingual Document CollectionsSemantically-enabled Browsing of Large Multilingual Document Collections
Semantically-enabled Browsing of Large Multilingual Document CollectionsCarlos Badenes-Olmedo
 
Scalable Cross-lingual Document Similarity through Language-specific Concept ...
Scalable Cross-lingual Document Similarity through Language-specific Concept ...Scalable Cross-lingual Document Similarity through Language-specific Concept ...
Scalable Cross-lingual Document Similarity through Language-specific Concept ...Carlos Badenes-Olmedo
 
Distributing Text Mining tasks with librAIry
Distributing Text Mining tasks with librAIryDistributing Text Mining tasks with librAIry
Distributing Text Mining tasks with librAIryCarlos Badenes-Olmedo
 
Efficient Clustering from Distributions over Topics
Efficient Clustering from Distributions over TopicsEfficient Clustering from Distributions over Topics
Efficient Clustering from Distributions over TopicsCarlos Badenes-Olmedo
 

More from Carlos Badenes-Olmedo (11)

NLP and Knowledge Graphs
NLP and Knowledge GraphsNLP and Knowledge Graphs
NLP and Knowledge Graphs
 
Semantically-enabled Browsing of Large Multilingual Document Collections
Semantically-enabled Browsing of Large Multilingual Document CollectionsSemantically-enabled Browsing of Large Multilingual Document Collections
Semantically-enabled Browsing of Large Multilingual Document Collections
 
Scalable Cross-lingual Document Similarity through Language-specific Concept ...
Scalable Cross-lingual Document Similarity through Language-specific Concept ...Scalable Cross-lingual Document Similarity through Language-specific Concept ...
Scalable Cross-lingual Document Similarity through Language-specific Concept ...
 
Crosslingual search-engine
Crosslingual search-engineCrosslingual search-engine
Crosslingual search-engine
 
Cross-lingual Similarity
Cross-lingual SimilarityCross-lingual Similarity
Cross-lingual Similarity
 
Multilingual searchapi
Multilingual searchapiMultilingual searchapi
Multilingual searchapi
 
Multilingual document analysis
Multilingual document analysisMultilingual document analysis
Multilingual document analysis
 
Topic Models Exploration
Topic Models ExplorationTopic Models Exploration
Topic Models Exploration
 
Docker Introduction
Docker IntroductionDocker Introduction
Docker Introduction
 
Distributing Text Mining tasks with librAIry
Distributing Text Mining tasks with librAIryDistributing Text Mining tasks with librAIry
Distributing Text Mining tasks with librAIry
 
Efficient Clustering from Distributions over Topics
Efficient Clustering from Distributions over TopicsEfficient Clustering from Distributions over Topics
Efficient Clustering from Distributions over Topics
 

Recently uploaded

➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...amitlee9823
 
hybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptxhybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptx9to5mart
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...amitlee9823
 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachBoston Institute of Analytics
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...amitlee9823
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsJoseMangaJr1
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...amitlee9823
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangaloreamitlee9823
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 

Recently uploaded (20)

➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
hybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptxhybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptx
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning Approach
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 

Probabilistic Topic models

  • 1. Badenes-Olmedo, Carlos Corcho, Oscar Ontology Engineering Group (OEG) Universidad Politécnica de Madrid (UPM) Probabilistic Topic Models K-CAP 2017 Knowledge Capture December 4th-6th, 2017 Austin, Texas, United States cbadenes@fi.upm.es @carbadol github.com/librairy oeg-upm.net
  • 2. 2 get ready to play .. $ git clone git@github.com:librairy/tutorial.git . $ docker run --name test -v "$PWD":/src librairy/compiler
  • 3. 3 Outline 1. Artifacts 2. Parameters 3. Training 4. Evaluation and Interpretation 5. Inference 6. Trends 7. Domains 8. Topic-based Similarity
  • 6. 6 ontology 0.011696399528999426 query 0.00840189384297452 datum 0.007648923027408832 example 0.007628095111872131 knowledge 0.006779803572327723 page 0.006705717576295264 event 0.0066423608486780505 case 0.006638825784874849 dataset 0.006361954166489025 concept 0.005970016686794867 .. .. class 0.0002888599713777317 6 extraction 0.0002747438325633771 6 datum 0.0002738706330554531 section 0.000268698920616526 example 0.0002680552015363467 6 event 0.0002648849674904949 dataset 0.0002602128572589013 5 case 0.0002595933993586539 result 0.0002585102532654629 entity 0.0002571243653838122 5 .. .. Topics dataset event
  • 7. 7 ontology 0.01169639952899942 6query 0.00840189384297452 datum 0.00764892302740883 2example 0.00762809511187213 1knowledge 0.00677980357232772 3page 0.00670571757629526 4event 0.00664236084867805 05case 0.00663882578487484 9dataset 0.00636195416648902 5concept 0.00597001668679486 7.. .. class 0.00028885997137773 176extraction 0.00027474383256337 716datum 0.00027387063305545 31section 0.00026869892061652 6example 0.00026805520153634 676event 0.00026488496749049 49dataset 0.00026021285725890 135case 0.00025959339935865 39result 0.00025851025326546 29entity 0.00025712436538381 225.. .. Documents dataset event
  • 8. 8 Representational Model [ 0.6, 0.3, 0.1] [ 0.6, 0.3, 0.1] [ 0.6, 0.3, 0.1] [ 0.6, 0.3, 0.1] [ 0.6, 0.3, 0.1] [ 0.6, 0.3, 0.1] [ 0.6, 0.3, 0.1] [ 0.6, 0.3, 0.1] [ 0.6, 0.3, 0.1] [ 0.6, 0.3, 0.1] [ 0.2, 0.2, 0.6] [ 0.4, 0.3, 0.3] [ T0, T1, T2] dataset event term datum concept knowledge TOPICS WORDS DOCUMENTS Topic Model
  • 9. 9 Visualization • PMLA items from JSTOR's Data for Research service • restricted to items categorized as “full-length articles” with more than 2000 words • 5605 articles out of the 9200 items from the years 1889–2007 • LDA, 64 topics
  • 10. 10 Outline 1. Artifacts 2. Parameters 3. Training 4. Evaluation and Interpretation 5. Inference 6. Trends 7. Domains 8. Topic-based Similarity
  • 11. 11 • Each topic is a distribution over words • Each document is a mixture of corpus-wide topics • Each word is drawn from one of those topics Probabilistic Topic Models
  • 12. 12 • We only observe the documents • The other structure are hidden variables. • Our goal is to infer the hidden variables by computing their distribution conditioned on the documents: p(topics, proportions, assignments | documents) Probabilistic Topic Models
  • 13. Graphical Models 13 • Encodes assumptions • Defines a factorization of the joint distribution • Connects to algorithms for computing with data Proportions parameter Per-document topic proportions Per-word topic assignment Observed word TopicsDocuments Topic parameter Words • Nodes are random variables • Edges indicate dependence • Shaded nodes are observed • Plates indicate replicated variables Plate Notation: Per-topic word proportions
  • 15. 15 Approximate posterior inference algorithms: - Mean FieldVariational methods [Blei et al., 2001,2003] - Expectation Propagation [Minka and Lafferty, 2002] - Collapsed Gibbs Sampling [Griggiths and Steyvers, 2002] - CollapsedVariational Inference [Teh et al., 2006] - OnlineVariational Inference [Hoffman et al., 2010] Probabilistic Topic Models as Graphical Models From a collection of documents, infer: - Per-word topic assignment Zd,n - Per-document topic proportions θd - Per-corpus topic distributions βk
  • 16. 16 Dirichlet Distribution • Exponential family distribution over the simplex, 
 i.e. positive vectors that sum to one
 • The parameter 𝛼 controls the mean shape and sparsity of 𝛳 probability distribution of documents over topics (dt) probability distribution of words over topics (wt) [ dt1, dt2, dt3] dt1+dt2+dt3 = 1 [ wt1, wt2, wt3] wt1+wt2+wt3 = 1 Word Latent Dirichlet Allocation (LDA) [Blei et al, 2003] Topic 1 Topic 2 Topic 3
  • 17. 17 Latent Dirichlet Allocation (LDA) [Blei et al, 2003] 𝛼=100
  • 18. 18 Latent Dirichlet Allocation (LDA) [Blei et al, 2003] 𝛼=1
  • 19. 19 Latent Dirichlet Allocation (LDA) [Blei et al, 2003] 𝛼=0.1
  • 20. 20 Latent Dirichlet Allocation (LDA) [Blei et al, 2003] 𝛼=0.001
  • 21. 21 Outline 1. Artifacts 2. Parameters 3. Training 4. Evaluation and Interpretation 5. Inference 6. Trends 7. Domains 8. Topic-based Similarity
  • 22. 22 http://librairy.linkeddata.es/api ❖ Distributing Text Mining tasks with librAIry. 
 Badenes-Olmedo, C.; Redondo-Garcia, J.; Corcho, O.
 In Proceedings of the 2017 ACM Symposium on Document Engineering (DocEng '17). ACM, 63-66. DOI: https://doi.org/10.1145/3103010.3121040 password: kcap2017user: oeg
  • 24. 24 • Documents per Corpus:
 <librairy-api>/domains/kcap/items
 • Document Content:
 <librairy-api>/items/3707eb49b81fb67e76bf1d40da842275?content=true
 • Document (noun) Lemmas:
 <librairy-api>/items/3707eb49b81fb67e76bf1d40da842275/annotations/lemma
 • Document Annotations:
 <librairy-api>/items/3707eb49b81fb67e76bf1d40da842275/annotations
 READ DOCUMENTS
  • 25. 25 • Topics per Corpus:
 <librairy-api>/domains/kcap/topics?words=10
 • Topic Details:
 <librairy-api>/domains/kcap/topics/0
 • Most Representative Documents in a Topic:
 <librairy-api>/domains/kcap/topics/0/items
 • Topics per Document:
 <librairy-api>/domains/kcap/items/d8933e1de1888ccbdf4e0df42c404d7e/topics? words=5
 READ TOPICS
  • 26. 26 let’s create a LDA model
  • 27. 27 • Parameters:
 
 [GET] <librairy-api>/domains/{domainId}/ parameters
 
 [POST]<librairy-api>/domains/{domainId}/ parameters ✴ lda.alpha = 0.1 ✴ lda.beta = 0.01 ✴ lda.topics = 6 ✴ lda.vocabulary.size = 10000 ✴ lda.max.iterations = 100 ✴ lda.optimizer = manual 
 (basic, nsga) ✴ lda.stopwords =example,service CREATE { "name": "lda.beta", "value": "0.01" } TOPICS • (Re)Train a model: [PUT] <librairy-api>/domains/kcap/topics

  • 28. 28 CREATE TOPICS $ git checkout lda $ docker start -i test • Adjust parameter.properties • Move to the ‘lda’ stage • Try $ nano parameters.properties • Results in ‘output/models/lda' folder
  • 30. 30 Outline 1. Artifacts 2. Parameters 3. Training 4. Evaluation and Interpretation 5. Inference 6. Trends 7. Domains 8. Topic-based Similarity
  • 32. Evaluation and Interpretation 32 • Synthetic Data
 - Because topic models have a generative process, we can generate data
 - Latent variables are no longer latent.
 - Take it account topic identifiers
 - Semi-synthetic data can be a good compromise: train from real data and generate new documents from that model
 • Baselines and Metrics
 - Compare to another model that either has a similar structure or application.
 - Perplexity or held-out likelihood: probability of held-out data given the settings of the model
 - Accuracy: prediction performance -> purpose oriented- Accuracy: prediction performance How well the model fit the data to be modelled
  • 33. Evaluation and Interpretation 33 Chang, J., Gerrish, S.,Wang, C., & Blei, D. M. (2009). ReadingTea Leaves: How Humans InterpretTopic Models.Advances in Neural Information Processing Systems 22, 288--296 • Word Intrusion: how semantically ‘cohesive’ the topics inferred by a model are and tests whether topics correspond to natural groupings for humans (Topic Coherence)
 
 
 
 
 
 • Topic Intrusion: how well a topic model’s decomposition of a document as a mixture of topics agrees with human associations of topics with a document { dog, cat, horse, apple, pig, cow} { car, teacher, platypus, agile, blue, Zaire} How interpretable is the model by a human
  • 34. 34 Outline 1. Artifacts 2. Parameters 3. Training 4. Evaluation and Interpretation 5. Inference 6. Trends 7. Domains 8. Topic-based Similarity
  • 35. 35 • Create a Document
 [POST] <librairy-api>/items/{id}
 • Add it to a Domain
 [POST] <librairy-api>/domains/{domainId}/items/{itemId}
 • Read Topics per Document:
 [GET] <librairy-api>/domains/{domainId}/items/{id}/topics
 INFERENCE TOPIC DISTRIBUTIONS CREATE { 
 "content": "string", "language": "string", "name": “string" }

  • 36. 36 CREATE $ git checkout inference • Write text in file.txt: $ docker start -i test • Execute: $ nano file.txt • Move to the ‘inference’ stage INFERENCE TOPIC DISTRIBUTIONS • Results in ‘output/inferences/lda' folder
  • 37. 37 Outline 1. Artifacts 2. Parameters 3. Training 4. Evaluation and Interpretation 5. Inference 6. Trends 7. Domains 8. Topic-based Similarity
  • 38. Correlated Topic Model (CTM) [Blei and Lafferty, 2006] 38 M, B. D., & D, L. J. (2006). CorrelatedTopic Models.Advances in Neural Information Processing Systems 18, 147–154 Correlated Topic Model (CTM) uses a logistic normal distribution for the topic proportions instead of a Dirichlet distribution to allow representing correlations between topics
  • 39. Extending LDA 39 • A Dynamic Topic Model (DTM) uses a logistic normal in a linear dynamic model to capture how topics change over time: • Supervised LDA are topic models of documents and responses, fit to find topics predictive of the response
  • 40. 40 COMPARE TOPICS CREATE $ git checkout trends •Move to the ‘trends’ stage $ docker start -i test • Adjust parameter.properties • Execute $ nano parameters.properties • Results in ‘output/similarities/topics' folder
  • 41. 41 CREATE $ similarities.csv # pairwise topic similarities $ topics.{domain}.txt # topic words in domain • See the ‘output/similarities/topics/ folder: Kumar, R., & Vassilvitskii, S. (2010). Generalized distances between rankings. Proceedings of the 19th International Conference on World Wide Web - WWW ’10,(3), 571. http://doi.org/10.1145/1772690.1772749 COMPARE TOPICS
  • 42. 42 Outline 1. Artifacts 2. Parameters 3. Training 4. Evaluation and Interpretation 5. Inference 6. Trends 7. Domains 8. Topic-based Similarity
  • 43. Historical Domain 43 • Finding themes in document that reflect temporal trends
 • Looking not just for the topical of each time period, but how those topics shift in concentration as they are influenced by historical events 
 • Discovering how events are reflected in writing and how ideas and emotions emerge in response to changing events. Topic Models in Historical Documents
  • 44. 44 Allen Beye Riddell. How to Read 22,198 Journal Articles: Studying the History of German Studies with Topic Models, pages 91–113. Camden House, 2012. A LDA model trained with 150 topics on 22.198 articles from JSTOR in the 20th century, removing numbers and articles having fewer than 200 words Historical Domain
  • 45. 45 Allen Beye Riddell. How to Read 22,198 Journal Articles: Studying the History of German Studies with Topic Models, pages 91–113. Camden House, 2012. Historical Domain
  • 46. 46 • the chronological trajectory of a topic is not the same thing as the trajectory of the individual words that compose it
 
 • individual topics always need to be interpreted in the context of the larger model
 
 • it becomes essential validate the description provided by a topic model by reference to something other than the topic model itself Conclusions Historical Domain
  • 47. Scientific Domain 47 • Specialized vocabularies define fields of study
 • Scientific documents innovate: unlike other domains, scientific documents are not just reports of news or events; they are news. Innovation is hard to detect and hard to attribute
 • Science and Policy: understanding scientific publications is important for funding agencies, lawmakers, and the public. Topic Models in Scientific Publications
  • 48. 48 Thomas L. Griffths and Mark Steyvers. Finding scientific topics. Proceedings of the National Academy of Sciences, 101(Suppl 1):5228–5235, 2004 Reconstruct the official Proceedings of the National Academy of Sciences (PNAS) topics automatically using topic models 28.154 abstract published in PNAS from 1991 to 2001 LDA, beta=0.1, alpha=50/K, K=numTopics=50/100/200../1000 Scientific Domain
  • 49. 49 Thomas L. Griffths and Mark Steyvers. Finding scientific topics. Proceedings of the National Academy of Sciences, 101(Suppl 1):5228–5235, 2004 Scientific Domain
  • 50. 50 Zhou, Ding, Xiang Ji, Hongyuan Zha and C. Lee Giles. Topic evolution and social interactions: how authors effect research. CIKM (2006). given a seemingly new topic, from where does this topic evolve? a newly emergent topic is truly new or rather a variation of an old topic? CiteSeer Dataset hypothesis • “one topic evolves into another topic when the corresponding social actors interact with other actors with different topics in the latent social network" Scientific Domain
  • 51. 51 Zhou, Ding, Xiang Ji, Hongyuan Zha and C. Lee Giles. Topic evolution and social interactions: how authors effect research. CIKM (2006). Topics with heavy methodology requirements (e.g. np problem, linear system) and/or popular topics (e.g. mobile computing, net- work) are more likely to remain stable. By contrast, topics closely related to applications are more likely to have higher transition probabilities than other topics (e.g. data mining in database, security) all things being equal Scientific Domain
  • 52. 52 Zhou, Ding, Xiang Ji, Hongyuan Zha and C. Lee Giles. Topic evolution and social interactions: how authors effect research. CIKM (2006). Scientific Domain
  • 53. 53 • Understanding science communication allows us to see how our understanding of nature, technology, and engineering have advanced over the years 
 
 • Topic models can capture how these fields have changed and have gained additional knowledge with each new discovery 
 • As the scientific enterprise becomes more distributed and faster moving, these tools are important for scientists hoping to understand trends and development and for policy makers who seek to guide innovation Conclusions Scientific Domain
  • 54. Fiction and Literature Domain 54 • What is a document?:Treating novels as a single bag of words does not work.Topics resulting from this corpus treatment are overly vague and lack thematic coherence 
 • People and Places: Because most works of fiction are set in imaginary worlds that do not exist outside the work itself, they have words such as character names that are extremely frequent locally but never occur elsewhere • Beyond the Literal: Documents that are valued not just for their information content but for their artistic expression • Topic models complement close reading in two ways, as a survey method and as a means for tracing and comparing large-scale patterns Topic Models in Literature
  • 55. Fiction and Literature 55 Beyond the Literal Lia M. Rhody. Topic modeling and figurative language. Journal of Digital Humanities, 2(1), 2012 In a corpus of 4600 poems and a sixty-topic model, one of the topics discovered: 
 {night, light, moon, stars, day, dark, sun, sleep, sky, wind, time, eyes, star, darkness, bright } Analyzing the context of the topic (i.e. poems), they are using a metaphor relating night and sleep to death. Thus, the topic can be considered as the death as metaphor. Rhody [2012] demonstrates on a corpus of poetry that although topics do not represent symbolic meanings, they are a good way of detecting the concrete language associated with repeated metaphors.
  • 56. Fiction and Literature 56 • Topic models cannot by themselves study literature, but they are useful tools for scholars studying literature 
 • Literary concepts are complicated, but they often have surprisingly strong statistical signatures • Models can still be useful in identifying areas of potential interest, even if they don’t “understand” what they are finding Conclusions
  • 57. 57 Outline 1. Artifacts 2. Parameters 3. Training 4. Evaluation and Interpretation 5. Inference 6. Trends 7. Domains 8. Topic-based Similarity
  • 58. 58 Topic Model Similarity Valuetopic0: 0.520, topic1: 0.327 topic2: 0.081 … topic122: 0.182 topic1: 0.573, topic2: 0.172 topic3: 0.136 … topic122: 0.099 0.595 .. Topic0 Topic1 Topic2 Topic122 DOCUMENT ‘A’ DOCUMENT ‘B’ Topic-based Similarity Jensen-Shannon Divergence
  • 59. 59 Representativeness Badenes-Olmedo, C., Redondo-Garcia, J. L., & Corcho, O. (2017). An Initial Analysis of Topic-based Similarity among Scientific Documents based on their rhetorical discourse parts. In Proceedings of the 1st SEMSCI workshop co-located with ISWC. Full-Paper Summary Internal External finding related items describing main ideas (JSD-based similarity) (precision / recall / f-measure) JSD-based similarity JSD-based similarity librAIry
  • 61. 61 $ git checkout similarity CREATE RELATIONS • Move to the ‘similarity’ stage: $ docker start -i test • Adjust parameter.properties • Execute $ nano parameters.properties • Results in ‘output/models/similarities' folder
  • 63. 63 CREATE • Try with input.domain=ecommerce (1000 nodes)
  • 64. 64 References • Jurafsky, D., & Martin, J. H. (2016). Language Modeling with N- grams. Speech and Language Processing, 1–28.
 • Jordan Boyd-Graber, Yuening Hu and David Mimno (2017), "Applications of Topic Models", Foundations and Trends® in Information Retrieval: Vol. 11: No. 2-3, pp 143-296
 • Blei, David M., Lawrence Carin and David B. Dunson. “Probabilistic Topic Models.” IEEE Signal Processing Magazine 27 (2010): 55-65.
 • Zhai, ChengXiang. “Probabilistic Topic Models for Text Data Retrieval and Analysis.” SIGIR (2017).
 • Wang, C., Blei, D., & Heckerman, D. (2008). Continuous Time Dynamic Topic Models. Proc of UAI, 579–586.
 • Chang, J., Gerrish, S., Wang, C., & Blei, D. M. (2009). Reading Tea Leaves: How Humans Interpret Topic Models. Advances in Neural Information Processing Systems 22, 288--296. 
 • Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet Allocation. Journal of Machine Learning Research, 3(4– 5), 993–1022. 
 • Cheng, X., Yan, X., Lan, Y., & Guo, J. (2014). BTM: Topic Modeling over Short Texts. Knowledge and Data Engineering, IEEE Transactions
 • Blei, D. M., & Lafferty, J. D. (2007). A correlated topic model of Science. The Annals of Applied Statistics, 1(1), 17– 35. 
 • Griffiths, T. L., Jordan, M. I., Tenenbaum, J. B., & Blei, D. M. (2004). Hierarchical Topic Models and the Nested Chinese Restaurant Process. Advances in Neural Information Processing Systems, 17–24.
 • Badenes-Olmedo, C.; Redondo-Garcia, J.; Corcho, O. (2017) Distributing Text Mining tasks with librAIry. In Proceedings of the 2017 ACM Symposium on Document Engineering (DocEng '17). ACM, 63-66.
  • 65. Probabilistic Topic Models Badenes-Olmedo, Carlos Corcho, Oscar Ontology Engineering Group (OEG) Universidad Politécnica de Madrid (UPM) K-CAP 2017 Knowledge Capture December 4th-6th, 2017 Austin, Texas, United States cbadenes@fi.upm.es @carbadol github.com/librairy oeg-upm.net