SlideShare a Scribd company logo
1 of 30
Social Web: Where are the Semantics?
ESWC 2014
Miriam Fernández, Victor Rodríguez,
Andrés García-Silva, Oscar Corcho
Ontology Engineering Group, UPM, Spain
Knowledge Media Institute, The Open University
Outline
2
•  Part 1: Understanding Social Media
–  Theory: background & applications described in this tutorial
–  Hands on: data extraction from Twitter and Facebook
•  Part 2: Using semantics to represent data from SNS
–  Theory: Using SW to represent content, users and relations
–  Hands on: applying and extending SIOC
•  Part 3: Using semantics to understand social media conversations
–  Theory: Using semantics to understand topics in social media
–  Hands on: using LDA to extract topics from social media
•  Part 4: Using semantics to understand user behaviour
Why we need semantics to understand social media?
•  Information overwhelming
–  We need mechanisms to support
•  Better information search/recommendation
•  Better information integration
•  Automatic knowledge extraction
•  User generated content is generally unstructured
–  Machines can not understand this content!
ESWC 2014 Social Web: Where are the Semantics? 3
"The Semantic Web is an extension of the current Web in which
information is given well-defined meaning, better enabling computers
and people to work in cooperation."
Tim Berners-Lee, James Hendler, Ora Lassila, The Semantic Web,
Scientific American, May 2001
Implicit vs. Explicit Semantics
•  Implicit Semantics
–  Implicit, also called statistical semantics focuses on extracting word
sense by studying the patterns of human word usage in massive
collections of text or other human generated data.
–  It does not rely on an explicit formalisation/conceptualisation of
knowledge
•  Explicit Semantics
–  Explicit semantics, focus on the analysis of content by using the
support of explicit conceptualisations in the form of ontologies and
knowledge bases
ESWC 2014 Social Web: Where are the Semantics? 4
Implicit semantics: Topic models
•  Topic models: one possible way of extracting implicit semantics
ESWC 2014 Social Web: Where are the Semantics? 5
bags of words
ESWC 2014 Social Web: Where are the Semantics? 6
Word count
ESWC 3
rank 1
technology 2
conference 1
venue 1
semantic 5
Web 7
knowledge 5
...
Word count
ISWC 3
rank 0
venue 1
semantic 5
conference 0
venue 1
semantic 5
Web 5
knowledge 0
...
term-document matrix
ESWC 2014 Social Web: Where are the Semantics? 7
•  Term-document matrix
–  A very large, sparse matrix
–  A document can be seen as a vector
similarity measures
•  Useful to answer… how similar are two documents?
–  Distance measures between two documents
•  Cosine similarity = ​ 𝐴•𝐵∕‖𝐴‖‖𝐵‖ =0,72
•  Jackard index = ​| 𝐴∩ 𝐵|∕|𝐴∪ 𝐵| =0,50
•  However:
–  Synonyms will appear far apart while they aren’t
–  Polysemic words will appear close while they aren’t
•  What is a document talking about?
–  «Explicit semantic analysis»(ESA)
ESWC 2014 Social Web: Where are the Semantics? 8
info retrieval, text mining
•  Classification. Documents may belong to different classes
•  How relevant is a word for a document (or class of documents)?
TF−IDF(𝑥, 𝑦)=​ 𝑡 𝑓↓𝑥, 𝑦 ×log​(​ 𝐷⁄​ 𝑑↓𝑥  )
ESWC 2014 Social Web: Where are the Semantics? 9
tfx,y=freq. of x in y
D=number of documents
Dx=number of documents containing x
latent semantic analysis
ESWC 2014 Social Web: Where are the Semantics? 10
Figure taken from http://faculty.washington.edu/jwilker/559/
SteyversGriffiths.pdf
Singular Value Decomposition to reduce dimensionality
•  The term-document matrix is large
•  Latent Semantic Analysis
•  Rank of D can be reduced
•  Meaning
–  U=term-topic correlation
–  D=topic importance
–  V=document-topic correlation
Semantics of a social media message
ESWC 2014 Social Web: Where are the Semantics? 11
Topics
ESWC 2014 Social Web: Where are the Semantics? 12
discriminative models / generative models
•  Discriminative Models (1-step)
1.  Directly infer posterior probabilities p(Ck|x)
•  Generative Models (2-steps)
1.  Infer class-conditional densities p(x|Ck) and priors p(Ck)
2.  Use Bayes theorem to determine posterior probabilities
𝑝​​ 𝐶↓1 ⁠ 𝑥 =​ 𝑝​ 𝑥⁠​ 𝐶↓1  𝑝(​ 𝐶↓1 )/𝑝​ 𝑥⁠​ 𝐶↓1  𝑝(​ 𝐶↓1 )+ 𝑝​ 𝑥⁠​ 𝐶↓2  𝑝(​ 𝐶↓2 ) 
ESWC 2014 Social Web: Where are the Semantics? 13
We can generate x that are
likely to have been produced by
class C1
Generative model
ESWC 2014 Social Web: Where are the Semantics? 14
LDA: a probabilistic generative model
ESWC 2014 Social Web: Where are the Semantics? 15
This is a Probabilistic
Generative Process: we can
generate documents according
to certain topics.
Topic models
ESWC 2014 Social Web: Where are the Semantics? 16
Topic models
ESWC 2014 Social Web: Where are the Semantics? 17
Topic models
ESWC 2014 Social Web: Where are the Semantics? 18
Topics known a priori Latent topics
•  We don’t know the topics in
advance
•  We don’t know the importance
of each word in a topic
Latent topics are not pre-specified but found from the corpus
topic models vs LSA
ESWC 2014 Social Web: Where are the Semantics? 19
Figure taken from http://faculty.washington.edu/jwilker/559/
SteyversGriffiths.pdf
Singular Value Decomposition reduces dimensionality
•  Latent Semantic Analysis vs Topic Model
topic models
ESWC 2014 Social Web: Where are the Semantics? 20
How important is a word in a topic How important is a topic in a document
topic models: LDA
ESWC 2014 Social Web: Where are the Semantics? 21
D documents, using a
total of W words
K topics
LDA: each document d is a mixture among Z topics with each
topic being a multinomial distribution over a vocabulary of W words
θd : topic distribution for a document d (~Dirichlet(α))
ϕz : word probabilities for a topic z (~Dirichlet(β))
•  Latent Direchlet Allocation (LDA)
Topic models
ESWC 2014 Social Web: Where are the Semantics? 22
LDA: each document d is a mixture among Z topics with each
topic being a multinomial distribution over a vocabulary of W words
Probability of a word:
θd : topic distribution for a document d (~Dirichlet(α))
ϕz : word probabilities for a topic z (~Dirichlet(β))
•  Latent Direchlet Allocation (LDA)
Reminder of Gamma function
Topic models
•  Joint probability distribution for a document of all the random
variables, assuming we know α and β.
•  Given a set of documents D and the LDA model we can use
inference to find out θ,ϕ and the topic assignment for each word
•  Intractable problem, but numerically solvable with the Gibbs
sampling method (sort of Monte Carlo for Markov Chains method)
ESWC 2014 Social Web: Where are the Semantics? 23
for each word in the document
Probability of a word in a document,
knowing the distribution of words in a topic
Probability of a topic in a document,
knowing the topic distribution for a document
The Dirichlet functions declared
before
Topic models: summary
•  Latent Dirichlet Allocation is a generative probabilistic model which
can discover latent topics in unlabelled data
•  Labelled LDA: a supervised version
•  Implementations:
–  Mallet
–  Stanford Topic Modelling Toolbox (Stanford)
•  Applications:
•  Similarity between two documents
•  Classification of texts
•  Indexing of documents
ESWC 2014 Social Web: Where are the Semantics? 24
Some references
•  Latent Dirichlet allocation. D. Blei, A. Ng, and M. Jordan. Journal
of Machine Learning Research, 3:993-1022, January 2003.
•  Finding Scientific Topics. Griffiths, T., & Steyvers, M. (2004).
Proceedings of the National Academy of Sciences, 101 (suppl. 1),
5228-5235.
•  Semantic Characterization of Tweets Using Topic Models: A Use
Case in the Entertainment Domain. A. García-Silva, V. Rodriguez-
Doncel, O. Corcho. Int. Journal on Semantic Web and Information
Systems (IJSWIS), 9(3), 1-13 (2013)
ESWC 2014 Social Web: Where are the Semantics? 25
Social Web: Where are the Semantics?
ESWC 2014
Victor, Andres, Oscar, Miriam
Universidad Politectica de Madrid, Spain
Knowledge Media Institute, The Open University,
mixture model
We observe some data generated by a mixture of distributions and we want to learn
the parameters of these distributions. A mixture model is a probabilistic model
representing the linear combination of several PDFs
We dont even see the colours!
Rule based
ESWC 2014 Social Web: Where are the Semantics? 28
Classes
Satisfaction
Insatisfaction Hate
Love
Fear
Trust Happiness
Sadness
SD LH TF HS
+
-
Rule grammar
object RuleGrammar extends StandardTokenParsers {
lexical.delimiters += ("[", "]", "#", "->", "+", "-", "*", "/", "=", "." )
lexical.reserved += ("_")
def rule_set = (replace_rule | classify_rule).*
def replace_rule = stringLit.* ~ "=" ~ stringLit
def classify_rule = morpho_sequence ~ "->" ~ action
def morpho_sequence = (word | lemma_pos| lemma | pos | entity | wildcard |
limited_wildcard).*
def word = stringLit
def lemma = ident
def pos = "[" ~> ident <~ "]"
def lemma_pos = ident ~ "#" ~ ident
def entity = "<ENTITY>"
def wildcard = "*"
def limited_wildcard = "/" ~> numericLit <~ "/"
def action = classify_action | chunk_action
def classify_action = ident ~ ("+" | "-" | "*" | "/") ~ number
def number = numericLit | negative
def negative = "-" ~ numericLit
def chunk_action = ident <~ "."
}
Examples (Spanish):
(cómo/cada día) odio (más) a (el/la/esta/…) ent:
/2/ ODIAR#V /1/ A#SP /1/ <ENTITY> -> LH -2
mi/este odio a/por ent:
D ODIO#NC [SP] <ENTITY> -> LH -1

More Related Content

What's hot

Text Mining for Lexicography
Text Mining for LexicographyText Mining for Lexicography
Text Mining for LexicographyLeiden University
 
An Evolution of Deep Learning Models for AI2 Reasoning Challenge
An Evolution of Deep Learning Models for AI2 Reasoning ChallengeAn Evolution of Deep Learning Models for AI2 Reasoning Challenge
An Evolution of Deep Learning Models for AI2 Reasoning ChallengeTraian Rebedea
 
Complex Networks Analysis @ Universita Roma Tre
Complex Networks Analysis @ Universita Roma TreComplex Networks Analysis @ Universita Roma Tre
Complex Networks Analysis @ Universita Roma TreMatteo Moci
 
Open domain Question Answering System - Research project in NLP
Open domain  Question Answering System - Research project in NLPOpen domain  Question Answering System - Research project in NLP
Open domain Question Answering System - Research project in NLPGVS Chaitanya
 
Neural Models for Information Retrieval
Neural Models for Information RetrievalNeural Models for Information Retrieval
Neural Models for Information RetrievalBhaskar Mitra
 
Spreading processes on temporal networks
Spreading processes on temporal networksSpreading processes on temporal networks
Spreading processes on temporal networksPetter Holme
 

What's hot (6)

Text Mining for Lexicography
Text Mining for LexicographyText Mining for Lexicography
Text Mining for Lexicography
 
An Evolution of Deep Learning Models for AI2 Reasoning Challenge
An Evolution of Deep Learning Models for AI2 Reasoning ChallengeAn Evolution of Deep Learning Models for AI2 Reasoning Challenge
An Evolution of Deep Learning Models for AI2 Reasoning Challenge
 
Complex Networks Analysis @ Universita Roma Tre
Complex Networks Analysis @ Universita Roma TreComplex Networks Analysis @ Universita Roma Tre
Complex Networks Analysis @ Universita Roma Tre
 
Open domain Question Answering System - Research project in NLP
Open domain  Question Answering System - Research project in NLPOpen domain  Question Answering System - Research project in NLP
Open domain Question Answering System - Research project in NLP
 
Neural Models for Information Retrieval
Neural Models for Information RetrievalNeural Models for Information Retrieval
Neural Models for Information Retrieval
 
Spreading processes on temporal networks
Spreading processes on temporal networksSpreading processes on temporal networks
Spreading processes on temporal networks
 

Viewers also liked

Viewers also liked (8)

ESWC 2014 Tutorial part 1
ESWC 2014 Tutorial part 1ESWC 2014 Tutorial part 1
ESWC 2014 Tutorial part 1
 
Toyota slides candi's revision
Toyota slides  candi's revisionToyota slides  candi's revision
Toyota slides candi's revision
 
DealersProgram
DealersProgramDealersProgram
DealersProgram
 
Dealers Program
Dealers ProgramDealers Program
Dealers Program
 
ESWC 2014 Tutorial Part 4
ESWC 2014 Tutorial Part 4ESWC 2014 Tutorial Part 4
ESWC 2014 Tutorial Part 4
 
ESWC 2014 Tutorial part 2
ESWC 2014 Tutorial part 2ESWC 2014 Tutorial part 2
ESWC 2014 Tutorial part 2
 
SocInfo2014 CityLabs Workshop
SocInfo2014 CityLabs WorkshopSocInfo2014 CityLabs Workshop
SocInfo2014 CityLabs Workshop
 
CAEPIA 2011
CAEPIA 2011CAEPIA 2011
CAEPIA 2011
 

Similar to ESWC 2014 Tutorial part 3

Schema-agnositc queries over large-schema databases: a distributional semanti...
Schema-agnositc queries over large-schema databases: a distributional semanti...Schema-agnositc queries over large-schema databases: a distributional semanti...
Schema-agnositc queries over large-schema databases: a distributional semanti...Andre Freitas
 
Fueling the future with Semantic Web patterns - Keynote at WOP2014@ISWC
Fueling the future with Semantic Web patterns - Keynote at WOP2014@ISWCFueling the future with Semantic Web patterns - Keynote at WOP2014@ISWC
Fueling the future with Semantic Web patterns - Keynote at WOP2014@ISWCValentina Presutti
 
Looking for Commonsense in the Semantic Web
Looking for Commonsense in the Semantic WebLooking for Commonsense in the Semantic Web
Looking for Commonsense in the Semantic WebValentina Presutti
 
Project Proposal Topics Modeling (Ir)
Project Proposal    Topics Modeling (Ir)Project Proposal    Topics Modeling (Ir)
Project Proposal Topics Modeling (Ir)Svitlana volkova
 
Slides: Concurrent Inference of Topic Models and Distributed Vector Represent...
Slides: Concurrent Inference of Topic Models and Distributed Vector Represent...Slides: Concurrent Inference of Topic Models and Distributed Vector Represent...
Slides: Concurrent Inference of Topic Models and Distributed Vector Represent...Parang Saraf
 
Introaied nancy2019 luengo
Introaied nancy2019 luengoIntroaied nancy2019 luengo
Introaied nancy2019 luengoVanda Luengo
 
Knowledge Graph Introduction
Knowledge Graph IntroductionKnowledge Graph Introduction
Knowledge Graph IntroductionSören Auer
 
Semantics at Scale: A Distributional Approach
Semantics at Scale: A Distributional ApproachSemantics at Scale: A Distributional Approach
Semantics at Scale: A Distributional ApproachAndre Freitas
 
Auto Mapping Texts for Human-Machine Analysis and Sensemaking
Auto Mapping Texts for Human-Machine Analysis and SensemakingAuto Mapping Texts for Human-Machine Analysis and Sensemaking
Auto Mapping Texts for Human-Machine Analysis and SensemakingShalin Hai-Jew
 
Topic Modeling for Information Retrieval and Word Sense Disambiguation tasks
Topic Modeling for Information Retrieval and Word Sense Disambiguation tasksTopic Modeling for Information Retrieval and Word Sense Disambiguation tasks
Topic Modeling for Information Retrieval and Word Sense Disambiguation tasksLeonardo Di Donato
 
Extracting Relevant Questions to an RDF Dataset Using Formal Concept Analysis
Extracting Relevant Questions to an RDF Dataset Using Formal Concept AnalysisExtracting Relevant Questions to an RDF Dataset Using Formal Concept Analysis
Extracting Relevant Questions to an RDF Dataset Using Formal Concept AnalysisMathieu d'Aquin
 
Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...
Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...
Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...Andre Freitas
 
«Ejemplos de herramientas que nos facilitan las analíticas de aprendizaje en ...
«Ejemplos de herramientas que nos facilitan las analíticas de aprendizaje en ...«Ejemplos de herramientas que nos facilitan las analíticas de aprendizaje en ...
«Ejemplos de herramientas que nos facilitan las analíticas de aprendizaje en ...eMadrid network
 
Knowledge Graph Construction and the Role of DBPedia
Knowledge Graph Construction and the Role of DBPediaKnowledge Graph Construction and the Role of DBPedia
Knowledge Graph Construction and the Role of DBPediaPaul Groth
 
Data science syllabus
Data science syllabusData science syllabus
Data science syllabusanoop bk
 
UNIBA: Exploiting a Distributional Semantic Model for Disambiguating and Link...
UNIBA: Exploiting a Distributional Semantic Model for Disambiguating and Link...UNIBA: Exploiting a Distributional Semantic Model for Disambiguating and Link...
UNIBA: Exploiting a Distributional Semantic Model for Disambiguating and Link...Pierpaolo Basile
 

Similar to ESWC 2014 Tutorial part 3 (20)

Schema-agnositc queries over large-schema databases: a distributional semanti...
Schema-agnositc queries over large-schema databases: a distributional semanti...Schema-agnositc queries over large-schema databases: a distributional semanti...
Schema-agnositc queries over large-schema databases: a distributional semanti...
 
Fueling the future with Semantic Web patterns - Keynote at WOP2014@ISWC
Fueling the future with Semantic Web patterns - Keynote at WOP2014@ISWCFueling the future with Semantic Web patterns - Keynote at WOP2014@ISWC
Fueling the future with Semantic Web patterns - Keynote at WOP2014@ISWC
 
Digital repertoires of poetry metrics: towards a Linked Open Data ecosystem
Digital repertoires of poetry metrics: towards a Linked Open Data ecosystemDigital repertoires of poetry metrics: towards a Linked Open Data ecosystem
Digital repertoires of poetry metrics: towards a Linked Open Data ecosystem
 
Looking for Commonsense in the Semantic Web
Looking for Commonsense in the Semantic WebLooking for Commonsense in the Semantic Web
Looking for Commonsense in the Semantic Web
 
Presentation to KILT
Presentation to KILTPresentation to KILT
Presentation to KILT
 
Project Proposal Topics Modeling (Ir)
Project Proposal    Topics Modeling (Ir)Project Proposal    Topics Modeling (Ir)
Project Proposal Topics Modeling (Ir)
 
Slides: Concurrent Inference of Topic Models and Distributed Vector Represent...
Slides: Concurrent Inference of Topic Models and Distributed Vector Represent...Slides: Concurrent Inference of Topic Models and Distributed Vector Represent...
Slides: Concurrent Inference of Topic Models and Distributed Vector Represent...
 
Introaied nancy2019 luengo
Introaied nancy2019 luengoIntroaied nancy2019 luengo
Introaied nancy2019 luengo
 
Knowledge Graph Introduction
Knowledge Graph IntroductionKnowledge Graph Introduction
Knowledge Graph Introduction
 
Digital Humanities Workshop
Digital Humanities WorkshopDigital Humanities Workshop
Digital Humanities Workshop
 
Semantics at Scale: A Distributional Approach
Semantics at Scale: A Distributional ApproachSemantics at Scale: A Distributional Approach
Semantics at Scale: A Distributional Approach
 
Auto Mapping Texts for Human-Machine Analysis and Sensemaking
Auto Mapping Texts for Human-Machine Analysis and SensemakingAuto Mapping Texts for Human-Machine Analysis and Sensemaking
Auto Mapping Texts for Human-Machine Analysis and Sensemaking
 
Topic Modeling for Information Retrieval and Word Sense Disambiguation tasks
Topic Modeling for Information Retrieval and Word Sense Disambiguation tasksTopic Modeling for Information Retrieval and Word Sense Disambiguation tasks
Topic Modeling for Information Retrieval and Word Sense Disambiguation tasks
 
Extracting Relevant Questions to an RDF Dataset Using Formal Concept Analysis
Extracting Relevant Questions to an RDF Dataset Using Formal Concept AnalysisExtracting Relevant Questions to an RDF Dataset Using Formal Concept Analysis
Extracting Relevant Questions to an RDF Dataset Using Formal Concept Analysis
 
Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...
Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...
Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...
 
«Ejemplos de herramientas que nos facilitan las analíticas de aprendizaje en ...
«Ejemplos de herramientas que nos facilitan las analíticas de aprendizaje en ...«Ejemplos de herramientas que nos facilitan las analíticas de aprendizaje en ...
«Ejemplos de herramientas que nos facilitan las analíticas de aprendizaje en ...
 
Where Does It Break?
Where Does It Break?Where Does It Break?
Where Does It Break?
 
Knowledge Graph Construction and the Role of DBPedia
Knowledge Graph Construction and the Role of DBPediaKnowledge Graph Construction and the Role of DBPedia
Knowledge Graph Construction and the Role of DBPedia
 
Data science syllabus
Data science syllabusData science syllabus
Data science syllabus
 
UNIBA: Exploiting a Distributional Semantic Model for Disambiguating and Link...
UNIBA: Exploiting a Distributional Semantic Model for Disambiguating and Link...UNIBA: Exploiting a Distributional Semantic Model for Disambiguating and Link...
UNIBA: Exploiting a Distributional Semantic Model for Disambiguating and Link...
 

More from Miriam Fernandez

Biases in Social Media Research (NoBias EU project)
Biases in Social Media Research (NoBias EU project)Biases in Social Media Research (NoBias EU project)
Biases in Social Media Research (NoBias EU project)Miriam Fernandez
 
Research seminar Queen Mary University of London (CogSci)
Research seminar Queen Mary University of London (CogSci)Research seminar Queen Mary University of London (CogSci)
Research seminar Queen Mary University of London (CogSci)Miriam Fernandez
 
Vision track october_2020_fernandez_v5
Vision track october_2020_fernandez_v5Vision track october_2020_fernandez_v5
Vision track october_2020_fernandez_v5Miriam Fernandez
 
On the Application of Social Data Science to Address Societal Challenges
On the Application of Social Data Science to Address Societal ChallengesOn the Application of Social Data Science to Address Societal Challenges
On the Application of Social Data Science to Address Societal ChallengesMiriam Fernandez
 
Online radicalisation: work, challenges and future directions
Online radicalisation: work, challenges and future directionsOnline radicalisation: work, challenges and future directions
Online radicalisation: work, challenges and future directionsMiriam Fernandez
 
Mining Social Media Data For Policing
Mining Social Media Data For PolicingMining Social Media Data For Policing
Mining Social Media Data For PolicingMiriam Fernandez
 
Introduction to Mining Social Media Data
Introduction to Mining Social Media DataIntroduction to Mining Social Media Data
Introduction to Mining Social Media DataMiriam Fernandez
 
Online Misinformation: Challenges and Future Directions
Online Misinformation: Challenges and Future DirectionsOnline Misinformation: Challenges and Future Directions
Online Misinformation: Challenges and Future DirectionsMiriam Fernandez
 
Slides 28-feb-2018-v2.pptx
Slides 28-feb-2018-v2.pptxSlides 28-feb-2018-v2.pptx
Slides 28-feb-2018-v2.pptxMiriam Fernandez
 
Artificial Intelligence for Policing
Artificial Intelligence for PolicingArtificial Intelligence for Policing
Artificial Intelligence for PolicingMiriam Fernandez
 
OUSocial OUSocMed conference
OUSocial OUSocMed conference OUSocial OUSocMed conference
OUSocial OUSocMed conference Miriam Fernandez
 
On the use of social media for evidence-based policing
On the use of social media for evidence-based policingOn the use of social media for evidence-based policing
On the use of social media for evidence-based policingMiriam Fernandez
 
ECSM2014: Using Social Media To Inform Policy Making: To whom are we listenin...
ECSM2014: Using Social Media To Inform Policy Making: To whom are we listenin...ECSM2014: Using Social Media To Inform Policy Making: To whom are we listenin...
ECSM2014: Using Social Media To Inform Policy Making: To whom are we listenin...Miriam Fernandez
 
ESWC 2014 Tutorial Handson 1: Collect Data from Facebook
ESWC 2014 Tutorial Handson 1: Collect Data from FacebookESWC 2014 Tutorial Handson 1: Collect Data from Facebook
ESWC 2014 Tutorial Handson 1: Collect Data from FacebookMiriam Fernandez
 
Wm unit1.6-slides-semantic web-final
Wm unit1.6-slides-semantic web-finalWm unit1.6-slides-semantic web-final
Wm unit1.6-slides-semantic web-finalMiriam Fernandez
 
Iswc 2011: Linking Data Across Universities: An Integrated Video Lectures Dat...
Iswc 2011: Linking Data Across Universities: An Integrated Video Lectures Dat...Iswc 2011: Linking Data Across Universities: An Integrated Video Lectures Dat...
Iswc 2011: Linking Data Across Universities: An Integrated Video Lectures Dat...Miriam Fernandez
 

More from Miriam Fernandez (16)

Biases in Social Media Research (NoBias EU project)
Biases in Social Media Research (NoBias EU project)Biases in Social Media Research (NoBias EU project)
Biases in Social Media Research (NoBias EU project)
 
Research seminar Queen Mary University of London (CogSci)
Research seminar Queen Mary University of London (CogSci)Research seminar Queen Mary University of London (CogSci)
Research seminar Queen Mary University of London (CogSci)
 
Vision track october_2020_fernandez_v5
Vision track october_2020_fernandez_v5Vision track october_2020_fernandez_v5
Vision track october_2020_fernandez_v5
 
On the Application of Social Data Science to Address Societal Challenges
On the Application of Social Data Science to Address Societal ChallengesOn the Application of Social Data Science to Address Societal Challenges
On the Application of Social Data Science to Address Societal Challenges
 
Online radicalisation: work, challenges and future directions
Online radicalisation: work, challenges and future directionsOnline radicalisation: work, challenges and future directions
Online radicalisation: work, challenges and future directions
 
Mining Social Media Data For Policing
Mining Social Media Data For PolicingMining Social Media Data For Policing
Mining Social Media Data For Policing
 
Introduction to Mining Social Media Data
Introduction to Mining Social Media DataIntroduction to Mining Social Media Data
Introduction to Mining Social Media Data
 
Online Misinformation: Challenges and Future Directions
Online Misinformation: Challenges and Future DirectionsOnline Misinformation: Challenges and Future Directions
Online Misinformation: Challenges and Future Directions
 
Slides 28-feb-2018-v2.pptx
Slides 28-feb-2018-v2.pptxSlides 28-feb-2018-v2.pptx
Slides 28-feb-2018-v2.pptx
 
Artificial Intelligence for Policing
Artificial Intelligence for PolicingArtificial Intelligence for Policing
Artificial Intelligence for Policing
 
OUSocial OUSocMed conference
OUSocial OUSocMed conference OUSocial OUSocMed conference
OUSocial OUSocMed conference
 
On the use of social media for evidence-based policing
On the use of social media for evidence-based policingOn the use of social media for evidence-based policing
On the use of social media for evidence-based policing
 
ECSM2014: Using Social Media To Inform Policy Making: To whom are we listenin...
ECSM2014: Using Social Media To Inform Policy Making: To whom are we listenin...ECSM2014: Using Social Media To Inform Policy Making: To whom are we listenin...
ECSM2014: Using Social Media To Inform Policy Making: To whom are we listenin...
 
ESWC 2014 Tutorial Handson 1: Collect Data from Facebook
ESWC 2014 Tutorial Handson 1: Collect Data from FacebookESWC 2014 Tutorial Handson 1: Collect Data from Facebook
ESWC 2014 Tutorial Handson 1: Collect Data from Facebook
 
Wm unit1.6-slides-semantic web-final
Wm unit1.6-slides-semantic web-finalWm unit1.6-slides-semantic web-final
Wm unit1.6-slides-semantic web-final
 
Iswc 2011: Linking Data Across Universities: An Integrated Video Lectures Dat...
Iswc 2011: Linking Data Across Universities: An Integrated Video Lectures Dat...Iswc 2011: Linking Data Across Universities: An Integrated Video Lectures Dat...
Iswc 2011: Linking Data Across Universities: An Integrated Video Lectures Dat...
 

ESWC 2014 Tutorial part 3

  • 1. Social Web: Where are the Semantics? ESWC 2014 Miriam Fernández, Victor Rodríguez, Andrés García-Silva, Oscar Corcho Ontology Engineering Group, UPM, Spain Knowledge Media Institute, The Open University
  • 2. Outline 2 •  Part 1: Understanding Social Media –  Theory: background & applications described in this tutorial –  Hands on: data extraction from Twitter and Facebook •  Part 2: Using semantics to represent data from SNS –  Theory: Using SW to represent content, users and relations –  Hands on: applying and extending SIOC •  Part 3: Using semantics to understand social media conversations –  Theory: Using semantics to understand topics in social media –  Hands on: using LDA to extract topics from social media •  Part 4: Using semantics to understand user behaviour
  • 3. Why we need semantics to understand social media? •  Information overwhelming –  We need mechanisms to support •  Better information search/recommendation •  Better information integration •  Automatic knowledge extraction •  User generated content is generally unstructured –  Machines can not understand this content! ESWC 2014 Social Web: Where are the Semantics? 3 "The Semantic Web is an extension of the current Web in which information is given well-defined meaning, better enabling computers and people to work in cooperation." Tim Berners-Lee, James Hendler, Ora Lassila, The Semantic Web, Scientific American, May 2001
  • 4. Implicit vs. Explicit Semantics •  Implicit Semantics –  Implicit, also called statistical semantics focuses on extracting word sense by studying the patterns of human word usage in massive collections of text or other human generated data. –  It does not rely on an explicit formalisation/conceptualisation of knowledge •  Explicit Semantics –  Explicit semantics, focus on the analysis of content by using the support of explicit conceptualisations in the form of ontologies and knowledge bases ESWC 2014 Social Web: Where are the Semantics? 4
  • 5. Implicit semantics: Topic models •  Topic models: one possible way of extracting implicit semantics ESWC 2014 Social Web: Where are the Semantics? 5
  • 6. bags of words ESWC 2014 Social Web: Where are the Semantics? 6 Word count ESWC 3 rank 1 technology 2 conference 1 venue 1 semantic 5 Web 7 knowledge 5 ... Word count ISWC 3 rank 0 venue 1 semantic 5 conference 0 venue 1 semantic 5 Web 5 knowledge 0 ...
  • 7. term-document matrix ESWC 2014 Social Web: Where are the Semantics? 7 •  Term-document matrix –  A very large, sparse matrix –  A document can be seen as a vector
  • 8. similarity measures •  Useful to answer… how similar are two documents? –  Distance measures between two documents •  Cosine similarity = ​ 𝐴•𝐵∕‖𝐴‖‖𝐵‖ =0,72 •  Jackard index = ​| 𝐴∩ 𝐵|∕|𝐴∪ 𝐵| =0,50 •  However: –  Synonyms will appear far apart while they aren’t –  Polysemic words will appear close while they aren’t •  What is a document talking about? –  «Explicit semantic analysis»(ESA) ESWC 2014 Social Web: Where are the Semantics? 8
  • 9. info retrieval, text mining •  Classification. Documents may belong to different classes •  How relevant is a word for a document (or class of documents)? TF−IDF(𝑥, 𝑦)=​ 𝑡 𝑓↓𝑥, 𝑦 ×log​(​ 𝐷⁄​ 𝑑↓𝑥  ) ESWC 2014 Social Web: Where are the Semantics? 9 tfx,y=freq. of x in y D=number of documents Dx=number of documents containing x
  • 10. latent semantic analysis ESWC 2014 Social Web: Where are the Semantics? 10 Figure taken from http://faculty.washington.edu/jwilker/559/ SteyversGriffiths.pdf Singular Value Decomposition to reduce dimensionality •  The term-document matrix is large •  Latent Semantic Analysis •  Rank of D can be reduced •  Meaning –  U=term-topic correlation –  D=topic importance –  V=document-topic correlation
  • 11. Semantics of a social media message ESWC 2014 Social Web: Where are the Semantics? 11
  • 12. Topics ESWC 2014 Social Web: Where are the Semantics? 12
  • 13. discriminative models / generative models •  Discriminative Models (1-step) 1.  Directly infer posterior probabilities p(Ck|x) •  Generative Models (2-steps) 1.  Infer class-conditional densities p(x|Ck) and priors p(Ck) 2.  Use Bayes theorem to determine posterior probabilities 𝑝​​ 𝐶↓1 ⁠ 𝑥 =​ 𝑝​ 𝑥⁠​ 𝐶↓1  𝑝(​ 𝐶↓1 )/𝑝​ 𝑥⁠​ 𝐶↓1  𝑝(​ 𝐶↓1 )+ 𝑝​ 𝑥⁠​ 𝐶↓2  𝑝(​ 𝐶↓2 )  ESWC 2014 Social Web: Where are the Semantics? 13 We can generate x that are likely to have been produced by class C1
  • 14. Generative model ESWC 2014 Social Web: Where are the Semantics? 14
  • 15. LDA: a probabilistic generative model ESWC 2014 Social Web: Where are the Semantics? 15 This is a Probabilistic Generative Process: we can generate documents according to certain topics.
  • 16. Topic models ESWC 2014 Social Web: Where are the Semantics? 16
  • 17. Topic models ESWC 2014 Social Web: Where are the Semantics? 17
  • 18. Topic models ESWC 2014 Social Web: Where are the Semantics? 18 Topics known a priori Latent topics •  We don’t know the topics in advance •  We don’t know the importance of each word in a topic Latent topics are not pre-specified but found from the corpus
  • 19. topic models vs LSA ESWC 2014 Social Web: Where are the Semantics? 19 Figure taken from http://faculty.washington.edu/jwilker/559/ SteyversGriffiths.pdf Singular Value Decomposition reduces dimensionality •  Latent Semantic Analysis vs Topic Model
  • 20. topic models ESWC 2014 Social Web: Where are the Semantics? 20 How important is a word in a topic How important is a topic in a document
  • 21. topic models: LDA ESWC 2014 Social Web: Where are the Semantics? 21 D documents, using a total of W words K topics LDA: each document d is a mixture among Z topics with each topic being a multinomial distribution over a vocabulary of W words θd : topic distribution for a document d (~Dirichlet(α)) ϕz : word probabilities for a topic z (~Dirichlet(β)) •  Latent Direchlet Allocation (LDA)
  • 22. Topic models ESWC 2014 Social Web: Where are the Semantics? 22 LDA: each document d is a mixture among Z topics with each topic being a multinomial distribution over a vocabulary of W words Probability of a word: θd : topic distribution for a document d (~Dirichlet(α)) ϕz : word probabilities for a topic z (~Dirichlet(β)) •  Latent Direchlet Allocation (LDA) Reminder of Gamma function
  • 23. Topic models •  Joint probability distribution for a document of all the random variables, assuming we know α and β. •  Given a set of documents D and the LDA model we can use inference to find out θ,ϕ and the topic assignment for each word •  Intractable problem, but numerically solvable with the Gibbs sampling method (sort of Monte Carlo for Markov Chains method) ESWC 2014 Social Web: Where are the Semantics? 23 for each word in the document Probability of a word in a document, knowing the distribution of words in a topic Probability of a topic in a document, knowing the topic distribution for a document The Dirichlet functions declared before
  • 24. Topic models: summary •  Latent Dirichlet Allocation is a generative probabilistic model which can discover latent topics in unlabelled data •  Labelled LDA: a supervised version •  Implementations: –  Mallet –  Stanford Topic Modelling Toolbox (Stanford) •  Applications: •  Similarity between two documents •  Classification of texts •  Indexing of documents ESWC 2014 Social Web: Where are the Semantics? 24
  • 25. Some references •  Latent Dirichlet allocation. D. Blei, A. Ng, and M. Jordan. Journal of Machine Learning Research, 3:993-1022, January 2003. •  Finding Scientific Topics. Griffiths, T., & Steyvers, M. (2004). Proceedings of the National Academy of Sciences, 101 (suppl. 1), 5228-5235. •  Semantic Characterization of Tweets Using Topic Models: A Use Case in the Entertainment Domain. A. García-Silva, V. Rodriguez- Doncel, O. Corcho. Int. Journal on Semantic Web and Information Systems (IJSWIS), 9(3), 1-13 (2013) ESWC 2014 Social Web: Where are the Semantics? 25
  • 26. Social Web: Where are the Semantics? ESWC 2014 Victor, Andres, Oscar, Miriam Universidad Politectica de Madrid, Spain Knowledge Media Institute, The Open University,
  • 27. mixture model We observe some data generated by a mixture of distributions and we want to learn the parameters of these distributions. A mixture model is a probabilistic model representing the linear combination of several PDFs We dont even see the colours!
  • 28. Rule based ESWC 2014 Social Web: Where are the Semantics? 28
  • 30. Rule grammar object RuleGrammar extends StandardTokenParsers { lexical.delimiters += ("[", "]", "#", "->", "+", "-", "*", "/", "=", "." ) lexical.reserved += ("_") def rule_set = (replace_rule | classify_rule).* def replace_rule = stringLit.* ~ "=" ~ stringLit def classify_rule = morpho_sequence ~ "->" ~ action def morpho_sequence = (word | lemma_pos| lemma | pos | entity | wildcard | limited_wildcard).* def word = stringLit def lemma = ident def pos = "[" ~> ident <~ "]" def lemma_pos = ident ~ "#" ~ ident def entity = "<ENTITY>" def wildcard = "*" def limited_wildcard = "/" ~> numericLit <~ "/" def action = classify_action | chunk_action def classify_action = ident ~ ("+" | "-" | "*" | "/") ~ number def number = numericLit | negative def negative = "-" ~ numericLit def chunk_action = ident <~ "." } Examples (Spanish): (cómo/cada día) odio (más) a (el/la/esta/…) ent: /2/ ODIAR#V /1/ A#SP /1/ <ENTITY> -> LH -2 mi/este odio a/por ent: D ODIO#NC [SP] <ENTITY> -> LH -1