SlideShare una empresa de Scribd logo
1 de 19
#1
EventSense: Capturing the Pulse of Large-scale
Events by Mining Social Media Streams
Case study: Thessaloniki International Film Festival
E. Schinas, S. Papadopoulos, S. Diplaris, Y. Kompatsiaris,
Y. Mass, J. Herzig, L. Boudakidis
#2
Capturing & mining large-scale events
• Large-scale events  attended by
thousands of people  captured by
mobile devices in the form of status
updates, photos, ratings, etc.
• SXSW Music, Film and Interactive
Conferences and Festivals
– 30000+ attendees
– ~300,000 tweets between Mar 3 and 7
– 40,247 tweets even the last month
• Sundance film festival
– 200 films, 10 days, 50,000+ attendees
– 200,000+ tweets during the festival
– 20,438 tweets even the last month
A search for #tiff53 in
twitter returns an
unstructured list of tweets
#3
Capturing & mining large-scale events
• The online representation of an event as a sequential list of
posts and status updates is ineffective
• A more effective means of event representation would
employ facets, such as entities, sub-events and sentiment.
• Challenge:
– Organize information around
– entities of interest
– Extract meaningful
insights, obtain informative
summaries
• EventSense framework
#4
Entity Detection (1/3)
• Entities are defined as lists of properties:
– a film consists of a title, description, names of
director(s)/actors
• Matching status updates (tweets) to entities relies on
representing both as tf * idf vectors
m: message (tweet), f: feature (term), M: set of all event messages
boost(f): boosting factor when f is a named entity
#5
Entity Detection (2/3)
Unigrams Bigrams
αργυρ : 0.348 αργυρ αλεξανδρ : 0.348
αλεξανδρ : 0.289 αλεξανδρ τουρκ : 0.233
τουρκ : 0.231 τουρκ ταιν : 0.201
ταιν : 0.191 ταιν μουχλ : 0.418
μουχλ : 0.616 -
1. Language Detection
2. Tokenization (using the appropriate tokenizer)
3. Stemming
4. tf * idf weighting
5. Boost
film’s name
#6
Entity Detection (3/3)
Entity of interest
1. Select a combination of properties
e.g.title, director and actors
2. Aggregate selected properties to a
single string  «Μούχλα Αλί Αιντίν»
3. Calculate tf * idf vector of n-grams
using the same vocabulary with tweets
4. Calculate cosine similarity between an
incoming message and the set of all
entities of interest.
5. Assign message to the entities that
similarity exceeds a predefined
threshold.
#7
Topic analysis
• 1 NN clustering algorithm to create clusters/topics
Assign an incoming message to the nearest topic, if cosine similarity
exceeds a predefined threshold. Else create a new topic.
• Similarity threshold sensitivity analysis similar to entity extraction
• LSH approximation to scale up (Petrovich at al., NAACL 2010)
hash the input items so that similar items are mapped to the same
buckets with high probability. Reduce search only to this bucket.
• Title Extraction per Topic
For the set of the items of a topic we find the largest sequence of words
with the highest frequency.
#8
Sentiment Analysis
• Training using tweets with emoticons. E.g.   positive,   negative
(A. Go, R. Bhayani, and L. Huang)
• For each message we extract two types of features. The first is n-grams.
The second includes the existence of user mentions and URLs,
punctuation, repeated letters
• Naive Bayes (NB) classifier for positive and negative data. Assuming a
uniform prior for all classes, independence between features, and using
the Bayes rule we get:
#9
Aggregation & summarization
• For each entity we retrieve the set of associated messages
and calculate the mean value of sentiment, Polarity and
Subjectivity
• Calculate the same sentiment measures per topic and per user
• Several other statistics: top shared messages, URLs and images,
top active & influential users
#10
Dataset: 53rd
Thessaloniki International Film Festival
Three sources of data
1. A detailed set of the 168 films included in the official festival
program of tiff53
2. 3,974 tweets that contain the official hashtag of the festival
(#tiff53) for the period between November 1st
and 13th
3. Film rating and bookmarking data created by the ThessFest
mobile app (available both for iPhone* and Android**).
* https://itunes.apple.com/gr/app/thessfest/id504913309?mt=8
** https://play.google.com/store/apps/details?id=com.mk4droid.FF_pack&hl=el
10 days long event
2-11 November 2012
#11
Tweet-film matching results
• film = <title, description, directors, actors>
• Multiple entity representations using Greek/English/both, uni-/bi-grams
• Similarity threshold sensitivity analysis
Pooling multiple representationsthreshold ∈ (0.1, 0.3)
#12
Topic analysis results
• 834 topics (clusters)
• Manual inspection
of topics:
– 53.8% of topic titles
considered
informative
– 98.5% of topics
were found to be
“clean”
Topics in time
Top-10
#13
Sentiment analysis results
• Training
– 800K positive & negative tweets for English
– 12K positive & negative tweets for Greek
• Tuning (for threshold)
– Manually annotated dataset from Thessaloniki Documentary Festival
(similar event)
– 325/73/553 in English and 781/216/781 in Greek
• Testing
– 324/33/724 in English and 901/315/1667 in Greek
– Best accuracy (English) ~ 0.75
– Performance in Greek much poorer
compared to English 
need for richer training corpus
pos neg neut
#14
Aggregation & summarization results (1/2)
#T: number of tweets
Pol: polarity of film tweets
Subj: subjectivity of film tweets
R: average rating
#R: number of ratings
#F: number of times the film was bookmarked
• Films with positive polarity are rated higher.
• Films that are tweeted a lot are also more
likely to be rated.
• Films that are tweet a lot are also more
likely to be added to the users’ bookmarks.
Pearson correlation across film statistics
#15
Aggregation & summarization results (2/2)
Most active & influential Twitter
accounts (+sentiment per user)
Most shared photos
(+number of retweets)
#16
Summary
• Extract entities of interest from messages
F1 = 0.737 (precision = 0.774, recall = 0.697)
• Detect topics in event related messages
834 topics, 98.5% considered “clean”
• Sentiment analysis per messages, entities & topics
Accuracy: 0.75 for English, 0.62 for Greek
• Aggregation & statistics
Valuable insights and overview information
#17
Future Work
• Apply the proposed framework to larger-scale events
of different nature (e.g. music festivals, sports
events).
• Monitoring and processing more OSN sources (e.g.
Facebook, Instagram).
• Refine the proposed methods with the goal of
improving accuracy and robustness over different
datasets.
• Experiment with techniques for automatically
creating visual informative summaries based on the
results of the automatic analysis.
#18
Thank You
Questions?
#19
References
1. Petrovic S., Osborne M., Lavrenko V. (2010) Streaming first story
detection with application to Twitter. Human Language Technologies:
The 2010 Annual Conference of the North American Chapter of the ACL
(NAACL)
2. A. Go, R. Bhayani, and L. Huang. Twitter sentiment classification using
distant supervision. 2009.

Más contenido relacionado

Destacado

firmenbonitaet24
firmenbonitaet24firmenbonitaet24
firmenbonitaet24infoprimus
 
Iii feira das ciencias galego 2014
Iii feira das ciencias galego 2014Iii feira das ciencias galego 2014
Iii feira das ciencias galego 2014joseflorencio
 
7 rajchenberg optimización cosecha
7 rajchenberg optimización cosecha7 rajchenberg optimización cosecha
7 rajchenberg optimización cosechasaldungaray
 
92 Turismo Rural de Navarra. Guía Práctica de Servicios, Alojamientos y Acti...
92  Turismo Rural de Navarra. Guía Práctica de Servicios, Alojamientos y Acti...92  Turismo Rural de Navarra. Guía Práctica de Servicios, Alojamientos y Acti...
92 Turismo Rural de Navarra. Guía Práctica de Servicios, Alojamientos y Acti...Casa Rural Urbasa Urederra en Navarra
 
Java EE 7 - New Features and the WebSocket API
Java EE 7 - New Features and the WebSocket APIJava EE 7 - New Features and the WebSocket API
Java EE 7 - New Features and the WebSocket APIMarcus Schiesser
 
El Secreto de María - Luís Maria Grignion de Montfort
El Secreto de María - Luís Maria Grignion de MontfortEl Secreto de María - Luís Maria Grignion de Montfort
El Secreto de María - Luís Maria Grignion de MontfortLucas Lazarini
 
MENTOREO Damián Carvajal
MENTOREO Damián CarvajalMENTOREO Damián Carvajal
MENTOREO Damián Carvajalagoradocentes
 
Climbing Denali, AK 2008
Climbing Denali, AK 2008Climbing Denali, AK 2008
Climbing Denali, AK 2008Jim Geiger
 
Improving the nitrogen responses of UK wheat varieties
Improving the nitrogen responses of UK wheat varietiesImproving the nitrogen responses of UK wheat varieties
Improving the nitrogen responses of UK wheat varietiesCIMMYT
 
Edificación sostenible
Edificación sostenibleEdificación sostenible
Edificación sosteniblesesestacions
 
TECNO FINANZAS
TECNO FINANZASTECNO FINANZAS
TECNO FINANZASaospica
 
Chapter 15 social media marketing plan
Chapter 15 social media marketing planChapter 15 social media marketing plan
Chapter 15 social media marketing planXinyi Chen
 
Mapa conceptual erelsy gomez
Mapa conceptual erelsy gomezMapa conceptual erelsy gomez
Mapa conceptual erelsy gomezpanitax78
 
Landscapes and settlements
Landscapes and settlementsLandscapes and settlements
Landscapes and settlementselbauldelared
 
Martin Doyce Digital Engagement in Public Libraries NSWnet DE & UX seminar 2015
Martin Doyce Digital Engagement in Public Libraries NSWnet DE & UX seminar 2015Martin Doyce Digital Engagement in Public Libraries NSWnet DE & UX seminar 2015
Martin Doyce Digital Engagement in Public Libraries NSWnet DE & UX seminar 2015PublicLibraryServices
 

Destacado (20)

firmenbonitaet24
firmenbonitaet24firmenbonitaet24
firmenbonitaet24
 
Iii feira das ciencias galego 2014
Iii feira das ciencias galego 2014Iii feira das ciencias galego 2014
Iii feira das ciencias galego 2014
 
7 rajchenberg optimización cosecha
7 rajchenberg optimización cosecha7 rajchenberg optimización cosecha
7 rajchenberg optimización cosecha
 
Brugada
BrugadaBrugada
Brugada
 
catalogo pt
catalogo ptcatalogo pt
catalogo pt
 
92 Turismo Rural de Navarra. Guía Práctica de Servicios, Alojamientos y Acti...
92  Turismo Rural de Navarra. Guía Práctica de Servicios, Alojamientos y Acti...92  Turismo Rural de Navarra. Guía Práctica de Servicios, Alojamientos y Acti...
92 Turismo Rural de Navarra. Guía Práctica de Servicios, Alojamientos y Acti...
 
Java EE 7 - New Features and the WebSocket API
Java EE 7 - New Features and the WebSocket APIJava EE 7 - New Features and the WebSocket API
Java EE 7 - New Features and the WebSocket API
 
El Secreto de María - Luís Maria Grignion de Montfort
El Secreto de María - Luís Maria Grignion de MontfortEl Secreto de María - Luís Maria Grignion de Montfort
El Secreto de María - Luís Maria Grignion de Montfort
 
MENTOREO Damián Carvajal
MENTOREO Damián CarvajalMENTOREO Damián Carvajal
MENTOREO Damián Carvajal
 
ArticuloElFaroGallegoMarzo2016
ArticuloElFaroGallegoMarzo2016ArticuloElFaroGallegoMarzo2016
ArticuloElFaroGallegoMarzo2016
 
Climbing Denali, AK 2008
Climbing Denali, AK 2008Climbing Denali, AK 2008
Climbing Denali, AK 2008
 
Improving the nitrogen responses of UK wheat varieties
Improving the nitrogen responses of UK wheat varietiesImproving the nitrogen responses of UK wheat varieties
Improving the nitrogen responses of UK wheat varieties
 
Me desconecto luego existo
Me desconecto luego existoMe desconecto luego existo
Me desconecto luego existo
 
Edificación sostenible
Edificación sostenibleEdificación sostenible
Edificación sostenible
 
TECNO FINANZAS
TECNO FINANZASTECNO FINANZAS
TECNO FINANZAS
 
Chapter 15 social media marketing plan
Chapter 15 social media marketing planChapter 15 social media marketing plan
Chapter 15 social media marketing plan
 
Aleluya
Aleluya Aleluya
Aleluya
 
Mapa conceptual erelsy gomez
Mapa conceptual erelsy gomezMapa conceptual erelsy gomez
Mapa conceptual erelsy gomez
 
Landscapes and settlements
Landscapes and settlementsLandscapes and settlements
Landscapes and settlements
 
Martin Doyce Digital Engagement in Public Libraries NSWnet DE & UX seminar 2015
Martin Doyce Digital Engagement in Public Libraries NSWnet DE & UX seminar 2015Martin Doyce Digital Engagement in Public Libraries NSWnet DE & UX seminar 2015
Martin Doyce Digital Engagement in Public Libraries NSWnet DE & UX seminar 2015
 

Similar a Capturing & mining large-scale events from social media

SocialSensor Project: Sensing User Generated Input for Improved Media Discove...
SocialSensor Project: Sensing User Generated Input for Improved Media Discove...SocialSensor Project: Sensing User Generated Input for Improved Media Discove...
SocialSensor Project: Sensing User Generated Input for Improved Media Discove...Yiannis Kompatsiaris
 
Citizen Sensor Data Mining, Social Media Analytics and Applications
Citizen Sensor Data Mining, Social Media Analytics and ApplicationsCitizen Sensor Data Mining, Social Media Analytics and Applications
Citizen Sensor Data Mining, Social Media Analytics and ApplicationsAmit Sheth
 
Video Retrieval for Multimedia Verification of Breaking News on Social Networks
Video Retrieval for Multimedia Verification  of Breaking News on Social NetworksVideo Retrieval for Multimedia Verification  of Breaking News on Social Networks
Video Retrieval for Multimedia Verification of Breaking News on Social NetworksInVID Project
 
final_nlp
final_nlpfinal_nlp
final_nlpaphex34
 
Weather events identification in social media streams: tools to detect their ...
Weather events identification in social media streams: tools to detect their ...Weather events identification in social media streams: tools to detect their ...
Weather events identification in social media streams: tools to detect their ...Alfonso Crisci
 
Final Poster for Engineering Showcase
Final Poster for Engineering ShowcaseFinal Poster for Engineering Showcase
Final Poster for Engineering ShowcaseTucker Truesdale
 
Inferring social media user attributes using language and network information
Inferring social media user attributes using language and network informationInferring social media user attributes using language and network information
Inferring social media user attributes using language and network informationNikolaos Aletras
 
Socialsensor project overview and topic discovery in tweeter streams
Socialsensor project overview and topic discovery in tweeter streams Socialsensor project overview and topic discovery in tweeter streams
Socialsensor project overview and topic discovery in tweeter streams Yiannis Kompatsiaris
 
Global Media Monitor - Marko Grobelnik
Global Media Monitor - Marko GrobelnikGlobal Media Monitor - Marko Grobelnik
Global Media Monitor - Marko GrobelnikMarko Grobelnik
 
NMIX 4200 Final Paper Report
NMIX 4200 Final Paper ReportNMIX 4200 Final Paper Report
NMIX 4200 Final Paper ReportPatrick Grant
 
Twitris - Web Information System 2011 Course
Twitris - Web Information System 2011 Course Twitris - Web Information System 2011 Course
Twitris - Web Information System 2011 Course Ashutosh Jadhav
 
Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...
Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...
Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...Artificial Intelligence Institute at UofSC
 
Hila wsdm12-final
Hila wsdm12-finalHila wsdm12-final
Hila wsdm12-finalHila Becker
 
New Methodologies for Capturing and Working with Publicly Available Twitter Data
New Methodologies for Capturing and Working with Publicly Available Twitter DataNew Methodologies for Capturing and Working with Publicly Available Twitter Data
New Methodologies for Capturing and Working with Publicly Available Twitter DataAxel Bruns
 
Threat Intelligence Baseada em Dados: Métricas de Disseminação e Compartilham...
Threat Intelligence Baseada em Dados: Métricas de Disseminação e Compartilham...Threat Intelligence Baseada em Dados: Métricas de Disseminação e Compartilham...
Threat Intelligence Baseada em Dados: Métricas de Disseminação e Compartilham...Alexandre Sieira
 

Similar a Capturing & mining large-scale events from social media (20)

SocialSensor Project: Sensing User Generated Input for Improved Media Discove...
SocialSensor Project: Sensing User Generated Input for Improved Media Discove...SocialSensor Project: Sensing User Generated Input for Improved Media Discove...
SocialSensor Project: Sensing User Generated Input for Improved Media Discove...
 
Citizen Sensor Data Mining, Social Media Analytics and Applications
Citizen Sensor Data Mining, Social Media Analytics and ApplicationsCitizen Sensor Data Mining, Social Media Analytics and Applications
Citizen Sensor Data Mining, Social Media Analytics and Applications
 
Video Retrieval for Multimedia Verification of Breaking News on Social Networks
Video Retrieval for Multimedia Verification  of Breaking News on Social NetworksVideo Retrieval for Multimedia Verification  of Breaking News on Social Networks
Video Retrieval for Multimedia Verification of Breaking News on Social Networks
 
Learning Semantic Relationships between Entities in Twitter
Learning Semantic Relationships between Entities in TwitterLearning Semantic Relationships between Entities in Twitter
Learning Semantic Relationships between Entities in Twitter
 
final_nlp
final_nlpfinal_nlp
final_nlp
 
Weather events identification in social media streams: tools to detect their ...
Weather events identification in social media streams: tools to detect their ...Weather events identification in social media streams: tools to detect their ...
Weather events identification in social media streams: tools to detect their ...
 
Final Poster for Engineering Showcase
Final Poster for Engineering ShowcaseFinal Poster for Engineering Showcase
Final Poster for Engineering Showcase
 
Inferring social media user attributes using language and network information
Inferring social media user attributes using language and network informationInferring social media user attributes using language and network information
Inferring social media user attributes using language and network information
 
Socialsensor project overview and topic discovery in tweeter streams
Socialsensor project overview and topic discovery in tweeter streams Socialsensor project overview and topic discovery in tweeter streams
Socialsensor project overview and topic discovery in tweeter streams
 
Global Media Monitor - Marko Grobelnik
Global Media Monitor - Marko GrobelnikGlobal Media Monitor - Marko Grobelnik
Global Media Monitor - Marko Grobelnik
 
NMIX 4200 Final Paper Report
NMIX 4200 Final Paper ReportNMIX 4200 Final Paper Report
NMIX 4200 Final Paper Report
 
Understanding User-Community Engagement by Multi-faceted Features: A Case ...
Understanding User-Community Engagement by Multi-faceted Features: A Case ...Understanding User-Community Engagement by Multi-faceted Features: A Case ...
Understanding User-Community Engagement by Multi-faceted Features: A Case ...
 
Twitris - Web Information System 2011 Course
Twitris - Web Information System 2011 Course Twitris - Web Information System 2011 Course
Twitris - Web Information System 2011 Course
 
Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...
Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...
Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...
 
Hila wsdm12-final
Hila wsdm12-finalHila wsdm12-final
Hila wsdm12-final
 
NDU Present
NDU PresentNDU Present
NDU Present
 
Yuntech present
Yuntech presentYuntech present
Yuntech present
 
New Methodologies for Capturing and Working with Publicly Available Twitter Data
New Methodologies for Capturing and Working with Publicly Available Twitter DataNew Methodologies for Capturing and Working with Publicly Available Twitter Data
New Methodologies for Capturing and Working with Publicly Available Twitter Data
 
DIE 20130724
DIE 20130724DIE 20130724
DIE 20130724
 
Threat Intelligence Baseada em Dados: Métricas de Disseminação e Compartilham...
Threat Intelligence Baseada em Dados: Métricas de Disseminação e Compartilham...Threat Intelligence Baseada em Dados: Métricas de Disseminação e Compartilham...
Threat Intelligence Baseada em Dados: Métricas de Disseminação e Compartilham...
 

Más de Symeon Papadopoulos

DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...
DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...
DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...Symeon Papadopoulos
 
Deepfakes: An Emerging Internet Threat and their Detection
Deepfakes: An Emerging Internet Threat and their DetectionDeepfakes: An Emerging Internet Threat and their Detection
Deepfakes: An Emerging Internet Threat and their DetectionSymeon Papadopoulos
 
Knowledge-based Fusion for Image Tampering Localization
Knowledge-based Fusion for Image Tampering LocalizationKnowledge-based Fusion for Image Tampering Localization
Knowledge-based Fusion for Image Tampering LocalizationSymeon Papadopoulos
 
Deepfake Detection: The Importance of Training Data Preprocessing and Practic...
Deepfake Detection: The Importance of Training Data Preprocessing and Practic...Deepfake Detection: The Importance of Training Data Preprocessing and Practic...
Deepfake Detection: The Importance of Training Data Preprocessing and Practic...Symeon Papadopoulos
 
COVID-19 Infodemic vs Contact Tracing
COVID-19 Infodemic vs Contact TracingCOVID-19 Infodemic vs Contact Tracing
COVID-19 Infodemic vs Contact TracingSymeon Papadopoulos
 
Similarity-based retrieval of multimedia content
Similarity-based retrieval of multimedia contentSimilarity-based retrieval of multimedia content
Similarity-based retrieval of multimedia contentSymeon Papadopoulos
 
Twitter-based Sensing of City-level Air Quality
Twitter-based Sensing of City-level Air QualityTwitter-based Sensing of City-level Air Quality
Twitter-based Sensing of City-level Air QualitySymeon Papadopoulos
 
Aggregating and Analyzing the Context of Social Media Content
Aggregating and Analyzing the Context of Social Media ContentAggregating and Analyzing the Context of Social Media Content
Aggregating and Analyzing the Context of Social Media ContentSymeon Papadopoulos
 
Verifying Multimedia Content on the Internet
Verifying Multimedia Content on the InternetVerifying Multimedia Content on the Internet
Verifying Multimedia Content on the InternetSymeon Papadopoulos
 
A Web-based Service for Image Tampering Detection
A Web-based Service for Image Tampering DetectionA Web-based Service for Image Tampering Detection
A Web-based Service for Image Tampering DetectionSymeon Papadopoulos
 
Learning to detect Misleading Content on Twitter
Learning to detect Misleading Content on TwitterLearning to detect Misleading Content on Twitter
Learning to detect Misleading Content on TwitterSymeon Papadopoulos
 
Near-Duplicate Video Retrieval by Aggregating Intermediate CNN Layers
Near-Duplicate Video Retrieval by Aggregating Intermediate CNN LayersNear-Duplicate Video Retrieval by Aggregating Intermediate CNN Layers
Near-Duplicate Video Retrieval by Aggregating Intermediate CNN LayersSymeon Papadopoulos
 
Verifying Multimedia Use at MediaEval 2016
Verifying Multimedia Use at MediaEval 2016Verifying Multimedia Use at MediaEval 2016
Verifying Multimedia Use at MediaEval 2016Symeon Papadopoulos
 
Placing Images with Refined Language Models and Similarity Search with PCA-re...
Placing Images with Refined Language Models and Similarity Search with PCA-re...Placing Images with Refined Language Models and Similarity Search with PCA-re...
Placing Images with Refined Language Models and Similarity Search with PCA-re...Symeon Papadopoulos
 
In-depth Exploration of Geotagging Performance
In-depth Exploration of Geotagging PerformanceIn-depth Exploration of Geotagging Performance
In-depth Exploration of Geotagging PerformanceSymeon Papadopoulos
 
Perceived versus Actual Predictability of Personal Information in Social Netw...
Perceived versus Actual Predictability of Personal Information in Social Netw...Perceived versus Actual Predictability of Personal Information in Social Netw...
Perceived versus Actual Predictability of Personal Information in Social Netw...Symeon Papadopoulos
 
Web and Social Media Image Forensics for News Professionals
Web and Social Media Image Forensics for News ProfessionalsWeb and Social Media Image Forensics for News Professionals
Web and Social Media Image Forensics for News ProfessionalsSymeon Papadopoulos
 
Predicting News Popularity by Mining Online Discussions
Predicting News Popularity by Mining Online DiscussionsPredicting News Popularity by Mining Online Discussions
Predicting News Popularity by Mining Online DiscussionsSymeon Papadopoulos
 
Finding Diverse Social Images at MediaEval 2015
Finding Diverse Social Images at MediaEval 2015Finding Diverse Social Images at MediaEval 2015
Finding Diverse Social Images at MediaEval 2015Symeon Papadopoulos
 

Más de Symeon Papadopoulos (20)

DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...
DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...
DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...
 
Deepfakes: An Emerging Internet Threat and their Detection
Deepfakes: An Emerging Internet Threat and their DetectionDeepfakes: An Emerging Internet Threat and their Detection
Deepfakes: An Emerging Internet Threat and their Detection
 
Knowledge-based Fusion for Image Tampering Localization
Knowledge-based Fusion for Image Tampering LocalizationKnowledge-based Fusion for Image Tampering Localization
Knowledge-based Fusion for Image Tampering Localization
 
Deepfake Detection: The Importance of Training Data Preprocessing and Practic...
Deepfake Detection: The Importance of Training Data Preprocessing and Practic...Deepfake Detection: The Importance of Training Data Preprocessing and Practic...
Deepfake Detection: The Importance of Training Data Preprocessing and Practic...
 
COVID-19 Infodemic vs Contact Tracing
COVID-19 Infodemic vs Contact TracingCOVID-19 Infodemic vs Contact Tracing
COVID-19 Infodemic vs Contact Tracing
 
Similarity-based retrieval of multimedia content
Similarity-based retrieval of multimedia contentSimilarity-based retrieval of multimedia content
Similarity-based retrieval of multimedia content
 
Twitter-based Sensing of City-level Air Quality
Twitter-based Sensing of City-level Air QualityTwitter-based Sensing of City-level Air Quality
Twitter-based Sensing of City-level Air Quality
 
Aggregating and Analyzing the Context of Social Media Content
Aggregating and Analyzing the Context of Social Media ContentAggregating and Analyzing the Context of Social Media Content
Aggregating and Analyzing the Context of Social Media Content
 
Verifying Multimedia Content on the Internet
Verifying Multimedia Content on the InternetVerifying Multimedia Content on the Internet
Verifying Multimedia Content on the Internet
 
A Web-based Service for Image Tampering Detection
A Web-based Service for Image Tampering DetectionA Web-based Service for Image Tampering Detection
A Web-based Service for Image Tampering Detection
 
Learning to detect Misleading Content on Twitter
Learning to detect Misleading Content on TwitterLearning to detect Misleading Content on Twitter
Learning to detect Misleading Content on Twitter
 
Near-Duplicate Video Retrieval by Aggregating Intermediate CNN Layers
Near-Duplicate Video Retrieval by Aggregating Intermediate CNN LayersNear-Duplicate Video Retrieval by Aggregating Intermediate CNN Layers
Near-Duplicate Video Retrieval by Aggregating Intermediate CNN Layers
 
Verifying Multimedia Use at MediaEval 2016
Verifying Multimedia Use at MediaEval 2016Verifying Multimedia Use at MediaEval 2016
Verifying Multimedia Use at MediaEval 2016
 
Multimedia Privacy
Multimedia PrivacyMultimedia Privacy
Multimedia Privacy
 
Placing Images with Refined Language Models and Similarity Search with PCA-re...
Placing Images with Refined Language Models and Similarity Search with PCA-re...Placing Images with Refined Language Models and Similarity Search with PCA-re...
Placing Images with Refined Language Models and Similarity Search with PCA-re...
 
In-depth Exploration of Geotagging Performance
In-depth Exploration of Geotagging PerformanceIn-depth Exploration of Geotagging Performance
In-depth Exploration of Geotagging Performance
 
Perceived versus Actual Predictability of Personal Information in Social Netw...
Perceived versus Actual Predictability of Personal Information in Social Netw...Perceived versus Actual Predictability of Personal Information in Social Netw...
Perceived versus Actual Predictability of Personal Information in Social Netw...
 
Web and Social Media Image Forensics for News Professionals
Web and Social Media Image Forensics for News ProfessionalsWeb and Social Media Image Forensics for News Professionals
Web and Social Media Image Forensics for News Professionals
 
Predicting News Popularity by Mining Online Discussions
Predicting News Popularity by Mining Online DiscussionsPredicting News Popularity by Mining Online Discussions
Predicting News Popularity by Mining Online Discussions
 
Finding Diverse Social Images at MediaEval 2015
Finding Diverse Social Images at MediaEval 2015Finding Diverse Social Images at MediaEval 2015
Finding Diverse Social Images at MediaEval 2015
 

Último

A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 

Último (20)

A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 

Capturing & mining large-scale events from social media

  • 1. #1 EventSense: Capturing the Pulse of Large-scale Events by Mining Social Media Streams Case study: Thessaloniki International Film Festival E. Schinas, S. Papadopoulos, S. Diplaris, Y. Kompatsiaris, Y. Mass, J. Herzig, L. Boudakidis
  • 2. #2 Capturing & mining large-scale events • Large-scale events  attended by thousands of people  captured by mobile devices in the form of status updates, photos, ratings, etc. • SXSW Music, Film and Interactive Conferences and Festivals – 30000+ attendees – ~300,000 tweets between Mar 3 and 7 – 40,247 tweets even the last month • Sundance film festival – 200 films, 10 days, 50,000+ attendees – 200,000+ tweets during the festival – 20,438 tweets even the last month A search for #tiff53 in twitter returns an unstructured list of tweets
  • 3. #3 Capturing & mining large-scale events • The online representation of an event as a sequential list of posts and status updates is ineffective • A more effective means of event representation would employ facets, such as entities, sub-events and sentiment. • Challenge: – Organize information around – entities of interest – Extract meaningful insights, obtain informative summaries • EventSense framework
  • 4. #4 Entity Detection (1/3) • Entities are defined as lists of properties: – a film consists of a title, description, names of director(s)/actors • Matching status updates (tweets) to entities relies on representing both as tf * idf vectors m: message (tweet), f: feature (term), M: set of all event messages boost(f): boosting factor when f is a named entity
  • 5. #5 Entity Detection (2/3) Unigrams Bigrams αργυρ : 0.348 αργυρ αλεξανδρ : 0.348 αλεξανδρ : 0.289 αλεξανδρ τουρκ : 0.233 τουρκ : 0.231 τουρκ ταιν : 0.201 ταιν : 0.191 ταιν μουχλ : 0.418 μουχλ : 0.616 - 1. Language Detection 2. Tokenization (using the appropriate tokenizer) 3. Stemming 4. tf * idf weighting 5. Boost film’s name
  • 6. #6 Entity Detection (3/3) Entity of interest 1. Select a combination of properties e.g.title, director and actors 2. Aggregate selected properties to a single string  «Μούχλα Αλί Αιντίν» 3. Calculate tf * idf vector of n-grams using the same vocabulary with tweets 4. Calculate cosine similarity between an incoming message and the set of all entities of interest. 5. Assign message to the entities that similarity exceeds a predefined threshold.
  • 7. #7 Topic analysis • 1 NN clustering algorithm to create clusters/topics Assign an incoming message to the nearest topic, if cosine similarity exceeds a predefined threshold. Else create a new topic. • Similarity threshold sensitivity analysis similar to entity extraction • LSH approximation to scale up (Petrovich at al., NAACL 2010) hash the input items so that similar items are mapped to the same buckets with high probability. Reduce search only to this bucket. • Title Extraction per Topic For the set of the items of a topic we find the largest sequence of words with the highest frequency.
  • 8. #8 Sentiment Analysis • Training using tweets with emoticons. E.g.   positive,   negative (A. Go, R. Bhayani, and L. Huang) • For each message we extract two types of features. The first is n-grams. The second includes the existence of user mentions and URLs, punctuation, repeated letters • Naive Bayes (NB) classifier for positive and negative data. Assuming a uniform prior for all classes, independence between features, and using the Bayes rule we get:
  • 9. #9 Aggregation & summarization • For each entity we retrieve the set of associated messages and calculate the mean value of sentiment, Polarity and Subjectivity • Calculate the same sentiment measures per topic and per user • Several other statistics: top shared messages, URLs and images, top active & influential users
  • 10. #10 Dataset: 53rd Thessaloniki International Film Festival Three sources of data 1. A detailed set of the 168 films included in the official festival program of tiff53 2. 3,974 tweets that contain the official hashtag of the festival (#tiff53) for the period between November 1st and 13th 3. Film rating and bookmarking data created by the ThessFest mobile app (available both for iPhone* and Android**). * https://itunes.apple.com/gr/app/thessfest/id504913309?mt=8 ** https://play.google.com/store/apps/details?id=com.mk4droid.FF_pack&hl=el 10 days long event 2-11 November 2012
  • 11. #11 Tweet-film matching results • film = <title, description, directors, actors> • Multiple entity representations using Greek/English/both, uni-/bi-grams • Similarity threshold sensitivity analysis Pooling multiple representationsthreshold ∈ (0.1, 0.3)
  • 12. #12 Topic analysis results • 834 topics (clusters) • Manual inspection of topics: – 53.8% of topic titles considered informative – 98.5% of topics were found to be “clean” Topics in time Top-10
  • 13. #13 Sentiment analysis results • Training – 800K positive & negative tweets for English – 12K positive & negative tweets for Greek • Tuning (for threshold) – Manually annotated dataset from Thessaloniki Documentary Festival (similar event) – 325/73/553 in English and 781/216/781 in Greek • Testing – 324/33/724 in English and 901/315/1667 in Greek – Best accuracy (English) ~ 0.75 – Performance in Greek much poorer compared to English  need for richer training corpus pos neg neut
  • 14. #14 Aggregation & summarization results (1/2) #T: number of tweets Pol: polarity of film tweets Subj: subjectivity of film tweets R: average rating #R: number of ratings #F: number of times the film was bookmarked • Films with positive polarity are rated higher. • Films that are tweeted a lot are also more likely to be rated. • Films that are tweet a lot are also more likely to be added to the users’ bookmarks. Pearson correlation across film statistics
  • 15. #15 Aggregation & summarization results (2/2) Most active & influential Twitter accounts (+sentiment per user) Most shared photos (+number of retweets)
  • 16. #16 Summary • Extract entities of interest from messages F1 = 0.737 (precision = 0.774, recall = 0.697) • Detect topics in event related messages 834 topics, 98.5% considered “clean” • Sentiment analysis per messages, entities & topics Accuracy: 0.75 for English, 0.62 for Greek • Aggregation & statistics Valuable insights and overview information
  • 17. #17 Future Work • Apply the proposed framework to larger-scale events of different nature (e.g. music festivals, sports events). • Monitoring and processing more OSN sources (e.g. Facebook, Instagram). • Refine the proposed methods with the goal of improving accuracy and robustness over different datasets. • Experiment with techniques for automatically creating visual informative summaries based on the results of the automatic analysis.
  • 19. #19 References 1. Petrovic S., Osborne M., Lavrenko V. (2010) Streaming first story detection with application to Twitter. Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the ACL (NAACL) 2. A. Go, R. Bhayani, and L. Huang. Twitter sentiment classification using distant supervision. 2009.