SlideShare una empresa de Scribd logo
1 de 19
Descargar para leer sin conexión
8th Author Profiling task at PAN
Profiling Fake News Spreaders
on Twitter
PAN-AP-2020 CLEF 2020
Online, 22-25 September
Francisco Rangel
Symanto Research
Paolo Rosso
PRHLT Research Center
Universitat Politècnica de Valencia
Bilal Ghanem
Symanto Research
Anastasia Giachanou
PRHLT Research Center
Universitat Politècnica de Valencia
Introduction
Author profiling aims at identifying
personal traits such as age, gender,
personality traits, native language,
language variety… from writings?
This is crucial for:
- Marketing.
- Security.
- Forensics.
2
Author
Profiling
PAN’20
Task goal
Given a Twitter feed, determine whether
its author is keen to spread fake news or
not.
3
Author
Profiling
Two languages:
English Spanish
PAN’20
Corpus
4
Author
Profiling
PAN’20
(EN) English (ES) Spanish
Keen to spread
fake news
Not keen to spread
fake news
Total
Keen to spread
fake news
Not keen to
spread fake news
Total
Training 150 150 300 150 150 300
Test 100 100 200 100 100 200
Total 250 250 500 250 250 500
Methodology
1. Selection of fake news from Politifact and Snopes related sites (+ manual review).
2. Collection of tweets responding to the previous news:
2.1. Manual inspection to ensure that the tweet refers to the news.
2.2. Manual annotation of those tweets supporting vs. rejecting the news.
3. Timeline collection
3.1. Manual review of the tweets to label the fake ones.
3.2. Users with one of more fake tweets are keen to spread them. Otherwise, they are not.
3.3. Removal of tweets referring explicitly to the fake news (to avoid bias).
Evaluation measures
5
Author
Profiling
PAN’20
The accuracy is calculated per language and averaged:
Baselines
6
Author
Profiling
PAN’20
RANDOM A baseline that randomly generates the predictions among the different classes
LSTM An Long Short-Term Memory neural network that uses FastTex embeddings to
represent texts.
CHAR N-GRAMS With values for $n$ from 2 to 6, with a SVM
WORD N-GRAMS With values for $n$ from 1 to 3, with a Neural Network
EIN The Emotionally-Infused Neural (EIN) network with word embedding and
emotional features as the input of an LSTM
Symanto (LDSE) This method represents documents on the basis of the probability distribution of
occurrence of their words in the different classes. The key concept of LDSE is a
weight, representing the probability of a term to belong to one of the different
categories: fake news spreaders / non-spreader. The distribution of weights for
a given document should be closer to the weights of its corresponding category.
LDSE takes advantage of the whole vocabulary
66 participants
33 working notes
22 countries
7
Author
Profiling
PAN’20
Participation
https://mapchart.net/world.html
Approaches
8
Author
Profiling
PAN’20
Approaches - Preprocessing
9
Author
Profiling
Twitter elements (RT, VIA,
FAV)
Giglou; Hashemi; Pinnaparaju
Emojis and other
non-alphanumeric chars
Buda; Pinnaparaju; Vogel; Giglou; Espinosa; Majumder; Lichouri; Shashirekha
Lemmatisation Giglou; Hashemi; Lichouri; Shashirekha
Tokenisation Vogel; Labadie; Fernández; Espinosa; Lichouri; Shashirekha; Baruah
Punctuation signs Vogel; Koloski; Giglou; Espinosa; Hashemi; Lichouri; Shashirekha
Numbers Pizarro; Vogel; Giglou; Espinosa; Hashemi; Shashirekha
Lowercase Buda; Pizarro; Vogel; Pinnaparaju
Stopwords Vogel; Koloski; Giglou; Espinosa; Hashemi; Lichouri; Shashirekha
Character flooding Vogel; Labadie
Infrequent terms Ikade
Short texts Vogel
PAN’20
Approaches - Features
10
Author
Profiling
Stylistic features:
- Number of occurrences
- Verbs, adjs, pronouns
- Number of hashtags, mentions,
URLs...
- Capital vs. lower letters
- Punctuation marks
- ...
Manna; Buda; Lichouri; Justin; Niven; Russo; Hörtenhuemer;
Cardaioli; Spezanno; Ogaltsov; Labadie; Hashemi;
Moreno-Sandoval;
N-gram models Pizarro; Espinosa; Vogel; Koloski; López-Fernández; Vijayasaradhi;
Buda; Lichouri; Justin; Hörtenhuemer; Spezanno; Aguirrezabal;
Shashirekha; Babaei; Labadie; Hashemi;
Emotional and personality features Justin; Niven; Russo; Hörtenhuemer; Espinosa; Cardaioli;
Spezanno; Moreno-Sandoval;
Embeddings Justin; Hörtenhuemer; Aguirrezabal; Ogaltsov; Shashirekha;
Babaei; Labadie; Hashemi; Cilet; Majumder;
...BERT Spezanno; Kaushik; Baruah; Chien;
PAN’20
* 9 teams have used Symanto API to obtain psycholinguistic and/or emotional features
Approaches - Methods
11
Author
Profiling
SVM Pizarro; Vogel; Koloski; Espinosa; Fernández; Hashemi; Lichouri;
Aguirrezabal; Fersini
Logistic regression Buda; Vogel; Koloski; Hörtennhuemer; Pinnaparaju; Aguirrezabal; Manna
Random Forest Cardaioli; Espinosa; Hashemi; Aguirrezabal; Sandoval; Manna
Ensembles Ikade; Shrestha; Shashirekha; Niven
Multilayer Perceptron Aguerrizabal
NN with Dense Layer Baruah
Fully-Connected NN Giglou
CNN Chilet
LSTM Majumder; Labadie
bi-LSTM Saeed
Ensemble (GRU + CNN) Bakhteev
PAN’20
Global ranking
12
Author
Profiling
PAN’20
Confusion matrices
13
Author
Profiling
PAN’20
ENGLISH
SPANISH
Best results at PAN'20
14
Author
Profiling
PAN’20
Buda and Bolonyai
- n-Grams
- Stylistic features
- Logistic Regression ensemble
Pizarro
- word and char n-grams
- SVM
Conclusions
● Several approaches to tackle the task:
○ n-Grams + SVM prevailing.
● Best results in English:
○ Over 67% on average.
○ Best (75%): Buda and Bolonyai - n-Grams + Stylistic features + Logistic Regression ensemble
● Best results in Spanish:
○ Over 73% on average.
○ Best (82%): Pizarro - char & word n-Grams + SVM.
● Error analysis:
○ English:
■ False positives (real news spreaders as fake news spreaders): 35.50%
■ False negatives (fake news spreaders as real news spreaders): 30.03%
○ Spanish:
■ False positives (real news spreaders as fake news spreaders): 20.23%
■ False negatives (fake news spreaders as real news spreaders): 35.09%
Looking at the results, we can conclude:
● It is feasible to automatically identify Fake News Spreaders with high precision
○ ...even when only textual features are used.
● We have to bear in mind false positives since especially in English, they sum up to one-third of the
total predictions, and misclassification might lead to ethical or legal implications.
15
Author
Profiling
PAN’20
16
Author
Profiling
PAN’20
Industry at PAN (Author Profiling)
17
Author
Profiling
Organisation
Sponsors
PAN’20
This year, the winners of the task are (ex aequo):
● Jakab Buda and Flora Bolonyai, Eötvös
Loránd University, Hungary
● Juan Pizarro, Chile
2021 -> HATE
speech spreadeRS
18
Author
Profiling
PAN’20
19
Author
Profiling
On behalf of the author profiling task organisers:
Thank you very much for participating
and hope to see you next year!!
PAN’20

Más contenido relacionado

Similar a Overview of the 8th Author Profiling task at PAN: Profiling Fake News Spreaders on Twitter

Babak Rasolzadeh: The importance of entities
Babak Rasolzadeh: The importance of entitiesBabak Rasolzadeh: The importance of entities
Babak Rasolzadeh: The importance of entitiesZoltan Varju
 
Netflix Sample Hispanic Intelligence Report
Netflix Sample Hispanic Intelligence ReportNetflix Sample Hispanic Intelligence Report
Netflix Sample Hispanic Intelligence ReportOYE! Intelligence
 
Profiling Irony and Stereotype Spreaders on Twitter (IROSTEREO)
Profiling Irony and Stereotype Spreaders on Twitter (IROSTEREO)Profiling Irony and Stereotype Spreaders on Twitter (IROSTEREO)
Profiling Irony and Stereotype Spreaders on Twitter (IROSTEREO)Francisco Manuel Rangel Pardo
 
Overview of the 4th. Author Profiling task at PAN-CLEF 2016
Overview of the 4th. Author Profiling task at PAN-CLEF 2016Overview of the 4th. Author Profiling task at PAN-CLEF 2016
Overview of the 4th. Author Profiling task at PAN-CLEF 2016Francisco Manuel Rangel Pardo
 
Leveraging the Power of Social Media
Leveraging the Power of Social MediaLeveraging the Power of Social Media
Leveraging the Power of Social MediaLeon Derczynski
 
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-NietoTowards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-NietoUniversitat Politècnica de Catalunya
 

Similar a Overview of the 8th Author Profiling task at PAN: Profiling Fake News Spreaders on Twitter (7)

Author Profiling. PAN@CLEF-2013 Task
Author Profiling. PAN@CLEF-2013 TaskAuthor Profiling. PAN@CLEF-2013 Task
Author Profiling. PAN@CLEF-2013 Task
 
Babak Rasolzadeh: The importance of entities
Babak Rasolzadeh: The importance of entitiesBabak Rasolzadeh: The importance of entities
Babak Rasolzadeh: The importance of entities
 
Netflix Sample Hispanic Intelligence Report
Netflix Sample Hispanic Intelligence ReportNetflix Sample Hispanic Intelligence Report
Netflix Sample Hispanic Intelligence Report
 
Profiling Irony and Stereotype Spreaders on Twitter (IROSTEREO)
Profiling Irony and Stereotype Spreaders on Twitter (IROSTEREO)Profiling Irony and Stereotype Spreaders on Twitter (IROSTEREO)
Profiling Irony and Stereotype Spreaders on Twitter (IROSTEREO)
 
Overview of the 4th. Author Profiling task at PAN-CLEF 2016
Overview of the 4th. Author Profiling task at PAN-CLEF 2016Overview of the 4th. Author Profiling task at PAN-CLEF 2016
Overview of the 4th. Author Profiling task at PAN-CLEF 2016
 
Leveraging the Power of Social Media
Leveraging the Power of Social MediaLeveraging the Power of Social Media
Leveraging the Power of Social Media
 
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-NietoTowards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
 

Más de Francisco Manuel Rangel Pardo

AL4Trust - Artificial Intelligence for Building Trust 2019
AL4Trust - Artificial Intelligence for Building Trust 2019AL4Trust - Artificial Intelligence for Building Trust 2019
AL4Trust - Artificial Intelligence for Building Trust 2019Francisco Manuel Rangel Pardo
 
Author Profiling en Social Media. En la Academia... y en la Industria.
Author Profiling en Social Media. En la Academia... y en la Industria.Author Profiling en Social Media. En la Academia... y en la Industria.
Author Profiling en Social Media. En la Academia... y en la Industria.Francisco Manuel Rangel Pardo
 
Multimodal Stance Detection in Tweets on Catalan #1Oct Referendum @Ibereval 2...
Multimodal Stance Detection in Tweets on Catalan #1Oct Referendum @Ibereval 2...Multimodal Stance Detection in Tweets on Catalan #1Oct Referendum @Ibereval 2...
Multimodal Stance Detection in Tweets on Catalan #1Oct Referendum @Ibereval 2...Francisco Manuel Rangel Pardo
 
Stance and Gender Detection in Tweets on Catalan Independence. Ibereval@SEPLN...
Stance and Gender Detection in Tweets on Catalan Independence. Ibereval@SEPLN...Stance and Gender Detection in Tweets on Catalan Independence. Ibereval@SEPLN...
Stance and Gender Detection in Tweets on Catalan Independence. Ibereval@SEPLN...Francisco Manuel Rangel Pardo
 
PR-SOCO Personality Recognition in SOurce COde (PAN@FIRE 2016)
PR-SOCO Personality Recognition in SOurce COde (PAN@FIRE 2016)PR-SOCO Personality Recognition in SOurce COde (PAN@FIRE 2016)
PR-SOCO Personality Recognition in SOurce COde (PAN@FIRE 2016)Francisco Manuel Rangel Pardo
 
Overview of PAN'16 - New challenges for Authorship Analysis: Cross-genre prof...
Overview of PAN'16 - New challenges for Authorship Analysis: Cross-genre prof...Overview of PAN'16 - New challenges for Authorship Analysis: Cross-genre prof...
Overview of PAN'16 - New challenges for Authorship Analysis: Cross-genre prof...Francisco Manuel Rangel Pardo
 
El Futuro de las Comunicaciones Personales a Través de los Dispositivos Móvil...
El Futuro de las Comunicaciones Personales a Través de los Dispositivos Móvil...El Futuro de las Comunicaciones Personales a Través de los Dispositivos Móvil...
El Futuro de las Comunicaciones Personales a Través de los Dispositivos Móvil...Francisco Manuel Rangel Pardo
 
A Low Dimensionality Representation for Language Variety Identification (CICL...
A Low Dimensionality Representation for Language Variety Identification (CICL...A Low Dimensionality Representation for Language Variety Identification (CICL...
A Low Dimensionality Representation for Language Variety Identification (CICL...Francisco Manuel Rangel Pardo
 
Language Variety Identification using Distributed Representations of Words an...
Language Variety Identification using Distributed Representations of Words an...Language Variety Identification using Distributed Representations of Words an...
Language Variety Identification using Distributed Representations of Words an...Francisco Manuel Rangel Pardo
 
Native Language Identification - Brief review to the state of the art
Native Language Identification - Brief review to the state of the artNative Language Identification - Brief review to the state of the art
Native Language Identification - Brief review to the state of the artFrancisco Manuel Rangel Pardo
 
Overview of the 2nd. Author Profiling task at PAN-CLEF 2014
Overview of the 2nd. Author Profiling task at PAN-CLEF 2014Overview of the 2nd. Author Profiling task at PAN-CLEF 2014
Overview of the 2nd. Author Profiling task at PAN-CLEF 2014Francisco Manuel Rangel Pardo
 
Social Business Intelligence - Inteligencia Social de Negocio
Social Business Intelligence - Inteligencia Social de NegocioSocial Business Intelligence - Inteligencia Social de Negocio
Social Business Intelligence - Inteligencia Social de NegocioFrancisco Manuel Rangel Pardo
 
Dualidad onda-partícula del científico de datos en la empresa
Dualidad onda-partícula del científico de datos en la empresaDualidad onda-partícula del científico de datos en la empresa
Dualidad onda-partícula del científico de datos en la empresaFrancisco Manuel Rangel Pardo
 

Más de Francisco Manuel Rangel Pardo (20)

AL4Trust - Artificial Intelligence for Building Trust 2019
AL4Trust - Artificial Intelligence for Building Trust 2019AL4Trust - Artificial Intelligence for Building Trust 2019
AL4Trust - Artificial Intelligence for Building Trust 2019
 
Author Profiling en Social Media. En la Academia... y en la Industria.
Author Profiling en Social Media. En la Academia... y en la Industria.Author Profiling en Social Media. En la Academia... y en la Industria.
Author Profiling en Social Media. En la Academia... y en la Industria.
 
Multimodal Stance Detection in Tweets on Catalan #1Oct Referendum @Ibereval 2...
Multimodal Stance Detection in Tweets on Catalan #1Oct Referendum @Ibereval 2...Multimodal Stance Detection in Tweets on Catalan #1Oct Referendum @Ibereval 2...
Multimodal Stance Detection in Tweets on Catalan #1Oct Referendum @Ibereval 2...
 
Stance and Gender Detection in Tweets on Catalan Independence. Ibereval@SEPLN...
Stance and Gender Detection in Tweets on Catalan Independence. Ibereval@SEPLN...Stance and Gender Detection in Tweets on Catalan Independence. Ibereval@SEPLN...
Stance and Gender Detection in Tweets on Catalan Independence. Ibereval@SEPLN...
 
Redes sociales y preadolescentes
Redes sociales y preadolescentesRedes sociales y preadolescentes
Redes sociales y preadolescentes
 
PR-SOCO Personality Recognition in SOurce COde (PAN@FIRE 2016)
PR-SOCO Personality Recognition in SOurce COde (PAN@FIRE 2016)PR-SOCO Personality Recognition in SOurce COde (PAN@FIRE 2016)
PR-SOCO Personality Recognition in SOurce COde (PAN@FIRE 2016)
 
Overview of PAN'16 - New challenges for Authorship Analysis: Cross-genre prof...
Overview of PAN'16 - New challenges for Authorship Analysis: Cross-genre prof...Overview of PAN'16 - New challenges for Authorship Analysis: Cross-genre prof...
Overview of PAN'16 - New challenges for Authorship Analysis: Cross-genre prof...
 
El Futuro de las Comunicaciones Personales a Través de los Dispositivos Móvil...
El Futuro de las Comunicaciones Personales a Través de los Dispositivos Móvil...El Futuro de las Comunicaciones Personales a Través de los Dispositivos Móvil...
El Futuro de las Comunicaciones Personales a Través de los Dispositivos Móvil...
 
Smart Listening - MUIinf
Smart Listening - MUIinfSmart Listening - MUIinf
Smart Listening - MUIinf
 
IA + Big Data = problema + oportunidad
IA + Big Data = problema + oportunidadIA + Big Data = problema + oportunidad
IA + Big Data = problema + oportunidad
 
A Low Dimensionality Representation for Language Variety Identification (CICL...
A Low Dimensionality Representation for Language Variety Identification (CICL...A Low Dimensionality Representation for Language Variety Identification (CICL...
A Low Dimensionality Representation for Language Variety Identification (CICL...
 
Language Variety Identification using Distributed Representations of Words an...
Language Variety Identification using Distributed Representations of Words an...Language Variety Identification using Distributed Representations of Words an...
Language Variety Identification using Distributed Representations of Words an...
 
Author Profiling task at PAN Lab at CLEF 2015
Author Profiling task at PAN Lab at CLEF 2015Author Profiling task at PAN Lab at CLEF 2015
Author Profiling task at PAN Lab at CLEF 2015
 
EmoGraph for Age and Gender Identification
EmoGraph for Age and Gender IdentificationEmoGraph for Age and Gender Identification
EmoGraph for Age and Gender Identification
 
My Phd Student T-Shirt
My Phd Student T-ShirtMy Phd Student T-Shirt
My Phd Student T-Shirt
 
Kico's Stairway to Phd
Kico's Stairway to PhdKico's Stairway to Phd
Kico's Stairway to Phd
 
Native Language Identification - Brief review to the state of the art
Native Language Identification - Brief review to the state of the artNative Language Identification - Brief review to the state of the art
Native Language Identification - Brief review to the state of the art
 
Overview of the 2nd. Author Profiling task at PAN-CLEF 2014
Overview of the 2nd. Author Profiling task at PAN-CLEF 2014Overview of the 2nd. Author Profiling task at PAN-CLEF 2014
Overview of the 2nd. Author Profiling task at PAN-CLEF 2014
 
Social Business Intelligence - Inteligencia Social de Negocio
Social Business Intelligence - Inteligencia Social de NegocioSocial Business Intelligence - Inteligencia Social de Negocio
Social Business Intelligence - Inteligencia Social de Negocio
 
Dualidad onda-partícula del científico de datos en la empresa
Dualidad onda-partícula del científico de datos en la empresaDualidad onda-partícula del científico de datos en la empresa
Dualidad onda-partícula del científico de datos en la empresa
 

Último

Giridih Escorts Service Girl ^ 9332606886, WhatsApp Anytime Giridih
Giridih Escorts Service Girl ^ 9332606886, WhatsApp Anytime GiridihGiridih Escorts Service Girl ^ 9332606886, WhatsApp Anytime Giridih
Giridih Escorts Service Girl ^ 9332606886, WhatsApp Anytime Giridihmeghakumariji156
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...nirzagarg
 
20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdfkhraisr
 
💞 Safe And Secure Call Girls Agra Call Girls Service Just Call 🍑👄6378878445 🍑...
💞 Safe And Secure Call Girls Agra Call Girls Service Just Call 🍑👄6378878445 🍑...💞 Safe And Secure Call Girls Agra Call Girls Service Just Call 🍑👄6378878445 🍑...
💞 Safe And Secure Call Girls Agra Call Girls Service Just Call 🍑👄6378878445 🍑...vershagrag
 
Top profile Call Girls In Rohtak [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Rohtak [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Rohtak [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Rohtak [ 7014168258 ] Call Me For Genuine Models We...nirzagarg
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangeThinkInnovation
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowgargpaaro
 
Case Study 4 Where the cry of rebellion happen?
Case Study 4 Where the cry of rebellion happen?Case Study 4 Where the cry of rebellion happen?
Case Study 4 Where the cry of rebellion happen?RemarkSemacio
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...nirzagarg
 
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...SOFTTECHHUB
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareGraham Ware
 
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...HyderabadDolls
 
Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...
Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...
Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...HyderabadDolls
 
Introduction to Statistics Presentation.pptx
Introduction to Statistics Presentation.pptxIntroduction to Statistics Presentation.pptx
Introduction to Statistics Presentation.pptxAniqa Zai
 
Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...
Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...
Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...HyderabadDolls
 
Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...
Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...
Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...HyderabadDolls
 
Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...gajnagarg
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...nirzagarg
 
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...ThinkInnovation
 

Último (20)

Giridih Escorts Service Girl ^ 9332606886, WhatsApp Anytime Giridih
Giridih Escorts Service Girl ^ 9332606886, WhatsApp Anytime GiridihGiridih Escorts Service Girl ^ 9332606886, WhatsApp Anytime Giridih
Giridih Escorts Service Girl ^ 9332606886, WhatsApp Anytime Giridih
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf
 
💞 Safe And Secure Call Girls Agra Call Girls Service Just Call 🍑👄6378878445 🍑...
💞 Safe And Secure Call Girls Agra Call Girls Service Just Call 🍑👄6378878445 🍑...💞 Safe And Secure Call Girls Agra Call Girls Service Just Call 🍑👄6378878445 🍑...
💞 Safe And Secure Call Girls Agra Call Girls Service Just Call 🍑👄6378878445 🍑...
 
Top profile Call Girls In Rohtak [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Rohtak [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Rohtak [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Rohtak [ 7014168258 ] Call Me For Genuine Models We...
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 
Case Study 4 Where the cry of rebellion happen?
Case Study 4 Where the cry of rebellion happen?Case Study 4 Where the cry of rebellion happen?
Case Study 4 Where the cry of rebellion happen?
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
 
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
 
Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...
Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...
Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...
 
Introduction to Statistics Presentation.pptx
Introduction to Statistics Presentation.pptxIntroduction to Statistics Presentation.pptx
Introduction to Statistics Presentation.pptx
 
Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...
Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...
Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...
 
Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...
Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...
Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...
 
Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
 
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
 

Overview of the 8th Author Profiling task at PAN: Profiling Fake News Spreaders on Twitter

  • 1. 8th Author Profiling task at PAN Profiling Fake News Spreaders on Twitter PAN-AP-2020 CLEF 2020 Online, 22-25 September Francisco Rangel Symanto Research Paolo Rosso PRHLT Research Center Universitat Politècnica de Valencia Bilal Ghanem Symanto Research Anastasia Giachanou PRHLT Research Center Universitat Politècnica de Valencia
  • 2. Introduction Author profiling aims at identifying personal traits such as age, gender, personality traits, native language, language variety… from writings? This is crucial for: - Marketing. - Security. - Forensics. 2 Author Profiling PAN’20
  • 3. Task goal Given a Twitter feed, determine whether its author is keen to spread fake news or not. 3 Author Profiling Two languages: English Spanish PAN’20
  • 4. Corpus 4 Author Profiling PAN’20 (EN) English (ES) Spanish Keen to spread fake news Not keen to spread fake news Total Keen to spread fake news Not keen to spread fake news Total Training 150 150 300 150 150 300 Test 100 100 200 100 100 200 Total 250 250 500 250 250 500 Methodology 1. Selection of fake news from Politifact and Snopes related sites (+ manual review). 2. Collection of tweets responding to the previous news: 2.1. Manual inspection to ensure that the tweet refers to the news. 2.2. Manual annotation of those tweets supporting vs. rejecting the news. 3. Timeline collection 3.1. Manual review of the tweets to label the fake ones. 3.2. Users with one of more fake tweets are keen to spread them. Otherwise, they are not. 3.3. Removal of tweets referring explicitly to the fake news (to avoid bias).
  • 5. Evaluation measures 5 Author Profiling PAN’20 The accuracy is calculated per language and averaged:
  • 6. Baselines 6 Author Profiling PAN’20 RANDOM A baseline that randomly generates the predictions among the different classes LSTM An Long Short-Term Memory neural network that uses FastTex embeddings to represent texts. CHAR N-GRAMS With values for $n$ from 2 to 6, with a SVM WORD N-GRAMS With values for $n$ from 1 to 3, with a Neural Network EIN The Emotionally-Infused Neural (EIN) network with word embedding and emotional features as the input of an LSTM Symanto (LDSE) This method represents documents on the basis of the probability distribution of occurrence of their words in the different classes. The key concept of LDSE is a weight, representing the probability of a term to belong to one of the different categories: fake news spreaders / non-spreader. The distribution of weights for a given document should be closer to the weights of its corresponding category. LDSE takes advantage of the whole vocabulary
  • 7. 66 participants 33 working notes 22 countries 7 Author Profiling PAN’20 Participation https://mapchart.net/world.html
  • 9. Approaches - Preprocessing 9 Author Profiling Twitter elements (RT, VIA, FAV) Giglou; Hashemi; Pinnaparaju Emojis and other non-alphanumeric chars Buda; Pinnaparaju; Vogel; Giglou; Espinosa; Majumder; Lichouri; Shashirekha Lemmatisation Giglou; Hashemi; Lichouri; Shashirekha Tokenisation Vogel; Labadie; Fernández; Espinosa; Lichouri; Shashirekha; Baruah Punctuation signs Vogel; Koloski; Giglou; Espinosa; Hashemi; Lichouri; Shashirekha Numbers Pizarro; Vogel; Giglou; Espinosa; Hashemi; Shashirekha Lowercase Buda; Pizarro; Vogel; Pinnaparaju Stopwords Vogel; Koloski; Giglou; Espinosa; Hashemi; Lichouri; Shashirekha Character flooding Vogel; Labadie Infrequent terms Ikade Short texts Vogel PAN’20
  • 10. Approaches - Features 10 Author Profiling Stylistic features: - Number of occurrences - Verbs, adjs, pronouns - Number of hashtags, mentions, URLs... - Capital vs. lower letters - Punctuation marks - ... Manna; Buda; Lichouri; Justin; Niven; Russo; Hörtenhuemer; Cardaioli; Spezanno; Ogaltsov; Labadie; Hashemi; Moreno-Sandoval; N-gram models Pizarro; Espinosa; Vogel; Koloski; López-Fernández; Vijayasaradhi; Buda; Lichouri; Justin; Hörtenhuemer; Spezanno; Aguirrezabal; Shashirekha; Babaei; Labadie; Hashemi; Emotional and personality features Justin; Niven; Russo; Hörtenhuemer; Espinosa; Cardaioli; Spezanno; Moreno-Sandoval; Embeddings Justin; Hörtenhuemer; Aguirrezabal; Ogaltsov; Shashirekha; Babaei; Labadie; Hashemi; Cilet; Majumder; ...BERT Spezanno; Kaushik; Baruah; Chien; PAN’20 * 9 teams have used Symanto API to obtain psycholinguistic and/or emotional features
  • 11. Approaches - Methods 11 Author Profiling SVM Pizarro; Vogel; Koloski; Espinosa; Fernández; Hashemi; Lichouri; Aguirrezabal; Fersini Logistic regression Buda; Vogel; Koloski; Hörtennhuemer; Pinnaparaju; Aguirrezabal; Manna Random Forest Cardaioli; Espinosa; Hashemi; Aguirrezabal; Sandoval; Manna Ensembles Ikade; Shrestha; Shashirekha; Niven Multilayer Perceptron Aguerrizabal NN with Dense Layer Baruah Fully-Connected NN Giglou CNN Chilet LSTM Majumder; Labadie bi-LSTM Saeed Ensemble (GRU + CNN) Bakhteev PAN’20
  • 14. Best results at PAN'20 14 Author Profiling PAN’20 Buda and Bolonyai - n-Grams - Stylistic features - Logistic Regression ensemble Pizarro - word and char n-grams - SVM
  • 15. Conclusions ● Several approaches to tackle the task: ○ n-Grams + SVM prevailing. ● Best results in English: ○ Over 67% on average. ○ Best (75%): Buda and Bolonyai - n-Grams + Stylistic features + Logistic Regression ensemble ● Best results in Spanish: ○ Over 73% on average. ○ Best (82%): Pizarro - char & word n-Grams + SVM. ● Error analysis: ○ English: ■ False positives (real news spreaders as fake news spreaders): 35.50% ■ False negatives (fake news spreaders as real news spreaders): 30.03% ○ Spanish: ■ False positives (real news spreaders as fake news spreaders): 20.23% ■ False negatives (fake news spreaders as real news spreaders): 35.09% Looking at the results, we can conclude: ● It is feasible to automatically identify Fake News Spreaders with high precision ○ ...even when only textual features are used. ● We have to bear in mind false positives since especially in English, they sum up to one-third of the total predictions, and misclassification might lead to ethical or legal implications. 15 Author Profiling PAN’20
  • 17. Industry at PAN (Author Profiling) 17 Author Profiling Organisation Sponsors PAN’20 This year, the winners of the task are (ex aequo): ● Jakab Buda and Flora Bolonyai, Eötvös Loránd University, Hungary ● Juan Pizarro, Chile
  • 18. 2021 -> HATE speech spreadeRS 18 Author Profiling PAN’20
  • 19. 19 Author Profiling On behalf of the author profiling task organisers: Thank you very much for participating and hope to see you next year!! PAN’20