SlideShare una empresa de Scribd logo
1 de 21
Descargar para leer sin conexión
Weather events identification in
social media streams: tools to
detect their evidence in Twitter
Alfonso Crisci(1)
Valentina Grasso (1,2), Imad Zaza (3) , Federica
Zabini (1,2), Paolo Nesi (3) and Gianni Pantaleo (3)
(1) CNR Ibimet, Italian National Research Council, Florence, Italy
(v.grasso@ibimet.cnr.it)
(2) LaMMA Consortium, Italian National Research Council, Florence,
Italy.
(3) DISIT Lab, Distributed [Systems and internet | Data Intelligence and]
Technologies Lab, Dep. of Information Engineering (DINFO), University
of Florence, Italy
Where the River Rises: A River of Words.
Source Going Deeper or Flatter: Connecting Deep Mapping, Flat
Ontologies and the Democratizing of Knowledge
Humanities 2015, 4(4), 623-636; doi:10.3390/h4040623
http://selinaspringett.com/wp/?p=166
OGRS'16, October 12-14, 2016,
Perugia, Italy
Open Source Geospatial Research & Education Symposium
OGRS is a meeting dedicated to sharing knowledge, new
solutions, methods, practices, ideas and trends in the field of
geospatial information through the development and the use
of free and open source software in both research and
education.
● Social media data become a powerful real time
informative source to monitor the impacts of
meteoclimatic events.
● The reliability of social media activity and content on
weather related natural risk is a current topic.
● Weather events are bounded in time and space and
could have a deep impact on civil society
● Performing real-time events detection from social
media sources requires tools and a
cross-disciplinary framework.
● Methods to detect and track natural hazards by
using public informative continue to increase in
coverage, resolution and reliability.
Background
River of Words is a project by writer Israel
Centeno and visual artists Carolina Arnal and
Gisela Romero Pittsburg
● Report a 4 year experience on analysis of Twitter
social media (SM) streams related to weather.
● Share background needed for an effective information
extraction from public SM flows paying attention to a
fast and real detection of impacting weather events in
order to increase situational awareness.
● Showing some empirical evidences to help implement
a real open toolchain leading to a visual dashboard for
regional/national weather services.
Aims of work
Social Media Public Sources: crowd-sensing by using Twitter data
Who feed information in ?
Citizen, Institution, Institutional Public Services, Business
Companies, Community - NGO organization,
Media,Conversational Bots and Sensors Bots.
Many and various
On-line multilanguage platform for
social-networking and microblogging.
Twitter data perform significant crowd-sensing.
6.4 million of active users in Italy (2015)
RTW_TW: N° of tweets & retweets
RTW: N° of tweets & retweets
TW: N of native tweets & retweets
U_native_users: N° of native TW authors
U_full_user: N° authors of TW & RTW
U_unique_hashtag: N° of hash-TAG
Daily activity common metrics
https://github.com/alfcrisci/rTwChannel
Metrics of activity
Open Source tools
DATA EXTRACTION by word query : the Twitter Vigilance DISIT Model
Twitter VIgilance platform is an environment
developed by DISIT University of Florence
that:
● Manage multiple queries in twitter API
● Store the data of messages collected by user
defined queries -> channel
● It is a dashboard able to visualize data collecting
process & analytics of twitter metrics of channel
http://www.disit.org/tv/
Social media data: Channel “CALDO”
http://www.disit.org/tv/index.php?p=chart_singlechannel&canale=CALDO
EVENT DETECTION in social media: fishing words/tags in a FLOOD of in dropped messages
EVENT DETECTION in social media: words/tags creates “word drainage” channels
#maltempo
@meteousers
#allertameteo
Different queries
correspond to different
data extraction volumes.
Each one can be
summarized as daily
statistics by using key
activity metric and
obtaining different
representable flows in
function of the querying
parameters (terms,
hashtags (#) or users
(@).
DATA EXTRACTION : Twitter corpora have the highest lexical diversity
Modeling Word Meaning: Distributional Semantics and the Corpus
Quality-Quantity Trade-Off
S Sridharan, B Murphy
Proceedings of the 24th International Conference on Computational Linguistics
Twitter shows many OOV
out-of-vocabulary terms that
change in function of the events.
Considering specific features of
Twitter textual data it is ever hard to
mine conversations.
DATA EXTRACTION : Terms of channel have different extractive capabilities
More “crowd” corpora More “oriented community based”
corpora
Italian Corpus
repubblica.it
Maltempo
Channel
Weather
services TW
Channel
Weather People
communities
TW Channel
Semantic capabilities ( N of
connection) of words are
different
and their power of extraction
in the Twitter streams
differs in function to
Users sets.
In popular streams wide
sense words seems also
high frequency words and
are able extract more
messages.
Babelnet.org metrics are
used for meteorology
category.
Semantic network of word sense: word “PIOGGIA” from Babelnet.org
EVENT IDENTIFICATION: Social media and real-world event condition for linking
Identification and
Characterization of Events in
Social Media
Hila Becker
Thesis
Columbia University 2011
http://www.cs.columbia.edu/~hila/
publications.html
● Document stream: a time ordered sequences of featured
documents Dev
● Trending time: a trending time period for a feature over a
document stream is time period where document
frequency of the features in document stream is
substantially higher than expected.
● Trending event : is a real-world occurrence with an (1)
associated time ( Tev),(2) a stream document (Dev) about
the occurrence and published during Tev and (3) one or
more features that describes the occurrence and for
which Tev is trending time period over document stream
(Dev).
EVENT DETECTION in social media: what is a Twitter channel?
● Channel is a conversation stream obtained by a recursive API query of tweets (Twitter
API 1.1)
● Channel produces a dataset with a time dimension.
● Channel has a set of contributors (original unique authors or amplifiers by re-tweet
mechanism).
● Channel is a virtual conversational space among different communities.
● Channel is a textual corpus with specific linguistic and semantic properties.
● Channel has proper activity dynamics and behaviours regarding social media
mechanisms (retweeting, mentioning, tagging, media and linking content)
● Channel could be representative of a specific web-community or not.
● Semantic oriented channels could represent a set of topic specific informations.
EVENT DETECTION. Social media and real-world event: conditions for their link.
As a flood needs rain, in social media
real-impact events needs documents.
Quantitative relation relies on some factors:
● Population density of the area impacted
● Level of awareness of people involved
● Digital literacy of impacted population
● Institutional preparedness to weather related
risks
EVENT DETECTION : 12-08-2015 Twitter channels for Rossano (Calabria) flood event
Image source :
http://www.caritasitaliana.it/pls/caritasitaliana/
EVENT DETECTION : 01-10-2015 Twitter channels for Olbia (Sardinia) flood event
Image source
http://www.ilpost.it/2013/11/19/foto-alluvione-sardegna/alluvione-sardegna-14/
EVENT DETECTION : 01-08-2015 twitter channels for Firenze Toscana burst event
Image Sources:
http://aldopiombino.blogspot.it/2015/08/il-downburst-del-temporale-di-fir
enze.html
Channel “Caldo” and Heatwave periods 2015 (15 May to 15 September)
A
B
C
D
E
Framing conjectures for detection of weather events
● For weather events that have a slow impact “trending
periods” of SM activity work well.
● For the detection of fast and sudden events the
synchronization of trending periods could be exploited by
using search terms with different semantic extents.
● During synchronization time document frequency reach its
relative maximum creating a well recognizable pattern and
event is clearly detected (half onion effect).
● Geographical terms show their importance because are good
proxy of event ‘s situational awareness.
● During severe events trending topics generally contain the
name of the places involved.
http://www.shutterstock.com/pic-86262523/stock-
photo-a-red-onion-sliced-in-half-isolated-on-white
-background.html
● Public & accessible social media data could be considered as a real huge informative data stream
concerning severe weather events or other climatic threats.
● Social media (SM) data acquisition, storage and filtering to obtain reliable data collections requires
huge work and dedicated tools. It is a real challenge for open source developers.
● Finding appropriate query-terms to obtain suitable data. The semantic tuning of TW channels is
ever required. Platforms as TwitterVigilance are suitable example.
● Weather events are different in space, time and atmospheric processes involved. Events detection
requires a specific search strategy. Appropriate use of semantic features of words point to
interesting directions.
Conclusion
Contacts
Image from :https://commons.wikimedia.org/wiki/File:Thank-you-word-cloud.jpg
Communication analyst
Valentina Grasso
grasso@lamma.rete.toscana.it
v.grasso@ibimet.cnr..it
Federica Zabini
zabini@lammarete.toscana.it
http://www.lamma.rete.toscana.it
http://www.ibimet.cnr.it/
Biometeorology
Alfonso Crisci
a.crisci@ibimet.cnr.it
alfcrisci@gmail.com
@alf_crisci
Informatics Science
Imad Zaza
Imad_zaza@unifi.it
Paolo Nesi
paolo.nesi@unifi.it
Gianni Pantaleo
gianni.pantaleo@unifi.it
http://www.disit.org/
http://www.disit.org/tv/
Data & Code:
https://github.com/alfcrisci/ogrs_2016_weathersocial_paper
This study was carried out in the field of the CARISMAND Project:
Culture And RISkmanagement in Man-made And Natural Disasters
http://www.carismand.eu/

Más contenido relacionado

Similar a Weather events identification in social media streams: tools to detect their evidence in Twitter

Expelling Information of Events from Critical Public Space using Social Senso...
Expelling Information of Events from Critical Public Space using Social Senso...Expelling Information of Events from Critical Public Space using Social Senso...
Expelling Information of Events from Critical Public Space using Social Senso...
ijtsrd
 
Eswc2013 audience short
Eswc2013 audience shortEswc2013 audience short
Eswc2013 audience short
Claudia Wagner
 
Beyond-Data-Literacy-2015
Beyond-Data-Literacy-2015Beyond-Data-Literacy-2015
Beyond-Data-Literacy-2015
Amanda noonan
 

Similar a Weather events identification in social media streams: tools to detect their evidence in Twitter (20)

Fusing text and image for event
Fusing text and image for eventFusing text and image for event
Fusing text and image for event
 
Twitter Intelligent Sensor Agent
Twitter Intelligent Sensor AgentTwitter Intelligent Sensor Agent
Twitter Intelligent Sensor Agent
 
From Research to Applications: What Can We Extract with Social Media Sensing?
From Research to Applications: What Can We Extract with Social Media Sensing?From Research to Applications: What Can We Extract with Social Media Sensing?
From Research to Applications: What Can We Extract with Social Media Sensing?
 
Eavesdropping on the Twitter Microblogging Site
Eavesdropping on the Twitter Microblogging SiteEavesdropping on the Twitter Microblogging Site
Eavesdropping on the Twitter Microblogging Site
 
SEVERE WEATHER EVENTS AND SOCIAL MEDIA STREAMS: BIGDATA APPROACH FOR IMPACT M...
SEVERE WEATHER EVENTS AND SOCIAL MEDIA STREAMS: BIGDATA APPROACH FOR IMPACT M...SEVERE WEATHER EVENTS AND SOCIAL MEDIA STREAMS: BIGDATA APPROACH FOR IMPACT M...
SEVERE WEATHER EVENTS AND SOCIAL MEDIA STREAMS: BIGDATA APPROACH FOR IMPACT M...
 
Expelling Information of Events from Critical Public Space using Social Senso...
Expelling Information of Events from Critical Public Space using Social Senso...Expelling Information of Events from Critical Public Space using Social Senso...
Expelling Information of Events from Critical Public Space using Social Senso...
 
Strategic perspectives 3
Strategic perspectives 3Strategic perspectives 3
Strategic perspectives 3
 
Twitter, Public Communication and the Media Ecology: The Case of the Queensla...
Twitter, Public Communication and the Media Ecology: The Case of the Queensla...Twitter, Public Communication and the Media Ecology: The Case of the Queensla...
Twitter, Public Communication and the Media Ecology: The Case of the Queensla...
 
Public crowd-sensing of heat-waves by social media data
Public crowd-sensing of heat-waves by social media dataPublic crowd-sensing of heat-waves by social media data
Public crowd-sensing of heat-waves by social media data
 
s00146-014-0549-4.pdf
s00146-014-0549-4.pdfs00146-014-0549-4.pdf
s00146-014-0549-4.pdf
 
IRJET- Identification of Prevalent News from Twitter and Traditional Media us...
IRJET- Identification of Prevalent News from Twitter and Traditional Media us...IRJET- Identification of Prevalent News from Twitter and Traditional Media us...
IRJET- Identification of Prevalent News from Twitter and Traditional Media us...
 
Processing Large Complex Data
Processing Large Complex DataProcessing Large Complex Data
Processing Large Complex Data
 
Twitter analytics: some thoughts on sampling, tools, data, ethics and user re...
Twitter analytics: some thoughts on sampling, tools, data, ethics and user re...Twitter analytics: some thoughts on sampling, tools, data, ethics and user re...
Twitter analytics: some thoughts on sampling, tools, data, ethics and user re...
 
Pestle based event detection and classification
Pestle based event detection and classificationPestle based event detection and classification
Pestle based event detection and classification
 
Eswc2013 audience short
Eswc2013 audience shortEswc2013 audience short
Eswc2013 audience short
 
Meliorating usable document density for online event detection
Meliorating usable document density for online event detectionMeliorating usable document density for online event detection
Meliorating usable document density for online event detection
 
Beyond-Data-Literacy-2015
Beyond-Data-Literacy-2015Beyond-Data-Literacy-2015
Beyond-Data-Literacy-2015
 
Evolving social data mining and affective analysis
Evolving social data mining and affective analysis  Evolving social data mining and affective analysis
Evolving social data mining and affective analysis
 
Mapping Online Publics: Researching the Uses of Twitter
Mapping Online Publics: Researching the Uses of TwitterMapping Online Publics: Researching the Uses of Twitter
Mapping Online Publics: Researching the Uses of Twitter
 
Dealing with Information Overload When Using Social Media for Emergency Manag...
Dealing with Information Overload When Using Social Media for Emergency Manag...Dealing with Information Overload When Using Social Media for Emergency Manag...
Dealing with Information Overload When Using Social Media for Emergency Manag...
 

Más de Alfonso Crisci

Mappiamo biodiversita
Mappiamo biodiversitaMappiamo biodiversita
Mappiamo biodiversita
Alfonso Crisci
 

Más de Alfonso Crisci (20)

monitoraggio isola di calore
monitoraggio isola di caloremonitoraggio isola di calore
monitoraggio isola di calore
 
Terrazzi
TerrazziTerrazzi
Terrazzi
 
Mappatura_hotspot_termici
Mappatura_hotspot_termiciMappatura_hotspot_termici
Mappatura_hotspot_termici
 
Ecosemiotica del territorio
Ecosemiotica del territorioEcosemiotica del territorio
Ecosemiotica del territorio
 
Complessità nascoste
Complessità nascosteComplessità nascoste
Complessità nascoste
 
Sistemi digitali per la geolocalizzazione di piante aromatiche e medicinali
Sistemi digitali per la geolocalizzazione di piante aromatiche e medicinali Sistemi digitali per la geolocalizzazione di piante aromatiche e medicinali
Sistemi digitali per la geolocalizzazione di piante aromatiche e medicinali
 
Mappiamo biodiversita
Mappiamo biodiversitaMappiamo biodiversita
Mappiamo biodiversita
 
Resilienza climatica
Resilienza climaticaResilienza climatica
Resilienza climatica
 
La città che scotta: le prospettive e i dati per la valutazione della resilie...
La città che scotta: le prospettive e i dati per la valutazione della resilie...La città che scotta: le prospettive e i dati per la valutazione della resilie...
La città che scotta: le prospettive e i dati per la valutazione della resilie...
 
Italian weather type
Italian weather typeItalian weather type
Italian weather type
 
Not only rome burns
Not only rome burnsNot only rome burns
Not only rome burns
 
Cemento e l'eroica vendetta del letame
Cemento e l'eroica vendetta del letameCemento e l'eroica vendetta del letame
Cemento e l'eroica vendetta del letame
 
#SoilDay Roma
#SoilDay Roma #SoilDay Roma
#SoilDay Roma
 
Heat Wave risk mapping in Europe for elderly people
Heat Wave risk mapping in Europe for elderly peopleHeat Wave risk mapping in Europe for elderly people
Heat Wave risk mapping in Europe for elderly people
 
IBIMET Heat WAVE resiliency
IBIMET Heat WAVE resiliency IBIMET Heat WAVE resiliency
IBIMET Heat WAVE resiliency
 
Flyers Smart Cities and Big Data
Flyers Smart Cities and Big Data Flyers Smart Cities and Big Data
Flyers Smart Cities and Big Data
 
Smart cities and Big data DISIT
Smart cities and Big data DISITSmart cities and Big data DISIT
Smart cities and Big data DISIT
 
A. albopictus: la piattaforma digitale dedicata all’ascolto e alla visualizza...
A. albopictus: la piattaforma digitale dedicata all’ascolto e alla visualizza...A. albopictus: la piattaforma digitale dedicata all’ascolto e alla visualizza...
A. albopictus: la piattaforma digitale dedicata all’ascolto e alla visualizza...
 
RedLav meeting
RedLav meetingRedLav meeting
RedLav meeting
 
Lana e Comfort Termico: un esperienza concreta in Toscana
Lana e Comfort Termico: un esperienza concreta in ToscanaLana e Comfort Termico: un esperienza concreta in Toscana
Lana e Comfort Termico: un esperienza concreta in Toscana
 

Último

Introduction,importance and scope of horticulture.pptx
Introduction,importance and scope of horticulture.pptxIntroduction,importance and scope of horticulture.pptx
Introduction,importance and scope of horticulture.pptx
Bhagirath Gogikar
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
PirithiRaju
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
Areesha Ahmad
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformation
Areesha Ahmad
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Sérgio Sacani
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Sérgio Sacani
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
Sérgio Sacani
 

Último (20)

Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
 
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
 
Introduction,importance and scope of horticulture.pptx
Introduction,importance and scope of horticulture.pptxIntroduction,importance and scope of horticulture.pptx
Introduction,importance and scope of horticulture.pptx
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
 
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verifiedConnaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
 
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformation
 
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
 
GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learning
 
STS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATION
STS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATIONSTS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATION
STS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATION
 
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
 
IDENTIFICATION OF THE LIVING- forensic medicine
IDENTIFICATION OF THE LIVING- forensic medicineIDENTIFICATION OF THE LIVING- forensic medicine
IDENTIFICATION OF THE LIVING- forensic medicine
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 

Weather events identification in social media streams: tools to detect their evidence in Twitter

  • 1. Weather events identification in social media streams: tools to detect their evidence in Twitter Alfonso Crisci(1) Valentina Grasso (1,2), Imad Zaza (3) , Federica Zabini (1,2), Paolo Nesi (3) and Gianni Pantaleo (3) (1) CNR Ibimet, Italian National Research Council, Florence, Italy (v.grasso@ibimet.cnr.it) (2) LaMMA Consortium, Italian National Research Council, Florence, Italy. (3) DISIT Lab, Distributed [Systems and internet | Data Intelligence and] Technologies Lab, Dep. of Information Engineering (DINFO), University of Florence, Italy Where the River Rises: A River of Words. Source Going Deeper or Flatter: Connecting Deep Mapping, Flat Ontologies and the Democratizing of Knowledge Humanities 2015, 4(4), 623-636; doi:10.3390/h4040623 http://selinaspringett.com/wp/?p=166 OGRS'16, October 12-14, 2016, Perugia, Italy Open Source Geospatial Research & Education Symposium OGRS is a meeting dedicated to sharing knowledge, new solutions, methods, practices, ideas and trends in the field of geospatial information through the development and the use of free and open source software in both research and education.
  • 2. ● Social media data become a powerful real time informative source to monitor the impacts of meteoclimatic events. ● The reliability of social media activity and content on weather related natural risk is a current topic. ● Weather events are bounded in time and space and could have a deep impact on civil society ● Performing real-time events detection from social media sources requires tools and a cross-disciplinary framework. ● Methods to detect and track natural hazards by using public informative continue to increase in coverage, resolution and reliability. Background River of Words is a project by writer Israel Centeno and visual artists Carolina Arnal and Gisela Romero Pittsburg
  • 3. ● Report a 4 year experience on analysis of Twitter social media (SM) streams related to weather. ● Share background needed for an effective information extraction from public SM flows paying attention to a fast and real detection of impacting weather events in order to increase situational awareness. ● Showing some empirical evidences to help implement a real open toolchain leading to a visual dashboard for regional/national weather services. Aims of work
  • 4. Social Media Public Sources: crowd-sensing by using Twitter data Who feed information in ? Citizen, Institution, Institutional Public Services, Business Companies, Community - NGO organization, Media,Conversational Bots and Sensors Bots. Many and various On-line multilanguage platform for social-networking and microblogging. Twitter data perform significant crowd-sensing. 6.4 million of active users in Italy (2015) RTW_TW: N° of tweets & retweets RTW: N° of tweets & retweets TW: N of native tweets & retweets U_native_users: N° of native TW authors U_full_user: N° authors of TW & RTW U_unique_hashtag: N° of hash-TAG Daily activity common metrics https://github.com/alfcrisci/rTwChannel Metrics of activity Open Source tools
  • 5. DATA EXTRACTION by word query : the Twitter Vigilance DISIT Model Twitter VIgilance platform is an environment developed by DISIT University of Florence that: ● Manage multiple queries in twitter API ● Store the data of messages collected by user defined queries -> channel ● It is a dashboard able to visualize data collecting process & analytics of twitter metrics of channel http://www.disit.org/tv/ Social media data: Channel “CALDO” http://www.disit.org/tv/index.php?p=chart_singlechannel&canale=CALDO
  • 6. EVENT DETECTION in social media: fishing words/tags in a FLOOD of in dropped messages
  • 7. EVENT DETECTION in social media: words/tags creates “word drainage” channels #maltempo @meteousers #allertameteo Different queries correspond to different data extraction volumes. Each one can be summarized as daily statistics by using key activity metric and obtaining different representable flows in function of the querying parameters (terms, hashtags (#) or users (@).
  • 8. DATA EXTRACTION : Twitter corpora have the highest lexical diversity Modeling Word Meaning: Distributional Semantics and the Corpus Quality-Quantity Trade-Off S Sridharan, B Murphy Proceedings of the 24th International Conference on Computational Linguistics Twitter shows many OOV out-of-vocabulary terms that change in function of the events. Considering specific features of Twitter textual data it is ever hard to mine conversations.
  • 9. DATA EXTRACTION : Terms of channel have different extractive capabilities More “crowd” corpora More “oriented community based” corpora Italian Corpus repubblica.it Maltempo Channel Weather services TW Channel Weather People communities TW Channel Semantic capabilities ( N of connection) of words are different and their power of extraction in the Twitter streams differs in function to Users sets. In popular streams wide sense words seems also high frequency words and are able extract more messages. Babelnet.org metrics are used for meteorology category.
  • 10. Semantic network of word sense: word “PIOGGIA” from Babelnet.org
  • 11. EVENT IDENTIFICATION: Social media and real-world event condition for linking Identification and Characterization of Events in Social Media Hila Becker Thesis Columbia University 2011 http://www.cs.columbia.edu/~hila/ publications.html ● Document stream: a time ordered sequences of featured documents Dev ● Trending time: a trending time period for a feature over a document stream is time period where document frequency of the features in document stream is substantially higher than expected. ● Trending event : is a real-world occurrence with an (1) associated time ( Tev),(2) a stream document (Dev) about the occurrence and published during Tev and (3) one or more features that describes the occurrence and for which Tev is trending time period over document stream (Dev).
  • 12. EVENT DETECTION in social media: what is a Twitter channel? ● Channel is a conversation stream obtained by a recursive API query of tweets (Twitter API 1.1) ● Channel produces a dataset with a time dimension. ● Channel has a set of contributors (original unique authors or amplifiers by re-tweet mechanism). ● Channel is a virtual conversational space among different communities. ● Channel is a textual corpus with specific linguistic and semantic properties. ● Channel has proper activity dynamics and behaviours regarding social media mechanisms (retweeting, mentioning, tagging, media and linking content) ● Channel could be representative of a specific web-community or not. ● Semantic oriented channels could represent a set of topic specific informations.
  • 13. EVENT DETECTION. Social media and real-world event: conditions for their link. As a flood needs rain, in social media real-impact events needs documents. Quantitative relation relies on some factors: ● Population density of the area impacted ● Level of awareness of people involved ● Digital literacy of impacted population ● Institutional preparedness to weather related risks
  • 14. EVENT DETECTION : 12-08-2015 Twitter channels for Rossano (Calabria) flood event Image source : http://www.caritasitaliana.it/pls/caritasitaliana/
  • 15. EVENT DETECTION : 01-10-2015 Twitter channels for Olbia (Sardinia) flood event Image source http://www.ilpost.it/2013/11/19/foto-alluvione-sardegna/alluvione-sardegna-14/
  • 16. EVENT DETECTION : 01-08-2015 twitter channels for Firenze Toscana burst event Image Sources: http://aldopiombino.blogspot.it/2015/08/il-downburst-del-temporale-di-fir enze.html
  • 17. Channel “Caldo” and Heatwave periods 2015 (15 May to 15 September) A B C D E
  • 18. Framing conjectures for detection of weather events ● For weather events that have a slow impact “trending periods” of SM activity work well. ● For the detection of fast and sudden events the synchronization of trending periods could be exploited by using search terms with different semantic extents. ● During synchronization time document frequency reach its relative maximum creating a well recognizable pattern and event is clearly detected (half onion effect). ● Geographical terms show their importance because are good proxy of event ‘s situational awareness. ● During severe events trending topics generally contain the name of the places involved. http://www.shutterstock.com/pic-86262523/stock- photo-a-red-onion-sliced-in-half-isolated-on-white -background.html
  • 19. ● Public & accessible social media data could be considered as a real huge informative data stream concerning severe weather events or other climatic threats. ● Social media (SM) data acquisition, storage and filtering to obtain reliable data collections requires huge work and dedicated tools. It is a real challenge for open source developers. ● Finding appropriate query-terms to obtain suitable data. The semantic tuning of TW channels is ever required. Platforms as TwitterVigilance are suitable example. ● Weather events are different in space, time and atmospheric processes involved. Events detection requires a specific search strategy. Appropriate use of semantic features of words point to interesting directions. Conclusion
  • 20. Contacts Image from :https://commons.wikimedia.org/wiki/File:Thank-you-word-cloud.jpg Communication analyst Valentina Grasso grasso@lamma.rete.toscana.it v.grasso@ibimet.cnr..it Federica Zabini zabini@lammarete.toscana.it http://www.lamma.rete.toscana.it http://www.ibimet.cnr.it/ Biometeorology Alfonso Crisci a.crisci@ibimet.cnr.it alfcrisci@gmail.com @alf_crisci Informatics Science Imad Zaza Imad_zaza@unifi.it Paolo Nesi paolo.nesi@unifi.it Gianni Pantaleo gianni.pantaleo@unifi.it http://www.disit.org/ http://www.disit.org/tv/ Data & Code: https://github.com/alfcrisci/ogrs_2016_weathersocial_paper
  • 21. This study was carried out in the field of the CARISMAND Project: Culture And RISkmanagement in Man-made And Natural Disasters http://www.carismand.eu/