Social media platforms have become key portals for sharing and consuming information during crisis situations. However, humanitarian organisations and effected communities often struggle to sieve through the large volumes of data that are typically shared on such platforms during crises to determine which posts are truly relevant to the crisis, and which are not. Previous work on automatically classifying crisis information was mostly focused on using statistical features. However, such approaches tend to be inappropriate when processing data on a type of crisis that the model was not trained on, such as processing information about a train crash, whereas the classifier was trained on floods, earthquakes, and typhoons. In such cases, the model will need to be retrained, which is costly and time-consuming.
In this paper, we explore the impact of semantics in classifying Twitter posts across same, and different, types of crises. We experiment with 26 crisis events, using a hybrid system that combines statistical features with various semantic features extracted from external knowledge bases. We show that adding semantic features has no noticeable benefit over statistical features when classifying same-type crises, whereas it enhances the classifier performance by up to 7.2% when classifying information about a new type of crisis.
2. www.comrades-project.eu
Motivation
2
People of NSW, be
careful because
there's fires spreading!
Stay safe everyone!
Hundreds of volunteers
in Mexico tried to
unearth children they
hoped were still alive
beneath a school's ruins
Two trucks and one car
in the water after a road
collapse at Hwy 287 and
Dillon. #cowx
#boulderflood
CRISIS
Wildfire
Floods
Earthquake
3. www.comrades-project.eu
Motivation
3
Challenges
A flood of data gets generated.
For e.g.:
Over a million tweets were
posted during the 2017
Hurricane Harvey.
500% increase in the
tweets bandwidth during
2011 Japan earthquake.
Almost impossible to manually
absorb and process such sheer
volumes.
In addition, the characteristics
of social media posts such as
short length, colloquialism,
syntactic issues pose additional
challenges of processing the
data.
4. www.comrades-project.eu
Motivation
4
FEMA launched an initiative to use public social media data for situational
awareness purpose1.
1: https://www.dhs.gov/sites/default/files/publications/privacy-pia-FEMA-OUSM-April2016.pdf
Image source – fema.gov
7. www.comrades-project.eu
Key Problem- Broad Spectrum
Data
7
The diverse range of situations result in a broad spectrum of content
People of NSW, be careful because
there's fires spreading! Stay safe
everyone!
BREAKING: Reports of shots fired at
LAX Airport, says senior government
official.
Two trucks and one car in the water
after a road collapse at Hwy 287 and
Dillon. #cowx #boulderflood
Report: Between 3 and 5 firefighters
missing following massive blast at West,
Texas, fertilizer plant, police say
Hundreds of volunteers in Mexico
tried to unearth children they hoped
were still alive beneath a school's
ruins during earthquake
Casualties from 7.2 #earthquake in
the #Philippines is now 20+ according
to authorities.
Casualties from 7.2 #earthquake in
the #Philippines is now 20+ according
to authorities.
8. www.comrades-project.eu
Access Relevant Information Across
Crisis Situations
8
• How do we handle information overload?
• How do we identify relevant and irrelevant
information across diverse crisis situations?
• Can we learn from one type of crisis situation, and
identify relevant information in another type?
9. www.comrades-project.eu
Previous Efforts - Identifying Crisis Related
Information
• ML Classification Methods:
Supervised Approaches: Often making use of n-grams,
linguistic features, and/or statistical features of tweets.
Unsupervised Approaches: Keyword processing and
clustering.
• Semantic Models:
Representation of the information emerging from Crisis
Events, providing faceted search of crisis related
information.
9
10. www.comrades-project.eu
Hypothesis and Aim
• Hypothesis:
Semantics establish a consistency across various types of
crisis situations thereby enabling identification of relevant
information and can enhance the discriminative power of
the classification systems.
• Go beyond statistical features, n-grams, and
incorporate the contextual semantics to the
statistical features.
10
11. www.comrades-project.eu
Statistical Features
Example of statistical features:
- Text length.
- Number of words.
- Presence and count of various Parts of Speech (PoS).
- Data specific features such as hashtags (in tweets).
- E.g., #neworleans #nola #algiers #nolafood #hurricanekatrina.
- Readability Score (Gunning Fox Index using average
sentence length (ASL) and percentage of complex words
(PCW) : 0.4*(ASL + PCW)).
11
12. www.comrades-project.eu
Semantic Features
Example of semantic features:
Additional information about terms found in the tweets can
be extracted using NER tools, entity linking tools, and
semantic databases:
- Entity linking in Knowledge base.
- Co-occurring words (from a data corpus)
- Synset Sense – WordNet
- Hierarchical Context: Hypernyms, Synonyms
- Dbpedia properties.
12
13. www.comrades-project.eu
Extracting Semantics
Available tools for entity extraction and knowledge expansion:
NER
DBpedia Spotlight
Alchemy (IBM)
Babelfy (BabelNet)
Text Razor NLP API
Aylien Text Analysis API
Knowledge Bases
Dbpedia
YAGO
BabelNet
WordNet
Google Knowledge Graph
Wikidata
13
14. www.comrades-project.eu
Babelfy and BabelNet
BabelNet – a multilingual lexicalised semantic network formed by
combining various knowledge resources- WordNet, Wikipedia,
Wikitionary, OmegaWiki etc. It can enable multilingual NLP
applications. It can be used for words sense disambiguation and
entity linking with Babelfy.
Babelfy – A words sense disambiguation and entity linking
API built on top of BabelNet.
14
15. www.comrades-project.eu
Features extracted
Statistical Features
Number of Nouns, Verbs, Pronouns
Tweet Length
Number of words/tokens
Number of Hashtags
Semantic Features
BabelNet Semantics
BabelNet Sense: English labels of entities identified via Babelfy.
BabelNet Hypernym: Direct English hypernyms of each entity (at a
distance 1).
Dbpedia Semantics: List of properties associated with Dbpedia URI
returned by Babelfy.
subject, label, type, city, state, country
16
16. www.comrades-project.eu
Semantic Enrichment- Broader
Perspective
Features Post A Post B
‘No confirmed casualties yet from
landslide reported in Compostela
Valley. #PabloPH’
‘News: Italy quake victims given
shelter http://t.co/cXQEusVm via
@BBC’
Babelfy Entities
Sense (English)
confirm, casualty, report, landslide Italy, earthquake, victim, shelter, news
Hypernyms
(English)
victim, affirm, flood, seismology, geology,
soil slide, announce, disaster, natural
disaster, geological phenomenon
natural disaster, geological phenomenon,
broadcasting, communication, nation,
country, unfortunate
DBpedia dbc:landslide, dbr:landslide, dbo:place,
dbc:Geological hazards, dbc:Seismology,
dbc:Geological hazards, dbc:Seismology,
dbr:Earthquake, dbc:Communication,
dbr:News
17
17. www.comrades-project.eu
Method
• Collect Data from CrisisLex.org- collection of Crisis oriented
tweets.
• Extract Statistical Features.
• Semantic Enrichment of tweets via annotation using Babelfy
API.
• Expand the semantics by incorporating hypernyms through
BabelNet.
• Retrieve Dbpedia features through SPARQL endpoint.
• Classify using SVM classification method.
19
18. www.comrades-project.eu
Data
• CrisisLexT26
• 26 crisis events with 1000 labelled tweets in each event.
• 4 Labels: Related & Informative, Related & Not
Informative, Not Related, and Not Applicable.
• Merged Related & Informative, Related & Not Informative –
Related.
• Merged Not Related, and Not Applicable – Not Related.
20
19. www.comrades-project.eu
Data
• After removing duplicates: 21378 Related and 2965 Not
Related.
• To prevent bias, we chose a balanced data-
• Selected same number of Related tweets as Not Related in
each event.
• Final figure: 2966 Related and 2965 Not Related.
21
20. www.comrades-project.eu
Data
22
Related Not
Related
Total Related Not
Related
Total
CWF Col. Wildfire 242 242 484 COS Costa Rica
E’qke
470 470 940
GAU Gautemalla
E’quake
103 103 206 ITL Italy E’quake 56 56 112
PHF Philippines
Flood
70 70 140 TYP Typhoon P 88 88 176
VNZ Venezuela
Fire
60 60 120 ALB Alberta
Flood
16 16 32
ABF Australia
Bushfire
183 183 366 BOL Bohol
E’quake
31 31 62
BOB Boston
Bomb
69 69 138 BRZ Brazil Fire 44 44 88
CFL Col.Fire 61 61 122 GLW Glasg Crash 110 110 220
LAX LA Shootout 112 112 224 LAM Train Crash 34 34 68
MNL Manila
Flood
74 74 148 NYT NY Train
Crash
2 1 3
QFL Queensland
Flood
278 278 556 RUS Russia
Meteor
241 241 482
SAR Sardinia
Flood
67 67 134 SVR Savar
Building
305 305 610
SGR Singapore
Haze
54 54 108 SPT Spain Train
Crash
8 8 16
TPY Typhoon Y 107 107 214 WTX West Texas
Ex.
81 81 162
21. www.comrades-project.eu
Data- Event Type Distribution
23
Event Type Events Event Type Events
Wildfire/Bushfire
(2)
CWF, ABF Haze (1) SGR
E’quakes(4) COS, ITL, BOL, GAU Helicopter Crash
(1)
GLW
Flood/Typhoons
(8)
TPY, TYP, CFL, QFL,
ALB, PHF, SAR,
MNL
Building Collapse
(1)
SVR
Terror
Shooting/Bombing
(2)
LAX, BOB Location Fire (2) BRZ, VNZ
Train Crash (2) SPT, LAM Explosion (1) WTX
Meteor (1) RUS
Crisis Type Distribution
Wildfire/Bushfire
E’quakes
Flood/Typhoons
Terror Shooting/Bombing
Train Crash
Meteor
Haze
Helicopter Crash
Building Collapse
Location Fire
Explosion
22. www.comrades-project.eu
Experiment Design
Feature Models:
Statistical Features (SF- baseline)
Statistical Features + BabelNet Semantics (SF + SemEF_BN)
Statistical Features + Dbpedia Semantics (SF + SemEF_DB)
Statistical Features + BabelNet Semantics + Dbpedia
Semantics (SF + SemEF_BNDB)
Crisis Classification Model
Merge the entire data and perform 20 iterations of 5-fold
cross-validation across all the models to evaluate the
performance.
24
Statistical Features (SF), BabelNet Semantics (SemEF_BN), DBpediaSemantics (SemEF_DB), BabelNet and
Dbpedia Semantics (SemEF_BNDB)
23. www.comrades-project.eu
Experiment Design
Cross Crisis Classification
Criteria 1- Content relatedness classification of already
seen crisis event type.
When type of test data already exists in training data.
e.g. A classifier trained on data containing
tweets/documents from flood event types (along with
other event types), is used to classify data from a new
flood type crisis event.
25
Statistical Features (SF), BabelNet Semantics (SemEF_BN), DBpediaSemantics (SemEF_DB), BabelNet and
Dbpedia Semantics (SemEF_BNDB)
24. www.comrades-project.eu
Experiment Design
Cross Crisis Classification
Criteria 2- Content relatedness classification of unseen
crisis event type.
When type of test data does not exist in training data.
e.g. A classifier trained on data containing
tweets/documents from crisis events types except
building fire event types, and is used to classify data
from a such crisis event.
To classify - “With death toll at 300, Bangladesh factory collapse
becomes worst tragedy in garment industry history”
26
25. www.comrades-project.eu
Experiment
• Classifier Selection
Support Vector Machine with Linear Kernel
Chosen after determining its performance significance over RBF
Kernel, Polynomial Kernel, and Logistic Regression via 20 iterations
of 5-fold CV over the entire data)
• Tools & Library
Scikit-learn Library
Python 2.7
27
26. www.comrades-project.eu
Results
Crisis Classification Model (20 iterations 5- fold cross
validation)
28
Features Pmean Rmean Fmean Std. Dev. σ
(20 iteration)
∆F /F
(%)
Sig. (p-value)
SF
(Baseline)
0.8145 0.8093 0.8118 0.0101 -
SF +
SemEF_BN
0.8233 0.8231 0.8231 0.0111 1.3919 <0.00001
SF +
SemEF_DB
0.8148 0.8146 0.8145 0.0113 0.3326 0.01878
SF +
SemEF_BN
DB
0.8169 0.8167 0.8167 0.0106 0.6036 0.00001
Statistical Features (SF), BabelNet Semantics (SemEF_BN), DBpediaSemantics (SemEF_DB), BabelNet and
Dbpedia Semantics (SemEF_BNDB)
29. www.comrades-project.eu
Results and Observations
• Based on IG score across each feature model (on the overall data), we observed
very event specific features in SF model such as collapse, terremoto, fire,
earthquake in top ranked features.
• Observed 7 different hashtags in top 50 features (indicate event specific
vocabulary).
• In SF+SemEF_BN and SF+SemEF_DB models, we observed concepts such as
natural_hazard, structural_integrity_and_failure, conflagration, perception,
geological_phenomenon, dbo:location, dbc:building_defect etc in top 50
features.
• structural_integrity_and_failure – annotated entity for term like collapse,
building collapse – frequently occurring terms in earthquake, flood type events.
• Natural_disaster – hypernym to event terms such as flood, landslide,
earthquake.
31
30. www.comrades-project.eu
Results and Observations
• On an average SF+SemEF_DB is the best performing model (from
Criteria 2).
• An avg. percentage gain in F1 score (△F/F) of +7.2% with a Std.
Dev. 12.83%.
• Improvement over the baseline SF model, in 10 out of 15 events
• 5 of 7 flood/typhoon, 3 of 4 earthquake, 2 of 4 crash/terrorist.
• The results show that when type of test event is NOT seen in the
training data, semantics enhance classifier performance.
32
31. www.comrades-project.eu
Results and Observations
• Semantics generalise event specific terms and consequently
adapt to new event types (e.g., dbc:flood and dbc:natural hazard
).
• Semantic concepts can be also be too general and thus do not
help the classification of document (e.g., desire and virtue
hypernyms).
– Virtue is hypernym of broad range of concepts such as loyalty, courage,
cooperation, charity.
• Automatic semantic extraction tools could extract many non-
relevant entities and therefore might confuse the.
– e.g. “Super Typhoon in Philliphines is 236 mph It's roughly the top speed of Formula 1
cars http://t.co/vcRE…” – the annotation and semantic extraction results in
33
32. www.comrades-project.eu
Further explorations
• A more in-depth error analysis of misclassified documents is
required.
• Event type is based on the nature of the crisis. However,
events of different types could produce overlapping content.
Hence, content similarity could also be taken into account,
along with event types.
• Data about the same crisis event can emerge in multiple
languages. Hence we need to expand the analysis to
multilingual content.
• Khare, P., Burel, G., Maynard, D., and Alani, H., Cross-Lingual Classification of Crisis
Data, Int. Semantic Web Conference (ISWC), Monterey, 2018 (to be presented)
34
‘Classifying Crisis-information Relevancy with Semantics’
As the topic of the paper suggests, crisis situations are the principal motivation behind the work. People around the world are impacted by crisis and disasters in various forms. And in this era with the ability to share and access information in real time they resort to different online social media forums. Twitter certainly is among the most prominent medium for sharing and accessing real time information.
To support this idea, I would like to highlight this case from Hurricane Harvey in 2017. What you see highlighted are two very crucial piece of information shared in course of crisis situations. A volunteer driven handle, that collates info on a portal about who needs help and rescue, and who could assist in that geographical area. This is an interaction between two parties that resulted in rescuing 3 elderly ladies. But these critical information are not always easy to find and access as we are also well aware of the challenges that social media projects along side the opportunity that it offers.
But given those challenges, the opportunities these platforms offer have widely been acknowledged by humanitarian and government agencies.
The need for having tools and systems to rightly determine what is valuable on social media with respect to its relatedness to crisis situations is highlighted in this minor exercise of performing keyword based search on twitter during Hurricane Harvey. It is evident that not everything that might contain crisis specific based terminology is always a related content.
That’s precisely where the problem of ‘what is crisis related’ defined.
But the problem explored in this work isn’t just about what is crisis related and what is not. They key problem is the diversity in the crisis events. Different crisis situations have different types and levels of impact on human life in form of well being, civic facilities, and what not.
Which results in a very broad spectrum of data. For instance, if we see here the social media data from situations like e’quakes, wildfires, explosions, road situations , terrorism to name a few. You can see how diverse the information is. For that we do need to come up with ways to be able to filter in as much diverse related content as we can.
So as a requirement of disaster management, we come down to the following questions. Since we understand that manually sieving through the social stream is nearly impossible so can we have automated ways to identify relevant information and can we learn from crisis situations to identify relevancy in new crisis situations.
Previous approaches have made attempts at tackling this problem. Some have adopted ML approaches where they go either by supervised classification approach or unsupervised.
Supervised Approaches:
Often making use of n-grams, linguistic features, and/or statistical features of tweets.
Unsupervised Approaches:
Keyword processing and clustering.
Some of the approaches perform semantic enrichment of the data to create a faceted search on top of the semantic data. But that does not strictly tell you if something is crisis related or not. And may require a new strategy to search each time in new type of crisis events.
Our hypothesis is that the crisis relatedness is exhibited by combination of various concepts that occur in the user generated content. It might not always be just one key term that establishes the crisis relatedness of a tweet. So, we go beyond the statistical features and incorporate the contextual semantics along with the statistical features. We hypothesize that different crisis situations, while they maybe very discrete in their vocabulary and sense, can relate to each other somewhere at a broader contextual sense.
A number of statistical features can be extracted: length, number of words, various part of speech such as noun, verbs, pronouns, hashtags, and some can also calculate the readability score that scores a tweet on how easy or complicated it is structured to read. Gunning Fox Index is one of the methods to calculate that.
Various additional semantics can be considered. Co-occurring words (words that are very common to occur together across large scale corpuses). Word-embeddings are a good example of it. Extracted entities- along with the original text can sharpen the context more. Next, we can refer to the knowledge base (such as DBpedia) to retrieve extra information/properties about that entity. We can also use hierarchical context (from WordNet) to retrieve hypernyms, synonyms to each concept to generalise the context more.
We require NER tools and Knowledge Bases to extract the semantics that we want. Here are some of the well known tools available.
In this work we have relied on BabelNet knowledge base and NER API built on top of it. BabelNet is a multilingual semantic network resource which incorporates multiple knowledgebase such as wikipedia and wordnet, that really caters to the requirements here.
Here is an example of using BabelNet to semantically enrich a tweet by extracting entities and their hypernyms. We annotate the key entities in the text and then look for their corresponding hypernyms and augment them to the overall context.
The features that we have extracted are as follows. The statistical features are… The semantic features are. We extract hypernyms assuming that diverse concepts can relate when the context is expanded to parent levels of concepts. We refer to the Dbpedia properties to retrieve extra information/properties about that entity, which can link us to broader knowledge about its nature.
To gain a perspective of what we imply by semantic enrichment across the crisis types let us look at the following real scenario tweets.
Another example elaborating the same.
As an end to end process in a nutshell, this is what we do.
CrisisLex is a very popular data repository which has time and again been referred to for various related research studies. We have used one particular data corpus from Crisis Lex, and that is called CrisisLexT26. This dataset comprises of manually labelled tweets of 26 crisis events that occurred between 2012 and 13. Each of these 26 events have close to 1000 labelled tweets. So, here they have 4 labels. We merge the two to create a binary class system.
To have a balanced learning of both the classes, we ensure to pass an un-skewed data to the classifier. In each of the 26 events we select the same number of related tweets as the unrelated.
Here you can see related and unrelated distribution across all the 26 crisis events. As it is obvious, there isn’t an equal distribution across each event though.
I would like to highlight each event is basically a broad crisis situation and tweets have been collected during that situation and that we call as event. For instance CWF is Colorado Wildfire, and it contains various tweets collected during that event.
Here we have categorised events based on their types. And this infographic shows that maximum events in this dataset are in flood/typhoons and next is earthquakes. Next we have same number of crash and shooting/bombing events.
Now we describe out experiment design. We create 4 feature models to evaluate. Statistical which is our baseline. Next, we create 3 semantic models where we enhance the SF feature model with the semantic features. In 3 models, we create SF + Babel Semantics, SF + Dbpedia semantics, and in last model we combine both the Babel and Dbpedia features. When we add the semantics we concatenate the semantics with the original text of the tweet and the do the tokenisation and create n-grams.
So, these are the feature models. Now we design the classification methods. First, as a broad run, we simply merge the data from all the events and perform 20 iterations of 5-fold CV to just see how the 4 feature models perform.
Now here, we begin designing the cross crisis classification methods. Where we aim to evaluate how the classifier performs in while classifying the data from a new crisis event type. We set up 2 criteria. In Criteria 1, we test a type of crisis event when the type is already seen by the classifier in the training data.
In Criteria 2, we test a type of crisis event when the type is not seen by the classifier in the training data. For instance, we have a classifier trained on data collected during say flood and earthquake events. And we get to classify something coming from a factory collapse situation like here. Basically these 2 criteria set up here are critical part of our analysis.
We chose SVM linear kernel. SVM is known for its suitability in text classification problems. On top of it we compared the linear kernel’s performance against other kernels and logistic regression following a 20 iteration of 5-fold CV over entire data. SVM Linear Kernel was found to be more statistically significant, and had a better mean F1 value of 0.8118 and a p-value of < 0.00001.
We look at the results, so this is when we merge the entire data and do cross validation. So while this is not cross-crisis classification strictly, but it still shows that broadly the semantics perform slightly better than the baseline, improvement ranging from 0.6 to 1.4 %.
Now in Criteria 1, where the event is completely new but the classifier has seen the type of the test event in training. In this case we, take out the test event from the data, and use rest of the dataset as training data.
As seen in an earlier infographic, flood/typhoon and earthquake had good number of events in the overall dataset, we chose to perform our analysis on these crisis types.
So, when the classifier has already seen a type of crisis event, the semantics may not always be superior to the statistical features as we can observer from the results. The Dbpedia semantics seem to be the more consistent of the semantic feature model, performing better than the baseline in 6 out of 11 test events.
Now, when we see the Criteria 2, where the type of the event is not seen in the training data. In this case, while we test an event of a given type we ensure that none of the other events of similar type are in the training data. Here we see that firstly the performance of baseline drops significantly in comparison to baseline in Criteria 1. Secondly the semantic models, particularly Dbpedia feature model, outperforms the baseline in 10 out of 15 test events. As an additional category of types we also included the events from train crash, bombing/terror attacks.
To analyse how the semantics were effecting the nature of the data and the classifier, we performed Information Gain across all the feature models. We observed the in Statistical Features we observed very crisis specific terms among the top ranked features. Also, we saw 7 different hashtags in top 50 features and that is indicative of how vocabulary specific important features are. As we analysed the semantic models, we saw more generic, yet crisis related concepts showing up in the top ranked features. Some of the semantics concepts existed across different crisis types.
Overall the Dbpedia semantics was the best performing model when the classifier was tested on an unseen type of the crisis. The avg. gain was 7.2% over the baseline. And it performed uniformly well across all the 3 tested types. The Dbpedia performed well, likely due to its better coverage and semantic depth. Something we need to explore more.
Few more take aways from the analysis were: generalizing the data semantically help in adapting new crisis events.
Sometimes very broad and general concepts can result in underperformance of the classifier. For instance, virtue is the hypernym of diverse concepts which can often be used in different context.
Also, sometimes semantic extraction can yield very unwanted concepts and in huge volumes. For e.g. here.
As a progression to this work, we need to perform a more in-depth error analysis.
Currently we only take the type of event in account, which is broadly the nature of a given crisis. However, different events can have overlapping content based on similar situations. So it would make sense to take that also into account.
The crisis data can also originate in different languages, so how the classifier can be tuned to handle multilingual aspect of the crisis data, that analysis should be expanded. We made an attempt on doing the same, and that can be referred to in the near future at the following research paper that we will present at upcoming semantic web conference.