Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Mentions of Security Vulnerabilities on Reddit, Twitter and GitHub
1. Mentions of Security
Vulnerabilities on Reddit,
Twitter and GitHub
Sameera Horawalavithana*,
Abhishek Bhattacharjee, Renhao Liu, Nazim Choudhury,
Lawrence O. Hall, Adriana Iamnitchi
University of South Florida
IEEE/WIC/ACM International Conference on Web Intelligence, Thessaloniki, Greece
2. Security Vulnerabilities
❏ Identified by CVE (Common
Vulnerabilities and Exposures)
identifiers:
❏ Publicly known security
vulnerability is uniquely identified
by a pattern CVE-YYYY-NNNN
❏ Formally recorded in National
Vulnerability Database (NVD)
❏ “U.S. government repository of
standards based vulnerability
management data represented
using the Security Content
Automation Protocol (SCAP)”
❏ Discussed on social media
2CVEs published in NVD over time.
3. Research Questions
1) What is the relationship between
mentions of security
vulnerabilities as posted on
Twitter, Reddit and GitHub?
2) Can the software development
activities in GitHub be predicted
from the discussions on Reddit
and Twitter?
3
4. Outline
❏ Dataset
❏ Data analysis
❏ CVE mentions in Reddit and Twitter
❏ CVE mentions in GitHub actions
❏ Predicting GitHub activities by using Reddit and Twitter activity signals
❏ Summary
4
5. Datasets
❏ Two social-media platforms: Reddit and
Twitter
❏ One software collaborative platform:
GitHub
❏ 18 months of records: 03/16-08/17
❏ Data filtering using the regular expression
CVE-d{4}-d{4} to match CVE identifiers
that appeared in posts, comments in
Reddit, tweets, replies in Twitter, and
GitHub event descriptions
5
6. RQ1: What is the relationship between mentions of security
vulnerabilities as posted on Twitter, Reddit and GitHub?
a. How do social media platforms compare in terms of the volume
of security vulnerability mentions?
6
7. CVE Mentions in Reddit and Twitter (1)
7
❏ 10,257 CVE identifiers
mentioned in our Reddit/Twitter
dataset,
❏ 95% CVE identifiers are
mentioned only on Twitter.
❏ 0.5% CVE IDs are mentioned
only on Reddit.
❏ 4.5% mentioned on both
platforms
More security vulnerabilities are discussed on Twitter
8. RQ1: What is the relationship between mentions of security
vulnerabilities as posted on Twitter, Reddit and GitHub?
a. How do social media platforms compare in terms of the volume of
security vulnerability mentions?
b. To what extent are named vulnerabilities discussed on public
channels before the official disclosure day?
8
9. CVE Mentions in Reddit and Twitter (2)
9
Reddit Twitter
Both platforms show a peak in the mentions of CVE identifiers near their
public disclosure
❏ Day 0 represent the NVD public disclosure date
❏ Published date of the message (post/tweet) is relative to NVD public
disclosure date of mentioned CVE identifier
10. CVE Mentions in Reddit and Twitter (3)
10
Reddit Twitter
❏ Timing of social-media messages with respect to Reddit subreddits and
Twitter Hashtags
Out of the CVE identifiers discussed on Reddit, majority are discussed
before public disclosure
11. RQ1: What is the relationship between mentions of security
vulnerabilities as posted on Twitter, Reddit and GitHub?
a. How do social media platforms compare in terms of the volume of
security vulnerability mentions?
b. To what extent are named vulnerabilities discussed on public
channels before the official disclosure day?
c. How does the severity of the security vulnerabilities affect the
timing of vulnerability mentions on the two platforms?
11
12. CVE Mentions in Reddit and Twitter (4)
12
❏ Timing of social-media messages
with respect to the severity of
mentioned security vulnerabilities
❏ We identified bot-driven
communities using the textual
description of the subreddit
❏ We used BotHunter to detect
Twitter bot users
Early discussions related to high
severity CVE identifiers occur on
Reddit
13. RQ1: What is the relationship between mentions of security vulnerabilities as
posted on Twitter, Reddit and GitHub?
a. How do social media platforms compare in terms of the volume of
security vulnerability mentions?
b. To what extent are named vulnerabilities discussed on public
channels before the official disclosure day?
c. How the severity of the security vulnerabilities affects the timing of
vulnerability mentions on the two platforms.
d. How do CVE mentions spread over social-media platforms?
13
14. CVE Mentions in Reddit and Twitter (5)
14
❏ Three Cascade Types
❏ Before (completed): cascades start and end before the public disclosure day of the
mentioned CVE
❏ Before (not completed): cascades start before the public disclosure day, but continue
after the public disclosure day of the mentioned CVE
❏ After: cascades start and end before the public disclosure day of the mentioned CVE
Reddit discussions are viral before the CVE public disclosure,
Twitter re-shares emerge after the CVE public disclosure
15. RQ1: What is the relationship between mentions of security vulnerabilities as
posted on Twitter, Reddit and GitHub?
a. How do social media platforms compare in terms of the volume of
security vulnerability mentions?
b. To what extent are named vulnerabilities discussed on public
channels before the official disclosure day?
c. How the severity of the security vulnerabilities affects the timing of
vulnerability mentions on the two platforms.
d. How do CVE mentions spread over social-media platforms?
e. What types of sentiments fuel these discussions?
15
16. CVE Mentions in Reddit and Twitter (6)
16
● Uncertainty analysis of Reddit
comments
○ Used a pre-trained machine learning model
(Yu et al. [1]) to classify whether comment
is certain or not towards the subject of the
conversation
● Reaction types of Twitter replies
○ Used a pre-trained machine learning model
(Glenski et al. [2]) to classify whether the
reply is in a type of an answer, elaboration,
question, appreciation, negative reaction,
and agreement
1. Ning Yu and Graham Horwood. 2018. Veracity Enriched Event Extraction. In 2018 International Workshop on Social Sensing (SocialSens).3–3.
2. Maria Glenski, Tim Weninger, and Svitlana Volkova. 2018. Identifying and Understanding User Reactions to Deceptive and Trusted Social News
Sources. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 176–181.
More “certain” comments in Reddit,
Majority of Twitter replies are classified
as “elaboration”, then follows “answer”
before and after public disclosure
17. RQ1: What is the relationship between mentions of security vulnerabilities as
posted on Twitter, Reddit and GitHub?
a. How do social media platforms compare in terms of the volume of
security vulnerability mentions?
b. To what extent are named vulnerabilities discussed on public
channels before the official disclosure day?
c. How the severity of the security vulnerabilities affects the timing of
vulnerability mentions on the two platforms.
d. How do CVE mentions spread over social-media platforms?
e. How does GitHub activity depend on the public disclosure of
security vulnerabilities?
17
18. CVE Mentions in GitHub Events (1)
❏ 10,502 CVE identifiers
mentioned in GitHub Events
❏ The overlap with the CVE
identifiers mentioned in
platforms
❏ 40% with Twitter
❏ 3% with Reddit
18
Moderate overlap of CVE identifiers
subject to software development
with Twitter
19. CVE Mentions in GitHub Events (2)
❏ Majority of GitHub events
mentioned only one CVE identifier,
❏ One CVE identifier
(CVE-2015-1805) is mentioned
more than in 3000 GitHub events,
❏ CVE-2015-1805 is published in
NVD around August 2015
❏ We noticed an increased
volume of related GitHub
activities in early 2016
❏ What did really happen?
19
20. RQ1: What is the relationship between mentions of security
vulnerabilities as posted on Twitter, Reddit and GitHub?
a. How do social media platforms compare in terms of the volume of
security vulnerability mentions?
b. To what extent are named vulnerabilities discussed on public
channels before the official disclosure day?
c. How the severity of the security vulnerabilities affects the timing of
vulnerability mentions on the two platforms.
d. How do CVE mentions spread over social-media platforms?
e. How does GitHub activity depend on the public disclosure of
security vulnerabilities?
f. How does GitHub activity correlate with the number of CVEs
for the most vulnerable repositories?
20
21. CVE Mentions in GitHub Events (3)
21
❏ We selected two most vulnerable
repositories with respect to the
number of associated CVE identifiers
❏ We show the pattern across three
time-series, monthly number of
mentioned CVEs, Forks, Watches
and Push Events
❏ We calculate Dynamic Time
Warping (DTW) to measure the
similarity between GitHub event
and CVE time-series
Push Events are the closest to follow
the pattern of CVE mentions
22. RQ2: Can the software development activities in GitHub be predicted
from the discussions on Reddit and Twitter?
22
23. Predicting GitHub Activities
A GitHub event is defined as (U,R,Ep
,Th
),
❏ U: user
❏ R: repository
❏ Ep
: type of action (PushEvent
PullRequestEvent, IssuesEvent,
ForkEvent, WatchEvent,
CommitEvent, ReleaseEvent)
❏ Th
: the event time-stamp in hours
23
Time
Reddit
Twitter
GitHub
Training Testing
Features
Target
(Event)
Features
Target
(Event)
January 2017 to May 2017* August 2017
*June and July, 2017 as validation data
24. Predicting GitHub Activities: Features and Approach
❏ Reddit time-series features
❏ Daily count of posts
❏ Daily count of active authors
❏ Daily count of active subreddits
❏ Daily counts of comments
❏ Twitter time-series features
❏ Daily count of tweets
❏ Daily count of tweeting users
❏ Daily count of retweets
❏ Daily count of retweeting users
24
Reddit/Twitter
time-series
Features
NN
Number of
GitHub events
in a day
Likelihood of a user
performing an
action to a
repository in a hour
LSTM
Hourly GitHub
activities of a
user to a
repository
Predicting Longitudinal User Activity at Fine Time Granularity in Online Collaborative
Platforms, Renhao Liu, Frederick mubang, Lawrence Hall*, Sameera Horawalavithana,
Adriana iamnitchi, John Skvoretz, IEEE International Conference on Systems, Man, and
Cybernetics (SMC) , Bari, Italy, 2019
26. Predicting GitHub Activities: Relevance
26
❏ Why is predicting GitHub activities
important?
❏ GitHub hosts many exploits and
patches related with CVE identifiers
❏ Predictions might reflect the
software development activities of
an attacker who develops an exploit
❏ Predictions can be used to estimate
the availability of a patch related to
a security vulnerability
Reddit/Twitter features are helpful for
predicting number of GitHub events.
It is more difficult to predict the
identity of a user and the repository
in an event.
27. Summary
27
❏ We characterized a use-case scenario where diverse online platforms are
interconnected such that the activities in one platform can be predicted based
on the activities in the others.
Practical implications of our findings:
❏ Advance or calibrate security alert tools based on information from multiple
social media platforms.
❏ Better coordinate software development activities with the lessons learned
from social-media information
28. Acknowledgements
❏ Funded by DARPA SocialSim Program and the Air Force Research
Laboratory
❏ Data: Leidos, Netanomics
❏ Evaluation code provided by Pacific Northwest National Laboratory
28
29. Mentions of Security
Vulnerabilities on Reddit,
Twitter and GitHub
Sameera Horawalavithana*
(sameera1@mail.usf.edu)
Check out our project @SocialSim
32. Related Work
❏ Different types of security vulnerability information available in Twitter (Syed
et. al., Sauerwein et al.)
❏ Description of Vulnerabilities (e.g., URLs to security mailing list, expert blogs etc.)
❏ Demonstration of Exploits (e.g., URLs to YouTube videos)
❏ Unofficial proposals of countermeasures (e.g., URLs to security blogs describing unofficial
patches)
❏ Announcement of patch releases (e.g., URLs to official blog posts by vendors)
❏ Automatically discovering security threats from independent platforms.
❏ E.g., Twitter, Dark Web (Sapienza et al.), security blogs (Mittal et. al, ) etc.
32