Exploration of gaps in Bitly's spam detection and relevant countermeasures

Exploration of gaps in Bitly's spam detection
and relevant counter measures
Neha Gupta
Advisor: Dr. Ponnurangam Kumaraguru
M.Tech Thesis Defense
23-April-2014

Thesis Committee
 Mr. Sachin Gaur, MixORG
 Dr. Vinayak Naik, IIIT-Delhi
 Dr. PK (Chair), IIIT-Delhi
2

Achievements
 Gupta, N., Aggarwal, A., and Kumaraguru, P. bit.ly/can-do-better.
Poster at Security and Privacy Symposium (SPS), IIT-K, 2014.
3

Presentation Outline
 Research Motivation and Aim
 Related Work
 Research Contribution
 Methodology
 Experiments and Analysis
 Malicious Bitly Link Detection
 Conclusion
 Future Work
 Questions
4

What are URL shortening services?
Long URL Short URL
…
Others
http://bit.ly/1oL7gi5
https://www.youtube.com/watch?v=ukUL_I14GPw
URL shortening service
hash
The most
popular
5
Research Motivation and Aim
 Shortens close to 80 million links each day
 Marks 2-3 million as suspicious every week
 Twitter’s default URL shortening service before 2011

Use of URL shortening services
 Space gain (Twitter’s 140 character limit)
 More manageable
 Prevent line breaks
 Easy dissemination of content
 Provides useful analytics (e.g. click data)
 Online Social Media (OSM) connection
 Complex link obfuscation
Please like this picture
http://3.bp.blogspot.com/_s5emCsFnEd
E/TKUVi2BopBI/AAAAAAAADl8/lffvi7khF
7g/s1600/googl.png
@abc
Dr. ABC
10:44 PM Tue Aug 30, 2011
Please like this picture bit.ly/1hAVVaE
@abc
Dr. ABC
10:44 PM Tue Aug 30, 2011
versus
6

Abuse of URL shortening services
- Attack scenario
URL
shortening
service
One-level obfuscation
Long malicious URL Short malicious URL
Many short URL services detect and restrict long URLs at submission, but Bitly does not
Not so
popular
URL
shortening
service
Long malicious URL Short malicious URL
Popular
URL
shortening
service
Multi-level obfuscation
…
7
http://www.sexpixbox.com/kingnet/cute/index.html http://bit.ly/QCMW2S
http://www.sexpixbox.com/kingnet/cute/index.html http://bit.ly/QCMW2Shttp://short.me/ABCD

Abuse of URL shortening services
- Attack execution
Legitimate looking tweet Scam!!
8

Major attacks
Year 2014
9

Bitly's Spam Detection Policies
+
+
More filters..
10
‘‘ ’’
‘‘
’’

Research Aim
A focused study on Bitly to:
 characterize malicious URLs
 examine Bitly’s security policies
 identify Bitly specific features to detect spam
11

 Related Work
 Methodology
 Conclusion
 Future Work
 Questions
12

Related Work
2009
• Kandylas et al. Relative study of long and short Bitly URLs on Twitter
2010
• Benevenuto et al. + Grier et al. Identification of distinctive features to detect spammers on Twitter
2011
• Antoniades et al. Analysis of content, popularity, and impact of short URLs
• Chhabra et al. Overview of evolving phishing attacks through short URLs on Twitter
• Thomas et al. Classification of a long URL as malicious / benign in real time
2012
• Klien et al. Global usage pattern analysis of short URLs
• Aggarwal et al. Real time phishing detection on Twitter using Twitter and URL based features
• Lee et al. Real time suspicious URL detection technique on Twitter using conditional redirects
2013
• Maggi et al. Study of abuse of short URLs using 622 distinct shortening services
2014
• Nikiforakis et al. Study of ecosystem of ad-based URL shortening services
13
Related Work

Related Work
2013
• Click Traffic Analysis of Short URL Spam on Twitter
Wang et al. Classification of a link as spam / non-spam using only click traffic based features
14
Related Work
 No dedicated study to identify ground security issues specific to a URL shortener
 Unexplored short URL based features to detect malicious content before it targets the audience
Research Gaps

Research Contribution
 Click traffic analysis and social network impact of malicious Bitly links
 Detailed inspection to highlight weaknesses in Bitly’s spam detection
techniques
 Proposal to detect clicked / unclicked malicious Bitly links
15
Related Contribution

Methodology - Acquiring Dataset
link_encoder_info
link_encoder_link_history
link_info
link_expand
link_clicks
link_referring_domains
link_encoders
Bitly Global Hash
Long URL
#Warnings
Link Dataset
(763,160)
Link Metric Dataset
(413,119)
Encoder/User Metric Dataset
(12,344)
(Bitly API)
(Bitly API)
Phase 1 Phase 2 Phase 3
(54.13%) (100%)
16
Methodology

 Related Work
 Methodology
 Conclusion
 Future Work
 Questions
17

Metadata Analysis – Content Creation
Bitly Global Hash
Long URL
#Warnings
TLD
Domains
Status check after 5 months
(22,038)Link Dataset (763,160)
Non-existent
Domains
(18,966)
Whitelist check
Domains
(21,982)
Results:
 83.06% suspicious domains non-existent
before / after 5 months
 Total number of click requests made to these dead
domains (only in October) found to be 9,937,250
Inference:
Created for a dedicated purpose of spamming and
eventually die out after achieving significant
number of hits!! 18
Experiments and Analysis

19
Network Analysis
 Referrer Network
 Connected Network

Network Analysis - Referrer Network
Referrer as Twitter
37,903
last <=200 tweets
788,759
Text + URL + Domain
Jaccard Similarity |BA|
|BA|
B)J(A,


(Twitter API)
17 Twitter profiles with variance <=0.00012
…
Status check
after 5 months
Manual annotation
21,679
20
4,336
(11.44%)
636
(1.68%)
5,444
(14.36%)
5,302
(13.99%)
22,185
(58.53%)
Hour of the day vs. Minute of the hour graph for Twitter user –
(a) @dtitgp2. (b) @fujisakikaoru
3 profiles: pornographic content
1 profile: work from home scam
11 profiles: spam but <=3 tweets
2 profiles: suspended

21
Network Analysis – Connected Network

Twitter
3,415
(63.54%)
Facebook
951
(17.69%)
1,009
18.77%
5,375 users connected Twitter / Facebook profile
22
 Users can connect any number of Facebook / Twitter
accounts
Why more Twitter than Facebook?
 Doesn't allow users to connect Facebook brand / fan
pages for free
Multiple connections
 507 malicious users connected multiple Twitter accounts
 28 malicious users connected at least 10 Twitter accounts
Connected OSM network of all encoders

23
Bitly profiles
(Link history)
Bitly warning check
(Connected Twitter
accounts)
(<=200 tweets)
Inter Twitter profile
Jaccard Similarity
(Bitly user name)
Inter Bitly profile
Jaccard Similarity
Manual annotation based on similarity scores
3 malicious communities detected

Community 1
2 Bitly users, 9 associated Twitter accounts each
All 18 Twitter accounts shared similar explicit
pornographic content
Dormant on Bitly, active on Twitter
24
Counter measure 1
Bitly should impose a restriction on the
number of OSM accounts a user can
connect

25
Security Analysis
 Efficiency Check
 Promptness Check
 Tractability Check

(a) Malicious link identification
Security Analysis - Efficiency Check
International consortium that
brings together businesses
affected by phishing attacks
Free checking of suspicious URLs using 52
different website / domain scanning engines
and datasets
Domain / IP level blacklist created
from the occurrence of websites
in unsolicited messages.
1 2 3
26

1
APWGs live
feed request
setup
142,660
6 months
216
2,656
2,872
+
Bitly warning check 382
(13.30%)
Inference:
 86% undetected malicious links by Bitly in 6 months
 Such low detection rate looks alarming as APWG is a popular and trusted source to detect phishing!
27
Bitly is not even using the claimed detection services effectively
 Similar analysis when performed against Virustotal, 71.53% undetected links obtained
 36.66% domains blacklisted by SURBL, but undetected by Bitly
 Bitly claims to use SURBL (http://blog.bitly.com/post/138381844/spam-and-malware-protection)

(b) Malicious user profile Identification
collectedlinksTotal
pagewarningBitlytogredirectinLinks
FactorSuspicion encoder
#
#
 Suspicion Factor measures the credibility of a Bitly profile
 Computed for all 12,344 encoders in our dataset
28

2,018 (12,344 - 10,326) out of 12,344 encoders (16.35%) had a Suspicion Factor=1
i.e. they shortened only suspicious links 29
Counter measure 2
 Bitly should take some measures
to detect and suspend such users
 If not suspend, a credibility score
can be added with a Bitly profile

Our blog Bitly’s reponse
Tweet to our blog
30

Highly suspicious profiles: User has shortened at least 100 links + Suspicion Factor is 1 80 profiles
31
Security Analysis – Promptness Check
User: bamsesang, Month lag: 24
 Also collected their recent link history (after 1 January 2014)
 4 of these 80 users were still active and propagating malicious content
Ease of penetration of spammers and delay in Bitly’s suspicious user detection process which it claims to follow
User: iplayonlinegames, Month lag: 18

Result:
 35.2% identified malicious links in October 2013 are also being actively clicked in year 2014
Inference:
 By-passable Bitly warning page is alone not enough to curtail the dissemination of spam
 No control over the access to already detected malicious links can heavily encourage
spammers to use Bitly
Popular malicious Bitly links
Links with large number of warning pages displayed (URLs with high overall impact)
Bitly Global Hash
Long URL
#Warnings
Link Dataset (763,160)
Reverse sort based on
number of warnings Top 1000
links
Bitly API Recent Click
history
32
Security Analysis – Tractability Check
Counter measure 3
Bitly should not only throw a
warning page but also block the
visit on popular malicious Bitly links
already detected

 Bitly is not using the claimed detection services effectively
 Extreme delay in suspicious user identification (if at all)
 By-passable Bitly warning page is alone not enough to control the
problem of spam
33
Security Analysis – Major Findings

 Related Work
 Methodology
 Conclusion
 Future Work
 Questions
34

Malicious Bitly Link Detection - Data
Collection and Labeling
a repository of suspected phishing or malware pages maintained by Google Inc.
a public crowdsourced database of phishing URLs
35
Malicious Bitly Link Detection
Tweets from
Twitter’s REST
API
(412,139)
Blacklist + Bitly Warning Check
Extract and
expand bitly
URLs
(34,802)
Malicious
Benign
labeled-datasetunlabeled-datasetCollect data
1. Google Safebrowsing
2. SURBL
3. PhishTank
4. VirusTotal
Data Collection Data Labeling

Malicious Bitly Link Detection – Feature
Selection
No. Feature Name Feature Description
1 Domain age Difference between domain creation / updation date and expiration
date
2 Link Creation domain
creation difference
Difference between domain creation date and bitly link creation date
3 Link creation hour Bitly link creation hour
4 Number of encoders Number of bitly users who encoded a particular link
5 Anonymous and API
encoder ratio
Ratio of encoders as ‘’anonymous’’ or from a Twitter based application
(Twitterfeed, TweetDeck, Tweetbot) to the total number of encoders
6 Link creation first click
difference
Difference in days between bitly link creation date and date of first
click received
7 Referring domains -
direct by total
Ratio of referring domains from a direct source to the total number of
referring domains
WHOIS
specific
Bitly
specific
Non-Click
based
Clickbased
36

Malicious Bitly Link Detection -
Experimental Setup
1) Naive Bayes
2) Decision Tree
3) Random Forest
Machine Learning Algorithms
Training on pre-labeled dataset
Predict labels of an unseen data
 Used the classifiers implemented in Weka software package
 Open source collection of machine learning classifiers for data mining tasks
37
Training and Testing Data
Testing
25%
Training
75%
10 fold
cross validation
Performance
evaluation on
unlabeled data

Malicious Bitly Link Detection –
Evaluation Results
(a) Experiment 1
Mix dataset – Click and Non-click
All features
Malicious Benign
5,926 6,074
Malicious Benign
2,074 1,926
Training data (75%) Test Data (25%)
Evaluation Metric Naive Bayes Decision Tree Random Forest
Accuracy 72.15% 78.37% 80.43%
Recall (malicious) 73.10% 82.40% 81.00%
Recall (Benign) 71.10% 74.10% 79.90%
Precision (malicious) 73.10% 77.40% 81.20%
Precision (Benign) 71.10% 79.60% 79.60%
F-measure (malicious) 73.10% 76.70% 81.10%
F-measure (benign) 71.10% 76.70% 79.70%
38
TP
FP
FN
TN

Evaluation Results
(c) Experiment 2
Mix dataset – Click and Non-click
WHOIS + Non-click based features
Accuracy 73.57% 81.93% 83.50%
Recall (Benign) 73.80% 80.30% 82.80%
F-measure (benign) 72.90% 81.00% 82.80%
Malicious Benign
5,926 6,074
Malicious Benign
2,074 1,926
39
TP
FP
FN
TN

Malicious Bitly Link Detection –
Evaluation Results
(b) Experiment 3
Only Non-click data
WHOIS + Non-click based features
Malicious Benign
2,743 2,796
Malicious Benign
950 897
Accuracy 80.02% 85.06% 86.41%
Recall (Benign) 80.40% 80.80% 83.40%
F-measure (benign) 80.50% 84.80% 86.30%
40
TP
FP
FN
TN

Feature Ranks
Rank Feature
1 Type of referring domains
2 Link Creation domain creation difference
3 Domain age
4 Link creation hour
5 Type of encoders
6 Link creation-click lag
7 Number of encoders
Rank Feature
1 Link creation hour
2 Link Creation domain creation difference
3 Domain age
4 Type of encoders
5 Number of encoders
Complete labeled-dataset Non-Click labeled-dataset
 Using Weka's InfoGainAttributeEval package for attribute selection
 Evaluates the worth of an attribute by measuring the information gain with respect to the class
41

Malicious Bitly Link Detection - Result
Summary
 Increase in accuracy and F-measure on using only non-click based
features
 Not only efficient in detecting clicked malicious Bitly links, but can
identify suspicious links even when no click is received
 This solution can also capture the multi level obfuscation technique used
by attackers
42
Counter measure 4
In addition to the blacklists and
other spam detection filters, Bitly
specific feature set can also be
used to detect malicious content

Proposed Counter Measures - Summary
 Impose a check on the number of connected OSM accounts by a single
profile
 Either directly delete the identified malicious links or introduce a
credibility score for each profile to warn users (of upcoming risks)
 More than a warning page, should go ahead and block the popular
malicious links to prevent its persistence over web
 In addition to various blacklists and other filters, can also incorporate
available Bitly analytics to better detect illegitimate content
43
Counter Measures

 Domains created for a dedicated purpose of spamming eventually die out after
achieving a significant number of hits
 Spammers exploit Bitly's policy of not imposing a cap on the number of connected OSM
accounts
 Existence of malicious communities which operate across Bitly and Twitter
 Bitly is not using the claimed detection services effectively
 Inability / extreme-delay in detection of malicious accounts
 By-passable warning page does not restrict the overall problem of spam
 High classification accuracy for non-click dataset; capable of identifying suspicious Bitly
links much before they target their audience
Conclusion
44
Conclusion

 Since characteristics of spammers change over time , do a detailed comparative
analysis on a more exhaustive dataset
 Broaden and generalize our feature set to detect spam from any short URL services
 Develop a browser extension that can work in real time and classify any short link as
malicious or benign
Future Work
45
Future Work

Acknowledgements
Dr. PK, IIIT-Delhi
Anupama Aggarwal, PhD ,IIIT-Delhi
Brian David Eoff (senior data scientist), Mark Josephson (CEO), Bitly
CERC, IIIT-Delhi
Precog members, friends and family
46

References (I)
 Alexander Neumann, Johannes Barnickel, Ulrike Meyer. Security and Privacy Implications of URL Shortening
Services. In proceedings of Web 2.0 Security and Privacy (W2SP) (2011).
 Florian Klien, Markus Strohmaier. Short Links Under Attack: Geographical Analysis of Spam in a URL Shortener
Network. In proceedings of the 23rd ACM conference on Hypertext and social media (2012), Pages 83-88.
 Demetris Antoniades, Iasonas Polakis, Georgios Kontaxis. we.b: The web of short URLs. In proceedings of the
20th international conference on World wide web (2011), Pages 715-724.
 Federico Maggi, Alessandro Frossi, Stefano Zanero, Gianluca Stringhini, Brett Stone-Gross, Christopher
Kruegel, Giovanni Vigna. Two Years of Short URLs Internet Measurement: Security Threats and
Countermeasures. In proceedings of the 22nd international conference on World Wide Web (2013), Pages 861-
872.
 Sangho Lee and Jong Kim. WARNINGBIRD: Detecting Suspicious URLs in Twitter Stream. NDSS 2012 (2012).
 De Wang, Shamkant B. Navathe, Ling Liu, Danesh Irani, Acar Tamersoy, and Calton Pu. Click Traffic Analysis of
Short URL Spam on Twitter. Collaborative Computing: Networking, Applications and Worksharing
(Collaboratecom), 2013 9th International Conference (2013), Pages 250-259.
 Aditi Gupta and and Ponnurangam Kumaraguru. Credibility ranking of tweets during high impact events. In
Proceedings of the 1st Workshop on Privacy and Security in Online Social Media (PSOSM), In conjunction with
WWW'12 (2012).
47

 Kurt Thomas, Chris Grier, Justin Ma, Vern Paxson, and Dawn Song. Design and Evaluation of a Real-Time URL
Spam Filtering Service. Security and Privacy (SP) IEEE Symposium (2011), Pages 447 - 462.
 Hongyu Gao, Jun Hu, and Christo Wilson. Detecting and Characterizing Social Spam Campaigns. In proceedings
of the 10th ACM SIGCOMM conference on Internet measurement (2010), Pages 35-47.
 Fabricio Benevenuto, Gabriel Magno, Tiago Rodrigues, and Virgilio Almeida. Detecting Spammers on Twitter. In
Collaboration, Electronic messaging, Anti-Abuse and Spam Conference (CEAS) (2010).
 Anupama Aggarwal, Ashwin Rajadesingan, and Ponnurangam Kumaraguru. PhishAri: Automatic Realtime
Phishing Detection on Twitter. In Seventh IEEE APWG eCrime researchers summit (eCRS) (2012). Master's
thesis, IIIT-Delhi, http://precog.iiitd.edu.in/Publications les/Anupama_MTech_Thesis.pdf (2012).
 Chris Grier, Kurt Thomas, Vern Paxson, and Michael Zhang. @spam: The Underground on 140 Characters or
Less. In proceedings of the 17th ACM conference on Computer and communications security (2010), Pages 27-
37.
 Sidharth Chhabra, Anupama Aggarwal, Fabricio Benevenuto, and Ponnurangam Kumaraguru. Phi.sh/$oCiaL:
The Phishing Landscape through Short URLs. CEAS '11 Proceedings of the 8th Annual Collaboration, Electronic
messaging, Anti-Abuse and Spam Conference (2011), Pages 92-101.
References (II)
48

 Saeed Abu-Nimeh, Dario Nappa, Xinlei Wang, and Suku Nair. A Comparison of Machine Learning Techniques
for Phishing Detection. In proceedings of eCrime researchers summit (2007), ACM, Pages 60-69.
 Mashable. Warning: Bit.ly Is Eating Other URL Shorteners for Breakfast.
http://mashable.com/2009/10/12/bitly-domination/, October 2009.
 Symantec. Spam with .gov URLs. http://www.symantec.com/connect/blogs/spam-gov-urls, October 2012.
 Symantec. Malicious Shortened URLS on Social Networking Sites.
http://www.symantec.com/threatreport/topic.jsp?id=threat_activity_trends&aid=malicious_shortened_urls,
2010.
 Mark Hall, Eibe Frank, Georey Holmes, Bernhard Pfahringer, Peter Reutemann, and Ian H. Witten. The WEKA
Data Mining Software: An Update. ACM SIGKDD Explorations Newsletter (2009), vol. 11, no. 1, Pages 1018.
References (III)
49

Exploration of gaps in Bitly's spam detection and relevant countermeasures

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Exploration of gaps in Bitly's spam detection and relevant countermeasures

Similar to Exploration of gaps in Bitly's spam detection and relevant countermeasures (20)

More from Cybersecurity Education and Research Centre

More from Cybersecurity Education and Research Centre (15)

Recently uploaded

Recently uploaded (20)

Exploration of gaps in Bitly's spam detection and relevant countermeasures

Editor's Notes