SlideShare a Scribd company logo
1 of 50
Exploration of gaps in Bitly's spam detection
and relevant counter measures
Neha Gupta
Advisor: Dr. Ponnurangam Kumaraguru
M.Tech Thesis Defense
23-April-2014
Thesis Committee
 Mr. Sachin Gaur, MixORG
 Dr. Vinayak Naik, IIIT-Delhi
 Dr. PK (Chair), IIIT-Delhi
2
Achievements
 Gupta, N., Aggarwal, A., and Kumaraguru, P. bit.ly/can-do-better.
Poster at Security and Privacy Symposium (SPS), IIT-K, 2014.
3
Presentation Outline
 Research Motivation and Aim
 Related Work
 Research Contribution
 Methodology
 Experiments and Analysis
 Malicious Bitly Link Detection
 Conclusion
 Future Work
 Questions
4
Presentation Outline
What are URL shortening services?
Long URL Short URL
…
Others
http://bit.ly/1oL7gi5
https://www.youtube.com/watch?v=ukUL_I14GPw
URL shortening service
hash
The most
popular
5
Research Motivation and Aim
 Shortens close to 80 million links each day
 Marks 2-3 million as suspicious every week
 Twitter’s default URL shortening service before 2011
Use of URL shortening services
 Space gain (Twitter’s 140 character limit)
 More manageable
 Prevent line breaks
 Easy dissemination of content
 Provides useful analytics (e.g. click data)
 Online Social Media (OSM) connection
 Complex link obfuscation
Please like this picture
http://3.bp.blogspot.com/_s5emCsFnEd
E/TKUVi2BopBI/AAAAAAAADl8/lffvi7khF
7g/s1600/googl.png
@abc
Dr. ABC
10:44 PM Tue Aug 30, 2011
Please like this picture bit.ly/1hAVVaE
@abc
Dr. ABC
10:44 PM Tue Aug 30, 2011
versus
6
Research Motivation and Aim
Abuse of URL shortening services
- Attack scenario
URL
shortening
service
One-level obfuscation
Long malicious URL Short malicious URL
Many short URL services detect and restrict long URLs at submission, but Bitly does not
Not so
popular
URL
shortening
service
Long malicious URL Short malicious URL
Popular
URL
shortening
service
Multi-level obfuscation
…
7
Research Motivation and Aim
http://www.sexpixbox.com/kingnet/cute/index.html http://bit.ly/QCMW2S
http://www.sexpixbox.com/kingnet/cute/index.html http://bit.ly/QCMW2Shttp://short.me/ABCD
Abuse of URL shortening services
- Attack execution
Legitimate looking tweet Scam!!
8
Research Motivation and Aim
Major attacks
Year 2014
9
Research Motivation and Aim
Bitly's Spam Detection Policies
+
+
More filters..
10
‘‘ ’’
‘‘
’’
Research Motivation and Aim
Research Aim
A focused study on Bitly to:
 characterize malicious URLs
 examine Bitly’s security policies
 identify Bitly specific features to detect spam
11
Research Motivation and Aim
Presentation Outline
 Research Motivation and Aim
 Related Work
 Research Contribution
 Methodology
 Experiments and Analysis
 Malicious Bitly Link Detection
 Conclusion
 Future Work
 Questions
12
Presentation Outline
Related Work
2009
• Kandylas et al. Relative study of long and short Bitly URLs on Twitter
2010
• Benevenuto et al. + Grier et al. Identification of distinctive features to detect spammers on Twitter
2011
• Antoniades et al. Analysis of content, popularity, and impact of short URLs
• Chhabra et al. Overview of evolving phishing attacks through short URLs on Twitter
• Thomas et al. Classification of a long URL as malicious / benign in real time
2012
• Klien et al. Global usage pattern analysis of short URLs
• Aggarwal et al. Real time phishing detection on Twitter using Twitter and URL based features
• Lee et al. Real time suspicious URL detection technique on Twitter using conditional redirects
2013
• Maggi et al. Study of abuse of short URLs using 622 distinct shortening services
2014
• Nikiforakis et al. Study of ecosystem of ad-based URL shortening services
13
Related Work
Related Work
2013
• Click Traffic Analysis of Short URL Spam on Twitter
Wang et al. Classification of a link as spam / non-spam using only click traffic based features
14
Related Work
 No dedicated study to identify ground security issues specific to a URL shortener
 Unexplored short URL based features to detect malicious content before it targets the audience
Research Gaps
Research Contribution
 Click traffic analysis and social network impact of malicious Bitly links
 Detailed inspection to highlight weaknesses in Bitly’s spam detection
techniques
 Proposal to detect clicked / unclicked malicious Bitly links
15
Related Contribution
Methodology - Acquiring Dataset
link_encoder_info
link_encoder_link_history
link_info
link_expand
link_clicks
link_referring_domains
link_encoders
Bitly Global Hash
Long URL
#Warnings
Link Dataset
(763,160)
Link Metric Dataset
(413,119)
Encoder/User Metric Dataset
(12,344)
(Bitly API)
(Bitly API)
Phase 1 Phase 2 Phase 3
(54.13%) (100%)
16
Methodology
Presentation Outline
 Research Motivation and Aim
 Related Work
 Research Contribution
 Methodology
 Experiments and Analysis
 Malicious Bitly Link Detection
 Conclusion
 Future Work
 Questions
17
Presentation Outline
Metadata Analysis – Content Creation
Bitly Global Hash
Long URL
#Warnings
TLD
Domains
Status check after 5 months
(22,038)Link Dataset (763,160)
Non-existent
Domains
(18,966)
Whitelist check
Domains
(21,982)
Results:
 83.06% suspicious domains non-existent
before / after 5 months
 Total number of click requests made to these dead
domains (only in October) found to be 9,937,250
Inference:
Created for a dedicated purpose of spamming and
eventually die out after achieving significant
number of hits!! 18
Experiments and Analysis
Experiments and Analysis
19
Network Analysis
 Referrer Network
 Connected Network
Experiments and Analysis
Network Analysis - Referrer Network
Referrer as Twitter
37,903
last <=200 tweets
788,759
Text + URL + Domain
Jaccard Similarity |BA|
|BA|
B)J(A,


(Twitter API)
17 Twitter profiles with variance <=0.00012
…
Status check
after 5 months
Manual annotation
21,679
20
4,336
(11.44%)
636
(1.68%)
5,444
(14.36%)
5,302
(13.99%)
22,185
(58.53%)
Experiments and Analysis
Hour of the day vs. Minute of the hour graph for Twitter user –
(a) @dtitgp2. (b) @fujisakikaoru
3 profiles: pornographic content
1 profile: work from home scam
11 profiles: spam but <=3 tweets
2 profiles: suspended
21
Network Analysis – Connected Network
Experiments and Analysis
Twitter
3,415
(63.54%)
Facebook
951
(17.69%)
1,009
18.77%
5,375 users connected Twitter / Facebook profile
22
 Users can connect any number of Facebook / Twitter
accounts
Why more Twitter than Facebook?
 Doesn't allow users to connect Facebook brand / fan
pages for free
Multiple connections
 507 malicious users connected multiple Twitter accounts
 28 malicious users connected at least 10 Twitter accounts
Experiments and Analysis
Network Analysis – Connected Network
Connected OSM network of all encoders
23
Bitly profiles
(Link history)
Bitly warning check
(Connected Twitter
accounts)
(<=200 tweets)
Inter Twitter profile
Jaccard Similarity
(Bitly user name)
Inter Bitly profile
Jaccard Similarity
Manual annotation based on similarity scores
3 malicious communities detected
Experiments and Analysis
Network Analysis – Connected Network
Community 1
2 Bitly users, 9 associated Twitter accounts each
All 18 Twitter accounts shared similar explicit
pornographic content
Dormant on Bitly, active on Twitter
24
Experiments and Analysis
Network Analysis – Connected Network
Counter measure 1
Bitly should impose a restriction on the
number of OSM accounts a user can
connect
Experiments and Analysis
25
Experiments and Analysis
Security Analysis
 Efficiency Check
 Promptness Check
 Tractability Check
(a) Malicious link identification
Security Analysis - Efficiency Check
International consortium that
brings together businesses
affected by phishing attacks
Free checking of suspicious URLs using 52
different website / domain scanning engines
and datasets
Domain / IP level blacklist created
from the occurrence of websites
in unsolicited messages.
1 2 3
26
Experiments and Analysis
1
APWGs live
feed request
setup
142,660
6 months
216
2,656
2,872
+
Bitly warning check 382
(13.30%)
Inference:
 86% undetected malicious links by Bitly in 6 months
 Such low detection rate looks alarming as APWG is a popular and trusted source to detect phishing!
27
Experiments and Analysis
Security Analysis - Efficiency Check
Bitly is not even using the claimed detection services effectively
 Similar analysis when performed against Virustotal, 71.53% undetected links obtained
 36.66% domains blacklisted by SURBL, but undetected by Bitly
 Bitly claims to use SURBL (http://blog.bitly.com/post/138381844/spam-and-malware-protection)
(b) Malicious user profile Identification
collectedlinksTotal
pagewarningBitlytogredirectinLinks
FactorSuspicion encoder
#
#
 Suspicion Factor measures the credibility of a Bitly profile
 Computed for all 12,344 encoders in our dataset
28
Experiments and Analysis
Security Analysis - Efficiency Check
2,018 (12,344 - 10,326) out of 12,344 encoders (16.35%) had a Suspicion Factor=1
i.e. they shortened only suspicious links 29
Experiments and Analysis
Security Analysis - Efficiency Check
Counter measure 2
 Bitly should take some measures
to detect and suspend such users
 If not suspend, a credibility score
can be added with a Bitly profile
Our blog Bitly’s reponse
Tweet to our blog
30
Experiments and Analysis
Security Analysis - Efficiency Check
Highly suspicious profiles: User has shortened at least 100 links + Suspicion Factor is 1 80 profiles
31
Experiments and Analysis
Security Analysis – Promptness Check
User: bamsesang, Month lag: 24
 Also collected their recent link history (after 1 January 2014)
 4 of these 80 users were still active and propagating malicious content
Ease of penetration of spammers and delay in Bitly’s suspicious user detection process which it claims to follow
User: iplayonlinegames, Month lag: 18
Result:
 35.2% identified malicious links in October 2013 are also being actively clicked in year 2014
Inference:
 By-passable Bitly warning page is alone not enough to curtail the dissemination of spam
 No control over the access to already detected malicious links can heavily encourage
spammers to use Bitly
Popular malicious Bitly links
Links with large number of warning pages displayed (URLs with high overall impact)
Bitly Global Hash
Long URL
#Warnings
Link Dataset (763,160)
Reverse sort based on
number of warnings Top 1000
links
Bitly API Recent Click
history
32
Experiments and Analysis
Security Analysis – Tractability Check
Counter measure 3
Bitly should not only throw a
warning page but also block the
visit on popular malicious Bitly links
already detected
 Bitly is not using the claimed detection services effectively
 Extreme delay in suspicious user identification (if at all)
 By-passable Bitly warning page is alone not enough to control the
problem of spam
33
Experiments and Analysis
Security Analysis – Major Findings
Presentation Outline
 Research Motivation and Aim
 Related Work
 Research Contribution
 Methodology
 Experiments and Analysis
 Malicious Bitly Link Detection
 Conclusion
 Future Work
 Questions
34
Presentation Outline
Malicious Bitly Link Detection - Data
Collection and Labeling
a repository of suspected phishing or malware pages maintained by Google Inc.
a public crowdsourced database of phishing URLs
35
Malicious Bitly Link Detection
Tweets from
Twitter’s REST
API
(412,139)
Blacklist + Bitly Warning Check
Extract and
expand bitly
URLs
(34,802)
Malicious
Benign
labeled-datasetunlabeled-datasetCollect data
1. Google Safebrowsing
2. SURBL
3. PhishTank
4. VirusTotal
Data Collection Data Labeling
Malicious Bitly Link Detection – Feature
Selection
No. Feature Name Feature Description
1 Domain age Difference between domain creation / updation date and expiration
date
2 Link Creation domain
creation difference
Difference between domain creation date and bitly link creation date
3 Link creation hour Bitly link creation hour
4 Number of encoders Number of bitly users who encoded a particular link
5 Anonymous and API
encoder ratio
Ratio of encoders as ‘’anonymous’’ or from a Twitter based application
(Twitterfeed, TweetDeck, Tweetbot) to the total number of encoders
6 Link creation first click
difference
Difference in days between bitly link creation date and date of first
click received
7 Referring domains -
direct by total
Ratio of referring domains from a direct source to the total number of
referring domains
WHOIS
specific
Bitly
specific
Non-Click
based
Clickbased
36
Malicious Bitly Link Detection
Malicious Bitly Link Detection -
Experimental Setup
1) Naive Bayes
2) Decision Tree
3) Random Forest
Machine Learning Algorithms
Training on pre-labeled dataset
Predict labels of an unseen data
 Used the classifiers implemented in Weka software package
 Open source collection of machine learning classifiers for data mining tasks
37
Malicious Bitly Link Detection
Training and Testing Data
Testing
25%
Training
75%
10 fold
cross validation
Performance
evaluation on
unlabeled data
Malicious Bitly Link Detection –
Evaluation Results
(a) Experiment 1
Mix dataset – Click and Non-click
All features
Malicious Benign
5,926 6,074
Malicious Benign
2,074 1,926
Training data (75%) Test Data (25%)
Evaluation Metric Naive Bayes Decision Tree Random Forest
Accuracy 72.15% 78.37% 80.43%
Recall (malicious) 73.10% 82.40% 81.00%
Recall (Benign) 71.10% 74.10% 79.90%
Precision (malicious) 73.10% 77.40% 81.20%
Precision (Benign) 71.10% 79.60% 79.60%
F-measure (malicious) 73.10% 76.70% 81.10%
F-measure (benign) 71.10% 76.70% 79.70%
38
TP
FP
FN
TN
Malicious Bitly Link Detection
Malicious Bitly Link Detection -
Evaluation Results
(c) Experiment 2
Mix dataset – Click and Non-click
WHOIS + Non-click based features
Evaluation Metric Naive Bayes Decision Tree Random Forest
Accuracy 73.57% 81.93% 83.50%
Recall (malicious) 73.40% 83.50% 84.20%
Recall (Benign) 73.80% 80.30% 82.80%
Precision (malicious) 75.10% 82.00% 84.00%
Precision (Benign) 72.00% 81.80% 82.90%
F-measure (malicious) 74.20% 82.70% 84.10%
F-measure (benign) 72.90% 81.00% 82.80%
Malicious Benign
5,926 6,074
Malicious Benign
2,074 1,926
Training data (75%) Test Data (25%)
39
TP
FP
FN
TN
Malicious Bitly Link Detection
Malicious Bitly Link Detection –
Evaluation Results
(b) Experiment 3
Only Non-click data
WHOIS + Non-click based features
Malicious Benign
2,743 2,796
Malicious Benign
950 897
Training data (75%) Test Data (25%)
Evaluation Metric Naive Bayes Decision Tree Random Forest
Accuracy 80.02% 85.06% 86.41%
Recall (malicious) 79.60% 89.50% 89.60%
Recall (Benign) 80.40% 80.80% 83.40%
Precision (malicious) 79.30% 81.50% 83.60%
Precision (Benign) 80.70% 89.10% 89.50%
F-measure (malicious) 79.50% 85.30% 86.50%
F-measure (benign) 80.50% 84.80% 86.30%
40
TP
FP
FN
TN
Malicious Bitly Link Detection
Malicious Bitly Link Detection -
Feature Ranks
Rank Feature
1 Type of referring domains
2 Link Creation domain creation difference
3 Domain age
4 Link creation hour
5 Type of encoders
6 Link creation-click lag
7 Number of encoders
Rank Feature
1 Link creation hour
2 Link Creation domain creation difference
3 Domain age
4 Type of encoders
5 Number of encoders
Complete labeled-dataset Non-Click labeled-dataset
 Using Weka's InfoGainAttributeEval package for attribute selection
 Evaluates the worth of an attribute by measuring the information gain with respect to the class
41
Malicious Bitly Link Detection
Malicious Bitly Link Detection - Result
Summary
 Increase in accuracy and F-measure on using only non-click based
features
 Not only efficient in detecting clicked malicious Bitly links, but can
identify suspicious links even when no click is received
 This solution can also capture the multi level obfuscation technique used
by attackers
42
Malicious Bitly Link Detection
Counter measure 4
In addition to the blacklists and
other spam detection filters, Bitly
specific feature set can also be
used to detect malicious content
Proposed Counter Measures - Summary
 Impose a check on the number of connected OSM accounts by a single
profile
 Either directly delete the identified malicious links or introduce a
credibility score for each profile to warn users (of upcoming risks)
 More than a warning page, should go ahead and block the popular
malicious links to prevent its persistence over web
 In addition to various blacklists and other filters, can also incorporate
available Bitly analytics to better detect illegitimate content
43
Counter Measures
 Domains created for a dedicated purpose of spamming eventually die out after
achieving a significant number of hits
 Spammers exploit Bitly's policy of not imposing a cap on the number of connected OSM
accounts
 Existence of malicious communities which operate across Bitly and Twitter
 Bitly is not using the claimed detection services effectively
 Inability / extreme-delay in detection of malicious accounts
 By-passable warning page does not restrict the overall problem of spam
 High classification accuracy for non-click dataset; capable of identifying suspicious Bitly
links much before they target their audience
Conclusion
44
Conclusion
 Since characteristics of spammers change over time , do a detailed comparative
analysis on a more exhaustive dataset
 Broaden and generalize our feature set to detect spam from any short URL services
 Develop a browser extension that can work in real time and classify any short link as
malicious or benign
Future Work
45
Future Work
Acknowledgements
Dr. PK, IIIT-Delhi
Anupama Aggarwal, PhD ,IIIT-Delhi
Brian David Eoff (senior data scientist), Mark Josephson (CEO), Bitly
CERC, IIIT-Delhi
Precog members, friends and family
46
References (I)
 Alexander Neumann, Johannes Barnickel, Ulrike Meyer. Security and Privacy Implications of URL Shortening
Services. In proceedings of Web 2.0 Security and Privacy (W2SP) (2011).
 Florian Klien, Markus Strohmaier. Short Links Under Attack: Geographical Analysis of Spam in a URL Shortener
Network. In proceedings of the 23rd ACM conference on Hypertext and social media (2012), Pages 83-88.
 Demetris Antoniades, Iasonas Polakis, Georgios Kontaxis. we.b: The web of short URLs. In proceedings of the
20th international conference on World wide web (2011), Pages 715-724.
 Federico Maggi, Alessandro Frossi, Stefano Zanero, Gianluca Stringhini, Brett Stone-Gross, Christopher
Kruegel, Giovanni Vigna. Two Years of Short URLs Internet Measurement: Security Threats and
Countermeasures. In proceedings of the 22nd international conference on World Wide Web (2013), Pages 861-
872.
 Sangho Lee and Jong Kim. WARNINGBIRD: Detecting Suspicious URLs in Twitter Stream. NDSS 2012 (2012).
 De Wang, Shamkant B. Navathe, Ling Liu, Danesh Irani, Acar Tamersoy, and Calton Pu. Click Traffic Analysis of
Short URL Spam on Twitter. Collaborative Computing: Networking, Applications and Worksharing
(Collaboratecom), 2013 9th International Conference (2013), Pages 250-259.
 Aditi Gupta and and Ponnurangam Kumaraguru. Credibility ranking of tweets during high impact events. In
Proceedings of the 1st Workshop on Privacy and Security in Online Social Media (PSOSM), In conjunction with
WWW'12 (2012).
47
 Kurt Thomas, Chris Grier, Justin Ma, Vern Paxson, and Dawn Song. Design and Evaluation of a Real-Time URL
Spam Filtering Service. Security and Privacy (SP) IEEE Symposium (2011), Pages 447 - 462.
 Hongyu Gao, Jun Hu, and Christo Wilson. Detecting and Characterizing Social Spam Campaigns. In proceedings
of the 10th ACM SIGCOMM conference on Internet measurement (2010), Pages 35-47.
 Fabricio Benevenuto, Gabriel Magno, Tiago Rodrigues, and Virgilio Almeida. Detecting Spammers on Twitter. In
Collaboration, Electronic messaging, Anti-Abuse and Spam Conference (CEAS) (2010).
 Anupama Aggarwal, Ashwin Rajadesingan, and Ponnurangam Kumaraguru. PhishAri: Automatic Realtime
Phishing Detection on Twitter. In Seventh IEEE APWG eCrime researchers summit (eCRS) (2012). Master's
thesis, IIIT-Delhi, http://precog.iiitd.edu.in/Publications les/Anupama_MTech_Thesis.pdf (2012).
 Chris Grier, Kurt Thomas, Vern Paxson, and Michael Zhang. @spam: The Underground on 140 Characters or
Less. In proceedings of the 17th ACM conference on Computer and communications security (2010), Pages 27-
37.
 Sidharth Chhabra, Anupama Aggarwal, Fabricio Benevenuto, and Ponnurangam Kumaraguru. Phi.sh/$oCiaL:
The Phishing Landscape through Short URLs. CEAS '11 Proceedings of the 8th Annual Collaboration, Electronic
messaging, Anti-Abuse and Spam Conference (2011), Pages 92-101.
References (II)
48
 Saeed Abu-Nimeh, Dario Nappa, Xinlei Wang, and Suku Nair. A Comparison of Machine Learning Techniques
for Phishing Detection. In proceedings of eCrime researchers summit (2007), ACM, Pages 60-69.
 Mashable. Warning: Bit.ly Is Eating Other URL Shorteners for Breakfast.
http://mashable.com/2009/10/12/bitly-domination/, October 2009.
 Symantec. Spam with .gov URLs. http://www.symantec.com/connect/blogs/spam-gov-urls, October 2012.
 Symantec. Malicious Shortened URLS on Social Networking Sites.
http://www.symantec.com/threatreport/topic.jsp?id=threat_activity_trends&aid=malicious_shortened_urls,
2010.
 Mark Hall, Eibe Frank, Georey Holmes, Bernhard Pfahringer, Peter Reutemann, and Ian H. Witten. The WEKA
Data Mining Software: An Update. ACM SIGKDD Explorations Newsletter (2009), vol. 11, no. 1, Pages 1018.
References (III)
49
Thank You!
Questions?
50

More Related Content

What's hot

Classification of instagram fake users using supervised machine learning algo...
Classification of instagram fake users using supervised machine learning algo...Classification of instagram fake users using supervised machine learning algo...
Classification of instagram fake users using supervised machine learning algo...
IJECEIAES
 
Structural Balance Theory Based Recommendation for Social Service Portal
Structural Balance Theory Based Recommendation for Social Service PortalStructural Balance Theory Based Recommendation for Social Service Portal
Structural Balance Theory Based Recommendation for Social Service Portal
YogeshIJTSRD
 

What's hot (20)

IRJET-Fake Product Review Monitoring
IRJET-Fake Product Review MonitoringIRJET-Fake Product Review Monitoring
IRJET-Fake Product Review Monitoring
 
Social networks protection against fake profiles and social bots attacks
Social networks protection against  fake profiles and social bots attacksSocial networks protection against  fake profiles and social bots attacks
Social networks protection against fake profiles and social bots attacks
 
Game Theory Approach for Identity Crime Detection
Game Theory Approach for Identity Crime DetectionGame Theory Approach for Identity Crime Detection
Game Theory Approach for Identity Crime Detection
 
IRJET- Fake Profile Identification using Machine Learning
IRJET-  	  Fake Profile Identification using Machine LearningIRJET-  	  Fake Profile Identification using Machine Learning
IRJET- Fake Profile Identification using Machine Learning
 
IRJET - An Automated System for Detection of Social Engineering Phishing Atta...
IRJET - An Automated System for Detection of Social Engineering Phishing Atta...IRJET - An Automated System for Detection of Social Engineering Phishing Atta...
IRJET - An Automated System for Detection of Social Engineering Phishing Atta...
 
DETECTION OF FAKE ACCOUNTS IN INSTAGRAM USING MACHINE LEARNING
DETECTION OF FAKE ACCOUNTS IN INSTAGRAM USING MACHINE LEARNINGDETECTION OF FAKE ACCOUNTS IN INSTAGRAM USING MACHINE LEARNING
DETECTION OF FAKE ACCOUNTS IN INSTAGRAM USING MACHINE LEARNING
 
PHISHING MITIGATION TECHNIQUES: A LITERATURE SURVEY
PHISHING MITIGATION TECHNIQUES: A LITERATURE SURVEYPHISHING MITIGATION TECHNIQUES: A LITERATURE SURVEY
PHISHING MITIGATION TECHNIQUES: A LITERATURE SURVEY
 
IRJET- An Effective Analysis of Anti Troll System using Artificial Intell...
IRJET-  	  An Effective Analysis of Anti Troll System using Artificial Intell...IRJET-  	  An Effective Analysis of Anti Troll System using Artificial Intell...
IRJET- An Effective Analysis of Anti Troll System using Artificial Intell...
 
Classification of instagram fake users using supervised machine learning algo...
Classification of instagram fake users using supervised machine learning algo...Classification of instagram fake users using supervised machine learning algo...
Classification of instagram fake users using supervised machine learning algo...
 
F017433947
F017433947F017433947
F017433947
 
Tweet sentiment analysis
Tweet sentiment analysisTweet sentiment analysis
Tweet sentiment analysis
 
Twitter sentimentanalysis report
Twitter sentimentanalysis reportTwitter sentimentanalysis report
Twitter sentimentanalysis report
 
A Survey Of Collaborative Filtering Techniques
A Survey Of Collaborative Filtering TechniquesA Survey Of Collaborative Filtering Techniques
A Survey Of Collaborative Filtering Techniques
 
MALICIOUS URL DETECTION USING CONVOLUTIONAL NEURAL NETWORK
MALICIOUS URL DETECTION USING CONVOLUTIONAL NEURAL NETWORKMALICIOUS URL DETECTION USING CONVOLUTIONAL NEURAL NETWORK
MALICIOUS URL DETECTION USING CONVOLUTIONAL NEURAL NETWORK
 
IRJET - Twitter Sentimental Analysis
IRJET -  	  Twitter Sentimental AnalysisIRJET -  	  Twitter Sentimental Analysis
IRJET - Twitter Sentimental Analysis
 
Structural Balance Theory Based Recommendation for Social Service Portal
Structural Balance Theory Based Recommendation for Social Service PortalStructural Balance Theory Based Recommendation for Social Service Portal
Structural Balance Theory Based Recommendation for Social Service Portal
 
Development of Twitter Application #5 - Users
Development of Twitter Application #5 - UsersDevelopment of Twitter Application #5 - Users
Development of Twitter Application #5 - Users
 
Sentiment Analysis of Twitter Data
Sentiment Analysis of Twitter DataSentiment Analysis of Twitter Data
Sentiment Analysis of Twitter Data
 
FRAMEWORK FOR ANALYZING TWITTER TO DETECT COMMUNITY SUSPICIOUS CRIME ACTIVITY
FRAMEWORK FOR ANALYZING TWITTER TO DETECT COMMUNITY SUSPICIOUS CRIME ACTIVITYFRAMEWORK FOR ANALYZING TWITTER TO DETECT COMMUNITY SUSPICIOUS CRIME ACTIVITY
FRAMEWORK FOR ANALYZING TWITTER TO DETECT COMMUNITY SUSPICIOUS CRIME ACTIVITY
 
Learning to detect phishing ur ls
Learning to detect phishing ur lsLearning to detect phishing ur ls
Learning to detect phishing ur ls
 

Similar to Exploration of gaps in Bitly's spam detection and relevant countermeasures

PHISHING URL DETECTION USING LSTM BASED ENSEMBLE LEARNING APPROACHES
PHISHING URL DETECTION USING LSTM BASED ENSEMBLE LEARNING APPROACHESPHISHING URL DETECTION USING LSTM BASED ENSEMBLE LEARNING APPROACHES
PHISHING URL DETECTION USING LSTM BASED ENSEMBLE LEARNING APPROACHES
IJCNCJournal
 
Phishing URL Detection using LSTM Based Ensemble Learning Approaches
Phishing URL Detection using LSTM Based Ensemble Learning ApproachesPhishing URL Detection using LSTM Based Ensemble Learning Approaches
Phishing URL Detection using LSTM Based Ensemble Learning Approaches
IJCNCJournal
 
Identification and Analysis of Malicious Content on Facebook: A Survey
Identification and Analysis of Malicious Content on Facebook: A SurveyIdentification and Analysis of Malicious Content on Facebook: A Survey
Identification and Analysis of Malicious Content on Facebook: A Survey
Cybersecurity Education and Research Centre
 

Similar to Exploration of gaps in Bitly's spam detection and relevant countermeasures (20)

Phishing detection & protection scheme
Phishing detection & protection schemePhishing detection & protection scheme
Phishing detection & protection scheme
 
IRJET- Detecting the Phishing Websites using Enhance Secure Algorithm
IRJET- Detecting the Phishing Websites using Enhance Secure AlgorithmIRJET- Detecting the Phishing Websites using Enhance Secure Algorithm
IRJET- Detecting the Phishing Websites using Enhance Secure Algorithm
 
DETECTION OF MALICIOUS SOCIAL BOTS USING ML TECHNIQUE IN TWITTER NETWORK
DETECTION OF MALICIOUS SOCIAL BOTS USING ML TECHNIQUE IN TWITTER NETWORKDETECTION OF MALICIOUS SOCIAL BOTS USING ML TECHNIQUE IN TWITTER NETWORK
DETECTION OF MALICIOUS SOCIAL BOTS USING ML TECHNIQUE IN TWITTER NETWORK
 
Detecting Phishing Websites Using Machine Learning
Detecting Phishing Websites Using Machine LearningDetecting Phishing Websites Using Machine Learning
Detecting Phishing Websites Using Machine Learning
 
PHISHING URL DETECTION USING LSTM BASED ENSEMBLE LEARNING APPROACHES
PHISHING URL DETECTION USING LSTM BASED ENSEMBLE LEARNING APPROACHESPHISHING URL DETECTION USING LSTM BASED ENSEMBLE LEARNING APPROACHES
PHISHING URL DETECTION USING LSTM BASED ENSEMBLE LEARNING APPROACHES
 
Phishing URL Detection using LSTM Based Ensemble Learning Approaches
Phishing URL Detection using LSTM Based Ensemble Learning ApproachesPhishing URL Detection using LSTM Based Ensemble Learning Approaches
Phishing URL Detection using LSTM Based Ensemble Learning Approaches
 
DP1_160430723010_Divya.pptx
DP1_160430723010_Divya.pptxDP1_160430723010_Divya.pptx
DP1_160430723010_Divya.pptx
 
Spammer Detection and Fake User Identification on Social Networks
Spammer Detection and Fake User Identification on Social NetworksSpammer Detection and Fake User Identification on Social Networks
Spammer Detection and Fake User Identification on Social Networks
 
Malicious Link Detection System
Malicious Link Detection SystemMalicious Link Detection System
Malicious Link Detection System
 
Phishing Detection using Decision Tree Model
Phishing Detection using Decision Tree ModelPhishing Detection using Decision Tree Model
Phishing Detection using Decision Tree Model
 
IRJET - Phishing Attack Detection and Prevention using Linkguard Algorithm
IRJET - Phishing Attack Detection and Prevention using Linkguard AlgorithmIRJET - Phishing Attack Detection and Prevention using Linkguard Algorithm
IRJET - Phishing Attack Detection and Prevention using Linkguard Algorithm
 
PUMMP: PHISHING URL DETECTION USING MACHINE LEARNING WITH MONOMORPHIC AND POL...
PUMMP: PHISHING URL DETECTION USING MACHINE LEARNING WITH MONOMORPHIC AND POL...PUMMP: PHISHING URL DETECTION USING MACHINE LEARNING WITH MONOMORPHIC AND POL...
PUMMP: PHISHING URL DETECTION USING MACHINE LEARNING WITH MONOMORPHIC AND POL...
 
PUMMP: Phishing URL Detection using Machine Learning with Monomorphic and Pol...
PUMMP: Phishing URL Detection using Machine Learning with Monomorphic and Pol...PUMMP: Phishing URL Detection using Machine Learning with Monomorphic and Pol...
PUMMP: Phishing URL Detection using Machine Learning with Monomorphic and Pol...
 
IRJET - Chrome Extension for Detecting Phishing Websites
IRJET -  	  Chrome Extension for Detecting Phishing WebsitesIRJET -  	  Chrome Extension for Detecting Phishing Websites
IRJET - Chrome Extension for Detecting Phishing Websites
 
IRJET- Effective Countering of Communal Hatred During Disaster Events in Soci...
IRJET- Effective Countering of Communal Hatred During Disaster Events in Soci...IRJET- Effective Countering of Communal Hatred During Disaster Events in Soci...
IRJET- Effective Countering of Communal Hatred During Disaster Events in Soci...
 
Detecting Phishing using Machine Learning
Detecting Phishing using Machine LearningDetecting Phishing using Machine Learning
Detecting Phishing using Machine Learning
 
Identification and Analysis of Malicious Content on Facebook: A Survey
Identification and Analysis of Malicious Content on Facebook: A SurveyIdentification and Analysis of Malicious Content on Facebook: A Survey
Identification and Analysis of Malicious Content on Facebook: A Survey
 
Detecting Malicious Bots in Social Media Accounts Using Machine Learning Tech...
Detecting Malicious Bots in Social Media Accounts Using Machine Learning Tech...Detecting Malicious Bots in Social Media Accounts Using Machine Learning Tech...
Detecting Malicious Bots in Social Media Accounts Using Machine Learning Tech...
 
IRJET - Detecting Spiteful Accounts in Social Network
IRJET - Detecting Spiteful Accounts in Social NetworkIRJET - Detecting Spiteful Accounts in Social Network
IRJET - Detecting Spiteful Accounts in Social Network
 
Warningbird a near real time detection system for suspicious urls in twitter ...
Warningbird a near real time detection system for suspicious urls in twitter ...Warningbird a near real time detection system for suspicious urls in twitter ...
Warningbird a near real time detection system for suspicious urls in twitter ...
 

More from Cybersecurity Education and Research Centre

Video Inpainting detection using inconsistencies in optical Flow
Video Inpainting detection using inconsistencies in optical FlowVideo Inpainting detection using inconsistencies in optical Flow
Video Inpainting detection using inconsistencies in optical Flow
Cybersecurity Education and Research Centre
 
Clotho : Saving Programs from Malformed Strings and Incorrect
Clotho : Saving Programs from Malformed Strings and IncorrectClotho : Saving Programs from Malformed Strings and Incorrect
Clotho : Saving Programs from Malformed Strings and Incorrect
Cybersecurity Education and Research Centre
 
Clotho: Saving Programs from Malformed Strings and Incorrect String-handling
Clotho: Saving Programs from Malformed Strings and Incorrect String-handling�Clotho: Saving Programs from Malformed Strings and Incorrect String-handling�
Clotho: Saving Programs from Malformed Strings and Incorrect String-handling
Cybersecurity Education and Research Centre
 
Analyzing Social and Stylometric Features to Identify Spear phishing Emails
Analyzing Social and Stylometric Features to Identify Spear phishing EmailsAnalyzing Social and Stylometric Features to Identify Spear phishing Emails
Analyzing Social and Stylometric Features to Identify Spear phishing Emails
Cybersecurity Education and Research Centre
 
Emerging Phishing Trends and Effectiveness of the Anti-Phishing Landing Page
Emerging Phishing Trends and Effectiveness of the Anti-Phishing Landing PageEmerging Phishing Trends and Effectiveness of the Anti-Phishing Landing Page
Emerging Phishing Trends and Effectiveness of the Anti-Phishing Landing Page
Cybersecurity Education and Research Centre
 
Securing the Digital Enterprise
Securing the Digital EnterpriseSecuring the Digital Enterprise
Securing the Digital Enterprise
Cybersecurity Education and Research Centre
 
Broker Bots: Analyzing automated activity during High Impact Events on Twitter
Broker Bots: Analyzing automated activity during High Impact Events on TwitterBroker Bots: Analyzing automated activity during High Impact Events on Twitter
Broker Bots: Analyzing automated activity during High Impact Events on Twitter
Cybersecurity Education and Research Centre
 
Twitter and Polls: What Do 140 Characters Say About India General Elections 2014
Twitter and Polls: What Do 140 Characters Say About India General Elections 2014Twitter and Polls: What Do 140 Characters Say About India General Elections 2014
Twitter and Polls: What Do 140 Characters Say About India General Elections 2014
Cybersecurity Education and Research Centre
 

More from Cybersecurity Education and Research Centre (15)

Novel Instruction Set Architecture Based Side Channels in popular SSL/TLS Imp...
Novel Instruction Set Architecture Based Side Channels in popular SSL/TLS Imp...Novel Instruction Set Architecture Based Side Channels in popular SSL/TLS Imp...
Novel Instruction Set Architecture Based Side Channels in popular SSL/TLS Imp...
 
Video Inpainting detection using inconsistencies in optical Flow
Video Inpainting detection using inconsistencies in optical FlowVideo Inpainting detection using inconsistencies in optical Flow
Video Inpainting detection using inconsistencies in optical Flow
 
TASVEER : Tomography of India’s Internet Infrastructure
TASVEER : Tomography of India’s Internet InfrastructureTASVEER : Tomography of India’s Internet Infrastructure
TASVEER : Tomography of India’s Internet Infrastructure
 
Data-Driven Assessment of Cyber Risk: Challenges in Assessing and Migrating C...
Data-Driven Assessment of Cyber Risk: Challenges in Assessing and Migrating C...Data-Driven Assessment of Cyber Risk: Challenges in Assessing and Migrating C...
Data-Driven Assessment of Cyber Risk: Challenges in Assessing and Migrating C...
 
A Strategy for Addressing Cyber Security Challenges
A Strategy for Addressing Cyber Security Challenges A Strategy for Addressing Cyber Security Challenges
A Strategy for Addressing Cyber Security Challenges
 
Clotho : Saving Programs from Malformed Strings and Incorrect
Clotho : Saving Programs from Malformed Strings and IncorrectClotho : Saving Programs from Malformed Strings and Incorrect
Clotho : Saving Programs from Malformed Strings and Incorrect
 
National Critical Information Infrastructure Protection Centre (NCIIPC): Role...
National Critical Information Infrastructure Protection Centre (NCIIPC): Role...National Critical Information Infrastructure Protection Centre (NCIIPC): Role...
National Critical Information Infrastructure Protection Centre (NCIIPC): Role...
 
Clotho: Saving Programs from Malformed Strings and Incorrect String-handling
Clotho: Saving Programs from Malformed Strings and Incorrect String-handling�Clotho: Saving Programs from Malformed Strings and Incorrect String-handling�
Clotho: Saving Programs from Malformed Strings and Incorrect String-handling
 
Analyzing Social and Stylometric Features to Identify Spear phishing Emails
Analyzing Social and Stylometric Features to Identify Spear phishing EmailsAnalyzing Social and Stylometric Features to Identify Spear phishing Emails
Analyzing Social and Stylometric Features to Identify Spear phishing Emails
 
Emerging Phishing Trends and Effectiveness of the Anti-Phishing Landing Page
Emerging Phishing Trends and Effectiveness of the Anti-Phishing Landing PageEmerging Phishing Trends and Effectiveness of the Anti-Phishing Landing Page
Emerging Phishing Trends and Effectiveness of the Anti-Phishing Landing Page
 
Securing the Digital Enterprise
Securing the Digital EnterpriseSecuring the Digital Enterprise
Securing the Digital Enterprise
 
Broker Bots: Analyzing automated activity during High Impact Events on Twitter
Broker Bots: Analyzing automated activity during High Impact Events on TwitterBroker Bots: Analyzing automated activity during High Impact Events on Twitter
Broker Bots: Analyzing automated activity during High Impact Events on Twitter
 
Twitter and Polls: What Do 140 Characters Say About India General Elections 2014
Twitter and Polls: What Do 140 Characters Say About India General Elections 2014Twitter and Polls: What Do 140 Characters Say About India General Elections 2014
Twitter and Polls: What Do 140 Characters Say About India General Elections 2014
 
Web Application Security 101
Web Application Security 101Web Application Security 101
Web Application Security 101
 
The future of interaction & its security challenges
The future of interaction & its security challengesThe future of interaction & its security challenges
The future of interaction & its security challenges
 

Recently uploaded

VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
dharasingh5698
 
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
ssuser89054b
 

Recently uploaded (20)

(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
 
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
 
Intro To Electric Vehicles PDF Notes.pdf
Intro To Electric Vehicles PDF Notes.pdfIntro To Electric Vehicles PDF Notes.pdf
Intro To Electric Vehicles PDF Notes.pdf
 
Block diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptBlock diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.ppt
 
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdf
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leap
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
 
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
 
NFPA 5000 2024 standard .
NFPA 5000 2024 standard                                  .NFPA 5000 2024 standard                                  .
NFPA 5000 2024 standard .
 
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 

Exploration of gaps in Bitly's spam detection and relevant countermeasures

  • 1. Exploration of gaps in Bitly's spam detection and relevant counter measures Neha Gupta Advisor: Dr. Ponnurangam Kumaraguru M.Tech Thesis Defense 23-April-2014
  • 2. Thesis Committee  Mr. Sachin Gaur, MixORG  Dr. Vinayak Naik, IIIT-Delhi  Dr. PK (Chair), IIIT-Delhi 2
  • 3. Achievements  Gupta, N., Aggarwal, A., and Kumaraguru, P. bit.ly/can-do-better. Poster at Security and Privacy Symposium (SPS), IIT-K, 2014. 3
  • 4. Presentation Outline  Research Motivation and Aim  Related Work  Research Contribution  Methodology  Experiments and Analysis  Malicious Bitly Link Detection  Conclusion  Future Work  Questions 4 Presentation Outline
  • 5. What are URL shortening services? Long URL Short URL … Others http://bit.ly/1oL7gi5 https://www.youtube.com/watch?v=ukUL_I14GPw URL shortening service hash The most popular 5 Research Motivation and Aim  Shortens close to 80 million links each day  Marks 2-3 million as suspicious every week  Twitter’s default URL shortening service before 2011
  • 6. Use of URL shortening services  Space gain (Twitter’s 140 character limit)  More manageable  Prevent line breaks  Easy dissemination of content  Provides useful analytics (e.g. click data)  Online Social Media (OSM) connection  Complex link obfuscation Please like this picture http://3.bp.blogspot.com/_s5emCsFnEd E/TKUVi2BopBI/AAAAAAAADl8/lffvi7khF 7g/s1600/googl.png @abc Dr. ABC 10:44 PM Tue Aug 30, 2011 Please like this picture bit.ly/1hAVVaE @abc Dr. ABC 10:44 PM Tue Aug 30, 2011 versus 6 Research Motivation and Aim
  • 7. Abuse of URL shortening services - Attack scenario URL shortening service One-level obfuscation Long malicious URL Short malicious URL Many short URL services detect and restrict long URLs at submission, but Bitly does not Not so popular URL shortening service Long malicious URL Short malicious URL Popular URL shortening service Multi-level obfuscation … 7 Research Motivation and Aim http://www.sexpixbox.com/kingnet/cute/index.html http://bit.ly/QCMW2S http://www.sexpixbox.com/kingnet/cute/index.html http://bit.ly/QCMW2Shttp://short.me/ABCD
  • 8. Abuse of URL shortening services - Attack execution Legitimate looking tweet Scam!! 8 Research Motivation and Aim
  • 10. Bitly's Spam Detection Policies + + More filters.. 10 ‘‘ ’’ ‘‘ ’’ Research Motivation and Aim
  • 11. Research Aim A focused study on Bitly to:  characterize malicious URLs  examine Bitly’s security policies  identify Bitly specific features to detect spam 11 Research Motivation and Aim
  • 12. Presentation Outline  Research Motivation and Aim  Related Work  Research Contribution  Methodology  Experiments and Analysis  Malicious Bitly Link Detection  Conclusion  Future Work  Questions 12 Presentation Outline
  • 13. Related Work 2009 • Kandylas et al. Relative study of long and short Bitly URLs on Twitter 2010 • Benevenuto et al. + Grier et al. Identification of distinctive features to detect spammers on Twitter 2011 • Antoniades et al. Analysis of content, popularity, and impact of short URLs • Chhabra et al. Overview of evolving phishing attacks through short URLs on Twitter • Thomas et al. Classification of a long URL as malicious / benign in real time 2012 • Klien et al. Global usage pattern analysis of short URLs • Aggarwal et al. Real time phishing detection on Twitter using Twitter and URL based features • Lee et al. Real time suspicious URL detection technique on Twitter using conditional redirects 2013 • Maggi et al. Study of abuse of short URLs using 622 distinct shortening services 2014 • Nikiforakis et al. Study of ecosystem of ad-based URL shortening services 13 Related Work
  • 14. Related Work 2013 • Click Traffic Analysis of Short URL Spam on Twitter Wang et al. Classification of a link as spam / non-spam using only click traffic based features 14 Related Work  No dedicated study to identify ground security issues specific to a URL shortener  Unexplored short URL based features to detect malicious content before it targets the audience Research Gaps
  • 15. Research Contribution  Click traffic analysis and social network impact of malicious Bitly links  Detailed inspection to highlight weaknesses in Bitly’s spam detection techniques  Proposal to detect clicked / unclicked malicious Bitly links 15 Related Contribution
  • 16. Methodology - Acquiring Dataset link_encoder_info link_encoder_link_history link_info link_expand link_clicks link_referring_domains link_encoders Bitly Global Hash Long URL #Warnings Link Dataset (763,160) Link Metric Dataset (413,119) Encoder/User Metric Dataset (12,344) (Bitly API) (Bitly API) Phase 1 Phase 2 Phase 3 (54.13%) (100%) 16 Methodology
  • 17. Presentation Outline  Research Motivation and Aim  Related Work  Research Contribution  Methodology  Experiments and Analysis  Malicious Bitly Link Detection  Conclusion  Future Work  Questions 17 Presentation Outline
  • 18. Metadata Analysis – Content Creation Bitly Global Hash Long URL #Warnings TLD Domains Status check after 5 months (22,038)Link Dataset (763,160) Non-existent Domains (18,966) Whitelist check Domains (21,982) Results:  83.06% suspicious domains non-existent before / after 5 months  Total number of click requests made to these dead domains (only in October) found to be 9,937,250 Inference: Created for a dedicated purpose of spamming and eventually die out after achieving significant number of hits!! 18 Experiments and Analysis
  • 19. Experiments and Analysis 19 Network Analysis  Referrer Network  Connected Network Experiments and Analysis
  • 20. Network Analysis - Referrer Network Referrer as Twitter 37,903 last <=200 tweets 788,759 Text + URL + Domain Jaccard Similarity |BA| |BA| B)J(A,   (Twitter API) 17 Twitter profiles with variance <=0.00012 … Status check after 5 months Manual annotation 21,679 20 4,336 (11.44%) 636 (1.68%) 5,444 (14.36%) 5,302 (13.99%) 22,185 (58.53%) Experiments and Analysis Hour of the day vs. Minute of the hour graph for Twitter user – (a) @dtitgp2. (b) @fujisakikaoru 3 profiles: pornographic content 1 profile: work from home scam 11 profiles: spam but <=3 tweets 2 profiles: suspended
  • 21. 21 Network Analysis – Connected Network Experiments and Analysis
  • 22. Twitter 3,415 (63.54%) Facebook 951 (17.69%) 1,009 18.77% 5,375 users connected Twitter / Facebook profile 22  Users can connect any number of Facebook / Twitter accounts Why more Twitter than Facebook?  Doesn't allow users to connect Facebook brand / fan pages for free Multiple connections  507 malicious users connected multiple Twitter accounts  28 malicious users connected at least 10 Twitter accounts Experiments and Analysis Network Analysis – Connected Network Connected OSM network of all encoders
  • 23. 23 Bitly profiles (Link history) Bitly warning check (Connected Twitter accounts) (<=200 tweets) Inter Twitter profile Jaccard Similarity (Bitly user name) Inter Bitly profile Jaccard Similarity Manual annotation based on similarity scores 3 malicious communities detected Experiments and Analysis Network Analysis – Connected Network
  • 24. Community 1 2 Bitly users, 9 associated Twitter accounts each All 18 Twitter accounts shared similar explicit pornographic content Dormant on Bitly, active on Twitter 24 Experiments and Analysis Network Analysis – Connected Network Counter measure 1 Bitly should impose a restriction on the number of OSM accounts a user can connect
  • 25. Experiments and Analysis 25 Experiments and Analysis Security Analysis  Efficiency Check  Promptness Check  Tractability Check
  • 26. (a) Malicious link identification Security Analysis - Efficiency Check International consortium that brings together businesses affected by phishing attacks Free checking of suspicious URLs using 52 different website / domain scanning engines and datasets Domain / IP level blacklist created from the occurrence of websites in unsolicited messages. 1 2 3 26 Experiments and Analysis
  • 27. 1 APWGs live feed request setup 142,660 6 months 216 2,656 2,872 + Bitly warning check 382 (13.30%) Inference:  86% undetected malicious links by Bitly in 6 months  Such low detection rate looks alarming as APWG is a popular and trusted source to detect phishing! 27 Experiments and Analysis Security Analysis - Efficiency Check Bitly is not even using the claimed detection services effectively  Similar analysis when performed against Virustotal, 71.53% undetected links obtained  36.66% domains blacklisted by SURBL, but undetected by Bitly  Bitly claims to use SURBL (http://blog.bitly.com/post/138381844/spam-and-malware-protection)
  • 28. (b) Malicious user profile Identification collectedlinksTotal pagewarningBitlytogredirectinLinks FactorSuspicion encoder # #  Suspicion Factor measures the credibility of a Bitly profile  Computed for all 12,344 encoders in our dataset 28 Experiments and Analysis Security Analysis - Efficiency Check
  • 29. 2,018 (12,344 - 10,326) out of 12,344 encoders (16.35%) had a Suspicion Factor=1 i.e. they shortened only suspicious links 29 Experiments and Analysis Security Analysis - Efficiency Check Counter measure 2  Bitly should take some measures to detect and suspend such users  If not suspend, a credibility score can be added with a Bitly profile
  • 30. Our blog Bitly’s reponse Tweet to our blog 30 Experiments and Analysis Security Analysis - Efficiency Check
  • 31. Highly suspicious profiles: User has shortened at least 100 links + Suspicion Factor is 1 80 profiles 31 Experiments and Analysis Security Analysis – Promptness Check User: bamsesang, Month lag: 24  Also collected their recent link history (after 1 January 2014)  4 of these 80 users were still active and propagating malicious content Ease of penetration of spammers and delay in Bitly’s suspicious user detection process which it claims to follow User: iplayonlinegames, Month lag: 18
  • 32. Result:  35.2% identified malicious links in October 2013 are also being actively clicked in year 2014 Inference:  By-passable Bitly warning page is alone not enough to curtail the dissemination of spam  No control over the access to already detected malicious links can heavily encourage spammers to use Bitly Popular malicious Bitly links Links with large number of warning pages displayed (URLs with high overall impact) Bitly Global Hash Long URL #Warnings Link Dataset (763,160) Reverse sort based on number of warnings Top 1000 links Bitly API Recent Click history 32 Experiments and Analysis Security Analysis – Tractability Check Counter measure 3 Bitly should not only throw a warning page but also block the visit on popular malicious Bitly links already detected
  • 33.  Bitly is not using the claimed detection services effectively  Extreme delay in suspicious user identification (if at all)  By-passable Bitly warning page is alone not enough to control the problem of spam 33 Experiments and Analysis Security Analysis – Major Findings
  • 34. Presentation Outline  Research Motivation and Aim  Related Work  Research Contribution  Methodology  Experiments and Analysis  Malicious Bitly Link Detection  Conclusion  Future Work  Questions 34 Presentation Outline
  • 35. Malicious Bitly Link Detection - Data Collection and Labeling a repository of suspected phishing or malware pages maintained by Google Inc. a public crowdsourced database of phishing URLs 35 Malicious Bitly Link Detection Tweets from Twitter’s REST API (412,139) Blacklist + Bitly Warning Check Extract and expand bitly URLs (34,802) Malicious Benign labeled-datasetunlabeled-datasetCollect data 1. Google Safebrowsing 2. SURBL 3. PhishTank 4. VirusTotal Data Collection Data Labeling
  • 36. Malicious Bitly Link Detection – Feature Selection No. Feature Name Feature Description 1 Domain age Difference between domain creation / updation date and expiration date 2 Link Creation domain creation difference Difference between domain creation date and bitly link creation date 3 Link creation hour Bitly link creation hour 4 Number of encoders Number of bitly users who encoded a particular link 5 Anonymous and API encoder ratio Ratio of encoders as ‘’anonymous’’ or from a Twitter based application (Twitterfeed, TweetDeck, Tweetbot) to the total number of encoders 6 Link creation first click difference Difference in days between bitly link creation date and date of first click received 7 Referring domains - direct by total Ratio of referring domains from a direct source to the total number of referring domains WHOIS specific Bitly specific Non-Click based Clickbased 36 Malicious Bitly Link Detection
  • 37. Malicious Bitly Link Detection - Experimental Setup 1) Naive Bayes 2) Decision Tree 3) Random Forest Machine Learning Algorithms Training on pre-labeled dataset Predict labels of an unseen data  Used the classifiers implemented in Weka software package  Open source collection of machine learning classifiers for data mining tasks 37 Malicious Bitly Link Detection Training and Testing Data Testing 25% Training 75% 10 fold cross validation Performance evaluation on unlabeled data
  • 38. Malicious Bitly Link Detection – Evaluation Results (a) Experiment 1 Mix dataset – Click and Non-click All features Malicious Benign 5,926 6,074 Malicious Benign 2,074 1,926 Training data (75%) Test Data (25%) Evaluation Metric Naive Bayes Decision Tree Random Forest Accuracy 72.15% 78.37% 80.43% Recall (malicious) 73.10% 82.40% 81.00% Recall (Benign) 71.10% 74.10% 79.90% Precision (malicious) 73.10% 77.40% 81.20% Precision (Benign) 71.10% 79.60% 79.60% F-measure (malicious) 73.10% 76.70% 81.10% F-measure (benign) 71.10% 76.70% 79.70% 38 TP FP FN TN Malicious Bitly Link Detection
  • 39. Malicious Bitly Link Detection - Evaluation Results (c) Experiment 2 Mix dataset – Click and Non-click WHOIS + Non-click based features Evaluation Metric Naive Bayes Decision Tree Random Forest Accuracy 73.57% 81.93% 83.50% Recall (malicious) 73.40% 83.50% 84.20% Recall (Benign) 73.80% 80.30% 82.80% Precision (malicious) 75.10% 82.00% 84.00% Precision (Benign) 72.00% 81.80% 82.90% F-measure (malicious) 74.20% 82.70% 84.10% F-measure (benign) 72.90% 81.00% 82.80% Malicious Benign 5,926 6,074 Malicious Benign 2,074 1,926 Training data (75%) Test Data (25%) 39 TP FP FN TN Malicious Bitly Link Detection
  • 40. Malicious Bitly Link Detection – Evaluation Results (b) Experiment 3 Only Non-click data WHOIS + Non-click based features Malicious Benign 2,743 2,796 Malicious Benign 950 897 Training data (75%) Test Data (25%) Evaluation Metric Naive Bayes Decision Tree Random Forest Accuracy 80.02% 85.06% 86.41% Recall (malicious) 79.60% 89.50% 89.60% Recall (Benign) 80.40% 80.80% 83.40% Precision (malicious) 79.30% 81.50% 83.60% Precision (Benign) 80.70% 89.10% 89.50% F-measure (malicious) 79.50% 85.30% 86.50% F-measure (benign) 80.50% 84.80% 86.30% 40 TP FP FN TN Malicious Bitly Link Detection
  • 41. Malicious Bitly Link Detection - Feature Ranks Rank Feature 1 Type of referring domains 2 Link Creation domain creation difference 3 Domain age 4 Link creation hour 5 Type of encoders 6 Link creation-click lag 7 Number of encoders Rank Feature 1 Link creation hour 2 Link Creation domain creation difference 3 Domain age 4 Type of encoders 5 Number of encoders Complete labeled-dataset Non-Click labeled-dataset  Using Weka's InfoGainAttributeEval package for attribute selection  Evaluates the worth of an attribute by measuring the information gain with respect to the class 41 Malicious Bitly Link Detection
  • 42. Malicious Bitly Link Detection - Result Summary  Increase in accuracy and F-measure on using only non-click based features  Not only efficient in detecting clicked malicious Bitly links, but can identify suspicious links even when no click is received  This solution can also capture the multi level obfuscation technique used by attackers 42 Malicious Bitly Link Detection Counter measure 4 In addition to the blacklists and other spam detection filters, Bitly specific feature set can also be used to detect malicious content
  • 43. Proposed Counter Measures - Summary  Impose a check on the number of connected OSM accounts by a single profile  Either directly delete the identified malicious links or introduce a credibility score for each profile to warn users (of upcoming risks)  More than a warning page, should go ahead and block the popular malicious links to prevent its persistence over web  In addition to various blacklists and other filters, can also incorporate available Bitly analytics to better detect illegitimate content 43 Counter Measures
  • 44.  Domains created for a dedicated purpose of spamming eventually die out after achieving a significant number of hits  Spammers exploit Bitly's policy of not imposing a cap on the number of connected OSM accounts  Existence of malicious communities which operate across Bitly and Twitter  Bitly is not using the claimed detection services effectively  Inability / extreme-delay in detection of malicious accounts  By-passable warning page does not restrict the overall problem of spam  High classification accuracy for non-click dataset; capable of identifying suspicious Bitly links much before they target their audience Conclusion 44 Conclusion
  • 45.  Since characteristics of spammers change over time , do a detailed comparative analysis on a more exhaustive dataset  Broaden and generalize our feature set to detect spam from any short URL services  Develop a browser extension that can work in real time and classify any short link as malicious or benign Future Work 45 Future Work
  • 46. Acknowledgements Dr. PK, IIIT-Delhi Anupama Aggarwal, PhD ,IIIT-Delhi Brian David Eoff (senior data scientist), Mark Josephson (CEO), Bitly CERC, IIIT-Delhi Precog members, friends and family 46
  • 47. References (I)  Alexander Neumann, Johannes Barnickel, Ulrike Meyer. Security and Privacy Implications of URL Shortening Services. In proceedings of Web 2.0 Security and Privacy (W2SP) (2011).  Florian Klien, Markus Strohmaier. Short Links Under Attack: Geographical Analysis of Spam in a URL Shortener Network. In proceedings of the 23rd ACM conference on Hypertext and social media (2012), Pages 83-88.  Demetris Antoniades, Iasonas Polakis, Georgios Kontaxis. we.b: The web of short URLs. In proceedings of the 20th international conference on World wide web (2011), Pages 715-724.  Federico Maggi, Alessandro Frossi, Stefano Zanero, Gianluca Stringhini, Brett Stone-Gross, Christopher Kruegel, Giovanni Vigna. Two Years of Short URLs Internet Measurement: Security Threats and Countermeasures. In proceedings of the 22nd international conference on World Wide Web (2013), Pages 861- 872.  Sangho Lee and Jong Kim. WARNINGBIRD: Detecting Suspicious URLs in Twitter Stream. NDSS 2012 (2012).  De Wang, Shamkant B. Navathe, Ling Liu, Danesh Irani, Acar Tamersoy, and Calton Pu. Click Traffic Analysis of Short URL Spam on Twitter. Collaborative Computing: Networking, Applications and Worksharing (Collaboratecom), 2013 9th International Conference (2013), Pages 250-259.  Aditi Gupta and and Ponnurangam Kumaraguru. Credibility ranking of tweets during high impact events. In Proceedings of the 1st Workshop on Privacy and Security in Online Social Media (PSOSM), In conjunction with WWW'12 (2012). 47
  • 48.  Kurt Thomas, Chris Grier, Justin Ma, Vern Paxson, and Dawn Song. Design and Evaluation of a Real-Time URL Spam Filtering Service. Security and Privacy (SP) IEEE Symposium (2011), Pages 447 - 462.  Hongyu Gao, Jun Hu, and Christo Wilson. Detecting and Characterizing Social Spam Campaigns. In proceedings of the 10th ACM SIGCOMM conference on Internet measurement (2010), Pages 35-47.  Fabricio Benevenuto, Gabriel Magno, Tiago Rodrigues, and Virgilio Almeida. Detecting Spammers on Twitter. In Collaboration, Electronic messaging, Anti-Abuse and Spam Conference (CEAS) (2010).  Anupama Aggarwal, Ashwin Rajadesingan, and Ponnurangam Kumaraguru. PhishAri: Automatic Realtime Phishing Detection on Twitter. In Seventh IEEE APWG eCrime researchers summit (eCRS) (2012). Master's thesis, IIIT-Delhi, http://precog.iiitd.edu.in/Publications les/Anupama_MTech_Thesis.pdf (2012).  Chris Grier, Kurt Thomas, Vern Paxson, and Michael Zhang. @spam: The Underground on 140 Characters or Less. In proceedings of the 17th ACM conference on Computer and communications security (2010), Pages 27- 37.  Sidharth Chhabra, Anupama Aggarwal, Fabricio Benevenuto, and Ponnurangam Kumaraguru. Phi.sh/$oCiaL: The Phishing Landscape through Short URLs. CEAS '11 Proceedings of the 8th Annual Collaboration, Electronic messaging, Anti-Abuse and Spam Conference (2011), Pages 92-101. References (II) 48
  • 49.  Saeed Abu-Nimeh, Dario Nappa, Xinlei Wang, and Suku Nair. A Comparison of Machine Learning Techniques for Phishing Detection. In proceedings of eCrime researchers summit (2007), ACM, Pages 60-69.  Mashable. Warning: Bit.ly Is Eating Other URL Shorteners for Breakfast. http://mashable.com/2009/10/12/bitly-domination/, October 2009.  Symantec. Spam with .gov URLs. http://www.symantec.com/connect/blogs/spam-gov-urls, October 2012.  Symantec. Malicious Shortened URLS on Social Networking Sites. http://www.symantec.com/threatreport/topic.jsp?id=threat_activity_trends&aid=malicious_shortened_urls, 2010.  Mark Hall, Eibe Frank, Georey Holmes, Bernhard Pfahringer, Peter Reutemann, and Ian H. Witten. The WEKA Data Mining Software: An Update. ACM SIGKDD Explorations Newsletter (2009), vol. 11, no. 1, Pages 1018. References (III) 49

Editor's Notes

  1. Dominated the market and gained major traction when Twitter started to use it as a default URL shortener in year 2009 before the launch of its own service, t.co in the year 2011.
  2. This snapshot on the right clearly shows the amount of space saved
  3. Thus task of an attacker becomes even more simple on bitly
  4. Here we can see some major attacks carried out by exploiting bitlyWe are interested in studying bitly not only because it is the most popular but also because attackers take undue advantage of the trust users have on Bitly. This makes them propagate a lot of spam using bitly as a medium. All this imposes additional performance pressure on Bitly to timely detect and handle illegitimate content
  5. Domain / IP level blacklist created from the occurrence of websites in unsolicited messages.Google Safebrowsing 2 is a repository of suspected phishing or malware pages maintained by Google Inc.
  6. APWG: International consortium that brings together businesses affected by phishing attacksFirst analysis that we did was to analyze the metadata to understand the content creation
  7. Referrer network is the network that is directing traffic to the maliciousbitly linksAll this makes it evident that the detection of malicious Bitly URLs can also help identify suspicious Twitter profiles. The results also highlights that only detection but no deletion of these Bitly links does not restrict the activities of spammers.After jaccard value 0.00012, the next value Explain jaccard similarity
  8. Connected OSM network of all encoders in link metric dataset
  9. *In order to infer a possible reason behind the connection of multiple twitter accounts with a single bitly account, we did a small experiment*We again computed the jaccard similarity scores and compared all these accounts with each otherNext we collected all their connected twitter accounts with their tweets and inter compared them based on their jaccard similarity scoresAll the bitly profiles were then also inter compared the same wayWith all this experiment and a little manual annotation, we could identify 3 malicious communities in our dataset
  10. Sharing pornographic content is banned according to Twitter’s policiesExistence of these malicious communities across Bitly and Twitter shows that spammers exploit Bitly’s policy of imposing no restriction on the number of connected OSM accountsThis community of 2 Bitly and 45 Twitter accounts appears to be an active spam campaign
  11. In order to inspect the efficiency of Bitly in identifying malicious links, we did a comparison against 3 popular blacklists
  12. Bitly does not claim to use APWG, but such low rate still looks alarming as APWG is a popular and trusted source to detect phishing. Since VirusTotal is a collated result set from 52 detection services, such high undetection rate again highlights how Bitly misses on a lot of spamWe also did similar kind of experiments using virustotal and SURBL, details of these experiments are documented in my thesis report
  13. To make users aware of the malicious profiles and uncoming risks* 2,558 encoders (20.72%) had at least 80% of their shortened URLs as malicious (Suspicion Factor &gt;= 0.8)Highlights the malicious intent of these encoders on creating their Bitly accounts
  14. Our results in the above experiment clearly brings to light as to how Bitly users keep shortening only malicious URLs. Till this point of our study, it was unknown to us as to how (if at all) Bitly reacts to suspicious user profiles. To report our findings and get some insights, we made a blog entry on our initial data analysis
  15. This particular graph gives the month separated timeline for the bitly user bamsesang* Extracted Link creation time and Number of clicks* user bamsesang shortened all malicious links for 7 months, remained inactive for close to one year and then shortened the bad URLs again.
  16. * 7 of 10 malicious Bitly links with more than 70,000 warning pages are also active in terms of luring users and receiving clicks* important to study because it gives a clear picture about the persistence of already identified malignant URL propagation through Bitly network.* Restricted our study to only popular links because we wanted to capture the URLs with high overall impact.* Collected click history of top 1000 Bitly links from our link-dataset based on the number of warning pages reported in the month of October 2013 * Bitly detected these suspicious links months before, users are still getting trapped and visiting these links
  17. * In order to collect the data, we used Twitter Rest API 1 and its search method to get only tweets with a Bitly URL.* restricted our search query to bit.lySURBL 3 is a consolidated list of websites that appear in unsolicited messages. SURBL lookup feature allows a user to check a domain name against the ones blacklisted by SURBL. For this we used SURBL client library implemented in python.VirusTotal is an aggregated information warehouse of malicious links and domains as marked by 52 website scanning engines and contributed byusers.
  18. WHOIS is a query and response protocol that gives information like domain name, domain creation / updation date and domain expiration date for a particular URL.Non-Click based features are the ones which define general characteristics of a short Bitly URL and are independent of its click history. Clickbased features on the other hand depends on the click analytics of a Bitly link.
  19. Simple probabilistic classifier based on the Bayes&apos; theorem, Maximum likelihood principle, Data point classified into the class with highest probability Most popular , Decision tree as a predictive model,Binary (yes / no) decisions at each level Multiple decision trees, Output class is the mode of classes output by individual trees, Can run efficiently on large datasets
  20. * Of the 8,000 malicious ground truth links in our labeled-dataset, we found that 3,693 (46.16%) links were never clicked by anyone. * Although this property itself serves as a feature in the above classication, but we believe that our classier should perform even better if we segregate all links with zero clicks.
  21. On using onlyclick-dataset, accuracy dropped to 74% -&gt; though this is decent, but this shows that the click pattern is not very distinct for malicious / benign linksThus our algorithm is not only efficient in detecting malicious links after they receive clicks, but can also identify such malicious content much before it target its audience
  22. our data is only october 13, Since the characteristics of spammers change over time, we can do a detailed and comparative analysis on a more exhaustive datasetClassier currently works better on non-click dataset, because only 2 click based features used for the classificationThis would require a temporal analysis and can serve as another good feature for our classierProfile attributes not considered because it requires extracting encoder information and analyzing their click history for each link. We understand that such an evaluation is a time consuming process.As a part of our future work, we would like to include these parameters since incorporation of such discriminating features can help improve the overall performance ofour classifiers
  23. AnupamaAggarwal for shepherding me and spending her valuable time to review my thesis.Bitly and particularly Brian David Eo (senior data scientist) and Mark Josephson (CEO) for sharing the data with us.express my sincere gratitude to CERC at IIIT-Delhi for providing me an exposure to share my ideas with experts from different parts of the world. I thank my fellow lab-mates from Precog research group at IIIT- Delhi for all their encouragement and insightful comments. Last but not the least, I would like to thank all my family members and friends who encouraged and kept me motivated throughout the project.