SlideShare una empresa de Scribd logo
1 de 33
Classifying Phishing URLs Using
Recurrent Neural Networks
Sergio Villegas
Javier Vargas
*Alejandro Correa Bahnsen
Easy Solutions Research
Eduardo Contreras Bohorquez
Fabio A. Gonzalez
MindLab Research Group,
Universidad Nacional de
Colombia
Industry recognition
A leading global provider of electronic
fraud prevention for financial institutions
and enterprise customers
385 customers
In 30 countries
100 million
Users protected
27+ billion
Online connections monitored
About Easy Solutions®
Easy Solutions to be Acquired by New Joint Venture Creating Global, Secure Infrastructure Company
Phishing
3
Phishing is the act of defrauding an online
user in order to obtain personal information
by posing as a trustworthy institution or
entity.
Typical Phishing Example
4
Why Phishing Detection is Hard
5
Original Website Only Using Images Subtle Changes
Is It Phishing?
Ideal Phishing Detection System
7
Machine
Learning
Algorithm
Ideal Phishing Detection System - Issues
8
Issues with full content
analysis:
• Time consuming
• Impractical to process
millions of websites per day
• Hard to implement for
small devices
There is always the need for an URL
9
Database of URLs
1,000,000 Phishing URLs from PhishTank
10
http://moviesjingle.com/auto/163.com/index.php
1,000,000 Legitimate URLs from Common Crawl
http://paypal.com.update.account.toughbook.cl/8a30e847925afc597516
1aeabe8930f1/?cmd=_home&dispatch=d09b78f5812945a73610edf38
http://msystemtech.ru/components/com_users/Italy/zz/Login.php?run=
_login-submit&session=68bbd43c854147324d77872062349924
https://www.sanfordhealth.org/ChildrensHealth/Article/73980
http://www.grahamleader.com/ci_25029538/these-are-5-worst-super-
bowl-halftime-shows&defid=1634182
http://www.carolinaguesthouse.co.uk/onlinebooking/?industrytype=1&
startdate=2013-09-05&nights=2&location&productid=25d47a24-6b74
CLASSIFYING PHISHING USING
URL LEXICAL AND
STATISTICAL FREQUENCIES
11
URL Lexical and Statistical Frequencies
12
http://www.papaya.com/secure_login.php
URL length Alexa
Ranking
Path length
URL Entropy
# of .com
Punctuation
count
TLD count
Is IP?
Euclidean
distance
KS & KL
distance
URL Lexical and Statistical Frequencies
13
http://www.papaya.com/secure_login.php
URL length Alexa
Ranking
Path length
URL Entropy
# of .com
Punctuation
count
TLD count
Is IP?
Euclidean
distance
KS & KL
distance
Is It Phishing?
URL Lexical and Statistical Frequencies
14
3-Fold CV Accuracy Recall Precision
Average 93.47% 93.28% 93.64%
Deviation 0.01% 0.02% 0.03%
Results:
URL Lexical and Statistical Frequencies
15
Feature
Importance
MODELING PHISHING URLS
WITH RECURRENT
NEURAL NETWORKS
16
Normal Neural Network
17
Source: https://en.wikipedia.org/wiki/Artificial_neural_network
Recurrent Neural Networks RNN
Have loops!
19
The Problem of Long-Term Dependencies
20
Short term dependencies are easy
long term …
Long-Short Term Memory Networks LSTM
21
RNN contains
a single layer
LSTM contains
four interacting
layers
Source: http://colah.github.io/posts/2015-08-Understanding-LSTMs/
Long-Short Term Memory Networks LSTM
22
Key idea: Cell State
LSTM Step-by-Step
23
Step 1. Decide what information is going to be used
LSTM Step-by-Step
24
Step 2. Which new information is stored
LSTM Step-by-Step
25
Step 3. Update old cell state
LSTM Step-by-Step
26
Step 4. Make prediction
Modeling Architecture for URL Classification
27
URL
h
t
t
p
:
/
/
w
w
w
.
p
a
p
a
y
a
.
c
o
m
One hot
Encoding
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
Embedding
3.2 1.2 … 1.7
6.4 2.3 … 2.6
6.4 3.0 … 1.7
3.4 2.6 … 3.4
2.6 3.8 … 2.6
3.5 3.2 … 6.4
1.7 4.2 … 6.4
8.6 2.4 … 6.4
4.3 2.9 … 6.4
2.2 3.4 … 3.4
3.2 2.6 … 2.6
4.2 2.2 … 3.5
2.4 3.2 … 1.7
2.9 1.7 … 8.6
3.0 6.4 … 2.6
2.6 6.4 … 3.8
3.8 3.4 … 3.2
3.3 2.6 … 2.2
3.1 2.2 … 2.9
1.8 3.2 … 3.0
2.5 6.4 … 2.6
LSTM
LSTM
LSTM
LSTM
Sigmoid
…
Long-Short Term Memory Networks
28
3-Fold CV Accuracy Recall Precision
Average 98.76% 98.93% 98.60%
Deviation 0.04% 0.02% 0.02%
Results:
Models Comparison
29
90%
91%
92%
93%
94%
95%
96%
97%
98%
99%
100%
Accuracy Recall Precision
Long-Short Term Memory Network Random Forest
Models Comparison
30
Model
Random Forest
Long-Short Term
Memory Network
Memory
Consumption (MB)
289
0.56
Evaluation Time
(URLs per sec)
942
281
Training Time
(minutes)
2.95
238.7
What we learned
• Discerning URLs by their patterns is a good predictor of
phishing websites
• LSTM model shows an overall higher prediction
performance without the need of expert knowledge to
create the features
31
Free to use
32
Thank you!
Any questions or comments, please let me know.
Alejandro Correa Bahnsen, PhD
Chief Data Scientist
acorrea@easysol.net

Más contenido relacionado

La actualidad más candente

[CB20] Operation Chimera - APT Operation Targets Semiconductor Vendors by CK ...
[CB20] Operation Chimera - APT Operation Targets Semiconductor Vendors by CK ...[CB20] Operation Chimera - APT Operation Targets Semiconductor Vendors by CK ...
[CB20] Operation Chimera - APT Operation Targets Semiconductor Vendors by CK ...
CODE BLUE
 
BlueHat v18 || The law of unintended consequences - gdpr impact on cybersecur...
BlueHat v18 || The law of unintended consequences - gdpr impact on cybersecur...BlueHat v18 || The law of unintended consequences - gdpr impact on cybersecur...
BlueHat v18 || The law of unintended consequences - gdpr impact on cybersecur...
BlueHat Security Conference
 
BlueHat v18 || Modern day entomology - examining the inner workings of the bu...
BlueHat v18 || Modern day entomology - examining the inner workings of the bu...BlueHat v18 || Modern day entomology - examining the inner workings of the bu...
BlueHat v18 || Modern day entomology - examining the inner workings of the bu...
BlueHat Security Conference
 
PHISHING DETECTION
PHISHING DETECTIONPHISHING DETECTION
PHISHING DETECTION
umme ayesha
 

La actualidad más candente (20)

Blueliv Corporate Brochure 2017
Blueliv Corporate Brochure 2017Blueliv Corporate Brochure 2017
Blueliv Corporate Brochure 2017
 
[CB20] Operation Chimera - APT Operation Targets Semiconductor Vendors by CK ...
[CB20] Operation Chimera - APT Operation Targets Semiconductor Vendors by CK ...[CB20] Operation Chimera - APT Operation Targets Semiconductor Vendors by CK ...
[CB20] Operation Chimera - APT Operation Targets Semiconductor Vendors by CK ...
 
Data Analytics in Cyber Security - Intellisys 2015 Keynote
Data Analytics in Cyber Security - Intellisys 2015 KeynoteData Analytics in Cyber Security - Intellisys 2015 Keynote
Data Analytics in Cyber Security - Intellisys 2015 Keynote
 
BlueHat v18 || The law of unintended consequences - gdpr impact on cybersecur...
BlueHat v18 || The law of unintended consequences - gdpr impact on cybersecur...BlueHat v18 || The law of unintended consequences - gdpr impact on cybersecur...
BlueHat v18 || The law of unintended consequences - gdpr impact on cybersecur...
 
The Anatomy of a Data Breach
The Anatomy of a Data BreachThe Anatomy of a Data Breach
The Anatomy of a Data Breach
 
InfoSec Monthly News Recap: April 2017
InfoSec Monthly News Recap: April 2017InfoSec Monthly News Recap: April 2017
InfoSec Monthly News Recap: April 2017
 
CONFidence 2017: Hackers vs SOC - 12 hours to break in, 250 days to detect (G...
CONFidence 2017: Hackers vs SOC - 12 hours to break in, 250 days to detect (G...CONFidence 2017: Hackers vs SOC - 12 hours to break in, 250 days to detect (G...
CONFidence 2017: Hackers vs SOC - 12 hours to break in, 250 days to detect (G...
 
Intelligent Application Security
Intelligent Application SecurityIntelligent Application Security
Intelligent Application Security
 
Symantec Webinar | How to Detect Targeted Ransomware with MITRE ATT&CK
Symantec Webinar | How to Detect Targeted Ransomware with MITRE ATT&CKSymantec Webinar | How to Detect Targeted Ransomware with MITRE ATT&CK
Symantec Webinar | How to Detect Targeted Ransomware with MITRE ATT&CK
 
(SACON) Nilanjan, Jitendra chauhan & Abhisek Datta - How does an attacker kno...
(SACON) Nilanjan, Jitendra chauhan & Abhisek Datta - How does an attacker kno...(SACON) Nilanjan, Jitendra chauhan & Abhisek Datta - How does an attacker kno...
(SACON) Nilanjan, Jitendra chauhan & Abhisek Datta - How does an attacker kno...
 
BlueHat v18 || Modern day entomology - examining the inner workings of the bu...
BlueHat v18 || Modern day entomology - examining the inner workings of the bu...BlueHat v18 || Modern day entomology - examining the inner workings of the bu...
BlueHat v18 || Modern day entomology - examining the inner workings of the bu...
 
Cybersecurity: How to Use What We Already Know
Cybersecurity: How to Use What We Already KnowCybersecurity: How to Use What We Already Know
Cybersecurity: How to Use What We Already Know
 
PHISHING DETECTION
PHISHING DETECTIONPHISHING DETECTION
PHISHING DETECTION
 
How Machine Learning & AI Will Improve Cyber Security
How Machine Learning & AI Will Improve Cyber SecurityHow Machine Learning & AI Will Improve Cyber Security
How Machine Learning & AI Will Improve Cyber Security
 
"Inter- application vulnerabilities. hunting for bugs in secure applications"...
"Inter- application vulnerabilities. hunting for bugs in secure applications"..."Inter- application vulnerabilities. hunting for bugs in secure applications"...
"Inter- application vulnerabilities. hunting for bugs in secure applications"...
 
Netpluz - Managed Firewall & Endpoint Protection
Netpluz - Managed Firewall & Endpoint Protection Netpluz - Managed Firewall & Endpoint Protection
Netpluz - Managed Firewall & Endpoint Protection
 
Why Organisations Need_Barac
Why Organisations Need_BaracWhy Organisations Need_Barac
Why Organisations Need_Barac
 
Insider theft detection
Insider theft detection Insider theft detection
Insider theft detection
 
Phishing Attacks: A Challenge Ahead
Phishing Attacks: A Challenge AheadPhishing Attacks: A Challenge Ahead
Phishing Attacks: A Challenge Ahead
 
Introduction to MITRE ATT&CK
Introduction to MITRE ATT&CKIntroduction to MITRE ATT&CK
Introduction to MITRE ATT&CK
 

Destacado

Fraud Detection with Cost-Sensitive Predictive Analytics
Fraud Detection with Cost-Sensitive Predictive AnalyticsFraud Detection with Cost-Sensitive Predictive Analytics
Fraud Detection with Cost-Sensitive Predictive Analytics
Alejandro Correa Bahnsen, PhD
 
Analytics - compitiendo en la era de la informacion
Analytics - compitiendo en la era de la informacionAnalytics - compitiendo en la era de la informacion
Analytics - compitiendo en la era de la informacion
Alejandro Correa Bahnsen, PhD
 
Maximizing a churn campaigns profitability with cost sensitive machine learning
Maximizing a churn campaigns profitability with cost sensitive machine learningMaximizing a churn campaigns profitability with cost sensitive machine learning
Maximizing a churn campaigns profitability with cost sensitive machine learning
Alejandro Correa Bahnsen, PhD
 
Fraud analytics detección y prevención de fraudes en la era del big data sl...
Fraud analytics detección y prevención de fraudes en la era del big data   sl...Fraud analytics detección y prevención de fraudes en la era del big data   sl...
Fraud analytics detección y prevención de fraudes en la era del big data sl...
Alejandro Correa Bahnsen, PhD
 
1609 Fraud Data Science
1609 Fraud Data Science1609 Fraud Data Science
1609 Fraud Data Science
Alejandro Correa Bahnsen, PhD
 
Ensembles of example dependent cost-sensitive decision trees slides
Ensembles of example dependent cost-sensitive decision trees slidesEnsembles of example dependent cost-sensitive decision trees slides
Ensembles of example dependent cost-sensitive decision trees slides
Alejandro Correa Bahnsen, PhD
 

Destacado (13)

Fraud Detection with Cost-Sensitive Predictive Analytics
Fraud Detection with Cost-Sensitive Predictive AnalyticsFraud Detection with Cost-Sensitive Predictive Analytics
Fraud Detection with Cost-Sensitive Predictive Analytics
 
Example-Dependent Cost-Sensitive Credit Card Fraud Detection
Example-Dependent Cost-Sensitive Credit Card Fraud DetectionExample-Dependent Cost-Sensitive Credit Card Fraud Detection
Example-Dependent Cost-Sensitive Credit Card Fraud Detection
 
PhD Defense - Example-Dependent Cost-Sensitive Classification
PhD Defense - Example-Dependent Cost-Sensitive ClassificationPhD Defense - Example-Dependent Cost-Sensitive Classification
PhD Defense - Example-Dependent Cost-Sensitive Classification
 
Analytics - compitiendo en la era de la informacion
Analytics - compitiendo en la era de la informacionAnalytics - compitiendo en la era de la informacion
Analytics - compitiendo en la era de la informacion
 
Modern Data Science
Modern Data ScienceModern Data Science
Modern Data Science
 
Maximizing a churn campaigns profitability with cost sensitive machine learning
Maximizing a churn campaigns profitability with cost sensitive machine learningMaximizing a churn campaigns profitability with cost sensitive machine learning
Maximizing a churn campaigns profitability with cost sensitive machine learning
 
Fraud analytics detección y prevención de fraudes en la era del big data sl...
Fraud analytics detección y prevención de fraudes en la era del big data   sl...Fraud analytics detección y prevención de fraudes en la era del big data   sl...
Fraud analytics detección y prevención de fraudes en la era del big data sl...
 
1609 Fraud Data Science
1609 Fraud Data Science1609 Fraud Data Science
1609 Fraud Data Science
 
2013 credit card fraud detection why theory dosent adjust to practice
2013 credit card fraud detection why theory dosent adjust to practice2013 credit card fraud detection why theory dosent adjust to practice
2013 credit card fraud detection why theory dosent adjust to practice
 
2011 advanced analytics through the credit cycle
2011 advanced analytics through the credit cycle2011 advanced analytics through the credit cycle
2011 advanced analytics through the credit cycle
 
Maximizing a churn campaign’s profitability with cost sensitive predictive an...
Maximizing a churn campaign’s profitability with cost sensitive predictive an...Maximizing a churn campaign’s profitability with cost sensitive predictive an...
Maximizing a churn campaign’s profitability with cost sensitive predictive an...
 
Demystifying machine learning using lime
Demystifying machine learning using limeDemystifying machine learning using lime
Demystifying machine learning using lime
 
Ensembles of example dependent cost-sensitive decision trees slides
Ensembles of example dependent cost-sensitive decision trees slidesEnsembles of example dependent cost-sensitive decision trees slides
Ensembles of example dependent cost-sensitive decision trees slides
 

Similar a Classifying Phishing URLs Using Recurrent Neural Networks

CyberCrime in the Cloud and How to defend Yourself
CyberCrime in the Cloud and How to defend Yourself CyberCrime in the Cloud and How to defend Yourself
CyberCrime in the Cloud and How to defend Yourself
Alert Logic
 

Similar a Classifying Phishing URLs Using Recurrent Neural Networks (20)

BDAS-2017 | Deep Neural Networks Para la Detección de Phishing
BDAS-2017 | Deep Neural Networks Para la Detección de PhishingBDAS-2017 | Deep Neural Networks Para la Detección de Phishing
BDAS-2017 | Deep Neural Networks Para la Detección de Phishing
 
Introduction to Ion – a layer 2 network for Decentralized Identifiers with Bi...
Introduction to Ion – a layer 2 network for Decentralized Identifiers with Bi...Introduction to Ion – a layer 2 network for Decentralized Identifiers with Bi...
Introduction to Ion – a layer 2 network for Decentralized Identifiers with Bi...
 
BLOCKHUNTER.pptx
BLOCKHUNTER.pptxBLOCKHUNTER.pptx
BLOCKHUNTER.pptx
 
SCADA Security: The Five Stages of Cyber Grief
SCADA Security: The Five Stages of Cyber GriefSCADA Security: The Five Stages of Cyber Grief
SCADA Security: The Five Stages of Cyber Grief
 
CyberCrime in the Cloud and How to defend Yourself
CyberCrime in the Cloud and How to defend Yourself CyberCrime in the Cloud and How to defend Yourself
CyberCrime in the Cloud and How to defend Yourself
 
2016 - 10 questions you should answer before building a new microservice
2016 - 10 questions you should answer before building a new microservice2016 - 10 questions you should answer before building a new microservice
2016 - 10 questions you should answer before building a new microservice
 
CLASS 2018 - Palestra de Edgard Capdevielle (Presidente e CEO – Nozomi)
CLASS 2018 - Palestra de Edgard Capdevielle (Presidente e CEO – Nozomi)CLASS 2018 - Palestra de Edgard Capdevielle (Presidente e CEO – Nozomi)
CLASS 2018 - Palestra de Edgard Capdevielle (Presidente e CEO – Nozomi)
 
SCADA Security: The Five Stages of Cyber Grief
SCADA Security: The Five Stages of Cyber GriefSCADA Security: The Five Stages of Cyber Grief
SCADA Security: The Five Stages of Cyber Grief
 
IRJET - Improving Password System using Blockchain
IRJET - Improving Password System using BlockchainIRJET - Improving Password System using Blockchain
IRJET - Improving Password System using Blockchain
 
CLASS 2022 - Marty Edwards (Tenable) - O perigo crescente de ransomware crimi...
CLASS 2022 - Marty Edwards (Tenable) - O perigo crescente de ransomware crimi...CLASS 2022 - Marty Edwards (Tenable) - O perigo crescente de ransomware crimi...
CLASS 2022 - Marty Edwards (Tenable) - O perigo crescente de ransomware crimi...
 
Blockchain on Azure
Blockchain on AzureBlockchain on Azure
Blockchain on Azure
 
IoT meets Big Data
IoT meets Big DataIoT meets Big Data
IoT meets Big Data
 
Mris network architecture proposal r1
Mris network architecture proposal r1Mris network architecture proposal r1
Mris network architecture proposal r1
 
Blockchains and Adult Education
Blockchains and Adult EducationBlockchains and Adult Education
Blockchains and Adult Education
 
Industrial Control System Network Cyber Security Monitoring Solution (SCAB)
Industrial Control System Network Cyber Security Monitoring Solution (SCAB)Industrial Control System Network Cyber Security Monitoring Solution (SCAB)
Industrial Control System Network Cyber Security Monitoring Solution (SCAB)
 
Core intel
Core intelCore intel
Core intel
 
Cyber security
Cyber securityCyber security
Cyber security
 
ING CoreIntel - collect and process network logs across data centers in near ...
ING CoreIntel - collect and process network logs across data centers in near ...ING CoreIntel - collect and process network logs across data centers in near ...
ING CoreIntel - collect and process network logs across data centers in near ...
 
How to measure your security response readiness?
How to measure your security response readiness?How to measure your security response readiness?
How to measure your security response readiness?
 
Blockchain, Finance & Regulatory Development
Blockchain, Finance & Regulatory DevelopmentBlockchain, Finance & Regulatory Development
Blockchain, Finance & Regulatory Development
 

Último

如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
wsppdmt
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
HyderabadDolls
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
chadhar227
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
ahmedjiabur940
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
gajnagarg
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
gajnagarg
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
nirzagarg
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
HyderabadDolls
 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
gajnagarg
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
nirzagarg
 

Último (20)

5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
 
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
 
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
 
Statistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbersStatistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbers
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 

Classifying Phishing URLs Using Recurrent Neural Networks

  • 1. Classifying Phishing URLs Using Recurrent Neural Networks Sergio Villegas Javier Vargas *Alejandro Correa Bahnsen Easy Solutions Research Eduardo Contreras Bohorquez Fabio A. Gonzalez MindLab Research Group, Universidad Nacional de Colombia
  • 2. Industry recognition A leading global provider of electronic fraud prevention for financial institutions and enterprise customers 385 customers In 30 countries 100 million Users protected 27+ billion Online connections monitored About Easy Solutions® Easy Solutions to be Acquired by New Joint Venture Creating Global, Secure Infrastructure Company
  • 3. Phishing 3 Phishing is the act of defrauding an online user in order to obtain personal information by posing as a trustworthy institution or entity.
  • 5. Why Phishing Detection is Hard 5 Original Website Only Using Images Subtle Changes
  • 6.
  • 7. Is It Phishing? Ideal Phishing Detection System 7 Machine Learning Algorithm
  • 8. Ideal Phishing Detection System - Issues 8 Issues with full content analysis: • Time consuming • Impractical to process millions of websites per day • Hard to implement for small devices
  • 9. There is always the need for an URL 9
  • 10. Database of URLs 1,000,000 Phishing URLs from PhishTank 10 http://moviesjingle.com/auto/163.com/index.php 1,000,000 Legitimate URLs from Common Crawl http://paypal.com.update.account.toughbook.cl/8a30e847925afc597516 1aeabe8930f1/?cmd=_home&dispatch=d09b78f5812945a73610edf38 http://msystemtech.ru/components/com_users/Italy/zz/Login.php?run= _login-submit&session=68bbd43c854147324d77872062349924 https://www.sanfordhealth.org/ChildrensHealth/Article/73980 http://www.grahamleader.com/ci_25029538/these-are-5-worst-super- bowl-halftime-shows&defid=1634182 http://www.carolinaguesthouse.co.uk/onlinebooking/?industrytype=1& startdate=2013-09-05&nights=2&location&productid=25d47a24-6b74
  • 11. CLASSIFYING PHISHING USING URL LEXICAL AND STATISTICAL FREQUENCIES 11
  • 12. URL Lexical and Statistical Frequencies 12 http://www.papaya.com/secure_login.php URL length Alexa Ranking Path length URL Entropy # of .com Punctuation count TLD count Is IP? Euclidean distance KS & KL distance
  • 13. URL Lexical and Statistical Frequencies 13 http://www.papaya.com/secure_login.php URL length Alexa Ranking Path length URL Entropy # of .com Punctuation count TLD count Is IP? Euclidean distance KS & KL distance Is It Phishing?
  • 14. URL Lexical and Statistical Frequencies 14 3-Fold CV Accuracy Recall Precision Average 93.47% 93.28% 93.64% Deviation 0.01% 0.02% 0.03% Results:
  • 15. URL Lexical and Statistical Frequencies 15 Feature Importance
  • 16. MODELING PHISHING URLS WITH RECURRENT NEURAL NETWORKS 16
  • 17. Normal Neural Network 17 Source: https://en.wikipedia.org/wiki/Artificial_neural_network
  • 18.
  • 19. Recurrent Neural Networks RNN Have loops! 19
  • 20. The Problem of Long-Term Dependencies 20 Short term dependencies are easy long term …
  • 21. Long-Short Term Memory Networks LSTM 21 RNN contains a single layer LSTM contains four interacting layers Source: http://colah.github.io/posts/2015-08-Understanding-LSTMs/
  • 22. Long-Short Term Memory Networks LSTM 22 Key idea: Cell State
  • 23. LSTM Step-by-Step 23 Step 1. Decide what information is going to be used
  • 24. LSTM Step-by-Step 24 Step 2. Which new information is stored
  • 25. LSTM Step-by-Step 25 Step 3. Update old cell state
  • 26. LSTM Step-by-Step 26 Step 4. Make prediction
  • 27. Modeling Architecture for URL Classification 27 URL h t t p : / / w w w . p a p a y a . c o m One hot Encoding … … … … … … … … … … … … … … … … … … … … … Embedding 3.2 1.2 … 1.7 6.4 2.3 … 2.6 6.4 3.0 … 1.7 3.4 2.6 … 3.4 2.6 3.8 … 2.6 3.5 3.2 … 6.4 1.7 4.2 … 6.4 8.6 2.4 … 6.4 4.3 2.9 … 6.4 2.2 3.4 … 3.4 3.2 2.6 … 2.6 4.2 2.2 … 3.5 2.4 3.2 … 1.7 2.9 1.7 … 8.6 3.0 6.4 … 2.6 2.6 6.4 … 3.8 3.8 3.4 … 3.2 3.3 2.6 … 2.2 3.1 2.2 … 2.9 1.8 3.2 … 3.0 2.5 6.4 … 2.6 LSTM LSTM LSTM LSTM Sigmoid …
  • 28. Long-Short Term Memory Networks 28 3-Fold CV Accuracy Recall Precision Average 98.76% 98.93% 98.60% Deviation 0.04% 0.02% 0.02% Results:
  • 29. Models Comparison 29 90% 91% 92% 93% 94% 95% 96% 97% 98% 99% 100% Accuracy Recall Precision Long-Short Term Memory Network Random Forest
  • 30. Models Comparison 30 Model Random Forest Long-Short Term Memory Network Memory Consumption (MB) 289 0.56 Evaluation Time (URLs per sec) 942 281 Training Time (minutes) 2.95 238.7
  • 31. What we learned • Discerning URLs by their patterns is a good predictor of phishing websites • LSTM model shows an overall higher prediction performance without the need of expert knowledge to create the features 31
  • 33. Thank you! Any questions or comments, please let me know. Alejandro Correa Bahnsen, PhD Chief Data Scientist acorrea@easysol.net