SlideShare una empresa de Scribd logo
1 de 10
THE FIFTH ELEPHANT
- ARJUN B.M.
MUDPIPE
MaliciousURLDetectionfor
PhishingIdentificationandPrevention
PHISHING INTRODUCTION
The fraudulent practice of sending emails purporting to be from reputable
companies in order to induce individuals to reveal personal information, such
as passwords and credit card numbers
MOTIVES: Financial gain, damage reputation, identity theft, fame & notoriety
Phishing websites indicators:
• Visually appears like the original website
• Email creates a sense of urgency to force user action
• Fake HTTPS certificate & domain name
• Provides attractive offers which tempts the user to respond
PROBLEM STATEMENT
For Employees:
• Accessing malicious sites by being victims of phishing emails
• No mechanism to check bad sites by employees through self-service
• Lack of awareness and training for employees
For Security Teams
• Manual time & effort spent to block sites by Security Operations team
• Lack of internal ML solution insights on phishing data; current solutions maybe rule-based
• Different teams/networks may have different requirements for site access, which cannot be
served by external commercial solutions
For Business:
• 91% of all cyber-attacks are via phishing and they have devastating consequences
• Licensing cost for commercial based solutions to detect phishing sites
• Dependency on external solution/product
APPROACH &METHODOLOGY
MACHINE LEARNING APPROACH
• Data Collection & Validation
• Parameter Determination (Address Bar based
Features, HTML and JavaScript based Features, Domain
based Features, Abnormal Based Features, URL Blacklist
Features)
• Feature Extraction from unknown, incoming
data (test data)
• Create baseline model with initial dataset
• Evaluate performance of model and fine-tuning
• Apply test data on pre-trained baseline model &
make prediction
• Compare with known data sources & further
fine-tune results
• Retrain model on frequent intervals for better
accuracy, context and relevancy
• Classification model pickled and exposed as
REST API
WHITELIST / BLACKLIST APPROACH
• Identify data sources which provide
info on phishing sites
• Scrape data from data sources
• Create whitelist / blacklist and
compare URLs
CONS
• Lack of updated data sources
• Lack of real-time intelligence
• Data not comprehensive enough
• Extensive effort for data scraping
RULE BASED APPROACH
• Determine phishing indicators
• Define rules using combination of
indicators
• Compare & match URLs against
rules to deny/allow
CONS
• Complex rule set definitions
• Overhead in managing and
updating rules
• High False Positive and False
Negative rates
ARCHITECTURE /WORKFLOW
BASELINE
DATA SET
BASELINE
ML MODEL
PREDICTION
TRAIN
EXPOSE AS
REST API
FEATURE
EXTRACTION
TEST DATA
OUTPUT
RETRAIN
WEB TRAFFIC
(UNKNOWN DATA)
INPUT
• Address Bar based Features
• HTML and JavaScript based
Features
• Domain based Features
• Abnormal Based Features
• URL Blacklist Features
• Total of 30 features
• CLASSIFICATION = 0: LEGITIMATE
• CLASSIFICATION = 1: PHISHING
• COMPARE WITH KNOWN SOURCES
• PROBABILITY OF PREDICTION
SECURITY
ACTION
(BLOCK / ALLOW)
FEATURE EXTRACTION
1. having_IP_Address
2. URL_Length
3. Shortening_Service
4. having_At_Symbol
5. double_slash_redirecting
6. Prefix_Suffix
7. having_Sub_Domain
8. SSL_State
25. DNS_Record
26. web_traffic_rank
27. Page_Rank
28. Google_Index
29. Links_pointing_to_page
30. Statistical_report - top
phishing domains
Classification output: 0 = legitimate, 1 = phishing
9. Domain_registeration_length
10. Favicon
11. Open_ports
12. HTTPS_token_in_URL
13. Request_URL
14. URL_of_Anchor
15. Links_in_tags
16. Server_Form_Handler
17. Submitting_to_email
18. Abnormal_URL
19. Site_Redirect
20. on_mouseover_changes
21. RightClick_Disabled
22. popUpWindow
23. Iframe_redirection
24. age_of_domain
DEPLOYING TOPRODUCTION
• Context specific use-cases:
• Certain sub-nets within the org might require access to certain websites to support business functionality
• Org might want to block access to sites even though they are classified as “suspicious” by commercial softwares
• Infrastructural & capacity planning considerations: client, load balancer, web server, queues, etc
• REST-API approach: train, retrain & predictions
• Develop automation test cases for your model (especially on feature engineering side)
• Automate evaluation of the production model, which allows to efficiently back-test changes to the model on historical data and determine if
improvements have been made or not
• Possibly have different ML models / end-points exposed for different sections of the network or for different departments
• Have a fall-back or set-default-value for parameters which fail to get processed by the Feature Engineering module (exception handling)
• Decouple the input and the output for the model; model should still work if parameters are added, modified or deleted in feature engineering
• Single egress point for web traffic, where the ML model can be plugged-in with the REST API
• Have a fail-open or kill-switch mechanism for traffic to flow through if model processing fails
• Place model operation in “monitoring” or “non-blocking” mode initially, which allows the ML model to get additional data and allows for fine-
tuning and prevents errors
• Supplement with existing controls like spam filtering, black-listing, etc
• Model should refer to other data sources as well for fine-tuning in the initial stages
• Baselining and retraining the model at frequent internals; also maintaining model versions
• Provide security analysts with an option to tweak/edit input data for contextual representation
• Deploying the MODEL client-side versus server-side
PROS&CONS
PROS
• Reduce dependency, cost & license on third-party external software
• Re-use of in-house org’s data rather than contribute towards improving
commercial software
• Better insights into online behavior of employees
• Real-time protection for employees who access malicious websites or
click on phishing links
• Detect and prevent against unknown phishing attacks, as new patterns
are created by attackers
• Next level of intelligence on top of signature-based prevention
techniques & blacklists
• Email filtering solutions help in filtering phishing/spam emails, but this
provides holistic protection for all outgoing internet traffic
• Centralized solution implemented org-wide and no dependency on client-
side agents/software
• Anti-phishing: move from offline to real-time; move from reactive to
proactive
CONS
• Data collection & building data repository
• Initial baseline dataset has too few records
• Cost / Maintenance of solution/product
• Fine-tuning of rules & predictions to meet
changing threat vectors
• False positive rate could cause bad user
experience
• Needs to be supplemented with Cyber Threat
Intel
• Solution works only when users are connected to
org network, since there is no client-side agent
AUDIENCE TAKE-AWAYS
• Opportunity for engineers and analysts to collaborate and work
together to build tailored intelligent security solutions / products
• Learn the various considerations in designing and deploying a
ML solution in the InfoSec domain
EMAIL : arjun.job14@gmail.com
LINKEDIN: https://www.linkedin.com/in/arjunbm
FURTHERREADING
LINKS & REFERENCES
• https://www.researchgate.net/publication/226420039_Detection_of_Phishing_Attacks_
A_Machine_Learning_Approach
• https://ieeexplore.ieee.org/document/8004877
• https://pdfs.semanticscholar.org/188f/3bde688d5a47ce86bc0a8eca03aeb1bb9dfc.pdf

Más contenido relacionado

La actualidad más candente

PHISHING DETECTION
PHISHING DETECTIONPHISHING DETECTION
PHISHING DETECTION
umme ayesha
 
Phishing attacks ppt
Phishing attacks pptPhishing attacks ppt
Phishing attacks ppt
Aryan Ragu
 
Ethical Hacking & Penetration Testing
Ethical Hacking & Penetration TestingEthical Hacking & Penetration Testing
Ethical Hacking & Penetration Testing
ecmee
 

La actualidad más candente (20)

Cross site scripting
Cross site scriptingCross site scripting
Cross site scripting
 
PHISHING DETECTION
PHISHING DETECTIONPHISHING DETECTION
PHISHING DETECTION
 
KHNOG 3: DDoS Attack Prevention
KHNOG 3: DDoS Attack PreventionKHNOG 3: DDoS Attack Prevention
KHNOG 3: DDoS Attack Prevention
 
HTTP HOST header attacks
HTTP HOST header attacksHTTP HOST header attacks
HTTP HOST header attacks
 
Wi-Fi Hotspot Attacks
Wi-Fi Hotspot AttacksWi-Fi Hotspot Attacks
Wi-Fi Hotspot Attacks
 
Burp better - Finding Struts and XXE Vulns with Burp Extensions
Burp better - Finding Struts and XXE Vulns with Burp ExtensionsBurp better - Finding Struts and XXE Vulns with Burp Extensions
Burp better - Finding Struts and XXE Vulns with Burp Extensions
 
PPT on Phishing
PPT on PhishingPPT on Phishing
PPT on Phishing
 
Introduction to burp suite
Introduction to burp suiteIntroduction to burp suite
Introduction to burp suite
 
Forti web
Forti webForti web
Forti web
 
Ssrf
SsrfSsrf
Ssrf
 
Phishing ppt
Phishing pptPhishing ppt
Phishing ppt
 
Malware analysis
Malware analysisMalware analysis
Malware analysis
 
Different Types of Phishing Attacks
Different Types of Phishing AttacksDifferent Types of Phishing Attacks
Different Types of Phishing Attacks
 
Deep Exploit@Black Hat Europe 2018 Arsenal
Deep Exploit@Black Hat Europe 2018 ArsenalDeep Exploit@Black Hat Europe 2018 Arsenal
Deep Exploit@Black Hat Europe 2018 Arsenal
 
Phishing attacks ppt
Phishing attacks pptPhishing attacks ppt
Phishing attacks ppt
 
Ethical Hacking & Penetration Testing
Ethical Hacking & Penetration TestingEthical Hacking & Penetration Testing
Ethical Hacking & Penetration Testing
 
CNIT 129S: Ch 6: Attacking Authentication
CNIT 129S: Ch 6: Attacking AuthenticationCNIT 129S: Ch 6: Attacking Authentication
CNIT 129S: Ch 6: Attacking Authentication
 
CNIT 129S: Ch 4: Mapping the Application
CNIT 129S: Ch 4: Mapping the ApplicationCNIT 129S: Ch 4: Mapping the Application
CNIT 129S: Ch 4: Mapping the Application
 
Ch04 Network Vulnerabilities and Attacks
Ch04 Network Vulnerabilities and AttacksCh04 Network Vulnerabilities and Attacks
Ch04 Network Vulnerabilities and Attacks
 
F5 ASM v12 DDoS best practices
F5 ASM v12 DDoS best practices F5 ASM v12 DDoS best practices
F5 ASM v12 DDoS best practices
 

Similar a Rootconf_phishing_v2

Cloud Services Brokerage Demystified
Cloud Services Brokerage DemystifiedCloud Services Brokerage Demystified
Cloud Services Brokerage Demystified
Zach Gardner
 
Using Data Science for Cybersecurity
Using Data Science for CybersecurityUsing Data Science for Cybersecurity
Using Data Science for Cybersecurity
VMware Tanzu
 

Similar a Rootconf_phishing_v2 (20)

Phishing Detection using Machine Learning
Phishing Detection using Machine LearningPhishing Detection using Machine Learning
Phishing Detection using Machine Learning
 
dasdweda PPT.pptx
dasdweda PPT.pptxdasdweda PPT.pptx
dasdweda PPT.pptx
 
Phishing Website Detection by Machine Learning Techniques Presentation.pdf
Phishing Website Detection by Machine Learning Techniques Presentation.pdfPhishing Website Detection by Machine Learning Techniques Presentation.pdf
Phishing Website Detection by Machine Learning Techniques Presentation.pdf
 
Machine Learning in Cyber Security
Machine Learning in Cyber SecurityMachine Learning in Cyber Security
Machine Learning in Cyber Security
 
Web Analytics: Challenges in Data Modeling
Web Analytics: Challenges in Data ModelingWeb Analytics: Challenges in Data Modeling
Web Analytics: Challenges in Data Modeling
 
Overcoming Barriers to the Cloud
Overcoming Barriers to the Cloud Overcoming Barriers to the Cloud
Overcoming Barriers to the Cloud
 
Building a Real-Time Security Application Using Log Data and Machine Learning...
Building a Real-Time Security Application Using Log Data and Machine Learning...Building a Real-Time Security Application Using Log Data and Machine Learning...
Building a Real-Time Security Application Using Log Data and Machine Learning...
 
Using ML to Protect Customer Privacy by fmr Amazon Sr PM
Using ML to Protect Customer Privacy by fmr Amazon Sr PMUsing ML to Protect Customer Privacy by fmr Amazon Sr PM
Using ML to Protect Customer Privacy by fmr Amazon Sr PM
 
Cloud Services Brokerage Demystified
Cloud Services Brokerage DemystifiedCloud Services Brokerage Demystified
Cloud Services Brokerage Demystified
 
Productionising Machine Learning Models
Productionising Machine Learning ModelsProductionising Machine Learning Models
Productionising Machine Learning Models
 
IWMW 2000: Self Evident Applications for Universities
IWMW 2000: Self Evident Applications for UniversitiesIWMW 2000: Self Evident Applications for Universities
IWMW 2000: Self Evident Applications for Universities
 
Using Data Science for Cybersecurity
Using Data Science for CybersecurityUsing Data Science for Cybersecurity
Using Data Science for Cybersecurity
 
AReNA - Machine Learning in Financial Institutions - Prof Hernan Huwyler MBA CPA
AReNA - Machine Learning in Financial Institutions - Prof Hernan Huwyler MBA CPAAReNA - Machine Learning in Financial Institutions - Prof Hernan Huwyler MBA CPA
AReNA - Machine Learning in Financial Institutions - Prof Hernan Huwyler MBA CPA
 
Cloud Cmputing Security
Cloud Cmputing SecurityCloud Cmputing Security
Cloud Cmputing Security
 
Webinar: 10-Step Guide to Creating a Single View of your Business
Webinar: 10-Step Guide to Creating a Single View of your BusinessWebinar: 10-Step Guide to Creating a Single View of your Business
Webinar: 10-Step Guide to Creating a Single View of your Business
 
ADV Slides: Data Pipelines in the Enterprise and Comparison
ADV Slides: Data Pipelines in the Enterprise and ComparisonADV Slides: Data Pipelines in the Enterprise and Comparison
ADV Slides: Data Pipelines in the Enterprise and Comparison
 
If You Are Not Embedding Analytics Into Your Day To Day Processes, You Are Do...
If You Are Not Embedding Analytics Into Your Day To Day Processes, You Are Do...If You Are Not Embedding Analytics Into Your Day To Day Processes, You Are Do...
If You Are Not Embedding Analytics Into Your Day To Day Processes, You Are Do...
 
Three layer API Design Architecture
Three layer API Design ArchitectureThree layer API Design Architecture
Three layer API Design Architecture
 
SCRIMPS-STD: Test Automation Design Principles - and asking the right questions!
SCRIMPS-STD: Test Automation Design Principles - and asking the right questions!SCRIMPS-STD: Test Automation Design Principles - and asking the right questions!
SCRIMPS-STD: Test Automation Design Principles - and asking the right questions!
 
Security-Top-10-Penetration-Findings.pptx
Security-Top-10-Penetration-Findings.pptxSecurity-Top-10-Penetration-Findings.pptx
Security-Top-10-Penetration-Findings.pptx
 

Último

Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 

Último (20)

DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 

Rootconf_phishing_v2

  • 1. THE FIFTH ELEPHANT - ARJUN B.M. MUDPIPE MaliciousURLDetectionfor PhishingIdentificationandPrevention
  • 2. PHISHING INTRODUCTION The fraudulent practice of sending emails purporting to be from reputable companies in order to induce individuals to reveal personal information, such as passwords and credit card numbers MOTIVES: Financial gain, damage reputation, identity theft, fame & notoriety Phishing websites indicators: • Visually appears like the original website • Email creates a sense of urgency to force user action • Fake HTTPS certificate & domain name • Provides attractive offers which tempts the user to respond
  • 3. PROBLEM STATEMENT For Employees: • Accessing malicious sites by being victims of phishing emails • No mechanism to check bad sites by employees through self-service • Lack of awareness and training for employees For Security Teams • Manual time & effort spent to block sites by Security Operations team • Lack of internal ML solution insights on phishing data; current solutions maybe rule-based • Different teams/networks may have different requirements for site access, which cannot be served by external commercial solutions For Business: • 91% of all cyber-attacks are via phishing and they have devastating consequences • Licensing cost for commercial based solutions to detect phishing sites • Dependency on external solution/product
  • 4. APPROACH &METHODOLOGY MACHINE LEARNING APPROACH • Data Collection & Validation • Parameter Determination (Address Bar based Features, HTML and JavaScript based Features, Domain based Features, Abnormal Based Features, URL Blacklist Features) • Feature Extraction from unknown, incoming data (test data) • Create baseline model with initial dataset • Evaluate performance of model and fine-tuning • Apply test data on pre-trained baseline model & make prediction • Compare with known data sources & further fine-tune results • Retrain model on frequent intervals for better accuracy, context and relevancy • Classification model pickled and exposed as REST API WHITELIST / BLACKLIST APPROACH • Identify data sources which provide info on phishing sites • Scrape data from data sources • Create whitelist / blacklist and compare URLs CONS • Lack of updated data sources • Lack of real-time intelligence • Data not comprehensive enough • Extensive effort for data scraping RULE BASED APPROACH • Determine phishing indicators • Define rules using combination of indicators • Compare & match URLs against rules to deny/allow CONS • Complex rule set definitions • Overhead in managing and updating rules • High False Positive and False Negative rates
  • 5. ARCHITECTURE /WORKFLOW BASELINE DATA SET BASELINE ML MODEL PREDICTION TRAIN EXPOSE AS REST API FEATURE EXTRACTION TEST DATA OUTPUT RETRAIN WEB TRAFFIC (UNKNOWN DATA) INPUT • Address Bar based Features • HTML and JavaScript based Features • Domain based Features • Abnormal Based Features • URL Blacklist Features • Total of 30 features • CLASSIFICATION = 0: LEGITIMATE • CLASSIFICATION = 1: PHISHING • COMPARE WITH KNOWN SOURCES • PROBABILITY OF PREDICTION SECURITY ACTION (BLOCK / ALLOW)
  • 6. FEATURE EXTRACTION 1. having_IP_Address 2. URL_Length 3. Shortening_Service 4. having_At_Symbol 5. double_slash_redirecting 6. Prefix_Suffix 7. having_Sub_Domain 8. SSL_State 25. DNS_Record 26. web_traffic_rank 27. Page_Rank 28. Google_Index 29. Links_pointing_to_page 30. Statistical_report - top phishing domains Classification output: 0 = legitimate, 1 = phishing 9. Domain_registeration_length 10. Favicon 11. Open_ports 12. HTTPS_token_in_URL 13. Request_URL 14. URL_of_Anchor 15. Links_in_tags 16. Server_Form_Handler 17. Submitting_to_email 18. Abnormal_URL 19. Site_Redirect 20. on_mouseover_changes 21. RightClick_Disabled 22. popUpWindow 23. Iframe_redirection 24. age_of_domain
  • 7. DEPLOYING TOPRODUCTION • Context specific use-cases: • Certain sub-nets within the org might require access to certain websites to support business functionality • Org might want to block access to sites even though they are classified as “suspicious” by commercial softwares • Infrastructural & capacity planning considerations: client, load balancer, web server, queues, etc • REST-API approach: train, retrain & predictions • Develop automation test cases for your model (especially on feature engineering side) • Automate evaluation of the production model, which allows to efficiently back-test changes to the model on historical data and determine if improvements have been made or not • Possibly have different ML models / end-points exposed for different sections of the network or for different departments • Have a fall-back or set-default-value for parameters which fail to get processed by the Feature Engineering module (exception handling) • Decouple the input and the output for the model; model should still work if parameters are added, modified or deleted in feature engineering • Single egress point for web traffic, where the ML model can be plugged-in with the REST API • Have a fail-open or kill-switch mechanism for traffic to flow through if model processing fails • Place model operation in “monitoring” or “non-blocking” mode initially, which allows the ML model to get additional data and allows for fine- tuning and prevents errors • Supplement with existing controls like spam filtering, black-listing, etc • Model should refer to other data sources as well for fine-tuning in the initial stages • Baselining and retraining the model at frequent internals; also maintaining model versions • Provide security analysts with an option to tweak/edit input data for contextual representation • Deploying the MODEL client-side versus server-side
  • 8. PROS&CONS PROS • Reduce dependency, cost & license on third-party external software • Re-use of in-house org’s data rather than contribute towards improving commercial software • Better insights into online behavior of employees • Real-time protection for employees who access malicious websites or click on phishing links • Detect and prevent against unknown phishing attacks, as new patterns are created by attackers • Next level of intelligence on top of signature-based prevention techniques & blacklists • Email filtering solutions help in filtering phishing/spam emails, but this provides holistic protection for all outgoing internet traffic • Centralized solution implemented org-wide and no dependency on client- side agents/software • Anti-phishing: move from offline to real-time; move from reactive to proactive CONS • Data collection & building data repository • Initial baseline dataset has too few records • Cost / Maintenance of solution/product • Fine-tuning of rules & predictions to meet changing threat vectors • False positive rate could cause bad user experience • Needs to be supplemented with Cyber Threat Intel • Solution works only when users are connected to org network, since there is no client-side agent
  • 9. AUDIENCE TAKE-AWAYS • Opportunity for engineers and analysts to collaborate and work together to build tailored intelligent security solutions / products • Learn the various considerations in designing and deploying a ML solution in the InfoSec domain
  • 10. EMAIL : arjun.job14@gmail.com LINKEDIN: https://www.linkedin.com/in/arjunbm FURTHERREADING LINKS & REFERENCES • https://www.researchgate.net/publication/226420039_Detection_of_Phishing_Attacks_ A_Machine_Learning_Approach • https://ieeexplore.ieee.org/document/8004877 • https://pdfs.semanticscholar.org/188f/3bde688d5a47ce86bc0a8eca03aeb1bb9dfc.pdf