SlideShare una empresa de Scribd logo
1 de 32
Descargar para leer sin conexión
http://www.free-powerpoint-templates-design.com
Malware Detection using
Machine learning
&
Deep Learning
Minh Đức + Đình Phúc
CyRadar Team at SBC 2019
Disclaimer: This topic is about Machine Learning & Deep Learning
Contents
1. Reality
2. In Research
3. In CyRadar
4. Demo
5. Conclusion
6. Q&A
1. Reality
over 350,000 new malware per day
- is a very big threat in today’s computing
world.
- continues to grow in volume and evolve in
complexity.
- a lot of malware generator.
- The number of websites distributing the malware
is increasing at an alarming rate and is getting out of control.
Malware
- Signature-based: code, hash, behavior, rules,...
Malware detection
Advantages Disadvantages
High accurancy Unable to detect new malware.
Easy to bypass.
Require update database frequenly.
Rely on human expertise in creating
the signatures
*A Theoretical Feature-wise Study of Malware Detection Techniques
2. In Research
1 Malware Detection using Machine Learning and Deep Learning | Hemant
Rathore, Swati Agarwal, Sanjay K. Sahay and Mohit Sewak BITS, Pilani |
Dept. of CS & IS, Goa Campus, Goa, India | 4 Apr 2019.
2 Malware Detection using Windows Api Sequence and Machine Learning |
Chandrasekar Ravi, R Manoharan | Chandrasekar Ravi, R Manoharan |
Department of Computer Science and Engineering, Pondicherry
Engineering College,Pillaichavady, Puducherry - 605014, India | April 2012
3 DeepSign: Deep Learning for Automatic Malware Signature Generation and
Classification | Eli (Omid) David | Dept. of Computer Science Bar-Ilan
University | 23 Nov 2017
4 DeepAM: a heterogeneous deep learning framework for intelligent
malware detection | Yanfang Ye1 · Lingwei Chen1 · Shifu Hou1 · William
Hardy1 · Xin Li | 12 May 2016
5 Behavior-based features model for malware detection | Hisham Shehata
Galal1 · Yousef Bassyouni Mahdy1 · Mohammed Ali Atiea1 | 12 December
2014
6 A Fast Malware Detection Algorithm Based on Objective-Oriented
Association Mining | Yuxin Ding, Xuebing Yuan, Ke Tang, Xiao Xiao, Yibin
Zhang | 19 January 2013
Machine learning principle
Training phase
Detection phase
Extract features
Benign/malware
Training
Predictive model
Predictive model
Unknow
Model decision
Dataset:
• VirusTotal.
• Windows API library.
• VxHeavens website.
• Malicia project.
• ...
small, outdate data.
Malware detection
Static analysis Dynamic analysis
Features:
• Raw Byte.
• Strings.
• Header
• Metadata
• Entropy
• Opcodes
• ….
Features:
• API calls
• Resource usage
• Ports
• Host
• Arguments
• …..
Malware detection
Static analysis Dynamic analysis
Advantages:
• Allows malicious files to be detected
prior to execution.
• Easy to run.
• Fast identification.
Advantages:
• Detecting unconceived types of malware
attacks.
• Detecting the polymorphic malwares.
Malware detection
Static analysis Dynamic analysis
Disadvantages:
• Failing to detect the polymorphic
malwares.
• Each model per sub-type.
• Mistaken for encryption, fileless
malwares,...
Disadvantages:
• Hard to extract feartures.
• Storage complexity for behavioral patterns.
• Time complexity.
Algorithms:
• Supervised learning:
• Decision tree.
• Random forest.
• Logistic Regression.
• SVM.
• Deep Learning
• ...
• Unsupervised learning:
• KNN
• A lot of algorithms have good
results. (> 90%)
• Random forest has best
results.
1. CrowdStrike
2. Cylance
3. Endgame
4. MAX
5. Trapmine
6. SeintinelOne
7. Sophos ML
The AV Industry
3. In CyRadar
PE32 files
push xor …...... call jm
0.125 0.23 ….. 0.345 0.098
0.071 0.123 …. 0.32 0.22
Opcode frequency models
• Binary classification problem
• Static analysis
Opcode
is the portion of a machine language instruction
that specifies what operation is to be performed by
the central processing unit (CPU).
Step 1: Collect data:
• Download pages
• Window's system files.
• Virustotal.
Benign
Malware
Step 1: Collect data.
Step 2: Data cleaning:
1. Remove dupplicated files.
2. Verify with virustotal's API.
Source Number of files
Crawl from download pages 10899
Windows 7 4804
Windows 8 7768
Windows 10 8394
Virustotal 44984
Benign Malware
31865 44984
Step 1: Collect data.
Step 2: Data cleaning.
Step 3: Extract features:
1. Disassembly files.
2. Calculate opcode's frequency.
• Features matrix:
63730 files X 1230 opcode
mov push …. xor and
120 150 ... 100 30
065 12 ... 239 123
Step 1: Collect data.
Step 2: Preprocessing.
Step 3: Extract features.
Step 4: Data preprocessing:
1. Variance threshold (0.1)
2. Remove NANs
• Features matrix:
50388 files X 681 opcode
• Reduce ~45% features
Step 1: Collect data.
Step 2: Preprocessing.
Step 3: Extract features.
Step 4: Data preprocessing:
1. Variance threshold (0.1)
2. Calculate opcode percentage.
3. Remove NANs.
4. Standardize features.
Step 1: Collect data.
Step 2: Preprocessing.
Step 3: Extract features.
Step 4: Dimension reduction.
Step 5: Training:
1. Split train-test data:
• Train: (45349, 681)
• Test : (5039, 681)
2. Try with algorithms:
• Random forest.
• SVM
• Linear regression
• Neural network (9 layers)
Random forest
Neural network
Step 1: Collect data.
Step 2: Preprocessing.
Step 3: Extract features.
Step 4: Dimension reduction.
Step 5: Training.
Step 6: Evaluate models:
Step 1: Collect data.
Step 2: Preprocessing.
Step 3: Extract features.
Step 4: Dimension reduction.
Step 5: Training.
Step 6: Evaluate models:
• Testset: ~5000 files:
• ~2900 malware
• ~2100 benign
Algorithm precision recall
Random Forest
(Machine Learning)
98%
(1996/2037)
97%
(2037/2100)
Deep learning
(DNN 9 Layers)
96%
(1955/2037)
97%
(2037/2100)
4. Demo
5. Conclusion
1. Malware is continues to grow in volume
and evolve in complexity.
2. Traditional approaches is less effective to detect
new malware.
3. There are a lot of research using ML & DL to detect
malware.
4. Industries are trying to apply in to the real world
products.
Internet
Shield
Advanced
Threat
Detection
Web
Email
DNS
EDR
EDR Integrated to
Threat Intelligence Platform
6. Q&A

Más contenido relacionado

La actualidad más candente

Malware classification and detection
Malware classification and detectionMalware classification and detection
Malware classification and detection
Chong-Kuan Chen
 

La actualidad más candente (20)

Malware analysis
Malware analysisMalware analysis
Malware analysis
 
Fast detection of Android malware: machine learning approach
Fast detection of Android malware: machine learning approachFast detection of Android malware: machine learning approach
Fast detection of Android malware: machine learning approach
 
Malware classification and detection
Malware classification and detectionMalware classification and detection
Malware classification and detection
 
Introduction to Malware Analysis
Introduction to Malware AnalysisIntroduction to Malware Analysis
Introduction to Malware Analysis
 
Malware- Types, Detection and Future
Malware- Types, Detection and FutureMalware- Types, Detection and Future
Malware- Types, Detection and Future
 
Presentation_Malware Analysis.pptx
Presentation_Malware Analysis.pptxPresentation_Malware Analysis.pptx
Presentation_Malware Analysis.pptx
 
CNIT 126 2: Malware Analysis in Virtual Machines & 3: Basic Dynamic Analysis
CNIT 126 2: Malware Analysis in Virtual Machines & 3: Basic Dynamic AnalysisCNIT 126 2: Malware Analysis in Virtual Machines & 3: Basic Dynamic Analysis
CNIT 126 2: Malware Analysis in Virtual Machines & 3: Basic Dynamic Analysis
 
malware analysis
malware  analysismalware  analysis
malware analysis
 
Malware Detection Using Data Mining Techniques
Malware Detection Using Data Mining Techniques Malware Detection Using Data Mining Techniques
Malware Detection Using Data Mining Techniques
 
Machine Learning in Cyber Security
Machine Learning in Cyber SecurityMachine Learning in Cyber Security
Machine Learning in Cyber Security
 
Practical Malware Analysis: Ch 0: Malware Analysis Primer & 1: Basic Static T...
Practical Malware Analysis: Ch 0: Malware Analysis Primer & 1: Basic Static T...Practical Malware Analysis: Ch 0: Malware Analysis Primer & 1: Basic Static T...
Practical Malware Analysis: Ch 0: Malware Analysis Primer & 1: Basic Static T...
 
Footprinting and reconnaissance
Footprinting and reconnaissanceFootprinting and reconnaissance
Footprinting and reconnaissance
 
Malware analysis _ Threat Intelligence Morocco
Malware analysis _ Threat Intelligence MoroccoMalware analysis _ Threat Intelligence Morocco
Malware analysis _ Threat Intelligence Morocco
 
Basic Malware Analysis
Basic Malware AnalysisBasic Malware Analysis
Basic Malware Analysis
 
Honeypots
HoneypotsHoneypots
Honeypots
 
Intrusion detection system
Intrusion detection systemIntrusion detection system
Intrusion detection system
 
Application of Machine Learning in Cybersecurity
Application of Machine Learning in CybersecurityApplication of Machine Learning in Cybersecurity
Application of Machine Learning in Cybersecurity
 
Ch 4: Footprinting and Social Engineering
Ch 4: Footprinting and Social EngineeringCh 4: Footprinting and Social Engineering
Ch 4: Footprinting and Social Engineering
 
Trojan virus & backdoors
Trojan virus & backdoorsTrojan virus & backdoors
Trojan virus & backdoors
 
Network Forensics
Network ForensicsNetwork Forensics
Network Forensics
 

Similar a Malware detection-using-machine-learning

Toward revealing Advanced Persistence Threats in your organization - Public
Toward revealing Advanced Persistence Threats in your organization - PublicToward revealing Advanced Persistence Threats in your organization - Public
Toward revealing Advanced Persistence Threats in your organization - Public
Charles Lim
 
Understand How Machine Learning Defends Against Zero-Day Threats
Understand How Machine Learning Defends Against Zero-Day ThreatsUnderstand How Machine Learning Defends Against Zero-Day Threats
Understand How Machine Learning Defends Against Zero-Day Threats
Rahul Mohandas
 
Hunting: Defense Against The Dark Arts
Hunting: Defense Against The Dark ArtsHunting: Defense Against The Dark Arts
Hunting: Defense Against The Dark Arts
Spyglass Security
 
Hunting: Defense Against The Dark Arts v2
Hunting: Defense Against The Dark Arts v2Hunting: Defense Against The Dark Arts v2
Hunting: Defense Against The Dark Arts v2
Spyglass Security
 
Malware collection and analysis
Malware collection and analysisMalware collection and analysis
Malware collection and analysis
Chong-Kuan Chen
 

Similar a Malware detection-using-machine-learning (20)

Design and Development of an Efficient Malware Detection Using ML
Design and Development of an Efficient Malware Detection Using MLDesign and Development of an Efficient Malware Detection Using ML
Design and Development of an Efficient Malware Detection Using ML
 
CV
CVCV
CV
 
NextGen Endpoint Security for Dummies
NextGen Endpoint Security for DummiesNextGen Endpoint Security for Dummies
NextGen Endpoint Security for Dummies
 
How to build corporate size fraud prevention
How to build corporate size fraud preventionHow to build corporate size fraud prevention
How to build corporate size fraud prevention
 
Toward revealing Advanced Persistence Threats in your organization - Public
Toward revealing Advanced Persistence Threats in your organization - PublicToward revealing Advanced Persistence Threats in your organization - Public
Toward revealing Advanced Persistence Threats in your organization - Public
 
Understand How Machine Learning Defends Against Zero-Day Threats
Understand How Machine Learning Defends Against Zero-Day ThreatsUnderstand How Machine Learning Defends Against Zero-Day Threats
Understand How Machine Learning Defends Against Zero-Day Threats
 
Understand How Machine Learning Defends Against Zero-Day Threats
Understand How Machine Learning Defends Against Zero-Day ThreatsUnderstand How Machine Learning Defends Against Zero-Day Threats
Understand How Machine Learning Defends Against Zero-Day Threats
 
How to build corporate size fraud prevention
How to build corporate size fraud preventionHow to build corporate size fraud prevention
How to build corporate size fraud prevention
 
Hunting: Defense Against The Dark Arts
Hunting: Defense Against The Dark ArtsHunting: Defense Against The Dark Arts
Hunting: Defense Against The Dark Arts
 
Paper sharing_Edge based intrusion detection for IOT devices
Paper sharing_Edge based intrusion detection for IOT devicesPaper sharing_Edge based intrusion detection for IOT devices
Paper sharing_Edge based intrusion detection for IOT devices
 
Malware Collection and Analysis via Hardware Virtualization
Malware Collection and Analysis via Hardware VirtualizationMalware Collection and Analysis via Hardware Virtualization
Malware Collection and Analysis via Hardware Virtualization
 
Tune in for the Ultimate WAF Torture Test: Bots Attack!
Tune in for the Ultimate WAF Torture Test: Bots Attack!Tune in for the Ultimate WAF Torture Test: Bots Attack!
Tune in for the Ultimate WAF Torture Test: Bots Attack!
 
Why Johnny Still Can’t Pentest: A Comparative Analysis of Open-source Black-...
Why Johnny Still Can’t Pentest:  A Comparative Analysis of Open-source Black-...Why Johnny Still Can’t Pentest:  A Comparative Analysis of Open-source Black-...
Why Johnny Still Can’t Pentest: A Comparative Analysis of Open-source Black-...
 
influence of AI in IS
influence of AI in ISinfluence of AI in IS
influence of AI in IS
 
Adversarial machine learning for av software
Adversarial machine learning for av softwareAdversarial machine learning for av software
Adversarial machine learning for av software
 
Malware Analysis 101: N00b to Ninja in 60 Minutes at BSidesDC on October 19, ...
Malware Analysis 101: N00b to Ninja in 60 Minutes at BSidesDC on October 19, ...Malware Analysis 101: N00b to Ninja in 60 Minutes at BSidesDC on October 19, ...
Malware Analysis 101: N00b to Ninja in 60 Minutes at BSidesDC on October 19, ...
 
Hunting: Defense Against The Dark Arts v2
Hunting: Defense Against The Dark Arts v2Hunting: Defense Against The Dark Arts v2
Hunting: Defense Against The Dark Arts v2
 
Threat Hunting by Falgun Rathod - Cyber Octet Private Limited
Threat Hunting by Falgun Rathod - Cyber Octet Private LimitedThreat Hunting by Falgun Rathod - Cyber Octet Private Limited
Threat Hunting by Falgun Rathod - Cyber Octet Private Limited
 
Malware collection and analysis
Malware collection and analysisMalware collection and analysis
Malware collection and analysis
 
The artificial reality of cyber defense
The artificial reality of cyber defenseThe artificial reality of cyber defense
The artificial reality of cyber defense
 

Más de Security Bootcamp

GOLDEN TICKET - Hiểm hoa tiềm ẩn trong hệ thống Active Directory
GOLDEN TICKET -  Hiểm hoa tiềm ẩn trong hệ thống Active DirectoryGOLDEN TICKET -  Hiểm hoa tiềm ẩn trong hệ thống Active Directory
GOLDEN TICKET - Hiểm hoa tiềm ẩn trong hệ thống Active Directory
Security Bootcamp
 
PHÂN TÍCH MỘT SỐ CUỘC TẤN CÔNG APT ĐIỂN HÌNH NHẮM VÀO VIỆT NAM 2017-2018
PHÂN TÍCH MỘT SỐ CUỘC TẤN CÔNG APT ĐIỂN HÌNH NHẮM VÀO VIỆT NAM 2017-2018PHÂN TÍCH MỘT SỐ CUỘC TẤN CÔNG APT ĐIỂN HÌNH NHẮM VÀO VIỆT NAM 2017-2018
PHÂN TÍCH MỘT SỐ CUỘC TẤN CÔNG APT ĐIỂN HÌNH NHẮM VÀO VIỆT NAM 2017-2018
Security Bootcamp
 

Más de Security Bootcamp (20)

Ransomware is Knocking your Door_Final.pdf
Ransomware is Knocking your Door_Final.pdfRansomware is Knocking your Door_Final.pdf
Ransomware is Knocking your Door_Final.pdf
 
Hieupc-The role of psychology in enhancing cybersecurity
Hieupc-The role of psychology in enhancing cybersecurityHieupc-The role of psychology in enhancing cybersecurity
Hieupc-The role of psychology in enhancing cybersecurity
 
Nguyen Huu Trung - Building a web vulnerability scanner - From a hacker’s view
Nguyen Huu Trung - Building a web vulnerability scanner - From a hacker’s viewNguyen Huu Trung - Building a web vulnerability scanner - From a hacker’s view
Nguyen Huu Trung - Building a web vulnerability scanner - From a hacker’s view
 
Sbc 2020 bao gio vn co anm dua vao cong nghe mo
Sbc 2020 bao gio vn co anm dua vao cong nghe moSbc 2020 bao gio vn co anm dua vao cong nghe mo
Sbc 2020 bao gio vn co anm dua vao cong nghe mo
 
Deception change-the-game
Deception change-the-gameDeception change-the-game
Deception change-the-game
 
Giam sat thu dong thong tin an toan hang hai su dung sdr
Giam sat thu dong thong tin an toan hang hai su dung sdrGiam sat thu dong thong tin an toan hang hai su dung sdr
Giam sat thu dong thong tin an toan hang hai su dung sdr
 
Sbc2019 luong-cyber startup
Sbc2019 luong-cyber startupSbc2019 luong-cyber startup
Sbc2019 luong-cyber startup
 
Insider threat-what-us-do d-want
Insider threat-what-us-do d-wantInsider threat-what-us-do d-want
Insider threat-what-us-do d-want
 
Macro malware common techniques - public
Macro malware   common techniques - publicMacro malware   common techniques - public
Macro malware common techniques - public
 
Tim dieu moi trong nhung dieu cu
Tim dieu moi trong nhung dieu cuTim dieu moi trong nhung dieu cu
Tim dieu moi trong nhung dieu cu
 
Threat detection with 0 cost
Threat detection with 0 costThreat detection with 0 cost
Threat detection with 0 cost
 
Build SOC
Build SOC Build SOC
Build SOC
 
AD red vs blue
AD red vs blueAD red vs blue
AD red vs blue
 
Securitybox
SecurityboxSecuritybox
Securitybox
 
GOLDEN TICKET - Hiểm hoa tiềm ẩn trong hệ thống Active Directory
GOLDEN TICKET -  Hiểm hoa tiềm ẩn trong hệ thống Active DirectoryGOLDEN TICKET -  Hiểm hoa tiềm ẩn trong hệ thống Active Directory
GOLDEN TICKET - Hiểm hoa tiềm ẩn trong hệ thống Active Directory
 
PHÂN TÍCH MỘT SỐ CUỘC TẤN CÔNG APT ĐIỂN HÌNH NHẮM VÀO VIỆT NAM 2017-2018
PHÂN TÍCH MỘT SỐ CUỘC TẤN CÔNG APT ĐIỂN HÌNH NHẮM VÀO VIỆT NAM 2017-2018PHÂN TÍCH MỘT SỐ CUỘC TẤN CÔNG APT ĐIỂN HÌNH NHẮM VÀO VIỆT NAM 2017-2018
PHÂN TÍCH MỘT SỐ CUỘC TẤN CÔNG APT ĐIỂN HÌNH NHẮM VÀO VIỆT NAM 2017-2018
 
Api security-present
Api security-presentApi security-present
Api security-present
 
Lannguyen-Detecting Cyber Attacks
Lannguyen-Detecting Cyber AttacksLannguyen-Detecting Cyber Attacks
Lannguyen-Detecting Cyber Attacks
 
Letrungnghia-gopyluananm2018
Letrungnghia-gopyluananm2018Letrungnghia-gopyluananm2018
Letrungnghia-gopyluananm2018
 
Cyber Attacks on Financial _ Vikjava
Cyber Attacks on Financial _ VikjavaCyber Attacks on Financial _ Vikjava
Cyber Attacks on Financial _ Vikjava
 

Último

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 

Último (20)

Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 

Malware detection-using-machine-learning

  • 1. http://www.free-powerpoint-templates-design.com Malware Detection using Machine learning & Deep Learning Minh Đức + Đình Phúc CyRadar Team at SBC 2019
  • 2. Disclaimer: This topic is about Machine Learning & Deep Learning
  • 3. Contents 1. Reality 2. In Research 3. In CyRadar 4. Demo 5. Conclusion 6. Q&A
  • 5. over 350,000 new malware per day
  • 6. - is a very big threat in today’s computing world. - continues to grow in volume and evolve in complexity. - a lot of malware generator. - The number of websites distributing the malware is increasing at an alarming rate and is getting out of control. Malware
  • 7. - Signature-based: code, hash, behavior, rules,... Malware detection Advantages Disadvantages High accurancy Unable to detect new malware. Easy to bypass. Require update database frequenly. Rely on human expertise in creating the signatures
  • 8. *A Theoretical Feature-wise Study of Malware Detection Techniques
  • 10. 1 Malware Detection using Machine Learning and Deep Learning | Hemant Rathore, Swati Agarwal, Sanjay K. Sahay and Mohit Sewak BITS, Pilani | Dept. of CS & IS, Goa Campus, Goa, India | 4 Apr 2019. 2 Malware Detection using Windows Api Sequence and Machine Learning | Chandrasekar Ravi, R Manoharan | Chandrasekar Ravi, R Manoharan | Department of Computer Science and Engineering, Pondicherry Engineering College,Pillaichavady, Puducherry - 605014, India | April 2012 3 DeepSign: Deep Learning for Automatic Malware Signature Generation and Classification | Eli (Omid) David | Dept. of Computer Science Bar-Ilan University | 23 Nov 2017 4 DeepAM: a heterogeneous deep learning framework for intelligent malware detection | Yanfang Ye1 · Lingwei Chen1 · Shifu Hou1 · William Hardy1 · Xin Li | 12 May 2016 5 Behavior-based features model for malware detection | Hisham Shehata Galal1 · Yousef Bassyouni Mahdy1 · Mohammed Ali Atiea1 | 12 December 2014 6 A Fast Malware Detection Algorithm Based on Objective-Oriented Association Mining | Yuxin Ding, Xuebing Yuan, Ke Tang, Xiao Xiao, Yibin Zhang | 19 January 2013
  • 11. Machine learning principle Training phase Detection phase Extract features Benign/malware Training Predictive model Predictive model Unknow Model decision
  • 12. Dataset: • VirusTotal. • Windows API library. • VxHeavens website. • Malicia project. • ... small, outdate data.
  • 13. Malware detection Static analysis Dynamic analysis Features: • Raw Byte. • Strings. • Header • Metadata • Entropy • Opcodes • …. Features: • API calls • Resource usage • Ports • Host • Arguments • …..
  • 14. Malware detection Static analysis Dynamic analysis Advantages: • Allows malicious files to be detected prior to execution. • Easy to run. • Fast identification. Advantages: • Detecting unconceived types of malware attacks. • Detecting the polymorphic malwares.
  • 15. Malware detection Static analysis Dynamic analysis Disadvantages: • Failing to detect the polymorphic malwares. • Each model per sub-type. • Mistaken for encryption, fileless malwares,... Disadvantages: • Hard to extract feartures. • Storage complexity for behavioral patterns. • Time complexity.
  • 16. Algorithms: • Supervised learning: • Decision tree. • Random forest. • Logistic Regression. • SVM. • Deep Learning • ... • Unsupervised learning: • KNN • A lot of algorithms have good results. (> 90%) • Random forest has best results.
  • 17. 1. CrowdStrike 2. Cylance 3. Endgame 4. MAX 5. Trapmine 6. SeintinelOne 7. Sophos ML The AV Industry
  • 19. PE32 files push xor …...... call jm 0.125 0.23 ….. 0.345 0.098 0.071 0.123 …. 0.32 0.22 Opcode frequency models • Binary classification problem • Static analysis
  • 20. Opcode is the portion of a machine language instruction that specifies what operation is to be performed by the central processing unit (CPU).
  • 21. Step 1: Collect data: • Download pages • Window's system files. • Virustotal. Benign Malware
  • 22. Step 1: Collect data. Step 2: Data cleaning: 1. Remove dupplicated files. 2. Verify with virustotal's API. Source Number of files Crawl from download pages 10899 Windows 7 4804 Windows 8 7768 Windows 10 8394 Virustotal 44984 Benign Malware 31865 44984
  • 23. Step 1: Collect data. Step 2: Data cleaning. Step 3: Extract features: 1. Disassembly files. 2. Calculate opcode's frequency. • Features matrix: 63730 files X 1230 opcode mov push …. xor and 120 150 ... 100 30 065 12 ... 239 123
  • 24. Step 1: Collect data. Step 2: Preprocessing. Step 3: Extract features. Step 4: Data preprocessing: 1. Variance threshold (0.1) 2. Remove NANs • Features matrix: 50388 files X 681 opcode • Reduce ~45% features
  • 25. Step 1: Collect data. Step 2: Preprocessing. Step 3: Extract features. Step 4: Data preprocessing: 1. Variance threshold (0.1) 2. Calculate opcode percentage. 3. Remove NANs. 4. Standardize features.
  • 26. Step 1: Collect data. Step 2: Preprocessing. Step 3: Extract features. Step 4: Dimension reduction. Step 5: Training: 1. Split train-test data: • Train: (45349, 681) • Test : (5039, 681) 2. Try with algorithms: • Random forest. • SVM • Linear regression • Neural network (9 layers) Random forest Neural network
  • 27. Step 1: Collect data. Step 2: Preprocessing. Step 3: Extract features. Step 4: Dimension reduction. Step 5: Training. Step 6: Evaluate models:
  • 28. Step 1: Collect data. Step 2: Preprocessing. Step 3: Extract features. Step 4: Dimension reduction. Step 5: Training. Step 6: Evaluate models: • Testset: ~5000 files: • ~2900 malware • ~2100 benign Algorithm precision recall Random Forest (Machine Learning) 98% (1996/2037) 97% (2037/2100) Deep learning (DNN 9 Layers) 96% (1955/2037) 97% (2037/2100)
  • 30. 5. Conclusion 1. Malware is continues to grow in volume and evolve in complexity. 2. Traditional approaches is less effective to detect new malware. 3. There are a lot of research using ML & DL to detect malware. 4. Industries are trying to apply in to the real world products.