SlideShare una empresa de Scribd logo
1 de 37
Descargar para leer sin conexión
FFRI, Inc.

@PacSec 2013

Fighting advanced malware
using machine learning
Fourteenforty Research Institute, Inc.
Junichi Murakami
Director of Advanced Development
FFRI, Inc.
http://www.ffri.jp
FFRI, Inc.

Who am I ?
• Security Researcher at FFRI, Inc
– Malware and vulnerabilities analysis with RCE
– Both Windows and Linux kernel development

• Speaker at
– BlackHat USA/JP, RSA, PacSec, AVAR, etc.
• *NOT* a MS/PhD degree in CS/Math
– Just a user of machine learning

2
FFRI, Inc.

The Data Science Venn Diagram (in security)

http://www.niemanlab.org/images/drew-conway-data-science-venn-diagram.jpg
3
FFRI, Inc.

I am, and this talk is

http://www.niemanlab.org/images/drew-conway-data-science-venn-diagram.jpg
4
FFRI, Inc.

Agenda
• Background
• Our approach
• Future works
– Computers vs. Man
– Applying to real time protection

• Conclusion

5
FFRI, Inc.

Background – malware and its detection
Generators

Obfuscators

Increasing
malware

Limitation of
pattern matching

Targeted Attack
(Unknown malware)

other methods
Reputation

Bigdata

Heuristic

Behavioral

Machine learning
6
FFRI, Inc.

Limitation of signature matching
• Evaluated 11 AV-product’s TRP using Metascan
• Used fresh malware (not wildlist malware)
• Prepared 2 test sets from different sources and period
– test-1: 1,000 samples
Pattern
– test-2: 900 samples
Reputation
Huer.

Bhvr.

ML
7
test-1
test-2
test-1
test-2
test-1
test-2
test-1
test-2
test-1
test-2
test-1
test-2
test-1
test-2

A
B
C
D
E
F
G
H
I
J

test-1
test-2

test-1
test-2

TPR

test-1
test-2

test-1
test-2

FFRI, Inc.

Limitation of signature matching
FNR

100%

80%

60%

40%
Avg.

20%

0%

K
8
FFRI, Inc.

Advantage of (cloud) reputation
• Concept is the same with signature-matching(Blacklisting)
• Endpoints don’t have to keep HEAVY patterns anymore
• Easy to reflect a new pattern to the others
2. check

1. query

3. response
Pattern

Reputation
Huer.

Bhvr.

ML
9
FFRI, Inc.

Disdvantage of (cloud) reputation
• “Detectable” means that someone is already attacked
• What if you are the first victim?
• How much effective against “Targeted Attack” it is?
2. check

1. query

3. response
Pattern

Reputation
Huer.

Bhvr.

ML
10
FFRI, Inc.

Advantage of heuristic/behavioral detection
• Based on pre-determined characteristics or behaviors
– OpenProcess -> WriteProcessMemory -> CreateRemoteThread
– Registering itself to auto start extensibility points
– Disabling Windows Firewall, etc.

• Providing generic logics to detect malware
• Signature-free
(eliminate regular scanning and update)

Pattern

Reputation
Huer.

Bhvr.

ML
11
FFRI, Inc.

Disadvantage of heuristic/behavioral detection
• Difficult to avoid false positives completely
• Or ask user to determine if an action would be allow or not
(User dependent)

• Have to analyze malware and update logics continuously
(Not human task, more suitable for computers)
Pattern

Reputation
Huer.

Bhvr.

ML
12
FFRI, Inc.

Heuristic Behavioral Test - July 2012
• AV-comparatives publishes H/B Test results since 2012
• The behavioral hardly contributes to detect (avg: 4.8%)

http://chart.av-comparatives.org/chart1.php
13
FFRI, Inc.

Our approach
• Behavioral-based detection powered by machine learning
• Not new but more practical for the industry
• Easy to try and automate using open source below
– Cuckoo Sandbox
– Jubatus

Pattern

Reputation
Huer.

Bhvr.

ML
14
FFRI, Inc.

Machine learning-based detection
•
•
•
•

Pattern

Most research is doing in academic
Basically, it is a classification problem (task)
Mainly focusing on a combination of the factors below
Some good results are reported

Reputation

Bhvr.

Huer.

ML

Features

Algorithms

Evaluation

Static information

SVM

TPR/FRP, etc.

Dynamic information

Naive bayes

Accuracy, Precision

Hybrid

Perceptron, etc.

ROC-curve, etc.

15
FFRI, Inc.

Overview
original data set

malware
Dynamic analysis
Cuckoo
Sandbox

benign
training set

feature
selection

test set

convert to
Feature
Vector

train

test

Jubatus

Jubatus

TPR: X%
FPR: Y%
16
FFRI, Inc.

How many samples should we use?
• It is a “Confidence Interval” theory

• It depends on how margin of error we accept
• All of these below hit rates are 1%
– 1/100 (N=100)
– 10/1,000 (N=1,000)
– 100/10,000 (N=10,000)
• Each confidences for determining to “1%” are different
– Each of them has different error
17
FFRI, Inc.

How many samples should we use?
Hit Rate containing errors (95%
confidence)

• We can calculate estimation of margin of error based on N
6.0%
5.0%
4.0%
3.0%
2.0%
1.0%
0.0%
100

1,000

10,000

100,000

1,000,000

(N)

18
FFRI, Inc.

Malware and benign files
• Randomly sampling from files collected by ourselves
– Malware: 15,000(5,000 = training, 10,000 = testing)
– Benign: 15,000 (5,000 = training, 10,000 = testing)
• *Random* is very important
– Different period (choose 15,000/N sample from every day)
– Different sources
– Never care about filetype or malware type

19
FFRI, Inc.

Cuckoo Sandbox - http://www.cuckoosandbox.org/
• Open source automated malware analysis system
– Sending malware into a virtual machine from a host
– Executing the malware inside the virtual machine
– Monitoring and saving its behaviors at runtime
• API calls, network activity, VT results, etc.

20
FFRI, Inc.

API calls

21
FFRI, Inc.

Trends of API calls

The Number of samples
which finished calling API

• Most of the samples finished calling API within 1s
(or keep calling APIs) -> Used API only called within 5s
malware

goodware

10000
8000
6000
4000
2000
0
1s

5s

10s

15s 30s 45s
Elapsed time

60s 60s+

22
FFRI, Inc.

Jubatus - http://jubat.us/en/
• Distributed Online Machine Learning Framework
– Distributed: Scalable
– Online: Not batch, continuous learning
• Open source, LGPL v2.1. Latest release is 0.4.5(22/07/2013)

• Developed by Preferred Infrastructure, Inc. and NTT Software
Innovation Center
• Support various machine learning
– Classification, Regression, Recommendation, Anomaly Detection
• Easy to use (many language bindings, feature converter, etc)
23
FFRI, Inc.

Feature selection and convert to FV
NtOpenFile

Call sequence

NtWriteFile
NtWriteFile

NtOpenFile
NtWriteFile
NtWriteFile

NtOpenFileNtWriteFileNtWriteFile
NtWriteFileNtWriteFileNtOpenFile

NtWriteFileNtOpenFileNtWriteFile

.
.
.
.
.

NtCreateProcess

24
FFRI, Inc.

Feature selection and convert to FV

NtOpenFile NtWriteFile Nt
NtOpenFileNtWriteFileNtWriteFile
malware
malware
malware
label:malware

train
Jubatus

NtCreateProcess NtClose
NtCreateProcess NtClose
NtCreateProcessNtCloseNtClose
benign
NtClose
benign
NtClose
benign
label:benign

25
FFRI, Inc.

(Image) Training internal
Calc and update each FV’s weight based on its freq. and label
(for the detail is dependent on the algorithm called AROW, don’t ask me :-)

malware

benign

0.0
0.25

0.75

NtClose NtClose NtClose

0.75

NtWriteFile LdrLoadDll NtDelayExecution

0.25

0.75

NtCreateProcess NtClose NtClose

0.25

0.5

NtAllocateVirtualMemory ... ...

0.5
26
FFRI, Inc.

Testing results
• N of N-gram
– 3~5-gram > 2-gram = 6-gram
• Best: 3-gram
– TRP: 72.33% [71.58 ~ 73.07 % (95% confidence)]
– FPR: 0.77% [0.60 ~ 0.99% (95% confidence)]
• The result above is an example
– A lot of combination of features are available
(we used only “API-name” and its sequence)
27
FFRI, Inc.

Demo

28
FFRI, Inc.

Future Works

29
FFRI, Inc.

Dumping training model
• http://blog.jubat.us/2013/06/classifier.html (Japanese only)

Investigating weight parameters of classifier

jubalocal_storage_dump.cpp
https://gist.github.com/t-abe/5746333
30
FFRI, Inc.

Indicators of malware likeness in API 3-gram
[foo@nolife classifier]$ ./dump --input model --label “malware”
0.181128
api_call$VirtualProtectEx_VirtualProtectEx_VirtualProtectEx@space#log_tf/bin
0.142254
api_call$RegOpenKeyExA_NtOpenKey_NtOpenKey@space#log_tf/bin
0.137144
api_call$NtReadFile_NtReadFile_NtFreeVirtualMemory@space#log_tf/bin
0.134443
api_call$LdrLoadDll_LdrGetProcedureAddress_VirtualProtectEx@space#log_tf/bin
0.130287
api_call$LdrLoadDll_RegOpenKeyExA_NtOpenKey@space#log_tf/bin
0.130287
api_call$DeviceIoControl_LdrLoadDll_RegOpenKeyExA@space#log_tf/bin
0.122363
api_call$VirtualProtectEx_LdrLoadDll_LdrGetProcedureAddress@space#log_tf/bin
0.102545
api_call$NtFreeVirtualMemory_LdrGetDllHandle_NtCreateFile@space#log_tf/bin
0.102485
api_call$RegCloseKey_RegCloseKey_RegCloseKey@space#log_tf/bin
0.0983165
api_call$NtReadFile_NtFreeVirtualMemory_LdrLoadDll@space#log_tf/bin
0.0966545
api_call$NtSetInformationFile_NtReadFile_NtFreeVirtualMemory@space#log_tf/bin
0.094639
api_call$NtMapViewOfSection_NtFreeVirtualMemory_NtOpenKey@space#log_tf/bin
0.0933827
api_call$NtFreeVirtualMemory_LdrLoadDll_LdrGetProcedureAddress@space#log_tf/bin
0.0905402
api_call$DeviceIoControl_DeviceIoControl_NtWriteFile@space#log_tf/bin
0.0903766
api_call$DeviceIoControl_NtWriteFile_NtWriteFile@space#log_tf/bin
0.0884724
api_call$RegOpenKeyExW_RegOpenKeyExW_LdrGetDllHandle@space#log_tf/bin
0.0853282
api_call$LdrLoadDll_LdrLoadDll_LdrLoadDll@space#log_tf/bin
...

31
FFRI, Inc.

Indicators of goodware likeness in API 3-gram
[foo@nolife classifier]$ ./dump --input model --label “goodware”
0.268353
api_call$LdrGetDllHandle_LdrGetDllHandle_ExitProcess@space#log_tf/bin
0.268353
api_call$LdrGetDllHandle_ExitProcess_NtTerminateProcess@space#log_tf/bin
0.259838
api_call$NtWriteFile_LdrGetDllHandle_LdrGetDllHandle@space#log_tf/bin
0.25887 api_call$NtWriteFile_NtWriteFile_LdrGetDllHandle@space#log_tf/bin
0.135514
api_call$NtOpenFile_NtOpenFile_NtCreateFile@space#log_tf/bin
0.122445
api_call$DeviceIoControl_LdrLoadDll_LdrGetProcedureAddress@space#log_tf/bin
0.12242 api_call$DeviceIoControl_DeviceIoControl_LdrGetDllHandle@space#log_tf/bin
0.119231
api_call$GetSystemMetrics_LdrLoadDll_NtCreateMutant@space#log_tf/bin
0.115319
api_call$DeviceIoControl_LdrGetDllHandle_LdrGetProcedureAddress@space#log_tf/bin
0.109306
api_call$LdrGetProcedureAddress_NtOpenKey_LdrLoadDll@space#log_tf/bin
0.105579
api_call$NtReadFile_NtReadFile_NtReadFile@space#log_tf/bin
0.104565
api_call$NtCreateFile_NtCreateFile_NtWriteFile@space#log_tf/bin
0.103304
api_call$RegOpenKeyExA_LdrGetDllHandle_LdrGetProcedureAddress@space#log_tf/bin
0.10306
api_call$VirtualProtectEx_RegOpenKeyExA_LdrGetDllHandle@space#log_tf/bin
0.100701
api_call$NtFreeVirtualMemory_NtFreeVirtualMemory_GetSystemMetrics@space#log_tf/bin
...

32
FFRI, Inc.

Computer vs. Man
• “VirtualProtectEx_VirtualProtectEx_VirtualProtectEx”
looks like to related to malware
• How about “RegOpenKeyExA_NtOpenKey_NtOpenKey”?

• Computers might recognize indicators which human can’t
(Extremely strong left-brain player)
• Why don’t we cooperate with machine?

33
FFRI, Inc.

Using computers
• Generating models using computers
• Checking them and guessing new logics by human
(Using our right-brain)

• ML-based detection is sometimes difficult to control
– Cannot specify strict conditions to detect

“ It is detected because ML said so ! ”
• Hybrid of ML-based and Logic-based would be powerful

34
FFRI, Inc.

Applying to real time protection
• Using static information as feature
– We can check a file before its execution
– The performance is dependent on features
• Using dynamic information as feature
– Malware is already executed
– Sometimes, detections would be too late
– The hybrid detection above might be also useful
in this perspective

35
FFRI, Inc.

Conclusion
• Traditional pattern-matching reaches its limit
• Current behavioral detections hardly contributes to detect
• By applying ML to behavioral detections
– TPR is improved
– Computers recognize new features which human can’t
– We should make use of them

36
FFRI, Inc.

Thank you!

Contact: research-feedback@ffri.jp
Twitter: @FFRI_Research
37

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

The VTC experience
The VTC experienceThe VTC experience
The VTC experience
 
Nguyen Huu Trung - Building a web vulnerability scanner - From a hacker’s view
Nguyen Huu Trung - Building a web vulnerability scanner - From a hacker’s viewNguyen Huu Trung - Building a web vulnerability scanner - From a hacker’s view
Nguyen Huu Trung - Building a web vulnerability scanner - From a hacker’s view
 
The Difference between Track and Testing Performance
The Difference between Track and Testing PerformanceThe Difference between Track and Testing Performance
The Difference between Track and Testing Performance
 
Active Testing
Active TestingActive Testing
Active Testing
 
Testing Heuristic Detections
Testing Heuristic DetectionsTesting Heuristic Detections
Testing Heuristic Detections
 
Sorting Out The Trash
Sorting Out The TrashSorting Out The Trash
Sorting Out The Trash
 
Whittaker How To Break Software Security - SoftTest Ireland
Whittaker How To Break Software Security - SoftTest IrelandWhittaker How To Break Software Security - SoftTest Ireland
Whittaker How To Break Software Security - SoftTest Ireland
 
Building & Leveraging White Database for Antivirus Testing
Building & Leveraging White Database for Antivirus TestingBuilding & Leveraging White Database for Antivirus Testing
Building & Leveraging White Database for Antivirus Testing
 
Web application Testing
Web application TestingWeb application Testing
Web application Testing
 
Introduction to penetration testing
Introduction to penetration testingIntroduction to penetration testing
Introduction to penetration testing
 
Malenko Аndrii "Security for AI"
Malenko Аndrii "Security for AI"Malenko Аndrii "Security for AI"
Malenko Аndrii "Security for AI"
 
What Every Developer And Tester Should Know About Software Security
What Every Developer And Tester Should Know About Software SecurityWhat Every Developer And Tester Should Know About Software Security
What Every Developer And Tester Should Know About Software Security
 
An Example of use the Threat Modeling Tool (FFRI Monthly Research Nov 2016)
An Example of use the Threat Modeling Tool (FFRI Monthly Research Nov 2016)An Example of use the Threat Modeling Tool (FFRI Monthly Research Nov 2016)
An Example of use the Threat Modeling Tool (FFRI Monthly Research Nov 2016)
 
Maintaining a Malware Collection
Maintaining a Malware CollectionMaintaining a Malware Collection
Maintaining a Malware Collection
 
Sparse Support Faces - Battista Biggio - Int'l Conf. Biometrics, ICB 2015, Ph...
Sparse Support Faces - Battista Biggio - Int'l Conf. Biometrics, ICB 2015, Ph...Sparse Support Faces - Battista Biggio - Int'l Conf. Biometrics, ICB 2015, Ph...
Sparse Support Faces - Battista Biggio - Int'l Conf. Biometrics, ICB 2015, Ph...
 
Fighting Software Inefficiency Through Automated Bug Detection
 Fighting Software Inefficiency Through Automated Bug Detection Fighting Software Inefficiency Through Automated Bug Detection
Fighting Software Inefficiency Through Automated Bug Detection
 
Reverse Engineering - Protecting and Breaking the Software
Reverse Engineering - Protecting and Breaking the SoftwareReverse Engineering - Protecting and Breaking the Software
Reverse Engineering - Protecting and Breaking the Software
 
Penetration testing tools and phases
Penetration testing tools and phasesPenetration testing tools and phases
Penetration testing tools and phases
 
Penetration testing in wireless network
Penetration testing in wireless networkPenetration testing in wireless network
Penetration testing in wireless network
 
Compliance in the 2016 Future of Open Source
Compliance in the 2016 Future of Open SourceCompliance in the 2016 Future of Open Source
Compliance in the 2016 Future of Open Source
 

Similar a Fighting advanced malware using machine learning (English)

Mr201306 machine learning for computer security
Mr201306 machine learning for computer securityMr201306 machine learning for computer security
Mr201306 machine learning for computer security
FFRI, Inc.
 
Matt Eakin - The New Tester Skillset
Matt Eakin - The New Tester SkillsetMatt Eakin - The New Tester Skillset
Matt Eakin - The New Tester Skillset
QA or the Highway
 
Feb 2013Lesson 38 Software Acquisition Development
Feb 2013Lesson 38 Software Acquisition DevelopmentFeb 2013Lesson 38 Software Acquisition Development
Feb 2013Lesson 38 Software Acquisition Development
Barb Tillich
 

Similar a Fighting advanced malware using machine learning (English) (20)

Mr201306 machine learning for computer security
Mr201306 machine learning for computer securityMr201306 machine learning for computer security
Mr201306 machine learning for computer security
 
Ask me anything: A Conversational Interface to Augment Information Security w...
Ask me anything:A Conversational Interface to Augment Information Security w...Ask me anything:A Conversational Interface to Augment Information Security w...
Ask me anything: A Conversational Interface to Augment Information Security w...
 
Monitoring and Instrumentation Strategies: Tips and Best Practices - AppSphere16
Monitoring and Instrumentation Strategies: Tips and Best Practices - AppSphere16Monitoring and Instrumentation Strategies: Tips and Best Practices - AppSphere16
Monitoring and Instrumentation Strategies: Tips and Best Practices - AppSphere16
 
Improving accuracy of malware detection by filtering evaluation dataset based...
Improving accuracy of malware detection by filtering evaluation dataset based...Improving accuracy of malware detection by filtering evaluation dataset based...
Improving accuracy of malware detection by filtering evaluation dataset based...
 
Software Analytics: Towards Software Mining that Matters (2014)
Software Analytics:Towards Software Mining that Matters (2014)Software Analytics:Towards Software Mining that Matters (2014)
Software Analytics: Towards Software Mining that Matters (2014)
 
High time to add machine learning to your information security stack
High time to add machine learning to your information security stackHigh time to add machine learning to your information security stack
High time to add machine learning to your information security stack
 
Top 10 Software to Detect & Prevent Security Vulnerabilities from BlackHat US...
Top 10 Software to Detect & Prevent Security Vulnerabilities from BlackHat US...Top 10 Software to Detect & Prevent Security Vulnerabilities from BlackHat US...
Top 10 Software to Detect & Prevent Security Vulnerabilities from BlackHat US...
 
Black Hat USA 2015 Survey Report (FFRI Monthly Research 201508)
Black Hat USA 2015 Survey Report (FFRI Monthly Research 201508)Black Hat USA 2015 Survey Report (FFRI Monthly Research 201508)
Black Hat USA 2015 Survey Report (FFRI Monthly Research 201508)
 
Software Security Assurance for DevOps
Software Security Assurance for DevOpsSoftware Security Assurance for DevOps
Software Security Assurance for DevOps
 
Software Security Assurance for Devops
Software Security Assurance for DevopsSoftware Security Assurance for Devops
Software Security Assurance for Devops
 
Machine programming
Machine programmingMachine programming
Machine programming
 
Matt Eakin - The New Tester Skillset
Matt Eakin - The New Tester SkillsetMatt Eakin - The New Tester Skillset
Matt Eakin - The New Tester Skillset
 
LJC-Unconference-2023-Keynote.pdf
LJC-Unconference-2023-Keynote.pdfLJC-Unconference-2023-Keynote.pdf
LJC-Unconference-2023-Keynote.pdf
 
Software Analytics: Data Analytics for Software Engineering and Security
Software Analytics: Data Analytics for Software Engineering and SecuritySoftware Analytics: Data Analytics for Software Engineering and Security
Software Analytics: Data Analytics for Software Engineering and Security
 
Open-Source Formative Evaluation Process in Remote Software Maintenance
Open-Source Formative Evaluation Process in Remote Software Maintenance   Open-Source Formative Evaluation Process in Remote Software Maintenance
Open-Source Formative Evaluation Process in Remote Software Maintenance
 
Web Hacking With Burp Suite 101
Web Hacking With Burp Suite 101Web Hacking With Burp Suite 101
Web Hacking With Burp Suite 101
 
Feb 2013Lesson 38 Software Acquisition Development
Feb 2013Lesson 38 Software Acquisition DevelopmentFeb 2013Lesson 38 Software Acquisition Development
Feb 2013Lesson 38 Software Acquisition Development
 
8 Usability Lessons from the UPA Conference by Mark Alves
8 Usability Lessons from the UPA Conference by Mark Alves8 Usability Lessons from the UPA Conference by Mark Alves
8 Usability Lessons from the UPA Conference by Mark Alves
 
Penetration testing dont just leave it to chance
Penetration testing dont just leave it to chancePenetration testing dont just leave it to chance
Penetration testing dont just leave it to chance
 
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - Trivadis
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - TrivadisTechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - Trivadis
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - Trivadis
 

Más de FFRI, Inc.

Appearances are deceiving: Novel offensive techniques in Windows 10/11 on ARM
Appearances are deceiving: Novel offensive techniques in Windows 10/11 on ARMAppearances are deceiving: Novel offensive techniques in Windows 10/11 on ARM
Appearances are deceiving: Novel offensive techniques in Windows 10/11 on ARM
FFRI, Inc.
 
Appearances are deceiving: Novel offensive techniques in Windows 10/11 on ARM
Appearances are deceiving: Novel offensive techniques in Windows 10/11 on ARMAppearances are deceiving: Novel offensive techniques in Windows 10/11 on ARM
Appearances are deceiving: Novel offensive techniques in Windows 10/11 on ARM
FFRI, Inc.
 
Malwarem armed with PowerShell
Malwarem armed with PowerShellMalwarem armed with PowerShell
Malwarem armed with PowerShell
FFRI, Inc.
 

Más de FFRI, Inc. (20)

Appearances are deceiving: Novel offensive techniques in Windows 10/11 on ARM
Appearances are deceiving: Novel offensive techniques in Windows 10/11 on ARMAppearances are deceiving: Novel offensive techniques in Windows 10/11 on ARM
Appearances are deceiving: Novel offensive techniques in Windows 10/11 on ARM
 
Appearances are deceiving: Novel offensive techniques in Windows 10/11 on ARM
Appearances are deceiving: Novel offensive techniques in Windows 10/11 on ARMAppearances are deceiving: Novel offensive techniques in Windows 10/11 on ARM
Appearances are deceiving: Novel offensive techniques in Windows 10/11 on ARM
 
TrustZone use case and trend (FFRI Monthly Research Mar 2017)
TrustZone use case and trend (FFRI Monthly Research Mar 2017) TrustZone use case and trend (FFRI Monthly Research Mar 2017)
TrustZone use case and trend (FFRI Monthly Research Mar 2017)
 
Android Things Security Research in Developer Preview 2 (FFRI Monthly Researc...
Android Things Security Research in Developer Preview 2 (FFRI Monthly Researc...Android Things Security Research in Developer Preview 2 (FFRI Monthly Researc...
Android Things Security Research in Developer Preview 2 (FFRI Monthly Researc...
 
An Overview of the Android Things Security (FFRI Monthly Research Jan 2017)
An Overview of the Android Things Security (FFRI Monthly Research Jan 2017) An Overview of the Android Things Security (FFRI Monthly Research Jan 2017)
An Overview of the Android Things Security (FFRI Monthly Research Jan 2017)
 
Black Hat Europe 2016 Survey Report (FFRI Monthly Research Dec 2016)
Black Hat Europe 2016 Survey Report (FFRI Monthly Research Dec 2016) Black Hat Europe 2016 Survey Report (FFRI Monthly Research Dec 2016)
Black Hat Europe 2016 Survey Report (FFRI Monthly Research Dec 2016)
 
STRIDE Variants and Security Requirements-based Threat Analysis (FFRI Monthly...
STRIDE Variants and Security Requirements-based Threat Analysis (FFRI Monthly...STRIDE Variants and Security Requirements-based Threat Analysis (FFRI Monthly...
STRIDE Variants and Security Requirements-based Threat Analysis (FFRI Monthly...
 
Introduction of Threat Analysis Methods(FFRI Monthly Research 2016.9)
Introduction of Threat Analysis Methods(FFRI Monthly Research 2016.9)Introduction of Threat Analysis Methods(FFRI Monthly Research 2016.9)
Introduction of Threat Analysis Methods(FFRI Monthly Research 2016.9)
 
Black Hat USA 2016 Survey Report (FFRI Monthly Research 2016.8)
Black Hat USA 2016  Survey Report (FFRI Monthly Research 2016.8)Black Hat USA 2016  Survey Report (FFRI Monthly Research 2016.8)
Black Hat USA 2016 Survey Report (FFRI Monthly Research 2016.8)
 
About security assessment framework “CHIPSEC” (FFRI Monthly Research 2016.7)
About security assessment framework “CHIPSEC” (FFRI Monthly Research 2016.7) About security assessment framework “CHIPSEC” (FFRI Monthly Research 2016.7)
About security assessment framework “CHIPSEC” (FFRI Monthly Research 2016.7)
 
Black Hat USA 2016 Pre-Survey (FFRI Monthly Research 2016.6)
Black Hat USA 2016 Pre-Survey (FFRI Monthly Research 2016.6)Black Hat USA 2016 Pre-Survey (FFRI Monthly Research 2016.6)
Black Hat USA 2016 Pre-Survey (FFRI Monthly Research 2016.6)
 
Black Hat Asia 2016 Survey Report (FFRI Monthly Research 2016.4)
Black Hat Asia 2016 Survey Report (FFRI Monthly Research 2016.4)Black Hat Asia 2016 Survey Report (FFRI Monthly Research 2016.4)
Black Hat Asia 2016 Survey Report (FFRI Monthly Research 2016.4)
 
ARMv8-M TrustZone: A New Security Feature for Embedded Systems (FFRI Monthly ...
ARMv8-M TrustZone: A New Security Feature for Embedded Systems (FFRI Monthly ...ARMv8-M TrustZone: A New Security Feature for Embedded Systems (FFRI Monthly ...
ARMv8-M TrustZone: A New Security Feature for Embedded Systems (FFRI Monthly ...
 
CODE BLUE 2015 Report (FFRI Monthly Research 2015.11)
CODE BLUE 2015 Report (FFRI Monthly Research 2015.11)CODE BLUE 2015 Report (FFRI Monthly Research 2015.11)
CODE BLUE 2015 Report (FFRI Monthly Research 2015.11)
 
Latest Security Reports of Automobile and Vulnerability Assessment by CVSS v3...
Latest Security Reports of Automobile and Vulnerability Assessment by CVSS v3...Latest Security Reports of Automobile and Vulnerability Assessment by CVSS v3...
Latest Security Reports of Automobile and Vulnerability Assessment by CVSS v3...
 
A Survey of Threats in OS X and iOS(FFRI Monthly Research 201507)
A Survey of Threats in OS X and iOS(FFRI Monthly Research 201507)A Survey of Threats in OS X and iOS(FFRI Monthly Research 201507)
A Survey of Threats in OS X and iOS(FFRI Monthly Research 201507)
 
Security of Windows 10 IoT Core(FFRI Monthly Research 201506)
Security of Windows 10 IoT Core(FFRI Monthly Research 201506)Security of Windows 10 IoT Core(FFRI Monthly Research 201506)
Security of Windows 10 IoT Core(FFRI Monthly Research 201506)
 
Trend of Next-Gen In-Vehicle Network Standard and Current State of Security(F...
Trend of Next-Gen In-Vehicle Network Standard and Current State of Security(F...Trend of Next-Gen In-Vehicle Network Standard and Current State of Security(F...
Trend of Next-Gen In-Vehicle Network Standard and Current State of Security(F...
 
Malwarem armed with PowerShell
Malwarem armed with PowerShellMalwarem armed with PowerShell
Malwarem armed with PowerShell
 
MR201504 Web Defacing Attacks Targeting WordPress
MR201504 Web Defacing Attacks Targeting WordPressMR201504 Web Defacing Attacks Targeting WordPress
MR201504 Web Defacing Attacks Targeting WordPress
 

Último

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 

Último (20)

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 

Fighting advanced malware using machine learning (English)

  • 1. FFRI, Inc. @PacSec 2013 Fighting advanced malware using machine learning Fourteenforty Research Institute, Inc. Junichi Murakami Director of Advanced Development FFRI, Inc. http://www.ffri.jp
  • 2. FFRI, Inc. Who am I ? • Security Researcher at FFRI, Inc – Malware and vulnerabilities analysis with RCE – Both Windows and Linux kernel development • Speaker at – BlackHat USA/JP, RSA, PacSec, AVAR, etc. • *NOT* a MS/PhD degree in CS/Math – Just a user of machine learning 2
  • 3. FFRI, Inc. The Data Science Venn Diagram (in security) http://www.niemanlab.org/images/drew-conway-data-science-venn-diagram.jpg 3
  • 4. FFRI, Inc. I am, and this talk is http://www.niemanlab.org/images/drew-conway-data-science-venn-diagram.jpg 4
  • 5. FFRI, Inc. Agenda • Background • Our approach • Future works – Computers vs. Man – Applying to real time protection • Conclusion 5
  • 6. FFRI, Inc. Background – malware and its detection Generators Obfuscators Increasing malware Limitation of pattern matching Targeted Attack (Unknown malware) other methods Reputation Bigdata Heuristic Behavioral Machine learning 6
  • 7. FFRI, Inc. Limitation of signature matching • Evaluated 11 AV-product’s TRP using Metascan • Used fresh malware (not wildlist malware) • Prepared 2 test sets from different sources and period – test-1: 1,000 samples Pattern – test-2: 900 samples Reputation Huer. Bhvr. ML 7
  • 9. FFRI, Inc. Advantage of (cloud) reputation • Concept is the same with signature-matching(Blacklisting) • Endpoints don’t have to keep HEAVY patterns anymore • Easy to reflect a new pattern to the others 2. check 1. query 3. response Pattern Reputation Huer. Bhvr. ML 9
  • 10. FFRI, Inc. Disdvantage of (cloud) reputation • “Detectable” means that someone is already attacked • What if you are the first victim? • How much effective against “Targeted Attack” it is? 2. check 1. query 3. response Pattern Reputation Huer. Bhvr. ML 10
  • 11. FFRI, Inc. Advantage of heuristic/behavioral detection • Based on pre-determined characteristics or behaviors – OpenProcess -> WriteProcessMemory -> CreateRemoteThread – Registering itself to auto start extensibility points – Disabling Windows Firewall, etc. • Providing generic logics to detect malware • Signature-free (eliminate regular scanning and update) Pattern Reputation Huer. Bhvr. ML 11
  • 12. FFRI, Inc. Disadvantage of heuristic/behavioral detection • Difficult to avoid false positives completely • Or ask user to determine if an action would be allow or not (User dependent) • Have to analyze malware and update logics continuously (Not human task, more suitable for computers) Pattern Reputation Huer. Bhvr. ML 12
  • 13. FFRI, Inc. Heuristic Behavioral Test - July 2012 • AV-comparatives publishes H/B Test results since 2012 • The behavioral hardly contributes to detect (avg: 4.8%) http://chart.av-comparatives.org/chart1.php 13
  • 14. FFRI, Inc. Our approach • Behavioral-based detection powered by machine learning • Not new but more practical for the industry • Easy to try and automate using open source below – Cuckoo Sandbox – Jubatus Pattern Reputation Huer. Bhvr. ML 14
  • 15. FFRI, Inc. Machine learning-based detection • • • • Pattern Most research is doing in academic Basically, it is a classification problem (task) Mainly focusing on a combination of the factors below Some good results are reported Reputation Bhvr. Huer. ML Features Algorithms Evaluation Static information SVM TPR/FRP, etc. Dynamic information Naive bayes Accuracy, Precision Hybrid Perceptron, etc. ROC-curve, etc. 15
  • 16. FFRI, Inc. Overview original data set malware Dynamic analysis Cuckoo Sandbox benign training set feature selection test set convert to Feature Vector train test Jubatus Jubatus TPR: X% FPR: Y% 16
  • 17. FFRI, Inc. How many samples should we use? • It is a “Confidence Interval” theory • It depends on how margin of error we accept • All of these below hit rates are 1% – 1/100 (N=100) – 10/1,000 (N=1,000) – 100/10,000 (N=10,000) • Each confidences for determining to “1%” are different – Each of them has different error 17
  • 18. FFRI, Inc. How many samples should we use? Hit Rate containing errors (95% confidence) • We can calculate estimation of margin of error based on N 6.0% 5.0% 4.0% 3.0% 2.0% 1.0% 0.0% 100 1,000 10,000 100,000 1,000,000 (N) 18
  • 19. FFRI, Inc. Malware and benign files • Randomly sampling from files collected by ourselves – Malware: 15,000(5,000 = training, 10,000 = testing) – Benign: 15,000 (5,000 = training, 10,000 = testing) • *Random* is very important – Different period (choose 15,000/N sample from every day) – Different sources – Never care about filetype or malware type 19
  • 20. FFRI, Inc. Cuckoo Sandbox - http://www.cuckoosandbox.org/ • Open source automated malware analysis system – Sending malware into a virtual machine from a host – Executing the malware inside the virtual machine – Monitoring and saving its behaviors at runtime • API calls, network activity, VT results, etc. 20
  • 22. FFRI, Inc. Trends of API calls The Number of samples which finished calling API • Most of the samples finished calling API within 1s (or keep calling APIs) -> Used API only called within 5s malware goodware 10000 8000 6000 4000 2000 0 1s 5s 10s 15s 30s 45s Elapsed time 60s 60s+ 22
  • 23. FFRI, Inc. Jubatus - http://jubat.us/en/ • Distributed Online Machine Learning Framework – Distributed: Scalable – Online: Not batch, continuous learning • Open source, LGPL v2.1. Latest release is 0.4.5(22/07/2013) • Developed by Preferred Infrastructure, Inc. and NTT Software Innovation Center • Support various machine learning – Classification, Regression, Recommendation, Anomaly Detection • Easy to use (many language bindings, feature converter, etc) 23
  • 24. FFRI, Inc. Feature selection and convert to FV NtOpenFile Call sequence NtWriteFile NtWriteFile NtOpenFile NtWriteFile NtWriteFile NtOpenFileNtWriteFileNtWriteFile NtWriteFileNtWriteFileNtOpenFile NtWriteFileNtOpenFileNtWriteFile . . . . . NtCreateProcess 24
  • 25. FFRI, Inc. Feature selection and convert to FV NtOpenFile NtWriteFile Nt NtOpenFileNtWriteFileNtWriteFile malware malware malware label:malware train Jubatus NtCreateProcess NtClose NtCreateProcess NtClose NtCreateProcessNtCloseNtClose benign NtClose benign NtClose benign label:benign 25
  • 26. FFRI, Inc. (Image) Training internal Calc and update each FV’s weight based on its freq. and label (for the detail is dependent on the algorithm called AROW, don’t ask me :-) malware benign 0.0 0.25 0.75 NtClose NtClose NtClose 0.75 NtWriteFile LdrLoadDll NtDelayExecution 0.25 0.75 NtCreateProcess NtClose NtClose 0.25 0.5 NtAllocateVirtualMemory ... ... 0.5 26
  • 27. FFRI, Inc. Testing results • N of N-gram – 3~5-gram > 2-gram = 6-gram • Best: 3-gram – TRP: 72.33% [71.58 ~ 73.07 % (95% confidence)] – FPR: 0.77% [0.60 ~ 0.99% (95% confidence)] • The result above is an example – A lot of combination of features are available (we used only “API-name” and its sequence) 27
  • 30. FFRI, Inc. Dumping training model • http://blog.jubat.us/2013/06/classifier.html (Japanese only) Investigating weight parameters of classifier jubalocal_storage_dump.cpp https://gist.github.com/t-abe/5746333 30
  • 31. FFRI, Inc. Indicators of malware likeness in API 3-gram [foo@nolife classifier]$ ./dump --input model --label “malware” 0.181128 api_call$VirtualProtectEx_VirtualProtectEx_VirtualProtectEx@space#log_tf/bin 0.142254 api_call$RegOpenKeyExA_NtOpenKey_NtOpenKey@space#log_tf/bin 0.137144 api_call$NtReadFile_NtReadFile_NtFreeVirtualMemory@space#log_tf/bin 0.134443 api_call$LdrLoadDll_LdrGetProcedureAddress_VirtualProtectEx@space#log_tf/bin 0.130287 api_call$LdrLoadDll_RegOpenKeyExA_NtOpenKey@space#log_tf/bin 0.130287 api_call$DeviceIoControl_LdrLoadDll_RegOpenKeyExA@space#log_tf/bin 0.122363 api_call$VirtualProtectEx_LdrLoadDll_LdrGetProcedureAddress@space#log_tf/bin 0.102545 api_call$NtFreeVirtualMemory_LdrGetDllHandle_NtCreateFile@space#log_tf/bin 0.102485 api_call$RegCloseKey_RegCloseKey_RegCloseKey@space#log_tf/bin 0.0983165 api_call$NtReadFile_NtFreeVirtualMemory_LdrLoadDll@space#log_tf/bin 0.0966545 api_call$NtSetInformationFile_NtReadFile_NtFreeVirtualMemory@space#log_tf/bin 0.094639 api_call$NtMapViewOfSection_NtFreeVirtualMemory_NtOpenKey@space#log_tf/bin 0.0933827 api_call$NtFreeVirtualMemory_LdrLoadDll_LdrGetProcedureAddress@space#log_tf/bin 0.0905402 api_call$DeviceIoControl_DeviceIoControl_NtWriteFile@space#log_tf/bin 0.0903766 api_call$DeviceIoControl_NtWriteFile_NtWriteFile@space#log_tf/bin 0.0884724 api_call$RegOpenKeyExW_RegOpenKeyExW_LdrGetDllHandle@space#log_tf/bin 0.0853282 api_call$LdrLoadDll_LdrLoadDll_LdrLoadDll@space#log_tf/bin ... 31
  • 32. FFRI, Inc. Indicators of goodware likeness in API 3-gram [foo@nolife classifier]$ ./dump --input model --label “goodware” 0.268353 api_call$LdrGetDllHandle_LdrGetDllHandle_ExitProcess@space#log_tf/bin 0.268353 api_call$LdrGetDllHandle_ExitProcess_NtTerminateProcess@space#log_tf/bin 0.259838 api_call$NtWriteFile_LdrGetDllHandle_LdrGetDllHandle@space#log_tf/bin 0.25887 api_call$NtWriteFile_NtWriteFile_LdrGetDllHandle@space#log_tf/bin 0.135514 api_call$NtOpenFile_NtOpenFile_NtCreateFile@space#log_tf/bin 0.122445 api_call$DeviceIoControl_LdrLoadDll_LdrGetProcedureAddress@space#log_tf/bin 0.12242 api_call$DeviceIoControl_DeviceIoControl_LdrGetDllHandle@space#log_tf/bin 0.119231 api_call$GetSystemMetrics_LdrLoadDll_NtCreateMutant@space#log_tf/bin 0.115319 api_call$DeviceIoControl_LdrGetDllHandle_LdrGetProcedureAddress@space#log_tf/bin 0.109306 api_call$LdrGetProcedureAddress_NtOpenKey_LdrLoadDll@space#log_tf/bin 0.105579 api_call$NtReadFile_NtReadFile_NtReadFile@space#log_tf/bin 0.104565 api_call$NtCreateFile_NtCreateFile_NtWriteFile@space#log_tf/bin 0.103304 api_call$RegOpenKeyExA_LdrGetDllHandle_LdrGetProcedureAddress@space#log_tf/bin 0.10306 api_call$VirtualProtectEx_RegOpenKeyExA_LdrGetDllHandle@space#log_tf/bin 0.100701 api_call$NtFreeVirtualMemory_NtFreeVirtualMemory_GetSystemMetrics@space#log_tf/bin ... 32
  • 33. FFRI, Inc. Computer vs. Man • “VirtualProtectEx_VirtualProtectEx_VirtualProtectEx” looks like to related to malware • How about “RegOpenKeyExA_NtOpenKey_NtOpenKey”? • Computers might recognize indicators which human can’t (Extremely strong left-brain player) • Why don’t we cooperate with machine? 33
  • 34. FFRI, Inc. Using computers • Generating models using computers • Checking them and guessing new logics by human (Using our right-brain) • ML-based detection is sometimes difficult to control – Cannot specify strict conditions to detect “ It is detected because ML said so ! ” • Hybrid of ML-based and Logic-based would be powerful 34
  • 35. FFRI, Inc. Applying to real time protection • Using static information as feature – We can check a file before its execution – The performance is dependent on features • Using dynamic information as feature – Malware is already executed – Sometimes, detections would be too late – The hybrid detection above might be also useful in this perspective 35
  • 36. FFRI, Inc. Conclusion • Traditional pattern-matching reaches its limit • Current behavioral detections hardly contributes to detect • By applying ML to behavioral detections – TPR is improved – Computers recognize new features which human can’t – We should make use of them 36
  • 37. FFRI, Inc. Thank you! Contact: research-feedback@ffri.jp Twitter: @FFRI_Research 37