Near-memory & In-Memory Detection of Fileless Malware
Challenges in High Accuracy of Malware Detection
1. Intro
Issues
Objectives
Methodology
Conclusion
Challenges in High Accuracy of
Malware Detection
Muhammad Najmi Ahmad Zabidi
International Islamic University Malaysia
IEEE Control & System Graduate Research Colloquium 2012
Shah Alam, Malaysia
16th July 2012
Muhammad Najmi Ahmad Zabidi ICSRGC 2012 1/26
2. Intro
Issues
Objectives
Methodology
Conclusion
About
I am a research grad student at Universiti Teknologi
Malaysia, Skudai, Johor Bahru, Malaysia
My current employer is International Islamic University
Malaysia, Kuala Lumpur
Research area - malware detection, narrowing on
Windows executables
Muhammad Najmi Ahmad Zabidi ICSRGC 2012 2/26
3. Intro
Issues
Objectives
Methodology
Conclusion
Malware in short
is a software
maliciousness is defined on the risks exposed to the user
sometimes, when in vague, the term ‘‘Potentially
Unwanted Program/Application’’ (PUP/PUA) being used
Muhammad Najmi Ahmad Zabidi ICSRGC 2012 3/26
4. Intro
Issues
Objectives
Methodology
Conclusion
Methods of detections
Static analysis
In this case we have developed a Python based tool,
called as pi-ngaji, an open source tool for static malware
analysis
Dynamic analysis
In this case we will execute the malware in a Windows
environment and dump the API traces into a text file
Muhammad Najmi Ahmad Zabidi ICSRGC 2012 4/26
5. Intro
Issues
Objectives
Methodology
Conclusion
This talk outline several challenges on the current methods of
malware detection
Muhammad Najmi Ahmad Zabidi ICSRGC 2012 5/26
6. Intro
Issues
Objectives
Methodology
Conclusion
Analysis of strings
Important, although not foolproof
Find interesting calls first
Considered static analysis, since no executing of the
binary
Muhammad Najmi Ahmad Zabidi ICSRGC 2012 6/26
7. Intro
Issues
Objectives
Methodology
Conclusion
Methods to find interesting strings
Use strings command (on *NIX systems)
Editors
Checking with Import Address Table (IAT)
Muhammad Najmi Ahmad Zabidi ICSRGC 2012 7/26
8. Intro
Issues
Objectives
Methodology
Conclusion
Issues
Malware numbers are enormous
Need automation in handling the detection
Our proposal - use Machine Learning methods
Muhammad Najmi Ahmad Zabidi ICSRGC 2012 8/26
9. Intro
Issues
Objectives
Methodology
Conclusion
Objectives
Reducing features in malware API since
Some are weak, irrelevant features
Considered as ‘‘noise’’
Feature selection, ranking method is chosen
Muhammad Najmi Ahmad Zabidi ICSRGC 2012 9/26
10. Intro API calls
Issues Anti Debugger/AntiVM strings
Objectives Feature Ranking Selection with Information Gain
Methodology Classification and Clustering
Conclusion
The features
The following are the features
Application Programming Interface (API) calls
XOR’ed strings
Anti virtualization/virtual machine detector
Binary entropy is also interesting
Muhammad Najmi Ahmad Zabidi ICSRGC 2012 10/26
11. Intro API calls
Issues Anti Debugger/AntiVM strings
Objectives Feature Ranking Selection with Information Gain
Methodology Classification and Clustering
Conclusion
Binary file structure
Figure: Structure of a PE file[Pietrek, 1994]
Muhammad Najmi Ahmad Zabidi ICSRGC 2012 11/26
12. Intro API calls
Issues Anti Debugger/AntiVM strings
Objectives Feature Ranking Selection with Information Gain
Methodology Classification and Clustering
Conclusion
Figure: PE components, simplified
Muhammad Najmi Ahmad Zabidi ICSRGC 2012 12/26
13. Intro API calls
Issues Anti Debugger/AntiVM strings
Objectives Feature Ranking Selection with Information Gain
Methodology Classification and Clustering
Conclusion
API calls
Features are as follows:
Example of Features
GetSystemTimeAsFileTime
SetUnhandledExceptionFilte
GetCurrentProces
TerminateProcess
LoadLibraryExW
GetVersionExW
GetProcAddress
Muhammad Najmi Ahmad Zabidi ICSRGC 2012 13/26
14. Intro API calls
Issues Anti Debugger/AntiVM strings
Objectives Feature Ranking Selection with Information Gain
Methodology Classification and Clustering
Conclusion
Anti Debugger/AntiVM strings
IsDebuggerPresent
VMCheck.dll
Muhammad Najmi Ahmad Zabidi ICSRGC 2012 14/26
15. Intro API calls
Issues Anti Debugger/AntiVM strings
Objectives Feature Ranking Selection with Information Gain
Methodology Classification and Clustering
Conclusion
"Red Pill":"x0fx01x0dx00x00x00x00xc3",
"VirtualPc trick":"x0fx3fx07x0b",
"VMware trick":"VMXh",
"VMCheck.dll":"x45xC7x00x01",
"VMCheck.dll for VirtualPC":"x0fx3fx07x0bxc7x45xfcxffxffxffxff",
"Xen":"XenVMM", # Or XenVMMXenVMM
"Bochs & QEmu CPUID Trick":"x44x4dx41x63",
"Torpig VMM Trick": "xE8xEDxFFxFFxFFx25x00x00x00xFF
x33xC9x3Dx00x00x00x80x0Fx95xC1x8BxC1xC3",
"Torpig (UPX) VMM Trick": "x51x51x0Fx01x27x00xC1xFBxB5xD5x35
x02xE2xC3xD1x66x25x32
xBDx83x7FxB7x4Ex3Dx06x80x0Fx95xC1x8BxC1xC3"
Source: ZeroWine source code
Muhammad Najmi Ahmad Zabidi ICSRGC 2012 15/26
16. Intro API calls
Issues Anti Debugger/AntiVM strings
Objectives Feature Ranking Selection with Information Gain
Methodology Classification and Clustering
Conclusion
Sample execution
Analyzing e665297bf9dbb2b2790e4d898d70c9e9
Analyzing registry...
[+] Malware is Adding a Key at Hive: HKEY_LOCAL_MACHINE
^G^@Label11^@^A^AÃˇ^Nreg add "HKEY_LOCAL_MACHINESOFTWAREMicrosoftWindows NTCurrentVersion
R
File Execution OptionsRx.exe" /v debugger /t REG_SZ /d %systemrot%repair1sass.exe /f^M
....
[+] Malware Seems to be IRC BOT: Verified By String : ADMIN
[+] Malware Seems to be IRC BOT: Verified By String : LIST
[+] Malware Seems to be IRC BOT: Verified By String : QUIT
[+] Malware Seems to be IRC BOT: Verified By String : VERSION
Analyzing interesting calls..
[+] Found an Interesting call to: FindWindow
[+] Found an Interesting call to: LoadLibraryA
[+] Found an Interesting call to: CreateProcess
[+] Found an Interesting call to: GetProcAddress
[+] Found an Interesting call to: CopyFile
[+] Found an Interesting call to: shdocvw
Muhammad Najmi Ahmad Zabidi ICSRGC 2012 16/26
17. Intro API calls
Issues Anti Debugger/AntiVM strings
Objectives Feature Ranking Selection with Information Gain
Methodology Classification and Clustering
Conclusion
Advantages on the researcher’s side
Malware writers usually are ‘‘lazy’’ hence there is a
tendency they will reuse the previous chunk of codes
Hence, it’s easier to trace the previous family based on
the commonalities
Muhammad Najmi Ahmad Zabidi ICSRGC 2012 17/26
18. Intro API calls
Issues Anti Debugger/AntiVM strings
Objectives Feature Ranking Selection with Information Gain
Methodology Classification and Clustering
Conclusion
Our methods
Roughly our methods consist of :
1 Feature Selection(Ranking/Pruning)
2 Supervised Classification
3 Unsupervised Classification
Item 2) and 3) above also could be combined to a method
known as ‘‘Semi Supervised Classification’’.
Muhammad Najmi Ahmad Zabidi ICSRGC 2012 18/26
19. Intro API calls
Issues Anti Debugger/AntiVM strings
Objectives Feature Ranking Selection with Information Gain
Methodology Classification and Clustering
Conclusion
Information Gain
[Zhang et al., 2007, Altaher et al., 2011,
Singhal and Raul, 2012] use the following formula for IG
application in malware
The amount by which the entropy of X decreases
reflects additional information about X provided by Y is
called information gain, given by
IG(X |Y ) = H(X ) − H(X |Y )
[Singhal and Raul, 2012] introduced the following algorithm
to ‘‘correct out’’ error the results.
n
i−0 IG(Xi )
IG(X ) = IG(X ) ±
n
Muhammad Najmi Ahmad Zabidi ICSRGC 2012 19/26
20. Intro API calls
Issues Anti Debugger/AntiVM strings
Objectives Feature Ranking Selection with Information Gain
Methodology Classification and Clustering
Conclusion
Information Gain (cont’d)
From [Jiang et al., 2011]
P(t , c)
IG(t) = P(t , c)log
P(t )P(c)
c∈{ci ,ci } t ∈{t,t}
Muhammad Najmi Ahmad Zabidi ICSRGC 2012 20/26
21. Intro API calls
Issues Anti Debugger/AntiVM strings
Objectives Feature Ranking Selection with Information Gain
Methodology Classification and Clustering
Conclusion
For research purpose the following issues are always
wondered:
No standard dataset, unlike Intrusion Detection System
(IDS) area
Fast-paced malware sample, will the datasets being used
for the experiment will be questioned
Last resort, stick to the existing database, try to free from
any specific malware family as to make sure the method
will/could work with incoming, new malware
Muhammad Najmi Ahmad Zabidi ICSRGC 2012 21/26
22. Intro API calls
Issues Anti Debugger/AntiVM strings
Objectives Feature Ranking Selection with Information Gain
Methodology Classification and Clustering
Conclusion
Table: Differences between clustering and classification
Muhammad Najmi Ahmad Zabidi ICSRGC 2012 22/26
23. Intro API calls
Issues Anti Debugger/AntiVM strings
Objectives Feature Ranking Selection with Information Gain
Methodology Classification and Clustering
Conclusion
Table: Differences between clustering and classification
Classification
Muhammad Najmi Ahmad Zabidi ICSRGC 2012 22/26
24. Intro API calls
Issues Anti Debugger/AntiVM strings
Objectives Feature Ranking Selection with Information Gain
Methodology Classification and Clustering
Conclusion
Table: Differences between clustering and classification
Classification
Deals with known data
Muhammad Najmi Ahmad Zabidi ICSRGC 2012 22/26
25. Intro API calls
Issues Anti Debugger/AntiVM strings
Objectives Feature Ranking Selection with Information Gain
Methodology Classification and Clustering
Conclusion
Table: Differences between clustering and classification
Classification
Deals with known data
Supervised learning
Muhammad Najmi Ahmad Zabidi ICSRGC 2012 22/26
26. Intro API calls
Issues Anti Debugger/AntiVM strings
Objectives Feature Ranking Selection with Information Gain
Methodology Classification and Clustering
Conclusion
Table: Differences between clustering and classification
Classification
Deals with known data
Supervised learning
Popular algorithms includes:
Random Forest
Neural Networks
k-Nearest Neighbor
Decision Trees
Muhammad Najmi Ahmad Zabidi ICSRGC 2012 22/26
27. Intro API calls
Issues Anti Debugger/AntiVM strings
Objectives Feature Ranking Selection with Information Gain
Methodology Classification and Clustering
Conclusion
Table: Differences between clustering and classification
Classification
Deals with known data
Supervised learning
Popular algorithms includes:
Random Forest
Neural Networks
k-Nearest Neighbor
Decision Trees
Muhammad Najmi Ahmad Zabidi ICSRGC 2012 22/26
28. Intro API calls
Issues Anti Debugger/AntiVM strings
Objectives Feature Ranking Selection with Information Gain
Methodology Classification and Clustering
Conclusion
Table: Differences between clustering and classification
Classification Clustering
Deals with known data
Supervised learning
Popular algorithms includes:
Random Forest
Neural Networks
k-Nearest Neighbor
Decision Trees
Muhammad Najmi Ahmad Zabidi ICSRGC 2012 22/26
29. Intro API calls
Issues Anti Debugger/AntiVM strings
Objectives Feature Ranking Selection with Information Gain
Methodology Classification and Clustering
Conclusion
Table: Differences between clustering and classification
Classification Clustering
Deals with known data Deals with unknown data
Supervised learning
Popular algorithms includes:
Random Forest
Neural Networks
k-Nearest Neighbor
Decision Trees
Muhammad Najmi Ahmad Zabidi ICSRGC 2012 22/26
30. Intro API calls
Issues Anti Debugger/AntiVM strings
Objectives Feature Ranking Selection with Information Gain
Methodology Classification and Clustering
Conclusion
Table: Differences between clustering and classification
Classification Clustering
Deals with known data Deals with unknown data
Supervised learning Unsupervised learning
Popular algorithms includes:
Random Forest
Neural Networks
k-Nearest Neighbor
Decision Trees
Muhammad Najmi Ahmad Zabidi ICSRGC 2012 22/26
31. Intro API calls
Issues Anti Debugger/AntiVM strings
Objectives Feature Ranking Selection with Information Gain
Methodology Classification and Clustering
Conclusion
Table: Differences between clustering and classification
Classification Clustering
Deals with known data Deals with unknown data
Supervised learning Unsupervised learning
Popular algorithms includes: Popular algorithms includes:
Random Forest K-means
Neural Networks Fuzzy C
k-Nearest Neighbor Gaussian
Decision Trees
Muhammad Najmi Ahmad Zabidi ICSRGC 2012 22/26
32. Intro API calls
Issues Anti Debugger/AntiVM strings
Objectives Feature Ranking Selection with Information Gain
Methodology Classification and Clustering
Conclusion
Classification (supervised) chosen to deal with known
corpus but incomplete data
Clustering (unsupervised) chosen to deal with new inputs
Muhammad Najmi Ahmad Zabidi ICSRGC 2012 23/26
33. Intro API calls
Issues Anti Debugger/AntiVM strings
Objectives Feature Ranking Selection with Information Gain
Methodology Classification and Clustering
Conclusion
Some results
We managed to detect several malware samples by using
the existing API traces and other features (bot
commands, file/registry deletion)
New malware which is more sophisticated -
Stuxned/Duqu is very platform specific - attacking SCADA
system hence needs more reading on detecting them.
Perhaps the most obvious if any XOR’ed communication
channels being used.
Muhammad Najmi Ahmad Zabidi ICSRGC 2012 24/26
34. Intro
Issues
Objectives
Methodology
Conclusion
The flow
Feature Selection Feature Categorization
Weka, Octave/Matlab
Clustering Classification
Weka, Octave/Matlab
scipy, Octave/Matlab
Visualization
scipy, Octave/Matlab
Muhammad Najmi Ahmad Zabidi ICSRGC 2012 25/26
35. Intro
Issues
Objectives
Methodology
Conclusion
Altaher, A., Ramadass, S., and Ali, A. (2011).
Computer Virus Detection Using Features Ranking and Machine Learning.
Australian Journal of Basic and Applied Sciences, 5(9):1482--1486.
Jiang, Q., Zhao, X., and Huang, K. (2011).
A feature selection method for malware detection.
In 2011 IEEE International Conference on Information and Automation (ICIA), pages 890--895.
Pietrek, M. (1994).
Peering Inside the PE: A Tour of the Win32 Portable Executable File Format.
http://msdn.microsoft.com/en-us/library/ms809762.aspx.
Singhal, P. and Raul, N. (2012).
Malware detection module using machine learning algorithms to assist in centralized security in enterprise
networks.
International Journal of Network Security & Its Applications, 4.
Zhang, B., Yin, J., Hao, J., Wang, S., and Zhang, D. (2007).
New malicious code detection based on n-gram analysis and rough set theory.
pages 626--633. Springer-Verlag, Berlin, Heidelberg.
Muhammad Najmi Ahmad Zabidi ICSRGC 2012 26/26