SlideShare una empresa de Scribd logo
1 de 35
Descargar para leer sin conexión
Intro
                           Issues
                       Objectives
                     Methodology
                       Conclusion




Challenges in High Accuracy of
      Malware Detection
         Muhammad Najmi Ahmad Zabidi
                  International Islamic University Malaysia



   IEEE Control & System Graduate Research Colloquium 2012
                      Shah Alam, Malaysia


                        16th July 2012



     Muhammad Najmi Ahmad Zabidi         ICSRGC 2012          1/26
Intro
                               Issues
                           Objectives
                         Methodology
                           Conclusion



About


   I am a research grad student at Universiti Teknologi
   Malaysia, Skudai, Johor Bahru, Malaysia
   My current employer is International Islamic University
   Malaysia, Kuala Lumpur
   Research area - malware detection, narrowing on
   Windows executables




          Muhammad Najmi Ahmad Zabidi    ICSRGC 2012   2/26
Intro
                                Issues
                            Objectives
                          Methodology
                            Conclusion



Malware in short


    is a software
    maliciousness is defined on the risks exposed to the user
    sometimes, when in vague, the term ‘‘Potentially
    Unwanted Program/Application’’ (PUP/PUA) being used




           Muhammad Najmi Ahmad Zabidi    ICSRGC 2012   3/26
Intro
                               Issues
                           Objectives
                         Methodology
                           Conclusion



Methods of detections


    Static analysis
        In this case we have developed a Python based tool,
        called as pi-ngaji, an open source tool for static malware
        analysis
    Dynamic analysis
        In this case we will execute the malware in a Windows
        environment and dump the API traces into a text file




          Muhammad Najmi Ahmad Zabidi    ICSRGC 2012   4/26
Intro
                                Issues
                            Objectives
                          Methodology
                            Conclusion




This talk outline several challenges on the current methods of
malware detection




           Muhammad Najmi Ahmad Zabidi    ICSRGC 2012   5/26
Intro
                                 Issues
                             Objectives
                           Methodology
                             Conclusion



Analysis of strings


    Important, although not foolproof
    Find interesting calls first
    Considered static analysis, since no executing of the
    binary




            Muhammad Najmi Ahmad Zabidi    ICSRGC 2012   6/26
Intro
                               Issues
                           Objectives
                         Methodology
                           Conclusion



Methods to find interesting strings


    Use strings command (on *NIX systems)
    Editors
    Checking with Import Address Table (IAT)




          Muhammad Najmi Ahmad Zabidi    ICSRGC 2012   7/26
Intro
                               Issues
                           Objectives
                         Methodology
                           Conclusion



Issues



    Malware numbers are enormous
    Need automation in handling the detection
         Our proposal - use Machine Learning methods




          Muhammad Najmi Ahmad Zabidi    ICSRGC 2012   8/26
Intro
                              Issues
                          Objectives
                        Methodology
                          Conclusion



Objectives


    Reducing features in malware API since
        Some are weak, irrelevant features
        Considered as ‘‘noise’’
        Feature selection, ranking method is chosen




         Muhammad Najmi Ahmad Zabidi    ICSRGC 2012   9/26
Intro   API calls
                                Issues    Anti Debugger/AntiVM strings
                            Objectives    Feature Ranking Selection with Information Gain
                          Methodology     Classification and Clustering
                            Conclusion



The features


  The following are the features
     Application Programming Interface (API) calls
     XOR’ed strings
     Anti virtualization/virtual machine detector
     Binary entropy is also interesting




           Muhammad Najmi Ahmad Zabidi    ICSRGC 2012     10/26
Intro   API calls
                             Issues    Anti Debugger/AntiVM strings
                         Objectives    Feature Ranking Selection with Information Gain
                       Methodology     Classification and Clustering
                         Conclusion



Binary file structure




        Figure: Structure of a PE file[Pietrek, 1994]
        Muhammad Najmi Ahmad Zabidi    ICSRGC 2012     11/26
Intro   API calls
                     Issues    Anti Debugger/AntiVM strings
                 Objectives    Feature Ranking Selection with Information Gain
               Methodology     Classification and Clustering
                 Conclusion




      Figure: PE components, simplified




Muhammad Najmi Ahmad Zabidi    ICSRGC 2012     12/26
Intro   API calls
                                  Issues    Anti Debugger/AntiVM strings
                              Objectives    Feature Ranking Selection with Information Gain
                            Methodology     Classification and Clustering
                              Conclusion



API calls

  Features are as follows:
  Example of Features
  GetSystemTimeAsFileTime
  SetUnhandledExceptionFilte
  GetCurrentProces
  TerminateProcess
  LoadLibraryExW
  GetVersionExW
  GetProcAddress



             Muhammad Najmi Ahmad Zabidi    ICSRGC 2012     13/26
Intro   API calls
                                Issues    Anti Debugger/AntiVM strings
                            Objectives    Feature Ranking Selection with Information Gain
                          Methodology     Classification and Clustering
                            Conclusion



Anti Debugger/AntiVM strings



   IsDebuggerPresent
   VMCheck.dll




           Muhammad Najmi Ahmad Zabidi    ICSRGC 2012     14/26
Intro   API calls
                                        Issues    Anti Debugger/AntiVM strings
                                    Objectives    Feature Ranking Selection with Information Gain
                                  Methodology     Classification and Clustering
                                    Conclusion




         "Red Pill":"x0fx01x0dx00x00x00x00xc3",
         "VirtualPc trick":"x0fx3fx07x0b",
         "VMware trick":"VMXh",
         "VMCheck.dll":"x45xC7x00x01",
         "VMCheck.dll for VirtualPC":"x0fx3fx07x0bxc7x45xfcxffxffxffxff",
         "Xen":"XenVMM", # Or XenVMMXenVMM
         "Bochs & QEmu CPUID Trick":"x44x4dx41x63",
         "Torpig VMM Trick": "xE8xEDxFFxFFxFFx25x00x00x00xFF
                       x33xC9x3Dx00x00x00x80x0Fx95xC1x8BxC1xC3",
         "Torpig (UPX) VMM Trick": "x51x51x0Fx01x27x00xC1xFBxB5xD5x35
                                       x02xE2xC3xD1x66x25x32
                             xBDx83x7FxB7x4Ex3Dx06x80x0Fx95xC1x8BxC1xC3"


Source: ZeroWine source code




                   Muhammad Najmi Ahmad Zabidi    ICSRGC 2012     15/26
Intro   API calls
                                      Issues    Anti Debugger/AntiVM strings
                                  Objectives    Feature Ranking Selection with Information Gain
                                Methodology     Classification and Clustering
                                  Conclusion



Sample execution
 Analyzing   e665297bf9dbb2b2790e4d898d70c9e9

 Analyzing registry...
 [+] Malware is Adding a Key at Hive: HKEY_LOCAL_MACHINE
 ^G^@Label11^@^A^AÃˇ^Nreg add "HKEY_LOCAL_MACHINESOFTWAREMicrosoftWindows NTCurrentVersion
                   R
  File Execution OptionsRx.exe" /v debugger /t REG_SZ /d %systemrot%repair1sass.exe /f^M

 ....

 [+] Malware Seems to be IRC BOT: Verified By String    :   ADMIN
 [+] Malware Seems to be IRC BOT: Verified By String    :   LIST
 [+] Malware Seems to be IRC BOT: Verified By String    :   QUIT
 [+] Malware Seems to be IRC BOT: Verified By String    :   VERSION
 Analyzing interesting calls..
 [+] Found an Interesting call to: FindWindow
 [+] Found an Interesting call to: LoadLibraryA
 [+] Found an Interesting call to: CreateProcess
 [+] Found an Interesting call to: GetProcAddress
 [+] Found an Interesting call to: CopyFile
 [+] Found an Interesting call to: shdocvw




                 Muhammad Najmi Ahmad Zabidi    ICSRGC 2012     16/26
Intro   API calls
                                Issues    Anti Debugger/AntiVM strings
                            Objectives    Feature Ranking Selection with Information Gain
                          Methodology     Classification and Clustering
                            Conclusion



Advantages on the researcher’s side


    Malware writers usually are ‘‘lazy’’ hence there is a
    tendency they will reuse the previous chunk of codes
    Hence, it’s easier to trace the previous family based on
    the commonalities




           Muhammad Najmi Ahmad Zabidi    ICSRGC 2012     17/26
Intro   API calls
                                   Issues    Anti Debugger/AntiVM strings
                               Objectives    Feature Ranking Selection with Information Gain
                             Methodology     Classification and Clustering
                               Conclusion



Our methods

 Roughly our methods consist of :

    1   Feature Selection(Ranking/Pruning)
    2   Supervised Classification
    3   Unsupervised Classification

 Item 2) and 3) above also could be combined to a method
 known as ‘‘Semi Supervised Classification’’.




              Muhammad Najmi Ahmad Zabidi    ICSRGC 2012     18/26
Intro   API calls
                                  Issues    Anti Debugger/AntiVM strings
                              Objectives    Feature Ranking Selection with Information Gain
                            Methodology     Classification and Clustering
                              Conclusion



Information Gain
  [Zhang et al., 2007, Altaher et al., 2011,
  Singhal and Raul, 2012] use the following formula for IG
  application in malware
          The amount by which the entropy of X decreases
      reflects additional information about X provided by Y is
      called information gain, given by

                       IG(X |Y ) = H(X ) − H(X |Y )

  [Singhal and Raul, 2012] introduced the following algorithm
  to ‘‘correct out’’ error the results.
                                                  n
                                                  i−0 IG(Xi )
                    IG(X ) = IG(X ) ±
                                                       n
             Muhammad Najmi Ahmad Zabidi    ICSRGC 2012     19/26
Intro     API calls
                                  Issues      Anti Debugger/AntiVM strings
                              Objectives      Feature Ranking Selection with Information Gain
                            Methodology       Classification and Clustering
                              Conclusion



Information Gain (cont’d)


  From [Jiang et al., 2011]

                                                               P(t , c)
            IG(t) =                           P(t , c)log
                                                              P(t )P(c)
                       c∈{ci ,ci } t ∈{t,t}




             Muhammad Najmi Ahmad Zabidi      ICSRGC 2012     20/26
Intro   API calls
                                Issues    Anti Debugger/AntiVM strings
                            Objectives    Feature Ranking Selection with Information Gain
                          Methodology     Classification and Clustering
                            Conclusion




For research purpose the following issues are always
wondered:
    No standard dataset, unlike Intrusion Detection System
    (IDS) area
    Fast-paced malware sample, will the datasets being used
    for the experiment will be questioned
    Last resort, stick to the existing database, try to free from
    any specific malware family as to make sure the method
    will/could work with incoming, new malware




           Muhammad Najmi Ahmad Zabidi    ICSRGC 2012     21/26
Intro   API calls
                           Issues    Anti Debugger/AntiVM strings
                       Objectives    Feature Ranking Selection with Information Gain
                     Methodology     Classification and Clustering
                       Conclusion




Table: Differences between clustering and classification




      Muhammad Najmi Ahmad Zabidi    ICSRGC 2012     22/26
Intro   API calls
                            Issues    Anti Debugger/AntiVM strings
                        Objectives    Feature Ranking Selection with Information Gain
                      Methodology     Classification and Clustering
                        Conclusion




Table: Differences between clustering and classification

 Classification




       Muhammad Najmi Ahmad Zabidi    ICSRGC 2012     22/26
Intro   API calls
                            Issues    Anti Debugger/AntiVM strings
                        Objectives    Feature Ranking Selection with Information Gain
                      Methodology     Classification and Clustering
                        Conclusion




Table: Differences between clustering and classification

 Classification

 Deals with known data




       Muhammad Najmi Ahmad Zabidi    ICSRGC 2012     22/26
Intro   API calls
                             Issues    Anti Debugger/AntiVM strings
                         Objectives    Feature Ranking Selection with Information Gain
                       Methodology     Classification and Clustering
                         Conclusion




Table: Differences between clustering and classification

 Classification

 Deals with known data

 Supervised learning




       Muhammad Najmi Ahmad Zabidi     ICSRGC 2012     22/26
Intro   API calls
                             Issues    Anti Debugger/AntiVM strings
                         Objectives    Feature Ranking Selection with Information Gain
                       Methodology     Classification and Clustering
                         Conclusion




Table: Differences between clustering and classification

 Classification

 Deals with known data

 Supervised learning

 Popular algorithms includes:
      Random Forest
      Neural Networks
      k-Nearest Neighbor
      Decision Trees




       Muhammad Najmi Ahmad Zabidi     ICSRGC 2012     22/26
Intro   API calls
                             Issues    Anti Debugger/AntiVM strings
                         Objectives    Feature Ranking Selection with Information Gain
                       Methodology     Classification and Clustering
                         Conclusion




Table: Differences between clustering and classification

 Classification

 Deals with known data

 Supervised learning

 Popular algorithms includes:
      Random Forest
      Neural Networks
      k-Nearest Neighbor
      Decision Trees




       Muhammad Najmi Ahmad Zabidi     ICSRGC 2012     22/26
Intro   API calls
                             Issues    Anti Debugger/AntiVM strings
                         Objectives    Feature Ranking Selection with Information Gain
                       Methodology     Classification and Clustering
                         Conclusion




Table: Differences between clustering and classification

 Classification                        Clustering

 Deals with known data

 Supervised learning

 Popular algorithms includes:
      Random Forest
      Neural Networks
      k-Nearest Neighbor
      Decision Trees




       Muhammad Najmi Ahmad Zabidi     ICSRGC 2012     22/26
Intro   API calls
                             Issues    Anti Debugger/AntiVM strings
                         Objectives    Feature Ranking Selection with Information Gain
                       Methodology     Classification and Clustering
                         Conclusion




Table: Differences between clustering and classification

 Classification                        Clustering

 Deals with known data                 Deals with unknown data

 Supervised learning

 Popular algorithms includes:
      Random Forest
      Neural Networks
      k-Nearest Neighbor
      Decision Trees




       Muhammad Najmi Ahmad Zabidi     ICSRGC 2012     22/26
Intro   API calls
                             Issues    Anti Debugger/AntiVM strings
                         Objectives    Feature Ranking Selection with Information Gain
                       Methodology     Classification and Clustering
                         Conclusion




Table: Differences between clustering and classification

 Classification                        Clustering

 Deals with known data                 Deals with unknown data

 Supervised learning                   Unsupervised learning

 Popular algorithms includes:
      Random Forest
      Neural Networks
      k-Nearest Neighbor
      Decision Trees




       Muhammad Najmi Ahmad Zabidi     ICSRGC 2012     22/26
Intro   API calls
                             Issues    Anti Debugger/AntiVM strings
                         Objectives    Feature Ranking Selection with Information Gain
                       Methodology     Classification and Clustering
                         Conclusion




Table: Differences between clustering and classification

 Classification                        Clustering

 Deals with known data                 Deals with unknown data

 Supervised learning                   Unsupervised learning

 Popular algorithms includes:          Popular algorithms includes:
      Random Forest                          K-means
      Neural Networks                        Fuzzy C
      k-Nearest Neighbor                     Gaussian
      Decision Trees




       Muhammad Najmi Ahmad Zabidi     ICSRGC 2012     22/26
Intro   API calls
                            Issues    Anti Debugger/AntiVM strings
                        Objectives    Feature Ranking Selection with Information Gain
                      Methodology     Classification and Clustering
                        Conclusion




Classification (supervised) chosen to deal with known
corpus but incomplete data
Clustering (unsupervised) chosen to deal with new inputs




       Muhammad Najmi Ahmad Zabidi    ICSRGC 2012     23/26
Intro   API calls
                               Issues    Anti Debugger/AntiVM strings
                           Objectives    Feature Ranking Selection with Information Gain
                         Methodology     Classification and Clustering
                           Conclusion



Some results

    We managed to detect several malware samples by using
    the existing API traces and other features (bot
    commands, file/registry deletion)
    New malware which is more sophisticated -
    Stuxned/Duqu is very platform specific - attacking SCADA
    system hence needs more reading on detecting them.
    Perhaps the most obvious if any XOR’ed communication
    channels being used.




          Muhammad Najmi Ahmad Zabidi    ICSRGC 2012     24/26
Intro
                                 Issues
                             Objectives
                           Methodology
                             Conclusion



The flow

  Feature Selection              Feature Categorization
               Weka, Octave/Matlab



     Clustering                            Classification
                Weka, Octave/Matlab
           scipy, Octave/Matlab

    Visualization
                                   scipy, Octave/Matlab




            Muhammad Najmi Ahmad Zabidi      ICSRGC 2012   25/26
Intro
                                         Issues
                                     Objectives
                                   Methodology
                                     Conclusion




Altaher, A., Ramadass, S., and Ali, A. (2011).
Computer Virus Detection Using Features Ranking and Machine Learning.
Australian Journal of Basic and Applied Sciences, 5(9):1482--1486.

Jiang, Q., Zhao, X., and Huang, K. (2011).
A feature selection method for malware detection.
In 2011 IEEE International Conference on Information and Automation (ICIA), pages 890--895.

Pietrek, M. (1994).
Peering Inside the PE: A Tour of the Win32 Portable Executable File Format.
http://msdn.microsoft.com/en-us/library/ms809762.aspx.

Singhal, P. and Raul, N. (2012).
Malware detection module using machine learning algorithms to assist in centralized security in enterprise
networks.
International Journal of Network Security & Its Applications, 4.

Zhang, B., Yin, J., Hao, J., Wang, S., and Zhang, D. (2007).
New malicious code detection based on n-gram analysis and rough set theory.
pages 626--633. Springer-Verlag, Berlin, Heidelberg.




              Muhammad Najmi Ahmad Zabidi               ICSRGC 2012   26/26

Más contenido relacionado

La actualidad más candente

Volatile IOCs for Fast Incident Response
Volatile IOCs for Fast Incident ResponseVolatile IOCs for Fast Incident Response
Volatile IOCs for Fast Incident Response
Takahiro Haruyama
 
One-Byte Modification for Breaking Memory Forensic Analysis
One-Byte Modification for Breaking Memory Forensic AnalysisOne-Byte Modification for Breaking Memory Forensic Analysis
One-Byte Modification for Breaking Memory Forensic Analysis
Takahiro Haruyama
 
Malicious File for Exploiting Forensic Software
Malicious File for Exploiting Forensic SoftwareMalicious File for Exploiting Forensic Software
Malicious File for Exploiting Forensic Software
Takahiro Haruyama
 
Source Boston 2009 - Anti-Debugging A Developers Viewpoint
Source Boston 2009 - Anti-Debugging A Developers ViewpointSource Boston 2009 - Anti-Debugging A Developers Viewpoint
Source Boston 2009 - Anti-Debugging A Developers Viewpoint
Tyler Shields
 
Dmitriy evdokimov. light and dark side of code instrumentation
Dmitriy evdokimov. light and dark side of code instrumentationDmitriy evdokimov. light and dark side of code instrumentation
Dmitriy evdokimov. light and dark side of code instrumentation
Yury Chemerkin
 

La actualidad más candente (19)

Hack.Lu 2010 - Escaping Protected Mode Internet Explorer
Hack.Lu 2010 - Escaping Protected Mode Internet ExplorerHack.Lu 2010 - Escaping Protected Mode Internet Explorer
Hack.Lu 2010 - Escaping Protected Mode Internet Explorer
 
[ITAS.VN]CheckMarx-CxSuite-Sample result for webgoat5.3rc1
[ITAS.VN]CheckMarx-CxSuite-Sample result for webgoat5.3rc1[ITAS.VN]CheckMarx-CxSuite-Sample result for webgoat5.3rc1
[ITAS.VN]CheckMarx-CxSuite-Sample result for webgoat5.3rc1
 
Volatile IOCs for Fast Incident Response
Volatile IOCs for Fast Incident ResponseVolatile IOCs for Fast Incident Response
Volatile IOCs for Fast Incident Response
 
Java Exploit Analysis .
Java Exploit Analysis .Java Exploit Analysis .
Java Exploit Analysis .
 
One-Byte Modification for Breaking Memory Forensic Analysis
One-Byte Modification for Breaking Memory Forensic AnalysisOne-Byte Modification for Breaking Memory Forensic Analysis
One-Byte Modification for Breaking Memory Forensic Analysis
 
Malware Analysis and Defeating using Virtual Machines
Malware Analysis and Defeating using Virtual MachinesMalware Analysis and Defeating using Virtual Machines
Malware Analysis and Defeating using Virtual Machines
 
44CON London - Attacking VxWorks: from Stone Age to Interstellar
44CON London - Attacking VxWorks: from Stone Age to Interstellar44CON London - Attacking VxWorks: from Stone Age to Interstellar
44CON London - Attacking VxWorks: from Stone Age to Interstellar
 
Malicious File for Exploiting Forensic Software
Malicious File for Exploiting Forensic SoftwareMalicious File for Exploiting Forensic Software
Malicious File for Exploiting Forensic Software
 
Fast and Generic Malware Triage Using openioc_scan Volatility Plugin
Fast and Generic Malware Triage Using openioc_scan Volatility PluginFast and Generic Malware Triage Using openioc_scan Volatility Plugin
Fast and Generic Malware Triage Using openioc_scan Volatility Plugin
 
Revealing the Attack Operations Targeting Japan by Shusei Tomonaga & Yuu Nak...
Revealing the Attack Operations Targeting Japan by  Shusei Tomonaga & Yuu Nak...Revealing the Attack Operations Targeting Japan by  Shusei Tomonaga & Yuu Nak...
Revealing the Attack Operations Targeting Japan by Shusei Tomonaga & Yuu Nak...
 
SyScan 2016 - Remote code execution via Java native deserialization
SyScan 2016 - Remote code execution via Java native deserializationSyScan 2016 - Remote code execution via Java native deserialization
SyScan 2016 - Remote code execution via Java native deserialization
 
На страже ваших денег и данных
На страже ваших денег и данныхНа страже ваших денег и данных
На страже ваших денег и данных
 
Automating Malware Analysis
Automating Malware AnalysisAutomating Malware Analysis
Automating Malware Analysis
 
The Hunter Games: How to Find the Adversary with Event Query Language
The Hunter Games: How to Find the Adversary with Event Query LanguageThe Hunter Games: How to Find the Adversary with Event Query Language
The Hunter Games: How to Find the Adversary with Event Query Language
 
100% Code Coverage in Real World Software
100% Code Coverage in Real World Software100% Code Coverage in Real World Software
100% Code Coverage in Real World Software
 
Under the hood of modern HIPS-es and Windows access control mechanisms
Under the hood of modern HIPS-es and Windows access control mechanismsUnder the hood of modern HIPS-es and Windows access control mechanisms
Under the hood of modern HIPS-es and Windows access control mechanisms
 
Source Boston 2009 - Anti-Debugging A Developers Viewpoint
Source Boston 2009 - Anti-Debugging A Developers ViewpointSource Boston 2009 - Anti-Debugging A Developers Viewpoint
Source Boston 2009 - Anti-Debugging A Developers Viewpoint
 
B-Sides Seattle 2012 Offensive Defense
B-Sides Seattle 2012 Offensive DefenseB-Sides Seattle 2012 Offensive Defense
B-Sides Seattle 2012 Offensive Defense
 
Dmitriy evdokimov. light and dark side of code instrumentation
Dmitriy evdokimov. light and dark side of code instrumentationDmitriy evdokimov. light and dark side of code instrumentation
Dmitriy evdokimov. light and dark side of code instrumentation
 

Destacado

FooCodeChu - Services for Software Analysis, Malware Detection, and Vulnerabi...
FooCodeChu - Services for Software Analysis, Malware Detection, and Vulnerabi...FooCodeChu - Services for Software Analysis, Malware Detection, and Vulnerabi...
FooCodeChu - Services for Software Analysis, Malware Detection, and Vulnerabi...
Silvio Cesare
 
Applications of genetic algorithms to malware detection and creation
Applications of genetic algorithms to malware detection and creationApplications of genetic algorithms to malware detection and creation
Applications of genetic algorithms to malware detection and creation
UltraUploader
 

Destacado (13)

FooCodeChu - Services for Software Analysis, Malware Detection, and Vulnerabi...
FooCodeChu - Services for Software Analysis, Malware Detection, and Vulnerabi...FooCodeChu - Services for Software Analysis, Malware Detection, and Vulnerabi...
FooCodeChu - Services for Software Analysis, Malware Detection, and Vulnerabi...
 
Applications of genetic algorithms to malware detection and creation
Applications of genetic algorithms to malware detection and creationApplications of genetic algorithms to malware detection and creation
Applications of genetic algorithms to malware detection and creation
 
Zero Day Malware Detection/Prevention Using Open Source Software
Zero Day Malware Detection/Prevention Using Open Source SoftwareZero Day Malware Detection/Prevention Using Open Source Software
Zero Day Malware Detection/Prevention Using Open Source Software
 
Seminar
SeminarSeminar
Seminar
 
Ensembled Based Categorization and Adaptive Learning Model for Malware Detection
Ensembled Based Categorization and Adaptive Learning Model for Malware DetectionEnsembled Based Categorization and Adaptive Learning Model for Malware Detection
Ensembled Based Categorization and Adaptive Learning Model for Malware Detection
 
Anomaly Detection using String Analysis for Android Malware Detection - CISIS...
Anomaly Detection using String Analysis for Android Malware Detection - CISIS...Anomaly Detection using String Analysis for Android Malware Detection - CISIS...
Anomaly Detection using String Analysis for Android Malware Detection - CISIS...
 
Next Generation Advanced Malware Detection and Defense
Next Generation Advanced Malware Detection and DefenseNext Generation Advanced Malware Detection and Defense
Next Generation Advanced Malware Detection and Defense
 
Malware Detection - A Machine Learning Perspective
Malware Detection - A Machine Learning PerspectiveMalware Detection - A Machine Learning Perspective
Malware Detection - A Machine Learning Perspective
 
Malware Detection Using Machine Learning Techniques
Malware Detection Using Machine Learning TechniquesMalware Detection Using Machine Learning Techniques
Malware Detection Using Machine Learning Techniques
 
Malware Detection using Machine Learning
Malware Detection using Machine Learning	Malware Detection using Machine Learning
Malware Detection using Machine Learning
 
Data Science Driven Malware Detection
Data Science Driven Malware DetectionData Science Driven Malware Detection
Data Science Driven Malware Detection
 
Model-checking for efficient malware detection
Model-checking for efficient malware detectionModel-checking for efficient malware detection
Model-checking for efficient malware detection
 
Big Data - 25 Amazing Facts Everyone Should Know
Big Data - 25 Amazing Facts Everyone Should KnowBig Data - 25 Amazing Facts Everyone Should Know
Big Data - 25 Amazing Facts Everyone Should Know
 

Similar a Challenges in High Accuracy of Malware Detection

Introduction to automated quality assurance
Introduction to automated quality assuranceIntroduction to automated quality assurance
Introduction to automated quality assurance
Philip Johnson
 
STAF 在自動化測試上的延伸應用 -- TMSTAF (TrendMicro STAF)
STAF 在自動化測試上的延伸應用 -- TMSTAF (TrendMicro STAF)STAF 在自動化測試上的延伸應用 -- TMSTAF (TrendMicro STAF)
STAF 在自動化測試上的延伸應用 -- TMSTAF (TrendMicro STAF)
pycontw
 
Ccnsp course outline
Ccnsp course outlineCcnsp course outline
Ccnsp course outline
Ralbary
 
Ccnsp course outline
Ccnsp course outlineCcnsp course outline
Ccnsp course outline
Ralbary
 
Application Security TRENDS – Lessons Learnt- Firosh Ummer
Application Security TRENDS – Lessons Learnt- Firosh UmmerApplication Security TRENDS – Lessons Learnt- Firosh Ummer
Application Security TRENDS – Lessons Learnt- Firosh Ummer
OWASP-Qatar Chapter
 

Similar a Challenges in High Accuracy of Malware Detection (20)

Presentation on vulnerability analysis
Presentation on vulnerability analysisPresentation on vulnerability analysis
Presentation on vulnerability analysis
 
PROVIDING CYBER SECURITY SOLUTION FOR MALWARE DETECTION USING SUPPORT VECTOR ...
PROVIDING CYBER SECURITY SOLUTION FOR MALWARE DETECTION USING SUPPORT VECTOR ...PROVIDING CYBER SECURITY SOLUTION FOR MALWARE DETECTION USING SUPPORT VECTOR ...
PROVIDING CYBER SECURITY SOLUTION FOR MALWARE DETECTION USING SUPPORT VECTOR ...
 
Mechanical Cheat
Mechanical CheatMechanical Cheat
Mechanical Cheat
 
Java Code Quality Improvements - DevWeek
Java Code Quality Improvements - DevWeekJava Code Quality Improvements - DevWeek
Java Code Quality Improvements - DevWeek
 
MobSecCon 2015 - Dynamic Analysis of Android Apps
MobSecCon 2015 - Dynamic Analysis of Android AppsMobSecCon 2015 - Dynamic Analysis of Android Apps
MobSecCon 2015 - Dynamic Analysis of Android Apps
 
Approximating Attack Surfaces with Stack Traces [ICSE 15]
Approximating Attack Surfaces with Stack Traces [ICSE 15]Approximating Attack Surfaces with Stack Traces [ICSE 15]
Approximating Attack Surfaces with Stack Traces [ICSE 15]
 
Introduction to automated quality assurance
Introduction to automated quality assuranceIntroduction to automated quality assurance
Introduction to automated quality assurance
 
STAF 在自動化測試上的延伸應用 -- TMSTAF (TrendMicro STAF)
STAF 在自動化測試上的延伸應用 -- TMSTAF (TrendMicro STAF)STAF 在自動化測試上的延伸應用 -- TMSTAF (TrendMicro STAF)
STAF 在自動化測試上的延伸應用 -- TMSTAF (TrendMicro STAF)
 
Ccnsp course outline
Ccnsp course outlineCcnsp course outline
Ccnsp course outline
 
Ccnsp course outline
Ccnsp course outlineCcnsp course outline
Ccnsp course outline
 
Ownux global Oct 2023.pdf
Ownux global Oct 2023.pdfOwnux global Oct 2023.pdf
Ownux global Oct 2023.pdf
 
Introduction to Web Application Penetration Testing
Introduction to Web Application Penetration TestingIntroduction to Web Application Penetration Testing
Introduction to Web Application Penetration Testing
 
Automated Android Malware Detection Using Optimal Ensemble Learning Approach ...
Automated Android Malware Detection Using Optimal Ensemble Learning Approach ...Automated Android Malware Detection Using Optimal Ensemble Learning Approach ...
Automated Android Malware Detection Using Optimal Ensemble Learning Approach ...
 
An Efficient Framework for Detection & Classification of IoT BotNet.pptx
An Efficient Framework for Detection & Classification of IoT BotNet.pptxAn Efficient Framework for Detection & Classification of IoT BotNet.pptx
An Efficient Framework for Detection & Classification of IoT BotNet.pptx
 
Literature Review on DDOS Attacks Detection Using SVM algorithm.
Literature Review on DDOS Attacks Detection Using SVM algorithm.Literature Review on DDOS Attacks Detection Using SVM algorithm.
Literature Review on DDOS Attacks Detection Using SVM algorithm.
 
Introduction to Application Security Testing
Introduction to Application Security TestingIntroduction to Application Security Testing
Introduction to Application Security Testing
 
Introduction to Adversary Evaluation Tools
Introduction to Adversary Evaluation ToolsIntroduction to Adversary Evaluation Tools
Introduction to Adversary Evaluation Tools
 
Application Security TRENDS – Lessons Learnt- Firosh Ummer
Application Security TRENDS – Lessons Learnt- Firosh UmmerApplication Security TRENDS – Lessons Learnt- Firosh Ummer
Application Security TRENDS – Lessons Learnt- Firosh Ummer
 
Manual Testing
Manual TestingManual Testing
Manual Testing
 
Near-memory & In-Memory Detection of Fileless Malware
Near-memory & In-Memory Detection of Fileless MalwareNear-memory & In-Memory Detection of Fileless Malware
Near-memory & In-Memory Detection of Fileless Malware
 

Challenges in High Accuracy of Malware Detection

  • 1. Intro Issues Objectives Methodology Conclusion Challenges in High Accuracy of Malware Detection Muhammad Najmi Ahmad Zabidi International Islamic University Malaysia IEEE Control & System Graduate Research Colloquium 2012 Shah Alam, Malaysia 16th July 2012 Muhammad Najmi Ahmad Zabidi ICSRGC 2012 1/26
  • 2. Intro Issues Objectives Methodology Conclusion About I am a research grad student at Universiti Teknologi Malaysia, Skudai, Johor Bahru, Malaysia My current employer is International Islamic University Malaysia, Kuala Lumpur Research area - malware detection, narrowing on Windows executables Muhammad Najmi Ahmad Zabidi ICSRGC 2012 2/26
  • 3. Intro Issues Objectives Methodology Conclusion Malware in short is a software maliciousness is defined on the risks exposed to the user sometimes, when in vague, the term ‘‘Potentially Unwanted Program/Application’’ (PUP/PUA) being used Muhammad Najmi Ahmad Zabidi ICSRGC 2012 3/26
  • 4. Intro Issues Objectives Methodology Conclusion Methods of detections Static analysis In this case we have developed a Python based tool, called as pi-ngaji, an open source tool for static malware analysis Dynamic analysis In this case we will execute the malware in a Windows environment and dump the API traces into a text file Muhammad Najmi Ahmad Zabidi ICSRGC 2012 4/26
  • 5. Intro Issues Objectives Methodology Conclusion This talk outline several challenges on the current methods of malware detection Muhammad Najmi Ahmad Zabidi ICSRGC 2012 5/26
  • 6. Intro Issues Objectives Methodology Conclusion Analysis of strings Important, although not foolproof Find interesting calls first Considered static analysis, since no executing of the binary Muhammad Najmi Ahmad Zabidi ICSRGC 2012 6/26
  • 7. Intro Issues Objectives Methodology Conclusion Methods to find interesting strings Use strings command (on *NIX systems) Editors Checking with Import Address Table (IAT) Muhammad Najmi Ahmad Zabidi ICSRGC 2012 7/26
  • 8. Intro Issues Objectives Methodology Conclusion Issues Malware numbers are enormous Need automation in handling the detection Our proposal - use Machine Learning methods Muhammad Najmi Ahmad Zabidi ICSRGC 2012 8/26
  • 9. Intro Issues Objectives Methodology Conclusion Objectives Reducing features in malware API since Some are weak, irrelevant features Considered as ‘‘noise’’ Feature selection, ranking method is chosen Muhammad Najmi Ahmad Zabidi ICSRGC 2012 9/26
  • 10. Intro API calls Issues Anti Debugger/AntiVM strings Objectives Feature Ranking Selection with Information Gain Methodology Classification and Clustering Conclusion The features The following are the features Application Programming Interface (API) calls XOR’ed strings Anti virtualization/virtual machine detector Binary entropy is also interesting Muhammad Najmi Ahmad Zabidi ICSRGC 2012 10/26
  • 11. Intro API calls Issues Anti Debugger/AntiVM strings Objectives Feature Ranking Selection with Information Gain Methodology Classification and Clustering Conclusion Binary file structure Figure: Structure of a PE file[Pietrek, 1994] Muhammad Najmi Ahmad Zabidi ICSRGC 2012 11/26
  • 12. Intro API calls Issues Anti Debugger/AntiVM strings Objectives Feature Ranking Selection with Information Gain Methodology Classification and Clustering Conclusion Figure: PE components, simplified Muhammad Najmi Ahmad Zabidi ICSRGC 2012 12/26
  • 13. Intro API calls Issues Anti Debugger/AntiVM strings Objectives Feature Ranking Selection with Information Gain Methodology Classification and Clustering Conclusion API calls Features are as follows: Example of Features GetSystemTimeAsFileTime SetUnhandledExceptionFilte GetCurrentProces TerminateProcess LoadLibraryExW GetVersionExW GetProcAddress Muhammad Najmi Ahmad Zabidi ICSRGC 2012 13/26
  • 14. Intro API calls Issues Anti Debugger/AntiVM strings Objectives Feature Ranking Selection with Information Gain Methodology Classification and Clustering Conclusion Anti Debugger/AntiVM strings IsDebuggerPresent VMCheck.dll Muhammad Najmi Ahmad Zabidi ICSRGC 2012 14/26
  • 15. Intro API calls Issues Anti Debugger/AntiVM strings Objectives Feature Ranking Selection with Information Gain Methodology Classification and Clustering Conclusion "Red Pill":"x0fx01x0dx00x00x00x00xc3", "VirtualPc trick":"x0fx3fx07x0b", "VMware trick":"VMXh", "VMCheck.dll":"x45xC7x00x01", "VMCheck.dll for VirtualPC":"x0fx3fx07x0bxc7x45xfcxffxffxffxff", "Xen":"XenVMM", # Or XenVMMXenVMM "Bochs & QEmu CPUID Trick":"x44x4dx41x63", "Torpig VMM Trick": "xE8xEDxFFxFFxFFx25x00x00x00xFF x33xC9x3Dx00x00x00x80x0Fx95xC1x8BxC1xC3", "Torpig (UPX) VMM Trick": "x51x51x0Fx01x27x00xC1xFBxB5xD5x35 x02xE2xC3xD1x66x25x32 xBDx83x7FxB7x4Ex3Dx06x80x0Fx95xC1x8BxC1xC3" Source: ZeroWine source code Muhammad Najmi Ahmad Zabidi ICSRGC 2012 15/26
  • 16. Intro API calls Issues Anti Debugger/AntiVM strings Objectives Feature Ranking Selection with Information Gain Methodology Classification and Clustering Conclusion Sample execution Analyzing e665297bf9dbb2b2790e4d898d70c9e9 Analyzing registry... [+] Malware is Adding a Key at Hive: HKEY_LOCAL_MACHINE ^G^@Label11^@^A^AÃˇ^Nreg add "HKEY_LOCAL_MACHINESOFTWAREMicrosoftWindows NTCurrentVersion R File Execution OptionsRx.exe" /v debugger /t REG_SZ /d %systemrot%repair1sass.exe /f^M .... [+] Malware Seems to be IRC BOT: Verified By String : ADMIN [+] Malware Seems to be IRC BOT: Verified By String : LIST [+] Malware Seems to be IRC BOT: Verified By String : QUIT [+] Malware Seems to be IRC BOT: Verified By String : VERSION Analyzing interesting calls.. [+] Found an Interesting call to: FindWindow [+] Found an Interesting call to: LoadLibraryA [+] Found an Interesting call to: CreateProcess [+] Found an Interesting call to: GetProcAddress [+] Found an Interesting call to: CopyFile [+] Found an Interesting call to: shdocvw Muhammad Najmi Ahmad Zabidi ICSRGC 2012 16/26
  • 17. Intro API calls Issues Anti Debugger/AntiVM strings Objectives Feature Ranking Selection with Information Gain Methodology Classification and Clustering Conclusion Advantages on the researcher’s side Malware writers usually are ‘‘lazy’’ hence there is a tendency they will reuse the previous chunk of codes Hence, it’s easier to trace the previous family based on the commonalities Muhammad Najmi Ahmad Zabidi ICSRGC 2012 17/26
  • 18. Intro API calls Issues Anti Debugger/AntiVM strings Objectives Feature Ranking Selection with Information Gain Methodology Classification and Clustering Conclusion Our methods Roughly our methods consist of : 1 Feature Selection(Ranking/Pruning) 2 Supervised Classification 3 Unsupervised Classification Item 2) and 3) above also could be combined to a method known as ‘‘Semi Supervised Classification’’. Muhammad Najmi Ahmad Zabidi ICSRGC 2012 18/26
  • 19. Intro API calls Issues Anti Debugger/AntiVM strings Objectives Feature Ranking Selection with Information Gain Methodology Classification and Clustering Conclusion Information Gain [Zhang et al., 2007, Altaher et al., 2011, Singhal and Raul, 2012] use the following formula for IG application in malware The amount by which the entropy of X decreases reflects additional information about X provided by Y is called information gain, given by IG(X |Y ) = H(X ) − H(X |Y ) [Singhal and Raul, 2012] introduced the following algorithm to ‘‘correct out’’ error the results. n i−0 IG(Xi ) IG(X ) = IG(X ) ± n Muhammad Najmi Ahmad Zabidi ICSRGC 2012 19/26
  • 20. Intro API calls Issues Anti Debugger/AntiVM strings Objectives Feature Ranking Selection with Information Gain Methodology Classification and Clustering Conclusion Information Gain (cont’d) From [Jiang et al., 2011] P(t , c) IG(t) = P(t , c)log P(t )P(c) c∈{ci ,ci } t ∈{t,t} Muhammad Najmi Ahmad Zabidi ICSRGC 2012 20/26
  • 21. Intro API calls Issues Anti Debugger/AntiVM strings Objectives Feature Ranking Selection with Information Gain Methodology Classification and Clustering Conclusion For research purpose the following issues are always wondered: No standard dataset, unlike Intrusion Detection System (IDS) area Fast-paced malware sample, will the datasets being used for the experiment will be questioned Last resort, stick to the existing database, try to free from any specific malware family as to make sure the method will/could work with incoming, new malware Muhammad Najmi Ahmad Zabidi ICSRGC 2012 21/26
  • 22. Intro API calls Issues Anti Debugger/AntiVM strings Objectives Feature Ranking Selection with Information Gain Methodology Classification and Clustering Conclusion Table: Differences between clustering and classification Muhammad Najmi Ahmad Zabidi ICSRGC 2012 22/26
  • 23. Intro API calls Issues Anti Debugger/AntiVM strings Objectives Feature Ranking Selection with Information Gain Methodology Classification and Clustering Conclusion Table: Differences between clustering and classification Classification Muhammad Najmi Ahmad Zabidi ICSRGC 2012 22/26
  • 24. Intro API calls Issues Anti Debugger/AntiVM strings Objectives Feature Ranking Selection with Information Gain Methodology Classification and Clustering Conclusion Table: Differences between clustering and classification Classification Deals with known data Muhammad Najmi Ahmad Zabidi ICSRGC 2012 22/26
  • 25. Intro API calls Issues Anti Debugger/AntiVM strings Objectives Feature Ranking Selection with Information Gain Methodology Classification and Clustering Conclusion Table: Differences between clustering and classification Classification Deals with known data Supervised learning Muhammad Najmi Ahmad Zabidi ICSRGC 2012 22/26
  • 26. Intro API calls Issues Anti Debugger/AntiVM strings Objectives Feature Ranking Selection with Information Gain Methodology Classification and Clustering Conclusion Table: Differences between clustering and classification Classification Deals with known data Supervised learning Popular algorithms includes: Random Forest Neural Networks k-Nearest Neighbor Decision Trees Muhammad Najmi Ahmad Zabidi ICSRGC 2012 22/26
  • 27. Intro API calls Issues Anti Debugger/AntiVM strings Objectives Feature Ranking Selection with Information Gain Methodology Classification and Clustering Conclusion Table: Differences between clustering and classification Classification Deals with known data Supervised learning Popular algorithms includes: Random Forest Neural Networks k-Nearest Neighbor Decision Trees Muhammad Najmi Ahmad Zabidi ICSRGC 2012 22/26
  • 28. Intro API calls Issues Anti Debugger/AntiVM strings Objectives Feature Ranking Selection with Information Gain Methodology Classification and Clustering Conclusion Table: Differences between clustering and classification Classification Clustering Deals with known data Supervised learning Popular algorithms includes: Random Forest Neural Networks k-Nearest Neighbor Decision Trees Muhammad Najmi Ahmad Zabidi ICSRGC 2012 22/26
  • 29. Intro API calls Issues Anti Debugger/AntiVM strings Objectives Feature Ranking Selection with Information Gain Methodology Classification and Clustering Conclusion Table: Differences between clustering and classification Classification Clustering Deals with known data Deals with unknown data Supervised learning Popular algorithms includes: Random Forest Neural Networks k-Nearest Neighbor Decision Trees Muhammad Najmi Ahmad Zabidi ICSRGC 2012 22/26
  • 30. Intro API calls Issues Anti Debugger/AntiVM strings Objectives Feature Ranking Selection with Information Gain Methodology Classification and Clustering Conclusion Table: Differences between clustering and classification Classification Clustering Deals with known data Deals with unknown data Supervised learning Unsupervised learning Popular algorithms includes: Random Forest Neural Networks k-Nearest Neighbor Decision Trees Muhammad Najmi Ahmad Zabidi ICSRGC 2012 22/26
  • 31. Intro API calls Issues Anti Debugger/AntiVM strings Objectives Feature Ranking Selection with Information Gain Methodology Classification and Clustering Conclusion Table: Differences between clustering and classification Classification Clustering Deals with known data Deals with unknown data Supervised learning Unsupervised learning Popular algorithms includes: Popular algorithms includes: Random Forest K-means Neural Networks Fuzzy C k-Nearest Neighbor Gaussian Decision Trees Muhammad Najmi Ahmad Zabidi ICSRGC 2012 22/26
  • 32. Intro API calls Issues Anti Debugger/AntiVM strings Objectives Feature Ranking Selection with Information Gain Methodology Classification and Clustering Conclusion Classification (supervised) chosen to deal with known corpus but incomplete data Clustering (unsupervised) chosen to deal with new inputs Muhammad Najmi Ahmad Zabidi ICSRGC 2012 23/26
  • 33. Intro API calls Issues Anti Debugger/AntiVM strings Objectives Feature Ranking Selection with Information Gain Methodology Classification and Clustering Conclusion Some results We managed to detect several malware samples by using the existing API traces and other features (bot commands, file/registry deletion) New malware which is more sophisticated - Stuxned/Duqu is very platform specific - attacking SCADA system hence needs more reading on detecting them. Perhaps the most obvious if any XOR’ed communication channels being used. Muhammad Najmi Ahmad Zabidi ICSRGC 2012 24/26
  • 34. Intro Issues Objectives Methodology Conclusion The flow Feature Selection Feature Categorization Weka, Octave/Matlab Clustering Classification Weka, Octave/Matlab scipy, Octave/Matlab Visualization scipy, Octave/Matlab Muhammad Najmi Ahmad Zabidi ICSRGC 2012 25/26
  • 35. Intro Issues Objectives Methodology Conclusion Altaher, A., Ramadass, S., and Ali, A. (2011). Computer Virus Detection Using Features Ranking and Machine Learning. Australian Journal of Basic and Applied Sciences, 5(9):1482--1486. Jiang, Q., Zhao, X., and Huang, K. (2011). A feature selection method for malware detection. In 2011 IEEE International Conference on Information and Automation (ICIA), pages 890--895. Pietrek, M. (1994). Peering Inside the PE: A Tour of the Win32 Portable Executable File Format. http://msdn.microsoft.com/en-us/library/ms809762.aspx. Singhal, P. and Raul, N. (2012). Malware detection module using machine learning algorithms to assist in centralized security in enterprise networks. International Journal of Network Security & Its Applications, 4. Zhang, B., Yin, J., Hao, J., Wang, S., and Zhang, D. (2007). New malicious code detection based on n-gram analysis and rough set theory. pages 626--633. Springer-Verlag, Berlin, Heidelberg. Muhammad Najmi Ahmad Zabidi ICSRGC 2012 26/26