SlideShare una empresa de Scribd logo
1 de 37
Descargar para leer sin conexión
Introduction                      Assembler as a native language          Anomalies detection




                 Detecting abnormal executable files using
                            binary code mining

                                       Rechkov Anton

                               TU Berlin Germany & TTI SFU Russia


                                      21th March 2012




               Rechkov Anton          Lomonosov Scholarship Report   21th March 2012    1 / 31
Introduction                   Assembler as a native language              Anomalies detection




Malware evolution

       Ciphered
       Encrypted malware code of viruses


       Oligomorphic
       Generation of a decryptor by randomly selecting each piece of the decryptor
       from several predefined alternatives.


       Polymorphic
       Generation of a sample by encypting malware body and modifying decryptor
       each replication


       Metamorphic
       Reprograming all virus body by some obfuscation engine.

               Rechkov Anton       Lomonosov Scholarship Report       21th March 2012    2 / 31
Introduction                   Assembler as a native language           Anomalies detection




Modern detection technique


       Signature analysis
       Searching a determine pattern in code.


       Emulation
       Unpacking and analysis through the emulation of malware code and continue
       signature analysis.


       Behavioral analysis
       Analysis of functions graph flow.




               Rechkov Anton       Lomonosov Scholarship Report    21th March 2012    3 / 31
Introduction                         Assembler as a native language          Anomalies detection




Code modification



       Obfuscation
       Transformation of executable program code which preserves functionality, but
       complicates the analysis and understanding algorithms.


       Deobfuscation
       Resolving irrelevant code by
                  Algebraic models
                  Formal grammars




               Rechkov Anton             Lomonosov Scholarship Report   21th March 2012    4 / 31
Introduction                         Assembler as a native language          Anomalies detection




Code modification



       Obfuscation
       Transformation of executable program code which preserves functionality, but
       complicates the analysis and understanding algorithms.


       Deobfuscation
       Resolving irrelevant code by
                  Algebraic models
                  Formal grammars




               Rechkov Anton             Lomonosov Scholarship Report   21th March 2012    4 / 31
Introduction                     Assembler as a native language          Anomalies detection




Outline



       1       Assembler as a native language
                 Binary code mining
                 Native language processing
                 Stochastic models

       2       Anomalies detection




               Rechkov Anton         Lomonosov Scholarship Report   21th March 2012    5 / 31
Introduction                    Assembler as a native language          Anomalies detection


Binary code mining


Table of Contents


       1        Assembler as a native language
                  Binary code mining
                  Native language processing
                  Stochastic models

       2        Anomalies detection
                  Preparation
                  Code generator lexemes
                  Anomalies detection by neural networks
                  Anomalies detection by probability model



               Rechkov Anton        Lomonosov Scholarship Report   21th March 2012    6 / 31
Introduction                         Assembler as a native language              Anomalies detection


Binary code mining


Structure of compiler

                                                               Common compiler scheme
   Code generator engine:
               Machine code generator,
               Optimizers:
                      interprocedural
                      optimization (IPO),
                      profile-guided
                      optimization (PGO),
                      high-level optimizations
               Mutation code generator /
               obfuscator.



               Rechkov Anton             Lomonosov Scholarship Report       21th March 2012    7 / 31
Introduction                          Assembler as a native language          Anomalies detection


Binary code mining


Common Code generator features


       high-level optimizations
                  Unique intermediate language
                  Preoptimizing in intermediate representation


       Code generation
                  Code templates from Intermediate to Target
                  Number of used instruction types


       Machine dependent optimizer
                  Instructions cost



               Rechkov Anton              Lomonosov Scholarship Report   21th March 2012    8 / 31
Introduction                          Assembler as a native language          Anomalies detection


Binary code mining


Common Code generator features


       high-level optimizations
                  Unique intermediate language
                  Preoptimizing in intermediate representation


       Code generation
                  Code templates from Intermediate to Target
                  Number of used instruction types


       Machine dependent optimizer
                  Instructions cost



               Rechkov Anton              Lomonosov Scholarship Report   21th March 2012    8 / 31
Introduction                          Assembler as a native language          Anomalies detection


Binary code mining


Common Code generator features


       high-level optimizations
                  Unique intermediate language
                  Preoptimizing in intermediate representation


       Code generation
                  Code templates from Intermediate to Target
                  Number of used instruction types


       Machine dependent optimizer
                  Instructions cost



               Rechkov Anton              Lomonosov Scholarship Report   21th March 2012    8 / 31
Introduction                        Assembler as a native language            Anomalies detection


Binary code mining


Approving theory


       Experiment
                  Determine instruction sequences
                  Compile source code with compilers
                  Compare distributions


       Compilers
          ⇒ MSVC
          ⇒ LLVM
          ⇒ GCC
          ⇒ Intel C++ Compiler



               Rechkov Anton              Lomonosov Scholarship Report   21th March 2012    9 / 31
Introduction                        Assembler as a native language            Anomalies detection


Binary code mining


Approving theory


       Experiment
                  Determine instruction sequences
                  Compile source code with compilers
                  Compare distributions


       Compilers
          ⇒ MSVC
          ⇒ LLVM
          ⇒ GCC
          ⇒ Intel C++ Compiler



               Rechkov Anton              Lomonosov Scholarship Report   21th March 2012    9 / 31
Introduction                         Assembler as a native language                    Anomalies detection


Binary code mining


XTEA distribution test
                                 Frequency of words in binary.




                                (a) LLVM                              (b) MSVC




                               (c) Intel C++                          (d) GCC
               Rechkov Anton             Lomonosov Scholarship Report            21th March 2012    10 / 31
Introduction                         Assembler as a native language           Anomalies detection


Binary code mining



                               Optimize binary’s mean distribution




               Rechkov Anton             Lomonosov Scholarship Report   21th March 2012    11 / 31
Introduction                    Assembler as a native language           Anomalies detection


Native language processing


Table of Contents


       1        Assembler as a native language
                  Binary code mining
                  Native language processing
                  Stochastic models

       2        Anomalies detection
                  Preparation
                  Code generator lexemes
                  Anomalies detection by neural networks
                  Anomalies detection by probability model



               Rechkov Anton        Lomonosov Scholarship Report   21th March 2012    12 / 31
Introduction                   Assembler as a native language           Anomalies detection


Native language processing


Text Mining


       Language detection


       Author detection


       Text Classification


       Document clustering




               Rechkov Anton       Lomonosov Scholarship Report   21th March 2012    13 / 31
Introduction                    Assembler as a native language           Anomalies detection


Stochastic models


Table of Contents


       1        Assembler as a native language
                  Binary code mining
                  Native language processing
                  Stochastic models

       2        Anomalies detection
                  Preparation
                  Code generator lexemes
                  Anomalies detection by neural networks
                  Anomalies detection by probability model



               Rechkov Anton        Lomonosov Scholarship Report   21th March 2012    14 / 31
Introduction                        Assembler as a native language           Anomalies detection


Stochastic models


Neural networks


       Advantages
           + effectively with small number of training vectors
           + assessment of all samples proximity


       Disadvantages
               - predetermining model
                         manual words definition
                         manual excessive elements analysis
                         reeducation limitations




               Rechkov Anton            Lomonosov Scholarship Report   21th March 2012    15 / 31
Introduction                         Assembler as a native language           Anomalies detection


Stochastic models


Probability model


       Advantages
           + self-sufficient word definition
           + education only by positive vectors
           + education unification(flexible reeducation)


       Disadvantages
               - big sample set for education
               - errors while distribution determination
               - computational complexity




               Rechkov Anton             Lomonosov Scholarship Report   21th March 2012    16 / 31
Introduction                     Assembler as a native language           Anomalies detection




Outline



       1       Assembler as a native language

       2       Anomalies detection
                 Preparation
                 Code generator lexemes
                 Anomalies detection by neural networks
                 Anomalies detection by probability model




               Rechkov Anton         Lomonosov Scholarship Report   21th March 2012    17 / 31
Introduction                    Assembler as a native language           Anomalies detection


Preparation


Table of Contents


       1        Assembler as a native language
                  Binary code mining
                  Native language processing
                  Stochastic models

       2        Anomalies detection
                  Preparation
                  Code generator lexemes
                  Anomalies detection by neural networks
                  Anomalies detection by probability model



               Rechkov Anton        Lomonosov Scholarship Report   21th March 2012    18 / 31
Introduction                          Assembler as a native language           Anomalies detection


Preparation


Collect statistics samples



       Python
                  Detection list of max repeated sequences
                  Disassembling
                  Searching strings


       Matlab
                  Stochastic models




               Rechkov Anton              Lomonosov Scholarship Report   21th March 2012    19 / 31
Introduction                          Assembler as a native language           Anomalies detection


Preparation


Collect statistics samples



       Python
                  Detection list of max repeated sequences
                  Disassembling
                  Searching strings


       Matlab
                  Stochastic models




               Rechkov Anton              Lomonosov Scholarship Report   21th March 2012    19 / 31
Introduction                          Assembler as a native language           Anomalies detection


Preparation


Collect statistics samples



       Python
                  Detection list of max repeated sequences
                  Disassembling
                  Searching strings


       Matlab
                  Stochastic models




               Rechkov Anton              Lomonosov Scholarship Report   21th March 2012    19 / 31
Introduction                    Assembler as a native language           Anomalies detection


Code generator lexemes


Table of Contents


       1        Assembler as a native language
                  Binary code mining
                  Native language processing
                  Stochastic models

       2        Anomalies detection
                  Preparation
                  Code generator lexemes
                  Anomalies detection by neural networks
                  Anomalies detection by probability model



               Rechkov Anton        Lomonosov Scholarship Report   21th March 2012    20 / 31
Introduction                         Assembler as a native language           Anomalies detection


Code generator lexemes


From disassembling to lexemes




       Lexem
                  3 to 6 instruction length sequences
                  ignore unknown bytes
                  maximum repeated sequences




               Rechkov Anton             Lomonosov Scholarship Report   21th March 2012    21 / 31
Introduction                   Assembler as a native language                   Anomalies detection


 Code generator lexemes


 Lexemes analysis


                                                                   Suffix Tree example


Suffix tree:
       Economy memory,
       String searching faster then O(N 2 ),
       Fast assessment of maximum
       repeats in strings




                Rechkov Anton       Lomonosov Scholarship Report           21th March 2012    22 / 31
Introduction                             Assembler as a native language           Anomalies detection


Anomalies detection by neural networks


Table of Contents


       1        Assembler as a native language
                  Binary code mining
                  Native language processing
                  Stochastic models

       2        Anomalies detection
                  Preparation
                  Code generator lexemes
                  Anomalies detection by neural networks
                  Anomalies detection by probability model



               Rechkov Anton                 Lomonosov Scholarship Report   21th March 2012    23 / 31
Introduction                             Assembler as a native language                     Anomalies detection


Anomalies detection by neural networks


Radial basis networks



                                                                            Neural net architecture

      no need to choose the number of
      hidden layers
      lack of the pathology convergence
      fast convergence through a
      combination of learning algorithms.




               Rechkov Anton                 Lomonosov Scholarship Report             21th March 2012    24 / 31
Introduction                              Assembler as a native language           Anomalies detection


Anomalies detection by neural networks


Detection compilers

                                         Compiler detection testing




               Rechkov Anton                  Lomonosov Scholarship Report   21th March 2012    25 / 31
Introduction                               Assembler as a native language           Anomalies detection


Anomalies detection by probability model


Table of Contents


       1        Assembler as a native language
                  Binary code mining
                  Native language processing
                  Stochastic models

       2        Anomalies detection
                  Preparation
                  Code generator lexemes
                  Anomalies detection by neural networks
                  Anomalies detection by probability model



               Rechkov Anton                   Lomonosov Scholarship Report   21th March 2012    26 / 31
Introduction                               Assembler as a native language                                       Anomalies detection


Anomalies detection by probability model


Multivariate Gamma

                                                                  Empirical and theoretical PDF
                                                                           of element

   Using a set of bi- and 3-variate                                        40

   Gamma:                                                                  35
                                                                                                                          Gamma PDF
                                                                                                                          Empirical PDF


               Suggest Gamma                                               30

               distribution                                                25


               Sample proximity

                                                                     PDF
                                                                           20



               Fast education                                              15


                                                                           10


                                                                            5


                                                                            0
                                                                           −0.02   0   0.02   0.04       0.06      0.08      0.1       0.12
                                                                                                     X




               Rechkov Anton                   Lomonosov Scholarship Report                          21th March 2012               27 / 31
Introduction                                            Assembler as a native language                                                Anomalies detection


Anomalies detection by probability model


Probability model testing

                Error graphs of compiler probabilities based on coefficient of
                              minimal value Pp = Pmin ∗ 10coef
                                             i       i



                 1                                                                         1
                                               false positive GCC O0                                                                          false positive MS
                                               false negative Clang                       0.9                                                 false negative LLVM
                0.9
                                               false negative Intel
                                               false negative GCC O2                      0.8
                0.8                            false negative MS

                0.7                                                                       0.7


                0.6                                                                       0.6




                                                                                  error
        error




                0.5                                                                       0.5


                0.4                                                                       0.4


                0.3                                                                       0.3


                0.2                                                                       0.2


                0.1                                                                       0.1


                 0                                                                         0
                      0   1   2   3    4        5      6    7      8   9   10                   0   2   4   6    8       10      12      14       16      18        20
                                      coeff for min value                                                       coeff for min value




                  Rechkov Anton                              Lomonosov Scholarship Report                              21th March 2012                    28 / 31
Introduction                                            Assembler as a native language                                                       Anomalies detection


Anomalies detection by probability model


Probability model testing


                                        Problem of existing zero elements


                 1                                                                                1
                                                            false positive GCC O2                                                              false positive GCC O2
                                                            false negative Clang                 0.9                                           false negative Clang
                0.9
                                                            false negative Intel                                                               false negative Intel
                                                            false negative GCC O0                                                              false negative GCC O0
                0.8                                                                              0.8
                                                            false negative MS                                                                  false negative MS

                0.7                                                                              0.7


                0.6                                                                              0.6




                                                                                         error
        error




                0.5                                                                              0.5


                0.4                                                                              0.4


                0.3                                                                              0.3


                0.2                                                                              0.2


                0.1                                                                              0.1


                 0                                                                                0
                      0   1   2   3    4        5      6    7      8      9         10                 0   1   2   3    4        5      6       7     8      9         10
                                      coeff for min value                                                              coeff for min value




                  Rechkov Anton                              Lomonosov Scholarship Report                                     21th March 2012                 29 / 31
Introduction                               Assembler as a native language           Anomalies detection


Anomalies detection by probability model


Conclusion


                  Proposed connection between native language and
                  assembler
                  Developed algorithms of lexical assembler language
                  analyzes
                  Developed experimental stochastic models:
                         Based on neural networks
                         Based on probability model
                  Realized lexical assembler language analysis.
                  Approximate false positive errors of compiler detection:
                         27%
                         10-15%


               Rechkov Anton                   Lomonosov Scholarship Report   21th March 2012    30 / 31
Introduction                               Assembler as a native language           Anomalies detection


Anomalies detection by probability model




                                           Questions?




               Rechkov Anton                   Lomonosov Scholarship Report   21th March 2012    31 / 31

Más contenido relacionado

La actualidad más candente

Program & language generation
Program & language generationProgram & language generation
Program & language generationBuxoo Abdullah
 
Generations of programming_language.kum_ari11-1-1-1
Generations of programming_language.kum_ari11-1-1-1Generations of programming_language.kum_ari11-1-1-1
Generations of programming_language.kum_ari11-1-1-1lakshmi kumari neelapu
 
.Net platform an understanding
.Net platform an understanding.Net platform an understanding
.Net platform an understandingBinu Bhasuran
 
Generations of Programming Languages
Generations of Programming LanguagesGenerations of Programming Languages
Generations of Programming LanguagesTarun Sharma
 
Programming language design and implemenation
Programming language design and implemenationProgramming language design and implemenation
Programming language design and implemenationAshwini Awatare
 
Domain Specific Language with pleasure
Domain Specific Language with pleasureDomain Specific Language with pleasure
Domain Specific Language with pleasureVaclav Pech
 
Programming Language
Programming  LanguageProgramming  Language
Programming LanguageAdeel Hamid
 
Computer Programming Overview
Computer Programming OverviewComputer Programming Overview
Computer Programming Overviewagorolabs
 
mbeddr meets IncQuer - Combining the Best Features of Two Modeling Worlds
mbeddr meets IncQuer - Combining the Best Features of Two Modeling Worldsmbeddr meets IncQuer - Combining the Best Features of Two Modeling Worlds
mbeddr meets IncQuer - Combining the Best Features of Two Modeling WorldsIstvan Rath
 
generation of programming language
 generation of programming language generation of programming language
generation of programming languagekunalkumar500
 
Presentation1
Presentation1Presentation1
Presentation1kpkcsc
 
Machine language to artificial intelligence
Machine language to artificial intelligenceMachine language to artificial intelligence
Machine language to artificial intelligenceSuneel Dogra
 
Software languages
Software languagesSoftware languages
Software languagesEelco Visser
 
Evolution of programming languages
Evolution of programming languagesEvolution of programming languages
Evolution of programming languagesNitin Kumar Kashyap
 
History of Programming Language
History of Programming LanguageHistory of Programming Language
History of Programming Languagetahria123
 
Can programming be liberated from the von neumman style
Can programming be liberated from the von neumman styleCan programming be liberated from the von neumman style
Can programming be liberated from the von neumman styleshady_10
 

La actualidad más candente (19)

Program & language generation
Program & language generationProgram & language generation
Program & language generation
 
Generations of programming_language.kum_ari11-1-1-1
Generations of programming_language.kum_ari11-1-1-1Generations of programming_language.kum_ari11-1-1-1
Generations of programming_language.kum_ari11-1-1-1
 
.Net platform an understanding
.Net platform an understanding.Net platform an understanding
.Net platform an understanding
 
Presentation on Programming Languages.
Presentation on Programming Languages.Presentation on Programming Languages.
Presentation on Programming Languages.
 
Generations of Programming Languages
Generations of Programming LanguagesGenerations of Programming Languages
Generations of Programming Languages
 
Programming language design and implemenation
Programming language design and implemenationProgramming language design and implemenation
Programming language design and implemenation
 
Domain Specific Language with pleasure
Domain Specific Language with pleasureDomain Specific Language with pleasure
Domain Specific Language with pleasure
 
Introduction to c language
Introduction to c language Introduction to c language
Introduction to c language
 
Programming Language
Programming  LanguageProgramming  Language
Programming Language
 
Computer Programming Overview
Computer Programming OverviewComputer Programming Overview
Computer Programming Overview
 
mbeddr meets IncQuer - Combining the Best Features of Two Modeling Worlds
mbeddr meets IncQuer - Combining the Best Features of Two Modeling Worldsmbeddr meets IncQuer - Combining the Best Features of Two Modeling Worlds
mbeddr meets IncQuer - Combining the Best Features of Two Modeling Worlds
 
generation of programming language
 generation of programming language generation of programming language
generation of programming language
 
Presentation1
Presentation1Presentation1
Presentation1
 
Machine language to artificial intelligence
Machine language to artificial intelligenceMachine language to artificial intelligence
Machine language to artificial intelligence
 
Software languages
Software languagesSoftware languages
Software languages
 
Evolution of programming languages
Evolution of programming languagesEvolution of programming languages
Evolution of programming languages
 
History of Programming Language
History of Programming LanguageHistory of Programming Language
History of Programming Language
 
Can programming be liberated from the von neumman style
Can programming be liberated from the von neumman styleCan programming be liberated from the von neumman style
Can programming be liberated from the von neumman style
 
Computer programming languages
Computer programming languagesComputer programming languages
Computer programming languages
 

Último

BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...Sapna Thakur
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfchloefrazer622
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docxPoojaSen20
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajanpragatimahajan3
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room servicediscovermytutordmt
 

Último (20)

BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdf
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docx
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajan
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 

Rechkov. Lomonosov Report

  • 1. Introduction Assembler as a native language Anomalies detection Detecting abnormal executable files using binary code mining Rechkov Anton TU Berlin Germany & TTI SFU Russia 21th March 2012 Rechkov Anton Lomonosov Scholarship Report 21th March 2012 1 / 31
  • 2. Introduction Assembler as a native language Anomalies detection Malware evolution Ciphered Encrypted malware code of viruses Oligomorphic Generation of a decryptor by randomly selecting each piece of the decryptor from several predefined alternatives. Polymorphic Generation of a sample by encypting malware body and modifying decryptor each replication Metamorphic Reprograming all virus body by some obfuscation engine. Rechkov Anton Lomonosov Scholarship Report 21th March 2012 2 / 31
  • 3. Introduction Assembler as a native language Anomalies detection Modern detection technique Signature analysis Searching a determine pattern in code. Emulation Unpacking and analysis through the emulation of malware code and continue signature analysis. Behavioral analysis Analysis of functions graph flow. Rechkov Anton Lomonosov Scholarship Report 21th March 2012 3 / 31
  • 4. Introduction Assembler as a native language Anomalies detection Code modification Obfuscation Transformation of executable program code which preserves functionality, but complicates the analysis and understanding algorithms. Deobfuscation Resolving irrelevant code by Algebraic models Formal grammars Rechkov Anton Lomonosov Scholarship Report 21th March 2012 4 / 31
  • 5. Introduction Assembler as a native language Anomalies detection Code modification Obfuscation Transformation of executable program code which preserves functionality, but complicates the analysis and understanding algorithms. Deobfuscation Resolving irrelevant code by Algebraic models Formal grammars Rechkov Anton Lomonosov Scholarship Report 21th March 2012 4 / 31
  • 6. Introduction Assembler as a native language Anomalies detection Outline 1 Assembler as a native language Binary code mining Native language processing Stochastic models 2 Anomalies detection Rechkov Anton Lomonosov Scholarship Report 21th March 2012 5 / 31
  • 7. Introduction Assembler as a native language Anomalies detection Binary code mining Table of Contents 1 Assembler as a native language Binary code mining Native language processing Stochastic models 2 Anomalies detection Preparation Code generator lexemes Anomalies detection by neural networks Anomalies detection by probability model Rechkov Anton Lomonosov Scholarship Report 21th March 2012 6 / 31
  • 8. Introduction Assembler as a native language Anomalies detection Binary code mining Structure of compiler Common compiler scheme Code generator engine: Machine code generator, Optimizers: interprocedural optimization (IPO), profile-guided optimization (PGO), high-level optimizations Mutation code generator / obfuscator. Rechkov Anton Lomonosov Scholarship Report 21th March 2012 7 / 31
  • 9. Introduction Assembler as a native language Anomalies detection Binary code mining Common Code generator features high-level optimizations Unique intermediate language Preoptimizing in intermediate representation Code generation Code templates from Intermediate to Target Number of used instruction types Machine dependent optimizer Instructions cost Rechkov Anton Lomonosov Scholarship Report 21th March 2012 8 / 31
  • 10. Introduction Assembler as a native language Anomalies detection Binary code mining Common Code generator features high-level optimizations Unique intermediate language Preoptimizing in intermediate representation Code generation Code templates from Intermediate to Target Number of used instruction types Machine dependent optimizer Instructions cost Rechkov Anton Lomonosov Scholarship Report 21th March 2012 8 / 31
  • 11. Introduction Assembler as a native language Anomalies detection Binary code mining Common Code generator features high-level optimizations Unique intermediate language Preoptimizing in intermediate representation Code generation Code templates from Intermediate to Target Number of used instruction types Machine dependent optimizer Instructions cost Rechkov Anton Lomonosov Scholarship Report 21th March 2012 8 / 31
  • 12. Introduction Assembler as a native language Anomalies detection Binary code mining Approving theory Experiment Determine instruction sequences Compile source code with compilers Compare distributions Compilers ⇒ MSVC ⇒ LLVM ⇒ GCC ⇒ Intel C++ Compiler Rechkov Anton Lomonosov Scholarship Report 21th March 2012 9 / 31
  • 13. Introduction Assembler as a native language Anomalies detection Binary code mining Approving theory Experiment Determine instruction sequences Compile source code with compilers Compare distributions Compilers ⇒ MSVC ⇒ LLVM ⇒ GCC ⇒ Intel C++ Compiler Rechkov Anton Lomonosov Scholarship Report 21th March 2012 9 / 31
  • 14. Introduction Assembler as a native language Anomalies detection Binary code mining XTEA distribution test Frequency of words in binary. (a) LLVM (b) MSVC (c) Intel C++ (d) GCC Rechkov Anton Lomonosov Scholarship Report 21th March 2012 10 / 31
  • 15. Introduction Assembler as a native language Anomalies detection Binary code mining Optimize binary’s mean distribution Rechkov Anton Lomonosov Scholarship Report 21th March 2012 11 / 31
  • 16. Introduction Assembler as a native language Anomalies detection Native language processing Table of Contents 1 Assembler as a native language Binary code mining Native language processing Stochastic models 2 Anomalies detection Preparation Code generator lexemes Anomalies detection by neural networks Anomalies detection by probability model Rechkov Anton Lomonosov Scholarship Report 21th March 2012 12 / 31
  • 17. Introduction Assembler as a native language Anomalies detection Native language processing Text Mining Language detection Author detection Text Classification Document clustering Rechkov Anton Lomonosov Scholarship Report 21th March 2012 13 / 31
  • 18. Introduction Assembler as a native language Anomalies detection Stochastic models Table of Contents 1 Assembler as a native language Binary code mining Native language processing Stochastic models 2 Anomalies detection Preparation Code generator lexemes Anomalies detection by neural networks Anomalies detection by probability model Rechkov Anton Lomonosov Scholarship Report 21th March 2012 14 / 31
  • 19. Introduction Assembler as a native language Anomalies detection Stochastic models Neural networks Advantages + effectively with small number of training vectors + assessment of all samples proximity Disadvantages - predetermining model manual words definition manual excessive elements analysis reeducation limitations Rechkov Anton Lomonosov Scholarship Report 21th March 2012 15 / 31
  • 20. Introduction Assembler as a native language Anomalies detection Stochastic models Probability model Advantages + self-sufficient word definition + education only by positive vectors + education unification(flexible reeducation) Disadvantages - big sample set for education - errors while distribution determination - computational complexity Rechkov Anton Lomonosov Scholarship Report 21th March 2012 16 / 31
  • 21. Introduction Assembler as a native language Anomalies detection Outline 1 Assembler as a native language 2 Anomalies detection Preparation Code generator lexemes Anomalies detection by neural networks Anomalies detection by probability model Rechkov Anton Lomonosov Scholarship Report 21th March 2012 17 / 31
  • 22. Introduction Assembler as a native language Anomalies detection Preparation Table of Contents 1 Assembler as a native language Binary code mining Native language processing Stochastic models 2 Anomalies detection Preparation Code generator lexemes Anomalies detection by neural networks Anomalies detection by probability model Rechkov Anton Lomonosov Scholarship Report 21th March 2012 18 / 31
  • 23. Introduction Assembler as a native language Anomalies detection Preparation Collect statistics samples Python Detection list of max repeated sequences Disassembling Searching strings Matlab Stochastic models Rechkov Anton Lomonosov Scholarship Report 21th March 2012 19 / 31
  • 24. Introduction Assembler as a native language Anomalies detection Preparation Collect statistics samples Python Detection list of max repeated sequences Disassembling Searching strings Matlab Stochastic models Rechkov Anton Lomonosov Scholarship Report 21th March 2012 19 / 31
  • 25. Introduction Assembler as a native language Anomalies detection Preparation Collect statistics samples Python Detection list of max repeated sequences Disassembling Searching strings Matlab Stochastic models Rechkov Anton Lomonosov Scholarship Report 21th March 2012 19 / 31
  • 26. Introduction Assembler as a native language Anomalies detection Code generator lexemes Table of Contents 1 Assembler as a native language Binary code mining Native language processing Stochastic models 2 Anomalies detection Preparation Code generator lexemes Anomalies detection by neural networks Anomalies detection by probability model Rechkov Anton Lomonosov Scholarship Report 21th March 2012 20 / 31
  • 27. Introduction Assembler as a native language Anomalies detection Code generator lexemes From disassembling to lexemes Lexem 3 to 6 instruction length sequences ignore unknown bytes maximum repeated sequences Rechkov Anton Lomonosov Scholarship Report 21th March 2012 21 / 31
  • 28. Introduction Assembler as a native language Anomalies detection Code generator lexemes Lexemes analysis Suffix Tree example Suffix tree: Economy memory, String searching faster then O(N 2 ), Fast assessment of maximum repeats in strings Rechkov Anton Lomonosov Scholarship Report 21th March 2012 22 / 31
  • 29. Introduction Assembler as a native language Anomalies detection Anomalies detection by neural networks Table of Contents 1 Assembler as a native language Binary code mining Native language processing Stochastic models 2 Anomalies detection Preparation Code generator lexemes Anomalies detection by neural networks Anomalies detection by probability model Rechkov Anton Lomonosov Scholarship Report 21th March 2012 23 / 31
  • 30. Introduction Assembler as a native language Anomalies detection Anomalies detection by neural networks Radial basis networks Neural net architecture no need to choose the number of hidden layers lack of the pathology convergence fast convergence through a combination of learning algorithms. Rechkov Anton Lomonosov Scholarship Report 21th March 2012 24 / 31
  • 31. Introduction Assembler as a native language Anomalies detection Anomalies detection by neural networks Detection compilers Compiler detection testing Rechkov Anton Lomonosov Scholarship Report 21th March 2012 25 / 31
  • 32. Introduction Assembler as a native language Anomalies detection Anomalies detection by probability model Table of Contents 1 Assembler as a native language Binary code mining Native language processing Stochastic models 2 Anomalies detection Preparation Code generator lexemes Anomalies detection by neural networks Anomalies detection by probability model Rechkov Anton Lomonosov Scholarship Report 21th March 2012 26 / 31
  • 33. Introduction Assembler as a native language Anomalies detection Anomalies detection by probability model Multivariate Gamma Empirical and theoretical PDF of element Using a set of bi- and 3-variate 40 Gamma: 35 Gamma PDF Empirical PDF Suggest Gamma 30 distribution 25 Sample proximity PDF 20 Fast education 15 10 5 0 −0.02 0 0.02 0.04 0.06 0.08 0.1 0.12 X Rechkov Anton Lomonosov Scholarship Report 21th March 2012 27 / 31
  • 34. Introduction Assembler as a native language Anomalies detection Anomalies detection by probability model Probability model testing Error graphs of compiler probabilities based on coefficient of minimal value Pp = Pmin ∗ 10coef i i 1 1 false positive GCC O0 false positive MS false negative Clang 0.9 false negative LLVM 0.9 false negative Intel false negative GCC O2 0.8 0.8 false negative MS 0.7 0.7 0.6 0.6 error error 0.5 0.5 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0.1 0 0 0 1 2 3 4 5 6 7 8 9 10 0 2 4 6 8 10 12 14 16 18 20 coeff for min value coeff for min value Rechkov Anton Lomonosov Scholarship Report 21th March 2012 28 / 31
  • 35. Introduction Assembler as a native language Anomalies detection Anomalies detection by probability model Probability model testing Problem of existing zero elements 1 1 false positive GCC O2 false positive GCC O2 false negative Clang 0.9 false negative Clang 0.9 false negative Intel false negative Intel false negative GCC O0 false negative GCC O0 0.8 0.8 false negative MS false negative MS 0.7 0.7 0.6 0.6 error error 0.5 0.5 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0.1 0 0 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 coeff for min value coeff for min value Rechkov Anton Lomonosov Scholarship Report 21th March 2012 29 / 31
  • 36. Introduction Assembler as a native language Anomalies detection Anomalies detection by probability model Conclusion Proposed connection between native language and assembler Developed algorithms of lexical assembler language analyzes Developed experimental stochastic models: Based on neural networks Based on probability model Realized lexical assembler language analysis. Approximate false positive errors of compiler detection: 27% 10-15% Rechkov Anton Lomonosov Scholarship Report 21th March 2012 30 / 31
  • 37. Introduction Assembler as a native language Anomalies detection Anomalies detection by probability model Questions? Rechkov Anton Lomonosov Scholarship Report 21th March 2012 31 / 31