SlideShare a Scribd company logo
1 of 52
Download to read offline
Platform-independent static
binary code analysis using a meta-
        assembly language
    Thomas Dullien, Sebastian Porst
          zynamics GmbH


          CanSecWest 2009
Overview

The REIL Language


  Abstract Interpretation


    MonoREIL


       Results
                            2
Motivation
• Bugs are getting harder to find
• Defensive side (most notably Microsoft) has
  invested a lot of money in a „bugocide“
• Concerted effort: Lots of manual code auditing
  aided by static analysis tools
• Phoenix RDK: Includes „lattice based“ analysis
  framework to allow pluggable abstract
  interpretation in the compiler

                                                   3
Motivation
• Offense needs automated tools if they want to
  avoid being sidelined
• Offensive static analysis: Depth vs. Breadth
• Offense has no source code, no Phoenix RDK,
  and should not depend on Microsoft
• We want a static analysis framework for
  offensive purposes


                                                  4
Overview

The REIL Language


  Abstract Interpretation


    MonoREIL


       Results
                            5
REIL
• Reverse Engineering Intermediate Language
• Platform-Independent meta-assembly language
• Specifically made for static code analysis of
  binary files
• Can be recovered from arbitrary native
  assembly code
  – Supported so far: x86, PowerPC, ARM


                                                  6
Advantages of REIL
•   Very small instruction set (17 instructions)
•   Instructions are very simple
•   Operands are very simple
•   Free of side-effects
•   Analysis algorithms can be written in a
    platform-independent way
    – Great for security researchers working on more
      than one platform

                                                       7
Creation of REIL code
• Input: Disassembled Function
  – x86, ARM, PowerPC, potentially others
• Each native assembly instruction is translated to
  one or more REIL instructions
• Output: The original function in REIL code




                                                      8
Example




          9
Design Criteria
• Simplicity
• Small number of instructions
  – Simplifies abstract interpretation (more later)
• Explicit flag modeling
  – Simplifies reasoning about control-flow
• Explicit load and store instructions
• No side-effects


                                                      10
REIL Instructions
• One Address
  – Source Address * 0x100 + n
  – Easy to map REIL instructions back to input code
• One Mnemonic
• Three Operands
  – Always
• An arbitrary amount of meta-data
  – Nearly unused at this point

                                                       11
REIL Operands
• All operands are typed
  – Can be either registers, literals, or sub-addresses
  – No complex expressions
• All operands have a size
  – 1 byte, 2 bytes, 4 bytes, ...




                                                          12
The REIL Instruction Set
• Arithmetic Instructions
  – ADD, SUB, MUL, DIV, MOD, BSH
• Bitwise Instructions
  – AND, OR, XOR
• Data Transfer Instructions
  – LDM, STM, STR




                                   13
The REIL Instruction Set
• Conditional Instructions
  – BISZ, JCC
• Other Instructions
  – NOP, UNDEF, UNKN


• Instruction set is easily extensible



                                         14
REIL Architecture
• Register Machine
  – Unlimited number of registers t0, t1, ...
  – No explicit stack
• Simulated Memory
  – Infinite storage
  – Automatically assumes endianness of the source
    platform



                                                     15
Limitations of REIL
• Does not support certain instructions (FPU,
  MMX, Ring-0, ...) yet
• Can not handle exceptions in a platform-
  independent way
• Can not handle self-modifying code
• Does not correctly deal with memory selectors



                                                  16
Overview

The REIL Language


  Abstract Interpretation


    MonoREIL


       Results
                            17
Abstract Interpretation
• Theoretical background for most code analysis
• Developed by Patrick and Rhadia Cousot around
  1975-1977
• Formalizes „static abstract reasoning about
  dynamic properties“
• Huh ?
• A lot of the literature is a bit dense for many
  security practitioners

                                                    18
Abstract Interpretation
• We want to make statements about programs
• Example: Possible set of values for variable x at
  a given program point p
• In essence: For each point p, we want to find
  Kp P(States)
• Problem: P(States) is a bit unwieldly
• Problem: Many questions are undecidable
  (where is the w*nker that yells „halting
  problem“) ?
                                                      19
Dealing with unwieldy stuff
• Reason about something simpler:

                       Abstraction
          P(States)                    D
                      Concretisation
          P(States)                    D


• Example: Values vs. Intervals



                                           20
Lattices
• In order for this to work, D must be structurally
  similar to P(States)
• P(States) supports intersection and union
• You can check for inclusion (contains, does not
  contain)
• You have an empty set (bottom) and
  „everything“ (top)


                                                      21
Lattices
• A lattice is something like a generalized
  powerset
• Example lattices: Intervals, Signs, P(Registers ) ,
  mod p




                                                        22
Dealing with halting
• Original program consists of p1 ... pn program
  points
• Each instruction transforms a set of states into a
  different set of states
• p1 ... pn are mappings P(States) P(States)
• Specify p'1  p'n : D D
• This yields us ~ : D n D n
                  p



                                                       23
Dealing with halting
•   We cheat: Let D be finite  Dn      is finite
                    ~
•   Make sure that is monotonous (like this talk)
                    p
•   Begin with initial state I
•   Calculate ~(l )
               p
•   Calculate ~( ~(l ))
               p p
                            ~ n (l ) ~ n 1 (l )
•   Eventually, you reach p          p
•   You are done – read off the results and see if
    your question is answered
                                                     24
Theory vs. practice
• A lot of the academic focus is on proving
  correctness of the transforms
                           pi
           P(States)                 P(States)
                          pi '
                 D                  D
• As practitioner we know that pi is probably not
  fully correctly specified
• We care much more about choosing and
  constructing a D so that we get the results we need
                                                        25
Overview

The REIL Language


  Abstract Interpretation


    MonoREIL


       Results
                            26
MonoREIL
• You want to do static analysis
• You do not want to write a full abstract
  interpretation framework
• We provide one: MonoREIL
• A simple-to-use abstract interpretation
  framework based on REIL



                                             27
What does it do ?
• You give it
  – The control flow graph of a function (2 LOC)
  – A way to walk through the CFG (1 + n LOC)
  – The lattice D (15 + n LOC)
     • Lattice Elements
     • A way to combine lattice elements
  – The initial state (12 + n LOC)
  – Effects of REIL instructions on D (50 + n LOC)


                                                     28
How does it work?
• Fixed-point iteration until final state is found
• Interpretation of result
  – Map results back to original assembly code
• Implementation of MonoREIL already exists
• Usable from Java, ECMAScript, Python, Ruby




                                                     29
Overview

The REIL Language


  Abstract Interpretation


    MonoREIL


       Results
                            30
Register Tracking
• First Example: Simple
• Question: What are the effects of a register on
  other instructions?
• Useful for following register values




                                                    31
Register Tracking
• Demo




                             32
Register Tracking
• Lattice: For each instruction, set of influenced
  registers, combine with union
• Initial State
  – Empty (nearly) everywhere
  – Start instruction: { tracked register }
• Transformations for MNEM op1, op2, op3
  – If op1 or op2 are tracked  op3 is tracked too
  – Otherwise: op3 is removed from set

                                                     33
Negative indexing
• Second Example: More complicated
• Question: Is this function indexing into an array
  with a negative value ?
• This gets a bit more involved




                                                      34
Negative indexing
• Simple intervals alone do not help us much
• How would you model a situation where
  – A function gets a structure pointer as argument
  – The function retrieves a pointer to an array from an
    array of pointers in the structure
  – The function then indexes negatively into this array
• Uh. Ok.


                                                           35
Abstract locations
• For each instruction, what are the contents of the
  registers ? Let‘s slowly build complexity:
• If eax contains arg_4, how could this be modelled ?
   – eax = *(esp.in + 8)
• If eax contains arg_4 + 4 ?
   – eax = *(esp.in + 8) + 4
• If eax can contain arg_4+4, arg_4+8, arg_4+16,
  arg_4 + 20 ?
   – eax = *(esp.in + 8) + [4, 20]

                                                        36
Abstract locations
• If eax can contain arg_4+4, arg_8+16 ?
  – eax = *(esp.in + [8,12]) + [4,16]
• If eax can contain any element from
  – arg_4mem[0] to arg_4mem[10], incremented
    once, how do we model this ?
  – eax = *(*(esp.in + [8,8]) + [4, 44]) + [1,1]
• OK. An abstract location is a base value and a
  list of intervals, each denoting memory
  dereferences (except the last)
                                                   37
Range Tracking

eax.in + [a, b] + [0, 0]




                 eax.in + a   eax.in + b




                                           38
Range Tracking

             eax + [a, b] + [c, d] + [0, 0]

                        eax + a                   eax + b




[eax+a]+c   [eax+a]+d        [eax+a+4]+c      [eax+a+4]+d   [eax+b]+c   [eax+b]+d
                                                                                    39
Range Tracking
• Lattice: For each instruction, a map:
            Register  Aloc Aloc

• Initial State
  – Empty (nearly) everywhere
  – Start instruction: { reg -> reg.in + [0,0] }
• Transformations
  – Complicated. Next slide.


                                                   40
Range Tracking
• Transformations
  – ADD/SUB are simple: Operate on last intervals
  – STM op1, , op3
     • If op1 or op3 not in our input map M skip
     • Otherwise, M[ M[op3] ] = op1
  – LDM op1, , op3
     • If op1 or op3 is not in our input map M skip
     • M[ op3 ] = M[ op1 ]
  – Others: Case-specific hacks

                                                      41
Range Tracking
• Where is the meat ?
• Real world example: Find negative array
  indexing




                                            42
MS08-67
• Function takes in argument to a buffer
• Function performs complex pointer arithmetic
• Attacker can make this pointer arithmetic go
  bad
• The pointer to the target buffer of a wcscpy will
  be decremented beyond the beginning of the
  buffer


                                                      43
MS08-67
• Michael Howard‘s Blog:
  – “In my opinion, hand reviewing this code and
    successfully finding this bug would require a great deal
    of skill and luck. So what about tools? It's very difficult
    to design an algorithm which can analyze C or C++ code
    for these sorts of errors. The possible variable states
    grows very, very quickly. It's even more difficult to take
    such algorithms and scale them to non-trivial code
    bases. This is made more complex as the function
    accepts a highly variable argument, it's not like the
    argument is the value 1, 2 or 3! Our present toolset
    does not catch this bug.”

                                                                  44
MS08-67
• Michael is correct
  – He has to defend all of Windows
  – His „regular“ developers have to live with the
    results of the automated tools
  – His computational costs for an analysis are gigantic
  – His developers have low tolerance for false positives




                                                            45
MS08-67
• Attackers might have it easier
  – They usually have a much smaller target
  – They are highly motivated: I will tolerate 100 false
    positives for each „real“ bug
     • I can work through 20-50 a day
     • A week for a bug is still worth it
  – False positive reduction is nice, but if I have to read
    100 functions instead of 20000, I have already
    gained something

                                                              46
MS08-67
• Demo




                   47
Limitations and assumptions
• Limitations and assumptions
  – The presented analysis does not deal with aliasing
  – We make no claims about soundness
  – We do not use conditional control-flow information
  – We are still wrestling with calling convention issues
  – The important bit is not our analysis itself – the
    important part is MonoREIL
  – Analysis algorithms will improve over time – laying
    the foundations was the boring part

                                                            48
Status
• Abstract interpretation framework available in
  BinNavi
• Currently x86
• In April (two weeks !): PPC and ARM
  – Was only a matter of adding REIL translators
• Some example analyses:
  – Register tracking (lame, but useful !)
  – Negative array indexing (less lame, also useful !)

                                                         49
Outlook
• Deobfuscation through optimizing REIL
• More precise and better static analysis
• Register tracking etc. release in April (two
  weeks !)
• Negative array indexing etc. release in October
• Attempting to encourage others to build their
  own lattices


                                                    50
Related work ?
• Julien Vanegue / ERESI team (EKOPARTY)
• Tyler Durden‘s Phrack 64 article
• Principles of Program Analysis
  (Nielson/Nielson/Hankin)
• University of Wisconsin WISA project
• Possibly related: GrammaTech CodeSurfer x86



                                                51
Questions ?



 ( Good Bye, Canada )




                        52

More Related Content

What's hot

Data types and Operators Continued
Data types and Operators ContinuedData types and Operators Continued
Data types and Operators ContinuedMohamed Samy
 
Preparing Java 7 Certifications
Preparing Java 7 CertificationsPreparing Java 7 Certifications
Preparing Java 7 CertificationsGiacomo Veneri
 
EXTENT-2016: Industry Practices of Advanced Program Analysis
EXTENT-2016: Industry Practices of Advanced Program AnalysisEXTENT-2016: Industry Practices of Advanced Program Analysis
EXTENT-2016: Industry Practices of Advanced Program AnalysisIosif Itkin
 
Lecture 11 semantic analysis 2
Lecture 11 semantic analysis 2Lecture 11 semantic analysis 2
Lecture 11 semantic analysis 2Iffat Anjum
 
45 aop-programming
45 aop-programming45 aop-programming
45 aop-programmingdaotuan85
 
P04 restricted boltzmann machines cvpr2012 deep learning methods for vision
P04 restricted boltzmann machines cvpr2012 deep learning methods for visionP04 restricted boltzmann machines cvpr2012 deep learning methods for vision
P04 restricted boltzmann machines cvpr2012 deep learning methods for visionzukun
 
Bert pre_training_of_deep_bidirectional_transformers_for_language_understanding
Bert  pre_training_of_deep_bidirectional_transformers_for_language_understandingBert  pre_training_of_deep_bidirectional_transformers_for_language_understanding
Bert pre_training_of_deep_bidirectional_transformers_for_language_understandingThyrixYang1
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingMinh Pham
 
FregeDay: Design and Implementation of the language (Ingo Wechsung)
FregeDay: Design and Implementation of the language (Ingo Wechsung)FregeDay: Design and Implementation of the language (Ingo Wechsung)
FregeDay: Design and Implementation of the language (Ingo Wechsung)Dierk König
 
Sentence-to-Code Traceability Recovery with Domain Ontologies
Sentence-to-Code Traceability Recovery with Domain OntologiesSentence-to-Code Traceability Recovery with Domain Ontologies
Sentence-to-Code Traceability Recovery with Domain OntologiesShinpei Hayashi
 
02 direct3 d_pipeline
02 direct3 d_pipeline02 direct3 d_pipeline
02 direct3 d_pipelineGirish Ghate
 
A Feature-Based Model for Nested Named-Entity Recognition at VLSP-2018 NER Ev...
A Feature-Based Model for Nested Named-Entity Recognition at VLSP-2018 NER Ev...A Feature-Based Model for Nested Named-Entity Recognition at VLSP-2018 NER Ev...
A Feature-Based Model for Nested Named-Entity Recognition at VLSP-2018 NER Ev...Minh Pham
 
14.jun.2012
14.jun.201214.jun.2012
14.jun.2012Tech_MX
 
Scala: functional programming for the imperative mind
Scala: functional programming for the imperative mindScala: functional programming for the imperative mind
Scala: functional programming for the imperative mindSander Mak (@Sander_Mak)
 
The Rise of Dynamic Languages
The Rise of Dynamic LanguagesThe Rise of Dynamic Languages
The Rise of Dynamic Languagesgreenwop
 

What's hot (20)

Slides
SlidesSlides
Slides
 
Data types and Operators Continued
Data types and Operators ContinuedData types and Operators Continued
Data types and Operators Continued
 
Preparing Java 7 Certifications
Preparing Java 7 CertificationsPreparing Java 7 Certifications
Preparing Java 7 Certifications
 
EXTENT-2016: Industry Practices of Advanced Program Analysis
EXTENT-2016: Industry Practices of Advanced Program AnalysisEXTENT-2016: Industry Practices of Advanced Program Analysis
EXTENT-2016: Industry Practices of Advanced Program Analysis
 
Lecture 11 semantic analysis 2
Lecture 11 semantic analysis 2Lecture 11 semantic analysis 2
Lecture 11 semantic analysis 2
 
45 aop-programming
45 aop-programming45 aop-programming
45 aop-programming
 
P04 restricted boltzmann machines cvpr2012 deep learning methods for vision
P04 restricted boltzmann machines cvpr2012 deep learning methods for visionP04 restricted boltzmann machines cvpr2012 deep learning methods for vision
P04 restricted boltzmann machines cvpr2012 deep learning methods for vision
 
Bert pre_training_of_deep_bidirectional_transformers_for_language_understanding
Bert  pre_training_of_deep_bidirectional_transformers_for_language_understandingBert  pre_training_of_deep_bidirectional_transformers_for_language_understanding
Bert pre_training_of_deep_bidirectional_transformers_for_language_understanding
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
 
Bert
BertBert
Bert
 
Garbage
GarbageGarbage
Garbage
 
FregeDay: Design and Implementation of the language (Ingo Wechsung)
FregeDay: Design and Implementation of the language (Ingo Wechsung)FregeDay: Design and Implementation of the language (Ingo Wechsung)
FregeDay: Design and Implementation of the language (Ingo Wechsung)
 
Sentence-to-Code Traceability Recovery with Domain Ontologies
Sentence-to-Code Traceability Recovery with Domain OntologiesSentence-to-Code Traceability Recovery with Domain Ontologies
Sentence-to-Code Traceability Recovery with Domain Ontologies
 
02 direct3 d_pipeline
02 direct3 d_pipeline02 direct3 d_pipeline
02 direct3 d_pipeline
 
A Feature-Based Model for Nested Named-Entity Recognition at VLSP-2018 NER Ev...
A Feature-Based Model for Nested Named-Entity Recognition at VLSP-2018 NER Ev...A Feature-Based Model for Nested Named-Entity Recognition at VLSP-2018 NER Ev...
A Feature-Based Model for Nested Named-Entity Recognition at VLSP-2018 NER Ev...
 
From DOT to Dotty
From DOT to DottyFrom DOT to Dotty
From DOT to Dotty
 
14.jun.2012
14.jun.201214.jun.2012
14.jun.2012
 
Scala: functional programming for the imperative mind
Scala: functional programming for the imperative mindScala: functional programming for the imperative mind
Scala: functional programming for the imperative mind
 
Boost.Dispatch
Boost.DispatchBoost.Dispatch
Boost.Dispatch
 
The Rise of Dynamic Languages
The Rise of Dynamic LanguagesThe Rise of Dynamic Languages
The Rise of Dynamic Languages
 

Similar to Platform-independent static binary code analysis using a meta-assembly language

Applications of the Reverse Engineering Language REIL
Applications of the Reverse Engineering Language REILApplications of the Reverse Engineering Language REIL
Applications of the Reverse Engineering Language REILzynamics GmbH
 
CNIT 127 Ch 1: Before you Begin
CNIT 127 Ch 1: Before you BeginCNIT 127 Ch 1: Before you Begin
CNIT 127 Ch 1: Before you BeginSam Bowne
 
The Factoring Dead: Preparing for the Cryptopocalypse
The Factoring Dead: Preparing for the CryptopocalypseThe Factoring Dead: Preparing for the Cryptopocalypse
The Factoring Dead: Preparing for the CryptopocalypseAlex Stamos
 
Recurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRURecurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRUananth
 
ESIL - Universal IL (Intermediate Language) for Radare2
ESIL - Universal IL (Intermediate Language) for Radare2ESIL - Universal IL (Intermediate Language) for Radare2
ESIL - Universal IL (Intermediate Language) for Radare2Anton Kochkov
 
CNIT 127 Ch Ch 1: Before you Begin
CNIT 127 Ch Ch 1: Before you BeginCNIT 127 Ch Ch 1: Before you Begin
CNIT 127 Ch Ch 1: Before you BeginSam Bowne
 
Can programming be liberated from the von neumann style?
Can programming be liberated from the von neumann style?Can programming be liberated from the von neumann style?
Can programming be liberated from the von neumann style?Oriol López Massaguer
 
Cs4hs2008 track a-programming
Cs4hs2008 track a-programmingCs4hs2008 track a-programming
Cs4hs2008 track a-programmingRashi Agarwal
 
WiNGS 2014 Workshop 2 R, RStudio, and reproducible research with knitr
WiNGS 2014 Workshop 2 R, RStudio, and reproducible research with knitrWiNGS 2014 Workshop 2 R, RStudio, and reproducible research with knitr
WiNGS 2014 Workshop 2 R, RStudio, and reproducible research with knitrAnn Loraine
 
Practical Malware Analysis: Ch 4 A Crash Course in x86 Disassembly
Practical Malware Analysis: Ch 4 A Crash Course in x86 Disassembly Practical Malware Analysis: Ch 4 A Crash Course in x86 Disassembly
Practical Malware Analysis: Ch 4 A Crash Course in x86 Disassembly Sam Bowne
 
CNIT 126 4: A Crash Course in x86 Disassembly
CNIT 126 4: A Crash Course in x86 DisassemblyCNIT 126 4: A Crash Course in x86 Disassembly
CNIT 126 4: A Crash Course in x86 DisassemblySam Bowne
 
Bluffers guide to Terminology
Bluffers guide to TerminologyBluffers guide to Terminology
Bluffers guide to TerminologyJim Gough
 
Clojure - An Introduction for Lisp Programmers
Clojure - An Introduction for Lisp ProgrammersClojure - An Introduction for Lisp Programmers
Clojure - An Introduction for Lisp Programmerselliando dias
 
Verilog Lecture1
Verilog Lecture1Verilog Lecture1
Verilog Lecture1Béo Tú
 

Similar to Platform-independent static binary code analysis using a meta-assembly language (20)

Verification with LoLA: 1 Basics
Verification with LoLA: 1 BasicsVerification with LoLA: 1 Basics
Verification with LoLA: 1 Basics
 
Applications of the Reverse Engineering Language REIL
Applications of the Reverse Engineering Language REILApplications of the Reverse Engineering Language REIL
Applications of the Reverse Engineering Language REIL
 
CNIT 127 Ch 1: Before you Begin
CNIT 127 Ch 1: Before you BeginCNIT 127 Ch 1: Before you Begin
CNIT 127 Ch 1: Before you Begin
 
The Factoring Dead: Preparing for the Cryptopocalypse
The Factoring Dead: Preparing for the CryptopocalypseThe Factoring Dead: Preparing for the Cryptopocalypse
The Factoring Dead: Preparing for the Cryptopocalypse
 
Recurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRURecurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRU
 
ESIL - Universal IL (Intermediate Language) for Radare2
ESIL - Universal IL (Intermediate Language) for Radare2ESIL - Universal IL (Intermediate Language) for Radare2
ESIL - Universal IL (Intermediate Language) for Radare2
 
Introduction to multicore .ppt
Introduction to multicore .pptIntroduction to multicore .ppt
Introduction to multicore .ppt
 
CNIT 127 Ch Ch 1: Before you Begin
CNIT 127 Ch Ch 1: Before you BeginCNIT 127 Ch Ch 1: Before you Begin
CNIT 127 Ch Ch 1: Before you Begin
 
Can programming be liberated from the von neumann style?
Can programming be liberated from the von neumann style?Can programming be liberated from the von neumann style?
Can programming be liberated from the von neumann style?
 
Cs4hs2008 track a-programming
Cs4hs2008 track a-programmingCs4hs2008 track a-programming
Cs4hs2008 track a-programming
 
WiNGS 2014 Workshop 2 R, RStudio, and reproducible research with knitr
WiNGS 2014 Workshop 2 R, RStudio, and reproducible research with knitrWiNGS 2014 Workshop 2 R, RStudio, and reproducible research with knitr
WiNGS 2014 Workshop 2 R, RStudio, and reproducible research with knitr
 
Practical Malware Analysis: Ch 4 A Crash Course in x86 Disassembly
Practical Malware Analysis: Ch 4 A Crash Course in x86 Disassembly Practical Malware Analysis: Ch 4 A Crash Course in x86 Disassembly
Practical Malware Analysis: Ch 4 A Crash Course in x86 Disassembly
 
Mcs lec2
Mcs lec2Mcs lec2
Mcs lec2
 
CNIT 126 4: A Crash Course in x86 Disassembly
CNIT 126 4: A Crash Course in x86 DisassemblyCNIT 126 4: A Crash Course in x86 Disassembly
CNIT 126 4: A Crash Course in x86 Disassembly
 
Paradigms
ParadigmsParadigms
Paradigms
 
Bluffers guide to Terminology
Bluffers guide to TerminologyBluffers guide to Terminology
Bluffers guide to Terminology
 
Clojure - An Introduction for Lisp Programmers
Clojure - An Introduction for Lisp ProgrammersClojure - An Introduction for Lisp Programmers
Clojure - An Introduction for Lisp Programmers
 
Extlect04
Extlect04Extlect04
Extlect04
 
Verilog Lecture1
Verilog Lecture1Verilog Lecture1
Verilog Lecture1
 
lecture 01.1.ppt
lecture 01.1.pptlecture 01.1.ppt
lecture 01.1.ppt
 

More from zynamics GmbH

Automated Mobile Malware Classification
Automated Mobile Malware ClassificationAutomated Mobile Malware Classification
Automated Mobile Malware Classificationzynamics GmbH
 
VxClass for Incident Response
VxClass for Incident ResponseVxClass for Incident Response
VxClass for Incident Responsezynamics GmbH
 
Malware classification
Malware classificationMalware classification
Malware classificationzynamics GmbH
 
Automated static deobfuscation in the context of Reverse Engineering
Automated static deobfuscation in the context of Reverse EngineeringAutomated static deobfuscation in the context of Reverse Engineering
Automated static deobfuscation in the context of Reverse Engineeringzynamics GmbH
 

More from zynamics GmbH (7)

Automated Mobile Malware Classification
Automated Mobile Malware ClassificationAutomated Mobile Malware Classification
Automated Mobile Malware Classification
 
VxClass for Incident Response
VxClass for Incident ResponseVxClass for Incident Response
VxClass for Incident Response
 
Malware classification
Malware classificationMalware classification
Malware classification
 
Hitb
HitbHitb
Hitb
 
Eusecwest
EusecwestEusecwest
Eusecwest
 
Bh dc09
Bh dc09Bh dc09
Bh dc09
 
Automated static deobfuscation in the context of Reverse Engineering
Automated static deobfuscation in the context of Reverse EngineeringAutomated static deobfuscation in the context of Reverse Engineering
Automated static deobfuscation in the context of Reverse Engineering
 

Recently uploaded

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 

Recently uploaded (20)

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 

Platform-independent static binary code analysis using a meta-assembly language

  • 1. Platform-independent static binary code analysis using a meta- assembly language Thomas Dullien, Sebastian Porst zynamics GmbH CanSecWest 2009
  • 2. Overview The REIL Language Abstract Interpretation MonoREIL Results 2
  • 3. Motivation • Bugs are getting harder to find • Defensive side (most notably Microsoft) has invested a lot of money in a „bugocide“ • Concerted effort: Lots of manual code auditing aided by static analysis tools • Phoenix RDK: Includes „lattice based“ analysis framework to allow pluggable abstract interpretation in the compiler 3
  • 4. Motivation • Offense needs automated tools if they want to avoid being sidelined • Offensive static analysis: Depth vs. Breadth • Offense has no source code, no Phoenix RDK, and should not depend on Microsoft • We want a static analysis framework for offensive purposes 4
  • 5. Overview The REIL Language Abstract Interpretation MonoREIL Results 5
  • 6. REIL • Reverse Engineering Intermediate Language • Platform-Independent meta-assembly language • Specifically made for static code analysis of binary files • Can be recovered from arbitrary native assembly code – Supported so far: x86, PowerPC, ARM 6
  • 7. Advantages of REIL • Very small instruction set (17 instructions) • Instructions are very simple • Operands are very simple • Free of side-effects • Analysis algorithms can be written in a platform-independent way – Great for security researchers working on more than one platform 7
  • 8. Creation of REIL code • Input: Disassembled Function – x86, ARM, PowerPC, potentially others • Each native assembly instruction is translated to one or more REIL instructions • Output: The original function in REIL code 8
  • 10. Design Criteria • Simplicity • Small number of instructions – Simplifies abstract interpretation (more later) • Explicit flag modeling – Simplifies reasoning about control-flow • Explicit load and store instructions • No side-effects 10
  • 11. REIL Instructions • One Address – Source Address * 0x100 + n – Easy to map REIL instructions back to input code • One Mnemonic • Three Operands – Always • An arbitrary amount of meta-data – Nearly unused at this point 11
  • 12. REIL Operands • All operands are typed – Can be either registers, literals, or sub-addresses – No complex expressions • All operands have a size – 1 byte, 2 bytes, 4 bytes, ... 12
  • 13. The REIL Instruction Set • Arithmetic Instructions – ADD, SUB, MUL, DIV, MOD, BSH • Bitwise Instructions – AND, OR, XOR • Data Transfer Instructions – LDM, STM, STR 13
  • 14. The REIL Instruction Set • Conditional Instructions – BISZ, JCC • Other Instructions – NOP, UNDEF, UNKN • Instruction set is easily extensible 14
  • 15. REIL Architecture • Register Machine – Unlimited number of registers t0, t1, ... – No explicit stack • Simulated Memory – Infinite storage – Automatically assumes endianness of the source platform 15
  • 16. Limitations of REIL • Does not support certain instructions (FPU, MMX, Ring-0, ...) yet • Can not handle exceptions in a platform- independent way • Can not handle self-modifying code • Does not correctly deal with memory selectors 16
  • 17. Overview The REIL Language Abstract Interpretation MonoREIL Results 17
  • 18. Abstract Interpretation • Theoretical background for most code analysis • Developed by Patrick and Rhadia Cousot around 1975-1977 • Formalizes „static abstract reasoning about dynamic properties“ • Huh ? • A lot of the literature is a bit dense for many security practitioners 18
  • 19. Abstract Interpretation • We want to make statements about programs • Example: Possible set of values for variable x at a given program point p • In essence: For each point p, we want to find Kp P(States) • Problem: P(States) is a bit unwieldly • Problem: Many questions are undecidable (where is the w*nker that yells „halting problem“) ? 19
  • 20. Dealing with unwieldy stuff • Reason about something simpler: Abstraction P(States) D Concretisation P(States) D • Example: Values vs. Intervals 20
  • 21. Lattices • In order for this to work, D must be structurally similar to P(States) • P(States) supports intersection and union • You can check for inclusion (contains, does not contain) • You have an empty set (bottom) and „everything“ (top) 21
  • 22. Lattices • A lattice is something like a generalized powerset • Example lattices: Intervals, Signs, P(Registers ) , mod p 22
  • 23. Dealing with halting • Original program consists of p1 ... pn program points • Each instruction transforms a set of states into a different set of states • p1 ... pn are mappings P(States) P(States) • Specify p'1  p'n : D D • This yields us ~ : D n D n p 23
  • 24. Dealing with halting • We cheat: Let D be finite  Dn is finite ~ • Make sure that is monotonous (like this talk) p • Begin with initial state I • Calculate ~(l ) p • Calculate ~( ~(l )) p p ~ n (l ) ~ n 1 (l ) • Eventually, you reach p p • You are done – read off the results and see if your question is answered 24
  • 25. Theory vs. practice • A lot of the academic focus is on proving correctness of the transforms pi P(States) P(States) pi ' D D • As practitioner we know that pi is probably not fully correctly specified • We care much more about choosing and constructing a D so that we get the results we need 25
  • 26. Overview The REIL Language Abstract Interpretation MonoREIL Results 26
  • 27. MonoREIL • You want to do static analysis • You do not want to write a full abstract interpretation framework • We provide one: MonoREIL • A simple-to-use abstract interpretation framework based on REIL 27
  • 28. What does it do ? • You give it – The control flow graph of a function (2 LOC) – A way to walk through the CFG (1 + n LOC) – The lattice D (15 + n LOC) • Lattice Elements • A way to combine lattice elements – The initial state (12 + n LOC) – Effects of REIL instructions on D (50 + n LOC) 28
  • 29. How does it work? • Fixed-point iteration until final state is found • Interpretation of result – Map results back to original assembly code • Implementation of MonoREIL already exists • Usable from Java, ECMAScript, Python, Ruby 29
  • 30. Overview The REIL Language Abstract Interpretation MonoREIL Results 30
  • 31. Register Tracking • First Example: Simple • Question: What are the effects of a register on other instructions? • Useful for following register values 31
  • 33. Register Tracking • Lattice: For each instruction, set of influenced registers, combine with union • Initial State – Empty (nearly) everywhere – Start instruction: { tracked register } • Transformations for MNEM op1, op2, op3 – If op1 or op2 are tracked  op3 is tracked too – Otherwise: op3 is removed from set 33
  • 34. Negative indexing • Second Example: More complicated • Question: Is this function indexing into an array with a negative value ? • This gets a bit more involved 34
  • 35. Negative indexing • Simple intervals alone do not help us much • How would you model a situation where – A function gets a structure pointer as argument – The function retrieves a pointer to an array from an array of pointers in the structure – The function then indexes negatively into this array • Uh. Ok. 35
  • 36. Abstract locations • For each instruction, what are the contents of the registers ? Let‘s slowly build complexity: • If eax contains arg_4, how could this be modelled ? – eax = *(esp.in + 8) • If eax contains arg_4 + 4 ? – eax = *(esp.in + 8) + 4 • If eax can contain arg_4+4, arg_4+8, arg_4+16, arg_4 + 20 ? – eax = *(esp.in + 8) + [4, 20] 36
  • 37. Abstract locations • If eax can contain arg_4+4, arg_8+16 ? – eax = *(esp.in + [8,12]) + [4,16] • If eax can contain any element from – arg_4mem[0] to arg_4mem[10], incremented once, how do we model this ? – eax = *(*(esp.in + [8,8]) + [4, 44]) + [1,1] • OK. An abstract location is a base value and a list of intervals, each denoting memory dereferences (except the last) 37
  • 38. Range Tracking eax.in + [a, b] + [0, 0] eax.in + a eax.in + b 38
  • 39. Range Tracking eax + [a, b] + [c, d] + [0, 0] eax + a eax + b [eax+a]+c [eax+a]+d [eax+a+4]+c [eax+a+4]+d [eax+b]+c [eax+b]+d 39
  • 40. Range Tracking • Lattice: For each instruction, a map: Register  Aloc Aloc • Initial State – Empty (nearly) everywhere – Start instruction: { reg -> reg.in + [0,0] } • Transformations – Complicated. Next slide. 40
  • 41. Range Tracking • Transformations – ADD/SUB are simple: Operate on last intervals – STM op1, , op3 • If op1 or op3 not in our input map M skip • Otherwise, M[ M[op3] ] = op1 – LDM op1, , op3 • If op1 or op3 is not in our input map M skip • M[ op3 ] = M[ op1 ] – Others: Case-specific hacks 41
  • 42. Range Tracking • Where is the meat ? • Real world example: Find negative array indexing 42
  • 43. MS08-67 • Function takes in argument to a buffer • Function performs complex pointer arithmetic • Attacker can make this pointer arithmetic go bad • The pointer to the target buffer of a wcscpy will be decremented beyond the beginning of the buffer 43
  • 44. MS08-67 • Michael Howard‘s Blog: – “In my opinion, hand reviewing this code and successfully finding this bug would require a great deal of skill and luck. So what about tools? It's very difficult to design an algorithm which can analyze C or C++ code for these sorts of errors. The possible variable states grows very, very quickly. It's even more difficult to take such algorithms and scale them to non-trivial code bases. This is made more complex as the function accepts a highly variable argument, it's not like the argument is the value 1, 2 or 3! Our present toolset does not catch this bug.” 44
  • 45. MS08-67 • Michael is correct – He has to defend all of Windows – His „regular“ developers have to live with the results of the automated tools – His computational costs for an analysis are gigantic – His developers have low tolerance for false positives 45
  • 46. MS08-67 • Attackers might have it easier – They usually have a much smaller target – They are highly motivated: I will tolerate 100 false positives for each „real“ bug • I can work through 20-50 a day • A week for a bug is still worth it – False positive reduction is nice, but if I have to read 100 functions instead of 20000, I have already gained something 46
  • 48. Limitations and assumptions • Limitations and assumptions – The presented analysis does not deal with aliasing – We make no claims about soundness – We do not use conditional control-flow information – We are still wrestling with calling convention issues – The important bit is not our analysis itself – the important part is MonoREIL – Analysis algorithms will improve over time – laying the foundations was the boring part 48
  • 49. Status • Abstract interpretation framework available in BinNavi • Currently x86 • In April (two weeks !): PPC and ARM – Was only a matter of adding REIL translators • Some example analyses: – Register tracking (lame, but useful !) – Negative array indexing (less lame, also useful !) 49
  • 50. Outlook • Deobfuscation through optimizing REIL • More precise and better static analysis • Register tracking etc. release in April (two weeks !) • Negative array indexing etc. release in October • Attempting to encourage others to build their own lattices 50
  • 51. Related work ? • Julien Vanegue / ERESI team (EKOPARTY) • Tyler Durden‘s Phrack 64 article • Principles of Program Analysis (Nielson/Nielson/Hankin) • University of Wisconsin WISA project • Possibly related: GrammaTech CodeSurfer x86 51
  • 52. Questions ? ( Good Bye, Canada ) 52