SlideShare una empresa de Scribd logo
1 de 70
Descargar para leer sin conexión
Crash Analysis with
Reverse Taint
Powered by Taintgrind
Marek Zmysłowski
V. 12122019
whoami
Security Researcher @
Interested in fuzzing and vulnerability finding
Fan of The Matrix and Hacker movies
Co-organizer of “H4x0r5 %40 Warsaw” meetings
Crash Analysis with Reverse Taint
Crash Analysis with Reverse Taint
- What is a crash?
Crash, or system crash, occurs when a computer program (...) stops functioning properly
and exits. *
An application typically crashes when it performs an operation that is not allowed by the
operating system. The operating system then triggers an exception or signal in the
application. Unix applications traditionally responded to the signal by dumping core. *
- Why is it important to identify a crash?
“performs an operation that is not allowed”. This indicate that inside the application a bug
exists. When the crash is identified and recurrent the bug can be found.
*https://en.wikipedia.org/wiki/Crash_(computing)
Crash Analysis with Reverse Taint
- How to identify a crash?
“operating system then triggers an exception or signal” Different operating
systems contain different mechanisms to “collect” crashes. Some perform core
dump, store the memory (Linux), other attach the debugger to allow user the
debugging session with the crashed application (Windows).
- How do we get crashes?
“That, Detective, is the ‘right question.’”
“I, Robot” (2004)
Hunting in the wild
- How can a crash be found?
By accident or by one of the most popular techniques that we will be also
mentioned here, fuzzing. The sooner the bug is found in the production
process, the less the costs are.
- So what is ‘fuzzing’?
The idea behind fuzzing is very simple. Let’s take an malformed input and feed it
to the application. Maybe it will crash. Of course, how the input is “chosen” and
how the crashes are caught is a topic for another presentation(s).
Fuzzers
The godfather of all is, of course, AFL.
However, recent years brought multiple
fuzzers used for different purposes. Some
of them are different clones of AFL, some of
them try to do things differently.
Everyone can find something for
themselves.
American Fuzzy Lop
HonggFuzz
AFL++
Angora
QSYM
WinAFL
Real Example - Fuzzing
jhead is used to display and manipulate data contained in the Exif header of
JPEG images from digital cameras. By default, jhead displays the more useful
camera settings from the file in a user-friendly format.
The version used here is 3.03.
http://www.sentex.net/~mwandel/jhead/
Real Example - Fuzzing
Crash
Example
Crash vs Bug
- What is the difference between Crash and Bug?
Crash is a result of incorrectly working code caused by a bug. Sometimes it
happens, that the crashing place and the bug place are “the same”. And
sometimes not ...
- Is one crash caused by one bug?
No.
Crash vs Bug
Case 1.
One bug causes one crash.
This is the easiest situation as the identification
is straightforward.
Crash vs Bug
Case 2.
One bug can cause a few
crashes.
This happens quite often especially
with simple buffer overflows where
the “size” variable is used. Direct
read or write and access different
memory regions cause different
crashes.
Crash vs Bug
Case 3.
A few bugs can cause one crash.
This depends on how we identify crash. The
simplest example can be a frame processor.
For the different types of frame, the size
parser works incorrectly and may cause
different crashes for different paths.
Crash vs Bug
In an additional experiment we computed a portion of groundtruth. We applied all patches to cxxfilt
from the version we fuzzed up until the present. We grouped together all inputs that a particular patch
caused to now gracefully exit [11], confirming that the patch represented a single conceptual bugfix. We
found that all 57,142 crashing inputs deemed “unique” by coverage profiles were addressed by 9
distinct patches.
Stack hashes did better, but still over-counted bugs. Instead of the bug mapping to, say 500 AFL
coverage-unique crashes in a given trial, it would map to about 46 stack hashes, on average.
Stackhashes were also subject to false negatives: roughly 16% of hashesfor crashes from one bug
were shared by crashes from another bug.In five cases, a distinct bug was found by only one crash,
and that crash had a non-unique hash, meaning that evidence of a distinct bug would have been
dropped by “de-duplication.”
“Evaluating Fuzz Testing” https://arxiv.org/pdf/1808.09700.pdf
Crash vs Bug
So what is needed?
Crash Analysis with Reverse Taint
Crash Analysis with Reverse Taint
- What is crash analysis and why do we need that?
Crash analysis is a process of evaluating exploitability of the crash and
identifying the root cause of this crash. If you are fuzzing something, the number
of crashes can be huge. Also the impact and consequences (criticality) the bug
might have, depends on the application technology and the system.
- So what exactly do we analyze?
There are two major things to analyze: the crash and the bug.
Analysis - Crash
- What type of the crash is it?
For example: Out-of-bound read, NULL Pointer Dereference, Buffer Overflow, etc.
- Is the crash exploitable?
It is a part of identification process to find out if the crash can be used to achieve
something more than just crash the application - read a piece data, overwrite
memory or execute code.
- Critical or exploitable - what is the difference?
The exploitability related only to bug and crash itself. The criticality is related to
the whole environment. A Safe NULL Pointer Dereference is different for a nuclear
power plant software and a kids game.
Analysis - Bug
The second important analysis part is to identify the bug. As this was mentioned
before, there can be different relations between bug and crash.
It is also important to how inputs (which bits and bytes) correlate with the bug.
This of course may influence the crash later and its exploitability.
- What different relation are here?
User data can control the crash directly (e: offset inside the table is calculated
based on user data) or indirectly (e: the incorrect branch is taken)
For example: NULL Pointer Dereference vs Safe NULL Pointer Dereference
Analysis - Bug (Direct vs. Indirect)
void *pointer = NULL
char table[100]
int index = 0
char user_data
user_data>100
pointer[index] = 0
table[user_data] = 0
Analysis - Few Interesting Tools
It runs crash files with instrumentation and outputs results in various formats.
It summarizes crashes in a crashwalk database by major / minor stack hash.
Although AFL (for example) already de-dupes crashes, bucketing summarizes
those crashes by an order of magnitude or more. Crashes that bucket the same
have exactly the same stack contents, so they're likely (not guaranteed) to be
the same bug.
It is a simple utility to output the filenames of all crashes matching a given hash. I
use it in combination with xargs to bulk delete / move crash files.
crashwalk
- cwtriage
- cwdump
- cwfind
https://github.com/bnagy/crashwalk
Analysis - Few Interesting Tools
afl-utils
- afl-collect
- afl-minimize
Copies all crash sample files from an afl synchronisation
directory (used by multiple afl instances when run in parallel)
into a single location providing easy access for further crash
analysis. Also executes exploitable on them and remove
uninteresting crashes.
Helps to create a minimized corpus from samples of a parallel
fuzzing job.
https://github.com/rc0r/afl-utils
afl-collect
https://github.com/rc0r/afl-utils
Results of the
“exploitable” plugin
afl-minimize
Reducing Input Files
Analysis - Few Interesting Tools
afl-analyze It takes an input file, attempts to
sequentially flip bytes, and observes
the behavior of the tested program. It
then color-codes the input based on
which sections appear to be critical,
and which are not.
While not bulletproof, it can often offer
quick insights into complex file
formats.
https://lcamtuf.blogspot.com/2016/02/say-hello-to-afl-analyze.html
afl-analyze
*Of course it works, it just does not always give expected results.
Crash Analysis with Reverse Taint
Tainting
- What is tainting?
The purpose of dynamic taint analysis is to track information flow between
sources and sinks. Any program value whose computation depends on data
derived from a taint source is considered tainted. Any other value is considered
untainted.
- What are the types of tainting?
● The direct value is tainted
● Indirect/Control flow
● Address/Pointer relation
https://users.ece.cmu.edu/~aavgerin/papers/Oakland10.pdf
Tainting - Types
Indirect/ Control Flow
if (X > 2)
Y = 5
else
Y = 10
Address/Pointer
Y = A[X]
Direct Value
Y = X + 2
Tainting Propagation (Policy)
Depends on the application, different rules can be used to propagate the taint. It
is a set of rules how the source operants are propagated to destination. In
standard taint analysis, the destination operand is typically marked as tainted if
any of the source operands is tainted regardless of how the specific semantics
of s affects its destination operands.
Tainting - Issues
- What is over-tainting?
Overtainting occurs when code or data identified by the analysis as tainted is
not in fact influenced by any taint source (false-positive).
- What is under-tainting?
Under-tainting occurs when code or data that is influenced by a taint source is
not identified by the analysis as tainted. Such imprecision can be problematic,
especially in systems where the result of the taint analysis is critically important
(false-negative).
Powered by Taintgrind
- What is Valgrind?
Valgrind
It is an instrumentation framework for
building dynamic analysis tools. It comes with
a set of tools each of which performs some
kind of debugging, profiling, or similar task
that helps you improve your programs.
Valgrind's architecture is modular, so new
tools can be created easily and without
disturbing the existing structure.
http://valgrind.org
Valgrind IR
Valgrind had an x86-specific,part D&R, part
C&A, assembly-code-like IR in which the
units of translation were basic blocks. Since
then Valgrind has had anarchitecture-
neutral, D&R, single-static-assignment
(SSA) IR that is more similar to what might
be used in a compiler. IR blocks are
superblocks: single-entry, multiple-exit
stretches of code.
*http://valgrind.org/docs/valgrind2007.pdf
Single-Static-Assignment (SSA)
It is a property of an intermediate representation (IR),
which requires that each variable is assigned exactly
once, and every variable is defined before it is used.
Existing variables in the original IR are split into
versions, new variables typically indicated by the
original name with a subscript in textbooks, so that
every definition gets its own version. In SSA form, use-
def chains are explicit and each contains a single
element.
*https://en.wikipedia.org/wiki/Static_single_assignment_form
- What is Taintgrind?
Taintgrind
Taintgrind is based on Valgrind's MemCheck and Flayer plugin.
Taintgrind borrows the bit-precise shadow memory from MemCheck and only
propagates explicit data flow. This means that Taintgrind will not propagate taint
in control structures such as if-else, for-loops and while-loops. Taintgrind will also
not propagate taint in dereferenced tainted pointers.
http://valgrind.org/docs/memcheck2005.pdf
Taintgrind - Propagation Rules
1. The direct value is tainted
2. Indirect/Control flow
3. Address/Pointer relation
Taintgrind - Propagation Rules
- What are the Taintgrind propagation rules?
The granularity for the memory operation is 1 byte.
For the registry operation it is the size related to the operand. Even if one byte
is used there, the whole register will still be tainted. In such case, the Taintgrind
is overtainting.
However, because the Taintgrind is handling first type, it is also under-tainting.
Taintgrind - Propagation Rules
WRITE READ
Overtainting bit-byte operation
Taintgrind
Here is the example of logs and how
the taint is propagated over the file.
The job is to find all the patch from
the end of the file to the beginning.
One instruction can be tainted with
multiple input.
Taintgrind
The original Taintgrind was not useful for the purpose of the reverse taint. It was
missing a few parts.
- What was changed?
The “Read” function was not showing the size of data that was read.
The “Load” and “Store” functions were also not presenting the size of the operation.
Tracked variables
Reported crash
Taintgrind
GDB
Function where the
crash occurred
Instruction that caused the crash
The crash occurred with the
reference to the address
stored in RAX register
GDB
Taintgrind
/work/taint-analysis/valgrind-
3.15.0/build/bin/valgrind
--tool=taintgrind --file-filter=/work/taint-
analysis/CRASH
--compact=yes
--taint-start=0
--taint-len=1504
/work/taint-analysis/jhead-3.03/jhead
/work/taint-analysis/CRASH
Taintgrind
/work/taint-analysis/valgrind-
3.15.0/build/bin/valgrind --tool=taintgrind
Calling the Taintgrind tool.
---file-filter=/work/taint-analysis/CRASH This is the name of the file that needs
to be tainted. It must be FULL path.
--compact=yes Makes the log file smaller.
--taint-start=0 Offset inside the file.
--taint-len=1504 Taint size
/work/taint-analysis/jhead-3.03/jhead
/work/taint-analysis/CRASH
Command
Crash Analysis with Reverse Taint
Reverse Tainting the Value
An example how the values are tracked.
Reverse Tainting the
Value
Parts of one tainted variable diagram
rtaint
https://github.com/Cycura/rtaint
rtaint
- -f
This is the name of the log file created by Taintgrind. It can be in the compact
version.
- -g
The script can also produce the file in dot format used to generate a graph.
- -s
This is the name of the file with the slice. Later, this can be used to display what
operations where tainted with the values.
- -k
This is the directory path where the KaiTai struct will be stored inside files.
Reverse Tainting
the Value
These are the
indexes from
the file that
are causing
the crash.
Crash File
with Kaitai
Struct
Reverse Tainting the Value
- With or without the file size?
What is the probability that different size files with the same KaiTai Struct will
have different root cause?
- What is the relation between AFL Unique Algorithm and the Tainted
Input?
It is an open question...
Reverse Tainting the Value - Results
413 total crashes found by 4 instances (1 master and 3 slaves)
- master - 44 crashes
- slave1 - 116 crashes
- slave2 - 124 crashes
- slave3 - 129 crashes
349 crashes were reproduced under Taintgraind
177 crashes had unique KaiTai structure.
Slicing
It is the computation of the set of program
statements, the program slice, that may
affect the values at some point of interest,
referred to as a slicing criterion.*
https://en.wikipedia.org/wiki/Program_slicing
Graph
Taintgrind and rtaint allows to
create a dot graph that can be
converted with the graphviz
package.
Simple Example
Crash Analysis with Reverse Taint
Powered by Taintgrind
Powered by ...
Moflow - https://github.com/Cisco-
Talos/moflow/tree/master/BAP-0.7-moflow
Binary Ninja - https://blog.trailofbits.com/2019/08/29/reverse-
taint-analysis-using-binary-ninja/
Triton - https://triton.quarkslab.com/
BARF - https://github.com/programa-stic/barf-project
libdft - http://www.cs.columbia.edu/~vpk/research/libdft/
CommercialFree
TETRANE - https://www.tetrane.com/
What Next?
The tainting starts from the last line inside the file. This is
useful when there is a crash. But there is no way to taint any
arbitrary instruction if the application doesn’t crash.
IDA Pro/Ghidra/Binary Ninja script for highlighting the tainted
instruction. This will help to easy identified the data flow.
The way as it is written currently makes it slow. Optimization
or the language change (thinking about Rust) is required.
Updates
- Address
- Scripts
- Speed
What Next?
Issues Currently the Taintgrind doesn't work on the ARM
processors. This is caused by the Valgrind itself. It is missing
some of the ARM conversions. The bug was already
reported.
Summary
The solution is based on the Valgrind/Taintgrind. It means that supports all the
system supported by Valgrind itself (+) But it also suffers from the Valgrind issues
(-)
The process of creating taint log is time consuming (-)
rtaint can be used in most of the cases making the analysis “faster” and
automated. Easy to incorporate to other tools. (+)
The Python may not be the best solution for the rtaint. Too slow? (-)
It requires more testing on the real live application. I’m happy to receive any
feedback :) (+)
References and Interesting Docs
https://github.com/wmkhoo/taintgrind
http://valgrind.org/
http://valgrind.org/docs/memcheck2005.pdf
https://users.ece.cmu.edu/~aavgerin/papers/Oakland10.pdf
https://www2.cs.arizona.edu/~debray/Publications/bit-level-taint.pdf
http://bitblaze.cs.berkeley.edu/papers/dta%2B%2B-ndss11.pdf
http://shell-storm.org/blog/Taint-analysis-and-pattern-matching-with-Pin/
https://www.blackhat.com/docs/eu-15/materials/eu-15-Kim-Triaging-Crashes-With-Backward-Taint-
Analysis-For-ARM-Architecture.pdf
Special Thanks
Wei Ming Khoo
Questions
Thank you :)
mzmyslowski@cycura.com
@marekzmyslowski
https://github.com/Cycura/rtaint
https://twitter.com/H4x0r54

Más contenido relacionado

La actualidad más candente

Serial Killer - Silently Pwning your Java Endpoints // OWASP BeNeLux Day 2016
Serial Killer - Silently Pwning your Java Endpoints // OWASP BeNeLux Day 2016Serial Killer - Silently Pwning your Java Endpoints // OWASP BeNeLux Day 2016
Serial Killer - Silently Pwning your Java Endpoints // OWASP BeNeLux Day 2016
Christian Schneider
 
Honeypots for Active Defense
Honeypots for Active DefenseHoneypots for Active Defense
Honeypots for Active Defense
Greg Foss
 

La actualidad más candente (20)

Practical Windows Kernel Exploitation
Practical Windows Kernel ExploitationPractical Windows Kernel Exploitation
Practical Windows Kernel Exploitation
 
ORM Injection
ORM InjectionORM Injection
ORM Injection
 
Serial Killer - Silently Pwning your Java Endpoints // OWASP BeNeLux Day 2016
Serial Killer - Silently Pwning your Java Endpoints // OWASP BeNeLux Day 2016Serial Killer - Silently Pwning your Java Endpoints // OWASP BeNeLux Day 2016
Serial Killer - Silently Pwning your Java Endpoints // OWASP BeNeLux Day 2016
 
Honeypots for Active Defense
Honeypots for Active DefenseHoneypots for Active Defense
Honeypots for Active Defense
 
Process hollowing
Process hollowingProcess hollowing
Process hollowing
 
How Functions Work
How Functions WorkHow Functions Work
How Functions Work
 
コンピュータフォレンジックにちょっとだけ触れてみる
コンピュータフォレンジックにちょっとだけ触れてみるコンピュータフォレンジックにちょっとだけ触れてみる
コンピュータフォレンジックにちょっとだけ触れてみる
 
Lie to Me: Bypassing Modern Web Application Firewalls
Lie to Me: Bypassing Modern Web Application FirewallsLie to Me: Bypassing Modern Web Application Firewalls
Lie to Me: Bypassing Modern Web Application Firewalls
 
Secure Coding 101 - OWASP University of Ottawa Workshop
Secure Coding 101 - OWASP University of Ottawa WorkshopSecure Coding 101 - OWASP University of Ottawa Workshop
Secure Coding 101 - OWASP University of Ottawa Workshop
 
Introduction httpClient on Java11 / Java11時代のHTTPアクセス再入門
Introduction httpClient on Java11 / Java11時代のHTTPアクセス再入門Introduction httpClient on Java11 / Java11時代のHTTPアクセス再入門
Introduction httpClient on Java11 / Java11時代のHTTPアクセス再入門
 
Race condition
Race conditionRace condition
Race condition
 
Client Side Exploits using PDF
Client Side Exploits using PDFClient Side Exploits using PDF
Client Side Exploits using PDF
 
64ビット対応Dllインジェクション
64ビット対応Dllインジェクション64ビット対応Dllインジェクション
64ビット対応Dllインジェクション
 
iOS Security
iOS SecurityiOS Security
iOS Security
 
Java Deserialization Vulnerabilities - The Forgotten Bug Class
Java Deserialization Vulnerabilities - The Forgotten Bug ClassJava Deserialization Vulnerabilities - The Forgotten Bug Class
Java Deserialization Vulnerabilities - The Forgotten Bug Class
 
Rust: Systems Programming for Everyone
Rust: Systems Programming for EveryoneRust: Systems Programming for Everyone
Rust: Systems Programming for Everyone
 
ACRiウェビナー:岩渕様ご講演資料
ACRiウェビナー:岩渕様ご講演資料ACRiウェビナー:岩渕様ご講演資料
ACRiウェビナー:岩渕様ご講演資料
 
Linux KVM のコードを追いかけてみよう
Linux KVM のコードを追いかけてみようLinux KVM のコードを追いかけてみよう
Linux KVM のコードを追いかけてみよう
 
Glibc malloc internal
Glibc malloc internalGlibc malloc internal
Glibc malloc internal
 
Blackhat USA 2016 - What's the DFIRence for ICS?
Blackhat USA 2016 - What's the DFIRence for ICS?Blackhat USA 2016 - What's the DFIRence for ICS?
Blackhat USA 2016 - What's the DFIRence for ICS?
 

Similar a Crash Analysis with Reverse Taint

Breaking av software
Breaking av softwareBreaking av software
Breaking av software
Joxean Koret
 
Breaking Antivirus Software
Breaking Antivirus SoftwareBreaking Antivirus Software
Breaking Antivirus Software
rahmanprojectd
 
Showing How Security Has (And Hasn't) Improved, After Ten Years Of Trying
Showing How Security Has (And Hasn't) Improved, After Ten Years Of TryingShowing How Security Has (And Hasn't) Improved, After Ten Years Of Trying
Showing How Security Has (And Hasn't) Improved, After Ten Years Of Trying
Dan Kaminsky
 
What
WhatWhat
What
anity
 

Similar a Crash Analysis with Reverse Taint (20)

nullcon 2011 - Reversing MicroSoft patches to reveal vulnerable code
nullcon 2011 - Reversing MicroSoft patches to reveal vulnerable codenullcon 2011 - Reversing MicroSoft patches to reveal vulnerable code
nullcon 2011 - Reversing MicroSoft patches to reveal vulnerable code
 
DEFCON 21: EDS: Exploitation Detection System WP
DEFCON 21: EDS: Exploitation Detection System WPDEFCON 21: EDS: Exploitation Detection System WP
DEFCON 21: EDS: Exploitation Detection System WP
 
Breaking Antivirus Software - Joxean Koret (SYSCAN 2014)
Breaking Antivirus Software - Joxean Koret (SYSCAN 2014)Breaking Antivirus Software - Joxean Koret (SYSCAN 2014)
Breaking Antivirus Software - Joxean Koret (SYSCAN 2014)
 
Breaking av software
Breaking av softwareBreaking av software
Breaking av software
 
Breaking av software
Breaking av softwareBreaking av software
Breaking av software
 
Breaking Antivirus Software
Breaking Antivirus SoftwareBreaking Antivirus Software
Breaking Antivirus Software
 
Malware 101 by saurabh chaudhary
Malware 101 by saurabh chaudharyMalware 101 by saurabh chaudhary
Malware 101 by saurabh chaudhary
 
Showing How Security Has (And Hasn't) Improved, After Ten Years Of Trying
Showing How Security Has (And Hasn't) Improved, After Ten Years Of TryingShowing How Security Has (And Hasn't) Improved, After Ten Years Of Trying
Showing How Security Has (And Hasn't) Improved, After Ten Years Of Trying
 
Call Graph Agnostic Malware Indexing (EuskalHack 2017)
Call Graph Agnostic Malware Indexing (EuskalHack 2017)Call Graph Agnostic Malware Indexing (EuskalHack 2017)
Call Graph Agnostic Malware Indexing (EuskalHack 2017)
 
What
WhatWhat
What
 
PVS-Studio Static Analyzer as a Tool for Protection against Zero-Day Vulnerab...
PVS-Studio Static Analyzer as a Tool for Protection against Zero-Day Vulnerab...PVS-Studio Static Analyzer as a Tool for Protection against Zero-Day Vulnerab...
PVS-Studio Static Analyzer as a Tool for Protection against Zero-Day Vulnerab...
 
Malware Classification Using Structured Control Flow
Malware Classification Using Structured Control FlowMalware Classification Using Structured Control Flow
Malware Classification Using Structured Control Flow
 
Debugging and optimization of multi-thread OpenMP-programs
Debugging and optimization of multi-thread OpenMP-programsDebugging and optimization of multi-thread OpenMP-programs
Debugging and optimization of multi-thread OpenMP-programs
 
A Smart Fuzzing Approach for Integer Overflow Detection
A Smart Fuzzing Approach for Integer Overflow DetectionA Smart Fuzzing Approach for Integer Overflow Detection
A Smart Fuzzing Approach for Integer Overflow Detection
 
Attacking antivirus
Attacking antivirusAttacking antivirus
Attacking antivirus
 
Parallel Lint
Parallel LintParallel Lint
Parallel Lint
 
Introduction into Fault-tolerant Distributed Algorithms and their Modeling (P...
Introduction into Fault-tolerant Distributed Algorithms and their Modeling (P...Introduction into Fault-tolerant Distributed Algorithms and their Modeling (P...
Introduction into Fault-tolerant Distributed Algorithms and their Modeling (P...
 
44CON 2014 - Breaking AV Software
44CON 2014 - Breaking AV Software44CON 2014 - Breaking AV Software
44CON 2014 - Breaking AV Software
 
How to find 56 potential vulnerabilities in FreeBSD code in one evening
How to find 56 potential vulnerabilities in FreeBSD code in one eveningHow to find 56 potential vulnerabilities in FreeBSD code in one evening
How to find 56 potential vulnerabilities in FreeBSD code in one evening
 
Cyber Defense Forensic Analyst - Real World Hands-on Examples
Cyber Defense Forensic Analyst - Real World Hands-on ExamplesCyber Defense Forensic Analyst - Real World Hands-on Examples
Cyber Defense Forensic Analyst - Real World Hands-on Examples
 

Último

Último (20)

Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 

Crash Analysis with Reverse Taint

  • 1. Crash Analysis with Reverse Taint Powered by Taintgrind Marek Zmysłowski V. 12122019
  • 2. whoami Security Researcher @ Interested in fuzzing and vulnerability finding Fan of The Matrix and Hacker movies Co-organizer of “H4x0r5 %40 Warsaw” meetings
  • 3. Crash Analysis with Reverse Taint
  • 4. Crash Analysis with Reverse Taint - What is a crash? Crash, or system crash, occurs when a computer program (...) stops functioning properly and exits. * An application typically crashes when it performs an operation that is not allowed by the operating system. The operating system then triggers an exception or signal in the application. Unix applications traditionally responded to the signal by dumping core. * - Why is it important to identify a crash? “performs an operation that is not allowed”. This indicate that inside the application a bug exists. When the crash is identified and recurrent the bug can be found. *https://en.wikipedia.org/wiki/Crash_(computing)
  • 5. Crash Analysis with Reverse Taint - How to identify a crash? “operating system then triggers an exception or signal” Different operating systems contain different mechanisms to “collect” crashes. Some perform core dump, store the memory (Linux), other attach the debugger to allow user the debugging session with the crashed application (Windows). - How do we get crashes? “That, Detective, is the ‘right question.’” “I, Robot” (2004)
  • 6. Hunting in the wild - How can a crash be found? By accident or by one of the most popular techniques that we will be also mentioned here, fuzzing. The sooner the bug is found in the production process, the less the costs are. - So what is ‘fuzzing’? The idea behind fuzzing is very simple. Let’s take an malformed input and feed it to the application. Maybe it will crash. Of course, how the input is “chosen” and how the crashes are caught is a topic for another presentation(s).
  • 7. Fuzzers The godfather of all is, of course, AFL. However, recent years brought multiple fuzzers used for different purposes. Some of them are different clones of AFL, some of them try to do things differently. Everyone can find something for themselves. American Fuzzy Lop HonggFuzz AFL++ Angora QSYM WinAFL
  • 8. Real Example - Fuzzing jhead is used to display and manipulate data contained in the Exif header of JPEG images from digital cameras. By default, jhead displays the more useful camera settings from the file in a user-friendly format. The version used here is 3.03. http://www.sentex.net/~mwandel/jhead/
  • 9. Real Example - Fuzzing
  • 11. Crash vs Bug - What is the difference between Crash and Bug? Crash is a result of incorrectly working code caused by a bug. Sometimes it happens, that the crashing place and the bug place are “the same”. And sometimes not ... - Is one crash caused by one bug? No.
  • 12. Crash vs Bug Case 1. One bug causes one crash. This is the easiest situation as the identification is straightforward.
  • 13. Crash vs Bug Case 2. One bug can cause a few crashes. This happens quite often especially with simple buffer overflows where the “size” variable is used. Direct read or write and access different memory regions cause different crashes.
  • 14. Crash vs Bug Case 3. A few bugs can cause one crash. This depends on how we identify crash. The simplest example can be a frame processor. For the different types of frame, the size parser works incorrectly and may cause different crashes for different paths.
  • 15. Crash vs Bug In an additional experiment we computed a portion of groundtruth. We applied all patches to cxxfilt from the version we fuzzed up until the present. We grouped together all inputs that a particular patch caused to now gracefully exit [11], confirming that the patch represented a single conceptual bugfix. We found that all 57,142 crashing inputs deemed “unique” by coverage profiles were addressed by 9 distinct patches. Stack hashes did better, but still over-counted bugs. Instead of the bug mapping to, say 500 AFL coverage-unique crashes in a given trial, it would map to about 46 stack hashes, on average. Stackhashes were also subject to false negatives: roughly 16% of hashesfor crashes from one bug were shared by crashes from another bug.In five cases, a distinct bug was found by only one crash, and that crash had a non-unique hash, meaning that evidence of a distinct bug would have been dropped by “de-duplication.” “Evaluating Fuzz Testing” https://arxiv.org/pdf/1808.09700.pdf
  • 16. Crash vs Bug So what is needed?
  • 17. Crash Analysis with Reverse Taint
  • 18. Crash Analysis with Reverse Taint - What is crash analysis and why do we need that? Crash analysis is a process of evaluating exploitability of the crash and identifying the root cause of this crash. If you are fuzzing something, the number of crashes can be huge. Also the impact and consequences (criticality) the bug might have, depends on the application technology and the system. - So what exactly do we analyze? There are two major things to analyze: the crash and the bug.
  • 19. Analysis - Crash - What type of the crash is it? For example: Out-of-bound read, NULL Pointer Dereference, Buffer Overflow, etc. - Is the crash exploitable? It is a part of identification process to find out if the crash can be used to achieve something more than just crash the application - read a piece data, overwrite memory or execute code. - Critical or exploitable - what is the difference? The exploitability related only to bug and crash itself. The criticality is related to the whole environment. A Safe NULL Pointer Dereference is different for a nuclear power plant software and a kids game.
  • 20. Analysis - Bug The second important analysis part is to identify the bug. As this was mentioned before, there can be different relations between bug and crash. It is also important to how inputs (which bits and bytes) correlate with the bug. This of course may influence the crash later and its exploitability. - What different relation are here? User data can control the crash directly (e: offset inside the table is calculated based on user data) or indirectly (e: the incorrect branch is taken) For example: NULL Pointer Dereference vs Safe NULL Pointer Dereference
  • 21. Analysis - Bug (Direct vs. Indirect) void *pointer = NULL char table[100] int index = 0 char user_data user_data>100 pointer[index] = 0 table[user_data] = 0
  • 22. Analysis - Few Interesting Tools It runs crash files with instrumentation and outputs results in various formats. It summarizes crashes in a crashwalk database by major / minor stack hash. Although AFL (for example) already de-dupes crashes, bucketing summarizes those crashes by an order of magnitude or more. Crashes that bucket the same have exactly the same stack contents, so they're likely (not guaranteed) to be the same bug. It is a simple utility to output the filenames of all crashes matching a given hash. I use it in combination with xargs to bulk delete / move crash files. crashwalk - cwtriage - cwdump - cwfind https://github.com/bnagy/crashwalk
  • 23. Analysis - Few Interesting Tools afl-utils - afl-collect - afl-minimize Copies all crash sample files from an afl synchronisation directory (used by multiple afl instances when run in parallel) into a single location providing easy access for further crash analysis. Also executes exploitable on them and remove uninteresting crashes. Helps to create a minimized corpus from samples of a parallel fuzzing job. https://github.com/rc0r/afl-utils
  • 26. Analysis - Few Interesting Tools afl-analyze It takes an input file, attempts to sequentially flip bytes, and observes the behavior of the tested program. It then color-codes the input based on which sections appear to be critical, and which are not. While not bulletproof, it can often offer quick insights into complex file formats. https://lcamtuf.blogspot.com/2016/02/say-hello-to-afl-analyze.html
  • 27. afl-analyze *Of course it works, it just does not always give expected results.
  • 28. Crash Analysis with Reverse Taint
  • 29. Tainting - What is tainting? The purpose of dynamic taint analysis is to track information flow between sources and sinks. Any program value whose computation depends on data derived from a taint source is considered tainted. Any other value is considered untainted. - What are the types of tainting? ● The direct value is tainted ● Indirect/Control flow ● Address/Pointer relation https://users.ece.cmu.edu/~aavgerin/papers/Oakland10.pdf
  • 30. Tainting - Types Indirect/ Control Flow if (X > 2) Y = 5 else Y = 10 Address/Pointer Y = A[X] Direct Value Y = X + 2
  • 31. Tainting Propagation (Policy) Depends on the application, different rules can be used to propagate the taint. It is a set of rules how the source operants are propagated to destination. In standard taint analysis, the destination operand is typically marked as tainted if any of the source operands is tainted regardless of how the specific semantics of s affects its destination operands.
  • 32. Tainting - Issues - What is over-tainting? Overtainting occurs when code or data identified by the analysis as tainted is not in fact influenced by any taint source (false-positive). - What is under-tainting? Under-tainting occurs when code or data that is influenced by a taint source is not identified by the analysis as tainted. Such imprecision can be problematic, especially in systems where the result of the taint analysis is critically important (false-negative).
  • 34. - What is Valgrind?
  • 35. Valgrind It is an instrumentation framework for building dynamic analysis tools. It comes with a set of tools each of which performs some kind of debugging, profiling, or similar task that helps you improve your programs. Valgrind's architecture is modular, so new tools can be created easily and without disturbing the existing structure. http://valgrind.org
  • 36. Valgrind IR Valgrind had an x86-specific,part D&R, part C&A, assembly-code-like IR in which the units of translation were basic blocks. Since then Valgrind has had anarchitecture- neutral, D&R, single-static-assignment (SSA) IR that is more similar to what might be used in a compiler. IR blocks are superblocks: single-entry, multiple-exit stretches of code. *http://valgrind.org/docs/valgrind2007.pdf
  • 37. Single-Static-Assignment (SSA) It is a property of an intermediate representation (IR), which requires that each variable is assigned exactly once, and every variable is defined before it is used. Existing variables in the original IR are split into versions, new variables typically indicated by the original name with a subscript in textbooks, so that every definition gets its own version. In SSA form, use- def chains are explicit and each contains a single element. *https://en.wikipedia.org/wiki/Static_single_assignment_form
  • 38. - What is Taintgrind?
  • 39. Taintgrind Taintgrind is based on Valgrind's MemCheck and Flayer plugin. Taintgrind borrows the bit-precise shadow memory from MemCheck and only propagates explicit data flow. This means that Taintgrind will not propagate taint in control structures such as if-else, for-loops and while-loops. Taintgrind will also not propagate taint in dereferenced tainted pointers. http://valgrind.org/docs/memcheck2005.pdf
  • 40. Taintgrind - Propagation Rules 1. The direct value is tainted 2. Indirect/Control flow 3. Address/Pointer relation
  • 41. Taintgrind - Propagation Rules - What are the Taintgrind propagation rules? The granularity for the memory operation is 1 byte. For the registry operation it is the size related to the operand. Even if one byte is used there, the whole register will still be tainted. In such case, the Taintgrind is overtainting. However, because the Taintgrind is handling first type, it is also under-tainting.
  • 42. Taintgrind - Propagation Rules WRITE READ Overtainting bit-byte operation
  • 43. Taintgrind Here is the example of logs and how the taint is propagated over the file. The job is to find all the patch from the end of the file to the beginning. One instruction can be tainted with multiple input.
  • 44. Taintgrind The original Taintgrind was not useful for the purpose of the reverse taint. It was missing a few parts. - What was changed? The “Read” function was not showing the size of data that was read. The “Load” and “Store” functions were also not presenting the size of the operation.
  • 46. GDB Function where the crash occurred Instruction that caused the crash
  • 47. The crash occurred with the reference to the address stored in RAX register GDB
  • 49. Taintgrind /work/taint-analysis/valgrind- 3.15.0/build/bin/valgrind --tool=taintgrind Calling the Taintgrind tool. ---file-filter=/work/taint-analysis/CRASH This is the name of the file that needs to be tainted. It must be FULL path. --compact=yes Makes the log file smaller. --taint-start=0 Offset inside the file. --taint-len=1504 Taint size /work/taint-analysis/jhead-3.03/jhead /work/taint-analysis/CRASH Command
  • 50. Crash Analysis with Reverse Taint
  • 51. Reverse Tainting the Value An example how the values are tracked.
  • 52. Reverse Tainting the Value Parts of one tainted variable diagram
  • 54. rtaint - -f This is the name of the log file created by Taintgrind. It can be in the compact version. - -g The script can also produce the file in dot format used to generate a graph. - -s This is the name of the file with the slice. Later, this can be used to display what operations where tainted with the values. - -k This is the directory path where the KaiTai struct will be stored inside files.
  • 55. Reverse Tainting the Value These are the indexes from the file that are causing the crash.
  • 57. Reverse Tainting the Value - With or without the file size? What is the probability that different size files with the same KaiTai Struct will have different root cause? - What is the relation between AFL Unique Algorithm and the Tainted Input? It is an open question...
  • 58. Reverse Tainting the Value - Results 413 total crashes found by 4 instances (1 master and 3 slaves) - master - 44 crashes - slave1 - 116 crashes - slave2 - 124 crashes - slave3 - 129 crashes 349 crashes were reproduced under Taintgraind 177 crashes had unique KaiTai structure.
  • 59. Slicing It is the computation of the set of program statements, the program slice, that may affect the values at some point of interest, referred to as a slicing criterion.* https://en.wikipedia.org/wiki/Program_slicing
  • 60. Graph Taintgrind and rtaint allows to create a dot graph that can be converted with the graphviz package.
  • 62. Crash Analysis with Reverse Taint Powered by Taintgrind
  • 63. Powered by ... Moflow - https://github.com/Cisco- Talos/moflow/tree/master/BAP-0.7-moflow Binary Ninja - https://blog.trailofbits.com/2019/08/29/reverse- taint-analysis-using-binary-ninja/ Triton - https://triton.quarkslab.com/ BARF - https://github.com/programa-stic/barf-project libdft - http://www.cs.columbia.edu/~vpk/research/libdft/ CommercialFree TETRANE - https://www.tetrane.com/
  • 64. What Next? The tainting starts from the last line inside the file. This is useful when there is a crash. But there is no way to taint any arbitrary instruction if the application doesn’t crash. IDA Pro/Ghidra/Binary Ninja script for highlighting the tainted instruction. This will help to easy identified the data flow. The way as it is written currently makes it slow. Optimization or the language change (thinking about Rust) is required. Updates - Address - Scripts - Speed
  • 65. What Next? Issues Currently the Taintgrind doesn't work on the ARM processors. This is caused by the Valgrind itself. It is missing some of the ARM conversions. The bug was already reported.
  • 66. Summary The solution is based on the Valgrind/Taintgrind. It means that supports all the system supported by Valgrind itself (+) But it also suffers from the Valgrind issues (-) The process of creating taint log is time consuming (-) rtaint can be used in most of the cases making the analysis “faster” and automated. Easy to incorporate to other tools. (+) The Python may not be the best solution for the rtaint. Too slow? (-) It requires more testing on the real live application. I’m happy to receive any feedback :) (+)
  • 67. References and Interesting Docs https://github.com/wmkhoo/taintgrind http://valgrind.org/ http://valgrind.org/docs/memcheck2005.pdf https://users.ece.cmu.edu/~aavgerin/papers/Oakland10.pdf https://www2.cs.arizona.edu/~debray/Publications/bit-level-taint.pdf http://bitblaze.cs.berkeley.edu/papers/dta%2B%2B-ndss11.pdf http://shell-storm.org/blog/Taint-analysis-and-pattern-matching-with-Pin/ https://www.blackhat.com/docs/eu-15/materials/eu-15-Kim-Triaging-Crashes-With-Backward-Taint- Analysis-For-ARM-Architecture.pdf