KEVIN BACKHOUSE
With the increasing awareness and adoption of DevSecOps, organisations are beginning to fully understand the crucial role that security plays, integrating it into every part of the development and deployment process, from start to finish. New processes such as vulnerability disclosures & bug bounty programs, red team exercises, pen-testing initiatives and static & dynamic code analysis are putting security front and center. These initiatives are proving to be an incredible source for discovering previously unknown vulnerabilities, and fixes are generally implemented and deployed pretty quickly. However, this response is often not quite enough.
In software development, we frequently see the same logical coding mistakes being made repeatedly over the course of a project’s lifetime, and sometimes across multiple projects. Sometimes there are a number of simultaneously active instances of these mistakes, and sometimes there’s only ever one active instance at a time, but it keeps reappearing. When these mistakes lead to security vulnerabilities, the consequences can be severe.
With each vulnerability discovered or reported, if the root cause was a bug in the code, we’re presented with an opportunity to investigate how often this mistake is repeated, whether there are any other unknown vulnerabilities as a result, and implement a process to prevent it reappearing. In this talk, I’ll be introducing Variant Analysis, a process for doing just this, and discuss how it can be integrated into your development and security operations. I’ll also be sharing real-world stories of what has happened when variant analysis was neglected, as well as stories of when it’s saved the day.
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
DevSecCon London 2018: Variant Analysis – A critical step in handling vulnerabilities
1. LONDON 18-19 OCT 2018
Variant Analysis – A critical step in
handling vulnerabilities
Kevin Backhouse
Sam Lanning
2. LONDON 18-19 OCT 2018
Variant Analysis: who is it for?
• Organizations that develop their own software
• The software is security or safety critical
• Primary use case: incident response
7. LONDON 18-19 OCT 2018
S2-008 Johannes Dahse, Bruce Phillips
S2-032 / CVE-2016-3081 Nike Zheng
S2-033 / CVE-2016-3087 Alvaro Munoz
S2-037 / CVE-2016-4438 Chao Jack PKAV_香草, Shinsaku Nomura
S2-045 / CVE-2017-5638 Nike Zheng
S2-046 / CVE-2017-5638 Chris Frohoff, Nike Zheng, Alvaro Munoz
S2-057 / CVE-2018-11776 Man Yue Mo
My colleague!
Apache Struts 2 OGNL injections
8. LONDON 18-19 OCT 2018
Apple packet-mangler (CVE-2017-13904, CVE-2018-4249)
9. LONDON 18-19 OCT 2018
packet-mangler.c (macOS 10.13)
• Two bugs
• Infinite loop
• Stack buffer overflow
• Both remotely triggerable (if packet-mangler is enabled)
10. LONDON 18-19 OCT 2018
while (tcp_optlen) {
if (tcp_opt_buf[i] == 0x1) {
PKT_MNGLR_LOG(LOG_INFO, "Skipping NOPn");
tcp_optlen--;
i++;
continue;
} else if ((tcp_opt_buf[i] != 0) && (tcp_opt_buf[i] != TCP_OPT_MULTIPATH_TCP)) {
PKT_MNGLR_LOG(LOG_INFO, "Skipping option %xn", tcp_opt_buf[i]);
tcp_optlen -= tcp_opt_buf[i+1];
i += tcp_opt_buf[i+1];
continue;
} else if (tcp_opt_buf[i] == TCP_OPT_MULTIPATH_TCP) {
int j = 0;
int mptcpoptlen = tcp_opt_buf[i+1];
…
for (; j < mptcpoptlen; j++) {
if (p_pkt_mnglr->proto_action_mask &
PKT_MNGLR_TCP_ACT_NOP_MPTCP) {
tcp_opt_buf[i+j] = 0x1;
}
}
tcp_optlen -= mptcpoptlen;
i += mptcpoptlen;
} else {
tcp_optlen--;
i++;
}
}
packet-mangler.c
macOS 10.13
1. Attacker controlled
2. Could be any value from -128 to
127
Out of bounds write if mptcpoptlen is large
Loops until tcp_optlen == 0
1. Grows if mptcpoptlen < 0
2. Goes negative if mptcpoptlen > tcp_optlen
11. LONDON 18-19 OCT 2018
while (tcp_optlen > 0) {
if (tcp_opt_buf[i] == 0x1) {
PKT_MNGLR_LOG(LOG_INFO, "Skipping NOPn");
tcp_optlen--;
i++;
continue;
} else if ((tcp_opt_buf[i] != 0) && (tcp_opt_buf[i] != TCP_OPT_MULTIPATH_TCP)) {
PKT_MNGLR_LOG(LOG_INFO, "Skipping option %xn", tcp_opt_buf[i]);
tcp_optlen -= tcp_opt_buf[i+1];
i += tcp_opt_buf[i+1];
continue;
} else if (tcp_opt_buf[i] == TCP_OPT_MULTIPATH_TCP) {
int j = 0;
unsigned char mptcpoptlen = tcp_opt_buf[i+1];
...
for (; j < mptcpoptlen && j < tcp_optlen; j++) {
if (p_pkt_mnglr->proto_action_mask &
PKT_MNGLR_TCP_ACT_NOP_MPTCP) {
tcp_opt_buf[i+j] = 0x1;
}
}
tcp_optlen -= mptcpoptlen;
i += mptcpoptlen;
} else {
tcp_optlen--;
i++;
}
}
packet-mangler.c
macOS 10.13.2
1. Attacker controlled
2. Could be zero
Don’t allow negative values
Cannot be negative
Bounds check
12. LONDON 18-19 OCT 2018
while (tcp_optlen > 0) {
if (tcp_opt_buf[i] == 0x1) {
PKT_MNGLR_LOG(LOG_INFO, "Skipping NOPn");
tcp_optlen--;
i++;
continue;
} else if ((tcp_opt_buf[i] != 0) && (tcp_opt_buf[i] != TCP_OPT_MULTIPATH_TCP)) {
PKT_MNGLR_LOG(LOG_INFO, "Skipping option %xn", tcp_opt_buf[i]);
/* Minimum TCP option size is 2 */
if (tcp_opt_buf[i+1] < 2) {
PKT_MNGLR_LOG(LOG_ERR, "Received suspicious TCP option");
goto drop_it;
}
tcp_optlen -= tcp_opt_buf[i+1];
i += tcp_opt_buf[i+1];
continue;
} else if (tcp_opt_buf[i] == TCP_OPT_MULTIPATH_TCP) {
int j = 0;
unsigned char mptcpoptlen = tcp_opt_buf[i+1];
...
for (; j < mptcpoptlen && j < tcp_optlen; j++) {
if (p_pkt_mnglr->proto_action_mask &
PKT_MNGLR_TCP_ACT_NOP_MPTCP) {
tcp_opt_buf[i+j] = 0x1;
}
}
packet-mangler.c
macOS 10.13.5
bounds check
13. LONDON 18-19 OCT 2018
packet-mangler.c (macOS 10.13)
• Two bugs
• Infinite loop
• Stack buffer overflow
• Both remotely triggerable (if packet-mangler is enabled)
14. LONDON 18-19 OCT 2018
int i = 0;
tcp_optlen = (tcp.th_off << 2)-sizeof(struct tcphdr);
PKT_MNGLR_LOG(LOG_INFO, "Packet from F5 is TCPn");
PKT_MNGLR_LOG(LOG_INFO, "Optlen: %dn", tcp_optlen);
orig_tcp_optlen = tcp_optlen;
if (orig_tcp_optlen) {
error = mbuf_copydata(*data, offset+sizeof(struct tcphdr), orig_tcp_optlen, tcp_opt_buf);
if (error) {
PKT_MNGLR_LOG(LOG_ERR, "Failed to copy tcp options");
goto input_done;
}
}
while (tcp_optlen > 0) {
if (tcp_opt_buf[i] == 0x1) {
...
packet-mangler.c
macOS 10.13.2User controlled (could be zero)
Could be negative
Implicit cast to size_t could overflow negatively.
Unlimited amount of user
controlled data gets copied to
stack
15. LONDON 18-19 OCT 2018
int i = 0, off;
off = (tcp.th_off << 2);
if (off < (int) sizeof(struct tcphdr) || off > ip_pld_len) {
PKT_MNGLR_LOG(LOG_ERR, "TCP header offset is wrong: %d", off);
goto drop_it;
}
tcp_optlen = off - sizeof(struct tcphdr);
PKT_MNGLR_LOG(LOG_INFO, "Packet from F5 is TCPn");
PKT_MNGLR_LOG(LOG_INFO, "Optlen: %dn", tcp_optlen);
orig_tcp_optlen = tcp_optlen;
if (orig_tcp_optlen) {
error = mbuf_copydata(*data, offset+sizeof(struct tcphdr), orig_tcp_optlen, tcp_opt_buf);
if (error) {
PKT_MNGLR_LOG(LOG_ERR, "Failed to copy tcp options: error %d offset %d optlen %d", error, offset, orig_tcp_optlen);
goto input_done;
}
}
while (tcp_optlen > 0) {
if (tcp_opt_buf[i] == 0x1) {
...
packet-mangler.c
macOS 10.13.5
bounds check
16. LONDON 18-19 OCT 2018
packet-mangler summary
• Multiple bugs found in 55 lines of code
• It took multiple tries to fix all the bugs:
• My initial PoC did not trigger all the bugs
• Apple only fixed the symptoms of the PoC
17. LONDON 18-19 OCT 2018
1. Badly tested area of the codebase
2. Flawed design makes the code bug prone
3. Confusing API leads to errors
4. Bug duplication due to copy/paste
5. The responsible developer made similar mistakes elsewhere
Reasons why bugs are rarely unique
vulns
bugs
Kev’s rule of thumb:
#bugs > 100 * #vulns
co-located bugs
scattered bugs
19. LONDON 18-19 OCT 2018
Techniques for discovering variants
1. Add a regression test
2. Code review:
• Thorough code review of the affected function/module
3. Add unit tests
• Check code coverage results
4. Fuzz testing
• Throwing random inputs at it might uncover other issues
• Use the known issue as a starting point
5. Check other code written by this developer
6. Search the code for similar patterns
20. LONDON 18-19 OCT 2018
Example of a dangerous coding pattern
librelp (rsyslog) CVE-2018-1000140
while(!bFoundPositiveMatch) { /* loop broken below */
…
iAllNames += snprintf(allNames+iAllNames, sizeof(allNames)-iAllNames,
"DNSname: %s; ", szAltName);
…
}
output is fed back into size argument
21. LONDON 18-19 OCT 2018
Code as data
• Import source code into a database
• Write queries to find patterns
22. LONDON 18-19 OCT 2018
Michael Fanning: “A Microsoft DevSecOps Static Application Security Testing (SAST) Exercise”
https://blogs.msdn.microsoft.com/devops/2018/08/21/microsoft-devsecops-static-application-security-testing-sast-exercise/
23. LONDON 18-19 OCT 2018
kev@semmle.com @kevin_backhouse
sam@semmle.com @samlanning
lgtm.com
Notas del editor
Hi, my name is Kevin Backhouse. I am a security researcher at Semmle, focusing on C and C++ applications.
The proposal and abstract for this talk were originally written by my colleague Sam Lanning. But he is unfortunately double-booked today, so he asked me if I could give the talk instead.
Just to briefly explain my background. I am relatively new to the security field. I only have been doing security research since approximately last summer. Before that, I was a developer. I have spent most of my career as a compiler engineer. So there are large areas of the security field and the DevOps field that I know very little or even nothing about. So for this talk, I am just to stick to the thing that I do know something about, which is about how to find and fix bugs in software.
Ok, so who is this talk aimed at?
I am going to talk about finding bugs in software.
More specifically, I am going to talk about finding bugs in software that you wrote.
For example, this is not about figuring out whether your people in your organization are running ancient versions of Internet Explorer that contain known vulnerabilities.
This is about finding and fixing vulnerabilities or safety issues that were created by your own developers.
Additionally, we are mainly talking here about software that is either security or safety critical.
For example, if the software is any way exposed to the public internet or other potentially attacker-controlled input.
Or if the software is in some way safety critical. For example, you are developing software for self-driving cars, or something like that.
Before I start talking about variant analysis in general, I am going to start with a few examples. I want to show that there have been a lot of high profile cases in which the same bugs have kept reappearing over and over again. The goal of variant analysis is to try to solve that.
This first example is something that I saw on twitter very recently. If you follow the trail of hyperlinks, what you see is a pretty incredible sequence of events.
If we follow the link from that tweet, we end up here. This is the bug tracker where Google Project Zero post the vulnerabilities that they have found.
You can see immediately from this comment that this is not the first bug that he has found in Ghostscript. He was reviewing the fix for a bug that he had reported previously and discovered that they hadn’t fixed it properly.
Something else that I just want to quickly highlight here is this comment about the 90 day disclosure deadline. Google Project Zero are pretty strict about this. If you don’t fix the vulnerability within 90 days, that’s just tough luck: they’re going to publish anyway. And this is pretty standard practice. Most security researchers don’t have an automated system like this that will automatically publish after a fixed period of time, but it is certainly common practice to set a reasonable deadline. So if you are on the receiving end of bug report like this then there is usually time pressure involved. That’s a topic that I will return to later.
Ok, so let’s follow the trail to issue 1690. And yet again, we see a comment referring to a previous bug!
And if we click the link, the trail of misery continues! It just keeps on going.
I am going to stop here, but I think you get the picture.
Here’s another example. OGNL is a scripting language that is used Apache Struts 2. It’s only supposed to be used inside the application, but over the years researchers have found numerous ways to pass OGNL into Struts by connecting to Struts with a specially crafted url. You can see that the first of these bugs was found in 2012 and have kept popping up over the years. The most recent one was found by my colleague Mo, a few months ago.
This example is a bug that I found myself, so I am going to go into a bit more technical detail on this one. What I want to do is show you a very specific example of what the bugs were and how they weren’t fixed properly.
First though, I am going to show this video, which shows what the effect of the bug was.
So there were actually two distinct bugs in the same piece of code. I discovered the infinite loop bug first and the stack buffer overflow a little bit later. So I am going to explain the infinite loop bug first and move onto the buffer overflow later.
So returning to the slide that I showed you earlier, I said that there were two bugs. I have shown you the infinite loop bug, but what about the stack buffer overflow?
What can we learn from the packet-mangler bugs? There were multiple bugs in a 55 line section of code. And it took Apple more than one attempt to fix it properly.
Personally, I learned that I cannot assume that the developers will see everything that I see. Spell it out.
Bugs are rarely unique. These are some of the reasons why.
Maybe this is a low quality section of the codebase? It might have been written in hurry with low quality standards. Has it been tested properly?
Sometimes the design of the software is fundamentally flawed and the developers are paying a game of whack-a-mole. I know from my own past experience working on large software projects that this is not an unusual scenario. A well-known and important example of this is Java deserialization. That was a bad design decision was made a long time ago and that is very difficult to reverse due to backwards compatibility issues.
Sometimes an API is non-intuitive or has some subtle gotchas which can cause even very diligent developers to make mistakes. One example of this was the recent Zip Slip vulnerability, where you might use a library to unzip a file. And you might have no idea that this could expose you to path traversal vulnerabilities (where someone adds ../ to a filename in the zip archive). Another example is the snprintf overflow gotcha that I have written a blog post about. The return value of snprintf is quite non-intuitive which can lead to buffer overflow vulnerabilities in certain situations.
Code gets copied all the time. And it isn’t necessarily bad practice. If you want to know how to implement something, then you go to look for examples of how other people have done similar things. Sometimes you might find an example elsewhere in the same codebase and other times you might find an example on a website like Stack Overflow. And it’s natural to assume that the person who wrote the code that you are copying knew what they were doing. So if that person did make a mistake then that mistake can easily start spreading to other parts of the codebase.
Some developers are just aren’t very careful. And if they have introduced a vulnerability here, then there’s a good chance that they have introduced similar vulnerabilities in other parts of the code that they have worked on. I think it’s also worth mentioning that organizations often unintentionally encourage sloppy coding because it’s much easier to measure the quantity of code that someone produces than the quality. So somebody who quickly churns out a lot of new features is much more likely to be held up as a “rockstar coder” and the fact that the quality of their work is low might go unnoticed.
I have a theory that for every vulnerability in the code there are at least 100 regular bugs. I want to emphasize that I don’t have any statistical evidence for this. It is purely anecdotal, based on my experience of hunting for vulnerabilities. As a security researcher, I don’t care about bugs. I only care about exploitable bugs. So when I see a potential bug in some code, I ignore it unless I think I can write an exploit for it. And I estimate that I pass over approximately 99 out every 100 bug candidates that I look at.
So if you are on the receiving end of a vulnerability report, I think you have to assume the opposite: that the security researcher ignored 99 other possible bugs before they found the one that they sent to you.
So a bug was found. Maybe it was found by your testing team, or maybe it was reported to you by a customer. Or more seriously, maybe there was an accident or a security breach. What now?
Obviously the first step is to diagnose what went wrong and find and fix the bug.
But, as we just saw, that’s not enough. Most bugs are not unique. There’s a good chance that a similar mistake was made elsewhere in the code. So we need to search for variants. And bear in mind that there’s time pressure involved here because you only have a limited amount of time before you have to disclose the vulnerability. So you have to find as many variants as you can before you hit that deadline.
Of course the final step is to make sure that similar bugs don’t happen again.
Here are some ways to discover variants. Step zero is the obvious first response, but it is very unlikely to find any new variants! That’s why you need to do the other stuff too.
Code review. This also seems obvious, as the packet-mangler example showed, I don’t think Apple did it.
Unit tests: obvious. I recommend that you use a code coverage tool to check that the new tests give you really good coverage on the affect file/module. Code that hasn’t been tested usually doesn’t work.
Fuzz testing. This basically means hitting the code with randomly generated input. It isn’t always easy to do, but you have a known issue to start from, so you might be able to generate some random variations of it to check for other issues.
The techniques up to number 3 are mainly good for finding the co-located bugs. But we also need to look for variants elsewhere in the codebase.
One thing that we can do is number 4. But that doesn’t help if the bugs were caused by copy/paste or something like a confusing API. So we also need to search the codebase for similar patterns.
So what do I mean when I say that we should look for similar code patterns?
If we were talking about the packet mangler vulnerability that I showed you before then one of the key features is that the loop doesn’t obviously terminate because it uses the -= operator to update the counter and it’s far from obvious that the counter is decremented by the correct amount. Another thing to look for is code that handles the “tcphdr” type because that almost certainly means that it’s handling untrusted data.
This code snippet shows a different example. This is a snippet of code from the librelp library which is used by rsyslog, a widely used logging tool on Linux.
Everyone knows that snprintf is the safe version of sprintf. It stops you from getting buffer overflows, right?
You pass the size of the buffer in as the second argument and snprintf will never write off the end of the buffer, even if the string doesn’t fit.
This piece of code is writing multiple strings into a buffer. So on each iteration it updates the number of bytes that it has written so far, so that it can pass the correct size argument to snprintf.
So this code looks pretty sensible. What could be wrong with it?
The thing about snprintf is that its return value isn’t what you would probably expect. If the string was too big for the buffer, then it returns the number of bytes that it would have written, if the buffer had been big enough.
So this means that iAllNames can become bigger than the size of the buffer.
And then the real problem happens on the next iteration when you get a negative integer overflow in the calculation of the size argument. The size argument is unsigned, so it wraps and becomes huge, which means that an attacker can write an almost unlimited number of bytes to the buffer. The other thing that is really bad is that by controlling the length of the penultimate string, you can choose where you want the buffer overflow to go. It doesn’t have to go immediately after the end of the buffer. This means that you have a lot of control over which bytes you overwrite. It also means that you can skip over the stack canary, so you can easily bypass the stack protector mitigation.
Ok, so what’s the pattern that we want to look for here. There are three key components to it:
call to snprintf
format string contains a %s
the output of snprintf is used to calculate the size argument on the next iteration
So those are the key components of the pattern. How do we search for it?
The idea is to treat code as data. You import all of your source code into a database, and then you can use queries to search it.
When I say it like that, it probably sounds a bit far fetched, but this concept “code as data” is the basis of all our technology at Semmle. But I am not going to talk about this too much because I don’t want this to start sounding like a vendor pitch. So the main point is that I didn’t just make this up! It really works and you can read more about it after the presentation if you are interested.
So what we can do is write queries that look for dangerous patterns so that you can make sure that you have fixed all the problems, not just the one that you already know about.
This diagram is a more sophisticated version of the diagram with 4 boxes that I showed you earlier.
You can see this diagram on a blog post written by Michael Fanning, who works at Microsoft. It’s well worth a read and I recommend that you check it out.
Microsoft have a lot of experience dealing with security incidents. So they have spent a lot of time honing their process. And they are one of the pioneers of this concept of variant analysis.