2. About me
4th year speaking at 44CON
- 2012: Malware as a hobby [P]
- 2013: Controlling a PC using Arduino [WS]
- 2014: Malware analysis as a big data problem [P]
- 2015: Malware anti-reversing [P], Indicators of Compromise [WS]
Malware Researcher, Founder Malware Research Institute
6 kids, one more on the way…
5. Detecting the Unknown
FBI: There are only two types of companies: those that have been hacked,
and those that will be.
Always assume that you have been compromised and look for signs to
confirm the assumption
6. Where to look
There is gold in those logfiles!
Firewall
IDS / IPS
Proxy
DNS
System logfiles
Netflow data
7. Firewall
New sessions are enough, no need to log every packet
Ingress (incoming) AND Egress (outgoing)
Denied AND Permitted
8. IDS / IPS
Detecting attacks are ”nice”, detecting compromises are ”cool”
You need actionable information from your IDS / IPS system
Custom rules are the path to salvation
10. DNS
Log queries
Establish DNS query & response baseline
Analyze NXDOMAIN responses
Analyze successful DNS lookups
Identify domain name abnormalities
11. System logfilesWindows 7 regular expressions SOURCE EventID
Number
.*APPCRASH.* Application 1001
.*he protected system file.* Application 64004
.*EMET_DLL Module logged the following event:.* Application 2
.*your virus/spyware.* Application Depends
.*A new process has been created..* Security 4688
.*A service was installed in the system..* Security 4697
.*A scheduled task was created..* Security 4698
.*Logon Type:[W]*(3|10).* Security 4624,
4625
.*SoftwareMicrosoftWindowsCurrentVersionRun.* Security 4657
.*service terminated unexpectedly..* System 7034
.*service was successfully sent a.* System 7035
.*service entered the.* System 7036
.*service was changed from.* System 7040
12. Netflow data
WHO is talking to WHOM
When doing incident response, being able to narrow down the scope is
key
13. Aquire the sample
Exctraction from network traffic
File on disk
Memory dump
18. Cuckoo Sandbox
Uses DLL-injection techniques to intercept and log specific API calls
Uses TCPDump to capture network traffic
19. Minibis
Uses Microsoft ProcMon inside the instrumented environment
Uses TCPDump to capture network trafic
ProcDOT can be used to analyze / visualize the execution process
20. Identify IOCs
Identifiable patterns in the sample
Created files
Created / Modified registry keys
Network traffic
Memory patterns
25. Searching Network Traffic
Firewall
Detection, Block specific communication
IDS / IPS
Create signatures to Detect and Prevent C2 communication, additional
infections
Proxy
Detection, Block specific communication
DNS
Detection, Block communication to sites
27. Announcement
Public VXCage-server
Available at vxcage.malwareresearch.institute (http, soon https)
Feel free to apply for a personal account, free of charge:
TO: michael@michaelboman.org
SUBJECT: VXCage Access
BODY:
Who you are: name, twitter handle (if any, for cyberstalking), other contact info
Why you want access
Proposed username for the system (the password will be generated for you)
Please contact me at the above address for raw access to the archive
28. VXCage API: Quick intro
REST with JSON output
/malware/add – upload sample
/malware/get/<sha256> - download sample
/malware/find – search sample based on hash, date, tag
/tags/list – list tags
Docs & Source code at https://github.com/mboman/vxcage
Hi! Good mom. Thanks for having me.
My name is Michael Boman and I am a Senior Malware Analyst at the Malware Research Institute, an organization that promotes malware research and tools and techniques for malware analysis. We are a young organization, just started out this year. I myself have been speaking on the topic of malware analysis at conferences like 44CON in London and DEEPSEC in Vienna as well as at different OWASP chapters here in Sweden.
This talk will cover things like network monitoring, network forensics, log analysis, memory accusition, malware analysis, creating signatures for files and network traffic etc, all topics worth a talk on their own so please excuse me if I don't go into great details on every single topic.
FBI recently said that basically everyone is or going to be hacked, and that your organization is either a target because you have something of value or that you can be leverage to gain something of value – or just for the LOLs.
If you assume that your systems and network infrastructure is compromised, how would you act differently? And how would you go about to identify the compromised assets?
<open feedback – whiteboard>
You might already have many of the systems on the list, but are you using them to the fullest? Make use of your existing IT investment.
Firewalls can be used to so much more then just to block traffic, with the right rules your IDS or IPS can do much more then just detecting attacks, the proxy you have to cache internet traffic or prevent users to surf questionable sites can also be used to detect malware infections. Have you ever thought about using the DNS? You know - the service which lets you type www.facebook.com instead of 31.13.64.1 - for malware hunting? The system logs. used correctly, is a gold mine for incident response and you know your network switches? They are sitting on a gold mine when it comes to traffic analysis!
Don't start spending a lot of money on new toys, learn how to use the tools you already have in a new, efficient way.
Your firewall has real gold if you do your logging right.
A few years back while I was working as a consultant me and a college was assigned to a municipality who was informed by their ISP that if they don’t stop spamming they will terminate their internet connection, and as their internet connection was providing everything from the local schools to city hall they were in a bit of panic.
They didn’t have any fancy equipment, not even any particular new one at that, so sniffing traffic was kind of a headache. So what we did was blocking outgoing traffic on port 25 – That’s SMTP which is used for sending email out – from everything that isn’t their official email server and then log all blocked connections. That became their source of machines to take in and re-image. I was told that the first machine belonged to a student who got his laptop repossessed from him in the middle of class by fairly large 3 IT-guys…
Anyway, so make sure your firewall does not only block everything you don’t want IN OR OUT of your network, and that you log traffic regardless if you permit it or not. And I don’t mean that you need to log every single packet, but all new connections is a good start.
How many of you have got an IDS or IPS? Raise your hands.
For those who have one, what are you looking for? Does your vendor support custom rules, and by that I mean are you able to write your own signatures and have you created any custom rules specific for your organization? One cool custom rule one can write is one that alerts, or logs, any traffic that goes to your ”dark” IP:s, meaning IP:s that you haven’t assigned to a host yet. As it is unused there shouldn’t be any traffic except misconfigured systems and attacks, both worth knowing about. Another important thing to take note of from your IDS or IPS is answer to the question: “Did it succeed?”
Frankly, I don’t care if we got 10 thousand attacks against our system in the last 24 hours, what I want to know: “DID ANY OF THEM SUCCEED”? Make sure that if you are looking for an IDS or IPS solution it can help you answer that question.
If you can you should record all network traffic data using something like daemon logger – available at sourceforge – which logs all the packets to disk, removing old packet captures based on configuration. Having full packet captures are golden because even if you missed the initial attack or need to verify if the attack was successful you still have the ability to do so.
How many of you work in an organization that, for whatever reason, forces you to surf through a proxy? Raise your hands. Are those requests logged? Is anyone looking at those logs for anything more than “damn you surf the internet a lot” statistics? Doing some analysis of those logs can be a useful source for indications of compromise. What to look at are hostnames, urls and downloaded files and user agents, and it a great source for finding additional comprised systems.
You can also use the proxy logs to detect data exfiltration by looking at POST requests and their sizes.
How about DNS traffic? Does anyone monitor your DNS traffic?
What you need to do is to start logging DNS queries and responses. You can either configure your local resolver to perform this logging or use packet capture techniques to log them. I would recommend that you use something like PassiveDNS - an open source tool written in Python and available on Github - to achive this goal as you don’t need to make any changes to your DNS infrastructure to get the data. If you place your sensor right you will also detect traffic that goes directly to external resolvers.
Once you have collected DNS requests and response it is time to analyze the data. The first thing you need to do is to establish a baseline. How does ”business as usual” look in your environment? Unfortunatly all environments are different so I can’t give you any shorthand tips on how it should look like.
After you have created a baseline you can take a look at all the NXDOMAIN responses. NXDOMAIN is the response you get when the hostname doesn’t resolve. This datapoint is extremely useful as domain generating algorithms used by malware fails a lot, because the bad guy only need successful response on one of the possible domains to control the botnet and it can be quite expensive to buy more domains then required.
By logging successful DNS lookups you can detect when a DNS entry changes from one IP to another, or an IP has several hostnames (the hosting server is supplying malware under many different DNS names). Suddenly you can find a whole bunch of new malware distributing sites just by looking at DNS requests and responses? Isn't that cool?
DGA - Domain Generation Algorithms - are used to create domains for C2 communication and they can make it very hard to block the traffic on a domain name level, but on the other hand DGAs generates very distinct and easy-to-spot domain names which you can locate using statistical analysis.
You should also compare DNS requests with known malicious domain names using blacklists from sites from Malc0de, Malware Domain List, Malware URLs, VX Vault, URLquery, CleanMX, ZeusTracker etc. and use the result as a input for further analysis.
Another data source to add to the DNS data is WHOIS information about the registrator and who registered the domain and how old the domain is.
How about system log files, are you actively collecting and looking through those logs for signs of compromise?
<CLICK>
Crashed applications, new services and scheduled jobs are just a few of many log entries that can indicate a system compromise. The approach you need to take is to filter out known good an investigate all other events. SANS Institute has several good Intrusion Discovery Cheat Sheet for both Windows and Linux systems.
One way to harden your Windows machines is to install EMET, the Microsoft Enhanced Mitigation Experience Toolkit, is a free utility that helps prevent vulnerabilities in software from being successfully exploited. EMET achieves this goal by using security mitigation technologies like:
Data execution prevention -- a security feature that helps prevents code in system memory from being used incorrectly
Mandatory address space layout randomization -- a technology that makes it difficult for exploits to find specific addresses in a system's memory
Structured exception handler overwrite protection -- a mitigation that blocks exploits that attempt to exploit stack overflows
Export address table access filtering -- a technology that blocks an exploit's ability to find the location of a function
Anti-Return Oriented Programming -- a mitigation technique that prevents hackers from bypassing DEP
SSL/TLS certificate trust pinning -- a feature that helps detect man-in-the-middle attacks leveraging the public key infrastructure
Apart from hardening the Windows system it will feed additional events to the system and security logs when the exploit fails.
Is ANYONE here collecting net flow data which contains information on WHO is talking to WHOM? Netflow is the protocol that keeps track on who speaks to whom, when and what ports are being used as well as how much data is being transfered.
In a incident response scenario, being able to map about what servers are talking to each other is a gold mine and a life saver. Think about it, any machine the compromised machine has spoken to is potentionally compromised, and those machines that has not been contacted is fairly unlikely to be affected. This is very useful to know when your kick in your triage kit to verify if the system has been compromised, as it could be a resource intensive action to look at signs of compromise and think about the need to reinstall compromised machines from a known good state to make sure that the infection has been erradicated? Would you reinstall perfectly healty systems because you didn't know the scope of the compromise? How about missing a system or two? You can't go around and nuke the whole IT infrastructure just because Alice in HR got infected while opening a job application...
If you can’t use Netflow from your network infrastructure you can use SANCP (Security Analyst Network Connection Profiler) to extract the same kind on information.
I am telling you, there are gold in those logs!
Let’s say that you now found an infected machine and you have decided to take a look at the particular malware. I believe this is a very important step, don’t just re-image the system and walk away. I’d say it is your duty as a security guy to know why your defenses failed and also make sure you got a complete scoop of the infection. Is it only this particular machine that is compromised or is it elsewhere in your organization?
First of all you need to get hold of the sample. Grabbing the initial exploit and downloader can be challenging because maybe it was just an in-memory kind of thing, but if the attacker wants persistence – meaning surviving a simple reboot – they will need to commit some data to disk. Maybe you can’t get hold of all the different parts of the compromise but some is better than none. When things gets really challenging is when the binary on disk is encrypted, in which case you want to grab a copy from running memory which has to be unencrypted to be able to run. There are plenty of anti-forensic techniques to stop this as well but fortunately they are not too common yet.
Let's start with extracting files from the network traffic. You can do it in many ways using tools like Wireshark, NetworkMiner, Foremost and Dshell – among others.
Foremost is a open source file carving tool originally developed by US Air Force. It is mainly used for extracting files from hard drives or hard drive images but can be used to extract files from network captures are well.
Dshell is an extensible network forensic analysis framework which enables rapid development of plugins to support the dissection of network packet captures. To extract files you just issue decode –d rip-http --rip-output_dir=output/ /path/to/pcap. Optionally you can specify using the --rip-http_content_filter and –rip-http_name_fiter options what kind and/or names of files you want to extract. DShell is quite a new tool, but can replace many other tools mentioned like PassiveDNS and SANCP for Netflow-like output. That said, DShell is more of a analysis and interactive tool so you still want to use PassiveDNS and Netflow or SANCP for collecting data.
MDD, or Memory Data Dumper, is a physical memory acquisition tool for imaging Windows based computers. It is like the unix-tool DD, but for memory. Used together with PsExec from Microsoft Sysinternals you can execute it on remote machines as long as you have a privileged account like a domain administrator. The resulting file is the same size as the physical RAM on the system. Mandiant Memoryze is another tool you can use to grab the RAM. If you are trying this from a Linux environment you could use Metasploit instead of PsExec to execute commands remotely using the credentials of a privileged account.
Once you got a memory dump it is time to extract the malware from it. For this you can either use DLLDUMP command for extracting DLL-files or PROCMEMDUMP to extract executables. If the malware has some anti-forensics enabled it will be more difficult to dump the RAM and extract the malware from the memory image.
Once you got hold of the malware sample you need to analyze it for capabilities and make sure you are acting on a real compromise. This step is important so you don't waste resources and activating incident management procedures for benign files. You also need to find out what kind of damage the malware has done, what kind of information has been sought and possible been compromised, what damage has the malware possible been doing?
There are two ways to analyze code: static and dynamic. Static is when you pick the binary apart and look at what libraries it imports, what strings it has even loading the code into a disassembler like IDA Pro.
Dynamic analysis is when you run the sample in a instrumented environment, which is a fancy word to describe that you in some way detect and log what the malware does to the system. It can range from using regshot available at sourceforge [http://sourceforge.net/projects/regshot/] and sysdiff, a now discontinues tool from Microsoft to customized hypervisors. The problem with using regshot and sysdiff is that it doesn’t record temporary files and registry entries that are created and removed between the different snapshots.
Cuckoo Sandbox is using DLL-injection technique to “hijack” API calls of interest and tcpdump to record network traffic to and from the instrumented system. I can also take optional screenshots at configurable intervals. The drawback from this approach is that DLL-injection can be discovered and that it will miss any API-call it haven’t specifically been told to log. Over all it is a very nice tool that has a lots of features to simulate user interactivity and detect suspect behavior using behavior-based signatures.
You can download Cuckoo from cuckoo sandbox DOT org.
Minibis uses ProcMon from Microsoft, former SysInternals, to record what is happening on the system and tcpdump to record traffic to and from the instrumented system. By using standard tools like ProcMon it has less custom code to maintain, but more components, like a FTP server for supporting file transfers to and from the instrumented system. Another cool feature is the graphical viewer ProcDOT that combines the ProcMon output with the TCPDump traffic to visualize a timeline of events.
Both Minibis and ProcDOT is available from the CERT dot AT website.
Indicators of Compromise, or IOCs for short, are ways to detect if a system has been compromised by looking for specific patterns in files, created files and mutexes, created or modified registry keys on a system. On the network we can look at specific patterns for the command and control traffic or we can search for patterns in the system memory. A mutex, short for mutual exclusion object, is a program object that allows multiple program threads to share the same resource, such as file access, but not simultaneously.
From easy to hard I recommend you approach the IOC identification in this order: Files, mutexes and registry keys followed by network analysis and finally memory dumps.
YARA is the Swish army knife of binary pattern matching. It understands both binary and printable patterns, ASCII and Unicode aware and you can ask it to match patterns a specific distance or range from another pattern. I would need another hour just to tell you all the cool features with YARA but here’s a quick example.
First we have a rule name (“silent_banker”) and a tag (“banker”). If you are collecting malware like me you sooner or later end up with a large collection of samples and it is nice to keep track of what kind of malware they are.
Then we have some metadata, like a human readable description of what it is looking for, and some other metadata. You can write pretty much anything here you want to be able to find in the output later on.
In the “strings” section we are specifying what patterns we are looking for. Here you see that we are using variables and $A and $B is using hex to specify the pattern, while $C is using ASCII.
Finally we specify under what conditions this rule should trigger, and in this example all of the patterns are required to trigger a match.
For SNORT rules it would look something like this if we were looking for the same patterns in the network traffic. First we specify that we want an alert for the match and the network protocol, in this example we are looking at TCP traffic, then we specify the source network followed by the source port. After the direction arrow we specify destination host or network and the port number.
The content tag is specifying that it is content we are looking for, because you can also look at other aspects of the network traffic like flags, time to live etc. Again we are specifying the strings in both HEX and ASCII format and finally the message – the name that will be displayed when the alert triggers – using the MSG tag.
Instead of specifying the known C2 server we just match the traffic pattern so we detect C2 traffic to C2 servers that we don’t yet know about. It is much easier for an attacker to change hostnames or IPs of C2 servers than it is to re-engineer the C2 protocol itself.
Using existing network infrastucture we can detect additional infections.
The firewall can be used to detect C2 traffic based on IP address and port number, and block the communication.
The IDS or IPS can be used to detect C2 communication on a protocol level as well as known binary downloads, and in the case of IPS can be used to block the traffic.
The proxy can be used to detect infections as well as block access to malware-infested sites and C2 servers.
DNS can be used to detect infections and by blackhole-routing known malicious hosts by either redirecting them to localhost but even better to redirect the traffic to a server with extra logging enabled to single out additional infected machines. A quick way to achive this is to use INetSim, the Internet Simulator.
In conclusion, your network has more capabilities to locate “bad stuff” then you know or making use of. You don’t need to spend tens of thousands on software and hardware, you can get a lot from what you already have by using the capabilities you already paid for and looking at the output from the systems. Of course there are many solutions on the market that puts it all together in a easy to use, usually web based, interface and if you can I would recommend you to look at some of the solutions but it is not a requirement.
All the tools I have spoken about today are either free – as in beer - or very cheap. I would say that you wasting your organization’s resources by not employing the techniques discussed here today.
If you have any questions you can ask them now or catch me afterwards, or you can drop me an email at michael AT michael boman DOT org, stalk me on twitter where I am AT mboman.
I also recommend you to visit the Malware Research Institute website at blog DOT malware research DOT institute where you can find more information on how to search and destroy malware.