In this training session, two leading security experts review how adversaries use DNS to achieve their mission, how to use DNS data as a starting point for launching an investigation, the data science behind automated detection of DNS-based malicious techniques and how DNS tunneling and DGA machine learning algorithms work.
Watch the presentation with audio here: http://info.sqrrl.com/leveraging-dns-for-proactive-investigations
5. What is DNS?
Client needs to connect to:
https://www.sqrrl.com
Client's DNS server doesn't know where
sqrrl.com is hosted, forwards query to
upstream server
Upstream DNS server knows sqrrl.com
resolves to 104.196.225.76, returns response
Client's DNS server caches response, sends
response to client
Client connects to https://www.sqrrl.com
DNS Server
https://sqrrl.com
2
3
5
DNS Server
1 4
1
2
3
4
5
6. How do attackers use DNS?
• Attackers target DNS
– DNS spoofing
– DNS reflection
• Attackers utilize DNS
– Tunneling
– Domain Generation Algorithms (DGA)
– Dynamic DNS
7. Why is DNS data useful?
Threat Detection
Opportunity for attacker to leave traceable
footprints in your network
Incident Investigations
Keep track of attacker access in your
network
8. DNS Tunneling Overview
• Data encoded inside of DNS queries is sent to an attacker-controlled server
• Used for command and control, data exfiltration
• Bypasses common security controls (firewalls, web proxies)
Local Network
Local DNS
Resolver
Intermediate DNS
Resolver
*.tunnel.com
DNS Tunnel Server
*.tunnel.com
DNS Tunnel Client
Remote Network
9. DNS Tunneling Overview
Many queries required to transfer moderate
amounts of data
1MB transfer would take ~5k domains
Tunnels produce patterns
paeqcigq.tunnel.com
pafich3i.tunnel.com
gxqwl0eaytioruga5.tunnel.com
Queried DNS domains tend to be unique
Assuming no repeats in data, each domain will
contain unique labels
10. DGA Overview
def generate_domain(year, month, day):
domain = ""
for i in range(16):
year = ((year ^ 8 * year) >> 11) ^
((year & 0xFFFFFFF0) << 17)
month = ((month ^ 4 * month) >> 25) ^
16 * (month & 0xFFFFFFF8)
day = ((day ^ (day << 13)) >> 19) ^
((day & 0xFFFFFFFE) << 12)
domain += chr(((year ^ month ^ day) %
25) + 97)
return domain
Method of establishing a connection with a
command and control server
Used to protect / hide infrastructure and
evade detection
Avoids DNS domain blacklisting
Malware generates DNS domains based
on an algorithm and a seed
Seed may be hardcoded or determined
dynamically (e.g., current datetime) en.wikipedia.org/wiki/Domain_generation_algorithm#
Example
11. DGA Overview
Source: https://johannesbader.ch/2014/12/the-dga-of-newgoz/
DGAs produce patterns
Visually appear “off”
Human would interpret the domain as strange
(pmwtrdsv.ru) or nonsensical (turnipboxsea.com)
Malware may attempt to resolve many
unregistered domains
ci4u0c10b77f5opvn211n5poa3.comwiq
yhl13dkep615aec27ue2t2t.net
kguv3bd2hi317d9l8vdy4i6m.org
xah67i2ayufesns8mh12h1kab.net
7m4oq6jngoka7zxtoq1taebe1.com
12. DGA Overview
Malware Seed # Domains in wild
Alureon Thread ID + milliseconds since boot 5/day
Padcrypt Date 24/day or 72/day
ProsLikeFan Date, hardcoded 100/day
Qadars Date 200/day
Qakbot Date 5000/day
Sisron Date 4/day
Source: https://johannesbader.ch/
16. DNS Tunnel Detection
DNS Data Filter DNS Data
0.
0.5
1.
1.5
2.
2.5
NumberofDNS
requests
Time
1 hour buckets
IP + Destination → Domain Session
IP + Destination → Domain Session
IP + Destination → Domain Session
IP + Destination → Domain Session
Collation
17. • Number of queries
• Number of subdomains
• Average subdomain length
• Average information content of subdomains
Features
DNS Tunnel Classification Features
IP + Destination → Domain Session
IP + Destination → Domain Session
IP + Destination → Domain Session
IP + Destination → Domain Session
18. • Number of queries
• Number of subdomains
• Average subdomain length
• Average information content of subdomains
Classifier Risk Outliers
Features
DNS Tunnel Classification
19. DNS Data Filter DNS Data
DNS Tunnel Validation
paeqcigq.tunnel.com
pafich3i.tunnel.com
gxqwl0eaytioruga5.tunnel.com IP + Destination → Domain Session
IP + Destination → Domain Session
IP + Destination → Domain Session
IP + Destination → Domain Session
Collation
20. Lessons Learned from testing on Sqrrl DNS data
• There are several potential sources of false positives:
– CDNs
– Anti-virus software
– Internal DNS traffic
– Popular services (Spotify, Slack, …)
• Many of these organize content under long, random-looking subdomain names
• Whitelisting can remove some of these false positives
• A hard cut requiring > K unique subdomains per user per hour helps significantly
21. Sqrrl traffic data feature plots
0
45
90
135
180
0 2250 4500 6750 9000
Number of Subdomains
Phishing
YouTube, Amazon AWS,
CDNs, anti-virus, anti-spam
sqrrl-lab.net
slack-msgs.com
AverageLength
22. Sqrrl traffic data feature plots
0.
0.25
0.5
0.75
1.
1.25
0 2250 4500 6750 9000 11250
Number of Subdomains
0.
0.25
0.5
0.75
1.
1.25
0 225 450 675 900 1125
Number of subdomains
eclampsialemontree.net
slack
sqrrl-lab
anti-virus
Ad servers
UniqueQueries
UniqueQueries
23. eclampsialemontree.net
• Queries to 284 unique subdomains with names like:
– ykzcpj1j4ovv3nc1mcgg27ji7uzf4o,
yhgir5h3ts3rppd3j3bph1se4rjqtj,
– Pkbenvnzwo2jl2onldka17rv5uu2kd,
– Kinkascic,
– Kinkascie,
– Kinkascig
• Most queried just once, a few 2-4 times
• Length always a multiple of 3, almost always 30 or 9
characters
• Appears to be a malware site that attempts to inject
invisible frames into ads
25. DNS DGA Detection
DNS Data Filter DNS Data
Collation
IP → Domain Session
IP → Domain Session
IP → Domain Session
IP → Domain Session
0.
0.5
1.
1.5
2.
2.5
Requestssent
Time
DNS Session
26. DNS DGA Classification Features
Features
0.
0.1
0.2
0.2
0.3
0 1 2 3 4 5 6
Day of the week
Histogram for day of the
week
0.
0.04
0.07
0.11
0.14
0.18
0 2 4 6 8 10 12 14 16 18 20 22 24
Hour of the day
Histogram for hour of the day
IP → Domain Session
IP → Domain Session
IP → Domain Session
IP → Domain Session
• Session duration
• Number of unique NxDomains
• Average information content of subdomains
27. DNS DGA Classification
Classifier Risk Outliers
Features
0.
0.1
0.2
0.2
0.3
0 1 2 3 4 5 6
Day of the week
Histogram for day of the week
0.
0.04
0.07
0.11
0.14
0.18
0 2 4 6 8 10 12 14 16 18 20 22 24
Hour of the day
Histogram for hour of the day
• Session duration
• Number of unique NxDomains
• Average information content of subdomains
28. DNS DGA Validation
DNS Data Filter DNS Data
ci4u0c10b77f5opvn211n5poa3.com
wiqyhl13dkep615aec27ue2t2t.net
mkguv3bd2hi317d9l8vdy4i6m.org
1xah67i2ayufesns8mh12h1kab.net
17m4oq6jngoka7zxtoq1taebe1.com
Collation
IP → Domain Session
IP → Domain Session
IP → Domain Session
IP → Domain Session
34. info.sqrrl.com/download-ueba-ebook
User & Entity Behavior Analytics
What's included in this
• What you need to know about advanced behavioral analytics
• How it can automate and revolutionize threat hunting
• How to use it for streamlined threat detection practices
The Heart of Next-Generation Threat Hunting
Phonebook for the Internet
Use a DNS domain name to look up an IP address
You can’t stop DNS
Protocol details
Runs on UDP (stateless)
Queries recursively propagate until an answer is determined
Server provides time-to-live (TTL)
Determines how long answer should be cached
potentially mention Threat Intelligence
Monitor attacker infrastructure from afar
Number of queries
Should be large for tunnels
Number of subdomains
Should be large and equal to or approaching number of queries
Average subdomain length
Should be large for tunnels
Average information content of subdomains
Should be higher for tunnels
Classify outlier-ness using a multivariate Bayesian classifier
Assigns a ranking score for each detection candidate triple (source, destination, time)
For each classifier feature (number of queries, subdomains, avg. length and info), determine the probability of that feature value among all observed traffic
Greater outliers are given higher ranks
Final risk score depends on the rank, the expected rate of attacks, and the time span of the analyzed data
To test the detector, we use the Sqrrl DNS data
We “inject” tunnels, or add them with the logs for regular traffic
We can vary subdomain lengths, have tried ~ 10 - max character in length
Typically include ~ 500 - 10,000 queries in a tunnel injection
The system finds all the injected tunnels
In the Sqrrl data, we typically have two false positives due to sophosxl AV software on two separate computers
BUT, these look very similar to tunneling activity
Detection based on classifying sessions (source IP, time interval)
Destination is a primary domain
Can eliminate all legitimate primary domains before sessionization
For each session, compute feature vector
Make an assumption that most DGA requests do not exist in DNS (NxDomain)
Detection based on classifying triples (source IP, destination, time interval)
Destination is a “registered domain” - usually a TLD plus next level
google.com
guardian.co.uk
mysite.cloudfront.net
Use records of DNS requests for subdomains under each registered domain.
E.g. “maps”, docs”, “mail”, “mymap.maps” might be subdomains of “google.com”
For each triple, compute feature vector to quantify properties of the subdomains under that registered domain
We can ignore queries for registered domains with no subdomain - no subdomain means there can’t be any encoded message
Can reasonably whitelist domains of major sites
Session Duration
Number of unique NxDomains
Should be large
Time of Day and Day of Week
DGAs are not constrained to normal work hours
Average information content of subdomains
Should be higher for DGA
Multi-classifier approach
One classifier for each of three focus areas
Combine results of classifiers in to a final risk score
Domain Classifier
How unusual given domain name in comparison to other domains seen in normal traffic?
Record Classifier
How unusual given DNS record?
Session Classifier
How unusual given DGA session?
Bro logs of 90 days of Sqrrl DNS traffic
Inject data with real DGA records
Domains generated from real DGA reverse engineered code
Model real DGA timing