The document discusses using osquery, an open source host-based monitoring and detection agent, to detect compromise on Windows endpoints. It provides an overview of osquery's capabilities including scheduled and event-based queries, file carving, on-demand querying, and deployment at scale. Examples are given of using osquery to monitor processes, users, groups, USB activity, Windows events, and PowerShell for detection of suspicious activity.
3. Get-Host
• Nick Anderson
• Security Engineer at Facebook
• thor@fb.com
• Super legit, not an alias
• Github - github.com/muffins
• Twitter - twitter.com/poppyseedplehzr
• twitter.com/osquery is much more interesting
4. • Intro and Background
• Osquery crash course
• Configuration and deployment
• Detection use cases
• Conclusion and questions
Schedule
5. • FOSS host based IDS agent
• Built at Facebook for Facebook systems
• Not much for POSIX back in 2012
• Must run on many systems
• Abstracts OS as a SQL database
• SQLite
• Native statically linked binary
• Shell vs daemon components
What is osquery
6. • Grants “Snapshot view” of OS state
• “Show me all scheduled tasks in my fleet”
• Data is state + time focused
• Detect deviations in enterprise
• Breadth of Detection vs Depth
What is osquery
7. • What about CotS?
• Infra already established for scaling
• Why not WMI?
• Required cross-platform agent
• “One query to ask them all”
• Osquery has so much more to offer
• Expose WMI compatibility layer
• Rich data vs performance
Porting to Windows
19. osqueryi
osquery> .mode line
osquery> select key, name, data from registry where key like
'HKEY_USERS%SOFTWAREMicrosoftWindowsCurrentVersionRun';
key = HKEY_USERSS-1-5-21-3535526762-2088972459-3486670382-
1001SOFTWAREMicrosoftWindowsCurrentVersionRun
name = totally legit
data = C:Windowstempdolphin_not_virus.exe
…
20. osqueryi
osquery> .mode line
osquery> select key, name, data from registry where key like
'HKEY_USERS%SOFTWAREMicrosoftWindowsCurrentVersionRun';
key = HKEY_USERSS-1-5-21-3535526762-2088972459-3486670382-
1001SOFTWAREMicrosoftWindowsCurrentVersionRun
name = totally legit
data = C:Windowstempdolphin_not_virus.exe
osquery> select r.data, h.md5 from registry r, hash h where key like
'HKEY_USERS%SOFTWAREMicrosoftWindowsCurrentVersionRun' and h.path = r.data;
+---------------------------------------+----------------------------------+
| data | md5 |
+---------------------------------------+----------------------------------+
| C:Windowstempdolphin_not_virus.exe | 13974cbf51996ab168c12d662fb3bfb7 |
+---------------------------------------+----------------------------------+
38. Event based queries
Time
t1 t2
Event_based_query
p1 p2
Infection occurs
Service
Deleted/modified
Event_based_query
osquery.db
39. Event based queries
C:UsersNick
λ osqueryi
--nodisable_events
--windows_event_channels="System,Security,Microsoft-Windows-PowerShell/Operational"
Using a virtual database. Need help, type '.help’
osquery> select time, eventid, source, data from windows_events where eventid = 4104;
40. Event based queries
C:UsersNick
λ osqueryi
--nodisable_events
--windows_event_channels="System,Security,Microsoft-Windows-PowerShell/Operational"
Using a virtual database. Need help, type '.help’
osquery> select time, eventid, source, data from windows_events where eventid = 4104;
time = 1508904136
eventid = 4104
source = Microsoft-Windows-PowerShell/Operational
data =
{"EventData":{"MessageNumber":"1","MessageTotal":"1","Path":"C:UsersNickDocuments
WindowsPowerShellModulesposh-gitGitTabExpansion.ps1","ScriptBlockId":"0bf6389f-
d4af-46fc-97ff-9069fb22fc3b","ScriptBlockText":"# Initial implementation by Jeremy
Skinnerrn# http://www.jeremyskinner.co.uk/2010/03/07/using-git-with-windows-
powershell/rnrn$Global:GitTabSettings = New-Object PSObject -Property @{rn …
58. Where we are
• 1+ year ago
• Unified results data for all platforms
• Agnostic queries against our full enterprise
• Numerous Windows specific wins
59. WEL Secure Auditing
• Numerous guides online for secure GPO
• Turn these on full blast because unlimited storage
• Spend weeks playing with all of your shiny data
• Find all the evil
110. Let’s leverage open source communities to build use
cases for detecting compromise beyond exploitation
Call to Action
111. Use case from Community
• Open Source Mentorship Program
• Universities and Students
• Engage more open source contributors
• Foster better open source community
Cover what we’re going to talk about and our roadmap
Agent built by Facebook to provide security and telemetry data for production systems, both CentOS and MacOS
Treat the OS like a SQL table, and query different OS constructs
Single binary statically linked to ease dependency maintenance, also provides a standalone CLI utility
Gives high level over-arching state of operating system
State and time focused data
Talk doesn’t emphasize low level techniques
Not finding ROP in memory (yet ;))
Focus is on detecting compromise post exploitation
Build detection layers and seek compromised behavior
Focus here on why this was ported to windows
Sell Sell Sell
Osquery gives much more capability than WMI
Now let’s dive into the CLI utility.
While this isn’t a driving component of osquery, I still find it super useful
Very nifty for prototyping scheduled queries,
Handy in triaging system state, netstat example
Launching the shell
Similarity to sqlite shell
OS Constructs are groupings and abstracted behind tables
Can select from attributes relating to the construct
Pid, name, path, etc, can all be queried against
The real power of osquery shows up when you start joining the different tables
Here we take our processes table from the first query, and we join it against our listening ports table, which only contains the pi
This gives us a view of processes with listening ports on the system
Make sure to mention Globbing
We have a table for taking crypto digests of files, it cannot be directly queried against, but you can pass it paths from other tables
Imagine this data being auto-cross referenced against something like VT or your internal IoC databases
One of the more powerful tables on Windows is the registry table
Abstracted the entire registry hive behind SQL tables.
Can specify different hives and key paths
Continuing these feature sets, we brought globbing to the system registry
We can glob all user registry hives
And then pass this data to hashes table we saw from before
Prototyping with shell
How do we run these prototyped queries periodically?
The daemon is the primary component deployed
Whiel the Shell is great for prototyping queries and querying the system state, daemon runs as a background service
Executes configured queries at specific time interval
Logs the differentials of results by default, shows results that have been added or removed
Large amounts of configuration options. Allows for one to turn off features they’re not interested in or do not want
Given all of the shell work, packs organize our queries
These are consumed by the daemon
At a high level this is what scheduled queries look like
We execute at the sepcified time interval
Lets walk through this
Looking at that data, at time t1 we execute
The data here gets reported to our results log
Decorations – Host uuid, this data gets ‘decorated’ on all of our results
Action – added, this can also be “removed”
This is an illustration of differentials
Differential component is that at t2 we don’t recorded anything new
Change our query pack to be snapshot
Differential component is that at t2 we don’t recorded anything new
You might’ve noticed a potetnail issue
Consider a scenario with scheduled queries
At p1, the infection happens
P2 our intel is dated, and threat intel isn’t going to catch
Eventing solves this issue by “flight” recording
Records events in DB via pub subs
Similar to before, let’s consider an execution flow
This time we’ll cache events as they happen into our RocksDB
At time t1, we’ll execute our scheduled query
We wont get any data, because nothing has happened yet
Now, at p1, when the infection occurs we’ll cache that event via a pub/sub model
We also potentially catch the “change” where the malware is attempting to hide
Lastly we query against the database and get any events that have transpired
Let’s look at event based queries in action
Then we trigger some events and see our event data happening.
In this example we’ve turned on powershell script block logging, and we’re seeing a script block of powershell
With nuances of osqueryd
How do we configure and deploy osquery
How do we get logs back?
Provision with Chef at FB
Puppet, SCCM, whatever you like
Packages hosted in Chocolatey
MSI coming soon! <3
Chef configures client boxes with certs and tokens
At a high level, our configuration consists of these three parts
Graph API endpoint
A configuration store
An enrollment store
Endpoints
User laptops hit our Graph API endpoint
Can happen anywhere they have internet
Happens over TLS with pinned certs provisioned from Chef
The Graph API verifies the enrollment of the Nodes
Logging happens in somewhat of a similar fashion
The endpoints write their data to the graph API endpoint
Again this is pretty awesome as we don’t need them to be on corp VPN to get our data
Data flows from graph API endpoint (again after enrollment verification) to a series of detection tubes
Once we’ve escalated things correctly and done some decoration of the data, we drop this into a long term “infinite” store
Whew. That was a lot
Now that we’ve gone over what an osquery configuration and deployment looks like, let’s take a quick look at some of the more advanced osquery feature sets
One of the features we get asked for quite a bit is the ability to take action
It’s something of a motto for us, that we’ll never alter system state
There’s one new exception to that rule, and whether or not it “alters” is debatable imo ;)
We have a feature we call the “Carver”
This allows one to remote pull files from systems running osquery
Disabled by default
Query agains a table
Table itself is metadata
Similar to before, globbing can be used with carves
If we wanted to always grab all files off of a desktop we could
High level configuration of carve
Uses ODOS (We talk about this shortly)
Requires MORE TLS endpoints
User laptop gets path
TARs everything up
The tar is “chunked” and fired off to the TLS endpoints
Blocks are stored in a temp cache
Once all blocks received, we stash these in an encrypted store
Other TLS endpoints are working to implement these
Expect support for other services soon!
On-demand osquery (or ODOS, we know, we’ve been through the runs with names :) )
allows you to poll a remote system in your enterprise
“work” database
User issues queries by placing them into this db
Laptops periodically ask for work
Get work, execute work
Results posted back and stored
This capability can go one to many
Results are then presented back to the user
This data can be decorated
Accomplished agent port
Deploying osquery on Windows
“Ask one query to rule them all”
Thousands of corp, 100s of ks in prod
Already scaling on POSIX, Windows takes full advantage
Windows wins – registry globbing and WEL
Follow with Patch compliance and vuln mgmt
Start a conversation around detection logic
Lets ask better questions around detecting compromise
What is WEL
Why are we talking about WEL?
Before we jump into use cases
WEL Secure auditing empowers WEL table
Let’s take a look at some wins
Note that all of this data is “as seen from” our back end infra
This is not osquery, this is interacting with our Hive cluster to view our “infinite” data
One of the more common suggestions for WEL secure auditing – turn on Process auditing
Think aboutt he eventing example we used prior
While we don’t have a process events table, we can view proc starts in WEL, and see the bad process starting
This image shows how we’re extracting this out
Our logging infra brings back the data as decorated JSON blobs, so we need to extract
Focus on how we’re filtering
We obtain these by filtering on EID 4688 in the security WEL source
In particular, we get back these fields
PID, Command Line, the process name, as well as (potentially) the user who spawned, …
This is how our data shows up
Can see fine grained process starts and their command line.
Noice.
Our first detection scenario we wanted to look for was “Can we see when a new local user is added to Local admins?”
To get this it’s similar to the proc starts, we instead look for one of a couple of event ids.
Event ID 4720 is local users being created
To get this it’s similar to the proc starts, we instead look for one of a couple of event ids.
Event ID 4720 is local users being created
This returned a surprising number of results
Specifically they all held a similar named pattern
Not depicted here but these were happening *in line* with being added to the local administrators group
We traced these down by looking for event ID 4720, 4735 very close to each other
Specifically they all held a similar named pattern
Not depicted here but these were happening *in line* with being added to the local administrators group
We traced these down by looking for event ID 4720, 4735 very close to each other
Specifically they all held a similar named pattern
Not depicted here but these were happening *in line* with being added to the local administrators group
We traced these down by looking for event ID 4720, 4735 very close to each other
Turns out that Lenovo installs a local user as a Local Admin in order to perform service updates
Turns out that Lenovo installs a local user as a Local Admin in order to perform service updates
Filtered out noisy results
Tracked explicit user group adds
First view of the data
We now have detections rolled out around local users being added to the Admins group, and we can see which accounts are responsible
This detection happens through considering multiple Event IDs in sequence
Just user adds is too noisy
Just Admin adds might not be rich enough
With our WEL pipeline we have turned on removable device auditing.
This lets us see USB devices plugged, files written, and files read
Another wall of our SQL
This allows us to view all removable device events, let’s break this down
One of the cool features of the WEL pipeline is we can turn on powershell script block logging. This gives us transcriptions of powershell executed on clients, but it’s in blocks
You can see here ¾, but we know that we can reassemble the blocks by using the block ID, total count, and script_block_id
This is us filtering for removable device audit logs in the security channel,
looking on my particular machine
The fields we are extracting
Most of these fields are pretty strait forward, we see the device name, the user initiating the action, the process name
The access mask indicates the type of transfer
We’ll see here ina second that ‘0x2’ indicates a file was _written_
This is our generic view of the data
Emphasize the device name and the process name
We could catch any types of transfers here, not just drag and drop from explorer.
One of the cool features of the WEL pipeline is we can turn on powershell script block logging.
This gives us transcriptions of powershell executed on clients, but it’s in blocks
Here’s a sample of that data
You can see here how we have the different blocks.
Fellow engineer Trevor Pottinger wrote some nightly jobs to reassemble this, and drop these into a new table
so now we can start hunting around for different powershell scripts
We started with the most obvious question, “can we find mimikatz in our network?”
We staged quite a few mimikatz runs between our cert team and myself. These all showed up
We started with the most obvious question, “can we find mimikatz in our network?”
We staged quite a few mimikatz runs between our cert team and myself. These all showed up
This found quite a few of my test cases, but also found some folks trying to make facebook more secure
Progressing from there, we started wondering if we had seen any native Win32 API executions
This turned up numerous script runs, but after quite a bit of filtering we found some interesting scripts
This was a pretty big script, bigger than the others I had seen to this point
Turned out to be a ninja copy run, from the same person who was turning off SE-DebugPriv earlier
He generates loads of FPs
From kernel32, we checked out a few sampled api calls themselves
This actually turned up three hits
Legit hits. Looked to be some form of Shellcode loader in powershell
Snippet closer up.
Did some OS recon and found this was from the SET framework
Notified CERT, only to find these were old infections that had already been dealt with (*Dang*)
So, I’m very excited about scouring through our Powreshell scripts for strings
Numerous talks here and else where by Lee Holmes and Daniel Bohannan (and many more I’m sure) show string matching isn’t enough
We started decomposing the powershell scripts in hopes of finding better pattersn
Looked for really high percentage frequencies of white space characters
Here you see ~40% white space, and to the right you can see the histogram of chars
We checked out some of the higher scorers of all time
Unfortunately these looked to just be incomplete script blocks
Call to action, emphasis here on the grow as a community component.
My hope for this talk is that you’ll be able to see us pull off some relatively advanced behavioral detections with FOSS
Potential candidates:
“Let’s make open source tools and communities more robust, supported”
Through open source we can build and foster powerful relationships across company boundaries
Engaging in these communities allows us to build better tools, more robust detections, and lower the bar to entry for companies hoping to establish robust security programs
To close out, I have one last story
We have an Open Source Mentorship program at FB
Great hiring pipeline, great way to engage students
Also great way to encourage more open source developers
We got loads of issues out of this program, which was awesome
Lots more eyes gave us much better usability
Helped us focus on barriers to entry and how we can better foster our communities
One mentee in particular, Rubab Syed
From Lahore Packiston, graduate student in computer sciences
Wrote a virtual table for monitoring pip managed packages on Linux
We got her to also port this to MacOS, and this was a great succcess for the program as she had never done OSS contribs
Then just recently, it came out that PyPi had received some masqueraded packages.
Notification went out Set 9th
We had open source query packs on September 15th
This was a bit after we had already scoured our internal network
Using Rubab’s table we were able to very quickly check our whole production and corporate environment for these packages
* Ways to get in touch with the osquery team and community