The Ultimate Guide to Choosing WordPress Pros and Cons
Splunk for cyber_threat
1. S O L U T I O N S G U I D E
Splunk® for Cyber Threat Analysis
A Big Data Approach to Enterprise Security
Challenge of Discovering Known and Unknown
Threats
In today’s cyber battlefield a vast amount of information is
commonly processed, aggregated and correlated to identify
security incidents collected from the IT architecture. This effort
largely represents looking for known threats—looking for
incidents that have been pre-defined as security threats. The
cyber analyst sets up behavioral rules that identify and match a
level of response that is appropriate for a given security incident.
These rules are commonly present in the detection technology
itself or may be implemented via a security information and
event management (SIEM) technology.
From an enterprise security point of view, this methodology of
aggregation and correlation is often targeted at the tier-1 data
center level, which operates as the front-line defense of your IT
security. The combination of human assets and technology falls
under the broad term of CND (or computer network defense)
and has represented the baseline for all SecOPS over the years.
While current technologies and methods are still somewhat
effective in identifying breeches, attackers have changed their
methodologies and have made the “what you know” proposition
much more difficult to quantify. Compounding the issue is
the explosion of unstructured data from increasingly complex
technologies that often do not fit nicely into the structured world
of SIEM, which can impose artificial restrictions on the collection
of specific data types and provide little visibility into attack
patterns and context.
In response to more sophisticated attacks, a new kind of cyber
threat analyst has emerged operating at the tier-3 level. This
analyst functions as a “security intelligence analyst” and is
often called upon to perform detailed analysis upon a security
incident. Rather than the point-in-time / predetermined
analysis of the tier-1 analyst, the intelligence analyst must
consider threats against a much larger pool of information,
some machine generated and some human generated, over a
significantly longer period of time. The unfortunate truth is that
the pre-defined tools of the tier-1 analyst, which are designed to
reduce the amount of data for analysis, are not suitable for the
investigative needs of the security intelligence analyst.
A Big Data Approach to Discovering Unknown
Threats
While Splunk can certainly address the tier-1 needs of reduction
and correlation, Splunk was designed to support a new paradigm
of data discovery. This shift rejects a data reduction strategy
in favor of a data inclusion strategy. This supports analysis of
very large datasets through data indexing and MapReduce
functionality pioneered by Google. This gives Splunk the ability
to collect data from virtually any available data source without
normalization at collection time and analyze security incidents
using analytics and statistical analysis.
Other Splunk functionality often leveraged for
threat analysis includes:
Indexed data storage with automated field extraction.
Splunk does not store data in a traditional schema-based
row and column format: events are free to be interpreted
as they are. This is especially important where the event
presents ‘multi-value’ fields such as an event that can
write multiple values for the same field in the same event.
This is a common issue in data sources that track SMTP
addresses. The addresses the data sources contain are
often variable. Using Splunk, each of these would be
extracted out separately regardless of the actual event.
Statistical analysis command language. Splunk offers
a ‘search language’ rather than an SQL-style query
language. While an SQL language is adequate for
searching what you know (such as values in columns that
are indexed) it is not adequate for handling ad-hoc queries
since it is a very structured language designed to blindly
‘dump’ the contents of a cell. In contrast, the Splunk search
language offers a much greater freedom in formulating
questions on the fly with a search-friendly interface that is
focused more on acquiring answers rather than formatting
questions. Additionally, much of the search language
is designed to manipulate the data not just save it. For
instance, the Splunk stats command can process a field
any number of ways such as averaging, first value, list,
max, mean, mode, percentile, per-hour, range, standard
deviation, sum and variance—just to name a few. The
ability to ask nearly any conceivable question of the data
rather than simply dumping the data is a key capability for
threat analysis.
Add knowledge to make Splunk smarter. The Splunk
function of tagging, when combined with the ability to
scale to incredibly large datasets allows threat analysts
to classify data independent of its source. This can
be as simple as classifying a particular IP address as
‘hostile,’ which then gets turned into an IP-hostile report
or classified by IP address report that can be analyzed
separately. Since tagging is performed at search time
rather than at index time, you can view data by different