Learn about structured logging with rsyslog and how it can be used to do actual format conversions. Include config samples for Linux and Windows log sources.
4. Rainer Gerhards, http://blog.gerhards.net
Logging is simple, isn't it?
• Just generate a log record when something
interesting happens
• BUT
▫ What is “interesting”?
▫ What is required to describe the event?
▫ How do we know what the actual data item means?
▫ What does a log record look like?
• So... making sense out of logs, especially in a
heterogeneous environment, is far from being
simple...
5. Rainer Gerhards, http://blog.gerhards.net
The Logging Dilemma
• There is no universally accepted format
• Logs looking very much the same describe different
events
• The same event is described in very different-
looking log records
• Often, pseudo-free-form text is used
• For consumers, it is very hard to digest even a
decent subset of important logging formats
6. Rainer Gerhards, http://blog.gerhards.net
It's a real-world problem!
One day in my mailbox...
“I am working with a customer who is deploying a
large rsyslog environment for central logging.
Basically they want a cluster of boxes to act as the
"log of record". They would also like to have the
logs fed to a couple security products for
analysis. The customer has a limited budget so
having each vendor write parsers is cost
prohibitive. ”
7. Rainer Gerhards, http://blog.gerhards.net
Log Producers & Consumers
Linux Boxes WindowsOther *nix FirewallsApps
Security
Analyzer I
Log
Storage
Security
Analyzer n
Capacity
Planning
Billing
?
8. Rainer Gerhards, http://blog.gerhards.net
Some important log sources
• Free-form text formats
▫ Traditional syslog messages
▫ Application text log files
• Structured formats
▫ Windows Event Log
▫ Linux Journal (today mostly text messages)
▫ Application text log files (XML, CSV, WELF, Apache
CLF, whatever)
▫ SNMP traps
▫ New-style syslog
9. Rainer Gerhards, http://blog.gerhards.net
How to solve that dilemma?
• Several efforts try very hard to solve this
▫ For many years
▫ With limited success
• Resulted in approach named
“Common Event Expression” (CEE)
▫ Cross vendor team (both OSS & commercial)
▫ Driven by US MITRE
▫ Build on existing infrastructure
11. Rainer Gerhards, http://blog.gerhards.net
CEE's core ideas
• Keep it simple & extensible
• Support existing technology
• As far as the format is concerned
▫ name/value pairs
▫ Keep the structure as flat as possible, but permit some
hierarchy
▫ Keep dictionaries of field names, syntax and semantic
▫ Profiles specify what needs to be present in specific
event types
12. Rainer Gerhards, http://blog.gerhards.net
Project Lumberjack
• Born on last years Fedora DevConf, right here!
• Intends to
▫ Build on CEE and drive the ideas further
▫ Provide open source implementation of core
functionality
▫ Deliver something that actually works
• Driven by Logging Professionals from Red Hat,
Balabit (syslog-ng) and Adiscon (rsyslog), open to
everyone else
13. Rainer Gerhards, http://blog.gerhards.net
What did we do the past year?
• Agree on the log format
• Made rsyslog fully lumberjack-aware
• Made Adiscon's Windows Products fully
lumberjack-aware
• Made syslog-ng fully lumberjack-aware
• Create new syslog API --> libumberlog
14. Rainer Gerhards, http://blog.gerhards.net
Back to my mailbox...
“I am working with a customer who is deploying a
large rsyslog environment for central logging.
Basically they want a cluster of boxes to act as the
"log of record". They would also like to have the logs
fed to a couple security products for analysis. The
customer has a limited budget so having each vendor
write parsers is cost prohibitive. A commonality
for each of the additional destinations is the
ability to ingest logs in <some common
format>. I believe rsyslog has the capability to alter
the output...”
16. Rainer Gerhards, http://blog.gerhards.net
Some rsyslog basics
• Ruleset
▫ Like a function in a programming language
▫ Consists of (conditional) statements and actions
▫ Can be called from another ruleset or bound to a
listener
• Variables
▫ Message Variables (e.g. $msg, $rawmsg)
▫ System Variables (e.g. $$now)
▫ Structured Variables: form a tree-like structure, e.g. $!
usr!somevar
17. Rainer Gerhards, http://blog.gerhards.net
Let's look at a practical case
• Goal: Unified log files with logon/logoff report
▫ For processing by backend tools (not shown)
▫ concentrate on just four fields: host system, reception
time, username, logon/logoff status
• Inputs
▫ Linux: traditional text log messages
▫ Windows: different Agents
• Output
▫ Lumberjack JSON style
▫ CSV
18. Rainer Gerhards, http://blog.gerhards.net
Have rsyslog gather the data
module(load="imtcp")
/* We assume to have all TCP logging (for simplicity)
* Note that we use different ports to point different sources
* to the right rule sets for normalization. While there are
* other methods (e.g. based on tag or source), using multiple
* ports is both the easiest as well as the fastest.
*/
input(type="imtcp" port="13514" Ruleset="WindowsRsyslog")
input(type="imtcp" port="13515" Ruleset="LinuxPlainText")
input(type="imtcp" port="13516" Ruleset="WindowsSnare")
19. Rainer Gerhards, http://blog.gerhards.net
The Linux Input Data sample
• Free-text format
Jan 16 09:28:33 rger-virtual-machine sudo: pam_unix(sudo:session): session opened
for user root by rger(uid=1000)
Jan 16 09:28:33 rger-virtual-machine sudo: pam_unix(sudo:session): session closed
for user root
Jan 24 02:38:49 rger-virtual-machine sshd[2414]: pam_unix(sshd:session): session
opened for user rger by (uid=0)
Jan 24 02:41:22 rger-virtual-machine sshd[2414]: pam_unix(sshd:session): session
closed for user rger
• Free-text format
20. Rainer Gerhards, http://blog.gerhards.net
Parsing Free-Text Messages:
mmnormalize
• Uses a “sample rule base”
▫ One sample for each expected message type
▫ Sample contains text (for matching) and property
descriptions (like IPv4 Address, char-matches, …)
▫ If sample matches, corresponding properties are
extracted
▫ Special parser for iptables
• Also implemented as an action
• Very fast algorithm (much faster than regex)
• Based on liblognorm (which you can use in your
own programs to gain this functionality!)
21. Rainer Gerhards, http://blog.gerhards.net
Needs to be normalized
• Job for rsyslog's mmnormalize
• rulebase:
# SSH and sudo logins
prefix=%rcvdat:date-rfc3164% %rcvdfrom:word%
rule=: sshd[%-:number%]: pam_unix(sshd:session): session %type:word% for user
%user:word% by (uid=%-:number%)
rule=: sshd[%-:number%]: pam_unix(sshd:session): session %type:word% for user
%user:word%rule=: sudo: pam_unix(sudo:session): session %type:word% for user root
by %user:char-to:(%(uid=%-:number%)
rule=: sudo: pam_unix(sudo:session): session %type:word% for user %user:word%
22. Rainer Gerhards, http://blog.gerhards.net
Putting it all together:
/* plain Linux log messages (here: ssh and sudo) need to be
* parsed - we use mmnormalize for fast and efficient parsing
* here.
*/
ruleset(name="LinuxPlainText") {
action(type="mmnormalize"
rulebase="/home/rger/proj/rsyslog/linux.rb" userawmsg="on")
if $parsesuccess == "OK" and $!user != "" then {
if $!type == "opened" then
set $!usr!type = "logon";
else if $!type == "closed" then
set $!usr!type = "logoff";
set $!usr!rcvdfrom = $!rcvdfrom;
set $!usr!rcvdat = $!rcvdat;
set $!usr!user = $!user;
call outwriter
}
}
23. Rainer Gerhards, http://blog.gerhards.net
Windows Horrors: SNARE
• Tab-delimited mess:
<131>Feb 10 15:48:12 Win2008StdR2x64_vm
MSWinEventLog#0111#011Security#0114#011Tue Feb 05 16:39:27
2013#0114624#011Microsoft-Windows-Security-
Auditing#011WIN2008STDR2X64Administrator#011N/A#011Success
Audit#011Win2008StdR2x64_vm#011Anmelden#011#011Ein Konto wurde erfolgreich
angemeldet. Antragsteller: Sicherheits-ID: S-1-5-18 Kontoname:
WIN2008STDR2X64$ Kontodomäne: WORKGROUP Anmelde-ID: 0x3e7
Anmeldetyp: 2 Neue Anmeldung: Sicherheits-ID: S-1-5-21-3148105976-3029560809-
1855765213-500 Kontoname: Administrator Kontodomäne: WIN2008STDR2X64
Anmelde-ID: 0x1d1feb Anmelde-GUID: {00000000-0000-0000-0000-
000000000000} Prozessinformationen: Prozess-ID: 0xc40 Prozessname:
C:WindowsSystem32winlogon.exe Netzwerkinformationen: Arbeitsstationsname:
WIN2008STDR2X64 Quellnetzwerkadresse: 127.0.0.1 Quellport: 0 Detaillierte
Authentifizierungsinformationen: Anmeldeprozess: User32 Authentifizierungspaket:
Negotiate Übertragene Dienste: - Paketname (nur NTLM): - Schlüssellänge: 0 Dieses
Ereignis wird beim Erstellen einer Anmeldesitzung generiert. Es wird auf dem Computer
24. Rainer Gerhards, http://blog.gerhards.net
Anyhow... digest by position:
ruleset(name="WindowsSnare") {
set $!usr!type = field($rawmsg, "#011", 6);
if $!usr!type == 4634 then {
set $!usr!type = "logoff"; set $!doProces = 1;
} else if $!usr!type == 4624 then {
set $!usr!type = "logon"; set $!doProces = 1;
} else set $!doProces = 0;
if $!doProces == 1 then {
set $!usr!rcvdfrom = field($rawmsg, 32, 4);
set $!usr!rcvdat = field($rawmsg, "#011", 5);
/* we need to fix up the snare date */
set $!usr!rcvdat = field($!usr!rcvdat, 32, 2) & " " &
field($!usr!rcvdat, 32, 3) & " " &
field($!usr!rcvdat, 32, 4);
set $!usr!user = field($rawmsg, "#011", 8);
call outwriter }
}
25. Rainer Gerhards, http://blog.gerhards.net
Windows: rsyslog Agent
• Native Lumberjack format with Windows field
names
• A structured mess ;-)
<133>Feb 05 11:15:56 win7fr.intern.adiscon.com EvntSLog: @cee: {"source":
"win7fr.intern.adiscon.com", "nteventlogtype": "Security", "sourceproc": "Microsoft-
Windows-Security-Auditing", "id": "4634", "categoryid": "12545", "category": "12545",
"keywordid": "0x8020000000000000", "user": "NA", "TargetUserSid": "S-1-5-21-
803433813-209592097-1264475144-8733", "TargetUserName": "fr",
"TargetDomainName": "ADISCON", "TargetLogonId": "0xb8c7aed", "LogonType":
"7", "catname": "Logoff", "keyword": "Audit Success", "level": "Information", "msg":
"An account was logged off.rnrnSubject:rntSecurity ID:ttS-1-5-21-
803433813-209592097-1264475144-8733rntAccount Name:ttfrrntAccount
Domain:ttADISCONrntLogon ID:tt0xb8c7aedrnrnLogon
Type:ttt7rnrnThis event is generated when a logon session is destroyed. It may
be positively correlated with a logon event using the Logon ID value. Logon IDs are
only unique between reboots on the same computer."}
26. Rainer Gerhards, http://blog.gerhards.net
Parsing Lumberjack Data:
mmjsonparse
• Checks if message contains Lumberjack structured
data
▫ If so
parse out fields
Use field names directly from the message
▫ If not: populate Lumberjack msg field
• Implemented via action interface
▫ Can be called based on rules, thus only for specific
events
27. Rainer Gerhards, http://blog.gerhards.net
Reading the Lumberjack Data:
/* the rsyslog Windows Agent uses native Lumberjack format
* (better said: is configured to use it)
*/
ruleset(name="WindowsRsyslog") {
action(type="mmjsonparse")
if $parsesuccess == "OK" then {
if $!id == 4634 then
set $!usr!type = "logoff";
else if $!id == 4624 then
set $!usr!type = "logon";
set $!usr!rcvdfrom = $!source;
set $!usr!rcvdat = $timereported;
set $!usr!user = $!TargetDomainName &
"" & $!TargetUserName;
call outwriter
}
}
28. Rainer Gerhards, http://blog.gerhards.net
What did we do so far?
• We accepted input from three different sources
▫ Free-form text
▫ Tab-delimited semi-structured
▫ Native Lumberjack
• We extracted the same information items from these
messages
• And stored these inside the $!usr branch variables
29. Rainer Gerhards, http://blog.gerhards.net
So we now need to write the
normalized output!
/* this ruleset simulates forwarding to the final destination */
ruleset(name="outwriter"){
action(type="omfile"
file="/home/rger/proj/rsyslog/logfile.csv" template="csv")
action(type="omfile"
file="/home/rger/proj/rsyslog/logfile.cee" template="cee")
}
30. Rainer Gerhards, http://blog.gerhards.net
Templates do the actual work
template(name="csv" type="list") {
property(name="$!usr!rcvdat" format="csv")
constant(value=",")
property(name="$!usr!rcvdfrom" format="csv")
constant(value=",")
property(name="$!usr!user" format="csv")
constant(value=",")
property(name="$!usr!type" format="csv")
constant(value="n")
}
template(name="cee" type="string"
string="@cee: %$!usr%n")
32. Rainer Gerhards, http://blog.gerhards.net
And the same in CSV:
"Jan 16 09:28:33","rger-virtual-machine","root","logon"
"Jan 16 09:28:33","rger-virtual-machine","root","logoff"
"Jan 24 02:38:49","rger-virtual-machine","rger","logon"
"Feb 05 16:39:27","Win2008StdR2x64_vm","WIN2008STDR2X64Administrator","logon"
"Jan 25 15:44:35","WIN-VSBQP2NOITT","WIN-VSBQP2NOITTte","logoff"
"Feb 5 11:15:56","win7fr.intern.adiscon.com","ADISCONfr","logoff"
"Feb 5 13:41:28","win7fr.intern.adiscon.com","NT AUTHORITYSYSTEM","logon"
33. Rainer Gerhards, http://blog.gerhards.net
Of course, this is just a small
example, but
• It shows how all the pieces can be put together
• mmnormalize is a very important building block to
integrate free-form text logs, no matter what the
source is
• The output format is highly flexible
• Of course, structured outputs like MongoDB or
Elasticsearch are also supported
• We can emit almost all output formats, new ones
requires relatively little work in rsyslog's engine
34. Rainer Gerhards, http://blog.gerhards.net
Bottom line
• Rsyslog can act today as an universal log format
translator
• We hope that consumer tools will make use of the
simple-to-process lumberjack format
• HOWEVER, we can already convert into what
today's real-world analysis tools can digest
35. Rainer Gerhards, http://blog.gerhards.net
Once again back to my inbox...
• “I know this is asking a lot since rsyslog would
have to do a bunch of processing. I also understand
there may be a delay in log delivery due to the
processing.”
• Well … actually it's far from being as bad as
described:
▫ Structured logs are ingested very quickly
▫ Liblognorm/mmnormalize is extremely fast in
converting classical text logs
▫ Reformatting is done always in any case, so... ;-)
36. Rainer Gerhards, http://blog.gerhards.net
Long-Term Vision
• There NEVER will be a single format
▫ Political reasons (vendors, projects, history, ...)
▫ Need for new features/functionality
• BUT: use as few as possible
▫ Less hassle for producer and consumer devs
▫ Forces closed source vendors to support these
standard, making it easier for the OSS guys
▫ Big win for Enterprise folks who get plug&play
• We hope that Lumberjack will be dominant
▫ Stack already in place
▫ Good & simple solution
▫ Rsyslog converts everything running on Linux