A seminar presentation done for TUT's NoSQL course. A brief look into the possibility and the feasibility of using NoSQL databases to store RADIUS accounting and Syslog data. In this particular case, Syslog-NG, Radiator RADIUS server and MongoDB were used as trial platforms. The presentation includes configuration examples and also some code.
Using NoSQL databases to store RADIUS and Syslog data
1. Using NoSQL databases to
store RADIUS and Syslog
data, part 1: Idea
Karri Huhtanen
18.9.2012
2. Some background
• currently RADIUS accounting data is stored usually
in SQL databases with fixed database schema
• for Syslog messages an SQL database can be used,
but commercial log analyzers (like Splunk) usually
use their own solutions which may or may not be
SQL databases
• Started thinking if NoSQL database could be
applied to both or one of these?
3. RADIUS accounting message
Wed
Aug
8
13:49:33
2012
User-‐Name
=
"jotain@realm"
NAS-‐Port
=
8 One message
NAS-‐IP-‐Address
=
192.168.229.131 contains
Framed-‐IP-‐Address
=
192.168.163.226 undetermined
NAS-‐Identifier
=
"Cisco_66:77:88"
Airespace-‐WLAN-‐Id
=
4 number of
Acct-‐Session-‐Id
=
"50223ea9/00:11:22:33:44:55/2292" attributes.
Acct-‐Authentic
=
Remote
Tunnel-‐Type
=
0:VLAN
interpreted
Tunnel-‐Medium-‐Type
=
0:802 Some can be
attributes,
Tunnel-‐Private-‐Group-‐ID
=
0:222 interpreted, some
the unknown
Event-‐Timestamp
=
1344422780 stay unknown.
Acct-‐Status-‐Type
=
Alive
attributes are
Acct-‐Input-‐Octets
=
1262012
usually left in
Acct-‐Input-‐Gigawords
=
0 Because there can
OID:FieldDataTyp
Acct-‐Output-‐Octets
=
13518133 be a changing
Acct-‐Output-‐Gigawords
=
0
e binary format
Acct-‐Input-‐Packets
=
11692 number of
Acct-‐Output-‐Packets
=
11154 changing type of
Acct-‐Session-‐Time
=
1235 attributes I began
Acct-‐Delay-‐Time
=
19
Calling-‐Station-‐Id
=
"00:11:22:33:44:55" to wonder if
Called-‐Station-‐Id
=
"f4:7f:35:5e:bf:b0" NoSQL could be
cisco-‐avpair
=
"nas-‐update=true" used for storing
Digest-‐Response
=
"P"C<188>"
Digest-‐Response
=
"P"C<194>" these?
Timestamp
=
1344422954
4. Syslog message
Until researching The
syslog
message
has
the
following
ABNF
[RFC5234]
definition:
into this I thought
SYSLOG-‐MSG
=
HEADER
SP
STRUCTURED-‐DATA
[SP
MSG]
Syslog messages had
HEADER
=
PRI
VERSION
SP
TIMESTAMP
SP
HOSTNAME
SP
APP-‐NAME
SP
PROCID
SP
MSGID
fixed structure and
PRI
=
"<"
PRIVAL
">"
PRIVAL
=
1*3DIGIT
;
range
0
..
191
could be then
VERSION
=
NONZERO-‐DIGIT
0*2DIGIT
HOSTNAME
=
NILVALUE
/
1*255PRINTUSASCII
handled with fixed
APP-‐NAME
=
NILVALUE
/
1*48PRINTUSASCII
database schema.
PROCID
=
NILVALUE
/
1*128PRINTUSASCII
MSGID
=
NILVALUE
/
1*32PRINTUSASCII
TIMESTAMP
=
NILVALUE
/
FULL-‐DATE
"T"
FULL-‐TIME
Then I read the
FULL-‐DATE
=
DATE-‐FULLYEAR
"-‐"
DATE-‐MONTH
"-‐"
DATE-‐MDAY
DATE-‐FULLYEAR
=
4DIGIT
RFC5424: http://
DATE-‐MONTH
=
2DIGIT
;
01-‐12
DATE-‐MDAY
=
2DIGIT
;
01-‐28,
01-‐29,
01-‐30,
01-‐31
based
on
tools.ietf.org/html/
;
month/year
FULL-‐TIME
=
PARTIAL-‐TIME
TIME-‐OFFSET
rfc5424
PARTIAL-‐TIME
=
TIME-‐HOUR
":"
TIME-‐MINUTE
":"
TIME-‐SECOND
[TIME-‐SECFRAC]
TIME-‐HOUR
=
2DIGIT
;
00-‐23
TIME-‐MINUTE
=
2DIGIT
;
00-‐59
TIME-‐SECOND
=
2DIGIT
;
00-‐59
TIME-‐SECFRAC
=
"."
1*6DIGIT
TIME-‐OFFSET
=
"Z"
/
TIME-‐NUMOFFSET
TIME-‐NUMOFFSET
=
("+"
/
"-‐")
TIME-‐HOUR
":"
TIME-‐MINUTE Here we have once
STRUCTURED-‐DATA
=
NILVALUE
/
1*SD-‐ELEMENT
again parameters,
SD-‐ELEMENT
=
"["
SD-‐ID
*(SP
SD-‐PARAM)
"]"
SD-‐PARAM
=
PARAM-‐NAME
"="
%d34
PARAM-‐VALUE
%d34
although they are
SD-‐ID
=
SD-‐NAME
PARAM-‐NAME
=
SD-‐NAME
within one defined
PARAM-‐VALUE
=
UTF-‐8-‐STRING
;
characters
'"',
''
and
;
']'
MUST
be
escaped.
STRUCTURED-
SD-‐NAME
=
1*32PRINTUSASCII
;
except
'=',
SP,
']',
%d34
(")
DATA field.
MSG
=
MSG-‐ANY
/
MSG-‐UTF8
MSG-‐ANY
=
*OCTET
;
not
starting
with
BOM
MSG-‐UTF8
=
BOM
UTF-‐8-‐STRING
So could NoSQL be
BOM
=
%xEF.BB.BF used also for Syslog?
5. So what happens next?
• Selection of NoSQL database:
• Likely Column Family Store if no one can suggest a
better one?
• Something easy to setup and use, will concentrate into
getting RADIUS server and/or Syslogd transferring
data to database.
• Setting up a WiFi access point and/or controller to
provide real RADIUS and Syslog data
• Storing data, retrieving data, searching data, deleting data
to see what works
• Writing and presenting Part II: “Implementation and
Results” of these slides
6. Results (hopefully)
• Is storing RADIUS accounting and Syslog messages into
NoSQL database: a brilliant idea, brilliantly stupid idea or
something else?
• How hard can it be? What does it require to do this, is it
possible and how?
• Does it actually work? What can you do with data? Is
there some indication of performance improvements or
problems?
• Will not do complete performance measurements
though, designing and setting up reliable measurement
environment will probably take too much time.
7. Using NoSQL databases to
store RADIUS and Syslog data,
part 1I: The Saga Continues
Karri Huhtanen
27.11.2012
8. Happened earlier
• currently RADIUS accounting data is stored usually
in SQL databases with fixed database schema
• for Syslog messages an SQL database can be used,
but commercial log analyzers (like Splunk) usually
use their own solutions which may or may not be
SQL databases
• Started thinking if NoSQL database could be
applied to both or one of these?
9. Results (luckily)
• Is storing RADIUS accounting and Syslog messages into
NoSQL database: a brilliant idea, brilliantly stupid idea or
something else? a good idea
• How hard can it be? What does it require to do this, is it
possible and how? easy, 1 night before
presentation required
• Does it actually work? What can you do with data? Is
there some indication of performance improvements or
problems? Yes. Store and Process. Unknown.
Some issues to be considered.
• Will not do complete performance measurements
though, designing and setting up reliable measurement
environment will probably take too much time.
Coded one Python script.
10. So what happened?
• Selection of NoSQL database:
• Likely Column Family Store if no one can suggest a
MongoDB
better one?
• Something easy to setup and use, will concentrate into
getting RADIUS server and/or Syslogd transferring
data to database.
• Setting up a WiFi access point and/or controller to
provide real RADIUS and Syslog data
• Storing data, retrieving data, searching data, deleting data
to see what works Done, but not thoroughly
• Writing and presenting Part II: “Implementation and
Results” of these slides Done
13. storing RADIUS accounting and Syslog
messages into NoSQL database
• It is a good idea because:
• When we have massive amount of log or accounting data, we need massive
database clusters.
• Data is mainly stored, read, analyzed and occasionally deleted. Data will not be
updated or changed and is relatively simple (few tables with a lot of columns).
• NoSQL may provide better way to scale this horizontally by distribution and
sharding.
• It is already being done. Several log analyzers, stores already use NoSQL
databases as backends. There exists projects such as Greylog2 etc. which
provide complete solutions from log storage, visualization, analysis etc.
• Logs and accounting data are actually use cases for some NoSQL databases, for
example: http://docs.mongodb.org/manual/use-cases/storing-log-data/
14. storing RADIUS accounting and Syslog
messages into NoSQL database
• It is not a brilliant idea because:
• If we look what we need to do to optimize the performance it starts to look
like a lot like designing and optimizing a SQL database: http://docs.mongodb.org/
manual/use-cases/storing-log-data/
• You cannot forget datatypes or database design even with NoSQL databases
especially when going into production.
• Prototypes may be faster and easier for developers, but creating a design and
configuration which survices production use may be as hard as it has ever been.
The difference is that instead of SQL database expert, you know need a NoSQL
expert.
• ... but it is not a brilliantly stupid idea either, it is an idea
worth considering depending of the project.
15. How hard can it be?
• With Ubuntu Linux Server 12.04 LTS:
• sudo apt-get install python-pymongo mongodb syslog-ng
syslog-ng-mod-mongodb
• for Syslog-NG, just some configuration
• for Radiator, some configuration and coding an external
Python script to handle accounting messages
• But this is far from production use, it is more like proto or
proof of concept implementation done in 1 work day.
18. Radiator RADIUS server
# /etc/radiator/radiator.cfg
#
# send all RADIUS accounting requests to external script
#
<Handler Request-Type = Accounting-Request>
<AuthBy EXTERNAL>
Command %D/acct2mongo.py
</AuthBy>
AcctLogFileName %L/acct-acct2mongodb-%Y-%M.log
</Handler>
19. #!/usr/bin/env python
from pymongo import Connection
import datetime
acct2mongo.py
import sys
def main():
line = str()
post = dict()
# opening connection
connection = Connection( 'localhost', 27017)
# database 'radius'
db = connection['radius']
# collection 'accounting'
collection = db['accounting']
post['acct2mongotimestamp'] = datetime.datetime.utcnow()
for line in sys.stdin.readlines():
pieces = line.split(' = ', 1)
if len(pieces) == 2:
post[pieces[0].strip().strip('"')]=pieces[1].strip().strip('"')
collection.insert(post)
connection.end_request()
connection.disconnect()
# 0 Means reply with an acceptance. For Access-Requests,
# an Access-Accept will be sent. For Accounting-Requests,
# an Accounting-Response will be sent.
return 0
if __name__ == '__main__':
main()
20. Does it actually work? What
can you do with data?
• Yes it does actually work, but once again it does not solve or be
applicable to everything.
• One can store, read, search and delete data supposedly very
efficiently, but anything more complicated is harder and must be
implemented by developer.
• For example: MongoDB does not have a reliable decimal datatype. It
is better to keep numbers as a string and convert them when
processing data.
• Repeating earlier statement: “You cannot forget datatypes or
database design even with NoSQL databases especially when going
into production.”
21. Performance?
• Would need to be measured and verified and with
real production environment or solution.
• Would also need to be compared with well
designed and optimised SQL database, maybe even
one functioning as NoSQL one.
• In the implementation this was not tested as the
datasets were very small compared to real datasets.
22. Conclusions
• NoSQL should be at least considered as an option
when designing and implementing large scale Syslog or
Radius Accounting storages.
• For development it is flexible.
• For production use NoSQL solution still needs design,
careful planning and testing to verify if the
performance, reliability and security is enough. Probably
as much as SQL database design.
• Key issue will probably be can the SQL database handle
the data or is horizontal scaling required.