2. Agenda
Lookups in General
Static Lookups
Dynamic Lookups
- Retrieve fields from a web site
- Retrieve fields from a database
- Retrieve fields from a persistent cache
2
4. Splunk: The Engine for Machine Data
Customer Outside the
Facing Data Datacenter
Click-stream data Manufacturing, logistics
Shopping cart data …
Online transaction data CDRs & IPDRs
Power consumption
Logfiles Configs Messages Traps Metrics Scripts Changes Tickets RFID data
Alerts GPS data
Virtualization
Windows Linux/Unix Applications Databases Networking
& Cloud
Registry Configurations Hypervisor Web logs Configurations Configurations
Event logs syslog Guest OS, Apps Log4J, JMS, JMX Audit/query logs syslog
File system File system Cloud .NET events Tables SNMP
sysinternals ps, iostat, top Code and scripts Schemas netflow
4
9. Interesting Things to Lookup
• User’s Mailing Address • External Host Address
• Error Code Descriptions • Database Query
• Product Names • Web Service Call for Status
• Stock Symbol (from CUSIP) • Geo Location
9
10. Other Reasons For Lookup
• Bypass static developer or vendor that does not enrich logs
• Imaginative correlations
• Example: A website URL with “Like” or “Dislike” count
stored in external source
• Make your data more interesting
• Better to see textual descriptions than arcane codes
10
11. Agenda
Lookups in General
Static Lookups
Dynamic Lookups
- Retrieve fields from a web site
- Retrieve fields from a database
- Retrieve fields from a persistent cache
11
12. Static vs. Dynamic Lookup
External Data comes from a CSV file
Static
Dynamic
External Data comes from output of external script, which
resembles a CSV file
12
13. Static Lookup Review
• Pick the input fields that will be used to get output fields
• Create or locate a CSV file that has all the fields you need in the
proper order
• Tell Splunk via the Manager about your CSV file and your lookup
• You can also define lookups manually via props.conf and
transforms.conf
• If you use automatic lookups, they will run every time the
source, sourcetype or associated host stanza is used in a search
• Non-automatic lookups run only when the lookup command is
invoked in the search
13
15. Permissions
Define Lookups via Splunk Manager & set permissions there
local.meta
[lookups/http_status.csv]
export = system
[transforms/http_status]
export = system
15
17. Agenda
Lookups in General
Static Lookups
Dynamic Lookups
- Retrieve fields from a web site
- Retrieve fields from a database
- Retrieve fields from a persistent cache
17
18. Dynamic Lookups
• Write the script to simulate access to external source
• Test the script with one set of inputs
• Create the Splunk Version of the lookup script
• Register the script with Splunk via Manager or conf files
• Test the script explicitly before using automatic lookups
18
19. Lookups vs Custom Command
• Use dynamic lookups when returning fields given input fields
• Standard use case for users who already are familiar with lookups
• Use a custom command when doing MORE than a lookup
• Not all use cases involve just returning fields
• Decrypt event data
• Translate event data from one format to another with new fields
(e.g. FIX)
19
20. Write/Test External Field Gathering Script
Send: Input Fields
External Data in
Cloud Your Python Script
Return: Output Fields
20
21. Example Script to Test External Lookup
# Given a host, find the corresponding IP address
def mylookup(host):
try:
ipaddrlist = socket.gethostbyname_ex(host)
return ipaddrlist
except:
return[]
21
22. External Field Gathering Script with Splunk
External Data in
Cloud Your Python Script
Return: Output Fields
22
23. Script for Splunk Simulates Reading Input CSV
hostname, ip
a.b.c.com
zorrosty.com
seemanny.com
23
24. Output of Script Returns Logically Complete CSV
hostname, ip
a.b.c.com, 1.2.3.4
zorrosty.com, 192.168.1.10
seemanny.com, 10.10.2.10
24
26. Example Dynamic Lookup conf files
transforms.conf
# Note – this is an explicit lookup
[whoisLookup]
external_cmd = whois_lookup.py ip whois
external_type = python
fields_list = ip, whois
26
27. Dynamic Lookup Python Flow
def lookup(input):
Perform external lookup based on input. Return result
main()
Check standard input for CSV headers.
Write headers to standard output.
For each line in standard input (input fields):
Gather input fields into a dictionary (key-value structure)
ret = lookup(input fields)
If ret:
Send to standard output input values and return values
from lookup
27
28. Whois Lookup
def main():
if len(sys.arv) != 3:
print “Usage: python whois_lookup.py [ip field] [whois field]”
sys.exit(0)
ipf = sys.argv[1]
whoisf = sys.argv[2]
r = csv.reader(sys.stdin)
w = none
header = [ ]
first = True…
28
29. Whois Lookup (cont.) to Read CSV Header
# First get read the “CSV Header” and output the field names
for line in r:
if first:
header = line
if whoisf not in header or ipf not in header:
print “IP and whois fields must exist in CSV data”
sys.exit(0)
csv.write(sys.stdout).writerow(header)
w = csv.DictWriter(sys.stdout, header)
first = False
continue…
29
30. Whois Lookup (cont.) to Populate Input Fields
# Read the result and populate the values for the input fields (ip
address in our case)
result = {}
i=0
while i < len(header):
if i < len(line):
result[header[i]] = line[i]
else:
result[header[i]] = ''
i += 1
30
31. Whois Lookup (cont.) to Populate Input Fields
# Perform the whois lookup if necessary
if len(result[ipf]) and len(result[whoisf]):
w.writerow(result)
# Else call external website to get whois field from the ip address as the
key
elif len(result[ipf]):
result[whoisf] = lookup(result[ipf])
if len(result[whoisf]):
w.writerow(result)
31
33. Database Lookup
• Acquire proper modules to connect to the database
• Connect and authenticate to database
• Use a connection pool if possible
• Have lookup function query the database
• Return a list([]) of results
33
34. Database Lookup vs. Database Sent To Index
• Well, it depends…
• Use a Lookup when:
• Using needle in the haystack searches with a few users
• Using form searches returning few results
• Index the database table or view when:
• Having LOTS of users and ad hoc reporting is needed
• It’s OK to have “stale” data (N minutes) old for a dynamic
database
34
35. Example Database Lookup using MySQL
# First connect to DB outside of the for loop
conn = MySQLdb.connect(host = “localhost”,
user = “name of user”,
passwd = “password”,
db = “Name of DB”)
cursor = conn.cursor()
35
36. Example Database Lookup (cont.) using MySQL
import MySQLdb…
# Given a city, find its country
def lookup(city, cur):
try:
selString=“SELECT country FROM city_country where city=“
cur.execute(selString + “”” + city + “””)
row = cur.fetechone()
return row[0]
except:
return []
36
37. Lookup Using Key Value Persistent Cache
• Download and install Redis
• Download and install Redis Python module
Redis is an open
• Import Redis module in Python and populate source, advanced key-
value store.
key value DB
• Import Redis module in lookup function
given to Splunk to lookup a value given a key
37
38. Redis Lookup
###CHANGE PATH According to your REDIS install ######
sys.path.append(“/Library/Python/2.6/…/redis-2.4.5-py.egg”)
import redis
…
def main()
…
#Connect to redis – Change for your distribution
pool = redis.ConnectionPool(host=„localhost‟,port=6379,db=0)
redp = redis.Redis(connection_pool=pool)
38
40. Combine Persistent Cache with External Lookup
• For data that is “relatively static”
• First see if the data is in the persistent cache
• If not, look it up in the external source such as a database or
web service
• If results come back, add results to the persistent cache and
return results
• For data that changes often, you will need to create your own cache
retention policies
40
41. Combining Redis with Whois Lookup
def lookup(redp, ip):
try:
ret = redp.get(ip)
if ret!=None and ret!='':
return ret
else:
whois_ret = urllib.urlopen(LOCATION_URL + ip)
lines = whois_ret.readlines()
if lines!='':
redp.set(ip, lines)
return lines…
except:
41
42. Where do I get the add-ons from today?
Splunkbase!
Add-On Download Location Release
http://splunk-base.splunk.com/apps/22381/whois- 4.x
Whois add-on
http://splunk- 4.x
DBLookup base.splunk.com/apps/22394/example-lookup-
using-a-database
http://splunk-base.splunk.com/apps/27106/redis- 4.x
Redis Lookup lookup
http://splunk-base.splunk.com/apps/22282/geo- 4.x
Geo IP Lookup (not
location-lookup-script-powered-by-maxmind
in these slides)
42
43. Conclusion
Lookups are a powerful way to enhance
your search experience beyond indexing
the data.
43
Splunk is a data engine for your machine data. It gives you real-time visibility and intelligence into what’s happening across your IT infrastructure – whether it’s physical, virtual or in the cloud. Everybody now recognizes the value of this data, the problem up to now has been getting to it. At Splunk we applied the search engine paradigm to being able to rapidly harness any and all machine data wherever it originates. The “no predefined schema” design, means you can point Splunk at any of your data, regardless of format, source or location. There is no need to build custom parsers or connectors, there’s no traditional RDBMS, there’s no need to filter and forward.Here we see just a sample of the kinds of data Splunk can ‘eat’.Reminder – what’s the ‘big deal’ about machine data? It holds a categorical record of the following:User transactionsCustomer behaviorMachine behaviorSecurity threatsFraudulent activityYou can imagine that a single user transaction can span many systems and sources of this data, or a single service relies on many underlying systems. Splunk gives you one place to search, report on, analyze and visualize all this data.