SlideShare una empresa de Scribd logo
1 de 25
Why it matters to IT
MACY CRONKRITE
@MACYCRON
www.facebook.com/safehex
Data Mining for Organization Value
Data you are already processing has value
• Audit trail & application status
• Automatic monitoring for errors and warnings
• Helping track down configuration problems
• Helping track down bugs
• Micro analysis of user behavior “click stream” and
complex events
• No more email to “monitor a process”
• Get alerts only when something critically fails.
What is it?
• Search and analysis engine
• Google like search of your log data
RDBMS???
ARE YOU KIDDING?
Organization Data is
BIG DATA
(velocity-variety-volume)
So Map Reduce – Key Value Pairs FTW!!!!
Old way RDBMS
>>> New Way (Map Reduce)
• Could be better than user supplied info? AKA
tickets, complaints, unreported errors.
• Behavior Analysis (Good and Bad)
Versions
• Free
– 500MB/day
– Reporting
– Ad-hoc search
• Enterprise (all above and)
– 500MB/day and more!
– Access controls
– Distributed Search, Load Balancing
– Monitoring & Alerting
Server 1
• Install Splunkd and SplunkWeb
• Via WebGUI under Manager tab
• Add Receiver Port to enable forwarders
Server1
Setup 2
Forwarder Setup (most common)
• Server1
– Install Splunkd and SplunkWeb
• ServerX
– Install Splunkd
Server1
ServerX
ServerX ServerX ServerX ServerX
ServerX
MACHINE DATA
• Most sensors create log files
• Anything with a time-stamp
• Unstructured data (many source types)
• Anything that the system does on behalf of a
user can be tracked, aggregated, and
correlated across servers and applications
• At minimum two keys are needed;
– timestamp, and unique user session id.
Why --- Event Correlation
• It leverages a natural query language to
perform searches and analysis of log files.
• A single search can cross multiple disparate
logs looking for key words and other
structures
• Splunk is licensed per volume of data
indexed, not on a per server basis
• Build Apps (custom views) for specific ROLES
Mix Human Event Reports AND Machine Events
Correlate your 1X / Base case instantly
LOGS are on all layers of your application stack
Alert when the combination of events meet criteria.
Less for human to parse Whew!!
Less data overload/ignore you won’t go back
What is Splunk?
• Sounds like its expensive or it takes weeks to set up.
• There’s a free license. It installs in 15 minutes. On your laptop, while you’re testing it
out, search billions of events in seconds. When you’re ready, scale up to your datacenter and
search trillions. Basic searching and quite a lot of the reporting will work right out of the box.
• Bullsxxx.
Well I’m not saying that 15 minutes in, it’s going to be emailing your boss a pdf pie chart of
“lost revenue – top causes”. But that’s seriously possible in a couple of hours. Out of the
box, Splunk will parse your data and extract out a lot of meaning, and if it doesn’t get
everything, teaching it how to extract the juicy numbers and names from your events is really
pretty straightforward. Then, once all the numbers and names are extracted and ready to be
reported on, you’ll be able to do real searches and reports that help your people solve real
problems. And when you get to that point, from then on it’s pretty much crack. My goal in
this document is to get you addicted. Sorry.
• Download Splunk for free and try it for yourself from splunk.com, right now.
Uses
• Right Now we are using Splunk to calculate our VPN metrics
for the Remote Access service
• Total Sessions
– index="vpn" user authentication Successful | stats count AS
Logins
• Unique users
– index="vpn" %ASA-6-113004 | rex field=_raw "user =
"(?<Username>.*) | dedup Username | stats count AS
UniqueUsers
• For information usage, “non ‘mm’ machines”
– index="vpn" Received request for DHCP hostname for DDNS|
rex field=_raw "hostname for DDNS is: (?<Machine>.*)!"| eval
machine=lower(Machine)| search Machine!= "mm*" | rex
field=_raw "Username = (?<User>.*), IP"| table User, Machine
Transactions ACROSS devices
• Can we calculate IN SPLUNK, the transaction
duration, e.g. started transaction at
timestamp, and end transaction. IF we
standardize on the Keys for the start and end.
• This is a different approach to solving
"duration"
Index Volume
Splunk Navigation and Basic Searching
REVIEW
• Splunk comes with several Apps, but the only relevant one now is the 'Search' app, which is
the interface for generic searching. To begin your Splunk search, type in terms you might
expect to find in your data. For example, if you want to find events that might be HTTP 404
errors (i.e., webpage not found), type in the keywords:
• http 404 --You'll get back all the events that have both HTTP and 404 in their text. Notice that
search terms are implicitly AND'd together. The search was the same as "http AND 404".
Let's make the search narrower:
• http 404 "like gecko“ Using quotes tells Splunk to search for a literal phrase “like
gecko”, which returns more specific results than just searching for “like” and “gecko” because
they must be adjacent as a phrase.
• Splunk supports the Boolean operators AND, OR, and NOT (must be capitalized), as well as
parentheses to enforce grouping. To get all HTTP error events (i.e., not 200 error code), not
including 403 or 404, use this:
• http NOT (200 OR 403 OR 404) Again, the AND operator is implied; the previous search is the
same as http AND NOT (200 OR 403 OR 404)
• Splunk supports the asterisk (*) wildcard for searching. For example, to retrieve events that
has 40x and 50xx classes of HTTP status codes, you could try: http (40* OR 50*)
•
Intermediate Searching
• Splunk's search language is much more powerful than you think it is. So far we've only been
talking about 'search', which retrieves your indexed data, but there are dozens of other
operations you can perform on your data. You can "pipe" (i.e., transfer) the results of a
search to other commands to filter, modify, reorder, and group your results.
• If Google were Splunk, you'd be able to search the web for every single page mentioning your
ex-girlfriends, extract out geographical information, remove results without location info, sort
the results by when they were written, keeping only the most recent page per ex-
girlfriend, and finally generate a state by-state count of where Mr. Don Juan's ladies currently
live. But Google isn't Splunk, so good luck with that.
• Let's do something similar, though, with our web data: let's find some interesting things
about URIs that have 404s. Here's our basic search:
• status=404
• Now let's take the result of that search and sort the results by URI:
• status=404 | sort - uri
• That special "pipe" character ("|") says "take the results of the thing on the left and process
it, in this case, with the 'sort' operator".
•
Splunk Navigation and Basic Searching
REVIEW
• Wildcards can appear anywhere in a term, so "f*ck" will return all events with
fack, feck, fick, fock, or flapjack, among others. A search for “*” will return all events. Note
that in these searches we’ve been playing fast and loose with precision. Any event that has
50 in it (e.g. “12:18:50”) would also unfortunately match. Let’s fix that.
•
When you index data, Splunk automatically adds fields (i.e., attributes) to each of your
events. You can always add your own extraction rules for pulling out additional fields. To
narrow results with a search, just add attribute=value to your search:
• sourcetype=access_combined status=404
• This search shows a much more precise version of our first search (i.e., "http 404") because it
will only return events that come from access_combined sources (i.e., webserver events) and
that have a status code of 404, which is different than just having a 404 somewhere in the
text. In addition to <attribute>=<value>, you can also do != (not equals), and <, >, >=, and <=
for numeric fields.
Continued
• status=404 | top 5 referer_domain | search count>2
•
OK math geeks, supposing you want to calculate a new field based on other fields, you can
use the 'eval' command. Let's make a new field kbytes, on the fly, based on the bytes fields:
• * | eval kbytes = bytes/1024
And now for something completely different: assuming you had indexed data from a dating
site, search for the smartest girl of each hair and eye color variation, calculating her bmi:
•
• gender=female |sort -iq |dedup hair, eyes |eval bmi=weight/height
• No hate mail.
• We've just shown you a tiny, tiny window of what is possible in a Splunk search. See the
Appendix for a quick cheatsheet of search commands and examples.
SPLUNK >
Real-time Big Data
• Search and analysis engine
• Google like search of your ORGANIZATIONS data

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Search is the UI
Search is the UI Search is the UI
Search is the UI
 
Building a real time big data analytics platform with solr
Building a real time big data analytics platform with solrBuilding a real time big data analytics platform with solr
Building a real time big data analytics platform with solr
 
State-of-the-Art Drupal Search with Apache Solr
State-of-the-Art Drupal Search with Apache SolrState-of-the-Art Drupal Search with Apache Solr
State-of-the-Art Drupal Search with Apache Solr
 
Side by Side with Elasticsearch & Solr, Part 2
Side by Side with Elasticsearch & Solr, Part 2Side by Side with Elasticsearch & Solr, Part 2
Side by Side with Elasticsearch & Solr, Part 2
 
URLSession Reloaded
URLSession ReloadedURLSession Reloaded
URLSession Reloaded
 
Webinar: Solr 6 Deep Dive - SQL and Graph
Webinar: Solr 6 Deep Dive - SQL and GraphWebinar: Solr 6 Deep Dive - SQL and Graph
Webinar: Solr 6 Deep Dive - SQL and Graph
 
Boosting Documents in Solr by Recency, Popularity and Personal Preferences - ...
Boosting Documents in Solr by Recency, Popularity and Personal Preferences - ...Boosting Documents in Solr by Recency, Popularity and Personal Preferences - ...
Boosting Documents in Solr by Recency, Popularity and Personal Preferences - ...
 
Managing Your Content with Elasticsearch
Managing Your Content with ElasticsearchManaging Your Content with Elasticsearch
Managing Your Content with Elasticsearch
 
Elasticsearch in 15 minutes
Elasticsearch in 15 minutesElasticsearch in 15 minutes
Elasticsearch in 15 minutes
 
Introduction to Elasticsearch
Introduction to ElasticsearchIntroduction to Elasticsearch
Introduction to Elasticsearch
 
The ultimate guide for Elasticsearch plugins
The ultimate guide for Elasticsearch pluginsThe ultimate guide for Elasticsearch plugins
The ultimate guide for Elasticsearch plugins
 
Xapian vs sphinx
Xapian vs sphinxXapian vs sphinx
Xapian vs sphinx
 
Elasticsearch: You know, for search! and more!
Elasticsearch: You know, for search! and more!Elasticsearch: You know, for search! and more!
Elasticsearch: You know, for search! and more!
 
Conf2014_SplunkSearchOptimization
Conf2014_SplunkSearchOptimizationConf2014_SplunkSearchOptimization
Conf2014_SplunkSearchOptimization
 
Practical Machine Learning for Smarter Search with Solr and Spark
Practical Machine Learning for Smarter Search with Solr and SparkPractical Machine Learning for Smarter Search with Solr and Spark
Practical Machine Learning for Smarter Search with Solr and Spark
 
Elasticsearch speed is key
Elasticsearch speed is keyElasticsearch speed is key
Elasticsearch speed is key
 
Why Is My Solr Slow?: Presented by Mike Drob, Cloudera
Why Is My Solr Slow?: Presented by Mike Drob, ClouderaWhy Is My Solr Slow?: Presented by Mike Drob, Cloudera
Why Is My Solr Slow?: Presented by Mike Drob, Cloudera
 
How Rackspace Cloud Monitoring uses Cassandra
How Rackspace Cloud Monitoring uses CassandraHow Rackspace Cloud Monitoring uses Cassandra
How Rackspace Cloud Monitoring uses Cassandra
 
Use Cases for Elastic Search Percolator
Use Cases for Elastic Search PercolatorUse Cases for Elastic Search Percolator
Use Cases for Elastic Search Percolator
 
SolrCloud and Shard Splitting
SolrCloud and Shard SplittingSolrCloud and Shard Splitting
SolrCloud and Shard Splitting
 

Similar a Splunk bsides

Kiran karnad rtc2014 ghdb-final
Kiran karnad rtc2014 ghdb-finalKiran karnad rtc2014 ghdb-final
Kiran karnad rtc2014 ghdb-final
Romania Testing
 
Getting Started with Splunk Break out Session
Getting Started with Splunk Break out SessionGetting Started with Splunk Break out Session
Getting Started with Splunk Break out Session
Georg Knon
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
Erik Hatcher
 
SplunkLive! Beginner Session
SplunkLive! Beginner SessionSplunkLive! Beginner Session
SplunkLive! Beginner Session
Splunk
 
SplunkLive Oslo/Stockholm Beginner Workshop
SplunkLive Oslo/Stockholm Beginner WorkshopSplunkLive Oslo/Stockholm Beginner Workshop
SplunkLive Oslo/Stockholm Beginner Workshop
jenny_splunk
 
SplunkLive! Zürich 2014 Beginner Workshop: Getting started with Splunk
SplunkLive! Zürich 2014 Beginner Workshop: Getting started with SplunkSplunkLive! Zürich 2014 Beginner Workshop: Getting started with Splunk
SplunkLive! Zürich 2014 Beginner Workshop: Getting started with Splunk
Georg Knon
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
Erik Hatcher
 
SplunkLive! Getting Started with Splunk Enterprise
SplunkLive! Getting Started with Splunk EnterpriseSplunkLive! Getting Started with Splunk Enterprise
SplunkLive! Getting Started with Splunk Enterprise
Splunk
 
YQL:: Select * from Internet
YQL:: Select * from InternetYQL:: Select * from Internet
YQL:: Select * from Internet
drgath
 
How ElasticSearch lives in my DevOps life
How ElasticSearch lives in my DevOps lifeHow ElasticSearch lives in my DevOps life
How ElasticSearch lives in my DevOps life
琛琳 饶
 

Similar a Splunk bsides (20)

Getting started with Splunk - Break out Session
Getting started with Splunk - Break out SessionGetting started with Splunk - Break out Session
Getting started with Splunk - Break out Session
 
Getting started with Splunk
Getting started with SplunkGetting started with Splunk
Getting started with Splunk
 
Kiran karnad rtc2014 ghdb-final
Kiran karnad rtc2014 ghdb-finalKiran karnad rtc2014 ghdb-final
Kiran karnad rtc2014 ghdb-final
 
Getting Started with Splunk Break out Session
Getting Started with Splunk Break out SessionGetting Started with Splunk Break out Session
Getting Started with Splunk Break out Session
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
SplunkLive! - Getting started with Splunk
SplunkLive! - Getting started with SplunkSplunkLive! - Getting started with Splunk
SplunkLive! - Getting started with Splunk
 
Getting started with Splunk Breakout Session
Getting started with Splunk Breakout SessionGetting started with Splunk Breakout Session
Getting started with Splunk Breakout Session
 
SplunkLive! Beginner Session
SplunkLive! Beginner SessionSplunkLive! Beginner Session
SplunkLive! Beginner Session
 
SplunkLive Oslo/Stockholm Beginner Workshop
SplunkLive Oslo/Stockholm Beginner WorkshopSplunkLive Oslo/Stockholm Beginner Workshop
SplunkLive Oslo/Stockholm Beginner Workshop
 
SplunkLive! Zürich 2014 Beginner Workshop: Getting started with Splunk
SplunkLive! Zürich 2014 Beginner Workshop: Getting started with SplunkSplunkLive! Zürich 2014 Beginner Workshop: Getting started with Splunk
SplunkLive! Zürich 2014 Beginner Workshop: Getting started with Splunk
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
Splunk live beginner training nyc
Splunk live beginner training nycSplunk live beginner training nyc
Splunk live beginner training nyc
 
Google Dorks
Google DorksGoogle Dorks
Google Dorks
 
SplunkLive! Getting Started with Splunk Enterprise
SplunkLive! Getting Started with Splunk EnterpriseSplunkLive! Getting Started with Splunk Enterprise
SplunkLive! Getting Started with Splunk Enterprise
 
Apache Spark Streaming -Real time web server log analytics
Apache Spark Streaming -Real time web server log analyticsApache Spark Streaming -Real time web server log analytics
Apache Spark Streaming -Real time web server log analytics
 
YQL: Select * from Internet
YQL: Select * from InternetYQL: Select * from Internet
YQL: Select * from Internet
 
YQL:: Select * from Internet
YQL:: Select * from InternetYQL:: Select * from Internet
YQL:: Select * from Internet
 
How ElasticSearch lives in my DevOps life
How ElasticSearch lives in my DevOps lifeHow ElasticSearch lives in my DevOps life
How ElasticSearch lives in my DevOps life
 
Data encoding and Metadata for Streams
Data encoding and Metadata for StreamsData encoding and Metadata for Streams
Data encoding and Metadata for Streams
 
System insight without Interference
System insight without InterferenceSystem insight without Interference
System insight without Interference
 

Splunk bsides

  • 1. Why it matters to IT MACY CRONKRITE @MACYCRON www.facebook.com/safehex
  • 2. Data Mining for Organization Value Data you are already processing has value • Audit trail & application status • Automatic monitoring for errors and warnings • Helping track down configuration problems • Helping track down bugs • Micro analysis of user behavior “click stream” and complex events • No more email to “monitor a process” • Get alerts only when something critically fails.
  • 3. What is it? • Search and analysis engine • Google like search of your log data
  • 4. RDBMS??? ARE YOU KIDDING? Organization Data is BIG DATA (velocity-variety-volume) So Map Reduce – Key Value Pairs FTW!!!! Old way RDBMS >>> New Way (Map Reduce)
  • 5. • Could be better than user supplied info? AKA tickets, complaints, unreported errors. • Behavior Analysis (Good and Bad)
  • 6. Versions • Free – 500MB/day – Reporting – Ad-hoc search • Enterprise (all above and) – 500MB/day and more! – Access controls – Distributed Search, Load Balancing – Monitoring & Alerting
  • 7. Server 1 • Install Splunkd and SplunkWeb • Via WebGUI under Manager tab • Add Receiver Port to enable forwarders Server1
  • 8. Setup 2 Forwarder Setup (most common) • Server1 – Install Splunkd and SplunkWeb • ServerX – Install Splunkd Server1 ServerX ServerX ServerX ServerX ServerX ServerX
  • 9. MACHINE DATA • Most sensors create log files • Anything with a time-stamp • Unstructured data (many source types) • Anything that the system does on behalf of a user can be tracked, aggregated, and correlated across servers and applications • At minimum two keys are needed; – timestamp, and unique user session id.
  • 10.
  • 11. Why --- Event Correlation • It leverages a natural query language to perform searches and analysis of log files. • A single search can cross multiple disparate logs looking for key words and other structures • Splunk is licensed per volume of data indexed, not on a per server basis • Build Apps (custom views) for specific ROLES
  • 12. Mix Human Event Reports AND Machine Events Correlate your 1X / Base case instantly LOGS are on all layers of your application stack Alert when the combination of events meet criteria. Less for human to parse Whew!! Less data overload/ignore you won’t go back
  • 13.
  • 14. What is Splunk? • Sounds like its expensive or it takes weeks to set up. • There’s a free license. It installs in 15 minutes. On your laptop, while you’re testing it out, search billions of events in seconds. When you’re ready, scale up to your datacenter and search trillions. Basic searching and quite a lot of the reporting will work right out of the box. • Bullsxxx. Well I’m not saying that 15 minutes in, it’s going to be emailing your boss a pdf pie chart of “lost revenue – top causes”. But that’s seriously possible in a couple of hours. Out of the box, Splunk will parse your data and extract out a lot of meaning, and if it doesn’t get everything, teaching it how to extract the juicy numbers and names from your events is really pretty straightforward. Then, once all the numbers and names are extracted and ready to be reported on, you’ll be able to do real searches and reports that help your people solve real problems. And when you get to that point, from then on it’s pretty much crack. My goal in this document is to get you addicted. Sorry. • Download Splunk for free and try it for yourself from splunk.com, right now.
  • 15. Uses • Right Now we are using Splunk to calculate our VPN metrics for the Remote Access service • Total Sessions – index="vpn" user authentication Successful | stats count AS Logins • Unique users – index="vpn" %ASA-6-113004 | rex field=_raw "user = "(?<Username>.*) | dedup Username | stats count AS UniqueUsers • For information usage, “non ‘mm’ machines” – index="vpn" Received request for DHCP hostname for DDNS| rex field=_raw "hostname for DDNS is: (?<Machine>.*)!"| eval machine=lower(Machine)| search Machine!= "mm*" | rex field=_raw "Username = (?<User>.*), IP"| table User, Machine
  • 16. Transactions ACROSS devices • Can we calculate IN SPLUNK, the transaction duration, e.g. started transaction at timestamp, and end transaction. IF we standardize on the Keys for the start and end. • This is a different approach to solving "duration"
  • 17.
  • 18.
  • 20. Splunk Navigation and Basic Searching REVIEW • Splunk comes with several Apps, but the only relevant one now is the 'Search' app, which is the interface for generic searching. To begin your Splunk search, type in terms you might expect to find in your data. For example, if you want to find events that might be HTTP 404 errors (i.e., webpage not found), type in the keywords: • http 404 --You'll get back all the events that have both HTTP and 404 in their text. Notice that search terms are implicitly AND'd together. The search was the same as "http AND 404". Let's make the search narrower: • http 404 "like gecko“ Using quotes tells Splunk to search for a literal phrase “like gecko”, which returns more specific results than just searching for “like” and “gecko” because they must be adjacent as a phrase. • Splunk supports the Boolean operators AND, OR, and NOT (must be capitalized), as well as parentheses to enforce grouping. To get all HTTP error events (i.e., not 200 error code), not including 403 or 404, use this: • http NOT (200 OR 403 OR 404) Again, the AND operator is implied; the previous search is the same as http AND NOT (200 OR 403 OR 404) • Splunk supports the asterisk (*) wildcard for searching. For example, to retrieve events that has 40x and 50xx classes of HTTP status codes, you could try: http (40* OR 50*) •
  • 21. Intermediate Searching • Splunk's search language is much more powerful than you think it is. So far we've only been talking about 'search', which retrieves your indexed data, but there are dozens of other operations you can perform on your data. You can "pipe" (i.e., transfer) the results of a search to other commands to filter, modify, reorder, and group your results. • If Google were Splunk, you'd be able to search the web for every single page mentioning your ex-girlfriends, extract out geographical information, remove results without location info, sort the results by when they were written, keeping only the most recent page per ex- girlfriend, and finally generate a state by-state count of where Mr. Don Juan's ladies currently live. But Google isn't Splunk, so good luck with that. • Let's do something similar, though, with our web data: let's find some interesting things about URIs that have 404s. Here's our basic search: • status=404 • Now let's take the result of that search and sort the results by URI: • status=404 | sort - uri • That special "pipe" character ("|") says "take the results of the thing on the left and process it, in this case, with the 'sort' operator". •
  • 22. Splunk Navigation and Basic Searching REVIEW • Wildcards can appear anywhere in a term, so "f*ck" will return all events with fack, feck, fick, fock, or flapjack, among others. A search for “*” will return all events. Note that in these searches we’ve been playing fast and loose with precision. Any event that has 50 in it (e.g. “12:18:50”) would also unfortunately match. Let’s fix that. • When you index data, Splunk automatically adds fields (i.e., attributes) to each of your events. You can always add your own extraction rules for pulling out additional fields. To narrow results with a search, just add attribute=value to your search: • sourcetype=access_combined status=404 • This search shows a much more precise version of our first search (i.e., "http 404") because it will only return events that come from access_combined sources (i.e., webserver events) and that have a status code of 404, which is different than just having a 404 somewhere in the text. In addition to <attribute>=<value>, you can also do != (not equals), and <, >, >=, and <= for numeric fields.
  • 23. Continued • status=404 | top 5 referer_domain | search count>2 • OK math geeks, supposing you want to calculate a new field based on other fields, you can use the 'eval' command. Let's make a new field kbytes, on the fly, based on the bytes fields: • * | eval kbytes = bytes/1024 And now for something completely different: assuming you had indexed data from a dating site, search for the smartest girl of each hair and eye color variation, calculating her bmi: • • gender=female |sort -iq |dedup hair, eyes |eval bmi=weight/height • No hate mail. • We've just shown you a tiny, tiny window of what is possible in a Splunk search. See the Appendix for a quick cheatsheet of search commands and examples.
  • 25. Real-time Big Data • Search and analysis engine • Google like search of your ORGANIZATIONS data