SlideShare una empresa de Scribd logo
1 de 26
Copyright © 2012 Splunk Inc.
How to Integrate Splunk
        with any Data Solution
Julian Hyde (Optiq) @julianhyde

http://github.com/julianhyde/optiq
http://github.com/julianhyde/optiq-splunk

Splunk Worldwide Users
Conference 2012
Why are we here?
I'm going to explain how to use Splunk to access all of the data in your
   enterprise.
And also to let people in your enterprise use data in Splunk.
This isn't easy. We'll be showing some raw technology – the new Optiq
   project and its Splunk adapter.
But it's open source, so you can all get your hands on it. :)
About me
 Database hacker
 Open source hacker
 Author of Mondrian (Pentaho Analysis)
 Startup fiend
http://www.flickr.com/photos/torkildr/3462606643
http://www.flickr.com/photos/sylvar/31436961/
“Big Data”
Right data, right time
Diverse data sources / Performance / Suitable format
Example
Accessing Splunk data via SQL
Sqlline (a standard JDBC client)
How do it (wrong)
                                             action =
                                           'purchase'
                     “search”



        Splunk                  Optiq   filter



SELECT “source”, “product_id”
FROM “splunk”.”splunk”
WHERE “action” = 'purchase'
How do it (right)
                              “search
                         action=purchase”




        Splunk                       Optiq



SELECT “source”, “product_id”
FROM “splunk”.”splunk”
WHERE “action” = 'purchase'
Example #2
Combining data from 2 sources (Splunk & MySQL)
Also possible: 3 or more sources; 3-way joins; unions
Expression tree                                 SELECT p.“product_name”, COUNT(*) AS c
                                                FROM “splunk”.”splunk” AS s
                                                  JOIN “mysql”.”products” AS p
                                                  ON s.”product_id” = p.”product_id”
                                                WHERE s.“action” = 'purchase'
                                                GROUP BY p.”product_name”
  Splunk                                        ORDER BY c DESC

 Table: splunk
                                                      Key: product_name
                     Key: product_id                  Agg: count
                                       Condition:                         Key: c DESC
                                         action =
                                       'purchase'
  scan
                          join
  MySQL                                filter             group           sort
     scan
                 Table: products
Expression tree                               SELECT p.“product_name”, COUNT(*) AS c
                                              FROM “splunk”.”splunk” AS s
(optimized)                                     JOIN “mysql”.”products” AS p
                                                ON s.”product_id” = p.”product_id”
                                              WHERE s.“action” = 'purchase'
                                              GROUP BY p.”product_name”
                 Splunk                       ORDER BY c DESC
                          Condition:
 Table: splunk              action =
                          'purchase'                     Key: product_name
                                                         Agg: count
                                                                             Key: c DESC
                                       Key: product_id
  scan                     filter

  MySQL
                                       join                  group           sort
     scan
                   Table: products
Optiq is not a database.
http://www.flickr.com/photos/torkildr/3462606643
http://www.flickr.com/photos/telstra-corp/5069403309/
Conventional database architecture
                 JDBC client


                 JDBC server
                 SQL parser /
                   validator           Metadata
                    Query
                  optimizer
                  Data-flow
                  operators

         Data                   Data
Optiq architecture
                         JDBC client


                          JDBC server
                 Optional SQL parser /          Metadata
                            validator             SPI
                   Core       Query             Pluggable
                            optimizer             rules
                           3rd     3rd
                Pluggable party party
                           ops     ops
         3rd party                       3rd party
           data                            data
What is Optiq?
A really, really smart JDBC driver
Framework
Potential core of a data management system
Writing an adapter
Driver – if you want a vanity URL like “jdbc:splunk:”
Schema – describes what tables exist (Splunk has just one)
Table – what are the columns, and how to get the data. (Splunk's table has
  any column you like... just ask for it.)
Operators (optional) – non-relational operations
Rules (optional, but recommended) – improve efficiency by changing the
   question
Parser (optional) – to query via a language other than SQL
Splunk Adapter
Rules for pushing down filters, projections
The tricky bit: changed the validator to allow tables to have any column
To be written: rules for pushing down aggregations, joins
(What you've seen today is in github.)


Would be really nice if... Splunk pushed down filters, projections,
  aggregations from its search pipeline to the MySQL connector.
  (Currently you have to hand-write a SQL statement.)
http://www.flickr.com/photos/walkercarpenter/4697637143/
Optiq roadmap ideas
Mondrian use Optiq to read from data sources such as Splunk
Kettle integration (read/write SQL to ETL)
Adapters: Cascading, MongoDB, Hbase, Apache Drill, …?
Front-ends: linq4j, Scala SLICK, Java8 streams
Contributions
Conclusions
Liberate your data!
Optiq is a framework
Build & share Optiq adapters
Questions?


@julianhyde
http://julianhyde.blogspot.com
http://github.com/julianhyde/optiq
http://github.com/julianhyde/optiq-splunk
Additional material: The following queries were used in the demo

select s."source", s."sourcetype"    select * from "mysql"."products";
   from "splunk"."splunk" as s;

                                     select p."product_name",
select s."source", s."sourcetype",      s."action"
   s."action" from
   "splunk"."splunk" as s            from "splunk"."splunk" as s

where s."action" = 'purchase';        join "mysql"."products" as p
                                      on s."product_id" =
                                       p."product_id";
select s."source", s."sourcetype",
   s."action" from

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

What's new in Mondrian 4?
What's new in Mondrian 4?What's new in Mondrian 4?
What's new in Mondrian 4?
 
SQL for NoSQL and how Apache Calcite can help
SQL for NoSQL and how  Apache Calcite can helpSQL for NoSQL and how  Apache Calcite can help
SQL for NoSQL and how Apache Calcite can help
 
SQL Now! How Optiq brings the best of SQL to NoSQL data.
SQL Now! How Optiq brings the best of SQL to NoSQL data.SQL Now! How Optiq brings the best of SQL to NoSQL data.
SQL Now! How Optiq brings the best of SQL to NoSQL data.
 
Mondrian update (Pentaho community meetup 2012, Amsterdam)
Mondrian update (Pentaho community meetup 2012, Amsterdam)Mondrian update (Pentaho community meetup 2012, Amsterdam)
Mondrian update (Pentaho community meetup 2012, Amsterdam)
 
Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...
Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...
Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...
 
Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...
Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...
Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...
 
Apache Calcite (a tutorial given at BOSS '21)
Apache Calcite (a tutorial given at BOSS '21)Apache Calcite (a tutorial given at BOSS '21)
Apache Calcite (a tutorial given at BOSS '21)
 
A smarter Pig: Building a SQL interface to Apache Pig using Apache Calcite
A smarter Pig: Building a SQL interface to Apache Pig using Apache CalciteA smarter Pig: Building a SQL interface to Apache Pig using Apache Calcite
A smarter Pig: Building a SQL interface to Apache Pig using Apache Calcite
 
Streaming SQL (at FlinkForward, Berlin, 2016/09/12)
Streaming SQL (at FlinkForward, Berlin, 2016/09/12)Streaming SQL (at FlinkForward, Berlin, 2016/09/12)
Streaming SQL (at FlinkForward, Berlin, 2016/09/12)
 
Cost-based query optimization in Apache Hive 0.14
Cost-based query optimization in Apache Hive 0.14Cost-based query optimization in Apache Hive 0.14
Cost-based query optimization in Apache Hive 0.14
 
Calcite meetup-2016-04-20
Calcite meetup-2016-04-20Calcite meetup-2016-04-20
Calcite meetup-2016-04-20
 
Introduce to Spark sql 1.3.0
Introduce to Spark sql 1.3.0 Introduce to Spark sql 1.3.0
Introduce to Spark sql 1.3.0
 
Cost-based Query Optimization in Apache Phoenix using Apache Calcite
Cost-based Query Optimization in Apache Phoenix using Apache CalciteCost-based Query Optimization in Apache Phoenix using Apache Calcite
Cost-based Query Optimization in Apache Phoenix using Apache Calcite
 
Introduction to Apache Calcite
Introduction to Apache CalciteIntroduction to Apache Calcite
Introduction to Apache Calcite
 
Apache Calcite overview
Apache Calcite overviewApache Calcite overview
Apache Calcite overview
 
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
 
Tactical data engineering
Tactical data engineeringTactical data engineering
Tactical data engineering
 
Why you care about
 relational algebra (even though you didn’t know it)
Why you care about
 relational algebra (even though you didn’t know it)Why you care about
 relational algebra (even though you didn’t know it)
Why you care about
 relational algebra (even though you didn’t know it)
 
Streaming SQL with Apache Calcite
Streaming SQL with Apache CalciteStreaming SQL with Apache Calcite
Streaming SQL with Apache Calcite
 
Discardable In-Memory Materialized Queries With Hadoop
Discardable In-Memory Materialized Queries With HadoopDiscardable In-Memory Materialized Queries With Hadoop
Discardable In-Memory Materialized Queries With Hadoop
 

Destacado

SplunkLive! Splunk for IT Operations
SplunkLive! Splunk for IT OperationsSplunkLive! Splunk for IT Operations
SplunkLive! Splunk for IT Operations
Splunk
 
Data Mining with Splunk
Data Mining with SplunkData Mining with Splunk
Data Mining with Splunk
David Carasso
 

Destacado (9)

From Support to Success: How Splunk Evolved Its Success Services to Deliver M...
From Support to Success: How Splunk Evolved Its Success Services to Deliver M...From Support to Success: How Splunk Evolved Its Success Services to Deliver M...
From Support to Success: How Splunk Evolved Its Success Services to Deliver M...
 
Splunk in integration testing
Splunk in integration testingSplunk in integration testing
Splunk in integration testing
 
SplunkLive! Splunk for IT Operations
SplunkLive! Splunk for IT OperationsSplunkLive! Splunk for IT Operations
SplunkLive! Splunk for IT Operations
 
Splunk for NAC in Yandex
Splunk for NAC in YandexSplunk for NAC in Yandex
Splunk for NAC in Yandex
 
Enrich a 360-degree Customer View with Splunk and Apache Hadoop
Enrich a 360-degree Customer View with Splunk and Apache HadoopEnrich a 360-degree Customer View with Splunk and Apache Hadoop
Enrich a 360-degree Customer View with Splunk and Apache Hadoop
 
Driving Efficiency with Splunk Cloud at Gatwick Airport
Driving Efficiency with Splunk Cloud at Gatwick AirportDriving Efficiency with Splunk Cloud at Gatwick Airport
Driving Efficiency with Splunk Cloud at Gatwick Airport
 
How to Test Big Data Systems | QualiTest Group
How to Test Big Data Systems | QualiTest GroupHow to Test Big Data Systems | QualiTest Group
How to Test Big Data Systems | QualiTest Group
 
Data Mining with Splunk
Data Mining with SplunkData Mining with Splunk
Data Mining with Splunk
 
Big Data Testing: Ensuring MongoDB Data Quality
Big Data Testing: Ensuring MongoDB Data QualityBig Data Testing: Ensuring MongoDB Data Quality
Big Data Testing: Ensuring MongoDB Data Quality
 

Similar a How to integrate Splunk with any data solution

Projeto-web-services-Spring-Boot-JPA.pdf
Projeto-web-services-Spring-Boot-JPA.pdfProjeto-web-services-Spring-Boot-JPA.pdf
Projeto-web-services-Spring-Boot-JPA.pdf
AdrianoSantos888423
 

Similar a How to integrate Splunk with any data solution (20)

Spark Summit EU talk by Michael Nitschinger
Spark Summit EU talk by Michael NitschingerSpark Summit EU talk by Michael Nitschinger
Spark Summit EU talk by Michael Nitschinger
 
Projeto-web-services-Spring-Boot-JPA.pdf
Projeto-web-services-Spring-Boot-JPA.pdfProjeto-web-services-Spring-Boot-JPA.pdf
Projeto-web-services-Spring-Boot-JPA.pdf
 
Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...
Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...
Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...
 
SplunkLive! Tampa: Splunk Ninjas: New Features, Pivot, and Search Dojo
SplunkLive! Tampa: Splunk Ninjas: New Features, Pivot, and Search Dojo SplunkLive! Tampa: Splunk Ninjas: New Features, Pivot, and Search Dojo
SplunkLive! Tampa: Splunk Ninjas: New Features, Pivot, and Search Dojo
 
Splunk Ninjas: New Features and Search Dojo
Splunk Ninjas: New Features and Search DojoSplunk Ninjas: New Features and Search Dojo
Splunk Ninjas: New Features and Search Dojo
 
The Django Book - Chapter 5: Models
The Django Book - Chapter 5: ModelsThe Django Book - Chapter 5: Models
The Django Book - Chapter 5: Models
 
Polyalgebra
PolyalgebraPolyalgebra
Polyalgebra
 
Spark Sql for Training
Spark Sql for TrainingSpark Sql for Training
Spark Sql for Training
 
Scaling the Content Repository with Elasticsearch
Scaling the Content Repository with ElasticsearchScaling the Content Repository with Elasticsearch
Scaling the Content Repository with Elasticsearch
 
Automatically Scaling Your Kubernetes Workloads - SVC209-S - Anaheim AWS Summit
Automatically Scaling Your Kubernetes Workloads - SVC209-S - Anaheim AWS SummitAutomatically Scaling Your Kubernetes Workloads - SVC209-S - Anaheim AWS Summit
Automatically Scaling Your Kubernetes Workloads - SVC209-S - Anaheim AWS Summit
 
Intro to Spark and Spark SQL
Intro to Spark and Spark SQLIntro to Spark and Spark SQL
Intro to Spark and Spark SQL
 
Using ElasticSearch as a fast, flexible, and scalable solution to search occu...
Using ElasticSearch as a fast, flexible, and scalable solution to search occu...Using ElasticSearch as a fast, flexible, and scalable solution to search occu...
Using ElasticSearch as a fast, flexible, and scalable solution to search occu...
 
How Klout is changing the landscape of social media with Hadoop and BI
How Klout is changing the landscape of social media with Hadoop and BIHow Klout is changing the landscape of social media with Hadoop and BI
How Klout is changing the landscape of social media with Hadoop and BI
 
Scale By The Bay | 2020 | Gimel
Scale By The Bay | 2020 | GimelScale By The Bay | 2020 | Gimel
Scale By The Bay | 2020 | Gimel
 
PYSPARK PROGRAMMING.pdf
PYSPARK PROGRAMMING.pdfPYSPARK PROGRAMMING.pdf
PYSPARK PROGRAMMING.pdf
 
Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...
Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...
Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...
 
Autoscaling Your Kubernetes Workloads (Sponsored by Datadog) - AWS Summit Sydney
Autoscaling Your Kubernetes Workloads (Sponsored by Datadog) - AWS Summit SydneyAutoscaling Your Kubernetes Workloads (Sponsored by Datadog) - AWS Summit Sydney
Autoscaling Your Kubernetes Workloads (Sponsored by Datadog) - AWS Summit Sydney
 
ETL 2.0 Data Engineering for developers
ETL 2.0 Data Engineering for developersETL 2.0 Data Engineering for developers
ETL 2.0 Data Engineering for developers
 
Ml ops and the feature store with hopsworks, DC Data Science Meetup
Ml ops and the feature store with hopsworks, DC Data Science MeetupMl ops and the feature store with hopsworks, DC Data Science Meetup
Ml ops and the feature store with hopsworks, DC Data Science Meetup
 
Automatically scaling your Kubernetes workloads - SVC201-S - Chicago AWS Summit
Automatically scaling your Kubernetes workloads - SVC201-S - Chicago AWS SummitAutomatically scaling your Kubernetes workloads - SVC201-S - Chicago AWS Summit
Automatically scaling your Kubernetes workloads - SVC201-S - Chicago AWS Summit
 

Más de Julian Hyde

Más de Julian Hyde (20)

Building a semantic/metrics layer using Calcite
Building a semantic/metrics layer using CalciteBuilding a semantic/metrics layer using Calcite
Building a semantic/metrics layer using Calcite
 
Cubing and Metrics in SQL, oh my!
Cubing and Metrics in SQL, oh my!Cubing and Metrics in SQL, oh my!
Cubing and Metrics in SQL, oh my!
 
Adding measures to Calcite SQL
Adding measures to Calcite SQLAdding measures to Calcite SQL
Adding measures to Calcite SQL
 
Morel, a data-parallel programming language
Morel, a data-parallel programming languageMorel, a data-parallel programming language
Morel, a data-parallel programming language
 
Is there a perfect data-parallel programming language? (Experiments with More...
Is there a perfect data-parallel programming language? (Experiments with More...Is there a perfect data-parallel programming language? (Experiments with More...
Is there a perfect data-parallel programming language? (Experiments with More...
 
Morel, a Functional Query Language
Morel, a Functional Query LanguageMorel, a Functional Query Language
Morel, a Functional Query Language
 
The evolution of Apache Calcite and its Community
The evolution of Apache Calcite and its CommunityThe evolution of Apache Calcite and its Community
The evolution of Apache Calcite and its Community
 
What to expect when you're Incubating
What to expect when you're IncubatingWhat to expect when you're Incubating
What to expect when you're Incubating
 
Open Source SQL - beyond parsers: ZetaSQL and Apache Calcite
Open Source SQL - beyond parsers: ZetaSQL and Apache CalciteOpen Source SQL - beyond parsers: ZetaSQL and Apache Calcite
Open Source SQL - beyond parsers: ZetaSQL and Apache Calcite
 
Efficient spatial queries on vanilla databases
Efficient spatial queries on vanilla databasesEfficient spatial queries on vanilla databases
Efficient spatial queries on vanilla databases
 
Don't optimize my queries, organize my data!
Don't optimize my queries, organize my data!Don't optimize my queries, organize my data!
Don't optimize my queries, organize my data!
 
Spatial query on vanilla databases
Spatial query on vanilla databasesSpatial query on vanilla databases
Spatial query on vanilla databases
 
Lazy beats Smart and Fast
Lazy beats Smart and FastLazy beats Smart and Fast
Lazy beats Smart and Fast
 
Don’t optimize my queries, optimize my data!
Don’t optimize my queries, optimize my data!Don’t optimize my queries, optimize my data!
Don’t optimize my queries, optimize my data!
 
Data profiling with Apache Calcite
Data profiling with Apache CalciteData profiling with Apache Calcite
Data profiling with Apache Calcite
 
Data Profiling in Apache Calcite
Data Profiling in Apache CalciteData Profiling in Apache Calcite
Data Profiling in Apache Calcite
 
Streaming SQL
Streaming SQLStreaming SQL
Streaming SQL
 
Streaming SQL
Streaming SQLStreaming SQL
Streaming SQL
 
Streaming SQL
Streaming SQLStreaming SQL
Streaming SQL
 
Streaming SQL
Streaming SQLStreaming SQL
Streaming SQL
 

Último

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 

Último (20)

presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 

How to integrate Splunk with any data solution

  • 1. Copyright © 2012 Splunk Inc.
  • 2. How to Integrate Splunk with any Data Solution Julian Hyde (Optiq) @julianhyde http://github.com/julianhyde/optiq http://github.com/julianhyde/optiq-splunk Splunk Worldwide Users Conference 2012
  • 3. Why are we here? I'm going to explain how to use Splunk to access all of the data in your enterprise. And also to let people in your enterprise use data in Splunk. This isn't easy. We'll be showing some raw technology – the new Optiq project and its Splunk adapter. But it's open source, so you can all get your hands on it. :)
  • 4. About me Database hacker Open source hacker Author of Mondrian (Pentaho Analysis) Startup fiend
  • 7. “Big Data” Right data, right time Diverse data sources / Performance / Suitable format
  • 8. Example Accessing Splunk data via SQL Sqlline (a standard JDBC client)
  • 9. How do it (wrong) action = 'purchase' “search” Splunk Optiq filter SELECT “source”, “product_id” FROM “splunk”.”splunk” WHERE “action” = 'purchase'
  • 10. How do it (right) “search action=purchase” Splunk Optiq SELECT “source”, “product_id” FROM “splunk”.”splunk” WHERE “action” = 'purchase'
  • 11. Example #2 Combining data from 2 sources (Splunk & MySQL) Also possible: 3 or more sources; 3-way joins; unions
  • 12. Expression tree SELECT p.“product_name”, COUNT(*) AS c FROM “splunk”.”splunk” AS s JOIN “mysql”.”products” AS p ON s.”product_id” = p.”product_id” WHERE s.“action” = 'purchase' GROUP BY p.”product_name” Splunk ORDER BY c DESC Table: splunk Key: product_name Key: product_id Agg: count Condition: Key: c DESC action = 'purchase' scan join MySQL filter group sort scan Table: products
  • 13. Expression tree SELECT p.“product_name”, COUNT(*) AS c FROM “splunk”.”splunk” AS s (optimized) JOIN “mysql”.”products” AS p ON s.”product_id” = p.”product_id” WHERE s.“action” = 'purchase' GROUP BY p.”product_name” Splunk ORDER BY c DESC Condition: Table: splunk action = 'purchase' Key: product_name Agg: count Key: c DESC Key: product_id scan filter MySQL join group sort scan Table: products
  • 14. Optiq is not a database.
  • 17. Conventional database architecture JDBC client JDBC server SQL parser / validator Metadata Query optimizer Data-flow operators Data Data
  • 18. Optiq architecture JDBC client JDBC server Optional SQL parser / Metadata validator SPI Core Query Pluggable optimizer rules 3rd 3rd Pluggable party party ops ops 3rd party 3rd party data data
  • 19. What is Optiq? A really, really smart JDBC driver Framework Potential core of a data management system
  • 20. Writing an adapter Driver – if you want a vanity URL like “jdbc:splunk:” Schema – describes what tables exist (Splunk has just one) Table – what are the columns, and how to get the data. (Splunk's table has any column you like... just ask for it.) Operators (optional) – non-relational operations Rules (optional, but recommended) – improve efficiency by changing the question Parser (optional) – to query via a language other than SQL
  • 21. Splunk Adapter Rules for pushing down filters, projections The tricky bit: changed the validator to allow tables to have any column To be written: rules for pushing down aggregations, joins (What you've seen today is in github.) Would be really nice if... Splunk pushed down filters, projections, aggregations from its search pipeline to the MySQL connector. (Currently you have to hand-write a SQL statement.)
  • 23. Optiq roadmap ideas Mondrian use Optiq to read from data sources such as Splunk Kettle integration (read/write SQL to ETL) Adapters: Cascading, MongoDB, Hbase, Apache Drill, …? Front-ends: linq4j, Scala SLICK, Java8 streams Contributions
  • 24. Conclusions Liberate your data! Optiq is a framework Build & share Optiq adapters
  • 26. Additional material: The following queries were used in the demo select s."source", s."sourcetype" select * from "mysql"."products"; from "splunk"."splunk" as s; select p."product_name", select s."source", s."sourcetype", s."action" s."action" from "splunk"."splunk" as s from "splunk"."splunk" as s where s."action" = 'purchase'; join "mysql"."products" as p on s."product_id" = p."product_id"; select s."source", s."sourcetype", s."action" from

Notas del editor

  1. The obligatory “big data” definition slide. What is “big data”? It's not really about “big”. We need to access data from different parts of the organization, when we need it (which often means we don't have time to copy it), and the performance needs to be reasonable. If the data is large, it is often larger than the disks one can fit on one machine. It helps if we can process the data in place, leveraging the CPU and memory of the machines where the data is stored. We'd rather not copy it from one system to another. It needs to be flexible, to deal with diverse systems and formats. That often means that open source is involved. Some systems (e.g. reporting tools) can't easily be changed to accommodate new formats. So it helps if the data can be presented in standard formats, e.g. SQL.
  2. Demo connecting to Splunk via the Optiq driver. We aer using sqlline as the shell (it works with any JDBC driver). Se;ect “source” from “splunk”.”splunk” where “sourcetype=” = 'mysqld-4'; In the generated Java on the screen, Note how sourcetype is pushed down to Splunk.
  3. The wrong way to execute the query is for Splunk to send all of the data to Optiq. Splunk does more work than it needs to, it doesn't use any indexes, the network sends too much data, Optiq does too much work.
  4. The right way to execute the query is to pass the filter down to Splunk. This lets Splunk use its indexes, so it does less work, passes less data over the network, and the query finishes faster. This is just a simple answer, but a lot of problems can be solved by “pushing down” expressions, filters, computation of summaries. Do the work, and reduce the volume of data, as early in the process as possible.
  5. Demo connecting to Splunk via the Optiq driver. We aer using sqlline as the shell (it works with any JDBC driver). Se;ect “source” from “splunk”.”splunk” where “sourcetype=” = 'mysqld-4'; In the generated Java on the screen, Note how sourcetype is pushed down to Splunk.
  6. It's much more efficient if we psuh filters and aggregations to Splunk. But the user writing SQL shouldn't have to worry about that. This is not about processing data. This is about processing expressions. Reformulating the question. The question is the parse tree of a query. The parse tree is a data flow. In Splunk, a data flow looks like a pipeline of Linux commands. SQL systems have pipelines too (sometimes they are dataflow trees) built up of the basic relational operators. Think of the SQL SELECT, WHERE, JOIN, GROUP BY, ORDER BY clauses.
  7. It's much more efficient if we psuh filters and aggregations to Splunk. But the user writing SQL shouldn't have to worry about that. This is not about processing data. This is about processing expressions. Reformulating the question. The question is the parse tree of a query. The parse tree is a data flow. In Splunk, a data flow looks like a pipeline of Linux commands. SQL systems have pipelines too (sometimes they are dataflow trees) built up of the basic relational operators. Think of the SQL SELECT, WHERE, JOIN, GROUP BY, ORDER BY clauses.
  8. To recap. Optiq is not a database. It does as little of the database processing as it can get away with. Ideally, nothing at all. But what is it?
  9. Optiq is not a database... it is more like a telephone exchange. Applications can get the data they need, quickly and efficiently.
  10. Conventional database has ODBC/JDBC driver, SQL parser, . Data sources. Expression tree. Expression transformation rules. Optimizer. For NoSQL databases, the language may not be SQL, and the optimizer may be less sophisticated, but the picture is basically the same. For frameworks, such as Hadoop, there is no planner. You end up writing code (e.g MapReduce jobs).
  11. In Optiq, the query optimizer (we modestly call it the planner) is central. The JDBC driver/server and SQL parser are optional; skip them if you have another language. Plug-ins provide metadata (the schema), planner rules, and runtime operators. There are built-in relational operators and rules, and there are built-in operators implemented in Java. But to access data, you need to provide at least one operator.
  12. It needs to be said. Optiq is not a database. It looks like a database to your applications, and that's great. But when you want to integrate data from multiple sources, in different formats, and have those systems talk to each other, it doesn't force you to copy the data around. It gets out of your way. You configure Optiq by writing Java code. Therefore it is a framework, like Spring and, yes, like Hadoop. Optiq masquerades as a really, really smart JDBC driver. It has a SQL parser and JDBC driver. And actually you can embed it into another data management system, with a language other than SQL.