SlideShare a Scribd company logo
1 of 23
Download to read offline
Real time analytics
of big data with Elasticsearch




Karel Minařík
cets
                                      Fa




                                      ly tics
  SON                        Ana
J

 http://www.youtube.com/watch?v=-GftBySG99Q
http://karmi.cz
http://elasticsearch.com


                  Realtime Analytics With ElasticSearch
Using a search engine for analytics?


wat?

                              Realtime Analytics With ElasticSearch
HOW DOES SEARCH WORK?

A collection of documents




      file_1.txt
      The  ruby  is  a  pink  to  blood-­‐red  colored  gemstone  ...


      file_2.txt
      Ruby  is  a  dynamic,  reflective,  general-­‐purpose  object-­‐oriented  
      programming  language  ...

      file_3.txt
      "Ruby"  is  a  song  by  English  rock  band  Kaiser  Chiefs  ...
HOW DOES SEARCH WORK?

How do you search documents?




File.read('file_1.txt').include?('ruby')
File.read('file_2.txt').include?('ruby')
...
HOW DOES SEARCH WORK?

The inverted index

TOKENS                         POSTINGS



 ruby                           file_1.txt        file_2.txt          file_3.txt
 pink                           file_1.txt
 gemstone                       file_1.txt

 dynamic                                         file_2.txt
 reflective                                      file_2.txt
 programming                                     file_2.txt

 song                                                                 file_3.txt
 english                                                              file_3.txt
 rock                                                                 file_3.txt

http://en.wikipedia.org/wiki/Index_(search_engine)#Inverted_indices
HOW DOES SEARCH WORK?

The inverted index

search  "ruby"

 ruby                           file_1.txt        file_2.txt          file_3.txt
 pink                           file_1.txt
 gemstone                       file_1.txt

 dynamic                                         file_2.txt
 reflective                                      file_2.txt
 programming                                     file_2.txt

 song                                                                 file_3.txt
 english                                                              file_3.txt
 rock                                                                 file_3.txt

http://en.wikipedia.org/wiki/Index_(search_engine)#Inverted_indices
HOW DOES SEARCH WORK?

The inverted index

search  "song"

 ruby                           file_1.txt        file_2.txt          file_3.txt
 pink                           file_1.txt
 gemstone                       file_1.txt

 dynamic                                         file_2.txt
 reflective                                      file_2.txt
 programming                                     file_2.txt

 song                                                                 file_3.txt
 english                                                              file_3.txt
 rock                                                                 file_3.txt

http://en.wikipedia.org/wiki/Index_(search_engine)#Inverted_indices
HOW DOES SEARCH WORK?

The inverted index

search  "ruby  AND  song"

 ruby                           file_1.txt        file_2.txt          file_3.txt
 pink                           file_1.txt
 gemstone                       file_1.txt

 dynamic                                         file_2.txt
 reflective                                      file_2.txt
 programming                                     file_2.txt

 song                                                                 file_3.txt
 english                                                              file_3.txt
 rock                                                                 file_3.txt

http://en.wikipedia.org/wiki/Index_(search_engine)#Inverted_indices
HOW DOES SEARCH WORK?

The inverted index

TOKENS                         POSTINGS
                              Statistics!


 ruby    3                      file_1.txt        file_2.txt          file_3.txt
 pink    1                      file_1.txt
 gemstone                       file_1.txt

 dynamic                                         file_2.txt
 reflective                                      file_2.txt
 programming                                     file_2.txt

 song                                                                 file_3.txt
 english                                                              file_3.txt
 rock                                                                 file_3.txt

http://en.wikipedia.org/wiki/Index_(search_engine)#Inverted_indices
http://elasticsearch.org
ElasticSearch is an open source, scalable,
distributed, cloud-ready, highly-available full-
text search engine and database with powerful
aggregation features, communicating by JSON
over RESTful HTTP, based on Apache
Lucene.


                                 Realtime Analytics With ElasticSearch
FACETS

    Faceted Navigation



Query




Facets




                         http://blog.linkedin.com/2009/12/14/linkedin-faceted-search/
FACETS

Faceted Navigation with Elasticsearch

curl  "http://localhost:9200/people/_search?pretty=true"  -­‐d  '
{
    "query"  :  {
        "match"  :  {  "name"  :  "John"}                                                        User query
    },
    "filter"  :  {
        "terms"  :  {  "employer"  :  ["IBM"]  }                                                 “Checkboxes”
    },
    "facets"  :  {
        "employer"  :  {
            "terms"  :  {                                                                        Facets
                    "field"  :  "employer",
                    "size"    :  3
            }                                    "facets"  :  {
        }                                                "employer"  :  {
    }                                                        "missing"  :  0,
}'
                                                                     "total"  :  10,
                                                                     "other"  :  3,
                                                                     "terms"  :  [  {
                                                                         "term"  :  "ibm",
                           Response                                      "count"  :  3
                                                                     },  {
                                                                         "term"  :  "twitter",
                                                                         "count"  :  2
                                                                     },  {
                                                                         "term"  :  "apple",
                                                                         "count"  :  2
                                                                     }  ]
                                                                 }
                                                             }

http://www.elasticsearch.org/guide/reference/api/search/facets/index.html
FACETS

Visualizing the Facets



  "facets"  :  {
          "employer"  :  {
              "missing"  :  0,
              "total"  :  10,
              "other"  :  3,
              "terms"  :  [  {
                  "term"  :  "ibm",
                  "count"  :  3
              },  {
                  "term"  :  "twitter",
                  "count"  :  2
              },  {
                  "term"  :  "apple",
                  "count"  :  2
              }  ]                                    DEMO: http://bl.ocks.org/4571766
          }
      }




  d3.js ~ A Bar Chart, Part 1
  http://mbostock.github.com/d3/tutorial/bar-1.html
FACETS

Visualizing the Facets
FACETS

Visualizing the Facets
FACETS

Visualizing the Facets




http://demo.kibana.org
Important Concepts
‣ No batch orientation
‣ No stats precomputation and caching
‣ No predefined metrics or schemas

‣ Combination of free text search, structured
  search, and facets
‣ Scripting for performing ad–hoc analytics
‣ Extendable: write your own facet types


                                 Realtime Analytics With ElasticSearch
FACETS

Scripting
Extract and aggregate most popular domains from article URLs
curl -X DELETE localhost:9200/demo-articles
curl -X POST localhost:9200/demo-articles -d '{"mappings": { "a": { "properties": {"url": {type: "string", "index": "not_analyzed"}} } } }'


curl         -X   PUT localhost:9200/demo-articles/a/1 -d '{"title":"...","url":"http://some.blogger.com/2012/09/01/index.html"}'
curl         -X   PUT localhost:9200/demo-articles/a/2 -d '{"title":"...","url":"http://some.blogger.com/2012/09/11/index.html"}'
curl         -X   PUT localhost:9200/demo-articles/a/3 -d '{"title":"...","url":"http://some.blogger.com/about.html"}'
curl         -X   PUT localhost:9200/demo-articles/a/5 -d '{"title":"...","url":"https://github.com/user/A"}'
curl         -X   PUT localhost:9200/demo-articles/a/5 -d '{"title":"...","url":"http://github.com/user/B"}'
curl         -X   POST localhost:9200/demo-articles/_refresh

curl -X GET 'localhost:9200/demo-articles/_search/?search_type=count&pretty' -d '{
  "facets": {
    "popular-domains": {
      "terms": {
        "field" :    "url",
                  "script" :   "term.replace(new            RegExp("https?://"), "").split("/")[0]",
                  "lang"   :   "javascript"
              }
         }
     }
                                                             "facets"  :  {
}'
                                                                     "popular-­‐domains"  :  {
                                                                         //  ...
                                                                         "terms"  :  [  {
                               Response                                      "term"  :  "some.blogger.com",  "count"  :  3
                                                                         },  {
                                                                             "term"  :  "github.com",  "count"  :  1
                                                                         }  ]
                                                                     }
                                                                 }
FACETS

Demonstrations
Extract and aggregate most popular domains from article URLs
curl -X DELETE localhost:9200/demo-articles
curl -X POST localhost:9200/demo-articles -d '{"mappings": { "a": { "properties": {"url": {type: "string", "index": "not_analyzed"}} } } }'


curl         -X   PUT localhost:9200/demo-articles/a/1 -d '{"title":"...","url":"http://some.blogger.com/2012/09/01/index.html"}'
curl         -X   PUT localhost:9200/demo-articles/a/2 -d '{"title":"...","url":"http://some.blogger.com/2012/09/11/index.html"}'
curl         -X   PUT localhost:9200/demo-articles/a/3 -d '{"title":"...","url":"http://some.blogger.com/about.html"}'
curl         -X   PUT localhost:9200/demo-articles/a/5 -d '{"title":"...","url":"https://github.com/user/A"}'
curl         -X   PUT localhost:9200/demo-articles/a/5 -d '{"title":"...","url":"http://github.com/user/B"}'
curl         -X   POST localhost:9200/demo-articles/_refresh

curl -X GET 'localhost:9200/demo-articles/_search/?search_type=count&pretty' -d '{
  "facets": {
    "popular-domains": {
      "terms": {
        "field" :    "url",
                  "script" :   "term.replace(new            RegExp("https?://"), "").split("/")[0]",
                  "lang"   :   "javascript"
              }



}'
     }
         }
                                                                        Demo
                                                             "facets"  :  {
                                                                     "popular-­‐domains"  :  {
                                                                         //  ...
                                                                         "terms"  :  [  {
                               Response                                      "term"  :  "some.blogger.com",  "count"  :  3
                                                                         },  {
                                                                             "term"  :  "github.com",  "count"  :  1
                                                                         }  ]
                                                                     }
                                                                 }
Thanks!
  d

More Related Content

Viewers also liked

Cgc2 cdn gamingsummit-real-time-customer-analytics
Cgc2 cdn gamingsummit-real-time-customer-analyticsCgc2 cdn gamingsummit-real-time-customer-analytics
Cgc2 cdn gamingsummit-real-time-customer-analyticsbrock55
 
Real Time Recommendation System using Kiji
Real Time Recommendation System using KijiReal Time Recommendation System using Kiji
Real Time Recommendation System using KijiDaqing Zhao
 
TDWI Solution Summit San Diego 2014 Advanced Analytics at Macys.com
TDWI Solution Summit San Diego 2014 Advanced Analytics at Macys.comTDWI Solution Summit San Diego 2014 Advanced Analytics at Macys.com
TDWI Solution Summit San Diego 2014 Advanced Analytics at Macys.comDaqing Zhao
 
Elasticsearch in 15 Minutes
Elasticsearch in 15 MinutesElasticsearch in 15 Minutes
Elasticsearch in 15 MinutesKarel Minarik
 
Elasticsearch (Rubyshift 2013)
Elasticsearch (Rubyshift 2013)Elasticsearch (Rubyshift 2013)
Elasticsearch (Rubyshift 2013)Karel Minarik
 
Real-Time Personalization
Real-Time PersonalizationReal-Time Personalization
Real-Time PersonalizationRichard Veryard
 
Near-realtime analytics with Kafka and HBase
Near-realtime analytics with Kafka and HBaseNear-realtime analytics with Kafka and HBase
Near-realtime analytics with Kafka and HBasedave_revell
 
Intro to Elasticsearch
Intro to ElasticsearchIntro to Elasticsearch
Intro to ElasticsearchClifford James
 
H2O World - Advanced Analytics at Macys.com - Daqing Zhao
H2O World - Advanced Analytics at Macys.com - Daqing ZhaoH2O World - Advanced Analytics at Macys.com - Daqing Zhao
H2O World - Advanced Analytics at Macys.com - Daqing ZhaoSri Ambati
 
Elasticsearch as a search alternative to a relational database
Elasticsearch as a search alternative to a relational databaseElasticsearch as a search alternative to a relational database
Elasticsearch as a search alternative to a relational databaseKristijan Duvnjak
 
Big Data Predictive Analytics for Retail businesses
Big Data Predictive Analytics for Retail businessesBig Data Predictive Analytics for Retail businesses
Big Data Predictive Analytics for Retail businessesGopalakrishna Palem
 
Rainbird: Realtime Analytics at Twitter (Strata 2011)
Rainbird: Realtime Analytics at Twitter (Strata 2011)Rainbird: Realtime Analytics at Twitter (Strata 2011)
Rainbird: Realtime Analytics at Twitter (Strata 2011)Kevin Weil
 
Webinar - Macy’s: Why Your Database Decision Directly Impacts Customer Experi...
Webinar - Macy’s: Why Your Database Decision Directly Impacts Customer Experi...Webinar - Macy’s: Why Your Database Decision Directly Impacts Customer Experi...
Webinar - Macy’s: Why Your Database Decision Directly Impacts Customer Experi...DataStax
 
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...Spark Summit
 
(BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics
(BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics(BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics
(BDT209) Launch: Amazon Elasticsearch For Real-Time Data AnalyticsAmazon Web Services
 

Viewers also liked (16)

Cgc2 cdn gamingsummit-real-time-customer-analytics
Cgc2 cdn gamingsummit-real-time-customer-analyticsCgc2 cdn gamingsummit-real-time-customer-analytics
Cgc2 cdn gamingsummit-real-time-customer-analytics
 
Real Time Recommendation System using Kiji
Real Time Recommendation System using KijiReal Time Recommendation System using Kiji
Real Time Recommendation System using Kiji
 
TDWI Solution Summit San Diego 2014 Advanced Analytics at Macys.com
TDWI Solution Summit San Diego 2014 Advanced Analytics at Macys.comTDWI Solution Summit San Diego 2014 Advanced Analytics at Macys.com
TDWI Solution Summit San Diego 2014 Advanced Analytics at Macys.com
 
Elasticsearch in 15 Minutes
Elasticsearch in 15 MinutesElasticsearch in 15 Minutes
Elasticsearch in 15 Minutes
 
Elasticsearch (Rubyshift 2013)
Elasticsearch (Rubyshift 2013)Elasticsearch (Rubyshift 2013)
Elasticsearch (Rubyshift 2013)
 
Real-Time Personalization
Real-Time PersonalizationReal-Time Personalization
Real-Time Personalization
 
Near-realtime analytics with Kafka and HBase
Near-realtime analytics with Kafka and HBaseNear-realtime analytics with Kafka and HBase
Near-realtime analytics with Kafka and HBase
 
Intro to Elasticsearch
Intro to ElasticsearchIntro to Elasticsearch
Intro to Elasticsearch
 
H2O World - Advanced Analytics at Macys.com - Daqing Zhao
H2O World - Advanced Analytics at Macys.com - Daqing ZhaoH2O World - Advanced Analytics at Macys.com - Daqing Zhao
H2O World - Advanced Analytics at Macys.com - Daqing Zhao
 
Elasticsearch as a search alternative to a relational database
Elasticsearch as a search alternative to a relational databaseElasticsearch as a search alternative to a relational database
Elasticsearch as a search alternative to a relational database
 
Big Data Predictive Analytics for Retail businesses
Big Data Predictive Analytics for Retail businessesBig Data Predictive Analytics for Retail businesses
Big Data Predictive Analytics for Retail businesses
 
Rainbird: Realtime Analytics at Twitter (Strata 2011)
Rainbird: Realtime Analytics at Twitter (Strata 2011)Rainbird: Realtime Analytics at Twitter (Strata 2011)
Rainbird: Realtime Analytics at Twitter (Strata 2011)
 
Webinar - Macy’s: Why Your Database Decision Directly Impacts Customer Experi...
Webinar - Macy’s: Why Your Database Decision Directly Impacts Customer Experi...Webinar - Macy’s: Why Your Database Decision Directly Impacts Customer Experi...
Webinar - Macy’s: Why Your Database Decision Directly Impacts Customer Experi...
 
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
 
(BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics
(BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics(BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics
(BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics
 
Customer Journey Analytics and Big Data
Customer Journey Analytics and Big DataCustomer Journey Analytics and Big Data
Customer Journey Analytics and Big Data
 

Similar to Realtime Analytics With Elasticsearch [New Media Inspiration 2013]

Your Data, Your Search, ElasticSearch (EURUKO 2011)
Your Data, Your Search, ElasticSearch (EURUKO 2011)Your Data, Your Search, ElasticSearch (EURUKO 2011)
Your Data, Your Search, ElasticSearch (EURUKO 2011)Karel Minarik
 
Using the whole web as your dataset
Using the whole web as your datasetUsing the whole web as your dataset
Using the whole web as your datasetTuri, Inc.
 
Semantic search within Earth Observation products databases based on automati...
Semantic search within Earth Observation products databases based on automati...Semantic search within Earth Observation products databases based on automati...
Semantic search within Earth Observation products databases based on automati...Gasperi Jerome
 
ElasticSearch with Tire
ElasticSearch with TireElasticSearch with Tire
ElasticSearch with TireDavid Yun
 
Elastic Search Training#1 (brief tutorial)-ESCC#1
Elastic Search Training#1 (brief tutorial)-ESCC#1Elastic Search Training#1 (brief tutorial)-ESCC#1
Elastic Search Training#1 (brief tutorial)-ESCC#1medcl
 
Visualizing Data in Elasticsearch DevFest DC 2016
Visualizing Data in Elasticsearch DevFest DC 2016Visualizing Data in Elasticsearch DevFest DC 2016
Visualizing Data in Elasticsearch DevFest DC 2016David Erickson
 
SPARQL1.1 Tutorial, given in UChile by Axel Polleres (DERI)
SPARQL1.1 Tutorial, given in UChile by Axel Polleres (DERI)SPARQL1.1 Tutorial, given in UChile by Axel Polleres (DERI)
SPARQL1.1 Tutorial, given in UChile by Axel Polleres (DERI)net2-project
 
Nltk natural language toolkit overview and application @ PyCon.tw 2012
Nltk  natural language toolkit overview and application @ PyCon.tw 2012Nltk  natural language toolkit overview and application @ PyCon.tw 2012
Nltk natural language toolkit overview and application @ PyCon.tw 2012Jimmy Lai
 
Scaling search to a million pages with Solr, Python, and Django
Scaling search to a million pages with Solr, Python, and DjangoScaling search to a million pages with Solr, Python, and Django
Scaling search to a million pages with Solr, Python, and Djangotow21
 
How to put an annotation in html
How to put an annotation in htmlHow to put an annotation in html
How to put an annotation in htmlSTIinnsbruck
 
Elasticsearch Basics
Elasticsearch BasicsElasticsearch Basics
Elasticsearch BasicsShifa Khan
 
First steps towards publishing library data on the semantic web
First steps towards publishing library data on the semantic webFirst steps towards publishing library data on the semantic web
First steps towards publishing library data on the semantic webhorvadam
 
About elasticsearch
About elasticsearchAbout elasticsearch
About elasticsearchMinsoo Jun
 
RESTo - restful semantic search tool for geospatial
RESTo - restful semantic search tool for geospatialRESTo - restful semantic search tool for geospatial
RESTo - restful semantic search tool for geospatialGasperi Jerome
 
Software tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data miningSoftware tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data miningAnubhav Jain
 

Similar to Realtime Analytics With Elasticsearch [New Media Inspiration 2013] (20)

Your Data, Your Search, ElasticSearch (EURUKO 2011)
Your Data, Your Search, ElasticSearch (EURUKO 2011)Your Data, Your Search, ElasticSearch (EURUKO 2011)
Your Data, Your Search, ElasticSearch (EURUKO 2011)
 
Using the whole web as your dataset
Using the whole web as your datasetUsing the whole web as your dataset
Using the whole web as your dataset
 
Semantic search within Earth Observation products databases based on automati...
Semantic search within Earth Observation products databases based on automati...Semantic search within Earth Observation products databases based on automati...
Semantic search within Earth Observation products databases based on automati...
 
ElasticSearch with Tire
ElasticSearch with TireElasticSearch with Tire
ElasticSearch with Tire
 
Elastic Search Training#1 (brief tutorial)-ESCC#1
Elastic Search Training#1 (brief tutorial)-ESCC#1Elastic Search Training#1 (brief tutorial)-ESCC#1
Elastic Search Training#1 (brief tutorial)-ESCC#1
 
Visualizing Data in Elasticsearch DevFest DC 2016
Visualizing Data in Elasticsearch DevFest DC 2016Visualizing Data in Elasticsearch DevFest DC 2016
Visualizing Data in Elasticsearch DevFest DC 2016
 
IR with lucene
IR with luceneIR with lucene
IR with lucene
 
SPARQL1.1 Tutorial, given in UChile by Axel Polleres (DERI)
SPARQL1.1 Tutorial, given in UChile by Axel Polleres (DERI)SPARQL1.1 Tutorial, given in UChile by Axel Polleres (DERI)
SPARQL1.1 Tutorial, given in UChile by Axel Polleres (DERI)
 
Nltk natural language toolkit overview and application @ PyCon.tw 2012
Nltk  natural language toolkit overview and application @ PyCon.tw 2012Nltk  natural language toolkit overview and application @ PyCon.tw 2012
Nltk natural language toolkit overview and application @ PyCon.tw 2012
 
Linked Data on Rails
Linked Data on RailsLinked Data on Rails
Linked Data on Rails
 
Scaling search to a million pages with Solr, Python, and Django
Scaling search to a million pages with Solr, Python, and DjangoScaling search to a million pages with Solr, Python, and Django
Scaling search to a million pages with Solr, Python, and Django
 
Mining legal texts with Python
Mining legal texts with PythonMining legal texts with Python
Mining legal texts with Python
 
How to put an annotation in html
How to put an annotation in htmlHow to put an annotation in html
How to put an annotation in html
 
Elasticsearch Basics
Elasticsearch BasicsElasticsearch Basics
Elasticsearch Basics
 
First steps towards publishing library data on the semantic web
First steps towards publishing library data on the semantic webFirst steps towards publishing library data on the semantic web
First steps towards publishing library data on the semantic web
 
About elasticsearch
About elasticsearchAbout elasticsearch
About elasticsearch
 
RESTo - restful semantic search tool for geospatial
RESTo - restful semantic search tool for geospatialRESTo - restful semantic search tool for geospatial
RESTo - restful semantic search tool for geospatial
 
Software tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data miningSoftware tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data mining
 
Democratizing Big Semantic Data management
Democratizing Big Semantic Data managementDemocratizing Big Semantic Data management
Democratizing Big Semantic Data management
 
Google code search
Google code searchGoogle code search
Google code search
 

More from Karel Minarik

Vizualizace dat a D3.js [EUROPEN 2014]
Vizualizace dat a D3.js [EUROPEN 2014]Vizualizace dat a D3.js [EUROPEN 2014]
Vizualizace dat a D3.js [EUROPEN 2014]Karel Minarik
 
Elasticsearch And Ruby [RuPy2012]
Elasticsearch And Ruby [RuPy2012]Elasticsearch And Ruby [RuPy2012]
Elasticsearch And Ruby [RuPy2012]Karel Minarik
 
Shell's Kitchen: Infrastructure As Code (Webexpo 2012)
Shell's Kitchen: Infrastructure As Code (Webexpo 2012)Shell's Kitchen: Infrastructure As Code (Webexpo 2012)
Shell's Kitchen: Infrastructure As Code (Webexpo 2012)Karel Minarik
 
Redis — The AK-47 of Post-relational Databases
Redis — The AK-47 of Post-relational DatabasesRedis — The AK-47 of Post-relational Databases
Redis — The AK-47 of Post-relational DatabasesKarel Minarik
 
CouchDB – A Database for the Web
CouchDB – A Database for the WebCouchDB – A Database for the Web
CouchDB – A Database for the WebKarel Minarik
 
Spoiling The Youth With Ruby (Euruko 2010)
Spoiling The Youth With Ruby (Euruko 2010)Spoiling The Youth With Ruby (Euruko 2010)
Spoiling The Youth With Ruby (Euruko 2010)Karel Minarik
 
Verzovani kodu s Gitem (Karel Minarik)
Verzovani kodu s Gitem (Karel Minarik)Verzovani kodu s Gitem (Karel Minarik)
Verzovani kodu s Gitem (Karel Minarik)Karel Minarik
 
Představení Ruby on Rails [Junior Internet]
Představení Ruby on Rails [Junior Internet]Představení Ruby on Rails [Junior Internet]
Představení Ruby on Rails [Junior Internet]Karel Minarik
 
Efektivni vyvoj webovych aplikaci v Ruby on Rails (Webexpo)
Efektivni vyvoj webovych aplikaci v Ruby on Rails (Webexpo)Efektivni vyvoj webovych aplikaci v Ruby on Rails (Webexpo)
Efektivni vyvoj webovych aplikaci v Ruby on Rails (Webexpo)Karel Minarik
 
Úvod do Ruby on Rails
Úvod do Ruby on RailsÚvod do Ruby on Rails
Úvod do Ruby on RailsKarel Minarik
 
Úvod do programování 7
Úvod do programování 7Úvod do programování 7
Úvod do programování 7Karel Minarik
 
Úvod do programování 6
Úvod do programování 6Úvod do programování 6
Úvod do programování 6Karel Minarik
 
Úvod do programování 5
Úvod do programování 5Úvod do programování 5
Úvod do programování 5Karel Minarik
 
Úvod do programování 4
Úvod do programování 4Úvod do programování 4
Úvod do programování 4Karel Minarik
 
Úvod do programování 3 (to be continued)
Úvod do programování 3 (to be continued)Úvod do programování 3 (to be continued)
Úvod do programování 3 (to be continued)Karel Minarik
 
Historie programovacích jazyků
Historie programovacích jazykůHistorie programovacích jazyků
Historie programovacích jazykůKarel Minarik
 
Úvod do programování aneb Do nitra stroje
Úvod do programování aneb Do nitra strojeÚvod do programování aneb Do nitra stroje
Úvod do programování aneb Do nitra strojeKarel Minarik
 
Interaktivita, originalita a návrhové vzory
Interaktivita, originalita a návrhové vzoryInteraktivita, originalita a návrhové vzory
Interaktivita, originalita a návrhové vzoryKarel Minarik
 

More from Karel Minarik (18)

Vizualizace dat a D3.js [EUROPEN 2014]
Vizualizace dat a D3.js [EUROPEN 2014]Vizualizace dat a D3.js [EUROPEN 2014]
Vizualizace dat a D3.js [EUROPEN 2014]
 
Elasticsearch And Ruby [RuPy2012]
Elasticsearch And Ruby [RuPy2012]Elasticsearch And Ruby [RuPy2012]
Elasticsearch And Ruby [RuPy2012]
 
Shell's Kitchen: Infrastructure As Code (Webexpo 2012)
Shell's Kitchen: Infrastructure As Code (Webexpo 2012)Shell's Kitchen: Infrastructure As Code (Webexpo 2012)
Shell's Kitchen: Infrastructure As Code (Webexpo 2012)
 
Redis — The AK-47 of Post-relational Databases
Redis — The AK-47 of Post-relational DatabasesRedis — The AK-47 of Post-relational Databases
Redis — The AK-47 of Post-relational Databases
 
CouchDB – A Database for the Web
CouchDB – A Database for the WebCouchDB – A Database for the Web
CouchDB – A Database for the Web
 
Spoiling The Youth With Ruby (Euruko 2010)
Spoiling The Youth With Ruby (Euruko 2010)Spoiling The Youth With Ruby (Euruko 2010)
Spoiling The Youth With Ruby (Euruko 2010)
 
Verzovani kodu s Gitem (Karel Minarik)
Verzovani kodu s Gitem (Karel Minarik)Verzovani kodu s Gitem (Karel Minarik)
Verzovani kodu s Gitem (Karel Minarik)
 
Představení Ruby on Rails [Junior Internet]
Představení Ruby on Rails [Junior Internet]Představení Ruby on Rails [Junior Internet]
Představení Ruby on Rails [Junior Internet]
 
Efektivni vyvoj webovych aplikaci v Ruby on Rails (Webexpo)
Efektivni vyvoj webovych aplikaci v Ruby on Rails (Webexpo)Efektivni vyvoj webovych aplikaci v Ruby on Rails (Webexpo)
Efektivni vyvoj webovych aplikaci v Ruby on Rails (Webexpo)
 
Úvod do Ruby on Rails
Úvod do Ruby on RailsÚvod do Ruby on Rails
Úvod do Ruby on Rails
 
Úvod do programování 7
Úvod do programování 7Úvod do programování 7
Úvod do programování 7
 
Úvod do programování 6
Úvod do programování 6Úvod do programování 6
Úvod do programování 6
 
Úvod do programování 5
Úvod do programování 5Úvod do programování 5
Úvod do programování 5
 
Úvod do programování 4
Úvod do programování 4Úvod do programování 4
Úvod do programování 4
 
Úvod do programování 3 (to be continued)
Úvod do programování 3 (to be continued)Úvod do programování 3 (to be continued)
Úvod do programování 3 (to be continued)
 
Historie programovacích jazyků
Historie programovacích jazykůHistorie programovacích jazyků
Historie programovacích jazyků
 
Úvod do programování aneb Do nitra stroje
Úvod do programování aneb Do nitra strojeÚvod do programování aneb Do nitra stroje
Úvod do programování aneb Do nitra stroje
 
Interaktivita, originalita a návrhové vzory
Interaktivita, originalita a návrhové vzoryInteraktivita, originalita a návrhové vzory
Interaktivita, originalita a návrhové vzory
 

Recently uploaded

Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 

Recently uploaded (20)

Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 

Realtime Analytics With Elasticsearch [New Media Inspiration 2013]

  • 1. Real time analytics of big data with Elasticsearch Karel Minařík
  • 2. cets Fa ly tics SON Ana J http://www.youtube.com/watch?v=-GftBySG99Q
  • 3. http://karmi.cz http://elasticsearch.com Realtime Analytics With ElasticSearch
  • 4. Using a search engine for analytics? wat? Realtime Analytics With ElasticSearch
  • 5. HOW DOES SEARCH WORK? A collection of documents file_1.txt The  ruby  is  a  pink  to  blood-­‐red  colored  gemstone  ... file_2.txt Ruby  is  a  dynamic,  reflective,  general-­‐purpose  object-­‐oriented   programming  language  ... file_3.txt "Ruby"  is  a  song  by  English  rock  band  Kaiser  Chiefs  ...
  • 6. HOW DOES SEARCH WORK? How do you search documents? File.read('file_1.txt').include?('ruby') File.read('file_2.txt').include?('ruby') ...
  • 7. HOW DOES SEARCH WORK? The inverted index TOKENS POSTINGS ruby file_1.txt file_2.txt file_3.txt pink file_1.txt gemstone file_1.txt dynamic file_2.txt reflective file_2.txt programming file_2.txt song file_3.txt english file_3.txt rock file_3.txt http://en.wikipedia.org/wiki/Index_(search_engine)#Inverted_indices
  • 8. HOW DOES SEARCH WORK? The inverted index search  "ruby" ruby file_1.txt file_2.txt file_3.txt pink file_1.txt gemstone file_1.txt dynamic file_2.txt reflective file_2.txt programming file_2.txt song file_3.txt english file_3.txt rock file_3.txt http://en.wikipedia.org/wiki/Index_(search_engine)#Inverted_indices
  • 9. HOW DOES SEARCH WORK? The inverted index search  "song" ruby file_1.txt file_2.txt file_3.txt pink file_1.txt gemstone file_1.txt dynamic file_2.txt reflective file_2.txt programming file_2.txt song file_3.txt english file_3.txt rock file_3.txt http://en.wikipedia.org/wiki/Index_(search_engine)#Inverted_indices
  • 10. HOW DOES SEARCH WORK? The inverted index search  "ruby  AND  song" ruby file_1.txt file_2.txt file_3.txt pink file_1.txt gemstone file_1.txt dynamic file_2.txt reflective file_2.txt programming file_2.txt song file_3.txt english file_3.txt rock file_3.txt http://en.wikipedia.org/wiki/Index_(search_engine)#Inverted_indices
  • 11. HOW DOES SEARCH WORK? The inverted index TOKENS POSTINGS Statistics! ruby 3 file_1.txt file_2.txt file_3.txt pink 1 file_1.txt gemstone file_1.txt dynamic file_2.txt reflective file_2.txt programming file_2.txt song file_3.txt english file_3.txt rock file_3.txt http://en.wikipedia.org/wiki/Index_(search_engine)#Inverted_indices
  • 13. ElasticSearch is an open source, scalable, distributed, cloud-ready, highly-available full- text search engine and database with powerful aggregation features, communicating by JSON over RESTful HTTP, based on Apache Lucene. Realtime Analytics With ElasticSearch
  • 14. FACETS Faceted Navigation Query Facets http://blog.linkedin.com/2009/12/14/linkedin-faceted-search/
  • 15. FACETS Faceted Navigation with Elasticsearch curl  "http://localhost:9200/people/_search?pretty=true"  -­‐d  ' {    "query"  :  {        "match"  :  {  "name"  :  "John"} User query    },    "filter"  :  {        "terms"  :  {  "employer"  :  ["IBM"]  } “Checkboxes”    },    "facets"  :  {        "employer"  :  {            "terms"  :  { Facets                    "field"  :  "employer",                    "size"    :  3            } "facets"  :  {        }        "employer"  :  {    }            "missing"  :  0, }'            "total"  :  10,            "other"  :  3,            "terms"  :  [  {                "term"  :  "ibm", Response                "count"  :  3            },  {                "term"  :  "twitter",                "count"  :  2            },  {                "term"  :  "apple",                "count"  :  2            }  ]        }    } http://www.elasticsearch.org/guide/reference/api/search/facets/index.html
  • 16. FACETS Visualizing the Facets "facets"  :  {        "employer"  :  {            "missing"  :  0,            "total"  :  10,            "other"  :  3,            "terms"  :  [  {                "term"  :  "ibm",                "count"  :  3            },  {                "term"  :  "twitter",                "count"  :  2            },  {                "term"  :  "apple",                "count"  :  2            }  ] DEMO: http://bl.ocks.org/4571766        }    } d3.js ~ A Bar Chart, Part 1 http://mbostock.github.com/d3/tutorial/bar-1.html
  • 20. Important Concepts ‣ No batch orientation ‣ No stats precomputation and caching ‣ No predefined metrics or schemas ‣ Combination of free text search, structured search, and facets ‣ Scripting for performing ad–hoc analytics ‣ Extendable: write your own facet types Realtime Analytics With ElasticSearch
  • 21. FACETS Scripting Extract and aggregate most popular domains from article URLs curl -X DELETE localhost:9200/demo-articles curl -X POST localhost:9200/demo-articles -d '{"mappings": { "a": { "properties": {"url": {type: "string", "index": "not_analyzed"}} } } }' curl -X PUT localhost:9200/demo-articles/a/1 -d '{"title":"...","url":"http://some.blogger.com/2012/09/01/index.html"}' curl -X PUT localhost:9200/demo-articles/a/2 -d '{"title":"...","url":"http://some.blogger.com/2012/09/11/index.html"}' curl -X PUT localhost:9200/demo-articles/a/3 -d '{"title":"...","url":"http://some.blogger.com/about.html"}' curl -X PUT localhost:9200/demo-articles/a/5 -d '{"title":"...","url":"https://github.com/user/A"}' curl -X PUT localhost:9200/demo-articles/a/5 -d '{"title":"...","url":"http://github.com/user/B"}' curl -X POST localhost:9200/demo-articles/_refresh curl -X GET 'localhost:9200/demo-articles/_search/?search_type=count&pretty' -d '{ "facets": { "popular-domains": { "terms": { "field" : "url", "script" : "term.replace(new RegExp("https?://"), "").split("/")[0]", "lang" : "javascript" } } } "facets"  :  { }'        "popular-­‐domains"  :  {            //  ...            "terms"  :  [  { Response                "term"  :  "some.blogger.com",  "count"  :  3            },  {                "term"  :  "github.com",  "count"  :  1            }  ]        }    }
  • 22. FACETS Demonstrations Extract and aggregate most popular domains from article URLs curl -X DELETE localhost:9200/demo-articles curl -X POST localhost:9200/demo-articles -d '{"mappings": { "a": { "properties": {"url": {type: "string", "index": "not_analyzed"}} } } }' curl -X PUT localhost:9200/demo-articles/a/1 -d '{"title":"...","url":"http://some.blogger.com/2012/09/01/index.html"}' curl -X PUT localhost:9200/demo-articles/a/2 -d '{"title":"...","url":"http://some.blogger.com/2012/09/11/index.html"}' curl -X PUT localhost:9200/demo-articles/a/3 -d '{"title":"...","url":"http://some.blogger.com/about.html"}' curl -X PUT localhost:9200/demo-articles/a/5 -d '{"title":"...","url":"https://github.com/user/A"}' curl -X PUT localhost:9200/demo-articles/a/5 -d '{"title":"...","url":"http://github.com/user/B"}' curl -X POST localhost:9200/demo-articles/_refresh curl -X GET 'localhost:9200/demo-articles/_search/?search_type=count&pretty' -d '{ "facets": { "popular-domains": { "terms": { "field" : "url", "script" : "term.replace(new RegExp("https?://"), "").split("/")[0]", "lang" : "javascript" } }' } } Demo "facets"  :  {        "popular-­‐domains"  :  {            //  ...            "terms"  :  [  { Response                "term"  :  "some.blogger.com",  "count"  :  3            },  {                "term"  :  "github.com",  "count"  :  1            }  ]        }    }