SlideShare una empresa de Scribd logo
1 de 75
Descargar para leer sin conexión
ElasticSearch
Beyond Ordinary Fulltext Search




Karel Minařík
http://karmi.cz


                  ElasticSearch
AUDIENCE POLL


Does your application have a search feature?




                                               ElasticSearch
AUDIENCE POLL


What do you use for search?

1. SELECT  ...  LIKE  %foo%
2. Sphinx
3. Apache Solr
4. ElasticSearch




                              ElasticSearch
Search is the primary interface
for getting information today.



                             ElasticSearch
http://www.apple.com/macosx/what-is-macosx/spotlight.html
http://www.apple.com/iphone/features/search.html
???
???
#uxfail???
Y U NO ALIGN???
???
???
Search is hard.
Let's go write SQL queries!


                         ElasticSearch
WHY SEARCH SUCKS?

How do you implement search?




def  search
    @results  =  MyModel.search  params[:q]
    respond_with  @results
end
WHY SEARCH SUCKS?

How do you implement search?



                    Query       Results   Result




                            MAGIC


def  search
    @results  =  MyModel.search  params[:q]
    respond_with  @results
end
WHY SEARCH SUCKS?

How do you implement search?



                    Query       Results   Result




                            MAGIC                  +   /

def  search
    @results  =  MyModel.search  params[:q]
    respond_with  @results
end
23px


                      670px




A personal story...
WHY SEARCH SUCKS?

Compare your search library with your ORM library



MyModel.search  "(this  OR  that)  AND  NOT  whatever"


Arel::Table.new(:articles).
    where(articles[:title].eq('On  Search')).
    where(["published_on  =>  ?",  Time.now]).
    join(comments).
    on(article[:id].eq(comments[:article_id]))
    take(5).
    skip(4).
    to_sql
How does search work?



                        ElasticSearch
HOW DOES SEARCH WORK?

A collection of documents




      file_1.txt
      The  ruby  is  a  pink  to  blood-­‐red  colored  gemstone  ...


      file_2.txt
      Ruby  is  a  dynamic,  reflective,  general-­‐purpose  object-­‐oriented  
      programming  language  ...

      file_3.txt
      "Ruby"  is  a  song  by  English  rock  band  Kaiser  Chiefs  ...
HOW DOES SEARCH WORK?

How do you search documents?




File.read('file_1.txt').include?('ruby')
File.read('file_2.txt').include?('ruby')
...
HOW DOES SEARCH WORK?

The inverted index

TOKENS                         POSTINGS



 ruby                           file_1.txt        file_2.txt          file_3.txt
 pink                           file_1.txt
 gemstone                       file_1.txt

 dynamic                                         file_2.txt
 reflective                                      file_2.txt
 programming                                     file_2.txt

 song                                                                 file_3.txt
 english                                                              file_3.txt
 rock                                                                 file_3.txt

http://en.wikipedia.org/wiki/Index_(search_engine)#Inverted_indices
HOW DOES SEARCH WORK?

The inverted index

MySearchLib.search  "ruby"

 ruby                           file_1.txt        file_2.txt          file_3.txt
 pink                           file_1.txt
 gemstone                       file_1.txt

 dynamic                                         file_2.txt
 reflective                                      file_2.txt
 programming                                     file_2.txt

 song                                                                 file_3.txt
 english                                                              file_3.txt
 rock                                                                 file_3.txt

http://en.wikipedia.org/wiki/Index_(search_engine)#Inverted_indices
HOW DOES SEARCH WORK?

The inverted index

MySearchLib.search  "song"

 ruby                           file_1.txt        file_2.txt          file_3.txt
 pink                           file_1.txt
 gemstone                       file_1.txt

 dynamic                                         file_2.txt
 reflective                                      file_2.txt
 programming                                     file_2.txt

 song                                                                 file_3.txt
 english                                                              file_3.txt
 rock                                                                 file_3.txt

http://en.wikipedia.org/wiki/Index_(search_engine)#Inverted_indices
HOW DOES SEARCH WORK?

The inverted index

MySearchLib.search  "ruby  AND  song"

 ruby                           file_1.txt        file_2.txt          file_3.txt
 pink                           file_1.txt
 gemstone                       file_1.txt

 dynamic                                         file_2.txt
 reflective                                      file_2.txt
 programming                                     file_2.txt

 song                                                                 file_3.txt
 english                                                              file_3.txt
 rock                                                                 file_3.txt

http://en.wikipedia.org/wiki/Index_(search_engine)#Inverted_indices
module  SimpleSearch
                                                                           A naïve Ruby implementation
    def  index  document,  content
        tokens  =  analyze  content
        store  document,  tokens
        puts  "Indexed  document  #{document}  with  tokens:",  tokens.inspect,  "n"
    end

    def  analyze  content
        #  >>>  Split  content  by  words  into  "tokens"
        content.split(/W/).
        #  >>>  Downcase  every  word
        map        {  |word|  word.downcase  }.
        #  >>>  Reject  stop  words,  digits  and  whitespace
        reject  {  |word|  STOPWORDS.include?(word)  ||  word  =~  /^d+/  ||  word  ==  ''    }
    end

    def  store  document_id,  tokens
        tokens.each  do  |token|
            #  >>>  Save  the  "posting"
            (  (INDEX[token]  ||=  [])  <<  document_id  ).uniq!
        end
    end

    def  search  token
        puts  "Results  for  token  '#{token}':"
        #  >>>  Print  documents  stored  in  index  for  this  token
        INDEX[token].each  {  |document|  "    *  #{document}"  }
    end

    INDEX  =  {}
    STOPWORDS  =  %w|a  an  and  are  as  at  but  by  for  if  in  is  it  no  not  of  on  or  that  the  then  there  t

    extend  self

end
HOW DOES SEARCH WORK?

Indexing documents


SimpleSearch.index  "file1",  "Ruby  is  a  language.  Java  is  also  a  language.
SimpleSearch.index  "file2",  "Ruby  is  a  song."
SimpleSearch.index  "file3",  "Ruby  is  a  stone."
SimpleSearch.index  "file4",  "Java  is  a  language."


Indexed  document  file1  with  tokens:
["ruby",  "language",  "java",  "also",  "language"]

Indexed  document  file2  with  tokens:
["ruby",  "song"]                                         Words downcased,
                                                          stopwords removed.
Indexed  document  file3  with  tokens:
["ruby",  "stone"]

Indexed  document  file4  with  tokens:
["java",  "language"]
HOW DOES SEARCH WORK?

The index


puts  "What's  in  our  index?"
p  SimpleSearch::INDEX
{
    "ruby"          =>  ["file1",  "file2",  "file3"],
    "language"  =>  ["file1",  "file4"],
    "java"          =>  ["file1",  "file4"],
    "also"          =>  ["file1"],
    "stone"        =>  ["file3"],
    "song"          =>  ["file2"]
}
HOW DOES SEARCH WORK?

Search the index



SimpleSearch.search  "ruby"
Results  for  token  'ruby':
*  file1
*  file2
*  file3
HOW DOES SEARCH WORK?

The inverted index

TOKENS                         POSTINGS



 ruby    3                      file_1.txt        file_2.txt          file_3.txt
 pink    1                      file_1.txt
 gemstone                       file_1.txt

 dynamic                                         file_2.txt
 reflective                                      file_2.txt
 programming                                     file_2.txt

 song                                                                 file_3.txt
 english                                                              file_3.txt
 rock                                                                 file_3.txt

http://en.wikipedia.org/wiki/Index_(search_engine)#Inverted_indices
It is very practical to know how search works.

For instance, now you know that
the analysis step is very important.

It's more important than the “search” step.




                                                 ElasticSearch
module  SimpleSearch

    def  index  document,  content
        tokens  =  analyze  content
        store  document,  tokens
        puts  "Indexed  document  #{document}  with  tokens:",  tokens.inspect,  "n"
    end

    def  analyze  content
        #  >>>  Split  content  by  words  into  "tokens"
        content.split(/W/).
        #  >>>  Downcase  every  word
        map        {  |word|  word.downcase  }.
        #  >>>  Reject  stop  words,  digits  and  whitespace
        reject  {  |word|  STOPWORDS.include?(word)  ||  word  =~  /^d+/  ||  word  ==  ''    }
    end

    def  store  document_id,  tokens
        tokens.each  do  |token|
            #  >>>  Save  the  "posting"
            (  (INDEX[token]  ||=  [])  <<  document_id  ).uniq!
        end
    end

    def  search  token
        puts  "Results  for  token  '#{token}':"
        #  >>>  Print  documents  stored  in  index  for  this  token
        INDEX[token].each  {  |document|  "    *  #{document}"  }
    end

    INDEX  =  {}
    STOPWORDS  =  %w|a  an  and  are  as  at  but  by  for  if  in  is  it  no  not  of  on  or  that  the  then  there  t

    extend  self

end
                                                                         A naïve Ruby implementation
HOW DOES SEARCH WORK?

The Search Engine Textbook




                                 Search Engines
                                 Information Retrieval in Practice
                                 Bruce Croft, Donald Metzler and Trevor Strohma
                                 Addison Wesley, 2009




http://search-engines-book.com
SEARCH IMPLEMENTATIONS

The Baseline Information Retrieval Implementation




                              Lucene in Action
                              Michael McCandless, Erik Hatcher and Otis Gospodnetic
                              July, 2010




http://manning.com/hatcher3
http://elasticsearch.org
ElasticSearch is an open source, scalable,
distributed, cloud-ready, highly-available full-
text search engine and database with powerfull
aggregation features, communicating by JSON
over RESTful HTTP, based on Apache Lucene.


                                            ElasticSearch
{ }
HTTP
JSON
Schema-free
Index as Resource
Distributed
Queries
Facets
Mapping
Ruby
                    ElasticSearch
ELASTICSEARCH FEATURES

HTTP / JSON / Schema-free / Index as Resource / Distributed / Queries / Facets / Mapping / Ruby

#  Add  a  document
curl  -­‐X  POST  

    "http://localhost:9200/articles/article/1"    
    
                                                                      INDEX                 TYPE   ID




    -­‐d  '{  "title"  :  "One"  }'

                                   DOCUMENT
ELASTICSEARCH FEATURES

HTTP / JSON / Schema-free / Index as Resource / Distributed / Queries / Facets / Mapping / Ruby
#  Add  a  document
curl  -­‐X  POST  "http://localhost:9200/articles/article/1"  -­‐d  '{  "title"  :  "One"  }'

#  Perform  query
curl  -­‐X  GET    "http://localhost:9200/articles/_search?q=One"
curl  -­‐X  POST  "http://localhost:9200/articles/_search"  -­‐d  '{
    "query"  :  {  "terms"  :  {  "tags"  :  ["ruby",  "python"],  "minimum_match"  :  2  }  }
}'

#  Delete  index
curl  -­‐X  DELETE    "http://localhost:9200/articles"

#  Create  index  with  settings  and  mapping
curl  -­‐X  PUT      "http://localhost:9200/articles"  -­‐d  '
{  "settings"  :  {  "index"  :  "number_of_shards"  :  3,  "number_of_replicas"  :  2  }},
{  "mappings"  :  {  "document"  :  {
                                      "properties"  :  {
                                          "body"  :  {  "type"  :  "string",  "analyzer"  :  "snowball"  }
                                      }
                              }  }
}'
ELASTICSEARCH FEATURES

HTTP / JSON / Schema-free / Index as Resource / Distributed / Queries / Facets / Mapping / Ruby
http  {         GET  http://user:password@localhost:8080/_search?q=*  =>  http://localhost:9200/user/_search?q=*
    server  {

        listen              8080;
        server_name    search.example.com;

        error_log      elasticsearch-­‐errors.log;
        access_log    elasticsearch.log;

        location  /  {

            #  Deny  access  to  Cluster  API
            if  ($request_filename  ~  "_cluster")  {
                return  403;
                                                                          #664 Add HTTPS and basic
                                                                          authentication support
                                                                                                           NO.
                break;
            }

            #  Pass  requests  to  ElasticSearch
            proxy_pass  http://localhost:9200;
            proxy_redirect  off;
                    
            proxy_set_header    X-­‐Real-­‐IP    $remote_addr;
            proxy_set_header    X-­‐Forwarded-­‐For  $proxy_add_x_forwarded_for;
            proxy_set_header    Host  $http_host;

            #  Authorize  access
            auth_basic                      "ElasticSearch";
            auth_basic_user_file  passwords;

            #  Route  all  requests  to  authorized  user's  own  index
            rewrite    ^(.*)$    /$remote_user$1    break;
            rewrite_log  on;

            return  403;
        
        }
                                                                                            https://gist.github.com/986390
    }
ELASTICSEARCH FEATURES

         JSON / Schema-free / Index as Resource / Distributed / Queries / Facets / Mapping / Ruby
                                                                                                      ON
HTTP /


                                                                                                    JS
{
    "id"        :  "abc123",

    "title"  :  "ElasticSearch  Understands  JSON!",

    "body"    :  "ElasticSearch  not  only  “works”  with  JSON,  it  understands  it!  Let’s  first  .

    "published_on"  :  "2011/05/27  10:00:00",
    
    "tags"    :  ["search",  "json"],

    "author"  :  {
        "first_name"  :  "Clara",
        "last_name"    :  "Rice",
        "email"            :  "clara@rice.org"
    }
}
ELASTICSEARCH FEATURES

HTTP /   JSON / Schema-free / Index as Resource / Distributed / Queries / Facets / Mapping / Ruby

curl  -­‐X  DELETE  "http://localhost:9200/articles";  sleep  1
curl  -­‐X  POST      "http://localhost:9200/articles/article"  -­‐d  '
{
    "id"        :  "abc123",

    "title"  :  "ElasticSearch  Understands  JSON!",

    "body"    :  "ElasticSearch  not  only  “works”  with  JSON,  it  understands  it!  Let’s  first  .

    "published_on"  :  "2011/05/27  10:00:00",
    
    "tags"    :  ["search",  "json"],

    "author"  :  {
        "first_name"  :  "Clara",
        "last_name"    :  "Rice",
        "email"            :  "clara@rice.org"
    }
}'
curl  -­‐X  POST      "http://localhost:9200/articles/_refresh"



curl  -­‐X  GET  
    "http://localhost:9200/articles/article/_search?q=author.first_name:clara"
ELASTICSEARCH FEATURES

HTTP /   JSON / Schema-free / Index as Resource / Distributed / Queries / Facets / Mapping / Ruby
curl  -­‐X  POST      "http://localhost:9200/articles/article"  -­‐d  '
...
"published_on"  :  "2011/05/27  10:00:00",
...

curl  -­‐X  GET        "http://localhost:9200/articles/_mapping?pretty=true"
{
    "articles"  :  {
        "article"  :  {
            "properties"  :  {
                "title"  :  {
                    "type"  :  "string"
                },
                //  ...
                "author"  :  {
                    "dynamic"  :  "true",
                    "properties"  :  {
                        "first_name"  :  {
                            "type"  :  "string"
                        },
                        //  ...
                    }
                },
                "published_on"  :  {
                    "format"  :  "yyyy/MM/dd  HH:mm:ss||yyyy/MM/dd",
                    "type"  :  "date"
                }
            }
        }
    }
}
ELASTICSEARCH FEATURES

HTTP / JSON /   Schema Free / Index as Resource / Distributed / Queries / Facets / Mapping / Ruby
curl  -­‐X  POST      "http://localhost:9200/articles/comment"  -­‐d  '
{
    
    "body"  :  "Wow!  Really  nice  JSON  support.",
                                                                            DIFFERENT TYPE
    "published_on"  :  "2011/05/27  10:05:00",

    "author"  :  {
        "first_name"  :  "John",
        "last_name"    :  "Pear",
        "email"            :  "john@pear.org"
    }
}'
curl  -­‐X  POST      "http://localhost:9200/articles/_refresh"


curl  -­‐X  GET  
    "http://localhost:9200/articles/comment/_search?q=author.first_name:john"
ELASTICSEARCH FEATURES

HTTP / JSON / Schema Free /   Index as Resource / Distributed / Queries / Facets / Mapping / Ruby
curl  -­‐X  GET  
    "http://localhost:9200/articles/comment/_search?q=body:json"



                                                       Search single type
curl  -­‐X  GET  
    "http://localhost:9200/articles/_search?q=body:json"



                                             Search whole index
curl  -­‐X  GET  
    "http://localhost:9200/articles,users/_search?q=body:json"



                                                 Search multiple indices
curl  -­‐X  GET  
    "http://localhost:9200/_search?q=body:json"



                                        Search all indices
ELASTICSEARCH FEATURES

 HTTP / JSON / Schema Free /    Index as Resource / Distributed / Queries / Facets / Mapping / Ruby
 curl  -­‐X  DELETE  "http://localhost:9200/articles";  sleep  1
 curl  -­‐X  POST      "http://localhost:9200/articles/article"  -­‐d  '
 {
     "id"        :  "abc123",

     "title"  :  "ElasticSearch  Understands  JSON!",

     "body"    :  "ElasticSearch  not  only  “works”  with  JSON,  it  understands  it!  Let’s  first  ...",

     "published_on"  :  "2011/05/27  10:00:00",
     
     "tags"    :  ["search",  "json"],

     "author"  :  {
         "first_name"  :  "Clara",
         "last_name"    :  "Rice",
         "email"            :  "clara@rice.org"
     }
 }'
 curl  -­‐X  POST      "http://localhost:9200/articles/_refresh"




curl  -­‐X  GET  "http://localhost:9200/articles/article/abc123"
ELASTICSEARCH FEATURES

HTTP / JSON / Schema Free /   Index as Resource / Distributed / Queries / Facets / Mapping / Ruby
{"_index":"articles","_type":"article","_id":"1","_version":1,  "_source"  :  
{
    "id"        :  "1",

    "title"  :  "ElasticSearch  Understands  JSON!",

    "body"    :  "ElasticSearch  not  only  “works”  with  JSON,  it  understands  it!  Let’s  

    "published_on"  :  "2011/05/27  10:00:00",
    
    "tags"    :  ["search",  "json"],

    "author"  :  {
        "first_name"  :  "Clara",
        "last_name"    :  "Rice",
        "email"            :  "clara@rice.org"
    }
}}


                                        “The Index Is Your Database”
ELASTICSEARCH FEATURES

HTTP / JSON / Schema Free /   Index as Resource / Distributed / Queries / Facets / Mapping / Ruby
Index Aliases




                                                   curl  -­‐X  POST  'http://localhost:9200/_aliases'  -­‐d  '
                                                   {
                                                       "actions"  :  [
                                                           {  "add"  :  {
                                 index_A                           "index"  :  "index_1",
                                                                   "alias"  :  "myalias"
my_alias                                                       }
                                                           },
                                                           {  "add"  :  {
                                                                   "index"  :  "index_2",
                                                                   "alias"  :  "myalias"
                                 index_B                       }
                                                           }
                                                       ]
                                                   }'




http://www.elasticsearch.org/guide/reference/api/admin-indices-aliases.html
ELASTICSEARCH FEATURES

HTTP / JSON / Schema Free /   Index as Resource / Distributed / Queries / Facets / Mapping / Ruby
The “Sliding Window” problem




     curl  -­‐X  DELETE  http://localhost:9200  /  logs_2010_01




                                                    logs_2010_02

                                logs

                                                    logs_2010_03




                                                    logs_2010_04




“We can really store only three months worth of data.”
ELASTICSEARCH FEATURES

HTTP / JSON / Schema Free /   Index as Resource / Distributed / Queries / Facets / Mapping / Ruby
Index Templates

curl  -­‐X  PUT  localhost:9200/_template/bookmarks_template  -­‐d  '
{
    "template"  :  "users_*",                                                     Apply this configuration
                                                                                  for every matching
    "settings"  :  {                                                              index being created
        "index"  :  {
            "number_of_shards"      :  1,
            "number_of_replicas"  :  3
        }
    },

    "mappings":  {
        "url":  {
            "properties":  {
                "url":  {
                    "type":  "string",  "analyzer":  "url_ngram",  "boost":  10
                },
                "title":  {
                    "type":  "string",  "analyzer":  "snowball",  "boost":  5
                }
                //  ...
            }
        }
    }
}
'
http://www.elasticsearch.org/guide/reference/api/admin-indices-templates.html
ELASTICSEARCH FEATURES

HTTP / JSON / Schema Free / Index as Resource /   Distributed / Queries / Facets / Mapping / Ruby

                                             $  cat  elasticsearch.yml
                                             cluster:
                                                 name:  <YOUR  APPLICATION>




                                        Automatic Discovery Protocol



                                                                                                MASTER
    Node 1                            Node 2                          Node 3                    Node 4




http://www.elasticsearch.org/guide/reference/modules/discovery/
ELASTICSEARCH FEATURES

HTTP / JSON / Schema Free / Index as Resource /   Distributed / Queries / Facets / Mapping / Ruby



 Index         A        is split into 3 shards, and duplicated in 2 replicas.


               A1          A1'       A1''            Replicas
               A2          A2'       A2''

               A3          A3'       A3''
                                                      curl  -­‐XPUT  'http://localhost:9200/A/'  -­‐d  '{
                                                              "settings"  :  {
                                                                      "index"  :  {
           Shards                                                             "number_of_shards"      :  3,
                                                                              "number_of_replicas"  :  2
                                                                      }
                                                              }
                                                      }'
ELASTICSEARCH FEATURES

 HTTP / JSON / Schema Free / Index as Resource /   Distributed / Queries / Facets / Mapping / Ruby
Im
 pr




                                                                                                                  ce
    ove




                                                                                                              an
                                                                                                             rm
        in
          de




                                                                                                          rfo
             xi




                                                                                                     pe
             ng




                                                                                                      h
                  pe




                                                                                                 a rc
                    rfo




                                                                                              se
                     rm




                                                                                               e
                                                                                            ov
                       an




                                                                                         pr
                          ce




                                                                                       Im
                                         SH
                                             AR




                                                                   AS
                                                    DS


                                                                 IC
                                                              PL
                                                           RE
ELASTICSEARCH FEATURES

HTTP / JSON / Schema Free / Index as Resource /   Distributed / Queries / Facets / Mapping / Ruby


                                                       Y U NO ASK FIRST???
ELASTICSEARCH FEATURES

HTTP / JSON / Schema Free / Index as Resource /   Distributed / Queries / Facets / Mapping / Ruby

Indexing 100 000 documents (~ 56MB), one shard, no replicas, MacBookAir SSD 2GB

#  Index  all  at  once
time  curl  -­‐s  -­‐X  POST  "http://localhost:9200/_bulk"  
    -­‐-­‐data-­‐binary  @data/bulk_all.json  >  /dev/null
real         2m1.142s

#  Index  in  batches  of  1000
for  file  in  data/bulk_*.json;  do
    time  curl  -­‐s  -­‐X  POST  "http://localhost:9200/_bulk"  
        -­‐-­‐data-­‐binary  @$file  >  /dev/null
done
real         1m36.697s  (-­‐25sec,  80%)

#  Do  not  refresh  during  indexing  in  batches
"settings"  :  {  "refresh_interval"  :  "-­‐1"  }
for  file  in  data/bulk_*.json;  do
...
real         0m38.859s  (-­‐82sec,  32%)
ELASTICSEARCH FEATURES

HTTP / JSON / Schema Free / Distributed /   Queries / Facets / Mapping / Ruby
                    $  curl  -­‐X  GET  "http://localhost:9200/_search?q=<YOUR  QUERY>"

                                        apple
                Terms
                                        apple  iphone
               Phrases                  "apple  iphone"

              Proximity                 "apple  safari"~5

                Fuzzy                   apple~0.8
                                        app*
              Wildcards
                                        *pp*
              Boosting                  apple^10  safari
                                        [2011/05/01  TO  2011/05/31]
                Range
                                        [java  TO  json]
                                        apple  AND  NOT  iphone
                                        +apple  -­‐iphone
               Boolean
                                        (apple  OR  iphone)  AND  NOT  review

                                        title:iphone^15  OR  body:iphone
                Fields                  published_on:[2011/05/01  TO  "2011/05/27  10:00:00"]

http://lucene.apache.org/java/3_1_0/queryparsersyntax.html
ELASTICSEARCH FEATURES

                                            Queries / Facets / Mapping / Ruby
                                                                                  ON
HTTP / JSON / Schema Free / Distributed /


                                                                                JS
Query DSL

curl  -­‐X  POST  "http://localhost:9200/articles/_search?pretty=true"  -­‐d  '
{
    "query"  :  {
        "terms"  :  {
            "tags"  :  [  "ruby",  "python"  ],
            "minimum_match"  :  2
        }
    }
}'




http://www.elasticsearch.org/guide/reference/query-dsl/
ELASTICSEARCH FEATURES

HTTP / JSON / Schema Free / Distributed /   Queries / Facets / Mapping / Ruby

Geo Search                                                                        Accepted  formats  for  Geo:
                                                                                  [lon, lat]        # Array
curl  -­‐X  POST  "http://localhost:9200/venues/venue"  -­‐d  '
{
                                                                                  "lat,lon"         # String
    "name":  "Pizzeria",                                                          drm3btev3e86      # Geohash
    "pin":  {
        "location":  {
            "lat":  50.071712,
            "lon":  14.386832
        }
    }
}'


curl  -­‐X  POST  "http://localhost:9200/venues/_search?pretty=true"  -­‐d  '
{
    "query"  :  {
        "filtered"  :  {
                "query"  :  {  "query_string"  :  {  "query"  :  "pizzeria"  }  },
                "filter"  :  {
                        "geo_distance"  :  {
                                "distance"  :  "0.5km",
                                "pin.location"  :  {  "lat"  :  50.071481,  "lon"  :  14.387284  }
                        }
                }
        }
    }
}'

http://www.elasticsearch.org/guide/reference/query-dsl/geo-distance-filter.html
ELASTICSEARCH FEATURES

    HTTP / JSON / Schema Free / Distributed / Queries /   Facets / Mapping / Ruby



Query




Facets




                                                           http://blog.linkedin.com/2009/12/14/linkedin-faceted-search/
ELASTICSEARCH FEATURES

HTTP / JSON / Schema Free / Distributed / Queries /   Facets / Mapping / Ruby

curl  -­‐X  POST  "http://localhost:9200/articles/_search?pretty=true"  -­‐d  '
{
    "query"  :  {
        "query_string"  :  {  "query"  :  "title:T*"}                             User query
    },
    "filter"  :  {
        "terms"  :  {  "tags"  :  ["ruby"]  }                                     “Checkboxes”
    },
    "facets"  :  {
        "tags"  :  {
            "terms"  :  {                                                         Facets
                    "field"  :  "tags",
                    "size"  :  10
            }
        }
    }
}'

#  facets"  :  {
#      "tags"  :  {
#          "terms"  :  [  {
#              "term"  :  "ruby",
#              "count"  :  2
#          },  {
#              "term"  :  "python",
#              "count"  :  1
#          },  {
#              "term"  :  "java",
#              "count"  :  1
#          }  ]
#      }
#  }




http://www.elasticsearch.org/guide/reference/api/search/facets/index.html
ELASTICSEARCH FEATURES

HTTP / JSON / Schema Free / Distributed / Queries /   Facets / Mapping / Ruby


  curl  -­‐X  POST  "http://localhost:9200/articles/_search?pretty=true"  -­‐d  '
  {
      "facets"  :  {
          "published_on"  :  {
              "date_histogram"  :  {
                  "field"        :  "published",
                  "interval"  :  "day"
              }
          }
      }
  }'
ELASTICSEARCH FEATURES

HTTP / JSON / Schema Free / Distributed / Queries /   Facets / Mapping / Ruby

   Geo Facets
  curl  -­‐X  POST  "http://localhost:9200/venues/_search?pretty=true"  -­‐d  '
  {
          "query"  :  {  "query_string"  :  {  "query"  :  "pizzeria"  }  },
          "facets"  :  {
                  "distance_count"  :  {
                          "geo_distance"  :  {
                                  "pin.location"  :  {
                                          "lat"  :  50.071712,
                                          "lon"  :  14.386832
                                  },
                                  "ranges"  :  [
                                          {  "to"  :  1  },
                                          {  "from"  :  1,  "to"  :  5  },
                                          {  "from"  :  5,  "to"  :  10  }
                                  ]
                          }
                  }
          }
  }'

   http://www.elasticsearch.org/guide/reference/api/search/facets/geo-distance-facet.html
ELASTICSEARCH FEATURES

HTTP / JSON / Schema Free / Distributed / Queries / Facets /     Mapping / Ruby
curl  -­‐X  DELETE  "http://localhost:9200/articles"
curl  -­‐X  POST      "http://localhost:9200/articles/article"  -­‐d  '
{
    "mappings":  {
        "article":  {
            "properties":  {
                "tags":  {
                    "type":  "string",
                    "analyzer":  "keyword"
                },
                "content":  {
                    "type":  "string",
                    "analyzer":  "snowball"
                },
                "title":  {
                    "type":  "string",
                    "analyzer":  "snowball",
                    "boost":        10.0
                }
            }
        }
    }
}'

curl  -­‐X  GET        'http://localhost:9200/articles/_mapping?pretty=true'
                                                                                           Remember?
                                                                                       def  analyze  content
                                                                                           #  >>>  Split  content  by  words  into  "tokens"
                                                                                           content.split(/W/).
                                                                                           #  >>>  Downcase  every  word
                                                                                           map        {  |word|  word.downcase  }.
                                                                                           #  ...
http://www.elasticsearch.org/guide/reference/api/admin-indices-create-index.html       end
ELASTICSEARCH FEATURES

HTTP / JSON / Schema Free / Distributed / Queries / Facets /      Mapping / Ruby
curl  -­‐X  DELETE  "http://localhost:9200/urls"
curl  -­‐X  POST      "http://localhost:9200/urls/url"  -­‐d  '
{
    "settings"  :  {
        "index"  :  {
            "analysis"  :  {
                "analyzer"  :  {
                    "url_analyzer"  :  {
                        "type"  :  "custom",
                        "tokenizer"  :  "lowercase",
                        "filter"        :  ["stop",  "url_stop",  "url_ngram"]
                    }
                },
                "filter"  :  {
                    "url_stop"  :  {
                        "type"  :  "stop",
                        "stopwords"  :  ["http",  "https",  "www"]
                    },
                    "url_ngram"  :  {
                        "type"  :  "nGram",
                        "min_gram"  :  3,
                        "max_gram"  :  5
                    }
                }
            }
        }
    }
}'


https://gist.github.com/988923
ELASTICSEARCH FEATURES

HTTP / JSON / Schema Free / Distributed / Queries / Facets / Mapping /   Ruby

Tire.index  'articles'  do
    delete
    create

    store  :title  =>  'One',      :tags  =>  ['ruby'],                      :published_on  =>  '2011-­‐01-­‐01'
    store  :title  =>  'Two',      :tags  =>  ['ruby',  'python'],  :published_on  =>  '2011-­‐01-­‐02'
    store  :title  =>  'Three',  :tags  =>  ['java'],                      :published_on  =>  '2011-­‐01-­‐02'
    store  :title  =>  'Four',    :tags  =>  ['ruby',  'php'],        :published_on  =>  '2011-­‐01-­‐03'

    refresh
end



s  =  Tire.search  'articles'  do
    query  {  string  'title:T*'  }

    filter  :terms,  :tags  =>  ['ruby']

    sort  {  title  'desc'  }


                                           http://github.com/karmi/tire
    facet  'global-­‐tags'    {  terms  :tags,  :global  =>  true  }

    facet  'current-­‐tags'  {  terms  :tags  }
end
ELASTICSEARCH FEATURES

HTTP / JSON / Schema Free / Distributed / Queries / Facets / Mapping /   Ruby

class  Article  <  ActiveRecord::Base
    include  Tire::Model::Search
    include  Tire::Model::Callbacks
end



$  rake  environment  tire:import  CLASS='Article'



Article.search  do
    query  {  string  'love'  }
    facet('timeline')  {  date  :published_on,  :interval  =>  'month'  }
    sort    {  published_on  'desc'  }
end




                                  http://github.com/karmi/tire
ELASTICSEARCH FEATURES

HTTP / JSON / Schema Free / Distributed / Queries / Facets / Mapping /   Ruby

class  Article
    include  Whatever::ORM

    include  Tire::Model::Search
    include  Tire::Model::Callbacks
end



$  rake  environment  tire:import  CLASS='Article'



Article.search  do
    query  {  string  'love'  }
    facet('timeline')  {  date  :published_on,  :interval  =>  'month'  }
    sort    {  published_on  'desc'  }
end


                                  http://github.com/karmi/tire
Try ElasticSearch in a Ruby On Rails aplication with a one-line command

$  rails  new  tired  -­‐m  "https://gist.github.com/raw/951343/tired.rb"




  A “batteries included” installation.
  Downloads and launches ElasticSearch.
  Sets up a Rails applicationand and launches it.
  When you're tired of it, just delete the folder.
Thanks!
  d

Más contenido relacionado

La actualidad más candente

Text as Data: processing the Hebrew Bible
Text as Data: processing the Hebrew BibleText as Data: processing the Hebrew Bible
Text as Data: processing the Hebrew BibleDirk Roorda
 
2016 bioinformatics i_io_wim_vancriekinge
2016 bioinformatics i_io_wim_vancriekinge2016 bioinformatics i_io_wim_vancriekinge
2016 bioinformatics i_io_wim_vancriekingeProf. Wim Van Criekinge
 
SSONDE: Semantic Similarity On liNked Data Entities
SSONDE: Semantic Similarity On liNked Data EntitiesSSONDE: Semantic Similarity On liNked Data Entities
SSONDE: Semantic Similarity On liNked Data EntitiesRiccardo Albertoni
 
Bio ontologies and semantic technologies
Bio ontologies and semantic technologiesBio ontologies and semantic technologies
Bio ontologies and semantic technologiesProf. Wim Van Criekinge
 
2010 06 ipaw_prv
2010 06 ipaw_prv2010 06 ipaw_prv
2010 06 ipaw_prvJun Zhao
 
The Role of Metadata in Reproducible Computational Research
The Role of Metadata in Reproducible Computational ResearchThe Role of Metadata in Reproducible Computational Research
The Role of Metadata in Reproducible Computational ResearchJeremy Leipzig
 

La actualidad más candente (6)

Text as Data: processing the Hebrew Bible
Text as Data: processing the Hebrew BibleText as Data: processing the Hebrew Bible
Text as Data: processing the Hebrew Bible
 
2016 bioinformatics i_io_wim_vancriekinge
2016 bioinformatics i_io_wim_vancriekinge2016 bioinformatics i_io_wim_vancriekinge
2016 bioinformatics i_io_wim_vancriekinge
 
SSONDE: Semantic Similarity On liNked Data Entities
SSONDE: Semantic Similarity On liNked Data EntitiesSSONDE: Semantic Similarity On liNked Data Entities
SSONDE: Semantic Similarity On liNked Data Entities
 
Bio ontologies and semantic technologies
Bio ontologies and semantic technologiesBio ontologies and semantic technologies
Bio ontologies and semantic technologies
 
2010 06 ipaw_prv
2010 06 ipaw_prv2010 06 ipaw_prv
2010 06 ipaw_prv
 
The Role of Metadata in Reproducible Computational Research
The Role of Metadata in Reproducible Computational ResearchThe Role of Metadata in Reproducible Computational Research
The Role of Metadata in Reproducible Computational Research
 

Similar a Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)

Your Data, Your Search, ElasticSearch (EURUKO 2011)
Your Data, Your Search, ElasticSearch (EURUKO 2011)Your Data, Your Search, ElasticSearch (EURUKO 2011)
Your Data, Your Search, ElasticSearch (EURUKO 2011)Karel Minarik
 
Elasticsearch Basics
Elasticsearch BasicsElasticsearch Basics
Elasticsearch BasicsShifa Khan
 
Elasticsearch - under the hood
Elasticsearch - under the hoodElasticsearch - under the hood
Elasticsearch - under the hoodSmartCat
 
Chapter 3 Indexing.pdf
Chapter 3 Indexing.pdfChapter 3 Indexing.pdf
Chapter 3 Indexing.pdfHabtamu100
 
NoSQL Couchbase Lite & BigData HPCC Systems
NoSQL Couchbase Lite & BigData HPCC SystemsNoSQL Couchbase Lite & BigData HPCC Systems
NoSQL Couchbase Lite & BigData HPCC SystemsFujio Turner
 
HPCC Systems vs Hadoop
HPCC Systems vs HadoopHPCC Systems vs Hadoop
HPCC Systems vs HadoopFujio Turner
 
Text based search engine on a fixed corpus and utilizing indexation and ranki...
Text based search engine on a fixed corpus and utilizing indexation and ranki...Text based search engine on a fixed corpus and utilizing indexation and ranki...
Text based search engine on a fixed corpus and utilizing indexation and ranki...Soham Mondal
 
Introduction to libre « fulltext » technology
Introduction to libre « fulltext » technologyIntroduction to libre « fulltext » technology
Introduction to libre « fulltext » technologyRobert Viseur
 
About elasticsearch
About elasticsearchAbout elasticsearch
About elasticsearchMinsoo Jun
 
Semantic search within Earth Observation products databases based on automati...
Semantic search within Earth Observation products databases based on automati...Semantic search within Earth Observation products databases based on automati...
Semantic search within Earth Observation products databases based on automati...Gasperi Jerome
 
Nltk natural language toolkit overview and application @ PyCon.tw 2012
Nltk  natural language toolkit overview and application @ PyCon.tw 2012Nltk  natural language toolkit overview and application @ PyCon.tw 2012
Nltk natural language toolkit overview and application @ PyCon.tw 2012Jimmy Lai
 
[@IndeedEng] From 1 To 1 Billion: Evolution of Indeed's Document Serving System
[@IndeedEng] From 1 To 1 Billion: Evolution of Indeed's Document Serving System[@IndeedEng] From 1 To 1 Billion: Evolution of Indeed's Document Serving System
[@IndeedEng] From 1 To 1 Billion: Evolution of Indeed's Document Serving Systemindeedeng
 
Apache lucene - full text search
Apache lucene - full text searchApache lucene - full text search
Apache lucene - full text searchMarcelo Cure
 
SPARQL1.1 Tutorial, given in UChile by Axel Polleres (DERI)
SPARQL1.1 Tutorial, given in UChile by Axel Polleres (DERI)SPARQL1.1 Tutorial, given in UChile by Axel Polleres (DERI)
SPARQL1.1 Tutorial, given in UChile by Axel Polleres (DERI)net2-project
 
Querying rich text with XQuery
Querying rich text with XQueryQuerying rich text with XQuery
Querying rich text with XQuerylucenerevolution
 
Falcon Full Text Search Engine
Falcon Full Text Search EngineFalcon Full Text Search Engine
Falcon Full Text Search EngineHideshi Ogoshi
 

Similar a Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague) (20)

Your Data, Your Search, ElasticSearch (EURUKO 2011)
Your Data, Your Search, ElasticSearch (EURUKO 2011)Your Data, Your Search, ElasticSearch (EURUKO 2011)
Your Data, Your Search, ElasticSearch (EURUKO 2011)
 
Elasticsearch Basics
Elasticsearch BasicsElasticsearch Basics
Elasticsearch Basics
 
IR with lucene
IR with luceneIR with lucene
IR with lucene
 
Elasticsearch - under the hood
Elasticsearch - under the hoodElasticsearch - under the hood
Elasticsearch - under the hood
 
Chapter 3 Indexing.pdf
Chapter 3 Indexing.pdfChapter 3 Indexing.pdf
Chapter 3 Indexing.pdf
 
NLP and LSA getting started
NLP and LSA getting startedNLP and LSA getting started
NLP and LSA getting started
 
NoSQL Couchbase Lite & BigData HPCC Systems
NoSQL Couchbase Lite & BigData HPCC SystemsNoSQL Couchbase Lite & BigData HPCC Systems
NoSQL Couchbase Lite & BigData HPCC Systems
 
HPCC Systems vs Hadoop
HPCC Systems vs HadoopHPCC Systems vs Hadoop
HPCC Systems vs Hadoop
 
LibreCat::Catmandu
LibreCat::CatmanduLibreCat::Catmandu
LibreCat::Catmandu
 
Text based search engine on a fixed corpus and utilizing indexation and ranki...
Text based search engine on a fixed corpus and utilizing indexation and ranki...Text based search engine on a fixed corpus and utilizing indexation and ranki...
Text based search engine on a fixed corpus and utilizing indexation and ranki...
 
Introduction to libre « fulltext » technology
Introduction to libre « fulltext » technologyIntroduction to libre « fulltext » technology
Introduction to libre « fulltext » technology
 
About elasticsearch
About elasticsearchAbout elasticsearch
About elasticsearch
 
Semantic search within Earth Observation products databases based on automati...
Semantic search within Earth Observation products databases based on automati...Semantic search within Earth Observation products databases based on automati...
Semantic search within Earth Observation products databases based on automati...
 
Nltk natural language toolkit overview and application @ PyCon.tw 2012
Nltk  natural language toolkit overview and application @ PyCon.tw 2012Nltk  natural language toolkit overview and application @ PyCon.tw 2012
Nltk natural language toolkit overview and application @ PyCon.tw 2012
 
[@IndeedEng] From 1 To 1 Billion: Evolution of Indeed's Document Serving System
[@IndeedEng] From 1 To 1 Billion: Evolution of Indeed's Document Serving System[@IndeedEng] From 1 To 1 Billion: Evolution of Indeed's Document Serving System
[@IndeedEng] From 1 To 1 Billion: Evolution of Indeed's Document Serving System
 
Apache lucene - full text search
Apache lucene - full text searchApache lucene - full text search
Apache lucene - full text search
 
lec6
lec6lec6
lec6
 
SPARQL1.1 Tutorial, given in UChile by Axel Polleres (DERI)
SPARQL1.1 Tutorial, given in UChile by Axel Polleres (DERI)SPARQL1.1 Tutorial, given in UChile by Axel Polleres (DERI)
SPARQL1.1 Tutorial, given in UChile by Axel Polleres (DERI)
 
Querying rich text with XQuery
Querying rich text with XQueryQuerying rich text with XQuery
Querying rich text with XQuery
 
Falcon Full Text Search Engine
Falcon Full Text Search EngineFalcon Full Text Search Engine
Falcon Full Text Search Engine
 

Más de Karel Minarik

Vizualizace dat a D3.js [EUROPEN 2014]
Vizualizace dat a D3.js [EUROPEN 2014]Vizualizace dat a D3.js [EUROPEN 2014]
Vizualizace dat a D3.js [EUROPEN 2014]Karel Minarik
 
Elasticsearch (Rubyshift 2013)
Elasticsearch (Rubyshift 2013)Elasticsearch (Rubyshift 2013)
Elasticsearch (Rubyshift 2013)Karel Minarik
 
Elasticsearch in 15 Minutes
Elasticsearch in 15 MinutesElasticsearch in 15 Minutes
Elasticsearch in 15 MinutesKarel Minarik
 
Elasticsearch And Ruby [RuPy2012]
Elasticsearch And Ruby [RuPy2012]Elasticsearch And Ruby [RuPy2012]
Elasticsearch And Ruby [RuPy2012]Karel Minarik
 
Shell's Kitchen: Infrastructure As Code (Webexpo 2012)
Shell's Kitchen: Infrastructure As Code (Webexpo 2012)Shell's Kitchen: Infrastructure As Code (Webexpo 2012)
Shell's Kitchen: Infrastructure As Code (Webexpo 2012)Karel Minarik
 
Redis — The AK-47 of Post-relational Databases
Redis — The AK-47 of Post-relational DatabasesRedis — The AK-47 of Post-relational Databases
Redis — The AK-47 of Post-relational DatabasesKarel Minarik
 
CouchDB – A Database for the Web
CouchDB – A Database for the WebCouchDB – A Database for the Web
CouchDB – A Database for the WebKarel Minarik
 
Spoiling The Youth With Ruby (Euruko 2010)
Spoiling The Youth With Ruby (Euruko 2010)Spoiling The Youth With Ruby (Euruko 2010)
Spoiling The Youth With Ruby (Euruko 2010)Karel Minarik
 
Verzovani kodu s Gitem (Karel Minarik)
Verzovani kodu s Gitem (Karel Minarik)Verzovani kodu s Gitem (Karel Minarik)
Verzovani kodu s Gitem (Karel Minarik)Karel Minarik
 
Představení Ruby on Rails [Junior Internet]
Představení Ruby on Rails [Junior Internet]Představení Ruby on Rails [Junior Internet]
Představení Ruby on Rails [Junior Internet]Karel Minarik
 
Efektivni vyvoj webovych aplikaci v Ruby on Rails (Webexpo)
Efektivni vyvoj webovych aplikaci v Ruby on Rails (Webexpo)Efektivni vyvoj webovych aplikaci v Ruby on Rails (Webexpo)
Efektivni vyvoj webovych aplikaci v Ruby on Rails (Webexpo)Karel Minarik
 
Úvod do Ruby on Rails
Úvod do Ruby on RailsÚvod do Ruby on Rails
Úvod do Ruby on RailsKarel Minarik
 
Úvod do programování 7
Úvod do programování 7Úvod do programování 7
Úvod do programování 7Karel Minarik
 
Úvod do programování 6
Úvod do programování 6Úvod do programování 6
Úvod do programování 6Karel Minarik
 
Úvod do programování 5
Úvod do programování 5Úvod do programování 5
Úvod do programování 5Karel Minarik
 
Úvod do programování 4
Úvod do programování 4Úvod do programování 4
Úvod do programování 4Karel Minarik
 
Úvod do programování 3 (to be continued)
Úvod do programování 3 (to be continued)Úvod do programování 3 (to be continued)
Úvod do programování 3 (to be continued)Karel Minarik
 
Historie programovacích jazyků
Historie programovacích jazykůHistorie programovacích jazyků
Historie programovacích jazykůKarel Minarik
 
Úvod do programování aneb Do nitra stroje
Úvod do programování aneb Do nitra strojeÚvod do programování aneb Do nitra stroje
Úvod do programování aneb Do nitra strojeKarel Minarik
 
Interaktivita, originalita a návrhové vzory
Interaktivita, originalita a návrhové vzoryInteraktivita, originalita a návrhové vzory
Interaktivita, originalita a návrhové vzoryKarel Minarik
 

Más de Karel Minarik (20)

Vizualizace dat a D3.js [EUROPEN 2014]
Vizualizace dat a D3.js [EUROPEN 2014]Vizualizace dat a D3.js [EUROPEN 2014]
Vizualizace dat a D3.js [EUROPEN 2014]
 
Elasticsearch (Rubyshift 2013)
Elasticsearch (Rubyshift 2013)Elasticsearch (Rubyshift 2013)
Elasticsearch (Rubyshift 2013)
 
Elasticsearch in 15 Minutes
Elasticsearch in 15 MinutesElasticsearch in 15 Minutes
Elasticsearch in 15 Minutes
 
Elasticsearch And Ruby [RuPy2012]
Elasticsearch And Ruby [RuPy2012]Elasticsearch And Ruby [RuPy2012]
Elasticsearch And Ruby [RuPy2012]
 
Shell's Kitchen: Infrastructure As Code (Webexpo 2012)
Shell's Kitchen: Infrastructure As Code (Webexpo 2012)Shell's Kitchen: Infrastructure As Code (Webexpo 2012)
Shell's Kitchen: Infrastructure As Code (Webexpo 2012)
 
Redis — The AK-47 of Post-relational Databases
Redis — The AK-47 of Post-relational DatabasesRedis — The AK-47 of Post-relational Databases
Redis — The AK-47 of Post-relational Databases
 
CouchDB – A Database for the Web
CouchDB – A Database for the WebCouchDB – A Database for the Web
CouchDB – A Database for the Web
 
Spoiling The Youth With Ruby (Euruko 2010)
Spoiling The Youth With Ruby (Euruko 2010)Spoiling The Youth With Ruby (Euruko 2010)
Spoiling The Youth With Ruby (Euruko 2010)
 
Verzovani kodu s Gitem (Karel Minarik)
Verzovani kodu s Gitem (Karel Minarik)Verzovani kodu s Gitem (Karel Minarik)
Verzovani kodu s Gitem (Karel Minarik)
 
Představení Ruby on Rails [Junior Internet]
Představení Ruby on Rails [Junior Internet]Představení Ruby on Rails [Junior Internet]
Představení Ruby on Rails [Junior Internet]
 
Efektivni vyvoj webovych aplikaci v Ruby on Rails (Webexpo)
Efektivni vyvoj webovych aplikaci v Ruby on Rails (Webexpo)Efektivni vyvoj webovych aplikaci v Ruby on Rails (Webexpo)
Efektivni vyvoj webovych aplikaci v Ruby on Rails (Webexpo)
 
Úvod do Ruby on Rails
Úvod do Ruby on RailsÚvod do Ruby on Rails
Úvod do Ruby on Rails
 
Úvod do programování 7
Úvod do programování 7Úvod do programování 7
Úvod do programování 7
 
Úvod do programování 6
Úvod do programování 6Úvod do programování 6
Úvod do programování 6
 
Úvod do programování 5
Úvod do programování 5Úvod do programování 5
Úvod do programování 5
 
Úvod do programování 4
Úvod do programování 4Úvod do programování 4
Úvod do programování 4
 
Úvod do programování 3 (to be continued)
Úvod do programování 3 (to be continued)Úvod do programování 3 (to be continued)
Úvod do programování 3 (to be continued)
 
Historie programovacích jazyků
Historie programovacích jazykůHistorie programovacích jazyků
Historie programovacích jazyků
 
Úvod do programování aneb Do nitra stroje
Úvod do programování aneb Do nitra strojeÚvod do programování aneb Do nitra stroje
Úvod do programování aneb Do nitra stroje
 
Interaktivita, originalita a návrhové vzory
Interaktivita, originalita a návrhové vzoryInteraktivita, originalita a návrhové vzory
Interaktivita, originalita a návrhové vzory
 

Último

Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 

Último (20)

Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 

Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)

  • 1. ElasticSearch Beyond Ordinary Fulltext Search Karel Minařík
  • 2. http://karmi.cz ElasticSearch
  • 3. AUDIENCE POLL Does your application have a search feature? ElasticSearch
  • 4. AUDIENCE POLL What do you use for search? 1. SELECT  ...  LIKE  %foo% 2. Sphinx 3. Apache Solr 4. ElasticSearch ElasticSearch
  • 5. Search is the primary interface for getting information today. ElasticSearch
  • 6.
  • 9. ???
  • 10. ???
  • 11.
  • 13. Y U NO ALIGN???
  • 14.
  • 15. ???
  • 16. ???
  • 17.
  • 18. Search is hard. Let's go write SQL queries! ElasticSearch
  • 19. WHY SEARCH SUCKS? How do you implement search? def  search    @results  =  MyModel.search  params[:q]    respond_with  @results end
  • 20. WHY SEARCH SUCKS? How do you implement search? Query Results Result MAGIC def  search    @results  =  MyModel.search  params[:q]    respond_with  @results end
  • 21. WHY SEARCH SUCKS? How do you implement search? Query Results Result MAGIC + / def  search    @results  =  MyModel.search  params[:q]    respond_with  @results end
  • 22. 23px 670px A personal story...
  • 23. WHY SEARCH SUCKS? Compare your search library with your ORM library MyModel.search  "(this  OR  that)  AND  NOT  whatever" Arel::Table.new(:articles).    where(articles[:title].eq('On  Search')).    where(["published_on  =>  ?",  Time.now]).    join(comments).    on(article[:id].eq(comments[:article_id]))    take(5).    skip(4).    to_sql
  • 24. How does search work? ElasticSearch
  • 25. HOW DOES SEARCH WORK? A collection of documents file_1.txt The  ruby  is  a  pink  to  blood-­‐red  colored  gemstone  ... file_2.txt Ruby  is  a  dynamic,  reflective,  general-­‐purpose  object-­‐oriented   programming  language  ... file_3.txt "Ruby"  is  a  song  by  English  rock  band  Kaiser  Chiefs  ...
  • 26. HOW DOES SEARCH WORK? How do you search documents? File.read('file_1.txt').include?('ruby') File.read('file_2.txt').include?('ruby') ...
  • 27. HOW DOES SEARCH WORK? The inverted index TOKENS POSTINGS ruby file_1.txt file_2.txt file_3.txt pink file_1.txt gemstone file_1.txt dynamic file_2.txt reflective file_2.txt programming file_2.txt song file_3.txt english file_3.txt rock file_3.txt http://en.wikipedia.org/wiki/Index_(search_engine)#Inverted_indices
  • 28. HOW DOES SEARCH WORK? The inverted index MySearchLib.search  "ruby" ruby file_1.txt file_2.txt file_3.txt pink file_1.txt gemstone file_1.txt dynamic file_2.txt reflective file_2.txt programming file_2.txt song file_3.txt english file_3.txt rock file_3.txt http://en.wikipedia.org/wiki/Index_(search_engine)#Inverted_indices
  • 29. HOW DOES SEARCH WORK? The inverted index MySearchLib.search  "song" ruby file_1.txt file_2.txt file_3.txt pink file_1.txt gemstone file_1.txt dynamic file_2.txt reflective file_2.txt programming file_2.txt song file_3.txt english file_3.txt rock file_3.txt http://en.wikipedia.org/wiki/Index_(search_engine)#Inverted_indices
  • 30. HOW DOES SEARCH WORK? The inverted index MySearchLib.search  "ruby  AND  song" ruby file_1.txt file_2.txt file_3.txt pink file_1.txt gemstone file_1.txt dynamic file_2.txt reflective file_2.txt programming file_2.txt song file_3.txt english file_3.txt rock file_3.txt http://en.wikipedia.org/wiki/Index_(search_engine)#Inverted_indices
  • 31. module  SimpleSearch A naïve Ruby implementation    def  index  document,  content        tokens  =  analyze  content        store  document,  tokens        puts  "Indexed  document  #{document}  with  tokens:",  tokens.inspect,  "n"    end    def  analyze  content        #  >>>  Split  content  by  words  into  "tokens"        content.split(/W/).        #  >>>  Downcase  every  word        map        {  |word|  word.downcase  }.        #  >>>  Reject  stop  words,  digits  and  whitespace        reject  {  |word|  STOPWORDS.include?(word)  ||  word  =~  /^d+/  ||  word  ==  ''    }    end    def  store  document_id,  tokens        tokens.each  do  |token|            #  >>>  Save  the  "posting"            (  (INDEX[token]  ||=  [])  <<  document_id  ).uniq!        end    end    def  search  token        puts  "Results  for  token  '#{token}':"        #  >>>  Print  documents  stored  in  index  for  this  token        INDEX[token].each  {  |document|  "    *  #{document}"  }    end    INDEX  =  {}    STOPWORDS  =  %w|a  an  and  are  as  at  but  by  for  if  in  is  it  no  not  of  on  or  that  the  then  there  t    extend  self end
  • 32. HOW DOES SEARCH WORK? Indexing documents SimpleSearch.index  "file1",  "Ruby  is  a  language.  Java  is  also  a  language. SimpleSearch.index  "file2",  "Ruby  is  a  song." SimpleSearch.index  "file3",  "Ruby  is  a  stone." SimpleSearch.index  "file4",  "Java  is  a  language." Indexed  document  file1  with  tokens: ["ruby",  "language",  "java",  "also",  "language"] Indexed  document  file2  with  tokens: ["ruby",  "song"] Words downcased, stopwords removed. Indexed  document  file3  with  tokens: ["ruby",  "stone"] Indexed  document  file4  with  tokens: ["java",  "language"]
  • 33. HOW DOES SEARCH WORK? The index puts  "What's  in  our  index?" p  SimpleSearch::INDEX {    "ruby"          =>  ["file1",  "file2",  "file3"],    "language"  =>  ["file1",  "file4"],    "java"          =>  ["file1",  "file4"],    "also"          =>  ["file1"],    "stone"        =>  ["file3"],    "song"          =>  ["file2"] }
  • 34. HOW DOES SEARCH WORK? Search the index SimpleSearch.search  "ruby" Results  for  token  'ruby': *  file1 *  file2 *  file3
  • 35. HOW DOES SEARCH WORK? The inverted index TOKENS POSTINGS ruby 3 file_1.txt file_2.txt file_3.txt pink 1 file_1.txt gemstone file_1.txt dynamic file_2.txt reflective file_2.txt programming file_2.txt song file_3.txt english file_3.txt rock file_3.txt http://en.wikipedia.org/wiki/Index_(search_engine)#Inverted_indices
  • 36. It is very practical to know how search works. For instance, now you know that the analysis step is very important. It's more important than the “search” step. ElasticSearch
  • 37. module  SimpleSearch    def  index  document,  content        tokens  =  analyze  content        store  document,  tokens        puts  "Indexed  document  #{document}  with  tokens:",  tokens.inspect,  "n"    end    def  analyze  content        #  >>>  Split  content  by  words  into  "tokens"        content.split(/W/).        #  >>>  Downcase  every  word        map        {  |word|  word.downcase  }.        #  >>>  Reject  stop  words,  digits  and  whitespace        reject  {  |word|  STOPWORDS.include?(word)  ||  word  =~  /^d+/  ||  word  ==  ''    }    end    def  store  document_id,  tokens        tokens.each  do  |token|            #  >>>  Save  the  "posting"            (  (INDEX[token]  ||=  [])  <<  document_id  ).uniq!        end    end    def  search  token        puts  "Results  for  token  '#{token}':"        #  >>>  Print  documents  stored  in  index  for  this  token        INDEX[token].each  {  |document|  "    *  #{document}"  }    end    INDEX  =  {}    STOPWORDS  =  %w|a  an  and  are  as  at  but  by  for  if  in  is  it  no  not  of  on  or  that  the  then  there  t    extend  self end A naïve Ruby implementation
  • 38. HOW DOES SEARCH WORK? The Search Engine Textbook Search Engines Information Retrieval in Practice Bruce Croft, Donald Metzler and Trevor Strohma Addison Wesley, 2009 http://search-engines-book.com
  • 39. SEARCH IMPLEMENTATIONS The Baseline Information Retrieval Implementation Lucene in Action Michael McCandless, Erik Hatcher and Otis Gospodnetic July, 2010 http://manning.com/hatcher3
  • 41. ElasticSearch is an open source, scalable, distributed, cloud-ready, highly-available full- text search engine and database with powerfull aggregation features, communicating by JSON over RESTful HTTP, based on Apache Lucene. ElasticSearch
  • 42.
  • 43. { } HTTP JSON Schema-free Index as Resource Distributed Queries Facets Mapping Ruby ElasticSearch
  • 44. ELASTICSEARCH FEATURES HTTP / JSON / Schema-free / Index as Resource / Distributed / Queries / Facets / Mapping / Ruby #  Add  a  document curl  -­‐X  POST      "http://localhost:9200/articles/article/1"         INDEX TYPE ID    -­‐d  '{  "title"  :  "One"  }' DOCUMENT
  • 45. ELASTICSEARCH FEATURES HTTP / JSON / Schema-free / Index as Resource / Distributed / Queries / Facets / Mapping / Ruby #  Add  a  document curl  -­‐X  POST  "http://localhost:9200/articles/article/1"  -­‐d  '{  "title"  :  "One"  }' #  Perform  query curl  -­‐X  GET    "http://localhost:9200/articles/_search?q=One" curl  -­‐X  POST  "http://localhost:9200/articles/_search"  -­‐d  '{    "query"  :  {  "terms"  :  {  "tags"  :  ["ruby",  "python"],  "minimum_match"  :  2  }  } }' #  Delete  index curl  -­‐X  DELETE    "http://localhost:9200/articles" #  Create  index  with  settings  and  mapping curl  -­‐X  PUT      "http://localhost:9200/articles"  -­‐d  ' {  "settings"  :  {  "index"  :  "number_of_shards"  :  3,  "number_of_replicas"  :  2  }}, {  "mappings"  :  {  "document"  :  {                                      "properties"  :  {                                          "body"  :  {  "type"  :  "string",  "analyzer"  :  "snowball"  }                                      }                              }  } }'
  • 46. ELASTICSEARCH FEATURES HTTP / JSON / Schema-free / Index as Resource / Distributed / Queries / Facets / Mapping / Ruby http  { GET  http://user:password@localhost:8080/_search?q=*  =>  http://localhost:9200/user/_search?q=*    server  {        listen              8080;        server_name    search.example.com;        error_log      elasticsearch-­‐errors.log;        access_log    elasticsearch.log;        location  /  {            #  Deny  access  to  Cluster  API            if  ($request_filename  ~  "_cluster")  {                return  403; #664 Add HTTPS and basic authentication support NO.                break;            }            #  Pass  requests  to  ElasticSearch            proxy_pass  http://localhost:9200;            proxy_redirect  off;                                proxy_set_header    X-­‐Real-­‐IP    $remote_addr;            proxy_set_header    X-­‐Forwarded-­‐For  $proxy_add_x_forwarded_for;            proxy_set_header    Host  $http_host;            #  Authorize  access            auth_basic                      "ElasticSearch";            auth_basic_user_file  passwords;            #  Route  all  requests  to  authorized  user's  own  index            rewrite    ^(.*)$    /$remote_user$1    break;            rewrite_log  on;            return  403;                } https://gist.github.com/986390    }
  • 47. ELASTICSEARCH FEATURES JSON / Schema-free / Index as Resource / Distributed / Queries / Facets / Mapping / Ruby ON HTTP / JS {    "id"        :  "abc123",    "title"  :  "ElasticSearch  Understands  JSON!",    "body"    :  "ElasticSearch  not  only  “works”  with  JSON,  it  understands  it!  Let’s  first  .    "published_on"  :  "2011/05/27  10:00:00",        "tags"    :  ["search",  "json"],    "author"  :  {        "first_name"  :  "Clara",        "last_name"    :  "Rice",        "email"            :  "clara@rice.org"    } }
  • 48. ELASTICSEARCH FEATURES HTTP / JSON / Schema-free / Index as Resource / Distributed / Queries / Facets / Mapping / Ruby curl  -­‐X  DELETE  "http://localhost:9200/articles";  sleep  1 curl  -­‐X  POST      "http://localhost:9200/articles/article"  -­‐d  ' {    "id"        :  "abc123",    "title"  :  "ElasticSearch  Understands  JSON!",    "body"    :  "ElasticSearch  not  only  “works”  with  JSON,  it  understands  it!  Let’s  first  .    "published_on"  :  "2011/05/27  10:00:00",        "tags"    :  ["search",  "json"],    "author"  :  {        "first_name"  :  "Clara",        "last_name"    :  "Rice",        "email"            :  "clara@rice.org"    } }' curl  -­‐X  POST      "http://localhost:9200/articles/_refresh" curl  -­‐X  GET      "http://localhost:9200/articles/article/_search?q=author.first_name:clara"
  • 49. ELASTICSEARCH FEATURES HTTP / JSON / Schema-free / Index as Resource / Distributed / Queries / Facets / Mapping / Ruby curl  -­‐X  POST      "http://localhost:9200/articles/article"  -­‐d  ' ... "published_on"  :  "2011/05/27  10:00:00", ... curl  -­‐X  GET        "http://localhost:9200/articles/_mapping?pretty=true" {    "articles"  :  {        "article"  :  {            "properties"  :  {                "title"  :  {                    "type"  :  "string"                },                //  ...                "author"  :  {                    "dynamic"  :  "true",                    "properties"  :  {                        "first_name"  :  {                            "type"  :  "string"                        },                        //  ...                    }                },                "published_on"  :  {                    "format"  :  "yyyy/MM/dd  HH:mm:ss||yyyy/MM/dd",                    "type"  :  "date"                }            }        }    } }
  • 50. ELASTICSEARCH FEATURES HTTP / JSON / Schema Free / Index as Resource / Distributed / Queries / Facets / Mapping / Ruby curl  -­‐X  POST      "http://localhost:9200/articles/comment"  -­‐d  ' {        "body"  :  "Wow!  Really  nice  JSON  support.", DIFFERENT TYPE    "published_on"  :  "2011/05/27  10:05:00",    "author"  :  {        "first_name"  :  "John",        "last_name"    :  "Pear",        "email"            :  "john@pear.org"    } }' curl  -­‐X  POST      "http://localhost:9200/articles/_refresh" curl  -­‐X  GET      "http://localhost:9200/articles/comment/_search?q=author.first_name:john"
  • 51. ELASTICSEARCH FEATURES HTTP / JSON / Schema Free / Index as Resource / Distributed / Queries / Facets / Mapping / Ruby curl  -­‐X  GET      "http://localhost:9200/articles/comment/_search?q=body:json" Search single type curl  -­‐X  GET      "http://localhost:9200/articles/_search?q=body:json" Search whole index curl  -­‐X  GET      "http://localhost:9200/articles,users/_search?q=body:json" Search multiple indices curl  -­‐X  GET      "http://localhost:9200/_search?q=body:json" Search all indices
  • 52. ELASTICSEARCH FEATURES HTTP / JSON / Schema Free / Index as Resource / Distributed / Queries / Facets / Mapping / Ruby curl  -­‐X  DELETE  "http://localhost:9200/articles";  sleep  1 curl  -­‐X  POST      "http://localhost:9200/articles/article"  -­‐d  ' {    "id"        :  "abc123",    "title"  :  "ElasticSearch  Understands  JSON!",    "body"    :  "ElasticSearch  not  only  “works”  with  JSON,  it  understands  it!  Let’s  first  ...",    "published_on"  :  "2011/05/27  10:00:00",        "tags"    :  ["search",  "json"],    "author"  :  {        "first_name"  :  "Clara",        "last_name"    :  "Rice",        "email"            :  "clara@rice.org"    } }' curl  -­‐X  POST      "http://localhost:9200/articles/_refresh" curl  -­‐X  GET  "http://localhost:9200/articles/article/abc123"
  • 53. ELASTICSEARCH FEATURES HTTP / JSON / Schema Free / Index as Resource / Distributed / Queries / Facets / Mapping / Ruby {"_index":"articles","_type":"article","_id":"1","_version":1,  "_source"  :   {    "id"        :  "1",    "title"  :  "ElasticSearch  Understands  JSON!",    "body"    :  "ElasticSearch  not  only  “works”  with  JSON,  it  understands  it!  Let’s      "published_on"  :  "2011/05/27  10:00:00",        "tags"    :  ["search",  "json"],    "author"  :  {        "first_name"  :  "Clara",        "last_name"    :  "Rice",        "email"            :  "clara@rice.org"    } }} “The Index Is Your Database”
  • 54. ELASTICSEARCH FEATURES HTTP / JSON / Schema Free / Index as Resource / Distributed / Queries / Facets / Mapping / Ruby Index Aliases curl  -­‐X  POST  'http://localhost:9200/_aliases'  -­‐d  ' {    "actions"  :  [        {  "add"  :  { index_A                "index"  :  "index_1",                "alias"  :  "myalias" my_alias            }        },        {  "add"  :  {                "index"  :  "index_2",                "alias"  :  "myalias" index_B            }        }    ] }' http://www.elasticsearch.org/guide/reference/api/admin-indices-aliases.html
  • 55. ELASTICSEARCH FEATURES HTTP / JSON / Schema Free / Index as Resource / Distributed / Queries / Facets / Mapping / Ruby The “Sliding Window” problem curl  -­‐X  DELETE  http://localhost:9200  /  logs_2010_01 logs_2010_02 logs logs_2010_03 logs_2010_04 “We can really store only three months worth of data.”
  • 56. ELASTICSEARCH FEATURES HTTP / JSON / Schema Free / Index as Resource / Distributed / Queries / Facets / Mapping / Ruby Index Templates curl  -­‐X  PUT  localhost:9200/_template/bookmarks_template  -­‐d  ' {    "template"  :  "users_*", Apply this configuration for every matching    "settings"  :  { index being created        "index"  :  {            "number_of_shards"      :  1,            "number_of_replicas"  :  3        }    },    "mappings":  {        "url":  {            "properties":  {                "url":  {                    "type":  "string",  "analyzer":  "url_ngram",  "boost":  10                },                "title":  {                    "type":  "string",  "analyzer":  "snowball",  "boost":  5                }                //  ...            }        }    } } ' http://www.elasticsearch.org/guide/reference/api/admin-indices-templates.html
  • 57. ELASTICSEARCH FEATURES HTTP / JSON / Schema Free / Index as Resource / Distributed / Queries / Facets / Mapping / Ruby $  cat  elasticsearch.yml cluster:    name:  <YOUR  APPLICATION> Automatic Discovery Protocol MASTER Node 1 Node 2 Node 3 Node 4 http://www.elasticsearch.org/guide/reference/modules/discovery/
  • 58. ELASTICSEARCH FEATURES HTTP / JSON / Schema Free / Index as Resource / Distributed / Queries / Facets / Mapping / Ruby Index A is split into 3 shards, and duplicated in 2 replicas. A1 A1' A1'' Replicas A2 A2' A2'' A3 A3' A3'' curl  -­‐XPUT  'http://localhost:9200/A/'  -­‐d  '{        "settings"  :  {                "index"  :  { Shards                        "number_of_shards"      :  3,                        "number_of_replicas"  :  2                }        } }'
  • 59. ELASTICSEARCH FEATURES HTTP / JSON / Schema Free / Index as Resource / Distributed / Queries / Facets / Mapping / Ruby Im pr ce ove an rm in de rfo xi pe ng h pe a rc rfo se rm e ov an pr ce Im SH AR AS DS IC PL RE
  • 60. ELASTICSEARCH FEATURES HTTP / JSON / Schema Free / Index as Resource / Distributed / Queries / Facets / Mapping / Ruby Y U NO ASK FIRST???
  • 61. ELASTICSEARCH FEATURES HTTP / JSON / Schema Free / Index as Resource / Distributed / Queries / Facets / Mapping / Ruby Indexing 100 000 documents (~ 56MB), one shard, no replicas, MacBookAir SSD 2GB #  Index  all  at  once time  curl  -­‐s  -­‐X  POST  "http://localhost:9200/_bulk"      -­‐-­‐data-­‐binary  @data/bulk_all.json  >  /dev/null real   2m1.142s #  Index  in  batches  of  1000 for  file  in  data/bulk_*.json;  do    time  curl  -­‐s  -­‐X  POST  "http://localhost:9200/_bulk"          -­‐-­‐data-­‐binary  @$file  >  /dev/null done real   1m36.697s  (-­‐25sec,  80%) #  Do  not  refresh  during  indexing  in  batches "settings"  :  {  "refresh_interval"  :  "-­‐1"  } for  file  in  data/bulk_*.json;  do ... real   0m38.859s  (-­‐82sec,  32%)
  • 62. ELASTICSEARCH FEATURES HTTP / JSON / Schema Free / Distributed / Queries / Facets / Mapping / Ruby $  curl  -­‐X  GET  "http://localhost:9200/_search?q=<YOUR  QUERY>" apple Terms apple  iphone Phrases "apple  iphone" Proximity "apple  safari"~5 Fuzzy apple~0.8 app* Wildcards *pp* Boosting apple^10  safari [2011/05/01  TO  2011/05/31] Range [java  TO  json] apple  AND  NOT  iphone +apple  -­‐iphone Boolean (apple  OR  iphone)  AND  NOT  review title:iphone^15  OR  body:iphone Fields published_on:[2011/05/01  TO  "2011/05/27  10:00:00"] http://lucene.apache.org/java/3_1_0/queryparsersyntax.html
  • 63. ELASTICSEARCH FEATURES Queries / Facets / Mapping / Ruby ON HTTP / JSON / Schema Free / Distributed / JS Query DSL curl  -­‐X  POST  "http://localhost:9200/articles/_search?pretty=true"  -­‐d  ' {    "query"  :  {        "terms"  :  {            "tags"  :  [  "ruby",  "python"  ],            "minimum_match"  :  2        }    } }' http://www.elasticsearch.org/guide/reference/query-dsl/
  • 64. ELASTICSEARCH FEATURES HTTP / JSON / Schema Free / Distributed / Queries / Facets / Mapping / Ruby Geo Search Accepted  formats  for  Geo: [lon, lat] # Array curl  -­‐X  POST  "http://localhost:9200/venues/venue"  -­‐d  ' { "lat,lon" # String    "name":  "Pizzeria", drm3btev3e86 # Geohash    "pin":  {        "location":  {            "lat":  50.071712,            "lon":  14.386832        }    } }' curl  -­‐X  POST  "http://localhost:9200/venues/_search?pretty=true"  -­‐d  ' {    "query"  :  {        "filtered"  :  {                "query"  :  {  "query_string"  :  {  "query"  :  "pizzeria"  }  },                "filter"  :  {                        "geo_distance"  :  {                                "distance"  :  "0.5km",                                "pin.location"  :  {  "lat"  :  50.071481,  "lon"  :  14.387284  }                        }                }        }    } }' http://www.elasticsearch.org/guide/reference/query-dsl/geo-distance-filter.html
  • 65. ELASTICSEARCH FEATURES HTTP / JSON / Schema Free / Distributed / Queries / Facets / Mapping / Ruby Query Facets http://blog.linkedin.com/2009/12/14/linkedin-faceted-search/
  • 66. ELASTICSEARCH FEATURES HTTP / JSON / Schema Free / Distributed / Queries / Facets / Mapping / Ruby curl  -­‐X  POST  "http://localhost:9200/articles/_search?pretty=true"  -­‐d  ' {    "query"  :  {        "query_string"  :  {  "query"  :  "title:T*"} User query    },    "filter"  :  {        "terms"  :  {  "tags"  :  ["ruby"]  } “Checkboxes”    },    "facets"  :  {        "tags"  :  {            "terms"  :  { Facets                    "field"  :  "tags",                    "size"  :  10            }        }    } }' #  facets"  :  { #      "tags"  :  { #          "terms"  :  [  { #              "term"  :  "ruby", #              "count"  :  2 #          },  { #              "term"  :  "python", #              "count"  :  1 #          },  { #              "term"  :  "java", #              "count"  :  1 #          }  ] #      } #  } http://www.elasticsearch.org/guide/reference/api/search/facets/index.html
  • 67. ELASTICSEARCH FEATURES HTTP / JSON / Schema Free / Distributed / Queries / Facets / Mapping / Ruby curl  -­‐X  POST  "http://localhost:9200/articles/_search?pretty=true"  -­‐d  ' {    "facets"  :  {        "published_on"  :  {            "date_histogram"  :  {                "field"        :  "published",                "interval"  :  "day"            }        }    } }'
  • 68. ELASTICSEARCH FEATURES HTTP / JSON / Schema Free / Distributed / Queries / Facets / Mapping / Ruby Geo Facets curl  -­‐X  POST  "http://localhost:9200/venues/_search?pretty=true"  -­‐d  ' {        "query"  :  {  "query_string"  :  {  "query"  :  "pizzeria"  }  },        "facets"  :  {                "distance_count"  :  {                        "geo_distance"  :  {                                "pin.location"  :  {                                        "lat"  :  50.071712,                                        "lon"  :  14.386832                                },                                "ranges"  :  [                                        {  "to"  :  1  },                                        {  "from"  :  1,  "to"  :  5  },                                        {  "from"  :  5,  "to"  :  10  }                                ]                        }                }        } }' http://www.elasticsearch.org/guide/reference/api/search/facets/geo-distance-facet.html
  • 69. ELASTICSEARCH FEATURES HTTP / JSON / Schema Free / Distributed / Queries / Facets / Mapping / Ruby curl  -­‐X  DELETE  "http://localhost:9200/articles" curl  -­‐X  POST      "http://localhost:9200/articles/article"  -­‐d  ' {    "mappings":  {        "article":  {            "properties":  {                "tags":  {                    "type":  "string",                    "analyzer":  "keyword"                },                "content":  {                    "type":  "string",                    "analyzer":  "snowball"                },                "title":  {                    "type":  "string",                    "analyzer":  "snowball",                    "boost":        10.0                }            }        }    } }' curl  -­‐X  GET        'http://localhost:9200/articles/_mapping?pretty=true' Remember?    def  analyze  content        #  >>>  Split  content  by  words  into  "tokens"        content.split(/W/).        #  >>>  Downcase  every  word        map        {  |word|  word.downcase  }.        #  ... http://www.elasticsearch.org/guide/reference/api/admin-indices-create-index.html    end
  • 70. ELASTICSEARCH FEATURES HTTP / JSON / Schema Free / Distributed / Queries / Facets / Mapping / Ruby curl  -­‐X  DELETE  "http://localhost:9200/urls" curl  -­‐X  POST      "http://localhost:9200/urls/url"  -­‐d  ' {    "settings"  :  {        "index"  :  {            "analysis"  :  {                "analyzer"  :  {                    "url_analyzer"  :  {                        "type"  :  "custom",                        "tokenizer"  :  "lowercase",                        "filter"        :  ["stop",  "url_stop",  "url_ngram"]                    }                },                "filter"  :  {                    "url_stop"  :  {                        "type"  :  "stop",                        "stopwords"  :  ["http",  "https",  "www"]                    },                    "url_ngram"  :  {                        "type"  :  "nGram",                        "min_gram"  :  3,                        "max_gram"  :  5                    }                }            }        }    } }' https://gist.github.com/988923
  • 71. ELASTICSEARCH FEATURES HTTP / JSON / Schema Free / Distributed / Queries / Facets / Mapping / Ruby Tire.index  'articles'  do    delete    create    store  :title  =>  'One',      :tags  =>  ['ruby'],                      :published_on  =>  '2011-­‐01-­‐01'    store  :title  =>  'Two',      :tags  =>  ['ruby',  'python'],  :published_on  =>  '2011-­‐01-­‐02'    store  :title  =>  'Three',  :tags  =>  ['java'],                      :published_on  =>  '2011-­‐01-­‐02'    store  :title  =>  'Four',    :tags  =>  ['ruby',  'php'],        :published_on  =>  '2011-­‐01-­‐03'    refresh end s  =  Tire.search  'articles'  do    query  {  string  'title:T*'  }    filter  :terms,  :tags  =>  ['ruby']    sort  {  title  'desc'  } http://github.com/karmi/tire    facet  'global-­‐tags'    {  terms  :tags,  :global  =>  true  }    facet  'current-­‐tags'  {  terms  :tags  } end
  • 72. ELASTICSEARCH FEATURES HTTP / JSON / Schema Free / Distributed / Queries / Facets / Mapping / Ruby class  Article  <  ActiveRecord::Base    include  Tire::Model::Search    include  Tire::Model::Callbacks end $  rake  environment  tire:import  CLASS='Article' Article.search  do    query  {  string  'love'  }    facet('timeline')  {  date  :published_on,  :interval  =>  'month'  }    sort    {  published_on  'desc'  } end http://github.com/karmi/tire
  • 73. ELASTICSEARCH FEATURES HTTP / JSON / Schema Free / Distributed / Queries / Facets / Mapping / Ruby class  Article    include  Whatever::ORM    include  Tire::Model::Search    include  Tire::Model::Callbacks end $  rake  environment  tire:import  CLASS='Article' Article.search  do    query  {  string  'love'  }    facet('timeline')  {  date  :published_on,  :interval  =>  'month'  }    sort    {  published_on  'desc'  } end http://github.com/karmi/tire
  • 74. Try ElasticSearch in a Ruby On Rails aplication with a one-line command $  rails  new  tired  -­‐m  "https://gist.github.com/raw/951343/tired.rb" A “batteries included” installation. Downloads and launches ElasticSearch. Sets up a Rails applicationand and launches it. When you're tired of it, just delete the folder.