SlideShare una empresa de Scribd logo
1 de 68
Descargar para leer sin conexión
99 Problems, But
The Search Ain’t One
Andrei Zmievski • PHP UK •!Feb 25, 2011
who am I?
 curl http://localhost:9200/speaker/info/andrei


{“name”:       “Andrei Zmievski”,
 “projects”:   [“PHP”, “PHP-GTK”, “Smarty”, “Unicode/i18n”],
 “likes”:      [“coding”, “beer”, “brewing”, “photography”],
 “twitter”:    “@a”,
 “email”:      “andrei@zmievski.org”}
what is elasticsearch?

a search engine for the NoSQL generation

  domain-driven

  distributed

  RESTful

  Hitchhiker’s Guide to the Galaxy (no, really)
document model


document-oriented

JSON-based

schema-free
engine


based on Lucene

multi-tenancy

distributed, out of the box
nomenclature

index

type

document

  _id

node
3 easy steps
1. index
           !"#$%&'()*+%,--./00$1!2$,13-/45660!17803.92:9#0;%&<=
           >
request




           %%%%?72@9?/%?A7<#9B%C@B9D3:B?E
           %%%%?-2$:?/%?44%(#1F$9@3E%F"-%-,9%*92#!,%AB7==-%)79?E
           %%%%?$B:93?/%G?!1<B7H?E%?F99#?E%?.,1-1H#2.,I?JE
           %%%%?-KB--9#?/%?2?E
           %%%%?,9BH,-?/%;LM
           N=


           >
response




           %%%%?1:?/-#"9
           %%%%?OB7<9P?/?!178?
           %%%%?O-I.9?/?3.92:9#?
           %%%%?OB<?/?;?
           N
2. search
request



           !"#$%,--./00$1!2$,13-/45660!17803.92:9#0O392#!,QRSF99#


           >%?-11:?%/%TE
           %%?O3,2#<3?%/%>
           %%%%?-1-2$?%/%;E
           %%%%?3"!!9338"$?%/%;E
           %%%%?82B$9<?%/%6
           %%NE
           %%?,B-3?%/%>
           %%%%?-1-2$?%/%;E
response




           %%%%?@2PO3!1#9?%/%6UV46LM64E
           %%%%?,B-3?%/%G%>
           %%%%%%?OB7<9P?%/%?!178?E
           %%%%%%?O-I.9?%/%?3.92:9#?E
           %%%%%%?OB<?%/%?5?E
           %%%%%%?O3!1#9?%/%6UV46LM64E
           %%%%%%?O31"#!9?%/%
           >
           %%%%?72@9?/%?A7<#9B%C@B9D3:B?E
           %%%%?-2$:?/%?44%(#1F$9@3E%F"-%-,9%*92#!,%AB7=-%)79?E
           %%%%?$B:93?/%G?!1<B7H?E%?F99#?E%?.,1-1H#2.,I?JE
           %%%%?-KB--9#?/%?2?E
           %%%%?,9BH,-?/%;LM
           N%N%J%N%N
2. search
request



           !"#$%,--./00$1!2$,13-/45660!17803.92:9#0O392#!,QRSF99#


           >%?-11:?%/%TE
           %%?O3,2#<3?%/%>
           %%%%?-1-2$?%/%;E
           %%%%?3"!!9338"$?%/%;E
           %%%%?82B$9<?%/%6
           %%NE                                total number of hits
           %%?,B-3?%/%>
           !!!!"#$#%&"!'!()
response




           %%%%?@2PO3!1#9?%/%6UV46LM64E
           %%%%?,B-3?%/%G%>
           %%%%%%?OB7<9P?%/%?!178?E
           %%%%%%?O-I.9?%/%?3.92:9#?E
           %%%%%%?OB<?%/%?5?E
           %%%%%%?O3!1#9?%/%6UV46LM64E
           %%%%%%?O31"#!9?%/%
           >
           %%%%?72@9?/%?A7<#9B%C@B9D3:B?E
           %%%%?-2$:?/%?44%(#1F$9@3E%F"-%-,9%*92#!,%AB7=-%)79?E
           %%%%?$B:93?/%G?!1<B7H?E%?F99#?E%?.,1-1H#2.,I?JE
           %%%%?-KB--9#?/%?2?E
           %%%%?,9BH,-?/%;LM
           N%N%J%N%N
2. search
request



           !"#$%,--./00$1!2$,13-/45660!17803.92:9#0O392#!,QRSF99#


           >%?-11:?%/%TE
           %%?O3,2#<3?%/%>
           %%%%?-1-2$?%/%;E
           %%%%?3"!!9338"$?%/%;E
           %%%%?82B$9<?%/%6
           %%NE
           %%?,B-3?%/%>
           %%%%?-1-2$?%/%;E
                                                       the index of the doc
response




           %%%%?@2PO3!1#9?%/%6UV46LM64E
           %%%%?,B-3?%/%G%>
           !!!!!!"*+,-./"!'!"0$,1")
           %%%%%%?O-I.9?%/%?3.92:9#?E
           %%%%%%?OB<?%/%?5?E
           %%%%%%?O3!1#9?%/%6UV46LM64E
           %%%%%%?O31"#!9?%/%
           >
           %%%%?72@9?/%?A7<#9B%C@B9D3:B?E
           %%%%?-2$:?/%?44%(#1F$9@3E%F"-%-,9%*92#!,%AB7=-%)79?E
           %%%%?$B:93?/%G?!1<B7H?E%?F99#?E%?.,1-1H#2.,I?JE
           %%%%?-KB--9#?/%?2?E
           %%%%?,9BH,-?/%;LM
           N%N%J%N%N
2. search
request



           !"#$%,--./00$1!2$,13-/45660!17803.92:9#0O392#!,QRSF99#


           >%?-11:?%/%TE
           %%?O3,2#<3?%/%>
           %%%%?-1-2$?%/%;E
           %%%%?3"!!9338"$?%/%;E
           %%%%?82B$9<?%/%6
           %%NE
           %%?,B-3?%/%>
           %%%%?-1-2$?%/%;E
response




           %%%%?@2PO3!1#9?%/%6UV46LM64E
           %%%%?,B-3?%/%G%>                              the type of the doc
           %%%%%%?OB7<9P?%/%?!178?E
           !!!!!!"*#23."!'!"43.%5.6")
           %%%%%%?OB<?%/%?5?E
           %%%%%%?O3!1#9?%/%6UV46LM64E
           %%%%%%?O31"#!9?%/%
           >
           %%%%?72@9?/%?A7<#9B%C@B9D3:B?E
           %%%%?-2$:?/%?44%(#1F$9@3E%F"-%-,9%*92#!,%AB7=-%)79?E
           %%%%?$B:93?/%G?!1<B7H?E%?F99#?E%?.,1-1H#2.,I?JE
           %%%%?-KB--9#?/%?2?E
           %%%%?,9BH,-?/%;LM
           N%N%J%N%N
2. search
request



           !"#$%,--./00$1!2$,13-/45660!17803.92:9#0O392#!,QRSF99#


           >%?-11:?%/%TE
           %%?O3,2#<3?%/%>
           %%%%?-1-2$?%/%;E
           %%%%?3"!!9338"$?%/%;E
           %%%%?82B$9<?%/%6
           %%NE
           %%?,B-3?%/%>
           %%%%?-1-2$?%/%;E
response




           %%%%?@2PO3!1#9?%/%6UV46LM64E
           %%%%?,B-3?%/%G%>
           %%%%%%?OB7<9P?%/%?!178?E
           %%%%%%?O-I.9?%/%?3.92:9#?E
           !!!!!!"*+-"!'!"7")                             the id of the doc
           %%%%%%?O3!1#9?%/%6UV46LM64E
           %%%%%%?O31"#!9?%/%
           >
           %%%%?72@9?/%?A7<#9B%C@B9D3:B?E
           %%%%?-2$:?/%?44%(#1F$9@3E%F"-%-,9%*92#!,%AB7=-%)79?E
           %%%%?$B:93?/%G?!1<B7H?E%?F99#?E%?.,1-1H#2.,I?JE
           %%%%?-KB--9#?/%?2?E
           %%%%?,9BH,-?/%;LM
           N%N%J%N%N
2. search
request



           !"#$%,--./00$1!2$,13-/45660!17803.92:9#0O392#!,QRSF99#


           >%?-11:?%/%TE
           %%?O3,2#<3?%/%>
           %%%%?-1-2$?%/%;E
           %%%%?3"!!9338"$?%/%;E
           %%%%?82B$9<?%/%6
           %%NE
           %%?,B-3?%/%>
           %%%%?-1-2$?%/%;E
response




           %%%%?@2PO3!1#9?%/%6UV46LM64E
           %%%%?,B-3?%/%G%>
           %%%%%%?OB7<9P?%/%?!178?E
           %%%%%%?O-I.9?%/%?3.92:9#?E
           !!!!!!"*+-"!'!"7")                             the id of the doc
           %%%%%%?O3!1#9?%/%6UV46LM64E
           %%%%%%?O31"#!9?%/%
           >
           %%%%?72@9?/%?A7<#9B%C@B9D3:B?E
           %%%%?-2$:?/%?44%(#1F$9@3E%F"-%-,9%*92#!,%AB7=-%)79?E
           %%%%?$B:93?/%G?!1<B7H?E%?F99#?E%?.,1-1H#2.,I?JE
           %%%%?-KB--9#?/%?2?E
           %%%%?,9BH,-?/%;LM
           N%N%J%N%N
2. search
request



           !"#$%,--./00$1!2$,13-/45660!17803.92:9#0O392#!,QRSF99#


           >%?-11:?%/%TE
           %%?O3,2#<3?%/%>
           %%%%?-1-2$?%/%;E
           %%%%?3"!!9338"$?%/%;E
           %%%%?82B$9<?%/%6
           %%NE
           %%?,B-3?%/%>
           %%%%?-1-2$?%/%;E
response




           %%%%?@2PO3!1#9?%/%6UV46LM64E
           %%%%?,B-3?%/%G%>
           %%%%%%?OB7<9P?%/%?!178?E
           %%%%%%?O-I.9?%/%?3.92:9#?E
           !!!!!!"*+-"!'!"7")
           %%%%%%?O3!1#9?%/%6UV46LM64E                            the hit score
           %%%%%%?O31"#!9?%/%
           >
           %%%%?72@9?/%?A7<#9B%C@B9D3:B?E
           %%%%?-2$:?/%?44%(#1F$9@3E%F"-%-,9%*92#!,%AB7=-%)79?E
           %%%%?$B:93?/%G?!1<B7H?E%?F99#?E%?.,1-1H#2.,I?JE
           %%%%?-KB--9#?/%?2?E
           %%%%?,9BH,-?/%;LM
           N%N%J%N%N
2. search
request



           !"#$%,--./00$1!2$,13-/45660!17803.92:9#0O392#!,QRSF99#


           >%?-11:?%/%TE
           %%?O3,2#<3?%/%>
           %%%%?-1-2$?%/%;E
           %%%%?3"!!9338"$?%/%;E
           %%%%?82B$9<?%/%6
           %%NE
           %%?,B-3?%/%>
           %%%%?-1-2$?%/%;E
response




           %%%%?@2PO3!1#9?%/%6UV46LM64E
           %%%%?,B-3?%/%G%>
           %%%%%%?OB7<9P?%/%?!178?E
           %%%%%%?O-I.9?%/%?3.92:9#?E
           !!!!!!"*+-"!'!"7")
           %%%%%%?O3!1#9?%/%6UV46LM64E
           %%%%%%?O31"#!9?%/%
                                                                  the original source
           8
           !!!!",%9."'!":,-6.+!;9+.<45+")
           !!!!"#%&5"'!"==!>6$?&.94)!?@#!#A.!B.%60A!:+,C#!D,.")
           !!!!"&+5.4"'!E"0$-+,F")!"?..6")!"3A$#$F6%3A2"G)
           !!!!"#H+##.6"'!"%")
           !!!!"A.+FA#"'!(IJ
           K%N%J%N%N
2. search
request



           !"#$%,--./00$1!2$,13-/45660!17803.92:9#0O392#!,QRSF99#


           >%"#$$5"!'!L)
           %%?O3,2#<3?%/%>
           %%%%?-1-2$?%/%;E                             the execution time
           %%%%?3"!!9338"$?%/%;E
           %%%%?82B$9<?%/%6
           %%NE
           %%?,B-3?%/%>
           %%%%?-1-2$?%/%;E
response




           %%%%?@2PO3!1#9?%/%6UV46LM64E
           %%%%?,B-3?%/%G%>
           %%%%%%?OB7<9P?%/%?!178?E
           %%%%%%?O-I.9?%/%?3.92:9#?E
           %%%%%%?OB<?%/%?5?E
           %%%%%%?O3!1#9?%/%6UV46LM64E
           %%%%%%?O31"#!9?%/%
           >
           %%%%?72@9?/%?A7<#9B%C@B9D3:B?E
           %%%%?-2$:?/%?44%(#1F$9@3E%F"-%-,9%*92#!,%AB7=-%)79?E
           %%%%?$B:93?/%G?!1<B7H?E%?F99#?E%?.,1-1H#2.,I?JE
           %%%%?-KB--9#?/%?2?E
           %%%%?,9BH,-?/%;LM
           N%N%J%N%N
3. profit


that’s up to you
demo
distributed model


provides:

  performance

  resiliency (high-availability)
shards
a portion of the document space

each one is a separate Lucene index

  thus, many per-index settings are available

document is sharded by its _id value

  but can be assigned (routed) to a shard
  deterministically
zero-conf discovery


zen (multicast and unicast)

cloud (EC2 via API)
auto-routing

master node:

  maintains cluster state

  reassigns shards if nodes leave/join cluster

any node can serve as the request router

the query is handled via scatter-gather mechanism
replicas

each shard can have 1 or more replicas

# of replicas can be updated dynamically after
index creation

replicas can be used for querying in parallel
shard allocation
               node 1




       start with a single node
shard allocation
                node 1
                 person1
                 person2




      PUT /person {
         “index”: {
            “number_of_shards”: 2,
            “number_of_replicas”: 1
      }}
shard allocation
       node 1          node 2
       person1         person1
       person2         person2




        start the second node
shard allocation
node 1    node 2         node 3   node 4
person1   person1
person2   person2




            start 2 more nodes
shard allocation
node 1    node 2         node 3    node 4
person1                  person1
          person2                  person2




            start 2 more nodes
document sharding
node 1    node 2         node 3     node 4
person1                   person1
          person2                   person2




            PUT /person/info/1
            {…}
document sharding
     node 1         node 2         node 3     node 4
     person1                        person1
                    person2                   person2




                      PUT /person/info/1
hashed to shard 1     {…}
document sharding
node 1    node 2         node 3      node 4
person1                   person1
          person2                    person2




                        replicated

            PUT /person/info/1
            {…}
document sharding
node 1    node 2         node 3     node 4
person1                   person1
          person2                   person2




            PUT /person/info/2
            {…}
document sharding
node 1         node 2            node 3     node 4
person1                           person1
               person2                      person2




hashed to shard 2
                    PUT /person/info/2
                    {…}
document sharding
node 1    node 2         node 3      node 4
person1                   person1
          person2                    person2




                                    replicated

            PUT /person/info/2
            {…}
scatter-gather
node 1          node 2        node 3          node 4
person1                        person1
                person2                       person2




          GET /person/_search?q=name:thomas
shard allocation
node 1          node 2        node 3          node 4
person1                        person1
                person2                       person2




          GET /person/_search?q=name:thomas
shard allocation
node 1          node 2        node 3          node 4
person1                        person1
                person2                       person2




          GET /person/_search?q=name:thomas
shard allocation
node 1          node 2        node 3          node 4
person1                        person1
                person2                       person2




          GET /person/_search?q=name:thomas
transactional model

per-document consistency

no need to commit/flush

uses write-behind transaction log

write consistency (W) can be controlled

  one, quorum, or all
(near) real-time search


1 second refresh rate by default

_refresh API also
index storage

node data considered transient

can be stored in local file system, JVM heap,
native OS memory, or FS & memory combination

persistent storage requires a gateway
gateways
persistent store for cluster state and indices

asynchronous, translog-based write strategy

allows full recovery if a cluster restart is needed

supported gateways:
  local
  shared FS
  Hadoop via HDFS
  S3
mapping
describes document structure to the search
engine

automatically created with sensible defaults

explicit mapping can be provided (generally, a
good idea)

can run into merge conflicts
mapping

important meta fields:

  _source

  _all

  _boost
mapping types

simple:

  string, integer/long, float/double, boolean, and
  null)

complex:

  array, object
sample mapping
document



           >?"39#?/%%%%%%?<9#B!:?E
           %?-B-$9?/%%%%%?W17X-%(27B!?E
           %?-2H3?/%%%%%%G?.#18B$B7H?E%?<9F"HHB7H?E%?.,.?JE
           %?.13-W2-9?/%%?56;6&;5&55+;M/;Y/;5?E
           %?.#B1#B-I?/%%5N



           >?.13-?/%>
           %%?.#1.9#-B93?%/%>
mapping




           %%%%?"39#?/%>?-I.9?/%?3-#B7H?E%?B7<9P?/%?71-O272$IZ9<?NE
           %%%%?@9332H9?/%>?-I.9?/%?3-#B7H?E%[F113-/%;UVNE
           %%%%?-2H3?/%>?-I.9?/%?3-#B7H?E%?B7!$"<9OB7O2$$?/%?71?NE
           %%%%?.13-W2-9?%/%>?-I.9?%/%?<2-9?E%[3-1#9/%[71NE
           %%%%?.#B1#B-I?%/%>?-I.9?%/%?B7-9H9#?N
           NNN
analyzers
break down (tokenize) and normalize fields during
indexing and query strings at search time

analyzer = tokenizer + token filters (0 or more)
*-27<2#<%A72$IZ9#%S
%%%*-27<2#<%+1:97BZ9#%]
%%%%%%%*-27<2#<%+1:97%^B$-9#%]
%%%%%%%_1K9#!239%+1:97%^B$-9#%]
%%%%%%%*-1.%+1:97%^B$-9#
analyzers
                            analyzers, tokenizers, and filters can be
                            customized
mapping elasticsearch.yml




                            B7<9P/
                            %%272$I3B3/
                            %%%%272$IZ9#/
                            %%%%%%.@&%,F/
                            %%%%%%%%-I.9/%!"3-1@
                            %%%%%%%%-1:97BZ9#/%3-27<2#<
                            %%%%%%%%8B$-9#/%G3-27<2#<E%$1K9#!239E%3-1.E
                            %%%%%%%%%%%%%%%%%23!BB81$<B7HE%.1#-9#*-9@J


                            `
                            ?-B-$9?/%>?-I.9?/%?3-#B7H?E%?272$IZ9#?/%?9"$27H?NE
                            `
API
API conventions


append ?pretty=true to get readable JSON

boolean values: false/0/off = false, rest is true

JSONP support via callback parameter
API structure

http://host:port/[index]/[type]/[_action/id]

 GET http://es:9200/_status

 GET http://es:9200/twitter/_status

 POST http://es:9200/twitter/tweet/1

 GET http://es:9200/twitter/tweet/1
API structure
http://host:port/[index]/[type]/[_action/id]

 GET http://es:9200/twitter/tweet/_search

 GET http://es:9200/twitter/user/_search

 GET http://es:9200/twitter/tweet,user/_search

 GET http://es:9200/twitter,facebook/_search

 GET http://es:9200/_search
_cluster API structure

GET /_cluster/health

GET /_cluster/health/index1,index2

GET /_cluster/nodes/stats

GET /_cluster/nodes/nodeId1,nodeId2/stats
API {core}
index             search

bulk               query

delete             from/size paging

delete by query    sort

get                highlighting

count              selective fields
API {indices}
create           optimize

delete           snapshot

open/close       update settings

get/put/delete   analyze
mapping
                 status
refresh
                 flush
API {cluster}

health

state

nodes info

nodes stats

nodes shutdown
Query DSL
term / terms   query_string

range            default_operator

prefix            analyzer

bool             phrase_slop

fuzzy            etc

wildcard
filters


share some similar features with queries (term,
range, etc)

why use a filter?
filters
faster than queries

cached (depends on the filter)

  the cache is used for different queries against
  the same filter

no scoring

more useful ones: term, terms, range, prefix, and,
or, not, exists, missing, query
facets

provide aggregated data based on the search
request

terms, histogram, date histogram, range,
statistical, and more
geo search

implemented as filters (and a facet)

  geo_distance

  geo_bounding_box

  geo_polygon
interfaces
REST

  including memcached

Java /!Groovy

Language clients (REST/Thrift):

  pyes, PHP (standalone and symfony), Ruby, Perl

Flume sink implementation
elastica

similar to the other PHP ElasticSearch client

API naming is consistent with Zend Framework

can be extended for new filters, facets, etc

still under development
elastica
          $es = new Elastica_Client('vm', 9200);
          $index = new Elastica_Index($es, 'test');
          $index->create(array(), true);
          $type = new Elastica_Type($index, 'person');
          $doc = new Elastica_Document(1, array('name' => 'Andrei Zmievski',
example




                                                 'email' => 'andrei@test.com',
                                                 'username' => 'andrei',
                                                 'bills' => array(2, 3, 5)));
          $type->addDocument($doc);

          $qs = new Elastica_Query_QueryString('andrei');
          $query = new Elastica_Query($qs);
          $resultSet = $type->search($query);
          print $resultSet->count();
data import

ES is not the primary data store (usually)

to import/synchronize data:

  write an agent (Gearman, message queues, etc)

  use rivers (CouchDB, RabbitMQ, Twitter)
10 more features
versioning          load balancing nodes

index aliases       plugins

parent/child docs   more_like_this

scripting           multi_field mapping

dynamic mapping     percolation
templates
References

http://github.com/elasticsearch/elasticsearch

http://www.elasticsearch.org/community/forum

IRC: #elasticsearch on irc.freenode.net

twitter: @elasticsearch


             HTTP://ZMIEVSKI.ORG/TALKS

Más contenido relacionado

Destacado

The Pregel Programming Model with Spark GraphX
The Pregel Programming Model with Spark GraphXThe Pregel Programming Model with Spark GraphX
The Pregel Programming Model with Spark GraphXAndrea Iacono
 
How to build_a_search_engine
How to build_a_search_engineHow to build_a_search_engine
How to build_a_search_engineAndrea Iacono
 
03. ElasticSearch : Data In, Data Out
03. ElasticSearch : Data In, Data Out03. ElasticSearch : Data In, Data Out
03. ElasticSearch : Data In, Data OutOpenThink Labs
 
Elasticsearch 101 - Cluster setup and tuning
Elasticsearch 101 - Cluster setup and tuningElasticsearch 101 - Cluster setup and tuning
Elasticsearch 101 - Cluster setup and tuningPetar Djekic
 
The Good, the Bad, and the Ugly: What Happened to Unicode and PHP 6
The Good, the Bad, and the Ugly: What Happened to Unicode and PHP 6The Good, the Bad, and the Ugly: What Happened to Unicode and PHP 6
The Good, the Bad, and the Ugly: What Happened to Unicode and PHP 6Andrei Zmievski
 
Scaling massive elastic search clusters - Rafał Kuć - Sematext
Scaling massive elastic search clusters - Rafał Kuć - SematextScaling massive elastic search clusters - Rafał Kuć - Sematext
Scaling massive elastic search clusters - Rafał Kuć - SematextRafał Kuć
 
Building a distributed search system with Hadoop and Lucene
Building a distributed search system with Hadoop and LuceneBuilding a distributed search system with Hadoop and Lucene
Building a distributed search system with Hadoop and LuceneMirko Calvaresi
 
Solr vs. Elasticsearch - Case by Case
Solr vs. Elasticsearch - Case by CaseSolr vs. Elasticsearch - Case by Case
Solr vs. Elasticsearch - Case by CaseAlexandre Rafalovitch
 
How to Become a Thought Leader in Your Niche
How to Become a Thought Leader in Your NicheHow to Become a Thought Leader in Your Niche
How to Become a Thought Leader in Your NicheLeslie Samuel
 

Destacado (10)

The Pregel Programming Model with Spark GraphX
The Pregel Programming Model with Spark GraphXThe Pregel Programming Model with Spark GraphX
The Pregel Programming Model with Spark GraphX
 
How to build_a_search_engine
How to build_a_search_engineHow to build_a_search_engine
How to build_a_search_engine
 
03. ElasticSearch : Data In, Data Out
03. ElasticSearch : Data In, Data Out03. ElasticSearch : Data In, Data Out
03. ElasticSearch : Data In, Data Out
 
Andrei's Regex Clinic
Andrei's Regex ClinicAndrei's Regex Clinic
Andrei's Regex Clinic
 
Elasticsearch 101 - Cluster setup and tuning
Elasticsearch 101 - Cluster setup and tuningElasticsearch 101 - Cluster setup and tuning
Elasticsearch 101 - Cluster setup and tuning
 
The Good, the Bad, and the Ugly: What Happened to Unicode and PHP 6
The Good, the Bad, and the Ugly: What Happened to Unicode and PHP 6The Good, the Bad, and the Ugly: What Happened to Unicode and PHP 6
The Good, the Bad, and the Ugly: What Happened to Unicode and PHP 6
 
Scaling massive elastic search clusters - Rafał Kuć - Sematext
Scaling massive elastic search clusters - Rafał Kuć - SematextScaling massive elastic search clusters - Rafał Kuć - Sematext
Scaling massive elastic search clusters - Rafał Kuć - Sematext
 
Building a distributed search system with Hadoop and Lucene
Building a distributed search system with Hadoop and LuceneBuilding a distributed search system with Hadoop and Lucene
Building a distributed search system with Hadoop and Lucene
 
Solr vs. Elasticsearch - Case by Case
Solr vs. Elasticsearch - Case by CaseSolr vs. Elasticsearch - Case by Case
Solr vs. Elasticsearch - Case by Case
 
How to Become a Thought Leader in Your Niche
How to Become a Thought Leader in Your NicheHow to Become a Thought Leader in Your Niche
How to Become a Thought Leader in Your Niche
 

Último

"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????blackmambaettijean
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 

Último (20)

"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 

99 Problems, But The Search Ain't One

  • 1. 99 Problems, But The Search Ain’t One Andrei Zmievski • PHP UK •!Feb 25, 2011
  • 2. who am I? curl http://localhost:9200/speaker/info/andrei {“name”: “Andrei Zmievski”, “projects”: [“PHP”, “PHP-GTK”, “Smarty”, “Unicode/i18n”], “likes”: [“coding”, “beer”, “brewing”, “photography”], “twitter”: “@a”, “email”: “andrei@zmievski.org”}
  • 3. what is elasticsearch? a search engine for the NoSQL generation domain-driven distributed RESTful Hitchhiker’s Guide to the Galaxy (no, really)
  • 8. 1. index !"#$%&'()*+%,--./00$1!2$,13-/45660!17803.92:9#0;%&<= > request %%%%?72@9?/%?A7<#9B%C@B9D3:B?E %%%%?-2$:?/%?44%(#1F$9@3E%F"-%-,9%*92#!,%AB7==-%)79?E %%%%?$B:93?/%G?!1<B7H?E%?F99#?E%?.,1-1H#2.,I?JE %%%%?-KB--9#?/%?2?E %%%%?,9BH,-?/%;LM N= > response %%%%?1:?/-#"9 %%%%?OB7<9P?/?!178? %%%%?O-I.9?/?3.92:9#? %%%%?OB<?/?;? N
  • 9. 2. search request !"#$%,--./00$1!2$,13-/45660!17803.92:9#0O392#!,QRSF99# >%?-11:?%/%TE %%?O3,2#<3?%/%> %%%%?-1-2$?%/%;E %%%%?3"!!9338"$?%/%;E %%%%?82B$9<?%/%6 %%NE %%?,B-3?%/%> %%%%?-1-2$?%/%;E response %%%%?@2PO3!1#9?%/%6UV46LM64E %%%%?,B-3?%/%G%> %%%%%%?OB7<9P?%/%?!178?E %%%%%%?O-I.9?%/%?3.92:9#?E %%%%%%?OB<?%/%?5?E %%%%%%?O3!1#9?%/%6UV46LM64E %%%%%%?O31"#!9?%/% > %%%%?72@9?/%?A7<#9B%C@B9D3:B?E %%%%?-2$:?/%?44%(#1F$9@3E%F"-%-,9%*92#!,%AB7=-%)79?E %%%%?$B:93?/%G?!1<B7H?E%?F99#?E%?.,1-1H#2.,I?JE %%%%?-KB--9#?/%?2?E %%%%?,9BH,-?/%;LM N%N%J%N%N
  • 10. 2. search request !"#$%,--./00$1!2$,13-/45660!17803.92:9#0O392#!,QRSF99# >%?-11:?%/%TE %%?O3,2#<3?%/%> %%%%?-1-2$?%/%;E %%%%?3"!!9338"$?%/%;E %%%%?82B$9<?%/%6 %%NE total number of hits %%?,B-3?%/%> !!!!"#$#%&"!'!() response %%%%?@2PO3!1#9?%/%6UV46LM64E %%%%?,B-3?%/%G%> %%%%%%?OB7<9P?%/%?!178?E %%%%%%?O-I.9?%/%?3.92:9#?E %%%%%%?OB<?%/%?5?E %%%%%%?O3!1#9?%/%6UV46LM64E %%%%%%?O31"#!9?%/% > %%%%?72@9?/%?A7<#9B%C@B9D3:B?E %%%%?-2$:?/%?44%(#1F$9@3E%F"-%-,9%*92#!,%AB7=-%)79?E %%%%?$B:93?/%G?!1<B7H?E%?F99#?E%?.,1-1H#2.,I?JE %%%%?-KB--9#?/%?2?E %%%%?,9BH,-?/%;LM N%N%J%N%N
  • 11. 2. search request !"#$%,--./00$1!2$,13-/45660!17803.92:9#0O392#!,QRSF99# >%?-11:?%/%TE %%?O3,2#<3?%/%> %%%%?-1-2$?%/%;E %%%%?3"!!9338"$?%/%;E %%%%?82B$9<?%/%6 %%NE %%?,B-3?%/%> %%%%?-1-2$?%/%;E the index of the doc response %%%%?@2PO3!1#9?%/%6UV46LM64E %%%%?,B-3?%/%G%> !!!!!!"*+,-./"!'!"0$,1") %%%%%%?O-I.9?%/%?3.92:9#?E %%%%%%?OB<?%/%?5?E %%%%%%?O3!1#9?%/%6UV46LM64E %%%%%%?O31"#!9?%/% > %%%%?72@9?/%?A7<#9B%C@B9D3:B?E %%%%?-2$:?/%?44%(#1F$9@3E%F"-%-,9%*92#!,%AB7=-%)79?E %%%%?$B:93?/%G?!1<B7H?E%?F99#?E%?.,1-1H#2.,I?JE %%%%?-KB--9#?/%?2?E %%%%?,9BH,-?/%;LM N%N%J%N%N
  • 12. 2. search request !"#$%,--./00$1!2$,13-/45660!17803.92:9#0O392#!,QRSF99# >%?-11:?%/%TE %%?O3,2#<3?%/%> %%%%?-1-2$?%/%;E %%%%?3"!!9338"$?%/%;E %%%%?82B$9<?%/%6 %%NE %%?,B-3?%/%> %%%%?-1-2$?%/%;E response %%%%?@2PO3!1#9?%/%6UV46LM64E %%%%?,B-3?%/%G%> the type of the doc %%%%%%?OB7<9P?%/%?!178?E !!!!!!"*#23."!'!"43.%5.6") %%%%%%?OB<?%/%?5?E %%%%%%?O3!1#9?%/%6UV46LM64E %%%%%%?O31"#!9?%/% > %%%%?72@9?/%?A7<#9B%C@B9D3:B?E %%%%?-2$:?/%?44%(#1F$9@3E%F"-%-,9%*92#!,%AB7=-%)79?E %%%%?$B:93?/%G?!1<B7H?E%?F99#?E%?.,1-1H#2.,I?JE %%%%?-KB--9#?/%?2?E %%%%?,9BH,-?/%;LM N%N%J%N%N
  • 13. 2. search request !"#$%,--./00$1!2$,13-/45660!17803.92:9#0O392#!,QRSF99# >%?-11:?%/%TE %%?O3,2#<3?%/%> %%%%?-1-2$?%/%;E %%%%?3"!!9338"$?%/%;E %%%%?82B$9<?%/%6 %%NE %%?,B-3?%/%> %%%%?-1-2$?%/%;E response %%%%?@2PO3!1#9?%/%6UV46LM64E %%%%?,B-3?%/%G%> %%%%%%?OB7<9P?%/%?!178?E %%%%%%?O-I.9?%/%?3.92:9#?E !!!!!!"*+-"!'!"7") the id of the doc %%%%%%?O3!1#9?%/%6UV46LM64E %%%%%%?O31"#!9?%/% > %%%%?72@9?/%?A7<#9B%C@B9D3:B?E %%%%?-2$:?/%?44%(#1F$9@3E%F"-%-,9%*92#!,%AB7=-%)79?E %%%%?$B:93?/%G?!1<B7H?E%?F99#?E%?.,1-1H#2.,I?JE %%%%?-KB--9#?/%?2?E %%%%?,9BH,-?/%;LM N%N%J%N%N
  • 14. 2. search request !"#$%,--./00$1!2$,13-/45660!17803.92:9#0O392#!,QRSF99# >%?-11:?%/%TE %%?O3,2#<3?%/%> %%%%?-1-2$?%/%;E %%%%?3"!!9338"$?%/%;E %%%%?82B$9<?%/%6 %%NE %%?,B-3?%/%> %%%%?-1-2$?%/%;E response %%%%?@2PO3!1#9?%/%6UV46LM64E %%%%?,B-3?%/%G%> %%%%%%?OB7<9P?%/%?!178?E %%%%%%?O-I.9?%/%?3.92:9#?E !!!!!!"*+-"!'!"7") the id of the doc %%%%%%?O3!1#9?%/%6UV46LM64E %%%%%%?O31"#!9?%/% > %%%%?72@9?/%?A7<#9B%C@B9D3:B?E %%%%?-2$:?/%?44%(#1F$9@3E%F"-%-,9%*92#!,%AB7=-%)79?E %%%%?$B:93?/%G?!1<B7H?E%?F99#?E%?.,1-1H#2.,I?JE %%%%?-KB--9#?/%?2?E %%%%?,9BH,-?/%;LM N%N%J%N%N
  • 15. 2. search request !"#$%,--./00$1!2$,13-/45660!17803.92:9#0O392#!,QRSF99# >%?-11:?%/%TE %%?O3,2#<3?%/%> %%%%?-1-2$?%/%;E %%%%?3"!!9338"$?%/%;E %%%%?82B$9<?%/%6 %%NE %%?,B-3?%/%> %%%%?-1-2$?%/%;E response %%%%?@2PO3!1#9?%/%6UV46LM64E %%%%?,B-3?%/%G%> %%%%%%?OB7<9P?%/%?!178?E %%%%%%?O-I.9?%/%?3.92:9#?E !!!!!!"*+-"!'!"7") %%%%%%?O3!1#9?%/%6UV46LM64E the hit score %%%%%%?O31"#!9?%/% > %%%%?72@9?/%?A7<#9B%C@B9D3:B?E %%%%?-2$:?/%?44%(#1F$9@3E%F"-%-,9%*92#!,%AB7=-%)79?E %%%%?$B:93?/%G?!1<B7H?E%?F99#?E%?.,1-1H#2.,I?JE %%%%?-KB--9#?/%?2?E %%%%?,9BH,-?/%;LM N%N%J%N%N
  • 16. 2. search request !"#$%,--./00$1!2$,13-/45660!17803.92:9#0O392#!,QRSF99# >%?-11:?%/%TE %%?O3,2#<3?%/%> %%%%?-1-2$?%/%;E %%%%?3"!!9338"$?%/%;E %%%%?82B$9<?%/%6 %%NE %%?,B-3?%/%> %%%%?-1-2$?%/%;E response %%%%?@2PO3!1#9?%/%6UV46LM64E %%%%?,B-3?%/%G%> %%%%%%?OB7<9P?%/%?!178?E %%%%%%?O-I.9?%/%?3.92:9#?E !!!!!!"*+-"!'!"7") %%%%%%?O3!1#9?%/%6UV46LM64E %%%%%%?O31"#!9?%/% the original source 8 !!!!",%9."'!":,-6.+!;9+.<45+") !!!!"#%&5"'!"==!>6$?&.94)!?@#!#A.!B.%60A!:+,C#!D,.") !!!!"&+5.4"'!E"0$-+,F")!"?..6")!"3A$#$F6%3A2"G) !!!!"#H+##.6"'!"%") !!!!"A.+FA#"'!(IJ K%N%J%N%N
  • 17. 2. search request !"#$%,--./00$1!2$,13-/45660!17803.92:9#0O392#!,QRSF99# >%"#$$5"!'!L) %%?O3,2#<3?%/%> %%%%?-1-2$?%/%;E the execution time %%%%?3"!!9338"$?%/%;E %%%%?82B$9<?%/%6 %%NE %%?,B-3?%/%> %%%%?-1-2$?%/%;E response %%%%?@2PO3!1#9?%/%6UV46LM64E %%%%?,B-3?%/%G%> %%%%%%?OB7<9P?%/%?!178?E %%%%%%?O-I.9?%/%?3.92:9#?E %%%%%%?OB<?%/%?5?E %%%%%%?O3!1#9?%/%6UV46LM64E %%%%%%?O31"#!9?%/% > %%%%?72@9?/%?A7<#9B%C@B9D3:B?E %%%%?-2$:?/%?44%(#1F$9@3E%F"-%-,9%*92#!,%AB7=-%)79?E %%%%?$B:93?/%G?!1<B7H?E%?F99#?E%?.,1-1H#2.,I?JE %%%%?-KB--9#?/%?2?E %%%%?,9BH,-?/%;LM N%N%J%N%N
  • 19. demo
  • 20. distributed model provides: performance resiliency (high-availability)
  • 21. shards a portion of the document space each one is a separate Lucene index thus, many per-index settings are available document is sharded by its _id value but can be assigned (routed) to a shard deterministically
  • 22. zero-conf discovery zen (multicast and unicast) cloud (EC2 via API)
  • 23. auto-routing master node: maintains cluster state reassigns shards if nodes leave/join cluster any node can serve as the request router the query is handled via scatter-gather mechanism
  • 24. replicas each shard can have 1 or more replicas # of replicas can be updated dynamically after index creation replicas can be used for querying in parallel
  • 25. shard allocation node 1 start with a single node
  • 26. shard allocation node 1 person1 person2 PUT /person { “index”: { “number_of_shards”: 2, “number_of_replicas”: 1 }}
  • 27. shard allocation node 1 node 2 person1 person1 person2 person2 start the second node
  • 28. shard allocation node 1 node 2 node 3 node 4 person1 person1 person2 person2 start 2 more nodes
  • 29. shard allocation node 1 node 2 node 3 node 4 person1 person1 person2 person2 start 2 more nodes
  • 30. document sharding node 1 node 2 node 3 node 4 person1 person1 person2 person2 PUT /person/info/1 {…}
  • 31. document sharding node 1 node 2 node 3 node 4 person1 person1 person2 person2 PUT /person/info/1 hashed to shard 1 {…}
  • 32. document sharding node 1 node 2 node 3 node 4 person1 person1 person2 person2 replicated PUT /person/info/1 {…}
  • 33. document sharding node 1 node 2 node 3 node 4 person1 person1 person2 person2 PUT /person/info/2 {…}
  • 34. document sharding node 1 node 2 node 3 node 4 person1 person1 person2 person2 hashed to shard 2 PUT /person/info/2 {…}
  • 35. document sharding node 1 node 2 node 3 node 4 person1 person1 person2 person2 replicated PUT /person/info/2 {…}
  • 36. scatter-gather node 1 node 2 node 3 node 4 person1 person1 person2 person2 GET /person/_search?q=name:thomas
  • 37. shard allocation node 1 node 2 node 3 node 4 person1 person1 person2 person2 GET /person/_search?q=name:thomas
  • 38. shard allocation node 1 node 2 node 3 node 4 person1 person1 person2 person2 GET /person/_search?q=name:thomas
  • 39. shard allocation node 1 node 2 node 3 node 4 person1 person1 person2 person2 GET /person/_search?q=name:thomas
  • 40. transactional model per-document consistency no need to commit/flush uses write-behind transaction log write consistency (W) can be controlled one, quorum, or all
  • 41. (near) real-time search 1 second refresh rate by default _refresh API also
  • 42. index storage node data considered transient can be stored in local file system, JVM heap, native OS memory, or FS & memory combination persistent storage requires a gateway
  • 43. gateways persistent store for cluster state and indices asynchronous, translog-based write strategy allows full recovery if a cluster restart is needed supported gateways: local shared FS Hadoop via HDFS S3
  • 44. mapping describes document structure to the search engine automatically created with sensible defaults explicit mapping can be provided (generally, a good idea) can run into merge conflicts
  • 45. mapping important meta fields: _source _all _boost
  • 46. mapping types simple: string, integer/long, float/double, boolean, and null) complex: array, object
  • 47. sample mapping document >?"39#?/%%%%%%?<9#B!:?E %?-B-$9?/%%%%%?W17X-%(27B!?E %?-2H3?/%%%%%%G?.#18B$B7H?E%?<9F"HHB7H?E%?.,.?JE %?.13-W2-9?/%%?56;6&;5&55+;M/;Y/;5?E %?.#B1#B-I?/%%5N >?.13-?/%> %%?.#1.9#-B93?%/%> mapping %%%%?"39#?/%>?-I.9?/%?3-#B7H?E%?B7<9P?/%?71-O272$IZ9<?NE %%%%?@9332H9?/%>?-I.9?/%?3-#B7H?E%[F113-/%;UVNE %%%%?-2H3?/%>?-I.9?/%?3-#B7H?E%?B7!$"<9OB7O2$$?/%?71?NE %%%%?.13-W2-9?%/%>?-I.9?%/%?<2-9?E%[3-1#9/%[71NE %%%%?.#B1#B-I?%/%>?-I.9?%/%?B7-9H9#?N NNN
  • 48. analyzers break down (tokenize) and normalize fields during indexing and query strings at search time analyzer = tokenizer + token filters (0 or more) *-27<2#<%A72$IZ9#%S %%%*-27<2#<%+1:97BZ9#%] %%%%%%%*-27<2#<%+1:97%^B$-9#%] %%%%%%%_1K9#!239%+1:97%^B$-9#%] %%%%%%%*-1.%+1:97%^B$-9#
  • 49. analyzers analyzers, tokenizers, and filters can be customized mapping elasticsearch.yml B7<9P/ %%272$I3B3/ %%%%272$IZ9#/ %%%%%%.@&%,F/ %%%%%%%%-I.9/%!"3-1@ %%%%%%%%-1:97BZ9#/%3-27<2#< %%%%%%%%8B$-9#/%G3-27<2#<E%$1K9#!239E%3-1.E %%%%%%%%%%%%%%%%%23!BB81$<B7HE%.1#-9#*-9@J ` ?-B-$9?/%>?-I.9?/%?3-#B7H?E%?272$IZ9#?/%?9"$27H?NE `
  • 50. API
  • 51. API conventions append ?pretty=true to get readable JSON boolean values: false/0/off = false, rest is true JSONP support via callback parameter
  • 52. API structure http://host:port/[index]/[type]/[_action/id] GET http://es:9200/_status GET http://es:9200/twitter/_status POST http://es:9200/twitter/tweet/1 GET http://es:9200/twitter/tweet/1
  • 53. API structure http://host:port/[index]/[type]/[_action/id] GET http://es:9200/twitter/tweet/_search GET http://es:9200/twitter/user/_search GET http://es:9200/twitter/tweet,user/_search GET http://es:9200/twitter,facebook/_search GET http://es:9200/_search
  • 54. _cluster API structure GET /_cluster/health GET /_cluster/health/index1,index2 GET /_cluster/nodes/stats GET /_cluster/nodes/nodeId1,nodeId2/stats
  • 55. API {core} index search bulk query delete from/size paging delete by query sort get highlighting count selective fields
  • 56. API {indices} create optimize delete snapshot open/close update settings get/put/delete analyze mapping status refresh flush
  • 58. Query DSL term / terms query_string range default_operator prefix analyzer bool phrase_slop fuzzy etc wildcard
  • 59. filters share some similar features with queries (term, range, etc) why use a filter?
  • 60. filters faster than queries cached (depends on the filter) the cache is used for different queries against the same filter no scoring more useful ones: term, terms, range, prefix, and, or, not, exists, missing, query
  • 61. facets provide aggregated data based on the search request terms, histogram, date histogram, range, statistical, and more
  • 62. geo search implemented as filters (and a facet) geo_distance geo_bounding_box geo_polygon
  • 63. interfaces REST including memcached Java /!Groovy Language clients (REST/Thrift): pyes, PHP (standalone and symfony), Ruby, Perl Flume sink implementation
  • 64. elastica similar to the other PHP ElasticSearch client API naming is consistent with Zend Framework can be extended for new filters, facets, etc still under development
  • 65. elastica $es = new Elastica_Client('vm', 9200); $index = new Elastica_Index($es, 'test'); $index->create(array(), true); $type = new Elastica_Type($index, 'person'); $doc = new Elastica_Document(1, array('name' => 'Andrei Zmievski', example 'email' => 'andrei@test.com', 'username' => 'andrei', 'bills' => array(2, 3, 5))); $type->addDocument($doc); $qs = new Elastica_Query_QueryString('andrei'); $query = new Elastica_Query($qs); $resultSet = $type->search($query); print $resultSet->count();
  • 66. data import ES is not the primary data store (usually) to import/synchronize data: write an agent (Gearman, message queues, etc) use rivers (CouchDB, RabbitMQ, Twitter)
  • 67. 10 more features versioning load balancing nodes index aliases plugins parent/child docs more_like_this scripting multi_field mapping dynamic mapping percolation templates