SlideShare una empresa de Scribd logo
1 de 42
Descargar para leer sin conexión
Martijn van Groningen
@mvgroningen
Percolator
Thursday, September 5, 13
Topics
• What is percolator?
• Redesigned percolator
• New percolator features
• How does the percolator work?
Thursday, September 5, 13
Percolator?
coffee OR pots
Title : Coffee percolator
Body : A coffee percolator is a type of
pot used to brew coffee by continually
cycling the boiling or nearly-boiling
brew through ...
Title : Coffee percolator
Body : A coffee percolator is a type of
pot used to brew coffee by continually
cycling the boiling or nearly-boiling
brew through ...
Title : Coffee percolator
Body : A coffee percolator is a type of
pot used to brew coffee by continually
cycling the boiling or nearly-boiling
brew through ...
Title : Coffee percolator
Body : A coffee percolator is a type of
pot used to brew coffee by continually
cycling the boiling or nearly-boiling
brew through ...
1. Coffee percolator
2. Plain old telephone service (pots)
...
Hits
Query
Documents
Thursday, September 5, 13
Percolator?
coffee OR pots
Title : Coffee percolator
Body : A coffee percolator is a type of
pot used to brew coffee by continually
cycling the boiling or nearly-boiling
brew through ...
1. Coffee OR pots
2. boiling AND brew
...
Matches
Document Queries
boiling AND brew
other AND stuff
Thursday, September 5, 13
Percolator?
• Reversed search
• Document becomes a query and a query
becomes a document.
• Queries need to be stored.
• matches != hits
Because hits has relevancy whereas matches have not.
Thursday, September 5, 13
Percolator, but how?
• Search request:
• Queries are defined in JSON.
• But so are documents!
curl -XPOST 'localhost:9200/my-index/_search' -d '{
"query" : {
      "match" : {
         "body" : "coffee"
    }
  }
}'
Thursday, September 5, 13
Percolator, but how?
• Indexing a query (<=0.90):
• Any query can be indexed as a document.
Plus any arbitrary data
curl -XPUT 'localhost:9200/_percolator/my-index/my-id' -d '{
"query" : {
      "match" : {
         "body" : "coffee"
    }
  },
"click_id" : 12
}'
Thursday, September 5, 13
Percolator, but how?
• Indexing a query:
• Path structure
index: _percolator is a reserved index for queries.
type: The index to register a query to.
id: The unique identifier for a query.
curl -XPUT 'localhost:9200/_percolator/my-index/my-id' -d '{
"query" : {
      "match" : {
         "body" : "coffee"
    }
  },
"click_id" : 12
}'
Thursday, September 5, 13
Percolator, but how?
• Percolate api (<=0.90):
• All queries registered to ‘my-index’ are
consulted.
curl -XPUT 'localhost:9200/my-index/my-type/_percolate' -d '{
"doc" : {
      "title" : "Coffee percolator",
"body" : "A coffee percolator is a type of ..."
  }
}'
Thursday, September 5, 13
Percolator, but how?
• Percolate api response (<=0.90):
• A simple list of query ids.
• Also the percolate api work in realtime.
{
"ok" : true
"matches" : ["my-id",...]
}
Thursday, September 5, 13
Percolation in the wild
Thursday, September 5, 13
Alerting use case
• Store and register queries that monitor data.
End users can define their alerts via application.
• Execute the percolate api right after indexing.
No need to wait - percolator works in realtime.
• Examples:
Price monitor, News alerts, Stock alerts, Weather alerts
Thursday, September 5, 13
Alerting use case
curl -XPUT 'localhost:9200/_percolator/prices/user-1' -d '{
"query" : {
"bool" : [
{
"range" : {
"product.price" : {
"lte" : 500
}
}
},
{
"match" : {
         "product.name" : "my led tv"
    }
}
]
  }
}'
Triggered by user adding an user alert:
Thursday, September 5, 13
Percolator - alerting use case
curl -XPOST 'localhost:9200/prices/price/_percolate' -d '{
"doc" : {
"product" : {
"name" : "my led tv",
"price" : 499
}
  }
}'
Then when new TVs are added:
Thursday, September 5, 13
Pricing use case
• Store all users’ queries of a specific time frame
Last week’s, last month’s queries.
• Provide feedback to advertisement owner.
Execute percolate api while editing the ad.
• Examples:
Real estate, car sales or any other market place.
Thursday, September 5, 13
Contextual ads use case
• Store advertisement as queries.
• On page display percolate document against
the stored advertisements.
• Examples:
Gmail
Thursday, September 5, 13
Classification use case
• Store queries that can identify patterns in your
documents.
• Percolate a document before indexing it.
Enrich the document with the queries it matches with.
• Examples:
Automatically tag documents, geo tag documents and
ways to automatically categorize documents.
Thursday, September 5, 13
Distributed Percolator
Thursday, September 5, 13
Percolator - redesign
• The _percolator index can only have one
primary shard.
Node 1
p1
Node 2
p1
Node 3
p1
C
Percolate
?
? ?
?
? ?
?
? ?
Thursday, September 5, 13
Percolator - redesign
• The redesigned percolator has no dedicated
reserved _percolator index.
• Instead the redesigned percolator has a
_percolator type / mapping.
• Any index can become a percolator index.
Without any restrictions on (sharding) settings.
Thursday, September 5, 13
Percolator - redesign
• Because _percolator index has been
replaced by _percolator type:
• Queries and your data coexist in the same
index.
Percolator shares the settings of the index it sits in.
• Or have a number dedicated percolator
indices.
Thursday, September 5, 13
Percolator - redesign
• Redesigned percolator is fully distributed.
Node 1
a1
Node 2
a1
C
Percolate
a2a2
Thursday, September 5, 13
Percolator - redesign
• Indexing a query:
• Path structure
index: The index to hold the query.
type: The reserved _percolator type.
id: The unique identifier for a query.
curl -XPUT 'localhost:9200/my-index/_percolator/my-id' -d '{
"query" : {
      "match" : {
         "body" : "coffee"
    }
  },
"click_id" : 12
}'
Thursday, September 5, 13
• Percolate api remains similar, but:
Fully multi tenant:
Full alias support:
And routing support.
Percolator - redesign
curl -XGET 'localhost:9200/my-index1,my-index2/my-type/_percolate' -d '{
"doc" : {
      "title" : "Coffee percolator",
"body" : "A coffee percolator is a type of ..."
  }
}'
curl -XGET 'localhost:9200/my-alias/my-type/_percolate' -d '{
"doc" : {
      "title" : "Coffee percolator",
"body" : "A coffee percolator is a type of ..."
  }
}'
Thursday, September 5, 13
Percolator - redesign
• Percolate api response:
{
"took" : 19,
"_shards" : {
"total" : 2,
"successful" : 2,
"failed" : 0
},
"count" : 4,
"matches" : [
{
"_index" : "my-index1",
"_id" : "my-id"
},
{
"_index" : "my-index2",
"_id" : "my-id"
},
...
]
}
Thursday, September 5, 13
Percolator - how does it work?
• Each shard holds a Collection of parsed
queries in memory.
• The queries are also stored on the shard
(Lucene index)
• The collection of queries get updated by
every index, create, update or delete
operation in realtime.
Thursday, September 5, 13
Percolator - how does it work?
• During percolating the document to be
percolated gets indexed into an in memory
index.
• All shard queries are executed against this one
document in memory index.
Shard level execution time is linear to the amount
queries to evaluate.
• After all queries have been evaluated the in
memory index gets cleaned up.
Thursday, September 5, 13
Distributed percolator
• Percolate api executes the request in parallel
on all shards.
• Use routing and multi tenancy to reduce the
amount of queries to evaluate.
- Routing will reduce the amount of shards.
- More indices (and therefore more shards) reduces the
amount of queries per shard.
Thursday, September 5, 13
Distributed percolator
• No routing / partitioning
Node 1
a1
Node 2
a1
C
Percolate
a2a2
a3 a3
Thursday, September 5, 13
Distributed percolator
• Percolating with routing:
Node 1
a1
Node 2
a1
C
Percolate,
but route
with XYZ
a2a2
a3 a3
Thursday, September 5, 13
Node 1
Distributed percolator
• Percolating based on location partitioning in
different indices.
a1
Node 2
a1
C
a2a2
b1 b1b2
index a = EU queries
index b = NA queries
b2
Thursday, September 5, 13
Percolator features
Thursday, September 5, 13
Feature - percolate existing doc
• Percolating a newly indexed document is very
common pattern.
curl -XGET 'localhost:9200/my-index1/my-type/1/_percolate'
curl -XGET 'localhost:9200/my-index1/my-type/1/_percolate?percolate_index=my-index2'
my-index1 is both percolate and source index:
my-index2 contains the queries to evaluate:
and my-index1 contains the document to percolate
Thursday, September 5, 13
Feature - count api
curl -XPUT 'localhost:9200/my-index1/my-type/_percolate/count' -d '{
"doc" : {
      "title" : "Coffee percolator",
"body" : "A coffee percolator is a type of ..."
  }
}'
{
"took" : 8,
"_shards" : {
"total" : 2,
"successful" : 2,
"failed" : 0
},
"count" : 5
}
Response:
Count api:
curl -XPUT 'localhost:9200/my-index1/my-type/1/_percolate/count'
Count existing doc api:
Thursday, September 5, 13
Feature - filtering
curl -XGET 'localhost:9200/my-index1/my-type/_percolate/count' -d '{
"doc" : {
       "title" : "Coffee percolator",
"body" : "A coffee percolator is a type of ..."
  },
"query" : {
"term" : {"click_id" : "43"}
}
}'
Filtering by query:
curl -XGET 'localhost:9200/my-index1/my-type/_percolate/count' -d '{
"doc" : {
       "title" : "Coffee percolator",
"body" : "A coffee percolator is a type of ..."
  },
"filter" : {
"term" : {"click_id" : "43"}
}
}'
Filtering by filter:
Thursday, September 5, 13
Feature - sorting / scoring
• Build on top on the query support.
• Sorting based on percolator query fields.
Document being percolated isn’t scored!
• Three new options:
• size The amount of matches to return (required with sort)
• sort Whether to sort based on query.
• score Just include score, but don’t sort
• Like the query / filter support not realtime.
Thursday, September 5, 13
Feature - sorting / scoring
• Sorting support works nicely with function
score query.
curl -XGET 'localhost:9200/my-index1/my-type/_percolate' -d '{
"doc" : {
       ...
  },
"query" : {
"function_score" : {
"query" : { "match_all": {}},
"functions" : [
{
"exp" : {
"create_date" : {
"reference" : "2013/08/14",
"scale" : "1000d"
}
}
}
]
}
}
"sort" : true,
"size" : 10
}'
Field in query
Thursday, September 5, 13
Feature - sorting / scoring
{
"took": 2,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"total": 2,
"matches": [
{
"_index": "my-index",
"_id": "2",
"_score": 0.85559505
},
{
"_index": "my-index",
"_id": "1",
"_score": 0.4002574
}
]
}
• Response:
Thursday, September 5, 13
Feature - highlighting
curl -XPUT 'localhost:9200/my-index/_percolator/1' -d '{
"query": {
"match" : {
"body" : "brown fox"
}
}
}'
curl -XPUT 'localhost:9200/my-index/_percolator/2' -d '{
"query": {
"match" : {
"body" : "lazy dog"
}
}
}'
• Lets index two queries:
Thursday, September 5, 13
Feature - highlighting
• The size option is required.
• All highlight options are supported.
curl -XGET 'localhost:9200/my-index/my-type/percolate' -d '{
"doc" : {
"body" : "The quick brown fox jumps over the lazy dog"
},
"highlight" : {
"fields" : {
"body" : {}
}
},
"size" : 5
}'
Thursday, September 5, 13
Feature - highlighting
{
...
"total": 2,
"matches": [
{
"_index": "my-index",
"_id": "1",
"highlight": {
"body": [
"The quick <em>brown</em> <em>fox</em> jumps over the lazy dog"
]
}
},
{
"_index": "my-index",
"_id": "2",
"highlight": {
"body": [
"The quick brown fox jumps over the <em>lazy</em> <em>dog</em>"
]
}
}
]
}
Thursday, September 5, 13
Feature - multi percolate
• Combine multiple percolate requests into a
single request.
{"percolate" : {"index" : "my-index", "type" : "my-tweet"}}
{"doc" : {"title" : "coffee percolator"}}
{"percolate" : "index" : "my-index", "type" : "my-type", "id" : "1"}
{}
{"count" : {"index" : "my-index", "type" : "my-type"}}
{"doc" : {"title" : "coffee percolator"}}
{"count" : "index" : "my-index", "type" : "my-type", "id" : "1"}
{}
curl -XGET 'localhost:9200/_mpercolate' --data-binary @requests.txt; echo
requests.txt:
Request:
Thursday, September 5, 13

Más contenido relacionado

La actualidad más candente

Stream processing using Kafka
Stream processing using KafkaStream processing using Kafka
Stream processing using KafkaKnoldus Inc.
 
Portable Lucene Index Format & Applications - Andrzej Bialecki
Portable Lucene Index Format & Applications - Andrzej BialeckiPortable Lucene Index Format & Applications - Andrzej Bialecki
Portable Lucene Index Format & Applications - Andrzej Bialeckilucenerevolution
 
Spark SQL Bucketing at Facebook
 Spark SQL Bucketing at Facebook Spark SQL Bucketing at Facebook
Spark SQL Bucketing at FacebookDatabricks
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotFlink Forward
 
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)Jean-Paul Azar
 
Data Science Across Data Sources with Apache Arrow
Data Science Across Data Sources with Apache ArrowData Science Across Data Sources with Apache Arrow
Data Science Across Data Sources with Apache ArrowDatabricks
 
Pinot: Near Realtime Analytics @ Uber
Pinot: Near Realtime Analytics @ UberPinot: Near Realtime Analytics @ Uber
Pinot: Near Realtime Analytics @ UberXiang Fu
 
From Zero to Hero with Kafka Connect
From Zero to Hero with Kafka ConnectFrom Zero to Hero with Kafka Connect
From Zero to Hero with Kafka ConnectDatabricks
 
Apache Kafka vs. Cloud-native iPaaS Integration Platform Middleware
Apache Kafka vs. Cloud-native iPaaS Integration Platform MiddlewareApache Kafka vs. Cloud-native iPaaS Integration Platform Middleware
Apache Kafka vs. Cloud-native iPaaS Integration Platform MiddlewareKai Wähner
 
Under the Hood of a Shard-per-Core Database Architecture
Under the Hood of a Shard-per-Core Database ArchitectureUnder the Hood of a Shard-per-Core Database Architecture
Under the Hood of a Shard-per-Core Database ArchitectureScyllaDB
 
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity PlanningFrom Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planningconfluent
 
Diving into the Deep End - Kafka Connect
Diving into the Deep End - Kafka ConnectDiving into the Deep End - Kafka Connect
Diving into the Deep End - Kafka Connectconfluent
 
Haystack 2019 - Natural Language Search with Knowledge Graphs - Trey Grainger
Haystack 2019 - Natural Language Search with Knowledge Graphs - Trey GraingerHaystack 2019 - Natural Language Search with Knowledge Graphs - Trey Grainger
Haystack 2019 - Natural Language Search with Knowledge Graphs - Trey GraingerOpenSource Connections
 
Kafka Tutorial: Advanced Producers
Kafka Tutorial: Advanced ProducersKafka Tutorial: Advanced Producers
Kafka Tutorial: Advanced ProducersJean-Paul Azar
 
(BDT309) Data Science & Best Practices for Apache Spark on Amazon EMR
(BDT309) Data Science & Best Practices for Apache Spark on Amazon EMR(BDT309) Data Science & Best Practices for Apache Spark on Amazon EMR
(BDT309) Data Science & Best Practices for Apache Spark on Amazon EMRAmazon Web Services
 
Best Practices for Running PostgreSQL on AWS - DAT314 - re:Invent 2017
Best Practices for Running PostgreSQL on AWS - DAT314 - re:Invent 2017Best Practices for Running PostgreSQL on AWS - DAT314 - re:Invent 2017
Best Practices for Running PostgreSQL on AWS - DAT314 - re:Invent 2017Amazon Web Services
 
Flink Forward Berlin 2017: Robert Metzger - Keep it going - How to reliably a...
Flink Forward Berlin 2017: Robert Metzger - Keep it going - How to reliably a...Flink Forward Berlin 2017: Robert Metzger - Keep it going - How to reliably a...
Flink Forward Berlin 2017: Robert Metzger - Keep it going - How to reliably a...Flink Forward
 
Troubleshooting Kafka's socket server: from incident to resolution
Troubleshooting Kafka's socket server: from incident to resolutionTroubleshooting Kafka's socket server: from incident to resolution
Troubleshooting Kafka's socket server: from incident to resolutionJoel Koshy
 

La actualidad más candente (20)

Stream processing using Kafka
Stream processing using KafkaStream processing using Kafka
Stream processing using Kafka
 
Portable Lucene Index Format & Applications - Andrzej Bialecki
Portable Lucene Index Format & Applications - Andrzej BialeckiPortable Lucene Index Format & Applications - Andrzej Bialecki
Portable Lucene Index Format & Applications - Andrzej Bialecki
 
Spark SQL Bucketing at Facebook
 Spark SQL Bucketing at Facebook Spark SQL Bucketing at Facebook
Spark SQL Bucketing at Facebook
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
 
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
 
Data Science Across Data Sources with Apache Arrow
Data Science Across Data Sources with Apache ArrowData Science Across Data Sources with Apache Arrow
Data Science Across Data Sources with Apache Arrow
 
Pinot: Near Realtime Analytics @ Uber
Pinot: Near Realtime Analytics @ UberPinot: Near Realtime Analytics @ Uber
Pinot: Near Realtime Analytics @ Uber
 
From Zero to Hero with Kafka Connect
From Zero to Hero with Kafka ConnectFrom Zero to Hero with Kafka Connect
From Zero to Hero with Kafka Connect
 
Apache Kafka vs. Cloud-native iPaaS Integration Platform Middleware
Apache Kafka vs. Cloud-native iPaaS Integration Platform MiddlewareApache Kafka vs. Cloud-native iPaaS Integration Platform Middleware
Apache Kafka vs. Cloud-native iPaaS Integration Platform Middleware
 
KSQL Intro
KSQL IntroKSQL Intro
KSQL Intro
 
Under the Hood of a Shard-per-Core Database Architecture
Under the Hood of a Shard-per-Core Database ArchitectureUnder the Hood of a Shard-per-Core Database Architecture
Under the Hood of a Shard-per-Core Database Architecture
 
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity PlanningFrom Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
 
Diving into the Deep End - Kafka Connect
Diving into the Deep End - Kafka ConnectDiving into the Deep End - Kafka Connect
Diving into the Deep End - Kafka Connect
 
Haystack 2019 - Natural Language Search with Knowledge Graphs - Trey Grainger
Haystack 2019 - Natural Language Search with Knowledge Graphs - Trey GraingerHaystack 2019 - Natural Language Search with Knowledge Graphs - Trey Grainger
Haystack 2019 - Natural Language Search with Knowledge Graphs - Trey Grainger
 
Kafka Tutorial: Advanced Producers
Kafka Tutorial: Advanced ProducersKafka Tutorial: Advanced Producers
Kafka Tutorial: Advanced Producers
 
Kafka/SMM Crash Course
Kafka/SMM Crash CourseKafka/SMM Crash Course
Kafka/SMM Crash Course
 
(BDT309) Data Science & Best Practices for Apache Spark on Amazon EMR
(BDT309) Data Science & Best Practices for Apache Spark on Amazon EMR(BDT309) Data Science & Best Practices for Apache Spark on Amazon EMR
(BDT309) Data Science & Best Practices for Apache Spark on Amazon EMR
 
Best Practices for Running PostgreSQL on AWS - DAT314 - re:Invent 2017
Best Practices for Running PostgreSQL on AWS - DAT314 - re:Invent 2017Best Practices for Running PostgreSQL on AWS - DAT314 - re:Invent 2017
Best Practices for Running PostgreSQL on AWS - DAT314 - re:Invent 2017
 
Flink Forward Berlin 2017: Robert Metzger - Keep it going - How to reliably a...
Flink Forward Berlin 2017: Robert Metzger - Keep it going - How to reliably a...Flink Forward Berlin 2017: Robert Metzger - Keep it going - How to reliably a...
Flink Forward Berlin 2017: Robert Metzger - Keep it going - How to reliably a...
 
Troubleshooting Kafka's socket server: from incident to resolution
Troubleshooting Kafka's socket server: from incident to resolutionTroubleshooting Kafka's socket server: from incident to resolution
Troubleshooting Kafka's socket server: from incident to resolution
 

Destacado

elasticsearch - advanced features in practice
elasticsearch - advanced features in practiceelasticsearch - advanced features in practice
elasticsearch - advanced features in practiceJano Suchal
 
Practical Elasticsearch - real world use cases
Practical Elasticsearch - real world use casesPractical Elasticsearch - real world use cases
Practical Elasticsearch - real world use casesItamar
 
Scaling Elasticsearch at Synthesio
Scaling Elasticsearch at SynthesioScaling Elasticsearch at Synthesio
Scaling Elasticsearch at SynthesioFred de Villamil
 
Elasticsearch Distributed search & analytics on BigData made easy
Elasticsearch Distributed search & analytics on BigData made easyElasticsearch Distributed search & analytics on BigData made easy
Elasticsearch Distributed search & analytics on BigData made easyItamar
 
ElasticSearch Basic Introduction
ElasticSearch Basic IntroductionElasticSearch Basic Introduction
ElasticSearch Basic IntroductionMayur Rathod
 
An Introduction to Elastic Search.
An Introduction to Elastic Search.An Introduction to Elastic Search.
An Introduction to Elastic Search.Jurriaan Persyn
 

Destacado (6)

elasticsearch - advanced features in practice
elasticsearch - advanced features in practiceelasticsearch - advanced features in practice
elasticsearch - advanced features in practice
 
Practical Elasticsearch - real world use cases
Practical Elasticsearch - real world use casesPractical Elasticsearch - real world use cases
Practical Elasticsearch - real world use cases
 
Scaling Elasticsearch at Synthesio
Scaling Elasticsearch at SynthesioScaling Elasticsearch at Synthesio
Scaling Elasticsearch at Synthesio
 
Elasticsearch Distributed search & analytics on BigData made easy
Elasticsearch Distributed search & analytics on BigData made easyElasticsearch Distributed search & analytics on BigData made easy
Elasticsearch Distributed search & analytics on BigData made easy
 
ElasticSearch Basic Introduction
ElasticSearch Basic IntroductionElasticSearch Basic Introduction
ElasticSearch Basic Introduction
 
An Introduction to Elastic Search.
An Introduction to Elastic Search.An Introduction to Elastic Search.
An Introduction to Elastic Search.
 

Similar a Distributed percolator in elasticsearch

elasticsearch basics workshop
elasticsearch basics workshopelasticsearch basics workshop
elasticsearch basics workshopMathieu Elie
 
ElasticSearch 5.x - New Tricks - 2017-02-08 - Elasticsearch Meetup
ElasticSearch 5.x -  New Tricks - 2017-02-08 - Elasticsearch Meetup ElasticSearch 5.x -  New Tricks - 2017-02-08 - Elasticsearch Meetup
ElasticSearch 5.x - New Tricks - 2017-02-08 - Elasticsearch Meetup Alberto Paro
 
Cooking an Omelette with Chef
Cooking an Omelette with ChefCooking an Omelette with Chef
Cooking an Omelette with Chefctaintor
 
Elasticsearch in 15 minutes
Elasticsearch in 15 minutesElasticsearch in 15 minutes
Elasticsearch in 15 minutesDavid Pilato
 
Specking Interactors with PHPSpec and YOLO (DDD) at PHPConference Argentina 2013
Specking Interactors with PHPSpec and YOLO (DDD) at PHPConference Argentina 2013Specking Interactors with PHPSpec and YOLO (DDD) at PHPConference Argentina 2013
Specking Interactors with PHPSpec and YOLO (DDD) at PHPConference Argentina 2013cordoval
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with SolrErik Hatcher
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with SolrErik Hatcher
 
Document relations - Berlin Buzzwords 2013
Document relations - Berlin Buzzwords 2013Document relations - Berlin Buzzwords 2013
Document relations - Berlin Buzzwords 2013martijnvg
 
Engineering culture
Engineering cultureEngineering culture
Engineering culturePamela Fox
 
CODAIT/Spark-Bench
CODAIT/Spark-BenchCODAIT/Spark-Bench
CODAIT/Spark-BenchEmily Curtin
 
Introduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of LuceneIntroduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of LuceneRahul Jain
 
Recommender Systems with Ruby (adding machine learning, statistics, etc)
Recommender Systems with Ruby (adding machine learning, statistics, etc)Recommender Systems with Ruby (adding machine learning, statistics, etc)
Recommender Systems with Ruby (adding machine learning, statistics, etc)Marcel Caraciolo
 
Padrino is agnostic
Padrino is agnosticPadrino is agnostic
Padrino is agnosticTakeshi Yabe
 
Become Master of Your Own Universe - DIBI 2013
Become Master of Your Own Universe - DIBI 2013Become Master of Your Own Universe - DIBI 2013
Become Master of Your Own Universe - DIBI 2013Phil Sturgeon
 
Don't RTFM, WTFM - Open Source Documentation - German Perl Workshop 2010
Don't RTFM, WTFM - Open Source Documentation - German Perl Workshop 2010Don't RTFM, WTFM - Open Source Documentation - German Perl Workshop 2010
Don't RTFM, WTFM - Open Source Documentation - German Perl Workshop 2010singingfish
 
Rapid prototyping with solr - By Erik Hatcher
Rapid prototyping with solr -  By Erik Hatcher Rapid prototyping with solr -  By Erik Hatcher
Rapid prototyping with solr - By Erik Hatcher lucenerevolution
 

Similar a Distributed percolator in elasticsearch (20)

elasticsearch basics workshop
elasticsearch basics workshopelasticsearch basics workshop
elasticsearch basics workshop
 
ElasticSearch 5.x - New Tricks - 2017-02-08 - Elasticsearch Meetup
ElasticSearch 5.x -  New Tricks - 2017-02-08 - Elasticsearch Meetup ElasticSearch 5.x -  New Tricks - 2017-02-08 - Elasticsearch Meetup
ElasticSearch 5.x - New Tricks - 2017-02-08 - Elasticsearch Meetup
 
Cooking an Omelette with Chef
Cooking an Omelette with ChefCooking an Omelette with Chef
Cooking an Omelette with Chef
 
Elasticsearch in 15 minutes
Elasticsearch in 15 minutesElasticsearch in 15 minutes
Elasticsearch in 15 minutes
 
CouchDB Day NYC 2017: Full Text Search
CouchDB Day NYC 2017: Full Text SearchCouchDB Day NYC 2017: Full Text Search
CouchDB Day NYC 2017: Full Text Search
 
Specking Interactors with PHPSpec and YOLO (DDD) at PHPConference Argentina 2013
Specking Interactors with PHPSpec and YOLO (DDD) at PHPConference Argentina 2013Specking Interactors with PHPSpec and YOLO (DDD) at PHPConference Argentina 2013
Specking Interactors with PHPSpec and YOLO (DDD) at PHPConference Argentina 2013
 
Meet Couch DB
Meet Couch DBMeet Couch DB
Meet Couch DB
 
ACM BPM and elasticsearch AMIS25
ACM BPM and elasticsearch AMIS25ACM BPM and elasticsearch AMIS25
ACM BPM and elasticsearch AMIS25
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
Document relations - Berlin Buzzwords 2013
Document relations - Berlin Buzzwords 2013Document relations - Berlin Buzzwords 2013
Document relations - Berlin Buzzwords 2013
 
Engineering culture
Engineering cultureEngineering culture
Engineering culture
 
CODAIT/Spark-Bench
CODAIT/Spark-BenchCODAIT/Spark-Bench
CODAIT/Spark-Bench
 
Introduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of LuceneIntroduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of Lucene
 
Recommender Systems with Ruby (adding machine learning, statistics, etc)
Recommender Systems with Ruby (adding machine learning, statistics, etc)Recommender Systems with Ruby (adding machine learning, statistics, etc)
Recommender Systems with Ruby (adding machine learning, statistics, etc)
 
Padrino is agnostic
Padrino is agnosticPadrino is agnostic
Padrino is agnostic
 
Become Master of Your Own Universe - DIBI 2013
Become Master of Your Own Universe - DIBI 2013Become Master of Your Own Universe - DIBI 2013
Become Master of Your Own Universe - DIBI 2013
 
Don't RTFM, WTFM - Open Source Documentation - German Perl Workshop 2010
Don't RTFM, WTFM - Open Source Documentation - German Perl Workshop 2010Don't RTFM, WTFM - Open Source Documentation - German Perl Workshop 2010
Don't RTFM, WTFM - Open Source Documentation - German Perl Workshop 2010
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
Rapid prototyping with solr - By Erik Hatcher
Rapid prototyping with solr -  By Erik Hatcher Rapid prototyping with solr -  By Erik Hatcher
Rapid prototyping with solr - By Erik Hatcher
 

Último

Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 

Último (20)

Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 

Distributed percolator in elasticsearch

  • 2. Topics • What is percolator? • Redesigned percolator • New percolator features • How does the percolator work? Thursday, September 5, 13
  • 3. Percolator? coffee OR pots Title : Coffee percolator Body : A coffee percolator is a type of pot used to brew coffee by continually cycling the boiling or nearly-boiling brew through ... Title : Coffee percolator Body : A coffee percolator is a type of pot used to brew coffee by continually cycling the boiling or nearly-boiling brew through ... Title : Coffee percolator Body : A coffee percolator is a type of pot used to brew coffee by continually cycling the boiling or nearly-boiling brew through ... Title : Coffee percolator Body : A coffee percolator is a type of pot used to brew coffee by continually cycling the boiling or nearly-boiling brew through ... 1. Coffee percolator 2. Plain old telephone service (pots) ... Hits Query Documents Thursday, September 5, 13
  • 4. Percolator? coffee OR pots Title : Coffee percolator Body : A coffee percolator is a type of pot used to brew coffee by continually cycling the boiling or nearly-boiling brew through ... 1. Coffee OR pots 2. boiling AND brew ... Matches Document Queries boiling AND brew other AND stuff Thursday, September 5, 13
  • 5. Percolator? • Reversed search • Document becomes a query and a query becomes a document. • Queries need to be stored. • matches != hits Because hits has relevancy whereas matches have not. Thursday, September 5, 13
  • 6. Percolator, but how? • Search request: • Queries are defined in JSON. • But so are documents! curl -XPOST 'localhost:9200/my-index/_search' -d '{ "query" : {       "match" : {          "body" : "coffee"     }   } }' Thursday, September 5, 13
  • 7. Percolator, but how? • Indexing a query (<=0.90): • Any query can be indexed as a document. Plus any arbitrary data curl -XPUT 'localhost:9200/_percolator/my-index/my-id' -d '{ "query" : {       "match" : {          "body" : "coffee"     }   }, "click_id" : 12 }' Thursday, September 5, 13
  • 8. Percolator, but how? • Indexing a query: • Path structure index: _percolator is a reserved index for queries. type: The index to register a query to. id: The unique identifier for a query. curl -XPUT 'localhost:9200/_percolator/my-index/my-id' -d '{ "query" : {       "match" : {          "body" : "coffee"     }   }, "click_id" : 12 }' Thursday, September 5, 13
  • 9. Percolator, but how? • Percolate api (<=0.90): • All queries registered to ‘my-index’ are consulted. curl -XPUT 'localhost:9200/my-index/my-type/_percolate' -d '{ "doc" : {       "title" : "Coffee percolator", "body" : "A coffee percolator is a type of ..."   } }' Thursday, September 5, 13
  • 10. Percolator, but how? • Percolate api response (<=0.90): • A simple list of query ids. • Also the percolate api work in realtime. { "ok" : true "matches" : ["my-id",...] } Thursday, September 5, 13
  • 11. Percolation in the wild Thursday, September 5, 13
  • 12. Alerting use case • Store and register queries that monitor data. End users can define their alerts via application. • Execute the percolate api right after indexing. No need to wait - percolator works in realtime. • Examples: Price monitor, News alerts, Stock alerts, Weather alerts Thursday, September 5, 13
  • 13. Alerting use case curl -XPUT 'localhost:9200/_percolator/prices/user-1' -d '{ "query" : { "bool" : [ { "range" : { "product.price" : { "lte" : 500 } } }, { "match" : {          "product.name" : "my led tv"     } } ]   } }' Triggered by user adding an user alert: Thursday, September 5, 13
  • 14. Percolator - alerting use case curl -XPOST 'localhost:9200/prices/price/_percolate' -d '{ "doc" : { "product" : { "name" : "my led tv", "price" : 499 }   } }' Then when new TVs are added: Thursday, September 5, 13
  • 15. Pricing use case • Store all users’ queries of a specific time frame Last week’s, last month’s queries. • Provide feedback to advertisement owner. Execute percolate api while editing the ad. • Examples: Real estate, car sales or any other market place. Thursday, September 5, 13
  • 16. Contextual ads use case • Store advertisement as queries. • On page display percolate document against the stored advertisements. • Examples: Gmail Thursday, September 5, 13
  • 17. Classification use case • Store queries that can identify patterns in your documents. • Percolate a document before indexing it. Enrich the document with the queries it matches with. • Examples: Automatically tag documents, geo tag documents and ways to automatically categorize documents. Thursday, September 5, 13
  • 19. Percolator - redesign • The _percolator index can only have one primary shard. Node 1 p1 Node 2 p1 Node 3 p1 C Percolate ? ? ? ? ? ? ? ? ? Thursday, September 5, 13
  • 20. Percolator - redesign • The redesigned percolator has no dedicated reserved _percolator index. • Instead the redesigned percolator has a _percolator type / mapping. • Any index can become a percolator index. Without any restrictions on (sharding) settings. Thursday, September 5, 13
  • 21. Percolator - redesign • Because _percolator index has been replaced by _percolator type: • Queries and your data coexist in the same index. Percolator shares the settings of the index it sits in. • Or have a number dedicated percolator indices. Thursday, September 5, 13
  • 22. Percolator - redesign • Redesigned percolator is fully distributed. Node 1 a1 Node 2 a1 C Percolate a2a2 Thursday, September 5, 13
  • 23. Percolator - redesign • Indexing a query: • Path structure index: The index to hold the query. type: The reserved _percolator type. id: The unique identifier for a query. curl -XPUT 'localhost:9200/my-index/_percolator/my-id' -d '{ "query" : {       "match" : {          "body" : "coffee"     }   }, "click_id" : 12 }' Thursday, September 5, 13
  • 24. • Percolate api remains similar, but: Fully multi tenant: Full alias support: And routing support. Percolator - redesign curl -XGET 'localhost:9200/my-index1,my-index2/my-type/_percolate' -d '{ "doc" : {       "title" : "Coffee percolator", "body" : "A coffee percolator is a type of ..."   } }' curl -XGET 'localhost:9200/my-alias/my-type/_percolate' -d '{ "doc" : {       "title" : "Coffee percolator", "body" : "A coffee percolator is a type of ..."   } }' Thursday, September 5, 13
  • 25. Percolator - redesign • Percolate api response: { "took" : 19, "_shards" : { "total" : 2, "successful" : 2, "failed" : 0 }, "count" : 4, "matches" : [ { "_index" : "my-index1", "_id" : "my-id" }, { "_index" : "my-index2", "_id" : "my-id" }, ... ] } Thursday, September 5, 13
  • 26. Percolator - how does it work? • Each shard holds a Collection of parsed queries in memory. • The queries are also stored on the shard (Lucene index) • The collection of queries get updated by every index, create, update or delete operation in realtime. Thursday, September 5, 13
  • 27. Percolator - how does it work? • During percolating the document to be percolated gets indexed into an in memory index. • All shard queries are executed against this one document in memory index. Shard level execution time is linear to the amount queries to evaluate. • After all queries have been evaluated the in memory index gets cleaned up. Thursday, September 5, 13
  • 28. Distributed percolator • Percolate api executes the request in parallel on all shards. • Use routing and multi tenancy to reduce the amount of queries to evaluate. - Routing will reduce the amount of shards. - More indices (and therefore more shards) reduces the amount of queries per shard. Thursday, September 5, 13
  • 29. Distributed percolator • No routing / partitioning Node 1 a1 Node 2 a1 C Percolate a2a2 a3 a3 Thursday, September 5, 13
  • 30. Distributed percolator • Percolating with routing: Node 1 a1 Node 2 a1 C Percolate, but route with XYZ a2a2 a3 a3 Thursday, September 5, 13
  • 31. Node 1 Distributed percolator • Percolating based on location partitioning in different indices. a1 Node 2 a1 C a2a2 b1 b1b2 index a = EU queries index b = NA queries b2 Thursday, September 5, 13
  • 33. Feature - percolate existing doc • Percolating a newly indexed document is very common pattern. curl -XGET 'localhost:9200/my-index1/my-type/1/_percolate' curl -XGET 'localhost:9200/my-index1/my-type/1/_percolate?percolate_index=my-index2' my-index1 is both percolate and source index: my-index2 contains the queries to evaluate: and my-index1 contains the document to percolate Thursday, September 5, 13
  • 34. Feature - count api curl -XPUT 'localhost:9200/my-index1/my-type/_percolate/count' -d '{ "doc" : {       "title" : "Coffee percolator", "body" : "A coffee percolator is a type of ..."   } }' { "took" : 8, "_shards" : { "total" : 2, "successful" : 2, "failed" : 0 }, "count" : 5 } Response: Count api: curl -XPUT 'localhost:9200/my-index1/my-type/1/_percolate/count' Count existing doc api: Thursday, September 5, 13
  • 35. Feature - filtering curl -XGET 'localhost:9200/my-index1/my-type/_percolate/count' -d '{ "doc" : {        "title" : "Coffee percolator", "body" : "A coffee percolator is a type of ..."   }, "query" : { "term" : {"click_id" : "43"} } }' Filtering by query: curl -XGET 'localhost:9200/my-index1/my-type/_percolate/count' -d '{ "doc" : {        "title" : "Coffee percolator", "body" : "A coffee percolator is a type of ..."   }, "filter" : { "term" : {"click_id" : "43"} } }' Filtering by filter: Thursday, September 5, 13
  • 36. Feature - sorting / scoring • Build on top on the query support. • Sorting based on percolator query fields. Document being percolated isn’t scored! • Three new options: • size The amount of matches to return (required with sort) • sort Whether to sort based on query. • score Just include score, but don’t sort • Like the query / filter support not realtime. Thursday, September 5, 13
  • 37. Feature - sorting / scoring • Sorting support works nicely with function score query. curl -XGET 'localhost:9200/my-index1/my-type/_percolate' -d '{ "doc" : {        ...   }, "query" : { "function_score" : { "query" : { "match_all": {}}, "functions" : [ { "exp" : { "create_date" : { "reference" : "2013/08/14", "scale" : "1000d" } } } ] } } "sort" : true, "size" : 10 }' Field in query Thursday, September 5, 13
  • 38. Feature - sorting / scoring { "took": 2, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "total": 2, "matches": [ { "_index": "my-index", "_id": "2", "_score": 0.85559505 }, { "_index": "my-index", "_id": "1", "_score": 0.4002574 } ] } • Response: Thursday, September 5, 13
  • 39. Feature - highlighting curl -XPUT 'localhost:9200/my-index/_percolator/1' -d '{ "query": { "match" : { "body" : "brown fox" } } }' curl -XPUT 'localhost:9200/my-index/_percolator/2' -d '{ "query": { "match" : { "body" : "lazy dog" } } }' • Lets index two queries: Thursday, September 5, 13
  • 40. Feature - highlighting • The size option is required. • All highlight options are supported. curl -XGET 'localhost:9200/my-index/my-type/percolate' -d '{ "doc" : { "body" : "The quick brown fox jumps over the lazy dog" }, "highlight" : { "fields" : { "body" : {} } }, "size" : 5 }' Thursday, September 5, 13
  • 41. Feature - highlighting { ... "total": 2, "matches": [ { "_index": "my-index", "_id": "1", "highlight": { "body": [ "The quick <em>brown</em> <em>fox</em> jumps over the lazy dog" ] } }, { "_index": "my-index", "_id": "2", "highlight": { "body": [ "The quick brown fox jumps over the <em>lazy</em> <em>dog</em>" ] } } ] } Thursday, September 5, 13
  • 42. Feature - multi percolate • Combine multiple percolate requests into a single request. {"percolate" : {"index" : "my-index", "type" : "my-tweet"}} {"doc" : {"title" : "coffee percolator"}} {"percolate" : "index" : "my-index", "type" : "my-type", "id" : "1"} {} {"count" : {"index" : "my-index", "type" : "my-type"}} {"doc" : {"title" : "coffee percolator"}} {"count" : "index" : "my-index", "type" : "my-type", "id" : "1"} {} curl -XGET 'localhost:9200/_mpercolate' --data-binary @requests.txt; echo requests.txt: Request: Thursday, September 5, 13