SlideShare una empresa de Scribd logo
1 de 82
Descargar para leer sin conexión
Fazendo mágica com 
ElasticSearch 
PEDROFRANCESCHI 
@pedroh96 
pedro@pagar.me 
github.com/pedrofranceschi
Outubro/2010
Filters 
Full text search 
Sort 
Highlight 
Facets 
Pagination
Você vai precisar buscar dados.
Você vai precisar entender dados.
(My)SQL não é a solução. 
(… nem NoSQL)
O que é o ElasticSearch?
ElasticSearch 
• “Open Source Distributed Real Time Search & Analytics” 
• API RESTful para indexar/buscar JSONs (“NoSQL”) 
• NÃO é um banco de dados 
• Apache Lucene 
• Just works (and scales) 
• Full text search, aggregations, scripting, etc, etc, etc.
Nomes? 
MySQL ElasticSearch 
Database Index 
Table Type 
Row Document 
Column Field 
Schema Mapping 
Partition Shard
Como usar o ElasticSearch?
$ curl -XPUT 'http://localhost:9200/twitter/tweet/1' -d '{! 
"user" : “pedroh96",! 
"post_date" : "2009-11-15T14:12:12",! 
"message" : "trying out Elasticsearch"! 
}' 
Endpoint Index Type 
Document 
ID 
Document 
{! 
"_index" : "twitter",! 
"_type" : "tweet",! 
"_id" : "1",! 
"_version" : 1,! 
"created" : true! 
} 
PUT data
Endpoint Index Type 
$ curl -XGET 'http://localhost:9200/twitter/tweet/1' 
Document 
ID 
{! 
"_id": "1",! 
"_index": "twitter",! 
"_source": {! 
"message": "trying out Elasticsearch",! 
"post_date": "2009-11-15T14:12:12",! 
"user": "pedroh96"! 
},! 
"_type": "tweet",! 
"_version": 1,! 
"found": true! 
} 
Document 
GET data
GET data Endpoint Index 
$ curl -XGET 'http://localhost:9200/twitter/_search'! 
-d ‘{ query: . . . }! 
! 
! 
Query de busca 
! 
! 
! 
! 
! 
! 
! 
Operador 
de busca
ActiveRecords 
class Tweet < ActiveRecord::Base! 
end
ActiveRecords 
require 'elasticsearch/model'! 
! 
class Tweet < ActiveRecord::Base! 
include Elasticsearch::Model! 
include Elasticsearch::Model::Callbacks! 
end! 
!
Tweet.import
Tweet.search(“pedroh96”)
Por que usar o ElasticSearch?
DISCLAIMER
Post.where(:all, :author => "pedroh96") 
vs 
Post.search(query: { match: { author: "pedroh96" }}) 
Just Another Query Language?
1) Full text search
ActiveRecords 
$ rails g scaffold Post title:string! 
source:string
GET /posts/5 
Post.find(5) 
:-) 
ActiveRecords
ActiveRecords 
“Amazon to Buy Video Site Twitch for More Than $1B” 
Post.where(:all, :title => "Amazon to Buy 
Video Site Twitch for More Than $1B") 
:-)
“amazon” 
Post.where(["title LIKE ?", "%Amazon%"]) 
??? 
ActiveRecords
“amazon source:online.wsj.com” 
Post.where(["title LIKE ? AND source = ?", 
"%Amazon%", "online.wsj.com"]) 
?????? 
ActiveRecords
“amazon” 
Post.search("amazon") 
:-) 
ElasticSearch
ElasticSearch 
“amazon source:online.wsj.com” 
search = Post.search("amazon source:online.wsj.com") 
:-)
ElasticSearch 
“amazon source:online.wsj.com” 
search = Post.search( 
query:{ 
match: { 
_all: "amazon source:online.wsj.com", 
} 
} 
) 
Full-text search
ElasticSearch 
“amazon source:online.wsj.com” 
search = Post.search( 
query:{ 
multi_match: { 
query: "amazon source:online.wsj.com", 
fields: ['title^10', 'source'] 
} 
} 
) 
Full-text search 
Title boost
ElasticSearch 
“amazon source:online.wsj.com” 
search = Post.search( 
query:{ 
multi_match: { 
query: "amazon source:online.wsj.com", 
fields: ['title^10', 'source'] 
} 
}, 
highlight: { 
fields: { 
title: {} 
} 
} 
) 
Title highlight 
Full-text search 
Title boost
ElasticSearch 
Title highlight 
> search.results[0].highlight.title 
=> ["Twitch officially acquired by <em>Amazon</em>"]
2) Aggregations (faceting)
Geo distance aggregation
ActiveRecords 
$ rails g scaffold Coordinate 
latitude:decimal longitude:decimal
ActiveRecords 
class Coordinate < ActiveRecord::Base! 
end
ActiveRecords 
class Coordinate < ActiveRecord::Base! 
def distance_to(coordinate)! 
# From http://en.wikipedia.org/wiki/Haversine_formula! 
rad_per_deg = Math::PI/180 # PI / 180! 
rkm = 6371 # Earth radius in kilometers! 
rm = rkm * 1000 # Radius in meters! 
! 
dlon_rad = (coordinate.longitude.to_f - self.longitude.to_f) * rad_per_deg # Delta, converted to rad! 
dlat_rad = (coordinate.latitude.to_f - self.latitude.to_f) * rad_per_deg! 
! 
lat1_rad = coordinate.latitude.to_f * rad_per_deg! 
lat2_rad = self.latitude.to_f * rad_per_deg! 
lon1_rad = coordinate.longitude.to_f * rad_per_deg! 
lon2_rad = self.longitude.to_f * rad_per_deg! 
! 
a = Math.sin(dlat_rad/2)**2 + Math.cos(lat1_rad) * Math.cos(lat2_rad) * Math.sin(dlon_rad/2)**2! 
c = 2 * Math::atan2(Math::sqrt(a), Math::sqrt(1-a))! 
! 
rm * c # Delta in meters! 
end! 
end 
> c1 = Coordinate.new(:latitude => -23.5532636, :longitude => -46.6528908) 
> c2 = Coordinate.new(:latitude => -23.5538488, :longitude => -46.6530035) 
> c1.distance_to(c2) 
=> 66.07749735875552
ActiveRecords 
origin = Coordinate.new(:latitude => -23.5532636, :longitude => -46.6528908) 
buckets = [! 
{! 
:to => 100,! 
:coordinates => []! 
},! 
{! 
:from => 100,! 
:to => 300,! 
:coordinates => []! 
},! 
{! 
:from => 300,! 
:coordinates => []! 
}! 
]! 
Coordinate.all.each do |coordinate|! 
distance = origin.distance_to(coordinate)! 
! 
buckets.each do |bucket|! 
if distance < bucket[:to] and distance > (bucket[:from] || 0)! 
bucket[:coordinates] << coordinate! 
end! 
end! 
end 
??????
ElasticSearch 
query = {! 
aggregations: {! 
Nome da aggregation 
rings_around_rubyconf: {! 
geo_distance: {! 
Field com localização 
Coordenadas da origem 
field: "location",! 
origin: "-23.5532636, -46.6528908",! 
ranges: [! 
{ to: 100 },! 
{ from: 100, to: 300 },! 
{ from: 300 }! 
]! 
}! 
Tipo da aggregation 
}! 
}! 
} 
Buckets para agregar 
search = Coordinate.search(query) :-)
(Extended) stats aggregation
ActiveRecords 
$ rails g scaffold Grade subject:string 
grade:decimal
ElasticSearch 
query = {! 
aggregations: {! 
Nome da aggregation 
grades_stats: {! 
Tipo da aggregation 
extended_stats: {! 
field: "grade",! 
}! 
}! 
}! 
}! 
! 
search = Grade.search(query) 
Nome do field
ElasticSearch 
> search.response.aggregations.grades_stats! 
! 
=> #<Hashie::Mash avg=8.03 count=3 max=10.0 min=4.6 
std_deviation=2.43 sum=24.1 sum_of_squares=211.41 
variance=5.93>>
(Extended) stats aggregation 
+ 
Scripting
ElasticSearch 
query = {! 
aggregations: {! 
grades_stats: {! 
extended_stats: {! 
field: "grade",! 
}! 
}! 
}! 
}
ElasticSearch 
query = {! 
aggregations: {! 
Nome da aggregation 
grades_stats: {! 
extended_stats: {! 
field: "grade",! 
script: "_value < 7.0 ? _value * correction : _value",! 
params: {! 
correction: 1.2! 
}! 
}! 
}! 
}! 
}! 
! 
search = Grade.search(query) 
Nome do field 
JavaScript para 
calcular novo grade 
Tipo da aggregation
ElasticSearch 
> search.response.aggregations.grades_stats! 
! 
=> #<Hashie::Mash avg=8.34 count=3 max=10.0 min=5.52 
std_deviation=2.00 sum=25.02 sum_of_squares=220.72 
variance=4.01>>
Term aggregation
ElasticSearch 
query = {! 
aggregations: {! 
subjects: {! 
terms: {! 
Nome da aggregation 
field: "subject"! 
}! 
}! 
}! 
}! 
! 
search = Grade.search(query) 
Nome do field 
Tipo da aggregation
ElasticSearch 
> search.response.aggregations.subjects! 
! 
=> #<Hashie::Mash buckets=[! 
#<Hashie::Mash doc_count=2 key=“math">,! 
#<Hashie::Mash doc_count=1 key="grammar">, 
#<Hashie::Mash doc_count=1 key=“physics">! 
]>
Combined aggregations 
(term + stats)
ElasticSearch 
query = {! 
aggregations: {! 
subjects: {! 
terms: {! 
field: "subject"! 
}! 
}! 
}! 
}! 
! 
search = Grade.search(query)
ElasticSearch 
query = {! 
aggregations: {! 
subjects: {! 
terms: {! 
Nome da parent aggregation 
field: "subject"! 
},! 
aggregations: {! 
grade_stats: {! 
stats: {! 
Nome da child aggregation 
field: "grade"! 
}! 
}! 
}! 
}! 
}! 
}! 
! 
search = Grade.search(query) 
Field para parent 
aggregation 
Field para child 
aggregation
ElasticSearch 
> search.response.aggregations.subjects! 
! 
#<Hashie::Mash buckets=[! 
#<Hashie::Mash doc_count=2 grade_stats=#<Hashie::Mash 
avg=9.0 count=2 max=10.0 min=8.0 sum=18.0> key="math">, 
#<Hashie::Mash doc_count=1 grade_stats=#<Hashie::Mash 
avg=4.6 count=1 max=4.6 min=4.6 sum=4.6> key="grammar">, 
#<Hashie::Mash doc_count=1 grade_stats=#<Hashie::Mash 
avg=9.5 count=1 max=9.5 min=9.5 sum=9.5> key=“physics">! 
]>
Top Hits 
More like this 
Histogram 
Scripted metrics 
Geo bounds 
Stemmer (sinônimos) 
IPv4 ranges 
. . .
3) Scoring
ActiveRecords 
$ rails g scaffold Post title:string! 
source:string likes:integer
“amazon” 
ElasticSearch 
search = Post.search( 
query: { 
match: { 
_all: "amazon", 
} 
} 
) 
Full-text search 
search.results.results[0]._score 
=> 0.8174651
“amazon” 
ElasticSearch 
search = Post.search( 
query: { 
custom_score: { 
query:{ 
match: { 
_all: "amazon", 
} 
}, 
script: "_score * doc['likes'].value" 
} 
} 
) 
Full-text search 
Likes influenciam no score 
search.results.results[0]._score 
=> 31.8811388
GET http://localhost:9200/post/_search?explain 
"_explanation": {! 
"description": "weight(tweet:honeymoon in 0)! 
[PerFieldSimilarity], result of:",! 
"value": 0.076713204,! 
"details": [! 
{! 
"description": "fieldWeight in 0, product of:",! 
"value": 0.076713204,! 
"details": [! 
{! 
"description": "tf(freq=1.0), with freq of:",! 
"value": 1,! 
"details": [! 
{! 
"description": "termFreq=1.0",! 
"value": 1! 
}! 
]! 
},! 
{! 
"description": "idf(docFreq=1, maxDocs=1)",! 
"value": 0.30685282! 
},! 
{! 
"description": "fieldNorm(doc=0)",! 
"value": 0.25,! 
}! 
]! 
}! 
]! 
} 
Score explicado
4) Indexando responses
$ rails g scaffold Post title:string! 
source:string likes:integer
class PostsController < ApplicationController! 
! 
# ...! 
! 
def show! 
@post = Post.find(params[:id])! 
! 
render json: @post! 
end! 
! 
# ...! 
! 
end 
SELECT * FROM Posts WHERE id = params[:id]
class PostsController < ApplicationController! 
! 
# ...! 
! 
def show! 
@post = Post.search(query: { match: { id: params[:id] }})! 
! 
render json: @post! 
end! 
! 
# ...! 
! 
end 
GET http://localhost:9200/posts/posts/params[:id]
ActiveRecords 
require 'elasticsearch/model'! 
! 
class Post < ActiveRecord::Base! 
include Elasticsearch::Model! 
include Elasticsearch::Model::Callbacks! 
! 
belongs_to :author! 
! 
def as_indexed_json(options={})! 
self.as_json(! 
include: { author: { only: [:name, :bio] },! 
})! 
end! 
end Inclui um parent no JSON indexado
Expondo o ElasticSearch
http://localhost:9200/pagarme/_search 
https://api.pagar.me/1/search
Infraestrutura do Pagar.me 
ElasticSearch ElasticSearch 
Router 
api.pagar.me 
Servidor da API 
(Node.js) 
MySQL 
(transações e dados relacionais) 
MySQL 
(transações e dados relacionais) 
MongoDB 
(dados de clientes e não relacionais) 
Ambiente de testes 
(sandbox dos clientes) 
Servidor da API 
(Node.js) 
Ambiente de produção
Expondo o ElasticSearch 
• Endpoint do ElasticSearch -> Endpoint acessado pelo 
cliente… 
• … mas cuidado: dados precisam ser delimitados a 
conta do cliente (claro) 
• Vantagem: acesso às mesmas features do 
ElasticSearch (aggregations, statistics, scores, etc) 
• Segurança: desabilitar scripts do ElasticSearch
GET /search 
• Um único endpoint para todos os GETs 
• Todos os dados indexados e prontos para serem 
usados (no joins) 
• Queries complexas construídas no front-side 
(Angular.js) 
• Desenvolvimento front-end não dependente do 
back-end
Overall…
1)Há uma ferramenta para cada tarefa. 
2)Um martelo é sempre a ferramenta certa. 
3)Toda ferramenta também é um martelo.
MySQL 
!= 
NoSQL 
!= 
ElasticSearch
Obrigado! :) 
PEDROFRANCESCHI 
@pedroh96 
pedro@pagar.me 
github.com/pedrofranceschi
Perguntas? 
PEDROFRANCESCHI 
@pedroh96 
pedro@pagar.me 
github.com/pedrofranceschi
Fazendo mágica com 
ElasticSearch 
PEDROFRANCESCHI 
@pedroh96 
pedro@pagar.me 
github.com/pedrofranceschi

Más contenido relacionado

La actualidad más candente

Morphia: Simplifying Persistence for Java and MongoDB
Morphia:  Simplifying Persistence for Java and MongoDBMorphia:  Simplifying Persistence for Java and MongoDB
Morphia: Simplifying Persistence for Java and MongoDB
Jeff Yemin
 
High Performance Django
High Performance DjangoHigh Performance Django
High Performance Django
DjangoCon2008
 
Intravert Server side processing for Cassandra
Intravert Server side processing for CassandraIntravert Server side processing for Cassandra
Intravert Server side processing for Cassandra
Edward Capriolo
 
Leveraging the Power of Graph Databases in PHP
Leveraging the Power of Graph Databases in PHPLeveraging the Power of Graph Databases in PHP
Leveraging the Power of Graph Databases in PHP
Jeremy Kendall
 
Elasticsearch And Ruby [RuPy2012]
Elasticsearch And Ruby [RuPy2012]Elasticsearch And Ruby [RuPy2012]
Elasticsearch And Ruby [RuPy2012]
Karel Minarik
 
Java Persistence Frameworks for MongoDB
Java Persistence Frameworks for MongoDBJava Persistence Frameworks for MongoDB
Java Persistence Frameworks for MongoDB
MongoDB
 

La actualidad más candente (20)

elasticsearch - advanced features in practice
elasticsearch - advanced features in practiceelasticsearch - advanced features in practice
elasticsearch - advanced features in practice
 
MongoDB Performance Tuning
MongoDB Performance TuningMongoDB Performance Tuning
MongoDB Performance Tuning
 
Solr vs. Elasticsearch - Case by Case
Solr vs. Elasticsearch - Case by CaseSolr vs. Elasticsearch - Case by Case
Solr vs. Elasticsearch - Case by Case
 
Использование Elasticsearch для организации поиска по сайту
Использование Elasticsearch для организации поиска по сайтуИспользование Elasticsearch для организации поиска по сайту
Использование Elasticsearch для организации поиска по сайту
 
Solr & Lucene @ Etsy by Gregg Donovan
Solr & Lucene @ Etsy by Gregg DonovanSolr & Lucene @ Etsy by Gregg Donovan
Solr & Lucene @ Etsy by Gregg Donovan
 
Simple search with elastic search
Simple search with elastic searchSimple search with elastic search
Simple search with elastic search
 
High Performance XQuery Processing in PHP with Zorba by Vikram Vaswani
High Performance XQuery Processing in PHP with Zorba by Vikram VaswaniHigh Performance XQuery Processing in PHP with Zorba by Vikram Vaswani
High Performance XQuery Processing in PHP with Zorba by Vikram Vaswani
 
Morphia: Simplifying Persistence for Java and MongoDB
Morphia:  Simplifying Persistence for Java and MongoDBMorphia:  Simplifying Persistence for Java and MongoDB
Morphia: Simplifying Persistence for Java and MongoDB
 
High Performance Django
High Performance DjangoHigh Performance Django
High Performance Django
 
Intravert Server side processing for Cassandra
Intravert Server side processing for CassandraIntravert Server side processing for Cassandra
Intravert Server side processing for Cassandra
 
Php 102: Out with the Bad, In with the Good
Php 102: Out with the Bad, In with the GoodPhp 102: Out with the Bad, In with the Good
Php 102: Out with the Bad, In with the Good
 
Leveraging the Power of Graph Databases in PHP
Leveraging the Power of Graph Databases in PHPLeveraging the Power of Graph Databases in PHP
Leveraging the Power of Graph Databases in PHP
 
SunshinePHP 2017 - Making the most out of MySQL
SunshinePHP 2017 - Making the most out of MySQLSunshinePHP 2017 - Making the most out of MySQL
SunshinePHP 2017 - Making the most out of MySQL
 
Elasticsearch And Ruby [RuPy2012]
Elasticsearch And Ruby [RuPy2012]Elasticsearch And Ruby [RuPy2012]
Elasticsearch And Ruby [RuPy2012]
 
Elasticsearch in 15 Minutes
Elasticsearch in 15 MinutesElasticsearch in 15 Minutes
Elasticsearch in 15 Minutes
 
Data Exploration with Elasticsearch
Data Exploration with ElasticsearchData Exploration with Elasticsearch
Data Exploration with Elasticsearch
 
Leveraging the Power of Graph Databases in PHP
Leveraging the Power of Graph Databases in PHPLeveraging the Power of Graph Databases in PHP
Leveraging the Power of Graph Databases in PHP
 
Back to Basics Webinar 5: Introduction to the Aggregation Framework
Back to Basics Webinar 5: Introduction to the Aggregation FrameworkBack to Basics Webinar 5: Introduction to the Aggregation Framework
Back to Basics Webinar 5: Introduction to the Aggregation Framework
 
Java Persistence Frameworks for MongoDB
Java Persistence Frameworks for MongoDBJava Persistence Frameworks for MongoDB
Java Persistence Frameworks for MongoDB
 
Elasticsearch (Rubyshift 2013)
Elasticsearch (Rubyshift 2013)Elasticsearch (Rubyshift 2013)
Elasticsearch (Rubyshift 2013)
 

Destacado

Alto desempenho e escalabilidade em aplicações web utilizando banco de dados ...
Alto desempenho e escalabilidade em aplicações web utilizando banco de dados ...Alto desempenho e escalabilidade em aplicações web utilizando banco de dados ...
Alto desempenho e escalabilidade em aplicações web utilizando banco de dados ...
Vagner Santana
 

Destacado (11)

Node.js no Pagar.me
Node.js no Pagar.meNode.js no Pagar.me
Node.js no Pagar.me
 
Node.js: serious business
Node.js: serious businessNode.js: serious business
Node.js: serious business
 
Porque você deve aprender VIm hoje.
Porque você deve aprender VIm hoje.Porque você deve aprender VIm hoje.
Porque você deve aprender VIm hoje.
 
Primeiros Passos Com Elasticsearch
Primeiros Passos Com ElasticsearchPrimeiros Passos Com Elasticsearch
Primeiros Passos Com Elasticsearch
 
Processo de Contratação de Pessoas - É possível fazer bem melhor!
Processo de Contratação de Pessoas - É possível fazer bem melhor!Processo de Contratação de Pessoas - É possível fazer bem melhor!
Processo de Contratação de Pessoas - É possível fazer bem melhor!
 
Como o elasticsearch salvou minhas buscas
Como o elasticsearch salvou minhas buscasComo o elasticsearch salvou minhas buscas
Como o elasticsearch salvou minhas buscas
 
Alto desempenho e escalabilidade em aplicações web utilizando banco de dados ...
Alto desempenho e escalabilidade em aplicações web utilizando banco de dados ...Alto desempenho e escalabilidade em aplicações web utilizando banco de dados ...
Alto desempenho e escalabilidade em aplicações web utilizando banco de dados ...
 
Treinamento Elasticsearch - Parte 1
Treinamento Elasticsearch - Parte 1Treinamento Elasticsearch - Parte 1
Treinamento Elasticsearch - Parte 1
 
Node.js - Devo adotar na minha empresa?
Node.js - Devo adotar na minha empresa?Node.js - Devo adotar na minha empresa?
Node.js - Devo adotar na minha empresa?
 
Micro serviços com node.js
Micro serviços com node.jsMicro serviços com node.js
Micro serviços com node.js
 
Secure Your REST API (The Right Way)
Secure Your REST API (The Right Way)Secure Your REST API (The Right Way)
Secure Your REST API (The Right Way)
 

Similar a Fazendo mágica com ElasticSearch

Real-time search in Drupal with Elasticsearch @Moldcamp
Real-time search in Drupal with Elasticsearch @MoldcampReal-time search in Drupal with Elasticsearch @Moldcamp
Real-time search in Drupal with Elasticsearch @Moldcamp
Alexei Gorobets
 
mobl presentation @ IHomer
mobl presentation @ IHomermobl presentation @ IHomer
mobl presentation @ IHomer
zefhemel
 
Refactoring to Macros with Clojure
Refactoring to Macros with ClojureRefactoring to Macros with Clojure
Refactoring to Macros with Clojure
Dmitry Buzdin
 

Similar a Fazendo mágica com ElasticSearch (20)

Extreme Swift
Extreme SwiftExtreme Swift
Extreme Swift
 
Spark with Elasticsearch
Spark with ElasticsearchSpark with Elasticsearch
Spark with Elasticsearch
 
A Rusty introduction to Apache Arrow and how it applies to a time series dat...
A Rusty introduction to Apache Arrow and how it applies to a  time series dat...A Rusty introduction to Apache Arrow and how it applies to a  time series dat...
A Rusty introduction to Apache Arrow and how it applies to a time series dat...
 
Elasticsearch first-steps
Elasticsearch first-stepsElasticsearch first-steps
Elasticsearch first-steps
 
Solving the Riddle of Search: Using Sphinx with Rails
Solving the Riddle of Search: Using Sphinx with RailsSolving the Riddle of Search: Using Sphinx with Rails
Solving the Riddle of Search: Using Sphinx with Rails
 
Elastic tire demo
Elastic tire demoElastic tire demo
Elastic tire demo
 
Real-time search in Drupal with Elasticsearch @Moldcamp
Real-time search in Drupal with Elasticsearch @MoldcampReal-time search in Drupal with Elasticsearch @Moldcamp
Real-time search in Drupal with Elasticsearch @Moldcamp
 
Cool bonsai cool - an introduction to ElasticSearch
Cool bonsai cool - an introduction to ElasticSearchCool bonsai cool - an introduction to ElasticSearch
Cool bonsai cool - an introduction to ElasticSearch
 
Open Source Search: An Analysis
Open Source Search: An AnalysisOpen Source Search: An Analysis
Open Source Search: An Analysis
 
The Ring programming language version 1.8 book - Part 50 of 202
The Ring programming language version 1.8 book - Part 50 of 202The Ring programming language version 1.8 book - Part 50 of 202
The Ring programming language version 1.8 book - Part 50 of 202
 
Finding the right stuff, an intro to Elasticsearch with Ruby/Rails
Finding the right stuff, an intro to Elasticsearch with Ruby/RailsFinding the right stuff, an intro to Elasticsearch with Ruby/Rails
Finding the right stuff, an intro to Elasticsearch with Ruby/Rails
 
huhu
huhuhuhu
huhu
 
mobl presentation @ IHomer
mobl presentation @ IHomermobl presentation @ IHomer
mobl presentation @ IHomer
 
Rails on Oracle 2011
Rails on Oracle 2011Rails on Oracle 2011
Rails on Oracle 2011
 
Refactoring to Macros with Clojure
Refactoring to Macros with ClojureRefactoring to Macros with Clojure
Refactoring to Macros with Clojure
 
Real life-coffeescript
Real life-coffeescriptReal life-coffeescript
Real life-coffeescript
 
Introduction to Elasticsearch
Introduction to ElasticsearchIntroduction to Elasticsearch
Introduction to Elasticsearch
 
Finding the right stuff, an intro to Elasticsearch (at Rug::B)
Finding the right stuff, an intro to Elasticsearch (at Rug::B) Finding the right stuff, an intro to Elasticsearch (at Rug::B)
Finding the right stuff, an intro to Elasticsearch (at Rug::B)
 
Full Text Search In PostgreSQL
Full Text Search In PostgreSQLFull Text Search In PostgreSQL
Full Text Search In PostgreSQL
 
JSON and the APInauts
JSON and the APInautsJSON and the APInauts
JSON and the APInauts
 

Último

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Último (20)

Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 

Fazendo mágica com ElasticSearch

  • 1. Fazendo mágica com ElasticSearch PEDROFRANCESCHI @pedroh96 pedro@pagar.me github.com/pedrofranceschi
  • 3. Filters Full text search Sort Highlight Facets Pagination
  • 4. Você vai precisar buscar dados.
  • 5. Você vai precisar entender dados.
  • 6. (My)SQL não é a solução. (… nem NoSQL)
  • 7. O que é o ElasticSearch?
  • 8. ElasticSearch • “Open Source Distributed Real Time Search & Analytics” • API RESTful para indexar/buscar JSONs (“NoSQL”) • NÃO é um banco de dados • Apache Lucene • Just works (and scales) • Full text search, aggregations, scripting, etc, etc, etc.
  • 9. Nomes? MySQL ElasticSearch Database Index Table Type Row Document Column Field Schema Mapping Partition Shard
  • 10. Como usar o ElasticSearch?
  • 11. $ curl -XPUT 'http://localhost:9200/twitter/tweet/1' -d '{! "user" : “pedroh96",! "post_date" : "2009-11-15T14:12:12",! "message" : "trying out Elasticsearch"! }' Endpoint Index Type Document ID Document {! "_index" : "twitter",! "_type" : "tweet",! "_id" : "1",! "_version" : 1,! "created" : true! } PUT data
  • 12. Endpoint Index Type $ curl -XGET 'http://localhost:9200/twitter/tweet/1' Document ID {! "_id": "1",! "_index": "twitter",! "_source": {! "message": "trying out Elasticsearch",! "post_date": "2009-11-15T14:12:12",! "user": "pedroh96"! },! "_type": "tweet",! "_version": 1,! "found": true! } Document GET data
  • 13. GET data Endpoint Index $ curl -XGET 'http://localhost:9200/twitter/_search'! -d ‘{ query: . . . }! ! ! Query de busca ! ! ! ! ! ! ! Operador de busca
  • 14. ActiveRecords class Tweet < ActiveRecord::Base! end
  • 15. ActiveRecords require 'elasticsearch/model'! ! class Tweet < ActiveRecord::Base! include Elasticsearch::Model! include Elasticsearch::Model::Callbacks! end! !
  • 18. Por que usar o ElasticSearch?
  • 20. Post.where(:all, :author => "pedroh96") vs Post.search(query: { match: { author: "pedroh96" }}) Just Another Query Language?
  • 21. 1) Full text search
  • 22. ActiveRecords $ rails g scaffold Post title:string! source:string
  • 23. GET /posts/5 Post.find(5) :-) ActiveRecords
  • 24. ActiveRecords “Amazon to Buy Video Site Twitch for More Than $1B” Post.where(:all, :title => "Amazon to Buy Video Site Twitch for More Than $1B") :-)
  • 25. “amazon” Post.where(["title LIKE ?", "%Amazon%"]) ??? ActiveRecords
  • 26. “amazon source:online.wsj.com” Post.where(["title LIKE ? AND source = ?", "%Amazon%", "online.wsj.com"]) ?????? ActiveRecords
  • 28. ElasticSearch “amazon source:online.wsj.com” search = Post.search("amazon source:online.wsj.com") :-)
  • 29. ElasticSearch “amazon source:online.wsj.com” search = Post.search( query:{ match: { _all: "amazon source:online.wsj.com", } } ) Full-text search
  • 30. ElasticSearch “amazon source:online.wsj.com” search = Post.search( query:{ multi_match: { query: "amazon source:online.wsj.com", fields: ['title^10', 'source'] } } ) Full-text search Title boost
  • 31. ElasticSearch “amazon source:online.wsj.com” search = Post.search( query:{ multi_match: { query: "amazon source:online.wsj.com", fields: ['title^10', 'source'] } }, highlight: { fields: { title: {} } } ) Title highlight Full-text search Title boost
  • 32. ElasticSearch Title highlight > search.results[0].highlight.title => ["Twitch officially acquired by <em>Amazon</em>"]
  • 33.
  • 36.
  • 37. ActiveRecords $ rails g scaffold Coordinate latitude:decimal longitude:decimal
  • 38. ActiveRecords class Coordinate < ActiveRecord::Base! end
  • 39. ActiveRecords class Coordinate < ActiveRecord::Base! def distance_to(coordinate)! # From http://en.wikipedia.org/wiki/Haversine_formula! rad_per_deg = Math::PI/180 # PI / 180! rkm = 6371 # Earth radius in kilometers! rm = rkm * 1000 # Radius in meters! ! dlon_rad = (coordinate.longitude.to_f - self.longitude.to_f) * rad_per_deg # Delta, converted to rad! dlat_rad = (coordinate.latitude.to_f - self.latitude.to_f) * rad_per_deg! ! lat1_rad = coordinate.latitude.to_f * rad_per_deg! lat2_rad = self.latitude.to_f * rad_per_deg! lon1_rad = coordinate.longitude.to_f * rad_per_deg! lon2_rad = self.longitude.to_f * rad_per_deg! ! a = Math.sin(dlat_rad/2)**2 + Math.cos(lat1_rad) * Math.cos(lat2_rad) * Math.sin(dlon_rad/2)**2! c = 2 * Math::atan2(Math::sqrt(a), Math::sqrt(1-a))! ! rm * c # Delta in meters! end! end > c1 = Coordinate.new(:latitude => -23.5532636, :longitude => -46.6528908) > c2 = Coordinate.new(:latitude => -23.5538488, :longitude => -46.6530035) > c1.distance_to(c2) => 66.07749735875552
  • 40. ActiveRecords origin = Coordinate.new(:latitude => -23.5532636, :longitude => -46.6528908) buckets = [! {! :to => 100,! :coordinates => []! },! {! :from => 100,! :to => 300,! :coordinates => []! },! {! :from => 300,! :coordinates => []! }! ]! Coordinate.all.each do |coordinate|! distance = origin.distance_to(coordinate)! ! buckets.each do |bucket|! if distance < bucket[:to] and distance > (bucket[:from] || 0)! bucket[:coordinates] << coordinate! end! end! end ??????
  • 41. ElasticSearch query = {! aggregations: {! Nome da aggregation rings_around_rubyconf: {! geo_distance: {! Field com localização Coordenadas da origem field: "location",! origin: "-23.5532636, -46.6528908",! ranges: [! { to: 100 },! { from: 100, to: 300 },! { from: 300 }! ]! }! Tipo da aggregation }! }! } Buckets para agregar search = Coordinate.search(query) :-)
  • 43. ActiveRecords $ rails g scaffold Grade subject:string grade:decimal
  • 44. ElasticSearch query = {! aggregations: {! Nome da aggregation grades_stats: {! Tipo da aggregation extended_stats: {! field: "grade",! }! }! }! }! ! search = Grade.search(query) Nome do field
  • 45. ElasticSearch > search.response.aggregations.grades_stats! ! => #<Hashie::Mash avg=8.03 count=3 max=10.0 min=4.6 std_deviation=2.43 sum=24.1 sum_of_squares=211.41 variance=5.93>>
  • 47. ElasticSearch query = {! aggregations: {! grades_stats: {! extended_stats: {! field: "grade",! }! }! }! }
  • 48. ElasticSearch query = {! aggregations: {! Nome da aggregation grades_stats: {! extended_stats: {! field: "grade",! script: "_value < 7.0 ? _value * correction : _value",! params: {! correction: 1.2! }! }! }! }! }! ! search = Grade.search(query) Nome do field JavaScript para calcular novo grade Tipo da aggregation
  • 49. ElasticSearch > search.response.aggregations.grades_stats! ! => #<Hashie::Mash avg=8.34 count=3 max=10.0 min=5.52 std_deviation=2.00 sum=25.02 sum_of_squares=220.72 variance=4.01>>
  • 51. ElasticSearch query = {! aggregations: {! subjects: {! terms: {! Nome da aggregation field: "subject"! }! }! }! }! ! search = Grade.search(query) Nome do field Tipo da aggregation
  • 52. ElasticSearch > search.response.aggregations.subjects! ! => #<Hashie::Mash buckets=[! #<Hashie::Mash doc_count=2 key=“math">,! #<Hashie::Mash doc_count=1 key="grammar">, #<Hashie::Mash doc_count=1 key=“physics">! ]>
  • 54. ElasticSearch query = {! aggregations: {! subjects: {! terms: {! field: "subject"! }! }! }! }! ! search = Grade.search(query)
  • 55. ElasticSearch query = {! aggregations: {! subjects: {! terms: {! Nome da parent aggregation field: "subject"! },! aggregations: {! grade_stats: {! stats: {! Nome da child aggregation field: "grade"! }! }! }! }! }! }! ! search = Grade.search(query) Field para parent aggregation Field para child aggregation
  • 56. ElasticSearch > search.response.aggregations.subjects! ! #<Hashie::Mash buckets=[! #<Hashie::Mash doc_count=2 grade_stats=#<Hashie::Mash avg=9.0 count=2 max=10.0 min=8.0 sum=18.0> key="math">, #<Hashie::Mash doc_count=1 grade_stats=#<Hashie::Mash avg=4.6 count=1 max=4.6 min=4.6 sum=4.6> key="grammar">, #<Hashie::Mash doc_count=1 grade_stats=#<Hashie::Mash avg=9.5 count=1 max=9.5 min=9.5 sum=9.5> key=“physics">! ]>
  • 57. Top Hits More like this Histogram Scripted metrics Geo bounds Stemmer (sinônimos) IPv4 ranges . . .
  • 59. ActiveRecords $ rails g scaffold Post title:string! source:string likes:integer
  • 60. “amazon” ElasticSearch search = Post.search( query: { match: { _all: "amazon", } } ) Full-text search search.results.results[0]._score => 0.8174651
  • 61. “amazon” ElasticSearch search = Post.search( query: { custom_score: { query:{ match: { _all: "amazon", } }, script: "_score * doc['likes'].value" } } ) Full-text search Likes influenciam no score search.results.results[0]._score => 31.8811388
  • 62. GET http://localhost:9200/post/_search?explain "_explanation": {! "description": "weight(tweet:honeymoon in 0)! [PerFieldSimilarity], result of:",! "value": 0.076713204,! "details": [! {! "description": "fieldWeight in 0, product of:",! "value": 0.076713204,! "details": [! {! "description": "tf(freq=1.0), with freq of:",! "value": 1,! "details": [! {! "description": "termFreq=1.0",! "value": 1! }! ]! },! {! "description": "idf(docFreq=1, maxDocs=1)",! "value": 0.30685282! },! {! "description": "fieldNorm(doc=0)",! "value": 0.25,! }! ]! }! ]! } Score explicado
  • 64. $ rails g scaffold Post title:string! source:string likes:integer
  • 65. class PostsController < ApplicationController! ! # ...! ! def show! @post = Post.find(params[:id])! ! render json: @post! end! ! # ...! ! end SELECT * FROM Posts WHERE id = params[:id]
  • 66. class PostsController < ApplicationController! ! # ...! ! def show! @post = Post.search(query: { match: { id: params[:id] }})! ! render json: @post! end! ! # ...! ! end GET http://localhost:9200/posts/posts/params[:id]
  • 67. ActiveRecords require 'elasticsearch/model'! ! class Post < ActiveRecord::Base! include Elasticsearch::Model! include Elasticsearch::Model::Callbacks! ! belongs_to :author! ! def as_indexed_json(options={})! self.as_json(! include: { author: { only: [:name, :bio] },! })! end! end Inclui um parent no JSON indexado
  • 69.
  • 71. Infraestrutura do Pagar.me ElasticSearch ElasticSearch Router api.pagar.me Servidor da API (Node.js) MySQL (transações e dados relacionais) MySQL (transações e dados relacionais) MongoDB (dados de clientes e não relacionais) Ambiente de testes (sandbox dos clientes) Servidor da API (Node.js) Ambiente de produção
  • 72. Expondo o ElasticSearch • Endpoint do ElasticSearch -> Endpoint acessado pelo cliente… • … mas cuidado: dados precisam ser delimitados a conta do cliente (claro) • Vantagem: acesso às mesmas features do ElasticSearch (aggregations, statistics, scores, etc) • Segurança: desabilitar scripts do ElasticSearch
  • 73. GET /search • Um único endpoint para todos os GETs • Todos os dados indexados e prontos para serem usados (no joins) • Queries complexas construídas no front-side (Angular.js) • Desenvolvimento front-end não dependente do back-end
  • 74.
  • 75.
  • 77.
  • 78. 1)Há uma ferramenta para cada tarefa. 2)Um martelo é sempre a ferramenta certa. 3)Toda ferramenta também é um martelo.
  • 79. MySQL != NoSQL != ElasticSearch
  • 80. Obrigado! :) PEDROFRANCESCHI @pedroh96 pedro@pagar.me github.com/pedrofranceschi
  • 81. Perguntas? PEDROFRANCESCHI @pedroh96 pedro@pagar.me github.com/pedrofranceschi
  • 82. Fazendo mágica com ElasticSearch PEDROFRANCESCHI @pedroh96 pedro@pagar.me github.com/pedrofranceschi