SlideShare una empresa de Scribd logo
1 de 82
Descargar para leer sin conexión
Fazendo mágica com 
ElasticSearch 
PEDROFRANCESCHI 
@pedroh96 
pedro@pagar.me 
github.com/pedrofranceschi
Outubro/2010
Filters 
Full text search 
Sort 
Highlight 
Facets 
Pagination
Você vai precisar buscar dados.
Você vai precisar entender dados.
(My)SQL não é a solução. 
(… nem NoSQL)
O que é o ElasticSearch?
ElasticSearch 
• “Open Source Distributed Real Time Search & Analytics” 
• API RESTful para indexar/buscar JSONs (“NoSQL”) 
• NÃO é um banco de dados 
• Apache Lucene 
• Just works (and scales) 
• Full text search, aggregations, scripting, etc, etc, etc.
Nomes? 
MySQL ElasticSearch 
Database Index 
Table Type 
Row Document 
Column Field 
Schema Mapping 
Partition Shard
Como usar o ElasticSearch?
$ curl -XPUT 'http://localhost:9200/twitter/tweet/1' -d '{! 
"user" : “pedroh96",! 
"post_date" : "2009-11-15T14:12:12",! 
"message" : "trying out Elasticsearch"! 
}' 
Endpoint Index Type 
Document 
ID 
Document 
{! 
"_index" : "twitter",! 
"_type" : "tweet",! 
"_id" : "1",! 
"_version" : 1,! 
"created" : true! 
} 
PUT data
Endpoint Index Type 
$ curl -XGET 'http://localhost:9200/twitter/tweet/1' 
Document 
ID 
{! 
"_id": "1",! 
"_index": "twitter",! 
"_source": {! 
"message": "trying out Elasticsearch",! 
"post_date": "2009-11-15T14:12:12",! 
"user": "pedroh96"! 
},! 
"_type": "tweet",! 
"_version": 1,! 
"found": true! 
} 
Document 
GET data
GET data Endpoint Index 
$ curl -XGET 'http://localhost:9200/twitter/_search'! 
-d ‘{ query: . . . }! 
! 
! 
Query de busca 
! 
! 
! 
! 
! 
! 
! 
Operador 
de busca
ActiveRecords 
class Tweet < ActiveRecord::Base! 
end
ActiveRecords 
require 'elasticsearch/model'! 
! 
class Tweet < ActiveRecord::Base! 
include Elasticsearch::Model! 
include Elasticsearch::Model::Callbacks! 
end! 
!
Tweet.import
Tweet.search(“pedroh96”)
Por que usar o ElasticSearch?
DISCLAIMER
Post.where(:all, :author => "pedroh96") 
vs 
Post.search(query: { match: { author: "pedroh96" }}) 
Just Another Query Language?
1) Full text search
ActiveRecords 
$ rails g scaffold Post title:string! 
source:string
GET /posts/5 
Post.find(5) 
:-) 
ActiveRecords
ActiveRecords 
“Amazon to Buy Video Site Twitch for More Than $1B” 
Post.where(:all, :title => "Amazon to Buy 
Video Site Twitch for More Than $1B") 
:-)
“amazon” 
Post.where(["title LIKE ?", "%Amazon%"]) 
??? 
ActiveRecords
“amazon source:online.wsj.com” 
Post.where(["title LIKE ? AND source = ?", 
"%Amazon%", "online.wsj.com"]) 
?????? 
ActiveRecords
“amazon” 
Post.search("amazon") 
:-) 
ElasticSearch
ElasticSearch 
“amazon source:online.wsj.com” 
search = Post.search("amazon source:online.wsj.com") 
:-)
ElasticSearch 
“amazon source:online.wsj.com” 
search = Post.search( 
query:{ 
match: { 
_all: "amazon source:online.wsj.com", 
} 
} 
) 
Full-text search
ElasticSearch 
“amazon source:online.wsj.com” 
search = Post.search( 
query:{ 
multi_match: { 
query: "amazon source:online.wsj.com", 
fields: ['title^10', 'source'] 
} 
} 
) 
Full-text search 
Title boost
ElasticSearch 
“amazon source:online.wsj.com” 
search = Post.search( 
query:{ 
multi_match: { 
query: "amazon source:online.wsj.com", 
fields: ['title^10', 'source'] 
} 
}, 
highlight: { 
fields: { 
title: {} 
} 
} 
) 
Title highlight 
Full-text search 
Title boost
ElasticSearch 
Title highlight 
> search.results[0].highlight.title 
=> ["Twitch officially acquired by <em>Amazon</em>"]
2) Aggregations (faceting)
Geo distance aggregation
ActiveRecords 
$ rails g scaffold Coordinate 
latitude:decimal longitude:decimal
ActiveRecords 
class Coordinate < ActiveRecord::Base! 
end
ActiveRecords 
class Coordinate < ActiveRecord::Base! 
def distance_to(coordinate)! 
# From http://en.wikipedia.org/wiki/Haversine_formula! 
rad_per_deg = Math::PI/180 # PI / 180! 
rkm = 6371 # Earth radius in kilometers! 
rm = rkm * 1000 # Radius in meters! 
! 
dlon_rad = (coordinate.longitude.to_f - self.longitude.to_f) * rad_per_deg # Delta, converted to rad! 
dlat_rad = (coordinate.latitude.to_f - self.latitude.to_f) * rad_per_deg! 
! 
lat1_rad = coordinate.latitude.to_f * rad_per_deg! 
lat2_rad = self.latitude.to_f * rad_per_deg! 
lon1_rad = coordinate.longitude.to_f * rad_per_deg! 
lon2_rad = self.longitude.to_f * rad_per_deg! 
! 
a = Math.sin(dlat_rad/2)**2 + Math.cos(lat1_rad) * Math.cos(lat2_rad) * Math.sin(dlon_rad/2)**2! 
c = 2 * Math::atan2(Math::sqrt(a), Math::sqrt(1-a))! 
! 
rm * c # Delta in meters! 
end! 
end 
> c1 = Coordinate.new(:latitude => -23.5532636, :longitude => -46.6528908) 
> c2 = Coordinate.new(:latitude => -23.5538488, :longitude => -46.6530035) 
> c1.distance_to(c2) 
=> 66.07749735875552
ActiveRecords 
origin = Coordinate.new(:latitude => -23.5532636, :longitude => -46.6528908) 
buckets = [! 
{! 
:to => 100,! 
:coordinates => []! 
},! 
{! 
:from => 100,! 
:to => 300,! 
:coordinates => []! 
},! 
{! 
:from => 300,! 
:coordinates => []! 
}! 
]! 
Coordinate.all.each do |coordinate|! 
distance = origin.distance_to(coordinate)! 
! 
buckets.each do |bucket|! 
if distance < bucket[:to] and distance > (bucket[:from] || 0)! 
bucket[:coordinates] << coordinate! 
end! 
end! 
end 
??????
ElasticSearch 
query = {! 
aggregations: {! 
Nome da aggregation 
rings_around_rubyconf: {! 
geo_distance: {! 
Field com localização 
Coordenadas da origem 
field: "location",! 
origin: "-23.5532636, -46.6528908",! 
ranges: [! 
{ to: 100 },! 
{ from: 100, to: 300 },! 
{ from: 300 }! 
]! 
}! 
Tipo da aggregation 
}! 
}! 
} 
Buckets para agregar 
search = Coordinate.search(query) :-)
(Extended) stats aggregation
ActiveRecords 
$ rails g scaffold Grade subject:string 
grade:decimal
ElasticSearch 
query = {! 
aggregations: {! 
Nome da aggregation 
grades_stats: {! 
Tipo da aggregation 
extended_stats: {! 
field: "grade",! 
}! 
}! 
}! 
}! 
! 
search = Grade.search(query) 
Nome do field
ElasticSearch 
> search.response.aggregations.grades_stats! 
! 
=> #<Hashie::Mash avg=8.03 count=3 max=10.0 min=4.6 
std_deviation=2.43 sum=24.1 sum_of_squares=211.41 
variance=5.93>>
(Extended) stats aggregation 
+ 
Scripting
ElasticSearch 
query = {! 
aggregations: {! 
grades_stats: {! 
extended_stats: {! 
field: "grade",! 
}! 
}! 
}! 
}
ElasticSearch 
query = {! 
aggregations: {! 
Nome da aggregation 
grades_stats: {! 
extended_stats: {! 
field: "grade",! 
script: "_value < 7.0 ? _value * correction : _value",! 
params: {! 
correction: 1.2! 
}! 
}! 
}! 
}! 
}! 
! 
search = Grade.search(query) 
Nome do field 
JavaScript para 
calcular novo grade 
Tipo da aggregation
ElasticSearch 
> search.response.aggregations.grades_stats! 
! 
=> #<Hashie::Mash avg=8.34 count=3 max=10.0 min=5.52 
std_deviation=2.00 sum=25.02 sum_of_squares=220.72 
variance=4.01>>
Term aggregation
ElasticSearch 
query = {! 
aggregations: {! 
subjects: {! 
terms: {! 
Nome da aggregation 
field: "subject"! 
}! 
}! 
}! 
}! 
! 
search = Grade.search(query) 
Nome do field 
Tipo da aggregation
ElasticSearch 
> search.response.aggregations.subjects! 
! 
=> #<Hashie::Mash buckets=[! 
#<Hashie::Mash doc_count=2 key=“math">,! 
#<Hashie::Mash doc_count=1 key="grammar">, 
#<Hashie::Mash doc_count=1 key=“physics">! 
]>
Combined aggregations 
(term + stats)
ElasticSearch 
query = {! 
aggregations: {! 
subjects: {! 
terms: {! 
field: "subject"! 
}! 
}! 
}! 
}! 
! 
search = Grade.search(query)
ElasticSearch 
query = {! 
aggregations: {! 
subjects: {! 
terms: {! 
Nome da parent aggregation 
field: "subject"! 
},! 
aggregations: {! 
grade_stats: {! 
stats: {! 
Nome da child aggregation 
field: "grade"! 
}! 
}! 
}! 
}! 
}! 
}! 
! 
search = Grade.search(query) 
Field para parent 
aggregation 
Field para child 
aggregation
ElasticSearch 
> search.response.aggregations.subjects! 
! 
#<Hashie::Mash buckets=[! 
#<Hashie::Mash doc_count=2 grade_stats=#<Hashie::Mash 
avg=9.0 count=2 max=10.0 min=8.0 sum=18.0> key="math">, 
#<Hashie::Mash doc_count=1 grade_stats=#<Hashie::Mash 
avg=4.6 count=1 max=4.6 min=4.6 sum=4.6> key="grammar">, 
#<Hashie::Mash doc_count=1 grade_stats=#<Hashie::Mash 
avg=9.5 count=1 max=9.5 min=9.5 sum=9.5> key=“physics">! 
]>
Top Hits 
More like this 
Histogram 
Scripted metrics 
Geo bounds 
Stemmer (sinônimos) 
IPv4 ranges 
. . .
3) Scoring
ActiveRecords 
$ rails g scaffold Post title:string! 
source:string likes:integer
“amazon” 
ElasticSearch 
search = Post.search( 
query: { 
match: { 
_all: "amazon", 
} 
} 
) 
Full-text search 
search.results.results[0]._score 
=> 0.8174651
“amazon” 
ElasticSearch 
search = Post.search( 
query: { 
custom_score: { 
query:{ 
match: { 
_all: "amazon", 
} 
}, 
script: "_score * doc['likes'].value" 
} 
} 
) 
Full-text search 
Likes influenciam no score 
search.results.results[0]._score 
=> 31.8811388
GET http://localhost:9200/post/_search?explain 
"_explanation": {! 
"description": "weight(tweet:honeymoon in 0)! 
[PerFieldSimilarity], result of:",! 
"value": 0.076713204,! 
"details": [! 
{! 
"description": "fieldWeight in 0, product of:",! 
"value": 0.076713204,! 
"details": [! 
{! 
"description": "tf(freq=1.0), with freq of:",! 
"value": 1,! 
"details": [! 
{! 
"description": "termFreq=1.0",! 
"value": 1! 
}! 
]! 
},! 
{! 
"description": "idf(docFreq=1, maxDocs=1)",! 
"value": 0.30685282! 
},! 
{! 
"description": "fieldNorm(doc=0)",! 
"value": 0.25,! 
}! 
]! 
}! 
]! 
} 
Score explicado
4) Indexando responses
$ rails g scaffold Post title:string! 
source:string likes:integer
class PostsController < ApplicationController! 
! 
# ...! 
! 
def show! 
@post = Post.find(params[:id])! 
! 
render json: @post! 
end! 
! 
# ...! 
! 
end 
SELECT * FROM Posts WHERE id = params[:id]
class PostsController < ApplicationController! 
! 
# ...! 
! 
def show! 
@post = Post.search(query: { match: { id: params[:id] }})! 
! 
render json: @post! 
end! 
! 
# ...! 
! 
end 
GET http://localhost:9200/posts/posts/params[:id]
ActiveRecords 
require 'elasticsearch/model'! 
! 
class Post < ActiveRecord::Base! 
include Elasticsearch::Model! 
include Elasticsearch::Model::Callbacks! 
! 
belongs_to :author! 
! 
def as_indexed_json(options={})! 
self.as_json(! 
include: { author: { only: [:name, :bio] },! 
})! 
end! 
end Inclui um parent no JSON indexado
Expondo o ElasticSearch
http://localhost:9200/pagarme/_search 
https://api.pagar.me/1/search
Infraestrutura do Pagar.me 
ElasticSearch ElasticSearch 
Router 
api.pagar.me 
Servidor da API 
(Node.js) 
MySQL 
(transações e dados relacionais) 
MySQL 
(transações e dados relacionais) 
MongoDB 
(dados de clientes e não relacionais) 
Ambiente de testes 
(sandbox dos clientes) 
Servidor da API 
(Node.js) 
Ambiente de produção
Expondo o ElasticSearch 
• Endpoint do ElasticSearch -> Endpoint acessado pelo 
cliente… 
• … mas cuidado: dados precisam ser delimitados a 
conta do cliente (claro) 
• Vantagem: acesso às mesmas features do 
ElasticSearch (aggregations, statistics, scores, etc) 
• Segurança: desabilitar scripts do ElasticSearch
GET /search 
• Um único endpoint para todos os GETs 
• Todos os dados indexados e prontos para serem 
usados (no joins) 
• Queries complexas construídas no front-side 
(Angular.js) 
• Desenvolvimento front-end não dependente do 
back-end
Overall…
1)Há uma ferramenta para cada tarefa. 
2)Um martelo é sempre a ferramenta certa. 
3)Toda ferramenta também é um martelo.
MySQL 
!= 
NoSQL 
!= 
ElasticSearch
Obrigado! :) 
PEDROFRANCESCHI 
@pedroh96 
pedro@pagar.me 
github.com/pedrofranceschi
Perguntas? 
PEDROFRANCESCHI 
@pedroh96 
pedro@pagar.me 
github.com/pedrofranceschi
Fazendo mágica com 
ElasticSearch 
PEDROFRANCESCHI 
@pedroh96 
pedro@pagar.me 
github.com/pedrofranceschi

Más contenido relacionado

La actualidad más candente

Morphia: Simplifying Persistence for Java and MongoDB
Morphia:  Simplifying Persistence for Java and MongoDBMorphia:  Simplifying Persistence for Java and MongoDB
Morphia: Simplifying Persistence for Java and MongoDB
Jeff Yemin
 
High Performance Django
High Performance DjangoHigh Performance Django
High Performance Django
DjangoCon2008
 
Intravert Server side processing for Cassandra
Intravert Server side processing for CassandraIntravert Server side processing for Cassandra
Intravert Server side processing for Cassandra
Edward Capriolo
 
Leveraging the Power of Graph Databases in PHP
Leveraging the Power of Graph Databases in PHPLeveraging the Power of Graph Databases in PHP
Leveraging the Power of Graph Databases in PHP
Jeremy Kendall
 
Elasticsearch And Ruby [RuPy2012]
Elasticsearch And Ruby [RuPy2012]Elasticsearch And Ruby [RuPy2012]
Elasticsearch And Ruby [RuPy2012]
Karel Minarik
 
Java Persistence Frameworks for MongoDB
Java Persistence Frameworks for MongoDBJava Persistence Frameworks for MongoDB
Java Persistence Frameworks for MongoDB
MongoDB
 

La actualidad más candente (20)

elasticsearch - advanced features in practice
elasticsearch - advanced features in practiceelasticsearch - advanced features in practice
elasticsearch - advanced features in practice
 
MongoDB Performance Tuning
MongoDB Performance TuningMongoDB Performance Tuning
MongoDB Performance Tuning
 
Solr vs. Elasticsearch - Case by Case
Solr vs. Elasticsearch - Case by CaseSolr vs. Elasticsearch - Case by Case
Solr vs. Elasticsearch - Case by Case
 
Использование Elasticsearch для организации поиска по сайту
Использование Elasticsearch для организации поиска по сайтуИспользование Elasticsearch для организации поиска по сайту
Использование Elasticsearch для организации поиска по сайту
 
Solr & Lucene @ Etsy by Gregg Donovan
Solr & Lucene @ Etsy by Gregg DonovanSolr & Lucene @ Etsy by Gregg Donovan
Solr & Lucene @ Etsy by Gregg Donovan
 
Simple search with elastic search
Simple search with elastic searchSimple search with elastic search
Simple search with elastic search
 
High Performance XQuery Processing in PHP with Zorba by Vikram Vaswani
High Performance XQuery Processing in PHP with Zorba by Vikram VaswaniHigh Performance XQuery Processing in PHP with Zorba by Vikram Vaswani
High Performance XQuery Processing in PHP with Zorba by Vikram Vaswani
 
Morphia: Simplifying Persistence for Java and MongoDB
Morphia:  Simplifying Persistence for Java and MongoDBMorphia:  Simplifying Persistence for Java and MongoDB
Morphia: Simplifying Persistence for Java and MongoDB
 
High Performance Django
High Performance DjangoHigh Performance Django
High Performance Django
 
Intravert Server side processing for Cassandra
Intravert Server side processing for CassandraIntravert Server side processing for Cassandra
Intravert Server side processing for Cassandra
 
Php 102: Out with the Bad, In with the Good
Php 102: Out with the Bad, In with the GoodPhp 102: Out with the Bad, In with the Good
Php 102: Out with the Bad, In with the Good
 
Leveraging the Power of Graph Databases in PHP
Leveraging the Power of Graph Databases in PHPLeveraging the Power of Graph Databases in PHP
Leveraging the Power of Graph Databases in PHP
 
SunshinePHP 2017 - Making the most out of MySQL
SunshinePHP 2017 - Making the most out of MySQLSunshinePHP 2017 - Making the most out of MySQL
SunshinePHP 2017 - Making the most out of MySQL
 
Elasticsearch And Ruby [RuPy2012]
Elasticsearch And Ruby [RuPy2012]Elasticsearch And Ruby [RuPy2012]
Elasticsearch And Ruby [RuPy2012]
 
Elasticsearch in 15 Minutes
Elasticsearch in 15 MinutesElasticsearch in 15 Minutes
Elasticsearch in 15 Minutes
 
Data Exploration with Elasticsearch
Data Exploration with ElasticsearchData Exploration with Elasticsearch
Data Exploration with Elasticsearch
 
Leveraging the Power of Graph Databases in PHP
Leveraging the Power of Graph Databases in PHPLeveraging the Power of Graph Databases in PHP
Leveraging the Power of Graph Databases in PHP
 
Back to Basics Webinar 5: Introduction to the Aggregation Framework
Back to Basics Webinar 5: Introduction to the Aggregation FrameworkBack to Basics Webinar 5: Introduction to the Aggregation Framework
Back to Basics Webinar 5: Introduction to the Aggregation Framework
 
Java Persistence Frameworks for MongoDB
Java Persistence Frameworks for MongoDBJava Persistence Frameworks for MongoDB
Java Persistence Frameworks for MongoDB
 
Elasticsearch (Rubyshift 2013)
Elasticsearch (Rubyshift 2013)Elasticsearch (Rubyshift 2013)
Elasticsearch (Rubyshift 2013)
 

Destacado

Alto desempenho e escalabilidade em aplicações web utilizando banco de dados ...
Alto desempenho e escalabilidade em aplicações web utilizando banco de dados ...Alto desempenho e escalabilidade em aplicações web utilizando banco de dados ...
Alto desempenho e escalabilidade em aplicações web utilizando banco de dados ...
Vagner Santana
 

Destacado (11)

Node.js no Pagar.me
Node.js no Pagar.meNode.js no Pagar.me
Node.js no Pagar.me
 
Node.js: serious business
Node.js: serious businessNode.js: serious business
Node.js: serious business
 
Porque você deve aprender VIm hoje.
Porque você deve aprender VIm hoje.Porque você deve aprender VIm hoje.
Porque você deve aprender VIm hoje.
 
Primeiros Passos Com Elasticsearch
Primeiros Passos Com ElasticsearchPrimeiros Passos Com Elasticsearch
Primeiros Passos Com Elasticsearch
 
Processo de Contratação de Pessoas - É possível fazer bem melhor!
Processo de Contratação de Pessoas - É possível fazer bem melhor!Processo de Contratação de Pessoas - É possível fazer bem melhor!
Processo de Contratação de Pessoas - É possível fazer bem melhor!
 
Como o elasticsearch salvou minhas buscas
Como o elasticsearch salvou minhas buscasComo o elasticsearch salvou minhas buscas
Como o elasticsearch salvou minhas buscas
 
Alto desempenho e escalabilidade em aplicações web utilizando banco de dados ...
Alto desempenho e escalabilidade em aplicações web utilizando banco de dados ...Alto desempenho e escalabilidade em aplicações web utilizando banco de dados ...
Alto desempenho e escalabilidade em aplicações web utilizando banco de dados ...
 
Treinamento Elasticsearch - Parte 1
Treinamento Elasticsearch - Parte 1Treinamento Elasticsearch - Parte 1
Treinamento Elasticsearch - Parte 1
 
Node.js - Devo adotar na minha empresa?
Node.js - Devo adotar na minha empresa?Node.js - Devo adotar na minha empresa?
Node.js - Devo adotar na minha empresa?
 
Micro serviços com node.js
Micro serviços com node.jsMicro serviços com node.js
Micro serviços com node.js
 
Secure Your REST API (The Right Way)
Secure Your REST API (The Right Way)Secure Your REST API (The Right Way)
Secure Your REST API (The Right Way)
 

Similar a Fazendo mágica com ElasticSearch

Real-time search in Drupal with Elasticsearch @Moldcamp
Real-time search in Drupal with Elasticsearch @MoldcampReal-time search in Drupal with Elasticsearch @Moldcamp
Real-time search in Drupal with Elasticsearch @Moldcamp
Alexei Gorobets
 
mobl presentation @ IHomer
mobl presentation @ IHomermobl presentation @ IHomer
mobl presentation @ IHomer
zefhemel
 
Refactoring to Macros with Clojure
Refactoring to Macros with ClojureRefactoring to Macros with Clojure
Refactoring to Macros with Clojure
Dmitry Buzdin
 

Similar a Fazendo mágica com ElasticSearch (20)

Extreme Swift
Extreme SwiftExtreme Swift
Extreme Swift
 
Spark with Elasticsearch
Spark with ElasticsearchSpark with Elasticsearch
Spark with Elasticsearch
 
A Rusty introduction to Apache Arrow and how it applies to a time series dat...
A Rusty introduction to Apache Arrow and how it applies to a  time series dat...A Rusty introduction to Apache Arrow and how it applies to a  time series dat...
A Rusty introduction to Apache Arrow and how it applies to a time series dat...
 
Elasticsearch first-steps
Elasticsearch first-stepsElasticsearch first-steps
Elasticsearch first-steps
 
Solving the Riddle of Search: Using Sphinx with Rails
Solving the Riddle of Search: Using Sphinx with RailsSolving the Riddle of Search: Using Sphinx with Rails
Solving the Riddle of Search: Using Sphinx with Rails
 
Elastic tire demo
Elastic tire demoElastic tire demo
Elastic tire demo
 
Real-time search in Drupal with Elasticsearch @Moldcamp
Real-time search in Drupal with Elasticsearch @MoldcampReal-time search in Drupal with Elasticsearch @Moldcamp
Real-time search in Drupal with Elasticsearch @Moldcamp
 
Cool bonsai cool - an introduction to ElasticSearch
Cool bonsai cool - an introduction to ElasticSearchCool bonsai cool - an introduction to ElasticSearch
Cool bonsai cool - an introduction to ElasticSearch
 
Open Source Search: An Analysis
Open Source Search: An AnalysisOpen Source Search: An Analysis
Open Source Search: An Analysis
 
The Ring programming language version 1.8 book - Part 50 of 202
The Ring programming language version 1.8 book - Part 50 of 202The Ring programming language version 1.8 book - Part 50 of 202
The Ring programming language version 1.8 book - Part 50 of 202
 
Finding the right stuff, an intro to Elasticsearch with Ruby/Rails
Finding the right stuff, an intro to Elasticsearch with Ruby/RailsFinding the right stuff, an intro to Elasticsearch with Ruby/Rails
Finding the right stuff, an intro to Elasticsearch with Ruby/Rails
 
huhu
huhuhuhu
huhu
 
mobl presentation @ IHomer
mobl presentation @ IHomermobl presentation @ IHomer
mobl presentation @ IHomer
 
Rails on Oracle 2011
Rails on Oracle 2011Rails on Oracle 2011
Rails on Oracle 2011
 
Refactoring to Macros with Clojure
Refactoring to Macros with ClojureRefactoring to Macros with Clojure
Refactoring to Macros with Clojure
 
Real life-coffeescript
Real life-coffeescriptReal life-coffeescript
Real life-coffeescript
 
Introduction to Elasticsearch
Introduction to ElasticsearchIntroduction to Elasticsearch
Introduction to Elasticsearch
 
Finding the right stuff, an intro to Elasticsearch (at Rug::B)
Finding the right stuff, an intro to Elasticsearch (at Rug::B) Finding the right stuff, an intro to Elasticsearch (at Rug::B)
Finding the right stuff, an intro to Elasticsearch (at Rug::B)
 
Full Text Search In PostgreSQL
Full Text Search In PostgreSQLFull Text Search In PostgreSQL
Full Text Search In PostgreSQL
 
JSON and the APInauts
JSON and the APInautsJSON and the APInauts
JSON and the APInauts
 

Último

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Último (20)

Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 

Fazendo mágica com ElasticSearch

  • 1. Fazendo mágica com ElasticSearch PEDROFRANCESCHI @pedroh96 pedro@pagar.me github.com/pedrofranceschi
  • 3. Filters Full text search Sort Highlight Facets Pagination
  • 4. Você vai precisar buscar dados.
  • 5. Você vai precisar entender dados.
  • 6. (My)SQL não é a solução. (… nem NoSQL)
  • 7. O que é o ElasticSearch?
  • 8. ElasticSearch • “Open Source Distributed Real Time Search & Analytics” • API RESTful para indexar/buscar JSONs (“NoSQL”) • NÃO é um banco de dados • Apache Lucene • Just works (and scales) • Full text search, aggregations, scripting, etc, etc, etc.
  • 9. Nomes? MySQL ElasticSearch Database Index Table Type Row Document Column Field Schema Mapping Partition Shard
  • 10. Como usar o ElasticSearch?
  • 11. $ curl -XPUT 'http://localhost:9200/twitter/tweet/1' -d '{! "user" : “pedroh96",! "post_date" : "2009-11-15T14:12:12",! "message" : "trying out Elasticsearch"! }' Endpoint Index Type Document ID Document {! "_index" : "twitter",! "_type" : "tweet",! "_id" : "1",! "_version" : 1,! "created" : true! } PUT data
  • 12. Endpoint Index Type $ curl -XGET 'http://localhost:9200/twitter/tweet/1' Document ID {! "_id": "1",! "_index": "twitter",! "_source": {! "message": "trying out Elasticsearch",! "post_date": "2009-11-15T14:12:12",! "user": "pedroh96"! },! "_type": "tweet",! "_version": 1,! "found": true! } Document GET data
  • 13. GET data Endpoint Index $ curl -XGET 'http://localhost:9200/twitter/_search'! -d ‘{ query: . . . }! ! ! Query de busca ! ! ! ! ! ! ! Operador de busca
  • 14. ActiveRecords class Tweet < ActiveRecord::Base! end
  • 15. ActiveRecords require 'elasticsearch/model'! ! class Tweet < ActiveRecord::Base! include Elasticsearch::Model! include Elasticsearch::Model::Callbacks! end! !
  • 18. Por que usar o ElasticSearch?
  • 20. Post.where(:all, :author => "pedroh96") vs Post.search(query: { match: { author: "pedroh96" }}) Just Another Query Language?
  • 21. 1) Full text search
  • 22. ActiveRecords $ rails g scaffold Post title:string! source:string
  • 23. GET /posts/5 Post.find(5) :-) ActiveRecords
  • 24. ActiveRecords “Amazon to Buy Video Site Twitch for More Than $1B” Post.where(:all, :title => "Amazon to Buy Video Site Twitch for More Than $1B") :-)
  • 25. “amazon” Post.where(["title LIKE ?", "%Amazon%"]) ??? ActiveRecords
  • 26. “amazon source:online.wsj.com” Post.where(["title LIKE ? AND source = ?", "%Amazon%", "online.wsj.com"]) ?????? ActiveRecords
  • 28. ElasticSearch “amazon source:online.wsj.com” search = Post.search("amazon source:online.wsj.com") :-)
  • 29. ElasticSearch “amazon source:online.wsj.com” search = Post.search( query:{ match: { _all: "amazon source:online.wsj.com", } } ) Full-text search
  • 30. ElasticSearch “amazon source:online.wsj.com” search = Post.search( query:{ multi_match: { query: "amazon source:online.wsj.com", fields: ['title^10', 'source'] } } ) Full-text search Title boost
  • 31. ElasticSearch “amazon source:online.wsj.com” search = Post.search( query:{ multi_match: { query: "amazon source:online.wsj.com", fields: ['title^10', 'source'] } }, highlight: { fields: { title: {} } } ) Title highlight Full-text search Title boost
  • 32. ElasticSearch Title highlight > search.results[0].highlight.title => ["Twitch officially acquired by <em>Amazon</em>"]
  • 33.
  • 36.
  • 37. ActiveRecords $ rails g scaffold Coordinate latitude:decimal longitude:decimal
  • 38. ActiveRecords class Coordinate < ActiveRecord::Base! end
  • 39. ActiveRecords class Coordinate < ActiveRecord::Base! def distance_to(coordinate)! # From http://en.wikipedia.org/wiki/Haversine_formula! rad_per_deg = Math::PI/180 # PI / 180! rkm = 6371 # Earth radius in kilometers! rm = rkm * 1000 # Radius in meters! ! dlon_rad = (coordinate.longitude.to_f - self.longitude.to_f) * rad_per_deg # Delta, converted to rad! dlat_rad = (coordinate.latitude.to_f - self.latitude.to_f) * rad_per_deg! ! lat1_rad = coordinate.latitude.to_f * rad_per_deg! lat2_rad = self.latitude.to_f * rad_per_deg! lon1_rad = coordinate.longitude.to_f * rad_per_deg! lon2_rad = self.longitude.to_f * rad_per_deg! ! a = Math.sin(dlat_rad/2)**2 + Math.cos(lat1_rad) * Math.cos(lat2_rad) * Math.sin(dlon_rad/2)**2! c = 2 * Math::atan2(Math::sqrt(a), Math::sqrt(1-a))! ! rm * c # Delta in meters! end! end > c1 = Coordinate.new(:latitude => -23.5532636, :longitude => -46.6528908) > c2 = Coordinate.new(:latitude => -23.5538488, :longitude => -46.6530035) > c1.distance_to(c2) => 66.07749735875552
  • 40. ActiveRecords origin = Coordinate.new(:latitude => -23.5532636, :longitude => -46.6528908) buckets = [! {! :to => 100,! :coordinates => []! },! {! :from => 100,! :to => 300,! :coordinates => []! },! {! :from => 300,! :coordinates => []! }! ]! Coordinate.all.each do |coordinate|! distance = origin.distance_to(coordinate)! ! buckets.each do |bucket|! if distance < bucket[:to] and distance > (bucket[:from] || 0)! bucket[:coordinates] << coordinate! end! end! end ??????
  • 41. ElasticSearch query = {! aggregations: {! Nome da aggregation rings_around_rubyconf: {! geo_distance: {! Field com localização Coordenadas da origem field: "location",! origin: "-23.5532636, -46.6528908",! ranges: [! { to: 100 },! { from: 100, to: 300 },! { from: 300 }! ]! }! Tipo da aggregation }! }! } Buckets para agregar search = Coordinate.search(query) :-)
  • 43. ActiveRecords $ rails g scaffold Grade subject:string grade:decimal
  • 44. ElasticSearch query = {! aggregations: {! Nome da aggregation grades_stats: {! Tipo da aggregation extended_stats: {! field: "grade",! }! }! }! }! ! search = Grade.search(query) Nome do field
  • 45. ElasticSearch > search.response.aggregations.grades_stats! ! => #<Hashie::Mash avg=8.03 count=3 max=10.0 min=4.6 std_deviation=2.43 sum=24.1 sum_of_squares=211.41 variance=5.93>>
  • 47. ElasticSearch query = {! aggregations: {! grades_stats: {! extended_stats: {! field: "grade",! }! }! }! }
  • 48. ElasticSearch query = {! aggregations: {! Nome da aggregation grades_stats: {! extended_stats: {! field: "grade",! script: "_value < 7.0 ? _value * correction : _value",! params: {! correction: 1.2! }! }! }! }! }! ! search = Grade.search(query) Nome do field JavaScript para calcular novo grade Tipo da aggregation
  • 49. ElasticSearch > search.response.aggregations.grades_stats! ! => #<Hashie::Mash avg=8.34 count=3 max=10.0 min=5.52 std_deviation=2.00 sum=25.02 sum_of_squares=220.72 variance=4.01>>
  • 51. ElasticSearch query = {! aggregations: {! subjects: {! terms: {! Nome da aggregation field: "subject"! }! }! }! }! ! search = Grade.search(query) Nome do field Tipo da aggregation
  • 52. ElasticSearch > search.response.aggregations.subjects! ! => #<Hashie::Mash buckets=[! #<Hashie::Mash doc_count=2 key=“math">,! #<Hashie::Mash doc_count=1 key="grammar">, #<Hashie::Mash doc_count=1 key=“physics">! ]>
  • 54. ElasticSearch query = {! aggregations: {! subjects: {! terms: {! field: "subject"! }! }! }! }! ! search = Grade.search(query)
  • 55. ElasticSearch query = {! aggregations: {! subjects: {! terms: {! Nome da parent aggregation field: "subject"! },! aggregations: {! grade_stats: {! stats: {! Nome da child aggregation field: "grade"! }! }! }! }! }! }! ! search = Grade.search(query) Field para parent aggregation Field para child aggregation
  • 56. ElasticSearch > search.response.aggregations.subjects! ! #<Hashie::Mash buckets=[! #<Hashie::Mash doc_count=2 grade_stats=#<Hashie::Mash avg=9.0 count=2 max=10.0 min=8.0 sum=18.0> key="math">, #<Hashie::Mash doc_count=1 grade_stats=#<Hashie::Mash avg=4.6 count=1 max=4.6 min=4.6 sum=4.6> key="grammar">, #<Hashie::Mash doc_count=1 grade_stats=#<Hashie::Mash avg=9.5 count=1 max=9.5 min=9.5 sum=9.5> key=“physics">! ]>
  • 57. Top Hits More like this Histogram Scripted metrics Geo bounds Stemmer (sinônimos) IPv4 ranges . . .
  • 59. ActiveRecords $ rails g scaffold Post title:string! source:string likes:integer
  • 60. “amazon” ElasticSearch search = Post.search( query: { match: { _all: "amazon", } } ) Full-text search search.results.results[0]._score => 0.8174651
  • 61. “amazon” ElasticSearch search = Post.search( query: { custom_score: { query:{ match: { _all: "amazon", } }, script: "_score * doc['likes'].value" } } ) Full-text search Likes influenciam no score search.results.results[0]._score => 31.8811388
  • 62. GET http://localhost:9200/post/_search?explain "_explanation": {! "description": "weight(tweet:honeymoon in 0)! [PerFieldSimilarity], result of:",! "value": 0.076713204,! "details": [! {! "description": "fieldWeight in 0, product of:",! "value": 0.076713204,! "details": [! {! "description": "tf(freq=1.0), with freq of:",! "value": 1,! "details": [! {! "description": "termFreq=1.0",! "value": 1! }! ]! },! {! "description": "idf(docFreq=1, maxDocs=1)",! "value": 0.30685282! },! {! "description": "fieldNorm(doc=0)",! "value": 0.25,! }! ]! }! ]! } Score explicado
  • 64. $ rails g scaffold Post title:string! source:string likes:integer
  • 65. class PostsController < ApplicationController! ! # ...! ! def show! @post = Post.find(params[:id])! ! render json: @post! end! ! # ...! ! end SELECT * FROM Posts WHERE id = params[:id]
  • 66. class PostsController < ApplicationController! ! # ...! ! def show! @post = Post.search(query: { match: { id: params[:id] }})! ! render json: @post! end! ! # ...! ! end GET http://localhost:9200/posts/posts/params[:id]
  • 67. ActiveRecords require 'elasticsearch/model'! ! class Post < ActiveRecord::Base! include Elasticsearch::Model! include Elasticsearch::Model::Callbacks! ! belongs_to :author! ! def as_indexed_json(options={})! self.as_json(! include: { author: { only: [:name, :bio] },! })! end! end Inclui um parent no JSON indexado
  • 69.
  • 71. Infraestrutura do Pagar.me ElasticSearch ElasticSearch Router api.pagar.me Servidor da API (Node.js) MySQL (transações e dados relacionais) MySQL (transações e dados relacionais) MongoDB (dados de clientes e não relacionais) Ambiente de testes (sandbox dos clientes) Servidor da API (Node.js) Ambiente de produção
  • 72. Expondo o ElasticSearch • Endpoint do ElasticSearch -> Endpoint acessado pelo cliente… • … mas cuidado: dados precisam ser delimitados a conta do cliente (claro) • Vantagem: acesso às mesmas features do ElasticSearch (aggregations, statistics, scores, etc) • Segurança: desabilitar scripts do ElasticSearch
  • 73. GET /search • Um único endpoint para todos os GETs • Todos os dados indexados e prontos para serem usados (no joins) • Queries complexas construídas no front-side (Angular.js) • Desenvolvimento front-end não dependente do back-end
  • 74.
  • 75.
  • 77.
  • 78. 1)Há uma ferramenta para cada tarefa. 2)Um martelo é sempre a ferramenta certa. 3)Toda ferramenta também é um martelo.
  • 79. MySQL != NoSQL != ElasticSearch
  • 80. Obrigado! :) PEDROFRANCESCHI @pedroh96 pedro@pagar.me github.com/pedrofranceschi
  • 81. Perguntas? PEDROFRANCESCHI @pedroh96 pedro@pagar.me github.com/pedrofranceschi
  • 82. Fazendo mágica com ElasticSearch PEDROFRANCESCHI @pedroh96 pedro@pagar.me github.com/pedrofranceschi