SlideShare una empresa de Scribd logo
1 de 52
Descargar para leer sin conexión
Using Sphinx
for Search
Mike Lively
Slickdeals, LLC
What is Sphinx?
• A full-text search engine
• Quickly get high quality (relevant) results
• Designed to integrate well with SQL RDBMS
• Can work with any data source
• Can be queried using either an API or SQL
How do I know anything
about Sphinx?
• Manager of Software Architecture for
Slickdeals.net
• Alexa top 150 site (in the US)
• Have been working at improving our Sphinx
search engine for the last 2 months or so.
• Over 7 Million searches a month directly through
the interface, lots more happen indirectly.
When should I use Sphinx?
• Site / Product / Document searches
• Auto-suggest / Auto-Correct functionality
• Finding relevant and related items
Simple Architecture
• Often, search is offloaded
straight to the database
• Search goes to the backend
which performs queries on the
database
• Obviously very easy to
implement
Simple Architecture
• Simple “starts with” searches
on indexed fields can
sometimes work: `city` LIKE
‘Las%’
• Anything else will lock your
database for writes with
MyISAM.
• MySQL is not a great or
flexible full text engine
• It can sometimes be adequate
Sphinx Architecture
• Searchd is responsible for
receiving requests from
clients and executing the
searches against the sphinx
index.
• Indexer is responsible for
getting data into the sphinx
index.
• This separation allows
indexing and searching to be
scaled separately.
Sphinx Architecture
• Searchd has a binary protocol
for which there are several
clients available in multiple
languages.
• Searchd is also binary
compatible with MySQL’s
protocol since mysql 4.1
• Searchd is a daemon that
runs on your search servers
Sphinx Architecture
• Indexer is a shell program that
you can execute to build any
number of indexes.
• Can handle index rotation for
live indexing
Not So Quick Side Note
MySQL IS SLOWWWWWWWWWWWWW
(at text matches)
Still Not Quick Side Note
Indexes won’t help you…
Quicker Side Note
Full Text Search isn’t so bad
IF….
Sphinx Concepts
• Sphinx Indexes “Documents”
• Each document has a unique unsigned, non-
zero integer ID (either 32 bit or 64 bit space)
• Each document has one or more fields
• Each document has zero or more attributes
Indexes / Sources
• Sphinx indexes are created from one or more
sources.
• The source can be a database, xml, or tsv
stream.
• You can use multiple sources
• This is useful for maintaining updated indexes
• Also used to implement a sphinx cluster
Sphinx Fields
• Fields are what the full text index is comprised of.
• When searching you can search against any number
of fields.
• You can assign different relevancy weights to different
fields.
• The original value of a field is never stored by Sphinx.
• You should always have at least one.
Sphinx Attributes
• data that helps further describe the item being
indexed
• Can be returned as a part of the search
• Useful for filtering and sorting results
• These are not a part of the full text index.
MySQL Full Text Search
• You can get away with MyISAM tables or as of
version 5.6 InnoDB.
• You don’t care about morphology (think plurals)
• You don’t need anything but the most basic of
search operators
Creating An Index
• We are going to add an index that sources a
mysql database.
• The data being sourced is a list of the titles of
wikipedia posts.
Creating An Index
Indexer Configuration
• We are going to be peaking into a sphinx
configuration file now.
• You can rebuild the config file by concatenating
each section into a single file.
• On my VM this file is located in /usr/local/etc/
sphinx.conf
Source Definition
Source Definition
Defines the connection information
Connection information
• Ideally, you should create a
separate account for sphinx
• You can also connect via unix
socket
• I didn’t specify it here, but you
can also add a port.
Source Definition
The query that pulls data to populate the index
Source Index
• The index query MUST return
the id field as the first column
• Remember, the id needs to be
a unique, unsigned 64 bit (or
less number)
• The query must be on a single
line. Unless you escape new
lines with back slashes.
• Notice that we converted the
timestamp into a unix
timestamp. That is important.
Source Definition
How data is stored in the index
Source Fields
• The first column in the query is
always the ID.
• You specify any columns that
are attributes.
• Remember, attributes are
stored in the index as fields
that can be used to filter and
sort by.
• Any field besides the id that is
not specified as an attribute, is
assumed to be a text field (title)
Index Definition
Index Definition
• An Index includes one or
more sources.
• Each source gets it’s own
“source” line
• Multiple sources must all
define the same fields and
attributes.
• The ids need to be unique
across resources
Index Definition
• path is not actually a path, it’s
a filename with no extension.
• docinfo dictates if attributes
are stored in the index or
outside of the index.
• dict is not really important
now. Used to be either crc or
keywords. Now crc is
deprecated.
• min_word_len is the minimum
length of words to index
Rest of the Index Configuration
It’s time to build the index
indexer <index name>
Searching the Index
• searchd is the daemon that searches the index
• Binary Protocol



OR
• MySQL Compatible too!
searchd config
Included in the same config file as the rest
Spinning up searchd
–Sphinx
“I know MySQL”
MySQL Compatible
MySQL Compatible
• Tables == Indexes
• SHOW TABLES…Shows indexes.
• Select * From <index> works too.
Selecting from an index
Querying Indexes
• Default limit of 20 rows
• Notice the text fields are not
returned…
• They would be if we made
them attributes
(sql_field_string)
Querying Indexes
• The magic function in
SphinxQL is match()
• match() performs a full text
search against the entire
index…usually
• The ‘@field’ operator can
isolate which field is searched
on.
Querying Indexes
• You can query against
attributes
• You can sort results
• You can use the weight()
function to determine
relevancy.
Querying Indexes
• The 25387283 title was more
relevant because it matched
on the term “testing”
Getting PHP into the mix
• All we need? PDO.
• We will build a basic search page
• Accepts a query, displays up to 100 matching
results by relevancy with the matching keywords
highlighted.
Pulling data from Sphinx
Fetching the data from Mysql
Adding the fancy yellow highlighting
The rest is pretty basic…
Cool things we would talk about
if I had like…3 more hours
• Auto-suggest, Auto-correct
• More on lemmatization and stemming
• Distributed Sphinx Clustering
• Delta indexes
• Real Time Indexes
• The plethora of operators you can use
• Ranged Queries
• ………
Additional Information
• The sphinx documentation is actually pretty
great
• http://sphinxsearch.com/docs/
• Slides are already on Slideshare
• Will link them to the meet up shortly
Questions?

Más contenido relacionado

La actualidad más candente

Apache Phoenix: Transforming HBase into a SQL Database
Apache Phoenix: Transforming HBase into a SQL DatabaseApache Phoenix: Transforming HBase into a SQL Database
Apache Phoenix: Transforming HBase into a SQL DatabaseDataWorks Summit
 
MacOS memory allocator (libmalloc) Exploitation
MacOS memory allocator (libmalloc) ExploitationMacOS memory allocator (libmalloc) Exploitation
MacOS memory allocator (libmalloc) ExploitationAngel Boy
 
[오픈소스컨설팅] SELinux : Stop Disabling SELinux
[오픈소스컨설팅] SELinux : Stop Disabling SELinux[오픈소스컨설팅] SELinux : Stop Disabling SELinux
[오픈소스컨설팅] SELinux : Stop Disabling SELinuxOpen Source Consulting
 
SSH Tunneling
SSH TunnelingSSH Tunneling
SSH TunnelingThanh Tai
 
Layout lm paper review
Layout lm paper review Layout lm paper review
Layout lm paper review taeseon ryu
 
ORC Column Encryption
ORC Column EncryptionORC Column Encryption
ORC Column EncryptionOwen O'Malley
 
Network Automation (NetDevOps) with Ansible
Network Automation (NetDevOps) with AnsibleNetwork Automation (NetDevOps) with Ansible
Network Automation (NetDevOps) with AnsibleAPNIC
 
[AWSKRUG 컨테이너 소모임] Rancher 기본 입문
[AWSKRUG 컨테이너 소모임] Rancher 기본 입문[AWSKRUG 컨테이너 소모임] Rancher 기본 입문
[AWSKRUG 컨테이너 소모임] Rancher 기본 입문Hyunmin Kim
 
Expose your data as an api is with oracle rest data services -spoug Madrid
Expose your data as an api is with oracle rest data services -spoug MadridExpose your data as an api is with oracle rest data services -spoug Madrid
Expose your data as an api is with oracle rest data services -spoug MadridVinay Kumar
 
Scaling Flink in Cloud
Scaling Flink in CloudScaling Flink in Cloud
Scaling Flink in CloudSteven Wu
 
Semantic & Multilingual Strategies in Lucene/Solr
Semantic & Multilingual Strategies in Lucene/SolrSemantic & Multilingual Strategies in Lucene/Solr
Semantic & Multilingual Strategies in Lucene/SolrTrey Grainger
 
Introduction to elasticsearch
Introduction to elasticsearchIntroduction to elasticsearch
Introduction to elasticsearchhypto
 
Start python with fastapi
Start python with fastapiStart python with fastapi
Start python with fastapiku_taka
 
Chef for DevOps - an Introduction
Chef for DevOps - an IntroductionChef for DevOps - an Introduction
Chef for DevOps - an IntroductionSanjeev Sharma
 
A Percona Support Engineer Walkthrough on pt-stalk
A Percona Support Engineer Walkthrough on pt-stalkA Percona Support Engineer Walkthrough on pt-stalk
A Percona Support Engineer Walkthrough on pt-stalkMarcelo Altmann
 
Efficient Data Storage for Analytics with Parquet 2.0 - Hadoop Summit 2014
Efficient Data Storage for Analytics with Parquet 2.0 - Hadoop Summit 2014Efficient Data Storage for Analytics with Parquet 2.0 - Hadoop Summit 2014
Efficient Data Storage for Analytics with Parquet 2.0 - Hadoop Summit 2014Julien Le Dem
 
你一定不能不知道的 Markdown 寫作技巧
你一定不能不知道的 Markdown 寫作技巧你一定不能不知道的 Markdown 寫作技巧
你一定不能不知道的 Markdown 寫作技巧Will Huang
 

La actualidad más candente (20)

Apache Phoenix: Transforming HBase into a SQL Database
Apache Phoenix: Transforming HBase into a SQL DatabaseApache Phoenix: Transforming HBase into a SQL Database
Apache Phoenix: Transforming HBase into a SQL Database
 
MacOS memory allocator (libmalloc) Exploitation
MacOS memory allocator (libmalloc) ExploitationMacOS memory allocator (libmalloc) Exploitation
MacOS memory allocator (libmalloc) Exploitation
 
[오픈소스컨설팅] SELinux : Stop Disabling SELinux
[오픈소스컨설팅] SELinux : Stop Disabling SELinux[오픈소스컨설팅] SELinux : Stop Disabling SELinux
[오픈소스컨설팅] SELinux : Stop Disabling SELinux
 
HDFS: Optimization, Stabilization and Supportability
HDFS: Optimization, Stabilization and SupportabilityHDFS: Optimization, Stabilization and Supportability
HDFS: Optimization, Stabilization and Supportability
 
Spark etl
Spark etlSpark etl
Spark etl
 
SSH Tunneling
SSH TunnelingSSH Tunneling
SSH Tunneling
 
Layout lm paper review
Layout lm paper review Layout lm paper review
Layout lm paper review
 
ORC Column Encryption
ORC Column EncryptionORC Column Encryption
ORC Column Encryption
 
Network Automation (NetDevOps) with Ansible
Network Automation (NetDevOps) with AnsibleNetwork Automation (NetDevOps) with Ansible
Network Automation (NetDevOps) with Ansible
 
[AWSKRUG 컨테이너 소모임] Rancher 기본 입문
[AWSKRUG 컨테이너 소모임] Rancher 기본 입문[AWSKRUG 컨테이너 소모임] Rancher 기본 입문
[AWSKRUG 컨테이너 소모임] Rancher 기본 입문
 
Expose your data as an api is with oracle rest data services -spoug Madrid
Expose your data as an api is with oracle rest data services -spoug MadridExpose your data as an api is with oracle rest data services -spoug Madrid
Expose your data as an api is with oracle rest data services -spoug Madrid
 
Scaling Flink in Cloud
Scaling Flink in CloudScaling Flink in Cloud
Scaling Flink in Cloud
 
Semantic & Multilingual Strategies in Lucene/Solr
Semantic & Multilingual Strategies in Lucene/SolrSemantic & Multilingual Strategies in Lucene/Solr
Semantic & Multilingual Strategies in Lucene/Solr
 
Introduction to elasticsearch
Introduction to elasticsearchIntroduction to elasticsearch
Introduction to elasticsearch
 
Reversing Google Protobuf protocol
Reversing Google Protobuf protocolReversing Google Protobuf protocol
Reversing Google Protobuf protocol
 
Start python with fastapi
Start python with fastapiStart python with fastapi
Start python with fastapi
 
Chef for DevOps - an Introduction
Chef for DevOps - an IntroductionChef for DevOps - an Introduction
Chef for DevOps - an Introduction
 
A Percona Support Engineer Walkthrough on pt-stalk
A Percona Support Engineer Walkthrough on pt-stalkA Percona Support Engineer Walkthrough on pt-stalk
A Percona Support Engineer Walkthrough on pt-stalk
 
Efficient Data Storage for Analytics with Parquet 2.0 - Hadoop Summit 2014
Efficient Data Storage for Analytics with Parquet 2.0 - Hadoop Summit 2014Efficient Data Storage for Analytics with Parquet 2.0 - Hadoop Summit 2014
Efficient Data Storage for Analytics with Parquet 2.0 - Hadoop Summit 2014
 
你一定不能不知道的 Markdown 寫作技巧
你一定不能不知道的 Markdown 寫作技巧你一定不能不知道的 Markdown 寫作技巧
你一定不能不知道的 Markdown 寫作技巧
 

Destacado

Advanced fulltext search with Sphinx
Advanced fulltext search with SphinxAdvanced fulltext search with Sphinx
Advanced fulltext search with SphinxAdrian Nuta
 
Inverted files for text search engines
Inverted files for text search enginesInverted files for text search engines
Inverted files for text search enginesunyil96
 
Tips for Tuning Solr Search: No Coding Required
Tips for Tuning Solr Search: No Coding RequiredTips for Tuning Solr Search: No Coding Required
Tips for Tuning Solr Search: No Coding RequiredAcquia
 
Real-time индексы (Ярослав Ворожко)
Real-time индексы (Ярослав Ворожко)Real-time индексы (Ярослав Ворожко)
Real-time индексы (Ярослав Ворожко)Ontico
 
Transition to a secure and low-carbon Swiss energy system
Transition to a secure and low-carbon Swiss energy systemTransition to a secure and low-carbon Swiss energy system
Transition to a secure and low-carbon Swiss energy systemIEA-ETSAP
 
Calendario efemérides ambientales
Calendario efemérides ambientalesCalendario efemérides ambientales
Calendario efemérides ambientalesnicogrungelo
 
How to Build Mobile Apps Fast with The Marketing App Cloud by Proscape
How to Build Mobile Apps Fast with The Marketing App Cloud by ProscapeHow to Build Mobile Apps Fast with The Marketing App Cloud by Proscape
How to Build Mobile Apps Fast with The Marketing App Cloud by ProscapeProscape
 
Ecologia miercoles
Ecologia miercolesEcologia miercoles
Ecologia miercolesJulio Castro
 
`Kestrel global portfolio presentation 2015 05_08
`Kestrel global portfolio presentation 2015 05_08`Kestrel global portfolio presentation 2015 05_08
`Kestrel global portfolio presentation 2015 05_08Dominic Hardcastle
 
Heavy Metal Desde Cuba: ¿por Que Usted Debe preocuparse Acerca de la Hipnosis...
Heavy Metal Desde Cuba: ¿por Que Usted Debe preocuparse Acerca de la Hipnosis...Heavy Metal Desde Cuba: ¿por Que Usted Debe preocuparse Acerca de la Hipnosis...
Heavy Metal Desde Cuba: ¿por Que Usted Debe preocuparse Acerca de la Hipnosis...nola3clark6
 
Tiendasvirtuales
TiendasvirtualesTiendasvirtuales
Tiendasvirtualesveronik_gc
 
TCILatinAmerica16 Producción y usos de producción y usos de proteína vegetal
TCILatinAmerica16 Producción y usos de producción y usos de proteína vegetalTCILatinAmerica16 Producción y usos de producción y usos de proteína vegetal
TCILatinAmerica16 Producción y usos de producción y usos de proteína vegetalTCI Network
 
Sprint 2016 Confianza Creativa (3de4) Jobs-to-be-Done
Sprint 2016 Confianza Creativa (3de4) Jobs-to-be-DoneSprint 2016 Confianza Creativa (3de4) Jobs-to-be-Done
Sprint 2016 Confianza Creativa (3de4) Jobs-to-be-DoneP3 Ventures
 
Nuevo folleto del master marketing politico UCV curso 2015-16
Nuevo folleto del master marketing politico UCV curso 2015-16Nuevo folleto del master marketing politico UCV curso 2015-16
Nuevo folleto del master marketing politico UCV curso 2015-16Silvia Moya Rozalén
 
General presentation pshpp Hidro TARNITA
General presentation pshpp Hidro TARNITAGeneral presentation pshpp Hidro TARNITA
General presentation pshpp Hidro TARNITAHIDRO TARNITA SA
 

Destacado (20)

Advanced fulltext search with Sphinx
Advanced fulltext search with SphinxAdvanced fulltext search with Sphinx
Advanced fulltext search with Sphinx
 
Inverted files for text search engines
Inverted files for text search enginesInverted files for text search engines
Inverted files for text search engines
 
Sphinx y su integracion con PHP
Sphinx y su integracion con PHPSphinx y su integracion con PHP
Sphinx y su integracion con PHP
 
Tips for Tuning Solr Search: No Coding Required
Tips for Tuning Solr Search: No Coding RequiredTips for Tuning Solr Search: No Coding Required
Tips for Tuning Solr Search: No Coding Required
 
Real-time индексы (Ярослав Ворожко)
Real-time индексы (Ярослав Ворожко)Real-time индексы (Ярослав Ворожко)
Real-time индексы (Ярослав Ворожко)
 
CARTAGENA - LORCA
CARTAGENA - LORCACARTAGENA - LORCA
CARTAGENA - LORCA
 
Transition to a secure and low-carbon Swiss energy system
Transition to a secure and low-carbon Swiss energy systemTransition to a secure and low-carbon Swiss energy system
Transition to a secure and low-carbon Swiss energy system
 
Calendario efemérides ambientales
Calendario efemérides ambientalesCalendario efemérides ambientales
Calendario efemérides ambientales
 
Hr tech trends
Hr tech trendsHr tech trends
Hr tech trends
 
How to Build Mobile Apps Fast with The Marketing App Cloud by Proscape
How to Build Mobile Apps Fast with The Marketing App Cloud by ProscapeHow to Build Mobile Apps Fast with The Marketing App Cloud by Proscape
How to Build Mobile Apps Fast with The Marketing App Cloud by Proscape
 
Ecologia miercoles
Ecologia miercolesEcologia miercoles
Ecologia miercoles
 
`Kestrel global portfolio presentation 2015 05_08
`Kestrel global portfolio presentation 2015 05_08`Kestrel global portfolio presentation 2015 05_08
`Kestrel global portfolio presentation 2015 05_08
 
Computech
ComputechComputech
Computech
 
Heavy Metal Desde Cuba: ¿por Que Usted Debe preocuparse Acerca de la Hipnosis...
Heavy Metal Desde Cuba: ¿por Que Usted Debe preocuparse Acerca de la Hipnosis...Heavy Metal Desde Cuba: ¿por Que Usted Debe preocuparse Acerca de la Hipnosis...
Heavy Metal Desde Cuba: ¿por Que Usted Debe preocuparse Acerca de la Hipnosis...
 
Tiendasvirtuales
TiendasvirtualesTiendasvirtuales
Tiendasvirtuales
 
TCILatinAmerica16 Producción y usos de producción y usos de proteína vegetal
TCILatinAmerica16 Producción y usos de producción y usos de proteína vegetalTCILatinAmerica16 Producción y usos de producción y usos de proteína vegetal
TCILatinAmerica16 Producción y usos de producción y usos de proteína vegetal
 
Sprint 2016 Confianza Creativa (3de4) Jobs-to-be-Done
Sprint 2016 Confianza Creativa (3de4) Jobs-to-be-DoneSprint 2016 Confianza Creativa (3de4) Jobs-to-be-Done
Sprint 2016 Confianza Creativa (3de4) Jobs-to-be-Done
 
Nuevo folleto del master marketing politico UCV curso 2015-16
Nuevo folleto del master marketing politico UCV curso 2015-16Nuevo folleto del master marketing politico UCV curso 2015-16
Nuevo folleto del master marketing politico UCV curso 2015-16
 
General presentation pshpp Hidro TARNITA
General presentation pshpp Hidro TARNITAGeneral presentation pshpp Hidro TARNITA
General presentation pshpp Hidro TARNITA
 
Congreso de salud_ocupacional
Congreso de salud_ocupacionalCongreso de salud_ocupacional
Congreso de salud_ocupacional
 

Similar a Using Sphinx for Search in PHP

ElasticSearch: Distributed Multitenant NoSQL Datastore and Search Engine
ElasticSearch: Distributed Multitenant NoSQL Datastore and Search EngineElasticSearch: Distributed Multitenant NoSQL Datastore and Search Engine
ElasticSearch: Distributed Multitenant NoSQL Datastore and Search EngineDaniel N
 
Full Text Search with Lucene
Full Text Search with LuceneFull Text Search with Lucene
Full Text Search with LuceneWO Community
 
An Introduction to Elastic Search.
An Introduction to Elastic Search.An Introduction to Elastic Search.
An Introduction to Elastic Search.Jurriaan Persyn
 
Roaring with elastic search sangam2018
Roaring with elastic search sangam2018Roaring with elastic search sangam2018
Roaring with elastic search sangam2018Vinay Kumar
 
Sphinx new
Sphinx newSphinx new
Sphinx newrit2010
 
Asp.Net 3.5 Part 2
Asp.Net 3.5 Part 2Asp.Net 3.5 Part 2
Asp.Net 3.5 Part 2asim78
 
Test driving Azure Search and DocumentDB
Test driving Azure Search and DocumentDBTest driving Azure Search and DocumentDB
Test driving Azure Search and DocumentDBAndrew Siemer
 
ElasticSearch AJUG 2013
ElasticSearch AJUG 2013ElasticSearch AJUG 2013
ElasticSearch AJUG 2013Roy Russo
 
Intro to Elasticsearch
Intro to ElasticsearchIntro to Elasticsearch
Intro to ElasticsearchClifford James
 
Apache Lucene intro - Breizhcamp 2015
Apache Lucene intro - Breizhcamp 2015Apache Lucene intro - Breizhcamp 2015
Apache Lucene intro - Breizhcamp 2015Adrien Grand
 
Plugin Opensql2008 Sphinx
Plugin Opensql2008 SphinxPlugin Opensql2008 Sphinx
Plugin Opensql2008 SphinxLiu Lizhi
 
Exploring MongoDB & Elasticsearch: Better Together
Exploring MongoDB & Elasticsearch: Better TogetherExploring MongoDB & Elasticsearch: Better Together
Exploring MongoDB & Elasticsearch: Better TogetherObjectRocket
 
Web indexing finale
Web indexing finaleWeb indexing finale
Web indexing finaleAjit More
 
Deep dive to ElasticSearch - معرفی ابزار جستجوی الاستیکی
Deep dive to ElasticSearch - معرفی ابزار جستجوی الاستیکیDeep dive to ElasticSearch - معرفی ابزار جستجوی الاستیکی
Deep dive to ElasticSearch - معرفی ابزار جستجوی الاستیکیEhsan Asgarian
 
Elasticsearch { "Meetup" : "talk" }
Elasticsearch { "Meetup" : "talk" }Elasticsearch { "Meetup" : "talk" }
Elasticsearch { "Meetup" : "talk" }Lutf Ur Rehman
 

Similar a Using Sphinx for Search in PHP (20)

ElasticSearch: Distributed Multitenant NoSQL Datastore and Search Engine
ElasticSearch: Distributed Multitenant NoSQL Datastore and Search EngineElasticSearch: Distributed Multitenant NoSQL Datastore and Search Engine
ElasticSearch: Distributed Multitenant NoSQL Datastore and Search Engine
 
Full Text Search with Lucene
Full Text Search with LuceneFull Text Search with Lucene
Full Text Search with Lucene
 
An Introduction to Elastic Search.
An Introduction to Elastic Search.An Introduction to Elastic Search.
An Introduction to Elastic Search.
 
Roaring with elastic search sangam2018
Roaring with elastic search sangam2018Roaring with elastic search sangam2018
Roaring with elastic search sangam2018
 
Full Text Search In PostgreSQL
Full Text Search In PostgreSQLFull Text Search In PostgreSQL
Full Text Search In PostgreSQL
 
Sphinx new
Sphinx newSphinx new
Sphinx new
 
No sq lv1_0
No sq lv1_0No sq lv1_0
No sq lv1_0
 
Asp.Net 3.5 Part 2
Asp.Net 3.5 Part 2Asp.Net 3.5 Part 2
Asp.Net 3.5 Part 2
 
Test driving Azure Search and DocumentDB
Test driving Azure Search and DocumentDBTest driving Azure Search and DocumentDB
Test driving Azure Search and DocumentDB
 
L6.sp17.pptx
L6.sp17.pptxL6.sp17.pptx
L6.sp17.pptx
 
Elasticsearch Introduction at BigData meetup
Elasticsearch Introduction at BigData meetupElasticsearch Introduction at BigData meetup
Elasticsearch Introduction at BigData meetup
 
ElasticSearch AJUG 2013
ElasticSearch AJUG 2013ElasticSearch AJUG 2013
ElasticSearch AJUG 2013
 
Intro to Elasticsearch
Intro to ElasticsearchIntro to Elasticsearch
Intro to Elasticsearch
 
Apache Lucene intro - Breizhcamp 2015
Apache Lucene intro - Breizhcamp 2015Apache Lucene intro - Breizhcamp 2015
Apache Lucene intro - Breizhcamp 2015
 
Plugin Opensql2008 Sphinx
Plugin Opensql2008 SphinxPlugin Opensql2008 Sphinx
Plugin Opensql2008 Sphinx
 
Exploring MongoDB & Elasticsearch: Better Together
Exploring MongoDB & Elasticsearch: Better TogetherExploring MongoDB & Elasticsearch: Better Together
Exploring MongoDB & Elasticsearch: Better Together
 
Web indexing finale
Web indexing finaleWeb indexing finale
Web indexing finale
 
Deep dive to ElasticSearch - معرفی ابزار جستجوی الاستیکی
Deep dive to ElasticSearch - معرفی ابزار جستجوی الاستیکیDeep dive to ElasticSearch - معرفی ابزار جستجوی الاستیکی
Deep dive to ElasticSearch - معرفی ابزار جستجوی الاستیکی
 
Elasticsearch { "Meetup" : "talk" }
Elasticsearch { "Meetup" : "talk" }Elasticsearch { "Meetup" : "talk" }
Elasticsearch { "Meetup" : "talk" }
 
Elastic search
Elastic searchElastic search
Elastic search
 

Último

#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 

Último (20)

#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 

Using Sphinx for Search in PHP

  • 1. Using Sphinx for Search Mike Lively Slickdeals, LLC
  • 2. What is Sphinx? • A full-text search engine • Quickly get high quality (relevant) results • Designed to integrate well with SQL RDBMS • Can work with any data source • Can be queried using either an API or SQL
  • 3. How do I know anything about Sphinx? • Manager of Software Architecture for Slickdeals.net • Alexa top 150 site (in the US) • Have been working at improving our Sphinx search engine for the last 2 months or so. • Over 7 Million searches a month directly through the interface, lots more happen indirectly.
  • 4. When should I use Sphinx? • Site / Product / Document searches • Auto-suggest / Auto-Correct functionality • Finding relevant and related items
  • 5. Simple Architecture • Often, search is offloaded straight to the database • Search goes to the backend which performs queries on the database • Obviously very easy to implement
  • 6. Simple Architecture • Simple “starts with” searches on indexed fields can sometimes work: `city` LIKE ‘Las%’ • Anything else will lock your database for writes with MyISAM. • MySQL is not a great or flexible full text engine • It can sometimes be adequate
  • 7. Sphinx Architecture • Searchd is responsible for receiving requests from clients and executing the searches against the sphinx index. • Indexer is responsible for getting data into the sphinx index. • This separation allows indexing and searching to be scaled separately.
  • 8. Sphinx Architecture • Searchd has a binary protocol for which there are several clients available in multiple languages. • Searchd is also binary compatible with MySQL’s protocol since mysql 4.1 • Searchd is a daemon that runs on your search servers
  • 9. Sphinx Architecture • Indexer is a shell program that you can execute to build any number of indexes. • Can handle index rotation for live indexing
  • 10. Not So Quick Side Note MySQL IS SLOWWWWWWWWWWWWW (at text matches)
  • 11. Still Not Quick Side Note Indexes won’t help you…
  • 12. Quicker Side Note Full Text Search isn’t so bad IF….
  • 13. Sphinx Concepts • Sphinx Indexes “Documents” • Each document has a unique unsigned, non- zero integer ID (either 32 bit or 64 bit space) • Each document has one or more fields • Each document has zero or more attributes
  • 14. Indexes / Sources • Sphinx indexes are created from one or more sources. • The source can be a database, xml, or tsv stream. • You can use multiple sources • This is useful for maintaining updated indexes • Also used to implement a sphinx cluster
  • 15. Sphinx Fields • Fields are what the full text index is comprised of. • When searching you can search against any number of fields. • You can assign different relevancy weights to different fields. • The original value of a field is never stored by Sphinx. • You should always have at least one.
  • 16. Sphinx Attributes • data that helps further describe the item being indexed • Can be returned as a part of the search • Useful for filtering and sorting results • These are not a part of the full text index.
  • 17. MySQL Full Text Search • You can get away with MyISAM tables or as of version 5.6 InnoDB. • You don’t care about morphology (think plurals) • You don’t need anything but the most basic of search operators
  • 18. Creating An Index • We are going to add an index that sources a mysql database. • The data being sourced is a list of the titles of wikipedia posts.
  • 20. Indexer Configuration • We are going to be peaking into a sphinx configuration file now. • You can rebuild the config file by concatenating each section into a single file. • On my VM this file is located in /usr/local/etc/ sphinx.conf
  • 22. Source Definition Defines the connection information
  • 23. Connection information • Ideally, you should create a separate account for sphinx • You can also connect via unix socket • I didn’t specify it here, but you can also add a port.
  • 24. Source Definition The query that pulls data to populate the index
  • 25. Source Index • The index query MUST return the id field as the first column • Remember, the id needs to be a unique, unsigned 64 bit (or less number) • The query must be on a single line. Unless you escape new lines with back slashes. • Notice that we converted the timestamp into a unix timestamp. That is important.
  • 26. Source Definition How data is stored in the index
  • 27. Source Fields • The first column in the query is always the ID. • You specify any columns that are attributes. • Remember, attributes are stored in the index as fields that can be used to filter and sort by. • Any field besides the id that is not specified as an attribute, is assumed to be a text field (title)
  • 29. Index Definition • An Index includes one or more sources. • Each source gets it’s own “source” line • Multiple sources must all define the same fields and attributes. • The ids need to be unique across resources
  • 30. Index Definition • path is not actually a path, it’s a filename with no extension. • docinfo dictates if attributes are stored in the index or outside of the index. • dict is not really important now. Used to be either crc or keywords. Now crc is deprecated. • min_word_len is the minimum length of words to index
  • 31. Rest of the Index Configuration
  • 32. It’s time to build the index indexer <index name>
  • 33. Searching the Index • searchd is the daemon that searches the index • Binary Protocol
 
 OR • MySQL Compatible too!
  • 34. searchd config Included in the same config file as the rest
  • 38. MySQL Compatible • Tables == Indexes • SHOW TABLES…Shows indexes. • Select * From <index> works too.
  • 40. Querying Indexes • Default limit of 20 rows • Notice the text fields are not returned… • They would be if we made them attributes (sql_field_string)
  • 41. Querying Indexes • The magic function in SphinxQL is match() • match() performs a full text search against the entire index…usually • The ‘@field’ operator can isolate which field is searched on.
  • 42. Querying Indexes • You can query against attributes • You can sort results • You can use the weight() function to determine relevancy.
  • 43. Querying Indexes • The 25387283 title was more relevant because it matched on the term “testing”
  • 44. Getting PHP into the mix • All we need? PDO. • We will build a basic search page • Accepts a query, displays up to 100 matching results by relevancy with the matching keywords highlighted.
  • 45.
  • 47. Fetching the data from Mysql
  • 48. Adding the fancy yellow highlighting
  • 49. The rest is pretty basic…
  • 50. Cool things we would talk about if I had like…3 more hours • Auto-suggest, Auto-correct • More on lemmatization and stemming • Distributed Sphinx Clustering • Delta indexes • Real Time Indexes • The plethora of operators you can use • Ranged Queries • ………
  • 51. Additional Information • The sphinx documentation is actually pretty great • http://sphinxsearch.com/docs/ • Slides are already on Slideshare • Will link them to the meet up shortly