4. All you need is…
>> query = quot;The cat sat on the matquot;
=> quot;The cat sat on the matquot;
>> where = quot;(email like '%#{ query.split(/s+/).map{|term| term.downcase }.join(quot;%') OR (email
like '%quot;) }')quot;
=> quot;(email like '%the%') OR (email like '%cat%') OR (email like '%sat%') OR (email like '%on%')
OR (email like '%the%') OR (email like '%mat')quot;
>>execute(“select * from users where #{where}”)
=> fail
PHP
Congratulations, you are now a l33t programmer!
^
Job done!
5. All you need is…
>> query = quot;The cat sat on the matquot;
=> quot;The cat sat on the matquot;
>> where = quot;(email like '%#{ query.split(/s+/).map{|term| term.downcase }.join(quot;%') OR (email
like '%quot;) }')quot;
=> quot;(email like '%the%') OR (email like '%cat%') OR (email like '%sat%') OR (email like '%on%')
OR (email like '%the%') OR (email like '%mat')quot;
>>execute(“select * from users where #{where}”)
=> fail
PHP
Congratulations, you are now a l33t programmer!
^
Job done!
6. Why not use the DB?
• Building up SQL queries in
code sucks
• Full text indexing in DBs isn’t
great either
• DB’s are hard to scale
7. Why not use the DB?
• Building up SQL queries in
code sucks
• Full text indexing in DBs isn’t
great either
• DB’s are hard to scale
8. Sphinx is…
• Sphinx is a full-text search engine
• Open source (GPL version 2)
• Standalone
• Proven stable
• Performs well
12. Out of the box
• indexer - utility which creates fulltext indexes
• searchd - daemon which enables external software to search fulltext
indexes
Amongst other things
13. Using with your app
Two Ruby on Rails APIs
• Ultra Sphinx
• Thinking Sphinx
- Welcome
- Beer and pizza sponsored by Brightbox
- Please consider talking!
-- So why not just use the DB?
- building SQL in code, easy to introduce mistakes - in with image
- Someone has already handled the hard stuff, stop-word removal, stemming, that sort of thing
- DB’s are traditionally the hardest element of a stack to scale, lets not put more stuff there. One of the main points.
- Luckily there are a bunch of alternatives, next slide
…and sphinx is one - Standalone - runs as a separate process
- written in c, small memory footprint
- stable
- high indexing speed (upto 10 MB/sec on modern CPUs)
- high search speed (avg query is under 0.1 sec on 2-4 GB text collections)
- high scalability (upto 100 GB of text, upto 100 M documents on a single CPU)
- supports distributed searching (since v.0.9.6)
- Most importantly… read
- Don’t let anyone try to convince you otherwise with shady propaganda
Installing sphinx is easy
- indexer, builds indexes
- searchd, where the magic happens
- Not much use to us unless we can use it with our applications, we have two choices
- Both widely used at EY
- differences