SlideShare una empresa de Scribd logo
1 de 24
Descargar para leer sin conexión
Searching With
Thinking Sphinx



                    Dan Pickett
I Know What You’re Thinking...




           But, No
The Sphinx We’re
 Talking About




    Yes, the Eye is looking at you
What is Full Text,
        Indexed Search?
•   Searches for keyword matches

    •   Think of the DB “like” operator on steroids

•   File based index (reduces DB load)

•   Relevance Ranking / Phrase Proximity

•   Two step process

    •   Query the DB and create indices (indexer)

    •   Search against created indices (searchd)
Can Haz Search?
   What’s Out There
• Direct SQL
• Ferret
• SOLR
• Lucene
• Sphinx
     Every time you integrate Ferret, an angel weeps for you
Courtesy of: Evan Weaver, “Rails Search Benchmarks” 03/17/08




                      ‘Nuff Said
Although angels are known to be emotional characters
UltraSphinx




Also, Evan Weaver likes Thinking Sphinx
Why Sphinx Rocks
• Relevance Ratings and Phrase Proximity
• Active Development
• searchd Daemon doesn’t hog memory
• Delta Indexing
• Fast Indexing + Querying
• Distributed Capability
            You rock too, but Sphinx is cooler
Why TS Rocks
•   Maximizes use of the Riddle Client

    •   Sort modes

    •   Match modes

•   Great support and active community

•   Available as a gem and a plug-in

•   Beautiful Code

•   Pat Allan is the man

          That was mean - I apologize for the burn in the last slide.
                    You are equally as cool as Sphinx
Let’s Play A Game...
        Where the F*ck is Carmen Sandiego?©




                                                   Courtesy: Bob-Rz @ Deviant Art 02/19/07


 “Where the F*ck is Carmen Sandiego?” is a registered trademark of Enlight Solutions, Inc. Well, not really but it sounds cool.
Honestly, though, does anyone ever read the fine print? You should be paying attention to the presentation. On we go...seriously,
                                                        focus people.
Define your Index of Suspects
InstallShield FTL
       Let’s Use Rake
• rake ts:config
• rake ts:in
• rake ts:start
• rake ts:stop
• rake ts:restart
Get to Work, Detective
Make Your Arrest




    That was easy...
Additional Features
• Match Modes
• Sort Modes
• Polymorphism
• Field Weighting
• Integration with will_paginate
What I Wish I Knew


                                                  Serious Mullet




 Protip: Despite its misleading name, Rockapella does not rock
What I Wish I Knew
    About Integrating TS
•   Sometimes the indexer silently fails

    •   Watch your output

    •   Disregard the Distributed Index warning

•   Use delta indexing

•   Run regular index tasks

•   Use delayed_job or another queue manager to
    handle delta indexing

                  What time is it? Beer o’clock
What I Wish I Knew
    About Deploying TS
•   Store PID files in a shared folder

•   Ensure you’ve set proper permissions

•   Set memory limits on indexing

    •   mem_limit option in sphinx.yml

•   For large data sets, indices can be extremely large

    •   Ensure you have a surplus of storage capacity


            Are we done yet? It’s about that time for a beer...
What’s Missing?

• Excerpting
• Strong Facet Support
• ASpell Integration/Spell Check support

       Blah, blah, blah - You must be getting thirsty by now
It’s a Young but
     Awesome Utility
• Clone the source and see for yourself
 • freelancing-god/thinking-sphinx
 • Cucumber test-suite
 • Extremely well architected
• Join the mailing list (Google Groups)
          Did he mention Pat Allan is the man, yet?
Thanks

• Follow me on Twitter
 • www.twitter.com/dpickett
• Check out my blog
 • www.enlightsolutions.com
• Recommend me
Questions?

Más contenido relacionado

Destacado

Engines Lightning Talk
Engines Lightning TalkEngines Lightning Talk
Engines Lightning TalkDan Pickett
 
So You've Got a Software Idea...Now What?
So You've Got a Software Idea...Now What?So You've Got a Software Idea...Now What?
So You've Got a Software Idea...Now What?Dan Pickett
 
Refinery CMS: BostonRB CMS Showdown
Refinery CMS: BostonRB CMS ShowdownRefinery CMS: BostonRB CMS Showdown
Refinery CMS: BostonRB CMS ShowdownDan Pickett
 
Comenius Trilateral Meeting - Second Day
Comenius Trilateral Meeting -   Second DayComenius Trilateral Meeting -   Second Day
Comenius Trilateral Meeting - Second DayJ. Carlos Martínez
 
Breathing in Obesity
Breathing in ObesityBreathing in Obesity
Breathing in ObesityParthiv Mehta
 
Investigación de Mercados - Aula Empresa
Investigación de Mercados - Aula EmpresaInvestigación de Mercados - Aula Empresa
Investigación de Mercados - Aula EmpresaAula Empresa
 
Schneider Canal Pump Station
Schneider Canal Pump StationSchneider Canal Pump Station
Schneider Canal Pump Stationjake_716
 
La Nieve
La NieveLa Nieve
La NieveABILIO
 
Traballo Castealo
Traballo CastealoTraballo Castealo
Traballo CastealoKarlailla
 
Family Policies – A Promising Field of eParticipation
Family Policies – A Promising Field of eParticipationFamily Policies – A Promising Field of eParticipation
Family Policies – A Promising Field of eParticipationePractice.eu
 

Destacado (15)

Engines Lightning Talk
Engines Lightning TalkEngines Lightning Talk
Engines Lightning Talk
 
So You've Got a Software Idea...Now What?
So You've Got a Software Idea...Now What?So You've Got a Software Idea...Now What?
So You've Got a Software Idea...Now What?
 
Refinery CMS: BostonRB CMS Showdown
Refinery CMS: BostonRB CMS ShowdownRefinery CMS: BostonRB CMS Showdown
Refinery CMS: BostonRB CMS Showdown
 
감성제품10개
감성제품10개감성제품10개
감성제품10개
 
Comenius Trilateral Meeting - Second Day
Comenius Trilateral Meeting -   Second DayComenius Trilateral Meeting -   Second Day
Comenius Trilateral Meeting - Second Day
 
On Gamification
On Gamification On Gamification
On Gamification
 
Bubu
BubuBubu
Bubu
 
Lisboa Portugal
Lisboa PortugalLisboa Portugal
Lisboa Portugal
 
Breathing in Obesity
Breathing in ObesityBreathing in Obesity
Breathing in Obesity
 
Investigación de Mercados - Aula Empresa
Investigación de Mercados - Aula EmpresaInvestigación de Mercados - Aula Empresa
Investigación de Mercados - Aula Empresa
 
Cables
CablesCables
Cables
 
Schneider Canal Pump Station
Schneider Canal Pump StationSchneider Canal Pump Station
Schneider Canal Pump Station
 
La Nieve
La NieveLa Nieve
La Nieve
 
Traballo Castealo
Traballo CastealoTraballo Castealo
Traballo Castealo
 
Family Policies – A Promising Field of eParticipation
Family Policies – A Promising Field of eParticipationFamily Policies – A Promising Field of eParticipation
Family Policies – A Promising Field of eParticipation
 

Último

Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 

Último (20)

Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 

Thinking Sphinx Talk at Boston.rb

  • 2. I Know What You’re Thinking... But, No
  • 3. The Sphinx We’re Talking About Yes, the Eye is looking at you
  • 4. What is Full Text, Indexed Search? • Searches for keyword matches • Think of the DB “like” operator on steroids • File based index (reduces DB load) • Relevance Ranking / Phrase Proximity • Two step process • Query the DB and create indices (indexer) • Search against created indices (searchd)
  • 5. Can Haz Search? What’s Out There • Direct SQL • Ferret • SOLR • Lucene • Sphinx Every time you integrate Ferret, an angel weeps for you
  • 6. Courtesy of: Evan Weaver, “Rails Search Benchmarks” 03/17/08 ‘Nuff Said Although angels are known to be emotional characters
  • 7. UltraSphinx Also, Evan Weaver likes Thinking Sphinx
  • 8. Why Sphinx Rocks • Relevance Ratings and Phrase Proximity • Active Development • searchd Daemon doesn’t hog memory • Delta Indexing • Fast Indexing + Querying • Distributed Capability You rock too, but Sphinx is cooler
  • 9. Why TS Rocks • Maximizes use of the Riddle Client • Sort modes • Match modes • Great support and active community • Available as a gem and a plug-in • Beautiful Code • Pat Allan is the man That was mean - I apologize for the burn in the last slide. You are equally as cool as Sphinx
  • 10. Let’s Play A Game... Where the F*ck is Carmen Sandiego?© Courtesy: Bob-Rz @ Deviant Art 02/19/07 “Where the F*ck is Carmen Sandiego?” is a registered trademark of Enlight Solutions, Inc. Well, not really but it sounds cool. Honestly, though, does anyone ever read the fine print? You should be paying attention to the presentation. On we go...seriously, focus people.
  • 11. Define your Index of Suspects
  • 12. InstallShield FTL Let’s Use Rake • rake ts:config • rake ts:in • rake ts:start • rake ts:stop • rake ts:restart
  • 13. Get to Work, Detective
  • 14.
  • 15.
  • 16. Make Your Arrest That was easy...
  • 17. Additional Features • Match Modes • Sort Modes • Polymorphism • Field Weighting • Integration with will_paginate
  • 18. What I Wish I Knew Serious Mullet Protip: Despite its misleading name, Rockapella does not rock
  • 19. What I Wish I Knew About Integrating TS • Sometimes the indexer silently fails • Watch your output • Disregard the Distributed Index warning • Use delta indexing • Run regular index tasks • Use delayed_job or another queue manager to handle delta indexing What time is it? Beer o’clock
  • 20. What I Wish I Knew About Deploying TS • Store PID files in a shared folder • Ensure you’ve set proper permissions • Set memory limits on indexing • mem_limit option in sphinx.yml • For large data sets, indices can be extremely large • Ensure you have a surplus of storage capacity Are we done yet? It’s about that time for a beer...
  • 21. What’s Missing? • Excerpting • Strong Facet Support • ASpell Integration/Spell Check support Blah, blah, blah - You must be getting thirsty by now
  • 22. It’s a Young but Awesome Utility • Clone the source and see for yourself • freelancing-god/thinking-sphinx • Cucumber test-suite • Extremely well architected • Join the mailing list (Google Groups) Did he mention Pat Allan is the man, yet?
  • 23. Thanks • Follow me on Twitter • www.twitter.com/dpickett • Check out my blog • www.enlightsolutions.com • Recommend me