SlideShare una empresa de Scribd logo
1 de 67
Descargar para leer sin conexión
Ferret
A Ruby Search Engine
  Brian Sam-Bodden
Agenda

• What is Ferret?
• Concepts
• Fields
• Indexing
• Installing Ferret
Agenda

• The Recipe
• Documents
• Ferret::Index::Index
• FQL
• Ferret in you App
Agenda

• Ferret in Rails
• Resources
What is Ferret?

• Information Retrieval (IR) Library
• Full-featured Text Search Engine
• Inspired on the         Search Engine

• Port to Ruby by David Balmain
What is Ferret?

• Initially a 100% pure Ruby port
• Since 0.9 many core functions are
  implemented in C

• Fast! Now Faster than Lucene ;-)
Concepts
Concepts

• Index : Sequence of documents
Concepts

• Index : Sequence of documents
• Document : Sequence of fields
Concepts

• Index : Sequence of documents
• Document : Sequence of fields
• Field : Named sequence of terms
Concepts

• Index : Sequence of documents
• Document : Sequence of fields
• Field : Named sequence of terms
• Term : A text string, keyed by field name
Fields of a Document in
        an Index
Fields of a Document in
           an Index
• Fields are individually searchable units
  that are:
Fields of a Document in
           an Index
• Fields are individually searchable units
  that are:
  • Stored: The original Terms of the fields are store
Fields of a Document in
           an Index
• Fields are individually searchable units
  that are:
  • Stored: The original Terms of the fields are store
  • Indexed: Inverted to rapidly find all Documents
    containing any of the Terms
Fields of a Document in
           an Index
• Fields are individually searchable units
  that are:
  • Stored: The original Terms of the fields are store
  • Indexed: Inverted to rapidly find all Documents
    containing any of the Terms

  • Tokenized: Individual Terms extracted are
    indexed
Fields of a Document in
           an Index
• Fields are individually searchable units
  that are:
  • Stored: The original Terms of the fields are store
  • Indexed: Inverted to rapidly find all Documents
    containing any of the Terms

  • Tokenized: Individual Terms extracted are
    indexed

  • Vectored: Frequency and location of Terms are
    stored
It’s all about Indexing

• Indexing is the processing of a source
  document into plain text tokens that Ferret
  can manipulate
• For any non-plaintext sources such as PDF,
  Word, Excel you need to:
  • Extract
  • Analyze
Installing Ferret
Installing Ferret



gem install ferret
Installing Ferret
Installing Ferret
Installing Ferret



    }
Installing Ferret



    }   Pick the latest version
        for your platform
The Recipe
The Recipe

1. Create some Documents
The Recipe

1. Create some Documents

2. Create an Index
The Recipe

1. Create some Documents

2. Create an Index

3. Adding Documents to the Index
The Recipe

1. Create some Documents

2. Create an Index

3. Adding Documents to the Index

4. Perform some Queries
Example Documents
 Create some Documents
Example Documents
  Create some Documents




 “Any String is a Document”
Example Documents
 Create some Documents
Example Documents
   Create some Documents




[“This”, “is also”, “a document”]
Example Documents
 Create some Documents
Example Documents
 Create some Documents
Ferret::Index::Index
     Create an Index
Ferret::Index::Index
            Create an Index

• Indexes are encapsulated by the class
Ferret::Index::Index
             Create an Index

• Indexes are encapsulated by the class
 ➡ Ferret::Index::Index
Ferret::Index::Index
             Create an Index

• Indexes are encapsulated by the class
 ➡ Ferret::Index::Index
• Use the alias Ferret::I for convenience
Ferret::Index::Index
             Create an Index

• Indexes are encapsulated by the class
 ➡ Ferret::Index::Index
• Use the alias Ferret::I for convenience
• Index can be persistent
Ferret::Index::Index
              Create an Index

• Indexes are encapsulated by the class
 ➡ Ferret::Index::Index
• Use the alias Ferret::I for convenience
• Index can be persistent
 ➡ index = Ferret::I.new(:path = > ‘/somepath’)
Ferret::Index::Index
              Create an Index

• Indexes are encapsulated by the class
 ➡ Ferret::Index::Index
• Use the alias Ferret::I for convenience
• Index can be persistent
 ➡ index = Ferret::I.new(:path = > ‘/somepath’)
• Or, completely in Memory
Ferret::Index::Index
              Create an Index

• Indexes are encapsulated by the class
 ➡ Ferret::Index::Index
• Use the alias Ferret::I for convenience
• Index can be persistent
 ➡ index = Ferret::I.new(:path = > ‘/somepath’)
• Or, completely in Memory
 ➡ index = Ferret::I.new()
Ferret::Index::Index
     Adding Documents to the Index

• Index provides the add_document
  method

• It also provides the << alias
• Adding documents is then as easy as:
 ➡ index << “This is a document”
 ➡ index << {:first => “Bob”, :last => “Smith”}
Ferret::Index::Index
   Perform some Queries
Ferret::Index::Index
         Perform some Queries

• Index provides the search and
  search_each methods
Ferret::Index::Index
          Perform some Queries

• Index provides the search and
  search_each methods

• search method takes a query and a an
  optional set of parameters:
Ferret::Index::Index
           Perform some Queries

• Index provides the search and
  search_each methods

• search method takes a query and a an
  optional set of parameters:
 ➡ search(query, options = {})
Ferret::Index::Index
          Perform some Queries

• Index provides the search and
  search_each methods

• search method takes a query and a an
  optional set of parameters:
 ➡ search(query, options = {})
• The search_each method provides an
  iterator block
Ferret::Index::Index
            Perform some Queries

• Index provides the search and
  search_each methods

• search method takes a query and a an
  optional set of parameters:
 ➡ search(query, options = {})
• The search_each method provides an
  iterator block
 ➡ search_each(query, options = {}) {|doc, score| ... }
Playing with Ferret in irb
Playing with Ferret in irb
Ferret Query Language

• Ferret own Query Language, FQL is a
  powerful way to specify search queries

• FQL supports many query types,
  including:

     • Term         • Range
     • Phrase       • Wild
     • Field        • Fuzz
     • Boolean
Index.explain

• The explain method of Index describes
  how a document score against a query
 • Very useful for debugging
 • and for learning how Ferret works
Index.explain
Ferret in your App
Application


                   Database             Web


                                                                   User
                                          Manual
              File System
                                           Input


                                                      Get User’s             Present
                        Gather Data                                       Search Results
                                                        Query



                              Index
                            Documents                        Search Index
Ferret




                                              Index
Ferret in Rails

• Acts As Ferret is an ActiveRecord
  extension

• Available as a plugin
• Provides a simplified interface to
  Ferret
• Maintained by Jens Kramer
Ferret in Rails

• Adding an index to an ActiveRecord
  model is as simple as:
Ferret in Rails

• Adding an index to an ActiveRecord
  model is as simple as:
Ferret in Rails
• Simple model has two searchable
  fields title and body:
Ferret in Rails

• After a quick rake db:migrate we now
  have some data to play with
• Fire up the Rails Console and let’s see
  what acts_as_ferret can do for our
  models
Ferret in Rails
Want more?

• Ferret is improving constantly
• Acts As Ferret seems to catch up
  quickly

• Real-life usage seems to require some
  good engineering on your part

  • Background indexing
  • Hot swap of indexes?
Want more?

• We only covered the simplest
  constructs in Ferret

• Ferret’s API provides enough
  flexibility for the most demanding
  searching needs
Online Resources

• http://ferret.davebalmain.com
• http://lucene.apache.org
• http://lucenebook.com
• http://projects.jkraemer.net/acts_as_ferret
In-Print Resources
Thanks!

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Django
DjangoDjango
Django
 
Ld4 l triannon
Ld4 l triannonLd4 l triannon
Ld4 l triannon
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
Consuming External Content and Enriching Content with Apache Camel
Consuming External Content and Enriching Content with Apache CamelConsuming External Content and Enriching Content with Apache Camel
Consuming External Content and Enriching Content with Apache Camel
 
IR with lucene
IR with luceneIR with lucene
IR with lucene
 
Tagging search solution design Advanced edition
Tagging search solution design Advanced editionTagging search solution design Advanced edition
Tagging search solution design Advanced edition
 
Doing Synonyms Right - John Marquiss, Wolters Kluwer
Doing Synonyms Right - John Marquiss, Wolters KluwerDoing Synonyms Right - John Marquiss, Wolters Kluwer
Doing Synonyms Right - John Marquiss, Wolters Kluwer
 
Thinking restfully
Thinking restfullyThinking restfully
Thinking restfully
 
Exploring Direct Concept Search - Steve Rowe, Lucidworks
Exploring Direct Concept Search - Steve Rowe, LucidworksExploring Direct Concept Search - Steve Rowe, Lucidworks
Exploring Direct Concept Search - Steve Rowe, Lucidworks
 
NoSQL Riak MongoDB Elasticsearch - All The Same?
NoSQL Riak MongoDB Elasticsearch - All The Same?NoSQL Riak MongoDB Elasticsearch - All The Same?
NoSQL Riak MongoDB Elasticsearch - All The Same?
 
Do you need an external search platform for Adobe Experience Manager?
Do you need an external search platform for Adobe Experience Manager?Do you need an external search platform for Adobe Experience Manager?
Do you need an external search platform for Adobe Experience Manager?
 
elasticsearch
elasticsearchelasticsearch
elasticsearch
 
What's with the 1s and 0s? Making sense of binary data at scale with Tika and...
What's with the 1s and 0s? Making sense of binary data at scale with Tika and...What's with the 1s and 0s? Making sense of binary data at scale with Tika and...
What's with the 1s and 0s? Making sense of binary data at scale with Tika and...
 
Apache Tika end-to-end
Apache Tika end-to-endApache Tika end-to-end
Apache Tika end-to-end
 
Apache tika
Apache tikaApache tika
Apache tika
 
Fire kit ios (r-baldwin)
Fire kit ios (r-baldwin)Fire kit ios (r-baldwin)
Fire kit ios (r-baldwin)
 
CARA MEMBUAT REFERENSI DAN SITASI PADA NASKAH
CARA MEMBUAT REFERENSI DAN SITASI PADA NASKAHCARA MEMBUAT REFERENSI DAN SITASI PADA NASKAH
CARA MEMBUAT REFERENSI DAN SITASI PADA NASKAH
 
W3C Web Annotation WG Update (I Annotate 2016)
W3C Web Annotation WG Update (I Annotate 2016)W3C Web Annotation WG Update (I Annotate 2016)
W3C Web Annotation WG Update (I Annotate 2016)
 
What's new with Apache Tika?
What's new with Apache Tika?What's new with Apache Tika?
What's new with Apache Tika?
 
Applied Semantic Search with Microsoft SQL Server
Applied Semantic Search with Microsoft SQL ServerApplied Semantic Search with Microsoft SQL Server
Applied Semantic Search with Microsoft SQL Server
 

Similar a Ferret A Ruby Search Engine

Full Text Search with Lucene
Full Text Search with LuceneFull Text Search with Lucene
Full Text Search with Lucene
WO Community
 
Solr Recipes Workshop
Solr Recipes WorkshopSolr Recipes Workshop
Solr Recipes Workshop
Erik Hatcher
 
Lucene BootCamp
Lucene BootCampLucene BootCamp
Lucene BootCamp
GokulD
 
#SEASPC: Information Architecture and Enterprise Search - Better Together
#SEASPC: Information Architecture and Enterprise Search - Better Together#SEASPC: Information Architecture and Enterprise Search - Better Together
#SEASPC: Information Architecture and Enterprise Search - Better Together
Agnes Molnar
 
Search enabled applications with lucene.net
Search enabled applications with lucene.netSearch enabled applications with lucene.net
Search enabled applications with lucene.net
Willem Meints
 
Rapid API Development ArangoDB Foxx
Rapid API Development ArangoDB FoxxRapid API Development ArangoDB Foxx
Rapid API Development ArangoDB Foxx
Michael Hackstein
 

Similar a Ferret A Ruby Search Engine (20)

Ferret
FerretFerret
Ferret
 
Ferret
FerretFerret
Ferret
 
Full Text Search with Lucene
Full Text Search with LuceneFull Text Search with Lucene
Full Text Search with Lucene
 
Integrating the Solr search engine
Integrating the Solr search engineIntegrating the Solr search engine
Integrating the Solr search engine
 
Tthornton code4lib
Tthornton code4libTthornton code4lib
Tthornton code4lib
 
Solr Recipes Workshop
Solr Recipes WorkshopSolr Recipes Workshop
Solr Recipes Workshop
 
Penny coventry fiddler-spsbe23
Penny coventry fiddler-spsbe23Penny coventry fiddler-spsbe23
Penny coventry fiddler-spsbe23
 
Lucene BootCamp
Lucene BootCampLucene BootCamp
Lucene BootCamp
 
Domain Specific Development using T4
Domain Specific Development using T4Domain Specific Development using T4
Domain Specific Development using T4
 
Apache solr
Apache solrApache solr
Apache solr
 
Let's Build an Inverted Index: Introduction to Apache Lucene/Solr
Let's Build an Inverted Index: Introduction to Apache Lucene/SolrLet's Build an Inverted Index: Introduction to Apache Lucene/Solr
Let's Build an Inverted Index: Introduction to Apache Lucene/Solr
 
Fhir dev days 2017 fhir profiling - overview and introduction v07
Fhir dev days 2017   fhir profiling - overview and introduction v07Fhir dev days 2017   fhir profiling - overview and introduction v07
Fhir dev days 2017 fhir profiling - overview and introduction v07
 
Dutch Information Worker User Group - January 2022 - eDiscovery and Microsoft...
Dutch Information Worker User Group - January 2022 - eDiscovery and Microsoft...Dutch Information Worker User Group - January 2022 - eDiscovery and Microsoft...
Dutch Information Worker User Group - January 2022 - eDiscovery and Microsoft...
 
Fedora4
Fedora4Fedora4
Fedora4
 
#SEASPC: Information Architecture and Enterprise Search - Better Together
#SEASPC: Information Architecture and Enterprise Search - Better Together#SEASPC: Information Architecture and Enterprise Search - Better Together
#SEASPC: Information Architecture and Enterprise Search - Better Together
 
Crossref LIVE US Online
Crossref LIVE US OnlineCrossref LIVE US Online
Crossref LIVE US Online
 
Drupal and Apache Stanbol
Drupal and Apache StanbolDrupal and Apache Stanbol
Drupal and Apache Stanbol
 
Search enabled applications with lucene.net
Search enabled applications with lucene.netSearch enabled applications with lucene.net
Search enabled applications with lucene.net
 
Content Registration - Crossref LIVE Hannover
Content Registration - Crossref LIVE HannoverContent Registration - Crossref LIVE Hannover
Content Registration - Crossref LIVE Hannover
 
Rapid API Development ArangoDB Foxx
Rapid API Development ArangoDB FoxxRapid API Development ArangoDB Foxx
Rapid API Development ArangoDB Foxx
 

Más de elliando dias

Why you should be excited about ClojureScript
Why you should be excited about ClojureScriptWhy you should be excited about ClojureScript
Why you should be excited about ClojureScript
elliando dias
 
Nomenclatura e peças de container
Nomenclatura  e peças de containerNomenclatura  e peças de container
Nomenclatura e peças de container
elliando dias
 
Polyglot and Poly-paradigm Programming for Better Agility
Polyglot and Poly-paradigm Programming for Better AgilityPolyglot and Poly-paradigm Programming for Better Agility
Polyglot and Poly-paradigm Programming for Better Agility
elliando dias
 
Javascript Libraries
Javascript LibrariesJavascript Libraries
Javascript Libraries
elliando dias
 
How to Make an Eight Bit Computer and Save the World!
How to Make an Eight Bit Computer and Save the World!How to Make an Eight Bit Computer and Save the World!
How to Make an Eight Bit Computer and Save the World!
elliando dias
 
A Practical Guide to Connecting Hardware to the Web
A Practical Guide to Connecting Hardware to the WebA Practical Guide to Connecting Hardware to the Web
A Practical Guide to Connecting Hardware to the Web
elliando dias
 
Introdução ao Arduino
Introdução ao ArduinoIntrodução ao Arduino
Introdução ao Arduino
elliando dias
 
Incanter Data Sorcery
Incanter Data SorceryIncanter Data Sorcery
Incanter Data Sorcery
elliando dias
 
Fab.in.a.box - Fab Academy: Machine Design
Fab.in.a.box - Fab Academy: Machine DesignFab.in.a.box - Fab Academy: Machine Design
Fab.in.a.box - Fab Academy: Machine Design
elliando dias
 
Hadoop - Simple. Scalable.
Hadoop - Simple. Scalable.Hadoop - Simple. Scalable.
Hadoop - Simple. Scalable.
elliando dias
 
Hadoop and Hive Development at Facebook
Hadoop and Hive Development at FacebookHadoop and Hive Development at Facebook
Hadoop and Hive Development at Facebook
elliando dias
 
Multi-core Parallelization in Clojure - a Case Study
Multi-core Parallelization in Clojure - a Case StudyMulti-core Parallelization in Clojure - a Case Study
Multi-core Parallelization in Clojure - a Case Study
elliando dias
 

Más de elliando dias (20)

Clojurescript slides
Clojurescript slidesClojurescript slides
Clojurescript slides
 
Why you should be excited about ClojureScript
Why you should be excited about ClojureScriptWhy you should be excited about ClojureScript
Why you should be excited about ClojureScript
 
Functional Programming with Immutable Data Structures
Functional Programming with Immutable Data StructuresFunctional Programming with Immutable Data Structures
Functional Programming with Immutable Data Structures
 
Nomenclatura e peças de container
Nomenclatura  e peças de containerNomenclatura  e peças de container
Nomenclatura e peças de container
 
Geometria Projetiva
Geometria ProjetivaGeometria Projetiva
Geometria Projetiva
 
Polyglot and Poly-paradigm Programming for Better Agility
Polyglot and Poly-paradigm Programming for Better AgilityPolyglot and Poly-paradigm Programming for Better Agility
Polyglot and Poly-paradigm Programming for Better Agility
 
Javascript Libraries
Javascript LibrariesJavascript Libraries
Javascript Libraries
 
How to Make an Eight Bit Computer and Save the World!
How to Make an Eight Bit Computer and Save the World!How to Make an Eight Bit Computer and Save the World!
How to Make an Eight Bit Computer and Save the World!
 
Ragel talk
Ragel talkRagel talk
Ragel talk
 
A Practical Guide to Connecting Hardware to the Web
A Practical Guide to Connecting Hardware to the WebA Practical Guide to Connecting Hardware to the Web
A Practical Guide to Connecting Hardware to the Web
 
Introdução ao Arduino
Introdução ao ArduinoIntrodução ao Arduino
Introdução ao Arduino
 
Minicurso arduino
Minicurso arduinoMinicurso arduino
Minicurso arduino
 
Incanter Data Sorcery
Incanter Data SorceryIncanter Data Sorcery
Incanter Data Sorcery
 
Rango
RangoRango
Rango
 
Fab.in.a.box - Fab Academy: Machine Design
Fab.in.a.box - Fab Academy: Machine DesignFab.in.a.box - Fab Academy: Machine Design
Fab.in.a.box - Fab Academy: Machine Design
 
The Digital Revolution: Machines that makes
The Digital Revolution: Machines that makesThe Digital Revolution: Machines that makes
The Digital Revolution: Machines that makes
 
Hadoop + Clojure
Hadoop + ClojureHadoop + Clojure
Hadoop + Clojure
 
Hadoop - Simple. Scalable.
Hadoop - Simple. Scalable.Hadoop - Simple. Scalable.
Hadoop - Simple. Scalable.
 
Hadoop and Hive Development at Facebook
Hadoop and Hive Development at FacebookHadoop and Hive Development at Facebook
Hadoop and Hive Development at Facebook
 
Multi-core Parallelization in Clojure - a Case Study
Multi-core Parallelization in Clojure - a Case StudyMulti-core Parallelization in Clojure - a Case Study
Multi-core Parallelization in Clojure - a Case Study
 

Último

Kharar Call Girls Service✔️ 9915851334 ✔️Call Now Ranveer📲 Zirakpur Escort Se...
Kharar Call Girls Service✔️ 9915851334 ✔️Call Now Ranveer📲 Zirakpur Escort Se...Kharar Call Girls Service✔️ 9915851334 ✔️Call Now Ranveer📲 Zirakpur Escort Se...
Kharar Call Girls Service✔️ 9915851334 ✔️Call Now Ranveer📲 Zirakpur Escort Se...
rajveermohali2022
 
I am Independent Call girl in noida at chepest price Call Me 8826255397
I am Independent Call girl in noida at chepest price Call Me 8826255397I am Independent Call girl in noida at chepest price Call Me 8826255397
I am Independent Call girl in noida at chepest price Call Me 8826255397
Riya Singh
 
Call Girls In Raigad Escorts ☎️8617370543 🔝 💃 Enjoy 24/7 Escort Service Enjoy...
Call Girls In Raigad Escorts ☎️8617370543 🔝 💃 Enjoy 24/7 Escort Service Enjoy...Call Girls In Raigad Escorts ☎️8617370543 🔝 💃 Enjoy 24/7 Escort Service Enjoy...
Call Girls In Raigad Escorts ☎️8617370543 🔝 💃 Enjoy 24/7 Escort Service Enjoy...
Nitya salvi
 
💚Call Girl In Amritsar 💯Anvi 📲🔝8725944379🔝Amritsar Call Girls No💰Advance Cash...
💚Call Girl In Amritsar 💯Anvi 📲🔝8725944379🔝Amritsar Call Girls No💰Advance Cash...💚Call Girl In Amritsar 💯Anvi 📲🔝8725944379🔝Amritsar Call Girls No💰Advance Cash...
💚Call Girl In Amritsar 💯Anvi 📲🔝8725944379🔝Amritsar Call Girls No💰Advance Cash...
Sheetaleventcompany
 

Último (20)

Call Girls In Mohali ☎ 9915851334☎ Just Genuine Call Call Girls Mohali 🧿Elite...
Call Girls In Mohali ☎ 9915851334☎ Just Genuine Call Call Girls Mohali 🧿Elite...Call Girls In Mohali ☎ 9915851334☎ Just Genuine Call Call Girls Mohali 🧿Elite...
Call Girls In Mohali ☎ 9915851334☎ Just Genuine Call Call Girls Mohali 🧿Elite...
 
Kharar Call Girls Service✔️ 9915851334 ✔️Call Now Ranveer📲 Zirakpur Escort Se...
Kharar Call Girls Service✔️ 9915851334 ✔️Call Now Ranveer📲 Zirakpur Escort Se...Kharar Call Girls Service✔️ 9915851334 ✔️Call Now Ranveer📲 Zirakpur Escort Se...
Kharar Call Girls Service✔️ 9915851334 ✔️Call Now Ranveer📲 Zirakpur Escort Se...
 
gatiin-namaa-meeqa .pdf
gatiin-namaa-meeqa                        .pdfgatiin-namaa-meeqa                        .pdf
gatiin-namaa-meeqa .pdf
 
Just Call Vip call girls Etawah Escorts ☎️8617370543 Two shot with one girl (...
Just Call Vip call girls Etawah Escorts ☎️8617370543 Two shot with one girl (...Just Call Vip call girls Etawah Escorts ☎️8617370543 Two shot with one girl (...
Just Call Vip call girls Etawah Escorts ☎️8617370543 Two shot with one girl (...
 
Tirunelveli Escorts Service Girl ^ 9332606886, WhatsApp Anytime Tirunelveli
Tirunelveli Escorts Service Girl ^ 9332606886, WhatsApp Anytime TirunelveliTirunelveli Escorts Service Girl ^ 9332606886, WhatsApp Anytime Tirunelveli
Tirunelveli Escorts Service Girl ^ 9332606886, WhatsApp Anytime Tirunelveli
 
Mahim Call Girls in Bandra 7738631006, Sakinaka Call Girls agency, Kurla Call...
Mahim Call Girls in Bandra 7738631006, Sakinaka Call Girls agency, Kurla Call...Mahim Call Girls in Bandra 7738631006, Sakinaka Call Girls agency, Kurla Call...
Mahim Call Girls in Bandra 7738631006, Sakinaka Call Girls agency, Kurla Call...
 
I am Independent Call girl in noida at chepest price Call Me 8826255397
I am Independent Call girl in noida at chepest price Call Me 8826255397I am Independent Call girl in noida at chepest price Call Me 8826255397
I am Independent Call girl in noida at chepest price Call Me 8826255397
 
Chicwish Clothing: A Critical Review of Quality, Fit, and Style
Chicwish Clothing: A Critical Review of Quality, Fit, and StyleChicwish Clothing: A Critical Review of Quality, Fit, and Style
Chicwish Clothing: A Critical Review of Quality, Fit, and Style
 
Zirakpur Call Girls Service ❤️🍑 7837612180 👄🫦Independent Escort Service Zirakpur
Zirakpur Call Girls Service ❤️🍑 7837612180 👄🫦Independent Escort Service ZirakpurZirakpur Call Girls Service ❤️🍑 7837612180 👄🫦Independent Escort Service Zirakpur
Zirakpur Call Girls Service ❤️🍑 7837612180 👄🫦Independent Escort Service Zirakpur
 
Payal Mehta 9867746289, Escorts Service Near The Taj Mahal Palace Colaba
Payal Mehta 9867746289, Escorts Service Near The Taj Mahal Palace ColabaPayal Mehta 9867746289, Escorts Service Near The Taj Mahal Palace Colaba
Payal Mehta 9867746289, Escorts Service Near The Taj Mahal Palace Colaba
 
📞 Contact Number 8617370543VIP Hardoi Call Girls
📞 Contact Number 8617370543VIP Hardoi Call Girls📞 Contact Number 8617370543VIP Hardoi Call Girls
📞 Contact Number 8617370543VIP Hardoi Call Girls
 
Call Girls In Mumbai Just Genuine Call ☎ 7738596112✅ Call Girl Andheri East P...
Call Girls In Mumbai Just Genuine Call ☎ 7738596112✅ Call Girl Andheri East P...Call Girls In Mumbai Just Genuine Call ☎ 7738596112✅ Call Girl Andheri East P...
Call Girls In Mumbai Just Genuine Call ☎ 7738596112✅ Call Girl Andheri East P...
 
9867746289 - Payal Mehta Book Call Girls in Versova and escort services 24x7
9867746289 - Payal Mehta Book Call Girls in Versova and escort services 24x79867746289 - Payal Mehta Book Call Girls in Versova and escort services 24x7
9867746289 - Payal Mehta Book Call Girls in Versova and escort services 24x7
 
Call Girls In Raigad Escorts ☎️8617370543 🔝 💃 Enjoy 24/7 Escort Service Enjoy...
Call Girls In Raigad Escorts ☎️8617370543 🔝 💃 Enjoy 24/7 Escort Service Enjoy...Call Girls In Raigad Escorts ☎️8617370543 🔝 💃 Enjoy 24/7 Escort Service Enjoy...
Call Girls In Raigad Escorts ☎️8617370543 🔝 💃 Enjoy 24/7 Escort Service Enjoy...
 
Introduction to Fashion Designing for all
Introduction to Fashion Designing for allIntroduction to Fashion Designing for all
Introduction to Fashion Designing for all
 
Tinted Sunscreen For Soft and Smooth Skin
Tinted Sunscreen For Soft and Smooth SkinTinted Sunscreen For Soft and Smooth Skin
Tinted Sunscreen For Soft and Smooth Skin
 
UNIVERSAL HUMAN VALUES -Harmony in the Human Being
UNIVERSAL HUMAN VALUES -Harmony in the Human BeingUNIVERSAL HUMAN VALUES -Harmony in the Human Being
UNIVERSAL HUMAN VALUES -Harmony in the Human Being
 
Ladies kitty party invitation messages and greetings.pdf
Ladies kitty party invitation messages and greetings.pdfLadies kitty party invitation messages and greetings.pdf
Ladies kitty party invitation messages and greetings.pdf
 
Style Victorious Cute Outfits for Winners
Style Victorious Cute Outfits for WinnersStyle Victorious Cute Outfits for Winners
Style Victorious Cute Outfits for Winners
 
💚Call Girl In Amritsar 💯Anvi 📲🔝8725944379🔝Amritsar Call Girls No💰Advance Cash...
💚Call Girl In Amritsar 💯Anvi 📲🔝8725944379🔝Amritsar Call Girls No💰Advance Cash...💚Call Girl In Amritsar 💯Anvi 📲🔝8725944379🔝Amritsar Call Girls No💰Advance Cash...
💚Call Girl In Amritsar 💯Anvi 📲🔝8725944379🔝Amritsar Call Girls No💰Advance Cash...
 

Ferret A Ruby Search Engine

  • 1. Ferret A Ruby Search Engine Brian Sam-Bodden
  • 2. Agenda • What is Ferret? • Concepts • Fields • Indexing • Installing Ferret
  • 3. Agenda • The Recipe • Documents • Ferret::Index::Index • FQL • Ferret in you App
  • 4. Agenda • Ferret in Rails • Resources
  • 5. What is Ferret? • Information Retrieval (IR) Library • Full-featured Text Search Engine • Inspired on the Search Engine • Port to Ruby by David Balmain
  • 6. What is Ferret? • Initially a 100% pure Ruby port • Since 0.9 many core functions are implemented in C • Fast! Now Faster than Lucene ;-)
  • 8. Concepts • Index : Sequence of documents
  • 9. Concepts • Index : Sequence of documents • Document : Sequence of fields
  • 10. Concepts • Index : Sequence of documents • Document : Sequence of fields • Field : Named sequence of terms
  • 11. Concepts • Index : Sequence of documents • Document : Sequence of fields • Field : Named sequence of terms • Term : A text string, keyed by field name
  • 12. Fields of a Document in an Index
  • 13. Fields of a Document in an Index • Fields are individually searchable units that are:
  • 14. Fields of a Document in an Index • Fields are individually searchable units that are: • Stored: The original Terms of the fields are store
  • 15. Fields of a Document in an Index • Fields are individually searchable units that are: • Stored: The original Terms of the fields are store • Indexed: Inverted to rapidly find all Documents containing any of the Terms
  • 16. Fields of a Document in an Index • Fields are individually searchable units that are: • Stored: The original Terms of the fields are store • Indexed: Inverted to rapidly find all Documents containing any of the Terms • Tokenized: Individual Terms extracted are indexed
  • 17. Fields of a Document in an Index • Fields are individually searchable units that are: • Stored: The original Terms of the fields are store • Indexed: Inverted to rapidly find all Documents containing any of the Terms • Tokenized: Individual Terms extracted are indexed • Vectored: Frequency and location of Terms are stored
  • 18. It’s all about Indexing • Indexing is the processing of a source document into plain text tokens that Ferret can manipulate • For any non-plaintext sources such as PDF, Word, Excel you need to: • Extract • Analyze
  • 24. Installing Ferret } Pick the latest version for your platform
  • 26. The Recipe 1. Create some Documents
  • 27. The Recipe 1. Create some Documents 2. Create an Index
  • 28. The Recipe 1. Create some Documents 2. Create an Index 3. Adding Documents to the Index
  • 29. The Recipe 1. Create some Documents 2. Create an Index 3. Adding Documents to the Index 4. Perform some Queries
  • 30. Example Documents Create some Documents
  • 31. Example Documents Create some Documents “Any String is a Document”
  • 32. Example Documents Create some Documents
  • 33. Example Documents Create some Documents [“This”, “is also”, “a document”]
  • 34. Example Documents Create some Documents
  • 35. Example Documents Create some Documents
  • 36. Ferret::Index::Index Create an Index
  • 37. Ferret::Index::Index Create an Index • Indexes are encapsulated by the class
  • 38. Ferret::Index::Index Create an Index • Indexes are encapsulated by the class ➡ Ferret::Index::Index
  • 39. Ferret::Index::Index Create an Index • Indexes are encapsulated by the class ➡ Ferret::Index::Index • Use the alias Ferret::I for convenience
  • 40. Ferret::Index::Index Create an Index • Indexes are encapsulated by the class ➡ Ferret::Index::Index • Use the alias Ferret::I for convenience • Index can be persistent
  • 41. Ferret::Index::Index Create an Index • Indexes are encapsulated by the class ➡ Ferret::Index::Index • Use the alias Ferret::I for convenience • Index can be persistent ➡ index = Ferret::I.new(:path = > ‘/somepath’)
  • 42. Ferret::Index::Index Create an Index • Indexes are encapsulated by the class ➡ Ferret::Index::Index • Use the alias Ferret::I for convenience • Index can be persistent ➡ index = Ferret::I.new(:path = > ‘/somepath’) • Or, completely in Memory
  • 43. Ferret::Index::Index Create an Index • Indexes are encapsulated by the class ➡ Ferret::Index::Index • Use the alias Ferret::I for convenience • Index can be persistent ➡ index = Ferret::I.new(:path = > ‘/somepath’) • Or, completely in Memory ➡ index = Ferret::I.new()
  • 44. Ferret::Index::Index Adding Documents to the Index • Index provides the add_document method • It also provides the << alias • Adding documents is then as easy as: ➡ index << “This is a document” ➡ index << {:first => “Bob”, :last => “Smith”}
  • 45. Ferret::Index::Index Perform some Queries
  • 46. Ferret::Index::Index Perform some Queries • Index provides the search and search_each methods
  • 47. Ferret::Index::Index Perform some Queries • Index provides the search and search_each methods • search method takes a query and a an optional set of parameters:
  • 48. Ferret::Index::Index Perform some Queries • Index provides the search and search_each methods • search method takes a query and a an optional set of parameters: ➡ search(query, options = {})
  • 49. Ferret::Index::Index Perform some Queries • Index provides the search and search_each methods • search method takes a query and a an optional set of parameters: ➡ search(query, options = {}) • The search_each method provides an iterator block
  • 50. Ferret::Index::Index Perform some Queries • Index provides the search and search_each methods • search method takes a query and a an optional set of parameters: ➡ search(query, options = {}) • The search_each method provides an iterator block ➡ search_each(query, options = {}) {|doc, score| ... }
  • 53. Ferret Query Language • Ferret own Query Language, FQL is a powerful way to specify search queries • FQL supports many query types, including: • Term • Range • Phrase • Wild • Field • Fuzz • Boolean
  • 54. Index.explain • The explain method of Index describes how a document score against a query • Very useful for debugging • and for learning how Ferret works
  • 56. Ferret in your App Application Database Web User Manual File System Input Get User’s Present Gather Data Search Results Query Index Documents Search Index Ferret Index
  • 57. Ferret in Rails • Acts As Ferret is an ActiveRecord extension • Available as a plugin • Provides a simplified interface to Ferret • Maintained by Jens Kramer
  • 58. Ferret in Rails • Adding an index to an ActiveRecord model is as simple as:
  • 59. Ferret in Rails • Adding an index to an ActiveRecord model is as simple as:
  • 60. Ferret in Rails • Simple model has two searchable fields title and body:
  • 61. Ferret in Rails • After a quick rake db:migrate we now have some data to play with • Fire up the Rails Console and let’s see what acts_as_ferret can do for our models
  • 63. Want more? • Ferret is improving constantly • Acts As Ferret seems to catch up quickly • Real-life usage seems to require some good engineering on your part • Background indexing • Hot swap of indexes?
  • 64. Want more? • We only covered the simplest constructs in Ferret • Ferret’s API provides enough flexibility for the most demanding searching needs
  • 65. Online Resources • http://ferret.davebalmain.com • http://lucene.apache.org • http://lucenebook.com • http://projects.jkraemer.net/acts_as_ferret