5. What is Ferret?
• Information Retrieval (IR) Library
• Full-featured Text Search Engine
• Inspired on the Search Engine
• Port to Ruby by David Balmain
6. What is Ferret?
• Initially a 100% pure Ruby port
• Since 0.9 many core functions are
implemented in C
• Fast! Now Faster than Lucene ;-)
10. Concepts
• Index : Sequence of documents
• Document : Sequence of fields
• Field : Named sequence of terms
11. Concepts
• Index : Sequence of documents
• Document : Sequence of fields
• Field : Named sequence of terms
• Term : A text string, keyed by field name
13. Fields of a Document in
an Index
• Fields are individually searchable units
that are:
14. Fields of a Document in
an Index
• Fields are individually searchable units
that are:
• Stored: The original Terms of the fields are store
15. Fields of a Document in
an Index
• Fields are individually searchable units
that are:
• Stored: The original Terms of the fields are store
• Indexed: Inverted to rapidly find all Documents
containing any of the Terms
16. Fields of a Document in
an Index
• Fields are individually searchable units
that are:
• Stored: The original Terms of the fields are store
• Indexed: Inverted to rapidly find all Documents
containing any of the Terms
• Tokenized: Individual Terms extracted are
indexed
17. Fields of a Document in
an Index
• Fields are individually searchable units
that are:
• Stored: The original Terms of the fields are store
• Indexed: Inverted to rapidly find all Documents
containing any of the Terms
• Tokenized: Individual Terms extracted are
indexed
• Vectored: Frequency and location of Terms are
stored
18. It’s all about Indexing
• Indexing is the processing of a source
document into plain text tokens that Ferret
can manipulate
• For any non-plaintext sources such as PDF,
Word, Excel you need to:
• Extract
• Analyze
38. Ferret::Index::Index
Create an Index
• Indexes are encapsulated by the class
➡ Ferret::Index::Index
39. Ferret::Index::Index
Create an Index
• Indexes are encapsulated by the class
➡ Ferret::Index::Index
• Use the alias Ferret::I for convenience
40. Ferret::Index::Index
Create an Index
• Indexes are encapsulated by the class
➡ Ferret::Index::Index
• Use the alias Ferret::I for convenience
• Index can be persistent
41. Ferret::Index::Index
Create an Index
• Indexes are encapsulated by the class
➡ Ferret::Index::Index
• Use the alias Ferret::I for convenience
• Index can be persistent
➡ index = Ferret::I.new(:path = > ‘/somepath’)
42. Ferret::Index::Index
Create an Index
• Indexes are encapsulated by the class
➡ Ferret::Index::Index
• Use the alias Ferret::I for convenience
• Index can be persistent
➡ index = Ferret::I.new(:path = > ‘/somepath’)
• Or, completely in Memory
43. Ferret::Index::Index
Create an Index
• Indexes are encapsulated by the class
➡ Ferret::Index::Index
• Use the alias Ferret::I for convenience
• Index can be persistent
➡ index = Ferret::I.new(:path = > ‘/somepath’)
• Or, completely in Memory
➡ index = Ferret::I.new()
44. Ferret::Index::Index
Adding Documents to the Index
• Index provides the add_document
method
• It also provides the << alias
• Adding documents is then as easy as:
➡ index << “This is a document”
➡ index << {:first => “Bob”, :last => “Smith”}
46. Ferret::Index::Index
Perform some Queries
• Index provides the search and
search_each methods
47. Ferret::Index::Index
Perform some Queries
• Index provides the search and
search_each methods
• search method takes a query and a an
optional set of parameters:
48. Ferret::Index::Index
Perform some Queries
• Index provides the search and
search_each methods
• search method takes a query and a an
optional set of parameters:
➡ search(query, options = {})
49. Ferret::Index::Index
Perform some Queries
• Index provides the search and
search_each methods
• search method takes a query and a an
optional set of parameters:
➡ search(query, options = {})
• The search_each method provides an
iterator block
50. Ferret::Index::Index
Perform some Queries
• Index provides the search and
search_each methods
• search method takes a query and a an
optional set of parameters:
➡ search(query, options = {})
• The search_each method provides an
iterator block
➡ search_each(query, options = {}) {|doc, score| ... }
53. Ferret Query Language
• Ferret own Query Language, FQL is a
powerful way to specify search queries
• FQL supports many query types,
including:
• Term • Range
• Phrase • Wild
• Field • Fuzz
• Boolean
54. Index.explain
• The explain method of Index describes
how a document score against a query
• Very useful for debugging
• and for learning how Ferret works
56. Ferret in your App
Application
Database Web
User
Manual
File System
Input
Get User’s Present
Gather Data Search Results
Query
Index
Documents Search Index
Ferret
Index
57. Ferret in Rails
• Acts As Ferret is an ActiveRecord
extension
• Available as a plugin
• Provides a simplified interface to
Ferret
• Maintained by Jens Kramer
58. Ferret in Rails
• Adding an index to an ActiveRecord
model is as simple as:
59. Ferret in Rails
• Adding an index to an ActiveRecord
model is as simple as:
60. Ferret in Rails
• Simple model has two searchable
fields title and body:
61. Ferret in Rails
• After a quick rake db:migrate we now
have some data to play with
• Fire up the Rails Console and let’s see
what acts_as_ferret can do for our
models
63. Want more?
• Ferret is improving constantly
• Acts As Ferret seems to catch up
quickly
• Real-life usage seems to require some
good engineering on your part
• Background indexing
• Hot swap of indexes?
64. Want more?
• We only covered the simplest
constructs in Ferret
• Ferret’s API provides enough
flexibility for the most demanding
searching needs