Introducing LucidWorks App for Splunk Enterprise webinar
Guidelines for Managers: What Lucene and Solr Open Source Search can do for Enterprise Search
1. Guidelines for Managers:
What Lucene and Solr
Open Source Search
can do for
Enterprise Search
A Lucid Imagination White Paper
2. Abstract
Lucene/Solr is an open-source search development environment ideally suited for large-
scale, enterprise search applications. This paper provides some ways to think about your
enterprise search requirements from both technological and economic perspectives,
explains why a Lucene/Solr-based approach can be optimal, and describes how Lucid
Imagination can help you to design, develop, and deploy the necessary search solution.
What Lucene and Solr Open Source Search can do for Enterprise Search
A Lucid Imagination White Paper • April 2009 Page i
3. Table of Contents
Introduction ............................................................................................................................................................ 1
Preliminary Considerations .............................................................................................................................. 2
Know Your Business Requirements .......................................................................................................... 3
Know Your Data ................................................................................................................................................. 4
Know Your Users ............................................................................................................................................... 5
Advantages of a Lucene/Solr-Based Solution ............................................................................................ 5
Technological Advantages ............................................................................................................................. 5
Lower Cost, Greater Flexibility .................................................................................................................... 7
How Lucid Imagination Can Help ................................................................................................................... 9
Conclusion............................................................................................................................................................. 11
What Lucene and Solr Open Source Search can do for Enterprise Search
A Lucid Imagination White Paper • April 2009 Page ii
4. Introduction
Markets are conversations. And today, increasingly communications-rich interactions
within companies, and between companies and their stakeholders, are typically preserved
and stored, creating ever larger reserves of documents and data.
Effective access to a company’s data can be a strategic advantage of potentially enormous
value. Email, office documents, databases, customer service chat logs, content management
systems, data types representing all forms of communications in the company and with its
marketplace, continue to grow in electronic form. It’s not just that the proverbial haystack
is growing larger; it also has more types of hay, with many different types of needles to be
found.
At some point, every function in the company needs access to such data, and these needs
can vary significantly across organizations. Search technology can be a standalone system
designed to provide a single point of access to the entirety of a company’s data, irrespective
of location, container format, or owner. Or, it may provide search functionality as a
component within another application. But enabling employees, customers, partners,
investors, and other stakeholders to find the information they need when they need it is the
goal of any enterprise search solution, no matter where it will be deployed or how it will be
used.
This white paper provides some ways to approach choosing and building enterprise search
solutions, and discusses why Lucene/Solr open source search solutions supported by Lucid
Imagination present key advantages. It starts with what must be considered when
presented with an enterprise search problem, discusses some attributes of a Lucene/Solr-
based solution that could be of special significance in selecting a solution strategy, and
concludes by describing how Lucid Imagination can help to design, implement, and support
a solution that meets your organization’s needs.
What Lucene and Solr Open Source Search can do for Enterprise Search
A Lucid Imagination White Paper • April 2009 Page 1
5. Lucene and Solr are state-of-the-art search technologies available for free as open source
from The Apache Software Foundation. Lucene is a powerful search library; Solr provides a
platform built on top of Lucene that makes it easy to build Lucene-based applications1.
Both incarnations are full-featured and have excellent performance, relevancy ranking and
scalability. These technologies are used today by thousands of organizations. They power
substantial and diverse search applications at AOL, CNET, Comcast Interactive Media, IBM,
Netflix, LinkedIn, MySpace and many others. In many instances, Lucene/Solr solutions
regularly index and search tens or hundreds of millions of documents with sub-second
response time.
Lucene and Solr power substantial
and diverse search applications at AOL,
CNET, Comcast Interactive Media,
IBM, Netflix, LinkedIn, MySpace
and many others.
Lucid Imagination is exclusively dedicated to providing robust commercial support for
Apache Lucene/Solr open source search technology. Our products and services are
designed for enterprises currently using or evaluating Lucene/Solr for their search
solutions.
Preliminary Considerations
It is not unusual to think of the Web the minute search is mentioned, and with good reason:
nowadays, even small companies can have a large Web presence, and most workers and
consumers use the Web every day.
1
Most organizations use Solr today as their search development platform. As Lucene is the older of the two
technologies, and serves as the core of Solr’s search capabilities, we’ll refer to them together, as Lucene/Solr. For
more on the technologies, see http://www.lucidimagination.com.
What Lucene and Solr Open Source Search can do for Enterprise Search
A Lucid Imagination White Paper • April 2009 Page 2
6. But even in small companies, web pages typically represent only a fraction of the
important, text-based data to which stakeholders need access. Spreadsheets, slide decks,
PDF’s, project management files, electronic design documents, chat logs and e-mail may all
contain information that will be critical in any number of business situations. Similarly,
within even small companies there can be a need to support search usage models not
typically found on Web-centric systems. For example, the ability to conduct collaborative
searches may be critical to productivity in some contexts.
To create an optimal enterprise search solution, it is essential to know your:
• Business requirements: What needs must be met to create competitive advantage
for your enterprise, and how you will know when they are met?
• Available data: What and where is the content you have to work on, and how is it
structured (e.g., does it form a natural “cascade” of sub-classes?)
• Users: What do they need to search and how will they prefer to search for it?
Along any of these dimensions, there is potentially huge variance, from one case to another.
The goal of the discussion here is not to provide an exhaustive checklist of issues. It is
intended rather to suggest the sorts of questions that should be considered at the earliest
possible stage of development.
Know Your Business Requirements
Applications for enterprise search and their associated requirements are as diverse as the
organizations that need them and the data they need to search. However, there are two
characteristics by which any search solution will ultimately be judged:
• Performance. Does the system return results quickly enough to fulfill the
expectations of the critical mass of users? How does it perform under peak loads?
Will the performance scale adequately as usage increases? Is enough known about
probable evolution to build the system in such a way that it will sustain projected
growth with minimum enhancement, let alone wholesale re-structuring?
Additionally, what is the cost associated with obtaining that scale?
• Relevance. How well will the system find the data that the user needs, and how
good a job will it do presenting query results in an appropriate way and in the best
order? What techniques, implicit or explicit, are required to get user assessments of
relevance?
What Lucene and Solr Open Source Search can do for Enterprise Search
A Lucid Imagination White Paper • April 2009 Page 3
7. In some cases, additional criteria of success will focus on areas such as system security or
legal and regulatory compliance. However, a focus on optimizing for both performance and
relevance is essential to designing and building an effective search solution.
Know Your Data
As noted above, your search solution can include input data of any type stored in any
format or container type — ranging from project-specific program management files to
sets of database records with relevant unstructured text fields. The better you understand
the data domain of the search system, the more effective your resulting searches will be,
and the higher the probability that your system’s success metrics (starting with
performance and relevance) will be achieved.
By beginning with a data audit, you can gain an understanding of:
• Number and types of documents. How many documents does your system need to
support, and how big are they, both individually and in aggregate? Answers to these
questions will have implications for performance design and planning. Similarly,
knowing about document types, formats, etc., is essential to ensure adequate access
by means of file filtering or other data preparation or pre-processing steps, and so is
crucial both for performance and relevance.
• Key fields. For structured or partially structured data, certain fields may carry more
weight than others in determining relevance. For example, a document’s title may
be assigned a higher relevancy weight than its size.
• Internal information structure. Even less formal documents can have key
structural attributes Let’s illustrate by example: Imagine that the data domain of a
search query includes the unstructured text of consumer-electronics blogs.
Although the text itself is unstructured, the information within it may have a fair
amount of structure, including, for example, names of manufacturers and their
products, product capabilities such as storage or resolution, etc. The structure has a
“shape” to it, from more general to more specific. Thus, a manufacturer’s name can
be associated with many product names, product names with attributes or
capabilities, etc.
What Lucene and Solr Open Source Search can do for Enterprise Search
A Lucid Imagination White Paper • April 2009 Page 4
8. Know Your Users
The human dimension of the search solution of course presents the most variables of all.
Who will your users be, and in what roles are they most likely to use the search application,
e.g., consumer, research scientist, salesperson, manager, all of the above? Of equal if not
greater importance than the “who” and “what” of your users is the “how.” Is there a need
for collaborative search or to structure a “work flow” into the search process itself? Is there
a need to provide different levels or types of access to different classes of users? Or will
your application be extracting search results to feed to another application, without
presenting it to users at all?
Advantages of a Lucene/Solr-Based Solution
We discussed above some key success criteria for enterprise search solutions primarily in
terms of the performance and results relevance of the search application. As a business-
decision maker, you may find that useful, but still a little too abstract. You are likely to ask
yourself at some point: How will my enterprise search solution help me either to make or
save money? While it is true that the Lucene and Solr software are free, there’s much more
to it than the attractive price.
Let’s take a closer look at the question of making and saving money using Lucene and Solr,
from the vantage points of both technology and economics.
Technological Advantages
Lucene is the core search library; Solr is the logical starting point for most developers
building search applications with Lucene/Solr technology for their web site, product, or
internal organizational use. Let’s look at how Solr helps you build search, and then how
Lucene executes it.
Solr is a layer of code on top of Lucene that transforms Lucene into an enterprise search
platform, and simplifies programming by extending to a broad variety of common, easier-to
use development environments. Key features include:
• Web services. Solr places Lucene over HTTP, allowing programs written in any
language to invoke Lucene Search. It provides access via REST-like interfaces, or
What Lucene and Solr Open Source Search can do for Enterprise Search
A Lucid Imagination White Paper • April 2009 Page 5
9. from a full array of open-standards based development environments, languages,
and tools, including, for example, Python, PHP, Ruby, Ruby-on-Rails, etc.
• Faceting, which is the grouping of items or search results into categories that let
users drill into search results (or even skip searching entirely) by any value in any
field (for example, choosing different attributes of shoes at Zappos.com, or searching
Wikipedia by sub-articles, or navigating news articles at cnet.com ).
• Easy configuration for managing which fields are indexed, and their
characteristics.
• System administration tools for data loading, index replication, monitoring,
logging and cache management.
“How will my enterprise search solution
help me either to make or save money?
While it is true that the Lucene and Solr
software are free, there’s much more to it
than the attractive price.
Lucene, the core search engine, is a Java-based search library available for free as open
source under the Apache Software License. At the heart of the application’s “search engine,”
Lucene exhibits attributes that enable applications employing it to deliver world-class user
satisfaction. These include:
• Outstanding speed. Supports sub-second performance for most queries.
• Strong relevancy ranking and full results processing. Great out-box precision
returns the information (documents) that users need without including a lot that
they don’t. These results are presented clearly by relevancy, date, field, or any
document property—and can be sorted by these attributes. Additional supported
features, like highlighting and spell checking, let you extend search interactivity,
making the refinement process easier and more conversational.
• Complete query capabilities. Offers a full array of query methods: keyword,
Boolean and +/- queries, proximity operators, wildcards, fielded searching,
term/field/document weights, find-similar, spell-checking, multi-lingual search, and
What Lucene and Solr Open Source Search can do for Enterprise Search
A Lucid Imagination White Paper • April 2009 Page 6
10. more. This means that your search solution can be flexible enough to accommodate
an enormous range of user preferences and data types, from the simplest to the
most complex. And because Lucene/Solr are open source, users can readily tailor
queries to very specific needs.
• Unsurpassed portability. Runs on any platform supporting Java, and indexes are
portable across platforms. You can build an index on Linux and copy it to a Microsoft
Windows machine and search it there. This makes it easy to leverage advances in
hardware and Operating Systems while minimizing additional development costs
for faster and better search functionality. There are also open source ports of
Lucene for many languages besides Java, including .NET, C, Python and others.
• Excellent scalability. Scales from document sets of hundreds to hundreds of
millions and beyond.
• Easily manageable, highly flexible deployment options. Enables “shrink-to-fit”
deployments, ranging from single-server to fully distributed, multi-server systems,
with its low overhead indexes and rapid incremental indexing (especially with
versions 2.3 and later).
While no single search technology is best on each of these dimensions for every application,
Lucene is among the best out-of-the-box on all of them.
Together, Lucene and Solr provide the foundations for a search solution that is fully
capable and functionally complete. When the capabilities and attributes listed above are
essential requirements for your enterprise search needs, Lucene/Solr is a prime candidate
for fulfilling them.
Lower Cost, Greater Flexibility
When evaluating the economic advantages of a Lucene/Solr-based enterprise search
solution, it is useful to consider competing solutions from the perspective of non-recurring
and recurring costs:
• Non-recurring costs: Requirements gathering, system design and specification,
system development (implementation), and testing are all more-or-less non-
recurring costs. Another important element in this set is the cost of software
acquisition (licensing or purchase).
What Lucene and Solr Open Source Search can do for Enterprise Search
A Lucid Imagination White Paper • April 2009 Page 7
11. • Recurring costs: The largest contributors here will be on-going technical and
customer support and system administration, management, and maintenance. These
are dependent on many factors including, for example, system size and complexity
and number of users and their level of sophistication.
In both sets, almost all costs are associated with labor, and despite possible assertions to
the contrary, those costs are going to be approximately the same across the competing
products and technologies. All search systems, no matter how they work or who provides
them, require design and specification, development, configuration, deployment, testing,
and on-going support and maintenance.
The only inarguably clear differentiator is the cost of software acquisition. As Lucene and
Solr are open-source software solutions based on open standards and community-driven
development processes, they are free. Assuming all other costs are about equal, therefore,
the open source solution is almost certain to be highly cost efficient.
That, however, is still not the whole story. A Lucene/Solr-based solution can be the most
cost effective as well. With its strong out-of-the-box performance and relevancy ranking;
complete query capabilities; portability, scalability, and manageability characteristics; and
easy-to-use, highly-standard programming interfaces, Lucene/Solr enables you to deploy
exactly the enterprise search functionality required to fulfill completely your customers’
needs.
What Lucene and Solr Open Source Search can do for Enterprise Search
A Lucid Imagination White Paper • April 2009 Page 8
12. How Lucid Imagination Can Help
Lucid Imagination has the expertise, resources and services you need to drive development
of Lucene/Solr-based enterprise search solutions. We offer a full portfolio of software and
services including:
Certified Distributions. Because Lucene/Solr distributions certified by Lucid Imagination
are tested and commercially supported, they speed up implementation time, reduce the
risk of “gotchas”, and eliminate the need for familiarity with the fine points of the
community release process. Tested bugfixes are incorporated in organized fashion,
reducing the time needed to comb through nightly open source community releases, or
risking code forks between release cycles. The Get Started program helps users who
download our Certified Distributions with first-time installation, configuration, and basic
usage of Lucene/Solr and included utilities.
“There is no substitute for “industrial
strength” support to ensure your
enterprise IT operation gets timely
responses, so it can both meet market-
driven development schedules and
maintain stringent service level
commitments.”
Technical Support. Although contemporary open-source solutions are typically at least as
robust and reliable as their commercial counterparts, problems can still arise. Because the
community may not focus on maintenance in timely fashion, there is no substitute for the
“industrial strength” support provided by Lucid Imagination to ensure your enterprise IT
operation gets timely responses, so it can both meet market-driven development schedules
and maintain stringent service level commitments. Designed for customers with
Lucene/Solr installations, the support subscriptions we offer include:
o Regular updates and upgrades for Lucid-certified versions of Lucene/Solr
What Lucene and Solr Open Source Search can do for Enterprise Search
A Lucid Imagination White Paper • April 2009 Page 9
13. o Problem isolation and diagnosis of errors in Lucene/Solr software
o Bug patches and workarounds
o Troubleshooting of use-case issues that may arise.
Support subscriptions are available in a variety of packages to fit different maintenance
profiles:
• Basic, a fit for stable deployments that can rely on minimal intervention and can
wait a day to hear back.
• Professional, for deployments with quicker response time requirements, featuring
both phone and email support.
• Enterprise. For mission-critical deployments requiring initial response within four
(4) business hours on the same business day, plus an annual Search Health Check
program.
• Advanced Support. Designed for customers with more demanding needs for expert
advice and guidance on an ongoing basis, Advanced Support subscriptions include
the services delivered under Enterprise Technical Support, plus consultative
support to help optimize development and/or deployment efforts. Two options are
available:
o Development Support: As noted above, enterprise search requirements often
are designed for deployment with enormous data domains and stringent user
requirements. Although it may be relatively easy to construct a solution that
works to a first-order of sophistication, when the requirements exceed more
straightforward design goals, we can help you get to a solution that is potentially
many times more capable for a relatively small amount of additional investment.
We help you optimize development of Lucene/Solr enterprise search
applications with reviews of architecture, design, code and configuration, along
with best-of-breed methodologies and powerful tools. Includes one annual
Search Health Check.
o Production Support: For large, relatively complex systems that have data
domains with continuously increasing size and complexity, on-going tuning and
performance enhancement may be critical to ensuring sustained customer
satisfaction We help you achieve optimal performance and availability for
Lucene/Solr in your production environment. We provide advice on best
What Lucene and Solr Open Source Search can do for Enterprise Search
A Lucid Imagination White Paper • April 2009 Page 10
14. practices for configuration, operations, scaling, tuning, and tools, as well as two
annual Search Health Checks.
• Training. Our hands-on training programs in Lucene and Solr technologies help
your staff acquire skills and develop expertise. Training programs are offered as
classroom-based courses, and can be customized for on-site delivery.
• Consulting. Our consulting practice offers flexible-term engagements to assist you
with high value activities such as architecture and design reviews, training,
enablement, and best practices. As our consultants work on a broad variety of
implementations, they are well positioned to recommend optimal approaches to
your business and technical challenges. Their deep domain expertise can be
retained on a project basis, over several months, ad-hoc, or as a subscription.
Consultants are available on a remote basis or for short-term onsite work.
Our customers benefit from the years of collective expertise found in our technical staff,
who are themselves widely recognized leaders in the Lucene/Solr community. By
providing predictable, reliable resources, Lucid Imagination helps you meet your project
feature, function, and schedule requirements. We can help you reduce the risks and capture
the benefits of open source for your enterprise search solution.
We invite you to visit our Website (http://www.lucidimagination.com) for additional
details.
Conclusion
Lucene/Solr-based enterprise search solutions are among the most comprehensive,
complete, robust, and flexible in the world today. Whether you are merely contemplating
an open-source enterprise search solution or already have one deployed, Lucid Imagination
is the one company that is uniquely situated to help ensure that your customers are not
merely satisfied, but delighted in the fulfillment of their enterprise search needs.
What Lucene and Solr Open Source Search can do for Enterprise Search
A Lucid Imagination White Paper • April 2009 Page 11