ZendCon 2010 - Building Intelligent Search Applications with Apache Solr and PHP5. This is a presentation on how to create intelligent web-based search applications using PHP 5 and the out-of-the-box features available in Solr 1.4.1 After we finish we finish the illustration of adding, updating and removing data from the Solr index, we will discuss how to add features such as auto-completion, hit highlighting, faceted navigation, spelling suggestions etc
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
Building Intelligent Search Applications with Apache Solr and PHP5
1. Israel Ekpo | World Disney Parks and Resorts Online
Building Intelligent Search
Applications with Solr 1.4.1
and PHP 5
2. About the Presenter
Husband (beautiful wife June)
Father (handsome son Joshua)
Sr. Software Engineer at World Disney Parks and Resorts Online
Resides in Orlando, FL
Open Source Contributor to Apache Solr / Apache Lucene Projects
Author of Apache Solr PECL Extension
Email : iekpo@php.net
Twitter : @israelekpo
Website : http://www.israelekpo.com
3. Summary
Why Create Search Applications?
What Solr is
What Solr is not
Why choose Apache Solr?
What Features Solr Has to Offer in Current Release 1.4.1
Taking Advantage of these features and more
Using Apache Solr via PHP 5
How do we make Search Applications Intelligent?
Additional Topics (Nutch, Local Solr, Bitwise Filtering, Plugins)
Links to slides, sample codes
Upcoming Features
Where to get help with Apache Solr
4. Reasons to Create Search Applications
Users have been spoiled by Google and other search engines.
Users are used to navigation using a search box.
If unable to locate information immediately, they assume it's not there.
Less patience when attempting to look for information
Certain customer service tools may not have relevant navigation
Information needs to be located immediately
5. Reasons to Create Search Applications
Reduce the amount of time it takes to locate information
Make products and services more accessible to customers
Increase time spent by visitors on web application
Save time, Save money
Increase employee efficiency (CRM applications)
Improve user experience on web application
Increase revenue and increase profit
6. About Solr
Solr is written in Java
Standalone full text Search Server within servlet container
Uses the information retrieval library Lucene at its Core
Joined Apache Incubator in January 2006
First major release in December 2006
Latest stable release is 1.4.1 - June 2010
Next major release 4.0
http://lucene.apache.org/solr/
7. What Solr is NOT
Solr is not a wrapper around Lucene
Solr is not a RDBMS
8. What version should I use?
I am confused …
The current version of Apache Solr is 1.4.1
Why is there both a 1.5 and a 3.x anyway ?
Not to mention a 4.x ?
1.5 is pre lucene/solr merge (very unlikely to ever be released)
3.1 is the next lucene/solr point release (3x branch in svn)
4.0 is the next major release (trunk in svn)
9. Why Choose Apache Solr?
• Solr is FREE and Open Source (released under Apache License)
• Advanced Full Text Search Capabilities absent in an RDBMS
• Optimized for High Volume Web Traffic
• Standards Based Open Interfaces - XML,JSON and HTTP
• Comprehensive HTML Administration Interfaces
• Server statistics exposed over JMX for monitoring
• Scalability - Efficient Replication to other Solr Search Servers
• Flexible and Adaptable with XML configuration
• Extensible Plugin Architecture
• Because a Voice in my head told me to
• All of the above
10. Features in Apache Solr 1.4.1
A Real Data Schema -Numeric Types, Dynamic Fields, Unique Keys
Hit Highlighting, Spelling Suggestions, Auto Suggests
Faceted Search and Filtering
Advanced, Configurable Text Analysis
Highly Configurable and User Extensible Caching
External Configuration via XML
An Administration Interface
Monitorable Logging
Fast Incremental Updates and Index Replication
Highly Scalable Distributed search - sharded index across multiple hosts
XML, CSV/delimited-text update formats
Rich Document Indexing - PDF, Word, HTML using Apache Tika
Multiple search indices
11. Setting Up and Getting Started
Setup Instructions will be posted here later today
http://www.israelekpo.com/works
Download Links
http://www.apache.org/dyn/closer.cgi/lucene/solr/
http://tomcat.apache.org/download-60.cgi
http://pecl.php.net/package/solr
Documentation and Helpful Information
http://us.php.net/solr
http://lucene.apache.org/solr/tutorial.html
http://wiki.apache.org/solr/
12. Important Directories
conf/
This directory is mandatory and must contain your solrconfig.xml and schema.xml.
Any other optional configuration files would also be kept here.
data/
This directory is the default location where Solr will keep your index, and is used by the replication scripts
dealing with snapshots.
You can override this location in the solrconfig.xml
Solr will create this directory if it does not already exist.
lib/
This directory is optional. If it exists, Solr will load any Jars found in this directory and use them to resolve
any "plugins” specified in your solrconfig.xml or schema.xml (ie: Analyzers, Request Handlers, etc...).
bin/
This directory is optional.
It is the default location used for keeping the replication scripts.
13. solrconfig.xml – handlers, plugins
Defines data directory for index
Overides directory for external libs
Defines and configures request handlers
Defines and enables response writers
Defines parameters for autowarming
Used to register additional plugins
14. synonyms.txt
Used for token or keyword substitution during indexing or queries
Source(s) => replacement(s)
colour => color
cheque => check
car, boat, truck => vehicle
dude, guy => man
trousers => pants
22. Solr Query Syntax
Solr Uses and Extends the Lucene Query Syntax (Superset of Lucene Syntax)
http://lucene.apache.org/java/2_9_1/queryparsersyntax.html
http://wiki.apache.org/solr/SolrQuerySyntax
+ Required Optional –Prohibited
Booleans
Free AND Fast
+Free +Fast
+title:Fast AND –body:dollars
Range Queries
[* TO 500]
{300 TO *}
cost:[* TO 299.99}
34. Taking advantage of features
Reasons to Use Spellchecker
• Alerts user of possible mistaeks in search kewyords
• Helps user in finding what they are looking for even when they
don’t know how to spell it
• Helps in suggesting alternate queries that could provide better
results
39. Taking advantage of features
Why use auto suggest?
• Helps users to finish their thoughts or complete search phrase
• Helps reduce number of “no matches found” experiences
• Provides “mind-reader” experience to certain users.
• May propose alternate search phrases that are more useful
48. Taking advantage of features
Why Facets and Filter Queries?
Allows users to narrow down humongous result set
Creates visual classification or categorization of result set
Gives user an idea of number of hits per category
Improves overall search experience
49. Additional Topics
Using Nutch and Solr – crawling and indexing intranet sites
http://nutch.apache.org/
Local Solr – filtering based on proximity
https://issues.apache.org/jira/browse/SOLR-773
Bitwise Filtering on Integer Fields
https://issues.apache.org/jira/browse/SOLR-1913
Using Different Response Writer for PHP
http://us.php.net/manual/en/solrclient.setresponsewriter.php
https://issues.apache.org/jira/browse/SOLR-1967
50. Upcoming Features
Apache Solr
• Local Solr
• Results Grouping
• Field Collapsing
PECL Extension
• Ability to Send Custom Requests to Custom URLS other than select, update, terms
etc.
• Ability to add files (pdf, office documents etc)
• Windows version of latest releases.
• Ensuring that SolrQuery::getFields(), SolrQuery::getFacets() et al returns an array
consistently.
• Lowering Libxml version to 2.6.16
51. Where to get help
Solr Wiki
http://wiki.apache.org/solr/
Solr Mailing Lists
solr-user@lucene.apache.org (send message)
solr-user-subscribe@lucene.apache.org (subscribe)
PECL Extension Documentation on PHP.net
http://www.php.net/solr
Additional Resources
http://wiki.apache.org/solr/SolrResources
Articles on LucidImagination.com – lot of articles from experts in the community.
53. WDPRO is Hiring
Walt Disney Parks and Resorts Online is hiring for the following positions
http://wdpro.jobs
• Software Architects
• Software Engineers
• Web Developers
• Automation Engineers
• System Engineers
• Release Engineers
• Quality Assurance Engineers
• Business Analysts
• Project Managers
• Technology Managers
Email your resume -> iekpo@php.net
mail(“iekpo@php.net”, “I am interested! Hook me up!”, “resume , contact info”);
54. Feedback and Links
Attendee Evaluation and Comments
http://joind.in/2261
Link to Slides
http://slidesha.re/bAXNF3
Sample Code (will be posted later)
http://www.israelekpo.com/works
Email
iekpo@php.net