SlideShare a Scribd company logo
1 of 23
andrew.janowczyk@searchbox.com
Solr is
◦ Blazing fast open source enterprise search platform
◦ Lucene-based search server
◦ Written in Java
◦ Has REST-like HTTP/XML and JSON APIs
◦ Extensive plugin architecture
http://lucene.apache.org/solr/
 Allows for the development of plugins which
provide advanced operations
 Types of plugins:
◦ RequestHandlers
 Uses url parameters and returns own response
◦ SearchComponents
 Responses are embedded in other responses (such as
/select)
◦ ProcessFactory
 Response is stored into a field along with the
document during index time
 A quick tutorial on how to program a
SearchComponent to
◦ Be initialized
◦ Parse configuration file arguments
◦ Do something useful on search request (counts
some words in indexed documents)
◦ Format and return response
 We’ll name our plugin
“DemoSearchComponent” and show how to
stick it into the solrconfig.xml for loading
 In the next slide, we’ll specify a list of variables
called “words”, and each list subtype is a string
“word”
 We want to load these specific words and then
count them in all result sets of queries.
 Ex: config file has “body”, “fish”, “dog”
◦ Indexed Document has: dog body body body fish fish
fish fish orange
◦ Result should be:
 body=3.0
 fish=4.0
 dog=1.0
<searchComponent
class="com.searchbox.DemoSearchComponent"
name="democomponent">
<str name=“field">myfield</str>
<lst name="words">
<str name="word">body</str>
<str name="word">fish</str>
<str name="word">dog</str>
</lst>
</searchComponent>
• We tell Solr the name of the
class which has our
component
• Variables will be loaded
from this section during
the init method
• We set a default field for
analyzing the documents
• We specify a list of words
we’d like to have counts of
 We can see that we’re asking for Solr to load
com.searchbox.DemoSearchComponent.
 This will be the output of our project in .jar
file format
 Copy the .jar file to the lib directory in the
Solr installation so that Solr can find it.
 That’s it!
package com.searchbox;
import java.io.IOException;
import java.util.Date;
import java.util.HashMap;
import java.util.HashSet;
import java.util.List;
import java.util.Set;
import java.util.logging.Level;
import org.apache.lucene.document.Document;
import org.apache.lucene.index.IndexableField;
import org.apache.solr.common.SolrException;
import org.apache.solr.common.params.SolrParams;
import org.apache.solr.common.util.NamedList;
import org.apache.solr.common.util.SimpleOrderedMap;
import org.apache.solr.core.SolrCore;
import org.apache.solr.core.SolrEventListener;
import org.apache.solr.handler.component.ResponseBuilder;
import org.apache.solr.handler.component.SearchComponent;
import org.apache.solr.schema.SchemaField;
import org.apache.solr.search.DocIterator;
import org.apache.solr.search.DocList;
import org.apache.solr.search.SolrIndexSearcher;
import org.apache.solr.util.plugin.SolrCoreAware;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
Just some of the
common packages we’ll
need to import to get
things rolling!
public class DemoSearchComponent
extends SearchComponent {
private static Logger LOGGER =
LoggerFactory.getLogger(DemoSearchC
omponent.class);
volatile long numRequests;
volatile long numErrors;
volatile long totalRequestsTime;
volatile String lastnewSearcher;
volatile String lastOptimizeEvent;
protected String defaultField;
private List<String> words;
• We specify that our class
extends SearchComponent, so
we know we’re in business!
• We decide that we’ll keep track
of some basic statistics for
future usage
• Number of requests/errors
• Total time
• Make a variable to store our
defaultField and our words.
 Initialization is called when the plugin is first
loaded
 This most commonly occurs when Solr is
started up
 At this point we can load things from file
(models, serialized objects, etc)
 Have access to the variables set in
solrconfig.xml
 We have selected to pass a list called “words”
and have also provided the list “fish”, ”body”,
”cat” of words we’d like to count.
 During initialization we need to load this list
from solrconfig.xml and store it locally
@Override
public void init(NamedList args) {
super.init(args);
defaultField = (String) args.get("field");
if (defaultField == null) {
throw new SolrException(SolrException.ErrorCode.SERVER_ERROR, "Need to specify the default for analysis");
}
words = ((NamedList) args.get("words")).getAll("word");
if (words.isEmpty()) {
throw new SolrException(SolrException.ErrorCode.SERVER_ERROR, "Need to specify at least one word in
searchComponent config!");
}
}
Notice that we’ve loaded the list “words” and
then all of its attributes called “word” and put
them into the class level variable words.
Also we’ve identified our
defaultField
 There are 2 phases in a searchComponent
◦ Prepare
◦ Process
 During a query the prepare method is called
on all components before any work is done.
 This allows modifying, adding or substracting
variables or components in the stack
 Afterwards, the process methods are called
for the components in the exact order
specified by the solrconfig
@Override
public void prepare(ResponseBuilder rb)
throws IOException {
//none necessary
}
Nothing going on here, but we
need to override it otherwise
we can’t extend
SearchComponent
@Override
public void process(ResponseBuilder rb) throws IOException {
numRequests++;
SolrParams params = rb.req.getParams();
long lstartTime = System.currentTimeMillis();
SolrIndexSearcher searcher = rb.req.getSearcher();
NamedList response = new SimpleOrderedMap();
String queryField = params.get("field");
String field = null;
if (defaultField != null) {
field = defaultField;
}
if (queryField != null) {
field = queryField;
}
if (field == null) {
LOGGER.error("Fields aren't defined, not performing counting.");
return;
}
• We start off by keeping track in a volatile
variable the number of requests we’ve
seen (for use later in statistics), and we’d
like to know how long the process takes
so we note the time.
• We create a new NamedList which will
hold this components response
• We look at the URL parameters to see if
there is a “field” variable present. We
have set this up to override the default
we loaded from the config file
DocList docs = rb.getResults().docList;
if (docs == null || docs.size() == 0) {
LOGGER.debug("No results");
}
LOGGER.debug("Doing This many docs:t" + docs.size());
Set<String> fieldSet = new HashSet<String>();
SchemaField keyField =
rb.req.getCore().getSchema().getUniqueKeyField();
if (null != keyField) {
fieldSet.add(keyField.getName());
}
fieldSet.add(field);
• Since the search has
already been completed,
we get a list of documents
which will be returned.
• We also need to pull from
the schema the field which
contains the unique id.
This will let us correlate
our results with the rest of
the response
DocIterator iterator = docs.iterator();
for (int i = 0; i < docs.size(); i++) {
try {
int docId = iterator.nextDoc();
HashMap<String, Double> counts = new HashMap<String, Double>();
Document doc = searcher.doc(docId, fieldSet);
IndexableField[] multifield = doc.getFields(field);
for (IndexableField singlefield : multifield) {
for (String string : singlefield.stringValue().split(" ")) {
if (words.contains(string)) {
Double oldcount = counts.containsKey(string) ? counts.get(string) : 0;
counts.put(string, oldcount + 1);
}
}
}
String id = doc.getField(keyField.getName()).stringValue();
NamedList<Double> docresults = new NamedList<Double>();
for (String word : words) {
docresults.add(word, counts.get(word));
}
response.add(id, docresults);
} catch (IOException ex) {
java.util.logging.Logger.getLogger(DemoSearchComponent.class.getName()).log(Level.SEVERE, null, ex);
}
}
• Get a document iterator to look
through all docs
• Setup count variable this doc
• Load the document through the
searcher
• Get the value of the field
• BEWARE if it is a multifield, using
getField will only return the first
instance, not ALL instances
• Do our basic word counting
• Get the document unique id from
the keyfield
• Add each word to the results for
the doc
• Add the doc result to the overall
response, using its id value
rb.rsp.add("demoSearchComponent", response);
totalRequestsTime += System.currentTimeMillis() - lstartTime;
}
• Add all results to the final
response
• The name we pick here will
show up in the Solr output
• Note down how long it took
for the entire process
@Override
public String getDescription() {
return "Searchbox DemoSearchComponent";
}
@Override
public String getVersion() {
return "1.0";
}
@Override
public String getSource() {
return "http://www.searchbox.com";
}
@Override
public NamedList<Object> getStatistics() {
NamedList all = new SimpleOrderedMap<Object>();
all.add("requests", "" + numRequests);
all.add("errors", "" + numErrors);
all.add("totalTime(ms)", "" + totalTime);
return all;
}
• In order to have a production
grade plugin, users expect to see
certain pieces of information
available in their Solr admin
panel
• Description, version and source
are just Strings
• We see getStatistics() actually
uses the volatile variables we
were keeping track of before,
sticks them into another named
list and returns them. These
appear under the statistics panel
in Solr.
That’s it!
<requestHandler name="/demoendpoint" class="solr.SearchHandler">
<arr name="last-components">
<str>democomponent</str>
</arr>
</requestHandler>
We need some way to run our searchComponent, so we’ll add a quick
requestHandler to test it. This is done simply by overriding the normal
searchHandler and telling it to run the component we defined on an earlier
slide. Of course you could use your component directly in the select handler
and/or add it to a chain of other components! Solr is super versatile!
http://192.168.56.101:8983/solr/corename/demoendpoint?q=*%3A*&wt=xml&rows=2&fl=id,myfield
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">79</int>
</lst>
<result name="response" numFound="13262" start="0">
<doc>
<str name="id">f73ca075-3826-45d5-85df-64b33c760efc</str>
<arr name="myfield">
<str>dog body body body fish fish fish fish orange</str>
</arr>
</doc>
<doc>
<str name="id">bc72dbef-87d1-4c39-b388-ec67babe6f05</str>
<arr name="myfield">
<str>the fish had a small body. the dog likes to eat fish</str>
</arr>
</doc>
</result>
<lst name="demoSearchComponent">
<lst name="f73ca075-3826-45d5-85df-64b33c760efc">
<double name="body">3.0</double>
<double name="fish">4.0</double>
<double name="dog">1.0</double>
</lst>
<lst name="bc72dbef-87d1-4c39-b388-ec67babe6f05">
<double name="body">1.0</double>
<double name="fish">2.0</double>
<double name="dog">1.0</double>
</lst>
</lst>
</response>
Query results
Our results
Same order + ids
for correlation
• Because we’ve overridden the getStatistics() method, we
can get real-time stats from the admin panel!
• In this case since it’s a component of the SearchHandler,
our fields are concatenated with the other statistics
Happy Developing!
Full Source Code available at:
http://www.searchbox.com/developing-a-solr-plugin/

More Related Content

What's hot

Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...Edureka!
 
Elasticsearch for beginners
Elasticsearch for beginnersElasticsearch for beginners
Elasticsearch for beginnersNeil Baker
 
Introduction to Elasticsearch
Introduction to ElasticsearchIntroduction to Elasticsearch
Introduction to ElasticsearchRuslan Zavacky
 
Let's Build an Inverted Index: Introduction to Apache Lucene/Solr
Let's Build an Inverted Index: Introduction to Apache Lucene/SolrLet's Build an Inverted Index: Introduction to Apache Lucene/Solr
Let's Build an Inverted Index: Introduction to Apache Lucene/SolrSease
 
Neural Search Comes to Apache Solr
Neural Search Comes to Apache SolrNeural Search Comes to Apache Solr
Neural Search Comes to Apache SolrSease
 
Deep Dive into the New Features of Apache Spark 3.1
Deep Dive into the New Features of Apache Spark 3.1Deep Dive into the New Features of Apache Spark 3.1
Deep Dive into the New Features of Apache Spark 3.1Databricks
 
Introducing DataFrames in Spark for Large Scale Data Science
Introducing DataFrames in Spark for Large Scale Data ScienceIntroducing DataFrames in Spark for Large Scale Data Science
Introducing DataFrames in Spark for Large Scale Data ScienceDatabricks
 
Introduction to elasticsearch
Introduction to elasticsearchIntroduction to elasticsearch
Introduction to elasticsearchhypto
 
Naver속도의, 속도에 의한, 속도를 위한 몽고DB (네이버 컨텐츠검색과 몽고DB) [Naver]
Naver속도의, 속도에 의한, 속도를 위한 몽고DB (네이버 컨텐츠검색과 몽고DB) [Naver]Naver속도의, 속도에 의한, 속도를 위한 몽고DB (네이버 컨텐츠검색과 몽고DB) [Naver]
Naver속도의, 속도에 의한, 속도를 위한 몽고DB (네이버 컨텐츠검색과 몽고DB) [Naver]MongoDB
 
Introduction to Storm
Introduction to Storm Introduction to Storm
Introduction to Storm Chandler Huang
 
Elastic Search (엘라스틱서치) 입문
Elastic Search (엘라스틱서치) 입문Elastic Search (엘라스틱서치) 입문
Elastic Search (엘라스틱서치) 입문SeungHyun Eom
 
ElasticSearch Basic Introduction
ElasticSearch Basic IntroductionElasticSearch Basic Introduction
ElasticSearch Basic IntroductionMayur Rathod
 
Introduction to Apache Solr
Introduction to Apache SolrIntroduction to Apache Solr
Introduction to Apache SolrChristos Manios
 
E-Commerce search with Elasticsearch
E-Commerce search with ElasticsearchE-Commerce search with Elasticsearch
E-Commerce search with ElasticsearchYevhen Shyshkin
 
Deep Dive Into Elasticsearch
Deep Dive Into ElasticsearchDeep Dive Into Elasticsearch
Deep Dive Into ElasticsearchKnoldus Inc.
 
Elasticsearch presentation 1
Elasticsearch presentation 1Elasticsearch presentation 1
Elasticsearch presentation 1Maruf Hassan
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to SolrErik Hatcher
 
Building a real time, solr-powered recommendation engine
Building a real time, solr-powered recommendation engineBuilding a real time, solr-powered recommendation engine
Building a real time, solr-powered recommendation engineTrey Grainger
 

What's hot (20)

Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...
 
Elasticsearch for beginners
Elasticsearch for beginnersElasticsearch for beginners
Elasticsearch for beginners
 
Introduction to Elasticsearch
Introduction to ElasticsearchIntroduction to Elasticsearch
Introduction to Elasticsearch
 
Let's Build an Inverted Index: Introduction to Apache Lucene/Solr
Let's Build an Inverted Index: Introduction to Apache Lucene/SolrLet's Build an Inverted Index: Introduction to Apache Lucene/Solr
Let's Build an Inverted Index: Introduction to Apache Lucene/Solr
 
Neural Search Comes to Apache Solr
Neural Search Comes to Apache SolrNeural Search Comes to Apache Solr
Neural Search Comes to Apache Solr
 
Deep Dive into the New Features of Apache Spark 3.1
Deep Dive into the New Features of Apache Spark 3.1Deep Dive into the New Features of Apache Spark 3.1
Deep Dive into the New Features of Apache Spark 3.1
 
Introducing DataFrames in Spark for Large Scale Data Science
Introducing DataFrames in Spark for Large Scale Data ScienceIntroducing DataFrames in Spark for Large Scale Data Science
Introducing DataFrames in Spark for Large Scale Data Science
 
Introduction to elasticsearch
Introduction to elasticsearchIntroduction to elasticsearch
Introduction to elasticsearch
 
Naver속도의, 속도에 의한, 속도를 위한 몽고DB (네이버 컨텐츠검색과 몽고DB) [Naver]
Naver속도의, 속도에 의한, 속도를 위한 몽고DB (네이버 컨텐츠검색과 몽고DB) [Naver]Naver속도의, 속도에 의한, 속도를 위한 몽고DB (네이버 컨텐츠검색과 몽고DB) [Naver]
Naver속도의, 속도에 의한, 속도를 위한 몽고DB (네이버 컨텐츠검색과 몽고DB) [Naver]
 
Introduction to Storm
Introduction to Storm Introduction to Storm
Introduction to Storm
 
Elastic Search (엘라스틱서치) 입문
Elastic Search (엘라스틱서치) 입문Elastic Search (엘라스틱서치) 입문
Elastic Search (엘라스틱서치) 입문
 
ElasticSearch Basic Introduction
ElasticSearch Basic IntroductionElasticSearch Basic Introduction
ElasticSearch Basic Introduction
 
Introduction to Apache Solr
Introduction to Apache SolrIntroduction to Apache Solr
Introduction to Apache Solr
 
E-Commerce search with Elasticsearch
E-Commerce search with ElasticsearchE-Commerce search with Elasticsearch
E-Commerce search with Elasticsearch
 
Spring Boot
Spring BootSpring Boot
Spring Boot
 
Deep Dive Into Elasticsearch
Deep Dive Into ElasticsearchDeep Dive Into Elasticsearch
Deep Dive Into Elasticsearch
 
Elasticsearch
ElasticsearchElasticsearch
Elasticsearch
 
Elasticsearch presentation 1
Elasticsearch presentation 1Elasticsearch presentation 1
Elasticsearch presentation 1
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
Building a real time, solr-powered recommendation engine
Building a real time, solr-powered recommendation engineBuilding a real time, solr-powered recommendation engine
Building a real time, solr-powered recommendation engine
 

Similar to Tutorial on developing a Solr search component plugin

Refactoring In Tdd The Missing Part
Refactoring In Tdd The Missing PartRefactoring In Tdd The Missing Part
Refactoring In Tdd The Missing PartGabriele Lana
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr DevelopersErik Hatcher
 
APPLICATION TO DOCUMENT ALL THE DETAILS OF JAVA CLASSES OF A PROJECT AT ONCE...
APPLICATION TO DOCUMENT ALL THE  DETAILS OF JAVA CLASSES OF A PROJECT AT ONCE...APPLICATION TO DOCUMENT ALL THE  DETAILS OF JAVA CLASSES OF A PROJECT AT ONCE...
APPLICATION TO DOCUMENT ALL THE DETAILS OF JAVA CLASSES OF A PROJECT AT ONCE...DEEPANSHU GUPTA
 
Webinar: Simplifying Persistence for Java and MongoDB
Webinar: Simplifying Persistence for Java and MongoDBWebinar: Simplifying Persistence for Java and MongoDB
Webinar: Simplifying Persistence for Java and MongoDBMongoDB
 
Simplifying Persistence for Java and MongoDB with Morphia
Simplifying Persistence for Java and MongoDB with MorphiaSimplifying Persistence for Java and MongoDB with Morphia
Simplifying Persistence for Java and MongoDB with MorphiaMongoDB
 
Java Search Engine Framework
Java Search Engine FrameworkJava Search Engine Framework
Java Search Engine FrameworkAppsterdam Milan
 
New Features in JDK 8
New Features in JDK 8New Features in JDK 8
New Features in JDK 8Martin Toshev
 
The Ring programming language version 1.6 book - Part 42 of 189
The Ring programming language version 1.6 book - Part 42 of 189The Ring programming language version 1.6 book - Part 42 of 189
The Ring programming language version 1.6 book - Part 42 of 189Mahmoud Samir Fayed
 
Local data storage for mobile apps
Local data storage for mobile appsLocal data storage for mobile apps
Local data storage for mobile appsIvano Malavolta
 
Selenium Webdriver with data driven framework
Selenium Webdriver with data driven frameworkSelenium Webdriver with data driven framework
Selenium Webdriver with data driven frameworkDavid Rajah Selvaraj
 
Тарас Олексин - Sculpt! Your! Tests!
Тарас Олексин  - Sculpt! Your! Tests!Тарас Олексин  - Sculpt! Your! Tests!
Тарас Олексин - Sculpt! Your! Tests!DataArt
 
Lucene Introduction
Lucene IntroductionLucene Introduction
Lucene Introductionotisg
 
Angular.js Primer in Aalto University
Angular.js Primer in Aalto UniversityAngular.js Primer in Aalto University
Angular.js Primer in Aalto UniversitySC5.io
 
Examiness hints and tips from the trenches
Examiness hints and tips from the trenchesExaminess hints and tips from the trenches
Examiness hints and tips from the trenchesIsmail Mayat
 
Java 7, 8 & 9 - Moving the language forward
Java 7, 8 & 9 - Moving the language forwardJava 7, 8 & 9 - Moving the language forward
Java 7, 8 & 9 - Moving the language forwardMario Fusco
 
SAX, DOM & JDOM parsers for beginners
SAX, DOM & JDOM parsers for beginnersSAX, DOM & JDOM parsers for beginners
SAX, DOM & JDOM parsers for beginnersHicham QAISSI
 
Building node.js applications with Database Jones
Building node.js applications with Database JonesBuilding node.js applications with Database Jones
Building node.js applications with Database JonesJohn David Duncan
 
WebTech Tutorial Querying DBPedia
WebTech Tutorial Querying DBPediaWebTech Tutorial Querying DBPedia
WebTech Tutorial Querying DBPediaKatrien Verbert
 

Similar to Tutorial on developing a Solr search component plugin (20)

Refactoring In Tdd The Missing Part
Refactoring In Tdd The Missing PartRefactoring In Tdd The Missing Part
Refactoring In Tdd The Missing Part
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr Developers
 
APPLICATION TO DOCUMENT ALL THE DETAILS OF JAVA CLASSES OF A PROJECT AT ONCE...
APPLICATION TO DOCUMENT ALL THE  DETAILS OF JAVA CLASSES OF A PROJECT AT ONCE...APPLICATION TO DOCUMENT ALL THE  DETAILS OF JAVA CLASSES OF A PROJECT AT ONCE...
APPLICATION TO DOCUMENT ALL THE DETAILS OF JAVA CLASSES OF A PROJECT AT ONCE...
 
Webinar: Simplifying Persistence for Java and MongoDB
Webinar: Simplifying Persistence for Java and MongoDBWebinar: Simplifying Persistence for Java and MongoDB
Webinar: Simplifying Persistence for Java and MongoDB
 
Simplifying Persistence for Java and MongoDB with Morphia
Simplifying Persistence for Java and MongoDB with MorphiaSimplifying Persistence for Java and MongoDB with Morphia
Simplifying Persistence for Java and MongoDB with Morphia
 
Java Search Engine Framework
Java Search Engine FrameworkJava Search Engine Framework
Java Search Engine Framework
 
Corba
CorbaCorba
Corba
 
New Features in JDK 8
New Features in JDK 8New Features in JDK 8
New Features in JDK 8
 
The Ring programming language version 1.6 book - Part 42 of 189
The Ring programming language version 1.6 book - Part 42 of 189The Ring programming language version 1.6 book - Part 42 of 189
The Ring programming language version 1.6 book - Part 42 of 189
 
Local data storage for mobile apps
Local data storage for mobile appsLocal data storage for mobile apps
Local data storage for mobile apps
 
Selenium Webdriver with data driven framework
Selenium Webdriver with data driven frameworkSelenium Webdriver with data driven framework
Selenium Webdriver with data driven framework
 
Тарас Олексин - Sculpt! Your! Tests!
Тарас Олексин  - Sculpt! Your! Tests!Тарас Олексин  - Sculpt! Your! Tests!
Тарас Олексин - Sculpt! Your! Tests!
 
Lucene Introduction
Lucene IntroductionLucene Introduction
Lucene Introduction
 
Angular.js Primer in Aalto University
Angular.js Primer in Aalto UniversityAngular.js Primer in Aalto University
Angular.js Primer in Aalto University
 
Examiness hints and tips from the trenches
Examiness hints and tips from the trenchesExaminess hints and tips from the trenches
Examiness hints and tips from the trenches
 
Java 7, 8 & 9 - Moving the language forward
Java 7, 8 & 9 - Moving the language forwardJava 7, 8 & 9 - Moving the language forward
Java 7, 8 & 9 - Moving the language forward
 
SAX, DOM & JDOM parsers for beginners
SAX, DOM & JDOM parsers for beginnersSAX, DOM & JDOM parsers for beginners
SAX, DOM & JDOM parsers for beginners
 
Building node.js applications with Database Jones
Building node.js applications with Database JonesBuilding node.js applications with Database Jones
Building node.js applications with Database Jones
 
WebTech Tutorial Querying DBPedia
WebTech Tutorial Querying DBPediaWebTech Tutorial Querying DBPedia
WebTech Tutorial Querying DBPedia
 
Full Text Search In PostgreSQL
Full Text Search In PostgreSQLFull Text Search In PostgreSQL
Full Text Search In PostgreSQL
 

Recently uploaded

Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 

Recently uploaded (20)

Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 

Tutorial on developing a Solr search component plugin

  • 2. Solr is ◦ Blazing fast open source enterprise search platform ◦ Lucene-based search server ◦ Written in Java ◦ Has REST-like HTTP/XML and JSON APIs ◦ Extensive plugin architecture http://lucene.apache.org/solr/
  • 3.  Allows for the development of plugins which provide advanced operations  Types of plugins: ◦ RequestHandlers  Uses url parameters and returns own response ◦ SearchComponents  Responses are embedded in other responses (such as /select) ◦ ProcessFactory  Response is stored into a field along with the document during index time
  • 4.  A quick tutorial on how to program a SearchComponent to ◦ Be initialized ◦ Parse configuration file arguments ◦ Do something useful on search request (counts some words in indexed documents) ◦ Format and return response  We’ll name our plugin “DemoSearchComponent” and show how to stick it into the solrconfig.xml for loading
  • 5.  In the next slide, we’ll specify a list of variables called “words”, and each list subtype is a string “word”  We want to load these specific words and then count them in all result sets of queries.  Ex: config file has “body”, “fish”, “dog” ◦ Indexed Document has: dog body body body fish fish fish fish orange ◦ Result should be:  body=3.0  fish=4.0  dog=1.0
  • 6. <searchComponent class="com.searchbox.DemoSearchComponent" name="democomponent"> <str name=“field">myfield</str> <lst name="words"> <str name="word">body</str> <str name="word">fish</str> <str name="word">dog</str> </lst> </searchComponent> • We tell Solr the name of the class which has our component • Variables will be loaded from this section during the init method • We set a default field for analyzing the documents • We specify a list of words we’d like to have counts of
  • 7.  We can see that we’re asking for Solr to load com.searchbox.DemoSearchComponent.  This will be the output of our project in .jar file format  Copy the .jar file to the lib directory in the Solr installation so that Solr can find it.  That’s it!
  • 8. package com.searchbox; import java.io.IOException; import java.util.Date; import java.util.HashMap; import java.util.HashSet; import java.util.List; import java.util.Set; import java.util.logging.Level; import org.apache.lucene.document.Document; import org.apache.lucene.index.IndexableField; import org.apache.solr.common.SolrException; import org.apache.solr.common.params.SolrParams; import org.apache.solr.common.util.NamedList; import org.apache.solr.common.util.SimpleOrderedMap; import org.apache.solr.core.SolrCore; import org.apache.solr.core.SolrEventListener; import org.apache.solr.handler.component.ResponseBuilder; import org.apache.solr.handler.component.SearchComponent; import org.apache.solr.schema.SchemaField; import org.apache.solr.search.DocIterator; import org.apache.solr.search.DocList; import org.apache.solr.search.SolrIndexSearcher; import org.apache.solr.util.plugin.SolrCoreAware; import org.slf4j.Logger; import org.slf4j.LoggerFactory; Just some of the common packages we’ll need to import to get things rolling!
  • 9. public class DemoSearchComponent extends SearchComponent { private static Logger LOGGER = LoggerFactory.getLogger(DemoSearchC omponent.class); volatile long numRequests; volatile long numErrors; volatile long totalRequestsTime; volatile String lastnewSearcher; volatile String lastOptimizeEvent; protected String defaultField; private List<String> words; • We specify that our class extends SearchComponent, so we know we’re in business! • We decide that we’ll keep track of some basic statistics for future usage • Number of requests/errors • Total time • Make a variable to store our defaultField and our words.
  • 10.  Initialization is called when the plugin is first loaded  This most commonly occurs when Solr is started up  At this point we can load things from file (models, serialized objects, etc)  Have access to the variables set in solrconfig.xml
  • 11.  We have selected to pass a list called “words” and have also provided the list “fish”, ”body”, ”cat” of words we’d like to count.  During initialization we need to load this list from solrconfig.xml and store it locally
  • 12. @Override public void init(NamedList args) { super.init(args); defaultField = (String) args.get("field"); if (defaultField == null) { throw new SolrException(SolrException.ErrorCode.SERVER_ERROR, "Need to specify the default for analysis"); } words = ((NamedList) args.get("words")).getAll("word"); if (words.isEmpty()) { throw new SolrException(SolrException.ErrorCode.SERVER_ERROR, "Need to specify at least one word in searchComponent config!"); } } Notice that we’ve loaded the list “words” and then all of its attributes called “word” and put them into the class level variable words. Also we’ve identified our defaultField
  • 13.  There are 2 phases in a searchComponent ◦ Prepare ◦ Process  During a query the prepare method is called on all components before any work is done.  This allows modifying, adding or substracting variables or components in the stack  Afterwards, the process methods are called for the components in the exact order specified by the solrconfig
  • 14. @Override public void prepare(ResponseBuilder rb) throws IOException { //none necessary } Nothing going on here, but we need to override it otherwise we can’t extend SearchComponent
  • 15. @Override public void process(ResponseBuilder rb) throws IOException { numRequests++; SolrParams params = rb.req.getParams(); long lstartTime = System.currentTimeMillis(); SolrIndexSearcher searcher = rb.req.getSearcher(); NamedList response = new SimpleOrderedMap(); String queryField = params.get("field"); String field = null; if (defaultField != null) { field = defaultField; } if (queryField != null) { field = queryField; } if (field == null) { LOGGER.error("Fields aren't defined, not performing counting."); return; } • We start off by keeping track in a volatile variable the number of requests we’ve seen (for use later in statistics), and we’d like to know how long the process takes so we note the time. • We create a new NamedList which will hold this components response • We look at the URL parameters to see if there is a “field” variable present. We have set this up to override the default we loaded from the config file
  • 16. DocList docs = rb.getResults().docList; if (docs == null || docs.size() == 0) { LOGGER.debug("No results"); } LOGGER.debug("Doing This many docs:t" + docs.size()); Set<String> fieldSet = new HashSet<String>(); SchemaField keyField = rb.req.getCore().getSchema().getUniqueKeyField(); if (null != keyField) { fieldSet.add(keyField.getName()); } fieldSet.add(field); • Since the search has already been completed, we get a list of documents which will be returned. • We also need to pull from the schema the field which contains the unique id. This will let us correlate our results with the rest of the response
  • 17. DocIterator iterator = docs.iterator(); for (int i = 0; i < docs.size(); i++) { try { int docId = iterator.nextDoc(); HashMap<String, Double> counts = new HashMap<String, Double>(); Document doc = searcher.doc(docId, fieldSet); IndexableField[] multifield = doc.getFields(field); for (IndexableField singlefield : multifield) { for (String string : singlefield.stringValue().split(" ")) { if (words.contains(string)) { Double oldcount = counts.containsKey(string) ? counts.get(string) : 0; counts.put(string, oldcount + 1); } } } String id = doc.getField(keyField.getName()).stringValue(); NamedList<Double> docresults = new NamedList<Double>(); for (String word : words) { docresults.add(word, counts.get(word)); } response.add(id, docresults); } catch (IOException ex) { java.util.logging.Logger.getLogger(DemoSearchComponent.class.getName()).log(Level.SEVERE, null, ex); } } • Get a document iterator to look through all docs • Setup count variable this doc • Load the document through the searcher • Get the value of the field • BEWARE if it is a multifield, using getField will only return the first instance, not ALL instances • Do our basic word counting • Get the document unique id from the keyfield • Add each word to the results for the doc • Add the doc result to the overall response, using its id value
  • 18. rb.rsp.add("demoSearchComponent", response); totalRequestsTime += System.currentTimeMillis() - lstartTime; } • Add all results to the final response • The name we pick here will show up in the Solr output • Note down how long it took for the entire process
  • 19. @Override public String getDescription() { return "Searchbox DemoSearchComponent"; } @Override public String getVersion() { return "1.0"; } @Override public String getSource() { return "http://www.searchbox.com"; } @Override public NamedList<Object> getStatistics() { NamedList all = new SimpleOrderedMap<Object>(); all.add("requests", "" + numRequests); all.add("errors", "" + numErrors); all.add("totalTime(ms)", "" + totalTime); return all; } • In order to have a production grade plugin, users expect to see certain pieces of information available in their Solr admin panel • Description, version and source are just Strings • We see getStatistics() actually uses the volatile variables we were keeping track of before, sticks them into another named list and returns them. These appear under the statistics panel in Solr. That’s it!
  • 20. <requestHandler name="/demoendpoint" class="solr.SearchHandler"> <arr name="last-components"> <str>democomponent</str> </arr> </requestHandler> We need some way to run our searchComponent, so we’ll add a quick requestHandler to test it. This is done simply by overriding the normal searchHandler and telling it to run the component we defined on an earlier slide. Of course you could use your component directly in the select handler and/or add it to a chain of other components! Solr is super versatile!
  • 21. http://192.168.56.101:8983/solr/corename/demoendpoint?q=*%3A*&wt=xml&rows=2&fl=id,myfield <response> <lst name="responseHeader"> <int name="status">0</int> <int name="QTime">79</int> </lst> <result name="response" numFound="13262" start="0"> <doc> <str name="id">f73ca075-3826-45d5-85df-64b33c760efc</str> <arr name="myfield"> <str>dog body body body fish fish fish fish orange</str> </arr> </doc> <doc> <str name="id">bc72dbef-87d1-4c39-b388-ec67babe6f05</str> <arr name="myfield"> <str>the fish had a small body. the dog likes to eat fish</str> </arr> </doc> </result> <lst name="demoSearchComponent"> <lst name="f73ca075-3826-45d5-85df-64b33c760efc"> <double name="body">3.0</double> <double name="fish">4.0</double> <double name="dog">1.0</double> </lst> <lst name="bc72dbef-87d1-4c39-b388-ec67babe6f05"> <double name="body">1.0</double> <double name="fish">2.0</double> <double name="dog">1.0</double> </lst> </lst> </response> Query results Our results Same order + ids for correlation
  • 22. • Because we’ve overridden the getStatistics() method, we can get real-time stats from the admin panel! • In this case since it’s a component of the SearchHandler, our fields are concatenated with the other statistics
  • 23. Happy Developing! Full Source Code available at: http://www.searchbox.com/developing-a-solr-plugin/