Examiness hints and tips from the trenches

Ismail Mayat
Senior Web Developer
@ The Cogworks

Examiness
Hints and tips from the trenches

What this talk is not
• How to install
• How to configure

What we will cover
• Tools to help you
• Hints and tips regarding indexing
• GatheringNodeData event is your friend!
• Indexing media (pdf,word etc)
• Deep in the bowels with DocumentWriting event
• Search highlighting
• Deployment to staging / production environments
• Faceting (Not exactly examine but still useful)
• Food for thought
• Questions and answers

Tools to help you
“Use the source Luke!”
http://code.google.com/p/luke/

Tools to help you
• http://luke.codeplex.com/ (.net port)
• Subset of common features present
• Scripting with Rhino missing etc

Using Luke
• Writing out generated queries to test in luke
var criteria = searcher.CreateSearchCriteria(IndexTypes.Content);

IBooleanOperation query = criteria.NodeTypeAlias("NewsItem");

query = query.Not().Field("umbracoNaviHide", 1.ToString());

var results = searcher.Search(query.Compile());
criteria.ToString();

Generates the following query
SearchIndexType: content, LuceneQuery: +(+__NodeTypeAlias:newsitem -umbracoNaviHide:1)
+__IndexType:content

Tools to help you
http://our.umbraco.org/projects/developer-tools/examine-dashboard

GatheringNode Data
• Examine has rich event system
• In all my implementations I have used
GatheringNode
– Merge into one contents field
– Searching on path
– Adding nodeTypeAlias field into pdf index

GatheringNode Data
Merge into contents field
• Example query
var query =
searchCriteria.Field("nodeName","hello").Or().Field("metaTit
le","hello").Field("metaDescription","hello").Compile();

GatheringNode Data
Merge to contents field
public class ExamineEvents:ApplicationBase {

public ExamineEvents() {

ExamineManager.Instance.IndexProviderCollection[Constants.ATGMainIndexerName].GatheringNodeDa
ta += ATGMainExamineEvents_GatheringNodeData;

}

void ATGMainExamineEvents_GatheringNodeData(object sender, IndexingNodeDataEventArgs e) {
AddToContentsField(e);
}

private void AddToContentsField(IndexingNodeDataEventArgs e) {

var fields = e.Fields;
var combinedFields = new StringBuilder();

foreach (var keyValuePair in fields) {
combinedFields.AppendLine(keyValuePair.Value);
}
e.Fields.Add("contents", combinedFields.ToString());
}
}

GatheringNode Data
Merge to contents field
• Query now looks like
query.Field(“contents”,”hello”)
• Adding new fields is just case of rebuild index

GatheringNode Data
Creating a searchable path
• Path is in index as 1,1056,1078 not tokenised
• Add new field with , replaced with space

GatheringNode Data
• How to query when no value e.g sql query like
select where value=‘’
• Select all
• Cannot do query like this in Examine / Lucene
• However can use GatheringNode data event
to inject in some arbitrary value then query on
that.
• E.g. field noData_Title value 1

GatheringNode Data
• Re Indexing errors
• MNTP field referencing a node that no longer
exists
• Use try catch and log the offending node

Document writing event
• You need lower level Lucene access
• E.g. boosting a field
• What is boosting? Not all documents are equal you need to artificially give
higher ranking to certain documents . When sort by is just not enough e.g.

– Person doc type. If they have important title they need to appear at
top of person search list
– Boost documents by age. Penalize older documents useful for news
and business documents.
– Boost based on unique views (would need to know up front also base
on time slots e.g last month, last week)
– Documents with more likes (custom like functionality)
– Tagging using XFS Term selector with weighting
http://our.umbraco.org/projects/website-utilities/xfs-term-selector

Document writing event
var indexer =
(UmbracoContentIndexer)ExamineManager.Instance.IndexProviderColle
ction[Constants.ATGMDirectoryIndexerName];

indexer.DocumentWriting += indexer_DocumentWriting;

void indexer_DocumentWriting(object sender,
Examine.LuceneEngine.DocumentWritingEventArgs e) {

var title= e.Document.GetField("title");

if(title==“Partner”){
e.Document.SetBoost(1.5f);
}
}

Indexing media
• Pdf indexer. Only indexes pdf content.
• CogUmbracoExamineMediaIndexer (Available as package on our)
– Uses apache tika. Indexes content and any associated meta data
– XML and derived formats
– Microsoft Office document formats
– OpenDocument Format
– Portable Document Format
– Electronic Publication Format
– Rich Text Format
– Compression and packaging formats
– Text formats
– Audio formats (MP3 etc)
– Image formats
– Video formats
– Java class files and archives
– The mbox format

Search highlighting
• Lucene contrib package Highlighter.net
• Highlights occurrences of your search term in
search results summary fragment.
• Wiki on our http://our.umbraco.org/wiki/how-
tos/how-to-highlight-text-in-examine-search-
results

Deployment to staging / production
environments
• Cannot copy index
• Can check in but could corrupt
• Selenium with ashx to rebuild index

environments
public class RebuildIndexes : IHttpHandler
{
readonly List<string> indexes = new List<string> { "ATGIndexer", "InternalIndexer", "directoryIndexer" };
public void ProcessRequest(HttpContext context)
{
context.Response.ContentType = "text/plain";
try
{
if(string.IsNullOrEmpty(context.Request.QueryString["index"]))
{
foreach (var index in indexes)
{
ExamineManager.Instance.IndexProviderCollection[index].RebuildIndex();
}

}
else
{
ExamineManager.Instance.IndexProviderCollection[context.Request.QueryString["index"]].RebuildIndex();
}
context.Response.Write("done");
}
catch(Exception ex)
{
context.Response.Write(ex.ToString());
}
}

public bool IsReusable
{
get
{
return false;
}
}
}

environments
[SetUp]
public void SetupTest()
{
selenium = new DefaultSelenium("localhost", 4444, "*chrome", "http://mydevsite");
selenium.Start();
_verificationErrors = new StringBuilder();
}

[Test]
public void RebuildIndex()
{
//not proper test but a hack to get indexes rebuilt after a deployment
try
{
selenium.Open("/umbraco/webservices/RebuildIndexes.ashx");

}
catch (SeleniumException se)
{
if (!se.Message.StartsWith("Timed out"))
{
throw;
}
}
catch (AssertionException e)
{
_verificationErrors.Append(e.Message);
}
}

Faceting
• Faceted search, also called faceted navigation or faceted
browsing, is a technique for accessing information organized
according to a faceted classification system, allowing users to
explore a collection of information by applying multiple filters
• Amazon, LinkedIn
http://www.linkedin.com/search/fpsearch?type=people&key
words=umbraco&pplSearchOrigin=GLHD&pageKey=member-
home&search=Search
• LinkedIn uses Bobo browser. Written in java it has been
ported to .net http://bobo.codeplex.com/
• Demo is SimpleFacetHandler others are available e.g
RangeFacet,PathFacet, GetFacet

Food for thought
• Using the index as object db ala RavenDb
• Scenario: You have nodes with large number of multi tree node pickers used as look ups

Food for thought
• In index node ids are stored as CSV list if MNTP
set to csv.
• Use GatheringNodeData event to do lookups
create a POCO with lookup data, serialise POCO
to JSON and store that in index.
• Advantage: Instant lookup all data ready to use
• Disadvantage: Need to keep up with lookup
changes. E.g. If Country code changes then you
would need to lookup code already in use and
update.
• Nice approach if lookup data is fairly static

Food for thought
• POCO hydration using activelucenenet ala
USiteBuilder
• Create pocos and decorate with attributes
public class Product
{
[LuceneField(“sku")]
public string Sku { get; set; }

[LuceneField(“productName")]
public string ProductName { get; set; }
}

Food for thought
var luceneProductDoc = GetItFromLucene(1234);
var product = LuceneMediator<Product>.ToRecord(luceneProductDoc );

Would need to use Lucene directly as there is a no way of getting the lucene
document from examine search result wrapper?

Take home today
• Use the index!!!

Questions
• ????
• http://twitter.com/ismailmayat

Examiness hints and tips from the trenches

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a Examiness hints and tips from the trenches

Similar a Examiness hints and tips from the trenches (20)

Examiness hints and tips from the trenches

Notas del editor