SlideShare una empresa de Scribd logo
1 de 33
Ismail Mayat
Senior Web Developer
  @ The Cogworks
Examiness
Hints and tips from the trenches
What this talk is not
• How to install
• How to configure
What we will cover
•   Tools to help you
•   Hints and tips regarding indexing
•   GatheringNodeData event is your friend!
•   Indexing media (pdf,word etc)
•   Deep in the bowels with DocumentWriting event
•   Search highlighting
•   Deployment to staging / production environments
•   Faceting (Not exactly examine but still useful)
•   Food for thought
•   Questions and answers
Tools to help you
Tools to help you
         “Use the source Luke!”
http://code.google.com/p/luke/
Tools to help you
• http://luke.codeplex.com/ (.net port)
• Subset of common features present
• Scripting with Rhino missing etc
Using Luke
• Writing out generated queries to test in luke
          var criteria = searcher.CreateSearchCriteria(IndexTypes.Content);

          IBooleanOperation query = criteria.NodeTypeAlias("NewsItem");

          query = query.Not().Field("umbracoNaviHide", 1.ToString());

          var results = searcher.Search(query.Compile());
          criteria.ToString();



Generates the following query
SearchIndexType: content, LuceneQuery: +(+__NodeTypeAlias:newsitem -umbracoNaviHide:1)
+__IndexType:content
Tools to help you
http://our.umbraco.org/projects/developer-tools/examine-dashboard
GatheringNode Data
• Examine has rich event system
• In all my implementations I have used
  GatheringNode
  – Merge into one contents field
  – Searching on path
  – Adding nodeTypeAlias field into pdf index
GatheringNode Data
         Merge into contents field
• Example query
 var query =
 searchCriteria.Field("nodeName","hello").Or().Field("metaTit
 le","hello").Field("metaDescription","hello").Compile();
GatheringNode Data
                       Merge to contents field
public class ExamineEvents:ApplicationBase {

    public ExamineEvents() {

            ExamineManager.Instance.IndexProviderCollection[Constants.ATGMainIndexerName].GatheringNodeDa
ta += ATGMainExamineEvents_GatheringNodeData;

    }

    void ATGMainExamineEvents_GatheringNodeData(object sender, IndexingNodeDataEventArgs e) {
              AddToContentsField(e);
    }

    private void AddToContentsField(IndexingNodeDataEventArgs e) {

              var fields = e.Fields;
              var combinedFields = new StringBuilder();

              foreach (var keyValuePair in fields) {
                 combinedFields.AppendLine(keyValuePair.Value);
              }
              e.Fields.Add("contents", combinedFields.ToString());
        }
}
GatheringNode Data
          Merge to contents field
• Query now looks like
  query.Field(“contents”,”hello”)
• Adding new fields is just case of rebuild index
GatheringNode Data
       Creating a searchable path
• Path is in index as 1,1056,1078 not tokenised
• Add new field with , replaced with space
GatheringNode Data
• How to query when no value e.g sql query like
  select where value=‘’
• Select all
• Cannot do query like this in Examine / Lucene
• However can use GatheringNode data event
  to inject in some arbitrary value then query on
  that.
• E.g. field noData_Title value 1
GatheringNode Data
• Re Indexing errors
• MNTP field referencing a node that no longer
  exists
• Use try catch and log the offending node
Document writing event
• You need lower level Lucene access
• E.g. boosting a field
• What is boosting? Not all documents are equal you need to artificially give
  higher ranking to certain documents . When sort by is just not enough e.g.

    – Person doc type. If they have important title they need to appear at
      top of person search list
    – Boost documents by age. Penalize older documents useful for news
      and business documents.
    – Boost based on unique views (would need to know up front also base
      on time slots e.g last month, last week)
    – Documents with more likes (custom like functionality)
    – Tagging using XFS Term selector with weighting
      http://our.umbraco.org/projects/website-utilities/xfs-term-selector
Document writing event
var indexer =
(UmbracoContentIndexer)ExamineManager.Instance.IndexProviderColle
ction[Constants.ATGMDirectoryIndexerName];

indexer.DocumentWriting += indexer_DocumentWriting;

void indexer_DocumentWriting(object sender,
Examine.LuceneEngine.DocumentWritingEventArgs e) {

       var title= e.Document.GetField("title");

       if(title==“Partner”){
               e.Document.SetBoost(1.5f);
       }
}
Indexing media
• Pdf indexer. Only indexes pdf content.
• CogUmbracoExamineMediaIndexer (Available as package on our)
   –   Uses apache tika. Indexes content and any associated meta data
   –   XML and derived formats
   –   Microsoft Office document formats
   –   OpenDocument Format
   –   Portable Document Format
   –   Electronic Publication Format
   –   Rich Text Format
   –   Compression and packaging formats
   –   Text formats
   –   Audio formats (MP3 etc)
   –   Image formats
   –   Video formats
   –   Java class files and archives
   –   The mbox format
Search highlighting
• Lucene contrib package Highlighter.net
• Highlights occurrences of your search term in
  search results summary fragment.
• Wiki on our http://our.umbraco.org/wiki/how-
  tos/how-to-highlight-text-in-examine-search-
  results
Deployment to staging / production
          environments
• Cannot copy index
• Can check in but could corrupt
• Selenium with ashx to rebuild index
Deployment to staging / production
                environments
public class RebuildIndexes : IHttpHandler
    {
        readonly List<string> indexes = new List<string> { "ATGIndexer", "InternalIndexer", "directoryIndexer" };
        public void ProcessRequest(HttpContext context)
        {
            context.Response.ContentType = "text/plain";
            try
            {
                if(string.IsNullOrEmpty(context.Request.QueryString["index"]))
                {
                    foreach (var index in indexes)
                    {
                        ExamineManager.Instance.IndexProviderCollection[index].RebuildIndex();
                    }

               }
               else
               {
                   ExamineManager.Instance.IndexProviderCollection[context.Request.QueryString["index"]].RebuildIndex();
               }
               context.Response.Write("done");
           }
           catch(Exception ex)
           {
               context.Response.Write(ex.ToString());
           }
       }

       public bool IsReusable
       {
           get
           {
               return false;
           }
       }
   }
Deployment to staging / production
         environments
[SetUp]
public void SetupTest()
{
    selenium = new DefaultSelenium("localhost", 4444, "*chrome", "http://mydevsite");
    selenium.Start();
    _verificationErrors = new StringBuilder();
}

[Test]
public void RebuildIndex()
{
    //not proper test but a hack to get indexes rebuilt after a deployment
    try
    {
        selenium.Open("/umbraco/webservices/RebuildIndexes.ashx");

    }
    catch (SeleniumException se)
    {
        if (!se.Message.StartsWith("Timed out"))
        {
            throw;
        }
    }
    catch (AssertionException e)
    {
        _verificationErrors.Append(e.Message);
    }
}
Faceting
• Faceted search, also called faceted navigation or faceted
  browsing, is a technique for accessing information organized
  according to a faceted classification system, allowing users to
  explore a collection of information by applying multiple filters
• Amazon, LinkedIn
  http://www.linkedin.com/search/fpsearch?type=people&key
  words=umbraco&pplSearchOrigin=GLHD&pageKey=member-
  home&search=Search
• LinkedIn uses Bobo browser. Written in java it has been
  ported to .net http://bobo.codeplex.com/
• Demo is SimpleFacetHandler others are available e.g
  RangeFacet,PathFacet, GetFacet
Food for thought
•   Using the index as object db ala RavenDb
•   Scenario: You have nodes with large number of multi tree node pickers used as look ups
Index as object db
Index as object db
Index as object db
Food for thought
• In index node ids are stored as CSV list if MNTP
  set to csv.
• Use GatheringNodeData event to do lookups
  create a POCO with lookup data, serialise POCO
  to JSON and store that in index.
• Advantage: Instant lookup all data ready to use
• Disadvantage: Need to keep up with lookup
  changes. E.g. If Country code changes then you
  would need to lookup code already in use and
  update.
• Nice approach if lookup data is fairly static
Food for thought
• POCO hydration using activelucenenet ala
  USiteBuilder
• Create pocos and decorate with attributes
   public class Product
   {
     [LuceneField(“sku")]
     public string Sku { get; set; }

       [LuceneField(“productName")]
       public string ProductName { get; set; }
   }
Food for thought
var luceneProductDoc = GetItFromLucene(1234);
var product = LuceneMediator<Product>.ToRecord(luceneProductDoc );

Would need to use Lucene directly as there is a no way of getting the lucene
document from examine search result wrapper?
Take home today
• Use the index!!!
Questions
• ????
• http://twitter.com/ismailmayat

Más contenido relacionado

La actualidad más candente

Tutorial on developing a Solr search component plugin
Tutorial on developing a Solr search component pluginTutorial on developing a Solr search component plugin
Tutorial on developing a Solr search component pluginsearchbox-com
 
Postgresql search demystified
Postgresql search demystifiedPostgresql search demystified
Postgresql search demystifiedjavier ramirez
 
Webinar: MongoDB Persistence with Java and Morphia
Webinar: MongoDB Persistence with Java and MorphiaWebinar: MongoDB Persistence with Java and Morphia
Webinar: MongoDB Persistence with Java and MorphiaMongoDB
 
Elasticsearch: You know, for search! and more!
Elasticsearch: You know, for search! and more!Elasticsearch: You know, for search! and more!
Elasticsearch: You know, for search! and more!Philips Kokoh Prasetyo
 
Elasticsearch - DevNexus 2015
Elasticsearch - DevNexus 2015Elasticsearch - DevNexus 2015
Elasticsearch - DevNexus 2015Roy Russo
 
Managing Your Content with Elasticsearch
Managing Your Content with ElasticsearchManaging Your Content with Elasticsearch
Managing Your Content with ElasticsearchSamantha Quiñones
 
Introduction to Elasticsearch
Introduction to ElasticsearchIntroduction to Elasticsearch
Introduction to ElasticsearchSperasoft
 
Getting started with Elasticsearch and .NET
Getting started with Elasticsearch and .NETGetting started with Elasticsearch and .NET
Getting started with Elasticsearch and .NETTomas Jansson
 
Lucene Introduction
Lucene IntroductionLucene Introduction
Lucene Introductionotisg
 
Apache Solr/Lucene Internals by Anatoliy Sokolenko
Apache Solr/Lucene Internals  by Anatoliy SokolenkoApache Solr/Lucene Internals  by Anatoliy Sokolenko
Apache Solr/Lucene Internals by Anatoliy SokolenkoProvectus
 
Introduction to Apache Solr
Introduction to Apache SolrIntroduction to Apache Solr
Introduction to Apache SolrChristos Manios
 
Retrieving Information From Solr
Retrieving Information From SolrRetrieving Information From Solr
Retrieving Information From SolrRamzi Alqrainy
 
Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...
Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...
Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...Lucidworks
 
Solr Black Belt Pre-conference
Solr Black Belt Pre-conferenceSolr Black Belt Pre-conference
Solr Black Belt Pre-conferenceErik Hatcher
 
Indexing and Query Optimization
Indexing and Query OptimizationIndexing and Query Optimization
Indexing and Query OptimizationMongoDB
 
Simplifying Persistence for Java and MongoDB with Morphia
Simplifying Persistence for Java and MongoDB with MorphiaSimplifying Persistence for Java and MongoDB with Morphia
Simplifying Persistence for Java and MongoDB with MorphiaMongoDB
 
Elasticsearch Basics
Elasticsearch BasicsElasticsearch Basics
Elasticsearch BasicsShifa Khan
 

La actualidad más candente (20)

Tutorial on developing a Solr search component plugin
Tutorial on developing a Solr search component pluginTutorial on developing a Solr search component plugin
Tutorial on developing a Solr search component plugin
 
Postgresql search demystified
Postgresql search demystifiedPostgresql search demystified
Postgresql search demystified
 
Webinar: MongoDB Persistence with Java and Morphia
Webinar: MongoDB Persistence with Java and MorphiaWebinar: MongoDB Persistence with Java and Morphia
Webinar: MongoDB Persistence with Java and Morphia
 
Elasticsearch: You know, for search! and more!
Elasticsearch: You know, for search! and more!Elasticsearch: You know, for search! and more!
Elasticsearch: You know, for search! and more!
 
Introduction To Apache Lucene
Introduction To Apache LuceneIntroduction To Apache Lucene
Introduction To Apache Lucene
 
Elasticsearch - DevNexus 2015
Elasticsearch - DevNexus 2015Elasticsearch - DevNexus 2015
Elasticsearch - DevNexus 2015
 
Managing Your Content with Elasticsearch
Managing Your Content with ElasticsearchManaging Your Content with Elasticsearch
Managing Your Content with Elasticsearch
 
Introduction to Elasticsearch
Introduction to ElasticsearchIntroduction to Elasticsearch
Introduction to Elasticsearch
 
Lucene
LuceneLucene
Lucene
 
Getting started with Elasticsearch and .NET
Getting started with Elasticsearch and .NETGetting started with Elasticsearch and .NET
Getting started with Elasticsearch and .NET
 
Lucene Introduction
Lucene IntroductionLucene Introduction
Lucene Introduction
 
Apache Solr/Lucene Internals by Anatoliy Sokolenko
Apache Solr/Lucene Internals  by Anatoliy SokolenkoApache Solr/Lucene Internals  by Anatoliy Sokolenko
Apache Solr/Lucene Internals by Anatoliy Sokolenko
 
Introduction to Apache Solr
Introduction to Apache SolrIntroduction to Apache Solr
Introduction to Apache Solr
 
Retrieving Information From Solr
Retrieving Information From SolrRetrieving Information From Solr
Retrieving Information From Solr
 
Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...
Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...
Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...
 
Solr Black Belt Pre-conference
Solr Black Belt Pre-conferenceSolr Black Belt Pre-conference
Solr Black Belt Pre-conference
 
Indexing and Query Optimization
Indexing and Query OptimizationIndexing and Query Optimization
Indexing and Query Optimization
 
Simplifying Persistence for Java and MongoDB with Morphia
Simplifying Persistence for Java and MongoDB with MorphiaSimplifying Persistence for Java and MongoDB with Morphia
Simplifying Persistence for Java and MongoDB with Morphia
 
Elasticsearch Basics
Elasticsearch BasicsElasticsearch Basics
Elasticsearch Basics
 
Apache lucene
Apache luceneApache lucene
Apache lucene
 

Similar a Examiness hints and tips from the trenches

Full Text Search with Lucene
Full Text Search with LuceneFull Text Search with Lucene
Full Text Search with LuceneWO Community
 
Tutorial 5 (lucene)
Tutorial 5 (lucene)Tutorial 5 (lucene)
Tutorial 5 (lucene)Kira
 
More on Fitnesse and Continuous Integration (Silicon Valley code camp 2012)
More on Fitnesse and Continuous Integration (Silicon Valley code camp 2012)More on Fitnesse and Continuous Integration (Silicon Valley code camp 2012)
More on Fitnesse and Continuous Integration (Silicon Valley code camp 2012)Jen Wong
 
Infinispan,Lucene,Hibername OGM
Infinispan,Lucene,Hibername OGMInfinispan,Lucene,Hibername OGM
Infinispan,Lucene,Hibername OGMJBug Italy
 
Advanced full text searching techniques using Lucene
Advanced full text searching techniques using LuceneAdvanced full text searching techniques using Lucene
Advanced full text searching techniques using LuceneAsad Abbas
 
Apache Lucene intro - Breizhcamp 2015
Apache Lucene intro - Breizhcamp 2015Apache Lucene intro - Breizhcamp 2015
Apache Lucene intro - Breizhcamp 2015Adrien Grand
 
Android webinar class_5
Android webinar class_5Android webinar class_5
Android webinar class_5Edureka!
 
SharePoint and jQuery Essentials
SharePoint and jQuery EssentialsSharePoint and jQuery Essentials
SharePoint and jQuery EssentialsMark Rackley
 
Advanced guide to develop ajax applications using dojo
Advanced guide to develop ajax applications using dojoAdvanced guide to develop ajax applications using dojo
Advanced guide to develop ajax applications using dojoFu Cheng
 
06 integrate elasticsearch
06 integrate elasticsearch06 integrate elasticsearch
06 integrate elasticsearchErhwen Kuo
 
Connecting to a REST API in iOS
Connecting to a REST API in iOSConnecting to a REST API in iOS
Connecting to a REST API in iOSgillygize
 
Indexing Strategies to Help You Scale
Indexing Strategies to Help You ScaleIndexing Strategies to Help You Scale
Indexing Strategies to Help You ScaleMongoDB
 
Sterling for Windows Phone 7
Sterling for Windows Phone 7Sterling for Windows Phone 7
Sterling for Windows Phone 7Jeremy Likness
 
[2 d1] elasticsearch 성능 최적화
[2 d1] elasticsearch 성능 최적화[2 d1] elasticsearch 성능 최적화
[2 d1] elasticsearch 성능 최적화Henry Jeong
 
Improving Your Selenium WebDriver Tests - Belgium testing days_2016
Improving Your Selenium WebDriver Tests - Belgium testing days_2016Improving Your Selenium WebDriver Tests - Belgium testing days_2016
Improving Your Selenium WebDriver Tests - Belgium testing days_2016Roy de Kleijn
 

Similar a Examiness hints and tips from the trenches (20)

Full Text Search In PostgreSQL
Full Text Search In PostgreSQLFull Text Search In PostgreSQL
Full Text Search In PostgreSQL
 
Full Text Search with Lucene
Full Text Search with LuceneFull Text Search with Lucene
Full Text Search with Lucene
 
Tutorial 5 (lucene)
Tutorial 5 (lucene)Tutorial 5 (lucene)
Tutorial 5 (lucene)
 
Real World MVC
Real World MVCReal World MVC
Real World MVC
 
More on Fitnesse and Continuous Integration (Silicon Valley code camp 2012)
More on Fitnesse and Continuous Integration (Silicon Valley code camp 2012)More on Fitnesse and Continuous Integration (Silicon Valley code camp 2012)
More on Fitnesse and Continuous Integration (Silicon Valley code camp 2012)
 
Infinispan,Lucene,Hibername OGM
Infinispan,Lucene,Hibername OGMInfinispan,Lucene,Hibername OGM
Infinispan,Lucene,Hibername OGM
 
Advanced full text searching techniques using Lucene
Advanced full text searching techniques using LuceneAdvanced full text searching techniques using Lucene
Advanced full text searching techniques using Lucene
 
Apache Lucene intro - Breizhcamp 2015
Apache Lucene intro - Breizhcamp 2015Apache Lucene intro - Breizhcamp 2015
Apache Lucene intro - Breizhcamp 2015
 
Android webinar class_5
Android webinar class_5Android webinar class_5
Android webinar class_5
 
SharePoint and jQuery Essentials
SharePoint and jQuery EssentialsSharePoint and jQuery Essentials
SharePoint and jQuery Essentials
 
L04 base patterns
L04 base patternsL04 base patterns
L04 base patterns
 
Advanced guide to develop ajax applications using dojo
Advanced guide to develop ajax applications using dojoAdvanced guide to develop ajax applications using dojo
Advanced guide to develop ajax applications using dojo
 
06 integrate elasticsearch
06 integrate elasticsearch06 integrate elasticsearch
06 integrate elasticsearch
 
Connecting to a REST API in iOS
Connecting to a REST API in iOSConnecting to a REST API in iOS
Connecting to a REST API in iOS
 
Indexing Strategies to Help You Scale
Indexing Strategies to Help You ScaleIndexing Strategies to Help You Scale
Indexing Strategies to Help You Scale
 
Local Storage
Local StorageLocal Storage
Local Storage
 
CouchDB-Lucene
CouchDB-LuceneCouchDB-Lucene
CouchDB-Lucene
 
Sterling for Windows Phone 7
Sterling for Windows Phone 7Sterling for Windows Phone 7
Sterling for Windows Phone 7
 
[2 d1] elasticsearch 성능 최적화
[2 d1] elasticsearch 성능 최적화[2 d1] elasticsearch 성능 최적화
[2 d1] elasticsearch 성능 최적화
 
Improving Your Selenium WebDriver Tests - Belgium testing days_2016
Improving Your Selenium WebDriver Tests - Belgium testing days_2016Improving Your Selenium WebDriver Tests - Belgium testing days_2016
Improving Your Selenium WebDriver Tests - Belgium testing days_2016
 

Examiness hints and tips from the trenches

  • 1. Ismail Mayat Senior Web Developer @ The Cogworks
  • 2. Examiness Hints and tips from the trenches
  • 3. What this talk is not • How to install • How to configure
  • 4. What we will cover • Tools to help you • Hints and tips regarding indexing • GatheringNodeData event is your friend! • Indexing media (pdf,word etc) • Deep in the bowels with DocumentWriting event • Search highlighting • Deployment to staging / production environments • Faceting (Not exactly examine but still useful) • Food for thought • Questions and answers
  • 6. Tools to help you “Use the source Luke!” http://code.google.com/p/luke/
  • 7. Tools to help you • http://luke.codeplex.com/ (.net port) • Subset of common features present • Scripting with Rhino missing etc
  • 8. Using Luke • Writing out generated queries to test in luke var criteria = searcher.CreateSearchCriteria(IndexTypes.Content); IBooleanOperation query = criteria.NodeTypeAlias("NewsItem"); query = query.Not().Field("umbracoNaviHide", 1.ToString()); var results = searcher.Search(query.Compile()); criteria.ToString(); Generates the following query SearchIndexType: content, LuceneQuery: +(+__NodeTypeAlias:newsitem -umbracoNaviHide:1) +__IndexType:content
  • 9. Tools to help you http://our.umbraco.org/projects/developer-tools/examine-dashboard
  • 10. GatheringNode Data • Examine has rich event system • In all my implementations I have used GatheringNode – Merge into one contents field – Searching on path – Adding nodeTypeAlias field into pdf index
  • 11. GatheringNode Data Merge into contents field • Example query var query = searchCriteria.Field("nodeName","hello").Or().Field("metaTit le","hello").Field("metaDescription","hello").Compile();
  • 12. GatheringNode Data Merge to contents field public class ExamineEvents:ApplicationBase { public ExamineEvents() { ExamineManager.Instance.IndexProviderCollection[Constants.ATGMainIndexerName].GatheringNodeDa ta += ATGMainExamineEvents_GatheringNodeData; } void ATGMainExamineEvents_GatheringNodeData(object sender, IndexingNodeDataEventArgs e) { AddToContentsField(e); } private void AddToContentsField(IndexingNodeDataEventArgs e) { var fields = e.Fields; var combinedFields = new StringBuilder(); foreach (var keyValuePair in fields) { combinedFields.AppendLine(keyValuePair.Value); } e.Fields.Add("contents", combinedFields.ToString()); } }
  • 13. GatheringNode Data Merge to contents field • Query now looks like query.Field(“contents”,”hello”) • Adding new fields is just case of rebuild index
  • 14. GatheringNode Data Creating a searchable path • Path is in index as 1,1056,1078 not tokenised • Add new field with , replaced with space
  • 15. GatheringNode Data • How to query when no value e.g sql query like select where value=‘’ • Select all • Cannot do query like this in Examine / Lucene • However can use GatheringNode data event to inject in some arbitrary value then query on that. • E.g. field noData_Title value 1
  • 16. GatheringNode Data • Re Indexing errors • MNTP field referencing a node that no longer exists • Use try catch and log the offending node
  • 17. Document writing event • You need lower level Lucene access • E.g. boosting a field • What is boosting? Not all documents are equal you need to artificially give higher ranking to certain documents . When sort by is just not enough e.g. – Person doc type. If they have important title they need to appear at top of person search list – Boost documents by age. Penalize older documents useful for news and business documents. – Boost based on unique views (would need to know up front also base on time slots e.g last month, last week) – Documents with more likes (custom like functionality) – Tagging using XFS Term selector with weighting http://our.umbraco.org/projects/website-utilities/xfs-term-selector
  • 18. Document writing event var indexer = (UmbracoContentIndexer)ExamineManager.Instance.IndexProviderColle ction[Constants.ATGMDirectoryIndexerName]; indexer.DocumentWriting += indexer_DocumentWriting; void indexer_DocumentWriting(object sender, Examine.LuceneEngine.DocumentWritingEventArgs e) { var title= e.Document.GetField("title"); if(title==“Partner”){ e.Document.SetBoost(1.5f); } }
  • 19. Indexing media • Pdf indexer. Only indexes pdf content. • CogUmbracoExamineMediaIndexer (Available as package on our) – Uses apache tika. Indexes content and any associated meta data – XML and derived formats – Microsoft Office document formats – OpenDocument Format – Portable Document Format – Electronic Publication Format – Rich Text Format – Compression and packaging formats – Text formats – Audio formats (MP3 etc) – Image formats – Video formats – Java class files and archives – The mbox format
  • 20. Search highlighting • Lucene contrib package Highlighter.net • Highlights occurrences of your search term in search results summary fragment. • Wiki on our http://our.umbraco.org/wiki/how- tos/how-to-highlight-text-in-examine-search- results
  • 21. Deployment to staging / production environments • Cannot copy index • Can check in but could corrupt • Selenium with ashx to rebuild index
  • 22. Deployment to staging / production environments public class RebuildIndexes : IHttpHandler { readonly List<string> indexes = new List<string> { "ATGIndexer", "InternalIndexer", "directoryIndexer" }; public void ProcessRequest(HttpContext context) { context.Response.ContentType = "text/plain"; try { if(string.IsNullOrEmpty(context.Request.QueryString["index"])) { foreach (var index in indexes) { ExamineManager.Instance.IndexProviderCollection[index].RebuildIndex(); } } else { ExamineManager.Instance.IndexProviderCollection[context.Request.QueryString["index"]].RebuildIndex(); } context.Response.Write("done"); } catch(Exception ex) { context.Response.Write(ex.ToString()); } } public bool IsReusable { get { return false; } } }
  • 23. Deployment to staging / production environments [SetUp] public void SetupTest() { selenium = new DefaultSelenium("localhost", 4444, "*chrome", "http://mydevsite"); selenium.Start(); _verificationErrors = new StringBuilder(); } [Test] public void RebuildIndex() { //not proper test but a hack to get indexes rebuilt after a deployment try { selenium.Open("/umbraco/webservices/RebuildIndexes.ashx"); } catch (SeleniumException se) { if (!se.Message.StartsWith("Timed out")) { throw; } } catch (AssertionException e) { _verificationErrors.Append(e.Message); } }
  • 24. Faceting • Faceted search, also called faceted navigation or faceted browsing, is a technique for accessing information organized according to a faceted classification system, allowing users to explore a collection of information by applying multiple filters • Amazon, LinkedIn http://www.linkedin.com/search/fpsearch?type=people&key words=umbraco&pplSearchOrigin=GLHD&pageKey=member- home&search=Search • LinkedIn uses Bobo browser. Written in java it has been ported to .net http://bobo.codeplex.com/ • Demo is SimpleFacetHandler others are available e.g RangeFacet,PathFacet, GetFacet
  • 25. Food for thought • Using the index as object db ala RavenDb • Scenario: You have nodes with large number of multi tree node pickers used as look ups
  • 29. Food for thought • In index node ids are stored as CSV list if MNTP set to csv. • Use GatheringNodeData event to do lookups create a POCO with lookup data, serialise POCO to JSON and store that in index. • Advantage: Instant lookup all data ready to use • Disadvantage: Need to keep up with lookup changes. E.g. If Country code changes then you would need to lookup code already in use and update. • Nice approach if lookup data is fairly static
  • 30. Food for thought • POCO hydration using activelucenenet ala USiteBuilder • Create pocos and decorate with attributes public class Product { [LuceneField(“sku")] public string Sku { get; set; } [LuceneField(“productName")] public string ProductName { get; set; } }
  • 31. Food for thought var luceneProductDoc = GetItFromLucene(1234); var product = LuceneMediator<Product>.ToRecord(luceneProductDoc ); Would need to use Lucene directly as there is a no way of getting the lucene document from examine search result wrapper?
  • 32. Take home today • Use the index!!!

Notas del editor

  1. Rationale behind talkAsk how many people are using it?Examine / Lucene is awesomeVery very fast!Examiness not real word I don’t think but used by shannon when he presented at cg describes the nuances of examine!
  2. Seeumbracotv vids also codegarden videos done by Tim Geyssens
  3. This will be more interactive session rather than me just going on for 20 mins
  4. If you do not have this book you are doing it wrong. It’s a deep grok into lucene. Examine is just a wrapper. Covers the mechanics of analysers, indexing and searching process also how a document is scored etc
  5. Mention the .net port
  6. If you don’t want to stick java on local machine or server
  7. In hidden field or trace, use luke with atg index. Grouped or And testing date ranges etc. Analysers etc. Fire up luke with atg index. Has helped to fix some strange errors not all examine related
  8. You want to rebuild your indexes use this.I had written a simple one this one is far superior. Latest version I think breaks. Update usercontrol
  9. That list of fields to query on can get pretty big you can pass in array of fields but you need to set those up front and know what they are. Also will need to add new fields after you add them to your doc type.
  10. Fields is dictionary of all fields defined in ExamineIndex node IndexUserFields I usually leave mine blank so all fields are in the index
  11. To do query where get all items from a given parent. Show in atg index
  12. Examine already has field like that egindexType so can use that just to get all content nodes. Can also use to do. ATG directory search all items.
  13. Not as common to use this event. Examine abstracts away lucene. Eg company sentiment analysis?? More details on boosting see lucene in action. Custom like so users get to like it and this boosts its relevancy in a search because it is more popular
  14. Have class inherits from ApplicationBase. Document is lucene document object. Field is lucene field.
  15. Uses ikvm so spins up java virtual machine. Images exif meta data image location search type functionality. Lucene.net spatial NB not image and audio content only meta data
  16. Mention how this is relevant to munged contents field
  17. Who uses selenium? Team city to run selenium test. Cheating as its not really a test!
  18. Old seleniumn not web driver code which is better!
  19. Amazon LOTR search as well as your list of results you get categories in left hand side e.gBooks,Music,Games.Demoatg facets on directory.
  20. Looks up for price, codes etc. Show advanced donate tool.
  21. Ask now many people use UsitebuilderHackathon??
  22. As far as I aware you cannot get the lucene document when searching using examine.