SlideShare una empresa de Scribd logo
1 de 35
Descargar para leer sin conexión
WORKSHOP TRACK
             Using Apache Solr to
             retrieve content
25.09.2012   Rüdiger Kurz, Alkacon Software
Project Collaboration
2
Agenda
3




     1.   What is Solr?
     2.   Benefits
     3.   Searching
     4.   Indexing
     5.   Configuration
Retrieving data fast
4


    ●Apache Solr is
     hopefully not able
     to answer this
     question!
    ●BUT it will
     return the
     results in less
     than a second
What is Apache Solr?
5

    ● Solr is an enterprise search platform from the
      Apache Lucene project
    ● Solr is highly scalable, providing distributed
      search and index replication
    ● Solr powers the search and navigation features
    ● Major features include
      ● Powerful full-text search
      ● Hit highlighting
      ● Faceted search
      ● Rich document (e.g., Word, PDF) handling
What is faceted search?
6

    ● Faceted search is the dynamic clustering of items
      or search results into categories
    ● That let users drill into search results (or even skip
      searching entirely)
    ● Each facet displayed typically shows the number of
      hits that match that category
    ● Users can then “drill down” by applying specific
      constraints to the search results
    ● Faceted search is also called faceted browsing,
      faceted navigation, guided navigation and
      sometimes parametric search
What is Faceted Search?
7

     The breadcrumb trail
    shows what constraints
      have already been
    applied and allows their
            removal


      “Resource types” is a
         facet, a way of
                                  Regular search results
     categorizing the results

        containerpage,
     v8flwoer, v8textblock,
     … are constraints, or
          facet values

    The facet count shows
      how many results            The tag bar shows other
      match each value            facet values of the found
                                document that can be applied
8




    Benefits
Database as bottleneck
9

    ● DBs are proprietary
    ● Require elaborate infrastructures
    ● SQL queries are hard to formulate
    ● SQL on DB is slower than search queries
    ● A lot SQL statements make DB to bottleneck
    ● Also lower traffic sites will slow to run when
      executing too many statements on DB layer
     Overall performance starts to degrade
Content retrieval so far
10

     ● OpenCms stores the content in a RDBMS
     ● To access values of an XML content you have to
       perform the following steps:
        1. Read the resource     Resource (dates, refs, attr)

        2. Read binary content         Content (blob)

        3. Un-marshal content           Marshaled XML

        4. Access with getters        Java Access Bean
The new way of content retrieval
11


     ● “Read” whole resource content by a single query
     ● Increase ease of data structure by storing
       documents
     ● New flexibility by using power of Solr query syntax
     ● Best performance based on optimized index
     ● HTTP interface for external applications
     ● Secure, scalable and cost-effective access
     ● Reduced DB traffic and increased performance
OpenCms 8.5 Solr Integration
13




     Searching
Search with Solr in OpenCms
14



     ●Querying OpenCms content using
      the power of Solr’s query syntax

      1. Send a HTTP request handler
      2. Use the new Solr Collector
      3. Call the Java API search method
OpenCms Solr handler
15

     ● The REST-like interface of Solr makes you able
       to access indexed documents over HTTP
       without any knowledge about CMS specific
       syntax
       ● A permission check is performed by OpenCms
         making sure no secure documents will be returned
       ● Using Solr based UI frameworks like “Ajax Solr” on
         your website without development costs
       ● Providing an open interface for external
         applications e.g. mobile applications
Examples: REST / JAVA / Collector
16

      http://localhost:8080/opencms/opencms/handleSolrSelect
          ?fq=type:v8flower
                                                                  1

      <cms:contentload
          collector="byQuery"
                                                                  2
          param="type:v8flower">
              <cms:contentaccess var="content" />
              ${content.value.Title}
      </cms:contentload>


      CmsObject cms = getCmsObject();
      String query = "fq=type:v8flower";
                                                                  3
      CmsSearchManager mananger = OpenCms.getSearchManager();
      CmsSolrIndex index = manager.getIndexSolr("Solr Online");
      CmsSolrResultList results = index.search(cms, query);
Live Demo
17




        Demo      Demo
                 デモ
                      Demo
     Demo
18




     Indexing
Indexed data
19




     ● Data indexed by default (hard coded)

     ● Field configuration (opencms-search.xml)

     ● XSD field mapping (Content definition)

     ● Implement a custom field configuration (Java)
Solr schema
20

     ● The Schema file contains all of the details
       about which fields your documents can
       contain
     ● OpenCms uses an adjusted version of the
       schema.xml that is contained within
       Apache Solr standard distribution
      WEB-INF/solr/conf/schama.xml
     ● If you want to add a new custom field or
       field type for documents you can modify
       this file
Advantages of field types
21


     ● Types are checked during the index
       process
     ● It enables easy rage queries even for
       dates, what is real facilitation making
       dev-life easier
     ● Custom types can be added, e.g.
       key/value tuple or some special JSON
       fields
Default indexed data
22

     ●   id - Structure id used as unique identifier for an document (The structure id of the resource)
     ●   path - Full root path (The root path of the resource e.g. /sites/default/flower_en/.content/article.html)
     ●   path_hierarchy - The full path as (path tokenized field type: text_path)
     ●   parent-folders - Parent folders (multi-valued field containing an entry for each parent path)
     ●   type - Type name (the resource type name)
     ●   res_locales - Existing locale nodes for XML content and all available locales in case of binary files
     ●   created - The creation date (The date when the resource itself has being created)
     ●   lastmodified - The date last modified (The last modification date of the resource itself)
     ●   contentdate - The content date (The date when the resource's content has been modified)
     ●   released - The release and expiration date of the resource
     ●   content A general content field that holds all extracted resource data (all languages, type text_general)
     ●   contentblob - The serialized extraction result toimprove the extraction performance while indexing
     ●   category - All categories as general text
     ●   category_exact - All categories as exact string for faceting reasons
     ●   text_<locale> - Extracted textual content optimized for the language specific search
     ●   timestamp - The time when the document was indexed last time
     ●   *_prop - All properties of a resource as searchable and stored text (<Property_Definition_Name>_prop)
     ●   *_exact - All properties of a resource as exact not stored string (<Property_Definition_Name>_exact)
XSD field mapping
23

     ● Additional field mappings for XML contents can
       now be configured directly within the XSD Schema
     ● Without modifying opencms-search.xml  No
       restart of the servlet container required
       <searchsetting element=“DisplayDate” searchcontent=“false”>
           <solrfield targetfield=“myDisplayDateField” sourcefield=“*_dt” />
       </searchsetting>
       <searchsetting element=“Teaser”>
           <solrfield targetfield=“ateaser”>
             <mapping type=“item” default=“Homepage n.a.”>Homepage</mapping>
             <mapping type=“property-search”>search.special</mapping>
             <mapping type=“dynamic” class=“my.DynamicMapping”>special</mapping>
           </solrfield>
       </searchsetting>
24




     Configuration
Enable Solr in OpenCms
25

     ● When installing OpenCms v8.5 Solr will be enabled by default
       while Solr will be disabled after updating a system to
       OpenCms 8.5
     ● To enable Solr in after updating you must create a Solr home
       directory in the WEB-INF folder of your OpenCms application
     ● Copy the solr/ folder from the OpenCms standard distribution
       as a starting point for your configuration
     ● All search configurations are done as usual in the opencms-
       search.xml below WEB-INF/config
     ● Adding the following lines will enable the Embedded Server
        <opencms><search>
            <solr enabled="true"/> […]
        </search></opencms>
Search index configuration
26

     ● You can add a custom Solr index with the known
       OpenCms search configuration syntax
     ● NOTE: class attributes are needed for the index and its
       field configuration
       <index
         class="org.opencms.search.solr.CmsSolrIndex">
           <name>Solr Online</name>
           <rebuild>auto</rebuild>
           <project>Online</project>
           <locale>all</locale>
           <configuration>solr_fields</configuration>
           <sources>
               <source>solr_source</source>
           </sources>
       </index>
Create field configuration (1/3)
27

     ● For converting a field configuration by:
       1. Copy a <filedconfiguration>-node
       2. Change / set the class attribute
       3. Optionally add a type attributes for fields
        <fieldconfiguration
        class="org.opencms.search.solr.CmsSolrFieldConfiguration">
          <name>example</name>
          <description>Converted Lucene Index</description>
            <field name="meta" store="false" index="true" type="en">
              <mapping type="property">Title</mapping>
              <mapping type="property">Description</mapping>
            </field>
          </fields>
        </fieldconfiguration>
Create field configuration (2/3)
28


     ● As value for the type attribute of a field
       definition inside the opencms-system.xml
       you can use names of any dynamic field
       defined in the schema.xml
     ● For example:
       i     -   type=“int”
       dt    -   type=“date”
       txt   -   type=“text_general”
       en    -   type=“text_en”
       es    -   type=“text_es”
       fr    -   type=“text_fr”
Create field configuration (3/3)
29

     ● As previously said the field names are defined
       in the schema.xml <solr_name> of Solr, now
       we define additional fields inside the opencms-
       search.xml <opencms_name>
     ● How does that work?
       String fieldName = <opencms_name>_txt;
       if (existsInSolrSchema(fieldName)) {
           fieldName = <opencms_name>;
       } else if (isTypeAttributeSet()) {
           fieldName = <opencms_name>_<type>;
       }
Live Demo
30




        Demo      Demo
                 デモ
                      Demo
     Demo
Future steps with IKS and Stanbol
31

     ● Having Solr and VIE integrated into OpenCms
       we are well prepared start using Apache
       Stanbol
     ● Stanbol is a top level Apache project
     ● Stanbol guarantees a quality standard
     ● Stanbol opens the perspective of sustainability
     ● We are looking to integrate Stanbol into
       OpenCms 9
Live Demo
32




        Demo      Demo
                 デモ
                      Demo
     Demo
Integration Conclusion
33


     ● Permission checked search (secure)
     ● Solr Request handler (accessible)
     ● Solr Collector (integrated)
     ● Result highlighting (user-friendly)
     ● Configuration opportunities (flexible)
     ● Search field mapping (sensitive)
     ● Type based field schema (type-safe)
     ● Lucene conversion (compatible)
34
     Thank you very much for your
     attention!
     Rüdiger Kurz
     Alkacon Software GmbH

     http://www.alkacon.com
     http://www.opencms.org

     http://www.iks-project.eu
     http://stanbol.apache.org
Any Questions?
35



      Questions?
                   Fragen?
        質問            ¿Preguntas?
      Questiones?

Más contenido relacionado

La actualidad más candente

Php classes in mumbai
Php classes in mumbaiPhp classes in mumbai
Php classes in mumbaiaadi Surve
 
Accessing mongo DB In Mule ESB
Accessing mongo DB In Mule ESBAccessing mongo DB In Mule ESB
Accessing mongo DB In Mule ESBSrinu Prasad
 
Salesforce CLI Cheat Sheet
Salesforce CLI Cheat Sheet Salesforce CLI Cheat Sheet
Salesforce CLI Cheat Sheet Keir Bowden
 
Umleitung: a tiny mochiweb/CouchDB app
Umleitung: a tiny mochiweb/CouchDB appUmleitung: a tiny mochiweb/CouchDB app
Umleitung: a tiny mochiweb/CouchDB appLenz Gschwendtner
 
Sharding in MongoDB 4.2 #what_is_new
 Sharding in MongoDB 4.2 #what_is_new Sharding in MongoDB 4.2 #what_is_new
Sharding in MongoDB 4.2 #what_is_newAntonios Giannopoulos
 
MuleSoft ESB Scripting Example
MuleSoft ESB Scripting ExampleMuleSoft ESB Scripting Example
MuleSoft ESB Scripting Exampleakashdprajapati
 
web2py:Web development like a boss
web2py:Web development like a bossweb2py:Web development like a boss
web2py:Web development like a bossFrancisco Ribeiro
 
Php Unit With Zend Framework Zendcon09
Php Unit With Zend Framework   Zendcon09Php Unit With Zend Framework   Zendcon09
Php Unit With Zend Framework Zendcon09Michelangelo van Dam
 
Synapseindia dot net development web applications with ajax
Synapseindia dot net development  web applications with ajaxSynapseindia dot net development  web applications with ajax
Synapseindia dot net development web applications with ajaxSynapseindiappsdevelopment
 
Crud operations using aws dynamo db with flask ap is and boto3
Crud operations using aws dynamo db with flask ap is and boto3Crud operations using aws dynamo db with flask ap is and boto3
Crud operations using aws dynamo db with flask ap is and boto3Katy Slemon
 
PHP 5.6 New and Deprecated Features
PHP 5.6  New and Deprecated FeaturesPHP 5.6  New and Deprecated Features
PHP 5.6 New and Deprecated FeaturesMark Niebergall
 
10 Most Important Features of New PHP 5.6
10 Most Important Features of New PHP 5.610 Most Important Features of New PHP 5.6
10 Most Important Features of New PHP 5.6Webline Infosoft P Ltd
 
eZ Find workshop: advanced insights & recipes
eZ Find workshop: advanced insights & recipeseZ Find workshop: advanced insights & recipes
eZ Find workshop: advanced insights & recipesPaul Borgermans
 

La actualidad más candente (20)

Php classes in mumbai
Php classes in mumbaiPhp classes in mumbai
Php classes in mumbai
 
Accessing mongo DB In Mule ESB
Accessing mongo DB In Mule ESBAccessing mongo DB In Mule ESB
Accessing mongo DB In Mule ESB
 
Salesforce CLI Cheat Sheet
Salesforce CLI Cheat Sheet Salesforce CLI Cheat Sheet
Salesforce CLI Cheat Sheet
 
Umleitung: a tiny mochiweb/CouchDB app
Umleitung: a tiny mochiweb/CouchDB appUmleitung: a tiny mochiweb/CouchDB app
Umleitung: a tiny mochiweb/CouchDB app
 
Sharding in MongoDB 4.2 #what_is_new
 Sharding in MongoDB 4.2 #what_is_new Sharding in MongoDB 4.2 #what_is_new
Sharding in MongoDB 4.2 #what_is_new
 
MuleSoft ESB Scripting Example
MuleSoft ESB Scripting ExampleMuleSoft ESB Scripting Example
MuleSoft ESB Scripting Example
 
Life outside WO
Life outside WOLife outside WO
Life outside WO
 
PHP7 Presentation
PHP7 PresentationPHP7 Presentation
PHP7 Presentation
 
Ajax
AjaxAjax
Ajax
 
web2py:Web development like a boss
web2py:Web development like a bossweb2py:Web development like a boss
web2py:Web development like a boss
 
Lumberjack XPath 101
Lumberjack XPath 101Lumberjack XPath 101
Lumberjack XPath 101
 
Php Unit With Zend Framework Zendcon09
Php Unit With Zend Framework   Zendcon09Php Unit With Zend Framework   Zendcon09
Php Unit With Zend Framework Zendcon09
 
Synapseindia dot net development web applications with ajax
Synapseindia dot net development  web applications with ajaxSynapseindia dot net development  web applications with ajax
Synapseindia dot net development web applications with ajax
 
Crud operations using aws dynamo db with flask ap is and boto3
Crud operations using aws dynamo db with flask ap is and boto3Crud operations using aws dynamo db with flask ap is and boto3
Crud operations using aws dynamo db with flask ap is and boto3
 
PHP 5.6 New and Deprecated Features
PHP 5.6  New and Deprecated FeaturesPHP 5.6  New and Deprecated Features
PHP 5.6 New and Deprecated Features
 
10 Most Important Features of New PHP 5.6
10 Most Important Features of New PHP 5.610 Most Important Features of New PHP 5.6
10 Most Important Features of New PHP 5.6
 
Apache
ApacheApache
Apache
 
eZ Find workshop: advanced insights & recipes
eZ Find workshop: advanced insights & recipeseZ Find workshop: advanced insights & recipes
eZ Find workshop: advanced insights & recipes
 
Catalyst MVC
Catalyst MVCCatalyst MVC
Catalyst MVC
 
REST API with CakePHP
REST API with CakePHPREST API with CakePHP
REST API with CakePHP
 

Similar a OpenCms Days 2012 - OpenCms 8.5: Using Apache Solr to retrieve content

Using Search API, Search API Solr and Facets in Drupal 8
Using Search API, Search API Solr and Facets in Drupal 8Using Search API, Search API Solr and Facets in Drupal 8
Using Search API, Search API Solr and Facets in Drupal 8Websolutions Agency
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to SolrErik Hatcher
 
Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)
Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)
Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)Kai Chan
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to SolrJayesh Bhoyar
 
Solr Powered Lucene
Solr Powered LuceneSolr Powered Lucene
Solr Powered LuceneErik Hatcher
 
IT talk SPb "Full text search for lazy guys"
IT talk SPb "Full text search for lazy guys" IT talk SPb "Full text search for lazy guys"
IT talk SPb "Full text search for lazy guys" DataArt
 
Information Retrieval - Data Science Bootcamp
Information Retrieval - Data Science BootcampInformation Retrieval - Data Science Bootcamp
Information Retrieval - Data Science BootcampKais Hassan, PhD
 
Solr search engine with multiple table relation
Solr search engine with multiple table relationSolr search engine with multiple table relation
Solr search engine with multiple table relationJay Bharat
 
Spring data presentation
Spring data presentationSpring data presentation
Spring data presentationOleksii Usyk
 
Apache Solr crash course
Apache Solr crash courseApache Solr crash course
Apache Solr crash courseTommaso Teofili
 
Apache Solr Workshop
Apache Solr WorkshopApache Solr Workshop
Apache Solr WorkshopJSGB
 
Introduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and UsecasesIntroduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and UsecasesRahul Jain
 
Solr Application Development Tutorial
Solr Application Development TutorialSolr Application Development Tutorial
Solr Application Development TutorialErik Hatcher
 
Solr Recipes Workshop
Solr Recipes WorkshopSolr Recipes Workshop
Solr Recipes WorkshopErik Hatcher
 
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)Alexandre Rafalovitch
 

Similar a OpenCms Days 2012 - OpenCms 8.5: Using Apache Solr to retrieve content (20)

Using Search API, Search API Solr and Facets in Drupal 8
Using Search API, Search API Solr and Facets in Drupal 8Using Search API, Search API Solr and Facets in Drupal 8
Using Search API, Search API Solr and Facets in Drupal 8
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
Solr5
Solr5Solr5
Solr5
 
Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)
Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)
Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
Solr Powered Lucene
Solr Powered LuceneSolr Powered Lucene
Solr Powered Lucene
 
Apache solr liferay
Apache solr liferayApache solr liferay
Apache solr liferay
 
IT talk SPb "Full text search for lazy guys"
IT talk SPb "Full text search for lazy guys" IT talk SPb "Full text search for lazy guys"
IT talk SPb "Full text search for lazy guys"
 
Information Retrieval - Data Science Bootcamp
Information Retrieval - Data Science BootcampInformation Retrieval - Data Science Bootcamp
Information Retrieval - Data Science Bootcamp
 
Apache Solr for begginers
Apache Solr for begginersApache Solr for begginers
Apache Solr for begginers
 
Solr Recipes
Solr RecipesSolr Recipes
Solr Recipes
 
Solr search engine with multiple table relation
Solr search engine with multiple table relationSolr search engine with multiple table relation
Solr search engine with multiple table relation
 
Spring data presentation
Spring data presentationSpring data presentation
Spring data presentation
 
Apache Solr crash course
Apache Solr crash courseApache Solr crash course
Apache Solr crash course
 
Apache Solr Workshop
Apache Solr WorkshopApache Solr Workshop
Apache Solr Workshop
 
Apache Solr Workshop
Apache Solr WorkshopApache Solr Workshop
Apache Solr Workshop
 
Introduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and UsecasesIntroduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and Usecases
 
Solr Application Development Tutorial
Solr Application Development TutorialSolr Application Development Tutorial
Solr Application Development Tutorial
 
Solr Recipes Workshop
Solr Recipes WorkshopSolr Recipes Workshop
Solr Recipes Workshop
 
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
 

Más de Alkacon Software GmbH & Co. KG

OpenCms Days 2016: Participation and transparency portals with OpenCms
OpenCms Days 2016: Participation and transparency portals with OpenCmsOpenCms Days 2016: Participation and transparency portals with OpenCms
OpenCms Days 2016: Participation and transparency portals with OpenCmsAlkacon Software GmbH & Co. KG
 
OpenCms Days 2016: OpenCms at the swiss seismological service
OpenCms Days 2016: OpenCms at the swiss seismological serviceOpenCms Days 2016: OpenCms at the swiss seismological service
OpenCms Days 2016: OpenCms at the swiss seismological serviceAlkacon Software GmbH & Co. KG
 
OpenCms Days 2016: Next generation content repository
OpenCms Days 2016: Next generation content repository OpenCms Days 2016: Next generation content repository
OpenCms Days 2016: Next generation content repository Alkacon Software GmbH & Co. KG
 
OpenCms Days 2015 Creating Apps for the OpenCms 10 workplace
OpenCms Days 2015  Creating Apps for the OpenCms 10 workplace OpenCms Days 2015  Creating Apps for the OpenCms 10 workplace
OpenCms Days 2015 Creating Apps for the OpenCms 10 workplace Alkacon Software GmbH & Co. KG
 
OpenCms Days 2015 Modern templates with nested containers
OpenCms Days 2015 Modern templates with nested containersOpenCms Days 2015 Modern templates with nested containers
OpenCms Days 2015 Modern templates with nested containersAlkacon Software GmbH & Co. KG
 
OpenCms Days 2014 - How Techem handles international customer portals
OpenCms Days 2014 - How Techem handles international customer portalsOpenCms Days 2014 - How Techem handles international customer portals
OpenCms Days 2014 - How Techem handles international customer portalsAlkacon Software GmbH & Co. KG
 
OpenCms Days 2014 - Enhancing OpenCms front end development with Sass and Grunt
OpenCms Days 2014 - Enhancing OpenCms front end development with Sass and GruntOpenCms Days 2014 - Enhancing OpenCms front end development with Sass and Grunt
OpenCms Days 2014 - Enhancing OpenCms front end development with Sass and GruntAlkacon Software GmbH & Co. KG
 
OpenCms Days 2014 - OpenCms cloud setup with the FI-TS
OpenCms Days 2014 - OpenCms cloud setup with the FI-TSOpenCms Days 2014 - OpenCms cloud setup with the FI-TS
OpenCms Days 2014 - OpenCms cloud setup with the FI-TSAlkacon Software GmbH & Co. KG
 

Más de Alkacon Software GmbH & Co. KG (20)

OpenCms Days 2016: Multilingual websites with OpenCms
OpenCms Days 2016:   Multilingual websites with OpenCmsOpenCms Days 2016:   Multilingual websites with OpenCms
OpenCms Days 2016: Multilingual websites with OpenCms
 
OpenCms Days 2016: Participation and transparency portals with OpenCms
OpenCms Days 2016: Participation and transparency portals with OpenCmsOpenCms Days 2016: Participation and transparency portals with OpenCms
OpenCms Days 2016: Participation and transparency portals with OpenCms
 
OpenCms Days 2016: OpenCms at the swiss seismological service
OpenCms Days 2016: OpenCms at the swiss seismological serviceOpenCms Days 2016: OpenCms at the swiss seismological service
OpenCms Days 2016: OpenCms at the swiss seismological service
 
OpenCms Days 2016: Next generation content repository
OpenCms Days 2016: Next generation content repository OpenCms Days 2016: Next generation content repository
OpenCms Days 2016: Next generation content repository
 
OpenCms Days 2016: Keynote - Introducing OpenCms 10.5
OpenCms Days 2016:   Keynote - Introducing OpenCms 10.5OpenCms Days 2016:   Keynote - Introducing OpenCms 10.5
OpenCms Days 2016: Keynote - Introducing OpenCms 10.5
 
OpenCms Days 2015 OpenCms X marks the spot
OpenCms Days 2015 OpenCms X marks the spotOpenCms Days 2015 OpenCms X marks the spot
OpenCms Days 2015 OpenCms X marks the spot
 
OpenCms Days 2015 Next generation repository
OpenCms Days 2015  Next generation repositoryOpenCms Days 2015  Next generation repository
OpenCms Days 2015 Next generation repository
 
OpenCms Days 2015 Creating Apps for the OpenCms 10 workplace
OpenCms Days 2015  Creating Apps for the OpenCms 10 workplace OpenCms Days 2015  Creating Apps for the OpenCms 10 workplace
OpenCms Days 2015 Creating Apps for the OpenCms 10 workplace
 
OpenCms Days 2015 OCEE explained
OpenCms Days 2015 OCEE explainedOpenCms Days 2015 OCEE explained
OpenCms Days 2015 OCEE explained
 
OpenCms Days 2015 Workflow using Docker and Jenkins
OpenCms Days 2015 Workflow using Docker and JenkinsOpenCms Days 2015 Workflow using Docker and Jenkins
OpenCms Days 2015 Workflow using Docker and Jenkins
 
OpenCms Days 2015 Modern templates with nested containers
OpenCms Days 2015 Modern templates with nested containersOpenCms Days 2015 Modern templates with nested containers
OpenCms Days 2015 Modern templates with nested containers
 
OpenCms Days 2015 Hidden features of OpenCms
OpenCms Days 2015 Hidden features of OpenCmsOpenCms Days 2015 Hidden features of OpenCms
OpenCms Days 2015 Hidden features of OpenCms
 
OpenCms Days 2015 Advanced Solr Searching
OpenCms Days 2015 Advanced Solr SearchingOpenCms Days 2015 Advanced Solr Searching
OpenCms Days 2015 Advanced Solr Searching
 
OpenCms Days 2015 OpenGovernment
OpenCms Days 2015 OpenGovernmentOpenCms Days 2015 OpenGovernment
OpenCms Days 2015 OpenGovernment
 
OpenCms Days 2015 OpenCms at erarta
OpenCms Days 2015 OpenCms at erarta OpenCms Days 2015 OpenCms at erarta
OpenCms Days 2015 OpenCms at erarta
 
OpenCms Days 2015 How do you develop for OpenCms?
OpenCms Days 2015 How do you develop for OpenCms?OpenCms Days 2015 How do you develop for OpenCms?
OpenCms Days 2015 How do you develop for OpenCms?
 
OpenCms Days 2015 Arkema, a leading chemicals company
OpenCms Days 2015 Arkema, a leading chemicals companyOpenCms Days 2015 Arkema, a leading chemicals company
OpenCms Days 2015 Arkema, a leading chemicals company
 
OpenCms Days 2014 - How Techem handles international customer portals
OpenCms Days 2014 - How Techem handles international customer portalsOpenCms Days 2014 - How Techem handles international customer portals
OpenCms Days 2014 - How Techem handles international customer portals
 
OpenCms Days 2014 - Enhancing OpenCms front end development with Sass and Grunt
OpenCms Days 2014 - Enhancing OpenCms front end development with Sass and GruntOpenCms Days 2014 - Enhancing OpenCms front end development with Sass and Grunt
OpenCms Days 2014 - Enhancing OpenCms front end development with Sass and Grunt
 
OpenCms Days 2014 - OpenCms cloud setup with the FI-TS
OpenCms Days 2014 - OpenCms cloud setup with the FI-TSOpenCms Days 2014 - OpenCms cloud setup with the FI-TS
OpenCms Days 2014 - OpenCms cloud setup with the FI-TS
 

Último

TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????blackmambaettijean
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 

Último (20)

TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 

OpenCms Days 2012 - OpenCms 8.5: Using Apache Solr to retrieve content

  • 1. WORKSHOP TRACK Using Apache Solr to retrieve content 25.09.2012 Rüdiger Kurz, Alkacon Software
  • 3. Agenda 3 1. What is Solr? 2. Benefits 3. Searching 4. Indexing 5. Configuration
  • 4. Retrieving data fast 4 ●Apache Solr is hopefully not able to answer this question! ●BUT it will return the results in less than a second
  • 5. What is Apache Solr? 5 ● Solr is an enterprise search platform from the Apache Lucene project ● Solr is highly scalable, providing distributed search and index replication ● Solr powers the search and navigation features ● Major features include ● Powerful full-text search ● Hit highlighting ● Faceted search ● Rich document (e.g., Word, PDF) handling
  • 6. What is faceted search? 6 ● Faceted search is the dynamic clustering of items or search results into categories ● That let users drill into search results (or even skip searching entirely) ● Each facet displayed typically shows the number of hits that match that category ● Users can then “drill down” by applying specific constraints to the search results ● Faceted search is also called faceted browsing, faceted navigation, guided navigation and sometimes parametric search
  • 7. What is Faceted Search? 7 The breadcrumb trail shows what constraints have already been applied and allows their removal “Resource types” is a facet, a way of Regular search results categorizing the results containerpage, v8flwoer, v8textblock, … are constraints, or facet values The facet count shows how many results The tag bar shows other match each value facet values of the found document that can be applied
  • 8. 8 Benefits
  • 9. Database as bottleneck 9 ● DBs are proprietary ● Require elaborate infrastructures ● SQL queries are hard to formulate ● SQL on DB is slower than search queries ● A lot SQL statements make DB to bottleneck ● Also lower traffic sites will slow to run when executing too many statements on DB layer  Overall performance starts to degrade
  • 10. Content retrieval so far 10 ● OpenCms stores the content in a RDBMS ● To access values of an XML content you have to perform the following steps: 1. Read the resource Resource (dates, refs, attr) 2. Read binary content Content (blob) 3. Un-marshal content Marshaled XML 4. Access with getters Java Access Bean
  • 11. The new way of content retrieval 11 ● “Read” whole resource content by a single query ● Increase ease of data structure by storing documents ● New flexibility by using power of Solr query syntax ● Best performance based on optimized index ● HTTP interface for external applications ● Secure, scalable and cost-effective access ● Reduced DB traffic and increased performance
  • 12. OpenCms 8.5 Solr Integration
  • 13. 13 Searching
  • 14. Search with Solr in OpenCms 14 ●Querying OpenCms content using the power of Solr’s query syntax 1. Send a HTTP request handler 2. Use the new Solr Collector 3. Call the Java API search method
  • 15. OpenCms Solr handler 15 ● The REST-like interface of Solr makes you able to access indexed documents over HTTP without any knowledge about CMS specific syntax ● A permission check is performed by OpenCms making sure no secure documents will be returned ● Using Solr based UI frameworks like “Ajax Solr” on your website without development costs ● Providing an open interface for external applications e.g. mobile applications
  • 16. Examples: REST / JAVA / Collector 16 http://localhost:8080/opencms/opencms/handleSolrSelect ?fq=type:v8flower 1 <cms:contentload collector="byQuery" 2 param="type:v8flower"> <cms:contentaccess var="content" /> ${content.value.Title} </cms:contentload> CmsObject cms = getCmsObject(); String query = "fq=type:v8flower"; 3 CmsSearchManager mananger = OpenCms.getSearchManager(); CmsSolrIndex index = manager.getIndexSolr("Solr Online"); CmsSolrResultList results = index.search(cms, query);
  • 17. Live Demo 17 Demo Demo デモ Demo Demo
  • 18. 18 Indexing
  • 19. Indexed data 19 ● Data indexed by default (hard coded) ● Field configuration (opencms-search.xml) ● XSD field mapping (Content definition) ● Implement a custom field configuration (Java)
  • 20. Solr schema 20 ● The Schema file contains all of the details about which fields your documents can contain ● OpenCms uses an adjusted version of the schema.xml that is contained within Apache Solr standard distribution WEB-INF/solr/conf/schama.xml ● If you want to add a new custom field or field type for documents you can modify this file
  • 21. Advantages of field types 21 ● Types are checked during the index process ● It enables easy rage queries even for dates, what is real facilitation making dev-life easier ● Custom types can be added, e.g. key/value tuple or some special JSON fields
  • 22. Default indexed data 22 ● id - Structure id used as unique identifier for an document (The structure id of the resource) ● path - Full root path (The root path of the resource e.g. /sites/default/flower_en/.content/article.html) ● path_hierarchy - The full path as (path tokenized field type: text_path) ● parent-folders - Parent folders (multi-valued field containing an entry for each parent path) ● type - Type name (the resource type name) ● res_locales - Existing locale nodes for XML content and all available locales in case of binary files ● created - The creation date (The date when the resource itself has being created) ● lastmodified - The date last modified (The last modification date of the resource itself) ● contentdate - The content date (The date when the resource's content has been modified) ● released - The release and expiration date of the resource ● content A general content field that holds all extracted resource data (all languages, type text_general) ● contentblob - The serialized extraction result toimprove the extraction performance while indexing ● category - All categories as general text ● category_exact - All categories as exact string for faceting reasons ● text_<locale> - Extracted textual content optimized for the language specific search ● timestamp - The time when the document was indexed last time ● *_prop - All properties of a resource as searchable and stored text (<Property_Definition_Name>_prop) ● *_exact - All properties of a resource as exact not stored string (<Property_Definition_Name>_exact)
  • 23. XSD field mapping 23 ● Additional field mappings for XML contents can now be configured directly within the XSD Schema ● Without modifying opencms-search.xml  No restart of the servlet container required <searchsetting element=“DisplayDate” searchcontent=“false”> <solrfield targetfield=“myDisplayDateField” sourcefield=“*_dt” /> </searchsetting> <searchsetting element=“Teaser”> <solrfield targetfield=“ateaser”> <mapping type=“item” default=“Homepage n.a.”>Homepage</mapping> <mapping type=“property-search”>search.special</mapping> <mapping type=“dynamic” class=“my.DynamicMapping”>special</mapping> </solrfield> </searchsetting>
  • 24. 24 Configuration
  • 25. Enable Solr in OpenCms 25 ● When installing OpenCms v8.5 Solr will be enabled by default while Solr will be disabled after updating a system to OpenCms 8.5 ● To enable Solr in after updating you must create a Solr home directory in the WEB-INF folder of your OpenCms application ● Copy the solr/ folder from the OpenCms standard distribution as a starting point for your configuration ● All search configurations are done as usual in the opencms- search.xml below WEB-INF/config ● Adding the following lines will enable the Embedded Server <opencms><search> <solr enabled="true"/> […] </search></opencms>
  • 26. Search index configuration 26 ● You can add a custom Solr index with the known OpenCms search configuration syntax ● NOTE: class attributes are needed for the index and its field configuration <index class="org.opencms.search.solr.CmsSolrIndex"> <name>Solr Online</name> <rebuild>auto</rebuild> <project>Online</project> <locale>all</locale> <configuration>solr_fields</configuration> <sources> <source>solr_source</source> </sources> </index>
  • 27. Create field configuration (1/3) 27 ● For converting a field configuration by: 1. Copy a <filedconfiguration>-node 2. Change / set the class attribute 3. Optionally add a type attributes for fields <fieldconfiguration class="org.opencms.search.solr.CmsSolrFieldConfiguration"> <name>example</name> <description>Converted Lucene Index</description> <field name="meta" store="false" index="true" type="en"> <mapping type="property">Title</mapping> <mapping type="property">Description</mapping> </field> </fields> </fieldconfiguration>
  • 28. Create field configuration (2/3) 28 ● As value for the type attribute of a field definition inside the opencms-system.xml you can use names of any dynamic field defined in the schema.xml ● For example: i - type=“int” dt - type=“date” txt - type=“text_general” en - type=“text_en” es - type=“text_es” fr - type=“text_fr”
  • 29. Create field configuration (3/3) 29 ● As previously said the field names are defined in the schema.xml <solr_name> of Solr, now we define additional fields inside the opencms- search.xml <opencms_name> ● How does that work? String fieldName = <opencms_name>_txt; if (existsInSolrSchema(fieldName)) { fieldName = <opencms_name>; } else if (isTypeAttributeSet()) { fieldName = <opencms_name>_<type>; }
  • 30. Live Demo 30 Demo Demo デモ Demo Demo
  • 31. Future steps with IKS and Stanbol 31 ● Having Solr and VIE integrated into OpenCms we are well prepared start using Apache Stanbol ● Stanbol is a top level Apache project ● Stanbol guarantees a quality standard ● Stanbol opens the perspective of sustainability ● We are looking to integrate Stanbol into OpenCms 9
  • 32. Live Demo 32 Demo Demo デモ Demo Demo
  • 33. Integration Conclusion 33 ● Permission checked search (secure) ● Solr Request handler (accessible) ● Solr Collector (integrated) ● Result highlighting (user-friendly) ● Configuration opportunities (flexible) ● Search field mapping (sensitive) ● Type based field schema (type-safe) ● Lucene conversion (compatible)
  • 34. 34 Thank you very much for your attention! Rüdiger Kurz Alkacon Software GmbH http://www.alkacon.com http://www.opencms.org http://www.iks-project.eu http://stanbol.apache.org
  • 35. Any Questions? 35 Questions? Fragen? 質問 ¿Preguntas? Questiones?