SlideShare una empresa de Scribd logo
1 de 28
Descargar para leer sin conexión
Apache Solr!
Ramzi Alqrainy!
Search Guy!
Part 1!
What !
is Apache Solr ?!
Apache Solr!
!
is!
“ a standalone full-text search server
with Apache Lucene at the backend. “!
!
!
Cont.!
Apache Lucene is a high-performance, full-
featured text search engine library written
entirely in Java. !
!
In brief Apache Solr exposes Lucene's JAVA
API as REST like API's which can be called
over HTTP from any programming
language/platform.!
Why!
use Apache Solr ?!
Features!
l  Full Text Search!
l  Faceted navigation!
l  More items like this(Recommendation)/
Related searches !
l  Spell Suggest/Auto-Complete!
l  Custom document ranking/ordering!
l  Snippet generation/highlighting!
And a lot More....!
Why Solr ?!
Also, Solr is only provides :!
1. Result Grouping / Field Collapsing!
2. Query Elevation!
3. Pivot Facet!
4. Pluggable Search/update Workflow!
5. Hash-Based Duplication!
Field Collapsing	
  
“ Collapses a group of results with the same
field value down to a single (or fixed
number) of entries.”!
For example, most search engines such as
Google collapse on site so only one or two
entries are shown, along with a link to click
to see more results from that site. Field
collapsing can also be used to suppress
duplicate documents.!
Result Grouping 	
  
“ groups documents with a common field
value into groups, returning the top
documents per group, and the top groups
based on what documents are in the groups”!
One example is a search at Best Buy for a
common term such as DVD, that shows the
top 3 results for each category ("TVs &
Video","Movies","Computers", etc)!
Query Elevation	
  
enables you to configure the top results for a
given query regardless of the normal lucene
scoring. This is sometimes called "sponsored
search", "editorial boosting" or "best bets".!
Pivot Facet	
  
You can think of it as "Decision Tree
Faceting" which tells you in advance what
the "next" set of facet results would be for a
field if you apply a constraint from the
current facet results!
Pluggable Search/update Workflow	
  
You can modify the workflow of existing API
endpoints / document instert or updates!
Hash-Based Duplication	
  
Determining the uniqueness of a document
not based on ad ID-Field, but the hash
signature of a field.!
!
Useful for web pages for example, where the
URL may be different but the content the
same.!
Boost documents by age!
•  Just do a descending
sort by age = done?!
•  Boost more recent
d o c u m e n t s a n d
p e n a l i z e o l d e r
documents just for
being old!
•  U s e f u l f o r n e w s ,
business docs, and
local search !
Solr: Indexing!
In schema.xml:
<fieldType name="tdate"
class="solr.TrieDateField"
omitNorms="true"
precisionStep="6"
positionIncrementGap="0"/>
<field name="pubdate"
type="tdate"
indexed="true"
stored="true"
required="true" />
Date published =
DateUtils.round(item.getPublishedOnDate(),Calendar.HOUR);
FunctionQuery Basics!
•  FunctionQuery: Computes a value for each
document!
– Ranking!
– Sorting!
constant
literal
fieldvalue
ord
rord
sum
sub
product
pow
abs
log
sqrt
map
scale
query
linear
recip
max
min
ms
sqedist - Squared Euclidean Dist
hsin, ghhsin - Haversine Formula
geohash - Convert to geohash
strdist
Solr: Query Time Boost!
•  Use the recip function with the ms function:!
q={!boost b=$recency v=$qq}&
recency=recip(ms(NOW/HOUR,pubdate),3.16e-11,0.08,0.05)&
qq=wine
•  Use edismax vs. dismax if possible:!
q=wine&
boost=recip(ms(NOW/HOUR,pubdate),3.16e-11,0.08,0.05)
•  Recip is a highly tunable function!
–  recip(x,m,a,b) implementing a / (m*x + b)
–  m = 3.16E-11 a= 0.08 b=0.05 x = Document Age
17
Tune Solr recip function!
18
Tips and Tricks!
•  Boost should be a multiplier on the relevancy score !
•  {!boost b=} syntax confuses the spell checker so you
need to use spellcheck.q to be explicit!
q={!boost b=$recency v=$qq}&spellcheck.q=wine
•  Bottom out the old age penalty using min:!
–  min(recip(…), 0.20)
•  Not a one-size fits all solution – academic research
focused on when to apply it !
19
•  Score based on number of unique views!
•  Not known at indexing time!
•  View count should be broken into time slots!
20
Boost by Popularity!
Popularity Illustrated!
21
Solr: ExternalFileField!
In schema.xml:
<fieldType name="externalPopularityScore"
keyField="id"
defVal="1"
stored="false" indexed="false"
class=”solr.ExternalFileField"
valType="pfloat"/>
<field name="popularity"
type="externalPopularityScore" />
22
Popularity Boost: Nuts & Bolts!
23
Logs	
  
Solr	
  Server	
  
User activity
logged
View	
  Coun1ng	
  
Job	
  
solr-home/data/
external_popularity
a=1.114
b=1.05
c=1.111
…
commit
Popularity Tips & Tricks
•  For big, high traffic sites, use log analysis!
–  Perfect problem for MapReduce!
–  Take a look at Hive for analyzing large volumes
of log data!
•  Minimum popularity score is 1 (not zero) …
up to 2 or more!
–  1 + (0.4*recent + 0.3*lastWeek + 0.2*lastMonth
…)!
•  Watch out for spell checker “buildOnCommit”!
24
Filtering By User Preferences
•  Easy approach is to build basic preference
fields in to the index:!
–  Content types of interest – content_type!
–  High-level categories of interest - category!
–  Source of interest – source!
!
•  We had too many categories and sources that
a user could enable / disable to use basic
filtering!
–  Custom SearchComponent with a connection to a
JDBC DataSource!
25
Preferences Component!
•  Connects to a database!
•  Caches DocIdSet in a Solr FastLRUCache!
•  Cached values marked as dirty using a
simple timestamp passed in the request!
!
Declared in solrconfig.xml:!
<searchComponent !
class=“demo.solr.PreferencesComponent" !
name=”pref">!
<str name="jdbcJndi">jdbc/solr</str> !
</searchComponent>!
26
YOU HAVE QUESTIONS
…. WE HAVE ANSWERS!
ramzi.alqrainy@gmail.com!
References!
•  h5p://wiki.apache.org/solr/	
  
•  h5p://www.lucidworks.com/	
  
•  Apache	
  Solr	
  4	
  Cookbook	
  

Más contenido relacionado

La actualidad más candente

Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchlucenerevolution
 
Practical Machine Learning for Smarter Search with Solr and Spark
Practical Machine Learning for Smarter Search with Solr and SparkPractical Machine Learning for Smarter Search with Solr and Spark
Practical Machine Learning for Smarter Search with Solr and SparkJake Mannix
 
Building Intelligent Search Applications with Apache Solr and PHP5
Building Intelligent Search Applications with Apache Solr and PHP5Building Intelligent Search Applications with Apache Solr and PHP5
Building Intelligent Search Applications with Apache Solr and PHP5israelekpo
 
Click-through relevance ranking in solr &  lucid works enterprise - By Andrz...
 Click-through relevance ranking in solr &  lucid works enterprise - By Andrz... Click-through relevance ranking in solr &  lucid works enterprise - By Andrz...
Click-through relevance ranking in solr &  lucid works enterprise - By Andrz...lucenerevolution
 
Webinar: Event Processing & Data Analytics with Lucidworks Fusion
Webinar: Event Processing & Data Analytics with Lucidworks FusionWebinar: Event Processing & Data Analytics with Lucidworks Fusion
Webinar: Event Processing & Data Analytics with Lucidworks FusionLucidworks
 
Webinar: Search and Recommenders
Webinar: Search and RecommendersWebinar: Search and Recommenders
Webinar: Search and RecommendersLucidworks
 
Solr: 4 big features
Solr: 4 big featuresSolr: 4 big features
Solr: 4 big featuresDavid Smiley
 
Building Client-side Search Applications with Solr
Building Client-side Search Applications with SolrBuilding Client-side Search Applications with Solr
Building Client-side Search Applications with Solrlucenerevolution
 
A Multifaceted Look At Faceting - Ted Sullivan, Lucidworks
A Multifaceted Look At Faceting - Ted Sullivan, LucidworksA Multifaceted Look At Faceting - Ted Sullivan, Lucidworks
A Multifaceted Look At Faceting - Ted Sullivan, LucidworksLucidworks
 
Relevance in the Wild - Daniel Gomez Vilanueva, Findwise
Relevance in the Wild - Daniel Gomez Vilanueva, FindwiseRelevance in the Wild - Daniel Gomez Vilanueva, Findwise
Relevance in the Wild - Daniel Gomez Vilanueva, FindwiseLucidworks
 
Introduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and UsecasesIntroduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and UsecasesRahul Jain
 
SplunkLive! Beginner Session
SplunkLive! Beginner SessionSplunkLive! Beginner Session
SplunkLive! Beginner SessionSplunk
 
10 Keys to Solr's Future: Presented by Grant Ingersoll, Lucidworks
10 Keys to Solr's Future: Presented by Grant Ingersoll, Lucidworks10 Keys to Solr's Future: Presented by Grant Ingersoll, Lucidworks
10 Keys to Solr's Future: Presented by Grant Ingersoll, LucidworksLucidworks
 
Webinar: Modern Techniques for Better Search Relevance with Fusion
Webinar: Modern Techniques for Better Search Relevance with FusionWebinar: Modern Techniques for Better Search Relevance with Fusion
Webinar: Modern Techniques for Better Search Relevance with FusionLucidworks
 
Solr 6.0 Graph Query Overview
Solr 6.0 Graph Query OverviewSolr 6.0 Graph Query Overview
Solr 6.0 Graph Query OverviewKevin Watters
 
State-of-the-Art Drupal Search with Apache Solr
State-of-the-Art Drupal Search with Apache SolrState-of-the-Art Drupal Search with Apache Solr
State-of-the-Art Drupal Search with Apache Solrguest432cd6
 

La actualidad más candente (18)

Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic search
 
Practical Machine Learning for Smarter Search with Solr and Spark
Practical Machine Learning for Smarter Search with Solr and SparkPractical Machine Learning for Smarter Search with Solr and Spark
Practical Machine Learning for Smarter Search with Solr and Spark
 
Building Intelligent Search Applications with Apache Solr and PHP5
Building Intelligent Search Applications with Apache Solr and PHP5Building Intelligent Search Applications with Apache Solr and PHP5
Building Intelligent Search Applications with Apache Solr and PHP5
 
Click-through relevance ranking in solr &  lucid works enterprise - By Andrz...
 Click-through relevance ranking in solr &  lucid works enterprise - By Andrz... Click-through relevance ranking in solr &  lucid works enterprise - By Andrz...
Click-through relevance ranking in solr &  lucid works enterprise - By Andrz...
 
Webinar: Event Processing & Data Analytics with Lucidworks Fusion
Webinar: Event Processing & Data Analytics with Lucidworks FusionWebinar: Event Processing & Data Analytics with Lucidworks Fusion
Webinar: Event Processing & Data Analytics with Lucidworks Fusion
 
Webinar: Search and Recommenders
Webinar: Search and RecommendersWebinar: Search and Recommenders
Webinar: Search and Recommenders
 
Solr: 4 big features
Solr: 4 big featuresSolr: 4 big features
Solr: 4 big features
 
Building Client-side Search Applications with Solr
Building Client-side Search Applications with SolrBuilding Client-side Search Applications with Solr
Building Client-side Search Applications with Solr
 
A Multifaceted Look At Faceting - Ted Sullivan, Lucidworks
A Multifaceted Look At Faceting - Ted Sullivan, LucidworksA Multifaceted Look At Faceting - Ted Sullivan, Lucidworks
A Multifaceted Look At Faceting - Ted Sullivan, Lucidworks
 
Relevance in the Wild - Daniel Gomez Vilanueva, Findwise
Relevance in the Wild - Daniel Gomez Vilanueva, FindwiseRelevance in the Wild - Daniel Gomez Vilanueva, Findwise
Relevance in the Wild - Daniel Gomez Vilanueva, Findwise
 
Introduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and UsecasesIntroduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and Usecases
 
SplunkLive! Beginner Session
SplunkLive! Beginner SessionSplunkLive! Beginner Session
SplunkLive! Beginner Session
 
10 Keys to Solr's Future: Presented by Grant Ingersoll, Lucidworks
10 Keys to Solr's Future: Presented by Grant Ingersoll, Lucidworks10 Keys to Solr's Future: Presented by Grant Ingersoll, Lucidworks
10 Keys to Solr's Future: Presented by Grant Ingersoll, Lucidworks
 
Introduction to Apache Solr
Introduction to Apache SolrIntroduction to Apache Solr
Introduction to Apache Solr
 
Webinar: Modern Techniques for Better Search Relevance with Fusion
Webinar: Modern Techniques for Better Search Relevance with FusionWebinar: Modern Techniques for Better Search Relevance with Fusion
Webinar: Modern Techniques for Better Search Relevance with Fusion
 
Solr 6.0 Graph Query Overview
Solr 6.0 Graph Query OverviewSolr 6.0 Graph Query Overview
Solr 6.0 Graph Query Overview
 
State-of-the-Art Drupal Search with Apache Solr
State-of-the-Art Drupal Search with Apache SolrState-of-the-Art Drupal Search with Apache Solr
State-of-the-Art Drupal Search with Apache Solr
 
it's just search
it's just searchit's just search
it's just search
 

Similar a Apache Solr 4 Part 1 - Introduction, Features, Recency Ranking and Popularity Ranking

Grails patterns and practices
Grails patterns and practicesGrails patterns and practices
Grails patterns and practicespaulbowler
 
Apache Solr Search Course Drupal 7 Acquia
Apache Solr Search Course Drupal 7 AcquiaApache Solr Search Course Drupal 7 Acquia
Apache Solr Search Course Drupal 7 AcquiaDropsolid
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with SolrErik Hatcher
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with SolrErik Hatcher
 
PLAT-8 Spring Web Scripts and Spring Surf
PLAT-8 Spring Web Scripts and Spring SurfPLAT-8 Spring Web Scripts and Spring Surf
PLAT-8 Spring Web Scripts and Spring SurfAlfresco Software
 
Rapid prototyping with solr - By Erik Hatcher
Rapid prototyping with solr -  By Erik Hatcher Rapid prototyping with solr -  By Erik Hatcher
Rapid prototyping with solr - By Erik Hatcher lucenerevolution
 
JSON REST API for WordPress
JSON REST API for WordPressJSON REST API for WordPress
JSON REST API for WordPressTaylor Lovett
 
Creating an end-to-end Recommender System with Apache Spark and Elasticsearch...
Creating an end-to-end Recommender System with Apache Spark and Elasticsearch...Creating an end-to-end Recommender System with Apache Spark and Elasticsearch...
Creating an end-to-end Recommender System with Apache Spark and Elasticsearch...sparktc
 
PLAT-7 Spring Web Scripts and Spring Surf
PLAT-7 Spring Web Scripts and Spring SurfPLAT-7 Spring Web Scripts and Spring Surf
PLAT-7 Spring Web Scripts and Spring SurfAlfresco Software
 
PLAT-7 Spring Web Scripts and Spring Surf
PLAT-7 Spring Web Scripts and Spring SurfPLAT-7 Spring Web Scripts and Spring Surf
PLAT-7 Spring Web Scripts and Spring SurfAlfresco Software
 
Turning a Search Engine into a Relational Database
Turning a Search Engine into a Relational DatabaseTurning a Search Engine into a Relational Database
Turning a Search Engine into a Relational DatabaseMatthias Wahl
 
BP-6 Repository Customization Best Practices
BP-6 Repository Customization Best PracticesBP-6 Repository Customization Best Practices
BP-6 Repository Customization Best PracticesAlfresco Software
 
System insight without Interference
System insight without InterferenceSystem insight without Interference
System insight without InterferenceTony Tam
 
曾勇 Elastic search-intro
曾勇 Elastic search-intro曾勇 Elastic search-intro
曾勇 Elastic search-introShaoning Pan
 
ElasticSearch AJUG 2013
ElasticSearch AJUG 2013ElasticSearch AJUG 2013
ElasticSearch AJUG 2013Roy Russo
 
Spring 3 - An Introduction
Spring 3 - An IntroductionSpring 3 - An Introduction
Spring 3 - An IntroductionThorsten Kamann
 
Apace Solr Web Development.pdf
Apace Solr Web Development.pdfApace Solr Web Development.pdf
Apace Solr Web Development.pdfAbanti Aazmin
 

Similar a Apache Solr 4 Part 1 - Introduction, Features, Recency Ranking and Popularity Ranking (20)

Grails patterns and practices
Grails patterns and practicesGrails patterns and practices
Grails patterns and practices
 
Splunk bsides
Splunk bsidesSplunk bsides
Splunk bsides
 
Apache Solr Search Course Drupal 7 Acquia
Apache Solr Search Course Drupal 7 AcquiaApache Solr Search Course Drupal 7 Acquia
Apache Solr Search Course Drupal 7 Acquia
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
PLAT-8 Spring Web Scripts and Spring Surf
PLAT-8 Spring Web Scripts and Spring SurfPLAT-8 Spring Web Scripts and Spring Surf
PLAT-8 Spring Web Scripts and Spring Surf
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
Rapid prototyping with solr - By Erik Hatcher
Rapid prototyping with solr -  By Erik Hatcher Rapid prototyping with solr -  By Erik Hatcher
Rapid prototyping with solr - By Erik Hatcher
 
JSON REST API for WordPress
JSON REST API for WordPressJSON REST API for WordPress
JSON REST API for WordPress
 
Solr @ eBay Kleinanzeigen
Solr @ eBay KleinanzeigenSolr @ eBay Kleinanzeigen
Solr @ eBay Kleinanzeigen
 
Creating an end-to-end Recommender System with Apache Spark and Elasticsearch...
Creating an end-to-end Recommender System with Apache Spark and Elasticsearch...Creating an end-to-end Recommender System with Apache Spark and Elasticsearch...
Creating an end-to-end Recommender System with Apache Spark and Elasticsearch...
 
PLAT-7 Spring Web Scripts and Spring Surf
PLAT-7 Spring Web Scripts and Spring SurfPLAT-7 Spring Web Scripts and Spring Surf
PLAT-7 Spring Web Scripts and Spring Surf
 
PLAT-7 Spring Web Scripts and Spring Surf
PLAT-7 Spring Web Scripts and Spring SurfPLAT-7 Spring Web Scripts and Spring Surf
PLAT-7 Spring Web Scripts and Spring Surf
 
Turning a Search Engine into a Relational Database
Turning a Search Engine into a Relational DatabaseTurning a Search Engine into a Relational Database
Turning a Search Engine into a Relational Database
 
BP-6 Repository Customization Best Practices
BP-6 Repository Customization Best PracticesBP-6 Repository Customization Best Practices
BP-6 Repository Customization Best Practices
 
System insight without Interference
System insight without InterferenceSystem insight without Interference
System insight without Interference
 
曾勇 Elastic search-intro
曾勇 Elastic search-intro曾勇 Elastic search-intro
曾勇 Elastic search-intro
 
ElasticSearch AJUG 2013
ElasticSearch AJUG 2013ElasticSearch AJUG 2013
ElasticSearch AJUG 2013
 
Spring 3 - An Introduction
Spring 3 - An IntroductionSpring 3 - An Introduction
Spring 3 - An Introduction
 
Apace Solr Web Development.pdf
Apace Solr Web Development.pdfApace Solr Web Development.pdf
Apace Solr Web Development.pdf
 

Más de Ramzi Alqrainy

Non English Search as a Machine Learning Problem
Non English Search as a Machine Learning Problem Non English Search as a Machine Learning Problem
Non English Search as a Machine Learning Problem Ramzi Alqrainy
 
OpenSooq Image Recognition on AWS - AWS ML Lab
OpenSooq Image Recognition on AWS - AWS ML LabOpenSooq Image Recognition on AWS - AWS ML Lab
OpenSooq Image Recognition on AWS - AWS ML LabRamzi Alqrainy
 
A Few Milliseconds in the Life of an HTTP Request - AWS Summit 2019
A Few Milliseconds in the Life of an HTTP Request - AWS Summit 2019A Few Milliseconds in the Life of an HTTP Request - AWS Summit 2019
A Few Milliseconds in the Life of an HTTP Request - AWS Summit 2019Ramzi Alqrainy
 
Mastering Chaos - OpenSooq’s journey from Monolithic to Microservices
Mastering Chaos - OpenSooq’s journey from Monolithic to Microservices Mastering Chaos - OpenSooq’s journey from Monolithic to Microservices
Mastering Chaos - OpenSooq’s journey from Monolithic to Microservices Ramzi Alqrainy
 
Infrastructure OpenSooq Mobile @ Scale
Infrastructure OpenSooq Mobile @ ScaleInfrastructure OpenSooq Mobile @ Scale
Infrastructure OpenSooq Mobile @ ScaleRamzi Alqrainy
 
Choosing the Right Technologies for OpenSooq
Choosing the Right Technologies for OpenSooqChoosing the Right Technologies for OpenSooq
Choosing the Right Technologies for OpenSooqRamzi Alqrainy
 
Retrieving Information From Solr
Retrieving Information From SolrRetrieving Information From Solr
Retrieving Information From SolrRamzi Alqrainy
 
Arabic Content with Apache Solr
Arabic Content with Apache SolrArabic Content with Apache Solr
Arabic Content with Apache SolrRamzi Alqrainy
 
Recommender Systems, Part 1 - Introduction to approaches and algorithms
Recommender Systems, Part 1 - Introduction to approaches and algorithmsRecommender Systems, Part 1 - Introduction to approaches and algorithms
Recommender Systems, Part 1 - Introduction to approaches and algorithmsRamzi Alqrainy
 
Evaluating Search Engines
Evaluating Search EnginesEvaluating Search Engines
Evaluating Search EnginesRamzi Alqrainy
 
Starting From Zero - Winning Strategies for Zero Results Page
Starting From Zero - Winning Strategies for Zero Results PageStarting From Zero - Winning Strategies for Zero Results Page
Starting From Zero - Winning Strategies for Zero Results PageRamzi Alqrainy
 
Search Behavior Patterns
Search Behavior PatternsSearch Behavior Patterns
Search Behavior PatternsRamzi Alqrainy
 
Intel microprocessor history
Intel microprocessor historyIntel microprocessor history
Intel microprocessor historyRamzi Alqrainy
 
How to prevent the cache problem in AJAX
How to prevent the cache problem in AJAXHow to prevent the cache problem in AJAX
How to prevent the cache problem in AJAXRamzi Alqrainy
 
Linked stacks and queues
Linked stacks and queuesLinked stacks and queues
Linked stacks and queuesRamzi Alqrainy
 
Advance Data Structure
Advance Data StructureAdvance Data Structure
Advance Data StructureRamzi Alqrainy
 

Más de Ramzi Alqrainy (20)

Non English Search as a Machine Learning Problem
Non English Search as a Machine Learning Problem Non English Search as a Machine Learning Problem
Non English Search as a Machine Learning Problem
 
OpenSooq Image Recognition on AWS - AWS ML Lab
OpenSooq Image Recognition on AWS - AWS ML LabOpenSooq Image Recognition on AWS - AWS ML Lab
OpenSooq Image Recognition on AWS - AWS ML Lab
 
A Few Milliseconds in the Life of an HTTP Request - AWS Summit 2019
A Few Milliseconds in the Life of an HTTP Request - AWS Summit 2019A Few Milliseconds in the Life of an HTTP Request - AWS Summit 2019
A Few Milliseconds in the Life of an HTTP Request - AWS Summit 2019
 
Mastering Chaos - OpenSooq’s journey from Monolithic to Microservices
Mastering Chaos - OpenSooq’s journey from Monolithic to Microservices Mastering Chaos - OpenSooq’s journey from Monolithic to Microservices
Mastering Chaos - OpenSooq’s journey from Monolithic to Microservices
 
Infrastructure OpenSooq Mobile @ Scale
Infrastructure OpenSooq Mobile @ ScaleInfrastructure OpenSooq Mobile @ Scale
Infrastructure OpenSooq Mobile @ Scale
 
Choosing the Right Technologies for OpenSooq
Choosing the Right Technologies for OpenSooqChoosing the Right Technologies for OpenSooq
Choosing the Right Technologies for OpenSooq
 
Retrieving Information From Solr
Retrieving Information From SolrRetrieving Information From Solr
Retrieving Information From Solr
 
MemSQL
MemSQLMemSQL
MemSQL
 
Arabic Content with Apache Solr
Arabic Content with Apache SolrArabic Content with Apache Solr
Arabic Content with Apache Solr
 
Recommender Systems, Part 1 - Introduction to approaches and algorithms
Recommender Systems, Part 1 - Introduction to approaches and algorithmsRecommender Systems, Part 1 - Introduction to approaches and algorithms
Recommender Systems, Part 1 - Introduction to approaches and algorithms
 
Evaluating Search Engines
Evaluating Search EnginesEvaluating Search Engines
Evaluating Search Engines
 
Starting From Zero - Winning Strategies for Zero Results Page
Starting From Zero - Winning Strategies for Zero Results PageStarting From Zero - Winning Strategies for Zero Results Page
Starting From Zero - Winning Strategies for Zero Results Page
 
Search Behavior Patterns
Search Behavior PatternsSearch Behavior Patterns
Search Behavior Patterns
 
Intel microprocessor history
Intel microprocessor historyIntel microprocessor history
Intel microprocessor history
 
How to prevent the cache problem in AJAX
How to prevent the cache problem in AJAXHow to prevent the cache problem in AJAX
How to prevent the cache problem in AJAX
 
Linked stacks and queues
Linked stacks and queuesLinked stacks and queues
Linked stacks and queues
 
Advance Data Structure
Advance Data StructureAdvance Data Structure
Advance Data Structure
 
Hashing
HashingHashing
Hashing
 
Markov Matrix
Markov MatrixMarkov Matrix
Markov Matrix
 
STACK
STACKSTACK
STACK
 

Último

Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 

Último (20)

Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 

Apache Solr 4 Part 1 - Introduction, Features, Recency Ranking and Popularity Ranking

  • 3. Apache Solr! ! is! “ a standalone full-text search server with Apache Lucene at the backend. “! ! !
  • 4. Cont.! Apache Lucene is a high-performance, full- featured text search engine library written entirely in Java. ! ! In brief Apache Solr exposes Lucene's JAVA API as REST like API's which can be called over HTTP from any programming language/platform.!
  • 6. Features! l  Full Text Search! l  Faceted navigation! l  More items like this(Recommendation)/ Related searches ! l  Spell Suggest/Auto-Complete! l  Custom document ranking/ordering! l  Snippet generation/highlighting! And a lot More....!
  • 7. Why Solr ?! Also, Solr is only provides :! 1. Result Grouping / Field Collapsing! 2. Query Elevation! 3. Pivot Facet! 4. Pluggable Search/update Workflow! 5. Hash-Based Duplication!
  • 8. Field Collapsing   “ Collapses a group of results with the same field value down to a single (or fixed number) of entries.”! For example, most search engines such as Google collapse on site so only one or two entries are shown, along with a link to click to see more results from that site. Field collapsing can also be used to suppress duplicate documents.!
  • 9. Result Grouping   “ groups documents with a common field value into groups, returning the top documents per group, and the top groups based on what documents are in the groups”! One example is a search at Best Buy for a common term such as DVD, that shows the top 3 results for each category ("TVs & Video","Movies","Computers", etc)!
  • 10. Query Elevation   enables you to configure the top results for a given query regardless of the normal lucene scoring. This is sometimes called "sponsored search", "editorial boosting" or "best bets".!
  • 11. Pivot Facet   You can think of it as "Decision Tree Faceting" which tells you in advance what the "next" set of facet results would be for a field if you apply a constraint from the current facet results!
  • 12. Pluggable Search/update Workflow   You can modify the workflow of existing API endpoints / document instert or updates!
  • 13. Hash-Based Duplication   Determining the uniqueness of a document not based on ad ID-Field, but the hash signature of a field.! ! Useful for web pages for example, where the URL may be different but the content the same.!
  • 14. Boost documents by age! •  Just do a descending sort by age = done?! •  Boost more recent d o c u m e n t s a n d p e n a l i z e o l d e r documents just for being old! •  U s e f u l f o r n e w s , business docs, and local search !
  • 15. Solr: Indexing! In schema.xml: <fieldType name="tdate" class="solr.TrieDateField" omitNorms="true" precisionStep="6" positionIncrementGap="0"/> <field name="pubdate" type="tdate" indexed="true" stored="true" required="true" /> Date published = DateUtils.round(item.getPublishedOnDate(),Calendar.HOUR);
  • 16. FunctionQuery Basics! •  FunctionQuery: Computes a value for each document! – Ranking! – Sorting! constant literal fieldvalue ord rord sum sub product pow abs log sqrt map scale query linear recip max min ms sqedist - Squared Euclidean Dist hsin, ghhsin - Haversine Formula geohash - Convert to geohash strdist
  • 17. Solr: Query Time Boost! •  Use the recip function with the ms function:! q={!boost b=$recency v=$qq}& recency=recip(ms(NOW/HOUR,pubdate),3.16e-11,0.08,0.05)& qq=wine •  Use edismax vs. dismax if possible:! q=wine& boost=recip(ms(NOW/HOUR,pubdate),3.16e-11,0.08,0.05) •  Recip is a highly tunable function! –  recip(x,m,a,b) implementing a / (m*x + b) –  m = 3.16E-11 a= 0.08 b=0.05 x = Document Age 17
  • 18. Tune Solr recip function! 18
  • 19. Tips and Tricks! •  Boost should be a multiplier on the relevancy score ! •  {!boost b=} syntax confuses the spell checker so you need to use spellcheck.q to be explicit! q={!boost b=$recency v=$qq}&spellcheck.q=wine •  Bottom out the old age penalty using min:! –  min(recip(…), 0.20) •  Not a one-size fits all solution – academic research focused on when to apply it ! 19
  • 20. •  Score based on number of unique views! •  Not known at indexing time! •  View count should be broken into time slots! 20 Boost by Popularity!
  • 22. Solr: ExternalFileField! In schema.xml: <fieldType name="externalPopularityScore" keyField="id" defVal="1" stored="false" indexed="false" class=”solr.ExternalFileField" valType="pfloat"/> <field name="popularity" type="externalPopularityScore" /> 22
  • 23. Popularity Boost: Nuts & Bolts! 23 Logs   Solr  Server   User activity logged View  Coun1ng   Job   solr-home/data/ external_popularity a=1.114 b=1.05 c=1.111 … commit
  • 24. Popularity Tips & Tricks •  For big, high traffic sites, use log analysis! –  Perfect problem for MapReduce! –  Take a look at Hive for analyzing large volumes of log data! •  Minimum popularity score is 1 (not zero) … up to 2 or more! –  1 + (0.4*recent + 0.3*lastWeek + 0.2*lastMonth …)! •  Watch out for spell checker “buildOnCommit”! 24
  • 25. Filtering By User Preferences •  Easy approach is to build basic preference fields in to the index:! –  Content types of interest – content_type! –  High-level categories of interest - category! –  Source of interest – source! ! •  We had too many categories and sources that a user could enable / disable to use basic filtering! –  Custom SearchComponent with a connection to a JDBC DataSource! 25
  • 26. Preferences Component! •  Connects to a database! •  Caches DocIdSet in a Solr FastLRUCache! •  Cached values marked as dirty using a simple timestamp passed in the request! ! Declared in solrconfig.xml:! <searchComponent ! class=“demo.solr.PreferencesComponent" ! name=”pref">! <str name="jdbcJndi">jdbc/solr</str> ! </searchComponent>! 26
  • 27. YOU HAVE QUESTIONS …. WE HAVE ANSWERS! ramzi.alqrainy@gmail.com!
  • 28. References! •  h5p://wiki.apache.org/solr/   •  h5p://www.lucidworks.com/   •  Apache  Solr  4  Cookbook