SlideShare una empresa de Scribd logo
1 de 8
Descargar para leer sin conexión
“Regular” Search
                 Faceted Search
                                                     Interface:
                                                      !   User expresses information need as short query.
                                                          Search engine returns ranked, pageable result set.
                   New York CTO Club
                                                      !



                   December 9, 2009                  User happy when...
                                                      !   Top-ranked result satisfies information need.
                                                      !   At least some result on first page is relevant.

               Daniel Tunkelang, Google              User unhappy when...
              Otis Gospodneti!, Sematext              !   No result on first page satisfies information need.
                                                      !   Results misleadingly appear relevant (bait and switch).
                                                 1                                                                  3




                         Agenda                                 Relevance Is Subjective
Daniel:
!
    What is faceted search?                          Relevance is defined as a measure of
!
    Why use faceted search?                          information conveyed by a document relative to
!
    Thoughts about design and user experience.       a query.

                                                     It is shown that the relationship between the
Otis:
!
    What are Lucene and Solr?                        document and the query, though necessary, is
!
    Why use an open-source search library?
                                                     not sufficient to determine relevance.
!
    Thoughts about implementation.
                                                     William Goffman, On relevance as a measure, 1964.
                                                 2                                                                  4
Regular Search Experience                                       What is Faceted Search?
                                                     !   Best understood through examples.
                                                            "   See the following slides.
                                                            "   Or shop on almost any ecommerce site.
                                                     !   Facets = multiple ways to organize information.
                                                            "   Often based on available structured information.
                                                            "   But not always, e.g., facets obtained via text mining.
                                                     !   Typical interaction:
                                                            "   User starts with a full-text search.
                                                            "   Facets guide query refinement process.

                                                 5                                                                       7




Assumptions Are Dangerous                                       Faceted Search for News
                      !
                          self-awareness

  tf-idf
           PageRank   !
                          self-expression

                      !
                          model knows best

                      !
                          answer is a document

                      !
                          one-shot query
                                                 6                                                                       8
Faceted Search for People




                                9




Faceted Search for Breakfast        But Facets are Not a Silver Bullet...
                                    !   Screen real estate is finite.
                                           "   Choose facets wisely.
                                           "   Choose facet values wisely for monster facets.
                                    !   Multiple selection within a facet is powerful, but...
                                           "   Has to be intuitive, especially AND vs. OR.
                                           "   Even trickier for hierarchical facets.
                                    !   Search relevance still matters!
                                           "   Most faceted search applications rank results.
                                           "   Irrelevant results " irrelevant facet refinements.


                               10                                                                   12
Exploring Information Science                              Be Careful with Faceted Search!



                                                                Cameras have artists?!




                                                      13                                     15




Deliver Precision and Recall                                     Clarify, Then Refine




 Easier said than done!

 Ranking of facet values is an open research topic.
                                                      14                                     16
Take-Aways                                                 What is / isn't Lucene
!   Faceted search addresses the subjectivity of                    !   Free, ASL, Java IR library, Jar
    relevance and information overload.                             !   Doug Cutting, ASF, 2001
!   But deploying faceted search effectively
                                                                    !   Application agnostic: Indexing & Searching
    requires that you think about user experience.                  !   High performance, scalable
                                                                    !   No dependencies
!   Recommended reading:
                                                                    !   Heavily ported
       "   My thin book entitled Faceted Search
       "   Marti Hearst's book on Search User Interfaces
                                                                    !   No: crawler, rich doc parser, turn-key solution
       "   Peter Morville's upcoming book on Search Patterns        !   No: out of the box faceted search-capability... but...



                                                               17                                                                19




Faceted Search with Lucene & Solr




                   Otis Gospodneti!, Sematext




                                                               18
What is/isn't Solr                                        Facet Field Requirements
!
    Indexing/Search server with HTTP API built on             !
                                                                  Must be indexed
    top of Lucene                                             !
                                                                  Often not tokenized
!
    Fast & scalable (distributed search, index                !
                                                                  Often not altered (lowercase, punctuation)#
    replication)#
                                                              !
                                                                  Storing not required
!
    XML, JSON, Ruby, Perl, PHP, javabin
                                                              !
                                                                  Multivalued fields OK
!
    No: crawler (but Nutch ==> Solr works)#
!
    Yes: rich text parser
!
    Yes: Faceted Search out of the box!
                                                         21                                                                              23




          Solr and Faceted Search                                                           Turn It On
!
    3 Types of facets: Field Values (text), Dates,            !
                                                                  0 facets:
    Queries.                                                      !
                                                                      http://host:80/solr/select?q=foo

!
    “Text”: return counts for all/top terms in a field        !
                                                                  1 facet:
    for a result set - e.g. categories a la Amazon                !
                                                                      http://host:80/solr/select?q=foo&facet=true&facet.field=category

!
    Dates: return counts for docs in specified date           !
                                                                  N facets:
    ranges                                                        !
                                                                      http://host:80/solr/select?
                                                                      q=foo&facet=true&facet.field=category&facet.field=inStock
!
    Queries: return counts for docs that also match           !
                                                                  facet=true or facet.on
    a given query - handy for number ranges (think
    prices!)#
                                                         22                                                                              24
Text Facet Response                                                     Date Facet Response
<result numFound="4" start="0"/>                                          <result name="response" numFound="42" start="0"/>
                                        !
                                            facet.mincount=1 to
<lst name="facet_counts">                                                 <lst name="facet_counts">

<lst name="facet_fields">
                                            avoid 0-count facet           <lst name="facet_dates">

 <lst name="category">                      values                         <lst name="timestamp">

     <int name="electronics">3</int>    !
                                            facet.limit=N to limit to          <int name="2007-08-11T00:00:00.000Z">1</int>

     <int name="copier">0</int>                                                <int name="2007-08-12T00:00:00.000Z">5</int>
                                            top N facet values
 </lst>                                                                        <int name="2007-08-13T00:00:00.000Z">3</int>

 <lst name="inStock">                   !
                                            facet.missing=true to              <int name="2007-08-14T00:00:00.000Z">7</int>

     <int name="false">3</int>              catch uncategorized                <int name="2007-08-15T00:00:00.000Z">2</int>

     <int name="true">1</int>                                                  <int name="2007-08-16T00:00:00.000Z">16</int>

 </lst>
                                        !
                                            lots of other options!             <str name="gap">+1DAY</str>

</lst>                                                                         <date name="end">2007-08-17T00:00:00Z</date>

</lst>                                                               25    </lst>                                              27




                                  Date Facets                                                         Query Facets
!
    http://.../solr/select/?                                              !
                                                                              http://.../solr/select?
    q=*:*&rows=0&facet=true&facet.date=timesta                                q=shoes&rows=0&facet=true&facet.field=inStoc
    mp&facet.date.start=NOW/DAY-                                              k&facet.query=price:
    5DAYS&facet.date.end=NOW/DAY                                              [*+TO+500]&facet.query=price:[500+TO+*]
    %2B1DAY&facet.date.gap=%2B1DAY                                        !
                                                                              Avoids the bucket-at-index-time work-around
!
    (%2B1 ==> +1)#                                                        !
                                                                              Keep queries disjoint
!
    Solr Date Math Parser syntax: /HOUR,
    +2YEARS, -1DAY, /DAY+6MONTHS+3DAYS,
    +6MONTHS+3DAYS/DAY
                                                                     26                                                        28
Query Facet Response                                State of Lucene & Solr
<result numFound="3" start="0"/>
                                                      !
                                                          Super healthy community, exploding
<lst name="facet_counts">

<lst name="facet_queries">
                                                          development
 <int name="price:[* TO 500]">3</int>                 !
                                                          Lucene 3.0 – 2009-11-25:
 <int name="price:[500 TO *]">1</int>
                                                             !
                                                                 Performance, faster range queries, clean API, better
</lst>
                                                                 Unicode support, more non-English support
<lst name="facet_fields">

 <lst name="inStock">
                                                      !
                                                          Solr 1.4 – 2009-11-10:
     <int name="false">3</int>                               !
                                                                 Performance, new replication, Db indexing, rich-doc
     <int name="true">1</int>                                    indexing, results clustering, faster response protocol,
 </lst>                                                          deduplication...
</lst>

</lst>                                           29                                                                        31




                                UI Integration                     Lucene, Solr, Enterprise
!
    Use Filter Queries via fq                         !
                                                          Free: Community
!
    http://.../solr/select?                                  !
                                                                 Lucene ~ 600 emails/month (dev: 2000/month)#
    q=shoes&facet=true&facet.field=category&                 !
                                                                 Solr ~1300 emails/month (dev: 800/month)#
    fq=price:[0 TO 300]
!
    http://.../solr/select?                           !
                                                          Commercial: Support Subscriptions
    q=shoes&facet=true&facet.field=category&                 !
                                                                 Sematext
    fq=price:[0 TO 300]&fq=inStock:true                      !
                                                                 Lucid Imagination
!
    Important: single request does it all

                                                 30                                                                        32

Más contenido relacionado

Destacado

Developing enterprise ecommerce solutions using hybris by Drazen Nikolic - Be...
Developing enterprise ecommerce solutions using hybris by Drazen Nikolic - Be...Developing enterprise ecommerce solutions using hybris by Drazen Nikolic - Be...
Developing enterprise ecommerce solutions using hybris by Drazen Nikolic - Be...
youngculture
 

Destacado (17)

Noise Resilience in Machine Learning Algorithms
Noise Resilience in Machine Learning AlgorithmsNoise Resilience in Machine Learning Algorithms
Noise Resilience in Machine Learning Algorithms
 
Resume parser
Resume parserResume parser
Resume parser
 
Automatically mining facets for queries from their search results
Automatically mining facets for queries from their search resultsAutomatically mining facets for queries from their search results
Automatically mining facets for queries from their search results
 
Apache Solr vs Oracle Endeca
Apache Solr vs Oracle EndecaApache Solr vs Oracle Endeca
Apache Solr vs Oracle Endeca
 
Are users really ready for faceted search?
Are users really ready for faceted search?Are users really ready for faceted search?
Are users really ready for faceted search?
 
Facettensuche mit Lucene und Solr
Facettensuche mit Lucene und SolrFacettensuche mit Lucene und Solr
Facettensuche mit Lucene und Solr
 
Hybris 6.0.0 to 6.3.0 comparision
Hybris 6.0.0 to 6.3.0 comparisionHybris 6.0.0 to 6.3.0 comparision
Hybris 6.0.0 to 6.3.0 comparision
 
What is Product Life Cycle Management?
What is Product Life Cycle Management?What is Product Life Cycle Management?
What is Product Life Cycle Management?
 
SAP hybris Caching and Monitoring
SAP hybris Caching and MonitoringSAP hybris Caching and Monitoring
SAP hybris Caching and Monitoring
 
Adobe AEM Commerce with hybris
Adobe AEM Commerce with hybrisAdobe AEM Commerce with hybris
Adobe AEM Commerce with hybris
 
Resume Parsing with Named Entity Clustering Algorithm
Resume Parsing with Named Entity Clustering AlgorithmResume Parsing with Named Entity Clustering Algorithm
Resume Parsing with Named Entity Clustering Algorithm
 
Achieve Digital Transformation with SAP Hybris Cloud for Service
Achieve Digital Transformation with SAP Hybris Cloud for ServiceAchieve Digital Transformation with SAP Hybris Cloud for Service
Achieve Digital Transformation with SAP Hybris Cloud for Service
 
Discover the Power of Contextual Marketing
Discover the Power of Contextual MarketingDiscover the Power of Contextual Marketing
Discover the Power of Contextual Marketing
 
Solr facets and custom indices
Solr facets and custom indicesSolr facets and custom indices
Solr facets and custom indices
 
SAP hybris - User Account Management
SAP hybris - User Account ManagementSAP hybris - User Account Management
SAP hybris - User Account Management
 
Developing enterprise ecommerce solutions using hybris by Drazen Nikolic - Be...
Developing enterprise ecommerce solutions using hybris by Drazen Nikolic - Be...Developing enterprise ecommerce solutions using hybris by Drazen Nikolic - Be...
Developing enterprise ecommerce solutions using hybris by Drazen Nikolic - Be...
 
Deliver the Perfect Omnichannel Commerce Experience
Deliver the Perfect Omnichannel Commerce ExperienceDeliver the Perfect Omnichannel Commerce Experience
Deliver the Perfect Omnichannel Commerce Experience
 

Similar a Faceted Search and Solr

The hunt for the perfect interface in a googlified world
The hunt for the perfect interface in a googlified worldThe hunt for the perfect interface in a googlified world
The hunt for the perfect interface in a googlified world
nabot
 
From post its to personas
From post its to personasFrom post its to personas
From post its to personas
Lee McIvor
 
Just the basics_strata_2013
Just the basics_strata_2013Just the basics_strata_2013
Just the basics_strata_2013
Ken Mwai
 
PxS’12 - week 4 - qualitative analysis
PxS’12 - week 4 - qualitative analysisPxS’12 - week 4 - qualitative analysis
PxS’12 - week 4 - qualitative analysis
hendrikknoche
 

Similar a Faceted Search and Solr (20)

UKUPA Feb 08 Flow Interactive Personas
UKUPA Feb 08 Flow Interactive PersonasUKUPA Feb 08 Flow Interactive Personas
UKUPA Feb 08 Flow Interactive Personas
 
Voice of the Customer in Travel
Voice of the Customer in TravelVoice of the Customer in Travel
Voice of the Customer in Travel
 
The hunt for the perfect interface in a googlified world
The hunt for the perfect interface in a googlified worldThe hunt for the perfect interface in a googlified world
The hunt for the perfect interface in a googlified world
 
Prototyping and Scrum
Prototyping and ScrumPrototyping and Scrum
Prototyping and Scrum
 
Practical Search with Solr: Beyond just Looking it Up
Practical Search with Solr: Beyond just Looking it UpPractical Search with Solr: Beyond just Looking it Up
Practical Search with Solr: Beyond just Looking it Up
 
A taxonomy of search strategies and their design implications
A taxonomy of search strategies and their design implicationsA taxonomy of search strategies and their design implications
A taxonomy of search strategies and their design implications
 
From post its to personas
From post its to personasFrom post its to personas
From post its to personas
 
LOD2 CKAN WS Vienna: PoolParty für semantische Suche und Vokabular Management...
LOD2 CKAN WS Vienna: PoolParty für semantische Suche und Vokabular Management...LOD2 CKAN WS Vienna: PoolParty für semantische Suche und Vokabular Management...
LOD2 CKAN WS Vienna: PoolParty für semantische Suche und Vokabular Management...
 
NETNOGRAPHY vs. Web Monitoring = “Qual. VS Quant. ?”
NETNOGRAPHY vs. Web Monitoring = “Qual. VS Quant. ?”NETNOGRAPHY vs. Web Monitoring = “Qual. VS Quant. ?”
NETNOGRAPHY vs. Web Monitoring = “Qual. VS Quant. ?”
 
Just the basics_strata_2013
Just the basics_strata_2013Just the basics_strata_2013
Just the basics_strata_2013
 
Core and Paths: Designing Findability from the Inside and Out
Core and Paths: Designing Findability from the Inside and OutCore and Paths: Designing Findability from the Inside and Out
Core and Paths: Designing Findability from the Inside and Out
 
How do we create great user experiences?
How do we create great user experiences?How do we create great user experiences?
How do we create great user experiences?
 
Search and Browsing Cycle for Knowledge Discovery and Learning
Search and Browsing Cycle for Knowledge Discovery and LearningSearch and Browsing Cycle for Knowledge Discovery and Learning
Search and Browsing Cycle for Knowledge Discovery and Learning
 
Information Retrieval intro TMM
Information Retrieval intro TMMInformation Retrieval intro TMM
Information Retrieval intro TMM
 
Creating Documentation Your Users Will Love
Creating Documentation Your Users Will LoveCreating Documentation Your Users Will Love
Creating Documentation Your Users Will Love
 
05 attention
05 attention05 attention
05 attention
 
NCompass Live: Tech Talk with Michael Sauers: Artificial Intelligence: Transf...
NCompass Live: Tech Talk with Michael Sauers: Artificial Intelligence: Transf...NCompass Live: Tech Talk with Michael Sauers: Artificial Intelligence: Transf...
NCompass Live: Tech Talk with Michael Sauers: Artificial Intelligence: Transf...
 
PxS’12 - week 4 - qualitative analysis
PxS’12 - week 4 - qualitative analysisPxS’12 - week 4 - qualitative analysis
PxS’12 - week 4 - qualitative analysis
 
MRECO Conversation Starter
MRECO Conversation StarterMRECO Conversation Starter
MRECO Conversation Starter
 
Enhance discovery Solr and Mahout
Enhance discovery Solr and MahoutEnhance discovery Solr and Mahout
Enhance discovery Solr and Mahout
 

Más de otisg

UIMA
UIMAUIMA
UIMA
otisg
 

Más de otisg (6)

Search at Tumblr (nyc search meetup)
Search at Tumblr (nyc search meetup)Search at Tumblr (nyc search meetup)
Search at Tumblr (nyc search meetup)
 
Lucandra
LucandraLucandra
Lucandra
 
Finite State Queries In Lucene
Finite State Queries In LuceneFinite State Queries In Lucene
Finite State Queries In Lucene
 
UIMA
UIMAUIMA
UIMA
 
Probabilistic Retrieval
Probabilistic RetrievalProbabilistic Retrieval
Probabilistic Retrieval
 
Lucene Introduction
Lucene IntroductionLucene Introduction
Lucene Introduction
 

Último

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Último (20)

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 

Faceted Search and Solr

  • 1. “Regular” Search Faceted Search Interface: ! User expresses information need as short query. Search engine returns ranked, pageable result set. New York CTO Club ! December 9, 2009 User happy when... ! Top-ranked result satisfies information need. ! At least some result on first page is relevant. Daniel Tunkelang, Google User unhappy when... Otis Gospodneti!, Sematext ! No result on first page satisfies information need. ! Results misleadingly appear relevant (bait and switch). 1 3 Agenda Relevance Is Subjective Daniel: ! What is faceted search? Relevance is defined as a measure of ! Why use faceted search? information conveyed by a document relative to ! Thoughts about design and user experience. a query. It is shown that the relationship between the Otis: ! What are Lucene and Solr? document and the query, though necessary, is ! Why use an open-source search library? not sufficient to determine relevance. ! Thoughts about implementation. William Goffman, On relevance as a measure, 1964. 2 4
  • 2. Regular Search Experience What is Faceted Search? ! Best understood through examples. " See the following slides. " Or shop on almost any ecommerce site. ! Facets = multiple ways to organize information. " Often based on available structured information. " But not always, e.g., facets obtained via text mining. ! Typical interaction: " User starts with a full-text search. " Facets guide query refinement process. 5 7 Assumptions Are Dangerous Faceted Search for News ! self-awareness tf-idf PageRank ! self-expression ! model knows best ! answer is a document ! one-shot query 6 8
  • 3. Faceted Search for People 9 Faceted Search for Breakfast But Facets are Not a Silver Bullet... ! Screen real estate is finite. " Choose facets wisely. " Choose facet values wisely for monster facets. ! Multiple selection within a facet is powerful, but... " Has to be intuitive, especially AND vs. OR. " Even trickier for hierarchical facets. ! Search relevance still matters! " Most faceted search applications rank results. " Irrelevant results " irrelevant facet refinements. 10 12
  • 4. Exploring Information Science Be Careful with Faceted Search! Cameras have artists?! 13 15 Deliver Precision and Recall Clarify, Then Refine Easier said than done! Ranking of facet values is an open research topic. 14 16
  • 5. Take-Aways What is / isn't Lucene ! Faceted search addresses the subjectivity of ! Free, ASL, Java IR library, Jar relevance and information overload. ! Doug Cutting, ASF, 2001 ! But deploying faceted search effectively ! Application agnostic: Indexing & Searching requires that you think about user experience. ! High performance, scalable ! No dependencies ! Recommended reading: ! Heavily ported " My thin book entitled Faceted Search " Marti Hearst's book on Search User Interfaces ! No: crawler, rich doc parser, turn-key solution " Peter Morville's upcoming book on Search Patterns ! No: out of the box faceted search-capability... but... 17 19 Faceted Search with Lucene & Solr Otis Gospodneti!, Sematext 18
  • 6. What is/isn't Solr Facet Field Requirements ! Indexing/Search server with HTTP API built on ! Must be indexed top of Lucene ! Often not tokenized ! Fast & scalable (distributed search, index ! Often not altered (lowercase, punctuation)# replication)# ! Storing not required ! XML, JSON, Ruby, Perl, PHP, javabin ! Multivalued fields OK ! No: crawler (but Nutch ==> Solr works)# ! Yes: rich text parser ! Yes: Faceted Search out of the box! 21 23 Solr and Faceted Search Turn It On ! 3 Types of facets: Field Values (text), Dates, ! 0 facets: Queries. ! http://host:80/solr/select?q=foo ! “Text”: return counts for all/top terms in a field ! 1 facet: for a result set - e.g. categories a la Amazon ! http://host:80/solr/select?q=foo&facet=true&facet.field=category ! Dates: return counts for docs in specified date ! N facets: ranges ! http://host:80/solr/select? q=foo&facet=true&facet.field=category&facet.field=inStock ! Queries: return counts for docs that also match ! facet=true or facet.on a given query - handy for number ranges (think prices!)# 22 24
  • 7. Text Facet Response Date Facet Response <result numFound="4" start="0"/> <result name="response" numFound="42" start="0"/> ! facet.mincount=1 to <lst name="facet_counts"> <lst name="facet_counts"> <lst name="facet_fields"> avoid 0-count facet <lst name="facet_dates"> <lst name="category"> values <lst name="timestamp"> <int name="electronics">3</int> ! facet.limit=N to limit to <int name="2007-08-11T00:00:00.000Z">1</int> <int name="copier">0</int> <int name="2007-08-12T00:00:00.000Z">5</int> top N facet values </lst> <int name="2007-08-13T00:00:00.000Z">3</int> <lst name="inStock"> ! facet.missing=true to <int name="2007-08-14T00:00:00.000Z">7</int> <int name="false">3</int> catch uncategorized <int name="2007-08-15T00:00:00.000Z">2</int> <int name="true">1</int> <int name="2007-08-16T00:00:00.000Z">16</int> </lst> ! lots of other options! <str name="gap">+1DAY</str> </lst> <date name="end">2007-08-17T00:00:00Z</date> </lst> 25 </lst> 27 Date Facets Query Facets ! http://.../solr/select/? ! http://.../solr/select? q=*:*&rows=0&facet=true&facet.date=timesta q=shoes&rows=0&facet=true&facet.field=inStoc mp&facet.date.start=NOW/DAY- k&facet.query=price: 5DAYS&facet.date.end=NOW/DAY [*+TO+500]&facet.query=price:[500+TO+*] %2B1DAY&facet.date.gap=%2B1DAY ! Avoids the bucket-at-index-time work-around ! (%2B1 ==> +1)# ! Keep queries disjoint ! Solr Date Math Parser syntax: /HOUR, +2YEARS, -1DAY, /DAY+6MONTHS+3DAYS, +6MONTHS+3DAYS/DAY 26 28
  • 8. Query Facet Response State of Lucene & Solr <result numFound="3" start="0"/> ! Super healthy community, exploding <lst name="facet_counts"> <lst name="facet_queries"> development <int name="price:[* TO 500]">3</int> ! Lucene 3.0 – 2009-11-25: <int name="price:[500 TO *]">1</int> ! Performance, faster range queries, clean API, better </lst> Unicode support, more non-English support <lst name="facet_fields"> <lst name="inStock"> ! Solr 1.4 – 2009-11-10: <int name="false">3</int> ! Performance, new replication, Db indexing, rich-doc <int name="true">1</int> indexing, results clustering, faster response protocol, </lst> deduplication... </lst> </lst> 29 31 UI Integration Lucene, Solr, Enterprise ! Use Filter Queries via fq ! Free: Community ! http://.../solr/select? ! Lucene ~ 600 emails/month (dev: 2000/month)# q=shoes&facet=true&facet.field=category& ! Solr ~1300 emails/month (dev: 800/month)# fq=price:[0 TO 300] ! http://.../solr/select? ! Commercial: Support Subscriptions q=shoes&facet=true&facet.field=category& ! Sematext fq=price:[0 TO 300]&fq=inStock:true ! Lucid Imagination ! Important: single request does it all 30 32