SlideShare a Scribd company logo
1 of 44
Download to read offline
The Many Facets of Apache Solr
          Yonik Seeley, Lucid Imagination
     yonik@lucidimagination.com, Oct 20 2011
What I Will Cover

§    What is Faceted Search
§    Solr’s Faceted Search
§    Tips & Tricks
§    Performance & Algorithms




                           3
My Background
§  Creator of Solr
§  Co-founder of Lucid Imagination
§  Expertise: Distributed Search systems and
    performance
§  Lucene/Solr committer, a member of the
    Lucene PMC,
    member of the Apache Software
    Foundation
§  Work: CNET Networks, BEA, Telcordia,
    among others
§  M.S. in Computer Science, Stanford
What is Faceted Search




           5
Faceted Search Example

                                  The facet count or
                                    constraint count
 Manufacturer is
                                   shows how many
a facet, a way of
                                  results match each
categorizing the
                                         value
     results




 Canon, Sony,
 and Nikon are
constraints, or
  facet values



The breadcrumb
 trail shows what
constraints have
   already been
    applied and
  allows for their
       removal
Key Elements of Faceted Search
§  No hierarchy of options is enforced
   •  Users can apply facet constraints in any order
   •  Users can remove facet constraints in any order
§  No surprises
   •  The user is only given facets and constraints
      that make sense in the context of the items they
      are looking at
   •  The user always knows what to expect before
      they apply a constraint
§  Also known as guided navigation, faceted
    navigation, faceted browsing, parametric search

                            7
Solr’s Faceted Search




          8
Field Faceting
§  Specifies a Field to be used as a Facet
§  Uses each term indexed in that Field as a
    Constraint
§  Field must be indexed
§  Can be used multiple times for multiple fields

  q
  !                    =   iPhone!
    !
 !fq                   =   inStock:true!
 !facet                =   true!
 !facet.field          =   color!
 !facet.field          =   category!
                            9
Field Faceting Response
http://localhost:8983/solr/select?q=iPhone&fq=inStock:true!
&facet=true&facet.field=color&facet.field=category!


<lst name="facet_counts”>!
 <lst name="facet_fields">!
  <lst name="color">!
    <int name="red">17</int>!
    <int name="green">6</int>!
    <int name="blue">2</int>!
    <int name="yellow">2</int>!
  <lst name=”category">!
    <int name=“accessories”>16</int>!
    <int name=“electronics”>11</int>!
                             10
Or if you prefer JSON…
http://localhost:8983/solr/select?q=iPhone&fq=inStock:true!
&facet=true&facet.field=color&facet.field=category&wt=json!


"facet_counts":{!
  "facet_fields":{!
    "color":[!
      "red",17,!
      "green",6,!
      "blue",2,      !
      "yellow",2]!
    "category":[!
      "accessories",16,!
      "electronics",11]!     11
Applying Constraints
  Assume the user clicked on “red”…
                                         Simply add another filter
                                           query to apply that
                                                constraint

http://localhost:8983/solr/select?
q=iPhone&fq=inStock:true&fq=color:red&facet=true&
facet.field=color&facet.field=category!



   Remove redundant facet.field
   (assuming single valued field)

                                    12
facet.field Options
§  facet.prefix - Restricts the possible constraints to only
    indexed values with a specified prefix.
§  facet.mincount=0 - Restricts the constraints returned to
    those containing a minimum number of documents in the
    result set.
§  facet.sort=count - The ordering of constraints: count or
    index
§  facet.offset=0 - Indicates how many constraints in the
    specified sort ordering should be skipped in the response.
§  facet.limit=100 - The number of constraints to return
§  facet.missing=false – Return the number of docs with no
    value in the field
                              13
facet.query
                        !
§  Specifies a query string to be used as a Facet
    Constraint
§  Typically used multiple times to get multiple
    (discrete) sets
§  Any type of query supported


!facet.query = rank:[* TO 20]!
!facet.query = rank:[21 TO *]!


                          14
facet.query Results

<result numFound="27" ... />!
...!
<lst name="facet_counts">!
 <lst name="facet_queries">!
  <int name="rank:[* TO 20]">2</int>!
  <int name="rank:[21 TO *]">15</int>!
 </lst>!
 ...!
The lat,lon center point to
 Spatial faceting                  search from

  !q=*:*&facet=true!
 !pt=45.15,-93.85!       Name of the field
                       containing lat+lon data
 !sfield=store!
 !facet.query={!geofilt d=5}!
 !facet.query={!geofilt d=10}!

                            geospatial query type
"facet_counts":{!
   "facet_queries":{!
     "{!geofilt d=5}":3,!
     "{!geofilt d=10}":6},!
                     16
Range Faceting
                              "facet_counts":{
§  Simpler than a sequence     "facet_ranges":{
    of facet.query params         "price":{
                                   "counts”:[
                                     "0.0”,5,
http://...&facet=true                "50.0”,2,
&facet.range=price                   "100.0”,0,
                                     "150.0”,2,
&facet.range.start=0                 "200.0”,0,
&facet.range.end=500                 "250.0”,1,
                                     "300.0”,2,
&facet.range.gap=50                  "350.0”,2,
                                     "400.0”,0,
                                     "450.0”,1],
                                   "gap":50.0,
                                   "start":0.0,
                                   "end":500.0}}}}
Date Faceting
l  facet.date  is deprecated, use facet.range on a date field now
l  Creates Constraints based on evenly sized date ranges using
    the Gregorian Calendar
l  Ranges are specified using "Date Math" so they DWIM in spite
    of variable length months and leap years


  facet.range                    =   pubdate!
  facet.range.start              =   NOW/YEAR-1YEAR!
  facet.range.end                =   NOW/MONTH+1MONTH!
  facet.range.gap                =   +1MONTH!
Date Faceting Results
 "facet_counts":{!
   "facet_ranges":{!
      ”pubdate":{!
        "counts":[!
          "2010-01-01T00:00:00Z",4,!
          "2010-02-01T00:00:00Z",6,!
          "2010-03-01T00:00:00Z",0,!
          "2010-04-01T00:00:00Z",13,!
[…]!
          "2011-09-01T00:00:00Z",5,!
          "2011-10-01T00:00:00Z",2],!
        "gap":"+1MONTH",!
        "start":"2010-01-01T00:00:00Z",!
        "end":"2011-11-01T00:00:00Z”}}}!
Range Faceting Options
l  facet.range.hardend=false     - Determines what effective
    end value is used when the specified "start" and "end"
    don't divide into even "gap" sized buckets; false means
    the last Constraint range may be shorter then the others
l  facet.range.other=none - Allows you to specify what
    other Constraints you are interested in besides the
    generated ranges: before, after, between, none, all
l  facet.range.include=lower – Specifies what bounds are
    inclusive vs exclusive: lower, upper, edge, outer, all
Pivot Faceting (trunk)
l  Computes   a Matrix of Constraint Counts across multiple
    Facet Fields
l  Syntax: facet.pivot=field1,field2,field3,…


facet.pivot=cat,inStock
                      #docs #docs w/         #docs w/
                            inStock:true     instock:false
cat:electronics       14     10              4
cat:memory            3      3               0
cat:connector         2      0               2
cat:graphics card     2      0               2
cat:hard drive        2      2               0
Pivot Faceting
   http://...&facet=true&facet.pivot=cat,popularity
            "facet_counts":{                    (continued)
               "facet_pivot":{
                 "cat,popularity":[{           {
                   "field":"cat",                "field":"popularity",
14 docs w/         "value":"electronics",        "value”:1,
cat==electronics   "count":14,                   "count":2}]},
                   "pivot":[{               {
5 docs w/             "field":"popularity", "field":"cat",
cat==electronics      "value":6,              "value":"memory",
&& popularity==6      "count":5},             "count":3,
                    {                         "pivot":[]},
                      "field":"popularity",
                      "value":7,                […]
                      "count":4},
Tips & Tricks




      23
term QParser
l  Default Query Parser does special things with whitespace
    and punctuation
l  Problematic when "filtering" on Facet Field Constraints
    that contain whitespace, punctuation, or other reserved
    characters.
l  Use the term parser to filter on an exact Term


 fq = {!term f=category}Books & Magazines
                         OR
 fq = {!term f=category v=$t}
  t = Books & Magazines
Taxonomy Facets
l  What   If Your Documents Are Organized in a Taxonomy?
Taxonomy Facets: Data
l  Flattened   Data
  !Doc#1: NonFic > Law!
  !Doc#2: NonFic > Sci!
  !Doc#3: NonFic > Hist!
          NonFic > Sci > Phys!
l  Indexed   Terms (prepend number of nodes in path segment)

Doc#1: 1/NonFic, 2/NonFic/Law!
Doc#2: 1/NonFic, 2/NonFic/Sci!
Doc#3: 1/NonFic, 2/NonFic/Hist, !
       2/NonFic/Sci, 3/NonFic/Sci/Phys!
Taxonomy Facets: Initial Query

facet.field    = category!
facet.prefix   = 2/NonFic!
facet.mincount = 1!

<result numFound="164" ...!
<lst name="facet_fields">!
 <lst name="category">!
   <int name="2/NonFic/Sci">2</int>!
   <int name="2/NonFic/Hist">1</int>!
   <int name="2/NonFic/Law">1</int>!
Taxonomy Facets: Drill Down

fq           =   {!term f=category}2/NonFic/Sci!
facet.field      = category!
facet.prefix     = 3/NonFic/Sci!
facet.mincount   = 1!


<result numFound="2" ...!
<lst name="facet_fields">!
 <lst name="category">!
   <int name=”3/NonFic/Sci/Phys">1</int>!
Multi-Select Faceting
http://search.lucidimagination.com   §  Very	
  generic	
  support	
  
                                     •  Reuses	
  localParams	
  syntax	
  {!name=val}	
  
                                     •  Ability	
  to	
  tag	
  filters	
  
                                     •  Ability	
  to	
  exclude	
  certain	
  filters	
  when	
  
                                        faceCng,	
  by	
  tag	
  
                                       	
  

                                      	
  q=index	
  replicaCon	
  
                                      	
  facet=true	
  
                                      	
  fq={!tag=pr}project:(lucene	
  OR	
  solr)	
  
                                      	
  facet.field={!ex=pr}project	
  
                                      	
  facet.field={!ex=src}source	
  


                                              29	
  
Same Facet, Different Exclusions
  §  A key can be specified for a facet to change the
      name used to identify it in the response.
          q   =   Hot Rod!
         fq   =   {!df=colors tag=cx}purple green!
facet.field   =   {!key=all_colors ex=cx}colors!
facet.field   =   {!key=overlap_colors}colors!

"facet_counts":{!                 ”overlap_colors":[!
  "facet_fields":{!                   "red",7,!
    ”all_colors":[!                   "green",6,!
      "red",19,!                      "blue”,1]!
      "green",6,!                 }!
      "blue",2],!            }!
                            30
“Pretty” facet.field Terms
      §  Field Faceting uses Indexed Terms
      §  Leverage copyField and TokenFilters that will
          give you good looking Constraints

<tokenizer	
  class="solr.PaPernTokenizerFactory"	
  	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  paPern="(,|;)s*"	
  />	
  
<filter	
  class="solr.PaPernReplaceFilterFactory"	
  	
  
	
  	
  	
  	
  	
  	
  	
  	
  paPern="s+"	
  replacement="	
  "	
  />	
  
<filter	
  class="solr.PaPernReplaceFilterFactory"	
  	
  
	
  	
  	
  	
  	
  	
  	
  	
  paPern="	
  and	
  "	
  replacement="	
  &amp;	
  "	
  />	
  
<filter	
  class="solr.TrimFilterFactory"	
  />	
  
<filter	
  class="solr.CapitalizaConFilterFactory"	
  	
  
	
  	
  	
  	
  	
  	
  	
  	
  onlyFirstWord="false"	
  />	
  
                                                      31
“Pretty” facet.field Results
{“id” : “MyExampleDoc”,!
 "category” : ” books !
 and magazines; !
computers, “!
}!
   copyField
  in schema    <copyField “source”=“category” “dest”=“category_pretty”/>

"facet_counts":{!
  "facet_fields":{!
    "category_pretty":[!
      "Books & Magazines",1,!
      "Computers",1]!
                                 32
facet.field Labels
  §  facet.query params are echoed verbatim when
      returning the constraint counts
  §  Optionally, one can declare a facet.query in
      solrconfig.xml and include a "label" that the
      presentation layer can parse out for display. “label” has no
                                                                                    meaning to solr
facet.query = {!label=‘Hot!’	
  }!
              +pop:[1 TO *] !
              +pub_date:[NOW/DAY-1DAY TO *]!

"facet_queries"	
  :	
  {	
  
	
  	
  "	
  {!label=‘Hot!’	
  }	
  pop:[1	
  TO	
  *]	
  ...	
  "	
  :	
  15	
  
}	
  
                                                        33
Performance




     34
facet.method
                                    !
name     facet.method description                 memory               CPU


enum     enum            Iterates over terms,     filter-per-term in   ~O(nTerms)
                         calculating set          the filterCache
                         intersections

field    fc (single-     Iterates over documents, Lucene               O(nDocs)
cache    valued field)   counting terms           FieldCache
                                                  Entry… int
                                                  [maxDoc]+terms
UnInvert fc (multi-      Iterates over documents, O(maxDoc * num       ~O(nDocs)
edField valued field)    counting terms           terms per doc)

Per-    fcs              Like field-cache, just   Lucene               O(nDocs)
segment                  better for NRT since     FieldCache           +O(nTerms)
field                    FieldCacheEntry is at    Entries… int
cache                    segment level.           [maxDoc]+terms

                                          35
facet.method=fc                                  Mem=int[maxDoc]

CPU=O(nDocs in
                      (single-valued field)                              +unique_values

   base set)
                             Documents
                             matching the
                             base query            Lucene FieldCache Entry
                              Juggernaut           (StringIndex) for the hero
 q=Juggernaut                                      field
                                 0                order: for each
 &facet=true                     2     lookup     doc, an index into   lookup: the
 &facet.field=hero                                the lookup array
                                                                       string values
                                 7
                                                         5                (null)
                                                         3              batman
                              accumulator
                                                         5                flash
                                 0
                                                         1             spiderman
                                 1
                                                         4             superman
            Priority queue       0    increment
               flash, 5
                                                         5             wolverine
                                 0
             Batman, 3                                   2
                                 0
                                                         1
                                 2
facet.method=fcs (trunk)

          (per-segment single-valued)
                  Segment1             Segment2           Segment3               Segment4
                  FieldCache           FieldCache         FieldCache             FieldCache
                     Entry                Entry              Entry                  Entry

                        accumulator1      accumulator2           accumulator3      accumulator4
                  inc
         lookup         0                 0                   1                       0
                        3                 2                   3                       1
         0
Base                    5                 1                   0                       0
DocSet   2
                        0                 0                   4
         7                                                                         thread4
                        1              thread2            thread3
                        2
                    thread1                                                     Priority queue
                                              FieldCache +
                                                                                   flash, 5
                                              accumulator                        Batman, 3
                                              merger
                                              (Priority queue)
facet.method=fcs!
l  Controllable
           multi-threading
   facet.method=fcs!
   facet.field={!threads=4}myfield!
   	
  
l  Disadvantages
   l     Larger memory use (FieldCaches + accumulators)
   l     Slower (extra FieldCache merge step needed) – O(nTerms)
l  Advantages
   l     Rebuilds FieldCache entries only for new segments (NRT friendly)
   l     Multi-threaded
Per-segment faceting performance
                 comparison
Test index: 10M documents, 18 segments, single valued field

    Base DocSet=100 docs, facet.field on a field with 100,000 unique terms
A   Time for request*           facet.method=fc        facet.method=fcs
    static index                3 ms                   244 ms
    quickly changing index      1388 ms                267 ms



    Base DocSet=1,000,000 docs, facet.field on a field with 100 unique terms

B   Time for request*          facet.method=fc          facet.method=fcs
    static index               26 ms                    34 ms
    quickly changing index     741 ms                   94 ms

                   *complete request time, measured externally
facet.method=fc 

                (multi-valued field)
§  UnInvertedField - like single-valued FieldCache algorithm, but
    with multi-valued FieldCache
§  Good for many unique terms, relatively few values per doc
   •  Best case: 50x faster, 5x smaller than “enum” (100K unique values,
      1-5 per doc)
   •  O(n_docs), but optimization to count the inverse when n>maxDoc/2
§  Memory efficient
   •  Terms in a document are delta coded variable width ords (vints)
   •  Ord list for document packed in an int or in a shared byte[]
   •  Hybrid approach: “big terms” that match >5% of index use
      filterCache instead
   •  Only 1/128th of string values in memory
facet.method=fc
              fieldValueCache
§  Implicit cache with UnInvertedField entries
   •  Not autowarmed – use static warming request
   •  http://localhost:8983/solr/admin/stats.jsp (mem size,
      time to create, etc)
Faceting: fieldValueCache
§  Implicit cache with UnInvertedField entries
   •  Not autowarmed – use static warming request
   •  http://localhost:8983/solr/admin/stats.jsp (mem size,
      time to create, etc)


item_cat:
{field=cat,memSize=5376,tindexSize=52,time=2,phase1=2,
nTerms=16,bigTerms=10,termInstances=6,uses=44}
Multi-valued faceting:
               facet.method=enum
§  facet.method=enum
§  For each term in field:
   •  Retrieve filter                                     Solr filterCache (in memory)
   •  Calculate intersection size                         hero:batman     hero:flash

         Lucene Inverted    Docs matching
         Index (on disk)    base query                        1             0
                                         intersection
                                 0                            3             1
      batman      1 3 5 8                   count
                                 1                            5             5
      flash       0 1 5          5                            8
      spiderman   2 4 7          9       Priority queue
                                          batman=2
      superman    0 6 9
      wolverine   1 2 7 8
facet.method=enum
                              !

§    O(n_terms_in_field)
§    Short circuits based on term.df
§    filterCache entries int[ndocs] or BitSet(maxDoc)
§    Size filterCache appropriately
      •  Either autowarm filterCache, or use static warming
         queries (via newSearcher event) in solrconfig.xml
§  facet.enum.cache.minDf - prevent filterCache use
    for small terms
      •  Also useful for huge index w/ no time constraints
Q&A




 45

More Related Content

What's hot

Introduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of LuceneIntroduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of LuceneRahul Jain
 
Deep Dive Into Kafka Tiered Storage With Satish Duggana | Current 2022
Deep Dive Into Kafka Tiered Storage With Satish Duggana | Current 2022Deep Dive Into Kafka Tiered Storage With Satish Duggana | Current 2022
Deep Dive Into Kafka Tiered Storage With Satish Duggana | Current 2022HostedbyConfluent
 
Introduction to Kibana
Introduction to KibanaIntroduction to Kibana
Introduction to KibanaVineet .
 
An Intro to Elasticsearch and Kibana
An Intro to Elasticsearch and KibanaAn Intro to Elasticsearch and Kibana
An Intro to Elasticsearch and KibanaObjectRocket
 
Everything I Ever Learned About JVM Performance Tuning @Twitter
Everything I Ever Learned About JVM Performance Tuning @TwitterEverything I Ever Learned About JVM Performance Tuning @Twitter
Everything I Ever Learned About JVM Performance Tuning @TwitterAttila Szegedi
 
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...StampedeCon
 
Simplifying Change Data Capture using Databricks Delta
Simplifying Change Data Capture using Databricks DeltaSimplifying Change Data Capture using Databricks Delta
Simplifying Change Data Capture using Databricks DeltaDatabricks
 
Storing 16 Bytes at Scale
Storing 16 Bytes at ScaleStoring 16 Bytes at Scale
Storing 16 Bytes at ScaleFabian Reinartz
 
Delta Lake Streaming: Under the Hood
Delta Lake Streaming: Under the HoodDelta Lake Streaming: Under the Hood
Delta Lake Streaming: Under the HoodDatabricks
 
ElasticSearch Basic Introduction
ElasticSearch Basic IntroductionElasticSearch Basic Introduction
ElasticSearch Basic IntroductionMayur Rathod
 
Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Flink Forward
 
NATS Streaming - an alternative to Apache Kafka?
NATS Streaming - an alternative to Apache Kafka?NATS Streaming - an alternative to Apache Kafka?
NATS Streaming - an alternative to Apache Kafka?Anton Zadorozhniy
 
Deep Dive into Stateful Stream Processing in Structured Streaming with Tathag...
Deep Dive into Stateful Stream Processing in Structured Streaming with Tathag...Deep Dive into Stateful Stream Processing in Structured Streaming with Tathag...
Deep Dive into Stateful Stream Processing in Structured Streaming with Tathag...Databricks
 
Microservices, Apache Kafka, Node, Dapr and more - Part Two (Fontys Hogeschoo...
Microservices, Apache Kafka, Node, Dapr and more - Part Two (Fontys Hogeschoo...Microservices, Apache Kafka, Node, Dapr and more - Part Two (Fontys Hogeschoo...
Microservices, Apache Kafka, Node, Dapr and more - Part Two (Fontys Hogeschoo...Lucas Jellema
 
Performance Tuning RocksDB for Kafka Streams’ State Stores
Performance Tuning RocksDB for Kafka Streams’ State StoresPerformance Tuning RocksDB for Kafka Streams’ State Stores
Performance Tuning RocksDB for Kafka Streams’ State Storesconfluent
 
Introduction to Apache solr
Introduction to Apache solrIntroduction to Apache solr
Introduction to Apache solrKnoldus Inc.
 
Running Kafka as a Native Binary Using GraalVM with Ozan Günalp
Running Kafka as a Native Binary Using GraalVM with Ozan GünalpRunning Kafka as a Native Binary Using GraalVM with Ozan Günalp
Running Kafka as a Native Binary Using GraalVM with Ozan GünalpHostedbyConfluent
 
Image Similarity Detection at Scale Using LSH and Tensorflow with Andrey Gusev
Image Similarity Detection at Scale Using LSH and Tensorflow with Andrey GusevImage Similarity Detection at Scale Using LSH and Tensorflow with Andrey Gusev
Image Similarity Detection at Scale Using LSH and Tensorflow with Andrey GusevDatabricks
 

What's hot (20)

Druid
DruidDruid
Druid
 
Introduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of LuceneIntroduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of Lucene
 
Deep Dive Into Kafka Tiered Storage With Satish Duggana | Current 2022
Deep Dive Into Kafka Tiered Storage With Satish Duggana | Current 2022Deep Dive Into Kafka Tiered Storage With Satish Duggana | Current 2022
Deep Dive Into Kafka Tiered Storage With Satish Duggana | Current 2022
 
Introduction to Kibana
Introduction to KibanaIntroduction to Kibana
Introduction to Kibana
 
An Intro to Elasticsearch and Kibana
An Intro to Elasticsearch and KibanaAn Intro to Elasticsearch and Kibana
An Intro to Elasticsearch and Kibana
 
Everything I Ever Learned About JVM Performance Tuning @Twitter
Everything I Ever Learned About JVM Performance Tuning @TwitterEverything I Ever Learned About JVM Performance Tuning @Twitter
Everything I Ever Learned About JVM Performance Tuning @Twitter
 
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
 
Simplifying Change Data Capture using Databricks Delta
Simplifying Change Data Capture using Databricks DeltaSimplifying Change Data Capture using Databricks Delta
Simplifying Change Data Capture using Databricks Delta
 
Storing 16 Bytes at Scale
Storing 16 Bytes at ScaleStoring 16 Bytes at Scale
Storing 16 Bytes at Scale
 
Delta Lake Streaming: Under the Hood
Delta Lake Streaming: Under the HoodDelta Lake Streaming: Under the Hood
Delta Lake Streaming: Under the Hood
 
ElasticSearch Basic Introduction
ElasticSearch Basic IntroductionElasticSearch Basic Introduction
ElasticSearch Basic Introduction
 
Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...
 
NATS Streaming - an alternative to Apache Kafka?
NATS Streaming - an alternative to Apache Kafka?NATS Streaming - an alternative to Apache Kafka?
NATS Streaming - an alternative to Apache Kafka?
 
Deep Dive into Stateful Stream Processing in Structured Streaming with Tathag...
Deep Dive into Stateful Stream Processing in Structured Streaming with Tathag...Deep Dive into Stateful Stream Processing in Structured Streaming with Tathag...
Deep Dive into Stateful Stream Processing in Structured Streaming with Tathag...
 
Microservices, Apache Kafka, Node, Dapr and more - Part Two (Fontys Hogeschoo...
Microservices, Apache Kafka, Node, Dapr and more - Part Two (Fontys Hogeschoo...Microservices, Apache Kafka, Node, Dapr and more - Part Two (Fontys Hogeschoo...
Microservices, Apache Kafka, Node, Dapr and more - Part Two (Fontys Hogeschoo...
 
Laravel and SOLR
Laravel and SOLRLaravel and SOLR
Laravel and SOLR
 
Performance Tuning RocksDB for Kafka Streams’ State Stores
Performance Tuning RocksDB for Kafka Streams’ State StoresPerformance Tuning RocksDB for Kafka Streams’ State Stores
Performance Tuning RocksDB for Kafka Streams’ State Stores
 
Introduction to Apache solr
Introduction to Apache solrIntroduction to Apache solr
Introduction to Apache solr
 
Running Kafka as a Native Binary Using GraalVM with Ozan Günalp
Running Kafka as a Native Binary Using GraalVM with Ozan GünalpRunning Kafka as a Native Binary Using GraalVM with Ozan Günalp
Running Kafka as a Native Binary Using GraalVM with Ozan Günalp
 
Image Similarity Detection at Scale Using LSH and Tensorflow with Andrey Gusev
Image Similarity Detection at Scale Using LSH and Tensorflow with Andrey GusevImage Similarity Detection at Scale Using LSH and Tensorflow with Andrey Gusev
Image Similarity Detection at Scale Using LSH and Tensorflow with Andrey Gusev
 

Viewers also liked

Faceted Search with Lucene
Faceted Search with LuceneFaceted Search with Lucene
Faceted Search with Lucenelucenerevolution
 
Native Code, Off-Heap Data & JSON Facet API for Solr (Heliosearch)
Native Code, Off-Heap Data & JSON Facet API for Solr (Heliosearch)Native Code, Off-Heap Data & JSON Facet API for Solr (Heliosearch)
Native Code, Off-Heap Data & JSON Facet API for Solr (Heliosearch)Yonik Seeley
 
Grouping and Joining in Lucene/Solr
Grouping and Joining in Lucene/SolrGrouping and Joining in Lucene/Solr
Grouping and Joining in Lucene/Solrlucenerevolution
 
Solr cloud the 'search first' nosql database extended deep dive
Solr cloud the 'search first' nosql database   extended deep diveSolr cloud the 'search first' nosql database   extended deep dive
Solr cloud the 'search first' nosql database extended deep divelucenerevolution
 
Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...
Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...
Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...Lucidworks
 
Solr6 の紹介(第18回 Solr勉強会 資料) (2016年6月10日)
Solr6 の紹介(第18回 Solr勉強会 資料) (2016年6月10日)Solr6 の紹介(第18回 Solr勉強会 資料) (2016年6月10日)
Solr6 の紹介(第18回 Solr勉強会 資料) (2016年6月10日)Issei Nishigata
 
Technology for reduce of mistakes - うっかりをなくす技術
Technology for reduce of mistakes - うっかりをなくす技術Technology for reduce of mistakes - うっかりをなくす技術
Technology for reduce of mistakes - うっかりをなくす技術karupanerura
 
Solr Application Development Tutorial
Solr Application Development TutorialSolr Application Development Tutorial
Solr Application Development TutorialErik Hatcher
 
Search at Twitter: Presented by Michael Busch, Twitter
Search at Twitter: Presented by Michael Busch, TwitterSearch at Twitter: Presented by Michael Busch, Twitter
Search at Twitter: Presented by Michael Busch, TwitterLucidworks
 
Apache Solr-Webinar
Apache Solr-WebinarApache Solr-Webinar
Apache Solr-WebinarEdureka!
 
Building a real time big data analytics platform with solr
Building a real time big data analytics platform with solrBuilding a real time big data analytics platform with solr
Building a real time big data analytics platform with solrTrey Grainger
 
Introduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and UsecasesIntroduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and UsecasesRahul Jain
 

Viewers also liked (12)

Faceted Search with Lucene
Faceted Search with LuceneFaceted Search with Lucene
Faceted Search with Lucene
 
Native Code, Off-Heap Data & JSON Facet API for Solr (Heliosearch)
Native Code, Off-Heap Data & JSON Facet API for Solr (Heliosearch)Native Code, Off-Heap Data & JSON Facet API for Solr (Heliosearch)
Native Code, Off-Heap Data & JSON Facet API for Solr (Heliosearch)
 
Grouping and Joining in Lucene/Solr
Grouping and Joining in Lucene/SolrGrouping and Joining in Lucene/Solr
Grouping and Joining in Lucene/Solr
 
Solr cloud the 'search first' nosql database extended deep dive
Solr cloud the 'search first' nosql database   extended deep diveSolr cloud the 'search first' nosql database   extended deep dive
Solr cloud the 'search first' nosql database extended deep dive
 
Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...
Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...
Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...
 
Solr6 の紹介(第18回 Solr勉強会 資料) (2016年6月10日)
Solr6 の紹介(第18回 Solr勉強会 資料) (2016年6月10日)Solr6 の紹介(第18回 Solr勉強会 資料) (2016年6月10日)
Solr6 の紹介(第18回 Solr勉強会 資料) (2016年6月10日)
 
Technology for reduce of mistakes - うっかりをなくす技術
Technology for reduce of mistakes - うっかりをなくす技術Technology for reduce of mistakes - うっかりをなくす技術
Technology for reduce of mistakes - うっかりをなくす技術
 
Solr Application Development Tutorial
Solr Application Development TutorialSolr Application Development Tutorial
Solr Application Development Tutorial
 
Search at Twitter: Presented by Michael Busch, Twitter
Search at Twitter: Presented by Michael Busch, TwitterSearch at Twitter: Presented by Michael Busch, Twitter
Search at Twitter: Presented by Michael Busch, Twitter
 
Apache Solr-Webinar
Apache Solr-WebinarApache Solr-Webinar
Apache Solr-Webinar
 
Building a real time big data analytics platform with solr
Building a real time big data analytics platform with solrBuilding a real time big data analytics platform with solr
Building a real time big data analytics platform with solr
 
Introduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and UsecasesIntroduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and Usecases
 

Similar to The Many Facets of Apache Solr - Yonik Seeley

Fazendo mágica com ElasticSearch
Fazendo mágica com ElasticSearchFazendo mágica com ElasticSearch
Fazendo mágica com ElasticSearchPedro Franceschi
 
Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017
Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017
Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017Codemotion
 
Introduction to Elasticsearch
Introduction to ElasticsearchIntroduction to Elasticsearch
Introduction to ElasticsearchSperasoft
 
Real-time search in Drupal with Elasticsearch @Moldcamp
Real-time search in Drupal with Elasticsearch @MoldcampReal-time search in Drupal with Elasticsearch @Moldcamp
Real-time search in Drupal with Elasticsearch @MoldcampAlexei Gorobets
 
Extreme Swift
Extreme SwiftExtreme Swift
Extreme SwiftMovel
 
Creating a Single View: Data Design and Loading Strategies
Creating a Single View: Data Design and Loading StrategiesCreating a Single View: Data Design and Loading Strategies
Creating a Single View: Data Design and Loading StrategiesMongoDB
 
visualisasi data praktik pakai excel, py
visualisasi data praktik pakai excel, pyvisualisasi data praktik pakai excel, py
visualisasi data praktik pakai excel, pyElmaLyrics
 
What's Coming Next in Sencha Frameworks
What's Coming Next in Sencha FrameworksWhat's Coming Next in Sencha Frameworks
What's Coming Next in Sencha FrameworksGrgur Grisogono
 
Webinar: Index Tuning and Evaluation
Webinar: Index Tuning and EvaluationWebinar: Index Tuning and Evaluation
Webinar: Index Tuning and EvaluationMongoDB
 
Philipp Krenn | Make Your Data FABulous | Codemotion Madrid 2018
Philipp Krenn | Make Your Data FABulous | Codemotion Madrid 2018Philipp Krenn | Make Your Data FABulous | Codemotion Madrid 2018
Philipp Krenn | Make Your Data FABulous | Codemotion Madrid 2018Codemotion
 
Elasticsearch first-steps
Elasticsearch first-stepsElasticsearch first-steps
Elasticsearch first-stepsMatteo Moci
 
Building a real time, big data analytics platform with solr
Building a real time, big data analytics platform with solrBuilding a real time, big data analytics platform with solr
Building a real time, big data analytics platform with solrlucenerevolution
 
Solr vs. Elasticsearch - Case by Case
Solr vs. Elasticsearch - Case by CaseSolr vs. Elasticsearch - Case by Case
Solr vs. Elasticsearch - Case by CaseAlexandre Rafalovitch
 
elasticsearch - advanced features in practice
elasticsearch - advanced features in practiceelasticsearch - advanced features in practice
elasticsearch - advanced features in practiceJano Suchal
 
A Rusty introduction to Apache Arrow and how it applies to a time series dat...
A Rusty introduction to Apache Arrow and how it applies to a  time series dat...A Rusty introduction to Apache Arrow and how it applies to a  time series dat...
A Rusty introduction to Apache Arrow and how it applies to a time series dat...Andrew Lamb
 
Elasticsearch in 15 minutes
Elasticsearch in 15 minutesElasticsearch in 15 minutes
Elasticsearch in 15 minutesDavid Pilato
 
Elasticsearch in 15 Minutes
Elasticsearch in 15 MinutesElasticsearch in 15 Minutes
Elasticsearch in 15 MinutesKarel Minarik
 
N1QL: What's new in Couchbase 5.0
N1QL: What's new in Couchbase 5.0N1QL: What's new in Couchbase 5.0
N1QL: What's new in Couchbase 5.0Keshav Murthy
 

Similar to The Many Facets of Apache Solr - Yonik Seeley (20)

Solr 3.1 and beyond
Solr 3.1 and beyondSolr 3.1 and beyond
Solr 3.1 and beyond
 
Fazendo mágica com ElasticSearch
Fazendo mágica com ElasticSearchFazendo mágica com ElasticSearch
Fazendo mágica com ElasticSearch
 
Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017
Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017
Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017
 
Introduction to Elasticsearch
Introduction to ElasticsearchIntroduction to Elasticsearch
Introduction to Elasticsearch
 
Real-time search in Drupal with Elasticsearch @Moldcamp
Real-time search in Drupal with Elasticsearch @MoldcampReal-time search in Drupal with Elasticsearch @Moldcamp
Real-time search in Drupal with Elasticsearch @Moldcamp
 
Extreme Swift
Extreme SwiftExtreme Swift
Extreme Swift
 
Creating a Single View: Data Design and Loading Strategies
Creating a Single View: Data Design and Loading StrategiesCreating a Single View: Data Design and Loading Strategies
Creating a Single View: Data Design and Loading Strategies
 
Hidden Gems of Ruby 1.9
Hidden Gems of Ruby 1.9Hidden Gems of Ruby 1.9
Hidden Gems of Ruby 1.9
 
visualisasi data praktik pakai excel, py
visualisasi data praktik pakai excel, pyvisualisasi data praktik pakai excel, py
visualisasi data praktik pakai excel, py
 
What's Coming Next in Sencha Frameworks
What's Coming Next in Sencha FrameworksWhat's Coming Next in Sencha Frameworks
What's Coming Next in Sencha Frameworks
 
Webinar: Index Tuning and Evaluation
Webinar: Index Tuning and EvaluationWebinar: Index Tuning and Evaluation
Webinar: Index Tuning and Evaluation
 
Philipp Krenn | Make Your Data FABulous | Codemotion Madrid 2018
Philipp Krenn | Make Your Data FABulous | Codemotion Madrid 2018Philipp Krenn | Make Your Data FABulous | Codemotion Madrid 2018
Philipp Krenn | Make Your Data FABulous | Codemotion Madrid 2018
 
Elasticsearch first-steps
Elasticsearch first-stepsElasticsearch first-steps
Elasticsearch first-steps
 
Building a real time, big data analytics platform with solr
Building a real time, big data analytics platform with solrBuilding a real time, big data analytics platform with solr
Building a real time, big data analytics platform with solr
 
Solr vs. Elasticsearch - Case by Case
Solr vs. Elasticsearch - Case by CaseSolr vs. Elasticsearch - Case by Case
Solr vs. Elasticsearch - Case by Case
 
elasticsearch - advanced features in practice
elasticsearch - advanced features in practiceelasticsearch - advanced features in practice
elasticsearch - advanced features in practice
 
A Rusty introduction to Apache Arrow and how it applies to a time series dat...
A Rusty introduction to Apache Arrow and how it applies to a  time series dat...A Rusty introduction to Apache Arrow and how it applies to a  time series dat...
A Rusty introduction to Apache Arrow and how it applies to a time series dat...
 
Elasticsearch in 15 minutes
Elasticsearch in 15 minutesElasticsearch in 15 minutes
Elasticsearch in 15 minutes
 
Elasticsearch in 15 Minutes
Elasticsearch in 15 MinutesElasticsearch in 15 Minutes
Elasticsearch in 15 Minutes
 
N1QL: What's new in Couchbase 5.0
N1QL: What's new in Couchbase 5.0N1QL: What's new in Couchbase 5.0
N1QL: What's new in Couchbase 5.0
 

More from lucenerevolution

Text Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and LuceneText Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and Lucenelucenerevolution
 
State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! lucenerevolution
 
Building Client-side Search Applications with Solr
Building Client-side Search Applications with SolrBuilding Client-side Search Applications with Solr
Building Client-side Search Applications with Solrlucenerevolution
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationslucenerevolution
 
Scaling Solr with SolrCloud
Scaling Solr with SolrCloudScaling Solr with SolrCloud
Scaling Solr with SolrCloudlucenerevolution
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusterslucenerevolution
 
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and ParboiledImplementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiledlucenerevolution
 
Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs lucenerevolution
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchlucenerevolution
 
Real-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and StormReal-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and Stormlucenerevolution
 
Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?lucenerevolution
 
Schemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APISchemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APIlucenerevolution
 
High Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with LuceneHigh Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with Lucenelucenerevolution
 
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMText Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMlucenerevolution
 
Recent Additions to Lucene Arsenal
Recent Additions to Lucene ArsenalRecent Additions to Lucene Arsenal
Recent Additions to Lucene Arsenallucenerevolution
 
Turning search upside down
Turning search upside downTurning search upside down
Turning search upside downlucenerevolution
 
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...lucenerevolution
 
Shrinking the haystack wes caldwell - final
Shrinking the haystack   wes caldwell - finalShrinking the haystack   wes caldwell - final
Shrinking the haystack wes caldwell - finallucenerevolution
 
The First Class Integration of Solr with Hadoop
The First Class Integration of Solr with HadoopThe First Class Integration of Solr with Hadoop
The First Class Integration of Solr with Hadooplucenerevolution
 

More from lucenerevolution (20)

Text Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and LuceneText Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and Lucene
 
State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here!
 
Search at Twitter
Search at TwitterSearch at Twitter
Search at Twitter
 
Building Client-side Search Applications with Solr
Building Client-side Search Applications with SolrBuilding Client-side Search Applications with Solr
Building Client-side Search Applications with Solr
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applications
 
Scaling Solr with SolrCloud
Scaling Solr with SolrCloudScaling Solr with SolrCloud
Scaling Solr with SolrCloud
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusters
 
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and ParboiledImplementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
 
Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic search
 
Real-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and StormReal-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and Storm
 
Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?
 
Schemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APISchemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST API
 
High Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with LuceneHigh Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with Lucene
 
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMText Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
 
Recent Additions to Lucene Arsenal
Recent Additions to Lucene ArsenalRecent Additions to Lucene Arsenal
Recent Additions to Lucene Arsenal
 
Turning search upside down
Turning search upside downTurning search upside down
Turning search upside down
 
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
 
Shrinking the haystack wes caldwell - final
Shrinking the haystack   wes caldwell - finalShrinking the haystack   wes caldwell - final
Shrinking the haystack wes caldwell - final
 
The First Class Integration of Solr with Hadoop
The First Class Integration of Solr with HadoopThe First Class Integration of Solr with Hadoop
The First Class Integration of Solr with Hadoop
 

Recently uploaded

CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 

Recently uploaded (20)

CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 

The Many Facets of Apache Solr - Yonik Seeley

  • 1. The Many Facets of Apache Solr Yonik Seeley, Lucid Imagination yonik@lucidimagination.com, Oct 20 2011
  • 2. What I Will Cover §  What is Faceted Search §  Solr’s Faceted Search §  Tips & Tricks §  Performance & Algorithms 3
  • 3. My Background §  Creator of Solr §  Co-founder of Lucid Imagination §  Expertise: Distributed Search systems and performance §  Lucene/Solr committer, a member of the Lucene PMC, member of the Apache Software Foundation §  Work: CNET Networks, BEA, Telcordia, among others §  M.S. in Computer Science, Stanford
  • 4. What is Faceted Search 5
  • 5. Faceted Search Example The facet count or constraint count Manufacturer is shows how many a facet, a way of results match each categorizing the value results Canon, Sony, and Nikon are constraints, or facet values The breadcrumb trail shows what constraints have already been applied and allows for their removal
  • 6. Key Elements of Faceted Search §  No hierarchy of options is enforced •  Users can apply facet constraints in any order •  Users can remove facet constraints in any order §  No surprises •  The user is only given facets and constraints that make sense in the context of the items they are looking at •  The user always knows what to expect before they apply a constraint §  Also known as guided navigation, faceted navigation, faceted browsing, parametric search 7
  • 8. Field Faceting §  Specifies a Field to be used as a Facet §  Uses each term indexed in that Field as a Constraint §  Field must be indexed §  Can be used multiple times for multiple fields q ! = iPhone! ! !fq = inStock:true! !facet = true! !facet.field = color! !facet.field = category! 9
  • 9. Field Faceting Response http://localhost:8983/solr/select?q=iPhone&fq=inStock:true! &facet=true&facet.field=color&facet.field=category! <lst name="facet_counts”>! <lst name="facet_fields">! <lst name="color">! <int name="red">17</int>! <int name="green">6</int>! <int name="blue">2</int>! <int name="yellow">2</int>! <lst name=”category">! <int name=“accessories”>16</int>! <int name=“electronics”>11</int>! 10
  • 10. Or if you prefer JSON… http://localhost:8983/solr/select?q=iPhone&fq=inStock:true! &facet=true&facet.field=color&facet.field=category&wt=json! "facet_counts":{! "facet_fields":{! "color":[! "red",17,! "green",6,! "blue",2, ! "yellow",2]! "category":[! "accessories",16,! "electronics",11]! 11
  • 11. Applying Constraints Assume the user clicked on “red”… Simply add another filter query to apply that constraint http://localhost:8983/solr/select? q=iPhone&fq=inStock:true&fq=color:red&facet=true& facet.field=color&facet.field=category! Remove redundant facet.field (assuming single valued field) 12
  • 12. facet.field Options §  facet.prefix - Restricts the possible constraints to only indexed values with a specified prefix. §  facet.mincount=0 - Restricts the constraints returned to those containing a minimum number of documents in the result set. §  facet.sort=count - The ordering of constraints: count or index §  facet.offset=0 - Indicates how many constraints in the specified sort ordering should be skipped in the response. §  facet.limit=100 - The number of constraints to return §  facet.missing=false – Return the number of docs with no value in the field 13
  • 13. facet.query ! §  Specifies a query string to be used as a Facet Constraint §  Typically used multiple times to get multiple (discrete) sets §  Any type of query supported !facet.query = rank:[* TO 20]! !facet.query = rank:[21 TO *]! 14
  • 14. facet.query Results <result numFound="27" ... />! ...! <lst name="facet_counts">! <lst name="facet_queries">! <int name="rank:[* TO 20]">2</int>! <int name="rank:[21 TO *]">15</int>! </lst>! ...!
  • 15. The lat,lon center point to Spatial faceting search from !q=*:*&facet=true! !pt=45.15,-93.85! Name of the field containing lat+lon data !sfield=store! !facet.query={!geofilt d=5}! !facet.query={!geofilt d=10}! geospatial query type "facet_counts":{! "facet_queries":{! "{!geofilt d=5}":3,! "{!geofilt d=10}":6},! 16
  • 16. Range Faceting "facet_counts":{ §  Simpler than a sequence "facet_ranges":{ of facet.query params "price":{ "counts”:[ "0.0”,5, http://...&facet=true "50.0”,2, &facet.range=price "100.0”,0, "150.0”,2, &facet.range.start=0 "200.0”,0, &facet.range.end=500 "250.0”,1, "300.0”,2, &facet.range.gap=50 "350.0”,2, "400.0”,0, "450.0”,1], "gap":50.0, "start":0.0, "end":500.0}}}}
  • 17. Date Faceting l  facet.date is deprecated, use facet.range on a date field now l  Creates Constraints based on evenly sized date ranges using the Gregorian Calendar l  Ranges are specified using "Date Math" so they DWIM in spite of variable length months and leap years facet.range = pubdate! facet.range.start = NOW/YEAR-1YEAR! facet.range.end = NOW/MONTH+1MONTH! facet.range.gap = +1MONTH!
  • 18. Date Faceting Results "facet_counts":{! "facet_ranges":{! ”pubdate":{! "counts":[! "2010-01-01T00:00:00Z",4,! "2010-02-01T00:00:00Z",6,! "2010-03-01T00:00:00Z",0,! "2010-04-01T00:00:00Z",13,! […]! "2011-09-01T00:00:00Z",5,! "2011-10-01T00:00:00Z",2],! "gap":"+1MONTH",! "start":"2010-01-01T00:00:00Z",! "end":"2011-11-01T00:00:00Z”}}}!
  • 19. Range Faceting Options l  facet.range.hardend=false - Determines what effective end value is used when the specified "start" and "end" don't divide into even "gap" sized buckets; false means the last Constraint range may be shorter then the others l  facet.range.other=none - Allows you to specify what other Constraints you are interested in besides the generated ranges: before, after, between, none, all l  facet.range.include=lower – Specifies what bounds are inclusive vs exclusive: lower, upper, edge, outer, all
  • 20. Pivot Faceting (trunk) l  Computes a Matrix of Constraint Counts across multiple Facet Fields l  Syntax: facet.pivot=field1,field2,field3,… facet.pivot=cat,inStock #docs #docs w/ #docs w/ inStock:true instock:false cat:electronics 14 10 4 cat:memory 3 3 0 cat:connector 2 0 2 cat:graphics card 2 0 2 cat:hard drive 2 2 0
  • 21. Pivot Faceting http://...&facet=true&facet.pivot=cat,popularity "facet_counts":{ (continued) "facet_pivot":{ "cat,popularity":[{ { "field":"cat", "field":"popularity", 14 docs w/ "value":"electronics", "value”:1, cat==electronics "count":14, "count":2}]}, "pivot":[{ { 5 docs w/ "field":"popularity", "field":"cat", cat==electronics "value":6, "value":"memory", && popularity==6 "count":5}, "count":3, { "pivot":[]}, "field":"popularity", "value":7, […] "count":4},
  • 23. term QParser l  Default Query Parser does special things with whitespace and punctuation l  Problematic when "filtering" on Facet Field Constraints that contain whitespace, punctuation, or other reserved characters. l  Use the term parser to filter on an exact Term fq = {!term f=category}Books & Magazines OR fq = {!term f=category v=$t} t = Books & Magazines
  • 24. Taxonomy Facets l  What If Your Documents Are Organized in a Taxonomy?
  • 25. Taxonomy Facets: Data l  Flattened Data !Doc#1: NonFic > Law! !Doc#2: NonFic > Sci! !Doc#3: NonFic > Hist! NonFic > Sci > Phys! l  Indexed Terms (prepend number of nodes in path segment) Doc#1: 1/NonFic, 2/NonFic/Law! Doc#2: 1/NonFic, 2/NonFic/Sci! Doc#3: 1/NonFic, 2/NonFic/Hist, ! 2/NonFic/Sci, 3/NonFic/Sci/Phys!
  • 26. Taxonomy Facets: Initial Query facet.field = category! facet.prefix = 2/NonFic! facet.mincount = 1! <result numFound="164" ...! <lst name="facet_fields">! <lst name="category">! <int name="2/NonFic/Sci">2</int>! <int name="2/NonFic/Hist">1</int>! <int name="2/NonFic/Law">1</int>!
  • 27. Taxonomy Facets: Drill Down fq = {!term f=category}2/NonFic/Sci! facet.field = category! facet.prefix = 3/NonFic/Sci! facet.mincount = 1! <result numFound="2" ...! <lst name="facet_fields">! <lst name="category">! <int name=”3/NonFic/Sci/Phys">1</int>!
  • 28. Multi-Select Faceting http://search.lucidimagination.com §  Very  generic  support   •  Reuses  localParams  syntax  {!name=val}   •  Ability  to  tag  filters   •  Ability  to  exclude  certain  filters  when   faceCng,  by  tag      q=index  replicaCon    facet=true    fq={!tag=pr}project:(lucene  OR  solr)    facet.field={!ex=pr}project    facet.field={!ex=src}source   29  
  • 29. Same Facet, Different Exclusions §  A key can be specified for a facet to change the name used to identify it in the response. q = Hot Rod! fq = {!df=colors tag=cx}purple green! facet.field = {!key=all_colors ex=cx}colors! facet.field = {!key=overlap_colors}colors! "facet_counts":{! ”overlap_colors":[! "facet_fields":{! "red",7,! ”all_colors":[! "green",6,! "red",19,! "blue”,1]! "green",6,! }! "blue",2],! }! 30
  • 30. “Pretty” facet.field Terms §  Field Faceting uses Indexed Terms §  Leverage copyField and TokenFilters that will give you good looking Constraints <tokenizer  class="solr.PaPernTokenizerFactory"                          paPern="(,|;)s*"  />   <filter  class="solr.PaPernReplaceFilterFactory"                    paPern="s+"  replacement="  "  />   <filter  class="solr.PaPernReplaceFilterFactory"                    paPern="  and  "  replacement="  &amp;  "  />   <filter  class="solr.TrimFilterFactory"  />   <filter  class="solr.CapitalizaConFilterFactory"                    onlyFirstWord="false"  />   31
  • 31. “Pretty” facet.field Results {“id” : “MyExampleDoc”,! "category” : ” books ! and magazines; ! computers, “! }! copyField in schema <copyField “source”=“category” “dest”=“category_pretty”/> "facet_counts":{! "facet_fields":{! "category_pretty":[! "Books & Magazines",1,! "Computers",1]! 32
  • 32. facet.field Labels §  facet.query params are echoed verbatim when returning the constraint counts §  Optionally, one can declare a facet.query in solrconfig.xml and include a "label" that the presentation layer can parse out for display. “label” has no meaning to solr facet.query = {!label=‘Hot!’  }! +pop:[1 TO *] ! +pub_date:[NOW/DAY-1DAY TO *]! "facet_queries"  :  {      "  {!label=‘Hot!’  }  pop:[1  TO  *]  ...  "  :  15   }   33
  • 34. facet.method ! name facet.method description memory CPU enum enum Iterates over terms, filter-per-term in ~O(nTerms) calculating set the filterCache intersections field fc (single- Iterates over documents, Lucene O(nDocs) cache valued field) counting terms FieldCache Entry… int [maxDoc]+terms UnInvert fc (multi- Iterates over documents, O(maxDoc * num ~O(nDocs) edField valued field) counting terms terms per doc) Per- fcs Like field-cache, just Lucene O(nDocs) segment better for NRT since FieldCache +O(nTerms) field FieldCacheEntry is at Entries… int cache segment level. [maxDoc]+terms 35
  • 35. facet.method=fc Mem=int[maxDoc] CPU=O(nDocs in (single-valued field) +unique_values base set) Documents matching the base query Lucene FieldCache Entry Juggernaut (StringIndex) for the hero q=Juggernaut field 0 order: for each &facet=true 2 lookup doc, an index into lookup: the &facet.field=hero the lookup array string values 7 5 (null) 3 batman accumulator 5 flash 0 1 spiderman 1 4 superman Priority queue 0 increment flash, 5 5 wolverine 0 Batman, 3 2 0 1 2
  • 36. facet.method=fcs (trunk)
 (per-segment single-valued) Segment1 Segment2 Segment3 Segment4 FieldCache FieldCache FieldCache FieldCache Entry Entry Entry Entry accumulator1 accumulator2 accumulator3 accumulator4 inc lookup 0 0 1 0 3 2 3 1 0 Base 5 1 0 0 DocSet 2 0 0 4 7 thread4 1 thread2 thread3 2 thread1 Priority queue FieldCache + flash, 5 accumulator Batman, 3 merger (Priority queue)
  • 37. facet.method=fcs! l  Controllable multi-threading facet.method=fcs! facet.field={!threads=4}myfield!   l  Disadvantages l  Larger memory use (FieldCaches + accumulators) l  Slower (extra FieldCache merge step needed) – O(nTerms) l  Advantages l  Rebuilds FieldCache entries only for new segments (NRT friendly) l  Multi-threaded
  • 38. Per-segment faceting performance comparison Test index: 10M documents, 18 segments, single valued field Base DocSet=100 docs, facet.field on a field with 100,000 unique terms A Time for request* facet.method=fc facet.method=fcs static index 3 ms 244 ms quickly changing index 1388 ms 267 ms Base DocSet=1,000,000 docs, facet.field on a field with 100 unique terms B Time for request* facet.method=fc facet.method=fcs static index 26 ms 34 ms quickly changing index 741 ms 94 ms *complete request time, measured externally
  • 39. facet.method=fc 
 (multi-valued field) §  UnInvertedField - like single-valued FieldCache algorithm, but with multi-valued FieldCache §  Good for many unique terms, relatively few values per doc •  Best case: 50x faster, 5x smaller than “enum” (100K unique values, 1-5 per doc) •  O(n_docs), but optimization to count the inverse when n>maxDoc/2 §  Memory efficient •  Terms in a document are delta coded variable width ords (vints) •  Ord list for document packed in an int or in a shared byte[] •  Hybrid approach: “big terms” that match >5% of index use filterCache instead •  Only 1/128th of string values in memory
  • 40. facet.method=fc fieldValueCache §  Implicit cache with UnInvertedField entries •  Not autowarmed – use static warming request •  http://localhost:8983/solr/admin/stats.jsp (mem size, time to create, etc)
  • 41. Faceting: fieldValueCache §  Implicit cache with UnInvertedField entries •  Not autowarmed – use static warming request •  http://localhost:8983/solr/admin/stats.jsp (mem size, time to create, etc) item_cat: {field=cat,memSize=5376,tindexSize=52,time=2,phase1=2, nTerms=16,bigTerms=10,termInstances=6,uses=44}
  • 42. Multi-valued faceting: facet.method=enum §  facet.method=enum §  For each term in field: •  Retrieve filter Solr filterCache (in memory) •  Calculate intersection size hero:batman hero:flash Lucene Inverted Docs matching Index (on disk) base query 1 0 intersection 0 3 1 batman 1 3 5 8 count 1 5 5 flash 0 1 5 5 8 spiderman 2 4 7 9 Priority queue batman=2 superman 0 6 9 wolverine 1 2 7 8
  • 43. facet.method=enum ! §  O(n_terms_in_field) §  Short circuits based on term.df §  filterCache entries int[ndocs] or BitSet(maxDoc) §  Size filterCache appropriately •  Either autowarm filterCache, or use static warming queries (via newSearcher event) in solrconfig.xml §  facet.enum.cache.minDf - prevent filterCache use for small terms •  Also useful for huge index w/ no time constraints