SlideShare una empresa de Scribd logo
1 de 56
Descargar para leer sin conexión
Solr@
Things I’m not going to
     talk about:
     A/B Testing
        i18n
Continuos Deployment
About
 Us
10+ Million Listings
     500 qps
Architecture
 Overview
Architecture Overview
Thrift
Architecture Overview
Thrift
      struct Listing {
         1: i64 listing_id
     }

     struct ListingResults {
         1: i64 count,
         2: list<Listing> listings
     }

     service Search {
         ListingResults search(1:string query)
     }
Architecture Overview
Thrift
Generated Java server code:
 public class Search {

   public interface Iface {

     public ListingResults search(String query) throws TException;

     }


Generated PHP client code:
 class SearchClient implements SearchIf {

    /**...**/
    public function search($query)
    {
      $this->send_search($query);
      return $this->recv_search();
    }
Architecture Overview
Thrift
Why use Thrift?
    • Service Encapsulation
    • Reduced Network Traffic
Architecture Overview
Thrift
Why only return IDs?
    • Index Size
    • Easy to scale PK lookups
The Search Server
Architecture Overview
Search Server


 • Identical Code + Hardware
 • Roles/Behavior controlled by Env variables
 • Single Java Process
 • Solr running as a Jetty Servlet
 • Thrift Servers
 • Smoker
Architecture Overview
Search Server




Master-specific processes:
 • Incremental Indexer
 • External File Field Updaters
Load Balancing
Load Balancing
Thrift TSocketPool
Load Balancing
Thrift TSocketPool
Load Balancing
Thrift TSocketPool
Load Balancing
Server Affinity
Load Balancing
    Server Affinity Algorithm
$serversNew = array();
                                              [“host2”, “host3”, “host1”, “host4”]
$numServers = count($servers);

while($numServers > 0) {
   // Take the first 4 chars of the md5sum of the server count
   // and the query, mod the available servers
   $key = hexdec(substr(md5($numServers . '+' . $query),0,4))%($numServers);
   $keySet = array_keys($servers);
   $serverId = $keySet[$key];

    // Push the chosen server onto the new list and remove it
    // from the initial list
    array_push($serversNew, $servers[$serverId]);
    unset($servers[$serverId]);
    --$numServers;
}
Load Balancing
Server Affinity Algorithm
              $key = hexdec(substr(md5($query),0,4))




  “jewelry”                 [“host2”, “host3”, “host1”, “host4”]

  “scarf”                   [“host2”, “host3”, “host1”, “host4”]
Load Balancing
Server Affinity Algorithm
      $key = hexdec(substr(md5($numServers . '+' . $query),0,4))%(count($servers));




  “jewelry”                            [“host2”, “host3”, “host1”, “host4”]

  “scarf”                              [“host2”, “host1”, “host4”, “host3”]
Load Balancing
Server Affinity Results


      2%        20%
Load Balancing
Server Affinity Caveats

     • Stemming / Analysis
     • Be wary of query distribution
Replication
Replication
The Problem
Replication
The Problem
Replication
Multicast Rsync?
Replication
Multicast Rsync?
[15:25]  <engineer> patrick: i'm gonna test multi-rsyncing some indexes
from host1 to host2 and host3 in prod. I'll be watching the graphs and
what not, but let me know if you see anything funky with the network
[15:26]  <patrick> ok
....

[15:31]  <keyur> is the site down?
Replication
Multicast Rsync?
Hmm...Bit Torrent?
Replication
Bit Torrent POC
Using BitTornado:
Replication
Bit Torrent + Solr
Fork of TTorent: https://github.com/etsy/ttorrent

                            Multi-File Support
                       Performance Enhancements
Replication
Bit Torrent + Solr
Replication
Bit Torrent + Solr
Replication
Bit Torrent + Solr
Replication
Bit Torrent + Solr
Solr InterOp
QParsers
“writing query strings
   is for suckers”
Solr InterOp
QParsers

  http://host:8393/solr/person/select/?q=_query_:%22{!dismax
  %20qf=$fqf%20v=$fnq}%22%20OR%20(_query_:%22{!dismax%20qf=$fiqf
  %20v=$fiq}%22%20AND%20(_query_:%22{!dismax%20qf=$lwqf%20v=$lwq}
  %22%20OR%20_query_:%22{!dismax%20qf=$lqf%20v=$lq}%20%22))&fnq=
  %22giovanni%20fernandez-kincade
 %22&fqf=full_name^4&fiq=giovanni&fiqf=first_name^2.0%20first_name_s
 yn&qt=standard&lwq=fernandez-kincade*&lwqf=last_name&lq=fernandez-
 kincade&lqf=last_name^3
Solr InterOp
QParsers

   http://host:8393/solr/person/select/?q={!personrealqp}giovanni
  %20fernandez-kincade
Solr InterOp
QParsers

 class PersonNameRealQParser extends QParser {
   public PersonNameRealQParser(String qstr, SolrParams localParams,
       SolrParams params, SolrQueryRequest req) {
     super(qstr, localParams, params, req);
   }
Solr InterOp
   QParsers
  @Override
  public Query parse() throws ParseException {
    TermQuery exactFullNameQuery = new TermQuery(new Term("full_name", qstr));
    exactFullNameQuery.setBoost(4.0f);

    String[] userQueryTerms = qstr.split("s+");
    Query firstLastQuery = null;

    if (2 == userQueryTerms.length)
      firstLastQuery = parseAsFirstAndLast(userQueryTerms[0], userQueryTerms[1]);
    else
      firstLastQuery = parseAsFirstOrLast(userQueryTerms);

    DisjunctionMaxQuery realNameQuery = new DisjunctionMaxQuery(0);
    realNameQuery.add(exactFullNameQuery);
    realNameQuery.add(firstLastQuery);

    return realNameQuery;
  }
Solr InterOp
QParsers
The QParserPlugin that returns our new QParser:
  public class PersonNameRealQParserPlugin extends QParserPlugin {
   public static final String NAME = "personrealqp";

   @Override
   public void init(NamedList args) {}

   @Override
   public QParser createParser(String qstr, SolrParams localParams,
       SolrParams params, SolrQueryRequest req) {
     return new PersonNameRealQParser(qstr, localParams, params, req);
   }
 }
Solr InterOp
QParsers

Registering the plugin in solrconfig.xml:

   <queryParser name="personrealqp"
      class="com.etsy.person.solr.PersonNameRealQParserPlugin" />
Custom Stemmer
Solr InterOp
Custom Stemmer
Solr InterOp
Custom Stemmer

 banded, banding, birding, bouldering, bounded, buffing, bundler, canning,
carded, circled, coupler, dangler, doubler, firring, foiling, hooper, japanned,
lipped, napped, papered, pebbled, pitted, pocketed, reductive, ricer, rooter,
roper, seeded, shouldered, silvered, skinning, spindling, staining, stitcher,
                      strapped, threaded, yellowing
Solr InterOp
Custom Stemmer
First we extend KStemmer and intercept stem calls:
  public class LStemmer extends KStemmer {

     /**.....**/

      @Override
      String stem(String term) {
          String override = overrideStemTransformations.get(term);
          if(override != null) return override;
          return super.stem(term);
      }
  }
Solr InterOp
 Custom Stemmer
Then create a TokenFilter that uses the new Stemmer:
 final class LStemFilter extends TokenFilter {

   /**.....**/        
   protected LStemFilter(TokenStream input, int cacheSize) {
     super(input);
     stemmer = new LStemmer(cacheSize);
   }
        
   @Override
   public boolean incrementToken() throws IOException {
     /**....**/
   }
Solr InterOp
Custom Stemmer
Create a FilterFactory that exposes it:
       public class LStemFilterFactory extends BaseTokenFilterFactory {
        private int cacheSize = 20000;
        
        @Override
        public void init(Map<String, String> args) {
          super.init(args);
         String cacheSizeStr = args.get("cacheSize");
         if (cacheSizeStr != null) {
          cacheSize = Integer.parseInt(cacheSizeStr);
         }
       }
        
        @Override
       public TokenStream create(TokenStream in) {
        return new LStemFilter(in, cacheSize);
       }
     }
Solr InterOp
Custom Stemmer
And finally plug it into your analysis chain:

 <analyzer>
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.ASCIIFoldingFilterFactory"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true"
       words="solr/common/conf/stopwords.txt"/>
    <filter class="com.etsy.solr.analysis.LStemFilterFactory" />
    <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
 </analyzer>
Thanks!

Más contenido relacionado

La actualidad más candente

Trading with opensource tools, two years later
Trading with opensource tools, two years laterTrading with opensource tools, two years later
Trading with opensource tools, two years later
clkao
 
From mysql to MongoDB(MongoDB2011北京交流会)
From mysql to MongoDB(MongoDB2011北京交流会)From mysql to MongoDB(MongoDB2011北京交流会)
From mysql to MongoDB(MongoDB2011北京交流会)
Night Sailer
 
How to write rust instead of c and get away with it
How to write rust instead of c and get away with itHow to write rust instead of c and get away with it
How to write rust instead of c and get away with it
Flavien Raynaud
 

La actualidad más candente (20)

[131]해커의 관점에서 바라보기
[131]해커의 관점에서 바라보기[131]해커의 관점에서 바라보기
[131]해커의 관점에서 바라보기
 
Nubilus Perl
Nubilus PerlNubilus Perl
Nubilus Perl
 
Perl Web Client
Perl Web ClientPerl Web Client
Perl Web Client
 
dotCloud and go
dotCloud and godotCloud and go
dotCloud and go
 
New SPL Features in PHP 5.3
New SPL Features in PHP 5.3New SPL Features in PHP 5.3
New SPL Features in PHP 5.3
 
Redis for the Everyday Developer
Redis for the Everyday DeveloperRedis for the Everyday Developer
Redis for the Everyday Developer
 
Scala Days 2011 - Rogue: A Type-Safe DSL for MongoDB
Scala Days 2011 - Rogue: A Type-Safe DSL for MongoDBScala Days 2011 - Rogue: A Type-Safe DSL for MongoDB
Scala Days 2011 - Rogue: A Type-Safe DSL for MongoDB
 
Trading with opensource tools, two years later
Trading with opensource tools, two years laterTrading with opensource tools, two years later
Trading with opensource tools, two years later
 
Invertible-syntax 入門
Invertible-syntax 入門Invertible-syntax 入門
Invertible-syntax 入門
 
Your code is not a string
Your code is not a stringYour code is not a string
Your code is not a string
 
Intro to The PHP SPL
Intro to The PHP SPLIntro to The PHP SPL
Intro to The PHP SPL
 
SPL: The Undiscovered Library - DataStructures
SPL: The Undiscovered Library -  DataStructuresSPL: The Undiscovered Library -  DataStructures
SPL: The Undiscovered Library - DataStructures
 
Solr Anti-Patterns: Presented by Rafał Kuć, Sematext
Solr Anti-Patterns: Presented by Rafał Kuć, SematextSolr Anti-Patterns: Presented by Rafał Kuć, Sematext
Solr Anti-Patterns: Presented by Rafał Kuć, Sematext
 
From mysql to MongoDB(MongoDB2011北京交流会)
From mysql to MongoDB(MongoDB2011北京交流会)From mysql to MongoDB(MongoDB2011北京交流会)
From mysql to MongoDB(MongoDB2011北京交流会)
 
CS442 - Rogue: A Scala DSL for MongoDB
CS442 - Rogue: A Scala DSL for MongoDBCS442 - Rogue: A Scala DSL for MongoDB
CS442 - Rogue: A Scala DSL for MongoDB
 
はじめてのMongoDB
はじめてのMongoDBはじめてのMongoDB
はじめてのMongoDB
 
Spl Not A Bridge Too Far phpNW09
Spl Not A Bridge Too Far phpNW09Spl Not A Bridge Too Far phpNW09
Spl Not A Bridge Too Far phpNW09
 
Things I Believe Now That I'm Old
Things I Believe Now That I'm OldThings I Believe Now That I'm Old
Things I Believe Now That I'm Old
 
groovy & grails - lecture 2
groovy & grails - lecture 2groovy & grails - lecture 2
groovy & grails - lecture 2
 
How to write rust instead of c and get away with it
How to write rust instead of c and get away with itHow to write rust instead of c and get away with it
How to write rust instead of c and get away with it
 

Destacado

Emphemeral hadoop clusters in the cloud
Emphemeral hadoop clusters in the cloudEmphemeral hadoop clusters in the cloud
Emphemeral hadoop clusters in the cloud
gfodor
 
Transforming Search in the Digital Marketplace
Transforming Search in the Digital MarketplaceTransforming Search in the Digital Marketplace
Transforming Search in the Digital Marketplace
Jason Davis
 
Responding to Outages Maturely
Responding to Outages MaturelyResponding to Outages Maturely
Responding to Outages Maturely
John Allspaw
 

Destacado (12)

Emphemeral hadoop clusters in the cloud
Emphemeral hadoop clusters in the cloudEmphemeral hadoop clusters in the cloud
Emphemeral hadoop clusters in the cloud
 
Data mining for_product_search
Data mining for_product_searchData mining for_product_search
Data mining for_product_search
 
Transforming Search in the Digital Marketplace
Transforming Search in the Digital MarketplaceTransforming Search in the Digital Marketplace
Transforming Search in the Digital Marketplace
 
Responding to Outages Maturely
Responding to Outages MaturelyResponding to Outages Maturely
Responding to Outages Maturely
 
Migrating from PostgreSQL to MySQL Without Downtime
Migrating from PostgreSQL to MySQL Without DowntimeMigrating from PostgreSQL to MySQL Without Downtime
Migrating from PostgreSQL to MySQL Without Downtime
 
Living with Garbage by Gregg Donovan at LuceneSolr Revolution 2013
Living with Garbage by Gregg Donovan at LuceneSolr Revolution 2013Living with Garbage by Gregg Donovan at LuceneSolr Revolution 2013
Living with Garbage by Gregg Donovan at LuceneSolr Revolution 2013
 
DevTools at Etsy
DevTools at EtsyDevTools at Etsy
DevTools at Etsy
 
Resilient Response In Complex Systems
Resilient Response In Complex SystemsResilient Response In Complex Systems
Resilient Response In Complex Systems
 
Outages, PostMortems, and Human Error
Outages, PostMortems, and Human ErrorOutages, PostMortems, and Human Error
Outages, PostMortems, and Human Error
 
Scaling Etsy: What Went Wrong, What Went Right
Scaling Etsy: What Went Wrong, What Went RightScaling Etsy: What Went Wrong, What Went Right
Scaling Etsy: What Went Wrong, What Went Right
 
DevOpsSec: Appling DevOps Principles to Security, DevOpsDays Austin 2012
DevOpsSec: Appling DevOps Principles to Security, DevOpsDays Austin 2012DevOpsSec: Appling DevOps Principles to Security, DevOpsDays Austin 2012
DevOpsSec: Appling DevOps Principles to Security, DevOpsDays Austin 2012
 
Code as Craft: Building a Strong Engineering Culture at Etsy
Code as Craft: Building a Strong Engineering Culture at EtsyCode as Craft: Building a Strong Engineering Culture at Etsy
Code as Craft: Building a Strong Engineering Culture at Etsy
 

Similar a Solr @ Etsy - Apache Lucene Eurocon

Wprowadzenie do technologi Big Data i Apache Hadoop
Wprowadzenie do technologi Big Data i Apache HadoopWprowadzenie do technologi Big Data i Apache Hadoop
Wprowadzenie do technologi Big Data i Apache Hadoop
Sages
 
Refactoring to Macros with Clojure
Refactoring to Macros with ClojureRefactoring to Macros with Clojure
Refactoring to Macros with Clojure
Dmitry Buzdin
 
Ft10 de smet
Ft10 de smetFt10 de smet
Ft10 de smet
nkaluva
 
Store and Process Big Data with Hadoop and Cassandra
Store and Process Big Data with Hadoop and CassandraStore and Process Big Data with Hadoop and Cassandra
Store and Process Big Data with Hadoop and Cassandra
Deependra Ariyadewa
 
服务框架: Thrift & PasteScript
服务框架: Thrift & PasteScript服务框架: Thrift & PasteScript
服务框架: Thrift & PasteScript
Qiangning Hong
 
fog or: How I Learned to Stop Worrying and Love the Cloud (OpenStack Edition)
fog or: How I Learned to Stop Worrying and Love the Cloud (OpenStack Edition)fog or: How I Learned to Stop Worrying and Love the Cloud (OpenStack Edition)
fog or: How I Learned to Stop Worrying and Love the Cloud (OpenStack Edition)
Wesley Beary
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr Developers
Erik Hatcher
 

Similar a Solr @ Etsy - Apache Lucene Eurocon (20)

Java
JavaJava
Java
 
Developing a Real-time Engine with Akka, Cassandra, and Spray
Developing a Real-time Engine with Akka, Cassandra, and SprayDeveloping a Real-time Engine with Akka, Cassandra, and Spray
Developing a Real-time Engine with Akka, Cassandra, and Spray
 
What is new in Java 8
What is new in Java 8What is new in Java 8
What is new in Java 8
 
Wprowadzenie do technologi Big Data i Apache Hadoop
Wprowadzenie do technologi Big Data i Apache HadoopWprowadzenie do technologi Big Data i Apache Hadoop
Wprowadzenie do technologi Big Data i Apache Hadoop
 
Wprowadzenie do technologii Big Data / Intro to Big Data Ecosystem
Wprowadzenie do technologii Big Data / Intro to Big Data EcosystemWprowadzenie do technologii Big Data / Intro to Big Data Ecosystem
Wprowadzenie do technologii Big Data / Intro to Big Data Ecosystem
 
Presto anatomy
Presto anatomyPresto anatomy
Presto anatomy
 
Refactoring to Macros with Clojure
Refactoring to Macros with ClojureRefactoring to Macros with Clojure
Refactoring to Macros with Clojure
 
Painless Persistence with Realm
Painless Persistence with RealmPainless Persistence with Realm
Painless Persistence with Realm
 
Shooting the Rapids
Shooting the RapidsShooting the Rapids
Shooting the Rapids
 
Ft10 de smet
Ft10 de smetFt10 de smet
Ft10 de smet
 
Kerberizing spark. Spark Summit east
Kerberizing spark. Spark Summit eastKerberizing spark. Spark Summit east
Kerberizing spark. Spark Summit east
 
Store and Process Big Data with Hadoop and Cassandra
Store and Process Big Data with Hadoop and CassandraStore and Process Big Data with Hadoop and Cassandra
Store and Process Big Data with Hadoop and Cassandra
 
Parboiled explained
Parboiled explainedParboiled explained
Parboiled explained
 
服务框架: Thrift & PasteScript
服务框架: Thrift & PasteScript服务框架: Thrift & PasteScript
服务框架: Thrift & PasteScript
 
Apache Thrift
Apache ThriftApache Thrift
Apache Thrift
 
fog or: How I Learned to Stop Worrying and Love the Cloud (OpenStack Edition)
fog or: How I Learned to Stop Worrying and Love the Cloud (OpenStack Edition)fog or: How I Learned to Stop Worrying and Love the Cloud (OpenStack Edition)
fog or: How I Learned to Stop Worrying and Love the Cloud (OpenStack Edition)
 
Where the wild things are - Benchmarking and Micro-Optimisations
Where the wild things are - Benchmarking and Micro-OptimisationsWhere the wild things are - Benchmarking and Micro-Optimisations
Where the wild things are - Benchmarking and Micro-Optimisations
 
Protocol handler in Gecko
Protocol handler in GeckoProtocol handler in Gecko
Protocol handler in Gecko
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr Developers
 
fog or: How I Learned to Stop Worrying and Love the Cloud
fog or: How I Learned to Stop Worrying and Love the Cloudfog or: How I Learned to Stop Worrying and Love the Cloud
fog or: How I Learned to Stop Worrying and Love the Cloud
 

Último

Último (20)

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 

Solr @ Etsy - Apache Lucene Eurocon

  • 2. Things I’m not going to talk about: A/B Testing i18n Continuos Deployment
  • 4.
  • 5.
  • 9. Architecture Overview Thrift struct Listing { 1: i64 listing_id } struct ListingResults { 1: i64 count, 2: list<Listing> listings } service Search { ListingResults search(1:string query) }
  • 10. Architecture Overview Thrift Generated Java server code: public class Search { public interface Iface { public ListingResults search(String query) throws TException; } Generated PHP client code: class SearchClient implements SearchIf { /**...**/ public function search($query) { $this->send_search($query); return $this->recv_search(); }
  • 11. Architecture Overview Thrift Why use Thrift? • Service Encapsulation • Reduced Network Traffic
  • 12. Architecture Overview Thrift Why only return IDs? • Index Size • Easy to scale PK lookups
  • 14. Architecture Overview Search Server • Identical Code + Hardware • Roles/Behavior controlled by Env variables • Single Java Process • Solr running as a Jetty Servlet • Thrift Servers • Smoker
  • 15. Architecture Overview Search Server Master-specific processes: • Incremental Indexer • External File Field Updaters
  • 21. Load Balancing Server Affinity Algorithm $serversNew = array(); [“host2”, “host3”, “host1”, “host4”] $numServers = count($servers); while($numServers > 0) { // Take the first 4 chars of the md5sum of the server count // and the query, mod the available servers $key = hexdec(substr(md5($numServers . '+' . $query),0,4))%($numServers); $keySet = array_keys($servers); $serverId = $keySet[$key]; // Push the chosen server onto the new list and remove it // from the initial list array_push($serversNew, $servers[$serverId]); unset($servers[$serverId]); --$numServers; }
  • 22. Load Balancing Server Affinity Algorithm $key = hexdec(substr(md5($query),0,4)) “jewelry” [“host2”, “host3”, “host1”, “host4”] “scarf” [“host2”, “host3”, “host1”, “host4”]
  • 23. Load Balancing Server Affinity Algorithm $key = hexdec(substr(md5($numServers . '+' . $query),0,4))%(count($servers)); “jewelry” [“host2”, “host3”, “host1”, “host4”] “scarf” [“host2”, “host1”, “host4”, “host3”]
  • 25. Load Balancing Server Affinity Caveats • Stemming / Analysis • Be wary of query distribution
  • 30. Replication Multicast Rsync? [15:25]  <engineer> patrick: i'm gonna test multi-rsyncing some indexes from host1 to host2 and host3 in prod. I'll be watching the graphs and what not, but let me know if you see anything funky with the network [15:26]  <patrick> ok .... [15:31]  <keyur> is the site down?
  • 34. Replication Bit Torrent + Solr Fork of TTorent: https://github.com/etsy/ttorrent Multi-File Support Performance Enhancements
  • 41. “writing query strings is for suckers”
  • 42.
  • 43. Solr InterOp QParsers http://host:8393/solr/person/select/?q=_query_:%22{!dismax %20qf=$fqf%20v=$fnq}%22%20OR%20(_query_:%22{!dismax%20qf=$fiqf %20v=$fiq}%22%20AND%20(_query_:%22{!dismax%20qf=$lwqf%20v=$lwq} %22%20OR%20_query_:%22{!dismax%20qf=$lqf%20v=$lq}%20%22))&fnq= %22giovanni%20fernandez-kincade %22&fqf=full_name^4&fiq=giovanni&fiqf=first_name^2.0%20first_name_s yn&qt=standard&lwq=fernandez-kincade*&lwqf=last_name&lq=fernandez- kincade&lqf=last_name^3
  • 44. Solr InterOp QParsers http://host:8393/solr/person/select/?q={!personrealqp}giovanni %20fernandez-kincade
  • 45. Solr InterOp QParsers class PersonNameRealQParser extends QParser {    public PersonNameRealQParser(String qstr, SolrParams localParams, SolrParams params, SolrQueryRequest req) {      super(qstr, localParams, params, req);    }
  • 46. Solr InterOp QParsers @Override   public Query parse() throws ParseException { TermQuery exactFullNameQuery = new TermQuery(new Term("full_name", qstr));     exactFullNameQuery.setBoost(4.0f);     String[] userQueryTerms = qstr.split("s+");     Query firstLastQuery = null;     if (2 == userQueryTerms.length)       firstLastQuery = parseAsFirstAndLast(userQueryTerms[0], userQueryTerms[1]);     else       firstLastQuery = parseAsFirstOrLast(userQueryTerms);     DisjunctionMaxQuery realNameQuery = new DisjunctionMaxQuery(0);     realNameQuery.add(exactFullNameQuery);     realNameQuery.add(firstLastQuery);     return realNameQuery;   }
  • 47. Solr InterOp QParsers The QParserPlugin that returns our new QParser: public class PersonNameRealQParserPlugin extends QParserPlugin {    public static final String NAME = "personrealqp";    @Override    public void init(NamedList args) {}    @Override    public QParser createParser(String qstr, SolrParams localParams, SolrParams params, SolrQueryRequest req) {      return new PersonNameRealQParser(qstr, localParams, params, req);    } }
  • 48. Solr InterOp QParsers Registering the plugin in solrconfig.xml: <queryParser name="personrealqp" class="com.etsy.person.solr.PersonNameRealQParserPlugin" />
  • 51. Solr InterOp Custom Stemmer banded, banding, birding, bouldering, bounded, buffing, bundler, canning, carded, circled, coupler, dangler, doubler, firring, foiling, hooper, japanned, lipped, napped, papered, pebbled, pitted, pocketed, reductive, ricer, rooter, roper, seeded, shouldered, silvered, skinning, spindling, staining, stitcher, strapped, threaded, yellowing
  • 52. Solr InterOp Custom Stemmer First we extend KStemmer and intercept stem calls: public class LStemmer extends KStemmer { /**.....**/      @Override      String stem(String term) {          String override = overrideStemTransformations.get(term);          if(override != null) return override;          return super.stem(term);      } }
  • 53. Solr InterOp Custom Stemmer Then create a TokenFilter that uses the new Stemmer: final class LStemFilter extends TokenFilter { /**.....**/         protected LStemFilter(TokenStream input, int cacheSize) { super(input); stemmer = new LStemmer(cacheSize); }          @Override public boolean incrementToken() throws IOException { /**....**/ }
  • 54. Solr InterOp Custom Stemmer Create a FilterFactory that exposes it: public class LStemFilterFactory extends BaseTokenFilterFactory { private int cacheSize = 20000;      @Override public void init(Map<String, String> args) { super.init(args);      String cacheSizeStr = args.get("cacheSize");      if (cacheSizeStr != null) {       cacheSize = Integer.parseInt(cacheSizeStr);      }    }      @Override    public TokenStream create(TokenStream in) {     return new LStemFilter(in, cacheSize);    } }
  • 55. Solr InterOp Custom Stemmer And finally plug it into your analysis chain: <analyzer> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.ASCIIFoldingFilterFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="solr/common/conf/stopwords.txt"/> <filter class="com.etsy.solr.analysis.LStemFilterFactory" /> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> </analyzer>