SlideShare una empresa de Scribd logo
1 de 35
Descargar para leer sin conexión
Search-time Parallelism 
Shikhar Bhushan, Etsy Inc. 
shikhar@etsy.com 
@shikhrr
26 million listings
Over 1 million shops
Search Infrastructure at Etsy 
Parallel clusters ‘flip’ & ‘flop’ - dark/live 
Listings index: 
2013: unsharded Solr, one large JVM 
2014: locally sharded Solr, 8 smaller JVM’s 
big win on latency tail
Speeding up search 
• Low-level improvements: use less CPU 
• Parallelize: use more cores
Amdahl’s law 
“The speedup of a program using multiple processors in parallel computing is 
limited by the time needed for the sequential fraction of the program.” 
Wikipedia
Sharding 
Using: 
• Solr distributed search 
• SolrCloud 
• Elasticsearch 
shard0 
hash(pk) % num_shards 
shard1 shard2 shard3 
shard4 shard5 shard6 shard7
Why not shard? 
New challenges arise once you go distributed, some examples: 
• More moving parts and failure modes to deal with 
• Missing features with distributed search 
• Index statistics on shards can vary, distorting IDF 
Tempting to defer sharding if not necessary due to index size.
Segment-level parallelism 
Fully-functioning mini-indexes within your Lucene index.
Collectors! 
• Hits get accumulated - ‘collected’ by the Collector abstraction. 
• Invoked for every hit that matches the Query. 
• Has the Scorer available to get the score for current hit if needed. 
• Output - e.g. top N hits, number of hits, grouped hits - can be retrieved when done.
Existing solution 
IndexSearcher(IndexReaderContext, ExecutorService) 
Special-cased for: 
TopScoreDocCollector (sort by score) 
and 
TopFieldCollector (arbitrary sort-spec) 
Only ships with parallel search support for the above TopDocs collectors.
Not composable 
Difficult to build parallelization for every possible permutation. 
With Solr you may have: 
TimeLimitingCollector 
|— MultiCollector 
|— TopScoreDocCollector 
|— DocSetCollector
Sync considered harmful 
protected void search( 
List<LeafReaderContext> leaves, 
Weight weight, 
Collector collector 
) throws IOException { 
// TODO: should we make this 
// threaded...? the Collector could be sync'd? 
// always use single thread: 
for (LeafReaderContext ctx : leaves) { // search each subreader 
… 
collect() called for every single document that matches the query. 
Can expect a lot of contention! 
IndexSearcher.java
Proposed solution 
LUCENE-5299 Refactor Collector API for parallelism
API review: Before 
public interface Collector { 
LeafCollector getLeafCollector(LeafReaderContext context) throws IOException; 
} 
public interface LeafCollector { 
void setScorer(Scorer scorer) throws IOException; 
void collect(int doc) throws IOException; 
boolean acceptsDocsOutOfOrder(); 
}
New methods: Collector 
public interface Collector { 
LeafCollector getLeafCollector(LeafReaderContext context) throws IOException; 
// NEW METHODS: 
boolean isParallelizable(); 
void setParallelized(); 
void done() throws IOException; 
}
New methods: LeafCollector 
public interface LeafCollector { 
void setScorer(Scorer scorer) throws IOException; 
void collect(int doc) throws IOException; 
boolean acceptsDocsOutOfOrder(); 
// NEW METHOD: 
void leafDone() throws IOException; 
}
Opt-in 
Collector.isParallelizable() 
Need every Collector in the chain to be parallelizable - can start attacking at the 
level of individual collectors. 
public class MultiCollector implements Collector { 
… 
@Override 
public boolean isParallelizable() { 
for (Collector c: collectors) { 
if (!c.isParallelizable()) { 
return false; 
} 
} 
return true; 
} 
… 
}
Don’t penalize serial 
Collector.setParallelized() 
‘Heads-up’ to the Collector whether collection will be parallelized, so it can adapt in 
case the parallelism-friendly approach has unnecessary cost in the serial case.
Non-blocking constructs 
Guarantee to always execute on primary search thread 
(existing) 
LeafCollector Collector.getLeafCollector() 
(new) 
void LeafCollector.leafDone() 
void Collector.done() 
=> safe places to act on shared mutable state
New search strategy 
IndexSearcher(IndexReaderContext, SearchStrategy) 
IndexSearcher.search() factored into: 
• SerialSearchStrategy 
• ParallelSearchStrategy(Executor e, int parallelism) 
• parallelism used to throttle maximum concurrent tasks at the request-level
Parallel search - not just collection 
• Scoring is thread-safe and segment-level. 
• Collection is also segment-level, but typically computes its outcome as shared state 
between leafs e.g. TopDocs over your index. 
• By making Collector API parallelism-friendly, we can parallelize search as a whole.
Stupidly parallelizable 
public class TotalHitCountCollector implements Collector { 
private int totalHits; 
@Override 
public LeafCollector getLeafCollector(LeafReaderContext context) throws IOException { 
return new LeafCollector() { 
private int totalHits = 0; 
.. 
@Override 
public void collect(int doc) throws IOException { 
totalHits++; 
} 
.. 
@Override 
public void leafDone() throws IOException { 
TotalHitCountCollector.this.totalHits += totalHits; 
} 
}; 
} 
.. 
@Override 
public boolean isParallelizable() { 
return true; 
} 
}
Fun to parallelize 
Solr DocSetCollector 
populates document ID’s in FixedBitSet(maxDoc) - internally a long[] 
leaf docBase maxDoc docId range 
0 0 42 [0 -­‐ 41] 
1 42 20 [42 -­‐ 61] 
To address possible race condition at segment boundaries, when parallelized: 
• collect() first and last 64 document ID’s for the segment into LeafCollector-private 
longs, all others into shared bitset. 
• when leafDone() merge these boundary document ID’s into shared bitset.
Bigger tradeoffs 
Lucene TopScoreDocCollector uses a single priority queue in serial case. 
When parallelized: 
• More memory: lazy pool of HitQueue - grab when getLeafCollector(), return 
when leafDone(), merge when done(). 
• More computation: in addition to the merge step - less likely to immediately discard 
hits that won’t eventually make it, as using multiple priority queues.
trunk (serial) vs patch (serial) 
Task QPS baseline StdDev QPS parcol StdDev Pct diff 
HighSloppyPhrase 175.08 (11.5%) 156.12 (12.7%) -10.8% ( -31% - 15%) 
LowSloppyPhrase 368.89 (11.5%) 337.39 (14.5%) -8.5% ( -30% - 19%) 
LowSpanNear 367.40 (13.3%) 336.12 (24.5%) -8.5% ( -40% - 33%) 
LowPhrase 364.59 (13.2%) 336.56 (15.2%) -7.7% ( -31% - 23%) 
HighPhrase 125.30 (13.6%) 116.24 (12.2%) -7.2% ( -29% - 21%) 
MedSpanNear 147.73 (13.6%) 137.15 (20.3%) -7.2% ( -36% - 30%) 
MedSloppyPhrase 414.70 (16.5%) 386.07 (13.6%) -6.9% ( -31% - 27%) 
LowTerm 2288.63 (17.4%) 2138.57 (12.7%) -6.6% ( -31% - 28%) 
Respell 62.78 (86.4%) 58.80 (76.0%) -6.3% ( -90% - 1143%) 
MedPhrase 350.78 (14.0%) 331.50 (12.5%) -5.5% ( -28% - 24%) 
HighTerm 779.36 (10.0%) 740.42 (11.8%) -5.0% ( -24% - 18%) 
PKLookup 238.68 (10.9%) 226.79 (12.8%) -5.0% ( -25% - 20%) 
OrHighMed 369.05 (11.2%) 351.05 (12.2%) -4.9% ( -25% - 20%) 
MedTerm 1166.96 (14.6%) 1116.42 (13.2%) -4.3% ( -27% - 27%) 
AndHighMed 616.82 (12.9%) 590.46 (13.8%) -4.3% ( -27% - 25%) 
AndHighHigh 155.12 (22.9%) 150.33 (22.2%) -3.1% ( -39% - 54%) 
Prefix3 522.67 (14.9%) 508.08 (13.4%) -2.8% ( -27% - 29%) 
OrHighLow 233.72 (11.6%) 227.83 (11.9%) -2.5% ( -23% - 23%) 
AndHighLow 982.58 (16.0%) 966.78 (13.2%) -1.6% ( -26% - 32%) 
IntNRQ 88.57 (15.1%) 87.37 (12.9%) -1.4% ( -25% - 31%) 
HighSpanNear 92.55 (21.2%) 91.55 (11.7%) -1.1% ( -28% - 40%) 
Wildcard 226.43 (11.5%) 225.37 (11.3%) -0.5% ( -20% - 25%) 
Fuzzy2 7.02 (17.1%) 7.02 (19.0%) 0.0% ( -30% - 43%) 
OrHighHigh 242.44 (14.3%) 243.60 (14.0%) 0.5% ( -24% - 33%) 
Fuzzy1 9.62 (30.7%) 12.40 (102.9%) 28.9% ( -80% - 234%) 
luceneutil with wikimedium500k 
SerialSearchStrategy 
SEARCH_NUM_THREADS=32 
Hardware: 32-core Sandy Bridge
trunk (serial) vs patch (parallel) 
Task QPS baseline StdDev QPS parcol StdDev Pct diff 
LowTerm 2401.88 (12.7%) 1799.27 (6.3%) -25.1% ( -39% - -6%) 
Fuzzy2 6.52 (14.4%) 5.74 (24.0%) -11.9% ( -43% - 30%) 
Respell 45.13 (90.2%) 40.94 (83.5%) -9.3% ( -96% - 1679%) 
PKLookup 232.02 (12.9%) 228.35 (12.4%) -1.6% ( -23% - 27%) 
MedTerm 1612.01 (14.0%) 1601.71 (10.9%) -0.6% ( -22% - 28%) 
Fuzzy1 14.19 (79.3%) 14.71 (177.6%) 3.7% (-141% - 1258%) 
AndHighLow 1205.65 (17.5%) 1254.76 (15.9%) 4.1% ( -24% - 45%) 
MedSpanNear 478.11 (25.4%) 946.72 (34.5%) 98.0% ( 30% - 211%) 
OrHighLow 424.71 (14.5%) 941.39 (31.4%) 121.7% ( 66% - 195%) 
AndHighHigh 377.82 (13.3%) 910.77 (32.2%) 141.1% ( 84% - 215%) 
HighTerm 325.35 (11.3%) 855.63 (8.9%) 163.0% ( 128% - 206%) 
AndHighMed 346.57 (11.7%) 914.59 (26.4%) 163.9% ( 112% - 228%) 
MedPhrase 227.47 (13.1%) 621.50 (22.9%) 173.2% ( 121% - 240%) 
LowSloppyPhrase 265.21 (10.4%) 748.30 (49.2%) 182.2% ( 110% - 269%) 
OrHighMed 221.49 (12.2%) 632.55 (23.9%) 185.6% ( 133% - 252%) 
LowPhrase 190.34 (14.9%) 586.71 (22.6%) 208.2% ( 148% - 288%) 
Prefix3 305.01 (15.9%) 948.63 (17.0%) 211.0% ( 153% - 289%) 
MedSloppyPhrase 229.15 (15.0%) 718.29 (41.4%) 213.5% ( 136% - 317%) 
LowSpanNear 102.98 (14.0%) 323.91 (37.1%) 214.5% ( 143% - 309%) 
Wildcard 249.66 (13.3%) 787.42 (17.0%) 215.4% ( 163% - 283%) 
OrHighHigh 124.76 (10.5%) 394.72 (35.0%) 216.4% ( 154% - 292%) 
HighSpanNear 119.23 (15.5%) 386.33 (57.5%) 224.0% ( 130% - 351%) 
HighPhrase 86.95 (14.4%) 293.00 (15.5%) 237.0% ( 180% - 311%) 
HighSloppyPhrase 136.37 (12.9%) 462.38 (21.7%) 239.1% ( 181% - 314%) 
IntNRQ 100.48 (14.1%) 391.02 (14.2%) 289.1% ( 228% - 369%) 
luceneutil with wikimedium500k 
ParallelSearchStrategy parallelism=8 
ForkJoinPool with 128 threads 
SEARCH_NUM_THREADS=32 
Hardware: 32-core Sandy Bridge
Replay testing 
Replayed traffic from Etsy listing search request logs to an experimental cluster 
running LUCENE-5299 changes in an unsharded setup. 
p95 latency p99 latency 
serial 
parallel
Throughput 
In general, system needs to do more work overall, which impacts throughput: 
• concurrency overhead 
• context switches 
• locally optimal choices at the leaf-level 
• merge cost 
serial user cpu % parallel user cpu %
Sharding comparison 
Segment-level parallelism Sharding 
Limited to single JVM. 
Distributed search not required. 
Scalable across JVM’s. 
Distributed search required. 
Sensitive to segment count and sizing. Index shards can be kept similarly sized. 
Prone to “shard lag” - limited by slowest shard. 
In-process merging is cheaper. Merge cost higher due to serialization. 
Existing solution has limited applicability. 
LUCENE-5299 solution not in trunk. 
Tried-and-tested approach.
Not mutually exclusive 
Sharding + Segment-level parallelism = ?
Next steps 
• Figure out whether serial penalty is real. 
• Semantics around exceptions during collection and ‘done’ callbacks. 
• Lots more collectors can be made parallelizable. 
• Your contributions welcome - LUCENE-5299. 
• Committer interest especially welcome!
Thanks 
Shikhar Bhushan 
shikhar@etsy.com 
@shikhrr 
codeascraft.com 
etsy.com/careers

Más contenido relacionado

La actualidad más candente

Call me maybe: Jepsen and flaky networks
Call me maybe: Jepsen and flaky networksCall me maybe: Jepsen and flaky networks
Call me maybe: Jepsen and flaky networksShalin Shekhar Mangar
 
Tuning Solr and its Pipeline for Logs: Presented by Rafał Kuć & Radu Gheorghe...
Tuning Solr and its Pipeline for Logs: Presented by Rafał Kuć & Radu Gheorghe...Tuning Solr and its Pipeline for Logs: Presented by Rafał Kuć & Radu Gheorghe...
Tuning Solr and its Pipeline for Logs: Presented by Rafał Kuć & Radu Gheorghe...Lucidworks
 
Lucene Revolution 2013 - Scaling Solr Cloud for Large-scale Social Media Anal...
Lucene Revolution 2013 - Scaling Solr Cloud for Large-scale Social Media Anal...Lucene Revolution 2013 - Scaling Solr Cloud for Large-scale Social Media Anal...
Lucene Revolution 2013 - Scaling Solr Cloud for Large-scale Social Media Anal...thelabdude
 
SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr PerformanceSFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr PerformanceLucidworks (Archived)
 
NYC Lucene/Solr Meetup: Spark / Solr
NYC Lucene/Solr Meetup: Spark / SolrNYC Lucene/Solr Meetup: Spark / Solr
NYC Lucene/Solr Meetup: Spark / Solrthelabdude
 
Using Morphlines for On-the-Fly ETL
Using Morphlines for On-the-Fly ETLUsing Morphlines for On-the-Fly ETL
Using Morphlines for On-the-Fly ETLCloudera, Inc.
 
Scaling Through Partitioning and Shard Splitting in Solr 4
Scaling Through Partitioning and Shard Splitting in Solr 4Scaling Through Partitioning and Shard Splitting in Solr 4
Scaling Through Partitioning and Shard Splitting in Solr 4thelabdude
 
GIDS2014: SolrCloud: Searching Big Data
GIDS2014: SolrCloud: Searching Big DataGIDS2014: SolrCloud: Searching Big Data
GIDS2014: SolrCloud: Searching Big DataShalin Shekhar Mangar
 
Rebalance API for SolrCloud: Presented by Nitin Sharma, Netflix & Suruchi Sha...
Rebalance API for SolrCloud: Presented by Nitin Sharma, Netflix & Suruchi Sha...Rebalance API for SolrCloud: Presented by Nitin Sharma, Netflix & Suruchi Sha...
Rebalance API for SolrCloud: Presented by Nitin Sharma, Netflix & Suruchi Sha...Lucidworks
 
Solr Exchange: Introduction to SolrCloud
Solr Exchange: Introduction to SolrCloudSolr Exchange: Introduction to SolrCloud
Solr Exchange: Introduction to SolrCloudthelabdude
 
Deploying and managing SolrCloud in the cloud using the Solr Scale Toolkit
Deploying and managing SolrCloud in the cloud using the Solr Scale ToolkitDeploying and managing SolrCloud in the cloud using the Solr Scale Toolkit
Deploying and managing SolrCloud in the cloud using the Solr Scale Toolkitthelabdude
 
Loading 350M documents into a large Solr cluster: Presented by Dion Olsthoorn...
Loading 350M documents into a large Solr cluster: Presented by Dion Olsthoorn...Loading 350M documents into a large Solr cluster: Presented by Dion Olsthoorn...
Loading 350M documents into a large Solr cluster: Presented by Dion Olsthoorn...Lucidworks
 
Exactly-Once Streaming from Kafka-(Cody Koeninger, Kixer)
Exactly-Once Streaming from Kafka-(Cody Koeninger, Kixer)Exactly-Once Streaming from Kafka-(Cody Koeninger, Kixer)
Exactly-Once Streaming from Kafka-(Cody Koeninger, Kixer)Spark Summit
 
Deploying and managing Solr at scale
Deploying and managing Solr at scaleDeploying and managing Solr at scale
Deploying and managing Solr at scaleAnshum Gupta
 
Effective testing for spark programs Strata NY 2015
Effective testing for spark programs   Strata NY 2015Effective testing for spark programs   Strata NY 2015
Effective testing for spark programs Strata NY 2015Holden Karau
 
Developing Java Streaming Applications with Apache Storm
Developing Java Streaming Applications with Apache StormDeveloping Java Streaming Applications with Apache Storm
Developing Java Streaming Applications with Apache StormLester Martin
 
What's New on AWS and What it Means to You
What's New on AWS and What it Means to YouWhat's New on AWS and What it Means to You
What's New on AWS and What it Means to YouAmazon Web Services
 

La actualidad más candente (20)

Call me maybe: Jepsen and flaky networks
Call me maybe: Jepsen and flaky networksCall me maybe: Jepsen and flaky networks
Call me maybe: Jepsen and flaky networks
 
Tuning Solr and its Pipeline for Logs: Presented by Rafał Kuć & Radu Gheorghe...
Tuning Solr and its Pipeline for Logs: Presented by Rafał Kuć & Radu Gheorghe...Tuning Solr and its Pipeline for Logs: Presented by Rafał Kuć & Radu Gheorghe...
Tuning Solr and its Pipeline for Logs: Presented by Rafał Kuć & Radu Gheorghe...
 
Lucene Revolution 2013 - Scaling Solr Cloud for Large-scale Social Media Anal...
Lucene Revolution 2013 - Scaling Solr Cloud for Large-scale Social Media Anal...Lucene Revolution 2013 - Scaling Solr Cloud for Large-scale Social Media Anal...
Lucene Revolution 2013 - Scaling Solr Cloud for Large-scale Social Media Anal...
 
SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr PerformanceSFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
 
NYC Lucene/Solr Meetup: Spark / Solr
NYC Lucene/Solr Meetup: Spark / SolrNYC Lucene/Solr Meetup: Spark / Solr
NYC Lucene/Solr Meetup: Spark / Solr
 
Using Morphlines for On-the-Fly ETL
Using Morphlines for On-the-Fly ETLUsing Morphlines for On-the-Fly ETL
Using Morphlines for On-the-Fly ETL
 
Scaling Through Partitioning and Shard Splitting in Solr 4
Scaling Through Partitioning and Shard Splitting in Solr 4Scaling Through Partitioning and Shard Splitting in Solr 4
Scaling Through Partitioning and Shard Splitting in Solr 4
 
GIDS2014: SolrCloud: Searching Big Data
GIDS2014: SolrCloud: Searching Big DataGIDS2014: SolrCloud: Searching Big Data
GIDS2014: SolrCloud: Searching Big Data
 
Apache SolrCloud
Apache SolrCloudApache SolrCloud
Apache SolrCloud
 
Rebalance API for SolrCloud: Presented by Nitin Sharma, Netflix & Suruchi Sha...
Rebalance API for SolrCloud: Presented by Nitin Sharma, Netflix & Suruchi Sha...Rebalance API for SolrCloud: Presented by Nitin Sharma, Netflix & Suruchi Sha...
Rebalance API for SolrCloud: Presented by Nitin Sharma, Netflix & Suruchi Sha...
 
Solr Exchange: Introduction to SolrCloud
Solr Exchange: Introduction to SolrCloudSolr Exchange: Introduction to SolrCloud
Solr Exchange: Introduction to SolrCloud
 
Deploying and managing SolrCloud in the cloud using the Solr Scale Toolkit
Deploying and managing SolrCloud in the cloud using the Solr Scale ToolkitDeploying and managing SolrCloud in the cloud using the Solr Scale Toolkit
Deploying and managing SolrCloud in the cloud using the Solr Scale Toolkit
 
Loading 350M documents into a large Solr cluster: Presented by Dion Olsthoorn...
Loading 350M documents into a large Solr cluster: Presented by Dion Olsthoorn...Loading 350M documents into a large Solr cluster: Presented by Dion Olsthoorn...
Loading 350M documents into a large Solr cluster: Presented by Dion Olsthoorn...
 
Scaling search with SolrCloud
Scaling search with SolrCloudScaling search with SolrCloud
Scaling search with SolrCloud
 
Exactly-Once Streaming from Kafka-(Cody Koeninger, Kixer)
Exactly-Once Streaming from Kafka-(Cody Koeninger, Kixer)Exactly-Once Streaming from Kafka-(Cody Koeninger, Kixer)
Exactly-Once Streaming from Kafka-(Cody Koeninger, Kixer)
 
Deploying and managing Solr at scale
Deploying and managing Solr at scaleDeploying and managing Solr at scale
Deploying and managing Solr at scale
 
Effective testing for spark programs Strata NY 2015
Effective testing for spark programs   Strata NY 2015Effective testing for spark programs   Strata NY 2015
Effective testing for spark programs Strata NY 2015
 
Developing Java Streaming Applications with Apache Storm
Developing Java Streaming Applications with Apache StormDeveloping Java Streaming Applications with Apache Storm
Developing Java Streaming Applications with Apache Storm
 
What's New on AWS and What it Means to You
What's New on AWS and What it Means to YouWhat's New on AWS and What it Means to You
What's New on AWS and What it Means to You
 
Apache Storm Tutorial
Apache Storm TutorialApache Storm Tutorial
Apache Storm Tutorial
 

Similar a Search-time Parallelism: Presented by Shikhar Bhushan, Etsy

Databases Have Forgotten About Single Node Performance, A Wrongheaded Trade Off
Databases Have Forgotten About Single Node Performance, A Wrongheaded Trade OffDatabases Have Forgotten About Single Node Performance, A Wrongheaded Trade Off
Databases Have Forgotten About Single Node Performance, A Wrongheaded Trade OffTimescale
 
20-Line Lifesavers: Coding simple solutions in the GATK
20-Line Lifesavers: Coding simple solutions in the GATK20-Line Lifesavers: Coding simple solutions in the GATK
20-Line Lifesavers: Coding simple solutions in the GATKDan Bolser
 
Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...
Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...
Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...Julian Hyde
 
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...Altinity Ltd
 
Docker Logging and analysing with Elastic Stack - Jakub Hajek
Docker Logging and analysing with Elastic Stack - Jakub Hajek Docker Logging and analysing with Elastic Stack - Jakub Hajek
Docker Logging and analysing with Elastic Stack - Jakub Hajek PROIDEA
 
Docker Logging and analysing with Elastic Stack
Docker Logging and analysing with Elastic StackDocker Logging and analysing with Elastic Stack
Docker Logging and analysing with Elastic StackJakub Hajek
 
Microservices observability
Microservices observabilityMicroservices observability
Microservices observabilityMaxim Shelest
 
Leveraging Hadoop in your PostgreSQL Environment
Leveraging Hadoop in your PostgreSQL EnvironmentLeveraging Hadoop in your PostgreSQL Environment
Leveraging Hadoop in your PostgreSQL EnvironmentJim Mlodgenski
 
Towards Ultra-Large-Scale System: Design of Scalable Software and Next-Gen H...
Towards Ultra-Large-Scale System:  Design of Scalable Software and Next-Gen H...Towards Ultra-Large-Scale System:  Design of Scalable Software and Next-Gen H...
Towards Ultra-Large-Scale System: Design of Scalable Software and Next-Gen H...Arghya Kusum Das
 
Inside SQL Server In-Memory OLTP
Inside SQL Server In-Memory OLTPInside SQL Server In-Memory OLTP
Inside SQL Server In-Memory OLTPBob Ward
 
Optimizing Elastic for Search at McQueen Solutions
Optimizing Elastic for Search at McQueen SolutionsOptimizing Elastic for Search at McQueen Solutions
Optimizing Elastic for Search at McQueen SolutionsElasticsearch
 
vFabric SQLFire for high performance data
vFabric SQLFire for high performance datavFabric SQLFire for high performance data
vFabric SQLFire for high performance dataVMware vFabric
 
Performance Test Driven Development with Oracle Coherence
Performance Test Driven Development with Oracle CoherencePerformance Test Driven Development with Oracle Coherence
Performance Test Driven Development with Oracle Coherencearagozin
 
Looping the Loop with SPL Iterators
Looping the Loop with SPL IteratorsLooping the Loop with SPL Iterators
Looping the Loop with SPL IteratorsMark Baker
 
Safe Automated Refactoring for Intelligent Parallelization of Java 8 Streams
Safe Automated Refactoring for Intelligent Parallelization of Java 8 StreamsSafe Automated Refactoring for Intelligent Parallelization of Java 8 Streams
Safe Automated Refactoring for Intelligent Parallelization of Java 8 StreamsRaffi Khatchadourian
 
Go Profiling - John Graham-Cumming
Go Profiling - John Graham-Cumming Go Profiling - John Graham-Cumming
Go Profiling - John Graham-Cumming Cloudflare
 
Lifecycle of a Solr Search Request - Chris "Hoss" Hostetter, Lucidworks
Lifecycle of a Solr Search Request - Chris "Hoss" Hostetter, LucidworksLifecycle of a Solr Search Request - Chris "Hoss" Hostetter, Lucidworks
Lifecycle of a Solr Search Request - Chris "Hoss" Hostetter, LucidworksLucidworks
 

Similar a Search-time Parallelism: Presented by Shikhar Bhushan, Etsy (20)

The Hadoop Ecosystem
The Hadoop EcosystemThe Hadoop Ecosystem
The Hadoop Ecosystem
 
Databases Have Forgotten About Single Node Performance, A Wrongheaded Trade Off
Databases Have Forgotten About Single Node Performance, A Wrongheaded Trade OffDatabases Have Forgotten About Single Node Performance, A Wrongheaded Trade Off
Databases Have Forgotten About Single Node Performance, A Wrongheaded Trade Off
 
20-Line Lifesavers: Coding simple solutions in the GATK
20-Line Lifesavers: Coding simple solutions in the GATK20-Line Lifesavers: Coding simple solutions in the GATK
20-Line Lifesavers: Coding simple solutions in the GATK
 
Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...
Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...
Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...
 
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
 
Docker Logging and analysing with Elastic Stack - Jakub Hajek
Docker Logging and analysing with Elastic Stack - Jakub Hajek Docker Logging and analysing with Elastic Stack - Jakub Hajek
Docker Logging and analysing with Elastic Stack - Jakub Hajek
 
Docker Logging and analysing with Elastic Stack
Docker Logging and analysing with Elastic StackDocker Logging and analysing with Elastic Stack
Docker Logging and analysing with Elastic Stack
 
Shooting the Rapids
Shooting the RapidsShooting the Rapids
Shooting the Rapids
 
Microservices observability
Microservices observabilityMicroservices observability
Microservices observability
 
Leveraging Hadoop in your PostgreSQL Environment
Leveraging Hadoop in your PostgreSQL EnvironmentLeveraging Hadoop in your PostgreSQL Environment
Leveraging Hadoop in your PostgreSQL Environment
 
Towards Ultra-Large-Scale System: Design of Scalable Software and Next-Gen H...
Towards Ultra-Large-Scale System:  Design of Scalable Software and Next-Gen H...Towards Ultra-Large-Scale System:  Design of Scalable Software and Next-Gen H...
Towards Ultra-Large-Scale System: Design of Scalable Software and Next-Gen H...
 
Inside SQL Server In-Memory OLTP
Inside SQL Server In-Memory OLTPInside SQL Server In-Memory OLTP
Inside SQL Server In-Memory OLTP
 
AI Development with H2O.ai
AI Development with H2O.aiAI Development with H2O.ai
AI Development with H2O.ai
 
Optimizing Elastic for Search at McQueen Solutions
Optimizing Elastic for Search at McQueen SolutionsOptimizing Elastic for Search at McQueen Solutions
Optimizing Elastic for Search at McQueen Solutions
 
vFabric SQLFire for high performance data
vFabric SQLFire for high performance datavFabric SQLFire for high performance data
vFabric SQLFire for high performance data
 
Performance Test Driven Development with Oracle Coherence
Performance Test Driven Development with Oracle CoherencePerformance Test Driven Development with Oracle Coherence
Performance Test Driven Development with Oracle Coherence
 
Looping the Loop with SPL Iterators
Looping the Loop with SPL IteratorsLooping the Loop with SPL Iterators
Looping the Loop with SPL Iterators
 
Safe Automated Refactoring for Intelligent Parallelization of Java 8 Streams
Safe Automated Refactoring for Intelligent Parallelization of Java 8 StreamsSafe Automated Refactoring for Intelligent Parallelization of Java 8 Streams
Safe Automated Refactoring for Intelligent Parallelization of Java 8 Streams
 
Go Profiling - John Graham-Cumming
Go Profiling - John Graham-Cumming Go Profiling - John Graham-Cumming
Go Profiling - John Graham-Cumming
 
Lifecycle of a Solr Search Request - Chris "Hoss" Hostetter, Lucidworks
Lifecycle of a Solr Search Request - Chris "Hoss" Hostetter, LucidworksLifecycle of a Solr Search Request - Chris "Hoss" Hostetter, Lucidworks
Lifecycle of a Solr Search Request - Chris "Hoss" Hostetter, Lucidworks
 

Más de Lucidworks

Search is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce StrategySearch is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce StrategyLucidworks
 
Drive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in SalesforceDrive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in SalesforceLucidworks
 
How Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant ProductsHow Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant ProductsLucidworks
 
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product DiscoveryLucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product DiscoveryLucidworks
 
Connected Experiences Are Personalized Experiences
Connected Experiences Are Personalized ExperiencesConnected Experiences Are Personalized Experiences
Connected Experiences Are Personalized ExperiencesLucidworks
 
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Lucidworks
 
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...Lucidworks
 
Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020Lucidworks
 
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Lucidworks
 
AI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and RosetteAI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and RosetteLucidworks
 
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentThe Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentLucidworks
 
Webinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - EuropeWebinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - EuropeLucidworks
 
Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19Lucidworks
 
Applying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchApplying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchLucidworks
 
Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1Lucidworks
 
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyWebinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyLucidworks
 
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Lucidworks
 
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceApply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceLucidworks
 
Webinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchWebinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchLucidworks
 
Why Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and BeyondWhy Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and BeyondLucidworks
 

Más de Lucidworks (20)

Search is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce StrategySearch is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce Strategy
 
Drive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in SalesforceDrive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in Salesforce
 
How Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant ProductsHow Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant Products
 
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product DiscoveryLucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
 
Connected Experiences Are Personalized Experiences
Connected Experiences Are Personalized ExperiencesConnected Experiences Are Personalized Experiences
Connected Experiences Are Personalized Experiences
 
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
 
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
 
Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020
 
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
 
AI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and RosetteAI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and Rosette
 
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentThe Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
 
Webinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - EuropeWebinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - Europe
 
Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19
 
Applying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchApplying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 Research
 
Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1
 
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyWebinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
 
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
 
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceApply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
 
Webinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchWebinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise Search
 
Why Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and BeyondWhy Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and Beyond
 

Último

A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfMarharyta Nedzelska
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Cizo Technology Services
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odishasmiwainfosol
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceBrainSell Technologies
 
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...Akihiro Suda
 
Understanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM ArchitectureUnderstanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM Architecturerahul_net
 
Large Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLarge Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLionel Briand
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprisepreethippts
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ
 
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdfExploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdfkalichargn70th171
 
VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Developmentvyaparkranti
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Matt Ray
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsSafe Software
 
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsChristian Birchler
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfDrew Moseley
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesŁukasz Chruściel
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsAhmed Mohamed
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfAlina Yurenko
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...confluent
 

Último (20)

A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdf
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. Salesforce
 
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
 
Understanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM ArchitectureUnderstanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM Architecture
 
Large Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLarge Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and Repair
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprise
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
 
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdfExploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
 
VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Development
 
Advantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your BusinessAdvantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your Business
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data Streams
 
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdf
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New Features
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML Diagrams
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 

Search-time Parallelism: Presented by Shikhar Bhushan, Etsy

  • 1.
  • 2. Search-time Parallelism Shikhar Bhushan, Etsy Inc. shikhar@etsy.com @shikhrr
  • 3.
  • 6. Search Infrastructure at Etsy Parallel clusters ‘flip’ & ‘flop’ - dark/live Listings index: 2013: unsharded Solr, one large JVM 2014: locally sharded Solr, 8 smaller JVM’s big win on latency tail
  • 7. Speeding up search • Low-level improvements: use less CPU • Parallelize: use more cores
  • 8. Amdahl’s law “The speedup of a program using multiple processors in parallel computing is limited by the time needed for the sequential fraction of the program.” Wikipedia
  • 9. Sharding Using: • Solr distributed search • SolrCloud • Elasticsearch shard0 hash(pk) % num_shards shard1 shard2 shard3 shard4 shard5 shard6 shard7
  • 10. Why not shard? New challenges arise once you go distributed, some examples: • More moving parts and failure modes to deal with • Missing features with distributed search • Index statistics on shards can vary, distorting IDF Tempting to defer sharding if not necessary due to index size.
  • 11. Segment-level parallelism Fully-functioning mini-indexes within your Lucene index.
  • 12. Collectors! • Hits get accumulated - ‘collected’ by the Collector abstraction. • Invoked for every hit that matches the Query. • Has the Scorer available to get the score for current hit if needed. • Output - e.g. top N hits, number of hits, grouped hits - can be retrieved when done.
  • 13. Existing solution IndexSearcher(IndexReaderContext, ExecutorService) Special-cased for: TopScoreDocCollector (sort by score) and TopFieldCollector (arbitrary sort-spec) Only ships with parallel search support for the above TopDocs collectors.
  • 14. Not composable Difficult to build parallelization for every possible permutation. With Solr you may have: TimeLimitingCollector |— MultiCollector |— TopScoreDocCollector |— DocSetCollector
  • 15. Sync considered harmful protected void search( List<LeafReaderContext> leaves, Weight weight, Collector collector ) throws IOException { // TODO: should we make this // threaded...? the Collector could be sync'd? // always use single thread: for (LeafReaderContext ctx : leaves) { // search each subreader … collect() called for every single document that matches the query. Can expect a lot of contention! IndexSearcher.java
  • 16. Proposed solution LUCENE-5299 Refactor Collector API for parallelism
  • 17. API review: Before public interface Collector { LeafCollector getLeafCollector(LeafReaderContext context) throws IOException; } public interface LeafCollector { void setScorer(Scorer scorer) throws IOException; void collect(int doc) throws IOException; boolean acceptsDocsOutOfOrder(); }
  • 18. New methods: Collector public interface Collector { LeafCollector getLeafCollector(LeafReaderContext context) throws IOException; // NEW METHODS: boolean isParallelizable(); void setParallelized(); void done() throws IOException; }
  • 19. New methods: LeafCollector public interface LeafCollector { void setScorer(Scorer scorer) throws IOException; void collect(int doc) throws IOException; boolean acceptsDocsOutOfOrder(); // NEW METHOD: void leafDone() throws IOException; }
  • 20. Opt-in Collector.isParallelizable() Need every Collector in the chain to be parallelizable - can start attacking at the level of individual collectors. public class MultiCollector implements Collector { … @Override public boolean isParallelizable() { for (Collector c: collectors) { if (!c.isParallelizable()) { return false; } } return true; } … }
  • 21. Don’t penalize serial Collector.setParallelized() ‘Heads-up’ to the Collector whether collection will be parallelized, so it can adapt in case the parallelism-friendly approach has unnecessary cost in the serial case.
  • 22. Non-blocking constructs Guarantee to always execute on primary search thread (existing) LeafCollector Collector.getLeafCollector() (new) void LeafCollector.leafDone() void Collector.done() => safe places to act on shared mutable state
  • 23. New search strategy IndexSearcher(IndexReaderContext, SearchStrategy) IndexSearcher.search() factored into: • SerialSearchStrategy • ParallelSearchStrategy(Executor e, int parallelism) • parallelism used to throttle maximum concurrent tasks at the request-level
  • 24. Parallel search - not just collection • Scoring is thread-safe and segment-level. • Collection is also segment-level, but typically computes its outcome as shared state between leafs e.g. TopDocs over your index. • By making Collector API parallelism-friendly, we can parallelize search as a whole.
  • 25. Stupidly parallelizable public class TotalHitCountCollector implements Collector { private int totalHits; @Override public LeafCollector getLeafCollector(LeafReaderContext context) throws IOException { return new LeafCollector() { private int totalHits = 0; .. @Override public void collect(int doc) throws IOException { totalHits++; } .. @Override public void leafDone() throws IOException { TotalHitCountCollector.this.totalHits += totalHits; } }; } .. @Override public boolean isParallelizable() { return true; } }
  • 26. Fun to parallelize Solr DocSetCollector populates document ID’s in FixedBitSet(maxDoc) - internally a long[] leaf docBase maxDoc docId range 0 0 42 [0 -­‐ 41] 1 42 20 [42 -­‐ 61] To address possible race condition at segment boundaries, when parallelized: • collect() first and last 64 document ID’s for the segment into LeafCollector-private longs, all others into shared bitset. • when leafDone() merge these boundary document ID’s into shared bitset.
  • 27. Bigger tradeoffs Lucene TopScoreDocCollector uses a single priority queue in serial case. When parallelized: • More memory: lazy pool of HitQueue - grab when getLeafCollector(), return when leafDone(), merge when done(). • More computation: in addition to the merge step - less likely to immediately discard hits that won’t eventually make it, as using multiple priority queues.
  • 28. trunk (serial) vs patch (serial) Task QPS baseline StdDev QPS parcol StdDev Pct diff HighSloppyPhrase 175.08 (11.5%) 156.12 (12.7%) -10.8% ( -31% - 15%) LowSloppyPhrase 368.89 (11.5%) 337.39 (14.5%) -8.5% ( -30% - 19%) LowSpanNear 367.40 (13.3%) 336.12 (24.5%) -8.5% ( -40% - 33%) LowPhrase 364.59 (13.2%) 336.56 (15.2%) -7.7% ( -31% - 23%) HighPhrase 125.30 (13.6%) 116.24 (12.2%) -7.2% ( -29% - 21%) MedSpanNear 147.73 (13.6%) 137.15 (20.3%) -7.2% ( -36% - 30%) MedSloppyPhrase 414.70 (16.5%) 386.07 (13.6%) -6.9% ( -31% - 27%) LowTerm 2288.63 (17.4%) 2138.57 (12.7%) -6.6% ( -31% - 28%) Respell 62.78 (86.4%) 58.80 (76.0%) -6.3% ( -90% - 1143%) MedPhrase 350.78 (14.0%) 331.50 (12.5%) -5.5% ( -28% - 24%) HighTerm 779.36 (10.0%) 740.42 (11.8%) -5.0% ( -24% - 18%) PKLookup 238.68 (10.9%) 226.79 (12.8%) -5.0% ( -25% - 20%) OrHighMed 369.05 (11.2%) 351.05 (12.2%) -4.9% ( -25% - 20%) MedTerm 1166.96 (14.6%) 1116.42 (13.2%) -4.3% ( -27% - 27%) AndHighMed 616.82 (12.9%) 590.46 (13.8%) -4.3% ( -27% - 25%) AndHighHigh 155.12 (22.9%) 150.33 (22.2%) -3.1% ( -39% - 54%) Prefix3 522.67 (14.9%) 508.08 (13.4%) -2.8% ( -27% - 29%) OrHighLow 233.72 (11.6%) 227.83 (11.9%) -2.5% ( -23% - 23%) AndHighLow 982.58 (16.0%) 966.78 (13.2%) -1.6% ( -26% - 32%) IntNRQ 88.57 (15.1%) 87.37 (12.9%) -1.4% ( -25% - 31%) HighSpanNear 92.55 (21.2%) 91.55 (11.7%) -1.1% ( -28% - 40%) Wildcard 226.43 (11.5%) 225.37 (11.3%) -0.5% ( -20% - 25%) Fuzzy2 7.02 (17.1%) 7.02 (19.0%) 0.0% ( -30% - 43%) OrHighHigh 242.44 (14.3%) 243.60 (14.0%) 0.5% ( -24% - 33%) Fuzzy1 9.62 (30.7%) 12.40 (102.9%) 28.9% ( -80% - 234%) luceneutil with wikimedium500k SerialSearchStrategy SEARCH_NUM_THREADS=32 Hardware: 32-core Sandy Bridge
  • 29. trunk (serial) vs patch (parallel) Task QPS baseline StdDev QPS parcol StdDev Pct diff LowTerm 2401.88 (12.7%) 1799.27 (6.3%) -25.1% ( -39% - -6%) Fuzzy2 6.52 (14.4%) 5.74 (24.0%) -11.9% ( -43% - 30%) Respell 45.13 (90.2%) 40.94 (83.5%) -9.3% ( -96% - 1679%) PKLookup 232.02 (12.9%) 228.35 (12.4%) -1.6% ( -23% - 27%) MedTerm 1612.01 (14.0%) 1601.71 (10.9%) -0.6% ( -22% - 28%) Fuzzy1 14.19 (79.3%) 14.71 (177.6%) 3.7% (-141% - 1258%) AndHighLow 1205.65 (17.5%) 1254.76 (15.9%) 4.1% ( -24% - 45%) MedSpanNear 478.11 (25.4%) 946.72 (34.5%) 98.0% ( 30% - 211%) OrHighLow 424.71 (14.5%) 941.39 (31.4%) 121.7% ( 66% - 195%) AndHighHigh 377.82 (13.3%) 910.77 (32.2%) 141.1% ( 84% - 215%) HighTerm 325.35 (11.3%) 855.63 (8.9%) 163.0% ( 128% - 206%) AndHighMed 346.57 (11.7%) 914.59 (26.4%) 163.9% ( 112% - 228%) MedPhrase 227.47 (13.1%) 621.50 (22.9%) 173.2% ( 121% - 240%) LowSloppyPhrase 265.21 (10.4%) 748.30 (49.2%) 182.2% ( 110% - 269%) OrHighMed 221.49 (12.2%) 632.55 (23.9%) 185.6% ( 133% - 252%) LowPhrase 190.34 (14.9%) 586.71 (22.6%) 208.2% ( 148% - 288%) Prefix3 305.01 (15.9%) 948.63 (17.0%) 211.0% ( 153% - 289%) MedSloppyPhrase 229.15 (15.0%) 718.29 (41.4%) 213.5% ( 136% - 317%) LowSpanNear 102.98 (14.0%) 323.91 (37.1%) 214.5% ( 143% - 309%) Wildcard 249.66 (13.3%) 787.42 (17.0%) 215.4% ( 163% - 283%) OrHighHigh 124.76 (10.5%) 394.72 (35.0%) 216.4% ( 154% - 292%) HighSpanNear 119.23 (15.5%) 386.33 (57.5%) 224.0% ( 130% - 351%) HighPhrase 86.95 (14.4%) 293.00 (15.5%) 237.0% ( 180% - 311%) HighSloppyPhrase 136.37 (12.9%) 462.38 (21.7%) 239.1% ( 181% - 314%) IntNRQ 100.48 (14.1%) 391.02 (14.2%) 289.1% ( 228% - 369%) luceneutil with wikimedium500k ParallelSearchStrategy parallelism=8 ForkJoinPool with 128 threads SEARCH_NUM_THREADS=32 Hardware: 32-core Sandy Bridge
  • 30. Replay testing Replayed traffic from Etsy listing search request logs to an experimental cluster running LUCENE-5299 changes in an unsharded setup. p95 latency p99 latency serial parallel
  • 31. Throughput In general, system needs to do more work overall, which impacts throughput: • concurrency overhead • context switches • locally optimal choices at the leaf-level • merge cost serial user cpu % parallel user cpu %
  • 32. Sharding comparison Segment-level parallelism Sharding Limited to single JVM. Distributed search not required. Scalable across JVM’s. Distributed search required. Sensitive to segment count and sizing. Index shards can be kept similarly sized. Prone to “shard lag” - limited by slowest shard. In-process merging is cheaper. Merge cost higher due to serialization. Existing solution has limited applicability. LUCENE-5299 solution not in trunk. Tried-and-tested approach.
  • 33. Not mutually exclusive Sharding + Segment-level parallelism = ?
  • 34. Next steps • Figure out whether serial penalty is real. • Semantics around exceptions during collection and ‘done’ callbacks. • Lots more collectors can be made parallelizable. • Your contributions welcome - LUCENE-5299. • Committer interest especially welcome!
  • 35. Thanks Shikhar Bhushan shikhar@etsy.com @shikhrr codeascraft.com etsy.com/careers