Elasticsearch @ ShopWiki 2014-03-20

•Download as PPTX, PDF•

1 like•3,092 views

Slides from the NY Elasticsearch Meetup on May 20, 2014. http://www.meetup.com/Elasticsearch-NY/events/170714812/ http://vimeo.com/90124531

Technology Self Improvement

What is ShopWiki?
• ShopWiki is the retail division of Oversee.net.
• We run a collection of retail websites,
• Including the Comparison Shopping Engines (CSE)
– ShopWiki.com
– Compare.com

How do we use Elasticsearch?
• You know, for search (not logging).
• We index millions of products, offered from
hundreds of thousands of stores, and allow
users to search them.

Why Elasticsearch?
• ShopWiki was built using a proprietary search
server written in C++.
• Served us well for many years, but it needed
improvements, especially for non-English
language search.
• What about Lucene-based solutions?

Solr3
• We tried out Solr3 when building
CouponFinder.com.
• Solr worked well (for English & French), but
the coupon dataset is small in comparison to
our product dataset.
• The setup was simple master-slave replication.

How do we scale?
• To use Solr for our product data we needed to
shard the data across multiple machines.
• But, Solr3’s sharding capabilities were clunky
and difficult to use.
• Enter Elasticsearch!
• Designed to scale out-of-the-box.

Compare.com
• Compare.com was built using Elasticsearch
from the start.
• Allowed us to get up & running very quickly.
• Allowed us to scale up very quickly.
– 60 million products and growing.
• Allows us iterate on new features quickly.

Other Languages
• ShopWiki search is being gradually ported to
Elasticsearch.
• Allows us to have better non-English search
right out-of-the-box.
– French
– German
– Dutch
– Spanish

Our Elasticsearch Cluster
• 12 indices, one for each website.
• 3 replicas per shard.
• 3 master nodes (quorum of 2).
• 6 data nodes.
• Plan to add more data nodes as we proceed with
our migration of ShopWiki (500m products).
• Expect to need less hardware than the C++.
cluster (uses 50+ machines).

Realtime Updates
• C++ search servers need to have the entire
dataset re-indexed and swapped out all at
once.
• Could only do this oncea day, at night (affects
performance).
• With Elasticsearch, we can update our data all
the time (it’s not even a limiting factor).

Top-N Faceting
• The solution in Solr is to limit facets to the
top-N results.
• Elasticsearch doesn’t have this feature (as
mentioned at last Meetup).
• Solution: TermsStatsFacet(AKA aggregations in 1.0)
• Allows us to get the brands/stores with the
most relevant results.
• E.g. Σ(scoren) n allows us to tune facet results to our liking

N = 0 (same as count)
TermsStatsFacet for Brands
Query: “mixing bowl”
Σ(scoren)
N = 4

$De-duping Products • Use “more_like_this” query to find similar products. • If result’s score is “high enough”, it’s likely the same product from a different store. • “High enough” is defined as a fraction of the identity match’s score.$

• Questions?
• Rob Stewart
• Lead Software Engineer
• rstewart@shopwiki.com

What's hot

Surviving Hadoop on AWSSoren Macbeth

AWS Customer Presenatation - SlingMedia uses AWSAmazon Web Services

How to reduce hosting costs for Redis based applications on JavaNikita Koksharov

AWS Enterprise Summit London 2013 - Bob Harris - Channel 4 Amazon Web Services

Recover from accidental deletions of your snapshots using recycle binDhaval Soni

AWS Webcast - Webinar Series for State and Local Government #2: Discover the ...Amazon Web Services

Aem asset optimizations & best practicesKanika Gera

Scaling drupal on amazon web services drTristan Roddis

Amazon Web Services Customer Case Study, Fashion for HomeAmazon Web Services

AWS Customer Presentation - ZyngaAmazon Web Services

Scalable Eventing Over Apache MesosOlivier Paugam

AWS Webcast - Explore the AWS CloudAmazon Web Services

AWS for Start-ups - Case Study - PeoplePerHour Amazon Web Services

AEM - Key Learning from EscalationsKanika Gera

Rp Nmoorenicat98

What's hot (15)

Surviving Hadoop on AWS

AWS Customer Presenatation - SlingMedia uses AWS

How to reduce hosting costs for Redis based applications on Java

AWS Enterprise Summit London 2013 - Bob Harris - Channel 4

Recover from accidental deletions of your snapshots using recycle bin

AWS Webcast - Webinar Series for State and Local Government #2: Discover the ...

Aem asset optimizations & best practices

Scaling drupal on amazon web services dr

Amazon Web Services Customer Case Study, Fashion for Home

AWS Customer Presentation - Zynga

Scalable Eventing Over Apache Mesos

AWS Webcast - Explore the AWS Cloud

AWS for Start-ups - Case Study - PeoplePerHour

AEM - Key Learning from Escalations

Rp Nmoore

Similar to Elasticsearch @ ShopWiki 2014-03-20

Tiago Fonseca & Rui Velho - Syone & Leroy Merlin - OSL19marketingsyone

Amazon Redshift Deep Dive Amazon Web Services

[DBA]_HiramFleitas_SQL_PASS_Summit_2017_SummaryHiram Fleitas León

Getting started with Laravel & ElasticsearchPeter Steenbergen

Boosting the Performance of your Rails AppsMatt Kuklinski

Enterprise Data World 2018 - Building Cloud Self-Service Analytical SolutionDmitry Anoshin

Modernizing WordPress Search with ElasticsearchTaylor Lovett

Solr and ElasticSearch demo and speaker feb 2014nkabra

Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases (DAT205) ...Amazon Web Services

Redshift deep diveAmazon Web Services LATAM

2017 09-27 democratize data products with SQLYu Ishikawa

First Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNATomas Cervenka

Building an Enterprise-Scale Dashboarding/Analytics Platform Powered by the C...Imply

Jeremy Engle's slides from Redshift / Big Data meetup on July 13, 2017AWS Chicago

Getting Started with ElasticsearchAlibaba Cloud

SenchaCon 2016: How to Auto Generate a Back-end in Minutes - Per Minborg, Emi...Sencha

Solr @ eBay KleinanzeigenLucidworks (Archived)

SenchaCon 2016 - How to Auto Generate a Back-end in MinutesMalin Weiss

SenchaCon 2016 - How to Auto Generate a Back-end in MinutesSpeedment, Inc.

Building WordPress eCommerce at Scale .pdfWP Engine

Similar to Elasticsearch @ ShopWiki 2014-03-20 (20)

Tiago Fonseca & Rui Velho - Syone & Leroy Merlin - OSL19

Amazon Redshift Deep Dive

[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary

Getting started with Laravel & Elasticsearch

Boosting the Performance of your Rails Apps

Enterprise Data World 2018 - Building Cloud Self-Service Analytical Solution

Modernizing WordPress Search with Elasticsearch

Solr and ElasticSearch demo and speaker feb 2014

Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases (DAT205) ...

Redshift deep dive

2017 09-27 democratize data products with SQL

First Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNA

Building an Enterprise-Scale Dashboarding/Analytics Platform Powered by the C...

Jeremy Engle's slides from Redshift / Big Data meetup on July 13, 2017

Getting Started with Elasticsearch

SenchaCon 2016: How to Auto Generate a Back-end in Minutes - Per Minborg, Emi...

Solr @ eBay Kleinanzeigen

SenchaCon 2016 - How to Auto Generate a Back-end in Minutes

Building WordPress eCommerce at Scale .pdf

Recently uploaded

How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science

Scaling API-first – The story of a global engineering organizationRadu Cotescu

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays

A Year of the Servo Reboot: Where Are We Now?Igalia

Histor y of HAM Radio presentation slidevu2urc

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung

Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer

Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1

What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco

Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC

How to convert PDF to text with Nanonetsnaman860154

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j

Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech

Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun

IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge

A Call to Action for Generative AI in 2024Results

Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun

Recently uploaded (20)

How to Troubleshoot Apps for the Modern Connected Worker

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx

Scaling API-first – The story of a global engineering organization

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...

A Year of the Servo Reboot: Where Are We Now?

Histor y of HAM Radio presentation slide

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...

Handwritten Text Recognition for manuscripts and early printed texts

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024

Boost Fertility New Invention Ups Success Rates.pdf

What Are The Drone Anti-jamming Systems Technology?

Breaking the Kubernetes Kill Chain: Host Path Mount

How to convert PDF to text with Nanonets

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...

Advantages of Hiring UIUX Design Service Providers for Your Business

Data Cloud, More than a CDP by Matt Robison

IAC 2024 - IA Fast Track to Search Focused AI Solutions

A Call to Action for Generative AI in 2024

Powerful Google developer tools for immediate impact! (2023-24 C)

Elasticsearch @ ShopWiki 2014-03-20

1. Elasticsearch @ ShopWiki

2. What is ShopWiki? • ShopWiki is the retail division of Oversee.net. • We run a collection of retail websites, • Including the Comparison Shopping Engines (CSE) – ShopWiki.com – Compare.com

5. How do we use Elasticsearch? • You know, for search (not logging). • We index millions of products, offered from hundreds of thousands of stores, and allow users to search them.

6. Why Elasticsearch? • ShopWiki was built using a proprietary search server written in C++. • Served us well for many years, but it needed improvements, especially for non-English language search. • What about Lucene-based solutions?

7. Solr3 • We tried out Solr3 when building CouponFinder.com. • Solr worked well (for English & French), but the coupon dataset is small in comparison to our product dataset. • The setup was simple master-slave replication.

8. How do we scale? • To use Solr for our product data we needed to shard the data across multiple machines. • But, Solr3’s sharding capabilities were clunky and difficult to use. • Enter Elasticsearch! • Designed to scale out-of-the-box.

9. Compare.com • Compare.com was built using Elasticsearch from the start. • Allowed us to get up & running very quickly. • Allowed us to scale up very quickly. – 60 million products and growing. • Allows us iterate on new features quickly.

10. Other Languages • ShopWiki search is being gradually ported to Elasticsearch. • Allows us to have better non-English search right out-of-the-box. – French – German – Dutch – Spanish

11. Our Elasticsearch Cluster • 12 indices, one for each website. • 3 replicas per shard. • 3 master nodes (quorum of 2). • 6 data nodes. • Plan to add more data nodes as we proceed with our migration of ShopWiki (500m products). • Expect to need less hardware than the C++. cluster (uses 50+ machines).

12. Elasticsearch Head

13. Realtime Updates • C++ search servers need to have the entire dataset re-indexed and swapped out all at once. • Could only do this oncea day, at night (affects performance). • With Elasticsearch, we can update our data all the time (it’s not even a limiting factor).

14. Challenges • Use TermsFacet to suggest filters to the user. • E.g. filter by stores or brands. • Using the 10 most frequent brands from a search can produce bad results. – A single brand may have lots of products that are all weakly relevant.

15. Top-N Faceting • The solution in Solr is to limit facets to the top-N results. • Elasticsearch doesn’t have this feature (as mentioned at last Meetup). • Solution: TermsStatsFacet(AKA aggregations in 1.0) • Allows us to get the brands/stores with the most relevant results. • E.g. Σ(scoren) n allows us to tune facet results to our liking

16. N = 0 (same as count) TermsStatsFacet for Brands Query: “mixing bowl” Σ(scoren) N = 4

17. De-duping Products • Use “more_like_this” query to find similar products. • If result’s score is “high enough”, it’s likely the same product from a different store. • “High enough” is defined as a fraction of the identity match’s score.

18. • Questions? • Rob Stewart • Lead Software Engineer • rstewart@shopwiki.com

Editor's Notes

Similar functionality.Different business models (SEO vs SEM).ShopWiki.com was first.
Long tail shopping.
CouponFinder.com is coupon search website.
Compare.com launchedSeptember, 2012.
shopwiki.com, shopwiki.co.ukshopwiki.frshopwiki.deshopwiki.nlshopwiki.es

Elasticsearch @ ShopWiki 2014-03-20

Recommended

Recommended

More Related Content

What's hot

What's hot (15)

Similar to Elasticsearch @ ShopWiki 2014-03-20

Similar to Elasticsearch @ ShopWiki 2014-03-20 (20)

Recently uploaded

Recently uploaded (20)

Elasticsearch @ ShopWiki 2014-03-20

Editor's Notes