SlideShare a Scribd company logo
1 of 18
Elasticsearch
@ ShopWiki
What is ShopWiki?
• ShopWiki is the retail division of Oversee.net.
• We run a collection of retail websites,
• Including the Comparison Shopping Engines (CSE)
– ShopWiki.com
– Compare.com
How do we use Elasticsearch?
• You know, for search (not logging).
• We index millions of products, offered from
hundreds of thousands of stores, and allow
users to search them.
Why Elasticsearch?
• ShopWiki was built using a proprietary search
server written in C++.
• Served us well for many years, but it needed
improvements, especially for non-English
language search.
• What about Lucene-based solutions?
Solr3
• We tried out Solr3 when building
CouponFinder.com.
• Solr worked well (for English & French), but
the coupon dataset is small in comparison to
our product dataset.
• The setup was simple master-slave replication.
How do we scale?
• To use Solr for our product data we needed to
shard the data across multiple machines.
• But, Solr3’s sharding capabilities were clunky
and difficult to use.
• Enter Elasticsearch!
• Designed to scale out-of-the-box.
Compare.com
• Compare.com was built using Elasticsearch
from the start.
• Allowed us to get up & running very quickly.
• Allowed us to scale up very quickly.
– 60 million products and growing.
• Allows us iterate on new features quickly.
Other Languages
• ShopWiki search is being gradually ported to
Elasticsearch.
• Allows us to have better non-English search
right out-of-the-box.
– French
– German
– Dutch
– Spanish
Our Elasticsearch Cluster
• 12 indices, one for each website.
• 3 replicas per shard.
• 3 master nodes (quorum of 2).
• 6 data nodes.
• Plan to add more data nodes as we proceed with
our migration of ShopWiki (500m products).
• Expect to need less hardware than the C++.
cluster (uses 50+ machines).
Elasticsearch Head
Realtime Updates
• C++ search servers need to have the entire
dataset re-indexed and swapped out all at
once.
• Could only do this oncea day, at night (affects
performance).
• With Elasticsearch, we can update our data all
the time (it’s not even a limiting factor).
Challenges
• Use TermsFacet to suggest filters to the user.
• E.g. filter by stores or brands.
• Using the 10 most frequent brands from a
search can produce bad results.
– A single brand may have lots of products that are
all weakly relevant.
Top-N Faceting
• The solution in Solr is to limit facets to the
top-N results.
• Elasticsearch doesn’t have this feature (as
mentioned at last Meetup).
• Solution: TermsStatsFacet(AKA aggregations in 1.0)
• Allows us to get the brands/stores with the
most relevant results.
• E.g. Σ(scoren) n allows us to tune facet results to our liking
N = 0 (same as count)
TermsStatsFacet for Brands
Query: “mixing bowl”
Σ(scoren)
N = 4
De-duping Products
• Use “more_like_this” query to find similar
products.
• If result’s score is “high enough”, it’s likely the
same product from a different store.
• “High enough” is defined as a fraction of the
identity match’s score.
• Questions?
• Rob Stewart
• Lead Software Engineer
• rstewart@shopwiki.com

More Related Content

What's hot

Surviving Hadoop on AWS
Surviving Hadoop on AWSSurviving Hadoop on AWS
Surviving Hadoop on AWSSoren Macbeth
 
AWS Customer Presenatation - SlingMedia uses AWS
AWS Customer Presenatation - SlingMedia uses AWSAWS Customer Presenatation - SlingMedia uses AWS
AWS Customer Presenatation - SlingMedia uses AWSAmazon Web Services
 
How to reduce hosting costs for Redis based applications on Java
How to reduce hosting costs for Redis based applications on JavaHow to reduce hosting costs for Redis based applications on Java
How to reduce hosting costs for Redis based applications on JavaNikita Koksharov
 
AWS Enterprise Summit London 2013 - Bob Harris - Channel 4
AWS Enterprise Summit London 2013 - Bob Harris - Channel 4 AWS Enterprise Summit London 2013 - Bob Harris - Channel 4
AWS Enterprise Summit London 2013 - Bob Harris - Channel 4 Amazon Web Services
 
Recover from accidental deletions of your snapshots using recycle bin
Recover from accidental deletions of your snapshots using recycle binRecover from accidental deletions of your snapshots using recycle bin
Recover from accidental deletions of your snapshots using recycle binDhaval Soni
 
AWS Webcast - Webinar Series for State and Local Government #2: Discover the ...
AWS Webcast - Webinar Series for State and Local Government #2: Discover the ...AWS Webcast - Webinar Series for State and Local Government #2: Discover the ...
AWS Webcast - Webinar Series for State and Local Government #2: Discover the ...Amazon Web Services
 
Aem asset optimizations & best practices
Aem asset optimizations & best practicesAem asset optimizations & best practices
Aem asset optimizations & best practicesKanika Gera
 
Scaling drupal on amazon web services dr
Scaling drupal on amazon web services drScaling drupal on amazon web services dr
Scaling drupal on amazon web services drTristan Roddis
 
Amazon Web Services Customer Case Study, Fashion for Home
Amazon Web Services Customer Case Study, Fashion for HomeAmazon Web Services Customer Case Study, Fashion for Home
Amazon Web Services Customer Case Study, Fashion for HomeAmazon Web Services
 
AWS Customer Presentation - Zynga
AWS Customer Presentation - ZyngaAWS Customer Presentation - Zynga
AWS Customer Presentation - ZyngaAmazon Web Services
 
Scalable Eventing Over Apache Mesos
Scalable Eventing Over Apache MesosScalable Eventing Over Apache Mesos
Scalable Eventing Over Apache MesosOlivier Paugam
 
AWS Webcast - Explore the AWS Cloud
AWS Webcast - Explore the AWS CloudAWS Webcast - Explore the AWS Cloud
AWS Webcast - Explore the AWS CloudAmazon Web Services
 
AWS for Start-ups - Case Study - PeoplePerHour
AWS for Start-ups - Case Study - PeoplePerHour AWS for Start-ups - Case Study - PeoplePerHour
AWS for Start-ups - Case Study - PeoplePerHour Amazon Web Services
 
AEM - Key Learning from Escalations
AEM - Key Learning from EscalationsAEM - Key Learning from Escalations
AEM - Key Learning from EscalationsKanika Gera
 
Rp Nmoore
Rp NmooreRp Nmoore
Rp Nmoorenicat98
 

What's hot (15)

Surviving Hadoop on AWS
Surviving Hadoop on AWSSurviving Hadoop on AWS
Surviving Hadoop on AWS
 
AWS Customer Presenatation - SlingMedia uses AWS
AWS Customer Presenatation - SlingMedia uses AWSAWS Customer Presenatation - SlingMedia uses AWS
AWS Customer Presenatation - SlingMedia uses AWS
 
How to reduce hosting costs for Redis based applications on Java
How to reduce hosting costs for Redis based applications on JavaHow to reduce hosting costs for Redis based applications on Java
How to reduce hosting costs for Redis based applications on Java
 
AWS Enterprise Summit London 2013 - Bob Harris - Channel 4
AWS Enterprise Summit London 2013 - Bob Harris - Channel 4 AWS Enterprise Summit London 2013 - Bob Harris - Channel 4
AWS Enterprise Summit London 2013 - Bob Harris - Channel 4
 
Recover from accidental deletions of your snapshots using recycle bin
Recover from accidental deletions of your snapshots using recycle binRecover from accidental deletions of your snapshots using recycle bin
Recover from accidental deletions of your snapshots using recycle bin
 
AWS Webcast - Webinar Series for State and Local Government #2: Discover the ...
AWS Webcast - Webinar Series for State and Local Government #2: Discover the ...AWS Webcast - Webinar Series for State and Local Government #2: Discover the ...
AWS Webcast - Webinar Series for State and Local Government #2: Discover the ...
 
Aem asset optimizations & best practices
Aem asset optimizations & best practicesAem asset optimizations & best practices
Aem asset optimizations & best practices
 
Scaling drupal on amazon web services dr
Scaling drupal on amazon web services drScaling drupal on amazon web services dr
Scaling drupal on amazon web services dr
 
Amazon Web Services Customer Case Study, Fashion for Home
Amazon Web Services Customer Case Study, Fashion for HomeAmazon Web Services Customer Case Study, Fashion for Home
Amazon Web Services Customer Case Study, Fashion for Home
 
AWS Customer Presentation - Zynga
AWS Customer Presentation - ZyngaAWS Customer Presentation - Zynga
AWS Customer Presentation - Zynga
 
Scalable Eventing Over Apache Mesos
Scalable Eventing Over Apache MesosScalable Eventing Over Apache Mesos
Scalable Eventing Over Apache Mesos
 
AWS Webcast - Explore the AWS Cloud
AWS Webcast - Explore the AWS CloudAWS Webcast - Explore the AWS Cloud
AWS Webcast - Explore the AWS Cloud
 
AWS for Start-ups - Case Study - PeoplePerHour
AWS for Start-ups - Case Study - PeoplePerHour AWS for Start-ups - Case Study - PeoplePerHour
AWS for Start-ups - Case Study - PeoplePerHour
 
AEM - Key Learning from Escalations
AEM - Key Learning from EscalationsAEM - Key Learning from Escalations
AEM - Key Learning from Escalations
 
Rp Nmoore
Rp NmooreRp Nmoore
Rp Nmoore
 

Similar to Elasticsearch @ ShopWiki 2014-03-20

Tiago Fonseca & Rui Velho - Syone & Leroy Merlin - OSL19
Tiago Fonseca & Rui Velho - Syone & Leroy Merlin - OSL19Tiago Fonseca & Rui Velho - Syone & Leroy Merlin - OSL19
Tiago Fonseca & Rui Velho - Syone & Leroy Merlin - OSL19marketingsyone
 
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_SummaryHiram Fleitas León
 
Getting started with Laravel & Elasticsearch
Getting started with Laravel & ElasticsearchGetting started with Laravel & Elasticsearch
Getting started with Laravel & ElasticsearchPeter Steenbergen
 
Boosting the Performance of your Rails Apps
Boosting the Performance of your Rails AppsBoosting the Performance of your Rails Apps
Boosting the Performance of your Rails AppsMatt Kuklinski
 
Enterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
Enterprise Data World 2018 - Building Cloud Self-Service Analytical SolutionEnterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
Enterprise Data World 2018 - Building Cloud Self-Service Analytical SolutionDmitry Anoshin
 
Modernizing WordPress Search with Elasticsearch
Modernizing WordPress Search with ElasticsearchModernizing WordPress Search with Elasticsearch
Modernizing WordPress Search with ElasticsearchTaylor Lovett
 
Solr and ElasticSearch demo and speaker feb 2014
Solr  and ElasticSearch demo and speaker feb 2014Solr  and ElasticSearch demo and speaker feb 2014
Solr and ElasticSearch demo and speaker feb 2014nkabra
 
Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases (DAT205) ...
Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases (DAT205) ...Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases (DAT205) ...
Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases (DAT205) ...Amazon Web Services
 
2017 09-27 democratize data products with SQL
2017 09-27 democratize data products with SQL2017 09-27 democratize data products with SQL
2017 09-27 democratize data products with SQLYu Ishikawa
 
First Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNA
First Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNAFirst Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNA
First Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNATomas Cervenka
 
Building an Enterprise-Scale Dashboarding/Analytics Platform Powered by the C...
Building an Enterprise-Scale Dashboarding/Analytics Platform Powered by the C...Building an Enterprise-Scale Dashboarding/Analytics Platform Powered by the C...
Building an Enterprise-Scale Dashboarding/Analytics Platform Powered by the C...Imply
 
Jeremy Engle's slides from Redshift / Big Data meetup on July 13, 2017
Jeremy Engle's slides from Redshift / Big Data meetup on July 13, 2017Jeremy Engle's slides from Redshift / Big Data meetup on July 13, 2017
Jeremy Engle's slides from Redshift / Big Data meetup on July 13, 2017AWS Chicago
 
Getting Started with Elasticsearch
Getting Started with ElasticsearchGetting Started with Elasticsearch
Getting Started with ElasticsearchAlibaba Cloud
 
SenchaCon 2016: How to Auto Generate a Back-end in Minutes - Per Minborg, Emi...
SenchaCon 2016: How to Auto Generate a Back-end in Minutes - Per Minborg, Emi...SenchaCon 2016: How to Auto Generate a Back-end in Minutes - Per Minborg, Emi...
SenchaCon 2016: How to Auto Generate a Back-end in Minutes - Per Minborg, Emi...Sencha
 
SenchaCon 2016 - How to Auto Generate a Back-end in Minutes
SenchaCon 2016 - How to Auto Generate a Back-end in MinutesSenchaCon 2016 - How to Auto Generate a Back-end in Minutes
SenchaCon 2016 - How to Auto Generate a Back-end in MinutesMalin Weiss
 
SenchaCon 2016 - How to Auto Generate a Back-end in Minutes
SenchaCon 2016 - How to Auto Generate a Back-end in MinutesSenchaCon 2016 - How to Auto Generate a Back-end in Minutes
SenchaCon 2016 - How to Auto Generate a Back-end in MinutesSpeedment, Inc.
 
Building WordPress eCommerce at Scale .pdf
Building WordPress eCommerce at Scale .pdfBuilding WordPress eCommerce at Scale .pdf
Building WordPress eCommerce at Scale .pdfWP Engine
 

Similar to Elasticsearch @ ShopWiki 2014-03-20 (20)

Tiago Fonseca & Rui Velho - Syone & Leroy Merlin - OSL19
Tiago Fonseca & Rui Velho - Syone & Leroy Merlin - OSL19Tiago Fonseca & Rui Velho - Syone & Leroy Merlin - OSL19
Tiago Fonseca & Rui Velho - Syone & Leroy Merlin - OSL19
 
Amazon Redshift Deep Dive
Amazon Redshift Deep Dive Amazon Redshift Deep Dive
Amazon Redshift Deep Dive
 
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
 
Getting started with Laravel & Elasticsearch
Getting started with Laravel & ElasticsearchGetting started with Laravel & Elasticsearch
Getting started with Laravel & Elasticsearch
 
Boosting the Performance of your Rails Apps
Boosting the Performance of your Rails AppsBoosting the Performance of your Rails Apps
Boosting the Performance of your Rails Apps
 
Enterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
Enterprise Data World 2018 - Building Cloud Self-Service Analytical SolutionEnterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
Enterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
 
Modernizing WordPress Search with Elasticsearch
Modernizing WordPress Search with ElasticsearchModernizing WordPress Search with Elasticsearch
Modernizing WordPress Search with Elasticsearch
 
Solr and ElasticSearch demo and speaker feb 2014
Solr  and ElasticSearch demo and speaker feb 2014Solr  and ElasticSearch demo and speaker feb 2014
Solr and ElasticSearch demo and speaker feb 2014
 
Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases (DAT205) ...
Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases (DAT205) ...Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases (DAT205) ...
Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases (DAT205) ...
 
Redshift deep dive
Redshift deep diveRedshift deep dive
Redshift deep dive
 
2017 09-27 democratize data products with SQL
2017 09-27 democratize data products with SQL2017 09-27 democratize data products with SQL
2017 09-27 democratize data products with SQL
 
First Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNA
First Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNAFirst Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNA
First Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNA
 
Building an Enterprise-Scale Dashboarding/Analytics Platform Powered by the C...
Building an Enterprise-Scale Dashboarding/Analytics Platform Powered by the C...Building an Enterprise-Scale Dashboarding/Analytics Platform Powered by the C...
Building an Enterprise-Scale Dashboarding/Analytics Platform Powered by the C...
 
Jeremy Engle's slides from Redshift / Big Data meetup on July 13, 2017
Jeremy Engle's slides from Redshift / Big Data meetup on July 13, 2017Jeremy Engle's slides from Redshift / Big Data meetup on July 13, 2017
Jeremy Engle's slides from Redshift / Big Data meetup on July 13, 2017
 
Getting Started with Elasticsearch
Getting Started with ElasticsearchGetting Started with Elasticsearch
Getting Started with Elasticsearch
 
SenchaCon 2016: How to Auto Generate a Back-end in Minutes - Per Minborg, Emi...
SenchaCon 2016: How to Auto Generate a Back-end in Minutes - Per Minborg, Emi...SenchaCon 2016: How to Auto Generate a Back-end in Minutes - Per Minborg, Emi...
SenchaCon 2016: How to Auto Generate a Back-end in Minutes - Per Minborg, Emi...
 
Solr @ eBay Kleinanzeigen
Solr @ eBay KleinanzeigenSolr @ eBay Kleinanzeigen
Solr @ eBay Kleinanzeigen
 
SenchaCon 2016 - How to Auto Generate a Back-end in Minutes
SenchaCon 2016 - How to Auto Generate a Back-end in MinutesSenchaCon 2016 - How to Auto Generate a Back-end in Minutes
SenchaCon 2016 - How to Auto Generate a Back-end in Minutes
 
SenchaCon 2016 - How to Auto Generate a Back-end in Minutes
SenchaCon 2016 - How to Auto Generate a Back-end in MinutesSenchaCon 2016 - How to Auto Generate a Back-end in Minutes
SenchaCon 2016 - How to Auto Generate a Back-end in Minutes
 
Building WordPress eCommerce at Scale .pdf
Building WordPress eCommerce at Scale .pdfBuilding WordPress eCommerce at Scale .pdf
Building WordPress eCommerce at Scale .pdf
 

Recently uploaded

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 

Recently uploaded (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 

Elasticsearch @ ShopWiki 2014-03-20

  • 2. What is ShopWiki? • ShopWiki is the retail division of Oversee.net. • We run a collection of retail websites, • Including the Comparison Shopping Engines (CSE) – ShopWiki.com – Compare.com
  • 3.
  • 4.
  • 5. How do we use Elasticsearch? • You know, for search (not logging). • We index millions of products, offered from hundreds of thousands of stores, and allow users to search them.
  • 6. Why Elasticsearch? • ShopWiki was built using a proprietary search server written in C++. • Served us well for many years, but it needed improvements, especially for non-English language search. • What about Lucene-based solutions?
  • 7. Solr3 • We tried out Solr3 when building CouponFinder.com. • Solr worked well (for English & French), but the coupon dataset is small in comparison to our product dataset. • The setup was simple master-slave replication.
  • 8. How do we scale? • To use Solr for our product data we needed to shard the data across multiple machines. • But, Solr3’s sharding capabilities were clunky and difficult to use. • Enter Elasticsearch! • Designed to scale out-of-the-box.
  • 9. Compare.com • Compare.com was built using Elasticsearch from the start. • Allowed us to get up & running very quickly. • Allowed us to scale up very quickly. – 60 million products and growing. • Allows us iterate on new features quickly.
  • 10. Other Languages • ShopWiki search is being gradually ported to Elasticsearch. • Allows us to have better non-English search right out-of-the-box. – French – German – Dutch – Spanish
  • 11. Our Elasticsearch Cluster • 12 indices, one for each website. • 3 replicas per shard. • 3 master nodes (quorum of 2). • 6 data nodes. • Plan to add more data nodes as we proceed with our migration of ShopWiki (500m products). • Expect to need less hardware than the C++. cluster (uses 50+ machines).
  • 13. Realtime Updates • C++ search servers need to have the entire dataset re-indexed and swapped out all at once. • Could only do this oncea day, at night (affects performance). • With Elasticsearch, we can update our data all the time (it’s not even a limiting factor).
  • 14. Challenges • Use TermsFacet to suggest filters to the user. • E.g. filter by stores or brands. • Using the 10 most frequent brands from a search can produce bad results. – A single brand may have lots of products that are all weakly relevant.
  • 15. Top-N Faceting • The solution in Solr is to limit facets to the top-N results. • Elasticsearch doesn’t have this feature (as mentioned at last Meetup). • Solution: TermsStatsFacet(AKA aggregations in 1.0) • Allows us to get the brands/stores with the most relevant results. • E.g. Σ(scoren) n allows us to tune facet results to our liking
  • 16. N = 0 (same as count) TermsStatsFacet for Brands Query: “mixing bowl” Σ(scoren) N = 4
  • 17. De-duping Products • Use “more_like_this” query to find similar products. • If result’s score is “high enough”, it’s likely the same product from a different store. • “High enough” is defined as a fraction of the identity match’s score.
  • 18. • Questions? • Rob Stewart • Lead Software Engineer • rstewart@shopwiki.com

Editor's Notes

  1. Similar functionality.Different business models (SEO vs SEM).ShopWiki.com was first.
  2. Long tail shopping.
  3. CouponFinder.com is coupon search website.
  4. Compare.com launchedSeptember, 2012.
  5. shopwiki.com, shopwiki.co.ukshopwiki.frshopwiki.deshopwiki.nlshopwiki.es