SlideShare una empresa de Scribd logo
1 de 27
Descargar para leer sin conexión
Vlassios Rizopoulos
Chief Technology Officer @ pricesearcher.com
What a search engine can teach you about product
sitemaps
@Pricesearcher #BrightonSEO
@Pricesearcher #BrightonSEO
BACKGROUND
Pricesearcher is a vertical search
engine focusing on products and their
prices.
Our mission is to provide access to all
the worlds prices in one place.
@Pricesearcher #BrightonSEO
OUR MISSION IS TO INDEX ALL THE WORLD’S PRICES
@Pricesearcher #BrightonSEO
SOURCES OF DATA
Product feeds
from 5000+ retailers
Developed plugins
Developed PriceBot to
complete the picture
@Pricesearcher #BrightonSEO
PROGRESS TO DATE
Gathered data on 1.1 Billion products
Online in 11 Countries
Gathered 91 Billion price points for our products On average we check the price of a product 3 times a
day
We have gathered:
17,000,000 ISBNs
144,000,000 MPNs
73,000,000 SKUs
157,000,000 GTINs
GB / US / DE / FR / IT / IE / NO / SE / FI / DK / NG
@Pricesearcher #BrightonSEO
WHAT IS PRICEBOT?
Pricebot is our proprietary crawler, built to discover products and turn unstructured data
from web pages into structured data for our product database
Pricesearcher is the only product search engine that crawls to complement our product
coverage
PriceBot is fully robots.txt compliant, leaves behind a footprint in its user agent and has a
built-in feedback mechanism
http://www.pricesearcher.com/pricebot
@Pricesearcher #BrightonSEO
WHAT INFORMATION IS PRICEBOT COLLECTING?
We are looking to extract the following fields:
• Product Title
• Product Image
• Product Price
and optionally:
• Product Description
• Product Identifier (GTIN/UPC/EAN/ISBN)
• Product Brand
• Product Category
• Product Stock Availability
Vastly simplified discovering all the products from retailers
@Pricesearcher #BrightonSEO
INITIAL CRAWLING TECH DEPENDED ON SITEMAPS
@Pricesearcher #BrightonSEO
DATA SAMPLE
We will focus on 4000 UK retailers
we currently crawl using XML sitemaps discovering
20million+ products
@Pricesearcher #BrightonSEO
TOP
10
Data Insights
from our crawling tech
@Pricesearcher #BrightonSEO
1. SITEMAP DATA
have an XML sitemap
with product links
that’s regularly updated
91%
61%
54%
of retailer websites
of retailer websites
of retailer websites
@Pricesearcher #BrightonSEO
2. BLOCKING OF CRAWLERS
have blocked us unintentionally
(generic robots.txt entry
or 403 automatic block)
have blocked us intentionally
(robots.txt entry)
2%
of retailer websites
0.05%
of retailer websites
@Pricesearcher #BrightonSEO
3. EXTRACTION USING METADATA STANDARDS
have product title + price + image
defined using meta / opengraph tags
have product title + price + image
defined using meta / itemprop tags
(schema)
have product title + price + image defined
using both
41%
36%
12%
of retailer websites
of retailer websites
of retailer websites
@Pricesearcher #BrightonSEO
4. EXTRACTION USING JAVASCRIPT
no info extracted due to heavy rendering
being uneconomical
price cannot be extracted as it is
converted / calculated on the fly
2%
of retailer websites
1%
of retailer websites
@Pricesearcher #BrightonSEO
5. SITEMAP LINKS
have multiple links to the same
product pages
have multiple links to pages that
return 404 codes
2%
of retailer websites
3%
of retailer websites
@Pricesearcher #BrightonSEO
6. PRODUCT IDENTIFIERS
provide a GTIN-14, EAN-13, UPC-12/8
for their products
provide an SKU for their products
provide an ISBN for their products
24%
of retailer websites
7%
of retailer websites
3%
of retailer websites
@Pricesearcher #BrightonSEO
7. PRODUCT CATALOGUE SIZE
have less than 5000 product links in
their sitemap
have between 5000 and 30000 links
have more than 30000 links
14%
of retailer websites
79%
of retailer websites
7%
of retailer websites
@Pricesearcher #BrightonSEO
8. DATA RICHNESS #1
provide a brand for their products
provide a category for their products
provide a stock indicator for their products
17%
of retailer websites
44%
of retailer websites
62%
of retailer websites
@Pricesearcher #BrightonSEO
9. DATA RICHNESS #2 – NUMBER OF DIMENSIONS
Crawler 6 dimensions
Plugin
Product Feed
12 dimensions
23 dimensions
@Pricesearcher #BrightonSEO
10. SITEMAP DISCOVERABILITY
list their sitemap in robots.txt33%
of retailer websites
@Pricesearcher #BrightonSEO
TOP
5
Action Points
suggestions
@Pricesearcher #BrightonSEO
ACTION POINT #1 - SITEMAP
• Have an XML sitemap
• Have the path of your sitemap listed in robots.txt
• Have your product pages in your sitemap
• Regularly update your sitemap
• Don’t point to 404 pages from your sitemap
@Pricesearcher #BrightonSEO
ACTION POINT #2 - META / OPENGRAPH / ITEMPROP
• Provide structured information on your products using meta
itemprop (schema) or opengraph tags
• Provide as much structured data as possible
• Implement them as close as possible to the standards
@Pricesearcher #BrightonSEO
ACTION POINT #3 – JAVASCRIPT & PRICE
• Be wary of the side effects of a javascript heavy site on crawling
• If you do implement a javascript heavy site, meta tags with
structured information are even more important!
• Be wary when converting the price based on geo location
• Don’t perform the price conversion in Javascript
@Pricesearcher #BrightonSEO
ACTION POINT #4 - ANTI-CRAWL & ROBOTS.TXT
• Ask yourselves what’s the benefit of an anti-crawl mechanism
• Ask yourselves what’s the benefit of blocking all crawlers in
robots.txt
• Control the speed of crawlers using crawl-delay
@Pricesearcher #BrightonSEO
ACTION POINT #5 - HAVE A SITEMAP MEETING
• Have a sitemap strategy, it’s just as important as your SEO strategy
• Sitemaps contribute massively to discoverability, yet are often overlooked
• Make sure you are doing everything you can to provide structured information
• Review your robots.txt contents
• Address missed opportunities from your sitemap sooner rather than later
@Pricesearcher #BrightonSEO
THANKS FOR LISTENING!
Pricebot
http://www.pricesearcher.com/pricebot
Keen to hear from you with feedback about PriceBot or Pricesearcher in general.
Feel free to drop me a line at vlassios@pricesearcher.com or catch up with me at
our stand B11 in the expo hall

Más contenido relacionado

La actualidad más candente

Crawling, indexation & the impact on performance | Brighton SEO
Crawling, indexation & the impact on performance | Brighton SEOCrawling, indexation & the impact on performance | Brighton SEO
Crawling, indexation & the impact on performance | Brighton SEOMartin Sean Fennon
 
Conflicting Website Signals & Confused Search Engines - Rachel Costello, Tech...
Conflicting Website Signals & Confused Search Engines - Rachel Costello, Tech...Conflicting Website Signals & Confused Search Engines - Rachel Costello, Tech...
Conflicting Website Signals & Confused Search Engines - Rachel Costello, Tech...Rachel Costello
 
Kostas Voudouris - BrightonSEO - Perfromance-based optimisation using Google ...
Kostas Voudouris - BrightonSEO - Perfromance-based optimisation using Google ...Kostas Voudouris - BrightonSEO - Perfromance-based optimisation using Google ...
Kostas Voudouris - BrightonSEO - Perfromance-based optimisation using Google ...kvonweb
 
Technical SEO Beyond the Audit - Brighton SEO April 2017 - Philip Gamble
Technical SEO Beyond the Audit - Brighton SEO April 2017 - Philip GambleTechnical SEO Beyond the Audit - Brighton SEO April 2017 - Philip Gamble
Technical SEO Beyond the Audit - Brighton SEO April 2017 - Philip GamblePhilip Gamble
 
SearchLeeds 2018 - Luke Carthy - How to optimise the s*** out of your interna...
SearchLeeds 2018 - Luke Carthy - How to optimise the s*** out of your interna...SearchLeeds 2018 - Luke Carthy - How to optimise the s*** out of your interna...
SearchLeeds 2018 - Luke Carthy - How to optimise the s*** out of your interna...Branded3
 
Optimizing Customer Journeys Online | Digital Growth Unleashed 2019 | Rachel ...
Optimizing Customer Journeys Online | Digital Growth Unleashed 2019 | Rachel ...Optimizing Customer Journeys Online | Digital Growth Unleashed 2019 | Rachel ...
Optimizing Customer Journeys Online | Digital Growth Unleashed 2019 | Rachel ...DeepCrawl
 
BrightonSEO 2017 - SEO quick wins from a technical check
BrightonSEO 2017  - SEO quick wins from a technical checkBrightonSEO 2017  - SEO quick wins from a technical check
BrightonSEO 2017 - SEO quick wins from a technical checkChloe Bodard
 
Building an SEO Exponential Growth model by closing your content gaps
Building an SEO Exponential Growth model by closing your content gapsBuilding an SEO Exponential Growth model by closing your content gaps
Building an SEO Exponential Growth model by closing your content gapsRazvan Gavrilas
 
Sam morton 10 Tips to Scale Link Building for your Clients
Sam morton   10 Tips to Scale Link Building for your Clients  Sam morton   10 Tips to Scale Link Building for your Clients
Sam morton 10 Tips to Scale Link Building for your Clients Sam Morton
 
Redefining relevance: links in 2018 - #LeedsLovesSearch
Redefining relevance: links in 2018 - #LeedsLovesSearchRedefining relevance: links in 2018 - #LeedsLovesSearch
Redefining relevance: links in 2018 - #LeedsLovesSearchBranded3
 
MeasureFest July 2021 - Session Segmentation with Machine Learning
MeasureFest July 2021 - Session Segmentation with Machine LearningMeasureFest July 2021 - Session Segmentation with Machine Learning
MeasureFest July 2021 - Session Segmentation with Machine LearningRichard Lawrence
 
SearchLeeds 2018 - Steve Chambers - Stickyeyes - How not to F**K up a Migration
SearchLeeds 2018 - Steve Chambers - Stickyeyes - How not to F**K up a Migration SearchLeeds 2018 - Steve Chambers - Stickyeyes - How not to F**K up a Migration
SearchLeeds 2018 - Steve Chambers - Stickyeyes - How not to F**K up a Migration Branded3
 
Mark Osborne - Brighton SEO April 2019 The Seedy Underbelly of Keyword Resear...
Mark Osborne - Brighton SEO April 2019 The Seedy Underbelly of Keyword Resear...Mark Osborne - Brighton SEO April 2019 The Seedy Underbelly of Keyword Resear...
Mark Osborne - Brighton SEO April 2019 The Seedy Underbelly of Keyword Resear...Mark Osborne
 
SMX West 2020 - Leveraging Structured Data for Maximum Effect
SMX West  2020 - Leveraging Structured Data for Maximum EffectSMX West  2020 - Leveraging Structured Data for Maximum Effect
SMX West 2020 - Leveraging Structured Data for Maximum EffectAbby Hamilton
 
Max Prin - MnSearch Summit 2018 - SEO for the Current Mobile Landscape
Max Prin - MnSearch Summit 2018 - SEO for the Current Mobile LandscapeMax Prin - MnSearch Summit 2018 - SEO for the Current Mobile Landscape
Max Prin - MnSearch Summit 2018 - SEO for the Current Mobile LandscapeMax Prin
 
How to report on SEO in 2018 #BrightonSEO
How to report on SEO in 2018 #BrightonSEOHow to report on SEO in 2018 #BrightonSEO
How to report on SEO in 2018 #BrightonSEOBranded3
 

La actualidad más candente (16)

Crawling, indexation & the impact on performance | Brighton SEO
Crawling, indexation & the impact on performance | Brighton SEOCrawling, indexation & the impact on performance | Brighton SEO
Crawling, indexation & the impact on performance | Brighton SEO
 
Conflicting Website Signals & Confused Search Engines - Rachel Costello, Tech...
Conflicting Website Signals & Confused Search Engines - Rachel Costello, Tech...Conflicting Website Signals & Confused Search Engines - Rachel Costello, Tech...
Conflicting Website Signals & Confused Search Engines - Rachel Costello, Tech...
 
Kostas Voudouris - BrightonSEO - Perfromance-based optimisation using Google ...
Kostas Voudouris - BrightonSEO - Perfromance-based optimisation using Google ...Kostas Voudouris - BrightonSEO - Perfromance-based optimisation using Google ...
Kostas Voudouris - BrightonSEO - Perfromance-based optimisation using Google ...
 
Technical SEO Beyond the Audit - Brighton SEO April 2017 - Philip Gamble
Technical SEO Beyond the Audit - Brighton SEO April 2017 - Philip GambleTechnical SEO Beyond the Audit - Brighton SEO April 2017 - Philip Gamble
Technical SEO Beyond the Audit - Brighton SEO April 2017 - Philip Gamble
 
SearchLeeds 2018 - Luke Carthy - How to optimise the s*** out of your interna...
SearchLeeds 2018 - Luke Carthy - How to optimise the s*** out of your interna...SearchLeeds 2018 - Luke Carthy - How to optimise the s*** out of your interna...
SearchLeeds 2018 - Luke Carthy - How to optimise the s*** out of your interna...
 
Optimizing Customer Journeys Online | Digital Growth Unleashed 2019 | Rachel ...
Optimizing Customer Journeys Online | Digital Growth Unleashed 2019 | Rachel ...Optimizing Customer Journeys Online | Digital Growth Unleashed 2019 | Rachel ...
Optimizing Customer Journeys Online | Digital Growth Unleashed 2019 | Rachel ...
 
BrightonSEO 2017 - SEO quick wins from a technical check
BrightonSEO 2017  - SEO quick wins from a technical checkBrightonSEO 2017  - SEO quick wins from a technical check
BrightonSEO 2017 - SEO quick wins from a technical check
 
Building an SEO Exponential Growth model by closing your content gaps
Building an SEO Exponential Growth model by closing your content gapsBuilding an SEO Exponential Growth model by closing your content gaps
Building an SEO Exponential Growth model by closing your content gaps
 
Sam morton 10 Tips to Scale Link Building for your Clients
Sam morton   10 Tips to Scale Link Building for your Clients  Sam morton   10 Tips to Scale Link Building for your Clients
Sam morton 10 Tips to Scale Link Building for your Clients
 
Redefining relevance: links in 2018 - #LeedsLovesSearch
Redefining relevance: links in 2018 - #LeedsLovesSearchRedefining relevance: links in 2018 - #LeedsLovesSearch
Redefining relevance: links in 2018 - #LeedsLovesSearch
 
MeasureFest July 2021 - Session Segmentation with Machine Learning
MeasureFest July 2021 - Session Segmentation with Machine LearningMeasureFest July 2021 - Session Segmentation with Machine Learning
MeasureFest July 2021 - Session Segmentation with Machine Learning
 
SearchLeeds 2018 - Steve Chambers - Stickyeyes - How not to F**K up a Migration
SearchLeeds 2018 - Steve Chambers - Stickyeyes - How not to F**K up a Migration SearchLeeds 2018 - Steve Chambers - Stickyeyes - How not to F**K up a Migration
SearchLeeds 2018 - Steve Chambers - Stickyeyes - How not to F**K up a Migration
 
Mark Osborne - Brighton SEO April 2019 The Seedy Underbelly of Keyword Resear...
Mark Osborne - Brighton SEO April 2019 The Seedy Underbelly of Keyword Resear...Mark Osborne - Brighton SEO April 2019 The Seedy Underbelly of Keyword Resear...
Mark Osborne - Brighton SEO April 2019 The Seedy Underbelly of Keyword Resear...
 
SMX West 2020 - Leveraging Structured Data for Maximum Effect
SMX West  2020 - Leveraging Structured Data for Maximum EffectSMX West  2020 - Leveraging Structured Data for Maximum Effect
SMX West 2020 - Leveraging Structured Data for Maximum Effect
 
Max Prin - MnSearch Summit 2018 - SEO for the Current Mobile Landscape
Max Prin - MnSearch Summit 2018 - SEO for the Current Mobile LandscapeMax Prin - MnSearch Summit 2018 - SEO for the Current Mobile Landscape
Max Prin - MnSearch Summit 2018 - SEO for the Current Mobile Landscape
 
How to report on SEO in 2018 #BrightonSEO
How to report on SEO in 2018 #BrightonSEOHow to report on SEO in 2018 #BrightonSEO
How to report on SEO in 2018 #BrightonSEO
 

Similar a What a search engine can teach you about product sitemaps - BrightonSEO April 2018

How to disrupt established markets with SEO in 2015 - LOGIN 2015
How to disrupt established markets with SEO in 2015 - LOGIN 2015How to disrupt established markets with SEO in 2015 - LOGIN 2015
How to disrupt established markets with SEO in 2015 - LOGIN 2015Yannis Karagiannidis
 
Lessons From Spider Support
Lessons From Spider SupportLessons From Spider Support
Lessons From Spider SupportOliver Brett
 
How to Scrape Amazon Best Seller Lists with Python and BeautifulSoup.pptx
How to Scrape Amazon Best Seller Lists with Python and BeautifulSoup.pptxHow to Scrape Amazon Best Seller Lists with Python and BeautifulSoup.pptx
How to Scrape Amazon Best Seller Lists with Python and BeautifulSoup.pptxProductdata Scrape
 
SEO Tools of the Trade - Barcelona Affiliate Conference 2014
SEO Tools of the Trade - Barcelona Affiliate Conference 2014SEO Tools of the Trade - Barcelona Affiliate Conference 2014
SEO Tools of the Trade - Barcelona Affiliate Conference 2014Bastian Grimm
 
How to Scrape Amazon Best Seller Lists with Python and BeautifulSoup.pdf
How to Scrape Amazon Best Seller Lists with Python and BeautifulSoup.pdfHow to Scrape Amazon Best Seller Lists with Python and BeautifulSoup.pdf
How to Scrape Amazon Best Seller Lists with Python and BeautifulSoup.pdfProductdata Scrape
 
Seo e marketing | PromoteDial.com
Seo e marketing | PromoteDial.comSeo e marketing | PromoteDial.com
Seo e marketing | PromoteDial.comPromoteDial.com
 
BrightEdge Share15 - CM205: Content Convergence: Search, Social & Content - M...
BrightEdge Share15 - CM205: Content Convergence: Search, Social & Content - M...BrightEdge Share15 - CM205: Content Convergence: Search, Social & Content - M...
BrightEdge Share15 - CM205: Content Convergence: Search, Social & Content - M...BrightEdge Technologies
 
Not Just a Blog - WordPress & ECommerce
Not Just a Blog - WordPress & ECommerceNot Just a Blog - WordPress & ECommerce
Not Just a Blog - WordPress & ECommerceWill Hanke
 
Understanding SEO - BritMums Live 16 Presentation
Understanding SEO - BritMums Live 16 PresentationUnderstanding SEO - BritMums Live 16 Presentation
Understanding SEO - BritMums Live 16 PresentationJudith Lewis
 
Redefining Technical SEO - Paul Shapiro at MozCon 2019
Redefining Technical SEO - Paul Shapiro at MozCon 2019Redefining Technical SEO - Paul Shapiro at MozCon 2019
Redefining Technical SEO - Paul Shapiro at MozCon 2019Catalyst
 
Tools of the Trade for Running SEO Audits - SMX East 2015: Essential Steps fo...
Tools of the Trade for Running SEO Audits - SMX East 2015: Essential Steps fo...Tools of the Trade for Running SEO Audits - SMX East 2015: Essential Steps fo...
Tools of the Trade for Running SEO Audits - SMX East 2015: Essential Steps fo...Benj Arriola
 
How to Automate Walmart Store Coupon Data Extraction with LXML and Python (1)...
How to Automate Walmart Store Coupon Data Extraction with LXML and Python (1)...How to Automate Walmart Store Coupon Data Extraction with LXML and Python (1)...
How to Automate Walmart Store Coupon Data Extraction with LXML and Python (1)...Productdata Scrape
 
Jeremy cabral search marketing summit - scraping data-driven content (1)
Jeremy cabral   search marketing summit - scraping data-driven content (1)Jeremy cabral   search marketing summit - scraping data-driven content (1)
Jeremy cabral search marketing summit - scraping data-driven content (1)Jeremy Cabral
 

Similar a What a search engine can teach you about product sitemaps - BrightonSEO April 2018 (20)

How to disrupt established markets with SEO in 2015 - LOGIN 2015
How to disrupt established markets with SEO in 2015 - LOGIN 2015How to disrupt established markets with SEO in 2015 - LOGIN 2015
How to disrupt established markets with SEO in 2015 - LOGIN 2015
 
Digital Marketing Mumbai
Digital Marketing MumbaiDigital Marketing Mumbai
Digital Marketing Mumbai
 
Lessons From Spider Support
Lessons From Spider SupportLessons From Spider Support
Lessons From Spider Support
 
How to Scrape Amazon Best Seller Lists with Python and BeautifulSoup.pptx
How to Scrape Amazon Best Seller Lists with Python and BeautifulSoup.pptxHow to Scrape Amazon Best Seller Lists with Python and BeautifulSoup.pptx
How to Scrape Amazon Best Seller Lists with Python and BeautifulSoup.pptx
 
SEO Tools of the Trade - Barcelona Affiliate Conference 2014
SEO Tools of the Trade - Barcelona Affiliate Conference 2014SEO Tools of the Trade - Barcelona Affiliate Conference 2014
SEO Tools of the Trade - Barcelona Affiliate Conference 2014
 
How to Scrape Amazon Best Seller Lists with Python and BeautifulSoup.pdf
How to Scrape Amazon Best Seller Lists with Python and BeautifulSoup.pdfHow to Scrape Amazon Best Seller Lists with Python and BeautifulSoup.pdf
How to Scrape Amazon Best Seller Lists with Python and BeautifulSoup.pdf
 
Seo e marketing | PromoteDial.com
Seo e marketing | PromoteDial.comSeo e marketing | PromoteDial.com
Seo e marketing | PromoteDial.com
 
BrightEdge Share15 - CM205: Content Convergence: Search, Social & Content - M...
BrightEdge Share15 - CM205: Content Convergence: Search, Social & Content - M...BrightEdge Share15 - CM205: Content Convergence: Search, Social & Content - M...
BrightEdge Share15 - CM205: Content Convergence: Search, Social & Content - M...
 
Seo Working
Seo WorkingSeo Working
Seo Working
 
Emarketing1
Emarketing1Emarketing1
Emarketing1
 
Not Just a Blog - WordPress & ECommerce
Not Just a Blog - WordPress & ECommerceNot Just a Blog - WordPress & ECommerce
Not Just a Blog - WordPress & ECommerce
 
Understanding SEO - BritMums Live 16 Presentation
Understanding SEO - BritMums Live 16 PresentationUnderstanding SEO - BritMums Live 16 Presentation
Understanding SEO - BritMums Live 16 Presentation
 
Redefining Technical SEO - Paul Shapiro at MozCon 2019
Redefining Technical SEO - Paul Shapiro at MozCon 2019Redefining Technical SEO - Paul Shapiro at MozCon 2019
Redefining Technical SEO - Paul Shapiro at MozCon 2019
 
SEO.ppt
SEO.pptSEO.ppt
SEO.ppt
 
Tools of the Trade for Running SEO Audits - SMX East 2015: Essential Steps fo...
Tools of the Trade for Running SEO Audits - SMX East 2015: Essential Steps fo...Tools of the Trade for Running SEO Audits - SMX East 2015: Essential Steps fo...
Tools of the Trade for Running SEO Audits - SMX East 2015: Essential Steps fo...
 
How to Automate Walmart Store Coupon Data Extraction with LXML and Python (1)...
How to Automate Walmart Store Coupon Data Extraction with LXML and Python (1)...How to Automate Walmart Store Coupon Data Extraction with LXML and Python (1)...
How to Automate Walmart Store Coupon Data Extraction with LXML and Python (1)...
 
Jeremy cabral search marketing summit - scraping data-driven content (1)
Jeremy cabral   search marketing summit - scraping data-driven content (1)Jeremy cabral   search marketing summit - scraping data-driven content (1)
Jeremy cabral search marketing summit - scraping data-driven content (1)
 
Seo ppt
Seo pptSeo ppt
Seo ppt
 
Seo
SeoSeo
Seo
 
Emarketing
EmarketingEmarketing
Emarketing
 

Último

SQL Server on Azure VM datasheet.dsadaspptx
SQL Server on Azure VM datasheet.dsadaspptxSQL Server on Azure VM datasheet.dsadaspptx
SQL Server on Azure VM datasheet.dsadaspptxJustineGarcia32
 
Google-Next-Madrid-BBVA-Research inv.pdf
Google-Next-Madrid-BBVA-Research inv.pdfGoogle-Next-Madrid-BBVA-Research inv.pdf
Google-Next-Madrid-BBVA-Research inv.pdfMaria Adalfio
 
APNIC Update and RIR Policies for ccTLDs, presented at APTLD 85
APNIC Update and RIR Policies for ccTLDs, presented at APTLD 85APNIC Update and RIR Policies for ccTLDs, presented at APTLD 85
APNIC Update and RIR Policies for ccTLDs, presented at APTLD 85APNIC
 
如何办理朴茨茅斯大学毕业证书学位证书成绩单?
如何办理朴茨茅斯大学毕业证书学位证书成绩单?如何办理朴茨茅斯大学毕业证书学位证书成绩单?
如何办理朴茨茅斯大学毕业证书学位证书成绩单?krc0yvm5
 
Tungsten Webinar: v6 & v7 Release Recap, and Beyond
Tungsten Webinar: v6 & v7 Release Recap, and BeyondTungsten Webinar: v6 & v7 Release Recap, and Beyond
Tungsten Webinar: v6 & v7 Release Recap, and BeyondContinuent
 
overview of Virtualization, concept of Virtualization
overview of Virtualization, concept of Virtualizationoverview of Virtualization, concept of Virtualization
overview of Virtualization, concept of VirtualizationRajan yadav
 
Mary Meeker Internet Trends Report for 2019
Mary Meeker Internet Trends Report for 2019Mary Meeker Internet Trends Report for 2019
Mary Meeker Internet Trends Report for 2019Eric Johnson
 
Benefits of Fiber Internet vs. Traditional Internet.pptx
Benefits of Fiber Internet vs. Traditional Internet.pptxBenefits of Fiber Internet vs. Traditional Internet.pptx
Benefits of Fiber Internet vs. Traditional Internet.pptxlibertyuae uae
 
Generalities about NFT , as a new technology
Generalities about NFT , as a new technologyGeneralities about NFT , as a new technology
Generalities about NFT , as a new technologysoufianbouktaib1
 
Section 3 - Technical Sales Foundations for IBM QRadar for Cloud (QRoC)V1 P10...
Section 3 - Technical Sales Foundations for IBM QRadar for Cloud (QRoC)V1 P10...Section 3 - Technical Sales Foundations for IBM QRadar for Cloud (QRoC)V1 P10...
Section 3 - Technical Sales Foundations for IBM QRadar for Cloud (QRoC)V1 P10...hasimatwork
 

Último (10)

SQL Server on Azure VM datasheet.dsadaspptx
SQL Server on Azure VM datasheet.dsadaspptxSQL Server on Azure VM datasheet.dsadaspptx
SQL Server on Azure VM datasheet.dsadaspptx
 
Google-Next-Madrid-BBVA-Research inv.pdf
Google-Next-Madrid-BBVA-Research inv.pdfGoogle-Next-Madrid-BBVA-Research inv.pdf
Google-Next-Madrid-BBVA-Research inv.pdf
 
APNIC Update and RIR Policies for ccTLDs, presented at APTLD 85
APNIC Update and RIR Policies for ccTLDs, presented at APTLD 85APNIC Update and RIR Policies for ccTLDs, presented at APTLD 85
APNIC Update and RIR Policies for ccTLDs, presented at APTLD 85
 
如何办理朴茨茅斯大学毕业证书学位证书成绩单?
如何办理朴茨茅斯大学毕业证书学位证书成绩单?如何办理朴茨茅斯大学毕业证书学位证书成绩单?
如何办理朴茨茅斯大学毕业证书学位证书成绩单?
 
Tungsten Webinar: v6 & v7 Release Recap, and Beyond
Tungsten Webinar: v6 & v7 Release Recap, and BeyondTungsten Webinar: v6 & v7 Release Recap, and Beyond
Tungsten Webinar: v6 & v7 Release Recap, and Beyond
 
overview of Virtualization, concept of Virtualization
overview of Virtualization, concept of Virtualizationoverview of Virtualization, concept of Virtualization
overview of Virtualization, concept of Virtualization
 
Mary Meeker Internet Trends Report for 2019
Mary Meeker Internet Trends Report for 2019Mary Meeker Internet Trends Report for 2019
Mary Meeker Internet Trends Report for 2019
 
Benefits of Fiber Internet vs. Traditional Internet.pptx
Benefits of Fiber Internet vs. Traditional Internet.pptxBenefits of Fiber Internet vs. Traditional Internet.pptx
Benefits of Fiber Internet vs. Traditional Internet.pptx
 
Generalities about NFT , as a new technology
Generalities about NFT , as a new technologyGeneralities about NFT , as a new technology
Generalities about NFT , as a new technology
 
Section 3 - Technical Sales Foundations for IBM QRadar for Cloud (QRoC)V1 P10...
Section 3 - Technical Sales Foundations for IBM QRadar for Cloud (QRoC)V1 P10...Section 3 - Technical Sales Foundations for IBM QRadar for Cloud (QRoC)V1 P10...
Section 3 - Technical Sales Foundations for IBM QRadar for Cloud (QRoC)V1 P10...
 

What a search engine can teach you about product sitemaps - BrightonSEO April 2018

  • 1. Vlassios Rizopoulos Chief Technology Officer @ pricesearcher.com What a search engine can teach you about product sitemaps @Pricesearcher #BrightonSEO
  • 2. @Pricesearcher #BrightonSEO BACKGROUND Pricesearcher is a vertical search engine focusing on products and their prices. Our mission is to provide access to all the worlds prices in one place.
  • 3. @Pricesearcher #BrightonSEO OUR MISSION IS TO INDEX ALL THE WORLD’S PRICES
  • 4. @Pricesearcher #BrightonSEO SOURCES OF DATA Product feeds from 5000+ retailers Developed plugins Developed PriceBot to complete the picture
  • 5. @Pricesearcher #BrightonSEO PROGRESS TO DATE Gathered data on 1.1 Billion products Online in 11 Countries Gathered 91 Billion price points for our products On average we check the price of a product 3 times a day We have gathered: 17,000,000 ISBNs 144,000,000 MPNs 73,000,000 SKUs 157,000,000 GTINs GB / US / DE / FR / IT / IE / NO / SE / FI / DK / NG
  • 6. @Pricesearcher #BrightonSEO WHAT IS PRICEBOT? Pricebot is our proprietary crawler, built to discover products and turn unstructured data from web pages into structured data for our product database Pricesearcher is the only product search engine that crawls to complement our product coverage PriceBot is fully robots.txt compliant, leaves behind a footprint in its user agent and has a built-in feedback mechanism http://www.pricesearcher.com/pricebot
  • 7. @Pricesearcher #BrightonSEO WHAT INFORMATION IS PRICEBOT COLLECTING? We are looking to extract the following fields: • Product Title • Product Image • Product Price and optionally: • Product Description • Product Identifier (GTIN/UPC/EAN/ISBN) • Product Brand • Product Category • Product Stock Availability
  • 8. Vastly simplified discovering all the products from retailers @Pricesearcher #BrightonSEO INITIAL CRAWLING TECH DEPENDED ON SITEMAPS
  • 9. @Pricesearcher #BrightonSEO DATA SAMPLE We will focus on 4000 UK retailers we currently crawl using XML sitemaps discovering 20million+ products
  • 11. @Pricesearcher #BrightonSEO 1. SITEMAP DATA have an XML sitemap with product links that’s regularly updated 91% 61% 54% of retailer websites of retailer websites of retailer websites
  • 12. @Pricesearcher #BrightonSEO 2. BLOCKING OF CRAWLERS have blocked us unintentionally (generic robots.txt entry or 403 automatic block) have blocked us intentionally (robots.txt entry) 2% of retailer websites 0.05% of retailer websites
  • 13. @Pricesearcher #BrightonSEO 3. EXTRACTION USING METADATA STANDARDS have product title + price + image defined using meta / opengraph tags have product title + price + image defined using meta / itemprop tags (schema) have product title + price + image defined using both 41% 36% 12% of retailer websites of retailer websites of retailer websites
  • 14. @Pricesearcher #BrightonSEO 4. EXTRACTION USING JAVASCRIPT no info extracted due to heavy rendering being uneconomical price cannot be extracted as it is converted / calculated on the fly 2% of retailer websites 1% of retailer websites
  • 15. @Pricesearcher #BrightonSEO 5. SITEMAP LINKS have multiple links to the same product pages have multiple links to pages that return 404 codes 2% of retailer websites 3% of retailer websites
  • 16. @Pricesearcher #BrightonSEO 6. PRODUCT IDENTIFIERS provide a GTIN-14, EAN-13, UPC-12/8 for their products provide an SKU for their products provide an ISBN for their products 24% of retailer websites 7% of retailer websites 3% of retailer websites
  • 17. @Pricesearcher #BrightonSEO 7. PRODUCT CATALOGUE SIZE have less than 5000 product links in their sitemap have between 5000 and 30000 links have more than 30000 links 14% of retailer websites 79% of retailer websites 7% of retailer websites
  • 18. @Pricesearcher #BrightonSEO 8. DATA RICHNESS #1 provide a brand for their products provide a category for their products provide a stock indicator for their products 17% of retailer websites 44% of retailer websites 62% of retailer websites
  • 19. @Pricesearcher #BrightonSEO 9. DATA RICHNESS #2 – NUMBER OF DIMENSIONS Crawler 6 dimensions Plugin Product Feed 12 dimensions 23 dimensions
  • 20. @Pricesearcher #BrightonSEO 10. SITEMAP DISCOVERABILITY list their sitemap in robots.txt33% of retailer websites
  • 22. @Pricesearcher #BrightonSEO ACTION POINT #1 - SITEMAP • Have an XML sitemap • Have the path of your sitemap listed in robots.txt • Have your product pages in your sitemap • Regularly update your sitemap • Don’t point to 404 pages from your sitemap
  • 23. @Pricesearcher #BrightonSEO ACTION POINT #2 - META / OPENGRAPH / ITEMPROP • Provide structured information on your products using meta itemprop (schema) or opengraph tags • Provide as much structured data as possible • Implement them as close as possible to the standards
  • 24. @Pricesearcher #BrightonSEO ACTION POINT #3 – JAVASCRIPT & PRICE • Be wary of the side effects of a javascript heavy site on crawling • If you do implement a javascript heavy site, meta tags with structured information are even more important! • Be wary when converting the price based on geo location • Don’t perform the price conversion in Javascript
  • 25. @Pricesearcher #BrightonSEO ACTION POINT #4 - ANTI-CRAWL & ROBOTS.TXT • Ask yourselves what’s the benefit of an anti-crawl mechanism • Ask yourselves what’s the benefit of blocking all crawlers in robots.txt • Control the speed of crawlers using crawl-delay
  • 26. @Pricesearcher #BrightonSEO ACTION POINT #5 - HAVE A SITEMAP MEETING • Have a sitemap strategy, it’s just as important as your SEO strategy • Sitemaps contribute massively to discoverability, yet are often overlooked • Make sure you are doing everything you can to provide structured information • Review your robots.txt contents • Address missed opportunities from your sitemap sooner rather than later
  • 27. @Pricesearcher #BrightonSEO THANKS FOR LISTENING! Pricebot http://www.pricesearcher.com/pricebot Keen to hear from you with feedback about PriceBot or Pricesearcher in general. Feel free to drop me a line at vlassios@pricesearcher.com or catch up with me at our stand B11 in the expo hall

Notas del editor

  1. Unintentional blocks: Crawl-delay is very high that would take weeks to crawl a single site All user-agents are blocked in robots.txt Automated anti-crawl system kicks in and starts serving 403s