SlideShare una empresa de Scribd logo
1 de 26
Checking Google Index status at scale with Node.js
Checking
Google Index status
at scale with Node.js
Jose Luis Hernando
@jlhernando #BrightonSEO
Senior Technical SEO Consultant
Checking Google Index status at scale with Node.js
Today’s agenda
1. Why it’s important to know your website’s indexing status
2. The challenge to extract this data
3. Getting the data with Node.js – Live Demo!
4. Using this data for your SEO strategy
Checking Google Index status at scale with Node.js
Why is it important?
Reason #1
Not in the Index => Not in the SERPs
Icons from Google, Flaticon & Sitecheckerpro
Checking Google Index status at scale with Node.js
Why is it important?
Reason #2
Google evaluates site quality based on indexed pages
Sources:
Google Only Can Judge Site Quality Based On Pages They Index – Barry Swartz (Search Engine Roundtable)
English Google Webmaster Central office-hours hangout – Google Webmasters YouTube Channel
Low Quality Pages
Uncontrolled Faceted Navigation URLs
Unsupervised User Generated Content
Indexable Non-Canonical URLs
High Quality Pages
Category Pages
Editorial Pages
Canonical Product Pages
+
Checking Google Index status at scale with Node.js
Why is it important?
Reason #3
Inefficient use of Google’s resources
https://website.com/category-one/
HTML CSS JS
/category-one/?color=red
/category-one/?color=blue
/category-one/?color=red&blue
…
∞
Checking Google Index status at scale with Node.js
71.7%
54.3%
41.7%
34.4%
45.3%
30.2%
15.1%
10.1%
1-10k
10k-100k
100k-1M
1M+
Avg. Crawl Ratio (%) Avg. Active Ratio (%)
Source: How Does Google Crawl the Web? – (Annabelle Bouard & Dimitri Brunel – Botify)
Crawl Ratio
Percentage of pages
crawled by Google in 30 days
Active Ratio
Percentage of pages that
have generated at least
one organic visit in 30 days.
How much of your site is Googlebot crawling?
Checking Google Index status at scale with Node.js
The challenge
to extract this data
• Googlebot’s crawling behaviour
doesn’t determine indexing status
Checking Google Index status at scale with Node.js
The challenge:
extracting this data
• Googlebot’s crawling behaviour
doesn’t determine indexing status
• You rely on partial and sometimes
inaccurate data points:
• site: & inurl: operators
• GSC Indexing reports:
• URL Inspection Tool (< 200 URLs /day)
• Coverage Reports (< 1,000 rows /
report)
Checking Google Index status at scale with Node.js
Proxy metrics != Accurate data
Checking Google Index status at scale with Node.js
If you can’t find it, build it
Checking Google Index status at scale with Node.js
{Live demo}
bit.ly/google-index-checker-script
Checking Google Index status at scale with Node.js
Using the following method
goes against Google’s Terms of Service
as it automatically requests search queries from Google Search
Quick FYI
Checking Google Index status at scale with Node.js
Our script outperforms every other method available
Checking Google Index status at scale with Node.js
How can you use Google index
data?
Identify inefficient
use of crawl budget
Error Prioritisation
Identify holes
in your
architecture
Check for pages from your
site that should be indexed
but are not.
Find pages that should not be
indexed but are indexed.
Detect pages that used to
exist and now return an error
(4xx) but are still indexed.
Checking Google Index status at scale with Node.js
Use case #1
Sitemap Health Check
How many URLs from your XML sitemap are
indexed?
• 200 Status Code – 81,688
Inspired by Data Secrets of the Index Coverage Report – AJ Kohn
Sitemaps = 111,772
URLs
80% Indexed 74,223
7,465
Google Index Status of 2xx URLs
from Sitemap
Indexed Not Indexed
Checking Google Index status at scale with Node.js
Use case #1
Sitemap Health Check
How many URLs from your XML sitemap are
indexed?
• 200 Status Code – 81,688
• 404 Status Code – 29,969
Inspired by Data Secrets of the Index Coverage Report – AJ Kohn
Sitemaps = 111,772
URLs
80% Indexed
21% Indexed
6,268
23,701
Google Index Status of 4xx URLs
from Sitemap
Indexed Not Indexed
Checking Google Index status at scale with Node.js
Use case #1
Sitemap Health Check
How many URLs from your XML sitemap are
indexed?
• 200 Status Code – 81,688
• 404 Status Code – 29,969
• 301 Status Code – 365
Inspired by Data Secrets of the Index Coverage Report – AJ Kohn
Sitemaps = 111,772
URLs
80% Indexed
21% Indexed
4% Indexed
16 349
Google Index Status of 3xx URLs
from Sitemap
Indexed Not Indexed
Checking Google Index status at scale with Node.js
Sitemap Health Check
Next Steps
1) Identify if these URLs are important to your site’s bottom line
2) Check if a pool of these URLs have issues on GSC’s
Index Coverage Report
3) Choose a tactic to improve the visibility of these URLs
4) Isolate the relevant URLs and modify the existing sitemap or create a
new-sitemap.xml to monitor progress
Checking Google Index status at scale with Node.js
Use case #2
Log File Analysis Plus+
How many URLs with Googlebot hits are
indexed?
• ~160k Googlebot hits to non-canonical URLs
(/Uppercase/ vs /lowercase/)
• Identified if non-canonical URLs were indexed
• Identified if the referenced canonical URLs
were indexed
35.8%
64.2%
Indexed Non-Canonical URLs
Requested by Googlebot
Indexed Not Indexed
Undisclosed Client
Checking Google Index status at scale with Node.js
Log File Analysis+
Next Steps
1) Identify if the canonical tag is correctly placed
2) Identify if the root cause is internal linking, external linking or other
3) Consider redirecting non-canonical URLs to canonical URLs
4) Create a new-sitemap.xml with problematic URLs to encourage
Googlebot revisiting those URLs and for monitoring purposes
Checking Google Index status at scale with Node.js
• Check Real-time indexing (News sites, Offer sites, Job Boards)
• Check uncontrolled faceted navigation (Crawl budget optimisation)
• Check inactive product/category URLs – (Site architecture
improvements)
• Check old 4xx that are live now & haven't been deindexed yet (Recover
organic opportunities)
Other use cases
Inform your SEO strategy
Checking Google Index status at scale with Node.js
Further reading
https://bit.ly/google-index-checks
Checking Google Index status at scale with Node.js
Further reading
https://bit.ly/gsc-index-coverage
Checking Google Index status at scale with Node.js
The Google Index Checker script has opened a door
to get useful, actionable data at scale for your sites
Use it, and act on it.
Checking Google Index status at scale with Node.js
Thank you.
builtvisible.com
Jose Luis Hernando
Senior Technical SEO Consultant
@jlhernando
Checking Google Index status at scale with Node.js
How does Google crawl the web – Annabelle Bouard & Dimitri Brunel (Botify)
English Google Webmaster Central office-hours hangout – Google Webmasters YouTube Channel
Google Only Can Judge Site Quality Based On Pages They Index – Barry Swartz (Search Engine Roundtable)
Data Secrets of the Index Coverage Report - Blind Five Year Old (AJ Kohn)
How Google Search Works – Google Documentation
How Search organises information – Google Documentation
Our new search index: Caffeine - Carrie Grimes
When indexing goes wrong: how Google Search recovered from indexing issues & lessons learned since -
Vincent Courson, Google Search Outreach
How Search Engines Work: Crawling, Indexing & Ranking – Moz
(Please) Stop Using Unsafe Characters in URLs – Jeff Starr
Sources & additional reading

Más contenido relacionado

La actualidad más candente

Advanced data-driven technical SEO - SMX London 2019
Advanced data-driven technical SEO - SMX London 2019Advanced data-driven technical SEO - SMX London 2019
Advanced data-driven technical SEO - SMX London 2019Bastian Grimm
 
How to Perform SEO Audits
How to Perform SEO AuditsHow to Perform SEO Audits
How to Perform SEO Auditsalanbleiweiss
 
BrightonSEO 2017 - SEO quick wins from a technical check
BrightonSEO 2017  - SEO quick wins from a technical checkBrightonSEO 2017  - SEO quick wins from a technical check
BrightonSEO 2017 - SEO quick wins from a technical checkChloe Bodard
 
Advanced SEO Ranking Relationships
Advanced SEO Ranking RelationshipsAdvanced SEO Ranking Relationships
Advanced SEO Ranking Relationshipsalanbleiweiss
 
How to repurpose your content in 2016
How to repurpose your content in 2016How to repurpose your content in 2016
How to repurpose your content in 2016Joseph Rega
 
Technical SEO Audits - SEO Consultant Bill Hartzer - Triangle Marketing Club
Technical SEO Audits - SEO Consultant Bill Hartzer - Triangle Marketing ClubTechnical SEO Audits - SEO Consultant Bill Hartzer - Triangle Marketing Club
Technical SEO Audits - SEO Consultant Bill Hartzer - Triangle Marketing ClubBill Hartzer
 
Redefining relevance: links in 2018 - #LeedsLovesSearch
Redefining relevance: links in 2018 - #LeedsLovesSearchRedefining relevance: links in 2018 - #LeedsLovesSearch
Redefining relevance: links in 2018 - #LeedsLovesSearchBranded3
 
React JS and Search Engines - Patrick Stox at Triangle ReactJS Meetup
React JS and Search Engines - Patrick Stox at Triangle ReactJS MeetupReact JS and Search Engines - Patrick Stox at Triangle ReactJS Meetup
React JS and Search Engines - Patrick Stox at Triangle ReactJS Meetuppatrickstox
 
Things Google Tries To Correct For You - SMX Advanced 2019 Insights Sessions ...
Things Google Tries To Correct For You - SMX Advanced 2019 Insights Sessions ...Things Google Tries To Correct For You - SMX Advanced 2019 Insights Sessions ...
Things Google Tries To Correct For You - SMX Advanced 2019 Insights Sessions ...patrickstox
 
Technical SEO Competitive Analysis - BrightonSEO 2020
Technical SEO Competitive Analysis - BrightonSEO 2020Technical SEO Competitive Analysis - BrightonSEO 2020
Technical SEO Competitive Analysis - BrightonSEO 2020Niki Mosier
 
Proactive Measures for Good Site Health - Brighton SEO 2014
Proactive Measures for Good Site Health - Brighton SEO 2014Proactive Measures for Good Site Health - Brighton SEO 2014
Proactive Measures for Good Site Health - Brighton SEO 2014Thomas Whittam
 
Gaca-Tworek: JavaScript analysis is extremely important and anyone can do it!...
Gaca-Tworek: JavaScript analysis is extremely important and anyone can do it!...Gaca-Tworek: JavaScript analysis is extremely important and anyone can do it!...
Gaca-Tworek: JavaScript analysis is extremely important and anyone can do it!...BarbaraGacaTworek
 
SMX East - SEO Tools Panel
SMX East - SEO Tools PanelSMX East - SEO Tools Panel
SMX East - SEO Tools PanelAbby Hamilton
 
What's Next for Page Experience - SMX Next 2021 - Patrick Stox
What's Next for Page Experience - SMX Next 2021 - Patrick StoxWhat's Next for Page Experience - SMX Next 2021 - Patrick Stox
What's Next for Page Experience - SMX Next 2021 - Patrick StoxAhrefs
 
SEO Crawl Rank And Crawl Tank - Brighton SEO April 2016
SEO Crawl Rank And Crawl Tank - Brighton SEO April 2016SEO Crawl Rank And Crawl Tank - Brighton SEO April 2016
SEO Crawl Rank And Crawl Tank - Brighton SEO April 2016Dawn Anderson MSc DigM
 
the SEO cyborg - Moz 2018 (full edition)
the SEO cyborg - Moz 2018 (full edition)the SEO cyborg - Moz 2018 (full edition)
the SEO cyborg - Moz 2018 (full edition)Alexis Sanders
 
FoundConf 2018 Signals Speak - Alexis Sanders
FoundConf 2018 Signals Speak - Alexis SandersFoundConf 2018 Signals Speak - Alexis Sanders
FoundConf 2018 Signals Speak - Alexis SandersAlexis Sanders
 
A Crash Course in Technical SEO from Patrick Stox - Beer & SEO Meetup May 2019
A Crash Course in Technical SEO from Patrick Stox - Beer & SEO Meetup May 2019A Crash Course in Technical SEO from Patrick Stox - Beer & SEO Meetup May 2019
A Crash Course in Technical SEO from Patrick Stox - Beer & SEO Meetup May 2019patrickstox
 
How Googlebot Renders (Roleplaying as Google's Web Rendering Service-- D&D st...
How Googlebot Renders (Roleplaying as Google's Web Rendering Service-- D&D st...How Googlebot Renders (Roleplaying as Google's Web Rendering Service-- D&D st...
How Googlebot Renders (Roleplaying as Google's Web Rendering Service-- D&D st...Jamie Indigo
 

La actualidad más candente (19)

Advanced data-driven technical SEO - SMX London 2019
Advanced data-driven technical SEO - SMX London 2019Advanced data-driven technical SEO - SMX London 2019
Advanced data-driven technical SEO - SMX London 2019
 
How to Perform SEO Audits
How to Perform SEO AuditsHow to Perform SEO Audits
How to Perform SEO Audits
 
BrightonSEO 2017 - SEO quick wins from a technical check
BrightonSEO 2017  - SEO quick wins from a technical checkBrightonSEO 2017  - SEO quick wins from a technical check
BrightonSEO 2017 - SEO quick wins from a technical check
 
Advanced SEO Ranking Relationships
Advanced SEO Ranking RelationshipsAdvanced SEO Ranking Relationships
Advanced SEO Ranking Relationships
 
How to repurpose your content in 2016
How to repurpose your content in 2016How to repurpose your content in 2016
How to repurpose your content in 2016
 
Technical SEO Audits - SEO Consultant Bill Hartzer - Triangle Marketing Club
Technical SEO Audits - SEO Consultant Bill Hartzer - Triangle Marketing ClubTechnical SEO Audits - SEO Consultant Bill Hartzer - Triangle Marketing Club
Technical SEO Audits - SEO Consultant Bill Hartzer - Triangle Marketing Club
 
Redefining relevance: links in 2018 - #LeedsLovesSearch
Redefining relevance: links in 2018 - #LeedsLovesSearchRedefining relevance: links in 2018 - #LeedsLovesSearch
Redefining relevance: links in 2018 - #LeedsLovesSearch
 
React JS and Search Engines - Patrick Stox at Triangle ReactJS Meetup
React JS and Search Engines - Patrick Stox at Triangle ReactJS MeetupReact JS and Search Engines - Patrick Stox at Triangle ReactJS Meetup
React JS and Search Engines - Patrick Stox at Triangle ReactJS Meetup
 
Things Google Tries To Correct For You - SMX Advanced 2019 Insights Sessions ...
Things Google Tries To Correct For You - SMX Advanced 2019 Insights Sessions ...Things Google Tries To Correct For You - SMX Advanced 2019 Insights Sessions ...
Things Google Tries To Correct For You - SMX Advanced 2019 Insights Sessions ...
 
Technical SEO Competitive Analysis - BrightonSEO 2020
Technical SEO Competitive Analysis - BrightonSEO 2020Technical SEO Competitive Analysis - BrightonSEO 2020
Technical SEO Competitive Analysis - BrightonSEO 2020
 
Proactive Measures for Good Site Health - Brighton SEO 2014
Proactive Measures for Good Site Health - Brighton SEO 2014Proactive Measures for Good Site Health - Brighton SEO 2014
Proactive Measures for Good Site Health - Brighton SEO 2014
 
Gaca-Tworek: JavaScript analysis is extremely important and anyone can do it!...
Gaca-Tworek: JavaScript analysis is extremely important and anyone can do it!...Gaca-Tworek: JavaScript analysis is extremely important and anyone can do it!...
Gaca-Tworek: JavaScript analysis is extremely important and anyone can do it!...
 
SMX East - SEO Tools Panel
SMX East - SEO Tools PanelSMX East - SEO Tools Panel
SMX East - SEO Tools Panel
 
What's Next for Page Experience - SMX Next 2021 - Patrick Stox
What's Next for Page Experience - SMX Next 2021 - Patrick StoxWhat's Next for Page Experience - SMX Next 2021 - Patrick Stox
What's Next for Page Experience - SMX Next 2021 - Patrick Stox
 
SEO Crawl Rank And Crawl Tank - Brighton SEO April 2016
SEO Crawl Rank And Crawl Tank - Brighton SEO April 2016SEO Crawl Rank And Crawl Tank - Brighton SEO April 2016
SEO Crawl Rank And Crawl Tank - Brighton SEO April 2016
 
the SEO cyborg - Moz 2018 (full edition)
the SEO cyborg - Moz 2018 (full edition)the SEO cyborg - Moz 2018 (full edition)
the SEO cyborg - Moz 2018 (full edition)
 
FoundConf 2018 Signals Speak - Alexis Sanders
FoundConf 2018 Signals Speak - Alexis SandersFoundConf 2018 Signals Speak - Alexis Sanders
FoundConf 2018 Signals Speak - Alexis Sanders
 
A Crash Course in Technical SEO from Patrick Stox - Beer & SEO Meetup May 2019
A Crash Course in Technical SEO from Patrick Stox - Beer & SEO Meetup May 2019A Crash Course in Technical SEO from Patrick Stox - Beer & SEO Meetup May 2019
A Crash Course in Technical SEO from Patrick Stox - Beer & SEO Meetup May 2019
 
How Googlebot Renders (Roleplaying as Google's Web Rendering Service-- D&D st...
How Googlebot Renders (Roleplaying as Google's Web Rendering Service-- D&D st...How Googlebot Renders (Roleplaying as Google's Web Rendering Service-- D&D st...
How Googlebot Renders (Roleplaying as Google's Web Rendering Service-- D&D st...
 

Similar a Checking Google Index Status at Scale using Node.js - Jose Hernando - BrightonSEO Oct 2020

Evaluating URLs at Scale
Evaluating URLs at ScaleEvaluating URLs at Scale
Evaluating URLs at ScaleBristolSEO
 
SEO for Ecommerce: A Comprehensive Guide
SEO for Ecommerce: A Comprehensive GuideSEO for Ecommerce: A Comprehensive Guide
SEO for Ecommerce: A Comprehensive GuideAdam Audette
 
Technical SEO - An Introduction to Core Aspects of Technical SEO Best-Practise
Technical SEO - An Introduction to Core Aspects of Technical SEO Best-PractiseTechnical SEO - An Introduction to Core Aspects of Technical SEO Best-Practise
Technical SEO - An Introduction to Core Aspects of Technical SEO Best-PractiseErudite
 
Paul Duncan - Advanced Tracking & Enriched SERP Results via Google Tag Manager
Paul Duncan - Advanced Tracking & Enriched SERP Results via Google Tag ManagerPaul Duncan - Advanced Tracking & Enriched SERP Results via Google Tag Manager
Paul Duncan - Advanced Tracking & Enriched SERP Results via Google Tag ManagerJulia Grosman
 
Crawl Budget: Everything you Need to Know
Crawl Budget: Everything you Need to KnowCrawl Budget: Everything you Need to Know
Crawl Budget: Everything you Need to KnowSallyR7
 
33 Tactics to Engage and Retain More Customers - IRCE 2016
33 Tactics to Engage and Retain More Customers - IRCE 201633 Tactics to Engage and Retain More Customers - IRCE 2016
33 Tactics to Engage and Retain More Customers - IRCE 2016Mark Ginsberg
 
Keeping Things Lean & Mean: Crawl Optimisation - Search Marketing Summit AU
Keeping Things Lean & Mean: Crawl Optimisation - Search Marketing Summit AUKeeping Things Lean & Mean: Crawl Optimisation - Search Marketing Summit AU
Keeping Things Lean & Mean: Crawl Optimisation - Search Marketing Summit AUJason Mun
 
Site Migrations by Nik Ranger
 Site Migrations by Nik Ranger Site Migrations by Nik Ranger
Site Migrations by Nik RangerAnton Shulke
 
SEO Audit Workshop : Frameworks , Techniques and Tools
SEO Audit Workshop : Frameworks , Techniques and Tools SEO Audit Workshop : Frameworks , Techniques and Tools
SEO Audit Workshop : Frameworks , Techniques and Tools NEW MEDIA GURU
 
33 Tactics to Engage and Retain More Customers- IRCE 2016
33 Tactics to Engage and Retain More Customers- IRCE 201633 Tactics to Engage and Retain More Customers- IRCE 2016
33 Tactics to Engage and Retain More Customers- IRCE 2016Andrew Scarbrough
 
Raven Tools for Reporting, Analysis & Strategy Development
Raven Tools for Reporting, Analysis & Strategy DevelopmentRaven Tools for Reporting, Analysis & Strategy Development
Raven Tools for Reporting, Analysis & Strategy DevelopmentBrettASnyder
 
Web Mining.pptx
Web Mining.pptxWeb Mining.pptx
Web Mining.pptxScrbifPt
 
Basic Search Engine Optimization Strategies
Basic Search Engine Optimization Strategies  Basic Search Engine Optimization Strategies
Basic Search Engine Optimization Strategies Online Business Owners
 
Faceted Navigation: (Almost) Everyone is Doing it Wrong
Faceted Navigation: (Almost) Everyone is Doing it WrongFaceted Navigation: (Almost) Everyone is Doing it Wrong
Faceted Navigation: (Almost) Everyone is Doing it WrongBotify
 
Implementing Site Search in CQ5 / AEM
Implementing Site Search in CQ5 / AEMImplementing Site Search in CQ5 / AEM
Implementing Site Search in CQ5 / AEMrtpaem
 
Introduction To SEO (SEARCH ENGINE OPTIMIZATION)- Learning Catalyst
Introduction To SEO (SEARCH ENGINE OPTIMIZATION)- Learning CatalystIntroduction To SEO (SEARCH ENGINE OPTIMIZATION)- Learning Catalyst
Introduction To SEO (SEARCH ENGINE OPTIMIZATION)- Learning CatalystLearning-Catalyst
 

Similar a Checking Google Index Status at Scale using Node.js - Jose Hernando - BrightonSEO Oct 2020 (20)

Evaluating URLs at Scale
Evaluating URLs at ScaleEvaluating URLs at Scale
Evaluating URLs at Scale
 
SEO for Ecommerce: A Comprehensive Guide
SEO for Ecommerce: A Comprehensive GuideSEO for Ecommerce: A Comprehensive Guide
SEO for Ecommerce: A Comprehensive Guide
 
Technical SEO - An Introduction to Core Aspects of Technical SEO Best-Practise
Technical SEO - An Introduction to Core Aspects of Technical SEO Best-PractiseTechnical SEO - An Introduction to Core Aspects of Technical SEO Best-Practise
Technical SEO - An Introduction to Core Aspects of Technical SEO Best-Practise
 
Paul Duncan - Advanced Tracking & Enriched SERP Results via Google Tag Manager
Paul Duncan - Advanced Tracking & Enriched SERP Results via Google Tag ManagerPaul Duncan - Advanced Tracking & Enriched SERP Results via Google Tag Manager
Paul Duncan - Advanced Tracking & Enriched SERP Results via Google Tag Manager
 
Crawl Budget: Everything you Need to Know
Crawl Budget: Everything you Need to KnowCrawl Budget: Everything you Need to Know
Crawl Budget: Everything you Need to Know
 
33 Tactics to Engage and Retain More Customers - IRCE 2016
33 Tactics to Engage and Retain More Customers - IRCE 201633 Tactics to Engage and Retain More Customers - IRCE 2016
33 Tactics to Engage and Retain More Customers - IRCE 2016
 
Site Analysis
Site AnalysisSite Analysis
Site Analysis
 
Keeping Things Lean & Mean: Crawl Optimisation - Search Marketing Summit AU
Keeping Things Lean & Mean: Crawl Optimisation - Search Marketing Summit AUKeeping Things Lean & Mean: Crawl Optimisation - Search Marketing Summit AU
Keeping Things Lean & Mean: Crawl Optimisation - Search Marketing Summit AU
 
Site Migrations by Nik Ranger
 Site Migrations by Nik Ranger Site Migrations by Nik Ranger
Site Migrations by Nik Ranger
 
SEO Audit Workshop : Frameworks , Techniques and Tools
SEO Audit Workshop : Frameworks , Techniques and Tools SEO Audit Workshop : Frameworks , Techniques and Tools
SEO Audit Workshop : Frameworks , Techniques and Tools
 
33 Tactics to Engage and Retain More Customers- IRCE 2016
33 Tactics to Engage and Retain More Customers- IRCE 201633 Tactics to Engage and Retain More Customers- IRCE 2016
33 Tactics to Engage and Retain More Customers- IRCE 2016
 
Raven Tools for Reporting, Analysis & Strategy Development
Raven Tools for Reporting, Analysis & Strategy DevelopmentRaven Tools for Reporting, Analysis & Strategy Development
Raven Tools for Reporting, Analysis & Strategy Development
 
Web Mining.pptx
Web Mining.pptxWeb Mining.pptx
Web Mining.pptx
 
Dc seo fin
Dc seo finDc seo fin
Dc seo fin
 
Basic Search Engine Optimization Strategies
Basic Search Engine Optimization Strategies  Basic Search Engine Optimization Strategies
Basic Search Engine Optimization Strategies
 
Faceted Navigation: (Almost) Everyone is Doing it Wrong
Faceted Navigation: (Almost) Everyone is Doing it WrongFaceted Navigation: (Almost) Everyone is Doing it Wrong
Faceted Navigation: (Almost) Everyone is Doing it Wrong
 
Implementing Site Search in CQ5 / AEM
Implementing Site Search in CQ5 / AEMImplementing Site Search in CQ5 / AEM
Implementing Site Search in CQ5 / AEM
 
Search engine optimization
Search engine optimizationSearch engine optimization
Search engine optimization
 
Introduction To SEO (SEARCH ENGINE OPTIMIZATION)- Learning Catalyst
Introduction To SEO (SEARCH ENGINE OPTIMIZATION)- Learning CatalystIntroduction To SEO (SEARCH ENGINE OPTIMIZATION)- Learning Catalyst
Introduction To SEO (SEARCH ENGINE OPTIMIZATION)- Learning Catalyst
 
NIRS.org SEO Audit
NIRS.org SEO AuditNIRS.org SEO Audit
NIRS.org SEO Audit
 

Último

IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxdolaknnilon
 
办美国加州大学伯克利分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
办美国加州大学伯克利分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree办美国加州大学伯克利分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
办美国加州大学伯克利分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectBoston Institute of Analytics
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxaleedritatuxx
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...GQ Research
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degreeyuu sss
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Boston Institute of Analytics
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一F La
 

Último (20)

IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptx
 
办美国加州大学伯克利分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
办美国加州大学伯克利分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree办美国加州大学伯克利分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
办美国加州大学伯克利分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis Project
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
 

Checking Google Index Status at Scale using Node.js - Jose Hernando - BrightonSEO Oct 2020

  • 1. Checking Google Index status at scale with Node.js Checking Google Index status at scale with Node.js Jose Luis Hernando @jlhernando #BrightonSEO Senior Technical SEO Consultant
  • 2. Checking Google Index status at scale with Node.js Today’s agenda 1. Why it’s important to know your website’s indexing status 2. The challenge to extract this data 3. Getting the data with Node.js – Live Demo! 4. Using this data for your SEO strategy
  • 3. Checking Google Index status at scale with Node.js Why is it important? Reason #1 Not in the Index => Not in the SERPs Icons from Google, Flaticon & Sitecheckerpro
  • 4. Checking Google Index status at scale with Node.js Why is it important? Reason #2 Google evaluates site quality based on indexed pages Sources: Google Only Can Judge Site Quality Based On Pages They Index – Barry Swartz (Search Engine Roundtable) English Google Webmaster Central office-hours hangout – Google Webmasters YouTube Channel Low Quality Pages Uncontrolled Faceted Navigation URLs Unsupervised User Generated Content Indexable Non-Canonical URLs High Quality Pages Category Pages Editorial Pages Canonical Product Pages +
  • 5. Checking Google Index status at scale with Node.js Why is it important? Reason #3 Inefficient use of Google’s resources https://website.com/category-one/ HTML CSS JS /category-one/?color=red /category-one/?color=blue /category-one/?color=red&blue … ∞
  • 6. Checking Google Index status at scale with Node.js 71.7% 54.3% 41.7% 34.4% 45.3% 30.2% 15.1% 10.1% 1-10k 10k-100k 100k-1M 1M+ Avg. Crawl Ratio (%) Avg. Active Ratio (%) Source: How Does Google Crawl the Web? – (Annabelle Bouard & Dimitri Brunel – Botify) Crawl Ratio Percentage of pages crawled by Google in 30 days Active Ratio Percentage of pages that have generated at least one organic visit in 30 days. How much of your site is Googlebot crawling?
  • 7. Checking Google Index status at scale with Node.js The challenge to extract this data • Googlebot’s crawling behaviour doesn’t determine indexing status
  • 8. Checking Google Index status at scale with Node.js The challenge: extracting this data • Googlebot’s crawling behaviour doesn’t determine indexing status • You rely on partial and sometimes inaccurate data points: • site: & inurl: operators • GSC Indexing reports: • URL Inspection Tool (< 200 URLs /day) • Coverage Reports (< 1,000 rows / report)
  • 9. Checking Google Index status at scale with Node.js Proxy metrics != Accurate data
  • 10. Checking Google Index status at scale with Node.js If you can’t find it, build it
  • 11. Checking Google Index status at scale with Node.js {Live demo} bit.ly/google-index-checker-script
  • 12. Checking Google Index status at scale with Node.js Using the following method goes against Google’s Terms of Service as it automatically requests search queries from Google Search Quick FYI
  • 13. Checking Google Index status at scale with Node.js Our script outperforms every other method available
  • 14. Checking Google Index status at scale with Node.js How can you use Google index data? Identify inefficient use of crawl budget Error Prioritisation Identify holes in your architecture Check for pages from your site that should be indexed but are not. Find pages that should not be indexed but are indexed. Detect pages that used to exist and now return an error (4xx) but are still indexed.
  • 15. Checking Google Index status at scale with Node.js Use case #1 Sitemap Health Check How many URLs from your XML sitemap are indexed? • 200 Status Code – 81,688 Inspired by Data Secrets of the Index Coverage Report – AJ Kohn Sitemaps = 111,772 URLs 80% Indexed 74,223 7,465 Google Index Status of 2xx URLs from Sitemap Indexed Not Indexed
  • 16. Checking Google Index status at scale with Node.js Use case #1 Sitemap Health Check How many URLs from your XML sitemap are indexed? • 200 Status Code – 81,688 • 404 Status Code – 29,969 Inspired by Data Secrets of the Index Coverage Report – AJ Kohn Sitemaps = 111,772 URLs 80% Indexed 21% Indexed 6,268 23,701 Google Index Status of 4xx URLs from Sitemap Indexed Not Indexed
  • 17. Checking Google Index status at scale with Node.js Use case #1 Sitemap Health Check How many URLs from your XML sitemap are indexed? • 200 Status Code – 81,688 • 404 Status Code – 29,969 • 301 Status Code – 365 Inspired by Data Secrets of the Index Coverage Report – AJ Kohn Sitemaps = 111,772 URLs 80% Indexed 21% Indexed 4% Indexed 16 349 Google Index Status of 3xx URLs from Sitemap Indexed Not Indexed
  • 18. Checking Google Index status at scale with Node.js Sitemap Health Check Next Steps 1) Identify if these URLs are important to your site’s bottom line 2) Check if a pool of these URLs have issues on GSC’s Index Coverage Report 3) Choose a tactic to improve the visibility of these URLs 4) Isolate the relevant URLs and modify the existing sitemap or create a new-sitemap.xml to monitor progress
  • 19. Checking Google Index status at scale with Node.js Use case #2 Log File Analysis Plus+ How many URLs with Googlebot hits are indexed? • ~160k Googlebot hits to non-canonical URLs (/Uppercase/ vs /lowercase/) • Identified if non-canonical URLs were indexed • Identified if the referenced canonical URLs were indexed 35.8% 64.2% Indexed Non-Canonical URLs Requested by Googlebot Indexed Not Indexed Undisclosed Client
  • 20. Checking Google Index status at scale with Node.js Log File Analysis+ Next Steps 1) Identify if the canonical tag is correctly placed 2) Identify if the root cause is internal linking, external linking or other 3) Consider redirecting non-canonical URLs to canonical URLs 4) Create a new-sitemap.xml with problematic URLs to encourage Googlebot revisiting those URLs and for monitoring purposes
  • 21. Checking Google Index status at scale with Node.js • Check Real-time indexing (News sites, Offer sites, Job Boards) • Check uncontrolled faceted navigation (Crawl budget optimisation) • Check inactive product/category URLs – (Site architecture improvements) • Check old 4xx that are live now & haven't been deindexed yet (Recover organic opportunities) Other use cases Inform your SEO strategy
  • 22. Checking Google Index status at scale with Node.js Further reading https://bit.ly/google-index-checks
  • 23. Checking Google Index status at scale with Node.js Further reading https://bit.ly/gsc-index-coverage
  • 24. Checking Google Index status at scale with Node.js The Google Index Checker script has opened a door to get useful, actionable data at scale for your sites Use it, and act on it.
  • 25. Checking Google Index status at scale with Node.js Thank you. builtvisible.com Jose Luis Hernando Senior Technical SEO Consultant @jlhernando
  • 26. Checking Google Index status at scale with Node.js How does Google crawl the web – Annabelle Bouard & Dimitri Brunel (Botify) English Google Webmaster Central office-hours hangout – Google Webmasters YouTube Channel Google Only Can Judge Site Quality Based On Pages They Index – Barry Swartz (Search Engine Roundtable) Data Secrets of the Index Coverage Report - Blind Five Year Old (AJ Kohn) How Google Search Works – Google Documentation How Search organises information – Google Documentation Our new search index: Caffeine - Carrie Grimes When indexing goes wrong: how Google Search recovered from indexing issues & lessons learned since - Vincent Courson, Google Search Outreach How Search Engines Work: Crawling, Indexing & Ranking – Moz (Please) Stop Using Unsafe Characters in URLs – Jeff Starr Sources & additional reading

Notas del editor

  1. Technical SEO Consultant at Builtvisible Builtvisible is a Digital Marketing Agency focusing exclusively on Organic Performance. We are specialist in Technical SEO, Content Strategy, Digital PR and Analytics and we deal primarily with medium and large-scale sites targeting both national and global audiences online.
  2. If you’re not in Google’s index you will not appear in Google SERPs To appear in Search Results, Google has to discover, crawl, render and index your website’s pages. Only once you’re in the index, you will be eligible to appear in SERPs and then you can acquire users through organic search. If you don’t know which pages are indexed you don’t know which pages can acquire users organically
  3. Pages that you’ve probably spent lots of time customising to serve users. These pages will be evaluated in the same way as low quality pages that are indexable: Uncontrolled facet nav USG Non-canonicals
  4. If you have an e-com site that has uncontrolled faceted navigation, Gbot will have to download that page (and its resources) to evaluate if that page is valuable. If for example, you have uncontrolled facet navigation, Gbot will have to crawl and render those URLs to see if these pages contain valuable information for future user query. Since this is not controlled, it can go ad-infinitum and hence wasting Google’s resources on URLs that are very likely not as valuable as others that you have in your site architecture.
  5. Key step in the indexing pipeling  Crawling In order for Google to Index your site it needs to crawl your site. But how much of your site is Googlebot crawling? According to a study from Botify using 270 sites with different architecture sizes, certainly not all of it. In this graph there are 2 important concepts: Crawl Ratio & Active ration (explain) If you are dealing with a site that has less than 10k URLs, Google is crawling on avg. 71% of your site and only 45% of that gets organic clicks. If we continue increasing the size of a website we can see that the rate at which Googlebot crawls your site, declines more and more. To the point where, if your site has more than 1M URLs, Googlebot crawls on average only 34% of your site and only 10% of those URLs get clicks from Organic Search
  6. Challenges Even if you are lucky enough to have access to your logs on a regular basis, Googlebot’s crawling behaviour doesn’t determine indexing status - You cannot guarantee that those URLs that have not received clicks from Google Search are actually part of Google’s index
  7. 2) If you don’t have access to server logs you have even less data, and hence you rely information that Google provides you through: a) site: & inurl: operators  Rough estimate for site-wide numbers and a lot of times inaccurate info for individual URLs b) Google Search Console reports  Inspection Tool (Great but you hit quota limit after 200 URLs and hence a bit pointless to automate)  Coverage/Sitemap Coverage reports (Great but GSC only allows 1,000 rows of data per report)
  8. Download Our Google Index Checker script from Github – Developed by our Senior Developer Alvaro Fernandez Download/Update Node.js Script relies on using ScraperAPI to get info from Google Search  Super easy to use and you can Sign up for Free to get the API Key. Concurrent requests limited to 5  ScraperAPI Free Plan Max limit but Al has built a function to automatically adapt concurrent to the Tier Plan limit Unlimited number of URLs Perfect for Clean URLs but it can also process parameterised URLs, case sensitive, international encoded characters, reserved/unreserved symbols Recycling feature Nice overview of the index status check when finishes
  9. Download your XML sitemap/s using your preferred crawler (SF, DC, OC, SB) get your list of URLs and create a urls.csv file and add it to the Google Index Checker Once it’s finished, you will get a CSV file with your results and you can find out how much of your sitemap is indexed. In this example I’ve taken argos.co.uk because is a large Ecom site, with a mix of normal URLs and URLs with unsafe characters.
  10. Download your XML sitemap/s using your preferred crawler (SF, DC, OC, SB) get your list of URLs and create a urls.csv file and add it to the Google Index Checker Once it’s finished, you will get a CSV file with your results and you can find out how much of your sitemap is indexed. In this example I’ve taken argos.co.uk because is a large Ecom site, with a mix of normal URLs and URLs with unsafe characters.
  11. Download your XML sitemap/s using your preferred crawler (SF, DC, OC, SB) get your list of URLs and create a urls.csv file and add it to the Google Index Checker Once it’s finished, you will get a CSV file with your results and you can find out how much of your sitemap is indexed. In this example I’ve taken argos.co.uk because is a large Ecom site, with a mix of normal URLs and URLs with unsafe characters.
  12. We found ~160k Non-canonical category pages with a significant amount of Googlebot request The problem was that the non-canonical URLs contained an Uppercase character which wasn’t supposed to be there. Firstly, we wanted to identify if these pages were indexed Secondly we wanted to know if the non-canonical URLs were being indexed instead of the canonicals In the end we found approximately 36% of the Non-canonical URLs that were indexed instead of their canonicals.