SlideShare una empresa de Scribd logo
1 de 35
Breaking Bad SEO
The Science of Crawl Space
Meet Walter…
Meet Walter, the chemistry teacher.
Despite getting his hands pretty dirty throughout the series, Walt is fundamentally a good guy
and as the star of the show, from a search perspective he should be considered a white hat SEO.
As a scientist, Walt is methodical in his approach, he understands the principles and practices
needed to achieve results and has the skill to produce the very best crystal meth.
Jesse Pinkman
Jesse Pinkman, on the other hand, is a Black Hat SEO.
An ex-student of Walt’s who failed chemistry at school and became a drug dealer.
Jesse knows his industry and has the right contacts to get by, but he isn’t really interested in
delivering the very best results or long term plans.
So why "Crawl Space“?
The Totality of Possible URLs For a Website
“[You]…have several options to optimize the “crawl space”
(the totality of URLs on your site known to Googlebot) for unique
content pages, reduce crawling of duplicative pages, and
consolidate indexing signals.”
Google Webmaster Central Blog
http://googlewebmastercentral.blogspot.co.uk/2014/02/faceted-navigation-best-and-5-of-worst.html
What is Crawl Space?
Google regularly refer to crawl space - it’s fundamentally about knowing your site architecture -
the first step towards successful website optimisation.
Identifying your crawl space is the first step towards knowing your site architecture.
Why should we care about crawl space?
Size Does Matter…
Google warn you about potential
crawl space issues in GWT and
outline some of the implications…
•Unnecessarily crawling a large
number of duplicate content URLs
•Discovering undesired parts of
your site
•Consuming more bandwidth
than necessary
•Potential inability to completely
index all of your site.
domain.com
domain.com/home
domain.com/home/
domain.com/Home
domain.com/?source=ppc
domain.com/index.html
How’d I Get so Many Pages?
So check what’s in the indexes already, is it a realistic number of pages for your site?
Do you recognise the URL structure/formats being returned?
The most common contributor to a large crawl space is duplicate content - seen here with
BooHoo.com, which can be caused by a number of valid and invalid reasons.
For SEO… There Can Only Be One!
But Don’t Lose Your Head
Auditing your Crawl Space will help you understand the full mix of URLs and help formulate a
consistent implementation.
What Makes Up Your Crawl Space?
To build a picture of your site architecture and to discover your complete crawl space, you need
to consider what contributes to your URL Universe – the place where all theoretically possible
URLs exist.
Poor management has
implications:
•Orphaned pages
•Incomplete sitemaps
•Dilution of backlinks and shares
•Performance is harder to track
•Limited volume of unique URLs
in analytics
•Crawl inefficiency for SEO
•Traffic growth strategies just won’t
work
The Threats
A lack of Crawl Space management can leave you wide open to threats, & lead to a load of GWT warnings!
Do social signals impact organic rankings?
•Twitter?
•Facebook?
•Google+
Potentially
Social and Backlink Synergy
Who cares!? Even if social signals are not used in a ranking algorithm, social media and SEO are both able to
drive discovery at scale. They both have enormous reach so understanding the crawl space for search &
social is crucial.
URL aggregation is not necessarily required, I’d recommend it.
So let take a look at how to tackle all this…
To get going we need all the components:
•A Cook
•A Lab
•The Formula
•Organic Ingredients
•Recipe
Let’s Cook!
Be the Boss (of URLs)
•An SEO needs to be a great cook…
• Make sure you have the right equipment
• Follow an accurate formula
• Use quality ingredients
•Take ownership of your crawl space
• Benchmark
• URL Roadmap
The Cook – YOU!
Just like Walt, you need to be the boss, the head chef, the daddy, ideally a scientist – but not a prerequisite
if you have the right tools to help you take ownership of your crawl space.
The Lab
This is Jesse’s RV – effectively a mobile meth lab. We need to set up our lab with the tools and equipment
necessary to develop search marketing recipes. Need to link a range of data sources to understand the crawl
space:
•Webmaster Tools - Google & Bing
•Landing page data (Analytics)
•Linked URL data - WMT/hrefs/Majestic/OSE
•Website crawler (Xenu/Screaming Frog/DeepCrawl)
DC’s new Universal Crawl now includes Google Analytics landing page data, along with link equity and social
tagging reports to assess a comprehensive crawl space in one.
Maximised + Minimised = Optimised
Maximise indexable space
•Increase volume of valuable pages
•Increase crawl efficiency
Minimise crawlable space
•Define your crawl space
•Identify and eliminate threats
Optimise canonical space
•A clean version of your website
•Your URL à la carte
The Formula
Discovery, management & optimisation of crawl space is essential and lays the foundation for strong
performance.
All spaces need to be carefully defined and managed efficiently.
Use Organic Ingredients
As SEOs, we have a whole host of juicy ingredients at our disposal to help cook up an optimised crawl space -
picking the very best ingredients for your recipe will make all the difference in testing your recipes.
Use the custom controls in DC to extract and schedule regular data comparisons for each source…
Overwrite robots.txt rules to assess full crawl space as well as the one delivered to search engines.
Schedule regular sitemap crawls and compare against internal and external links, review canonical setup and
index controls to maintain consistency.
- DeepCrawl Backlinks crawl
- OG tags & Hreflang DeepCrawl report
So What’s the Recipe?
Crawl Space
Solutions in One…
New Universal Crawl is now
out of Beta!
Review website, XML sitemaps &
organic landing page data in one
Universal Crawl with Deep Crawl.
Take advantage of a significant
head start in defining,
managing & optimising
your crawl
spaces.
Universal Crawl is the New Heisenberg…
Deep Crawl Goes Universal
Deep Crawl Goes Universal
DC’s Universal Crawl helps you quickly and easily understand
your crawl space and identify URL sources, gaps, new
formats and traffic value, plus you get all the regular DC
features including fully customisable crawl settings, ability to
compare test environment vs live site (support QA), custom
extraction tools and scheduled crawls that record change,
impact and help quantify SEO deliverables.
Lets take a look at some recipes to control your crawl space…
DeepCrawl automatically shows you what’s changed
between crawls so you can understand how much of the site
is changing.
You might spot some URLs formats which are changing very
frequently and affecting the crawl efficiency.
Understand what’s in your crawl space…
•Assess indexation reports
•Review the current index
•Test URL parameter changes
•Quick improvements through GWT Parameter settings
Identify Indexable & Non-indexable Parameters
DeepCrawl indexation reports help you quickly assess all crawlable URLs, unique pages, noindex pages and
identify URL parameters that you might not want indexed.
Use Webmaster tools data and and site: checks to understand current search engine indexation and use the
Parameter Removal controls in DeepCrawl to test the impact of stripping parameters. Monitor crawl
efficiency changes and for a quick win update your URL parameter settings in Webmaster tools when you
find the right formula.
Make sure you’re not self harming
•Check canonical implementation
•All canonical pages should be linked internally
•Assess pages without canonical tags
Canonical URL Configuration
Which URLs Are Being Shared?
?
Check the consistency of canonical URLs against social URLs – are your OG & TwitterCard URLs consistent?
DeepCrawl has a report called to automatically show any errors.
You can also schedule custom extraction crawls to regularly assess social share equity changes for specific
URLs using DC. Likewise for blog comments etc.
Where’s your thin
content hiding out?
Identify Low Value Navigational Pages
Take a good look at your analytics…
•Review URLs delivering minimal traffic
•Identify and assess URLs outside of your canonical setup
Identify Low Value Non Navigational URLs
Review and assess if you can make your crawl more efficient by excluding certain pages.
Check your non-canonical URLs & use Universal Crawl to see non-navigational pages not driving
traffic.
Identify Domain Aliases
Understanding who’s working with you or who’s against you is important too. Review your domain aliases.
Test all registered domains to check if they return a duplicate or redirect.
Check www/non-www configuration and HTTP/HTTPS.
Implement redirects or use cross domain canonical setups.
Monitor your domain portfolio & keep alert!
www.robotto.org
Monitor domains using Robotto.org
Check all disallowed URLs…
•Webmaster Tools
•Deep Crawl Indexation Reports
Disallowed URLs
DeepCrawl reports all ‘Disallowed URLs’ so you can easily see what's already being excluded.
Test changes to your robots.txt file using the Robots Overwrite functionality & develop an optimised
file that increases disallowed URLs & focuses the crawl on primary pages - test the impact.
Review & Validate all linked URLs…
Identify All Linked URLs
Check that your website internal linking is working towards an optimised crawl space.
Use DeepCrawl internal broken links, redirected links, 4xx and 5xx error reports to identify
internal links that are broken or are redirected URLs that may affecting your crawl efficiency.
Crawl your sitemap regularly…
•Run analysis and compare:
• Scheduled sitemap crawls
• Scheduled website crawls
• All validated, linked URLs
Compare Sitemaps
Identify pages ‘Only in Sitemaps’ and not linked internally, plus all pages linked internally, which
are not in the Sitemaps. Schedule regular sitemap & comparative website crawls to assess
what’s changed, you can monitor how much of the site is changing & map against performance.
You might spot some URL formats which are changing frequently and impacting crawl efficiency.
Where’s the link equity?
•Identify pages delivering traffic but not internally linked
•Understand the link profile of all pages:
• Crawl aggregated link data
•DeepCrawl automatically applies link metrics to all reports:
Compare Landing Page URLs to Linked URLs
By comparing sitemaps, GA and internal links, a Universal Crawl easily highlights URLs discovered ‘Only in
Organic Landing Pages’ and not linked internally.
You can add ‘Backlink Crawls’ to your DC projects, simply upload a comprehensive backlink profile URL list
and let DC crawl and validate the URLs. This crawl also helps identify pages generating traffic but not
necessarily linked internally, plus it automatically applies inbound link equity metrics to all DC reports at a
URL level – very useful.
Watch your
Language…
Check your
Hreflang!
International SEO Considerations
The correct use and implementation of HrefLang tags is important for effective International
SEO, but it can be confusing and even experienced SEO’s get it wrong.
• Universal Crawl tests implementation across:
• Sitemaps
• Headers
• On-page
• Review a matrix of language alternatives for each page
• Assess gaps and inconsistencies in the setup
• Review a ‘Pages Without Hreflang Tags’ report
• See David Sottimano’s MOZ post on HrefLang:
http://moz.com/blog/hreflang-behaviour-insights
International SEO Considerations
DeepCrawl helps manage a seamless HrefLang integration by detecting hreflang tags in
sitemaps, headers and on-page before showing a matrix of language alternatives for every page
so you can see the gaps and inconsistencies in the setup.
• Review your options and consider your URL Universe
• Setup your lab
• Google Analytics
• Google/BING Webmaster Tools
• Deep Crawl – Universal Crawl
• Follow the formula:
• Maximised + Minimised = Optimised
• Develop and test new recipes to focus your crawl spaces.
#BreakingBadSEO
Thanks, Keep in Touch…

Más contenido relacionado

Destacado

How Journalistic Principles Will Shape Digital Marketing - BrightonSEO 2014
How Journalistic Principles Will Shape Digital Marketing - BrightonSEO 2014How Journalistic Principles Will Shape Digital Marketing - BrightonSEO 2014
How Journalistic Principles Will Shape Digital Marketing - BrightonSEO 2014Zazzle Media
 
Stop Blind Marketing, Start Selling Through Content - #BrightonSEO by Matt Evans
Stop Blind Marketing, Start Selling Through Content - #BrightonSEO by Matt EvansStop Blind Marketing, Start Selling Through Content - #BrightonSEO by Matt Evans
Stop Blind Marketing, Start Selling Through Content - #BrightonSEO by Matt EvansMatt Evans
 
Time = Money: Marketing Lifehacks
Time = Money: Marketing LifehacksTime = Money: Marketing Lifehacks
Time = Money: Marketing LifehacksNed Poulter
 
Quick Technical SEO Audit Checklist - Peter Handley Brighton SEO April 2014
Quick Technical SEO Audit Checklist - Peter Handley Brighton SEO April 2014Quick Technical SEO Audit Checklist - Peter Handley Brighton SEO April 2014
Quick Technical SEO Audit Checklist - Peter Handley Brighton SEO April 2014Peter Handley
 
The Habits that Land You Links #brightonseo 2014 by @staceycav
The Habits that Land You Links #brightonseo 2014 by @staceycavThe Habits that Land You Links #brightonseo 2014 by @staceycav
The Habits that Land You Links #brightonseo 2014 by @staceycavStacey MacNaught
 
The Need for Speed (5 Performance Optimization Tipps) - brightonSEO 2014
The Need for Speed (5 Performance Optimization Tipps) - brightonSEO 2014The Need for Speed (5 Performance Optimization Tipps) - brightonSEO 2014
The Need for Speed (5 Performance Optimization Tipps) - brightonSEO 2014Bastian Grimm
 
Adthena #BrightonSEO_Competitive Intelligence @_hifi_lofi
Adthena #BrightonSEO_Competitive Intelligence @_hifi_lofi Adthena #BrightonSEO_Competitive Intelligence @_hifi_lofi
Adthena #BrightonSEO_Competitive Intelligence @_hifi_lofi Adthena
 
BrightonSEO Sep 2015 - HTTPS | Mark Thomas
BrightonSEO Sep 2015 - HTTPS | Mark Thomas BrightonSEO Sep 2015 - HTTPS | Mark Thomas
BrightonSEO Sep 2015 - HTTPS | Mark Thomas Anna Morrison
 
From Concept to Completion: Tips for Designing Great Content
From Concept to Completion: Tips for Designing Great ContentFrom Concept to Completion: Tips for Designing Great Content
From Concept to Completion: Tips for Designing Great ContentVicke Cheung
 
Sem days mobile 2015
Sem days mobile 2015Sem days mobile 2015
Sem days mobile 2015Anna Morrison
 
Mark Thomas - 10 Step Technical SEO Game Plan (annotated edition)
Mark Thomas - 10 Step Technical SEO Game Plan (annotated edition)Mark Thomas - 10 Step Technical SEO Game Plan (annotated edition)
Mark Thomas - 10 Step Technical SEO Game Plan (annotated edition)Anna Morrison
 
How to deliver cheap (not nasty) SEO - Brighton SEO 04/2014
How to deliver cheap (not nasty) SEO - Brighton SEO 04/2014How to deliver cheap (not nasty) SEO - Brighton SEO 04/2014
How to deliver cheap (not nasty) SEO - Brighton SEO 04/2014Custard Online Marketing
 
Nancy Scott - Cancer Research UK - SEO & Crawlability
Nancy Scott - Cancer Research UK - SEO & CrawlabilityNancy Scott - Cancer Research UK - SEO & Crawlability
Nancy Scott - Cancer Research UK - SEO & CrawlabilityAnna Morrison
 
The Gather Project: A Review of Healthcare Social Media 10.10
The Gather Project: A Review of Healthcare Social Media 10.10The Gather Project: A Review of Healthcare Social Media 10.10
The Gather Project: A Review of Healthcare Social Media 10.10Peter Levitan & Co.
 
εφημεριδα πεντελης
εφημεριδα πεντεληςεφημεριδα πεντελης
εφημεριδα πεντεληςgympentelis
 
Wiki diagonismos
Wiki diagonismosWiki diagonismos
Wiki diagonismosgympentelis
 

Destacado (17)

How Journalistic Principles Will Shape Digital Marketing - BrightonSEO 2014
How Journalistic Principles Will Shape Digital Marketing - BrightonSEO 2014How Journalistic Principles Will Shape Digital Marketing - BrightonSEO 2014
How Journalistic Principles Will Shape Digital Marketing - BrightonSEO 2014
 
Stop Blind Marketing, Start Selling Through Content - #BrightonSEO by Matt Evans
Stop Blind Marketing, Start Selling Through Content - #BrightonSEO by Matt EvansStop Blind Marketing, Start Selling Through Content - #BrightonSEO by Matt Evans
Stop Blind Marketing, Start Selling Through Content - #BrightonSEO by Matt Evans
 
Time = Money: Marketing Lifehacks
Time = Money: Marketing LifehacksTime = Money: Marketing Lifehacks
Time = Money: Marketing Lifehacks
 
Quick Technical SEO Audit Checklist - Peter Handley Brighton SEO April 2014
Quick Technical SEO Audit Checklist - Peter Handley Brighton SEO April 2014Quick Technical SEO Audit Checklist - Peter Handley Brighton SEO April 2014
Quick Technical SEO Audit Checklist - Peter Handley Brighton SEO April 2014
 
The Habits that Land You Links #brightonseo 2014 by @staceycav
The Habits that Land You Links #brightonseo 2014 by @staceycavThe Habits that Land You Links #brightonseo 2014 by @staceycav
The Habits that Land You Links #brightonseo 2014 by @staceycav
 
The Need for Speed (5 Performance Optimization Tipps) - brightonSEO 2014
The Need for Speed (5 Performance Optimization Tipps) - brightonSEO 2014The Need for Speed (5 Performance Optimization Tipps) - brightonSEO 2014
The Need for Speed (5 Performance Optimization Tipps) - brightonSEO 2014
 
Adthena #BrightonSEO_Competitive Intelligence @_hifi_lofi
Adthena #BrightonSEO_Competitive Intelligence @_hifi_lofi Adthena #BrightonSEO_Competitive Intelligence @_hifi_lofi
Adthena #BrightonSEO_Competitive Intelligence @_hifi_lofi
 
BrightonSEO Sep 2015 - HTTPS | Mark Thomas
BrightonSEO Sep 2015 - HTTPS | Mark Thomas BrightonSEO Sep 2015 - HTTPS | Mark Thomas
BrightonSEO Sep 2015 - HTTPS | Mark Thomas
 
From Concept to Completion: Tips for Designing Great Content
From Concept to Completion: Tips for Designing Great ContentFrom Concept to Completion: Tips for Designing Great Content
From Concept to Completion: Tips for Designing Great Content
 
Sem days mobile 2015
Sem days mobile 2015Sem days mobile 2015
Sem days mobile 2015
 
Mark Thomas - 10 Step Technical SEO Game Plan (annotated edition)
Mark Thomas - 10 Step Technical SEO Game Plan (annotated edition)Mark Thomas - 10 Step Technical SEO Game Plan (annotated edition)
Mark Thomas - 10 Step Technical SEO Game Plan (annotated edition)
 
Seasonality in Keyword Research
Seasonality in Keyword ResearchSeasonality in Keyword Research
Seasonality in Keyword Research
 
How to deliver cheap (not nasty) SEO - Brighton SEO 04/2014
How to deliver cheap (not nasty) SEO - Brighton SEO 04/2014How to deliver cheap (not nasty) SEO - Brighton SEO 04/2014
How to deliver cheap (not nasty) SEO - Brighton SEO 04/2014
 
Nancy Scott - Cancer Research UK - SEO & Crawlability
Nancy Scott - Cancer Research UK - SEO & CrawlabilityNancy Scott - Cancer Research UK - SEO & Crawlability
Nancy Scott - Cancer Research UK - SEO & Crawlability
 
The Gather Project: A Review of Healthcare Social Media 10.10
The Gather Project: A Review of Healthcare Social Media 10.10The Gather Project: A Review of Healthcare Social Media 10.10
The Gather Project: A Review of Healthcare Social Media 10.10
 
εφημεριδα πεντελης
εφημεριδα πεντεληςεφημεριδα πεντελης
εφημεριδα πεντελης
 
Wiki diagonismos
Wiki diagonismosWiki diagonismos
Wiki diagonismos
 

Último

Beyond Resumes_ How Volunteering Shapes Career Trajectories by Kent Kubie
Beyond Resumes_ How Volunteering Shapes Career Trajectories by Kent KubieBeyond Resumes_ How Volunteering Shapes Career Trajectories by Kent Kubie
Beyond Resumes_ How Volunteering Shapes Career Trajectories by Kent KubieKent Kubie
 
Do More with Less: Navigating Customer Acquisition Challenges for Today's Ent...
Do More with Less: Navigating Customer Acquisition Challenges for Today's Ent...Do More with Less: Navigating Customer Acquisition Challenges for Today's Ent...
Do More with Less: Navigating Customer Acquisition Challenges for Today's Ent...Search Engine Journal
 
Brighton SEO April 2024 - The Good, the Bad & the Ugly of SEO Success
Brighton SEO April 2024 - The Good, the Bad & the Ugly of SEO SuccessBrighton SEO April 2024 - The Good, the Bad & the Ugly of SEO Success
Brighton SEO April 2024 - The Good, the Bad & the Ugly of SEO SuccessVarn
 
Mastering SEO in the Evolving AI-driven World
Mastering SEO in the Evolving AI-driven WorldMastering SEO in the Evolving AI-driven World
Mastering SEO in the Evolving AI-driven WorldScalenut
 
How to Leverage Behavioral Science Insights for Direct Mail Success
How to Leverage Behavioral Science Insights for Direct Mail SuccessHow to Leverage Behavioral Science Insights for Direct Mail Success
How to Leverage Behavioral Science Insights for Direct Mail SuccessAggregage
 
Unraveling the Mystery of Roanoke Colony: What Really Happened?
Unraveling the Mystery of Roanoke Colony: What Really Happened?Unraveling the Mystery of Roanoke Colony: What Really Happened?
Unraveling the Mystery of Roanoke Colony: What Really Happened?elizabethella096
 
Local SEO Domination: Put your business at the forefront of local searches!
Local SEO Domination:  Put your business at the forefront of local searches!Local SEO Domination:  Put your business at the forefront of local searches!
Local SEO Domination: Put your business at the forefront of local searches!dstvtechnician
 
Social Samosa Guidebook for SAMMIES 2024.pdf
Social Samosa Guidebook for SAMMIES 2024.pdfSocial Samosa Guidebook for SAMMIES 2024.pdf
Social Samosa Guidebook for SAMMIES 2024.pdfSocial Samosa
 
Google 3rd-Party Cookie Deprecation [Update] + 5 Best Strategies
Google 3rd-Party Cookie Deprecation [Update] + 5 Best StrategiesGoogle 3rd-Party Cookie Deprecation [Update] + 5 Best Strategies
Google 3rd-Party Cookie Deprecation [Update] + 5 Best StrategiesSearch Engine Journal
 
Situation Analysis | Management Company.
Situation Analysis | Management Company.Situation Analysis | Management Company.
Situation Analysis | Management Company.DanielaQuiroz63
 
VIP 7001035870 Find & Meet Hyderabad Call Girls Film Nagar high-profile Call ...
VIP 7001035870 Find & Meet Hyderabad Call Girls Film Nagar high-profile Call ...VIP 7001035870 Find & Meet Hyderabad Call Girls Film Nagar high-profile Call ...
VIP 7001035870 Find & Meet Hyderabad Call Girls Film Nagar high-profile Call ...aditipandeya
 
How To Utilize Calculated Properties in your HubSpot Setup
How To Utilize Calculated Properties in your HubSpot SetupHow To Utilize Calculated Properties in your HubSpot Setup
How To Utilize Calculated Properties in your HubSpot Setupssuser4571da
 

Último (20)

Beyond Resumes_ How Volunteering Shapes Career Trajectories by Kent Kubie
Beyond Resumes_ How Volunteering Shapes Career Trajectories by Kent KubieBeyond Resumes_ How Volunteering Shapes Career Trajectories by Kent Kubie
Beyond Resumes_ How Volunteering Shapes Career Trajectories by Kent Kubie
 
Turn Digital Reputation Threats into Offense Tactics - Daniel Lemin
Turn Digital Reputation Threats into Offense Tactics - Daniel LeminTurn Digital Reputation Threats into Offense Tactics - Daniel Lemin
Turn Digital Reputation Threats into Offense Tactics - Daniel Lemin
 
How to Create a Social Media Plan Like a Pro - Jordan Scheltgen
How to Create a Social Media Plan Like a Pro - Jordan ScheltgenHow to Create a Social Media Plan Like a Pro - Jordan Scheltgen
How to Create a Social Media Plan Like a Pro - Jordan Scheltgen
 
Creator Influencer Strategy Master Class - Corinne Rose Guirgis
Creator Influencer Strategy Master Class - Corinne Rose GuirgisCreator Influencer Strategy Master Class - Corinne Rose Guirgis
Creator Influencer Strategy Master Class - Corinne Rose Guirgis
 
Do More with Less: Navigating Customer Acquisition Challenges for Today's Ent...
Do More with Less: Navigating Customer Acquisition Challenges for Today's Ent...Do More with Less: Navigating Customer Acquisition Challenges for Today's Ent...
Do More with Less: Navigating Customer Acquisition Challenges for Today's Ent...
 
Brand Strategy Master Class - Juntae DeLane
Brand Strategy Master Class - Juntae DeLaneBrand Strategy Master Class - Juntae DeLane
Brand Strategy Master Class - Juntae DeLane
 
Brighton SEO April 2024 - The Good, the Bad & the Ugly of SEO Success
Brighton SEO April 2024 - The Good, the Bad & the Ugly of SEO SuccessBrighton SEO April 2024 - The Good, the Bad & the Ugly of SEO Success
Brighton SEO April 2024 - The Good, the Bad & the Ugly of SEO Success
 
Mastering SEO in the Evolving AI-driven World
Mastering SEO in the Evolving AI-driven WorldMastering SEO in the Evolving AI-driven World
Mastering SEO in the Evolving AI-driven World
 
How to Leverage Behavioral Science Insights for Direct Mail Success
How to Leverage Behavioral Science Insights for Direct Mail SuccessHow to Leverage Behavioral Science Insights for Direct Mail Success
How to Leverage Behavioral Science Insights for Direct Mail Success
 
Unraveling the Mystery of Roanoke Colony: What Really Happened?
Unraveling the Mystery of Roanoke Colony: What Really Happened?Unraveling the Mystery of Roanoke Colony: What Really Happened?
Unraveling the Mystery of Roanoke Colony: What Really Happened?
 
Top 5 Breakthrough AI Innovations Elevating Content Creation and Personalizat...
Top 5 Breakthrough AI Innovations Elevating Content Creation and Personalizat...Top 5 Breakthrough AI Innovations Elevating Content Creation and Personalizat...
Top 5 Breakthrough AI Innovations Elevating Content Creation and Personalizat...
 
Local SEO Domination: Put your business at the forefront of local searches!
Local SEO Domination:  Put your business at the forefront of local searches!Local SEO Domination:  Put your business at the forefront of local searches!
Local SEO Domination: Put your business at the forefront of local searches!
 
BUY GMAIL ACCOUNTS PVA USA IP INDIAN IP GMAIL
BUY GMAIL ACCOUNTS PVA USA IP INDIAN IP GMAILBUY GMAIL ACCOUNTS PVA USA IP INDIAN IP GMAIL
BUY GMAIL ACCOUNTS PVA USA IP INDIAN IP GMAIL
 
The Future of Brands on LinkedIn - Alison Kaltman
The Future of Brands on LinkedIn - Alison KaltmanThe Future of Brands on LinkedIn - Alison Kaltman
The Future of Brands on LinkedIn - Alison Kaltman
 
Social Samosa Guidebook for SAMMIES 2024.pdf
Social Samosa Guidebook for SAMMIES 2024.pdfSocial Samosa Guidebook for SAMMIES 2024.pdf
Social Samosa Guidebook for SAMMIES 2024.pdf
 
No Cookies No Problem - Steve Krull, Be Found Online
No Cookies No Problem - Steve Krull, Be Found OnlineNo Cookies No Problem - Steve Krull, Be Found Online
No Cookies No Problem - Steve Krull, Be Found Online
 
Google 3rd-Party Cookie Deprecation [Update] + 5 Best Strategies
Google 3rd-Party Cookie Deprecation [Update] + 5 Best StrategiesGoogle 3rd-Party Cookie Deprecation [Update] + 5 Best Strategies
Google 3rd-Party Cookie Deprecation [Update] + 5 Best Strategies
 
Situation Analysis | Management Company.
Situation Analysis | Management Company.Situation Analysis | Management Company.
Situation Analysis | Management Company.
 
VIP 7001035870 Find & Meet Hyderabad Call Girls Film Nagar high-profile Call ...
VIP 7001035870 Find & Meet Hyderabad Call Girls Film Nagar high-profile Call ...VIP 7001035870 Find & Meet Hyderabad Call Girls Film Nagar high-profile Call ...
VIP 7001035870 Find & Meet Hyderabad Call Girls Film Nagar high-profile Call ...
 
How To Utilize Calculated Properties in your HubSpot Setup
How To Utilize Calculated Properties in your HubSpot SetupHow To Utilize Calculated Properties in your HubSpot Setup
How To Utilize Calculated Properties in your HubSpot Setup
 

Breaking Bad SEO - The Science of Crawl Space

  • 1. Breaking Bad SEO The Science of Crawl Space
  • 2. Meet Walter… Meet Walter, the chemistry teacher. Despite getting his hands pretty dirty throughout the series, Walt is fundamentally a good guy and as the star of the show, from a search perspective he should be considered a white hat SEO. As a scientist, Walt is methodical in his approach, he understands the principles and practices needed to achieve results and has the skill to produce the very best crystal meth.
  • 3. Jesse Pinkman Jesse Pinkman, on the other hand, is a Black Hat SEO. An ex-student of Walt’s who failed chemistry at school and became a drug dealer. Jesse knows his industry and has the right contacts to get by, but he isn’t really interested in delivering the very best results or long term plans. So why "Crawl Space“?
  • 4. The Totality of Possible URLs For a Website “[You]…have several options to optimize the “crawl space” (the totality of URLs on your site known to Googlebot) for unique content pages, reduce crawling of duplicative pages, and consolidate indexing signals.” Google Webmaster Central Blog http://googlewebmastercentral.blogspot.co.uk/2014/02/faceted-navigation-best-and-5-of-worst.html What is Crawl Space? Google regularly refer to crawl space - it’s fundamentally about knowing your site architecture - the first step towards successful website optimisation. Identifying your crawl space is the first step towards knowing your site architecture. Why should we care about crawl space?
  • 5. Size Does Matter… Google warn you about potential crawl space issues in GWT and outline some of the implications… •Unnecessarily crawling a large number of duplicate content URLs •Discovering undesired parts of your site •Consuming more bandwidth than necessary •Potential inability to completely index all of your site.
  • 6. domain.com domain.com/home domain.com/home/ domain.com/Home domain.com/?source=ppc domain.com/index.html How’d I Get so Many Pages? So check what’s in the indexes already, is it a realistic number of pages for your site? Do you recognise the URL structure/formats being returned? The most common contributor to a large crawl space is duplicate content - seen here with BooHoo.com, which can be caused by a number of valid and invalid reasons.
  • 7. For SEO… There Can Only Be One!
  • 8. But Don’t Lose Your Head Auditing your Crawl Space will help you understand the full mix of URLs and help formulate a consistent implementation.
  • 9. What Makes Up Your Crawl Space? To build a picture of your site architecture and to discover your complete crawl space, you need to consider what contributes to your URL Universe – the place where all theoretically possible URLs exist.
  • 10. Poor management has implications: •Orphaned pages •Incomplete sitemaps •Dilution of backlinks and shares •Performance is harder to track •Limited volume of unique URLs in analytics •Crawl inefficiency for SEO •Traffic growth strategies just won’t work The Threats A lack of Crawl Space management can leave you wide open to threats, & lead to a load of GWT warnings!
  • 11. Do social signals impact organic rankings? •Twitter? •Facebook? •Google+ Potentially Social and Backlink Synergy Who cares!? Even if social signals are not used in a ranking algorithm, social media and SEO are both able to drive discovery at scale. They both have enormous reach so understanding the crawl space for search & social is crucial. URL aggregation is not necessarily required, I’d recommend it. So let take a look at how to tackle all this…
  • 12. To get going we need all the components: •A Cook •A Lab •The Formula •Organic Ingredients •Recipe Let’s Cook!
  • 13. Be the Boss (of URLs) •An SEO needs to be a great cook… • Make sure you have the right equipment • Follow an accurate formula • Use quality ingredients •Take ownership of your crawl space • Benchmark • URL Roadmap The Cook – YOU! Just like Walt, you need to be the boss, the head chef, the daddy, ideally a scientist – but not a prerequisite if you have the right tools to help you take ownership of your crawl space.
  • 14. The Lab This is Jesse’s RV – effectively a mobile meth lab. We need to set up our lab with the tools and equipment necessary to develop search marketing recipes. Need to link a range of data sources to understand the crawl space: •Webmaster Tools - Google & Bing •Landing page data (Analytics) •Linked URL data - WMT/hrefs/Majestic/OSE •Website crawler (Xenu/Screaming Frog/DeepCrawl) DC’s new Universal Crawl now includes Google Analytics landing page data, along with link equity and social tagging reports to assess a comprehensive crawl space in one.
  • 15. Maximised + Minimised = Optimised Maximise indexable space •Increase volume of valuable pages •Increase crawl efficiency Minimise crawlable space •Define your crawl space •Identify and eliminate threats Optimise canonical space •A clean version of your website •Your URL à la carte The Formula Discovery, management & optimisation of crawl space is essential and lays the foundation for strong performance. All spaces need to be carefully defined and managed efficiently.
  • 16. Use Organic Ingredients As SEOs, we have a whole host of juicy ingredients at our disposal to help cook up an optimised crawl space - picking the very best ingredients for your recipe will make all the difference in testing your recipes. Use the custom controls in DC to extract and schedule regular data comparisons for each source… Overwrite robots.txt rules to assess full crawl space as well as the one delivered to search engines. Schedule regular sitemap crawls and compare against internal and external links, review canonical setup and index controls to maintain consistency. - DeepCrawl Backlinks crawl - OG tags & Hreflang DeepCrawl report
  • 17. So What’s the Recipe?
  • 18. Crawl Space Solutions in One… New Universal Crawl is now out of Beta! Review website, XML sitemaps & organic landing page data in one Universal Crawl with Deep Crawl. Take advantage of a significant head start in defining, managing & optimising your crawl spaces. Universal Crawl is the New Heisenberg…
  • 19. Deep Crawl Goes Universal
  • 20. Deep Crawl Goes Universal DC’s Universal Crawl helps you quickly and easily understand your crawl space and identify URL sources, gaps, new formats and traffic value, plus you get all the regular DC features including fully customisable crawl settings, ability to compare test environment vs live site (support QA), custom extraction tools and scheduled crawls that record change, impact and help quantify SEO deliverables. Lets take a look at some recipes to control your crawl space… DeepCrawl automatically shows you what’s changed between crawls so you can understand how much of the site is changing. You might spot some URLs formats which are changing very frequently and affecting the crawl efficiency.
  • 21. Understand what’s in your crawl space… •Assess indexation reports •Review the current index •Test URL parameter changes •Quick improvements through GWT Parameter settings Identify Indexable & Non-indexable Parameters DeepCrawl indexation reports help you quickly assess all crawlable URLs, unique pages, noindex pages and identify URL parameters that you might not want indexed. Use Webmaster tools data and and site: checks to understand current search engine indexation and use the Parameter Removal controls in DeepCrawl to test the impact of stripping parameters. Monitor crawl efficiency changes and for a quick win update your URL parameter settings in Webmaster tools when you find the right formula.
  • 22. Make sure you’re not self harming •Check canonical implementation •All canonical pages should be linked internally •Assess pages without canonical tags Canonical URL Configuration
  • 23. Which URLs Are Being Shared? ? Check the consistency of canonical URLs against social URLs – are your OG & TwitterCard URLs consistent? DeepCrawl has a report called to automatically show any errors. You can also schedule custom extraction crawls to regularly assess social share equity changes for specific URLs using DC. Likewise for blog comments etc.
  • 24. Where’s your thin content hiding out? Identify Low Value Navigational Pages
  • 25. Take a good look at your analytics… •Review URLs delivering minimal traffic •Identify and assess URLs outside of your canonical setup Identify Low Value Non Navigational URLs Review and assess if you can make your crawl more efficient by excluding certain pages. Check your non-canonical URLs & use Universal Crawl to see non-navigational pages not driving traffic.
  • 26. Identify Domain Aliases Understanding who’s working with you or who’s against you is important too. Review your domain aliases. Test all registered domains to check if they return a duplicate or redirect. Check www/non-www configuration and HTTP/HTTPS. Implement redirects or use cross domain canonical setups.
  • 27. Monitor your domain portfolio & keep alert! www.robotto.org Monitor domains using Robotto.org
  • 28. Check all disallowed URLs… •Webmaster Tools •Deep Crawl Indexation Reports Disallowed URLs DeepCrawl reports all ‘Disallowed URLs’ so you can easily see what's already being excluded. Test changes to your robots.txt file using the Robots Overwrite functionality & develop an optimised file that increases disallowed URLs & focuses the crawl on primary pages - test the impact.
  • 29. Review & Validate all linked URLs… Identify All Linked URLs Check that your website internal linking is working towards an optimised crawl space. Use DeepCrawl internal broken links, redirected links, 4xx and 5xx error reports to identify internal links that are broken or are redirected URLs that may affecting your crawl efficiency.
  • 30. Crawl your sitemap regularly… •Run analysis and compare: • Scheduled sitemap crawls • Scheduled website crawls • All validated, linked URLs Compare Sitemaps Identify pages ‘Only in Sitemaps’ and not linked internally, plus all pages linked internally, which are not in the Sitemaps. Schedule regular sitemap & comparative website crawls to assess what’s changed, you can monitor how much of the site is changing & map against performance. You might spot some URL formats which are changing frequently and impacting crawl efficiency.
  • 31. Where’s the link equity? •Identify pages delivering traffic but not internally linked •Understand the link profile of all pages: • Crawl aggregated link data •DeepCrawl automatically applies link metrics to all reports: Compare Landing Page URLs to Linked URLs By comparing sitemaps, GA and internal links, a Universal Crawl easily highlights URLs discovered ‘Only in Organic Landing Pages’ and not linked internally. You can add ‘Backlink Crawls’ to your DC projects, simply upload a comprehensive backlink profile URL list and let DC crawl and validate the URLs. This crawl also helps identify pages generating traffic but not necessarily linked internally, plus it automatically applies inbound link equity metrics to all DC reports at a URL level – very useful.
  • 32. Watch your Language… Check your Hreflang! International SEO Considerations The correct use and implementation of HrefLang tags is important for effective International SEO, but it can be confusing and even experienced SEO’s get it wrong.
  • 33. • Universal Crawl tests implementation across: • Sitemaps • Headers • On-page • Review a matrix of language alternatives for each page • Assess gaps and inconsistencies in the setup • Review a ‘Pages Without Hreflang Tags’ report • See David Sottimano’s MOZ post on HrefLang: http://moz.com/blog/hreflang-behaviour-insights International SEO Considerations DeepCrawl helps manage a seamless HrefLang integration by detecting hreflang tags in sitemaps, headers and on-page before showing a matrix of language alternatives for every page so you can see the gaps and inconsistencies in the setup.
  • 34. • Review your options and consider your URL Universe • Setup your lab • Google Analytics • Google/BING Webmaster Tools • Deep Crawl – Universal Crawl • Follow the formula: • Maximised + Minimised = Optimised • Develop and test new recipes to focus your crawl spaces. #BreakingBadSEO
  • 35. Thanks, Keep in Touch…

Notas del editor

  1. Why Breaking Bad? Story of a high school chemistry teacher who discovers he has cancer and starts producing crystal meth to earn some fast cash to help fund his treatment and leave some money for his family if he dies. Got me thinking about SEO as a science - requires knowledge, skill and regular testing to deliver the right results or improvements. Even the lead characters reflect certain aspects of the SEO industry…
  2. Meet Walter, the chemistry teacher. Despite getting his hands pretty dirty throughout the series, Walt is fundamentally a good guy and as the star of the show, from a search perspective he should be considered a white hat SEO. As a scientist, Walt is methodical in his approach, he understands the principles and practices needed to achieve results and has the skill to produce the very best crystal meth.
  3. Jesse Pinkman on the other hand is a Black Hat SEO. An ex-student of Walt’s who failed chemistry at school and became a drug dealer. Jesse knows his industry and has the right contacts to get by, but he isn’t really interested in delivering the very best results or long term plans. So why "Crawl Space“?
  4. The Totality of Possible URLs For a Website Google regularly refer to crawl space - it’s fundamentally about knowing your site architecture - the first step towards successful website optimisation. Identifying your crawl space is the first step towards knowing your site architecture. Why should we care about crawl space?
  5. Because size matters. Google warn you about potential crawl space issues in GWT and outline some of the implications… Unnecessarily crawling a large number of duplicate content URLs Discovering undesired parts of your site Consuming more bandwidth than necessary Potential inability to completely index all of your site.
  6. So check what’s in the search engine indexes already, is it a realistic number of pages for your site? Do you recognise the URL structure/formats being returned? The most common contributor to a large crawl space is duplicate content - seen here with BooHoo.com, which can be caused by a number of valid and invalid reasons. For SEO, when it comes to URLs per page… (There can only be one!)
  7. There can only be one.
  8. But don’t lose you head… Auditing your Crawl Space will help you understand the full mix of URLs and help formulate a consistent implementation.
  9. To build a picture of your site architecture and to discover your complete crawl space, you need to consider what contributes to your URL Universe – the place where all theoretically possible URLs exist. Invoked URLsAll of the URLs ever brought into existence, for any reason. Uninvoked URLsURLs that haven’t been invoked (but could be?) Internally Linked URLsAll Invoked URLs which linked from other internal pages URLs In SitemapsAll of the Invoked URLs which are currently included in a Sitemap. Crawlable URLsAll the URLs in the Universe which could be crawled by the search engine Indexable URLsAll the URLs in the Universe which could be indexed by the search engine Canonical URLsAll the clean canonical URLs Should be your crawable uniqe pages Indexed URLsAll crawled pages which are now in the search engine’s index Socially Shared URLsAll URLs being shared across social media platforms, Facebook posts, tweets, etc. Organic Landing Page URLsAll indexed URLs which have driven traffic from organic search results Externally Linked URLsAll URLs linked from another site. Mobile URLs Translated/Regional URLs Shortened URLsexternal but internal Domain DuplicatesAliased domains www/non-www, http/https
  10. A lack of Crawl Space management can leave you wide open to threats, and lead to a whole load of GWT warnings!
  11. Who cares!? Even if social signals are not used in a ranking algorithm, social media and SEO are both able to drive discovery at scale. They both have enormous reach so understanding the crawl space for search & social is crucial. URL aggregation is not necessarily required, I’d recommend it. So let take a look at how to tackle all this… Lets Cook.
  12. To get going we need all the components: A Cook A Lab The Formula Ingredients Recipe
  13. Just like Walt, you need to be the boss, the head chef, the daddy, ideally a scientist – but not a prerequisite if you have the right tools to help you take ownership of your crawl space. The Lab…
  14. This is Jesse’s RV – effectively a mobile meth lab. We need to set up our lab with the tools and equipment necessary to develop search marketing recipes. Need to link up a range of data sources to help us understand the crawl space: Webmaster Tools - Google & Bing Landing page data (Analytics) Linked URL data - WMT/hrefs/Majestic/OSE Website crawler (Xenu/Screaming Frog/DeepCrawl) DC’s new Universal Crawl now includes Google Analytics landing page data, along with link equity and social tagging reports to assess a comprehensive crawl space in one. The Formula…
  15. Maximised + Minimised = Optimised Discovery, management and optimisation of your crawl space is essential and lays the foundation for strong performance. All spaces need to be carefully defined and managed efficiently. Maximise indexable space, Minimise crawlable space, Optimise canonical space. Thankfully the search engines empower you with some great ingredients to help you develop your recipes…
  16. As SEOs, we have a whole host of juicy ingredients at our disposal to help cook up an optimised crawl space - picking the very best ingredients for your recipe will make all the difference in testing your recipes. Use the custom controls in DC to extract and schedule regular data comparisons for each source… Overwrite robots.txt rules to assess full crawl space as well as the one delivered to search engines. Schedule regular sitemap crawls and compare against internal and external links, review canonical setup and index controls to maintain consistency. - DeepCrawl Backlinks crawl - OG tags & Hreflang DeepCrawl report
  17. New Universal Crawl out of Beta – Capture and audit comprehensive website, XML sitemap & organic landing page data in one Universal Crawl - designed to capture and inform your crawl space optimisation for search and social using a wide range of data sources.
  18. DC’s Universal Crawl helps you quickly and easily understand your crawl space and identify URL sources, gaps, new formats and traffic value, plus you get all the regular DC features including fully customisable crawl settings, ability to compare test environment vs live site (support QA), custom extraction tools and scheduled crawls that record change, impact and help quantify SEO deliverables. Lets take a look at some recipes to control your crawl space… DeepCrawl automatically shows you what’s changed between crawls so you can understand how much of the site is changing. You might spot some URLs formats which are changing very frequently and affecting the crawl efficiency.
  19. DC’s Universal Crawl helps you quickly and easily understand your crawl space and identify URL sources, gaps, new formats and traffic value, plus you get all the regular DC features including fully customisable crawl settings, ability to compare test environment vs live site (support QA), custom extraction tools and scheduled crawls that record change, impact and help quantify SEO deliverables. Lets take a look at some recipes to control your crawl space… DeepCrawl automatically shows you what’s changed between crawls so you can understand how much of the site is changing. You might spot some URLs formats which are changing very frequently and affecting the crawl efficiency.
  20. DeepCrawl indexation reports help you quickly assess all crawlable URLs, unique pages, noindex pages and identify URL parameters that you might not want indexed. Use Webmaster tools data and and site: checks to understand current search engine indexation and use the Parameter Removal controls in DeepCrawl to test the impact of stripping parameters. Monitor crawl efficiency changes and for a quick win update your URL parameter settings in Webmaster tools when you find the right formula.
  21. ‘Canonicalized Pages’ reports to ensure your canonical implementation is correct. Identify canonical URLs which aren’t linked internally with the ‘Unlinked Canonical Pages’ report – you’d expect every canonical URL to be linked somewhere internally. The ‘Pages without Canonical Tag’ report shows you pages that are missing canonical tags, makes sure there aren’t any important pages included here.
  22. Check the consistency of your canonical URLs against social URLs – are your OG & TwitterCard URLs consistent?DeepCrawl has a report called ‘Inconsistent Open Graph and Canonical URLs’ to automatically show any errors. You can also schedule custom extraction crawls to regularly assess social share equity changes for specific URLs using DC. Likewise for blog comments etc.
  23. Use DeepCrawl Min. content/html ratio reports to find potential Panda pages linked internally. All pages with less than 10 percent minimum content/HTML ratio. 
  24. Review and assess if you can make your crawl more efficient by excluding certain pages.Check your non-canonical URLs and use Universal Crawl to see non-navigational pages not driving traffic.
  25. Understanding who’s working with you or who’s against you is important too. Review your domain aliases. Test all registered domains to check if they return a duplicate or redirect. Check www/non-www configuration and HTTP/HTTPS. Implement redirects or use cross domain canonical setups. Monitor domains using Robotto.org
  26. DeepCrawl reports all ‘Disallowed URLs’ so you can easily see what's already being excluded. Test changes to your robots.txt file using the Robots Overwrite functionality and develop an optimised file that increases disallowed URLs and focuses your crawl on primary pages - test the impact.
  27. Check that your website internal linking is working towards an optimised crawl space. Use DeepCrawl internal broken links, redirected links, 4xx and 5xx error reports to identify internal links on your site that are broken or are redirected URLs that may affecting your crawl efficiency.
  28. Identify pages ‘Only in Sitemaps’ and not linked internally, plus all pages linked internally, which are not in the Sitemaps. Schedule regular sitemap and comparative website crawls to assess what’s changed, you can monitor how much of the site is changing and map against performance. You might spot some URLs formats which are changing frequently and impacting crawl efficiency.
  29. By comparing sitemaps, GA and internal links, a Universal Crawl easily highlights URLs discovered ‘Only in Organic Landing Pages’ and not linked internally. You can add ‘Backlink Crawls’ to your DC projects, simply upload a comprehensive backlink profile URL list and let DC crawl and validate the URLs. This crawl also helps identify pages generating traffic but not necessarily linked internally, plus it automatically applies inbound link equity metrics to all DC reports at a URL level – very useful.
  30. The correct use and implementation of HrefLang tags is important for effective International SEO, but it can be confusing and even experienced SEO’s get it wrong.
  31. DeepCrawl helps manage a seamless HrefLang integration by detecting hreflang tags in sitemaps, headers and on-page before showing a matrix of language alternatives for every page so you can see the gaps and inconsistencies in the setup.