SlideShare una empresa de Scribd logo
1 de 30
Bot Herding
               presented by Stephan Spencer,
             Founder & President, Netconcepts


© 2008 Stephan M Spencer Netconcepts www.netconcepts.com sspencer@netconcepts.com
Duplicate Content Mitigation
 Dup content is rampant on blogs. Herd bots to permalink
  URL & lead in everywhere else (Archives by Date
  pages, Category pages, Tag pages, Home page, etc.)
  with paraphrased “Optional Excerpt”
   – Not just the first couple paragraphs, i.e. the <!--more--> tag!
   – Requires you to revise your Main Index Template theme file:
     if (empty($post->post_excerpt) || is_single() || is_page()) { the_content(); }
     else { the_excerpt(); echo quot;<a href='”; the_permalink(); echo quot;'
     rel='nofollow'>Continue reading &raquo;</a>quot;; }



                 © 2008 Stephan M Spencer Netconcepts www.netconcepts.com sspencer@netconcepts.com
Duplicate Content Mitigation
 Include sig line (& headshot photo!) at bottom of
  post/article. Link to original article/post permalink URL!
   – http://www.naturalsearchblog.com/archives/2008/06/03/syndic
     ating-your-articles/
   – http://www.businessblogconsulting.com/2008/05/brand-
     yourself-with-photo-sig-line




               © 2008 Stephan M Spencer Netconcepts www.netconcepts.com sspencer@netconcepts.com
Duplicate Content Mitigation
 On ecommerce sites, dup content also rampant:
   – Manufacturer-provided product descriptions, inconsistent order
     of query string parameters, “guided navigation”, pagination
     within categories, tracking parameters
 Selectively append tracking codes for humans w/ “white
  hat cloaking” or use JavaScript to append the codes
   – REI.com used to append a quot;vcatquot; parameter on all brand links
     on their Shop By Brand page (see
     http://web.archive.org/web/20060823085548/www.rei.com/rei/s
     ales_and_events/brands.html)
               © 2008 Stephan M Spencer Netconcepts www.netconcepts.com sspencer@netconcepts.com
Pagination
 Not only creates many pages that share the same
  keyword theme, also very large categories with
  thousands of products result in hundreds of pages of
  product listings not getting crawled. Thus lowered
  product page indexation.
 Herd bots through keyword-rich subcat links or “View
  All” link or both? How to display page number links?
  Optimal # of products to display/link per page? Test!

              © 2008 Stephan M Spencer Netconcepts www.netconcepts.com sspencer@netconcepts.com
PageRank Leakage?
 If you’re using Robots.txt Disallow, you’re probably
  leaking PageRank
 Robots.txt Disallow & Meta Robots Noindex both
  accumulate and pass PageRank
   – Meta Noindex tag on a Master sitemap page will de-index the
     page but still pass PageRank to linked sub-sitemap pages
 Meta Robots Nofollow blocks the flow of PageRank
   – http://www.stonetemple.com/articles/interview-matt-cutts.shtml


               © 2008 Stephan M Spencer Netconcepts www.netconcepts.com sspencer@netconcepts.com
Rewriting Spider-Unfriendly URLs
 3 approaches:
   1) Use a “URL rewriting” server module / plugin – such as
      mod_rewrite for Apache, or ISAPI_Rewrite for IIS Server
   2) Recode your scripts to extract variables out of the “path_info”
      part of the URL instead of the “query_string”
   3) Or, if IT department involvement must be minimized, use a
      proxy server based solution (e.g. Netconcepts' GravityStream)
   – With (1) and (2), replace all occurrences of your old URLs in
      links on your site with your new search-friendly URLs. 301
      redirect the old to new URLs too, so no link juice is lost.
               © 2008 Stephan M Spencer Netconcepts www.netconcepts.com sspencer@netconcepts.com
mod_rewrite – the Foundation for URL
Rewriting, Remapping & Redirecting
 Works with Apache and IBM HTTP Server
 Place “rules” within .htaccess or your Apache config file
  (e.g. httpd.conf, sites_conf/…)
   –   RewriteEngine on
   –   RewriteBase /
   –   RewriteRule ^products/([0-9]+)/?$ /get_product.php?id=$1 [L]
   –   RewriteRule ^([^/]+)/([^/]+).htm$
       /webapp/wcs/stores/servlet/ProductDisplay?storeId=10 001&cat
       alogId=10001&langId=-1 &categoryID=$1&productID=$2
       [QSA,P,L]
                © 2008 Stephan M Spencer Netconcepts www.netconcepts.com sspencer@netconcepts.com
Regular Expressions
 The magic of regular expressions / pattern matching
   –   * means 0 or more of the immediately preceding character
   –   + means 1 or more of the immediately preceding character
   –   ? means 0 or 1 occurrence of the immediately preceding char
   –   ^ means the beginning of the string, $ means the end of it
   –   . means any character (i.e. wildcard)
   –    “escapes” the character that follows, e.g. . mea dot
                                                          ns
   –   [ ] is for character ranges, e.g. [A-Za-z].
   –   ^ inside [] brackets means “not”, e.g. [^/]
                © 2008 Stephan M Spencer Netconcepts www.netconcepts.com sspencer@netconcepts.com
Regular Expressions
   – () puts whatever is wrapped within it into memory
   – Access what’s in memory with $1 (what’s in first set of parens),
     $2 (what’s in second set of parens), and so on
 Gotchas to beware of:
   – “Greedy” expressions. Use [^ instead of .*
   – .* can match on nothing. Use .+ instead
   – Unintentional substring matches because ^ or $ wasn’t
     specified


                © 2008 Stephan M Spencer Netconcepts www.netconcepts.com sspencer@netconcepts.com
mod_rewrite Specifics
 Proxy page using [P] flag
   – RewriteRule /blah.html$ http://www.google.com/ [P]
 [QSA] flag is for when you don’t want query string
  params dropped (like when you want a tracking param
  preserved)
 [L] flag saves on server processing
 Got a huge pile of rewrites? Use RewriteMap and have
  a lookup table as a text file

               © 2008 Stephan M Spencer Netconcepts www.netconcepts.com sspencer@netconcepts.com
IIS? ISAPI_Rewrite!
 What if your site is running Microsoft IIS Server?
 ISAPI_Rewrite plugin! Not that different from mod_rewrite
 In httpd.ini :
   – [ISAPI_Rewrite]
     RewriteRule ^/category/([0-9]+).htm$
     /index.asp?PageAction=VIEWCATS&Category=$1 [L]
   – Will rewrite a URL like
     http://www.example.com/index.asp?PageAction=VIEWCATS&Ca
     tegory=207 to something like
     http://www.example.com/category/207.htm
              © 2008 Stephan M Spencer Netconcepts www.netconcepts.com sspencer@netconcepts.com
Implementing 301 Redirects Using
Redirect Directives
 In .htaccess (or httpd.conf), you can redirect individual
  URLs, the contents of directories, entire domains… :
   – Redirect 301 /old_url.htm
     http://www.example.com/new_url.htm
   – Redirect 301 /old_dir/ http://www.example.com/new_dir/
   – Redirect 301 / http://www.example.com
 Pattern matching can be done with RedirectMatch 301
   – RedirectMatch 301 ^/(.+)/index.html$
     http://www.example.com/$1/

               © 2008 Stephan M Spencer Netconcepts www.netconcepts.com sspencer@netconcepts.com
Implementing 301 Redirects Using
Rewrite Rules
 Or use a rewrite rule with the [R=301] flag
   – RewriteCond %{HTTP_HOST} !^www.example.com$ [NC]
   – RewriteRule ^(.*)$ http://www.example.com/$1
     [L,QSA,R=301]
 [NC] flag makes the rewrite condition case-insensitive




              © 2008 Stephan M Spencer Netconcepts www.netconcepts.com sspencer@netconcepts.com
Conditional Redirects
 Conditional 301 for bots – great for capturing the link juice
  from inbound affiliate links
 Only works if you manage your own affiliate program
 Most are outsourced and 302  (e.g. C.J.)
 By outsourcing your affiliate marketing, none of your deep
  affiliate links are counting
 If Amazon’s doing it, why can’t you? 
   – (Credit to Brian Klais for hypothesizing Amazon was doing this)
   – http://tinyurl.com/5ubc28
               © 2008 Stephan M Spencer Netconcepts www.netconcepts.com sspencer@netconcepts.com
Status Code
                                                                       200 for humans

© 2008 Stephan M Spencer Netconcepts www.netconcepts.com sspencer@netconcepts.com
301 for all bots.
                                                                         Muahaha!!




© 2008 Stephan M Spencer Netconcepts www.netconcepts.com sspencer@netconcepts.com
Implementing Conditional Redirects
Using Rewrite Rules
 Selectively redirect bots that request URLs with session
  IDs to the URL sans session ID:
   – RewriteCond %{QUERY_STRING} PHPSESSID
     RewriteCond %{HTTP_USER_AGENT} Googlebot.* [OR]
     RewriteCond %{HTTP_USER_AGENT} ^msnbot.* [OR]
     RewriteCond %{HTTP_USER_AGENT} Slurp [OR]
     RewriteCond %{HTTP_USER_AGENT} Ask Jeeves
     RewriteRule ^/(.*)$ /$1 [R=301,L]
 Utilize browscap.ini instead of having to keep up with
  each spider’s name and version changes
              © 2008 Stephan M Spencer Netconcepts www.netconcepts.com sspencer@netconcepts.com
Error Pages




        © 2008 Stephan M Spencer Netconcepts www.netconcepts.com sspencer@netconcepts.com
Error Pages
 Traditional approach is to serve up a 404, which drops that error
  page with the obsolete or wrong URL out of the search indexes.
  This squanders the link juice to that page.
 But what if you return a 200 status code instead, so that the
  spiders follow the links! Then include a meta robots noindex so the
  error page itself doesn’t get indexed. 
 Or do a 301 redirect to something valuable (e.g. your home page)
  and dynamically include a small error notice 
 (Credit to Francois Planque for this clever approach.)


                © 2008 Stephan M Spencer Netconcepts www.netconcepts.com sspencer@netconcepts.com
URL Stability
 An annually recurring feature, like a Holiday Gift Buying
  Guide, should have a stable, date-unspecified URL
   – No need for any 301s
   – When the current edition is to be retired and replaced with a
     new edition, assign a new URL to the archived edition
 Otherwise link juice earned over time is not carried over
  to future years’ editions



                © 2008 Stephan M Spencer Netconcepts www.netconcepts.com sspencer@netconcepts.com
URL Testing
 URL affects
  searcher
  clickthrough
  rates
 Short URLs
  get clicked on
  2X long URLs

  (Source: MarketingSherpa,
  used with permission)


                 © 2008 Stephan M Spencer Netconcepts www.netconcepts.com sspencer@netconcepts.com
URL Testing
 Further, long URLs appear to act as a deterrent to clicking, drawing
  attention away from its listing and instead directing it to the listing
  below it, which then gets clicked 2.5x more frequently.
    – http://searchengineland.com/080515-084124.php
 Don’t be complacent with search-friendly URLs. Test and optimize.
 Make iterative improvements to URLs, but don’t lose link juice to
  previous URLs. 301 previous URLs to latest. No ch of 301s.
                                                    ains
 WordPress handles 301s automatically when renaming post slugs
 Mass editing URLs (post slugs) in WordPress – announcement
  tomorrow in Give It Up session
                 © 2008 Stephan M Spencer Netconcepts www.netconcepts.com sspencer@netconcepts.com
Yank Competitor’s Grouped Results
from Google page 1 SERPs
 Knock out your competitor’s second indented (grouped)
  listing by directing link juice to other non-competitive
  listings (e.g. on page 2 SERPs, or directly below
  indented result’s true position)
 First, find the true position of their indented result by
  appending &num=9 to the URL and see if the indented
  listing drops off. If not, append &num=8. Rinse and
  repeat until the indented listing falls away. Indented
  listing is more susceptible the worse its true position.
              © 2008 Stephan M Spencer Netconcepts www.netconcepts.com sspencer@netconcepts.com
This isn’t
                                       really #3




© 2008 Stephan M Spencer Netconcepts www.netconcepts.com sspencer@netconcepts.com
Nope,
                                         not yet




© 2008 Stephan M Spencer Netconcepts www.netconcepts.com sspencer@netconcepts.com
Gone!
                                        It’s true
                                        position
                                         was #9




© 2008 Stephan M Spencer Netconcepts www.netconcepts.com sspencer@netconcepts.com
SEO the
                                        title of #12
                                        to bump it
                                        up to page
                                       1 – it will be
                                       grouped to
                                         #2. Then
                                        link to #11
                                       and bump it
                                        up to page
                                        1 to knock
                                       #4 to page 2



© 2008 Stephan M Spencer Netconcepts www.netconcepts.com sspencer@netconcepts.com
More Things I Wish I Had Time to
Cover
 Robots.txt gotchas
 Webmaster Central tools (www vs no www, crawl rate, robots.txt
  builder, Sitemaps, etc.)
 Yahoo's Dynamic URLs tab in Site Explorer
 <div class=quot;robots-nocontentquot;>
 If-Modified-Since
 Status codes 404, 401, 500 etc.
 PageRank transfer from PDFs, RSS feeds, Word docs etc.
 Diagnostic tools (e.g. livehttpheaders, User Agent Switcher)

                © 2008 Stephan M Spencer Netconcepts www.netconcepts.com sspencer@netconcepts.com
Thanks!
 This Powerpoint can be downloaded from
  www.netconcepts.com/learn/bot-herding.ppt
 For 180 minute long screencast (including 90 minutes
  of Q&A) on SEO for large dynamic websites (taught
  by myself and Chris Smith) – including transcripts –
  email seo@netconcepts.com
 Questions after the show? Email me at
  stephan@netconcepts.com

             © 2008 Stephan M Spencer Netconcepts www.netconcepts.com sspencer@netconcepts.com

Más contenido relacionado

Destacado

SEO Camp'us 2015 - Atelier pratique Digital Analytics
SEO Camp'us 2015 - Atelier pratique Digital AnalyticsSEO Camp'us 2015 - Atelier pratique Digital Analytics
SEO Camp'us 2015 - Atelier pratique Digital Analytics
Nicolas Malo
 

Destacado (6)

Actualités sur Google et le SEO - Février 2015
Actualités sur Google et le SEO - Février 2015Actualités sur Google et le SEO - Février 2015
Actualités sur Google et le SEO - Février 2015
 
21 avril 2015 : la compatibilité mobile, critère SEO officiel chez Google
21 avril 2015 : la compatibilité mobile, critère SEO officiel chez Google21 avril 2015 : la compatibilité mobile, critère SEO officiel chez Google
21 avril 2015 : la compatibilité mobile, critère SEO officiel chez Google
 
SEO Camp'us 2015 - Atelier pratique Digital Analytics
SEO Camp'us 2015 - Atelier pratique Digital AnalyticsSEO Camp'us 2015 - Atelier pratique Digital Analytics
SEO Camp'us 2015 - Atelier pratique Digital Analytics
 
Backlinks : pépites et pommes pourries - SEO Camp'us 2015
Backlinks : pépites et pommes pourries - SEO Camp'us 2015Backlinks : pépites et pommes pourries - SEO Camp'us 2015
Backlinks : pépites et pommes pourries - SEO Camp'us 2015
 
#Seocamp Paris 2015 Google Adwords: Domptez le et vous Convertirez !
#Seocamp Paris 2015 Google Adwords: Domptez le et vous Convertirez ! #Seocamp Paris 2015 Google Adwords: Domptez le et vous Convertirez !
#Seocamp Paris 2015 Google Adwords: Domptez le et vous Convertirez !
 
#SeoCamp 2015 Google Adwords: Innovez et Améliorez votre Visibilité
#SeoCamp 2015 Google Adwords: Innovez et Améliorez votre Visibilité#SeoCamp 2015 Google Adwords: Innovez et Améliorez votre Visibilité
#SeoCamp 2015 Google Adwords: Innovez et Améliorez votre Visibilité
 

Similar a Google Bot Herding, PageRank Sculpting and Manipulation

Advanced SEO for Web Developers
Advanced SEO for Web DevelopersAdvanced SEO for Web Developers
Advanced SEO for Web Developers
Nathan Buggia
 
3 coding101 fewd_lesson3_your_first_website 20210105
3 coding101 fewd_lesson3_your_first_website 202101053 coding101 fewd_lesson3_your_first_website 20210105
3 coding101 fewd_lesson3_your_first_website 20210105
John Picasso
 
Using Amazon Simple Db With Rails
Using Amazon Simple Db With RailsUsing Amazon Simple Db With Rails
Using Amazon Simple Db With Rails
Akhil Bansal
 
Desenvolvimento web com Ruby on Rails (parte 2)
Desenvolvimento web com Ruby on Rails (parte 2)Desenvolvimento web com Ruby on Rails (parte 2)
Desenvolvimento web com Ruby on Rails (parte 2)
Joao Lucas Santana
 

Similar a Google Bot Herding, PageRank Sculpting and Manipulation (20)

Seo mistakes
Seo mistakesSeo mistakes
Seo mistakes
 
Seo mistakes
Seo mistakesSeo mistakes
Seo mistakes
 
Advanced SEO for Web Developers
Advanced SEO for Web DevelopersAdvanced SEO for Web Developers
Advanced SEO for Web Developers
 
Getting More Traffic From Search Advanced Seo For Developers Presentation
Getting More Traffic From Search  Advanced Seo For Developers PresentationGetting More Traffic From Search  Advanced Seo For Developers Presentation
Getting More Traffic From Search Advanced Seo For Developers Presentation
 
(SEO) Search Engine Optimization
(SEO) Search Engine Optimization(SEO) Search Engine Optimization
(SEO) Search Engine Optimization
 
Java script
Java scriptJava script
Java script
 
.htaccess for SEOs - A presentation by Roxana Stingu
.htaccess for SEOs - A presentation by Roxana Stingu.htaccess for SEOs - A presentation by Roxana Stingu
.htaccess for SEOs - A presentation by Roxana Stingu
 
3 coding101 fewd_lesson3_your_first_website 20210105
3 coding101 fewd_lesson3_your_first_website 202101053 coding101 fewd_lesson3_your_first_website 20210105
3 coding101 fewd_lesson3_your_first_website 20210105
 
T5 Oli Aro
T5 Oli AroT5 Oli Aro
T5 Oli Aro
 
The Django Web Application Framework 2
The Django Web Application Framework 2The Django Web Application Framework 2
The Django Web Application Framework 2
 
The Django Web Application Framework 2
The Django Web Application Framework 2The Django Web Application Framework 2
The Django Web Application Framework 2
 
The Django Web Application Framework 2
The Django Web Application Framework 2The Django Web Application Framework 2
The Django Web Application Framework 2
 
The Django Web Application Framework 2
The Django Web Application Framework 2The Django Web Application Framework 2
The Django Web Application Framework 2
 
Nanoformats
NanoformatsNanoformats
Nanoformats
 
Using Amazon Simple Db With Rails
Using Amazon Simple Db With RailsUsing Amazon Simple Db With Rails
Using Amazon Simple Db With Rails
 
Migration from ASP to ASP.NET
Migration from ASP to ASP.NETMigration from ASP to ASP.NET
Migration from ASP to ASP.NET
 
Web performance essentials - Goodies
Web performance essentials - GoodiesWeb performance essentials - Goodies
Web performance essentials - Goodies
 
Desenvolvimento web com Ruby on Rails (parte 2)
Desenvolvimento web com Ruby on Rails (parte 2)Desenvolvimento web com Ruby on Rails (parte 2)
Desenvolvimento web com Ruby on Rails (parte 2)
 
SDPHP - Percona Toolkit (It's Basically Magic)
SDPHP - Percona Toolkit (It's Basically Magic)SDPHP - Percona Toolkit (It's Basically Magic)
SDPHP - Percona Toolkit (It's Basically Magic)
 
Sinatra
SinatraSinatra
Sinatra
 

Más de David Degrelle - Consultant SEO Expert

Chiffres clés et usages des sites vente de chaussures en ligne france 2011
Chiffres clés et usages des sites vente de chaussures en ligne france 2011Chiffres clés et usages des sites vente de chaussures en ligne france 2011
Chiffres clés et usages des sites vente de chaussures en ligne france 2011
David Degrelle - Consultant SEO Expert
 

Más de David Degrelle - Consultant SEO Expert (19)

Référencement et Web Sémantique SMX Paris 2013
Référencement et Web Sémantique SMX Paris 2013Référencement et Web Sémantique SMX Paris 2013
Référencement et Web Sémantique SMX Paris 2013
 
Référencement international et SEO en Suisse
Référencement international et SEO en SuisseRéférencement international et SEO en Suisse
Référencement international et SEO en Suisse
 
Le Référencement Multicanal sur internet en 2012
Le Référencement Multicanal sur internet en 2012Le Référencement Multicanal sur internet en 2012
Le Référencement Multicanal sur internet en 2012
 
Referencement multicanal en e-tourime
Referencement multicanal en e-tourimeReferencement multicanal en e-tourime
Referencement multicanal en e-tourime
 
Chiffres clés et usages des sites vente de chaussures en ligne france 2011
Chiffres clés et usages des sites vente de chaussures en ligne france 2011Chiffres clés et usages des sites vente de chaussures en ligne france 2011
Chiffres clés et usages des sites vente de chaussures en ligne france 2011
 
Personnalisation des recherches Google, Google Instant, Caffeine et Mayday
Personnalisation des recherches Google, Google Instant, Caffeine et MaydayPersonnalisation des recherches Google, Google Instant, Caffeine et Mayday
Personnalisation des recherches Google, Google Instant, Caffeine et Mayday
 
Du référencement naturel (SEO) au référencement Social (SMO)
Du référencement naturel (SEO) au référencement Social (SMO)Du référencement naturel (SEO) au référencement Social (SMO)
Du référencement naturel (SEO) au référencement Social (SMO)
 
Online Gaming and Casino SEO in France
Online Gaming and Casino SEO in FranceOnline Gaming and Casino SEO in France
Online Gaming and Casino SEO in France
 
SEOCampus 2010 : Referencement Universel
SEOCampus 2010 : Referencement UniverselSEOCampus 2010 : Referencement Universel
SEOCampus 2010 : Referencement Universel
 
Référencement Social et SMO
Référencement Social et SMORéférencement Social et SMO
Référencement Social et SMO
 
Référencement Multimédia et Universel sur Google
Référencement Multimédia et Universel sur GoogleRéférencement Multimédia et Universel sur Google
Référencement Multimédia et Universel sur Google
 
Etude Google Adwords/Metrix Lab pour L'oréal
Etude Google Adwords/Metrix Lab pour L'oréalEtude Google Adwords/Metrix Lab pour L'oréal
Etude Google Adwords/Metrix Lab pour L'oréal
 
1ere Position Ebusiness Archamps 2007
1ere Position Ebusiness Archamps 20071ere Position Ebusiness Archamps 2007
1ere Position Ebusiness Archamps 2007
 
Le PageRank est mort, vive le TrustRank !
Le PageRank est mort, vive le TrustRank !Le PageRank est mort, vive le TrustRank !
Le PageRank est mort, vive le TrustRank !
 
Long Distance WiFi record : 382 km by air !
Long Distance WiFi record : 382 km by air !Long Distance WiFi record : 382 km by air !
Long Distance WiFi record : 382 km by air !
 
Barometre e-mailing secteur E-Commerce en France
Barometre e-mailing secteur E-Commerce en FranceBarometre e-mailing secteur E-Commerce en France
Barometre e-mailing secteur E-Commerce en France
 
Référencement 2.0 et 3.0
Référencement 2.0 et 3.0Référencement 2.0 et 3.0
Référencement 2.0 et 3.0
 
Réussir sa campagne de liens sponsorisés
Réussir sa campagne de liens sponsorisésRéussir sa campagne de liens sponsorisés
Réussir sa campagne de liens sponsorisés
 
Internet et les élections présidentielles 2007
Internet et les élections présidentielles 2007Internet et les élections présidentielles 2007
Internet et les élections présidentielles 2007
 

Último

Último (20)

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 

Google Bot Herding, PageRank Sculpting and Manipulation

  • 1. Bot Herding presented by Stephan Spencer, Founder & President, Netconcepts © 2008 Stephan M Spencer Netconcepts www.netconcepts.com sspencer@netconcepts.com
  • 2. Duplicate Content Mitigation  Dup content is rampant on blogs. Herd bots to permalink URL & lead in everywhere else (Archives by Date pages, Category pages, Tag pages, Home page, etc.) with paraphrased “Optional Excerpt” – Not just the first couple paragraphs, i.e. the <!--more--> tag! – Requires you to revise your Main Index Template theme file: if (empty($post->post_excerpt) || is_single() || is_page()) { the_content(); } else { the_excerpt(); echo quot;<a href='”; the_permalink(); echo quot;' rel='nofollow'>Continue reading &raquo;</a>quot;; } © 2008 Stephan M Spencer Netconcepts www.netconcepts.com sspencer@netconcepts.com
  • 3. Duplicate Content Mitigation  Include sig line (& headshot photo!) at bottom of post/article. Link to original article/post permalink URL! – http://www.naturalsearchblog.com/archives/2008/06/03/syndic ating-your-articles/ – http://www.businessblogconsulting.com/2008/05/brand- yourself-with-photo-sig-line © 2008 Stephan M Spencer Netconcepts www.netconcepts.com sspencer@netconcepts.com
  • 4. Duplicate Content Mitigation  On ecommerce sites, dup content also rampant: – Manufacturer-provided product descriptions, inconsistent order of query string parameters, “guided navigation”, pagination within categories, tracking parameters  Selectively append tracking codes for humans w/ “white hat cloaking” or use JavaScript to append the codes – REI.com used to append a quot;vcatquot; parameter on all brand links on their Shop By Brand page (see http://web.archive.org/web/20060823085548/www.rei.com/rei/s ales_and_events/brands.html) © 2008 Stephan M Spencer Netconcepts www.netconcepts.com sspencer@netconcepts.com
  • 5. Pagination  Not only creates many pages that share the same keyword theme, also very large categories with thousands of products result in hundreds of pages of product listings not getting crawled. Thus lowered product page indexation.  Herd bots through keyword-rich subcat links or “View All” link or both? How to display page number links? Optimal # of products to display/link per page? Test! © 2008 Stephan M Spencer Netconcepts www.netconcepts.com sspencer@netconcepts.com
  • 6. PageRank Leakage?  If you’re using Robots.txt Disallow, you’re probably leaking PageRank  Robots.txt Disallow & Meta Robots Noindex both accumulate and pass PageRank – Meta Noindex tag on a Master sitemap page will de-index the page but still pass PageRank to linked sub-sitemap pages  Meta Robots Nofollow blocks the flow of PageRank – http://www.stonetemple.com/articles/interview-matt-cutts.shtml © 2008 Stephan M Spencer Netconcepts www.netconcepts.com sspencer@netconcepts.com
  • 7. Rewriting Spider-Unfriendly URLs  3 approaches: 1) Use a “URL rewriting” server module / plugin – such as mod_rewrite for Apache, or ISAPI_Rewrite for IIS Server 2) Recode your scripts to extract variables out of the “path_info” part of the URL instead of the “query_string” 3) Or, if IT department involvement must be minimized, use a proxy server based solution (e.g. Netconcepts' GravityStream) – With (1) and (2), replace all occurrences of your old URLs in links on your site with your new search-friendly URLs. 301 redirect the old to new URLs too, so no link juice is lost. © 2008 Stephan M Spencer Netconcepts www.netconcepts.com sspencer@netconcepts.com
  • 8. mod_rewrite – the Foundation for URL Rewriting, Remapping & Redirecting  Works with Apache and IBM HTTP Server  Place “rules” within .htaccess or your Apache config file (e.g. httpd.conf, sites_conf/…) – RewriteEngine on – RewriteBase / – RewriteRule ^products/([0-9]+)/?$ /get_product.php?id=$1 [L] – RewriteRule ^([^/]+)/([^/]+).htm$ /webapp/wcs/stores/servlet/ProductDisplay?storeId=10 001&cat alogId=10001&langId=-1 &categoryID=$1&productID=$2 [QSA,P,L] © 2008 Stephan M Spencer Netconcepts www.netconcepts.com sspencer@netconcepts.com
  • 9. Regular Expressions  The magic of regular expressions / pattern matching – * means 0 or more of the immediately preceding character – + means 1 or more of the immediately preceding character – ? means 0 or 1 occurrence of the immediately preceding char – ^ means the beginning of the string, $ means the end of it – . means any character (i.e. wildcard) – “escapes” the character that follows, e.g. . mea dot ns – [ ] is for character ranges, e.g. [A-Za-z]. – ^ inside [] brackets means “not”, e.g. [^/] © 2008 Stephan M Spencer Netconcepts www.netconcepts.com sspencer@netconcepts.com
  • 10. Regular Expressions – () puts whatever is wrapped within it into memory – Access what’s in memory with $1 (what’s in first set of parens), $2 (what’s in second set of parens), and so on  Gotchas to beware of: – “Greedy” expressions. Use [^ instead of .* – .* can match on nothing. Use .+ instead – Unintentional substring matches because ^ or $ wasn’t specified © 2008 Stephan M Spencer Netconcepts www.netconcepts.com sspencer@netconcepts.com
  • 11. mod_rewrite Specifics  Proxy page using [P] flag – RewriteRule /blah.html$ http://www.google.com/ [P]  [QSA] flag is for when you don’t want query string params dropped (like when you want a tracking param preserved)  [L] flag saves on server processing  Got a huge pile of rewrites? Use RewriteMap and have a lookup table as a text file © 2008 Stephan M Spencer Netconcepts www.netconcepts.com sspencer@netconcepts.com
  • 12. IIS? ISAPI_Rewrite!  What if your site is running Microsoft IIS Server?  ISAPI_Rewrite plugin! Not that different from mod_rewrite  In httpd.ini : – [ISAPI_Rewrite] RewriteRule ^/category/([0-9]+).htm$ /index.asp?PageAction=VIEWCATS&Category=$1 [L] – Will rewrite a URL like http://www.example.com/index.asp?PageAction=VIEWCATS&Ca tegory=207 to something like http://www.example.com/category/207.htm © 2008 Stephan M Spencer Netconcepts www.netconcepts.com sspencer@netconcepts.com
  • 13. Implementing 301 Redirects Using Redirect Directives  In .htaccess (or httpd.conf), you can redirect individual URLs, the contents of directories, entire domains… : – Redirect 301 /old_url.htm http://www.example.com/new_url.htm – Redirect 301 /old_dir/ http://www.example.com/new_dir/ – Redirect 301 / http://www.example.com  Pattern matching can be done with RedirectMatch 301 – RedirectMatch 301 ^/(.+)/index.html$ http://www.example.com/$1/ © 2008 Stephan M Spencer Netconcepts www.netconcepts.com sspencer@netconcepts.com
  • 14. Implementing 301 Redirects Using Rewrite Rules  Or use a rewrite rule with the [R=301] flag – RewriteCond %{HTTP_HOST} !^www.example.com$ [NC] – RewriteRule ^(.*)$ http://www.example.com/$1 [L,QSA,R=301]  [NC] flag makes the rewrite condition case-insensitive © 2008 Stephan M Spencer Netconcepts www.netconcepts.com sspencer@netconcepts.com
  • 15. Conditional Redirects  Conditional 301 for bots – great for capturing the link juice from inbound affiliate links  Only works if you manage your own affiliate program  Most are outsourced and 302  (e.g. C.J.)  By outsourcing your affiliate marketing, none of your deep affiliate links are counting  If Amazon’s doing it, why can’t you?  – (Credit to Brian Klais for hypothesizing Amazon was doing this) – http://tinyurl.com/5ubc28 © 2008 Stephan M Spencer Netconcepts www.netconcepts.com sspencer@netconcepts.com
  • 16. Status Code 200 for humans © 2008 Stephan M Spencer Netconcepts www.netconcepts.com sspencer@netconcepts.com
  • 17. 301 for all bots. Muahaha!! © 2008 Stephan M Spencer Netconcepts www.netconcepts.com sspencer@netconcepts.com
  • 18. Implementing Conditional Redirects Using Rewrite Rules  Selectively redirect bots that request URLs with session IDs to the URL sans session ID: – RewriteCond %{QUERY_STRING} PHPSESSID RewriteCond %{HTTP_USER_AGENT} Googlebot.* [OR] RewriteCond %{HTTP_USER_AGENT} ^msnbot.* [OR] RewriteCond %{HTTP_USER_AGENT} Slurp [OR] RewriteCond %{HTTP_USER_AGENT} Ask Jeeves RewriteRule ^/(.*)$ /$1 [R=301,L]  Utilize browscap.ini instead of having to keep up with each spider’s name and version changes © 2008 Stephan M Spencer Netconcepts www.netconcepts.com sspencer@netconcepts.com
  • 19. Error Pages © 2008 Stephan M Spencer Netconcepts www.netconcepts.com sspencer@netconcepts.com
  • 20. Error Pages  Traditional approach is to serve up a 404, which drops that error page with the obsolete or wrong URL out of the search indexes. This squanders the link juice to that page.  But what if you return a 200 status code instead, so that the spiders follow the links! Then include a meta robots noindex so the error page itself doesn’t get indexed.   Or do a 301 redirect to something valuable (e.g. your home page) and dynamically include a small error notice   (Credit to Francois Planque for this clever approach.) © 2008 Stephan M Spencer Netconcepts www.netconcepts.com sspencer@netconcepts.com
  • 21. URL Stability  An annually recurring feature, like a Holiday Gift Buying Guide, should have a stable, date-unspecified URL – No need for any 301s – When the current edition is to be retired and replaced with a new edition, assign a new URL to the archived edition  Otherwise link juice earned over time is not carried over to future years’ editions © 2008 Stephan M Spencer Netconcepts www.netconcepts.com sspencer@netconcepts.com
  • 22. URL Testing  URL affects searcher clickthrough rates  Short URLs get clicked on 2X long URLs (Source: MarketingSherpa, used with permission) © 2008 Stephan M Spencer Netconcepts www.netconcepts.com sspencer@netconcepts.com
  • 23. URL Testing  Further, long URLs appear to act as a deterrent to clicking, drawing attention away from its listing and instead directing it to the listing below it, which then gets clicked 2.5x more frequently. – http://searchengineland.com/080515-084124.php  Don’t be complacent with search-friendly URLs. Test and optimize.  Make iterative improvements to URLs, but don’t lose link juice to previous URLs. 301 previous URLs to latest. No ch of 301s. ains  WordPress handles 301s automatically when renaming post slugs  Mass editing URLs (post slugs) in WordPress – announcement tomorrow in Give It Up session © 2008 Stephan M Spencer Netconcepts www.netconcepts.com sspencer@netconcepts.com
  • 24. Yank Competitor’s Grouped Results from Google page 1 SERPs  Knock out your competitor’s second indented (grouped) listing by directing link juice to other non-competitive listings (e.g. on page 2 SERPs, or directly below indented result’s true position)  First, find the true position of their indented result by appending &num=9 to the URL and see if the indented listing drops off. If not, append &num=8. Rinse and repeat until the indented listing falls away. Indented listing is more susceptible the worse its true position. © 2008 Stephan M Spencer Netconcepts www.netconcepts.com sspencer@netconcepts.com
  • 25. This isn’t really #3 © 2008 Stephan M Spencer Netconcepts www.netconcepts.com sspencer@netconcepts.com
  • 26. Nope, not yet © 2008 Stephan M Spencer Netconcepts www.netconcepts.com sspencer@netconcepts.com
  • 27. Gone! It’s true position was #9 © 2008 Stephan M Spencer Netconcepts www.netconcepts.com sspencer@netconcepts.com
  • 28. SEO the title of #12 to bump it up to page 1 – it will be grouped to #2. Then link to #11 and bump it up to page 1 to knock #4 to page 2 © 2008 Stephan M Spencer Netconcepts www.netconcepts.com sspencer@netconcepts.com
  • 29. More Things I Wish I Had Time to Cover  Robots.txt gotchas  Webmaster Central tools (www vs no www, crawl rate, robots.txt builder, Sitemaps, etc.)  Yahoo's Dynamic URLs tab in Site Explorer  <div class=quot;robots-nocontentquot;>  If-Modified-Since  Status codes 404, 401, 500 etc.  PageRank transfer from PDFs, RSS feeds, Word docs etc.  Diagnostic tools (e.g. livehttpheaders, User Agent Switcher) © 2008 Stephan M Spencer Netconcepts www.netconcepts.com sspencer@netconcepts.com
  • 30. Thanks!  This Powerpoint can be downloaded from www.netconcepts.com/learn/bot-herding.ppt  For 180 minute long screencast (including 90 minutes of Q&A) on SEO for large dynamic websites (taught by myself and Chris Smith) – including transcripts – email seo@netconcepts.com  Questions after the show? Email me at stephan@netconcepts.com © 2008 Stephan M Spencer Netconcepts www.netconcepts.com sspencer@netconcepts.com