Más contenido relacionado
Similar a Google Bot Herding, PageRank Sculpting and Manipulation (20)
Más de David Degrelle - Consultant SEO Expert (19)
Google Bot Herding, PageRank Sculpting and Manipulation
- 1. Bot Herding
presented by Stephan Spencer,
Founder & President, Netconcepts
© 2008 Stephan M Spencer Netconcepts www.netconcepts.com sspencer@netconcepts.com
- 2. Duplicate Content Mitigation
Dup content is rampant on blogs. Herd bots to permalink
URL & lead in everywhere else (Archives by Date
pages, Category pages, Tag pages, Home page, etc.)
with paraphrased “Optional Excerpt”
– Not just the first couple paragraphs, i.e. the <!--more--> tag!
– Requires you to revise your Main Index Template theme file:
if (empty($post->post_excerpt) || is_single() || is_page()) { the_content(); }
else { the_excerpt(); echo quot;<a href='”; the_permalink(); echo quot;'
rel='nofollow'>Continue reading »</a>quot;; }
© 2008 Stephan M Spencer Netconcepts www.netconcepts.com sspencer@netconcepts.com
- 3. Duplicate Content Mitigation
Include sig line (& headshot photo!) at bottom of
post/article. Link to original article/post permalink URL!
– http://www.naturalsearchblog.com/archives/2008/06/03/syndic
ating-your-articles/
– http://www.businessblogconsulting.com/2008/05/brand-
yourself-with-photo-sig-line
© 2008 Stephan M Spencer Netconcepts www.netconcepts.com sspencer@netconcepts.com
- 4. Duplicate Content Mitigation
On ecommerce sites, dup content also rampant:
– Manufacturer-provided product descriptions, inconsistent order
of query string parameters, “guided navigation”, pagination
within categories, tracking parameters
Selectively append tracking codes for humans w/ “white
hat cloaking” or use JavaScript to append the codes
– REI.com used to append a quot;vcatquot; parameter on all brand links
on their Shop By Brand page (see
http://web.archive.org/web/20060823085548/www.rei.com/rei/s
ales_and_events/brands.html)
© 2008 Stephan M Spencer Netconcepts www.netconcepts.com sspencer@netconcepts.com
- 5. Pagination
Not only creates many pages that share the same
keyword theme, also very large categories with
thousands of products result in hundreds of pages of
product listings not getting crawled. Thus lowered
product page indexation.
Herd bots through keyword-rich subcat links or “View
All” link or both? How to display page number links?
Optimal # of products to display/link per page? Test!
© 2008 Stephan M Spencer Netconcepts www.netconcepts.com sspencer@netconcepts.com
- 6. PageRank Leakage?
If you’re using Robots.txt Disallow, you’re probably
leaking PageRank
Robots.txt Disallow & Meta Robots Noindex both
accumulate and pass PageRank
– Meta Noindex tag on a Master sitemap page will de-index the
page but still pass PageRank to linked sub-sitemap pages
Meta Robots Nofollow blocks the flow of PageRank
– http://www.stonetemple.com/articles/interview-matt-cutts.shtml
© 2008 Stephan M Spencer Netconcepts www.netconcepts.com sspencer@netconcepts.com
- 7. Rewriting Spider-Unfriendly URLs
3 approaches:
1) Use a “URL rewriting” server module / plugin – such as
mod_rewrite for Apache, or ISAPI_Rewrite for IIS Server
2) Recode your scripts to extract variables out of the “path_info”
part of the URL instead of the “query_string”
3) Or, if IT department involvement must be minimized, use a
proxy server based solution (e.g. Netconcepts' GravityStream)
– With (1) and (2), replace all occurrences of your old URLs in
links on your site with your new search-friendly URLs. 301
redirect the old to new URLs too, so no link juice is lost.
© 2008 Stephan M Spencer Netconcepts www.netconcepts.com sspencer@netconcepts.com
- 8. mod_rewrite – the Foundation for URL
Rewriting, Remapping & Redirecting
Works with Apache and IBM HTTP Server
Place “rules” within .htaccess or your Apache config file
(e.g. httpd.conf, sites_conf/…)
– RewriteEngine on
– RewriteBase /
– RewriteRule ^products/([0-9]+)/?$ /get_product.php?id=$1 [L]
– RewriteRule ^([^/]+)/([^/]+).htm$
/webapp/wcs/stores/servlet/ProductDisplay?storeId=10 001&cat
alogId=10001&langId=-1 &categoryID=$1&productID=$2
[QSA,P,L]
© 2008 Stephan M Spencer Netconcepts www.netconcepts.com sspencer@netconcepts.com
- 9. Regular Expressions
The magic of regular expressions / pattern matching
– * means 0 or more of the immediately preceding character
– + means 1 or more of the immediately preceding character
– ? means 0 or 1 occurrence of the immediately preceding char
– ^ means the beginning of the string, $ means the end of it
– . means any character (i.e. wildcard)
– “escapes” the character that follows, e.g. . mea dot
ns
– [ ] is for character ranges, e.g. [A-Za-z].
– ^ inside [] brackets means “not”, e.g. [^/]
© 2008 Stephan M Spencer Netconcepts www.netconcepts.com sspencer@netconcepts.com
- 10. Regular Expressions
– () puts whatever is wrapped within it into memory
– Access what’s in memory with $1 (what’s in first set of parens),
$2 (what’s in second set of parens), and so on
Gotchas to beware of:
– “Greedy” expressions. Use [^ instead of .*
– .* can match on nothing. Use .+ instead
– Unintentional substring matches because ^ or $ wasn’t
specified
© 2008 Stephan M Spencer Netconcepts www.netconcepts.com sspencer@netconcepts.com
- 11. mod_rewrite Specifics
Proxy page using [P] flag
– RewriteRule /blah.html$ http://www.google.com/ [P]
[QSA] flag is for when you don’t want query string
params dropped (like when you want a tracking param
preserved)
[L] flag saves on server processing
Got a huge pile of rewrites? Use RewriteMap and have
a lookup table as a text file
© 2008 Stephan M Spencer Netconcepts www.netconcepts.com sspencer@netconcepts.com
- 12. IIS? ISAPI_Rewrite!
What if your site is running Microsoft IIS Server?
ISAPI_Rewrite plugin! Not that different from mod_rewrite
In httpd.ini :
– [ISAPI_Rewrite]
RewriteRule ^/category/([0-9]+).htm$
/index.asp?PageAction=VIEWCATS&Category=$1 [L]
– Will rewrite a URL like
http://www.example.com/index.asp?PageAction=VIEWCATS&Ca
tegory=207 to something like
http://www.example.com/category/207.htm
© 2008 Stephan M Spencer Netconcepts www.netconcepts.com sspencer@netconcepts.com
- 13. Implementing 301 Redirects Using
Redirect Directives
In .htaccess (or httpd.conf), you can redirect individual
URLs, the contents of directories, entire domains… :
– Redirect 301 /old_url.htm
http://www.example.com/new_url.htm
– Redirect 301 /old_dir/ http://www.example.com/new_dir/
– Redirect 301 / http://www.example.com
Pattern matching can be done with RedirectMatch 301
– RedirectMatch 301 ^/(.+)/index.html$
http://www.example.com/$1/
© 2008 Stephan M Spencer Netconcepts www.netconcepts.com sspencer@netconcepts.com
- 14. Implementing 301 Redirects Using
Rewrite Rules
Or use a rewrite rule with the [R=301] flag
– RewriteCond %{HTTP_HOST} !^www.example.com$ [NC]
– RewriteRule ^(.*)$ http://www.example.com/$1
[L,QSA,R=301]
[NC] flag makes the rewrite condition case-insensitive
© 2008 Stephan M Spencer Netconcepts www.netconcepts.com sspencer@netconcepts.com
- 15. Conditional Redirects
Conditional 301 for bots – great for capturing the link juice
from inbound affiliate links
Only works if you manage your own affiliate program
Most are outsourced and 302 (e.g. C.J.)
By outsourcing your affiliate marketing, none of your deep
affiliate links are counting
If Amazon’s doing it, why can’t you?
– (Credit to Brian Klais for hypothesizing Amazon was doing this)
– http://tinyurl.com/5ubc28
© 2008 Stephan M Spencer Netconcepts www.netconcepts.com sspencer@netconcepts.com
- 16. Status Code
200 for humans
© 2008 Stephan M Spencer Netconcepts www.netconcepts.com sspencer@netconcepts.com
- 17. 301 for all bots.
Muahaha!!
© 2008 Stephan M Spencer Netconcepts www.netconcepts.com sspencer@netconcepts.com
- 18. Implementing Conditional Redirects
Using Rewrite Rules
Selectively redirect bots that request URLs with session
IDs to the URL sans session ID:
– RewriteCond %{QUERY_STRING} PHPSESSID
RewriteCond %{HTTP_USER_AGENT} Googlebot.* [OR]
RewriteCond %{HTTP_USER_AGENT} ^msnbot.* [OR]
RewriteCond %{HTTP_USER_AGENT} Slurp [OR]
RewriteCond %{HTTP_USER_AGENT} Ask Jeeves
RewriteRule ^/(.*)$ /$1 [R=301,L]
Utilize browscap.ini instead of having to keep up with
each spider’s name and version changes
© 2008 Stephan M Spencer Netconcepts www.netconcepts.com sspencer@netconcepts.com
- 19. Error Pages
© 2008 Stephan M Spencer Netconcepts www.netconcepts.com sspencer@netconcepts.com
- 20. Error Pages
Traditional approach is to serve up a 404, which drops that error
page with the obsolete or wrong URL out of the search indexes.
This squanders the link juice to that page.
But what if you return a 200 status code instead, so that the
spiders follow the links! Then include a meta robots noindex so the
error page itself doesn’t get indexed.
Or do a 301 redirect to something valuable (e.g. your home page)
and dynamically include a small error notice
(Credit to Francois Planque for this clever approach.)
© 2008 Stephan M Spencer Netconcepts www.netconcepts.com sspencer@netconcepts.com
- 21. URL Stability
An annually recurring feature, like a Holiday Gift Buying
Guide, should have a stable, date-unspecified URL
– No need for any 301s
– When the current edition is to be retired and replaced with a
new edition, assign a new URL to the archived edition
Otherwise link juice earned over time is not carried over
to future years’ editions
© 2008 Stephan M Spencer Netconcepts www.netconcepts.com sspencer@netconcepts.com
- 22. URL Testing
URL affects
searcher
clickthrough
rates
Short URLs
get clicked on
2X long URLs
(Source: MarketingSherpa,
used with permission)
© 2008 Stephan M Spencer Netconcepts www.netconcepts.com sspencer@netconcepts.com
- 23. URL Testing
Further, long URLs appear to act as a deterrent to clicking, drawing
attention away from its listing and instead directing it to the listing
below it, which then gets clicked 2.5x more frequently.
– http://searchengineland.com/080515-084124.php
Don’t be complacent with search-friendly URLs. Test and optimize.
Make iterative improvements to URLs, but don’t lose link juice to
previous URLs. 301 previous URLs to latest. No ch of 301s.
ains
WordPress handles 301s automatically when renaming post slugs
Mass editing URLs (post slugs) in WordPress – announcement
tomorrow in Give It Up session
© 2008 Stephan M Spencer Netconcepts www.netconcepts.com sspencer@netconcepts.com
- 24. Yank Competitor’s Grouped Results
from Google page 1 SERPs
Knock out your competitor’s second indented (grouped)
listing by directing link juice to other non-competitive
listings (e.g. on page 2 SERPs, or directly below
indented result’s true position)
First, find the true position of their indented result by
appending &num=9 to the URL and see if the indented
listing drops off. If not, append &num=8. Rinse and
repeat until the indented listing falls away. Indented
listing is more susceptible the worse its true position.
© 2008 Stephan M Spencer Netconcepts www.netconcepts.com sspencer@netconcepts.com
- 25. This isn’t
really #3
© 2008 Stephan M Spencer Netconcepts www.netconcepts.com sspencer@netconcepts.com
- 26. Nope,
not yet
© 2008 Stephan M Spencer Netconcepts www.netconcepts.com sspencer@netconcepts.com
- 27. Gone!
It’s true
position
was #9
© 2008 Stephan M Spencer Netconcepts www.netconcepts.com sspencer@netconcepts.com
- 28. SEO the
title of #12
to bump it
up to page
1 – it will be
grouped to
#2. Then
link to #11
and bump it
up to page
1 to knock
#4 to page 2
© 2008 Stephan M Spencer Netconcepts www.netconcepts.com sspencer@netconcepts.com
- 29. More Things I Wish I Had Time to
Cover
Robots.txt gotchas
Webmaster Central tools (www vs no www, crawl rate, robots.txt
builder, Sitemaps, etc.)
Yahoo's Dynamic URLs tab in Site Explorer
<div class=quot;robots-nocontentquot;>
If-Modified-Since
Status codes 404, 401, 500 etc.
PageRank transfer from PDFs, RSS feeds, Word docs etc.
Diagnostic tools (e.g. livehttpheaders, User Agent Switcher)
© 2008 Stephan M Spencer Netconcepts www.netconcepts.com sspencer@netconcepts.com
- 30. Thanks!
This Powerpoint can be downloaded from
www.netconcepts.com/learn/bot-herding.ppt
For 180 minute long screencast (including 90 minutes
of Q&A) on SEO for large dynamic websites (taught
by myself and Chris Smith) – including transcripts –
email seo@netconcepts.com
Questions after the show? Email me at
stephan@netconcepts.com
© 2008 Stephan M Spencer Netconcepts www.netconcepts.com sspencer@netconcepts.com