Content's trendy. Links are talked to death. But everyone ignores infrastructure. Why? This presentation talks technical issues, from log file crawls to response codes to obfuscated content.
27. grep "GoogleBot" biglog.log >> googlelog.txt
“Find me every line that mentions
‘GoogleBot’ and write the result to a
new file called ‘googlelog.txt’”
28. grep -v ”foo.com" biglog.log|grep -v
"bot"|grep -v " 200 "|grep -v "spider"|
grep ".aspx" >> bar.txt
In English: Find me all non-200 responses
that came from people clicking on links on
other sites. Save it to bar.txt.
29. GET /gaggle/sheckel/ourstuff.aspx - 80 -
84.3.92.94 Mozilla/5.0+ http://
www.othersite.com/datstuff.aspx 302 0 0 14
A link to this page from that site
generated a 302 response.