3. GOOGLEBOT
The Google Crawler that reads the web pages
File Formats read by Googlebot
1. Adobe's PDF documents and PostScript (ps)
2. Microsoft Office Document types :- Excel, Word,
Powerpoint
3. Lotus Document types :-wk1, wk2, wk3, wk4, wk5,
wki, wks, wku
4. Lotus WordPro(lwp), Macwrite(mw), Rich Text Format(rtf)
Text files(txt)
Naveen Gujar
4. More on GOOGLEBOT
1. File Formats Avoided : - exe, dll, zip, dmg
2. Can be redirected to crawl a certain page through the
Use of ROBOTS.txt file
Naveen Gujar
5. GOOGLEBOT Partners
FRESHBOT:-
Used to Crawl updated
Pages on the web.
DEEPBOT:-
Follows as many
links and download
as may pages as
possible
Naveen Gujar
6. MEDIABOT
Used for serving contextually relevant ads to the publishing sites
Purpose- To analyze content of webpages so that adsense can
serve meaningful ads on the site.
This crawler should not be restricted on sites using Adsense
Naveen Gujar
7. IMAGEBOT
Scavenges the web for images to place in their image index
Ranking of images for a particular keywords depends upon:-
Filename, Surrounding text, AltText and Pagetitle
If website is not focused on image inventory and downloads
it makes sense to restrict IMAGEBOT using ROBOTS.txt
Restricting IMAGEBOT also saves some Bandwidth
Naveen Gujar
8. ADSBOT
It serves a very specific purpose as far as crawling is concerned
Geared to provide wisdom to Google Adsense program by:-
Analyzing the content of pages landing to it.
This content analysis helps in determining the Quality Score
for a particular ad.
This Quality score in association with the Bid Amount & CTR
(Click Through rate) is used by Google to determine the
ranking score of an Ad for a particular Keyword.
Naveen Gujar
9. GOOGLEBOT-MOBILE
Google does use a specific cawler for indexing mobile content.
Google indexes public mobile content.
If the content appears to be available only to subset of all
mobile users, it is NOT indexed.
Users can search the mobile web on their mobile devices using
Google Mobile Web Search.
Naveen Gujar
10. Getting Your Mobile Content Indexed
Steps are roughly the same:-
Submit Mobile Sitemaps to the Google Mobile Index just in the
same way as the Non-mobile site maps are submitted.
You create and add Mobile Sitemaps to your Google Webmaster
tools account in a similar way to Sitemaps for non-mobile
content.
If your Mobile site has changed, then you can resubmit your
map
Naveen Gujar
11. FEEDFETCHER-GOOGLE
This is the RSS and ATOM feed crawler of
Google
All Blogs published thorugh BLOGGER, Wordpress, Typepad
etc
Blogs written in ENGLISH, FRENCH,GERMAN,ITALIAN,
SPANISH, BRAZILIAN, PORTUGESE etc.
Average Crawl frequency is more than an hour, depending on
frequency of the Blog's update frequency.
If your Blog publishes a site feed in any format & pings an
update service, then the contents of this feed will be indexed in
the Blog Search.
Naveen Gujar