Robots or Spiders are used to visit web sites to discover and analyse the content. Dependant upon the owning website the content discovered may be either indexed for use by search engines or for gathering e-mail addresses.
Details for the CCBot Internet robot. Details for this robot include owner, description, HTTP user agent and whether this robot adheres to the robot exclusion standard.
The aim of CommonCrawl is to develop a comprehensive crawl of the Internet.
The website reference, given below, confirms support for the robots.txt exclusion standard, which is described at http://www.robotstxt.org/wc/exclusion.html#robotstxt.
NAT August 2008
Copyright © 2004-2008 Janet Systems Ltd.