The "robots.txt" file is an essential element you can place on your Web site to help search engines understand which parts of your site they may visit and index, and which parts they should avoid. By using the robots.txt file, you get control over how search engines interact with your site. So also over its interactions with CompanySpotter. It is especially useful if there are certain parts of your site that you do not want to see appear in search results, such as admin pages, private sections, or perhaps parts of your site that are still under construction. The file normally resides in the root directory of your site. In other words, if your site is www.test.com, your robots.txt file would be found at www.test.com/robots.txt.
The use of a robots.txt file is a basic aspect of SEO (Search Engine Optimization) and helps to get your site's appropriate content indexed and presented to search engine users. It can be a useful tool, but like all SEO strategies, it must be used carefully and wisely.
The important thing to remember is that although most search engines (including CompanySpotter) operate carefully and respect the rules of the robots.txt file, it is not an absolute guarantee. Not all search engines respect the rules, and malicious bots can willfully ignore the instructions.
Below are some useful examples on how to build a robots.txt file:
Example 1: Block all search engines
If you don't want search engines to index your website, you can put the following in your robots.txt file:
User-agent: * Disallow: /
Here User-agent says: * that the following rules apply to all search engines, and Disallow: / tells them to avoid the entire site. It then boils down to asking all search engines not to index pages.
Example 2: Blocking a specific search engine
For example, if you don't want Google to index your site, but you would like other search engines to index it:
User-agent: Googlebot Disallow: /
Here User-agent: Googlebot says that the following rules apply to Google's search engine bot. Essentially, the Googlebot is asked not to index pages, while all other search engines are allowed to index.
Example 3: Blocking specific directories
If you want to prevent search engines from indexing certain directories of your site:
User-agent: * Disallow: /private/ Disallow: /test/
In this example, all bots are instructed to avoid the directories "/private/" and "/test/." The bottom line is that all search engines are allowed to index all pages except those that are part of "/private/" and "/test/."
Example 4: Blocking specific files
If you want to prevent search engines from indexing certain files on your site:
User-agent: * Disallow: /directory/my-file.html
This example instructs bots to avoid the specific file "my-file.html" in the "/directory/". The bottom line is that all search engines are allowed to index all pages except the /directory/my-file.html page.