Internal Duplicate Pages
If you are using modules that generate a printer friendly version of your pages, like the book module, or if you have your forum activated with sort capabilities, you will want to keep the search engine robots from indexing these to avoid duplicate content.
Most of these issues can be handled through your robots.txt file. Drupal 5 comes preinstalled with a default robots.txt file, which mostly takes care of keeping the spiders out of the appropriate directories. However, it needs some changes and additions.
If you use the Printer Friendly Pages module, you will want to add the following to your robots.txt file.
Disallow: /print/
The Book module has a built in printer friendly option.
Disallow: /book/
For the Forward to Friend module add:
Disallow: /forward/
If you are using the Forum or Views modules add:
Disallow: /*sort=
And for taxonomy vocabularies that allow multiple terms per node, you can choose to disallow which terms you want.
Disallow: /term/
Additionally, there are some potential problems with the default robots.txt file that need correcting.
Special thanks to Drupalzilla's article on Drupal's robots text file who pointed out many of the problems with the default file. Below you can download my variation of his improved robots.txt file. I've added a few extra files and directories to disallow.
NOTE: This should not be considered a security feature. If you find malicious bots ignoring your robots.txt file, you will need to block them directly from the .htaccess file.
| Attachment | Size |
|---|---|
| enhanced-robots-2.txt | 1.61 KB |



enhanced-robots-2.txt compatible with 6.x?
This is a great site for a drupal newb (me). Is the enhanced-robots-2.txt file compatible with Drupal 6? Thanks for the info here.