Blocking the Big Bad Bots!

One of our sites recently was running out of bandwidth every 5-10 minutes, a quick look at the stats showed vast amount of traffic from bots/spiders, not necessarily bots that would be useful for us either!

Robot

Blocking IP addresses we tend to do at a server level, it’s quick and easy and they don’t eat any bandwidth that way. We could probably block these at a server level too, but some of our customers might actually want to be indexed by these bots (although I can’t think why!).

So we decided to block in .htaccess based on the User Agent string, the following works really well:

BrowserMatchNoCase "Baiduspider" bots
BrowserMatchNoCase "buzzsumo" bots
BrowserMatchNoCase "AhrefsBot" bots
Order Allow,Deny
Allow from ALL
Deny from env=bots

The request still gets through to Apache, but they get a 403 error rather than getting through to the site.

Photo: “Robot” by andreavallejos is licensed under CC BY-ND 2.0

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.