One of our sites recently was running out of bandwidth every 5-10 minutes, a quick look at the stats showed vast amount of traffic from bots/spiders, not necessarily bots that would be useful for us either!
Blocking IP addresses we tend to do at a server level, it’s quick and easy and they don’t eat any bandwidth that way. We could probably block these at a server level too, but some of our customers might actually want to be indexed by these bots (although I can’t think why!).
So we decided to block in .htaccess based on the User Agent string, the following works really well:
BrowserMatchNoCase "Baiduspider" bots BrowserMatchNoCase "buzzsumo" bots BrowserMatchNoCase "AhrefsBot" bots Order Allow,Deny Allow from ALL Deny from env=bots
The request still gets through to Apache, but they get a 403 error rather than getting through to the site.
Photo: “Robot” by andreavallejos is licensed under CC BY-ND 2.0
Leave a Reply