Block Bad Bots Globally On your cPanel/WHM CentOS Server

Block Bad Bots Globally On your cPanel/WHM CentOS Server

When it comes to the web, there are good bots and bad bots. An example of a good bot would be Googlebot. Googlebot is Google’s web crawling bot which crawls people’s new content and adds it to their search engine for indexing. An example of a bad bot would be Cheesebot. Bad bots can include spiders, crawlers, and scrapers. They are not always malicious, however most of the time it is also not necessary that they crawl your site. They consume your bandwidth, take up server resources, and steal your content.

You could simply block the bots via your .htaccess file with a rule like this one here:

BrowserMatchNoCase "Baiduspider" bots

Order Allow,Deny
Allow from ALL
Deny from env=bots

To check if the rule is working you can run the following:

curl -A "Baiduspider" http://yourdomain.com

However if you host a lot of sites on a single server, adding that to each account would be time consuming.

Here is an easy way that you could use to block some bots via your Apache conf directly on your cPanel server.

One thing to keep in mind is that you can not edit the httpd.conf directly on any cPanel server as your changes will be overwritten after the next cPanel httpd.conf rebuild.

You would need to add the rules to the following file:

/usr/local/apache/conf/includes/pre_main_global.conf

Depending on the bots that you would like to block, you could simply change the name or add more rules. In my case I want to block Yandex and MJ12bot, so I'm going to use the rules below:


<LocationMatch ".*">
  SetEnvIfNoCase User-Agent "MJ12bot" bad_user
  SetEnvIfNoCase User-Agent "YandexBot" bad_user
  Deny from env=bad_user


<If "%{HTTP_USER_AGENT} == 'MJ12bot'">
  Deny from all


<If "%{HTTP_USER_AGENT} == 'YandexBot'">
  Deny from all



  SetEnvIfNoCase User-Agent "MJ12bot" bad_user
  SetEnvIfNoCase User-Agent "YandexBot" bad_user
  Deny from env=bad_user

You can change the MJ12bot and the YandexBot parts with any bots that you would like to block or simply add more rules for the other bots that you want to block.

Save the file test the syntax by running one of the following:

httpd -t

Or:

service httpd configtest

If you get "Syntax OK" then go ahead and restart Apache:

 service httpd restart 

This is pretty much it, then if one of the bots tries to access your sites, they would get a 403 message.

 

Materialize

The Real-Time Data Platform for Developers

Buy me a coffeeBuy me a coffee