Block Bad Bots Globally On your cPanel/WHM CentOS Server

When it comes to the web, there are good bots and bad bots. An example of a good bot would be Googlebot. Googlebot is Google’s web crawling bot which crawls people’s new content and adds it to their search engine for indexing. An example of a bad bot would be Cheesebot. Bad bots can include spiders, crawlers, and scrapers. They are not always malicious, however most of the time it is also not necessary that they crawl your site. They consume your bandwidth, take up server resources, and steal your content.
You could simply block the bots via your .htaccess file with a rule like this one here:
BrowserMatchNoCase "Baiduspider" bots Order Allow,Deny Allow from ALL Deny from env=bots
To check if the rule is working you can run the following:
curl -A "Baiduspider" http://yourdomain.com
However if you host a lot of sites on a single server, adding that to each account would be time consuming.
Here is an easy way that you could use to block some bots via your Apache conf directly on your cPanel server.
One thing to keep in mind is that you can not edit the httpd.conf directly on any cPanel server as your changes will be overwritten after the next cPanel httpd.conf rebuild.
You would need to add the rules to the following file:
/usr/local/apache/conf/includes/pre_main_global.conf
Depending on the bots that you would like to block, you could simply change the name or add more rules. In my case I want to block Yandex and MJ12bot, so I'm going to use the rules below:
<LocationMatch ".*">
SetEnvIfNoCase User-Agent "MJ12bot" bad_user
SetEnvIfNoCase User-Agent "YandexBot" bad_user
Deny from env=bad_user
<If "%{HTTP_USER_AGENT} == 'MJ12bot'">
Deny from all
<If "%{HTTP_USER_AGENT} == 'YandexBot'">
Deny from all
SetEnvIfNoCase User-Agent "MJ12bot" bad_user
SetEnvIfNoCase User-Agent "YandexBot" bad_user
Deny from env=bad_user
You can change the MJ12bot and the YandexBot parts with any bots that you would like to block or simply add more rules for the other bots that you want to block.
Save the file test the syntax by running one of the following:
httpd -t
Or:
service httpd configtest
If you get "Syntax OK" then go ahead and restart Apache:
service httpd restart
This is pretty much it, then if one of the bots tries to access your sites, they would get a 403 message.
Recent Posts

How DigitalOcean Simplifies Cloud Computing for Developers
2023-01-18 12:35:28
How to Get Current Route Name in Laravel
2020-11-08 08:57:11
How to check the logs of running and crashed pods in Kubernetes?
2020-10-28 09:01:44
Top 10 VScode Shortcuts For Mac and Windows to Help You be More Productive
2020-10-28 07:12:51