If you run a Drupal site for any length of time you will quickly realise that a few paths that have nothing to do with Drupal will receive a lot of traffic. All of these paths result in page not found errors so the only impact is taking up your server resources. It's common to see paths like wp-login, xmlrpc.php, phpBB/page_header.php, postnuke/article.php, as well as a multitude of others. These requests are clearly bots probing the site to see what sort of CMS is in use and if they can exploit it or not.
It's a bit of a shame that the internet is like this, but it's just one of the things you need to be aware of when managing a website. Users, and more often, bots, will continuously probe your site and servers for exploits. This is why you need to have firewalls and ensure your software is up to date as people are only too willing to crack your site and expose your data.
As well as exploits bots will also probe your site for files left in the web root. Database dumps, backup files, testing files, unsecured directories used by common modules are all requested on a regular basis. It's an important fact to realise that security through obscurity isn't security at all. Thinking that a user won't find that database dump in your web root is simply naive.
To find out what sort of paths are being commonly requested on your Drupal site you can use the following SQL. This assumes that you have the DBLog module active on the site, which might not always be the case. The results will be a neatly ranked list of 404 (i.e. page not found) pages on your site.
SELECT location, count(location) AS location_count FROM watchdog AS watchdog WHERE type = 'page not found' AND message NOT LIKE 'sites/all/%' AND message NOT LIKE 'sites/default/%' AND message NOT LIKE '%/styles/%' GROUP BY location ORDER BY location_count DESC LIMIT 50;
In the last week alone I have received well over 300 requests to the path wp-login.php to this site, and because there is nothing there to respond to this the request is given a 404 response. This still bootstraps Drupal and takes up valuable server resources so I decided to do something about it.
It turns out there are a few strategies to solving this issue.
By default, Drupal comes with a fast 404 feature that allows you to prevent bootstrapping Drupal and return a light page load instead. This module has been around since Drupal 7 and was moved into core in Drupal 8.
Look in your settings.php file and you will find some configuration items for fast_404.
- exclude_paths - This is a regular expression that will allows to you to exclude anything that you don't want the fast 404 system to respond to. By default this is the styles and system/files directories, which means you can still serve dynamic content from these directories correctly.
- paths - A regular expression containing a list of paths that you want to trigger the fast 404 page for. By default this is a list of file extensions.
- html - This is the actual HTML that your site will serve if it produces a 404 error.
To activate the fast 404 functionality you just need to uncomment these lines.
$config['system.performance']['fast_404']['exclude_paths'] = '/\/(?:styles)|(?:system\/files)\//'; $config['system.performance']['fast_404']['paths'] = '/\.(?:txt|png|gif|jpe?g|css|js|ico|swf|flv|cgi|bat|pl|dll|exe|asp)$/i'; $config['system.performance']['fast_404']['html'] = '<!DOCTYPE html><html><head><title>404 Not Found</title></head><body><h1>Not Found</h1><p>The requested URL "@path" was not found on this server.</p></body></html>';
When you now visit a page that produces a 404 response you will see a cut down page that tells the user that the file they were looking for does not exist.
As you can see from the above screenshot the output of this feature isn't exactly fully featured. As this 404 page will be issued if any files don't exist on your site there is a chance that your normal users might trip over it if they are unlucky. You will need to spend some time looking at tweaking the 404 page so that it produces a page that is more on brand for your site. Also remember that the 404 page is produced quite early on in the Drupal bootstrap so you won't have access to all of the modules and blocks that you would otherwise use on a Drupal page. You don't even have access to the Drupal services so you can't easily get access to the database to pull the site name out. Fast 404 can also clash with a few other modules, like the robots.txt module. You need to add an exclude rule for the robots.txt file to prevent fast 404 issuing a 404 before robots.txt can do its job.
To add the common 404 paths to this feature you need to add them to the paths configuration item in the fast_404 settings.
$config['system.performance']['fast_404']['paths'] = '/^(wp-login|wp-admin|xmlrpc|phpbb)|\.(?:php|txt|png|gif|jpe?g|css|js|ico|swf|flv|cgi|bat|pl|dll|exe|asp)$/i';
When users now visit pages like wp-login they will see a cut down 404 page that doesn't take resources away from your site.
Ultimately, the fast 404 system has a few limitations, but for preventing access to useless pages it might be the solution you want. It isn't recommend using this system unless you have a very specific use case as it can take quite a bit of setting up and can often lead to interference with other aspects of the site (robots.txt for example).
Return A 404 Via .htaccess
Of course we could opt to issue a 404 response before every bootstrapping Drupal. This can be done using a variety of methods with adding rules to your web server, adding VCL rules to your Varnish servers, adding exceptions to your Clouflare configurations, or even block requests using a hardware firewall.
To add the 404 response to these paths to your .htaccess file you can do the following just after the "RewriteBase /" line in the Drupal core default .htaccess file.
RewriteRule ^wp-admin/? - [L,R=404]
With this in place the user will be given a 404 page issued by the server itself rather than Drupal. Again, much the like the case with the fast 404 module you will need to spend time in adding anything to that page that you want to be there. At least it takes responsibility for issuing the 404 page away from Drupal and will ultimately stop using resources.
The Perimeter Module
If you want to take things to the next level then the Perimeter Defence module might be for you. This module has a pretty simple administration interface that allows you to add some paths that will cause the user to be entirely blocked from the site.
Here is the Perimeter module administration interface.
You just need to update the above list with any paths that you want to block the user from if they happen to visit it.
With the module active, users who visit one of these paths, wp-login for example, will be met with the following response.
The Perimeter Defence module ties into the Ban module, which is a core Drupal module. This means that when a user is banned from the site they are banned permanently. For this reason you need to make very sure that you have database level access before trying out any of the paths. The module doesn't care if you are logged in or not, it will just ban your IP address straight away.
Thankfully, if any of your users have managed to get themselves banned then there is a simple enough admin interface to allow you to administer the IP addresses.
Clicking delete on any of the above IP addresses will remove the ban for that IP, allowing the user to visit the site again. The good thing about this method is that it automatically bans users who are attempting to visit your site with known exploit paths. If you take the approach of "if you are looking there then you're up to no good" then the module fits the purposes and will perform very well. The only bad thing (aside from accidentally banning yourself very easily) is that a database access is still used so the page loads are still taking up resources.
One word of warning as well, the administration interface will let you enter broken regular expressions, and if you do then you'll find your 404 pages producing errors. Just treat each line in the interface as a separate regular expression and you should be able to spot any problems.
Also, if you are behind a CDN or proxy like Cloudflare then make sure you have your user IP addresses set up correctly as you might be banning your proxy traffic instead of the bad user agents you want to ban.
By the way, if you're using Drupal 7 then path2ban is the equivalent module here.
In the end, I have decided to install the Perimeter Defence module and see how things went. After just 24 hours of the module being active on the site it had blocked 30 IP addresses. Obviously, I ask you not to visit the above paths on this site or you will be permanently blocked. I will clear out the blocked IP addresses from time to time to allow any one who accidentally visited one of the paths a chance to get back onto the site.
What methods have you employed to stop these requests? Comment below and let us know.
One advantage of 404 response over Perimeter module is that a "404 error" is just a "404 error" while "xxx has been banned" is yelling some wall has been raised and might tease the hacker to search deeper for a breach.
That's true, I'm just trying Perimeter at the moment. I'll see how it goes.
Looking at the sort of 404 requests the site gets it's clear that most of the time it's just a script being run. There'll be multiple instances of a path and variations of that path all requested within a second of each other.