A recent Wired article about the Parler data hack talked about how a hacker group was able to steal publicly available information from the Parler website using an Insecure Direct Object Reference (IDOR) or enumeration attack. This type of attack involves a hacker looking at the structure of the site and attempting to guess the next available resource by looking at the URL. Apparently, terabytes of Parler's data was downloaded by simply enumerating through the ID's of their publicly available posts.
Taking an arbitrary example in Drupal, let's say that you had a content type called post that was loaded using the type and ID of the content in the URL. This means that when a user visits the post they might see a URL with the path /post/1. If a user wanted to see the next available post on the site then they could just increase the ID by one and see if the next post loads or not. It becomes almost trivial to build a script to download every post on the site by just looking through the ID of the posts. This is almost what happened in the Parler attack, although that attack was done through their API service rather than through their website, but the principle is the same.
You might not think this is much of a problem, but you can actually leak a lot of information from a site like this without even realising it. Especially problematic is user profiles where improperly controlled user profile pages can mean that a site can leak all of their user data without hackers needing to break into the system. It doesn't have to be user profiles though, many users will post identifiable information in public posts and so any system that allows all of these posts to be enumerated will leak quite a lot of information that can be easily linked back to users.
Twitter have a good mechanism of preventing enumeration attacks on their Tweets by giving each one a long unique number. Given the number of posts being sent to Twitter every minute this is quite a feat of engineering to get right without clashes. As an example, if I take a recent Tweet and look at the numbers one higher and one lower than the ID of the Tweet then I find nothing.
- https://twitter.com/philipnorton42/status/1354010770338697215 <- 404
- https://twitter.com/philipnorton42/status/1354010770338697216 <- an actual tweet
- https://twitter.com/philipnorton42/status/1354010770338697217 <- 404
This prevents me from just cycling through all the Tweets I can find to grab data. You would be blocked by Twitter a long time before you were able to find the next Tweet via an enumeration attack. You could try and grab a load of data from the API through searching, but Twitter have very careful usage limits on the number of requests that can be performed and so you wouldn't be able to download any significant portion of Tweets before being blocked.
Out of the box, Drupal is quite susceptible to this kind of attack as everything tends to be referenced in the same kind of way. You normally see a URL constructed using the type of entity, followed by the ID of the entity (eg, /node/1). This is the case for pages, taxonomy terms, users and most entities created in Drupal. Interestingly, the Drupal security team do not see this as a problem, and I'm inclined to agree with their response here. Drupal is meant to be a web framework and allows all kinds of sites to be created by plugging together components in different ways. Drupal is really good at being a community driven site and does a lot of things that go towards this aim with things like user profile pages, moderated comments and an extensive permissions system backed in. This kind of information leakage only becomes a problem when the page itself contains either secure or personally identifiable information and it is up to the site owners to act on this information accordingly.
If you are interested in preventing this kind of attack or information leakage for your Drupal site then I have put together a list of things you can start looking at. From this point on I'm going to assume that you want to lock down your content so that public posts are only found via a unique URL and that user profile pages are inaccessible.
User Profiles Permissions
Drupal will create user profile pages under the path "user/uid", where uid is the ID of the user. These pages contain the user name and how long they have been a member of the site. The default permission to access the user profile page (called "view user information") is disabled for non administrator accounts so you need to actively turn this on if you want your users to be viewable to each other or the public. This means that if a non-administrator user attempted to access a user account page they would get an access denied (403) error. This doesn't stop anyone from guessing how many user accounts you have and what their user IDs are, but at least their information is safe by default.
Even if you don't allow anonymous users to see user profile pages it is probably a very good idea to heavily restrict the data that will appear on the page. If you create a field for a user then you need to make absolutely sure that the information is secure and only printed to the user profile if absolutely needed. I have seen many sites that have meta information or private fields for users, so if you have fields of that type you need to make very sure that they are hidden from the user unless needed. Drupal doesn't have field level permissions out of the box, but you can easily use a hook_form_alter() to manipulate the user form to hide fields. Preventing them from being printed out is easier and just means having the display formats not contain those fields.
Thankfully, there is no direct way to configure Drupal to print out the user's email address on a Drupal site. In order to do this you will need to actively install modules or write template code. It is essential that you do not print out the user's email address to anonymous users from a spam point of view. There are bots on the internet that are specifically designed to seek out email addresses and by exposing your users' email addresses you make them a target for email spam.
The Pathauto module is pretty much a default requirement for Drupal projects and can be used to prevent this type of attack by simply obscuring the URL. If you do allow user profiles to be shown on your site then preventing enumeration of your user's profiles can be done easily using path auto and removing the ID from the URL structure.
The most basic approach, or when creating a community driven site, is to use a path that contains the user's name. Something like this is quite common.
I feel I should point out that exposing the username like this can make it easier for an attacker to gain access to the user's account as they will now have half of the information needed. All they need to do is guess the password and they are into the account. Drupal, however, has a built in brute force prevention mechanism in the form of the flood service. This means that if an attacker was to attempt to break into the user's account they would only be able to guess the password a few times before blocking the account. Whilst annoying to the end user, it does prevent their account from being compromised. To read more about using the flood service take a look at my previous article on injecting flood into Drupal forms. If you are still worried about this then there are two factor authentication modules that can be used to add another layer of protection to the user accounts on your site.
Pathauto, used in conjunction with proper permissions, can obscure and prevent access to user profiles across your site. Alternative approaches to this would be to look at including a hash or a UUID in the user's profile path although that method is not built into Drupal, and so you would need to create code in order to do that. Thankfully, there does exist a module called Token UUID that allows you to use the entity UUID of every object in Drupal as a token, which therefore allows for each Pathauto integration.
Pathauto is only half the story when it comes to paths, the other half being filled by the Rabbit Hole module.
Drupal exposes a lot of different paths connected to entities and even if you have your permissions set up correctly you can leak information about how many users you have and a few other structural bits of pieces. Many Drupal sites also use things like taxonomy terms in order to segment content within the site. All taxonomy terms are given a path by default so it is possible for a user to guess the URL and see a list of all the internal categories in the site. Not only this, but if your normal users happen to visit a structural taxonomy page then the chances are that it will not have been themed correctly and they will get a bad experience.
The Rabbit Hole module can allow you to restrict or even just hide access to all forms of entities on your site. This means that instead of accidentally visiting a structural taxonomy page the user would get a 404.
The Rabbit Hole's role in preventing user enumeration is to change the access denied response (403) to a page not found response (404) when trying to view a users profile. This makes it impossible to enumerate over your users as there will be no difference between a normal page not found and a user's profile. If you set you user entities to be configured like this in the Administration - Configuration - People - Account settings form then this will have the desired effect.
I use the Rabbit Hole module all the time on Drupal sites and it has come in very handy in reducing the footprint of Drupal. It's less useful with pages of content, but it does prevent accidental URL exposure that you weren't expecting.
Username Enumeration Prevention
If you are worried about information disclosure around users then you can use the Username Enumeration Prevention module. This module has a number of different functions, but will prevent anyone trying to guess how many users you have and what their usernames are. This starts with things like preventing the normal user path from being used, but also prevents the user login and registration forms from exposing information about users. This means that if a user attempts to login using an email address they won't be told anything identifiable about their login attempt. Instead of saying "your password is incorrect" the error message will say "username or password is incorrect" and won't give away that the user is, in fact, a user.
This module also has a similar function as the Rabbit Hole module in that it will automatically produce a 404 response instead of a 403 when viewing a user's profile page. It also plugs a little gap in the path redirect structure where Drupal will helpfully redirect the page from /user/1 to the proper canonical page, even if that page ends up being not found. As this gives a little clue to the existence of a user at that address this is an important prevention step in making sure that the user can't be guessed at all.
Adding this module is especially important when looking at sites that handle sensitive user data. Things like specialist interest sites, dating sites or even sites that sell alcohol should be very careful about leaking what user accounts are registered.
I've concentrated on handling enumeration attacks through the front end interface so far. Just as important is making sure that your Drupal API layer is secure. By default, the JSON:API module will enable a bunch of paths to various objects in your site. For example, you can visit URLs like /jsonapi/user/user to see a list of users available on the site. In the JSON:API interface entity and field access is respected, along with any other validation constraints. This essentially means that you can't just get and post data to the API without having authentication.
Although the JSON:API is controlled by the Drupal permissions and access system, it is somewhat vulnerable to information leakage due to the fact that you can ask it for a neatly paginated list of user accounts. Thankfully, the user accounts are controlled by the permission system and so they won't be able to get much more than a display name. It is also important to know that the JSON:API doesn't use the UID to find the user. Instead the user records are returned by the UUID of the user so to find information on a particular user you would need to use the URL /jsonapi/user/user/dd0d36fb-d136-4e72-982e-545294ae9ad8.
There isn't much of an interface to the JSON:API module, but you can install the JSON:API Extras module to find out more about what endpoints are exposed and what data fields are included.
With the JSON:API Extras module installed you can see exactly what endpoints you have enabled and disable them. I highly recommend you disable access to everything unless you have a specific requirement to use it. Although keeping an endpoint enabled 'just in case' seems like a good idea, in reality it is just another attack surface for your data.
In addition to preventing access to certain endpoints it is also a very good idea to add rate limiting to your API. Rate limiting is the practice of setting a limit on the number of requests per minute that a user can perform, which prevents them from using too much resources on your site. The Rate Limits module can be installed and configured to do just this. This module is built upon the Drupal Flood system and so can be used to prevent any user or IP address from accessing any route on your Drupal site without limit. The module is highly configurable and pluggable into different areas of Drupal. This isn't built just for the JSON:API module, but is a very good way of restricting unfettered access to your APIs.
Fundamentally, if you run a Drupal site then enumeration attacks (or attacks of any kind) should always be on your radar. You need to know what sort of "shape" your Drupal site has as if you forget about a corner that leaks information there is a good chance that your attackers will find it and exploit it. Just because information is hidden behind a URL and a sequential ID doesn't mean it's secure. Also, don't enable the JSON:API just to have a play and not do anything with it. It just creates another attack surface that if you don't mange correctly can lead to disaster.
You should always have tests to double check that the pages you think are secure aren't fully accessible by anonymous users. This means that any future updates to your site that break that security can be caught before going live. Something as simple as "as an anonymous user, I should get a 404 when attempting to access a user profile page" should be part of your testing suite.
Modules like Pathauto and Rabbit Hole are quite commonly installed on Drupal sites and although they do provide a nice safety net for your paths you need to make sure they are configured correctly. It's quite a simple mistake to create a new structural content type like a webform or a FAQ question and not restrict the paths that the content type has. This can undo all of the careful configuration you have put in place so if you add anything new then add them to your enumeration tests.
Even if you are building a community site that allows access to posts and user profiles publicly then you should install some safeguards to prevent your entire site from being downloaded. Installing the rate limits module is a first step in preventing this sort of thing happening. But you should also lock your APIs behind some form of authentication to prevent users from using proxy servers to bypass your imposed limits.
There are a number of different techniques to securing your site from enumeration attacks, but they must be used as a coherent strategy rather than as separate parts. Don't just install the Pathauto module and think that the problem is solved as you will also need another module to, for example, prevent access to structural taxonomy tags.