PHP:CSI - Random Increment Insert

26th January 2019

A while ago I was working on some changes to a website and came across a block of code that made me stare blankly at my screen. The website I was working on was a custom build website, created by another developer at the company I was working with at the time. I have never done a PHP:CSI on this site before but remember being so amazed at what I found at the time that I made a note of it for future reference. I have pondered recently how to approach the analysis of the code.

The code I found was in a method that inserted an item into a database table. For some reason that I can't fathom the developer had opted to not use auto increment ID's and has instead developed a method that essentially randomly decided on an ID number for the item.

I can't paste the entire block of code here, but I can include the region of code that made me scratch my head in bewilderment.

  1. do {
  2. $ref = mt_rand(1, pow(2, 31)-1); // Generate a ref number
  3. $result = $mysqli->query("SELECT id FROM tblrefs WHERE ref = {$ref}");
  4. } while($result->num_rows == 0);

If you are experienced enough you'll probably realise the issue here. I thought it might be a good idea to step through everything going on here.

  • First we enter a do-while loop. This will continue to run whilst the condition of the while is true.
  • Next we generate a reference number and store it in the variable $ref. This is a random number (generated by the mt_rand() function) that lies between 1 and the maximum number of a 32 bit integer (calculated by raising 2 to the power of 31 and subtracting 1).
  • The next line is an SQL query to find out if the random number we generated already exists in the database table we are interested in.
  • The while condition is then looked at. If the number number of rows returned from the SQL query is not 0 then we have already used that number in the database and so we need to loop again to regenerate the number and so on. If it is 0 then the number is new and we can exit the do-while loop and go on to insert the item to the database.

This has a few problems.

  • To insert anything into the table we first need to first query the database to see if we have a matching ID.
  • For some reason we are figuring out the value of a maximum integer using maths, and not the built in PHP constant PHP_MAX_INT. Calculating this value on the fly by raising 2 to the power of 31 is actually quite an expensive operation. Why the developer decided not to use PHP_MAX_INT here is beyond me, but that's not the main issue here.
  • As we are generating a random number there is an ever increasing chance that we would create a number that already exists. This means that the more items are in the database the more likely it is that we will find an existing item in the database and have to regenerate our random number again.
  • As this was a busy website with a number of editors writing content there is the chance that this block of code would generate two identical IDs and attempt to insert them into the database at the same time. 

As a quick test I decided to see how long it would take this method to find a duplicate number from the randomly created value. I put together a quick script to generate random numbers using the same method as above until a duplicate was found, at which point we count the number of values created and start again.

  1. for ($outer = 0; $outer <= 100000; ++$outer) {
  2. // Reset refs array.
  3. $refs = array();
  4.  
  5. do {
  6. // Generate the random number.
  7. $ref = mt_rand(1, pow(2, 31)-1);
  8.  
  9. // Has it already been generated.
  10. if (in_array($ref, $refs)) {
  11. // Number already exists, record clash and break the loop.
  12.  
  13. // Count the number of random numbers generated.
  14. $count = count($refs);
  15.  
  16. // Record data to file.
  17. file_put_contents('counts.txt', $count . "\n", FILE_APPEND);
  18.  
  19. // Break out while loop.
  20. break;
  21. }
  22.  
  23. // Loop not broken, add the number to the list and continue.
  24. $refs[] = $ref;
  25.  
  26. } while(1==1);
  27. }

After running this script (which took a while to complete) I ran some quick analysis on it. Here are the results.

  1. Min: 824
  2. Max: 200448
  3. Mean: 58774.052132701
  4. Standard Deviation: 8117.1541955524

So it turns out that we only need to create a few thousand items before we start clashing numbers. This script only looks at the first clash, but it's clear that we are quite likely to generate duplicate ID's in the database the longer this situation continues.

A Better Approach

Whilst this approach works, a much better solution would be to simply use the existing auto increment system that exists in MySQL. I still can't figure out why the original developer didn't use this system.

Even if they didn't use auto increment they could have used a hash table algorithm to generate the ID for the item. This could have been done by generating a hash using the SHA of the information we are about to insert. In the unlikely event of a clash the database would return an error, at which point we would then need to figure out a new hash. Even in this situation it would still be more performant than randomly assigning an ID to a value and seeing if that exists first.

Once the code had gone down this route the problem was that it became difficult to swap to another method without a lot of unpicking. When I noticed this code I flagged it with my managers as an expensive and potentially dangerous operation and that we should spend some time fixing it. I left the company not too long after this, but I did see the site being migrated to a different platform so I can safely say that this code is no longer in production.

Comments

Permalink

He had used this may be because to make a unique random number to product (instead of showing user a primary key) that will be visible at the users end and from back-end there will be primary key to map that product related information .....
Yes but this is quite expensive..

rajesh (Wed, 09/11/2019 - 10:05)

Permalink

He had used that may be because to give product an unique number rather than showing him primary key.

Still it is bad practice.

rajesh (Wed, 09/11/2019 - 10:15)

Add new comment

The content of this field is kept private and will not be shown publicly.