2019-03-20 08:33 UTC

Fun MySQL fact of the day: the phantom record menace

Yesterday, we created an index to prevent a full table scan from locking the entire list_of_things table during a single UPDATE. I mentioned it was an improvement to the problem, but not a complete fix. Today, we'll see why.

Databases aren't incredibly useful if you can't INSERT data into them, so let's have a few other accounts try to insert some data into the same table as yesterday:

account 5> BEGIN; UPDATE list_of_things SET modified = now() WHERE account_id = 5;
account 3> BEGIN; INSERT INTO list_of_things VALUES(0, 3, now(), 'account 3''s thing');
account 6> BEGIN; INSERT INTO list_of_things VALUES(0, 6, now(), 'account 6''s thing');
account 9> BEGIN; INSERT INTO list_of_things VALUES(0, 9, now(), 'account 9''s thing');

What happens here? Well, the INSERT for account_id=9 completes normally, but the INSERT for account_id={3,6} are stalled until the UPDATE on account_id=5 commits. This may again seem somewhat surprising, but this is once again how InnoDB is attempting to prevent phantom reads: all records with account_id=5 must be updated, with no exception. This means that no new records with account_id=5 may be added until the transaction commits or rolls-back.

"But 3 and 6 aren't 5", you protest! Indeed, they aren't. This is the result of gap locks, which InnoDB uses to prevent new rows with account_id=5 when updating all existing rows with account_id=5. More specifically, this is actually the result of two gap locks: a "next-key lock", which locks account_id=(1,5], and a "gap lock" for account_id=[5,9). We'll consider the technical reasons tomorrow, but in any case, the net result for us is that as many as 6 accounts are impacted by the actions of 1 account.

Now, since this is a gap lock, we can say that the less sparse the data are, the less coarse the locks will be, but that's a variable often out of our control. Thankfully, some things are in our control and we can address this in a few ways: * Add a unique constraint to account_id, which will generally prevent gap locks. This, naturally, has broad implications on your schema design and may not be applicable. * Use READ-COMMITTED isolation, which will generally prevent gap locks. This may have subtle implications on your data access patterns and should be reasoned-out. * Use multiple transactions to find all of the records where account_id=5 (preferably from a replica) and update them in small batches (remember about undo logs) by primary key. This is most preferable for a number of reasons we have already and will discuss in the future.

Either of the last 2 solutions will result in possible phantom reads, so you just have to decide which is more important to you for your specific use case: preventing phantom reads or higher throughput and predictable latency by means of narrower locks.