Fun MySQL fact of the day: unsafe logging
Over the last couple weeks, we've been discussing the MySQL binary log and building up our mental mode around the STATEMENT
-based binary log format. When using a STATEMENT
binary log format, MySQL very simply puts the SQL statement that issued a change into the binary log. Then, when we re-read the log through either mysqlbinlog
or replication, the statements can be re-executed verbatim. And while this seems like a pretty simple solution that's easy to understand and implement, it comes with a whole lot of fun.
Let's consider, for example, INSERT INTO my_table(id, uuid) VALUES(0, UUID())
, or even UPDATE my_table SET some_value = 1 WHERE some_value < RAND() * 100
. When MySQL logs these using the STATEMENT
binary log format, they'll be recorded in the bin log exactly the same way. And, of course, every time you run these queries, you'll get differing values for RAND()
and UUID()
. This puts our ability to perform point-in-time recovery in jeopardy and will cause a replica to have data inconsistent with its master. Similarly, as we'd discussed back on February 28, the same problem holds true for DML (Data Manipulation Language) statements with a LIMIT
but no ORDER BY
: the query is, essentially, non-deterministic. To our convenience, MySQL has documented all of the safe/unsafe operations when using STATEMENT
-based logging. Be certain you are looking at the documentation for your current version, because it changes on each release.
Thankfully, the MySQL community noticed this was somewhat undesirable, if not risky, out-of-the-box behaviour and MySQL eventually changed the default of binlog_format
from STATEMENT
to ROW
in MySQL 5.7.7 and above. We'll discuss the ROW
format next week, but in the meantime, I suggest you SHOW VARIABLES LIKE 'binlog_format'
to make sure you are, in fact, not using STATEMENT
-based binary logging.