Fun MySQL fact of the day: collation filtering
Yesterday, we considered some cases where MySQL may indicate Using where; Using index
in the Extra
field of a query plan. Today, we're going to look at one more case that existed up until MySQL 5.7, and while I haven't made a habit of digging up too many bugs as fun facts, this one was a shocker to me, personally, and is something you'll want to keep in mind.
In MySQL (in general, really), a collation is a set of rules that governs the way character sets are evaluated. For example, a character set can have a case-insensitive collation that considers 'a'
and 'A'
as equal (which is the out-of-the-box default) or one that considers 'Ö'
and 'OE'
equal. Another type of collation is padding: PAD SPACE
vs. NO PAD
. These properties specify the equality of 2 string types by either ignoring or not ignoring the white-space trailing a string. For example, in a PAD SPACE
collation, 'a'
equals 'a '
, but not in a NO PAD
collation. Until MySQL 8.0.0, all collations were PAD SPACE
, and you can demonstrating it by running SELECT 'a ' = 'a';
.
Over the last couple days, we've seen a couple situations where the MySQL query optimiser may or may not decide to do additional filtering on a query's results. This decision is based on a set of rules dictating when MySQL can "eliminate" a specific condition that MySQL knows is guaranteed to be true. For example, if a query with WHERE a=1 AND b=1
is run against a covering index in InnoDB, (a,b)
, MySQL knows that it doesn't have to do any extra filtering (InnoDB won't lie) and can drop both conditions on a=1
and b=1
thereby preventing any supplemental filtering. Now, consider a similar SELECT ... WHERE a='a '
: in a collation with PAD SPACE
, we expect all records starting with a
and ending zero or more spaces. But, until MySQL 5.7, the optimiser assumed otherwise and excluded all string types from condition elimination and always filtered them (bug #66983).
Now, when I said this was a bug, well, it was when all you have are PAD SPACE
collations. But for NOPAD
collations, this isn't a bug: when you use NOPAD
on a CHAR
type field, you will continue to get supplemental filtering due to the way storage engines may choose to optimise CHAR
storage. Thankfully, it's all fixed up the best it can be and you just have to remember the trade-offs of collation when designing your table.