2019-05-13 23:05 UTC

Fun MySQL fact of the day: subqueries in MySQL5.5

The year 2013 can be remembered by 3 separate things: Justin Timberlake, the series finale of Breaking Bad, and the release of MySQL 5.6. Though, if you aren't a MySQL historian, the release of MySQL 5.6 may have passed you by and you may have missed the significance as it pertains to subquery optimisation. So, today, we're going to take a quick look back at MySQL 5.5's subquery execution strategy.

Generally speaking, MySQL 5.5's subquery execution strategy can be defined as "simple", and while it's not quite as simple as I'm letting on, we can pretty much lump it into 2 separate strategies:

SUBQUERY, which tends to be chosen when using a scalar subquery that returns 1 column and 1 row for equality comparison, like SELECT id FROM outer_table WHERE inner_table_id = (SELECT id FROM inner_table WHERE a_unique_val = 'yay'). When using the SUBQUERY strategy, MySQL will first derive the scalar value of the subquery and then resolve all of the rows in outer_table where inner_table_id matches. In SQL terms, this is an uncorrelated subquery.
DEPENDENT SUBQUERY - which tends to be resolved in all other cases, like in SELECT id FROM outer_table WHERE inner_table_id IN (SELECT id FROM inner_table WHERE a_non_unique_val = 'sad') where the result of the subqueryis not a scalar value. In this case, MySQL will execute the inner query once for each row in outer_table. In this specific example, that's outer_table * inner_table rows, which, depending on the size of each table, could be pretty substantial. In SQL terms, this is a correlated subquery.

Sure, there's some more finesse to this and a dependent subquery isn't always the worst choice, but tomorrow, we'll start looking at how MySQL 5.6 solved this limitation. And then we'll never look back.