Fun MySQL fact of the day: table pullouts
As noted yesterday, MySQL 5.6 was released with a large number of optimisations to the subquery engine. In particular, a number of changes were made to optimise for non-scalar, uncorrelated subqueries, allowing them to be run as... you guess it! Joins.
Except, these aren't any ordinary joins. No. They're semi-joins. "Why," you may ask? Well, unlike a "normal" join, a subquery has the following properties:
- The values of a subquery are never included in the result set. As such, MySQL only needs to retain the information whether or not a join would have happened and does not need to actually resolve the join.
- Further to this point, the results of a query with a subquery are, by ANSI-SQL specifications, unique: given an outer table with keys
(1,2,3}
and an inner table with join keys(1,1,2,2,3,3)
, the result set must be resolved to only(1,2,3)
.
And so it follows that these aren't "complete" joins but are, instead, semi-joins (which sounds better than "kinda-like-joins"). But how does this look? Today, we'll briefly discuss the "table pullout" semi-join strategy, which is used when MySQL determines it is able to "pull" the table out of a subquery and make it a join for you. Let's consider a company->employee
relation and a query wherein we wish to select all employees for an company by date of company incorporation:
EXPLAIN SELECT * FROM employee WHERE company_id IN (SELECT id FROM company WHERE date_of_incorporation = 1996);
If you run this EXPLAIN
in MySQL 5.5, you will likely see the company
table is resolved with a DEPENDENT SUBQUERY
, meaning the subquery will be executed once per employee
row. And, because this is a full table scan of employee
, that's a lot of rows! But, in MySQL 5.6, if a table pullout strategy is chosen, you will likely see that company
is being queried as a SIMPLE
type first, and then being joined to employee
, also using a SIMPLE
query type. In this case, MySQL will only need to evaluate the matching employee * company
rows, as opposed to an index scan over company
once per row in the employee
table.
When MySQL uses a table pullout strategy, you may not immediately identify it based on the EXPLAIN
plan. But, once you run the EXPLAIN
, what's actually happening will become completely apparent once you SHOW WARNINGS
:
***************************[ 1. row ]***************************
Level | Note
Code | 1003
Message | /* select#1 */ select `test`.`employee`.`id` AS `id`,`test`.`employee`.`company_id` AS `company_id` from `test`.`company` = `test`.`company`.`id`) and (`test`.`company`.`date_of_incorporation` = 1996))
Of course, if this still isn't enough, you can always SELECT * FROM INFORMATION_SCHEMA.OPTIMIZER_TRACE
, which, when table pullout strategy is chosen, will explain that select#2
is undergoing a transformation
to
semi-join
, and lists company
in pulled_out_semijoin_tables
.