Fun MySQL fact of the day: splitting pages
Last week, we looked at how InnoDB uses records, pages, and extents to organise data, and we mostly considered inserts on the PRIMARY
index using a monotonic primary key. Before moving on to secondary indices, we'll consider random-order inserts.
If we can continue to ignore some more details, let's consider an InnoDB table file like this next one, which we know is 256 pages for a total of 4MB. For simplicity and demonstrative purposes, let's pretend that each record is shy of 8KB, giving us 2 records per page:
[
extent [ page0=[11,10], page1=[20,21], ... page63[32,30] ],
extent [ page64=[22,23], page65=[26,25], ... page127[12,13] ],
extent [ page128=[15,16], page129=[33,34], ... page191[36,38] ],
extent [ page192=[50,39], page193=[17,19], ... page255[27,28] ],
]
I've purposefully jumbled up the data demonstrate how physical ordering differs from logical ordering, but the logical page order is as follows:
[0, 127, 128, 193, 1, 64, 65, 255, 63, 129, 191, 192]
Before we move on, one piece of information to consider is that InnoDB only uses the minimum key of a page to find a page. Now, let's say we need to insert the keys, 40
and 29
.
40
"naturally" belongs onpage 192
, since it's greater than39
but less than50
. Because this page is full, we'll need a new page. Because we we also have to preserve key order, we need to "split" the page and move50
.29
"naturally" belongs onpage 255
, since it's greater than28
and less than30
(which is on a different page). Again, we need to "split"page 255
and move28
.
[
extent [ page0=[11,10], page1=[20,21], ... page63[32,30] ],
extent [ page64=[22,23], page65=[26,25], ... page127[12,13] ],
extent [ page128=[15,16], page129=[33,34], ... page191[36,38] ],
extent [ page192=[39, ], page193=[17,19], ... page255[27, ] ],
extent [ page256=[50,40], page257=[28,29], ... page319[ ] ],
]
Our updated logical page order is now
[0, 127, 128, 193, 1, 64, 65, 255, 257, 63, 129, 191, 192, 256]
This example is, of course, overly simplified, but it should help visualize the consequences of random-order inserts:
- writes take extra work to move unrelated records,
- queries that need to visit multiple pages require
n
IO operations, wheren
is the number of extents/pages visited, and - unnecessary disk space is wasted and can only be reclaimed by dumping+restoring your database.
In an industry that continues to shift toward pricing by IOPS and storage, I hope you'll agree with me that random-order inserts is rarely ideal, especially in an OLTP (Online Transaction Processing) database.