Fun MySQL fact of the day: buffer pool size
Yesterday, we started to discuss the InnoDB buffer pool and I noted that 80% of a system's memory makes a decent starting size for the buffer pool. However, this is somewhat meaningless since we didn't consider how much memory we'd want our system to have. Today we will.
To start, we need to summarise a (fun) fact we're going to expand upon over the next couple days: in InnoDB, all reads and writes happen in the buffer pool. Specifically, if you can remember back to late March and early April when we discussed InnoDB's extents, pages, and records, you'll recall that InnoDB organises index data into records that are stored on 16KB pages. It's these very same 16KB pages that get loaded into the InnoDB buffer pool, and when InnoDB needs to read from or write to a page, it happens only in the buffer pool. We'll discuss this more in the coming days.
Now, let's imagine we're running database with 200GB of data with two distinct scenarios:
- a server with 64GB of memory dedicating 51.2GB to the InnoDB buffer pool, and
- a server with 256GB of memory dedicating 204.8GB to the InnoDB buffer pool.
Knowing what we have just learned, a few things probably pop into your head
- A 200GB database won't fit into a 51.2GB buffer pool. This means that we can hold about 25% of the database in memory at any point in time and will have to read from disk for the other 75%.
- A 200GB database will theoretically fit into a 204.8GB buffer pool. But, if you remember back to March 29 and April 1, index fragmentation is a real thing. In practice, the entire data set still likely won't fit into the buffer pool, but it depends how badly fragmented your indices are. Let's pretend we have 10% fragmentation for an extra 20GB of pages. In this case, we can hold about 92% of the database in memory at any point in time and will have to read from disk for the other 8%.
Of course, it's not exactly as simple as a 1:1 disk-to-memory size mapping as there are more fun facts we'll eventually discuss. But, ignoring that, the consequences of having either 25% or 92% of your data in memory at any point in time is specific to your database's access patterns. Generally speaking, more is probably going to be better since, as we said yesterday, memory is orders of magnitude faster than disk. So, how big should your buffer pool actually be? Well, you'll have to consider your own budget, schema design, access patterns, and SLOs (Service Level Objectives) to figure that out.