I have a MySQL 8.0 table on Amazon RDS. This table contains ~35 columns, most of which are sizable
JSON blobs. The primary key is an UNSIGNED INT32. There are 8M+ rows in this table and has a size of 50GB+.
I ran a simple COUNT(*) on this table with no WHERE clause and it took over 20 minutes. I read online that the PRIMARY key BTREE includes at least 20 bytes for each JSON/TEXT/BLOB column and to try creating a separate index on the primary key. I tried this and it slightly improved performance to 10 minutes for a COUNT(*). EXPLAINing the query shows correctly that a Full Index Scan was used on this new index.
I measured the size of this second index to be ~87MB (using the query in this SO answer https://stackoverflow.com/a/36573801). This makes sense to me as
8M rows * 4 bytes = 31MB and the other 56MB is likely overhead. This entire index would easily fit in RAM, and I expect that even a Full Index Scan would complete fairly quickly (in seconds, not 10 minutes).
My AWS console shows that Read IOPS spikes to the maximum when running COUNT(*). So MySQL reads from disk for some reason, despite the second index easily fitting in RAM. Running the same COUNT(*) query again right after the first did not change time taken at all, so it seems unlikely that it was reading the index into RAM (even if it was, the disk is an SSD so 87MB would load quickly).
What’s causing my query to read from disk? How can I improve performance?