There’s a popular and seemingly authoritative blog post called On Rocks and Sand on how to optimize PostgreSQL tables for size to eliminate internal padding by re-ordering their column length. They explain how variable-length types incur some extra padding if they’re not at the end of the table:
This means we can chain variable length columns all day long without introducing padding except at the right boundary. Consequently, we can deduce that variable length columns introduce no bloat so long as they’re at the end of a column listing.
And at the end of the post, to summarize:
Sort the columns by their type length as defined in pg_type.
There’s a library that integrates with Ruby’s ActiveRecord to automatically re-order columns to reduce padding called pg_column_byte_packer. You can see the README in that repo cites the above blog post and in general does the same thing that the blog post describes.
pg_column_byte_packer does not return results consistent with the blog post it cites. The blog post pulls from from PostgreSQL’s internal
pg_type.typelen which puts variable-length columns always at the end via an alignment of -1.
pg_column_byte_packer gives them an alignment of 3.
pg_column_byte_packer has an explanatory comment:
# These types generally have an alignment of 4 (as designated by pg_type # having a typalign value of 'i', but they're special in that small values # have an optimized storage layout. Beyond the optimized storage layout, though, # these small values also are not required to respect the alignment the type # would otherwise have. Specifically, values with a size of at most 127 bytes # aren't aligned. That 127 byte cap, however, includes an overhead byte to store # the length, and so in reality the max is 126 bytes. Interestingly TOASTable # values are also treated that way, but we don't have a good way of knowing which # values those will be. # # See: `fill_val()` in src/backend/access/common/heaptuple.c (in the conditional # `else if (att->attlen == -1)` branch. # # When no limit modifier has been applied we don't have a good heuristic for # determining which columns are likely to be long or short, so we currently # just slot them all after the columns we believe will always be long.
The comment appears to be not wrong as text columns do have a
pg_type.typalign of 4 but they’ve also got a
pg_type.typlen of -1 which the blog post argues gets the most optimal packing when at the end of the table.
So in the case of a table that has a four byte alignment column, a text column, and a two byte alignment column,
pg_column_byte_packer will put the text columns right in between the two. They’ve even got a unit test to assert that this always happens.
My question here is: what order of columns actually packs for minimal space? The comment from
pg_column_byte_packer appears to be not wrong as text columns do have a
pg_type.typalign of 4, but they’ve also got a
pg_type.typlen of -1.